http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/sql-programming-guide.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/sql-programming-guide.html 
b/site/docs/2.1.0/sql-programming-guide.html
index 17f5981..4534a98 100644
--- a/site/docs/2.1.0/sql-programming-guide.html
+++ b/site/docs/2.1.0/sql-programming-guide.html
@@ -127,95 +127,95 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#overview" id="markdown-toc-overview">Overview</a>    <ul>
-      <li><a href="#sql" id="markdown-toc-sql">SQL</a></li>
-      <li><a href="#datasets-and-dataframes" 
id="markdown-toc-datasets-and-dataframes">Datasets and DataFrames</a></li>
+  <li><a href="#overview">Overview</a>    <ul>
+      <li><a href="#sql">SQL</a></li>
+      <li><a href="#datasets-and-dataframes">Datasets and DataFrames</a></li>
     </ul>
   </li>
-  <li><a href="#getting-started" id="markdown-toc-getting-started">Getting 
Started</a>    <ul>
-      <li><a href="#starting-point-sparksession" 
id="markdown-toc-starting-point-sparksession">Starting Point: 
SparkSession</a></li>
-      <li><a href="#creating-dataframes" 
id="markdown-toc-creating-dataframes">Creating DataFrames</a></li>
-      <li><a href="#untyped-dataset-operations-aka-dataframe-operations" 
id="markdown-toc-untyped-dataset-operations-aka-dataframe-operations">Untyped 
Dataset Operations (aka DataFrame Operations)</a></li>
-      <li><a href="#running-sql-queries-programmatically" 
id="markdown-toc-running-sql-queries-programmatically">Running SQL Queries 
Programmatically</a></li>
-      <li><a href="#global-temporary-view" 
id="markdown-toc-global-temporary-view">Global Temporary View</a></li>
-      <li><a href="#creating-datasets" 
id="markdown-toc-creating-datasets">Creating Datasets</a></li>
-      <li><a href="#interoperating-with-rdds" 
id="markdown-toc-interoperating-with-rdds">Interoperating with RDDs</a>        
<ul>
-          <li><a href="#inferring-the-schema-using-reflection" 
id="markdown-toc-inferring-the-schema-using-reflection">Inferring the Schema 
Using Reflection</a></li>
-          <li><a href="#programmatically-specifying-the-schema" 
id="markdown-toc-programmatically-specifying-the-schema">Programmatically 
Specifying the Schema</a></li>
+  <li><a href="#getting-started">Getting Started</a>    <ul>
+      <li><a href="#starting-point-sparksession">Starting Point: 
SparkSession</a></li>
+      <li><a href="#creating-dataframes">Creating DataFrames</a></li>
+      <li><a 
href="#untyped-dataset-operations-aka-dataframe-operations">Untyped Dataset 
Operations (aka DataFrame Operations)</a></li>
+      <li><a href="#running-sql-queries-programmatically">Running SQL Queries 
Programmatically</a></li>
+      <li><a href="#global-temporary-view">Global Temporary View</a></li>
+      <li><a href="#creating-datasets">Creating Datasets</a></li>
+      <li><a href="#interoperating-with-rdds">Interoperating with RDDs</a>     
   <ul>
+          <li><a href="#inferring-the-schema-using-reflection">Inferring the 
Schema Using Reflection</a></li>
+          <li><a 
href="#programmatically-specifying-the-schema">Programmatically Specifying the 
Schema</a></li>
         </ul>
       </li>
     </ul>
   </li>
-  <li><a href="#data-sources" id="markdown-toc-data-sources">Data Sources</a>  
  <ul>
-      <li><a href="#generic-loadsave-functions" 
id="markdown-toc-generic-loadsave-functions">Generic Load/Save Functions</a>    
    <ul>
-          <li><a href="#manually-specifying-options" 
id="markdown-toc-manually-specifying-options">Manually Specifying 
Options</a></li>
-          <li><a href="#run-sql-on-files-directly" 
id="markdown-toc-run-sql-on-files-directly">Run SQL on files directly</a></li>
-          <li><a href="#save-modes" id="markdown-toc-save-modes">Save 
Modes</a></li>
-          <li><a href="#saving-to-persistent-tables" 
id="markdown-toc-saving-to-persistent-tables">Saving to Persistent 
Tables</a></li>
+  <li><a href="#data-sources">Data Sources</a>    <ul>
+      <li><a href="#generic-loadsave-functions">Generic Load/Save 
Functions</a>        <ul>
+          <li><a href="#manually-specifying-options">Manually Specifying 
Options</a></li>
+          <li><a href="#run-sql-on-files-directly">Run SQL on files 
directly</a></li>
+          <li><a href="#save-modes">Save Modes</a></li>
+          <li><a href="#saving-to-persistent-tables">Saving to Persistent 
Tables</a></li>
         </ul>
       </li>
-      <li><a href="#parquet-files" id="markdown-toc-parquet-files">Parquet 
Files</a>        <ul>
-          <li><a href="#loading-data-programmatically" 
id="markdown-toc-loading-data-programmatically">Loading Data 
Programmatically</a></li>
-          <li><a href="#partition-discovery" 
id="markdown-toc-partition-discovery">Partition Discovery</a></li>
-          <li><a href="#schema-merging" 
id="markdown-toc-schema-merging">Schema Merging</a></li>
-          <li><a href="#hive-metastore-parquet-table-conversion" 
id="markdown-toc-hive-metastore-parquet-table-conversion">Hive metastore 
Parquet table conversion</a>            <ul>
-              <li><a href="#hiveparquet-schema-reconciliation" 
id="markdown-toc-hiveparquet-schema-reconciliation">Hive/Parquet Schema 
Reconciliation</a></li>
-              <li><a href="#metadata-refreshing" 
id="markdown-toc-metadata-refreshing">Metadata Refreshing</a></li>
+      <li><a href="#parquet-files">Parquet Files</a>        <ul>
+          <li><a href="#loading-data-programmatically">Loading Data 
Programmatically</a></li>
+          <li><a href="#partition-discovery">Partition Discovery</a></li>
+          <li><a href="#schema-merging">Schema Merging</a></li>
+          <li><a href="#hive-metastore-parquet-table-conversion">Hive 
metastore Parquet table conversion</a>            <ul>
+              <li><a href="#hiveparquet-schema-reconciliation">Hive/Parquet 
Schema Reconciliation</a></li>
+              <li><a href="#metadata-refreshing">Metadata Refreshing</a></li>
             </ul>
           </li>
-          <li><a href="#configuration" 
id="markdown-toc-configuration">Configuration</a></li>
+          <li><a href="#configuration">Configuration</a></li>
         </ul>
       </li>
-      <li><a href="#json-datasets" id="markdown-toc-json-datasets">JSON 
Datasets</a></li>
-      <li><a href="#hive-tables" id="markdown-toc-hive-tables">Hive Tables</a> 
       <ul>
-          <li><a href="#interacting-with-different-versions-of-hive-metastore" 
id="markdown-toc-interacting-with-different-versions-of-hive-metastore">Interacting
 with Different Versions of Hive Metastore</a></li>
+      <li><a href="#json-datasets">JSON Datasets</a></li>
+      <li><a href="#hive-tables">Hive Tables</a>        <ul>
+          <li><a 
href="#interacting-with-different-versions-of-hive-metastore">Interacting with 
Different Versions of Hive Metastore</a></li>
         </ul>
       </li>
-      <li><a href="#jdbc-to-other-databases" 
id="markdown-toc-jdbc-to-other-databases">JDBC To Other Databases</a></li>
-      <li><a href="#troubleshooting" 
id="markdown-toc-troubleshooting">Troubleshooting</a></li>
+      <li><a href="#jdbc-to-other-databases">JDBC To Other Databases</a></li>
+      <li><a href="#troubleshooting">Troubleshooting</a></li>
     </ul>
   </li>
-  <li><a href="#performance-tuning" 
id="markdown-toc-performance-tuning">Performance Tuning</a>    <ul>
-      <li><a href="#caching-data-in-memory" 
id="markdown-toc-caching-data-in-memory">Caching Data In Memory</a></li>
-      <li><a href="#other-configuration-options" 
id="markdown-toc-other-configuration-options">Other Configuration 
Options</a></li>
+  <li><a href="#performance-tuning">Performance Tuning</a>    <ul>
+      <li><a href="#caching-data-in-memory">Caching Data In Memory</a></li>
+      <li><a href="#other-configuration-options">Other Configuration 
Options</a></li>
     </ul>
   </li>
-  <li><a href="#distributed-sql-engine" 
id="markdown-toc-distributed-sql-engine">Distributed SQL Engine</a>    <ul>
-      <li><a href="#running-the-thrift-jdbcodbc-server" 
id="markdown-toc-running-the-thrift-jdbcodbc-server">Running the Thrift 
JDBC/ODBC server</a></li>
-      <li><a href="#running-the-spark-sql-cli" 
id="markdown-toc-running-the-spark-sql-cli">Running the Spark SQL CLI</a></li>
+  <li><a href="#distributed-sql-engine">Distributed SQL Engine</a>    <ul>
+      <li><a href="#running-the-thrift-jdbcodbc-server">Running the Thrift 
JDBC/ODBC server</a></li>
+      <li><a href="#running-the-spark-sql-cli">Running the Spark SQL 
CLI</a></li>
     </ul>
   </li>
-  <li><a href="#migration-guide" id="markdown-toc-migration-guide">Migration 
Guide</a>    <ul>
-      <li><a href="#upgrading-from-spark-sql-20-to-21" 
id="markdown-toc-upgrading-from-spark-sql-20-to-21">Upgrading From Spark SQL 
2.0 to 2.1</a></li>
-      <li><a href="#upgrading-from-spark-sql-16-to-20" 
id="markdown-toc-upgrading-from-spark-sql-16-to-20">Upgrading From Spark SQL 
1.6 to 2.0</a></li>
-      <li><a href="#upgrading-from-spark-sql-15-to-16" 
id="markdown-toc-upgrading-from-spark-sql-15-to-16">Upgrading From Spark SQL 
1.5 to 1.6</a></li>
-      <li><a href="#upgrading-from-spark-sql-14-to-15" 
id="markdown-toc-upgrading-from-spark-sql-14-to-15">Upgrading From Spark SQL 
1.4 to 1.5</a></li>
-      <li><a href="#upgrading-from-spark-sql-13-to-14" 
id="markdown-toc-upgrading-from-spark-sql-13-to-14">Upgrading from Spark SQL 
1.3 to 1.4</a>        <ul>
-          <li><a href="#dataframe-data-readerwriter-interface" 
id="markdown-toc-dataframe-data-readerwriter-interface">DataFrame data 
reader/writer interface</a></li>
-          <li><a href="#dataframegroupby-retains-grouping-columns" 
id="markdown-toc-dataframegroupby-retains-grouping-columns">DataFrame.groupBy 
retains grouping columns</a></li>
-          <li><a href="#behavior-change-on-dataframewithcolumn" 
id="markdown-toc-behavior-change-on-dataframewithcolumn">Behavior change on 
DataFrame.withColumn</a></li>
+  <li><a href="#migration-guide">Migration Guide</a>    <ul>
+      <li><a href="#upgrading-from-spark-sql-20-to-21">Upgrading From Spark 
SQL 2.0 to 2.1</a></li>
+      <li><a href="#upgrading-from-spark-sql-16-to-20">Upgrading From Spark 
SQL 1.6 to 2.0</a></li>
+      <li><a href="#upgrading-from-spark-sql-15-to-16">Upgrading From Spark 
SQL 1.5 to 1.6</a></li>
+      <li><a href="#upgrading-from-spark-sql-14-to-15">Upgrading From Spark 
SQL 1.4 to 1.5</a></li>
+      <li><a href="#upgrading-from-spark-sql-13-to-14">Upgrading from Spark 
SQL 1.3 to 1.4</a>        <ul>
+          <li><a href="#dataframe-data-readerwriter-interface">DataFrame data 
reader/writer interface</a></li>
+          <li><a 
href="#dataframegroupby-retains-grouping-columns">DataFrame.groupBy retains 
grouping columns</a></li>
+          <li><a href="#behavior-change-on-dataframewithcolumn">Behavior 
change on DataFrame.withColumn</a></li>
         </ul>
       </li>
-      <li><a href="#upgrading-from-spark-sql-10-12-to-13" 
id="markdown-toc-upgrading-from-spark-sql-10-12-to-13">Upgrading from Spark SQL 
1.0-1.2 to 1.3</a>        <ul>
-          <li><a href="#rename-of-schemardd-to-dataframe" 
id="markdown-toc-rename-of-schemardd-to-dataframe">Rename of SchemaRDD to 
DataFrame</a></li>
-          <li><a href="#unification-of-the-java-and-scala-apis" 
id="markdown-toc-unification-of-the-java-and-scala-apis">Unification of the 
Java and Scala APIs</a></li>
-          <li><a 
href="#isolation-of-implicit-conversions-and-removal-of-dsl-package-scala-only" 
id="markdown-toc-isolation-of-implicit-conversions-and-removal-of-dsl-package-scala-only">Isolation
 of Implicit Conversions and Removal of dsl Package (Scala-only)</a></li>
-          <li><a 
href="#removal-of-the-type-aliases-in-orgapachesparksql-for-datatype-scala-only"
 
id="markdown-toc-removal-of-the-type-aliases-in-orgapachesparksql-for-datatype-scala-only">Removal
 of the type aliases in org.apache.spark.sql for DataType (Scala-only)</a></li>
-          <li><a href="#udf-registration-moved-to-sqlcontextudf-java--scala" 
id="markdown-toc-udf-registration-moved-to-sqlcontextudf-java--scala">UDF 
Registration Moved to <code>sqlContext.udf</code> (Java &amp; Scala)</a></li>
-          <li><a href="#python-datatypes-no-longer-singletons" 
id="markdown-toc-python-datatypes-no-longer-singletons">Python DataTypes No 
Longer Singletons</a></li>
+      <li><a href="#upgrading-from-spark-sql-10-12-to-13">Upgrading from Spark 
SQL 1.0-1.2 to 1.3</a>        <ul>
+          <li><a href="#rename-of-schemardd-to-dataframe">Rename of SchemaRDD 
to DataFrame</a></li>
+          <li><a href="#unification-of-the-java-and-scala-apis">Unification of 
the Java and Scala APIs</a></li>
+          <li><a 
href="#isolation-of-implicit-conversions-and-removal-of-dsl-package-scala-only">Isolation
 of Implicit Conversions and Removal of dsl Package (Scala-only)</a></li>
+          <li><a 
href="#removal-of-the-type-aliases-in-orgapachesparksql-for-datatype-scala-only">Removal
 of the type aliases in org.apache.spark.sql for DataType (Scala-only)</a></li>
+          <li><a 
href="#udf-registration-moved-to-sqlcontextudf-java--scala">UDF Registration 
Moved to <code>sqlContext.udf</code> (Java &amp; Scala)</a></li>
+          <li><a href="#python-datatypes-no-longer-singletons">Python 
DataTypes No Longer Singletons</a></li>
         </ul>
       </li>
-      <li><a href="#compatibility-with-apache-hive" 
id="markdown-toc-compatibility-with-apache-hive">Compatibility with Apache 
Hive</a>        <ul>
-          <li><a href="#deploying-in-existing-hive-warehouses" 
id="markdown-toc-deploying-in-existing-hive-warehouses">Deploying in Existing 
Hive Warehouses</a></li>
-          <li><a href="#supported-hive-features" 
id="markdown-toc-supported-hive-features">Supported Hive Features</a></li>
-          <li><a href="#unsupported-hive-functionality" 
id="markdown-toc-unsupported-hive-functionality">Unsupported Hive 
Functionality</a></li>
+      <li><a href="#compatibility-with-apache-hive">Compatibility with Apache 
Hive</a>        <ul>
+          <li><a href="#deploying-in-existing-hive-warehouses">Deploying in 
Existing Hive Warehouses</a></li>
+          <li><a href="#supported-hive-features">Supported Hive 
Features</a></li>
+          <li><a href="#unsupported-hive-functionality">Unsupported Hive 
Functionality</a></li>
         </ul>
       </li>
     </ul>
   </li>
-  <li><a href="#reference" id="markdown-toc-reference">Reference</a>    <ul>
-      <li><a href="#data-types" id="markdown-toc-data-types">Data 
Types</a></li>
-      <li><a href="#nan-semantics" id="markdown-toc-nan-semantics">NaN 
Semantics</a></li>
+  <li><a href="#reference">Reference</a>    <ul>
+      <li><a href="#data-types">Data Types</a></li>
+      <li><a href="#nan-semantics">NaN Semantics</a></li>
     </ul>
   </li>
 </ul>
@@ -275,7 +275,7 @@ While, in <a 
href="api/java/index.html?org/apache/spark/sql/Dataset.html">Java A
 
     <p>The entry point into all functionality in Spark is the <a 
href="api/scala/index.html#org.apache.spark.sql.SparkSession"><code>SparkSession</code></a>
 class. To create a basic <code>SparkSession</code>, just use 
<code>SparkSession.builder()</code>:</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span 
class="nn">org.apache.spark.sql.SparkSession</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> 
<span class="nn">org.apache.spark.sql.SparkSession</span>
 
 <span class="k">val</span> <span class="n">spark</span> <span 
class="k">=</span> <span class="nc">SparkSession</span>
   <span class="o">.</span><span class="n">builder</span><span 
class="o">()</span>
@@ -293,7 +293,7 @@ While, in <a 
href="api/java/index.html?org/apache/spark/sql/Dataset.html">Java A
 
     <p>The entry point into all functionality in Spark is the <a 
href="api/java/index.html#org.apache.spark.sql.SparkSession"><code>SparkSession</code></a>
 class. To create a basic <code>SparkSession</code>, just use 
<code>SparkSession.builder()</code>:</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span 
class="nn">org.apache.spark.sql.SparkSession</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> 
<span class="nn">org.apache.spark.sql.SparkSession</span><span 
class="o">;</span>
 
 <span class="n">SparkSession</span> <span class="n">spark</span> <span 
class="o">=</span> <span class="n">SparkSession</span>
   <span class="o">.</span><span class="na">builder</span><span 
class="o">()</span>
@@ -308,12 +308,12 @@ While, in <a 
href="api/java/index.html?org/apache/spark/sql/Dataset.html">Java A
 
     <p>The entry point into all functionality in Spark is the <a 
href="api/python/pyspark.sql.html#pyspark.sql.SparkSession"><code>SparkSession</code></a>
 class. To create a basic <code>SparkSession</code>, just use 
<code>SparkSession.builder</code>:</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span 
class="nn">pyspark.sql</span> <span class="kn">import</span> <span 
class="n">SparkSession</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> 
<span class="nn">pyspark.sql</span> <span class="kn">import</span> <span 
class="n">SparkSession</span>
 
 <span class="n">spark</span> <span class="o">=</span> <span 
class="n">SparkSession</span> \
     <span class="o">.</span><span class="n">builder</span> \
-    <span class="o">.</span><span class="n">appName</span><span 
class="p">(</span><span class="s">&quot;Python Spark SQL basic 
example&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">config</span><span 
class="p">(</span><span 
class="s">&quot;spark.some.config.option&quot;</span><span class="p">,</span> 
<span class="s">&quot;some-value&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">appName</span><span 
class="p">(</span><span class="s2">&quot;Python Spark SQL basic 
example&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">config</span><span 
class="p">(</span><span 
class="s2">&quot;spark.some.config.option&quot;</span><span class="p">,</span> 
<span class="s2">&quot;some-value&quot;</span><span class="p">)</span> \
     <span class="o">.</span><span class="n">getOrCreate</span><span 
class="p">()</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
@@ -323,7 +323,7 @@ While, in <a 
href="api/java/index.html?org/apache/spark/sql/Dataset.html">Java A
 
     <p>The entry point into all functionality in Spark is the <a 
href="api/R/sparkR.session.html"><code>SparkSession</code></a> class. To 
initialize a basic <code>SparkSession</code>, just call 
<code>sparkR.session()</code>:</p>
 
-    <div class="highlight"><pre>sparkR.session<span class="p">(</span>appName 
<span class="o">=</span> <span class="s">&quot;R Spark SQL basic 
example&quot;</span><span class="p">,</span> sparkConfig <span 
class="o">=</span> <span class="kt">list</span><span 
class="p">(</span>spark.some.config.option <span class="o">=</span> <span 
class="s">&quot;some-value&quot;</span><span class="p">))</span>
+    <div class="highlight"><pre><span></span>sparkR.session<span 
class="p">(</span>appName <span class="o">=</span> <span class="s">&quot;R 
Spark SQL basic example&quot;</span><span class="p">,</span> sparkConfig <span 
class="o">=</span> <span class="kt">list</span><span 
class="p">(</span>spark.some.config.option <span class="o">=</span> <span 
class="s">&quot;some-value&quot;</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/r/RSparkSQLExample.R" in the Spark repo.</small></div>
 
@@ -344,7 +344,7 @@ from a Hive table, or from <a href="#data-sources">Spark 
data sources</a>.</p>
 
     <p>As an example, the following creates a DataFrame based on the content 
of a JSON file:</p>
 
-    <div class="highlight"><pre><span class="k">val</span> <span 
class="n">df</span> <span class="k">=</span> <span class="n">spark</span><span 
class="o">.</span><span class="n">read</span><span class="o">.</span><span 
class="n">json</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="o">)</span>
+    <div class="highlight"><pre><span></span><span class="k">val</span> <span 
class="n">df</span> <span class="k">=</span> <span class="n">spark</span><span 
class="o">.</span><span class="n">read</span><span class="o">.</span><span 
class="n">json</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="o">)</span>
 
 <span class="c1">// Displays the content of the DataFrame to stdout</span>
 <span class="n">df</span><span class="o">.</span><span 
class="n">show</span><span class="o">()</span>
@@ -365,7 +365,7 @@ from a Hive table, or from <a href="#data-sources">Spark 
data sources</a>.</p>
 
     <p>As an example, the following creates a DataFrame based on the content 
of a JSON file:</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span 
class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> 
<span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">org.apache.spark.sql.Row</span><span class="o">;</span>
 
 <span class="n">Dataset</span><span class="o">&lt;</span><span 
class="n">Row</span><span class="o">&gt;</span> <span class="n">df</span> <span 
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="na">read</span><span class="o">().</span><span 
class="na">json</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="o">);</span>
@@ -389,17 +389,17 @@ from a Hive table, or from <a href="#data-sources">Spark 
data sources</a>.</p>
 
     <p>As an example, the following creates a DataFrame based on the content 
of a JSON file:</p>
 
-    <div class="highlight"><pre><span class="c"># spark is an existing 
SparkSession</span>
-<span class="n">df</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">read</span><span 
class="o">.</span><span class="n">json</span><span class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">)</span>
-<span class="c"># Displays the content of the DataFrame to stdout</span>
+    <div class="highlight"><pre><span></span><span class="c1"># spark is an 
existing SparkSession</span>
+<span class="n">df</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">read</span><span 
class="o">.</span><span class="n">json</span><span class="p">(</span><span 
class="s2">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">)</span>
+<span class="c1"># Displays the content of the DataFrame to stdout</span>
 <span class="n">df</span><span class="o">.</span><span 
class="n">show</span><span class="p">()</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># | age|   name|</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># |null|Michael|</span>
-<span class="c"># |  30|   Andy|</span>
-<span class="c"># |  19| Justin|</span>
-<span class="c"># +----+-------+</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># | age|   name|</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># |null|Michael|</span>
+<span class="c1"># |  30|   Andy|</span>
+<span class="c1"># |  19| Justin|</span>
+<span class="c1"># +----+-------+</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
@@ -410,7 +410,7 @@ from a Hive table, or from <a href="#data-sources">Spark 
data sources</a>.</p>
 
     <p>As an example, the following creates a DataFrame based on the content 
of a JSON file:</p>
 
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> read.json<span 
class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> 
read.json<span class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">)</span>
 
 <span class="c1"># Displays the content of the DataFrame</span>
 <span class="kp">head</span><span class="p">(</span>df<span class="p">)</span>
@@ -444,7 +444,7 @@ showDF<span class="p">(</span>df<span class="p">)</span>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="c1">// This import is needed to 
use the $-notation</span>
+    <div class="highlight"><pre><span></span><span class="c1">// This import 
is needed to use the $-notation</span>
 <span class="k">import</span> <span class="nn">spark.implicits._</span>
 <span class="c1">// Print the schema in a tree format</span>
 <span class="n">df</span><span class="o">.</span><span 
class="n">printSchema</span><span class="o">()</span>
@@ -499,8 +499,8 @@ showDF<span class="p">(</span>df<span class="p">)</span>
 
 <div data-lang="java">
 
-    <div class="highlight"><pre><span class="c1">// col(&quot;...&quot;) is 
preferable to df.col(&quot;...&quot;)</span>
-<span class="kn">import</span> <span class="nn">static</span> <span 
class="n">org</span><span class="o">.</span><span class="na">apache</span><span 
class="o">.</span><span class="na">spark</span><span class="o">.</span><span 
class="na">sql</span><span class="o">.</span><span 
class="na">functions</span><span class="o">.</span><span 
class="na">col</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="c1">// 
col(&quot;...&quot;) is preferable to df.col(&quot;...&quot;)</span>
+<span class="kn">import static</span> <span 
class="nn">org.apache.spark.sql.functions.col</span><span class="o">;</span>
 
 <span class="c1">// Print the schema in a tree format</span>
 <span class="n">df</span><span class="o">.</span><span 
class="na">printSchema</span><span class="o">();</span>
@@ -560,50 +560,50 @@ interactive data exploration, users are highly encouraged 
to use the
 latter form, which is future proof and won&#8217;t break with column names that
 are also attributes on the DataFrame class.</p>
 
-    <div class="highlight"><pre><span class="c"># spark, df are from the 
previous example</span>
-<span class="c"># Print the schema in a tree format</span>
+    <div class="highlight"><pre><span></span><span class="c1"># spark, df are 
from the previous example</span>
+<span class="c1"># Print the schema in a tree format</span>
 <span class="n">df</span><span class="o">.</span><span 
class="n">printSchema</span><span class="p">()</span>
-<span class="c"># root</span>
-<span class="c"># |-- age: long (nullable = true)</span>
-<span class="c"># |-- name: string (nullable = true)</span>
-
-<span class="c"># Select only the &quot;name&quot; column</span>
-<span class="n">df</span><span class="o">.</span><span 
class="n">select</span><span class="p">(</span><span 
class="s">&quot;name&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +-------+</span>
-<span class="c"># |   name|</span>
-<span class="c"># +-------+</span>
-<span class="c"># |Michael|</span>
-<span class="c"># |   Andy|</span>
-<span class="c"># | Justin|</span>
-<span class="c"># +-------+</span>
-
-<span class="c"># Select everybody, but increment the age by 1</span>
-<span class="n">df</span><span class="o">.</span><span 
class="n">select</span><span class="p">(</span><span class="n">df</span><span 
class="p">[</span><span class="s">&#39;name&#39;</span><span 
class="p">],</span> <span class="n">df</span><span class="p">[</span><span 
class="s">&#39;age&#39;</span><span class="p">]</span> <span class="o">+</span> 
<span class="mi">1</span><span class="p">)</span><span class="o">.</span><span 
class="n">show</span><span class="p">()</span>
-<span class="c"># +-------+---------+</span>
-<span class="c"># |   name|(age + 1)|</span>
-<span class="c"># +-------+---------+</span>
-<span class="c"># |Michael|     null|</span>
-<span class="c"># |   Andy|       31|</span>
-<span class="c"># | Justin|       20|</span>
-<span class="c"># +-------+---------+</span>
-
-<span class="c"># Select people older than 21</span>
-<span class="n">df</span><span class="o">.</span><span 
class="n">filter</span><span class="p">(</span><span class="n">df</span><span 
class="p">[</span><span class="s">&#39;age&#39;</span><span class="p">]</span> 
<span class="o">&gt;</span> <span class="mi">21</span><span 
class="p">)</span><span class="o">.</span><span class="n">show</span><span 
class="p">()</span>
-<span class="c"># +---+----+</span>
-<span class="c"># |age|name|</span>
-<span class="c"># +---+----+</span>
-<span class="c"># | 30|Andy|</span>
-<span class="c"># +---+----+</span>
-
-<span class="c"># Count people by age</span>
-<span class="n">df</span><span class="o">.</span><span 
class="n">groupBy</span><span class="p">(</span><span 
class="s">&quot;age&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">count</span><span class="p">()</span><span 
class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +----+-----+</span>
-<span class="c"># | age|count|</span>
-<span class="c"># +----+-----+</span>
-<span class="c"># |  19|    1|</span>
-<span class="c"># |null|    1|</span>
-<span class="c"># |  30|    1|</span>
-<span class="c"># +----+-----+</span>
+<span class="c1"># root</span>
+<span class="c1"># |-- age: long (nullable = true)</span>
+<span class="c1"># |-- name: string (nullable = true)</span>
+
+<span class="c1"># Select only the &quot;name&quot; column</span>
+<span class="n">df</span><span class="o">.</span><span 
class="n">select</span><span class="p">(</span><span 
class="s2">&quot;name&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +-------+</span>
+<span class="c1"># |   name|</span>
+<span class="c1"># +-------+</span>
+<span class="c1"># |Michael|</span>
+<span class="c1"># |   Andy|</span>
+<span class="c1"># | Justin|</span>
+<span class="c1"># +-------+</span>
+
+<span class="c1"># Select everybody, but increment the age by 1</span>
+<span class="n">df</span><span class="o">.</span><span 
class="n">select</span><span class="p">(</span><span class="n">df</span><span 
class="p">[</span><span class="s1">&#39;name&#39;</span><span 
class="p">],</span> <span class="n">df</span><span class="p">[</span><span 
class="s1">&#39;age&#39;</span><span class="p">]</span> <span 
class="o">+</span> <span class="mi">1</span><span class="p">)</span><span 
class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +-------+---------+</span>
+<span class="c1"># |   name|(age + 1)|</span>
+<span class="c1"># +-------+---------+</span>
+<span class="c1"># |Michael|     null|</span>
+<span class="c1"># |   Andy|       31|</span>
+<span class="c1"># | Justin|       20|</span>
+<span class="c1"># +-------+---------+</span>
+
+<span class="c1"># Select people older than 21</span>
+<span class="n">df</span><span class="o">.</span><span 
class="n">filter</span><span class="p">(</span><span class="n">df</span><span 
class="p">[</span><span class="s1">&#39;age&#39;</span><span class="p">]</span> 
<span class="o">&gt;</span> <span class="mi">21</span><span 
class="p">)</span><span class="o">.</span><span class="n">show</span><span 
class="p">()</span>
+<span class="c1"># +---+----+</span>
+<span class="c1"># |age|name|</span>
+<span class="c1"># +---+----+</span>
+<span class="c1"># | 30|Andy|</span>
+<span class="c1"># +---+----+</span>
+
+<span class="c1"># Count people by age</span>
+<span class="n">df</span><span class="o">.</span><span 
class="n">groupBy</span><span class="p">(</span><span 
class="s2">&quot;age&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">count</span><span class="p">()</span><span 
class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +----+-----+</span>
+<span class="c1"># | age|count|</span>
+<span class="c1"># +----+-----+</span>
+<span class="c1"># |  19|    1|</span>
+<span class="c1"># |null|    1|</span>
+<span class="c1"># |  30|    1|</span>
+<span class="c1"># +----+-----+</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
     <p>For a complete list of the types of operations that can be performed on 
a DataFrame refer to the <a 
href="api/python/pyspark.sql.html#pyspark.sql.DataFrame">API 
Documentation</a>.</p>
@@ -614,7 +614,7 @@ are also attributes on the DataFrame class.</p>
 
 <div data-lang="r">
 
-    <div class="highlight"><pre><span class="c1"># Create the DataFrame</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Create the 
DataFrame</span>
 df <span class="o">&lt;-</span> read.json<span class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">)</span>
 
 <span class="c1"># Show the content of the DataFrame</span>
@@ -673,7 +673,7 @@ printSchema<span class="p">(</span>df<span 
class="p">)</span>
 <div data-lang="scala">
     <p>The <code>sql</code> function on a <code>SparkSession</code> enables 
applications to run SQL queries programmatically and returns the result as a 
<code>DataFrame</code>.</p>
 
-    <div class="highlight"><pre><span class="c1">// Register the DataFrame as 
a SQL temporary view</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Register the 
DataFrame as a SQL temporary view</span>
 <span class="n">df</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="o">(</span><span 
class="s">&quot;people&quot;</span><span class="o">)</span>
 
 <span class="k">val</span> <span class="n">sqlDF</span> <span 
class="k">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">sql</span><span class="o">(</span><span class="s">&quot;SELECT * FROM 
people&quot;</span><span class="o">)</span>
@@ -692,7 +692,7 @@ printSchema<span class="p">(</span>df<span 
class="p">)</span>
 <div data-lang="java">
     <p>The <code>sql</code> function on a <code>SparkSession</code> enables 
applications to run SQL queries programmatically and returns the result as a 
<code>Dataset&lt;Row&gt;</code>.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span 
class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> 
<span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">org.apache.spark.sql.Row</span><span class="o">;</span>
 
 <span class="c1">// Register the DataFrame as a SQL temporary view</span>
@@ -714,18 +714,18 @@ printSchema<span class="p">(</span>df<span 
class="p">)</span>
 <div data-lang="python">
     <p>The <code>sql</code> function on a <code>SparkSession</code> enables 
applications to run SQL queries programmatically and returns the result as a 
<code>DataFrame</code>.</p>
 
-    <div class="highlight"><pre><span class="c"># Register the DataFrame as a 
SQL temporary view</span>
-<span class="n">df</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s">&quot;people&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Register the 
DataFrame as a SQL temporary view</span>
+<span class="n">df</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s2">&quot;people&quot;</span><span class="p">)</span>
 
-<span class="n">sqlDF</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="p">(</span><span class="s">&quot;SELECT * FROM people&quot;</span><span 
class="p">)</span>
+<span class="n">sqlDF</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="p">(</span><span class="s2">&quot;SELECT * FROM people&quot;</span><span 
class="p">)</span>
 <span class="n">sqlDF</span><span class="o">.</span><span 
class="n">show</span><span class="p">()</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># | age|   name|</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># |null|Michael|</span>
-<span class="c"># |  30|   Andy|</span>
-<span class="c"># |  19| Justin|</span>
-<span class="c"># +----+-------+</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># | age|   name|</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># |null|Michael|</span>
+<span class="c1"># |  30|   Andy|</span>
+<span class="c1"># |  19| Justin|</span>
+<span class="c1"># +----+-------+</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
@@ -733,7 +733,7 @@ printSchema<span class="p">(</span>df<span 
class="p">)</span>
 <div data-lang="r">
     <p>The <code>sql</code> function enables applications to run SQL queries 
programmatically and returns the result as a <code>SparkDataFrame</code>.</p>
 
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> sql<span 
class="p">(</span><span class="s">&quot;SELECT * FROM table&quot;</span><span 
class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> 
sql<span class="p">(</span><span class="s">&quot;SELECT * FROM 
table&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/r/RSparkSQLExample.R" in the Spark repo.</small></div>
 
@@ -750,7 +750,7 @@ refer it, e.g. <code>SELECT * FROM 
global_temp.view1</code>.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="c1">// Register the DataFrame as 
a global temporary view</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Register the 
DataFrame as a global temporary view</span>
 <span class="n">df</span><span class="o">.</span><span 
class="n">createGlobalTempView</span><span class="o">(</span><span 
class="s">&quot;people&quot;</span><span class="o">)</span>
 
 <span class="c1">// Global temporary view is tied to a system preserved 
database `global_temp`</span>
@@ -777,7 +777,7 @@ refer it, e.g. <code>SELECT * FROM 
global_temp.view1</code>.</p>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="c1">// Register the DataFrame as 
a global temporary view</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Register the 
DataFrame as a global temporary view</span>
 <span class="n">df</span><span class="o">.</span><span 
class="na">createGlobalTempView</span><span class="o">(</span><span 
class="s">&quot;people&quot;</span><span class="o">);</span>
 
 <span class="c1">// Global temporary view is tied to a system preserved 
database `global_temp`</span>
@@ -804,37 +804,37 @@ refer it, e.g. <code>SELECT * FROM 
global_temp.view1</code>.</p>
   </div>
 
 <div data-lang="python">
-    <div class="highlight"><pre><span class="c"># Register the DataFrame as a 
global temporary view</span>
-<span class="n">df</span><span class="o">.</span><span 
class="n">createGlobalTempView</span><span class="p">(</span><span 
class="s">&quot;people&quot;</span><span class="p">)</span>
-
-<span class="c"># Global temporary view is tied to a system preserved database 
`global_temp`</span>
-<span class="n">spark</span><span class="o">.</span><span 
class="n">sql</span><span class="p">(</span><span class="s">&quot;SELECT * FROM 
global_temp.people&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># | age|   name|</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># |null|Michael|</span>
-<span class="c"># |  30|   Andy|</span>
-<span class="c"># |  19| Justin|</span>
-<span class="c"># +----+-------+</span>
-
-<span class="c"># Global temporary view is cross-session</span>
-<span class="n">spark</span><span class="o">.</span><span 
class="n">newSession</span><span class="p">()</span><span 
class="o">.</span><span class="n">sql</span><span class="p">(</span><span 
class="s">&quot;SELECT * FROM global_temp.people&quot;</span><span 
class="p">)</span><span class="o">.</span><span class="n">show</span><span 
class="p">()</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># | age|   name|</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># |null|Michael|</span>
-<span class="c"># |  30|   Andy|</span>
-<span class="c"># |  19| Justin|</span>
-<span class="c"># +----+-------+</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Register the 
DataFrame as a global temporary view</span>
+<span class="n">df</span><span class="o">.</span><span 
class="n">createGlobalTempView</span><span class="p">(</span><span 
class="s2">&quot;people&quot;</span><span class="p">)</span>
+
+<span class="c1"># Global temporary view is tied to a system preserved 
database `global_temp`</span>
+<span class="n">spark</span><span class="o">.</span><span 
class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT * 
FROM global_temp.people&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># | age|   name|</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># |null|Michael|</span>
+<span class="c1"># |  30|   Andy|</span>
+<span class="c1"># |  19| Justin|</span>
+<span class="c1"># +----+-------+</span>
+
+<span class="c1"># Global temporary view is cross-session</span>
+<span class="n">spark</span><span class="o">.</span><span 
class="n">newSession</span><span class="p">()</span><span 
class="o">.</span><span class="n">sql</span><span class="p">(</span><span 
class="s2">&quot;SELECT * FROM global_temp.people&quot;</span><span 
class="p">)</span><span class="o">.</span><span class="n">show</span><span 
class="p">()</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># | age|   name|</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># |null|Michael|</span>
+<span class="c1"># |  30|   Andy|</span>
+<span class="c1"># |  19| Justin|</span>
+<span class="c1"># +----+-------+</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="sql">
 
-    <div class="highlight"><pre><code class="language-sql" 
data-lang="sql"><span class="k">CREATE</span> <span class="k">GLOBAL</span> 
<span class="k">TEMPORARY</span> <span class="k">VIEW</span> <span 
class="n">temp_view</span> <span class="k">AS</span> <span 
class="k">SELECT</span> <span class="n">a</span> <span class="o">+</span> <span 
class="mi">1</span><span class="p">,</span> <span class="n">b</span> <span 
class="o">*</span> <span class="mi">2</span> <span class="k">FROM</span> <span 
class="n">tbl</span>
+    <figure class="highlight"><pre><code class="language-sql" 
data-lang="sql"><span></span><span class="k">CREATE</span> <span 
class="k">GLOBAL</span> <span class="k">TEMPORARY</span> <span 
class="k">VIEW</span> <span class="n">temp_view</span> <span 
class="k">AS</span> <span class="k">SELECT</span> <span class="n">a</span> 
<span class="o">+</span> <span class="mi">1</span><span class="p">,</span> 
<span class="n">b</span> <span class="o">*</span> <span class="mi">2</span> 
<span class="k">FROM</span> <span class="n">tbl</span>
 
-<span class="k">SELECT</span> <span class="o">*</span> <span 
class="k">FROM</span> <span class="n">global_temp</span><span 
class="p">.</span><span class="n">temp_view</span></code></pre></div>
+<span class="k">SELECT</span> <span class="o">*</span> <span 
class="k">FROM</span> <span class="n">global_temp</span><span 
class="p">.</span><span class="n">temp_view</span></code></pre></figure>
 
   </div>
 </div>
@@ -850,7 +850,7 @@ the bytes back into an object.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="c1">// Note: Case classes in 
Scala 2.10 can support only up to 22 fields. To work around this limit,</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Note: Case 
classes in Scala 2.10 can support only up to 22 fields. To work around this 
limit,</span>
 <span class="c1">// you can use custom classes that implement the Product 
interface</span>
 <span class="k">case</span> <span class="k">class</span> <span 
class="nc">Person</span><span class="o">(</span><span 
class="n">name</span><span class="k">:</span> <span 
class="kt">String</span><span class="o">,</span> <span 
class="n">age</span><span class="k">:</span> <span class="kt">Long</span><span 
class="o">)</span>
 
@@ -883,7 +883,7 @@ the bytes back into an object.</p>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="kn">import</span> <span 
class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> 
<span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">java.util.Collections</span><span class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">java.io.Serializable</span><span class="o">;</span>
 
@@ -915,7 +915,7 @@ the bytes back into an object.</p>
 <span class="o">}</span>
 
 <span class="c1">// Create an instance of a Bean class</span>
-<span class="n">Person</span> <span class="n">person</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="nf">Person</span><span class="o">();</span>
+<span class="n">Person</span> <span class="n">person</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="n">Person</span><span class="o">();</span>
 <span class="n">person</span><span class="o">.</span><span 
class="na">setName</span><span class="o">(</span><span 
class="s">&quot;Andy&quot;</span><span class="o">);</span>
 <span class="n">person</span><span class="o">.</span><span 
class="na">setAge</span><span class="o">(</span><span class="mi">32</span><span 
class="o">);</span>
 
@@ -982,7 +982,7 @@ reflection and become the names of the columns. Case 
classes can also be nested
 types such as <code>Seq</code>s or <code>Array</code>s. This RDD can be 
implicitly converted to a DataFrame and then be
 registered as a table. Tables can be used in subsequent SQL statements.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span 
class="nn">org.apache.spark.sql.catalyst.encoders.ExpressionEncoder</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> 
<span class="nn">org.apache.spark.sql.catalyst.encoders.ExpressionEncoder</span>
 <span class="k">import</span> <span 
class="nn">org.apache.spark.sql.Encoder</span>
 
 <span class="c1">// For implicit conversions from RDDs to DataFrames</span>
@@ -1037,7 +1037,7 @@ does not support JavaBeans that contain <code>Map</code> 
field(s). Nested JavaBe
 fields are supported though. You can create a JavaBean by creating a class 
that implements
 Serializable and has getters and setters for all of its fields.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span 
class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> 
<span class="nn">org.apache.spark.api.java.JavaRDD</span><span 
class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">org.apache.spark.api.java.function.Function</span><span 
class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">org.apache.spark.api.java.function.MapFunction</span><span 
class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
@@ -1053,7 +1053,7 @@ Serializable and has getters and setters for all of its 
fields.</p>
     <span class="nd">@Override</span>
     <span class="kd">public</span> <span class="n">Person</span> <span 
class="nf">call</span><span class="o">(</span><span class="n">String</span> 
<span class="n">line</span><span class="o">)</span> <span 
class="kd">throws</span> <span class="n">Exception</span> <span 
class="o">{</span>
       <span class="n">String</span><span class="o">[]</span> <span 
class="n">parts</span> <span class="o">=</span> <span 
class="n">line</span><span class="o">.</span><span class="na">split</span><span 
class="o">(</span><span class="s">&quot;,&quot;</span><span class="o">);</span>
-      <span class="n">Person</span> <span class="n">person</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="nf">Person</span><span class="o">();</span>
+      <span class="n">Person</span> <span class="n">person</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="n">Person</span><span class="o">();</span>
       <span class="n">person</span><span class="o">.</span><span 
class="na">setName</span><span class="o">(</span><span 
class="n">parts</span><span class="o">[</span><span class="mi">0</span><span 
class="o">]);</span>
       <span class="n">person</span><span class="o">.</span><span 
class="na">setAge</span><span class="o">(</span><span 
class="n">Integer</span><span class="o">.</span><span 
class="na">parseInt</span><span class="o">(</span><span 
class="n">parts</span><span class="o">[</span><span class="mi">1</span><span 
class="o">].</span><span class="na">trim</span><span class="o">()));</span>
       <span class="k">return</span> <span class="n">person</span><span 
class="o">;</span>
@@ -1106,28 +1106,28 @@ Serializable and has getters and setters for all of its 
fields.</p>
 key/value pairs as kwargs to the Row class. The keys of this list define the 
column names of the table,
 and the types are inferred by sampling the whole dataset, similar to the 
inference that is performed on JSON files.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span 
class="nn">pyspark.sql</span> <span class="kn">import</span> <span 
class="n">Row</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> 
<span class="nn">pyspark.sql</span> <span class="kn">import</span> <span 
class="n">Row</span>
 
 <span class="n">sc</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span 
class="n">sparkContext</span>
 
-<span class="c"># Load a text file and convert each line to a Row.</span>
-<span class="n">lines</span> <span class="o">=</span> <span 
class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span 
class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.txt&quot;</span><span 
class="p">)</span>
-<span class="n">parts</span> <span class="o">=</span> <span 
class="n">lines</span><span class="o">.</span><span class="n">map</span><span 
class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span 
class="p">:</span> <span class="n">l</span><span class="o">.</span><span 
class="n">split</span><span class="p">(</span><span 
class="s">&quot;,&quot;</span><span class="p">))</span>
+<span class="c1"># Load a text file and convert each line to a Row.</span>
+<span class="n">lines</span> <span class="o">=</span> <span 
class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span 
class="p">(</span><span 
class="s2">&quot;examples/src/main/resources/people.txt&quot;</span><span 
class="p">)</span>
+<span class="n">parts</span> <span class="o">=</span> <span 
class="n">lines</span><span class="o">.</span><span class="n">map</span><span 
class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span 
class="p">:</span> <span class="n">l</span><span class="o">.</span><span 
class="n">split</span><span class="p">(</span><span 
class="s2">&quot;,&quot;</span><span class="p">))</span>
 <span class="n">people</span> <span class="o">=</span> <span 
class="n">parts</span><span class="o">.</span><span class="n">map</span><span 
class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span 
class="p">:</span> <span class="n">Row</span><span class="p">(</span><span 
class="n">name</span><span class="o">=</span><span class="n">p</span><span 
class="p">[</span><span class="mi">0</span><span class="p">],</span> <span 
class="n">age</span><span class="o">=</span><span class="nb">int</span><span 
class="p">(</span><span class="n">p</span><span class="p">[</span><span 
class="mi">1</span><span class="p">])))</span>
 
-<span class="c"># Infer the schema, and register the DataFrame as a 
table.</span>
+<span class="c1"># Infer the schema, and register the DataFrame as a 
table.</span>
 <span class="n">schemaPeople</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span 
class="n">createDataFrame</span><span class="p">(</span><span 
class="n">people</span><span class="p">)</span>
-<span class="n">schemaPeople</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s">&quot;people&quot;</span><span class="p">)</span>
+<span class="n">schemaPeople</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s2">&quot;people&quot;</span><span class="p">)</span>
 
-<span class="c"># SQL can be run over DataFrames that have been registered as 
a table.</span>
-<span class="n">teenagers</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="p">(</span><span class="s">&quot;SELECT name FROM people WHERE age &gt;= 
13 AND age &lt;= 19&quot;</span><span class="p">)</span>
+<span class="c1"># SQL can be run over DataFrames that have been registered as 
a table.</span>
+<span class="n">teenagers</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="p">(</span><span class="s2">&quot;SELECT name FROM people WHERE age 
&gt;= 13 AND age &lt;= 19&quot;</span><span class="p">)</span>
 
-<span class="c"># The results of SQL queries are Dataframe objects.</span>
-<span class="c"># rdd returns the content as an :class:`pyspark.RDD` of 
:class:`Row`.</span>
-<span class="n">teenNames</span> <span class="o">=</span> <span 
class="n">teenagers</span><span class="o">.</span><span 
class="n">rdd</span><span class="o">.</span><span class="n">map</span><span 
class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span 
class="p">:</span> <span class="s">&quot;Name: &quot;</span> <span 
class="o">+</span> <span class="n">p</span><span class="o">.</span><span 
class="n">name</span><span class="p">)</span><span class="o">.</span><span 
class="n">collect</span><span class="p">()</span>
+<span class="c1"># The results of SQL queries are Dataframe objects.</span>
+<span class="c1"># rdd returns the content as an :class:`pyspark.RDD` of 
:class:`Row`.</span>
+<span class="n">teenNames</span> <span class="o">=</span> <span 
class="n">teenagers</span><span class="o">.</span><span 
class="n">rdd</span><span class="o">.</span><span class="n">map</span><span 
class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span 
class="p">:</span> <span class="s2">&quot;Name: &quot;</span> <span 
class="o">+</span> <span class="n">p</span><span class="o">.</span><span 
class="n">name</span><span class="p">)</span><span class="o">.</span><span 
class="n">collect</span><span class="p">()</span>
 <span class="k">for</span> <span class="n">name</span> <span 
class="ow">in</span> <span class="n">teenNames</span><span class="p">:</span>
     <span class="k">print</span><span class="p">(</span><span 
class="n">name</span><span class="p">)</span>
-<span class="c"># Name: Justin</span>
+<span class="c1"># Name: Justin</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
@@ -1155,7 +1155,7 @@ by <code>SparkSession</code>.</li>
 
     <p>For example:</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span 
class="nn">org.apache.spark.sql.types._</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> 
<span class="nn">org.apache.spark.sql.types._</span>
 
 <span class="c1">// Create an RDD</span>
 <span class="k">val</span> <span class="n">peopleRDD</span> <span 
class="k">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">sparkContext</span><span class="o">.</span><span 
class="n">textFile</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/people.txt&quot;</span><span 
class="o">)</span>
@@ -1213,7 +1213,7 @@ by <code>SparkSession</code>.</li>
 
     <p>For example:</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span 
class="nn">java.util.ArrayList</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> 
<span class="nn">java.util.ArrayList</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span 
class="o">;</span>
 
 <span class="kn">import</span> <span 
class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
@@ -1296,43 +1296,43 @@ tuples or lists in the RDD created in the step 1.</li>
 
     <p>For example:</p>
 
-    <div class="highlight"><pre><span class="c"># Import data types</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Import data 
types</span>
 <span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span 
class="kn">import</span> <span class="o">*</span>
 
 <span class="n">sc</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span 
class="n">sparkContext</span>
 
-<span class="c"># Load a text file and convert each line to a Row.</span>
-<span class="n">lines</span> <span class="o">=</span> <span 
class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span 
class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.txt&quot;</span><span 
class="p">)</span>
-<span class="n">parts</span> <span class="o">=</span> <span 
class="n">lines</span><span class="o">.</span><span class="n">map</span><span 
class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span 
class="p">:</span> <span class="n">l</span><span class="o">.</span><span 
class="n">split</span><span class="p">(</span><span 
class="s">&quot;,&quot;</span><span class="p">))</span>
-<span class="c"># Each line is converted to a tuple.</span>
+<span class="c1"># Load a text file and convert each line to a Row.</span>
+<span class="n">lines</span> <span class="o">=</span> <span 
class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span 
class="p">(</span><span 
class="s2">&quot;examples/src/main/resources/people.txt&quot;</span><span 
class="p">)</span>
+<span class="n">parts</span> <span class="o">=</span> <span 
class="n">lines</span><span class="o">.</span><span class="n">map</span><span 
class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span 
class="p">:</span> <span class="n">l</span><span class="o">.</span><span 
class="n">split</span><span class="p">(</span><span 
class="s2">&quot;,&quot;</span><span class="p">))</span>
+<span class="c1"># Each line is converted to a tuple.</span>
 <span class="n">people</span> <span class="o">=</span> <span 
class="n">parts</span><span class="o">.</span><span class="n">map</span><span 
class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span 
class="p">:</span> <span class="p">(</span><span class="n">p</span><span 
class="p">[</span><span class="mi">0</span><span class="p">],</span> <span 
class="n">p</span><span class="p">[</span><span class="mi">1</span><span 
class="p">]</span><span class="o">.</span><span class="n">strip</span><span 
class="p">()))</span>
 
-<span class="c"># The schema is encoded in a string.</span>
-<span class="n">schemaString</span> <span class="o">=</span> <span 
class="s">&quot;name age&quot;</span>
+<span class="c1"># The schema is encoded in a string.</span>
+<span class="n">schemaString</span> <span class="o">=</span> <span 
class="s2">&quot;name age&quot;</span>
 
 <span class="n">fields</span> <span class="o">=</span> <span 
class="p">[</span><span class="n">StructField</span><span 
class="p">(</span><span class="n">field_name</span><span class="p">,</span> 
<span class="n">StringType</span><span class="p">(),</span> <span 
class="bp">True</span><span class="p">)</span> <span class="k">for</span> <span 
class="n">field_name</span> <span class="ow">in</span> <span 
class="n">schemaString</span><span class="o">.</span><span 
class="n">split</span><span class="p">()]</span>
 <span class="n">schema</span> <span class="o">=</span> <span 
class="n">StructType</span><span class="p">(</span><span 
class="n">fields</span><span class="p">)</span>
 
-<span class="c"># Apply the schema to the RDD.</span>
+<span class="c1"># Apply the schema to the RDD.</span>
 <span class="n">schemaPeople</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span 
class="n">createDataFrame</span><span class="p">(</span><span 
class="n">people</span><span class="p">,</span> <span 
class="n">schema</span><span class="p">)</span>
 
-<span class="c"># Creates a temporary view using the DataFrame</span>
-<span class="n">schemaPeople</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s">&quot;people&quot;</span><span class="p">)</span>
+<span class="c1"># Creates a temporary view using the DataFrame</span>
+<span class="n">schemaPeople</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s2">&quot;people&quot;</span><span class="p">)</span>
 
-<span class="c"># Creates a temporary view using the DataFrame</span>
-<span class="n">schemaPeople</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s">&quot;people&quot;</span><span class="p">)</span>
+<span class="c1"># Creates a temporary view using the DataFrame</span>
+<span class="n">schemaPeople</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s2">&quot;people&quot;</span><span class="p">)</span>
 
-<span class="c"># SQL can be run over DataFrames that have been registered as 
a table.</span>
-<span class="n">results</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="p">(</span><span class="s">&quot;SELECT name FROM 
people&quot;</span><span class="p">)</span>
+<span class="c1"># SQL can be run over DataFrames that have been registered as 
a table.</span>
+<span class="n">results</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="p">(</span><span class="s2">&quot;SELECT name FROM 
people&quot;</span><span class="p">)</span>
 
 <span class="n">results</span><span class="o">.</span><span 
class="n">show</span><span class="p">()</span>
-<span class="c"># +-------+</span>
-<span class="c"># |   name|</span>
-<span class="c"># +-------+</span>
-<span class="c"># |Michael|</span>
-<span class="c"># |   Andy|</span>
-<span class="c"># | Justin|</span>
-<span class="c"># +-------+</span>
+<span class="c1"># +-------+</span>
+<span class="c1"># |   name|</span>
+<span class="c1"># +-------+</span>
+<span class="c1"># |Michael|</span>
+<span class="c1"># |   Andy|</span>
+<span class="c1"># | Justin|</span>
+<span class="c1"># +-------+</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
@@ -1354,14 +1354,14 @@ goes into specific options that are available for the 
built-in data sources.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="k">val</span> <span 
class="n">usersDF</span> <span class="k">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">read</span><span 
class="o">.</span><span class="n">load</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span 
class="o">)</span>
+    <div class="highlight"><pre><span></span><span class="k">val</span> <span 
class="n">usersDF</span> <span class="k">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">read</span><span 
class="o">.</span><span class="n">load</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span 
class="o">)</span>
 <span class="n">usersDF</span><span class="o">.</span><span 
class="n">select</span><span class="o">(</span><span 
class="s">&quot;name&quot;</span><span class="o">,</span> <span 
class="s">&quot;favorite_color&quot;</span><span class="o">).</span><span 
class="n">write</span><span class="o">.</span><span class="n">save</span><span 
class="o">(</span><span 
class="s">&quot;namesAndFavColors.parquet&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala"
 in the Spark repo.</small></div>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="n">Dataset</span><span 
class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> 
<span class="n">usersDF</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="na">read</span><span 
class="o">().</span><span class="na">load</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span 
class="o">);</span>
+    <div class="highlight"><pre><span></span><span 
class="n">Dataset</span><span class="o">&lt;</span><span 
class="n">Row</span><span class="o">&gt;</span> <span class="n">usersDF</span> 
<span class="o">=</span> <span class="n">spark</span><span 
class="o">.</span><span class="na">read</span><span class="o">().</span><span 
class="na">load</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span 
class="o">);</span>
 <span class="n">usersDF</span><span class="o">.</span><span 
class="na">select</span><span class="o">(</span><span 
class="s">&quot;name&quot;</span><span class="o">,</span> <span 
class="s">&quot;favorite_color&quot;</span><span class="o">).</span><span 
class="na">write</span><span class="o">().</span><span 
class="na">save</span><span class="o">(</span><span 
class="s">&quot;namesAndFavColors.parquet&quot;</span><span class="o">);</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java"
 in the Spark repo.</small></div>
@@ -1369,15 +1369,15 @@ goes into specific options that are available for the 
built-in data sources.</p>
 
 <div data-lang="python">
 
-    <div class="highlight"><pre><span class="n">df</span> <span 
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">read</span><span class="o">.</span><span class="n">load</span><span 
class="p">(</span><span 
class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span 
class="p">)</span>
-<span class="n">df</span><span class="o">.</span><span 
class="n">select</span><span class="p">(</span><span 
class="s">&quot;name&quot;</span><span class="p">,</span> <span 
class="s">&quot;favorite_color&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">write</span><span class="o">.</span><span 
class="n">save</span><span class="p">(</span><span 
class="s">&quot;namesAndFavColors.parquet&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="n">df</span> <span 
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">read</span><span class="o">.</span><span class="n">load</span><span 
class="p">(</span><span 
class="s2">&quot;examples/src/main/resources/users.parquet&quot;</span><span 
class="p">)</span>
+<span class="n">df</span><span class="o">.</span><span 
class="n">select</span><span class="p">(</span><span 
class="s2">&quot;name&quot;</span><span class="p">,</span> <span 
class="s2">&quot;favorite_color&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">write</span><span class="o">.</span><span 
class="n">save</span><span class="p">(</span><span 
class="s2">&quot;namesAndFavColors.parquet&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/datasource.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="r">
 
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> read.df<span 
class="p">(</span><span 
class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span 
class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> 
read.df<span class="p">(</span><span 
class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span 
class="p">)</span>
 write.df<span class="p">(</span>select<span class="p">(</span>df<span 
class="p">,</span> <span class="s">&quot;name&quot;</span><span 
class="p">,</span> <span class="s">&quot;favorite_color&quot;</span><span 
class="p">),</span> <span 
class="s">&quot;namesAndFavColors.parquet&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/r/RSparkSQLExample.R" in the Spark repo.</small></div>
@@ -1395,14 +1395,14 @@ source type can be converted into other types using 
this syntax.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="k">val</span> <span 
class="n">peopleDF</span> <span class="k">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">read</span><span 
class="o">.</span><span class="n">format</span><span class="o">(</span><span 
class="s">&quot;json&quot;</span><span class="o">).</span><span 
class="n">load</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="o">)</span>
+    <div class="highlight"><pre><span></span><span class="k">val</span> <span 
class="n">peopleDF</span> <span class="k">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">read</span><span 
class="o">.</span><span class="n">format</span><span class="o">(</span><span 
class="s">&quot;json&quot;</span><span class="o">).</span><span 
class="n">load</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="o">)</span>
 <span class="n">peopleDF</span><span class="o">.</span><span 
class="n">select</span><span class="o">(</span><span 
class="s">&quot;name&quot;</span><span class="o">,</span> <span 
class="s">&quot;age&quot;</span><span class="o">).</span><span 
class="n">write</span><span class="o">.</span><span 
class="n">format</span><span class="o">(</span><span 
class="s">&quot;parquet&quot;</span><span class="o">).</span><span 
class="n">save</span><span class="o">(</span><span 
class="s">&quot;namesAndAges.parquet&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala"
 in the Spark repo.</small></div>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="n">Dataset</span><span 
class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> 
<span class="n">peopleDF</span> <span class="o">=</span>
+    <div class="highlight"><pre><span></span><span 
class="n">Dataset</span><span class="o">&lt;</span><span 
class="n">Row</span><span class="o">&gt;</span> <span class="n">peopleDF</span> 
<span class="o">=</span>
   <span class="n">spark</span><span class="o">.</span><span 
class="na">read</span><span class="o">().</span><span 
class="na">format</span><span class="o">(</span><span 
class="s">&quot;json&quot;</span><span class="o">).</span><span 
class="na">load</span><span class="o">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="o">);</span>
 <span class="n">peopleDF</span><span class="o">.</span><span 
class="na">select</span><span class="o">(</span><span 
class="s">&quot;name&quot;</span><span class="o">,</span> <span 
class="s">&quot;age&quot;</span><span class="o">).</span><span 
class="na">write</span><span class="o">().</span><span 
class="na">format</span><span class="o">(</span><span 
class="s">&quot;parquet&quot;</span><span class="o">).</span><span 
class="na">save</span><span class="o">(</span><span 
class="s">&quot;namesAndAges.parquet&quot;</span><span class="o">);</span>
 </pre></div>
@@ -1410,14 +1410,14 @@ source type can be converted into other types using 
this syntax.</p>
   </div>
 
 <div data-lang="python">
-    <div class="highlight"><pre><span class="n">df</span> <span 
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">read</span><span class="o">.</span><span class="n">load</span><span 
class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">,</span> <span class="n">format</span><span class="o">=</span><span 
class="s">&quot;json&quot;</span><span class="p">)</span>
-<span class="n">df</span><span class="o">.</span><span 
class="n">select</span><span class="p">(</span><span 
class="s">&quot;name&quot;</span><span class="p">,</span> <span 
class="s">&quot;age&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">write</span><span class="o">.</span><span 
class="n">save</span><span class="p">(</span><span 
class="s">&quot;namesAndAges.parquet&quot;</span><span class="p">,</span> <span 
class="n">format</span><span class="o">=</span><span 
class="s">&quot;parquet&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="n">df</span> <span 
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">read</span><span class="o">.</span><span class="n">load</span><span 
class="p">(</span><span 
class="s2">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">,</span> <span class="n">format</span><span class="o">=</span><span 
class="s2">&quot;json&quot;</span><span class="p">)</span>
+<span class="n">df</span><span class="o">.</span><span 
class="n">select</span><span class="p">(</span><span 
class="s2">&quot;name&quot;</span><span class="p">,</span> <span 
class="s2">&quot;age&quot;</span><span class="p">)</span><span 
class="o">.</span><span class="n">write</span><span class="o">.</span><span 
class="n">save</span><span class="p">(</span><span 
class="s2">&quot;namesAndAges.parquet&quot;</span><span class="p">,</span> 
<span class="n">format</span><span class="o">=</span><span 
class="s2">&quot;parquet&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/datasource.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="r">
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> read.df<span 
class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">,</span> <span class="s">&quot;json&quot;</span><span 
class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> 
read.df<span class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">,</span> <span class="s">&quot;json&quot;</span><span 
class="p">)</span>
 namesAndAges <span class="o">&lt;-</span> select<span 
class="p">(</span>df<span class="p">,</span> <span 
class="s">&quot;name&quot;</span><span class="p">,</span> <span 
class="s">&quot;age&quot;</span><span class="p">)</span>
 write.df<span class="p">(</span>namesAndAges<span class="p">,</span> <span 
class="s">&quot;namesAndAges.parquet&quot;</span><span class="p">,</span> <span 
class="s">&quot;parquet&quot;</span><span class="p">)</span>
 </pre></div>
@@ -1432,26 +1432,26 @@ file directly with SQL.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="k">val</span> <span 
class="n">sqlDF</span> <span class="k">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="o">(</span><span class="s">&quot;SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`&quot;</span><span 
class="o">)</span>
+    <div class="highlight"><pre><span></span><span class="k">val</span> <span 
class="n">sqlDF</span> <span class="k">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="o">(</span><span class="s">&quot;SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`&quot;</span><span 
class="o">)</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala"
 in the Spark repo.</small></div>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="n">Dataset</span><span 
class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> 
<span class="n">sqlDF</span> <span class="o">=</span>
+    <div class="highlight"><pre><span></span><span 
class="n">Dataset</span><span class="o">&lt;</span><span 
class="n">Row</span><span class="o">&gt;</span> <span class="n">sqlDF</span> 
<span class="o">=</span>
   <span class="n">spark</span><span class="o">.</span><span 
class="na">sql</span><span class="o">(</span><span class="s">&quot;SELECT * 
FROM parquet.`examples/src/main/resources/users.parquet`&quot;</span><span 
class="o">);</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java"
 in the Spark repo.</small></div>
   </div>
 
 <div data-lang="python">
-    <div class="highlight"><pre><span class="n">df</span> <span 
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">sql</span><span class="p">(</span><span class="s">&quot;SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`&quot;</span><span 
class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="n">df</span> <span 
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT * 
FROM parquet.`examples/src/main/resources/users.parquet`&quot;</span><span 
class="p">)</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/datasource.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="r">
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> sql<span 
class="p">(</span><span class="s">&quot;SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`&quot;</span><span 
class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> 
sql<span class="p">(</span><span class="s">&quot;SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`&quot;</span><span 
class="p">)</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/r/RSparkSQLExample.R" in the Spark repo.</small></div>
 
@@ -1531,7 +1531,7 @@ compatibility reasons.</p>
 <div class="codetabs">
 
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="c1">// Encoders for most common 
types are automatically provided by importing spark.implicits._</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Encoders for 
most common types are automatically provided by importing 
spark.implicits._</span>
 <span class="k">import</span> <span class="nn">spark.implicits._</span>
 
 <span class="k">val</span> <span class="n">peopleDF</span> <span 
class="k">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">read</span><span class="o">.</span><span class="n">json</span><span 
class="o">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="o">)</span>
@@ -1558,7 +1558,7 @@ compatibility reasons.</p>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="kn">import</span> <span 
class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> 
<span class="nn">org.apache.spark.api.java.JavaRDD</span><span 
class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">org.apache.spark.api.java.JavaSparkContext</span><span 
class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">org.apache.spark.api.java.function.MapFunction</span><span 
class="o">;</span>
 <span class="kn">import</span> <span 
class="nn">org.apache.spark.sql.Encoders</span><span class="o">;</span>
@@ -1595,32 +1595,32 @@ compatibility reasons.</p>
 
 <div data-lang="python">
 
-    <div class="highlight"><pre><span class="n">peopleDF</span> <span 
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span 
class="n">read</span><span class="o">.</span><span class="n">json</span><span 
class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="n">peopleDF</span> 
<span class="o">=</span> <span class="n">spark</span><span 
class="o">.</span><span class="n">read</span><span class="o">.</span><span 
class="n">json</span><span class="p">(</span><span 
class="s2">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">)</span>
 
-<span class="c"># DataFrames can be saved as Parquet files, maintaining the 
schema information.</span>
-<span class="n">peopleDF</span><span class="o">.</span><span 
class="n">write</span><span class="o">.</span><span 
class="n">parquet</span><span class="p">(</span><span 
class="s">&quot;people.parquet&quot;</span><span class="p">)</span>
+<span class="c1"># DataFrames can be saved as Parquet files, maintaining the 
schema information.</span>
+<span class="n">peopleDF</span><span class="o">.</span><span 
class="n">write</span><span class="o">.</span><span 
class="n">parquet</span><span class="p">(</span><span 
class="s2">&quot;people.parquet&quot;</span><span class="p">)</span>
 
-<span class="c"># Read in the Parquet file created above.</span>
-<span class="c"># Parquet files are self-describing so the schema is 
preserved.</span>
-<span class="c"># The result of loading a parquet file is also a 
DataFrame.</span>
-<span class="n">parquetFile</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">read</span><span 
class="o">.</span><span class="n">parquet</span><span class="p">(</span><span 
class="s">&quot;people.parquet&quot;</span><span class="p">)</span>
+<span class="c1"># Read in the Parquet file created above.</span>
+<span class="c1"># Parquet files are self-describing so the schema is 
preserved.</span>
+<span class="c1"># The result of loading a parquet file is also a 
DataFrame.</span>
+<span class="n">parquetFile</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">read</span><span 
class="o">.</span><span class="n">parquet</span><span class="p">(</span><span 
class="s2">&quot;people.parquet&quot;</span><span class="p">)</span>
 
-<span class="c"># Parquet files can also be used to create a temporary view 
and then used in SQL statements.</span>
-<span class="n">parquetFile</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s">&quot;parquetFile&quot;</span><span class="p">)</span>
-<span class="n">teenagers</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="p">(</span><span class="s">&quot;SELECT name FROM parquetFile WHERE age 
&gt;= 13 AND age &lt;= 19&quot;</span><span class="p">)</span>
+<span class="c1"># Parquet files can also be used to create a temporary view 
and then used in SQL statements.</span>
+<span class="n">parquetFile</span><span class="o">.</span><span 
class="n">createOrReplaceTempView</span><span class="p">(</span><span 
class="s2">&quot;parquetFile&quot;</span><span class="p">)</span>
+<span class="n">teenagers</span> <span class="o">=</span> <span 
class="n">spark</span><span class="o">.</span><span class="n">sql</span><span 
class="p">(</span><span class="s2">&quot;SELECT name FROM parquetFile WHERE age 
&gt;= 13 AND age &lt;= 19&quot;</span><span class="p">)</span>
 <span class="n">teenagers</span><span class="o">.</span><span 
class="n">show</span><span class="p">()</span>
-<span class="c"># +------+</span>
-<span class="c"># |  name|</span>
-<span class="c"># +------+</span>
-<span class="c"># |Justin|</span>
-<span class="c"># +------+</span>
+<span class="c1"># +------+</span>
+<span class="c1"># |  name|</span>
+<span class="c1"># +------+</span>
+<span class="c1"># |Justin|</span>
+<span class="c1"># +------+</span>
 </pre></div>
     <div><small>Find full example code at 
"examples/src/main/python/sql/datasource.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="r">
 
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> read.df<span 
class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">,</span> <span class="s">&quot;json&quot;</span><span 
class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> 
read.df<span class="p">(</span><span 
class="s">&quot;examples/src/main/resources/people.json&quot;</span><span 
class="p">,</span> <span class="s">&quot;json&quot;</span><span 
class="p">)</span>
 
 <span class="c1"># SparkDataFrame can be saved as Parquet files, maintaining 
the schema information.</span>
 write.parquet<span class="p">(</span>df<span class="p">,</span> <span 
class="s">&quot;people.parquet&quot;</span><span class="p">)</span>
@@ -1652,13 +1652,13 @@ teenNames <span class="o">&lt;-</span> dapply<span 
class="p">(</span>df<span cla
 
 <div data-lang="sql">
 
-    <div class="highlight"><pre><code class="language-sql" 
data-lang="sql"><span class="k">CREATE</span> <span class="k">TEMPORARY</span> 
<span class="k">VIEW</span> <span class="n">parquetTable</span>
+    <figure class="highlight"><pre><code class="language-sql" 
data-lang="sql"><span></span><span class="k">CREATE</span> <span 
class="k">TEMPORARY</span> <span class="k">VIEW</span> <span 
class="n">parquetTable</span>
 <span class="k">USING</span> <span class="n">org</span><span 
class="p">.</span><span class="n">apache</span><span class="p">.</span><span 
class="n">spark</span><span class="p">.</span><span class="k">sql</span><span 
class="p">.</span><span class="n">parquet</span>
 <span class="k">OPTIONS</span> <span class="p">(</span>
   <span class="n">path</span> <span 
class="ss">&quot;examples/src/main/resources/people.parquet&quot;</span>
 <span class="p">)</span>
 
-<span class="k">SELECT</span> <span class="o">*</span> <span 
class="k">FROM</span> <span class="n">parquetTable</span></code></pre></div>
+<span class="k">SELECT</span> <span class="o">*</span> <span 
class="k">FROM</span> <span class="n">parquetTable</span></code></pre></figure>
 
   </div>
 
@@ -1673,7 +1673,7 @@ partitioning information automatically. For example, we 
can store all our previo
 population data into a partitioned table using the following directory 
structure, with two extra
 columns, <code>gender</code> and <code>country</code> as partitioning 
columns:</p>
 
-<div class="highlight"><pre><code class="language-text" data-lang="text">path
+<figure class="highlight"><pre><code class="language-text" 
data-lang="text"><span></span>path
 └── to
     └── table
         ├── gender=male
@@ -1691,17 +1691,17 @@ columns, <code>gender</code> and <code>country</code> 
as partitioning columns:</
             │   └── data.parquet
             ├── country=CN
             │   └── data.parquet
-            └── ...</code></pre></div>
+            └── ...</code></pre></figure>
 
 <p>By passing <code>path/to/table</code> to either 
<code>SparkSession.read.parquet</code> or <code>SparkSession.read.load</code>, 
Spark SQL
 will automatically extract the partitioning information from the paths.
 Now the schema of the returned DataFrame becomes:</p>
 
-<div class="highlight"><pre><code class="language-text" data-lang="text">root
+<figure class="highlight"><pre><code class="language-text" 
data-lang="text"><span></span>root
 |-- name: string (nullable = true)
 |-- age: long (nullable = true)
 |-- gender: strin

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to