spark git commit: [SPARK-15863][SQL][DOC][FOLLOW-UP] Update SQL programming guide.

rxin Mon, 27 Jun 2016 22:44:40 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 7177e1843 -> af70ad028



[SPARK-15863][SQL][DOC][FOLLOW-UP] Update SQL programming guide.

## What changes were proposed in this pull request?
This PR makes several updates to SQL programming guide.

Author: Yin Huai <yh...@databricks.com>

Closes #13938 from yhuai/doc.

(cherry picked from commit dd6b7dbe7043f3fa3d2e3993d2e13f87231a59ca)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/af70ad02
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/af70ad02
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/af70ad02

Branch: refs/heads/branch-2.0
Commit: af70ad02859900e8c890e38b6fec0d12d42461f2
Parents: 7177e18
Author: Yin Huai <yh...@databricks.com>
Authored: Mon Jun 27 22:44:08 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Mon Jun 27 22:44:13 2016 -0700

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/af70ad02/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 4b52c94..6c6bc8d 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -25,29 +25,35 @@ the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive installation. 
For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned as a 
[DataFrame](#datasets-and-dataframes).
+SQL from within another programming language the results will be returned as a 
[Dataset/DataFrame](#datasets-and-dataframes).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
 ## Datasets and DataFrames
 
-A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+A Dataset is a distributed collection of data.
+Dataset is a new interface added in Spark 1.6 that provides the benefits of 
RDDs (strong
 typing, ability to use powerful lambda functions) with the benefits of Spark 
SQL's optimized
 execution engine. A Dataset can be [constructed](#creating-datasets) from JVM 
objects and then
 manipulated using functional transformations (`map`, `flatMap`, `filter`, 
etc.).
+The Dataset API is available in [Scala][scala-datasets] and
+[Java][java-datasets]. Python does not have the support for the Dataset API. 
But due to Python's dynamic nature,
+many of the benefits of the Dataset API are already available (i.e. you can 
access the field of a row by name naturally
+`row.columnName`). The case for R is similar.
 
-The Dataset API is the successor of the DataFrame API, which was introduced in 
Spark 1.3. In Spark
-2.0, Datasets and DataFrames are unified, and DataFrames are now equivalent to 
Datasets of `Row`s.
-In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the Scala 
API][scala-datasets].
-However, [Java API][java-datasets] users must use `Dataset<Row>` instead.
+A DataFrame is a *Dataset* organized into named columns. It is conceptually
+equivalent to a table in a relational database or a data frame in R/Python, 
but with richer
+optimizations under the hood. DataFrames can be constructed from a wide array 
of [sources](#data-sources) such
+as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+The DataFrame API is available in Scala,
+Java, [Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and 
[R](api/R/index.html).
+In Scala and Java, a DataFrame is represented by a Dataset of `Row`s.
+In [the Scala API][scala-datasets], `DataFrame` is simply a type alias of 
`Dataset[Row]`.
+While, in [Java API][java-datasets], users need to use `Dataset<Row>` to 
represent a `DataFrame`.
 
 [scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset
 [java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html
 
-Python does not have support for the Dataset API, but due to its dynamic 
nature many of the
-benefits are already available (i.e. you can access the field of a row by name 
naturally
-`row.columnName`). The case for R is similar.
-
 Throughout this document, we will often refer to Scala/Java Datasets of `Row`s 
as DataFrames.
 
 # Getting Started
@@ -2043,14 +2049,6 @@ that these options will be deprecated in future release 
as more optimizations ar
     </td>
   </tr>
   <tr>
-    <td><code>spark.sql.tungsten.enabled</code></td>
-    <td>true</td>
-    <td>
-      When true, use the optimized Tungsten physical execution backend which 
explicitly manages memory
-      and dynamically generates bytecode for expression evaluation.
-    </td>
-  </tr>
-  <tr>
     <td><code>spark.sql.shuffle.partitions</code></td>
     <td>200</td>
     <td>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15863][SQL][DOC][FOLLOW-UP] Update SQL programming guide.

Reply via email to