spark git commit: [SPARK-8462] [DOCS] Documentation fixes for Spark SQL

joshrosen Thu, 18 Jun 2015 19:41:23 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 152f4465d -> bd9bbd611



[SPARK-8462] [DOCS] Documentation fixes for Spark SQL

This fixes various minor documentation issues on the Spark SQL page

Author: Lars Francke <lars.fran...@gmail.com>

Closes #6890 from lfrancke/SPARK-8462 and squashes the following commits:

dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462
34eff2c [Lars Francke] Minor documentation fixes

(cherry picked from commit 4ce3bab89f6bdf6208fdad2fbfaba0b53d1954e3)
Signed-off-by: Josh Rosen <joshro...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bd9bbd61
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bd9bbd61
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bd9bbd61

Branch: refs/heads/branch-1.4
Commit: bd9bbd61197ae7164ed93e70f00e82d832902404
Parents: 152f446
Author: Lars Francke <lars.fran...@gmail.com>
Authored: Thu Jun 18 19:40:32 2015 -0700
Committer: Josh Rosen <joshro...@databricks.com>
Committed: Thu Jun 18 19:40:55 2015 -0700

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/bd9bbd61/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index cbcee8b..572c678 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -819,8 +819,8 @@ saveDF(select(df, "name", "age"), "namesAndAges.parquet")
 
 You can also manually specify the data source that will be used along with any 
extra options
 that you would like to pass to the data source.  Data sources are specified by 
their fully qualified
-name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you can 
also use the shorted
-name (`json`, `parquet`, `jdbc`).  DataFrames of any type can be converted 
into other types
+name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you can 
also use their short
+names (`json`, `parquet`, `jdbc`).  DataFrames of any type can be converted 
into other types
 using this syntax.
 
 <div class="codetabs">
@@ -828,7 +828,7 @@ using this syntax.
 
 {% highlight scala %}
 val df = 
sqlContext.read.format("json").load("examples/src/main/resources/people.json")
-df.select("name", "age").write.format("json").save("namesAndAges.parquet")
+df.select("name", "age").write.format("json").save("namesAndAges.json")
 {% endhighlight %}
 
 </div>
@@ -975,7 +975,7 @@ schemaPeople.write().parquet("people.parquet");
 // The result of loading a parquet file is also a DataFrame.
 DataFrame parquetFile = sqlContext.read().parquet("people.parquet");
 
-//Parquet files can also be registered as tables and then used in SQL 
statements.
+// Parquet files can also be registered as tables and then used in SQL 
statements.
 parquetFile.registerTempTable("parquetFile");
 DataFrame teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age 
>= 13 AND age <= 19");
 List<String> teenagerNames = teenagers.javaRDD().map(new Function<Row, 
String>() {
@@ -1059,7 +1059,7 @@ SELECT * FROM parquetTable
 Table partitioning is a common optimization approach used in systems like 
Hive.  In a partitioned
 table, data are usually stored in different directories, with partitioning 
column values encoded in
 the path of each partition directory.  The Parquet data source is now able to 
discover and infer
-partitioning information automatically.  For exmaple, we can store all our 
previously used
+partitioning information automatically.  For example, we can store all our 
previously used
 population data into a partitioned table using the following directory 
structure, with two extra
 columns, `gender` and `country` as partitioning columns:
 
@@ -1121,12 +1121,12 @@ source is now able to automatically detect this case 
and merge schemas of all th
 import sqlContext.implicits._
 
 // Create a simple DataFrame, stored into a partition directory
-val df1 = sparkContext.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", 
"double")
+val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")
 df1.write.parquet("data/test_table/key=1")
 
 // Create another DataFrame in a new partition directory,
 // adding a new column and dropping an existing column
-val df2 = sparkContext.makeRDD(6 to 10).map(i => (i, i * 3)).toDF("single", 
"triple")
+val df2 = sc.makeRDD(6 to 10).map(i => (i, i * 3)).toDF("single", "triple")
 df2.write.parquet("data/test_table/key=2")
 
 // Read the partitioned table
@@ -1134,7 +1134,7 @@ val df3 = sqlContext.read.parquet("data/test_table")
 df3.printSchema()
 
 // The final schema consists of all 3 columns in the Parquet files together
-// with the partiioning column appeared in the partition directory paths.
+// with the partitioning column appeared in the partition directory paths.
 // root
 // |-- single: int (nullable = true)
 // |-- double: int (nullable = true)
@@ -1165,7 +1165,7 @@ df3 = sqlContext.load("data/test_table", "parquet")
 df3.printSchema()
 
 # The final schema consists of all 3 columns in the Parquet files together
-# with the partiioning column appeared in the partition directory paths.
+# with the partitioning column appeared in the partition directory paths.
 # root
 # |-- single: int (nullable = true)
 # |-- double: int (nullable = true)
@@ -1192,7 +1192,7 @@ df3 <- loadDF(sqlContext, "data/test_table", "parquet")
 printSchema(df3)
 
 # The final schema consists of all 3 columns in the Parquet files together
-# with the partiioning column appeared in the partition directory paths.
+# with the partitioning column appeared in the partition directory paths.
 # root
 # |-- single: int (nullable = true)
 # |-- double: int (nullable = true)
@@ -1249,7 +1249,7 @@ Configuration of Parquet can be done using the `setConf` 
method on `SQLContext`
   <td>false</td>
   <td>
     Turn on Parquet filter pushdown optimization. This feature is turned off 
by default because of a known
-    bug in Paruet 1.6.0rc3 (<a 
href="https://issues.apache.org/jira/browse/PARQUET-136";>PARQUET-136</a>).
+    bug in Parquet 1.6.0rc3 (<a 
href="https://issues.apache.org/jira/browse/PARQUET-136";>PARQUET-136</a>).
     However, if your table doesn't contain any nullable string or binary 
columns, it's still safe to turn
     this feature on.
   </td>
@@ -1398,7 +1398,7 @@ sqlContext <- sparkRSQL.init(sc)
 # The path can be either a single text file or a directory storing text files.
 path <- "examples/src/main/resources/people.json"
 # Create a DataFrame from the file(s) pointed to by path
-people <- jsonFile(sqlContex,t path)
+people <- jsonFile(sqlContext, path)
 
 # The inferred schema can be visualized using the printSchema() method.
 printSchema(people)
@@ -1470,7 +1470,7 @@ sqlContext.sql("FROM src SELECT key, 
value").collect().foreach(println)
 
 When working with Hive one must construct a `HiveContext`, which inherits from 
`SQLContext`, and
 adds support for finding tables in the MetaStore and writing queries using 
HiveQL. In addition to
-the `sql` method a `HiveContext` also provides an `hql` methods, which allows 
queries to be
+the `sql` method a `HiveContext` also provides an `hql` method, which allows 
queries to be
 expressed in HiveQL.
 
 {% highlight java %}
@@ -2766,7 +2766,7 @@ from pyspark.sql.types import *
 </tr>
 <tr>
   <td> <b>MapType</b> </td>
-  <td> enviroment </td>
+  <td> environment </td>
   <td>
   list(type="map", keyType=<i>keyType</i>, valueType=<i>valueType</i>, 
valueContainsNull=[<i>valueContainsNull</i>])<br />
   <b>Note:</b> The default value of <i>valueContainsNull</i> is <i>True</i>.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-8462] [DOCS] Documentation fixes for Spark SQL

Reply via email to