Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22746#discussion_r225794477 --- Diff: docs/sql-getting-started.md --- @@ -0,0 +1,369 @@ +--- +layout: global +title: Getting Started +displayTitle: Getting Started +--- + +* Table of contents +{:toc} + +## Starting Point: SparkSession + +<div class="codetabs"> +<div data-lang="scala" markdown="1"> + +The entry point into all functionality in Spark is the [`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session scala/org/apache/spark/examples/sql/SparkSQLExample.scala %} +</div> + +<div data-lang="java" markdown="1"> + +The entry point into all functionality in Spark is the [`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %} +</div> + +<div data-lang="python" markdown="1"> + +The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder`: + +{% include_example init_session python/sql/basic.py %} +</div> + +<div data-lang="r" markdown="1"> + +The entry point into all functionality in Spark is the [`SparkSession`](api/R/sparkR.session.html) class. To initialize a basic `SparkSession`, just call `sparkR.session()`: + +{% include_example init_session r/RSparkSQLExample.R %} + +Note that when invoked for the first time, `sparkR.session()` initializes a global `SparkSession` singleton instance, and always returns a reference to this instance for successive invocations. In this way, users only need to initialize the `SparkSession` once, then SparkR functions like `read.df` will be able to access this global instance implicitly, and users don't need to pass the `SparkSession` instance around. +</div> +</div> + +`SparkSession` in Spark 2.0 provides builtin support for Hive features including the ability to +write queries using HiveQL, access to Hive UDFs, and the ability to read data from Hive tables. +To use these features, you do not need to have an existing Hive setup. + +## Creating DataFrames + +<div class="codetabs"> +<div data-lang="scala" markdown="1"> +With a `SparkSession`, applications can create DataFrames from an [existing `RDD`](#interoperating-with-rdds), +from a Hive table, or from [Spark data sources](#data-sources). --- End diff -- Done in 58115e5, also fix link in ml-pipeline.md\sparkr.md\structured-streaming-programming-guide.md
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org