date:20160613

[GitHub] spark issue #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reader-write...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13653
  
**[Test build #60452 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60452/consoleFull)**
 for PR 13653 at commit 
[`a59498b`](https://github.com/apache/spark/commit/a59498bbe86609bc206de9b229052f35071049cb).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

2016-06-13 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/13655#discussion_r66895475
  
--- Diff: dev/sparktestsupport/modules.py ---
@@ -337,6 +337,7 @@ def __hash__(self):
 "pyspark.sql.group",
 "pyspark.sql.functions",
 "pyspark.sql.readwriter",
+"pyspark.sql.streaming",
--- End diff --

this is pretty significant. I had no idea this file existed. I wonder if we 
could possibly add a check to the linter to catch new Python files appearing 
but not being added to this file? It's not a blocker message but an advisory 
one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...

2016-06-13 Thread kayousterhout

Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/13037
  
The screenshot looks great to me (haven't looked at the code yet) -- 
@andrewor14 / @srowen, thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13638: [SPARK-15915][SQL] CacheManager should use canonicalized...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13638
  
**[Test build #60458 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60458/consoleFull)**
 for PR 13638 at commit 
[`3d27607`](https://github.com/apache/spark/commit/3d27607982a3794c438221153b8e078389da146b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13620
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60444/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13620
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13561: [SPARK-15824][SQL] Run 'with ... insert ... selec...

2016-06-13 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13561#discussion_r66895096
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 ---
@@ -225,7 +225,10 @@ private[hive] class SparkExecuteStatementOperation(
 if (useIncrementalCollect) {
   result.toLocalIterator.asScala
 } else {
-  result.collect().iterator
+  
sqlContext.sessionState.executePlan(result.logicalPlan).executedPlan match {
--- End diff --

use `result.queryExection.executedPlan`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13620
  
**[Test build #60444 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60444/consoleFull)**
 for PR 13620 at commit 
[`4db0e09`](https://github.com/apache/spark/commit/4db0e0978cd2cc57814733e6a6360407f10cb37b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13638: [SPARK-15915][SQL] CacheManager should use canoni...

2016-06-13 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13638#discussion_r66895037
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -87,7 +87,7 @@ private[sql] class CacheManager extends Logging {
   query: Dataset[_],
   tableName: Option[String] = None,
   storageLevel: StorageLevel = MEMORY_AND_DISK): Unit = writeLock {
-val planToCache = query.queryExecution.analyzed
+val planToCache = query.queryExecution.analyzed.canonicalized
--- End diff --

It seems we don't need them for now.
I'll revert the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13655
  
**[Test build #60457 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60457/consoleFull)**
 for PR 13655 at commit 
[`0b15d88`](https://github.com/apache/spark/commit/0b15d88b0cde790e9aa58eb34aa22c57642b8886).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60443/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs ...

2016-06-13 Thread nblintao

Github user nblintao commented on a diff in the pull request:

https://github.com/apache/spark/pull/13620#discussion_r66894846
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -369,3 +361,246 @@ private[ui] class AllJobsPage(parent: JobsTab) 
extends WebUIPage("") {
 }
   }
 }
+
+private[ui] class JobTableRowData(
+val jobData: JobUIData,
+val lastStageName: String,
+val lastStageDescription: String,
+val duration: Long,
+val formattedDuration: String,
+val submissionTime: Long,
+val formattedSubmissionTime: String,
+val jobDescription: NodeSeq,
+val detailUrl: String)
+
+private[ui] class JobDataSource(
+jobs: Seq[JobUIData],
+stageIdToInfo: HashMap[Int, StageInfo],
+stageIdToData: HashMap[(Int, Int), StageUIData],
+basePath: String,
+currentTime: Long,
+pageSize: Int,
+sortColumn: String,
+desc: Boolean) extends PagedDataSource[JobTableRowData](pageSize) {
+
+  // Convert JobUIData to JobTableRowData which contains the final 
contents to show in the table
+  // so that we can avoid creating duplicate contents during sorting the 
data
+  private val data = jobs.map(jobRow).sorted(ordering(sortColumn, desc))
+
+  private var _slicedJobIds: Set[Int] = null
+
+  override def dataSize: Int = data.size
+
+  override def sliceData(from: Int, to: Int): Seq[JobTableRowData] = {
+val r = data.slice(from, to)
+_slicedJobIds = r.map(_.jobData.jobId).toSet
+r
+  }
+
+  def slicedJobIds: Set[Int] = _slicedJobIds
--- End diff --

Sorry, it's deprecated. I will remove it. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13632
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

2016-06-13 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/13655
  
/cc @tdas @brkyvz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

2016-06-13 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/13655

[SPARK-15935][PySpark]Enable test for sql/streaming.py and fix these tests

## What changes were proposed in this pull request?

This PR just enables tests for sql/streaming.py and also fixes the failures.

## How was this patch tested?

Existing unit tests.


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark python-streaming-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13655


commit 0b15d88b0cde790e9aa58eb34aa22c57642b8886
Author: Shixiong Zhu 
Date:   2016-06-14T01:00:13Z

Enable test for sql/streaming.py and fix these tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13632
  
**[Test build #60443 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60443/consoleFull)**
 for PR 13632 at commit 
[`e889f85`](https://github.com/apache/spark/commit/e889f85165acbb1d685e70c959abe04955d24a17).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66894492
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
+One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive 
installation. For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned 
as a [DataFrame](#DataFrames).
+SQL from within another programming language the results will be returned 
as a [Dataset\[Row\]](#datasets).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
-## DataFrames
+## Datasets and DataFrames
 
-A DataFrame is a distributed collection of data organized into named 
columns. It is conceptually
-equivalent to a table in a relational database or a data frame in 
R/Python, but with richer
-optimizations under the hood. DataFrames can be constructed from a wide 
array of [sources](#data-sources) such
-as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+typing, ability to use powerful lambda functions) with the benefits of 
Spark SQL's optimized
+execution engine. A Dataset can be [constructed](#creating-datasets) from 
JVM objects and then
+manipulated using functional transformations (map, flatMap, filter, etc.).
 
-The DataFrame API is available in 
[Scala](api/scala/index.html#org.apache.spark.sql.DataFrame),
-[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html),
-[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and 
[R](api/R/index.html).
+The Dataset API is the successor of the DataFrame API, which was 
introduced in Spark 1.3. In Spark
+2.0, Datasets and DataFrames are unified, and DataFrames are now 
equivalent to Datasets of `Row`s.
+In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the 
Scala API][scala-datasets].
+However, [Java API][java-datasets] users must use `Dataset` instead.
 
-## Datasets
+[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset
+[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html
 
-A Dataset is a new experimental interface added in Spark 1.6 that tries to 
provide the benefits of
-RDDs (strong typing, ability to use powerful lambda functions) with the 
benefits of Spark SQL's
-optimized execution engine. A Dataset can be 
[constructed](#creating-datasets) from JVM objects and then manipulated
-using functional transformations (map, flatMap, filter, etc.).
+Python does not have support for the Dataset API, but due to its dynamic 
nature many of the
+benefits are already available (i.e. you can access the field of a row by 
name naturally
+`row.columnName`). The case for R is similar.
 
-The unified Dataset API can be used both in 
[Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and
-[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does 
not yet have support for
-the Dataset API, but due to its dynamic nature many of the benefits are 
already available (i.e. you can
-access the field of a row by name naturally `row.columnName`).

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12938
  
**[Test build #60456 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60456/consoleFull)**
 for PR 12938 at commit 
[`d925f38`](https://github.com/apache/spark/commit/d925f3861ac96ddb09682b863160b18adcb4473d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66894292
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
+One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive 
installation. For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned 
as a [DataFrame](#DataFrames).
+SQL from within another programming language the results will be returned 
as a [Dataset\[Row\]](#datasets).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
-## DataFrames
+## Datasets and DataFrames
 
-A DataFrame is a distributed collection of data organized into named 
columns. It is conceptually
-equivalent to a table in a relational database or a data frame in 
R/Python, but with richer
-optimizations under the hood. DataFrames can be constructed from a wide 
array of [sources](#data-sources) such
-as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+typing, ability to use powerful lambda functions) with the benefits of 
Spark SQL's optimized
+execution engine. A Dataset can be [constructed](#creating-datasets) from 
JVM objects and then
+manipulated using functional transformations (map, flatMap, filter, etc.).
 
-The DataFrame API is available in 
[Scala](api/scala/index.html#org.apache.spark.sql.DataFrame),
-[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html),
-[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and 
[R](api/R/index.html).
+The Dataset API is the successor of the DataFrame API, which was 
introduced in Spark 1.3. In Spark
+2.0, Datasets and DataFrames are unified, and DataFrames are now 
equivalent to Datasets of `Row`s.
+In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the 
Scala API][scala-datasets].
+However, [Java API][java-datasets] users must use `Dataset` instead.
 
-## Datasets
+[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset
+[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html
 
-A Dataset is a new experimental interface added in Spark 1.6 that tries to 
provide the benefits of
-RDDs (strong typing, ability to use powerful lambda functions) with the 
benefits of Spark SQL's
-optimized execution engine. A Dataset can be 
[constructed](#creating-datasets) from JVM objects and then manipulated
-using functional transformations (map, flatMap, filter, etc.).
+Python does not have support for the Dataset API, but due to its dynamic 
nature many of the
+benefits are already available (i.e. you can access the field of a row by 
name naturally
+`row.columnName`). The case for R is similar.
 
-The unified Dataset API can be used both in 
[Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and
-[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does 
not yet have support for
-the Dataset API, but due to its dynamic nature many of the benefits are 
already available (i.e. you can
-access the field of a row by name naturally `row.columnName`).

[GitHub] spark issue #13563: [SPARK-15826] [CORE] PipedRDD to allow configurable char...

2016-06-13 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/13563
  
Jenkins retest please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66894132
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
+One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive 
installation. For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned 
as a [DataFrame](#DataFrames).
+SQL from within another programming language the results will be returned 
as a [Dataset\[Row\]](#datasets).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
-## DataFrames
+## Datasets and DataFrames
 
-A DataFrame is a distributed collection of data organized into named 
columns. It is conceptually
-equivalent to a table in a relational database or a data frame in 
R/Python, but with richer
-optimizations under the hood. DataFrames can be constructed from a wide 
array of [sources](#data-sources) such
-as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+typing, ability to use powerful lambda functions) with the benefits of 
Spark SQL's optimized
+execution engine. A Dataset can be [constructed](#creating-datasets) from 
JVM objects and then
+manipulated using functional transformations (map, flatMap, filter, etc.).
 
-The DataFrame API is available in 
[Scala](api/scala/index.html#org.apache.spark.sql.DataFrame),
-[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html),
-[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and 
[R](api/R/index.html).
+The Dataset API is the successor of the DataFrame API, which was 
introduced in Spark 1.3. In Spark
+2.0, Datasets and DataFrames are unified, and DataFrames are now 
equivalent to Datasets of `Row`s.
+In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the 
Scala API][scala-datasets].
+However, [Java API][java-datasets] users must use `Dataset` instead.
 
-## Datasets
+[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset
+[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html
 
-A Dataset is a new experimental interface added in Spark 1.6 that tries to 
provide the benefits of
-RDDs (strong typing, ability to use powerful lambda functions) with the 
benefits of Spark SQL's
-optimized execution engine. A Dataset can be 
[constructed](#creating-datasets) from JVM objects and then manipulated
-using functional transformations (map, flatMap, filter, etc.).
+Python does not have support for the Dataset API, but due to its dynamic 
nature many of the
+benefits are already available (i.e. you can access the field of a row by 
name naturally
+`row.columnName`). The case for R is similar.
 
-The unified Dataset API can be used both in 
[Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and
-[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does 
not yet have support for
-the Dataset API, but due to its dynamic nature many of the benefits are 
already available (i.e. you can
-access the field of a row by name naturally `row.columnName`).

[GitHub] spark issue #13654: [SPARK-15868] [Web UI] Executors table in Executors tab ...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13654
  
**[Test build #60455 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60455/consoleFull)**
 for PR 13654 at commit 
[`35b280a`](https://github.com/apache/spark/commit/35b280a1882fd3f5ae34c71cb00a12bf12852a90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13648: [SPARK-15932][SQL][DOC] document the contract of encoder...

2016-06-13 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13648
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66893943
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
+One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive 
installation. For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned 
as a [DataFrame](#DataFrames).
+SQL from within another programming language the results will be returned 
as a [Dataset\[Row\]](#datasets).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
-## DataFrames
+## Datasets and DataFrames
 
-A DataFrame is a distributed collection of data organized into named 
columns. It is conceptually
-equivalent to a table in a relational database or a data frame in 
R/Python, but with richer
-optimizations under the hood. DataFrames can be constructed from a wide 
array of [sources](#data-sources) such
-as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+typing, ability to use powerful lambda functions) with the benefits of 
Spark SQL's optimized
+execution engine. A Dataset can be [constructed](#creating-datasets) from 
JVM objects and then
+manipulated using functional transformations (map, flatMap, filter, etc.).
 
-The DataFrame API is available in 
[Scala](api/scala/index.html#org.apache.spark.sql.DataFrame),
-[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html),
-[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and 
[R](api/R/index.html).
+The Dataset API is the successor of the DataFrame API, which was 
introduced in Spark 1.3. In Spark
+2.0, Datasets and DataFrames are unified, and DataFrames are now 
equivalent to Datasets of `Row`s.
+In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the 
Scala API][scala-datasets].
+However, [Java API][java-datasets] users must use `Dataset` instead.
 
-## Datasets
+[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset
+[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html
 
-A Dataset is a new experimental interface added in Spark 1.6 that tries to 
provide the benefits of
-RDDs (strong typing, ability to use powerful lambda functions) with the 
benefits of Spark SQL's
-optimized execution engine. A Dataset can be 
[constructed](#creating-datasets) from JVM objects and then manipulated
-using functional transformations (map, flatMap, filter, etc.).
+Python does not have support for the Dataset API, but due to its dynamic 
nature many of the
+benefits are already available (i.e. you can access the field of a row by 
name naturally
+`row.columnName`). The case for R is similar.
 
-The unified Dataset API can be used both in 
[Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and
-[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does 
not yet have support for
-the Dataset API, but due to its dynamic nature many of the benefits are 
already available (i.e. you can
-access the field of a row by name naturally `row.columnName`).

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66893918
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
+One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive 
installation. For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned 
as a [DataFrame](#DataFrames).
+SQL from within another programming language the results will be returned 
as a [Dataset\[Row\]](#datasets).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
-## DataFrames
+## Datasets and DataFrames
 
-A DataFrame is a distributed collection of data organized into named 
columns. It is conceptually
-equivalent to a table in a relational database or a data frame in 
R/Python, but with richer
-optimizations under the hood. DataFrames can be constructed from a wide 
array of [sources](#data-sources) such
-as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+typing, ability to use powerful lambda functions) with the benefits of 
Spark SQL's optimized
+execution engine. A Dataset can be [constructed](#creating-datasets) from 
JVM objects and then
+manipulated using functional transformations (map, flatMap, filter, etc.).
 
-The DataFrame API is available in 
[Scala](api/scala/index.html#org.apache.spark.sql.DataFrame),
-[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html),
-[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and 
[R](api/R/index.html).
+The Dataset API is the successor of the DataFrame API, which was 
introduced in Spark 1.3. In Spark
+2.0, Datasets and DataFrames are unified, and DataFrames are now 
equivalent to Datasets of `Row`s.
+In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the 
Scala API][scala-datasets].
+However, [Java API][java-datasets] users must use `Dataset` instead.
 
-## Datasets
+[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset
+[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html
 
-A Dataset is a new experimental interface added in Spark 1.6 that tries to 
provide the benefits of
-RDDs (strong typing, ability to use powerful lambda functions) with the 
benefits of Spark SQL's
-optimized execution engine. A Dataset can be 
[constructed](#creating-datasets) from JVM objects and then manipulated
-using functional transformations (map, flatMap, filter, etc.).
+Python does not have support for the Dataset API, but due to its dynamic 
nature many of the
+benefits are already available (i.e. you can access the field of a row by 
name naturally
+`row.columnName`). The case for R is similar.
 
-The unified Dataset API can be used both in 
[Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and
-[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does 
not yet have support for
-the Dataset API, but due to its dynamic nature many of the benefits are 
already available (i.e. you can
-access the field of a row by name naturally `row.columnName`).

[GitHub] spark pull request #13654: [SPARK-15868] [Web UI] Executors table in Executo...

2016-06-13 Thread ajbozarth

GitHub user ajbozarth opened a pull request:

https://github.com/apache/spark/pull/13654

[SPARK-15868] [Web UI] Executors table in Executors tab should sort 
Executor IDs in numerical order

## What changes were proposed in this pull request?

Currently the Executors table sorts by id using a string sort (since that's 
what it is stored as). Since  the id is a number (other than the driver) we 
should be sorting numerically. I have changed both the initial sort on page 
load as well as the table sort to sort on id numerically, treating non-numeric 
strings (like the driver) as "-1"

## How was this patch tested?

Manually tested and dev/run-tests


![pageload](https://cloud.githubusercontent.com/assets/13952758/16027882/d32edd0a-318e-11e6-9faf-fc972b7c36ab.png)

![sorted](https://cloud.githubusercontent.com/assets/13952758/16027883/d34541c6-318e-11e6-9ed7-6bfc0cd4152e.png)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajbozarth/spark spark15868

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13654


commit 35b280a1882fd3f5ae34c71cb00a12bf12852a90
Author: Alex Bozarth 
Date:   2016-06-14T00:08:52Z

Fixed sorting on executor id




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13632: [SPARK-15910][SQL] Check schema consistency when ...

2016-06-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13632


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13632
  
thanks, merging to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13652: [SPARK-] Fix incorrect days to millis conversion

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13652
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13652: [SPARK-] Fix incorrect days to millis conversion

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60441/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13652: [SPARK-] Fix incorrect days to millis conversion

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13652
  
**[Test build #60441 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60441/consoleFull)**
 for PR 13652 at commit 
[`20904c4`](https://github.com/apache/spark/commit/20904c4323c5f908a42ea1b4cea6701626cebe28).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13632
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60442/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66893224
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
+One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive 
installation. For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned 
as a [DataFrame](#DataFrames).
+SQL from within another programming language the results will be returned 
as a [Dataset\[Row\]](#datasets).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
-## DataFrames
+## Datasets and DataFrames
 
-A DataFrame is a distributed collection of data organized into named 
columns. It is conceptually
-equivalent to a table in a relational database or a data frame in 
R/Python, but with richer
-optimizations under the hood. DataFrames can be constructed from a wide 
array of [sources](#data-sources) such
-as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+typing, ability to use powerful lambda functions) with the benefits of 
Spark SQL's optimized
+execution engine. A Dataset can be [constructed](#creating-datasets) from 
JVM objects and then
+manipulated using functional transformations (map, flatMap, filter, etc.).
 
-The DataFrame API is available in 
[Scala](api/scala/index.html#org.apache.spark.sql.DataFrame),
-[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html),
-[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and 
[R](api/R/index.html).
+The Dataset API is the successor of the DataFrame API, which was 
introduced in Spark 1.3. In Spark
--- End diff --

I will remove this line ~~The Dataset API is the successor of the DataFrame 
API, which was introduced in Spark 1.3.~~ 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13632
  
**[Test build #60442 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60442/consoleFull)**
 for PR 13632 at commit 
[`35a1ee0`](https://github.com/apache/spark/commit/35a1ee09b82d4d66a95d4e88d8dda4e056cd0e11).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...

2016-06-13 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/13037
  

![collapseopen](https://cloud.githubusercontent.com/assets/13952758/16027758/6dd2b5fe-318d-11e6-9c56-3862af2fe479.png)

![collapseclosed](https://cloud.githubusercontent.com/assets/13952758/16027759/6de5fc04-318d-11e6-8827-e0e1c1e80e72.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...

2016-06-13 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/13037
  
Ran into some bad merge/rebase issues and had to do a forced push with my 
new code (sorry the previous commits are gone, but so are all the changes in 
them anyway)

I've added both of @kayousterhout ideas, the link to the tasks table via "X 
Completed Tasks" as well as added a collapsible table feature, that I wrote to 
be reusable with minimal effort elsewhere in the UI. Since the link is a bit 
redundant I can remove it if everyone want, I don't think it hurts to have both 
though. I also left the table open by default (since users are currently used 
to seeing it).

I will be adding screen shots in a bit, I forgot to take them before 
dealing with all the rebasing issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12876: [SPARK-15095] [SQL] drop binary mode in ThriftServer

2016-06-13 Thread epahomov

Github user epahomov commented on the issue:

https://github.com/apache/spark/pull/12876
  
I've created a ticket to revert these changes - 
https://issues.apache.org/jira/browse/SPARK-15934


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66892238
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
+One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive 
installation. For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned 
as a [DataFrame](#DataFrames).
+SQL from within another programming language the results will be returned 
as a [Dataset\[Row\]](#datasets).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
-## DataFrames
+## Datasets and DataFrames
 
-A DataFrame is a distributed collection of data organized into named 
columns. It is conceptually
-equivalent to a table in a relational database or a data frame in 
R/Python, but with richer
-optimizations under the hood. DataFrames can be constructed from a wide 
array of [sources](#data-sources) such
-as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+typing, ability to use powerful lambda functions) with the benefits of 
Spark SQL's optimized
+execution engine. A Dataset can be [constructed](#creating-datasets) from 
JVM objects and then
--- End diff --

Maybe `[constructed](#creating-datasets) from JVM objects` => `[created 
from JVM objects](#creating-datasets)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13037
  
**[Test build #60454 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60454/consoleFull)**
 for PR 13037 at commit 
[`9e62eb3`](https://github.com/apache/spark/commit/9e62eb3344e41814b12a783ec51bec871dc8320b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66891847
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
+One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive 
installation. For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned 
as a [DataFrame](#DataFrames).
+SQL from within another programming language the results will be returned 
as a [Dataset\[Row\]](#datasets).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
-## DataFrames
+## Datasets and DataFrames
 
-A DataFrame is a distributed collection of data organized into named 
columns. It is conceptually
-equivalent to a table in a relational database or a data frame in 
R/Python, but with richer
-optimizations under the hood. DataFrames can be constructed from a wide 
array of [sources](#data-sources) such
-as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+typing, ability to use powerful lambda functions) with the benefits of 
Spark SQL's optimized
--- End diff --

"with the benefits of" => "as well as" ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66891502
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
--- End diff --

"Dataset API" instead of "Datasets API"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13563: [SPARK-15826] [CORE] PipedRDD to allow configurable char...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13563
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60440/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13563: [SPARK-15826] [CORE] PipedRDD to allow configurable char...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13563
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13563: [SPARK-15826] [CORE] PipedRDD to allow configurable char...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13563
  
**[Test build #60440 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60440/consoleFull)**
 for PR 13563 at commit 
[`fecd730`](https://github.com/apache/spark/commit/fecd730e982063bb19208d7a05bed8d6bbcbe776).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-13 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/12938
  
Which of course seems to be more that things are flaky rather than that 
commit actually introducing it :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3092 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3092/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13649: [SPARK-15929] Fix portability of DataFrameSuite p...

2016-06-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13649


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13037
  
**[Test build #60453 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60453/consoleFull)**
 for PR 13037 at commit 
[`5b12cff`](https://github.com/apache/spark/commit/5b12cff99a7252193da1e555185da50d4b973750).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13649
  
Merging to master and branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13636: [SPARK-15637][SPARKR] Remove R version check since maske...

2016-06-13 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13636
  
@shivaram True.

@felixcheung Could you please also add SPARK-15931 to the PR title if this 
PR also targets that one? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13651: [SPARK-15776][SQL] Divide Expression inside Aggregation ...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13651
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60439/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13651: [SPARK-15776][SQL] Divide Expression inside Aggregation ...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13651
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13651: [SPARK-15776][SQL] Divide Expression inside Aggregation ...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13651
  
**[Test build #60439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60439/consoleFull)**
 for PR 13651 at commit 
[`df08eea`](https://github.com/apache/spark/commit/df08eeacd85187ca5a71463fc5d25f63426ebe84).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13492: [SPARK-15749][SQL]make the error message more meaningful

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13492
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13492: [SPARK-15749][SQL]make the error message more meaningful

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13492
  
**[Test build #60449 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60449/consoleFull)**
 for PR 13492 at commit 
[`d0fcd33`](https://github.com/apache/spark/commit/d0fcd330c7a427fc3dccdbdb457a2139f6a238f0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13492: [SPARK-15749][SQL]make the error message more meaningful

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13492
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60449/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13649
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13649
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60438/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13498
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60437/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13649
  
**[Test build #60438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60438/consoleFull)**
 for PR 13649 at commit 
[`a466517`](https://github.com/apache/spark/commit/a46651794d701370d673b362019274fe76a2ff29).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13498
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-13 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/12938
  
Git bisect ends up blaming e2ab79d5ea00af45c083cc9a6607d2f0905f9908 - I'll 
poke at it some


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3089 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3089/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #60437 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60437/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13527: [SPARK-15782] [CORE] Set spark.jars system property in c...

2016-06-13 Thread nezihyigitbasi

Github user nezihyigitbasi commented on the issue:

https://github.com/apache/spark/pull/13527
  
thanks @vanzin, addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3090 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3090/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...

2016-06-13 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13653#discussion_r66888549
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.test
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types.{StringType, StructField, StructType}
+import org.apache.spark.util.Utils
+
+
+object LastOptions {
+
+  var parameters: Map[String, String] = null
+  var schema: Option[StructType] = null
+  var saveMode: SaveMode = null
+
+  def clear(): Unit = {
+parameters = null
+schema = null
+saveMode = null
+  }
+}
+
+
+/** Dummy provider. */
+class DefaultSource
+  extends RelationProvider
+  with SchemaRelationProvider
+  with CreatableRelationProvider {
+
+  case class FakeRelation(sqlContext: SQLContext) extends BaseRelation {
+override def schema: StructType = StructType(Seq(StructField("a", 
StringType)))
+  }
+
+  override def createRelation(
+  sqlContext: SQLContext,
+  parameters: Map[String, String],
+  schema: StructType
+): BaseRelation = {
+LastOptions.parameters = parameters
+LastOptions.schema = Some(schema)
+FakeRelation(sqlContext)
+  }
+
+  override def createRelation(
+  sqlContext: SQLContext,
+  parameters: Map[String, String]
+): BaseRelation = {
+LastOptions.parameters = parameters
+LastOptions.schema = None
+FakeRelation(sqlContext)
+  }
+
+  override def createRelation(
+  sqlContext: SQLContext,
+  mode: SaveMode,
+  parameters: Map[String, String],
+  data: DataFrame): BaseRelation = {
+LastOptions.parameters = parameters
+LastOptions.schema = None
+LastOptions.saveMode = mode
+FakeRelation(sqlContext)
+  }
+}
+
+
+class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext {
--- End diff --

These tests are very rudimentary set of tests to test the DataFrameReader 
code path. Note that in Spark 1.6, there were no tests at all. Also I believe 
that a lot of the DF functionality is tested through other test suites (e.g. 
partitioning columns is tested through PartitionedParquetSuite)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...

2016-06-13 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13653#discussion_r66888413
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamReaderWriterSuite.scala
 ---
@@ -371,76 +384,12 @@ class DataFrameReaderWriterSuite extends StreamTest 
with BeforeAndAfter {
 
   private def newTextInput = Utils.createTempDir(namePrefix = 
"text").getCanonicalPath
 
-  test("check trigger() can only be called on continuous queries") {
--- End diff --

Most of these tests are to check whether one method is called on the wrong 
type of DF or not. So all of these can be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...

2016-06-13 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13653#discussion_r66888223
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala 
---
@@ -0,0 +1,401 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, 
ForeachWriter}
+import org.apache.spark.sql.execution.datasources.DataSource
+import org.apache.spark.sql.execution.streaming.{ForeachSink, MemoryPlan, 
MemorySink}
+
+/**
+ * :: Experimental ::
+ * Interface used to write a streaming [[Dataset]] to external storage 
systems (e.g. file systems,
+ * key-value stores, etc). Use [[Dataset.write]] to access this.
+ *
+ * @since 2.0.0
+ */
+@Experimental
+final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
+
+  private val df = ds.toDF()
+
+  /**
+   * :: Experimental ::
+   * Specifies how data of a streaming DataFrame/Dataset is written to a 
streaming sink.
+   *   - `OutputMode.Append()`: only the new rows in the streaming 
DataFrame/Dataset will be
+   *written to the sink
+   *   - `OutputMode.Complete()`: all the rows in the streaming 
DataFrame/Dataset will be written
+   *  to the sink every time these is some 
updates
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def outputMode(outputMode: OutputMode): DataStreamWriter[T] = {
+this.outputMode = outputMode
+this
+  }
+
+
+  /**
+   * :: Experimental ::
+   * Specifies how data of a streaming DataFrame/Dataset is written to a 
streaming sink.
+   *   - `append`:   only the new rows in the streaming DataFrame/Dataset 
will be written to
+   * the sink
+   *   - `complete`: all the rows in the streaming DataFrame/Dataset will 
be written to the sink
+   * every time these is some updates
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def outputMode(outputMode: String): DataStreamWriter[T] = {
--- End diff --

should this be shortened to just `mode`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13558
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13558
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60446/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13558
  
**[Test build #60446 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60446/consoleFull)**
 for PR 13558 at commit 
[`8d87c0f`](https://github.com/apache/spark/commit/8d87c0f2bd9140928915f835fd7d21b178422c69).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...

2016-06-13 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13653#discussion_r66888185
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala 
---
@@ -0,0 +1,401 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, 
ForeachWriter}
+import org.apache.spark.sql.execution.datasources.DataSource
+import org.apache.spark.sql.execution.streaming.{ForeachSink, MemoryPlan, 
MemorySink}
+
+/**
+ * :: Experimental ::
+ * Interface used to write a streaming [[Dataset]] to external storage 
systems (e.g. file systems,
+ * key-value stores, etc). Use [[Dataset.write]] to access this.
+ *
+ * @since 2.0.0
+ */
+@Experimental
+final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
+
+  private val df = ds.toDF()
+
+  /**
+   * :: Experimental ::
+   * Specifies how data of a streaming DataFrame/Dataset is written to a 
streaming sink.
+   *   - `OutputMode.Append()`: only the new rows in the streaming 
DataFrame/Dataset will be
+   *written to the sink
+   *   - `OutputMode.Complete()`: all the rows in the streaming 
DataFrame/Dataset will be written
+   *  to the sink every time these is some 
updates
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def outputMode(outputMode: OutputMode): DataStreamWriter[T] = {
+this.outputMode = outputMode
+this
+  }
+
+
+  /**
+   * :: Experimental ::
+   * Specifies how data of a streaming DataFrame/Dataset is written to a 
streaming sink.
+   *   - `append`:   only the new rows in the streaming DataFrame/Dataset 
will be written to
+   * the sink
+   *   - `complete`: all the rows in the streaming DataFrame/Dataset will 
be written to the sink
+   * every time these is some updates
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def outputMode(outputMode: String): DataStreamWriter[T] = {
+this.outputMode = outputMode.toLowerCase match {
+  case "append" =>
+OutputMode.Append
+  case "complete" =>
+OutputMode.Complete
+  case _ =>
+throw new IllegalArgumentException(s"Unknown output mode 
$outputMode. " +
+  "Accepted output modes are 'append' and 'complete'")
+}
+this
+  }
+
+  /**
+   * :: Experimental ::
+   * Set the trigger for the stream query. The default value is 
`ProcessingTime(0)` and it will run
+   * the query as fast as possible.
+   *
+   * Scala Example:
+   * {{{
+   *   df.write.trigger(ProcessingTime("10 seconds"))
+   *
+   *   import scala.concurrent.duration._
+   *   df.write.trigger(ProcessingTime(10.seconds))
+   * }}}
+   *
+   * Java Example:
+   * {{{
+   *   df.write.trigger(ProcessingTime.create("10 seconds"))
+   *
+   *   import java.util.concurrent.TimeUnit
+   *   df.write.trigger(ProcessingTime.create(10, TimeUnit.SECONDS))
+   * }}}
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def trigger(trigger: Trigger): DataStreamWriter[T] = {
+this.trigger = trigger
+this
+  }
+
+
+  /**
+   * :: Experimental ::
+   * Specifies the name of the [[ContinuousQuery]] that can be started 
with `startStream()`.
+   * This name must be unique among all the currently active queries in 
the associated SQLContext.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def queryName(queryName: String): DataStreamWriter[T] = {
+this.extraOptions += ("queryName" -> queryName)
+this
+  }
+
+  /**
+   * :: Experimental ::
+   * Specifies the underlying output

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3091 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3091/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13561: [SPARK-15824][SQL] Run 'with ... insert ... select' fail...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13561
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13561: [SPARK-15824][SQL] Run 'with ... insert ... select' fail...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60445/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3094 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3094/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13561: [SPARK-15824][SQL] Run 'with ... insert ... select' fail...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13561
  
**[Test build #60445 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60445/consoleFull)**
 for PR 13561 at commit 
[`31212da`](https://github.com/apache/spark/commit/31212da89a7e7acfe7e7ae7f97860f1b45b481b3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...

2016-06-13 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13653#discussion_r66888051
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala 
---
@@ -0,0 +1,401 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, 
ForeachWriter}
+import org.apache.spark.sql.execution.datasources.DataSource
+import org.apache.spark.sql.execution.streaming.{ForeachSink, MemoryPlan, 
MemorySink}
+
+/**
+ * :: Experimental ::
+ * Interface used to write a streaming [[Dataset]] to external storage 
systems (e.g. file systems,
+ * key-value stores, etc). Use [[Dataset.write]] to access this.
+ *
+ * @since 2.0.0
+ */
+@Experimental
+final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
+
+  private val df = ds.toDF()
+
+  /**
+   * :: Experimental ::
+   * Specifies how data of a streaming DataFrame/Dataset is written to a 
streaming sink.
+   *   - `OutputMode.Append()`: only the new rows in the streaming 
DataFrame/Dataset will be
+   *written to the sink
+   *   - `OutputMode.Complete()`: all the rows in the streaming 
DataFrame/Dataset will be written
+   *  to the sink every time these is some 
updates
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def outputMode(outputMode: OutputMode): DataStreamWriter[T] = {
+this.outputMode = outputMode
+this
+  }
+
+
+  /**
+   * :: Experimental ::
+   * Specifies how data of a streaming DataFrame/Dataset is written to a 
streaming sink.
+   *   - `append`:   only the new rows in the streaming DataFrame/Dataset 
will be written to
+   * the sink
+   *   - `complete`: all the rows in the streaming DataFrame/Dataset will 
be written to the sink
+   * every time these is some updates
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def outputMode(outputMode: String): DataStreamWriter[T] = {
+this.outputMode = outputMode.toLowerCase match {
+  case "append" =>
+OutputMode.Append
+  case "complete" =>
+OutputMode.Complete
+  case _ =>
+throw new IllegalArgumentException(s"Unknown output mode 
$outputMode. " +
+  "Accepted output modes are 'append' and 'complete'")
+}
+this
+  }
+
+  /**
+   * :: Experimental ::
+   * Set the trigger for the stream query. The default value is 
`ProcessingTime(0)` and it will run
+   * the query as fast as possible.
+   *
+   * Scala Example:
+   * {{{
+   *   df.write.trigger(ProcessingTime("10 seconds"))
+   *
+   *   import scala.concurrent.duration._
+   *   df.write.trigger(ProcessingTime(10.seconds))
+   * }}}
+   *
+   * Java Example:
+   * {{{
+   *   df.write.trigger(ProcessingTime.create("10 seconds"))
+   *
+   *   import java.util.concurrent.TimeUnit
+   *   df.write.trigger(ProcessingTime.create(10, TimeUnit.SECONDS))
+   * }}}
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def trigger(trigger: Trigger): DataStreamWriter[T] = {
+this.trigger = trigger
+this
+  }
+
+
+  /**
+   * :: Experimental ::
+   * Specifies the name of the [[ContinuousQuery]] that can be started 
with `startStream()`.
+   * This name must be unique among all the currently active queries in 
the associated SQLContext.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def queryName(queryName: String): DataStreamWriter[T] = {
+this.extraOptions += ("queryName" -> queryName)
+this
+  }
+
+  /**
+   * :: Experimental ::
+   * Specifies the underlying output

[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...

2016-06-13 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13653#discussion_r66888017
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala 
---
@@ -0,0 +1,401 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, 
ForeachWriter}
+import org.apache.spark.sql.execution.datasources.DataSource
+import org.apache.spark.sql.execution.streaming.{ForeachSink, MemoryPlan, 
MemorySink}
+
+/**
+ * :: Experimental ::
+ * Interface used to write a streaming [[Dataset]] to external storage 
systems (e.g. file systems,
+ * key-value stores, etc). Use [[Dataset.write]] to access this.
+ *
+ * @since 2.0.0
+ */
+@Experimental
+final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
+
+  private val df = ds.toDF()
+
+  /**
+   * :: Experimental ::
+   * Specifies how data of a streaming DataFrame/Dataset is written to a 
streaming sink.
+   *   - `OutputMode.Append()`: only the new rows in the streaming 
DataFrame/Dataset will be
+   *written to the sink
+   *   - `OutputMode.Complete()`: all the rows in the streaming 
DataFrame/Dataset will be written
+   *  to the sink every time these is some 
updates
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def outputMode(outputMode: OutputMode): DataStreamWriter[T] = {
+this.outputMode = outputMode
+this
+  }
+
+
+  /**
+   * :: Experimental ::
+   * Specifies how data of a streaming DataFrame/Dataset is written to a 
streaming sink.
+   *   - `append`:   only the new rows in the streaming DataFrame/Dataset 
will be written to
+   * the sink
+   *   - `complete`: all the rows in the streaming DataFrame/Dataset will 
be written to the sink
+   * every time these is some updates
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def outputMode(outputMode: String): DataStreamWriter[T] = {
+this.outputMode = outputMode.toLowerCase match {
+  case "append" =>
+OutputMode.Append
+  case "complete" =>
+OutputMode.Complete
+  case _ =>
+throw new IllegalArgumentException(s"Unknown output mode 
$outputMode. " +
+  "Accepted output modes are 'append' and 'complete'")
+}
+this
+  }
+
+  /**
+   * :: Experimental ::
+   * Set the trigger for the stream query. The default value is 
`ProcessingTime(0)` and it will run
+   * the query as fast as possible.
+   *
+   * Scala Example:
+   * {{{
+   *   df.write.trigger(ProcessingTime("10 seconds"))
--- End diff --

update these examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13638: [SPARK-15915][SQL] CacheManager should use canoni...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13638#discussion_r66887975
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -87,7 +87,7 @@ private[sql] class CacheManager extends Logging {
   query: Dataset[_],
   tableName: Option[String] = None,
   storageLevel: StorageLevel = MEMORY_AND_DISK): Unit = writeLock {
-val planToCache = query.queryExecution.analyzed
+val planToCache = query.queryExecution.analyzed.canonicalized
--- End diff --

do we still need these changes? `LogicalPlan.canonicalized` is a lazy val 
and we don't have performance penalty even we use un-canonicalized plan as key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...

2016-06-13 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13653#discussion_r66887929
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -0,0 +1,288 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
+import org.apache.spark.sql.execution.datasources.DataSource
+import org.apache.spark.sql.execution.streaming.StreamingRelation
+import org.apache.spark.sql.types.StructType
+
+@Experimental
+final class DataStreamReader private[sql](sparkSession: SparkSession) 
extends Logging {
+  /**
+   * :: Experimental ::
+   * Specifies the input data source format.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def format(source: String): DataStreamReader = {
+this.source = source
+this
+  }
+
+  /**
+   * :: Experimental ::
+   * Specifies the input schema. Some data sources (e.g. JSON) can infer 
the input schema
+   * automatically from data. By specifying the schema here, the 
underlying data source can
+   * skip the schema inference step, and thus speed up data loading.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def schema(schema: StructType): DataStreamReader = {
+this.userSpecifiedSchema = Option(schema)
+this
+  }
+
+  /**
+   * :: Experimental ::
+   * Adds an input option for the underlying data source.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def option(key: String, value: String): DataStreamReader = {
+this.extraOptions += (key -> value)
+this
+  }
+
+  /**
+   * :: Experimental ::
+   * Adds an input option for the underlying data source.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def option(key: String, value: Boolean): DataStreamReader = option(key, 
value.toString)
+
+  /**
+   * :: Experimental ::
+   * Adds an input option for the underlying data source.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def option(key: String, value: Long): DataStreamReader = option(key, 
value.toString)
+
+  /**
+   * :: Experimental ::
+   * Adds an input option for the underlying data source.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def option(key: String, value: Double): DataStreamReader = option(key, 
value.toString)
+
+  /**
+   * :: Experimental ::
+   * (Scala-specific) Adds input options for the underlying data source.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def options(options: scala.collection.Map[String, String]): 
DataStreamReader = {
+this.extraOptions ++= options
+this
+  }
+
+  /**
+   * :: Experimental ::
+   * Adds input options for the underlying data source.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def options(options: java.util.Map[String, String]): DataStreamReader = {
+this.options(options.asScala)
+this
+  }
+
+
+  /**
+   * :: Experimental ::
+   * Loads input data stream in as a [[DataFrame]], for data streams that 
don't require a path
+   * (e.g. external key-value stores).
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def load(): DataFrame = {
--- End diff --

@marmbrus @rxin Should this be `load()`? In the general case, 
```
sdf.readStream
.format("myFormat")
.load("stringIdentifier")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact

[GitHub] spark issue #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reader-write...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13653
  
**[Test build #60452 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60452/consoleFull)**
 for PR 13653 at commit 
[`a59498b`](https://github.com/apache/spark/commit/a59498bbe86609bc206de9b229052f35071049cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...

2016-06-13 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13653#discussion_r66887779
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -0,0 +1,288 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
+import org.apache.spark.sql.execution.datasources.DataSource
+import org.apache.spark.sql.execution.streaming.StreamingRelation
+import org.apache.spark.sql.types.StructType
+
+@Experimental
--- End diff --

add docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...

2016-06-13 Thread tdas

GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/13653

[SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream 
and writeStream for streaming DFs

## What changes were proposed in this pull request?
Currently, the DataFrameReader/Writer has method that are needed for 
streaming and non-streaming DFs. This is quite awkward because each method in 
them through runtime exception for one case or the other. So rather having half 
the methods throw runtime exceptions, its just better to have a different 
reader/writer API for streams.

## How was this patch tested?
Existing unit tests + two sets of unit tests for DataFrameReader/Writer and 
DataStreamReader/Writer.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-15933

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13653






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11079: [SPARK-13197][SQL] When trying to select from the data f...

2016-06-13 Thread clockfly

Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/11079
  
@thomastechs 

We should use `df.select("`a.c`")` to select a column with name "a.c".
The reason is that we can df.select can be used to select a nested column, 
for example:

```
scala> case class A(inner: Int)
scala> val df = Seq((A(1), 2)).toDF("a", "b")
scala> df.select("a.inner")
res10: org.apache.spark.sql.DataFrame = [inner: int]

scala> df2.select("a.inner").show()
+-+
|inner|
+-+
|1|
+-+
```

So, I think this is NOT a bug, current behavior of select is expected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13647: [SPARK-15784][ML][WIP]:Add Power Iteration Clustering to...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13647
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60450/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13647: [SPARK-15784][ML][WIP]:Add Power Iteration Clustering to...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13647
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13647: [SPARK-15784][ML][WIP]:Add Power Iteration Clustering to...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13647
  
**[Test build #60450 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60450/consoleFull)**
 for PR 13647 at commit 
[`78b70b4`](https://github.com/apache/spark/commit/78b70b41de3de54f9114f495d754fc4852294084).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13115: [SPARK-12492] Using spark-sql commond to run quer...

2016-06-13 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/13115#discussion_r66887332
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -110,24 +110,29 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
*/
   def hiveResultString(): Seq[String] = executedPlan match {
 case ExecutedCommandExec(desc: DescribeTableCommand) =>
-  // If it is a describe command for a Hive table, we want to have the 
output format
-  // be similar with Hive.
-  desc.run(sparkSession).map {
-case Row(name: String, dataType: String, comment) =>
-  Seq(name, dataType,
-Option(comment.asInstanceOf[String]).getOrElse(""))
-.map(s => String.format(s"%-20s", s))
-.mkString("\t")
+  SQLExecution.withNewExecutionId(sparkSession, this) {
+// If it is a describe command for a Hive table, we want to have 
the output format
+// be similar with Hive.
+desc.run(sparkSession).map {
+  case Row(name: String, dataType: String, comment) =>
+Seq(name, dataType,
+  Option(comment.asInstanceOf[String]).getOrElse(""))
+  .map(s => String.format(s"%-20s", s))
+  .mkString("\t")
+}
   }
 case command: ExecutedCommandExec =>
-  command.executeCollect().map(_.getString(0))
-
+  SQLExecution.withNewExecutionId(sparkSession, this) {
+command.executeCollect().map(_.getString(0))
--- End diff --

okey. @KaiXinXiaoLei could you remove the wrapper for this line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13647: [SPARK-15784][ML][WIP]:Add Power Iteration Clustering to...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13647
  
**[Test build #60450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60450/consoleFull)**
 for PR 13647 at commit 
[`78b70b4`](https://github.com/apache/spark/commit/78b70b41de3de54f9114f495d754fc4852294084).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13638: [SPARK-15915][SQL] CacheManager should use canonicalized...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13638
  
**[Test build #60451 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60451/consoleFull)**
 for PR 13638 at commit 
[`11dc433`](https://github.com/apache/spark/commit/11dc433975719288a3694746c60a571d3f349b22).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3095 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3095/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13632
  
LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13492: [SPARK-15749][SQL]make the error message more meaningful

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13492
  
**[Test build #60449 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60449/consoleFull)**
 for PR 13492 at commit 
[`d0fcd33`](https://github.com/apache/spark/commit/d0fcd330c7a427fc3dccdbdb457a2139f6a238f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13115: [SPARK-12492] Using spark-sql commond to run quer...

2016-06-13 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13115#discussion_r66886807
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -110,24 +110,29 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
*/
   def hiveResultString(): Seq[String] = executedPlan match {
 case ExecutedCommandExec(desc: DescribeTableCommand) =>
-  // If it is a describe command for a Hive table, we want to have the 
output format
-  // be similar with Hive.
-  desc.run(sparkSession).map {
-case Row(name: String, dataType: String, comment) =>
-  Seq(name, dataType,
-Option(comment.asInstanceOf[String]).getOrElse(""))
-.map(s => String.format(s"%-20s", s))
-.mkString("\t")
+  SQLExecution.withNewExecutionId(sparkSession, this) {
+// If it is a describe command for a Hive table, we want to have 
the output format
+// be similar with Hive.
+desc.run(sparkSession).map {
+  case Row(name: String, dataType: String, comment) =>
+Seq(name, dataType,
+  Option(comment.asInstanceOf[String]).getOrElse(""))
+  .map(s => String.format(s"%-20s", s))
+  .mkString("\t")
+}
   }
 case command: ExecutedCommandExec =>
-  command.executeCollect().map(_.getString(0))
-
+  SQLExecution.withNewExecutionId(sparkSession, this) {
+command.executeCollect().map(_.getString(0))
--- End diff --

not sure. It's probably fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13637: [SPARK-15914][SQL] Add deprecated method back to SQLCont...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13637
  
**[Test build #60447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60447/consoleFull)**
 for PR 13637 at commit 
[`0cc81b1`](https://github.com/apache/spark/commit/0cc81b1d7dc3c4e05e3adf5f72c25c320a043a0c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13546: [SPARK-15808] [SQL] File Format Checking When Appending ...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13546
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13546: [SPARK-15808] [SQL] File Format Checking When Appending ...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13546
  
**[Test build #60448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60448/consoleFull)**
 for PR 13546 at commit 
[`9d9d263`](https://github.com/apache/spark/commit/9d9d2632fde85c62b28454f33b24f7ee8fb6f15e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 >

101 - 200 of 708 matches

Mail list logo