[GitHub] spark issue #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reader-write...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13653 **[Test build #60452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60452/consoleFull)** for PR 13653 at commit [`a59498b`](https://github.com/apache/spark/commit/a59498bbe86609bc206de9b229052f35071049cb). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/13655#discussion_r66895475 --- Diff: dev/sparktestsupport/modules.py --- @@ -337,6 +337,7 @@ def __hash__(self): "pyspark.sql.group", "pyspark.sql.functions", "pyspark.sql.readwriter", +"pyspark.sql.streaming", --- End diff -- this is pretty significant. I had no idea this file existed. I wonder if we could possibly add a check to the linter to catch new Python files appearing but not being added to this file? It's not a blocker message but an advisory one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/13037 The screenshot looks great to me (haven't looked at the code yet) -- @andrewor14 / @srowen, thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13638: [SPARK-15915][SQL] CacheManager should use canonicalized...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13638 **[Test build #60458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60458/consoleFull)** for PR 13638 at commit [`3d27607`](https://github.com/apache/spark/commit/3d27607982a3794c438221153b8e078389da146b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13620 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60444/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13620 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13561: [SPARK-15824][SQL] Run 'with ... insert ... selec...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13561#discussion_r66895096 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala --- @@ -225,7 +225,10 @@ private[hive] class SparkExecuteStatementOperation( if (useIncrementalCollect) { result.toLocalIterator.asScala } else { - result.collect().iterator + sqlContext.sessionState.executePlan(result.logicalPlan).executedPlan match { --- End diff -- use `result.queryExection.executedPlan`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs tab
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13620 **[Test build #60444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60444/consoleFull)** for PR 13620 at commit [`4db0e09`](https://github.com/apache/spark/commit/4db0e0978cd2cc57814733e6a6360407f10cb37b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13638: [SPARK-15915][SQL] CacheManager should use canoni...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/13638#discussion_r66895037 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -87,7 +87,7 @@ private[sql] class CacheManager extends Logging { query: Dataset[_], tableName: Option[String] = None, storageLevel: StorageLevel = MEMORY_AND_DISK): Unit = writeLock { -val planToCache = query.queryExecution.analyzed +val planToCache = query.queryExecution.analyzed.canonicalized --- End diff -- It seems we don't need them for now. I'll revert the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13655 **[Test build #60457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60457/consoleFull)** for PR 13655 at commit [`0b15d88`](https://github.com/apache/spark/commit/0b15d88b0cde790e9aa58eb34aa22c57642b8886). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13632 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60443/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13620: [SPARK-15590] [WEBUI] Paginate Job Table in Jobs ...
Github user nblintao commented on a diff in the pull request: https://github.com/apache/spark/pull/13620#discussion_r66894846 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -369,3 +361,246 @@ private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage("") { } } } + +private[ui] class JobTableRowData( +val jobData: JobUIData, +val lastStageName: String, +val lastStageDescription: String, +val duration: Long, +val formattedDuration: String, +val submissionTime: Long, +val formattedSubmissionTime: String, +val jobDescription: NodeSeq, +val detailUrl: String) + +private[ui] class JobDataSource( +jobs: Seq[JobUIData], +stageIdToInfo: HashMap[Int, StageInfo], +stageIdToData: HashMap[(Int, Int), StageUIData], +basePath: String, +currentTime: Long, +pageSize: Int, +sortColumn: String, +desc: Boolean) extends PagedDataSource[JobTableRowData](pageSize) { + + // Convert JobUIData to JobTableRowData which contains the final contents to show in the table + // so that we can avoid creating duplicate contents during sorting the data + private val data = jobs.map(jobRow).sorted(ordering(sortColumn, desc)) + + private var _slicedJobIds: Set[Int] = null + + override def dataSize: Int = data.size + + override def sliceData(from: Int, to: Int): Seq[JobTableRowData] = { +val r = data.slice(from, to) +_slicedJobIds = r.map(_.jobData.jobId).toSet +r + } + + def slicedJobIds: Set[Int] = _slicedJobIds --- End diff -- Sorry, it's deprecated. I will remove it. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13632 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/13655 /cc @tdas @brkyvz --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/13655 [SPARK-15935][PySpark]Enable test for sql/streaming.py and fix these tests ## What changes were proposed in this pull request? This PR just enables tests for sql/streaming.py and also fixes the failures. ## How was this patch tested? Existing unit tests. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark python-streaming-test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13655.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13655 commit 0b15d88b0cde790e9aa58eb34aa22c57642b8886 Author: Shixiong ZhuDate: 2016-06-14T01:00:13Z Enable test for sql/streaming.py and fix these tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13632 **[Test build #60443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60443/consoleFull)** for PR 13632 at commit [`e889f85`](https://github.com/apache/spark/commit/e889f85165acbb1d685e70c959abe04955d24a17). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66894492 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result +interact with Spark SQL including SQL and the Datasets API. When computing a result the same execution engine is used, independent of which API/language you are using to express the -computation. This unification means that developers can easily switch back and forth between the -various APIs based on which provides the most natural way to express a given transformation. +computation. This unification means that developers can easily switch back and forth between +different APIs based on which provides the most natural way to express a given transformation. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell. ## SQL -One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL. +One use of Spark SQL is to execute SQL queries. Spark SQL can also be used to read data from an existing Hive installation. For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section. When running -SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames). +SQL from within another programming language the results will be returned as a [Dataset\[Row\]](#datasets). You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli) or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server). -## DataFrames +## Datasets and DataFrames -A DataFrame is a distributed collection of data organized into named columns. It is conceptually -equivalent to a table in a relational database or a data frame in R/Python, but with richer -optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such -as: structured data files, tables in Hive, external databases, or existing RDDs. +A Dataset is a new interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong +typing, ability to use powerful lambda functions) with the benefits of Spark SQL's optimized +execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then +manipulated using functional transformations (map, flatMap, filter, etc.). -The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame), -[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html), -[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html). +The Dataset API is the successor of the DataFrame API, which was introduced in Spark 1.3. In Spark +2.0, Datasets and DataFrames are unified, and DataFrames are now equivalent to Datasets of `Row`s. +In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the Scala API][scala-datasets]. +However, [Java API][java-datasets] users must use `Dataset` instead. -## Datasets +[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset +[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html -A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of -RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL's -optimized execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then manipulated -using functional transformations (map, flatMap, filter, etc.). +Python does not have support for the Dataset API, but due to its dynamic nature many of the +benefits are already available (i.e. you can access the field of a row by name naturally +`row.columnName`). The case for R is similar. -The unified Dataset API can be used both in [Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and -[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does not yet have support for -the Dataset API, but due to its dynamic nature many of the benefits are already available (i.e. you can -access the field of a row by name naturally `row.columnName`).
[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12938 **[Test build #60456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60456/consoleFull)** for PR 12938 at commit [`d925f38`](https://github.com/apache/spark/commit/d925f3861ac96ddb09682b863160b18adcb4473d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66894292 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result +interact with Spark SQL including SQL and the Datasets API. When computing a result the same execution engine is used, independent of which API/language you are using to express the -computation. This unification means that developers can easily switch back and forth between the -various APIs based on which provides the most natural way to express a given transformation. +computation. This unification means that developers can easily switch back and forth between +different APIs based on which provides the most natural way to express a given transformation. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell. ## SQL -One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL. +One use of Spark SQL is to execute SQL queries. Spark SQL can also be used to read data from an existing Hive installation. For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section. When running -SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames). +SQL from within another programming language the results will be returned as a [Dataset\[Row\]](#datasets). You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli) or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server). -## DataFrames +## Datasets and DataFrames -A DataFrame is a distributed collection of data organized into named columns. It is conceptually -equivalent to a table in a relational database or a data frame in R/Python, but with richer -optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such -as: structured data files, tables in Hive, external databases, or existing RDDs. +A Dataset is a new interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong +typing, ability to use powerful lambda functions) with the benefits of Spark SQL's optimized +execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then +manipulated using functional transformations (map, flatMap, filter, etc.). -The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame), -[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html), -[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html). +The Dataset API is the successor of the DataFrame API, which was introduced in Spark 1.3. In Spark +2.0, Datasets and DataFrames are unified, and DataFrames are now equivalent to Datasets of `Row`s. +In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the Scala API][scala-datasets]. +However, [Java API][java-datasets] users must use `Dataset` instead. -## Datasets +[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset +[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html -A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of -RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL's -optimized execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then manipulated -using functional transformations (map, flatMap, filter, etc.). +Python does not have support for the Dataset API, but due to its dynamic nature many of the +benefits are already available (i.e. you can access the field of a row by name naturally +`row.columnName`). The case for R is similar. -The unified Dataset API can be used both in [Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and -[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does not yet have support for -the Dataset API, but due to its dynamic nature many of the benefits are already available (i.e. you can -access the field of a row by name naturally `row.columnName`).
[GitHub] spark issue #13563: [SPARK-15826] [CORE] PipedRDD to allow configurable char...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/13563 Jenkins retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66894132 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result +interact with Spark SQL including SQL and the Datasets API. When computing a result the same execution engine is used, independent of which API/language you are using to express the -computation. This unification means that developers can easily switch back and forth between the -various APIs based on which provides the most natural way to express a given transformation. +computation. This unification means that developers can easily switch back and forth between +different APIs based on which provides the most natural way to express a given transformation. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell. ## SQL -One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL. +One use of Spark SQL is to execute SQL queries. Spark SQL can also be used to read data from an existing Hive installation. For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section. When running -SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames). +SQL from within another programming language the results will be returned as a [Dataset\[Row\]](#datasets). You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli) or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server). -## DataFrames +## Datasets and DataFrames -A DataFrame is a distributed collection of data organized into named columns. It is conceptually -equivalent to a table in a relational database or a data frame in R/Python, but with richer -optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such -as: structured data files, tables in Hive, external databases, or existing RDDs. +A Dataset is a new interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong +typing, ability to use powerful lambda functions) with the benefits of Spark SQL's optimized +execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then +manipulated using functional transformations (map, flatMap, filter, etc.). -The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame), -[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html), -[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html). +The Dataset API is the successor of the DataFrame API, which was introduced in Spark 1.3. In Spark +2.0, Datasets and DataFrames are unified, and DataFrames are now equivalent to Datasets of `Row`s. +In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the Scala API][scala-datasets]. +However, [Java API][java-datasets] users must use `Dataset` instead. -## Datasets +[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset +[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html -A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of -RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL's -optimized execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then manipulated -using functional transformations (map, flatMap, filter, etc.). +Python does not have support for the Dataset API, but due to its dynamic nature many of the +benefits are already available (i.e. you can access the field of a row by name naturally +`row.columnName`). The case for R is similar. -The unified Dataset API can be used both in [Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and -[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does not yet have support for -the Dataset API, but due to its dynamic nature many of the benefits are already available (i.e. you can -access the field of a row by name naturally `row.columnName`).
[GitHub] spark issue #13654: [SPARK-15868] [Web UI] Executors table in Executors tab ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13654 **[Test build #60455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60455/consoleFull)** for PR 13654 at commit [`35b280a`](https://github.com/apache/spark/commit/35b280a1882fd3f5ae34c71cb00a12bf12852a90). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13648: [SPARK-15932][SQL][DOC] document the contract of encoder...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13648 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66893943 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result +interact with Spark SQL including SQL and the Datasets API. When computing a result the same execution engine is used, independent of which API/language you are using to express the -computation. This unification means that developers can easily switch back and forth between the -various APIs based on which provides the most natural way to express a given transformation. +computation. This unification means that developers can easily switch back and forth between +different APIs based on which provides the most natural way to express a given transformation. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell. ## SQL -One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL. +One use of Spark SQL is to execute SQL queries. Spark SQL can also be used to read data from an existing Hive installation. For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section. When running -SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames). +SQL from within another programming language the results will be returned as a [Dataset\[Row\]](#datasets). You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli) or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server). -## DataFrames +## Datasets and DataFrames -A DataFrame is a distributed collection of data organized into named columns. It is conceptually -equivalent to a table in a relational database or a data frame in R/Python, but with richer -optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such -as: structured data files, tables in Hive, external databases, or existing RDDs. +A Dataset is a new interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong +typing, ability to use powerful lambda functions) with the benefits of Spark SQL's optimized +execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then +manipulated using functional transformations (map, flatMap, filter, etc.). -The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame), -[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html), -[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html). +The Dataset API is the successor of the DataFrame API, which was introduced in Spark 1.3. In Spark +2.0, Datasets and DataFrames are unified, and DataFrames are now equivalent to Datasets of `Row`s. +In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the Scala API][scala-datasets]. +However, [Java API][java-datasets] users must use `Dataset` instead. -## Datasets +[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset +[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html -A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of -RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL's -optimized execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then manipulated -using functional transformations (map, flatMap, filter, etc.). +Python does not have support for the Dataset API, but due to its dynamic nature many of the +benefits are already available (i.e. you can access the field of a row by name naturally +`row.columnName`). The case for R is similar. -The unified Dataset API can be used both in [Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and -[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does not yet have support for -the Dataset API, but due to its dynamic nature many of the benefits are already available (i.e. you can -access the field of a row by name naturally `row.columnName`).
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66893918 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result +interact with Spark SQL including SQL and the Datasets API. When computing a result the same execution engine is used, independent of which API/language you are using to express the -computation. This unification means that developers can easily switch back and forth between the -various APIs based on which provides the most natural way to express a given transformation. +computation. This unification means that developers can easily switch back and forth between +different APIs based on which provides the most natural way to express a given transformation. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell. ## SQL -One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL. +One use of Spark SQL is to execute SQL queries. Spark SQL can also be used to read data from an existing Hive installation. For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section. When running -SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames). +SQL from within another programming language the results will be returned as a [Dataset\[Row\]](#datasets). You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli) or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server). -## DataFrames +## Datasets and DataFrames -A DataFrame is a distributed collection of data organized into named columns. It is conceptually -equivalent to a table in a relational database or a data frame in R/Python, but with richer -optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such -as: structured data files, tables in Hive, external databases, or existing RDDs. +A Dataset is a new interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong +typing, ability to use powerful lambda functions) with the benefits of Spark SQL's optimized +execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then +manipulated using functional transformations (map, flatMap, filter, etc.). -The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame), -[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html), -[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html). +The Dataset API is the successor of the DataFrame API, which was introduced in Spark 1.3. In Spark +2.0, Datasets and DataFrames are unified, and DataFrames are now equivalent to Datasets of `Row`s. +In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the Scala API][scala-datasets]. +However, [Java API][java-datasets] users must use `Dataset` instead. -## Datasets +[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset +[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html -A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of -RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL's -optimized execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then manipulated -using functional transformations (map, flatMap, filter, etc.). +Python does not have support for the Dataset API, but due to its dynamic nature many of the +benefits are already available (i.e. you can access the field of a row by name naturally +`row.columnName`). The case for R is similar. -The unified Dataset API can be used both in [Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and -[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does not yet have support for -the Dataset API, but due to its dynamic nature many of the benefits are already available (i.e. you can -access the field of a row by name naturally `row.columnName`).
[GitHub] spark pull request #13654: [SPARK-15868] [Web UI] Executors table in Executo...
GitHub user ajbozarth opened a pull request: https://github.com/apache/spark/pull/13654 [SPARK-15868] [Web UI] Executors table in Executors tab should sort Executor IDs in numerical order ## What changes were proposed in this pull request? Currently the Executors table sorts by id using a string sort (since that's what it is stored as). Since the id is a number (other than the driver) we should be sorting numerically. I have changed both the initial sort on page load as well as the table sort to sort on id numerically, treating non-numeric strings (like the driver) as "-1" ## How was this patch tested? Manually tested and dev/run-tests ![pageload](https://cloud.githubusercontent.com/assets/13952758/16027882/d32edd0a-318e-11e6-9faf-fc972b7c36ab.png) ![sorted](https://cloud.githubusercontent.com/assets/13952758/16027883/d34541c6-318e-11e6-9ed7-6bfc0cd4152e.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajbozarth/spark spark15868 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13654.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13654 commit 35b280a1882fd3f5ae34c71cb00a12bf12852a90 Author: Alex BozarthDate: 2016-06-14T00:08:52Z Fixed sorting on executor id --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13632: [SPARK-15910][SQL] Check schema consistency when ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13632 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13632 thanks, merging to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13652: [SPARK-] Fix incorrect days to millis conversion
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13652 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13652: [SPARK-] Fix incorrect days to millis conversion
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60441/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13652: [SPARK-] Fix incorrect days to millis conversion
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13652 **[Test build #60441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60441/consoleFull)** for PR 13652 at commit [`20904c4`](https://github.com/apache/spark/commit/20904c4323c5f908a42ea1b4cea6701626cebe28). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13632 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13632 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60442/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66893224 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result +interact with Spark SQL including SQL and the Datasets API. When computing a result the same execution engine is used, independent of which API/language you are using to express the -computation. This unification means that developers can easily switch back and forth between the -various APIs based on which provides the most natural way to express a given transformation. +computation. This unification means that developers can easily switch back and forth between +different APIs based on which provides the most natural way to express a given transformation. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell. ## SQL -One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL. +One use of Spark SQL is to execute SQL queries. Spark SQL can also be used to read data from an existing Hive installation. For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section. When running -SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames). +SQL from within another programming language the results will be returned as a [Dataset\[Row\]](#datasets). You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli) or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server). -## DataFrames +## Datasets and DataFrames -A DataFrame is a distributed collection of data organized into named columns. It is conceptually -equivalent to a table in a relational database or a data frame in R/Python, but with richer -optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such -as: structured data files, tables in Hive, external databases, or existing RDDs. +A Dataset is a new interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong +typing, ability to use powerful lambda functions) with the benefits of Spark SQL's optimized +execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then +manipulated using functional transformations (map, flatMap, filter, etc.). -The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark.sql.DataFrame), -[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html), -[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and [R](api/R/index.html). +The Dataset API is the successor of the DataFrame API, which was introduced in Spark 1.3. In Spark --- End diff -- I will remove this line ~~The Dataset API is the successor of the DataFrame API, which was introduced in Spark 1.3.~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13632 **[Test build #60442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60442/consoleFull)** for PR 13632 at commit [`35a1ee0`](https://github.com/apache/spark/commit/35a1ee09b82d4d66a95d4e88d8dda4e056cd0e11). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/13037 ![collapseopen](https://cloud.githubusercontent.com/assets/13952758/16027758/6dd2b5fe-318d-11e6-9c56-3862af2fe479.png) ![collapseclosed](https://cloud.githubusercontent.com/assets/13952758/16027759/6de5fc04-318d-11e6-8827-e0e1c1e80e72.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/13037 Ran into some bad merge/rebase issues and had to do a forced push with my new code (sorry the previous commits are gone, but so are all the changes in them anyway) I've added both of @kayousterhout ideas, the link to the tasks table via "X Completed Tasks" as well as added a collapsible table feature, that I wrote to be reusable with minimal effort elsewhere in the UI. Since the link is a bit redundant I can remove it if everyone want, I don't think it hurts to have both though. I also left the table open by default (since users are currently used to seeing it). I will be adding screen shots in a bit, I forgot to take them before dealing with all the rebasing issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12876: [SPARK-15095] [SQL] drop binary mode in ThriftServer
Github user epahomov commented on the issue: https://github.com/apache/spark/pull/12876 I've created a ticket to revert these changes - https://issues.apache.org/jira/browse/SPARK-15934 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66892238 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result +interact with Spark SQL including SQL and the Datasets API. When computing a result the same execution engine is used, independent of which API/language you are using to express the -computation. This unification means that developers can easily switch back and forth between the -various APIs based on which provides the most natural way to express a given transformation. +computation. This unification means that developers can easily switch back and forth between +different APIs based on which provides the most natural way to express a given transformation. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell. ## SQL -One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL. +One use of Spark SQL is to execute SQL queries. Spark SQL can also be used to read data from an existing Hive installation. For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section. When running -SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames). +SQL from within another programming language the results will be returned as a [Dataset\[Row\]](#datasets). You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli) or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server). -## DataFrames +## Datasets and DataFrames -A DataFrame is a distributed collection of data organized into named columns. It is conceptually -equivalent to a table in a relational database or a data frame in R/Python, but with richer -optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such -as: structured data files, tables in Hive, external databases, or existing RDDs. +A Dataset is a new interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong +typing, ability to use powerful lambda functions) with the benefits of Spark SQL's optimized +execution engine. A Dataset can be [constructed](#creating-datasets) from JVM objects and then --- End diff -- Maybe `[constructed](#creating-datasets) from JVM objects` => `[created from JVM objects](#creating-datasets)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13037 **[Test build #60454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60454/consoleFull)** for PR 13037 at commit [`9e62eb3`](https://github.com/apache/spark/commit/9e62eb3344e41814b12a783ec51bec871dc8320b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66891847 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result +interact with Spark SQL including SQL and the Datasets API. When computing a result the same execution engine is used, independent of which API/language you are using to express the -computation. This unification means that developers can easily switch back and forth between the -various APIs based on which provides the most natural way to express a given transformation. +computation. This unification means that developers can easily switch back and forth between +different APIs based on which provides the most natural way to express a given transformation. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`, `pyspark` shell, or `sparkR` shell. ## SQL -One use of Spark SQL is to execute SQL queries written using either a basic SQL syntax or HiveQL. +One use of Spark SQL is to execute SQL queries. Spark SQL can also be used to read data from an existing Hive installation. For more on how to configure this feature, please refer to the [Hive Tables](#hive-tables) section. When running -SQL from within another programming language the results will be returned as a [DataFrame](#DataFrames). +SQL from within another programming language the results will be returned as a [Dataset\[Row\]](#datasets). You can also interact with the SQL interface using the [command-line](#running-the-spark-sql-cli) or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server). -## DataFrames +## Datasets and DataFrames -A DataFrame is a distributed collection of data organized into named columns. It is conceptually -equivalent to a table in a relational database or a data frame in R/Python, but with richer -optimizations under the hood. DataFrames can be constructed from a wide array of [sources](#data-sources) such -as: structured data files, tables in Hive, external databases, or existing RDDs. +A Dataset is a new interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong +typing, ability to use powerful lambda functions) with the benefits of Spark SQL's optimized --- End diff -- "with the benefits of" => "as well as" ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66891502 --- Diff: docs/sql-programming-guide.md --- @@ -12,130 +12,130 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result +interact with Spark SQL including SQL and the Datasets API. When computing a result --- End diff -- "Dataset API" instead of "Datasets API"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13563: [SPARK-15826] [CORE] PipedRDD to allow configurable char...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13563 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60440/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13563: [SPARK-15826] [CORE] PipedRDD to allow configurable char...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13563 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13563: [SPARK-15826] [CORE] PipedRDD to allow configurable char...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13563 **[Test build #60440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60440/consoleFull)** for PR 13563 at commit [`fecd730`](https://github.com/apache/spark/commit/fecd730e982063bb19208d7a05bed8d6bbcbe776). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/12938 Which of course seems to be more that things are flaky rather than that commit actually introducing it :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13498 **[Test build #3092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3092/consoleFull)** for PR 13498 at commit [`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13649: [SPARK-15929] Fix portability of DataFrameSuite p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13649 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13037: [SPARK-1301] [Web UI] Added anchor links to Accumulators...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13037 **[Test build #60453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60453/consoleFull)** for PR 13037 at commit [`5b12cff`](https://github.com/apache/spark/commit/5b12cff99a7252193da1e555185da50d4b973750). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13649 Merging to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13636: [SPARK-15637][SPARKR] Remove R version check since maske...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13636 @shivaram True. @felixcheung Could you please also add SPARK-15931 to the PR title if this PR also targets that one? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13651: [SPARK-15776][SQL] Divide Expression inside Aggregation ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13651 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60439/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13651: [SPARK-15776][SQL] Divide Expression inside Aggregation ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13651 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13651: [SPARK-15776][SQL] Divide Expression inside Aggregation ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13651 **[Test build #60439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60439/consoleFull)** for PR 13651 at commit [`df08eea`](https://github.com/apache/spark/commit/df08eeacd85187ca5a71463fc5d25f63426ebe84). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13492: [SPARK-15749][SQL]make the error message more meaningful
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13492 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13492: [SPARK-15749][SQL]make the error message more meaningful
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13492 **[Test build #60449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60449/consoleFull)** for PR 13492 at commit [`d0fcd33`](https://github.com/apache/spark/commit/d0fcd330c7a427fc3dccdbdb457a2139f6a238f0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13492: [SPARK-15749][SQL]make the error message more meaningful
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13492 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60449/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13649 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13649 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60438/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13498 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60437/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13649 **[Test build #60438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60438/consoleFull)** for PR 13649 at commit [`a466517`](https://github.com/apache/spark/commit/a46651794d701370d673b362019274fe76a2ff29). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13498 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/12938 Git bisect ends up blaming e2ab79d5ea00af45c083cc9a6607d2f0905f9908 - I'll poke at it some --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13498 **[Test build #3089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3089/consoleFull)** for PR 13498 at commit [`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13498 **[Test build #60437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60437/consoleFull)** for PR 13498 at commit [`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13527: [SPARK-15782] [CORE] Set spark.jars system property in c...
Github user nezihyigitbasi commented on the issue: https://github.com/apache/spark/pull/13527 thanks @vanzin, addressed your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13498 **[Test build #3090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3090/consoleFull)** for PR 13498 at commit [`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13653#discussion_r66888549 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.test + +import org.apache.spark.sql._ +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types.{StringType, StructField, StructType} +import org.apache.spark.util.Utils + + +object LastOptions { + + var parameters: Map[String, String] = null + var schema: Option[StructType] = null + var saveMode: SaveMode = null + + def clear(): Unit = { +parameters = null +schema = null +saveMode = null + } +} + + +/** Dummy provider. */ +class DefaultSource + extends RelationProvider + with SchemaRelationProvider + with CreatableRelationProvider { + + case class FakeRelation(sqlContext: SQLContext) extends BaseRelation { +override def schema: StructType = StructType(Seq(StructField("a", StringType))) + } + + override def createRelation( + sqlContext: SQLContext, + parameters: Map[String, String], + schema: StructType +): BaseRelation = { +LastOptions.parameters = parameters +LastOptions.schema = Some(schema) +FakeRelation(sqlContext) + } + + override def createRelation( + sqlContext: SQLContext, + parameters: Map[String, String] +): BaseRelation = { +LastOptions.parameters = parameters +LastOptions.schema = None +FakeRelation(sqlContext) + } + + override def createRelation( + sqlContext: SQLContext, + mode: SaveMode, + parameters: Map[String, String], + data: DataFrame): BaseRelation = { +LastOptions.parameters = parameters +LastOptions.schema = None +LastOptions.saveMode = mode +FakeRelation(sqlContext) + } +} + + +class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext { --- End diff -- These tests are very rudimentary set of tests to test the DataFrameReader code path. Note that in Spark 1.6, there were no tests at all. Also I believe that a lot of the DF functionality is tested through other test suites (e.g. partitioning columns is tested through PartitionedParquetSuite) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13653#discussion_r66888413 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamReaderWriterSuite.scala --- @@ -371,76 +384,12 @@ class DataFrameReaderWriterSuite extends StreamTest with BeforeAndAfter { private def newTextInput = Utils.createTempDir(namePrefix = "text").getCanonicalPath - test("check trigger() can only be called on continuous queries") { --- End diff -- Most of these tests are to check whether one method is called on the wrong type of DF or not. So all of these can be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13653#discussion_r66888223 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -0,0 +1,401 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.Experimental +import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, ForeachWriter} +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.streaming.{ForeachSink, MemoryPlan, MemorySink} + +/** + * :: Experimental :: + * Interface used to write a streaming [[Dataset]] to external storage systems (e.g. file systems, + * key-value stores, etc). Use [[Dataset.write]] to access this. + * + * @since 2.0.0 + */ +@Experimental +final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { + + private val df = ds.toDF() + + /** + * :: Experimental :: + * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink. + * - `OutputMode.Append()`: only the new rows in the streaming DataFrame/Dataset will be + *written to the sink + * - `OutputMode.Complete()`: all the rows in the streaming DataFrame/Dataset will be written + * to the sink every time these is some updates + * + * @since 2.0.0 + */ + @Experimental + def outputMode(outputMode: OutputMode): DataStreamWriter[T] = { +this.outputMode = outputMode +this + } + + + /** + * :: Experimental :: + * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink. + * - `append`: only the new rows in the streaming DataFrame/Dataset will be written to + * the sink + * - `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink + * every time these is some updates + * + * @since 2.0.0 + */ + @Experimental + def outputMode(outputMode: String): DataStreamWriter[T] = { --- End diff -- should this be shortened to just `mode`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13558 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13558 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60446/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13558 **[Test build #60446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60446/consoleFull)** for PR 13558 at commit [`8d87c0f`](https://github.com/apache/spark/commit/8d87c0f2bd9140928915f835fd7d21b178422c69). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13653#discussion_r66888185 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -0,0 +1,401 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.Experimental +import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, ForeachWriter} +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.streaming.{ForeachSink, MemoryPlan, MemorySink} + +/** + * :: Experimental :: + * Interface used to write a streaming [[Dataset]] to external storage systems (e.g. file systems, + * key-value stores, etc). Use [[Dataset.write]] to access this. + * + * @since 2.0.0 + */ +@Experimental +final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { + + private val df = ds.toDF() + + /** + * :: Experimental :: + * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink. + * - `OutputMode.Append()`: only the new rows in the streaming DataFrame/Dataset will be + *written to the sink + * - `OutputMode.Complete()`: all the rows in the streaming DataFrame/Dataset will be written + * to the sink every time these is some updates + * + * @since 2.0.0 + */ + @Experimental + def outputMode(outputMode: OutputMode): DataStreamWriter[T] = { +this.outputMode = outputMode +this + } + + + /** + * :: Experimental :: + * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink. + * - `append`: only the new rows in the streaming DataFrame/Dataset will be written to + * the sink + * - `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink + * every time these is some updates + * + * @since 2.0.0 + */ + @Experimental + def outputMode(outputMode: String): DataStreamWriter[T] = { +this.outputMode = outputMode.toLowerCase match { + case "append" => +OutputMode.Append + case "complete" => +OutputMode.Complete + case _ => +throw new IllegalArgumentException(s"Unknown output mode $outputMode. " + + "Accepted output modes are 'append' and 'complete'") +} +this + } + + /** + * :: Experimental :: + * Set the trigger for the stream query. The default value is `ProcessingTime(0)` and it will run + * the query as fast as possible. + * + * Scala Example: + * {{{ + * df.write.trigger(ProcessingTime("10 seconds")) + * + * import scala.concurrent.duration._ + * df.write.trigger(ProcessingTime(10.seconds)) + * }}} + * + * Java Example: + * {{{ + * df.write.trigger(ProcessingTime.create("10 seconds")) + * + * import java.util.concurrent.TimeUnit + * df.write.trigger(ProcessingTime.create(10, TimeUnit.SECONDS)) + * }}} + * + * @since 2.0.0 + */ + @Experimental + def trigger(trigger: Trigger): DataStreamWriter[T] = { +this.trigger = trigger +this + } + + + /** + * :: Experimental :: + * Specifies the name of the [[ContinuousQuery]] that can be started with `startStream()`. + * This name must be unique among all the currently active queries in the associated SQLContext. + * + * @since 2.0.0 + */ + @Experimental + def queryName(queryName: String): DataStreamWriter[T] = { +this.extraOptions += ("queryName" -> queryName) +this + } + + /** + * :: Experimental :: + * Specifies the underlying output
[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13498 **[Test build #3091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3091/consoleFull)** for PR 13498 at commit [`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13561: [SPARK-15824][SQL] Run 'with ... insert ... select' fail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13561 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13561: [SPARK-15824][SQL] Run 'with ... insert ... select' fail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60445/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13498 **[Test build #3094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3094/consoleFull)** for PR 13498 at commit [`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13561: [SPARK-15824][SQL] Run 'with ... insert ... select' fail...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13561 **[Test build #60445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60445/consoleFull)** for PR 13561 at commit [`31212da`](https://github.com/apache/spark/commit/31212da89a7e7acfe7e7ae7f97860f1b45b481b3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13653#discussion_r66888051 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -0,0 +1,401 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.Experimental +import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, ForeachWriter} +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.streaming.{ForeachSink, MemoryPlan, MemorySink} + +/** + * :: Experimental :: + * Interface used to write a streaming [[Dataset]] to external storage systems (e.g. file systems, + * key-value stores, etc). Use [[Dataset.write]] to access this. + * + * @since 2.0.0 + */ +@Experimental +final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { + + private val df = ds.toDF() + + /** + * :: Experimental :: + * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink. + * - `OutputMode.Append()`: only the new rows in the streaming DataFrame/Dataset will be + *written to the sink + * - `OutputMode.Complete()`: all the rows in the streaming DataFrame/Dataset will be written + * to the sink every time these is some updates + * + * @since 2.0.0 + */ + @Experimental + def outputMode(outputMode: OutputMode): DataStreamWriter[T] = { +this.outputMode = outputMode +this + } + + + /** + * :: Experimental :: + * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink. + * - `append`: only the new rows in the streaming DataFrame/Dataset will be written to + * the sink + * - `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink + * every time these is some updates + * + * @since 2.0.0 + */ + @Experimental + def outputMode(outputMode: String): DataStreamWriter[T] = { +this.outputMode = outputMode.toLowerCase match { + case "append" => +OutputMode.Append + case "complete" => +OutputMode.Complete + case _ => +throw new IllegalArgumentException(s"Unknown output mode $outputMode. " + + "Accepted output modes are 'append' and 'complete'") +} +this + } + + /** + * :: Experimental :: + * Set the trigger for the stream query. The default value is `ProcessingTime(0)` and it will run + * the query as fast as possible. + * + * Scala Example: + * {{{ + * df.write.trigger(ProcessingTime("10 seconds")) + * + * import scala.concurrent.duration._ + * df.write.trigger(ProcessingTime(10.seconds)) + * }}} + * + * Java Example: + * {{{ + * df.write.trigger(ProcessingTime.create("10 seconds")) + * + * import java.util.concurrent.TimeUnit + * df.write.trigger(ProcessingTime.create(10, TimeUnit.SECONDS)) + * }}} + * + * @since 2.0.0 + */ + @Experimental + def trigger(trigger: Trigger): DataStreamWriter[T] = { +this.trigger = trigger +this + } + + + /** + * :: Experimental :: + * Specifies the name of the [[ContinuousQuery]] that can be started with `startStream()`. + * This name must be unique among all the currently active queries in the associated SQLContext. + * + * @since 2.0.0 + */ + @Experimental + def queryName(queryName: String): DataStreamWriter[T] = { +this.extraOptions += ("queryName" -> queryName) +this + } + + /** + * :: Experimental :: + * Specifies the underlying output
[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13653#discussion_r66888017 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -0,0 +1,401 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.Experimental +import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, ForeachWriter} +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.streaming.{ForeachSink, MemoryPlan, MemorySink} + +/** + * :: Experimental :: + * Interface used to write a streaming [[Dataset]] to external storage systems (e.g. file systems, + * key-value stores, etc). Use [[Dataset.write]] to access this. + * + * @since 2.0.0 + */ +@Experimental +final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { + + private val df = ds.toDF() + + /** + * :: Experimental :: + * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink. + * - `OutputMode.Append()`: only the new rows in the streaming DataFrame/Dataset will be + *written to the sink + * - `OutputMode.Complete()`: all the rows in the streaming DataFrame/Dataset will be written + * to the sink every time these is some updates + * + * @since 2.0.0 + */ + @Experimental + def outputMode(outputMode: OutputMode): DataStreamWriter[T] = { +this.outputMode = outputMode +this + } + + + /** + * :: Experimental :: + * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink. + * - `append`: only the new rows in the streaming DataFrame/Dataset will be written to + * the sink + * - `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink + * every time these is some updates + * + * @since 2.0.0 + */ + @Experimental + def outputMode(outputMode: String): DataStreamWriter[T] = { +this.outputMode = outputMode.toLowerCase match { + case "append" => +OutputMode.Append + case "complete" => +OutputMode.Complete + case _ => +throw new IllegalArgumentException(s"Unknown output mode $outputMode. " + + "Accepted output modes are 'append' and 'complete'") +} +this + } + + /** + * :: Experimental :: + * Set the trigger for the stream query. The default value is `ProcessingTime(0)` and it will run + * the query as fast as possible. + * + * Scala Example: + * {{{ + * df.write.trigger(ProcessingTime("10 seconds")) --- End diff -- update these examples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13638: [SPARK-15915][SQL] CacheManager should use canoni...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13638#discussion_r66887975 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -87,7 +87,7 @@ private[sql] class CacheManager extends Logging { query: Dataset[_], tableName: Option[String] = None, storageLevel: StorageLevel = MEMORY_AND_DISK): Unit = writeLock { -val planToCache = query.queryExecution.analyzed +val planToCache = query.queryExecution.analyzed.canonicalized --- End diff -- do we still need these changes? `LogicalPlan.canonicalized` is a lazy val and we don't have performance penalty even we use un-canonicalized plan as key. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13653#discussion_r66887929 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.Experimental +import org.apache.spark.internal.Logging +import org.apache.spark.sql.{DataFrame, Dataset, SparkSession} +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.streaming.StreamingRelation +import org.apache.spark.sql.types.StructType + +@Experimental +final class DataStreamReader private[sql](sparkSession: SparkSession) extends Logging { + /** + * :: Experimental :: + * Specifies the input data source format. + * + * @since 2.0.0 + */ + @Experimental + def format(source: String): DataStreamReader = { +this.source = source +this + } + + /** + * :: Experimental :: + * Specifies the input schema. Some data sources (e.g. JSON) can infer the input schema + * automatically from data. By specifying the schema here, the underlying data source can + * skip the schema inference step, and thus speed up data loading. + * + * @since 2.0.0 + */ + @Experimental + def schema(schema: StructType): DataStreamReader = { +this.userSpecifiedSchema = Option(schema) +this + } + + /** + * :: Experimental :: + * Adds an input option for the underlying data source. + * + * @since 2.0.0 + */ + @Experimental + def option(key: String, value: String): DataStreamReader = { +this.extraOptions += (key -> value) +this + } + + /** + * :: Experimental :: + * Adds an input option for the underlying data source. + * + * @since 2.0.0 + */ + @Experimental + def option(key: String, value: Boolean): DataStreamReader = option(key, value.toString) + + /** + * :: Experimental :: + * Adds an input option for the underlying data source. + * + * @since 2.0.0 + */ + @Experimental + def option(key: String, value: Long): DataStreamReader = option(key, value.toString) + + /** + * :: Experimental :: + * Adds an input option for the underlying data source. + * + * @since 2.0.0 + */ + @Experimental + def option(key: String, value: Double): DataStreamReader = option(key, value.toString) + + /** + * :: Experimental :: + * (Scala-specific) Adds input options for the underlying data source. + * + * @since 2.0.0 + */ + @Experimental + def options(options: scala.collection.Map[String, String]): DataStreamReader = { +this.extraOptions ++= options +this + } + + /** + * :: Experimental :: + * Adds input options for the underlying data source. + * + * @since 2.0.0 + */ + @Experimental + def options(options: java.util.Map[String, String]): DataStreamReader = { +this.options(options.asScala) +this + } + + + /** + * :: Experimental :: + * Loads input data stream in as a [[DataFrame]], for data streams that don't require a path + * (e.g. external key-value stores). + * + * @since 2.0.0 + */ + @Experimental + def load(): DataFrame = { --- End diff -- @marmbrus @rxin Should this be `load()`? In the general case, ``` sdf.readStream .format("myFormat") .load("stringIdentifier") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact
[GitHub] spark issue #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reader-write...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13653 **[Test build #60452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60452/consoleFull)** for PR 13653 at commit [`a59498b`](https://github.com/apache/spark/commit/a59498bbe86609bc206de9b229052f35071049cb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13653#discussion_r66887779 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import scala.collection.JavaConverters._ + +import org.apache.spark.annotation.Experimental +import org.apache.spark.internal.Logging +import org.apache.spark.sql.{DataFrame, Dataset, SparkSession} +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.streaming.StreamingRelation +import org.apache.spark.sql.types.StructType + +@Experimental --- End diff -- add docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13653: [SPARK-15933][SQL][STREAMING] Refactored DF reade...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/13653 [SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream and writeStream for streaming DFs ## What changes were proposed in this pull request? Currently, the DataFrameReader/Writer has method that are needed for streaming and non-streaming DFs. This is quite awkward because each method in them through runtime exception for one case or the other. So rather having half the methods throw runtime exceptions, its just better to have a different reader/writer API for streams. ## How was this patch tested? Existing unit tests + two sets of unit tests for DataFrameReader/Writer and DataStreamReader/Writer. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-15933 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13653.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13653 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11079: [SPARK-13197][SQL] When trying to select from the data f...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/11079 @thomastechs We should use `df.select("`a.c`")` to select a column with name "a.c". The reason is that we can df.select can be used to select a nested column, for example: ``` scala> case class A(inner: Int) scala> val df = Seq((A(1), 2)).toDF("a", "b") scala> df.select("a.inner") res10: org.apache.spark.sql.DataFrame = [inner: int] scala> df2.select("a.inner").show() +-+ |inner| +-+ |1| +-+ ``` So, I think this is NOT a bug, current behavior of select is expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13647: [SPARK-15784][ML][WIP]:Add Power Iteration Clustering to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13647 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60450/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13647: [SPARK-15784][ML][WIP]:Add Power Iteration Clustering to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13647 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13647: [SPARK-15784][ML][WIP]:Add Power Iteration Clustering to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13647 **[Test build #60450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60450/consoleFull)** for PR 13647 at commit [`78b70b4`](https://github.com/apache/spark/commit/78b70b41de3de54f9114f495d754fc4852294084). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13115: [SPARK-12492] Using spark-sql commond to run quer...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/13115#discussion_r66887332 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala --- @@ -110,24 +110,29 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) { */ def hiveResultString(): Seq[String] = executedPlan match { case ExecutedCommandExec(desc: DescribeTableCommand) => - // If it is a describe command for a Hive table, we want to have the output format - // be similar with Hive. - desc.run(sparkSession).map { -case Row(name: String, dataType: String, comment) => - Seq(name, dataType, -Option(comment.asInstanceOf[String]).getOrElse("")) -.map(s => String.format(s"%-20s", s)) -.mkString("\t") + SQLExecution.withNewExecutionId(sparkSession, this) { +// If it is a describe command for a Hive table, we want to have the output format +// be similar with Hive. +desc.run(sparkSession).map { + case Row(name: String, dataType: String, comment) => +Seq(name, dataType, + Option(comment.asInstanceOf[String]).getOrElse("")) + .map(s => String.format(s"%-20s", s)) + .mkString("\t") +} } case command: ExecutedCommandExec => - command.executeCollect().map(_.getString(0)) - + SQLExecution.withNewExecutionId(sparkSession, this) { +command.executeCollect().map(_.getString(0)) --- End diff -- okey. @KaiXinXiaoLei could you remove the wrapper for this line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13647: [SPARK-15784][ML][WIP]:Add Power Iteration Clustering to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13647 **[Test build #60450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60450/consoleFull)** for PR 13647 at commit [`78b70b4`](https://github.com/apache/spark/commit/78b70b41de3de54f9114f495d754fc4852294084). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13638: [SPARK-15915][SQL] CacheManager should use canonicalized...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13638 **[Test build #60451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60451/consoleFull)** for PR 13638 at commit [`11dc433`](https://github.com/apache/spark/commit/11dc433975719288a3694746c60a571d3f349b22). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13498 **[Test build #3095 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3095/consoleFull)** for PR 13498 at commit [`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13632 LGTM, pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13492: [SPARK-15749][SQL]make the error message more meaningful
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13492 **[Test build #60449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60449/consoleFull)** for PR 13492 at commit [`d0fcd33`](https://github.com/apache/spark/commit/d0fcd330c7a427fc3dccdbdb457a2139f6a238f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13115: [SPARK-12492] Using spark-sql commond to run quer...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/13115#discussion_r66886807 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala --- @@ -110,24 +110,29 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) { */ def hiveResultString(): Seq[String] = executedPlan match { case ExecutedCommandExec(desc: DescribeTableCommand) => - // If it is a describe command for a Hive table, we want to have the output format - // be similar with Hive. - desc.run(sparkSession).map { -case Row(name: String, dataType: String, comment) => - Seq(name, dataType, -Option(comment.asInstanceOf[String]).getOrElse("")) -.map(s => String.format(s"%-20s", s)) -.mkString("\t") + SQLExecution.withNewExecutionId(sparkSession, this) { +// If it is a describe command for a Hive table, we want to have the output format +// be similar with Hive. +desc.run(sparkSession).map { + case Row(name: String, dataType: String, comment) => +Seq(name, dataType, + Option(comment.asInstanceOf[String]).getOrElse("")) + .map(s => String.format(s"%-20s", s)) + .mkString("\t") +} } case command: ExecutedCommandExec => - command.executeCollect().map(_.getString(0)) - + SQLExecution.withNewExecutionId(sparkSession, this) { +command.executeCollect().map(_.getString(0)) --- End diff -- not sure. It's probably fine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13637: [SPARK-15914][SQL] Add deprecated method back to SQLCont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13637 **[Test build #60447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60447/consoleFull)** for PR 13637 at commit [`0cc81b1`](https://github.com/apache/spark/commit/0cc81b1d7dc3c4e05e3adf5f72c25c320a043a0c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13546: [SPARK-15808] [SQL] File Format Checking When Appending ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13546 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13546: [SPARK-15808] [SQL] File Format Checking When Appending ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13546 **[Test build #60448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60448/consoleFull)** for PR 13546 at commit [`9d9d263`](https://github.com/apache/spark/commit/9d9d2632fde85c62b28454f33b24f7ee8fb6f15e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org