[GitHub] [spark] HyukjinKwon commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


HyukjinKwon commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621137033


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


SparkQA commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621138777


   **[Test build #122059 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122059/testReport)**
 for PR 28395 at commit 
[`481bba6`](https://github.com/apache/spark/commit/481bba62ce62a13f23ce153e54e8a5f56f6059c2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621139361







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621139361







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


yaooqinn commented on pull request #28402:
URL: https://github.com/apache/spark/pull/28402#issuecomment-621147001


   cc @cloud-fan @maropu @dongjoon-hyun thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


beliefer commented on a change in pull request #28194:
URL: https://github.com/apache/spark/pull/28194#discussion_r417254240



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.util.{fileToString, stringToFile}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * End-to-end test cases for SQL schemas of expression examples.
+ * The golden result file is 
"spark/sql/core/src/test/resources/sql-functions/sql-expression-schema.md".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *ExpressionsSchemaSuite"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*ExpressionsSchemaSuite"
+ * }}}
+ *
+ * For example:
+ * {{{
+ *   ...
+ *   @ExpressionDescription(
+ * usage = "_FUNC_(str, n) - Returns the string which repeats the given 
string value n times.",
+ * examples = """
+ *   Examples:
+ * > SELECT _FUNC_('123', 2);
+ *  123123
+ * """,
+ * since = "1.5.0")
+ *   case class StringRepeat(str: Expression, times: Expression)
+ *   ...
+ * }}}
+ *
+ * The format for golden result files look roughly like:
+ * {{{
+ *   ...
+ *   | org.apache.spark.sql.catalyst.expressions.StringRepeat | repeat | 
SELECT repeat('123', 2) | struct |
+ *   ...
+ * }}}
+ */
+// scalastyle:on line.size.limit
+@ExtendedSQLTest
+class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  private val baseResourcePath = {
+// We use a path based on Spark home for 2 reasons:
+//   1. Maven can't get correct resource directory when resources in other 
jars.
+//   2. We test subclasses in the hive-thriftserver module.
+val sparkHome = {
+  assert(sys.props.contains("spark.test.home") ||
+sys.env.contains("SPARK_HOME"), "spark.test.home or SPARK_HOME is not 
set.")
+  sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME"))
+}
+
+java.nio.file.Paths.get(sparkHome,

Review comment:
   It's work too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


beliefer commented on a change in pull request #28194:
URL: https://github.com/apache/spark/pull/28194#discussion_r417254545



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.util.{fileToString, stringToFile}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * End-to-end test cases for SQL schemas of expression examples.
+ * The golden result file is 
"spark/sql/core/src/test/resources/sql-functions/sql-expression-schema.md".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *ExpressionsSchemaSuite"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*ExpressionsSchemaSuite"
+ * }}}
+ *
+ * For example:
+ * {{{
+ *   ...
+ *   @ExpressionDescription(
+ * usage = "_FUNC_(str, n) - Returns the string which repeats the given 
string value n times.",
+ * examples = """
+ *   Examples:
+ * > SELECT _FUNC_('123', 2);
+ *  123123
+ * """,
+ * since = "1.5.0")
+ *   case class StringRepeat(str: Expression, times: Expression)
+ *   ...
+ * }}}
+ *
+ * The format for golden result files look roughly like:
+ * {{{
+ *   ...
+ *   | org.apache.spark.sql.catalyst.expressions.StringRepeat | repeat | 
SELECT repeat('123', 2) | struct |
+ *   ...
+ * }}}
+ */
+// scalastyle:on line.size.limit
+@ExtendedSQLTest
+class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  private val baseResourcePath = {
+// We use a path based on Spark home for 2 reasons:
+//   1. Maven can't get correct resource directory when resources in other 
jars.
+//   2. We test subclasses in the hive-thriftserver module.
+val sparkHome = {
+  assert(sys.props.contains("spark.test.home") ||
+sys.env.contains("SPARK_HOME"), "spark.test.home or SPARK_HOME is not 
set.")
+  sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME"))
+}
+
+java.nio.file.Paths.get(sparkHome,
+  "sql", "core", "src", "test", "resources", "sql-functions").toFile
+  }
+
+  private val resultFile = new File(baseResourcePath, 
"sql-expression-schema.md")
+
+  /** A single SQL query's SQL and schema. */
+  protected case class QueryOutput(
+className: String,

Review comment:
   OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


HeartSaVioR commented on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621155094


   It sounds like non-minor thing which worths to initiate discussion.
   
   For my side, first two sections have been ended up describing the similar 
things, hence many times I just mentioned that previous/next section describes 
it.
   
   I'm also wondering about the benefits of being strict about the section 
`Does this PR introduce any user-facing change?`. This seems to be adopted from 
Kubernetes, but after I looked into Kubernetes template, the meaning of the 
section looks to be very different.
   
   
https://raw.githubusercontent.com/kubernetes/kubernetes/master/.github/PULL_REQUEST_TEMPLATE.md
   
   ```
   **Special notes for your reviewer**:
   
   **Does this PR introduce a user-facing change?**:
   
   \```release-note
   
   \```
   
   **Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), 
usage docs, etc.**:
   
   
   \```docs
   
   \```
   ```
   
   It would be mostly NONE, and require description only when the change worths 
to put into release note. I personally find the section useful to describe 
breaking backward compatibility, guide reviewers to focus more on the change, 
but if we need to put the information for any changes being shown to users 
(even doc) then it would become the huge pain.
   
   Have we gone through the PR template or contribution guide doc for similar 
size of open source projects?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


HeartSaVioR edited a comment on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621155094


   It sounds like non-minor thing which worths to initiate discussion.
   
   From my experience, first two sections have been ended up describing the 
similar things, hence many times I just mentioned that previous/next section 
describes it.
   
   I'm also wondering about the benefits of being strict about the section 
`Does this PR introduce any user-facing change?`. This seems to be adopted from 
Kubernetes, but after I looked into Kubernetes template, the meaning of the 
section looks to be very different.
   
   
https://raw.githubusercontent.com/kubernetes/kubernetes/master/.github/PULL_REQUEST_TEMPLATE.md
   
   ```
   **Special notes for your reviewer**:
   
   **Does this PR introduce a user-facing change?**:
   
   \```release-note
   
   \```
   
   **Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), 
usage docs, etc.**:
   
   
   \```docs
   
   \```
   ```
   
   It would be mostly NONE, and require description only when the change worths 
to put into release note. I personally find the section useful to describe 
breaking backward compatibility, guide reviewers to focus more on the change, 
but if we need to put the information for any changes being shown to users 
(even doc) then it would become the huge pain.
   
   Have we gone through the PR template or contribution guide doc for similar 
size of open source projects?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


cloud-fan commented on a change in pull request #28402:
URL: https://github.com/apache/spark/pull/28402#discussion_r417258181



##
File path: sql/core/src/test/resources/sql-tests/inputs/interval.sql
##
@@ -132,6 +132,7 @@ select
   interval '99 11:22:33.123456789' day to second + dateval
 from interval_arithmetic;
 
+-- datetimes(in string representation) + intervals

Review comment:
   shall we move this comment to the next query?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


HeartSaVioR edited a comment on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621155094


   It sounds like non-minor thing which worths to initiate discussion.
   
   From my experience, first two sections have been ended up describing the 
similar things, hence many times I just mentioned that previous/next section 
covers it.
   
   I'm also wondering about the benefits of being strict about the section 
`Does this PR introduce any user-facing change?`. This seems to be adopted from 
Kubernetes, but after I looked into Kubernetes template, the meaning of the 
section looks to be very different.
   
   
https://raw.githubusercontent.com/kubernetes/kubernetes/master/.github/PULL_REQUEST_TEMPLATE.md
   
   ```
   **Special notes for your reviewer**:
   
   **Does this PR introduce a user-facing change?**:
   
   \```release-note
   
   \```
   
   **Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), 
usage docs, etc.**:
   
   
   \```docs
   
   \```
   ```
   
   It would be mostly NONE, and require description only when the change worths 
to put into release note. I personally find the section useful to describe 
breaking backward compatibility, guide reviewers to focus more on the change, 
but if we need to put the information for any changes being shown to users 
(even doc) then it would become the huge pain.
   
   Have we gone through the PR template or contribution guide doc for similar 
size of open source projects?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #27664:
URL: https://github.com/apache/spark/pull/27664#issuecomment-621039628


   **[Test build #122046 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122046/testReport)**
 for PR 27664 at commit 
[`f6078bb`](https://github.com/apache/spark/commit/f6078bb82c449994037071d8a241cca80c187b75).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

2020-04-29 Thread GitBox


SparkQA commented on pull request #27664:
URL: https://github.com/apache/spark/pull/27664#issuecomment-621157091


   **[Test build #122046 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122046/testReport)**
 for PR 27664 at commit 
[`f6078bb`](https://github.com/apache/spark/commit/f6078bb82c449994037071d8a241cca80c187b75).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


beliefer commented on a change in pull request #28194:
URL: https://github.com/apache/spark/pull/28194#discussion_r417260790



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.util.{fileToString, stringToFile}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * End-to-end test cases for SQL schemas of expression examples.
+ * The golden result file is 
"spark/sql/core/src/test/resources/sql-functions/sql-expression-schema.md".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *ExpressionsSchemaSuite"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*ExpressionsSchemaSuite"
+ * }}}
+ *
+ * For example:
+ * {{{
+ *   ...
+ *   @ExpressionDescription(
+ * usage = "_FUNC_(str, n) - Returns the string which repeats the given 
string value n times.",
+ * examples = """
+ *   Examples:
+ * > SELECT _FUNC_('123', 2);
+ *  123123
+ * """,
+ * since = "1.5.0")
+ *   case class StringRepeat(str: Expression, times: Expression)
+ *   ...
+ * }}}
+ *
+ * The format for golden result files look roughly like:
+ * {{{
+ *   ...
+ *   | org.apache.spark.sql.catalyst.expressions.StringRepeat | repeat | 
SELECT repeat('123', 2) | struct |
+ *   ...
+ * }}}
+ */
+// scalastyle:on line.size.limit
+@ExtendedSQLTest
+class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  private val baseResourcePath = {
+// We use a path based on Spark home for 2 reasons:
+//   1. Maven can't get correct resource directory when resources in other 
jars.
+//   2. We test subclasses in the hive-thriftserver module.
+val sparkHome = {
+  assert(sys.props.contains("spark.test.home") ||
+sys.env.contains("SPARK_HOME"), "spark.test.home or SPARK_HOME is not 
set.")
+  sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME"))
+}
+
+java.nio.file.Paths.get(sparkHome,
+  "sql", "core", "src", "test", "resources", "sql-functions").toFile
+  }
+
+  private val resultFile = new File(baseResourcePath, 
"sql-expression-schema.md")
+
+  /** A single SQL query's SQL and schema. */
+  protected case class QueryOutput(
+className: String,
+funcName: String,
+sql: String = "N/A",
+schema: String = "N/A") {
+override def toString: String = {
+  s"| $className | $funcName | $sql | $schema |"
+}
+  }
+
+  test("Check schemas for expression examples") {
+val exampleRe = """^(.+);\n(?s)(.+)$""".r
+val funInfos = spark.sessionState.functionRegistry.listFunction().map { 
funcId =>
+  spark.sessionState.catalog.lookupFunctionInfo(funcId)
+}
+
+val classFunsMap = funInfos.groupBy(_.getClassName).toSeq.sortBy(_._1)
+val outputBuffer = new ArrayBuffer[String]
+val outputs = new ArrayBuffer[QueryOutput]
+val missingExamples = new ArrayBuffer[String]
+
+classFunsMap.foreach { kv =>
+  val className = kv._1
+  kv._2.foreach { funInfo =>
+val example = funInfo.getExamples
+val funcName = funInfo.getName.replaceAll("\\|", "|")
+if (example == "") {
+  val queryOutput = QueryOutput(className, funcName)
+  outputBuffer += queryOutput.toString
+  outputs += queryOutput
+  missingExamples += funcName
+}
+
+// If expression exists 'Examples' segment, the first element is 
'Examples'. Because

Review comment:
   The fundamental purpose of this PR is to double check whether the alias 
of an expression can be displayed correctly in the schema. Although some 
expressions have multiple examples, only the first one is output here.
   
https://github.com/apache/spark/blob/133456d2dc809ea7cd03139556998955074dd288/sql/co

[GitHub] [spark] AmplabJenkins commented on pull request #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #27664:
URL: https://github.com/apache/spark/pull/27664#issuecomment-621158084







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


HeartSaVioR commented on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621158065


   Oh there's another doc describing when to write release notes.
   
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md
   
   At least it seems to be broader than I imagine, but still not sure the 
benefits of be strict about it.
   
   Shall we please have some reference PRs (best practice), and see what's 
missing here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #27664:
URL: https://github.com/apache/spark/pull/27664#issuecomment-621158084


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


SparkQA commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621159040


   **[Test build #122044 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122044/testReport)**
 for PR 28194 at commit 
[`460da00`](https://github.com/apache/spark/commit/460da00a0f9045e2a4672459e20324a8e5c3a6fa).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #27664:
URL: https://github.com/apache/spark/pull/27664#issuecomment-621158091


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122046/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621029754


   **[Test build #122044 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122044/testReport)**
 for PR 28194 at commit 
[`460da00`](https://github.com/apache/spark/commit/460da00a0f9045e2a4672459e20324a8e5c3a6fa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

2020-04-29 Thread GitBox


HeartSaVioR commented on pull request #27664:
URL: https://github.com/apache/spark/pull/27664#issuecomment-621159650


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621161358


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621161358







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621161369


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122044/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

2020-04-29 Thread GitBox


SparkQA commented on pull request #27664:
URL: https://github.com/apache/spark/pull/27664#issuecomment-621162135


   **[Test build #122060 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122060/testReport)**
 for PR 27664 at commit 
[`f6078bb`](https://github.com/apache/spark/commit/f6078bb82c449994037071d8a241cca80c187b75).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28404: [SPARK-31603][ML]AFT uses common functions in RDDLossFunction

2020-04-29 Thread GitBox


SparkQA commented on pull request #28404:
URL: https://github.com/apache/spark/pull/28404#issuecomment-621162341


   **[Test build #122056 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122056/testReport)**
 for PR 28404 at commit 
[`7fe08b8`](https://github.com/apache/spark/commit/7fe08b8d6884329bc250a6bfd543b6213369c4c7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #27664:
URL: https://github.com/apache/spark/pull/27664#issuecomment-621162641







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27664: [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #27664:
URL: https://github.com/apache/spark/pull/27664#issuecomment-621162641







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28404: [SPARK-31603][ML]AFT uses common functions in RDDLossFunction

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28404:
URL: https://github.com/apache/spark/pull/28404#issuecomment-621124275


   **[Test build #122056 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122056/testReport)**
 for PR 28404 at commit 
[`7fe08b8`](https://github.com/apache/spark/commit/7fe08b8d6884329bc250a6bfd543b6213369c4c7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28404: [SPARK-31603][ML]AFT uses common functions in RDDLossFunction

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28404:
URL: https://github.com/apache/spark/pull/28404#issuecomment-621163067







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28404: [SPARK-31603][ML]AFT uses common functions in RDDLossFunction

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28404:
URL: https://github.com/apache/spark/pull/28404#issuecomment-621163067







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


HeartSaVioR edited a comment on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621158065


   Oh there's another doc describing when to write release notes.
   
https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md
   
   At least it seems to be broader than I imagine, but still not sure the 
benefits of be strict about it.
   
   Shall we please have some reference PRs (best practice in k8s repo), and see 
what's missing here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-29 Thread GitBox


SparkQA commented on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-621164974


   **[Test build #122045 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122045/testReport)**
 for PR 28392 at commit 
[`d90b2e7`](https://github.com/apache/spark/commit/d90b2e726135e91eadf10c2972199c3f45887b5c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-621030330


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26719/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-621029749


   **[Test build #122045 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122045/testReport)**
 for PR 28392 at commit 
[`d90b2e7`](https://github.com/apache/spark/commit/d90b2e726135e91eadf10c2972199c3f45887b5c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-621166097







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-621166097







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk opened a new pull request #28405: [SPARK-31553][SQL][TESTS][FOLLOWUP] Tests for collection elem types of `isInCollection`

2020-04-29 Thread GitBox


MaxGekk opened a new pull request #28405:
URL: https://github.com/apache/spark/pull/28405


   ### What changes were proposed in this pull request?
   - Add tests for different element types of collections that could be passed 
to `isInCollection`. Added tests for types that can pass the check 
`In`.`checkInputDataTypes()`.
   - Test different switch thresholds in the `isInCollection: Scala Collection` 
test.
   
   ### Why are the changes needed?
   To prevent regressions like introduced by 
https://github.com/apache/spark/pull/25754 and reverted by 
https://github.com/apache/spark/pull/28388
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   By existing and new tests in `ColumnExpressionSuite`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28405: [SPARK-31553][SQL][TESTS][FOLLOWUP] Tests for collection elem types of `isInCollection`

2020-04-29 Thread GitBox


SparkQA commented on pull request #28405:
URL: https://github.com/apache/spark/pull/28405#issuecomment-621169689


   **[Test build #122061 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122061/testReport)**
 for PR 28405 at commit 
[`5bed4ed`](https://github.com/apache/spark/commit/5bed4ede5d94f214123c1fc7a8bc57924efb1efa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28405: [SPARK-31553][SQL][TESTS][FOLLOWUP] Tests for collection elem types of `isInCollection`

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28405:
URL: https://github.com/apache/spark/pull/28405#issuecomment-621170484







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28405: [SPARK-31553][SQL][TESTS][FOLLOWUP] Tests for collection elem types of `isInCollection`

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28405:
URL: https://github.com/apache/spark/pull/28405#issuecomment-621170484







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28404: [SPARK-31603][ML]AFT uses common functions in RDDLossFunction

2020-04-29 Thread GitBox


SparkQA commented on pull request #28404:
URL: https://github.com/apache/spark/pull/28404#issuecomment-621171235


   **[Test build #122057 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122057/testReport)**
 for PR 28404 at commit 
[`f5d3bb1`](https://github.com/apache/spark/commit/f5d3bb1cf4b998eec29553540b6ffbe0dc99dc3c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28404: [SPARK-31603][ML]AFT uses common functions in RDDLossFunction

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28404:
URL: https://github.com/apache/spark/pull/28404#issuecomment-621130916


   **[Test build #122057 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122057/testReport)**
 for PR 28404 at commit 
[`f5d3bb1`](https://github.com/apache/spark/commit/f5d3bb1cf4b998eec29553540b6ffbe0dc99dc3c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28404: [SPARK-31603][ML]AFT uses common functions in RDDLossFunction

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28404:
URL: https://github.com/apache/spark/pull/28404#issuecomment-621171989







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28404: [SPARK-31603][ML]AFT uses common functions in RDDLossFunction

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28404:
URL: https://github.com/apache/spark/pull/28404#issuecomment-621171989







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


yaooqinn commented on a change in pull request #28402:
URL: https://github.com/apache/spark/pull/28402#discussion_r417277741



##
File path: sql/core/src/test/resources/sql-tests/inputs/interval.sql
##
@@ -132,6 +132,7 @@ select
   interval '99 11:22:33.123456789' day to second + dateval
 from interval_arithmetic;
 
+-- datetimes(in string representation) + intervals

Review comment:
   oops... fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


SparkQA commented on pull request #28402:
URL: https://github.com/apache/spark/pull/28402#issuecomment-621174037


   **[Test build #122062 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122062/testReport)**
 for PR 28402 at commit 
[`6c873ba`](https://github.com/apache/spark/commit/6c873ba9d1b67cae348c60ddf62bd35ea463d7cb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


SparkQA commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621174443


   **[Test build #122063 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122063/testReport)**
 for PR 28194 at commit 
[`e571667`](https://github.com/apache/spark/commit/e57166790e5da48d7d31f609f4658605d43d4482).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621174855







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621174855







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28402:
URL: https://github.com/apache/spark/pull/28402#issuecomment-621174832







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28402:
URL: https://github.com/apache/spark/pull/28402#issuecomment-621174832







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-621174940







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-621174940







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


beliefer commented on a change in pull request #28194:
URL: https://github.com/apache/spark/pull/28194#discussion_r417284748



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.util.{fileToString, stringToFile}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * End-to-end test cases for SQL schemas of expression examples.
+ * The golden result file is 
"spark/sql/core/src/test/resources/sql-functions/sql-expression-schema.md".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *ExpressionsSchemaSuite"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*ExpressionsSchemaSuite"
+ * }}}
+ *
+ * For example:
+ * {{{
+ *   ...
+ *   @ExpressionDescription(
+ * usage = "_FUNC_(str, n) - Returns the string which repeats the given 
string value n times.",
+ * examples = """
+ *   Examples:
+ * > SELECT _FUNC_('123', 2);
+ *  123123
+ * """,
+ * since = "1.5.0")
+ *   case class StringRepeat(str: Expression, times: Expression)
+ *   ...
+ * }}}
+ *
+ * The format for golden result files look roughly like:
+ * {{{
+ *   ...
+ *   | org.apache.spark.sql.catalyst.expressions.StringRepeat | repeat | 
SELECT repeat('123', 2) | struct |
+ *   ...
+ * }}}
+ */
+// scalastyle:on line.size.limit
+@ExtendedSQLTest
+class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  private val baseResourcePath = {
+// We use a path based on Spark home for 2 reasons:
+//   1. Maven can't get correct resource directory when resources in other 
jars.
+//   2. We test subclasses in the hive-thriftserver module.
+val sparkHome = {
+  assert(sys.props.contains("spark.test.home") ||
+sys.env.contains("SPARK_HOME"), "spark.test.home or SPARK_HOME is not 
set.")
+  sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME"))
+}
+
+java.nio.file.Paths.get(sparkHome,
+  "sql", "core", "src", "test", "resources", "sql-functions").toFile
+  }
+
+  private val resultFile = new File(baseResourcePath, 
"sql-expression-schema.md")
+
+  /** A single SQL query's SQL and schema. */
+  protected case class QueryOutput(
+className: String,
+funcName: String,
+sql: String = "N/A",
+schema: String = "N/A") {
+override def toString: String = {
+  s"| $className | $funcName | $sql | $schema |"
+}
+  }
+
+  test("Check schemas for expression examples") {
+val exampleRe = """^(.+);\n(?s)(.+)$""".r
+val funInfos = spark.sessionState.functionRegistry.listFunction().map { 
funcId =>
+  spark.sessionState.catalog.lookupFunctionInfo(funcId)
+}
+
+val classFunsMap = funInfos.groupBy(_.getClassName).toSeq.sortBy(_._1)
+val outputBuffer = new ArrayBuffer[String]
+val outputs = new ArrayBuffer[QueryOutput]
+val missingExamples = new ArrayBuffer[String]
+
+classFunsMap.foreach { kv =>
+  val className = kv._1
+  kv._2.foreach { funInfo =>
+val example = funInfo.getExamples
+val funcName = funInfo.getName.replaceAll("\\|", "|")
+if (example == "") {
+  val queryOutput = QueryOutput(className, funcName)
+  outputBuffer += queryOutput.toString
+  outputs += queryOutput
+  missingExamples += funcName
+}
+
+// If expression exists 'Examples' segment, the first element is 
'Examples'. Because
+// this test case is only used to print aliases of expressions for 
double checking.
+// Therefore, we only need to output the first SQL and its 
corresponding schema.
+// Note: We need to filter out the commands that set the parameters, 
such as:
+// SET spark.sql.parser.escapedStringLiterals=tr

[GitHub] [spark] SparkQA commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-29 Thread GitBox


SparkQA commented on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-621179244


   **[Test build #122064 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122064/testReport)**
 for PR 28379 at commit 
[`780cef7`](https://github.com/apache/spark/commit/780cef739ee1fb3f51b83f4e2527b73a7512b33d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


SparkQA commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621183678


   **[Test build #122065 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122065/testReport)**
 for PR 28194 at commit 
[`a4d4de9`](https://github.com/apache/spark/commit/a4d4de9e472dbd55ffbbc13ae1c8ad615a7e3455).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621184526







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621184526







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28397: [SPARK-31519][SQL][2.4] Datetime functions in having aggregate expressions returns the wrong result

2020-04-29 Thread GitBox


SparkQA commented on pull request #28397:
URL: https://github.com/apache/spark/pull/28397#issuecomment-621188751


   **[Test build #122066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122066/testReport)**
 for PR 28397 at commit 
[`e98f120`](https://github.com/apache/spark/commit/e98f120f6847c9a1c902d4855aa280bd49cb5c48).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28397: [SPARK-31519][SQL][2.4] Datetime functions in having aggregate expressions returns the wrong result

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28397:
URL: https://github.com/apache/spark/pull/28397#issuecomment-621189401







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28397: [SPARK-31519][SQL][2.4] Datetime functions in having aggregate expressions returns the wrong result

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28397:
URL: https://github.com/apache/spark/pull/28397#issuecomment-621189401







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-621108027


   **[Test build #122054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122054/testReport)**
 for PR 28379 at commit 
[`8af440d`](https://github.com/apache/spark/commit/8af440d26ded6878580a61b823fcdb7da4755a37).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-29 Thread GitBox


SparkQA commented on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-621192121


   **[Test build #122054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122054/testReport)**
 for PR 28379 at commit 
[`8af440d`](https://github.com/apache/spark/commit/8af440d26ded6878580a61b823fcdb7da4755a37).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #28397: [SPARK-31519][SQL][2.4] Datetime functions in having aggregate expressions returns the wrong result

2020-04-29 Thread GitBox


xuanyuanking commented on a change in pull request #28397:
URL: https://github.com/apache/spark/pull/28397#discussion_r417299662



##
File path: sql/core/src/test/resources/sql-tests/inputs/having.sql
##
@@ -16,3 +16,6 @@ SELECT MIN(t.v) FROM (SELECT * FROM hav WHERE v > 0) t 
HAVING(COUNT(1) > 0);
 
 -- SPARK-20329: make sure we handle timezones correctly
 SELECT a + b FROM VALUES (1L, 2), (3L, 4) AS T(a, b) GROUP BY a + b HAVING a + 
b > 1;
+
+-- SPARK-31519: Cast in having aggregate expressions returns the wrong result
+SELECT SUM(a) AS b, CAST('2020-01-01' AS DATE) AS fake FROM VALUES (1, 10), 
(2, 20) AS T(a, b) GROUP BY b HAVING b > 10

Review comment:
   Thanks for the verification, I used another test case and changed the 
description.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


SparkQA commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621192321


   **[Test build #122049 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122049/testReport)**
 for PR 28194 at commit 
[`133456d`](https://github.com/apache/spark/commit/133456d2dc809ea7cd03139556998955074dd288).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621047343


   **[Test build #122049 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122049/testReport)**
 for PR 28194 at commit 
[`133456d`](https://github.com/apache/spark/commit/133456d2dc809ea7cd03139556998955074dd288).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-621192916







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-621192916


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621194164







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


SparkQA commented on pull request #28402:
URL: https://github.com/apache/spark/pull/28402#issuecomment-621194277


   **[Test build #122052 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122052/testReport)**
 for PR 28402 at commit 
[`feb2f02`](https://github.com/apache/spark/commit/feb2f0293d4cfd2a1a3d9a918b884849f10399d6).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class DatetimeSub(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-621192932


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122054/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-621194164







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28402:
URL: https://github.com/apache/spark/pull/28402#issuecomment-621069279


   **[Test build #122052 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122052/testReport)**
 for PR 28402 at commit 
[`feb2f02`](https://github.com/apache/spark/commit/feb2f0293d4cfd2a1a3d9a918b884849f10399d6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28402:
URL: https://github.com/apache/spark/pull/28402#issuecomment-621194957







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28402:
URL: https://github.com/apache/spark/pull/28402#issuecomment-621194957


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28402:
URL: https://github.com/apache/spark/pull/28402#issuecomment-621194968


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122052/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


SparkQA commented on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621201041


   **[Test build #122053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122053/testReport)**
 for PR 28403 at commit 
[`540c600`](https://github.com/apache/spark/commit/540c600b8633be372ee70a47626661dea7df1e4c).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621104804


   **[Test build #122053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122053/testReport)**
 for PR 28403 at commit 
[`540c600`](https://github.com/apache/spark/commit/540c600b8633be372ee70a47626661dea7df1e4c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621202261


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621202261







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28403: [MINOR][INFRA] Add a guide to clarify release/unreleased Spark versions of user-facing change in the Github PR template

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28403:
URL: https://github.com/apache/spark/pull/28403#issuecomment-621202278


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122053/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28366: [WIP][SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28366:
URL: https://github.com/apache/spark/pull/28366#issuecomment-621058280


   **[Test build #122051 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122051/testReport)**
 for PR 28366 at commit 
[`84bc8dd`](https://github.com/apache/spark/commit/84bc8dd4f7ec1d49ad8631435f8cf212f92efd9b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28366: [WIP][SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-04-29 Thread GitBox


SparkQA commented on pull request #28366:
URL: https://github.com/apache/spark/pull/28366#issuecomment-621208144


   **[Test build #122051 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122051/testReport)**
 for PR 28366 at commit 
[`84bc8dd`](https://github.com/apache/spark/commit/84bc8dd4f7ec1d49ad8631435f8cf212f92efd9b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28366: [WIP][SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28366:
URL: https://github.com/apache/spark/pull/28366#issuecomment-621209917







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-04-29 Thread GitBox


akshatb1 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r417317689



##
File path: docs/spark-standalone.md
##
@@ -240,6 +240,16 @@ SPARK_MASTER_OPTS supports the following system properties:
   
   1.6.3
 
+
+  spark.submit.waitAppCompletion

Review comment:
   That's right, it's an app setting. I couldn't find a section for 
application settings in spark-standalone.md. Could you please suggest where we 
can add it? https://spark.apache.org/docs/latest/spark-standalone.html





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28366: [WIP][SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28366:
URL: https://github.com/apache/spark/pull/28366#issuecomment-621209917







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-04-29 Thread GitBox


akshatb1 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r417318841



##
File path: docs/spark-standalone.md
##
@@ -240,6 +240,16 @@ SPARK_MASTER_OPTS supports the following system properties:
   
   1.6.3
 
+
+  spark.submit.waitAppCompletion

Review comment:
   Or should it be added in the generic configuration 
application-properties section here: 
https://spark.apache.org/docs/latest/configuration.html#application-properties? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-04-29 Thread GitBox


srowen commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r417318879



##
File path: docs/spark-standalone.md
##
@@ -240,6 +240,16 @@ SPARK_MASTER_OPTS supports the following system properties:
   
   1.6.3
 
+
+  spark.submit.waitAppCompletion

Review comment:
   Hm, maybe configuration.md, but it's specific to standalone.
   I suppose you could mark the existing configs as 'cluster configs' and make 
a new table of 'client configs'? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-29 Thread GitBox


beliefer commented on a change in pull request #28194:
URL: https://github.com/apache/spark/pull/28194#discussion_r417260790



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.util.{fileToString, stringToFile}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * End-to-end test cases for SQL schemas of expression examples.
+ * The golden result file is 
"spark/sql/core/src/test/resources/sql-functions/sql-expression-schema.md".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *ExpressionsSchemaSuite"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*ExpressionsSchemaSuite"
+ * }}}
+ *
+ * For example:
+ * {{{
+ *   ...
+ *   @ExpressionDescription(
+ * usage = "_FUNC_(str, n) - Returns the string which repeats the given 
string value n times.",
+ * examples = """
+ *   Examples:
+ * > SELECT _FUNC_('123', 2);
+ *  123123
+ * """,
+ * since = "1.5.0")
+ *   case class StringRepeat(str: Expression, times: Expression)
+ *   ...
+ * }}}
+ *
+ * The format for golden result files look roughly like:
+ * {{{
+ *   ...
+ *   | org.apache.spark.sql.catalyst.expressions.StringRepeat | repeat | 
SELECT repeat('123', 2) | struct |
+ *   ...
+ * }}}
+ */
+// scalastyle:on line.size.limit
+@ExtendedSQLTest
+class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  private val baseResourcePath = {
+// We use a path based on Spark home for 2 reasons:
+//   1. Maven can't get correct resource directory when resources in other 
jars.
+//   2. We test subclasses in the hive-thriftserver module.
+val sparkHome = {
+  assert(sys.props.contains("spark.test.home") ||
+sys.env.contains("SPARK_HOME"), "spark.test.home or SPARK_HOME is not 
set.")
+  sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME"))
+}
+
+java.nio.file.Paths.get(sparkHome,
+  "sql", "core", "src", "test", "resources", "sql-functions").toFile
+  }
+
+  private val resultFile = new File(baseResourcePath, 
"sql-expression-schema.md")
+
+  /** A single SQL query's SQL and schema. */
+  protected case class QueryOutput(
+className: String,
+funcName: String,
+sql: String = "N/A",
+schema: String = "N/A") {
+override def toString: String = {
+  s"| $className | $funcName | $sql | $schema |"
+}
+  }
+
+  test("Check schemas for expression examples") {
+val exampleRe = """^(.+);\n(?s)(.+)$""".r
+val funInfos = spark.sessionState.functionRegistry.listFunction().map { 
funcId =>
+  spark.sessionState.catalog.lookupFunctionInfo(funcId)
+}
+
+val classFunsMap = funInfos.groupBy(_.getClassName).toSeq.sortBy(_._1)
+val outputBuffer = new ArrayBuffer[String]
+val outputs = new ArrayBuffer[QueryOutput]
+val missingExamples = new ArrayBuffer[String]
+
+classFunsMap.foreach { kv =>
+  val className = kv._1
+  kv._2.foreach { funInfo =>
+val example = funInfo.getExamples
+val funcName = funInfo.getName.replaceAll("\\|", "|")
+if (example == "") {
+  val queryOutput = QueryOutput(className, funcName)
+  outputBuffer += queryOutput.toString
+  outputs += queryOutput
+  missingExamples += funcName
+}
+
+// If expression exists 'Examples' segment, the first element is 
'Examples'. Because

Review comment:
   The fundamental purpose of this PR is to double check whether the alias 
of an expression can be displayed correctly in the schema. Although some 
expressions have multiple examples, only the first one is output here.
   https://github.com/apache/spark/pull/28194#discussion_r407235388
   
https://github.c

[GitHub] [spark] SparkQA removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


SparkQA removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621138777


   **[Test build #122059 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122059/testReport)**
 for PR 28395 at commit 
[`481bba6`](https://github.com/apache/spark/commit/481bba62ce62a13f23ce153e54e8a5f56f6059c2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


SparkQA commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621220102


   **[Test build #122059 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122059/testReport)**
 for PR 28395 at commit 
[`481bba6`](https://github.com/apache/spark/commit/481bba62ce62a13f23ce153e54e8a5f56f6059c2).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621221120


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621221120







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621221135


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122059/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-29 Thread GitBox


HyukjinKwon commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-621230128


   @WeichenXu123 the test failure seems legitimate.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28390: [SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests

2020-04-29 Thread GitBox


SparkQA commented on pull request #28390:
URL: https://github.com/apache/spark/pull/28390#issuecomment-621238887


   **[Test build #122067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122067/testReport)**
 for PR 28390 at commit 
[`1101d05`](https://github.com/apache/spark/commit/1101d05c63d844ce8e99aa55e7ef5d02e4d5f3e1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28390: [SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests

2020-04-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28390:
URL: https://github.com/apache/spark/pull/28390#issuecomment-621240024







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28390: [SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests

2020-04-29 Thread GitBox


AmplabJenkins commented on pull request #28390:
URL: https://github.com/apache/spark/pull/28390#issuecomment-621240024







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #28404: [SPARK-31603][ML]AFT uses common functions in RDDLossFunction

2020-04-29 Thread GitBox


srowen commented on a change in pull request #28404:
URL: https://github.com/apache/spark/pull/28404#discussion_r417350012



##
File path: 
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/AFTAggregator.scala
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.optim.aggregator
+
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.regression.AFTPoint
+
+/**
+ * AFTAggregator computes the gradient and loss for a AFT loss function,
+ * as used in AFT survival regression for samples in sparse or dense vector in 
an online fashion.
+ *
+ * The loss function and likelihood function under the AFT model based on:
+ * Lawless, J. F., Statistical Models and Methods for Lifetime Data,
+ * New York: John Wiley & Sons, Inc. 2003.
+ *
+ * Two AFTAggregator can be merged together to have a summary of loss and 
gradient of
+ * the corresponding joint dataset.
+ *
+ * Given the values of the covariates $x^{'}$, for random lifetime $t_{i}$ of 
subjects i = 1,..,n,
+ * with possible right-censoring, the likelihood function under the AFT model 
is given as
+ *
+ * 
+ *$$
+ *L(\beta,\sigma)=\prod_{i=1}^n[\frac{1}{\sigma}f_{0}
+ *  (\frac{\log{t_{i}}-x^{'}\beta}{\sigma})]^{\delta_{i}}S_{0}
+ *(\frac{\log{t_{i}}-x^{'}\beta}{\sigma})^{1-\delta_{i}}
+ *$$
+ * 
+ *
+ * Where $\delta_{i}$ is the indicator of the event has occurred i.e. 
uncensored or not.
+ * Using $\epsilon_{i}=\frac{\log{t_{i}}-x^{'}\beta}{\sigma}$, the 
log-likelihood function
+ * assumes the form
+ *
+ * 
+ *$$
+ *\iota(\beta,\sigma)=\sum_{i=1}^{n}[-\delta_{i}\log\sigma+
+ *
\delta_{i}\log{f_{0}}(\epsilon_{i})+(1-\delta_{i})\log{S_{0}(\epsilon_{i})}]
+ *$$
+ * 
+ * Where $S_{0}(\epsilon_{i})$ is the baseline survivor function,
+ * and $f_{0}(\epsilon_{i})$ is corresponding density function.
+ *
+ * The most commonly used log-linear survival regression method is based on 
the Weibull
+ * distribution of the survival time. The Weibull distribution for lifetime 
corresponding
+ * to extreme value distribution for log of the lifetime,
+ * and the $S_{0}(\epsilon)$ function is
+ *
+ * 
+ *$$
+ *S_{0}(\epsilon_{i})=\exp(-e^{\epsilon_{i}})
+ *$$
+ * 
+ *
+ * and the $f_{0}(\epsilon_{i})$ function is
+ *
+ * 
+ *$$
+ *f_{0}(\epsilon_{i})=e^{\epsilon_{i}}\exp(-e^{\epsilon_{i}})
+ *$$
+ * 
+ *
+ * The log-likelihood function for Weibull distribution of lifetime is
+ *
+ * 
+ *$$
+ *\iota(\beta,\sigma)=
+ *
-\sum_{i=1}^n[\delta_{i}\log\sigma-\delta_{i}\epsilon_{i}+e^{\epsilon_{i}}]
+ *$$
+ * 
+ *
+ * Due to minimizing the negative log-likelihood equivalent to maximum a 
posteriori probability,
+ * the loss function we use to optimize is $-\iota(\beta,\sigma)$.
+ * The gradient functions for $\beta$ and $\log\sigma$ respectively are
+ *
+ * 
+ *$$
+ *\frac{\partial (-\iota)}{\partial \beta}=
+ *\sum_{1=1}^{n}[\delta_{i}-e^{\epsilon_{i}}]\frac{x_{i}}{\sigma} \\
+ *
+ *\frac{\partial (-\iota)}{\partial (\log\sigma)}=
+ *\sum_{i=1}^{n}[\delta_{i}+(\delta_{i}-e^{\epsilon_{i}})\epsilon_{i}]
+ *$$
+ * 
+ *
+ * @param bcCoefficients The broadcasted value includes three part: The log of 
scale parameter,
+ *   the intercept and regression coefficients 
corresponding to the features.
+ * @param fitIntercept Whether to fit an intercept term.
+ * @param bcFeaturesStd The broadcast standard deviation values of the 
features.
+ */
+
+private[ml] class AFTAggregator(
+bcFeaturesStd: Broadcast[Array[Double]],
+fitIntercept: Boolean)(bcCoefficients: Broadcast[Vector])
+  extends DifferentiableLossAggregator[AFTPoint, AFTAggregator] {

Review comment:
   So the win here is reusing common code in this superclass?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



--

  1   2   3   4   5   6   7   >