[GitHub] [spark] imback82 commented on a change in pull request #27095: [SPARK-30214][SQL] V2 commands resolves namespaces with new resolution framework

2020-01-04 Thread GitBox
imback82 commented on a change in pull request #27095: [SPARK-30214][SQL] V2 
commands resolves namespaces with new resolution framework
URL: https://github.com/apache/spark/pull/27095#discussion_r363073621
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -730,6 +730,8 @@ class Analyzer(
   case class ResolveNamespace(catalogManager: CatalogManager)
 extends Rule[LogicalPlan] with LookupCatalog {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case UnresolvedNamespace(Seq()) =>
+ResolvedNamespace(currentCatalog.asNamespaceCatalog, Seq.empty[String])
 
 Review comment:
   The conflict here is that `SHOW NAMESPACES` treats `None` as `Nil`, but 
`SHOW TABLES` treats `None` as `current namespace`, thus causing the ambiguity. 
To make `SHOW TABLES` work with `Nil` approach (the current one), I have to do 
the following:
   ```scala
   case ShowTablesStatement(NonSessionCatalogAndNamespace(catalog, ns), 
pattern) =>
 val namespace = if (ns.isEmpty && 
currentCatalog.name.equals(catalog.name)) {
   catalogManager.currentNamespace.toSeq
 } else { ns }
 ShowTables(catalog.asTableCatalog, namespace, pattern)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf

2020-01-04 Thread GitBox
viirya commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated 
subquery use ScalaSubquery to optimize perf
URL: https://github.com/apache/spark/pull/26437#issuecomment-570876682
 
 
   Is the second SQL query wrong (`COL1 > 1` -> `COL1 > 10`) in the PR 
description?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf

2020-01-04 Thread GitBox
SparkQA removed a comment on issue #26437: [SPARK-29800][SQL] Rewrite 
non-correlated subquery use ScalaSubquery to optimize perf
URL: https://github.com/apache/spark/pull/26437#issuecomment-570866651
 
 
   **[Test build #4985 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4985/testReport)**
 for PR 26437 at commit 
[`8c6060a`](https://github.com/apache/spark/commit/8c6060a1a395c81cbd08d0afc25490b533493b69).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf

2020-01-04 Thread GitBox
SparkQA commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated 
subquery use ScalaSubquery to optimize perf
URL: https://github.com/apache/spark/pull/26437#issuecomment-570873680
 
 
   **[Test build #4985 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4985/testReport)**
 for PR 26437 at commit 
[`8c6060a`](https://github.com/apache/spark/commit/8c6060a1a395c81cbd08d0afc25490b533493b69).
* This patch **fails Spark unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27095: [SPARK-30214][SQL] V2 commands resolves namespaces with new resolution framework

2020-01-04 Thread GitBox
cloud-fan commented on a change in pull request #27095: [SPARK-30214][SQL] V2 
commands resolves namespaces with new resolution framework
URL: https://github.com/apache/spark/pull/27095#discussion_r363071473
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -730,6 +730,8 @@ class Analyzer(
   case class ResolveNamespace(catalogManager: CatalogManager)
 extends Rule[LogicalPlan] with LookupCatalog {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case UnresolvedNamespace(Seq()) =>
+ResolvedNamespace(currentCatalog.asNamespaceCatalog, Seq.empty[String])
 
 Review comment:
   `CatalogAndNamespace` doesn't look up the namespace, but look up the 
catalog. I think it can handle Nil, which resolves catalog to the current 
catalog, and return Nil as the namespace identifier.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode

2020-01-04 Thread GitBox
cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw 
Exception when invalid string is cast to numeric type in ANSI mode
URL: https://github.com/apache/spark/pull/26933#discussion_r363071095
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ##
 @@ -482,6 +482,15 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 
   // LongConverter
   private[this] def castToLong(from: DataType): Any => Any = from match {
+case StringType if ansiEnabled =>
 
 Review comment:
   I'd like to see something like
   ```
   case StringType if ansiEnabled =>
 buildCast[UTF8String](_, _.toLongExact())
   case StringType =>
 val result = new LongWrapper()
 buildCast[UTF8String](_, s => if (s.toLong(result)) result.value else null)
   ```
   and in codegen
   ```
   val casting = if (ansi) {
 s"$evPrim = $c.toLongExact();"
   } else {
 s"""
   if ($c.toLong($wrapper)) {
 $evPrim = $wrapper.value;
   } else {
 $evNull = true;
   }
 """
   }
   code"""
 UTF8String.IntWrapper $wrapper = new UTF8String.IntWrapper();
 $casting
 $wrapper = null;
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode

2020-01-04 Thread GitBox
cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw 
Exception when invalid string is cast to numeric type in ANSI mode
URL: https://github.com/apache/spark/pull/26933#discussion_r363071095
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ##
 @@ -482,6 +482,15 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 
   // LongConverter
   private[this] def castToLong(from: DataType): Any => Any = from match {
+case StringType if ansiEnabled =>
 
 Review comment:
   I'd like to see something like
   ```
   case StringType if ansiEnabled =>
 buildCast[UTF8String](_, _.toLongExact())
   case StringType =>
 val result = new LongWrapper()
 buildCast[UTF8String](_, s => if (s.toLong(result)) result.value else null)
   ```
   and in codegen
   ```
   val casting = if (ansi) {
 s"$evPrim = $c.toIntExact();"
   } else {
 s"""
   if ($c.toInt($wrapper)) {
 $evPrim = $wrapper.value;
   } else {
 $evNull = true;
   }
 """
   }
   code"""
 UTF8String.IntWrapper $wrapper = new UTF8String.IntWrapper();
 $casting
 $wrapper = null;
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570868214
 
 
   **[Test build #116117 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116117/testReport)**
 for PR 27078 at commit 
[`d6e519a`](https://github.com/apache/spark/commit/d6e519aa09330cf5688e1013fbbfa93a76c68abe).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw Exception when invalid string is cast to numeric type in ANSI mode

2020-01-04 Thread GitBox
cloud-fan commented on a change in pull request #26933: [SPARK-30292][SQL]Throw 
Exception when invalid string is cast to numeric type in ANSI mode
URL: https://github.com/apache/spark/pull/26933#discussion_r363071095
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ##
 @@ -482,6 +482,15 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 
   // LongConverter
   private[this] def castToLong(from: DataType): Any => Any = from match {
+case StringType if ansiEnabled =>
 
 Review comment:
   I'd like to see something like
   ```
   case StringType if ansiEnabled =>
 val result = new LongWrapper()
 buildCast[UTF8String](_, s => {
   s.toLongExact(result)
   result.value
 }
   case StringType =>
 val result = new LongWrapper()
 buildCast[UTF8String](_, s => if (s.toLong(result)) result.value else null)
   ```
   and in codegen
   ```
   val method = if (ansi) "toIntExact" else "toInt"
   code"""
 UTF8String.IntWrapper $wrapper = new UTF8String.IntWrapper();
 if ($c.$method($wrapper)) {
   $evPrim = $wrapper.value;
 } else {
   $evNull = true;
 }
 $wrapper = null;
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use 
`NoOp` datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570867562
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use 
`NoOp` datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570867568
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20909/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570867568
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20909/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570867562
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum closed pull request #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty

2020-01-04 Thread GitBox
wangyum closed pull request #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the 
table even table stats is empty
URL: https://github.com/apache/spark/pull/22721
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated subquery use ScalaSubquery to optimize perf

2020-01-04 Thread GitBox
SparkQA commented on issue #26437: [SPARK-29800][SQL] Rewrite non-correlated 
subquery use ScalaSubquery to optimize perf
URL: https://github.com/apache/spark/pull/26437#issuecomment-570866651
 
 
   **[Test build #4985 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4985/testReport)**
 for PR 26437 at commit 
[`8c6060a`](https://github.com/apache/spark/commit/8c6060a1a395c81cbd08d0afc25490b533493b69).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call 
super class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570859143
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116116/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call 
super class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570859142
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super 
class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570859142
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super 
class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570859143
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116116/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
SparkQA removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super 
class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570849671
 
 
   **[Test build #116116 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116116/testReport)**
 for PR 27093 at commit 
[`b0c01c4`](https://github.com/apache/spark/commit/b0c01c4a219fe27104977501545e1829394a9d7a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
SparkQA commented on issue #27093: [SPARK-30418][ML] Make FM call super class 
method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570859057
 
 
   **[Test build #116116 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116116/testReport)**
 for PR 27093 at commit 
[`b0c01c4`](https://github.com/apache/spark/commit/b0c01c4a219fe27104977501545e1829394a9d7a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs

2020-01-04 Thread GitBox
wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] 
Calculating size of table with large number of partitions causes flooding logs
URL: https://github.com/apache/spark/pull/27079#discussion_r363069147
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
 ##
 @@ -75,6 +77,10 @@ object CommandUtils extends Logging {
 }.sum
   }
 }
+val partInfo = if (partitions.nonEmpty) s" with ${partitions.length} 
partitions" else ""
+logInfo(s"It took ${(System.nanoTime() - startTime) / (1000 * 1000)} ms to 
calculate" +
 
 Review comment:
   @maropu @srowen Maybe I could change back to the initial version, which 
prints a log with partition info in the branch for partitioned table?
   ```
   logInfo(s"Starting to calculate sizes for ${partitions.length} partitions.")
   ```
   In this way we keep the "partitioned table" logic only in that branch. Then 
the final log applies to both non-partitioned and partitioned tables.
   ```
   logInfo(s"It took ${(System.nanoTime() - startTime) / (1000 * 1000)} ms to 
calculate" +
   s" the total size for table ${catalogTable.identifier}.")
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs

2020-01-04 Thread GitBox
wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] 
Calculating size of table with large number of partitions causes flooding logs
URL: https://github.com/apache/spark/pull/27079#discussion_r363068727
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
 ##
 @@ -75,6 +77,10 @@ object CommandUtils extends Logging {
 }.sum
   }
 }
+val partInfo = if (partitions.nonEmpty) s" with ${partitions.length} 
partitions" else ""
+logInfo(s"It took ${(System.nanoTime() - startTime) / (1000 * 1000)} ms to 
calculate" +
 
 Review comment:
   If I put two different logs in the branches, I would need to have two 
`totalSize` values in two branches and return them after the logs. Besides, the 
majority of two logs would still be the same (except the partition info). So 
the code may look redundant that way... what do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs

2020-01-04 Thread GitBox
wzhfy commented on a change in pull request #27079: [SPARK-30410][SQL] 
Calculating size of table with large number of partitions causes flooding logs
URL: https://github.com/apache/spark/pull/27079#discussion_r363068298
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
 ##
 @@ -124,8 +128,8 @@ object CommandUtils extends Logging {
   0L
   }
 }.getOrElse(0L)
-val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
-logInfo(s"It took $durationInMs ms to calculate the total file size under 
path $locationUri.")
+val durationInMs = (System.nanoTime() - startTime) / 1e6
+logDebug(s"It took $durationInMs ms to calculate the total file size under 
path $locationUri.")
 
 Review comment:
   @srowen yes, it could be called in the "else" branch above, and one 
partition per log would be too much if the number of partitions is very large.
   ```
   partitions.map { p =>
 calculateLocationSize(sessionState, catalogTable.identifier, 
p.storage.locationUri)
   }.sum
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super 
class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570849870
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20908/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call 
super class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570849865
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27093: [SPARK-30418][ML] Make FM call 
super class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570849870
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20908/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27093: [SPARK-30418][ML] Make FM call super 
class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570849865
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
SparkQA commented on issue #27093: [SPARK-30418][ML] Make FM call super class 
method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#issuecomment-570849671
 
 
   **[Test build #116116 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116116/testReport)**
 for PR 27093 at commit 
[`b0c01c4`](https://github.com/apache/spark/commit/b0c01c4a219fe27104977501545e1829394a9d7a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
huaxingao commented on a change in pull request #27093: [SPARK-30418][ML] Make 
FM call super class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#discussion_r363067570
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala
 ##
 @@ -204,14 +204,8 @@ class FMClassifier @Since("3.0.0") (
 instr.logNumFeatures(numFeatures)
 
 val handlePersistence = dataset.storageLevel == StorageLevel.NONE
-val data: RDD[(Double, OldVector)] =
-  dataset.select(col($(labelCol)), col($(featuresCol))).rdd.map {
-case Row(label: Double, features: Vector) =>
-  require(label == 0 || label == 1, s"FMClassifier was given" +
-s" dataset with invalid label $label.  Labels must be in {0,1}; 
note that" +
-s" FMClassifier currently only supports binary classification.")
-  (label, features)
-  }
+val labeledPoint = extractLabeledPoints (dataset, numClasses)
 
 Review comment:
   Removed. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: SPARK-28148: repartition after join is not optimized away

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27096: SPARK-28148: repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-570842530
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27096: SPARK-28148: repartition after join is not optimized away

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27096: SPARK-28148: repartition after join is 
not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-570843855
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27096: SPARK-28148: repartition after join is not optimized away

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27096: SPARK-28148: repartition after join is 
not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-570842530
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bmarcott opened a new pull request #27096: SPARK-28148: repartition after join is not optimized away

2020-01-04 Thread GitBox
bmarcott opened a new pull request #27096: SPARK-28148: repartition after join 
is not optimized away
URL: https://github.com/apache/spark/pull/27096
 
 
   ### What changes were proposed in this pull request?
   
   Extra shuffling was not eliminated after inner joins because they produce 
PartitioningCollection Partitioning and the current logic only matched on 
HashPartitioning. 
   
   Nothing was present in EnsureRequirements to eliminate parent sorting 
(within partitions) which was unnecessary when the same sort order was 
introduced by sortmergejoin
   
   Copied from jira:
   Partitioning & sorting is usually retained after join.
   ```
   spark.conf.set('spark.sql.shuffle.partitions', '42')
   
   df1 = spark.range(500, numPartitions=5)
   df2 = spark.range(1000, numPartitions=5)
   df3 = spark.range(2000, numPartitions=5)
   
   # Reuse previous partitions & sort.
   df1.join(df2, on='id').join(df3, on='id').explain()
   # == Physical Plan ==
   # *(8) Project [id#367L]
   # +- *(8) SortMergeJoin [id#367L], [id#374L], Inner
   #:- *(5) Project [id#367L]
   #:  +- *(5) SortMergeJoin [id#367L], [id#369L], Inner
   #: :- *(2) Sort [id#367L ASC NULLS FIRST], false, 0
   #: :  +- Exchange hashpartitioning(id#367L, 42)
   #: : +- *(1) Range (0, 500, step=1, splits=5)
   #: +- *(4) Sort [id#369L ASC NULLS FIRST], false, 0
   #:+- Exchange hashpartitioning(id#369L, 42)
   #:   +- *(3) Range (0, 1000, step=1, splits=5)
   #+- *(7) Sort [id#374L ASC NULLS FIRST], false, 0
   #   +- Exchange hashpartitioning(id#374L, 42)
   #  +- *(6) Range (0, 2000, step=1, splits=5)
   ```
   
   However here: Partitions persist through left join, sort doesn't.
   
   ```
   df1.join(df2, on='id', 
how='left').repartition('id').sortWithinPartitions('id').explain()
   # == Physical Plan ==
   # *(5) Sort [id#367L ASC NULLS FIRST], false, 0
   # +- *(5) Project [id#367L]
   #+- SortMergeJoin [id#367L], [id#369L], LeftOuter
   #   :- *(2) Sort [id#367L ASC NULLS FIRST], false, 0
   #   :  +- Exchange hashpartitioning(id#367L, 42)
   #   : +- *(1) Range (0, 500, step=1, splits=5)
   #   +- *(4) Sort [id#369L ASC NULLS FIRST], false, 0
   #  +- Exchange hashpartitioning(id#369L, 42)
   # +- *(3) Range (0, 1000, step=1, splits=5)
   ```
   Also here: Partitions do not persist though inner join.
   
   ```
   df1.join(df2, on='id').repartition('id').sortWithinPartitions('id').explain()
   # == Physical Plan ==
   # *(6) Sort [id#367L ASC NULLS FIRST], false, 0
   # +- Exchange hashpartitioning(id#367L, 42)
   #+- *(5) Project [id#367L]
   #   +- *(5) SortMergeJoin [id#367L], [id#369L], Inner
   #  :- *(2) Sort [id#367L ASC NULLS FIRST], false, 0
   #  :  +- Exchange hashpartitioning(id#367L, 42)
   #  : +- *(1) Range (0, 500, step=1, splits=5)
   #  +- *(4) Sort [id#369L ASC NULLS FIRST], false, 0
   # +- Exchange hashpartitioning(id#369L, 42)
   #+- *(3) Range (0, 1000, step=1, splits=5)
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen closed pull request #26891: Fix issue where `newFilesOnly` does nothing

2020-01-04 Thread GitBox
srowen closed pull request #26891: Fix issue where `newFilesOnly` does nothing
URL: https://github.com/apache/spark/pull/26891
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #27093: [SPARK-30418][ML] Make FM call super class method extractLabeledPoints

2020-01-04 Thread GitBox
srowen commented on a change in pull request #27093: [SPARK-30418][ML] Make FM 
call super class method extractLabeledPoints
URL: https://github.com/apache/spark/pull/27093#discussion_r363062505
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala
 ##
 @@ -204,14 +204,8 @@ class FMClassifier @Since("3.0.0") (
 instr.logNumFeatures(numFeatures)
 
 val handlePersistence = dataset.storageLevel == StorageLevel.NONE
-val data: RDD[(Double, OldVector)] =
-  dataset.select(col($(labelCol)), col($(featuresCol))).rdd.map {
-case Row(label: Double, features: Vector) =>
-  require(label == 0 || label == 1, s"FMClassifier was given" +
-s" dataset with invalid label $label.  Labels must be in {0,1}; 
note that" +
-s" FMClassifier currently only supports binary classification.")
-  (label, features)
-  }
+val labeledPoint = extractLabeledPoints (dataset, numClasses)
 
 Review comment:
   Nit: remove extra space before method call


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen closed pull request #22758: [SPARK-25332][SQL] select broadcast join instead of sortMergeJoin for the small size table even query fired via new session/context

2020-01-04 Thread GitBox
srowen closed pull request #22758: [SPARK-25332][SQL] select broadcast join 
instead of sortMergeJoin for the small size table even query fired via new 
session/context 
URL: https://github.com/apache/spark/pull/22758
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling patch

2020-01-04 Thread GitBox
github-actions[bot] closed pull request #23008: [SPARK-22674][PYTHON] Removed 
the namedtuple pickling patch
URL: https://github.com/apache/spark/pull/23008
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit in LimitPushDown rule

2020-01-04 Thread GitBox
github-actions[bot] closed pull request #23104: [SPARK-26138][SQL] Cross join 
requires push LocalLimit in LimitPushDown rule
URL: https://github.com/apache/spark/pull/23104
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on issue #22758: [SPARK-25332][SQL] select broadcast join instead of sortMergeJoin for the small size table even query fired via new session/context

2020-01-04 Thread GitBox
github-actions[bot] commented on issue #22758: [SPARK-25332][SQL] select 
broadcast join instead of sortMergeJoin for the small size table even query 
fired via new session/context 
URL: https://github.com/apache/spark/pull/22758#issuecomment-570831370
 
 
   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just
   a way of keeping the PR queue manageable.
   
   If you'd like to revive this PR, please reopen it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2020-01-04 Thread GitBox
github-actions[bot] closed pull request #24983: [SPARK-27714][SQL][CBO] Support 
Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #22964: [SPARK-25963] Optimize generate followed by window

2020-01-04 Thread GitBox
github-actions[bot] closed pull request #22964: [SPARK-25963] Optimize generate 
followed by window
URL: https://github.com/apache/spark/pull/22964
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which returns the column count for a given schema

2020-01-04 Thread GitBox
github-actions[bot] commented on issue #22905: [SPARK-25894][SQL] Add a 
ColumnarFileFormat type which returns the column count for a given schema
URL: https://github.com/apache/spark/pull/22905#issuecomment-570831364
 
 
   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just
   a way of keeping the PR queue manageable.
   
   If you'd like to revive this PR, please reopen it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories

2020-01-04 Thread GitBox
github-actions[bot] closed pull request #23108: [Spark-25993][SQL][TEST]Add 
test cases for CREATE EXTERNAL TABLE with subdirectories
URL: https://github.com/apache/spark/pull/23108
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #23042: [SPARK-26070][SQL] add rule for implicit type coercion for decimal(x, 0)

2020-01-04 Thread GitBox
github-actions[bot] closed pull request #23042: [SPARK-26070][SQL] add rule for 
implicit type coercion for decimal(x,0)
URL: https://github.com/apache/spark/pull/23042
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not escaped by JDBC writer for table names

2020-01-04 Thread GitBox
github-actions[bot] closed pull request #23094: [SPARK-26077][SQL] Reserved SQL 
words are not escaped by JDBC writer for table names
URL: https://github.com/apache/spark/pull/23094
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and orderings

2020-01-04 Thread GitBox
github-actions[bot] commented on issue #22957: [SPARK-25951][SQL] Ignore 
aliases for distributions and orderings
URL: https://github.com/apache/spark/pull/22957#issuecomment-570831353
 
 
   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just
   a way of keeping the PR queue manageable.
   
   If you'd like to revive this PR, please reopen it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #23074: [SPARK-19798][SQL] Refresh table does not have effect on other sessions than the issuing one

2020-01-04 Thread GitBox
github-actions[bot] closed pull request #23074: [SPARK-19798][SQL] Refresh 
table does not have effect on other sessions than the issuing one
URL: https://github.com/apache/spark/pull/23074
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on issue #22878: [SPARK-25789][SQL] Support for Dataset of Avro

2020-01-04 Thread GitBox
github-actions[bot] commented on issue #22878: [SPARK-25789][SQL] Support for 
Dataset of Avro
URL: https://github.com/apache/spark/pull/22878#issuecomment-570831366
 
 
   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just
   a way of keeping the PR queue manageable.
   
   If you'd like to revive this PR, please reopen it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #23032: [WIP][SPARK-26061][SQL][MINOR] Reduce the number of unused UnsafeRowWriters created in whole-stage codegen

2020-01-04 Thread GitBox
github-actions[bot] closed pull request #23032: [WIP][SPARK-26061][SQL][MINOR] 
Reduce the number of unused UnsafeRowWriters created in whole-stage codegen
URL: https://github.com/apache/spark/pull/23032
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives duplicate result when an application was killed

2020-01-04 Thread GitBox
github-actions[bot] commented on issue #25795: [WIP][SPARK-29037][Core] Spark 
gives duplicate result when an application was killed
URL: https://github.com/apache/spark/pull/25795#issuecomment-570831321
 
 
   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just
   a way of keeping the PR queue manageable.
   
   If you'd like to revive this PR, please reopen it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on issue #22947: [SPARK-24913][SQL] Make AssertNotNull and AssertTrue non-deterministic

2020-01-04 Thread GitBox
github-actions[bot] commented on issue #22947: [SPARK-24913][SQL] Make 
AssertNotNull and AssertTrue non-deterministic
URL: https://github.com/apache/spark/pull/22947#issuecomment-570831356
 
 
   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just
   a way of keeping the PR queue manageable.
   
   If you'd like to revive this PR, please reopen it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on issue #22774: [SPARK-25780][CORE]Scheduling the tasks which have no higher level locality first

2020-01-04 Thread GitBox
github-actions[bot] commented on issue #22774: [SPARK-25780][CORE]Scheduling 
the tasks which have no higher level locality first
URL: https://github.com/apache/spark/pull/22774#issuecomment-570831368
 
 
   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just
   a way of keeping the PR queue manageable.
   
   If you'd like to revive this PR, please reopen it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty

2020-01-04 Thread GitBox
github-actions[bot] commented on issue #22721: [SPARK-19784][SPARK-25403][SQL] 
Refresh the table even table stats is empty
URL: https://github.com/apache/spark/pull/22721#issuecomment-570831372
 
 
   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just
   a way of keeping the PR queue manageable.
   
   If you'd like to revive this PR, please reopen it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on issue #22945: [SPARK-24066][SQL]Add new optimization rule to eliminate unnecessary sort by exchanged adjacent Window expressions

2020-01-04 Thread GitBox
github-actions[bot] commented on issue #22945: [SPARK-24066][SQL]Add new 
optimization rule to eliminate unnecessary sort by exchanged adjacent Window 
expressions
URL: https://github.com/apache/spark/pull/22945#issuecomment-570831361
 
 
   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just
   a way of keeping the PR queue manageable.
   
   If you'd like to revive this PR, please reopen it!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-04 Thread GitBox
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#discussion_r363016268
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -1375,6 +1375,16 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val FALL_BACK_TO_HDFS_FOR_STATS_MAX_PART_NUM =
+buildConf("spark.sql.statistics.fallBackToHdfs.maxPartitionNum")
+.doc("If the number of table partitions exceed this value, falling back to 
hdfs " +
+  "for statistics calculation is not allowed. This is used to avoid 
calculating " +
+  "the size of a large number of partitions through hdfs, which is very 
time consuming." +
+  "Setting this value to 0 or negative will disable falling back to hdfs 
for " +
+  "partition statistic calculation.")
 
 Review comment:
   Yes, in PruneFileSourcePartitions, it also may lead to calculating size of 
large number of partitions through hdfs.
   I will create a follow-up PR to refine it after this PR finished.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen closed pull request #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
srowen closed pull request #27091: [SPARK-30415][SQL]Improve Readability of 
SQLConf Doc
URL: https://github.com/apache/spark/pull/27091
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
srowen commented on issue #27091: [SPARK-30415][SQL]Improve Readability of 
SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570822844
 
 
   Merged to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ayudovin commented on issue #27030: [SPARK-30244][SQL][Catalyst] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener

2020-01-04 Thread GitBox
ayudovin commented on issue #27030: [SPARK-30244][SQL][Catalyst] - Emit 
pre/post events for "Partition" methods in ExternalCatalogWithListener
URL: https://github.com/apache/spark/pull/27030#issuecomment-570820515
 
 
   @hvanhovell, Could you please review this pull request?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ayudovin commented on issue #27034: [SPARK-30122][Resource-Manager][Kubernetes] - Allow setting serviceAccountName for executor pods

2020-01-04 Thread GitBox
ayudovin commented on issue #27034: [SPARK-30122][Resource-Manager][Kubernetes] 
- Allow setting serviceAccountName for executor pods
URL: https://github.com/apache/spark/pull/27034#issuecomment-570820424
 
 
   @liyinan926, Could you please review this pull request? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on issue #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor

2020-01-04 Thread GitBox
huaxingao commented on issue #27094: [SPARK-30419][ML][PySpark] Make 
IsotonicRegression extend Regressor
URL: https://github.com/apache/spark/pull/27094#issuecomment-570819435
 
 
   I think over. I didn't implement this correctly: the FeaturesType could be 
Vector too. Even though the Vector features are changed to Double before train 
and predict, it is not correct for me to use Type Double in  
   ```class IsotonicRegression extends Regressor[Double, IsotonicRegression, 
IsotonicRegressionModel]```
   I tried type parameter just now but had trouble with it. I looked the 
history and found out this is the reason why IsotonicRegression doesn't inherit 
from Regressor. 
   I will take a look of other regression algorithms to see if there are any 
reasons they don't inherit from Regressor. 
   I will be more cautious before submitting PR next time. Sorry. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao closed pull request #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor

2020-01-04 Thread GitBox
huaxingao closed pull request #27094: [SPARK-30419][ML][PySpark] Make 
IsotonicRegression extend Regressor
URL: https://github.com/apache/spark/pull/27094
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve 
Readability of SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570816944
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability 
of SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570816944
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve 
Readability of SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570816946
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116114/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability 
of SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570816946
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116114/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
SparkQA removed a comment on issue #27091: [SPARK-30415][SQL]Improve 
Readability of SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570799046
 
 
   **[Test build #116114 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116114/testReport)**
 for PR 27091 at commit 
[`3a84864`](https://github.com/apache/spark/commit/3a84864b2863aa083455e662fc546d7db3b5681e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
SparkQA commented on issue #27091: [SPARK-30415][SQL]Improve Readability of 
SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570816809
 
 
   **[Test build #116114 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116114/testReport)**
 for PR 27091 at commit 
[`3a84864`](https://github.com/apache/spark/commit/3a84864b2863aa083455e662fc546d7db3b5681e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] superbobry commented on issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling patch

2020-01-04 Thread GitBox
superbobry commented on issue #23008: [SPARK-22674][PYTHON] Removed the 
namedtuple pickling patch
URL: https://github.com/apache/spark/pull/23008#issuecomment-570815761
 
 
   @HyukjinKwon I think you might still want to merge this eventually. Closing 
the PR will only make the issue harder to discover.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor

2020-01-04 Thread GitBox
srowen commented on a change in pull request #27094: [SPARK-30419][ML][PySpark] 
Make IsotonicRegression extend Regressor
URL: https://github.com/apache/spark/pull/27094#discussion_r363051568
 
 

 ##
 File path: project/MimaExcludes.scala
 ##
 @@ -465,7 +465,15 @@ object MimaExcludes {
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.deploy.SparkHadoopUtil.appendS3AndSparkHadoopConfigurations"),
 
 // [SPARK-29348] Add observable metrics.
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryProgress.this")
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryProgress.this"),
+
+// [SPARK-30419][ML] Make IsotonicRegression extend Regressor
+
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.fit"),
+
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setFeaturesCol"),
 
 Review comment:
   Hm, weird, because Learner is the type IsotonicRegression here. So it 
shouldn't be a real change. I wonder if it's just a MiMa problem. 
   
   So as far as you know this isn't changing any APIs right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570813182
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116115/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use 
`NoOp` datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570813182
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116115/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use 
`NoOp` datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570813180
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570813180
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570812994
 
 
   **[Test build #116115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116115/testReport)**
 for PR 27078 at commit 
[`c26164a`](https://github.com/apache/spark/commit/c26164a6cce5cbd3c21b1668e617518320ad97c4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
SparkQA removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570801625
 
 
   **[Test build #116115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116115/testReport)**
 for PR 27078 at commit 
[`c26164a`](https://github.com/apache/spark/commit/c26164a6cce5cbd3c21b1668e617518320ad97c4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] erikerlandson commented on a change in pull request #25024: [SPARK-27296][SQL] Allows Aggregator to be registered as a UDF

2020-01-04 Thread GitBox
erikerlandson commented on a change in pull request #25024: [SPARK-27296][SQL] 
Allows Aggregator to be registered as a UDF
URL: https://github.com/apache/spark/pull/25024#discussion_r363046965
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala
 ##
 @@ -450,3 +454,88 @@ case class ScalaUDAF(
 
   override def nodeName: String = udaf.getClass.getSimpleName
 }
+
+case class ScalaAggregator[IN, BUF, OUT](
+children: Seq[Expression],
+agg: Aggregator[IN, BUF, OUT],
+inputEncoder: ExpressionEncoder[IN],
+isNullable: Boolean = true,
+isDeterministic: Boolean = true,
+mutableAggBufferOffset: Int = 0,
+inputAggBufferOffset: Int = 0)
+  extends TypedImperativeAggregate[BUF]
+  with NonSQLExpression
+  with UserDefinedExpression
+  with ImplicitCastInputTypes
+  with Logging {
+
+  private[this] lazy val bufferEncoder = 
agg.bufferEncoder.asInstanceOf[ExpressionEncoder[BUF]]
+  private[this] lazy val outputEncoder = 
agg.outputEncoder.asInstanceOf[ExpressionEncoder[OUT]]
+
+  def dataType: DataType = outputEncoder.objSerializer.dataType
+
+  def inputTypes: Seq[DataType] = inputEncoder.schema.map(_.dataType)
+
+  def nullable: Boolean = isNullable
+
+  override lazy val deterministic: Boolean = isDeterministic
+
+  def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): 
ScalaAggregator[IN, BUF, OUT] =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
ScalaAggregator[IN, BUF, OUT] =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  private[this] lazy val childrenSchema: StructType = {
+val inputFields = children.zipWithIndex.map {
+  case (child, index) =>
+StructField(s"input$index", child.dataType, child.nullable, 
Metadata.empty)
+}
+StructType(inputFields)
+  }
+
+  private[this] lazy val inputProjection = {
+val inputAttributes = childrenSchema.toAttributes
+log.debug(
+  s"Creating MutableProj: $children, inputSchema: $inputAttributes.")
+UnsafeProjection.create(children, inputAttributes)
+  }
+
+  def createAggregationBuffer(): BUF = agg.zero
+
+  def update(buffer: BUF, input: InternalRow): BUF = {
+val proj = inputProjection(input)
+val a = inputEncoder.fromRow(proj)
+agg.reduce(buffer, a)
+  }
+
+  def merge(buffer: BUF, input: BUF): BUF = agg.merge(buffer, input)
+
+  private[this] lazy val outputToCatalystConverter: Any => Any = {
+CatalystTypeConverters.createToCatalystConverter(dataType)
+  }
+
+  def eval(buffer: BUF): Any = {
+val row = outputEncoder.toRow(agg.finish(buffer))
+if (outputEncoder.isSerializedAsStruct) row else row.get(0, dataType)
+  }
+
+  private[this] lazy val bufferSerializer = bufferEncoder.namedExpressions
+  private[this] lazy val bufferDeserializer = 
bufferEncoder.resolveAndBind().deserializer
+  private[this] lazy val bufferObjToRow = 
UnsafeProjection.create(bufferSerializer)
+  private[this] lazy val bufferRow = new UnsafeRow(bufferSerializer.length)
+  private[this] lazy val bufferRowToObject =
+GenerateSafeProjection.generate(bufferDeserializer :: Nil)
+
+  def serialize(agg: BUF): Array[Byte] = 
bufferObjToRow(InternalRow(agg)).getBytes
 
 Review comment:
   :+1: 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amanomer commented on issue #27052: [SPARK-30390][MLLIB] Avoid double caching in mllib.KMeans#runWithWeights.

2020-01-04 Thread GitBox
amanomer commented on issue #27052: [SPARK-30390][MLLIB] Avoid double caching 
in mllib.KMeans#runWithWeights.
URL: https://github.com/apache/spark/pull/27052#issuecomment-570805218
 
 
   Thanks @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #27094: [SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor

2020-01-04 Thread GitBox
huaxingao commented on a change in pull request #27094: 
[SPARK-30419][ML][PySpark] Make IsotonicRegression extend Regressor
URL: https://github.com/apache/spark/pull/27094#discussion_r363046502
 
 

 ##
 File path: project/MimaExcludes.scala
 ##
 @@ -465,7 +465,15 @@ object MimaExcludes {
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.deploy.SparkHadoopUtil.appendS3AndSparkHadoopConfigurations"),
 
 // [SPARK-29348] Add observable metrics.
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryProgress.this")
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryProgress.this"),
+
+// [SPARK-30419][ML] Make IsotonicRegression extend Regressor
+
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.fit"),
+
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setFeaturesCol"),
 
 Review comment:
   I was confused as well when I first saw the Mima errors lol
   
   Here are the Mima errors:
   ```
   [error]  * method 
fit(org.apache.spark.sql.Dataset)org.apache.spark.ml.regression.IsotonicRegressionModel
 in class org.apache.spark.ml.regression.IsotonicRegression has a different 
result type in current version, where it is org.apache.spark.ml.Model rather 
than org.apache.spark.ml.regression.IsotonicRegressionModel
   [error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.fit")
   [error]  * method 
setFeaturesCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegression
 in class org.apache.spark.ml.regression.IsotonicRegression has a different 
result type in current version, where it is org.apache.spark.ml.Predictor 
rather than org.apache.spark.ml.regression.IsotonicRegression
   [error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setFeaturesCol")
   [error]  * method 
setLabelCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegression 
in class org.apache.spark.ml.regression.IsotonicRegression has a different 
result type in current version, where it is org.apache.spark.ml.Predictor 
rather than org.apache.spark.ml.regression.IsotonicRegression
   [error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setLabelCol")
   [error]  * method 
setPredictionCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegression
 in class org.apache.spark.ml.regression.IsotonicRegression has a different 
result type in current version, where it is org.apache.spark.ml.Predictor 
rather than org.apache.spark.ml.regression.IsotonicRegression
   [error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegression.setPredictionCol")
   [error]  * method 
setFeaturesCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegressionModel
 in class org.apache.spark.ml.regression.IsotonicRegressionModel has a 
different result type in current version, where it is 
org.apache.spark.ml.PredictionModel rather than 
org.apache.spark.ml.regression.IsotonicRegressionModel
   [error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegressionModel.setFeaturesCol")
   [error]  * method 
setPredictionCol(java.lang.String)org.apache.spark.ml.regression.IsotonicRegressionModel
 in class org.apache.spark.ml.regression.IsotonicRegressionModel has a 
different result type in current version, where it is 
org.apache.spark.ml.PredictionModel rather than 
org.apache.spark.ml.regression.IsotonicRegressionModel
   [error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.IsotonicRegressionModel.setPredictionCol")
   ```
   The APIs are still the same, but the return types are different. Before the 
change, setXXX is in ```IsotonicRegression``` and the return type is 
```IsotonicRegression```
   
   ```
 def setFeaturesCol(value: String): this.type = set(featuresCol, value)
   ```
   After the change, setXXX is in the super class ```Predictor``` and the 
return type is ```Predictor```
   ```
 def setFeaturesCol(value: String): Learner = set(featuresCol, 
value).asInstanceOf[Learner]
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: 

[GitHub] [spark] imback82 commented on a change in pull request #27095: [SPARK-30214][SQL] V2 commands resolves namespaces with new resolution framework

2020-01-04 Thread GitBox
imback82 commented on a change in pull request #27095: [SPARK-30214][SQL] V2 
commands resolves namespaces with new resolution framework
URL: https://github.com/apache/spark/pull/27095#discussion_r363023439
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -730,6 +730,8 @@ class Analyzer(
   case class ResolveNamespace(catalogManager: CatalogManager)
 extends Rule[LogicalPlan] with LookupCatalog {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case UnresolvedNamespace(Seq()) =>
+ResolvedNamespace(currentCatalog.asNamespaceCatalog, Seq.empty[String])
 
 Review comment:
   This may not be a good idea since empty `Seq` can mean `None` or `Nil`.
   
   How about we add `Option` as following since namespace can be optional in a 
command:
   ```scala
   case class ResolvedNamespace(catalog: SupportsNamespaces, namespace: 
Option[Seq[String]])
 extends LeafNode {
 override def output: Seq[Attribute] = Nil
   }
   
   case class UnresolvedNamespace(multipartIdentifier: Option[Seq[String]]) 
extends LeafNode {
 override lazy val resolved: Boolean = false
 override def output: Seq[Attribute] = Nil
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use 
`NoOp` datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570801735
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27078: [SPARK-30409][SQL][TESTS] Use 
`NoOp` datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570801737
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20907/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570801737
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20907/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570801735
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` datasource in SQL benchmarks

2020-01-04 Thread GitBox
SparkQA commented on issue #27078: [SPARK-30409][SQL][TESTS] Use `NoOp` 
datasource in SQL benchmarks
URL: https://github.com/apache/spark/pull/27078#issuecomment-570801625
 
 
   **[Test build #116115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116115/testReport)**
 for PR 27078 at commit 
[`c26164a`](https://github.com/apache/spark/commit/c26164a6cce5cbd3c21b1668e617518320ad97c4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command

2020-01-04 Thread GitBox
srowen commented on issue #26759: [SPARK-28794][SQL][DOC] Documentation for 
Create table Command
URL: https://github.com/apache/spark/pull/26759#issuecomment-570799515
 
 
   Ping @PavithraRamachandran 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve 
Readability of SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570799158
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
AmplabJenkins removed a comment on issue #27091: [SPARK-30415][SQL]Improve 
Readability of SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570799164
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20906/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability 
of SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570799158
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
AmplabJenkins commented on issue #27091: [SPARK-30415][SQL]Improve Readability 
of SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570799164
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20906/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
SparkQA commented on issue #27091: [SPARK-30415][SQL]Improve Readability of 
SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570799046
 
 
   **[Test build #116114 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116114/testReport)**
 for PR 27091 at commit 
[`3a84864`](https://github.com/apache/spark/commit/3a84864b2863aa083455e662fc546d7db3b5681e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #27091: [SPARK-30415][SQL]Improve Readability of SQLConf Doc

2020-01-04 Thread GitBox
srowen commented on issue #27091: [SPARK-30415][SQL]Improve Readability of 
SQLConf Doc
URL: https://github.com/apache/spark/pull/27091#issuecomment-570798805
 
 
   Jenkins test this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs

2020-01-04 Thread GitBox
srowen commented on a change in pull request #27079: [SPARK-30410][SQL] 
Calculating size of table with large number of partitions causes flooding logs
URL: https://github.com/apache/spark/pull/27079#discussion_r363043345
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
 ##
 @@ -124,8 +128,8 @@ object CommandUtils extends Logging {
   0L
   }
 }.getOrElse(0L)
-val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
-logInfo(s"It took $durationInMs ms to calculate the total file size under 
path $locationUri.")
+val durationInMs = (System.nanoTime() - startTime) / 1e6
+logDebug(s"It took $durationInMs ms to calculate the total file size under 
path $locationUri.")
 
 Review comment:
   Do you mean to change to debug level here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #27079: [SPARK-30410][SQL] Calculating size of table with large number of partitions causes flooding logs

2020-01-04 Thread GitBox
srowen commented on a change in pull request #27079: [SPARK-30410][SQL] 
Calculating size of table with large number of partitions causes flooding logs
URL: https://github.com/apache/spark/pull/27079#discussion_r363043362
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
 ##
 @@ -75,6 +77,10 @@ object CommandUtils extends Logging {
 }.sum
   }
 }
+val partInfo = if (partitions.nonEmpty) s" with ${partitions.length} 
partitions" else ""
+logInfo(s"It took ${(System.nanoTime() - startTime) / (1000 * 1000)} ms to 
calculate" +
 
 Review comment:
   Because the two branches above differ in several ways, it might be cleaner 
to just put two different log statements in the branches above instead of this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen closed pull request #27059: [SPARK-30398][ML] PCA/RegressionMetrics/RowMatrix avoid unnecessary computation

2020-01-04 Thread GitBox
srowen closed pull request #27059: [SPARK-30398][ML] 
PCA/RegressionMetrics/RowMatrix avoid unnecessary computation
URL: https://github.com/apache/spark/pull/27059
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #27059: [SPARK-30398][ML] PCA/RegressionMetrics/RowMatrix avoid unnecessary computation

2020-01-04 Thread GitBox
srowen commented on issue #27059: [SPARK-30398][ML] 
PCA/RegressionMetrics/RowMatrix avoid unnecessary computation
URL: https://github.com/apache/spark/pull/27059#issuecomment-570798581
 
 
   Merged to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] brkyvz commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider

2020-01-04 Thread GitBox
brkyvz commented on issue #26913: [SPARK-29219][SQL] Introduce 
SupportsCatalogOptions for TableProvider
URL: https://github.com/apache/spark/pull/26913#issuecomment-570798373
 
 
   @cloud-fan Any more comments on this? Shall we merge this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #27092: [SPARK-30416][SQL] Log a warning for deprecated SQL config in `set()` and `unset()`

2020-01-04 Thread GitBox
MaxGekk commented on issue #27092: [SPARK-30416][SQL] Log a warning for 
deprecated SQL config in `set()` and `unset()`
URL: https://github.com/apache/spark/pull/27092#issuecomment-570798353
 
 
   @HyukjinKwon Please, have a look at the PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen closed pull request #27052: [SPARK-30390][MLLIB] Avoid double caching in mllib.KMeans#runWithWeights.

2020-01-04 Thread GitBox
srowen closed pull request #27052: [SPARK-30390][MLLIB] Avoid double caching in 
mllib.KMeans#runWithWeights.
URL: https://github.com/apache/spark/pull/27052
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #27052: [SPARK-30390][MLLIB] Avoid double caching in mllib.KMeans#runWithWeights.

2020-01-04 Thread GitBox
srowen commented on issue #27052: [SPARK-30390][MLLIB] Avoid double caching in 
mllib.KMeans#runWithWeights.
URL: https://github.com/apache/spark/pull/27052#issuecomment-570797957
 
 
   We can change this further, but this is an improvement and less of a change 
than anything else we'd do. I'll merge it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >