[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727161445







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #30347: [SPARK-33209][SS] Refactor unit test of stream-stream join in UnsupportedOperationsSuite

2020-11-13 Thread GitBox


c21 commented on pull request #30347:
URL: https://github.com/apache/spark/pull/30347#issuecomment-727162216


   Thanks @HeartSaVioR for heads up, please take your time. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727161442


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35691/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727161445







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30368:
URL: https://github.com/apache/spark/pull/30368#discussion_r523389535



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -1452,11 +1452,21 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
 }
 
 /**
- * Combines two adjacent [[Limit]] operators into one, merging the
- * expressions into one single expression.
+ * 1. Eliminate [[Limit]] operators if it's child max row <= limit.
+ * 2. Combines two adjacent [[Limit]] operators into one, merging the
+ *expressions into one single expression.
  */
-object CombineLimits extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+object EliminateLimits extends Rule[LogicalPlan] {
+  private def canEliminate(limitExpr: Expression, childMaxRow: Option[Long]): 
Boolean = {
+limitExpr.foldable && childMaxRow.exists { _ <= 
limitExpr.eval().toString.toLong }

Review comment:
   nit, but we can put `limitExpr.eval()` out of `exists` loop?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external

2020-11-13 Thread GitBox


Ngone51 commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523388191



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -657,6 +688,38 @@ class BlockManagerMasterEndpoint(
 }
   }
 
+  private def getShufflePushMergerLocations(
+  numMergersNeeded: Int,
+  hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+val blockManagersWithExecutors = 
blockManagerIdByExecutor.groupBy(_._2.host)
+  .mapValues(_.head).values.map(_._2).toSet
+val filteredBlockManagersWithExecutors = blockManagersWithExecutors
+  .filterNot(x => hostsToFilter.contains(x.host))
+val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map(
+  x => BlockManagerId(x.executorId, x.host, 
StorageUtils.externalShuffleServicePort(conf)))
+
+// Enough mergers are available as part of active executors list
+if (filteredMergersWithExecutors.size >= numMergersNeeded) {
+  filteredMergersWithExecutors.toSeq
+} else {
+  // Delta mergers added from inactive mergers list to the active mergers 
list
+  val filteredMergersWithExecutorsHosts = 
filteredMergersWithExecutors.map(_.host)
+  // Pick random hosts instead of preferring the top of the list
+  val randomizedShuffleMergerLocations = 
Utils.randomize(shuffleMergerLocations.values.toSeq)

Review comment:
   Yes! Thanks @mridulm 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


maropu commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727158698


   >  I think it should be added as a separate PR and merge the test PR first? 
Then we can run against with it to check if the result is the same. @maropu
   
   Yea, it looks fine. I'll review this again tonight and thanks for the update.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727158547


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35691/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295714



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The 
`ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+  val cache: LoadingCache[ExpressionProxy, ResultProxy] = 
CacheBuilder.newBuilder()
+.maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+.build(
+  new CacheLoader[ExpressionProxy, ResultProxy]() {
+override def load(expr: ExpressionProxy): ResultProxy = {
+  ResultProxy(expr.proxyEval(currentInput))
+}
+  })
+
+  private var currentInput: InternalRow = null
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This 
cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+currentInput = input
+cache.invalidateAll()
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+val equivalentExpressions: EquivalentExpressions = new 
EquivalentExpressions
+
+expressions.foreach(equivalentExpressions.addExprTree(_))
+
+var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+val commonExprs = 
equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+commonExprs.foreach { e =>
+  val expr = e.head
+  val proxy = ExpressionProxy(expr, this)
+
+  proxyMap ++= e.map(_ -> proxy).toMap
+}
+
+// Only adding proxy if we find subexpressions.
+if (proxyMap.nonEmpty) {
+  expressions.map { expr =>
+// `transform` will cause stackoverflow because it keeps transforming 
into
+// `ExpressionProxy`. But we cannot use `transformUp` because we want 
to use
+// subexpressions at higher level. So we `transformDown` until finding 
first
+// subexpression.
+var transformed = false
+expr.transform {

Review comment:
   Good point! I think I missed this point. I will add a test too. 
(actually there is a test)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30347: [SPARK-33209][SS] Refactor unit test of stream-stream join in UnsupportedOperationsSuite

2020-11-13 Thread GitBox


HeartSaVioR commented on pull request #30347:
URL: https://github.com/apache/spark/pull/30347#issuecomment-727156640


   Hey sorry. Please give me to some more time to start looking into, as here's 
Sat. already. I'd probably take this in early next week. Thanks for 
understanding.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727155028


   Hmm, actually for 
https://github.com/apache/spark/pull/30341#discussion_r522904967, that is to 
add some tests in `SQLQueryTestSuite` with `--CONFIG_DIM 
spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN`, I 
think it should be added as a separate PR and merge the test PR first? Then we 
can run against with it to check if the result is the same. @maropu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727154709


   I addressed most of comments except for 
https://github.com/apache/spark/pull/30341#discussion_r522822589 and 
https://github.com/apache/spark/pull/30341#discussion_r522904967.
   
   For https://github.com/apache/spark/pull/30341#discussion_r522822589, I will 
wait for confirmation.
   
   For https://github.com/apache/spark/pull/30341#discussion_r522904967, I will 
add some more tests later.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384442



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntimeSuite.scala
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class EvaluationRuntimeSuite extends SparkFunSuite {

Review comment:
   Okay, let me see how to add some tests for this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384163



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, 
UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object 
`EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime 
first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) 
extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So 
cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode =
+throw new UnsupportedOperationException(s"Cannot generate code for 
expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {

Review comment:
   Removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384132



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The 
`ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {

Review comment:
   Renamed as `SubExprEvaluationRuntime` now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384138



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, 
UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object 
`EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime 
first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) 
extends Expression {

Review comment:
   As a single file `SubExprEvaluationRuntime.scala` now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384143



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, 
UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object 
`EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime 
first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) 
extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So 
cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode =
+throw new UnsupportedOperationException(s"Cannot generate code for 
expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+runtime.cache.get(this).result
+  } catch {
+// Cache.get() may wrap the original exception. See the following URL
+// 
http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+//   Cache.html#get(K,%20java.util.concurrent.Callable)
+case e @ (_: UncheckedExecutionException | _: ExecutionError) =>

Review comment:
   Moved.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727154090


   **[Test build #131088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131088/testReport)**
 for PR 30341 at commit 
[`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523383340



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The 
`ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+  val cache: LoadingCache[ExpressionProxy, ResultProxy] = 
CacheBuilder.newBuilder()
+.maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+.build(
+  new CacheLoader[ExpressionProxy, ResultProxy]() {
+override def load(expr: ExpressionProxy): ResultProxy = {
+  ResultProxy(expr.proxyEval(currentInput))
+}
+  })
+
+  private var currentInput: InternalRow = null
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This 
cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+currentInput = input
+cache.invalidateAll()
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+val equivalentExpressions: EquivalentExpressions = new 
EquivalentExpressions
+
+expressions.foreach(equivalentExpressions.addExprTree(_))
+
+var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+val commonExprs = 
equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+commonExprs.foreach { e =>
+  val expr = e.head
+  val proxy = ExpressionProxy(expr, this)
+
+  proxyMap ++= e.map(_ -> proxy).toMap
+}
+
+// Only adding proxy if we find subexpressions.
+if (proxyMap.nonEmpty) {
+  expressions.map { expr =>
+// `transform` will cause stackoverflow because it keeps transforming 
into
+// `ExpressionProxy`. But we cannot use `transformUp` because we want 
to use
+// subexpressions at higher level. So we `transformDown` until finding 
first
+// subexpression.
+var transformed = false

Review comment:
   Oh, good idea, I like it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30358: [SPARK-33394][SQL] Throw `NoSuchNamespaceException` for not existing namespace in `InMemoryTableCatalog.listTables()`

2020-11-13 Thread GitBox


dongjoon-hyun commented on a change in pull request #30358:
URL: https://github.com/apache/spark/pull/30358#discussion_r523383165



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowTablesSuiteBase.scala
##
@@ -57,6 +58,13 @@ trait ShowTablesSuiteBase extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("show table in a not existing namespace") {
+val msg = intercept[NoSuchNamespaceException] {

Review comment:
   Got it~, @MaxGekk .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #30358: [SPARK-33394][SQL] Throw `NoSuchNamespaceException` for not existing namespace in `InMemoryTableCatalog.listTables()`

2020-11-13 Thread GitBox


MaxGekk commented on a change in pull request #30358:
URL: https://github.com/apache/spark/pull/30358#discussion_r523382673



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowTablesSuiteBase.scala
##
@@ -57,6 +58,13 @@ trait ShowTablesSuiteBase extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("show table in a not existing namespace") {
+val msg = intercept[NoSuchNamespaceException] {

Review comment:
   I didn't touch v1, it behaves as before. `NoSuchDatabaseException` is a 
sub-class of `NoSuchNamespaceException`. or do you mean the change in the test?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #30347: [SPARK-33209][SS] Refactor unit test of stream-stream join in UnsupportedOperationsSuite

2020-11-13 Thread GitBox


c21 commented on pull request #30347:
URL: https://github.com/apache/spark/pull/30347#issuecomment-727151821


   @HeartSaVioR - could you also help take a look when you have time? Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30358: [SPARK-33394][SQL] Throw `NoSuchNamespaceException` for not existing namespace in `InMemoryTableCatalog.listTables()`

2020-11-13 Thread GitBox


dongjoon-hyun commented on a change in pull request #30358:
URL: https://github.com/apache/spark/pull/30358#discussion_r523382095



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowTablesSuiteBase.scala
##
@@ -57,6 +58,13 @@ trait ShowTablesSuiteBase extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("show table in a not existing namespace") {
+val msg = intercept[NoSuchNamespaceException] {

Review comment:
   @cloud-fan and @MaxGekk .
   This means the behavior change for both v1 and v2, doesn't it?
   - `v1`: `NoSuchDatabaseException` -> `NoSuchNamespaceException`
   - `v2`: empty list -> `NoSuchNamespaceException`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523381340



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, 
UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object 
`EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime 
first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) 
extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So 
cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode =
+throw new UnsupportedOperationException(s"Cannot generate code for 
expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+runtime.cache.get(this).result

Review comment:
   I think it should be fine but want to make sure I get your point 
correctly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] BryanCutler commented on a change in pull request #30309: [SPARK-33407][PYTHON] Simplify the exception message from Python UDFs (disabled by default)

2020-11-13 Thread GitBox


BryanCutler commented on a change in pull request #30309:
URL: https://github.com/apache/spark/pull/30309#discussion_r523380710



##
File path: python/pyspark/util.py
##
@@ -75,6 +79,144 @@ def wrapper(*args, **kwargs):
 return wrapper
 
 
+def walk_tb(tb):
+while tb is not None:
+yield tb
+tb = tb.tb_next
+
+
+def try_simplify_traceback(tb):
+"""
+Simplify the traceback. It removes the tracebacks in the current package, 
and only
+shows the traceback that is related to the thirdparty and user-specified 
codes.
+
+Returns
+---
+TracebackType or None
+  Simplified traceback instance. It returns None if it fails to simplify.
+
+Notes
+-
+This keeps the tracebacks once it sees they are from a different file even
+though the following tracebacks are from the current package.
+
+Examples
+
+>>> import importlib
+>>> import sys
+>>> import traceback
+>>> import tempfile
+>>> with tempfile.TemporaryDirectory() as tmp_dir:
+... with open("%s/dummy_module.py" % tmp_dir, "w") as f:
+... _ = f.write(
+... 'def raise_stop_iteration():\\n'
+... 'raise StopIteration()\\n\\n'
+... 'def simple_wrapper(f):\\n'
+... 'def wrapper(*a, **k):\\n'
+... 'return f(*a, **k)\\n'
+... 'return wrapper\\n')
+... f.flush()
+... spec = importlib.util.spec_from_file_location(
+... "dummy_module", "%s/dummy_module.py" % tmp_dir)
+... dummy_module = importlib.util.module_from_spec(spec)
+... spec.loader.exec_module(dummy_module)
+>>> def skip_doctest_traceback(tb):
+... import pyspark
+... root = os.path.dirname(pyspark.__file__)
+... pairs = zip(walk_tb(tb), traceback.extract_tb(tb))
+... for cur_tb, cur_frame in pairs:
+... if cur_frame.filename.startswith(root):
+... return cur_tb
+
+Regular exceptions should show the file name of the current package as 
below.
+
+>>> exc_info = None
+>>> try:
+...fail_on_stopiteration(dummy_module.raise_stop_iteration)()

Review comment:
   indentation off here?

##
File path: python/pyspark/util.py
##
@@ -20,6 +20,10 @@
 import re
 import sys
 import traceback
+import types
+import os
+import platform
+import itertools

Review comment:
   nit: imports not in alphabetical order - not necessary though

##
File path: python/pyspark/util.py
##
@@ -75,6 +79,144 @@ def wrapper(*args, **kwargs):
 return wrapper
 
 
+def walk_tb(tb):
+while tb is not None:
+yield tb
+tb = tb.tb_next
+
+
+def try_simplify_traceback(tb):
+"""
+Simplify the traceback. It removes the tracebacks in the current package, 
and only
+shows the traceback that is related to the thirdparty and user-specified 
codes.
+
+Returns
+---
+TracebackType or None
+  Simplified traceback instance. It returns None if it fails to simplify.
+
+Notes
+-
+This keeps the tracebacks once it sees they are from a different file even
+though the following tracebacks are from the current package.
+
+Examples
+
+>>> import importlib
+>>> import sys
+>>> import traceback
+>>> import tempfile
+>>> with tempfile.TemporaryDirectory() as tmp_dir:
+... with open("%s/dummy_module.py" % tmp_dir, "w") as f:
+... _ = f.write(
+... 'def raise_stop_iteration():\\n'
+... 'raise StopIteration()\\n\\n'
+... 'def simple_wrapper(f):\\n'
+... 'def wrapper(*a, **k):\\n'
+... 'return f(*a, **k)\\n'
+... 'return wrapper\\n')
+... f.flush()
+... spec = importlib.util.spec_from_file_location(
+... "dummy_module", "%s/dummy_module.py" % tmp_dir)
+... dummy_module = importlib.util.module_from_spec(spec)
+... spec.loader.exec_module(dummy_module)
+>>> def skip_doctest_traceback(tb):
+... import pyspark
+... root = os.path.dirname(pyspark.__file__)
+... pairs = zip(walk_tb(tb), traceback.extract_tb(tb))
+... for cur_tb, cur_frame in pairs:
+... if cur_frame.filename.startswith(root):
+... return cur_tb
+
+Regular exceptions should show the file name of the current package as 
below.
+
+>>> exc_info = None
+>>> try:
+...fail_on_stopiteration(dummy_module.raise_stop_iteration)()
+... except Exception as e:
+... tb = sys.exc_info()[-1]
+... e.__cause__ = None
+... exc_info = "".join(
+... traceback.format_exception(type(e), e, tb))
+>>> print(exc_info)  # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
+Traceback 

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxCh

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727147876







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30309: [SPARK-33407][PYTHON] Simplify the exception message from Python UDFs (disabled by default)

2020-11-13 Thread GitBox


viirya commented on a change in pull request #30309:
URL: https://github.com/apache/spark/pull/30309#discussion_r523380970



##
File path: python/pyspark/util.py
##
@@ -177,6 +319,8 @@ def __del__(self):
 
 if __name__ == "__main__":
 import doctest
-(failure_count, test_count) = doctest.testmod()
-if failure_count:
-sys.exit(-1)
+
+if "pypy" not in platform.python_implementation().lower() and 
sys.version_info[:2] >= (3, 7):
+(failure_count, test_count) = doctest.testmod()

Review comment:
   Does this mean we also skip other doctest in `python/pyspark/util.py`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727149697


   I just initiated the discussion on dev@ mailing list which I should have 
been done instead.
   
https://lists.apache.org/thread.html/r30069e17f59e8d29267ae296d56840970905476019023f20164ee5a3%40%3Cdev.spark.apache.org%3E
   
   My apologize to make noises and feel anyone unhappy.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


dongjoon-hyun removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727125565







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727138220


   I was feeling odd and became feeling upset because my intention wasn't 
change from the first comment and the intention was disregarded (at least 
that's what I felt like) for a couple of early replies, even I emphasized 
whether you checked my comments were addressed or not is not the point of 
concern here. 
   
   The replies were simply based on whether there was technically any fault or 
not, and in addition, trying to interpret my action on your view and framed me. 
I never said my approval is required because I'm domain expert of SS. I 
mentioned twos including me because twos are only the reviewers who left 
"actual" review comments, not saying we're dealing with SS hence we are 
gatekeeper. This is unchanged from the first comment and I never said about SS 
in my comments. Same happened for the statement I'm framing the Apache Spark 
community be the negative. I'm going to be a bad guy based on what I even 
didn't mention. How I can't be upset?
   
   Can we make sure we carefully read the comment and try not to interpret and 
just ask if we're unsure about the intention?
   
   Other than that I feel this is a waste of time. I have been trying to 
explain the committer's power to say "technically correct" is not always 
representing "we are doing good", but I also admit it's super hard to 
reconsider the practice we have been using for a while, and I feel the air that 
we're reluctant to even think about it. At least I shouldn't deal with 
individual, as it's against my voice as well and that feels someone to be 
educated. My apologize again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727120756







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HyukjinKwon removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-726476089







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-726418896


   That's not my point. There's no indication I have produced all review 
comments (while actually I produced all review comments), and review comment 
author would be the right one to verify whether the review comments are 
addressed as suggested. I planned to review the new change again today but it's 
already merged.
   
   I see we tend to ask reviewers to see whether they're happy with the last 
change before merging. At least we "were", and I tend to ensure that, unless 
reviewers are not responded for days. I'd hope we ensure everyone is happy with 
the change before merging, otherwise discuss/debate if necessary.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


dongjoon-hyun removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-726811877


   @HeartSaVioR . It seems that you tend to frame the Apache Spark community in 
negative ways. For your examples, it clearly looks like a straw-man attack. All 
of us didn't claim such funny things.
   
   Putting them aside, I want to say that I have been appreciate your 
contributions really, especially in this area, `Structural Streaming` because 
this area really needs lots of more care from the community. To be 
constructive, it would be great if 
   - You can let us know what was your un-address comments in this PR  
(https://github.com/apache/spark/pull/30210#issuecomment-726413507)
   - You can do review after commits. Please do it now instead of stopping like 
this. Apache Spark community is widely open for that and I am also happy to 
accept any comments at any timeframe.
   - You can reply on the corresponding consensus email threads about this PR 
(which we already did based on your advice on JIRA and here) if you changed 
your mind. We are open for that, too.
   - You can raise another email thread about your specific Spark practice 
requests like you did before. I was also happy about your issue raising before 
and will be at this time again.
   
   I believe both of us want to collaborate in a productive way in this area, 
@HeartSaVioR .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


dongjoon-hyun removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-726413507







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-726410128


   (I hope the PR can wait for reviewers' approvals who left valid review 
comments. I understand the PR can be merged "technically", but doesn't seem to 
be a good practice. If in doubt w.r.t reviewers being inactive, we can at least 
wait for a day based on timezone difference.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-726428939


   No that's not also my point. I don't claim about the domain owner (but in 
practice I see there's implicit domain owner). You can merge the PR in SS area 
without me if I didn't review.
   
   The concern is that I was the one of active reviewers and you're just 
assuming my review comments were addressed without my confirmation. Both my and 
@xuanyuanking's comments. I don't require others to read my mind, and anyone 
can't read my mind, so there should be the process of "confirmation". That 
would just bring delay of merging a day or so, wouldn't hurt anything.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-726569827


   I'd rather avoid the chance of "post-review" whenever possible, but I'd 
admit everyone has different thoughts. I'm OK with it, and if that's considered 
here (and no one would be disappointed from post-review), I was clearly wrong. 
My bad.
   
   I hope we prefer to say about "good practice" of doing something instead of 
saying "whether someone is the right to do something". Everything is 
technically possible. e.g. I don't think Spark has some sort of BYLAWS which 
prevents committers to self +1 and push, but I don't leverage it even I still 
have PRs sitting around years which got LGTM from contributors, not just from 
committers. It's just because it's a bad practice.
   
   Agree to disagree about the different understanding of "details" about good 
practice of reviewing is OK (because I may be wrong), but I'd be unhappy if we 
keep saying about the "right". This says I should always start with explicitly 
vote -1 whenever I start reviewing, which is the only way about the claim of 
"my right" of reviewer, but no one would do this as we have a belief and know 
about the huge negative effect on putting veto. And sure I don't change my 
reviewing practice to start from -1. (That's gonna be just funny.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR removed a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR removed a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-726433238


   What do you say? I don't say you're missing the comment. I say the 
confirmation is better to be done by the reviewers who review, not from some 
other one. The confirmation would only take a day or so (due to the timezone) 
if the reviewers are quite active. Even you can say "I'll merge in tomorrow if 
there's no further feedback." to force the delay to a day.
   
   You're emphasizing 2 weeks but I can enumerate the PRs which waits for a 
half of year or even years. The code change is quite small (because the 
detection logic was there), but given it's breaking backward compatibility, 
it's not odd request to make a consensus. If the default is false probably we 
rather concern less. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727149286


   Sorry to all for all noises. Please disregard all conversation. I'll remove 
them now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBein

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727147876







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBe

2020-11-13 Thread GitBox


SparkQA removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727132757


   **[Test build #131087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131087/testReport)**
 for PR 30139 at commit 
[`fab5557`](https://github.com/apache/spark/commit/fab5557e1e0a0f4b78868ab98e55b05b997864b2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTrans

2020-11-13 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727147710


   **[Test build #131087 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131087/testReport)**
 for PR 30139 at commit 
[`fab5557`](https://github.com/apache/spark/commit/fab5557e1e0a0f4b78868ab98e55b05b997864b2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30374: [WIP][SPARK-33444][ML] Added support for Initial model in Gaussian Mixture Model in ML

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30374:
URL: https://github.com/apache/spark/pull/30374#issuecomment-727147056


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30374: [WIP][SPARK-33444][ML] Added support for Initial model in Gaussian Mixture Model in ML

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30374:
URL: https://github.com/apache/spark/pull/30374#issuecomment-727147154


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahar603 opened a new pull request #30374: [WIP][SPARK-33444][ML] Added support for Initial model in Gaussian Mixture Model in ML

2020-11-13 Thread GitBox


shahar603 opened a new pull request #30374:
URL: https://github.com/apache/spark/pull/30374


   ### What changes were proposed in this pull request?
   
   Added an optional `initialModel` in the GaussianMixture class.
   
   ### Why are the changes needed?
   
   To allow for non random initialization in ML's Gaussian Mixture. The 
algorithm's results can be affected due to the initialization and the existing 
random initialization may be limiting in some cases. This feature exists in 
MLLIB. 
   
   ### Does this PR introduce _any_ user-facing change?
   
   * A new method: `setInitialModel` in the `GaussianMixture` class
   
   
   ### How was this patch tested?
   
   Yes, tests were added to the `GaussianMixtureSuite.scala` file.
   The tests test:
   * The ability to add the model
   * The fact the GaussianMixture object fit data using the initial parameters
   * That the GaussianMixture class validates that `K` is the same to the 
initial model's `K`
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30374: [WIP][SPARK-33444][ML] Added support for Initial model in Gaussian Mixture Model in ML

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30374:
URL: https://github.com/apache/spark/pull/30374#issuecomment-727147056


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


dongjoon-hyun commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727146418


   Can we talk over the phone? It would be better for us.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-727143194







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-727143194







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-11-13 Thread GitBox


SparkQA removed a comment on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-727093834


   **[Test build #131086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131086/testReport)**
 for PR 29414 at commit 
[`5b41fbc`](https://github.com/apache/spark/commit/5b41fbc028620a289cb8bc853e8de5861428da87).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-11-13 Thread GitBox


SparkQA commented on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-727143021


   **[Test build #131086 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131086/testReport)**
 for PR 29414 at commit 
[`5b41fbc`](https://github.com/apache/spark/commit/5b41fbc028620a289cb8bc853e8de5861428da87).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30373: [SPARK-33183][SQL][FOLLOW-UP] Adjust RemoveRedundantSorts rule order and update config version

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30373:
URL: https://github.com/apache/spark/pull/30373#issuecomment-727142229







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30373: [SPARK-33183][SQL][FOLLOW-UP] Adjust RemoveRedundantSorts rule order and update config version

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30373:
URL: https://github.com/apache/spark/pull/30373#issuecomment-727142229







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30373: [SPARK-33183][SQL][FOLLOW-UP] Adjust RemoveRedundantSorts rule order and update config version

2020-11-13 Thread GitBox


SparkQA removed a comment on pull request #30373:
URL: https://github.com/apache/spark/pull/30373#issuecomment-727092021


   **[Test build #131085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131085/testReport)**
 for PR 30373 at commit 
[`e6b0bb3`](https://github.com/apache/spark/commit/e6b0bb301d928ce8e9970ec8b887ff06e3110b6d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30373: [SPARK-33183][SQL][FOLLOW-UP] Adjust RemoveRedundantSorts rule order and update config version

2020-11-13 Thread GitBox


SparkQA commented on pull request #30373:
URL: https://github.com/apache/spark/pull/30373#issuecomment-727142042


   **[Test build #131085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131085/testReport)**
 for PR 30373 at commit 
[`e6b0bb3`](https://github.com/apache/spark/commit/e6b0bb301d928ce8e9970ec8b887ff06e3110b6d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxCh

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727140308


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35690/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxCh

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727140306


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBein

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727140306







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTrans

2020-11-13 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727140293


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35690/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR edited a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727138220


   I was feeling odd and became feeling upset because my intention wasn't 
change from the first comment and the intention was disregarded (at least 
that's what I felt like) for a couple of early replies, even I emphasized 
whether you checked my comments were addressed or not is not the point of 
concern here. 
   
   The replies were simply based on whether there was technically any fault or 
not, and in addition, trying to interpret my action on your view and framed me. 
I never said my approval is required because I'm domain expert of SS. I 
mentioned twos including me because twos are only the reviewers who left 
"actual" review comments, not saying we're dealing with SS hence we are 
gatekeeper. This is unchanged from the first comment and I never said about SS 
in my comments. Same happened for the statement I'm framing the Apache Spark 
community be the negative. I'm going to be a bad guy based on what I even 
didn't mention. How I can't be upset?
   
   Can we make sure we carefully read the comment and try not to interpret and 
just ask if we're unsure about the intention?
   
   Other than that I feel this is a waste of time. I have been trying to 
explain the committer's power to say "technically correct" is not always 
representing "we are doing good", but I also admit it's super hard to 
reconsider the practice we have been using for a while, and I feel the air that 
we're reluctant to even think about it. At least I shouldn't deal with 
individual, as it's against my voice as well and that feels someone to be 
educated. My apologize again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727138220


   I was feeling odd and became feeling upset because my intention wasn't 
change from the first comment and the intention was disregarded (at least 
that's what I felt like) for a couple of early replies, even I emphasized 
whether you checked my comments were addressed or not is not the point of 
concern here. 
   
   The replies were simply based on whether there was any fault or not, and in 
addition, trying to interpret my action on your view and framed me. I never 
said my approval is required because I'm domain expert of SS. I mentioned twos 
including me because twos are only the reviewers who left "actual" review 
comments, not saying we're dealing with SS hence we are gatekeeper. This is 
unchanged from the first comment and I never said about SS in my comments. Same 
happened for the statement I'm framing the Apache Spark community be the 
negative. I'm going to be a bad guy based on what I even didn't mention. How I 
can't be upset?
   
   Can we make sure we carefully read the comment and try not to interpret and 
just ask if we're unsure about the intention?
   
   Other than that I feel this is a waste of time. I have been trying to 
explain the committer's power to say "technically correct" is not always 
representing "we are doing good", but I also admit it's super hard to 
reconsider the practice we have been using for a while, and I feel the air that 
we're reluctant to even think about it. At least I shouldn't deal with 
individual, as it's against my voice as well and that feels someone to be 
educated. My apologize again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTrans

2020-11-13 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727137606


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35690/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #30370: [SPARK-33446][CORE] Add config spark.executor.coresOverhead

2020-11-13 Thread GitBox


mridulm commented on pull request #30370:
URL: https://github.com/apache/spark/pull/30370#issuecomment-727133352


   > I want an executor with 2 cores and 6 gb, but only 1 core used for task 
allocation, which means at most 1 task could be running on this executor. If I 
use existing configs, there would be at most 2 tasks. I want to give each task 
6GB memory, but also use extra cpu time for gc or other things.
   
   GC is handled by the VM - we dont need an additional core for it.
   Without a good usecase, I am not in favor of adding additional 
configurations.
   
   Having said that, if the requirement is strictly what you mentioned : 
allocate two cores and 6gb, run only 1 task and leave 1 core around for "other 
things" (let me assume some background processing ? Not sure what is happening) 
: you can use spark.task.cpus=2 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] manuzhang commented on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command

2020-11-13 Thread GitBox


manuzhang commented on pull request #29797:
URL: https://github.com/apache/spark/pull/29797#issuecomment-727132781


   @maryannxue thanks for pointing out the fundamental problems while I'm not 
familiar with the design behind. When starting out, I was trying to solve the 
most urgent issue in our use cases, too many output files caused by LSR, which 
broke the downstream jobs on the pipeline. Meanwhile, disabling LSR entirely 
downgrades the performance of BHJ switched from SMJ. Hence, this PR with help 
of @cloud-fan. 
   
   > But again, this is a "repartition" issue rather than the write command 
issue.  Introducing this kind of hack in AQE plan does not make sense.
   1. Is that ok or even good if repartition is optimized away in cases other 
than on write command ?
   2. Moving forward, what's your suggestion ? Do you want me revert this PR ?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTrans

2020-11-13 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-727132757


   **[Test build #131087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131087/testReport)**
 for PR 30139 at commit 
[`fab5557`](https://github.com/apache/spark/commit/fab5557e1e0a0f4b78868ab98e55b05b997864b2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting externa

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-727123181


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131082/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuf

2020-11-13 Thread GitBox


SparkQA removed a comment on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-726987157


   **[Test build #131082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131082/testReport)**
 for PR 30164 at commit 
[`9ba4dfb`](https://github.com/apache/spark/commit/9ba4dfbed66fab7badf279ce3e83d4db569d61e2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting externa

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-727122853







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `ma

2020-11-13 Thread GitBox


AngersZh commented on a change in pull request #30139:
URL: https://github.com/apache/spark/pull/30139#discussion_r523364609



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java
##
@@ -88,12 +88,14 @@ public void processFetchRequest(
   logger.trace("Received req from {} to fetch block {}", 
getRemoteAddress(channel),
 msg.streamChunkId);
 }
-long chunksBeingTransferred = streamManager.chunksBeingTransferred();
-if (chunksBeingTransferred >= maxChunksBeingTransferred) {
-  logger.warn("The number of chunks being transferred {} is above {}, 
close the connection.",
-chunksBeingTransferred, maxChunksBeingTransferred);
-  channel.close();
-  return;
+if (maxChunksBeingTransferred < Long.MAX_VALUE) {
+  long chunksBeingTransferred = streamManager.chunksBeingTransferred();
+  if (chunksBeingTransferred >= maxChunksBeingTransferred) {
+logger.warn("The number of chunks being transferred {} is above {}, 
close the connection.",
+chunksBeingTransferred, maxChunksBeingTransferred);

Review comment:
   > Nit: indentation should be 2
   
   Done

##
File path: 
common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java
##
@@ -124,12 +124,14 @@ private void processStreamRequest(final StreamRequest 
req) {
 req.streamId);
 }
 
-long chunksBeingTransferred = streamManager.chunksBeingTransferred();
-if (chunksBeingTransferred >= maxChunksBeingTransferred) {
-  logger.warn("The number of chunks being transferred {} is above {}, 
close the connection.",
-chunksBeingTransferred, maxChunksBeingTransferred);
-  channel.close();
-  return;
+if (maxChunksBeingTransferred < Long.MAX_VALUE) {
+  long chunksBeingTransferred = streamManager.chunksBeingTransferred();
+  if (chunksBeingTransferred >= maxChunksBeingTransferred) {
+logger.warn("The number of chunks being transferred {} is above {}, 
close the connection.",
+chunksBeingTransferred, maxChunksBeingTransferred);

Review comment:
   > Nit: indentation should be 2 spaces
   
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


dongjoon-hyun edited a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727130032


   You are repeatedly saying that I had taken your right away here. In this 
PR's context, it doesn't make sense to me either.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


dongjoon-hyun commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727130032


   You are saying that I had taken your right away repeatedly. In this PR's 
context, it doesn't make sense to me either.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


dongjoon-hyun edited a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727125565


   Please remember that I also replied politely at your first comment like 
this. I apologized first and tried to understand your request and to fix it 
immiediately because I thought I missed your some comments.
   > Sorry, but could you be more specific? It seems that I missed some 
comments.
   
   But, it isn't true, is it? You didn't make any signal here. Suddenly, you 
were upset because your approval is not asked explicitly by me.
   
   After than, you did ignore my approval step and suggested me to have to say 
`I'll merge in tomorrow if there's no further feedback.`. That's the way I feel 
that you disregarded me. The rest of discussion looks like the same pattern to 
me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


dongjoon-hyun commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727125565


   Please remember that I also replied politely at your first comment like 
this. I apologized first and tried to understand your request and to fix it 
immiediately because I thought I missed your some comments.
   > Sorry, but could you be more specific? It seems that I missed some 
comments.
   
   But, it isn't true, is it? You didn't make any signal here. Suddenly, you 
were upset that your approval is not asked explicitly by me.
   
   After than, you did ignore my approval step and suggested me to have to say 
`I'll merge in tomorrow if there's no further feedback.`. That's the way I feel 
that you disregarded me. The rest of discussion looks like the same pattern to 
me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ueshin commented on pull request #30242: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-11-13 Thread GitBox


ueshin commented on pull request #30242:
URL: https://github.com/apache/spark/pull/30242#issuecomment-727124941


   gentle ping



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30366: [WIP][SPARK-33440][CORE] Separate the calculation for the next renewal date for each delegation token

2020-11-13 Thread GitBox


HeartSaVioR commented on pull request #30366:
URL: https://github.com/apache/spark/pull/30366#issuecomment-727124742


   I just changed the PR to draft, as I would make sure we consider about the 
better change (than the current approach of the PR) accounting my comments 
above.
   
   Reviewers, please consider the code change to only the reference, and go 
through below comments and share your opinion. Thanks in advance!
   
   https://github.com/apache/spark/pull/30366#issuecomment-726531257
   https://github.com/apache/spark/pull/30366#issuecomment-726534500



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #30139: [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunk

2020-11-13 Thread GitBox


otterc commented on a change in pull request #30139:
URL: https://github.com/apache/spark/pull/30139#discussion_r523353481



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java
##
@@ -88,12 +88,14 @@ public void processFetchRequest(
   logger.trace("Received req from {} to fetch block {}", 
getRemoteAddress(channel),
 msg.streamChunkId);
 }
-long chunksBeingTransferred = streamManager.chunksBeingTransferred();
-if (chunksBeingTransferred >= maxChunksBeingTransferred) {
-  logger.warn("The number of chunks being transferred {} is above {}, 
close the connection.",
-chunksBeingTransferred, maxChunksBeingTransferred);
-  channel.close();
-  return;
+if (maxChunksBeingTransferred < Long.MAX_VALUE) {
+  long chunksBeingTransferred = streamManager.chunksBeingTransferred();
+  if (chunksBeingTransferred >= maxChunksBeingTransferred) {
+logger.warn("The number of chunks being transferred {} is above {}, 
close the connection.",
+chunksBeingTransferred, maxChunksBeingTransferred);

Review comment:
   Nit: indentation should be 2

##
File path: 
common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java
##
@@ -124,12 +124,14 @@ private void processStreamRequest(final StreamRequest 
req) {
 req.streamId);
 }
 
-long chunksBeingTransferred = streamManager.chunksBeingTransferred();
-if (chunksBeingTransferred >= maxChunksBeingTransferred) {
-  logger.warn("The number of chunks being transferred {} is above {}, 
close the connection.",
-chunksBeingTransferred, maxChunksBeingTransferred);
-  channel.close();
-  return;
+if (maxChunksBeingTransferred < Long.MAX_VALUE) {
+  long chunksBeingTransferred = streamManager.chunksBeingTransferred();
+  if (chunksBeingTransferred >= maxChunksBeingTransferred) {
+logger.warn("The number of chunks being transferred {} is above {}, 
close the connection.",
+chunksBeingTransferred, maxChunksBeingTransferred);

Review comment:
   Nit: indentation should be 2 spaces





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR edited a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727121487


   And I also admit I have different voice on post-review. I agree post-review 
would open the possibility for reviewers to review later who weren't active 
during the review period before merging, but IMHO it's over-use of post-review 
if it's being used to provide merger rationalization to cut the review period 
at any time, feel other reviewers to be **disregarded**. For sure, that's just 
a 2 cents.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffl

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-727123177







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle serv

2020-11-13 Thread GitBox


SparkQA commented on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-727122995


   **[Test build #131082 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131082/testReport)**
 for PR 30164 at commit 
[`9ba4dfb`](https://github.com/apache/spark/commit/9ba4dfbed66fab7badf279ce3e83d4db569d61e2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffl

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-727122853







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30366: [SPARK-33440][CORE] Separate the calculation for the next renewal date for each delegation token

2020-11-13 Thread GitBox


HeartSaVioR edited a comment on pull request #30366:
URL: https://github.com/apache/spark/pull/30366#issuecomment-727107472


   @tgravescs 
   
   Thanks for the comment.
   
   > I thought the way we did it was just got the earliest renewal so we didn't 
have to track all them individually because the common case is renewal only 
happens in hours time frame - like once every 24 hours but its obviously 
configurable. 
   
   Yes that's configurable but agree that it only happens in hours time frame. 
The problematic token also produced expired time as 7 days later, so even 
longer than the Hadoop delegation token. The problem is simply just because 
Spark has a belief that token identifier should have valid issue date, whereas 
it's not "guaranteed" for every implementations. Once the precondition is 
broken, the calculation is completely going on unexpected way. There's 
safeguard but the safeguard also makes thing bad.
   
   That's why I had to change the approach. Before reaching this approach, I 
just fixed the issue via simply discarding the invalid next renewal date like 
less than now, but that really feels me just adding a workaround. (This 
workaround also can't handle the new case I left in above comment.) And given 
we're here, I'd like to collect some ideas to make this "concrete" with 
tolerating known case of broken assumption.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR edited a comment on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727120756


   I was writing a wall of text and Chrome happily (?) killed itself. Rewriting 
one.
   
   What I really asked you to do is exactly this. The practice is also happened 
for the first time I made a first contribution on Apache Spark. If you want I 
can find that too.
   
   https://github.com/apache/spark/pull/30139#issuecomment-727103124
   
   I'm really unhappy now you're saying I'm framing Apache Spark community to 
be negative ways. I was never saying about the community here. Let me quote my 
first comment again, where's the strong voice like the attacking or making 
things being negative, and mentioning any sort of thing representing community?
   
   > (I hope the PR can wait for reviewers' approvals who left valid review 
comments. I understand the PR can be merged "technically", but doesn't seem to 
be a good practice. If in doubt w.r.t reviewers being inactive, we can at least 
wait for a day based on timezone difference.)
   
   This comment is exactly explaining the same thing what I link for reference. 
Could you find any difference? Even I didn't say that strongly.
   
   A PMC member is just a "one of" PMC members. Before you make consensus along 
with PMC members and construct a BYLAWS, you can't frame me as one guy 
complaining to Spark community. The practice is not a thing "everyone" has a 
belief, so attacking the practice shouldn't be interpreted like so. Makes sense?
   
   For sure, I said about -1 as a sort of joke, but what you requested to me to 
block merger to merge at any time is "technically" the same. That is just a 
"representational" difference, and for me it's also not welcomed one if I have 
to always mention merger to stop doing before I approve. And this suggestion is 
your own, not from the consensus.
   
   But in overall, I have to admit I really did one bad thing, trying to 
educate someone which is not constructed like BYLAW in any way, regardless of 
the intention. I should avoid doing that and raise discussion in dev mailing 
thread like I have been doing before. My apologize.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727121487


   And I also admit I have different voice on post-review. I agree post-review 
would open the possibility for reviewers to review later who weren't active 
during the review period before merging, but IMHO it's over-use of post-review 
if it's being used to provide merger rationalization to cut the review period 
at any time, feel other reviewers to be **disregarded**.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30210: [SPARK-33259][SS] Disable streaming query with possible correctness issue by default

2020-11-13 Thread GitBox


HeartSaVioR commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-727120756


   I was writing a wall of text and Chrome happily (?) killed itself. Rewriting 
one.
   
   What I really asked you to do is exactly this. The practice is also happened 
for the first time I made a first contribution on Apache Spark. If you want I 
can find that too.
   
   https://github.com/apache/spark/pull/30139#issuecomment-727103124
   
   I'm really unhappy now you're saying I'm framing Apache Spark community to 
be negative ways. I was never saying about the community here. Let me quote my 
comment again, where's the strong voice like the attacking or making things 
being negative, and mentioning any sort of thing representing community?
   
   > (I hope the PR can wait for reviewers' approvals who left valid review 
comments. I understand the PR can be merged "technically", but doesn't seem to 
be a good practice. If in doubt w.r.t reviewers being inactive, we can at least 
wait for a day based on timezone difference.)
   
   This comment is exactly explaining the same thing what I link for reference. 
Could you find any difference? Even I didn't say that strongly.
   
   A PMC member is just a "one of" PMC members. Before you make consensus along 
with PMC members and construct a BYLAWS, you can't frame me as one guy 
complaining to Spark community. The practice is not a thing "everyone" has a 
belief, so attacking the practice shouldn't be interpreted like so. Makes sense?
   
   For sure, I said about -1 as a sort of joke, but what you requested to me to 
block merger to merge at any time is "technically" the same. That is just a 
"representational" difference, and for me it's also not welcomed one if I have 
to always mention merger to stop doing before I approve. And this suggestion is 
your own, not from the consensus.
   
   But in overall, I have to admit I really did one bad thing, trying to 
educate someone which is not constructed like BYLAW in any way, regardless of 
the intention. I should avoid doing that and raise discussion in dev mailing 
thread like I have been doing before. My apologize.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30371: [SPARK-33337][SQL][FOLLOWUP] Prevent possible flakyness in SubexpressionEliminationSuite

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30371:
URL: https://github.com/apache/spark/pull/30371#issuecomment-727107898







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30366: [SPARK-33440][CORE] Separate the calculation for the next renewal date for each delegation token

2020-11-13 Thread GitBox


HeartSaVioR commented on pull request #30366:
URL: https://github.com/apache/spark/pull/30366#issuecomment-727108543


   @dongjoon-hyun 
   Thanks for the review. Given @tgravescs is revisiting the logic (and 
considering the above comments as well), I expect the code change would occur 
afterwards. I'd defer addressing review comments until we are in agreement with 
the direction and the PR reflects the consensus.
   
   In the meanwhile I'll try to look into how to add the test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30371: [SPARK-33337][SQL][FOLLOWUP] Prevent possible flakyness in SubexpressionEliminationSuite

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30371:
URL: https://github.com/apache/spark/pull/30371#issuecomment-727107898







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30371: [SPARK-33337][SQL][FOLLOWUP] Prevent possible flakyness in SubexpressionEliminationSuite

2020-11-13 Thread GitBox


SparkQA removed a comment on pull request #30371:
URL: https://github.com/apache/spark/pull/30371#issuecomment-727018690


   **[Test build #131084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131084/testReport)**
 for PR 30371 at commit 
[`9816402`](https://github.com/apache/spark/commit/9816402ffddaf9d7c28a5d734ef4411728089dd2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30371: [SPARK-33337][SQL][FOLLOWUP] Prevent possible flakyness in SubexpressionEliminationSuite

2020-11-13 Thread GitBox


SparkQA commented on pull request #30371:
URL: https://github.com/apache/spark/pull/30371#issuecomment-727107533


   **[Test build #131084 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131084/testReport)**
 for PR 30371 at commit 
[`9816402`](https://github.com/apache/spark/commit/9816402ffddaf9d7c28a5d734ef4411728089dd2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30366: [SPARK-33440][CORE] Separate the calculation for the next renewal date for each delegation token

2020-11-13 Thread GitBox


HeartSaVioR edited a comment on pull request #30366:
URL: https://github.com/apache/spark/pull/30366#issuecomment-727107472


   @tgravescs 
   
   Thanks for the comment.
   
   > I thought the way we did it was just got the earliest renewal so we didn't 
have to track all them individually because the common case is renewal only 
happens in hours time frame - like once every 24 hours but its obviously 
configurable. 
   
   Yes that's configurable but agree that it only happens in hours time frame. 
The problematic token also produced expired time as 7 days later, so even 
longer than the Hadoop delegation token. The problem is simply just because 
Spark has a belief that token identifier should have valid issue date, whereas 
it's not "guaranteed" for every implementations. Once the precondition is 
broken, the calculation is completely going on unintentional way. There's 
safeguard but the safeguard also makes thing bad.
   
   That's why I had to change the approach. Before reaching this approach, I 
just fixed the issue via simply discarding the invalid next renewal date like 
less than now, but that really feels me just adding a workaround. (This 
workaround also can't handle the new case I left in above comment.) And given 
we're here, I'd like to collect some ideas to make this "concrete" with 
tolerating known case of broken assumption.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-727107235







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30366: [SPARK-33440][CORE] Separate the calculation for the next renewal date for each delegation token

2020-11-13 Thread GitBox


HeartSaVioR commented on pull request #30366:
URL: https://github.com/apache/spark/pull/30366#issuecomment-727107472


   @tgravescs 
   
   Thanks for the comment.
   
   > I thought the way we did it was just got the earliest renewal so we didn't 
have to track all them individually because the common case is renewal only 
happens in hours time frame - like once every 24 hours but its obviously 
configurable. 
   
   Yes that's configurable but agree that it only happens in hours time frame. 
The problematic token also produced expired time as 7 days later, so even 
longer than the Hadoop delegation token. That is simply just because Spark has 
a belief that token identifier should have valid issue date, whereas it's not 
"guaranteed" for every implementations. Once the precondition is broken, the 
calculation is completely going on unintentional way. There's safeguard but the 
safeguard also makes thing bad.
   
   That's why I had to change the approach. Before reaching this approach, I 
just fixed the issue via simply discarding the invalid next renewal date like 
less than now, but that really feels me just adding a workaround. (This 
workaround also can't handle the new case I left in above comment.) And given 
we're here, I'd like to collect some ideas to make this "concrete" with 
tolerating known case of broken assumption.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-11-13 Thread GitBox


SparkQA commented on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-727107227


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35689/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-727107235







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30373: [SPARK-33183][SQL][FOLLOW-UP] Adjust RemoveRedundantSorts rule order and update config version

2020-11-13 Thread GitBox


AmplabJenkins removed a comment on pull request #30373:
URL: https://github.com/apache/spark/pull/30373#issuecomment-727106355







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30373: [SPARK-33183][SQL][FOLLOW-UP] Adjust RemoveRedundantSorts rule order and update config version

2020-11-13 Thread GitBox


AmplabJenkins commented on pull request #30373:
URL: https://github.com/apache/spark/pull/30373#issuecomment-727106355







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30373: [SPARK-33183][SQL][FOLLOW-UP] Adjust RemoveRedundantSorts rule order and update config version

2020-11-13 Thread GitBox


SparkQA commented on pull request #30373:
URL: https://github.com/apache/spark/pull/30373#issuecomment-727106353


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35688/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] warrenzhu25 commented on pull request #30370: [SPARK-33446][CORE] Add config spark.executor.coresOverhead

2020-11-13 Thread GitBox


warrenzhu25 commented on pull request #30370:
URL: https://github.com/apache/spark/pull/30370#issuecomment-727105908


   > I am not sure I follow.
   > 
   > * If you want an executor with 2 cores and 6 gb, you can allocate them 
with existing configs.
   > * If you want an executor with 1 core and 6 gb, you can do the same - if 
underlying cluster allocates in 3gb/1core block, 1 core will be wasted - which 
is by requirement here.
   > * If you want 1 core and 6 gb - then setting core overhead will not help - 
since it is an underlying cluster issue that you cant get this : if I 
understood it properly (and extrapolating based on what yarn used to do).
   > 
   > In other words, spark application makes the request for what it needs - 
cluster provides the next higher multiple which satisfies the requirements. If 
this means extra memory or additional cores, spark wont use it.
   > 
   > I am trying to understand what is the actual usecase we have, and how the 
existing configs wont help. Thanks !
   
   I want an executor with 2 cores and 6 gb, but only 1 core used for task 
allocation, which means at most 1 task could be running on this executor. If I 
use existing configs, there would be at most 2 tasks. I want to give each task 
6GB memory, but also use extra cpu time for gc or other things.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm edited a comment on pull request #30370: [SPARK-33446][CORE] Add config spark.executor.coresOverhead

2020-11-13 Thread GitBox


mridulm edited a comment on pull request #30370:
URL: https://github.com/apache/spark/pull/30370#issuecomment-727104096


   I am not sure I follow.
   
   * If you want an executor with 2 cores and 6 gb, you can allocate them with 
existing configs.
   * If you want an executor with 1 core and 6 gb, you can do the same - if 
underlying cluster allocates in 3gb/1core block, 1 core will be wasted - which 
is by requirement here.
   * If you want 1 core and 6 gb - then setting core overhead will not help - 
since it is an underlying cluster issue that you cant get this : if I 
understood it properly (and extrapolating based on what yarn used to do).
   
   In other words, spark application makes the request for what it needs - 
cluster provides the next higher multiple which satisfies the requirements. If 
this means extra memory or additional cores, spark wont use it.
   
   
   I am trying to understand what is the actual usecase we have, and how the 
existing configs wont help. Thanks !
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >