date:20190918

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532565769
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16010/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher 
order functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-532565700
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON

2019-09-18 Thread GitBox

zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common 
classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in 
PYTHON
URL: https://github.com/apache/spark/pull/25776#issuecomment-532566063
 
 
   @srowen Most models in Pyspark do not have any setter/getter (One exception 
is OneVsRest). And no model has prediction function.
   
   A main complaint about PySpark-ML I heard from the uers of JD's bigdate 
platform is that they can not set the input/output column name of models. It is 
inconvenient to rename some columns to avoid column conflicts.
   Suppose we deal with a classification task in a interactive mode(like 
jupyter). We have trained some classification models with default columns 
names, we evaluate them one by one, and then want to ensamble some good models. 
Now we must rename the `predictionCol` of some models after transformation, 
since all model have the same column name. Otherwise, we need to re-train them 
with modified column names. Similar cases are easy to happen when we deal with 
dataframe with tens of columns and try several algorithms. So we want the 
column setters like the scala side.
   
   The goal is to make the py side in sync with the scala side. It has two 
benefits: 1, it will be easy to maintain the codebase, when we change the scala 
side, it is easy to sync in the py side; 2, function parity, methods like 
models' getter are still missing in the py side.
   I try to devide the goal into serveral subtasks in 
[SPARK-28958](https://issues.apache.org/jira/browse/SPARK-28958), after this PR 
we need to resolve others.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532565759
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325530214
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizedLocalShuffleReader.scala
 ##
 @@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
UnknownPartitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.execution.exchange.{EnsureRequirements, 
ShuffleExchangeExec}
+import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec}
+import org.apache.spark.sql.internal.SQLConf
+
+case class OptimizedLocalShuffleReader(conf: SQLConf) extends Rule[SparkPlan] {
+
+  private def setIsLocalToFalse(shuffleStage: QueryStageExec): QueryStageExec 
= {
+shuffleStage match {
+  case stage: ShuffleQueryStageExec =>
+stage.isLocalShuffle = false
+  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) =>
+stage.isLocalShuffle = false
+}
+shuffleStage
+  }
+
+  private def revertLocalShuffleReader(newPlan: SparkPlan): SparkPlan = {
+val revertPlan = newPlan.transformUp {
+  case localReader: LocalShuffleReaderExec
+if (ShuffleQueryStageExec.isShuffleQueryStageExec(localReader.child)) 
=>
+setIsLocalToFalse(localReader.child)
+}
+revertPlan
+  }
+
+  override def apply(plan: SparkPlan): SparkPlan = {
+// Collect the `BroadcastHashJoinExec` nodes and if isEmpty directly 
return.
+val bhjs = plan.collect {
+  case bhj: BroadcastHashJoinExec => bhj
+}
+
+if (!conf.optimizedLocalShuffleReaderEnabled || bhjs.isEmpty) {
+  return plan
+}
+
+// If the streamedPlan is `ShuffleQueryStageExec`, set the value of 
`isLocalShuffle` to true
+bhjs.map {
+  case bhj: BroadcastHashJoinExec =>
+bhj.children map {
+  case stage: ShuffleQueryStageExec => stage.isLocalShuffle = true
+  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) =>
+stage.isLocalShuffle = true
+  case plan: SparkPlan => plan
+}
+}
+
+// Add the new `LocalShuffleReaderExec` node if the value of 
`isLocalShuffle` is true
+val newPlan = plan.transformUp {
+  case stage: ShuffleQueryStageExec if (stage.isLocalShuffle) =>
+LocalShuffleReaderExec(stage)
+  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) if 
(stage.isLocalShuffle) =>
 
 Review comment:
   let's not strip the `ReusedQueryStageExec`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON

2019-09-18 Thread GitBox

zhengruifeng commented on a change in pull request #25776: 
[SPARK-28985][PYTHON][ML] Add common classes 
(JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON
URL: https://github.com/apache/spark/pull/25776#discussion_r325531166
 
 

 ##
 File path: python/pyspark/ml/classification.py
 ##
 @@ -81,6 +160,8 @@ class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, Ha
 ... Row(label=0.0, features=Vectors.dense(1.0, 2.0, 3.0))]).toDF()
 >>> svm = LinearSVC(maxIter=5, regParam=0.01)
 >>> model = svm.fit(df)
+>>> model.setPredictionCol("prediction")
 
 Review comment:
   What about changing the value to a non-default value like "newPrediction", 
and making sure that the ouput dataframe/row has changed column name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON

2019-09-18 Thread GitBox

zhengruifeng commented on a change in pull request #25776: 
[SPARK-28985][PYTHON][ML] Add common classes 
(JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON
URL: https://github.com/apache/spark/pull/25776#discussion_r325531166
 
 

 ##
 File path: python/pyspark/ml/classification.py
 ##
 @@ -81,6 +160,8 @@ class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, Ha
 ... Row(label=0.0, features=Vectors.dense(1.0, 2.0, 3.0))]).toDF()
 >>> svm = LinearSVC(maxIter=5, regParam=0.01)
 >>> model = svm.fit(df)
+>>> model.setPredictionCol("prediction")
 
 Review comment:
   What about change the value to a non-default value like "newPrediction", and 
make sure that the ouput dataframe/row has changed column name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325533208
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 ##
 @@ -91,6 +91,7 @@ case class AdaptiveSparkPlanExec(
   // optimizations should be stage-independent.
   @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
 ReuseAdaptiveSubquery(conf, subqueryCache),
+OptimizedLocalShuffleReader(conf),
 
 Review comment:
   since this may change the number of exchanges, we should put it in 
https://github.com/apache/spark/pull/25295/files#diff-6954dd8020a9ca298f1fb9602c0e831cR77
   
   Then the AQE framework can check the cost and give up the optimization if 
extra changes are introduced.
   
   Note that, making it a physical rule and check number of exchanges is 
suboptimal. It's possible that the local shuffle reader can avoid exchanges 
downstream, which changes the stage boundaries.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark

2019-09-18 Thread GitBox

MaxGekk commented on a change in pull request #25772: [SPARK-29065][SQL][TEST] 
Extend `EXTRACT` benchmark
URL: https://github.com/apache/spark/pull/25772#discussion_r325534755
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ExtractBenchmark.scala
 ##
 @@ -31,52 +36,76 @@ import java.time.Instant
  *  Results will be written to "benchmarks/ExtractBenchmark-results.txt".
  * }}}
  */
-object ExtractBenchmark extends SqlBasedBenchmark {
+object ExtractBenchmark extends BenchmarkBase with SQLHelper {
+  private val spark: SparkSession = SparkSession.builder()
+.master("local[1]")
+.appName(this.getClass.getCanonicalName)
+.getOrCreate()
 
 Review comment:
   Here is the PR https://github.com/apache/spark/pull/25828 , just in case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk opened a new pull request #25828: [SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL benchmarks

2019-09-18 Thread GitBox

MaxGekk opened a new pull request #25828: [SPARK-29141][SQL][TEST] Use 
SqlBasedBenchmark in SQL benchmarks
URL: https://github.com/apache/spark/pull/25828
 
 
   ### What changes were proposed in this pull request?
   
   Refactored SQL-related benchmark and made them depend on 
`SqlBasedBenchmark`. In particular, creation of Spark session are moved into 
`override def getSparkSession: SparkSession`.
   
   ### Why are the changes needed?
   
   This should simplify maintenance of SQL-based benchmarks by reducing the 
number of dependencies. In the future, it should be easier to refactor & extend 
all SQL benchmarks by changing only one trait. Finally, all SQL-based 
benchmarks will look uniformly. 
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   
   By running the modified benchmarks.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325533208
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 ##
 @@ -91,6 +91,7 @@ case class AdaptiveSparkPlanExec(
   // optimizations should be stage-independent.
   @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
 ReuseAdaptiveSubquery(conf, subqueryCache),
+OptimizedLocalShuffleReader(conf),
 
 Review comment:
   since this may change the number of exchanges, we should put it in 
`queryStagePreparationRules`
   Then the AQE framework can check the cost and give up the optimization if 
extra changes are introduced.
   
   Note that, the current approach (check number of exchanges at the end of 
rule) is suboptimal. It's possible that the local shuffle reader can avoid 
exchanges downstream, which changes the stage boundaries.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on issue #25802: [SPARK-29095][ML] add extractInstances

2019-09-18 Thread GitBox

zhengruifeng commented on issue #25802: [SPARK-29095][ML] add extractInstances
URL: https://github.com/apache/spark/pull/25802#issuecomment-532570363
 
 
   friendly ping @srowen 
   now more and more algs support sample-weighting, `extractLabeledPoints` are 
rarely used. We may need to add this method as an alternative to 
`extractLabeledPoints`.
   When RF&GBT support weighting, it can be reused in them. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325533208
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 ##
 @@ -91,6 +91,7 @@ case class AdaptiveSparkPlanExec(
   // optimizations should be stage-independent.
   @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
 ReuseAdaptiveSubquery(conf, subqueryCache),
+OptimizedLocalShuffleReader(conf),
 
 Review comment:
   since this may change the number of exchanges, we should put it in 
`queryStagePreparationRules`
   Then the AQE framework can check the cost and give up the optimization if 
extra changes are introduced.
   
   Note that, making it a physical rule and check number of exchanges is 
suboptimal. It's possible that the local shuffle reader can avoid exchanges 
downstream, which changes the stage boundaries.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25828: [SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL benchmarks

2019-09-18 Thread GitBox

SparkQA commented on issue #25828: [SPARK-29141][SQL][TEST] Use 
SqlBasedBenchmark in SQL benchmarks
URL: https://github.com/apache/spark/pull/25828#issuecomment-532570857
 
 
   **[Test build #110891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110891/testReport)**
 for PR 25828 at commit 
[`9a279a3`](https://github.com/apache/spark/commit/9a279a3530dfa1b6328ea46e75996a837cbd123a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for 
glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532571063
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110889/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for 
glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532571057
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

SparkQA commented on issue #24885: [SPARK-28040][R] Add serialization for glue 
type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532571047
 
 
   **[Test build #110889 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110889/testReport)**
 for PR 24885 at commit 
[`96e9544`](https://github.com/apache/spark/commit/96e9544e84c67abc75bcfb8d9471f1b605f5105f).
* This patch **fails R style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] advancedxy commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives duplicate result when an application was killed

2019-09-18 Thread GitBox

advancedxy commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives 
duplicate result when an application was killed
URL: https://github.com/apache/spark/pull/25795#issuecomment-532570982
 
 
   > @advancedxy can you give a completed proposal for it?
   
   All right, I think the requirements can be split into two parts:
   
   1. support concurrent writes to different locations(partitions).
   It's achieved by setting different output path for different writes:
   * For `dynamicPartitionOverwrite`, the output could be the staging 
dir(current solution of #25739), which is unique from each other. 
* For  `dynamicPartitionsOverwrite=false` and partitioned table, the 
output in the `OutputCommitter` could be 
`$table_output/static_part_key1=value1/static_part_key2=value2/...`. Concurrent 
writes to partitions prefixed by different static partitions won't interfere 
each other. This could be extended in #25379. 
* For non-partitioned table, there's only one output, don't support 
concurrent writes.
   2. detect concurrent writes to the same location and fail fast.
   This can be archived during `setupJob` stage. We can check the existence 
of output path like the `FileOutputFormat` did. If the output path has already 
been existed, it must be created by other concurrent writing job or left by 
previous failed/killed job. We can throw an exception with the possible reasons 
and fails the current job. Of course, we cannot simple check the output passed 
to JobConf as the $table_output should be presented(unless the first time to 
create table). $table_output/_temporary/$app_attempt_num could be a good 
candidate.
   
  One more thing to do in Spark, spark should infer yarn app attempt num 
when running under yarn mode. Currently, the app attempt num is always 0 when 
writing.
   
   I believe the approach proposal should covers concurrent writes and case in 
this pr. WDYT @cloud-fan, @turboFei and @wangyum 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

SparkQA removed a comment on issue #24885: [SPARK-28040][R] Add serialization 
for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532565133
 
 
   **[Test build #110889 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110889/testReport)**
 for PR 24885 at commit 
[`96e9544`](https://github.com/apache/spark/commit/96e9544e84c67abc75bcfb8d9471f1b605f5105f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532571057
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24232: [SPARK-27297] [SQL] Add higher order 
functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-532571421
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532571063
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110889/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher 
order functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-532571421
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24232: [SPARK-27297] [SQL] Add higher order 
functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-532571426
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110890/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-09-18 Thread GitBox

SparkQA removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order 
functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-532565160
 
 
   **[Test build #110890 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110890/testReport)**
 for PR 24232 at commit 
[`f371413`](https://github.com/apache/spark/commit/f371413de47c33beb78c396f0da36bea5cdc62a0).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25828: [SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL benchmarks

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #25828: [SPARK-29141][SQL][TEST] Use 
SqlBasedBenchmark in SQL benchmarks
URL: https://github.com/apache/spark/pull/25828#issuecomment-532571476
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16012/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25828: [SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL benchmarks

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #25828: [SPARK-29141][SQL][TEST] Use 
SqlBasedBenchmark in SQL benchmarks
URL: https://github.com/apache/spark/pull/25828#issuecomment-532571465
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325536116
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizedLocalShuffleReader.scala
 ##
 @@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
UnknownPartitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.execution.exchange.{EnsureRequirements, 
ShuffleExchangeExec}
+import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec}
+import org.apache.spark.sql.internal.SQLConf
+
+case class OptimizedLocalShuffleReader(conf: SQLConf) extends Rule[SparkPlan] {
+
+  private def setIsLocalToFalse(shuffleStage: QueryStageExec): QueryStageExec 
= {
+shuffleStage match {
+  case stage: ShuffleQueryStageExec =>
+stage.isLocalShuffle = false
+  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) =>
+stage.isLocalShuffle = false
+}
+shuffleStage
+  }
+
+  private def revertLocalShuffleReader(newPlan: SparkPlan): SparkPlan = {
+val revertPlan = newPlan.transformUp {
+  case localReader: LocalShuffleReaderExec
+if (ShuffleQueryStageExec.isShuffleQueryStageExec(localReader.child)) 
=>
+setIsLocalToFalse(localReader.child)
+}
+revertPlan
+  }
+
+  override def apply(plan: SparkPlan): SparkPlan = {
+// Collect the `BroadcastHashJoinExec` nodes and if isEmpty directly 
return.
+val bhjs = plan.collect {
+  case bhj: BroadcastHashJoinExec => bhj
+}
+
+if (!conf.optimizedLocalShuffleReaderEnabled || bhjs.isEmpty) {
+  return plan
+}
+
+// If the streamedPlan is `ShuffleQueryStageExec`, set the value of 
`isLocalShuffle` to true
+bhjs.map {
+  case bhj: BroadcastHashJoinExec =>
+bhj.children map {
+  case stage: ShuffleQueryStageExec => stage.isLocalShuffle = true
+  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) =>
+stage.isLocalShuffle = true
+  case plan: SparkPlan => plan
+}
+}
+
+// Add the new `LocalShuffleReaderExec` node if the value of 
`isLocalShuffle` is true
+val newPlan = plan.transformUp {
+  case stage: ShuffleQueryStageExec if (stage.isLocalShuffle) =>
+LocalShuffleReaderExec(stage)
+  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) if 
(stage.isLocalShuffle) =>
+LocalShuffleReaderExec(stage)
+}
+
+val afterEnsureRequirements = EnsureRequirements(conf).apply(newPlan)
+val numExchanges = afterEnsureRequirements.collect {
+  case e: ShuffleExchangeExec => e
+}.length
+if (numExchanges > 0) {
+  logWarning("Local shuffle reader optimization is not applied due" +
+" to additional shuffles will be introduced.")
+  revertLocalShuffleReader(newPlan)
+} else {
+  newPlan
+}
+  }
+}
+
+case class LocalShuffleReaderExec(
+child: QueryStageExec) extends UnaryExecNode {
+
+  override def output: Seq[Attribute] = child.output
+
+  override def doCanonicalize(): SparkPlan = child.canonicalized
+
+  override def outputPartitioning: Partitioning = {
 
 Review comment:
   shouldn't this be `child.outputPartitioning`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-m

[GitHub] [spark] SparkQA commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-09-18 Thread GitBox

SparkQA commented on issue #24232: [SPARK-27297] [SQL] Add higher order 
functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-532571372
 
 
   **[Test build #110890 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110890/testReport)**
 for PR 24232 at commit 
[`f371413`](https://github.com/apache/spark/commit/f371413de47c33beb78c396f0da36bea5cdc62a0).
* This patch **fails to generate documentation**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25828: [SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL benchmarks

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #25828: [SPARK-29141][SQL][TEST] Use 
SqlBasedBenchmark in SQL benchmarks
URL: https://github.com/apache/spark/pull/25828#issuecomment-532571476
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16012/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher 
order functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-532571426
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110890/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25828: [SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL benchmarks

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #25828: [SPARK-29141][SQL][TEST] Use 
SqlBasedBenchmark in SQL benchmarks
URL: https://github.com/apache/spark/pull/25828#issuecomment-532571465
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325536699
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizedLocalShuffleReader.scala
 ##
 @@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
UnknownPartitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.execution.exchange.{EnsureRequirements, 
ShuffleExchangeExec}
+import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec}
+import org.apache.spark.sql.internal.SQLConf
+
+case class OptimizedLocalShuffleReader(conf: SQLConf) extends Rule[SparkPlan] {
+
+  private def setIsLocalToFalse(shuffleStage: QueryStageExec): QueryStageExec 
= {
+shuffleStage match {
+  case stage: ShuffleQueryStageExec =>
+stage.isLocalShuffle = false
+  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) =>
+stage.isLocalShuffle = false
+}
+shuffleStage
+  }
+
+  private def revertLocalShuffleReader(newPlan: SparkPlan): SparkPlan = {
+val revertPlan = newPlan.transformUp {
+  case localReader: LocalShuffleReaderExec
+if (ShuffleQueryStageExec.isShuffleQueryStageExec(localReader.child)) 
=>
+setIsLocalToFalse(localReader.child)
+}
+revertPlan
+  }
+
+  override def apply(plan: SparkPlan): SparkPlan = {
+// Collect the `BroadcastHashJoinExec` nodes and if isEmpty directly 
return.
+val bhjs = plan.collect {
+  case bhj: BroadcastHashJoinExec => bhj
+}
+
+if (!conf.optimizedLocalShuffleReaderEnabled || bhjs.isEmpty) {
+  return plan
+}
+
+// If the streamedPlan is `ShuffleQueryStageExec`, set the value of 
`isLocalShuffle` to true
+bhjs.map {
+  case bhj: BroadcastHashJoinExec =>
+bhj.children map {
+  case stage: ShuffleQueryStageExec => stage.isLocalShuffle = true
+  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) =>
+stage.isLocalShuffle = true
+  case plan: SparkPlan => plan
+}
+}
+
+// Add the new `LocalShuffleReaderExec` node if the value of 
`isLocalShuffle` is true
+val newPlan = plan.transformUp {
+  case stage: ShuffleQueryStageExec if (stage.isLocalShuffle) =>
+LocalShuffleReaderExec(stage)
+  case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) if 
(stage.isLocalShuffle) =>
+LocalShuffleReaderExec(stage)
+}
+
+val afterEnsureRequirements = EnsureRequirements(conf).apply(newPlan)
+val numExchanges = afterEnsureRequirements.collect {
+  case e: ShuffleExchangeExec => e
+}.length
+if (numExchanges > 0) {
+  logWarning("Local shuffle reader optimization is not applied due" +
+" to additional shuffles will be introduced.")
+  revertLocalShuffleReader(newPlan)
+} else {
+  newPlan
+}
+  }
+}
+
+case class LocalShuffleReaderExec(
+child: QueryStageExec) extends UnaryExecNode {
 
 Review comment:
   We can make it a leaf node to hide its `QueryStageExec`. We don't expect any 
other rules to change the underlying shuffle stage.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available

2019-09-18 Thread GitBox

HeartSaVioR commented on a change in pull request #25760: [SPARK-29054][SS] 
Invalidate Kafka consumer when new delegation token available
URL: https://github.com/apache/spark/pull/25760#discussion_r325532165
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -516,13 +521,41 @@ private[kafka010] class KafkaDataConsumer(
 fetchedData.withNewPoll(records.listIterator, offsetAfterPoll)
   }
 
-  private def getOrRetrieveConsumer(): InternalKafkaConsumer = _consumer match 
{
-case None =>
-  _consumer = Option(consumerPool.borrowObject(cacheKey, kafkaParams))
-  require(_consumer.isDefined, "borrowing consumer from pool must always 
succeed.")
-  _consumer.get
+  private[kafka010] def getOrRetrieveConsumer(): InternalKafkaConsumer = {
+if (!_consumer.isDefined) {
+  retrieveConsumer()
+}
+ensureConsumerHasLatestToken()
+_consumer.get
+  }
 
-case Some(consumer) => consumer
+  private def retrieveConsumer(): Unit = {
+_consumer = Option(consumerPool.borrowObject(cacheKey, kafkaParams))
+require(_consumer.isDefined, "borrowing consumer from pool must always 
succeed.")
+  }
+
+  private def ensureConsumerHasLatestToken(): Unit = {
+require(_consumer.isDefined, "Consumer must be defined")
+val params = _consumer.get.kafkaParamsWithSecurity
+if (params.containsKey(SaslConfigs.SASL_JAAS_CONFIG)) {
+  logDebug("Delegation token used by cached consumer, checking if uses the 
latest token.")
+
+  val jaasParams = 
params.get(SaslConfigs.SASL_JAAS_CONFIG).asInstanceOf[String]
+  val clusterConfig = KafkaTokenUtil.findMatchingToken(SparkEnv.get.conf,
 
 Review comment:
   I feel `findMatchingToken` does too many things - that's why it needs tuple 
to  return, while callers seem to use either. Maybe better to split 
`findMatchingToken` into two, same name for token, new name (like 
`findMatchingTokenConf`) for config.
   
   And given we have TokenUtil, why not TokenUtil tells whether cached consumer 
uses fresh delegation token? KafkaDataConsumer seems to handle too many things 
in its own.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available

2019-09-18 Thread GitBox

HeartSaVioR commented on a change in pull request #25760: [SPARK-29054][SS] 
Invalidate Kafka consumer when new delegation token available
URL: https://github.com/apache/spark/pull/25760#discussion_r325534687
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDataConsumerSuite.scala
 ##
 @@ -96,47 +103,86 @@ class KafkaDataConsumerSuite extends SharedSparkSession 
with PrivateMethodTester
 
   val context1 = new TaskContextImpl(0, 0, 0, 0, 0, null, null, null)
   TaskContext.setTaskContext(context1)
-  val consumer1 = KafkaDataConsumer.acquire(topicPartition, kafkaParams)
-
-  // any method call which requires consumer is necessary
-  consumer1.getAvailableOffsetRange()
-
-  val consumer1Underlying = consumer1._consumer
-  assert(consumer1Underlying.isDefined)
-
-  consumer1.release()
-
-  assert(consumerPool.size(key) === 1)
-  // check whether acquired object is available in pool
-  val pooledObj = consumerPool.borrowObject(key, kafkaParams)
-  assert(consumer1Underlying.get.eq(pooledObj))
-  consumerPool.returnObject(pooledObj)
+  val consumer1Underlying = initSingleConsumer(kafkaParams, key)
 
   val context2 = new TaskContextImpl(0, 0, 0, 0, 1, null, null, null)
   TaskContext.setTaskContext(context2)
-  val consumer2 = KafkaDataConsumer.acquire(topicPartition, kafkaParams)
-
-  // any method call which requires consumer is necessary
-  consumer2.getAvailableOffsetRange()
+  val consumer2Underlying = initSingleConsumer(kafkaParams, key)
 
-  val consumer2Underlying = consumer2._consumer
-  assert(consumer2Underlying.isDefined)
   // here we expect different consumer as pool will invalidate for task 
reattempt
   assert(consumer2Underlying.get.ne(consumer1Underlying.get))
+} finally {
+  TaskContext.unset()
+}
+  }
 
-  consumer2.release()
+  test("same KafkaDataConsumer instance in case of same token") {
+try {
+  val kafkaParams = getKafkaParams()
+  val key = new CacheKey(groupId, topicPartition)
 
-  // The first consumer should be removed from cache, but the consumer 
after invalidate
 
 Review comment:
   This code lines got removed. Does it mean the patch breaks the verification 
here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325538188
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LocalShuffledRowRDD.scala
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark._
+import org.apache.spark.rdd.{RDD, ShuffledRDDPartition}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.metric.{SQLMetric, 
SQLShuffleReadMetricsReporter}
+
+/**
+ * This is a specialized version of 
[[org.apache.spark.sql.execution.ShuffledRowRDD]]. This is used
+ * in Spark SQL adaptive execution when a shuffle join is converted to 
broadcast join at runtime
+ * because the map output of one input table is small enough for broadcast. 
This RDD represents the
+ * data of another input table of the join that reads from shuffle. Each 
partition of the RDD reads
+ * the whole data from just one mapper output locally. So actually there is no 
data transferred
+ * from the network.
+
+ * This RDD takes a [[ShuffleDependency]] (`dependency`).
+ *
+ * The `dependency` has the parent RDD of this RDD, which represents the 
dataset before shuffle
+ * (i.e. map output). Elements of this RDD are (partitionId, Row) pairs.
+ * Partition ids should be in the range [0, numPartitions - 1].
+ * `dependency.partitioner.numPartitions` is the number of pre-shuffle 
partitions. (i.e. the number
+ * of partitions of the map output). The post-shuffle partition number is the 
same to the parent
+ * RDD's partition number.
+ */
+class LocalShuffledRowRDD(
+ var dependency: ShuffleDependency[Int, InternalRow, InternalRow],
+ metrics: Map[String, SQLMetric],
+ specifiedPartitionStartIndices: Option[Array[Int]] = None,
+ specifiedPartitionEndIndices: Option[Array[Int]] = None)
 
 Review comment:
   then let's add it when you propose this optimization.
   
   From my side, I think it may be beneficial to keep empty tasks, so that the 
local shuffle reader node can retain the output partitioning from the original 
plan and help us eliminate shuffles.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

SparkQA commented on issue #24885: [SPARK-28040][R] Add serialization for glue 
type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532573629
 
 
   **[Test build #110893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110893/testReport)**
 for PR 24885 at commit 
[`475d877`](https://github.com/apache/spark/commit/475d877e5ea61b97e2ec5681ef54a5fd9f8ed0f4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

SparkQA commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv 
file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532573588
 
 
   **[Test build #110892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110892/testReport)**
 for PR 25820 at commit 
[`f6a01ff`](https://github.com/apache/spark/commit/f6a01ffb6ae6dc4c99301214cfa697d19caf2559).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for 
glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532574071
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for 
csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532574103
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16013/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for 
glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532574075
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16014/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for 
csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532574097
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count 
API for csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532574103
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16013/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532574071
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count 
API for csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532574097
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532574075
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16014/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325539965
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LocalShuffledRowRDD.scala
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark._
+import org.apache.spark.rdd.{RDD, ShuffledRDDPartition}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.metric.{SQLMetric, 
SQLShuffleReadMetricsReporter}
+
+/**
+ * This is a specialized version of 
[[org.apache.spark.sql.execution.ShuffledRowRDD]]. This is used
+ * in Spark SQL adaptive execution when a shuffle join is converted to 
broadcast join at runtime
+ * because the map output of one input table is small enough for broadcast. 
This RDD represents the
+ * data of another input table of the join that reads from shuffle. Each 
partition of the RDD reads
+ * the whole data from just one mapper output locally. So actually there is no 
data transferred
+ * from the network.
+
+ * This RDD takes a [[ShuffleDependency]] (`dependency`).
+ *
+ * The `dependency` has the parent RDD of this RDD, which represents the 
dataset before shuffle
+ * (i.e. map output). Elements of this RDD are (partitionId, Row) pairs.
+ * Partition ids should be in the range [0, numPartitions - 1].
+ * `dependency.partitioner.numPartitions` is the number of pre-shuffle 
partitions. (i.e. the number
+ * of partitions of the map output). The post-shuffle partition number is the 
same to the parent
+ * RDD's partition number.
+ */
+class LocalShuffledRowRDD(
+ var dependency: ShuffleDependency[Int, InternalRow, InternalRow],
+ metrics: Map[String, SQLMetric],
+ specifiedPartitionStartIndices: Option[Array[Int]] = None,
+ specifiedPartitionEndIndices: Option[Array[Int]] = None)
+  extends RDD[InternalRow](dependency.rdd.context, Nil) {
+
+  private[this] val numPreShufflePartitions = 
dependency.partitioner.numPartitions
+  private[this] val numPostShufflePartitions = dependency.rdd.partitions.length
 
 Review comment:
   The name is wrong. This is the # of mappers and thus should be called 
`numPreShufflePartitions`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325540713
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
 ##
 @@ -629,6 +645,35 @@ private[spark] class MapOutputTrackerMaster(
 None
   }
 
+  /**
+   * Return the locations where the Mapper(s) ran. The locations each includes 
both a host and an
+   * executor id on that host.
+   *
+   * @param dep shuffle dependency object
+   * @param startMapId the start map id
+   * @param endMapId the end map id
+   * @return a sequence of locations that each includes both a host and an 
executor id on that
 
 Review comment:
   Why not return `ExecutorCacheTaskLocation`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325540713
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
 ##
 @@ -629,6 +645,35 @@ private[spark] class MapOutputTrackerMaster(
 None
   }
 
+  /**
+   * Return the locations where the Mapper(s) ran. The locations each includes 
both a host and an
+   * executor id on that host.
+   *
+   * @param dep shuffle dependency object
+   * @param startMapId the start map id
+   * @param endMapId the end map id
+   * @return a sequence of locations that each includes both a host and an 
executor id on that
 
 Review comment:
   Why not return `ExecutorCacheTaskLocation`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325542826
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
 ##
 @@ -629,6 +645,35 @@ private[spark] class MapOutputTrackerMaster(
 None
   }
 
+  /**
+   * Return the locations where the Mapper(s) ran. The locations each includes 
both a host and an
+   * executor id on that host.
+   *
+   * @param dep shuffle dependency object
+   * @param startMapId the start map id
+   * @param endMapId the end map id
+   * @return a sequence of locations that each includes both a host and an 
executor id on that
 
 Review comment:
   `includes both a host and an executor id` is confusing. We can just say 
`task location strinng (please refer to TaskLocation)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer

2019-09-18 Thread GitBox

zhengruifeng commented on a change in pull request #25812: 
[SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark 
QuantileDiscretizer
URL: https://github.com/apache/spark/pull/25812#discussion_r325544161
 
 

 ##
 File path: python/pyspark/ml/feature.py
 ##
 @@ -2082,10 +2144,19 @@ def _create_model(self, java_model):
 """
 Private method to convert the java_model to a Python model.
 """
-return Bucketizer(splits=list(java_model.getSplits()),
-  inputCol=self.getInputCol(),
-  outputCol=self.getOutputCol(),
-  handleInvalid=self.getHandleInvalid())
+if (self.isSet(self.inputCol)):
+return Bucketizer(splits=list(java_model.getSplits()),
+  inputCol=self.getInputCol(),
+  outputCol=self.getOutputCol(),
+  handleInvalid=self.getHandleInvalid())
+else:
+splitsArrayList = []
+for x in list(java_model.getSplitsArray()):
+splitsArrayList.append(list(x))
 
 Review comment:
   what about `splitsArrayList = [list(x) for x in 
list(java_model.getSplitsArray())]`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24901: [SPARK-28091[CORE] Extend Spark metrics system with user-defined metrics using executor plugins

2019-09-18 Thread GitBox

SparkQA commented on issue #24901: [SPARK-28091[CORE] Extend Spark metrics 
system with user-defined metrics using executor plugins
URL: https://github.com/apache/spark/pull/24901#issuecomment-532579027
 
 
   **[Test build #110894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110894/testReport)**
 for PR 24901 at commit 
[`886d293`](https://github.com/apache/spark/commit/886d29302f94d797f25bc6b75372a2625ea456ce).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox

cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r325545171
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LocalShuffledRowRDD.scala
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark._
+import org.apache.spark.rdd.{RDD, ShuffledRDDPartition}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.metric.{SQLMetric, 
SQLShuffleReadMetricsReporter}
+
+/**
+ * This is a specialized version of 
[[org.apache.spark.sql.execution.ShuffledRowRDD]]. This is used
+ * in Spark SQL adaptive execution when a shuffle join is converted to 
broadcast join at runtime
+ * because the map output of one input table is small enough for broadcast. 
This RDD represents the
+ * data of another input table of the join that reads from shuffle. Each 
partition of the RDD reads
+ * the whole data from just one mapper output locally. So actually there is no 
data transferred
+ * from the network.
+
+ * This RDD takes a [[ShuffleDependency]] (`dependency`).
+ *
+ * The `dependency` has the parent RDD of this RDD, which represents the 
dataset before shuffle
+ * (i.e. map output). Elements of this RDD are (partitionId, Row) pairs.
+ * Partition ids should be in the range [0, numPartitions - 1].
+ * `dependency.partitioner.numPartitions` is the number of pre-shuffle 
partitions. (i.e. the number
+ * of partitions of the map output). The post-shuffle partition number is the 
same to the parent
+ * RDD's partition number.
+ */
+class LocalShuffledRowRDD(
+ var dependency: ShuffleDependency[Int, InternalRow, InternalRow],
+ metrics: Map[String, SQLMetric],
+ specifiedPartitionStartIndices: Option[Array[Int]] = None,
+ specifiedPartitionEndIndices: Option[Array[Int]] = None)
+  extends RDD[InternalRow](dependency.rdd.context, Nil) {
+
+  private[this] val numPreShufflePartitions = 
dependency.partitioner.numPartitions
+  private[this] val numPostShufflePartitions = dependency.rdd.partitions.length
+
+  private[this] val partitionStartIndices: Array[Int] = 
specifiedPartitionStartIndices match {
+case Some(indices) => indices
+case None => Array(0)
+  }
+
+  private[this] val partitionEndIndices: Array[Int] = 
specifiedPartitionEndIndices match {
+case Some(indices) => indices
+case None if specifiedPartitionStartIndices.isEmpty => 
Array(numPreShufflePartitions)
+case _ => specifiedPartitionStartIndices.get.drop(1) :+ 
numPreShufflePartitions
+  }
+
+  override def getDependencies: Seq[Dependency[_]] = List(dependency)
+
+  override def getPartitions: Array[Partition] = {
+assert(partitionStartIndices.length == partitionEndIndices.length)
+Array.tabulate[Partition](numPostShufflePartitions) { i =>
+  new ShuffledRDDPartition(i)
+}
+  }
+
+  override def getPreferredLocations(partition: Partition): Seq[String] = {
+val tracker = 
SparkEnv.get.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster]
+val dep = dependencies.head.asInstanceOf[ShuffleDependency[_, _, _]]
+tracker.getMapLocation(dep, partition.index, partition.index + 1)
+  }
+
+  override def compute(split: Partition, context: TaskContext): 
Iterator[InternalRow] = {
+val shuffledRowPartition = split.asInstanceOf[ShuffledRDDPartition]
+val mapId = shuffledRowPartition.index
+val tempMetrics = context.taskMetrics().createTempShuffleReadMetrics()
+// `SQLShuffleReadMetricsReporter` will update its own metrics for SQL 
exchange operator,
+// as well as the `tempMetrics` for basic shuffle metrics.
+val sqlMetricsReporter = new SQLShuffleReadMetricsReporter(tempMetrics, 
metrics)
+// Connect the the InternalRows read by each ShuffleReader
+new Iterator[InternalRow] {
+  val readers = partitionStartIndices.zip(partitionEndIndices).map { case 
(start, end) =>
 
 Review comment:
   I get your point that some shuffle blocks are empty and we should skip them, 
but I think this

[GitHub] [spark] AmplabJenkins commented on issue #24901: [SPARK-28091[CORE] Extend Spark metrics system with user-defined metrics using executor plugins

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24901: [SPARK-28091[CORE] Extend Spark 
metrics system with user-defined metrics using executor plugins
URL: https://github.com/apache/spark/pull/24901#issuecomment-532579534
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24901: [SPARK-28091[CORE] Extend Spark metrics system with user-defined metrics using executor plugins

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24901: [SPARK-28091[CORE] Extend Spark 
metrics system with user-defined metrics using executor plugins
URL: https://github.com/apache/spark/pull/24901#issuecomment-532579544
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16015/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for 
glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532579706
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

SparkQA commented on issue #24885: [SPARK-28040][R] Add serialization for glue 
type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532579693
 
 
   **[Test build #110893 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110893/testReport)**
 for PR 24885 at commit 
[`475d877`](https://github.com/apache/spark/commit/475d877e5ea61b97e2ec5681ef54a5fd9f8ed0f4).
* This patch **fails R style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for 
glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532579718
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110893/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

SparkQA removed a comment on issue #24885: [SPARK-28040][R] Add serialization 
for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532573629
 
 
   **[Test build #110893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110893/testReport)**
 for PR 24885 at commit 
[`475d877`](https://github.com/apache/spark/commit/475d877e5ea61b97e2ec5681ef54a5fd9f8ed0f4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532579706
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24901: [SPARK-28091[CORE] Extend Spark metrics system with user-defined metrics using executor plugins

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24901: [SPARK-28091[CORE] Extend 
Spark metrics system with user-defined metrics using executor plugins
URL: https://github.com/apache/spark/pull/24901#issuecomment-532579544
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16015/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24901: [SPARK-28091[CORE] Extend Spark metrics system with user-defined metrics using executor plugins

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24901: [SPARK-28091[CORE] Extend 
Spark metrics system with user-defined metrics using executor plugins
URL: https://github.com/apache/spark/pull/24901#issuecomment-532579534
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532579718
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110893/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

zhengruifeng closed pull request #20732: [SPARK-23578][ML] Add multicolumn 
support for Binarizer
URL: https://github.com/apache/spark/pull/20732
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on issue #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

zhengruifeng commented on issue #20732: [SPARK-23578][ML] Add multicolumn 
support for Binarizer
URL: https://github.com/apache/spark/pull/20732#issuecomment-532581279
 
 
   `Bucketizer` already support multi-columns since SPARK-20542


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives duplicate result when an application was killed

2019-09-18 Thread GitBox

cloud-fan commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives 
duplicate result when an application was killed
URL: https://github.com/apache/spark/pull/25795#issuecomment-532581987
 
 
   SGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24901: [SPARK-28091[CORE] Extend Spark metrics system with user-defined metrics using executor plugins

2019-09-18 Thread GitBox

SparkQA commented on issue #24901: [SPARK-28091[CORE] Extend Spark metrics 
system with user-defined metrics using executor plugins
URL: https://github.com/apache/spark/pull/24901#issuecomment-532581934
 
 
   **[Test build #110895 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110895/testReport)**
 for PR 24901 at commit 
[`ea7a5df`](https://github.com/apache/spark/commit/ea7a5df22e66ae618f70bd48965a9c179b64f366).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

HyukjinKwon commented on a change in pull request #25820: [SPARK-29101][SQL] 
Fix count API for csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#discussion_r325548784
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ##
 @@ -2109,4 +2110,17 @@ class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvData {
 "expect the TextParsingException truncate the error content to be 1000 
length.")
 }
   }
+
+  test("SPARK-29101 test count with DROPMALFORMED mode") {
+Seq((true, 4), (false, 3)).foreach { record =>
 
 Review comment:
   Sorry, one more nit:
   
   ```scala
   Seq((true, 4), (false, 3)).foreach { case (csvColumnPruning, count) =>
 withSQLConf(SQLConf.CSV_PARSER_COLUMN_PRUNING.key -> 
csvColumnPruning.toString) {
   ...
   assert(record._2 == count)
 }
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives duplicate result when an application was killed

2019-09-18 Thread GitBox

cloud-fan commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives 
duplicate result when an application was killed
URL: https://github.com/apache/spark/pull/25795#issuecomment-532582535
 
 
   So #25739 is to support concurrent writes to different locations, and this 
PR is to detect concurrent writes to the same location and fail fast?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished

2019-09-18 Thread GitBox

cloud-fan commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO 
threads in PipedRDD when task is finished
URL: https://github.com/apache/spark/pull/25825#issuecomment-532583049
 
 
   Let's also mention the original PR in the description.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] 
Interrupt pipe IO threads in PipedRDD when task is finished
URL: https://github.com/apache/spark/pull/25825#issuecomment-532497335
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished

2019-09-18 Thread GitBox

cloud-fan commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO 
threads in PipedRDD when task is finished
URL: https://github.com/apache/spark/pull/25825#issuecomment-532582950
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tengpeng commented on issue #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

tengpeng commented on issue #20732: [SPARK-23578][ML] Add multicolumn support 
for Binarizer
URL: https://github.com/apache/spark/pull/20732#issuecomment-532583961
 
 
   This PR is for Binarizer not Bucketizer.
   
   On Wed, Sep 18, 2019 at 4:35 PM Ruifeng Zheng 
   wrote:
   
   > Closed #20732 .
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished

2019-09-18 Thread GitBox

SparkQA commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO 
threads in PipedRDD when task is finished
URL: https://github.com/apache/spark/pull/25825#issuecomment-532584868
 
 
   **[Test build #110896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110896/testReport)**
 for PR 25825 at commit 
[`6ee8d0d`](https://github.com/apache/spark/commit/6ee8d0d6aaddb8185122b9389155b64c102623d0).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

SparkQA commented on issue #24885: [SPARK-28040][R] Add serialization for glue 
type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532584906
 
 
   **[Test build #110897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110897/testReport)**
 for PR 24885 at commit 
[`ba90817`](https://github.com/apache/spark/commit/ba908171073a165ec54980fae2b38a529149d4f0).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt 
pipe IO threads in PipedRDD when task is finished
URL: https://github.com/apache/spark/pull/25825#issuecomment-532585412
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt 
pipe IO threads in PipedRDD when task is finished
URL: https://github.com/apache/spark/pull/25825#issuecomment-532585422
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16016/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for 
glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532585518
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #24885: [SPARK-28040][R] Add serialization for 
glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532585525
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16017/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532585518
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add serialization for glue type

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #24885: [SPARK-28040][R] Add 
serialization for glue type
URL: https://github.com/apache/spark/pull/24885#issuecomment-532585525
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16017/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] 
Interrupt pipe IO threads in PipedRDD when task is finished
URL: https://github.com/apache/spark/pull/25825#issuecomment-532585422
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16016/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] 
Interrupt pipe IO threads in PipedRDD when task is finished
URL: https://github.com/apache/spark/pull/25825#issuecomment-532585412
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use approxQuantile instead of sort

2019-09-18 Thread GitBox

zhengruifeng commented on issue #16966: [SPARK-18409][ML]LSH 
approxNearestNeighbors should use approxQuantile instead of sort
URL: https://github.com/apache/spark/pull/16966#issuecomment-532587336
 
 
   @Yunni  Are you still working on this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25729: [SPARK-29022][SQL][test-hadoop3.2][test-java11] Fix spark 'add jar', CliSessionState's hiveConf 's classLoader ClassNotFound

2019-09-18 Thread GitBox

SparkQA commented on issue #25729: 
[SPARK-29022][SQL][test-hadoop3.2][test-java11] Fix spark 'add jar', 
CliSessionState's hiveConf 's classLoader ClassNotFound
URL: https://github.com/apache/spark/pull/25729#issuecomment-532587836
 
 
   **[Test build #110899 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110899/testReport)**
 for PR 25729 at commit 
[`88169fc`](https://github.com/apache/spark/commit/88169fccbc9af28f7400d324b15a4f1b1a244a4d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

SparkQA commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv 
file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532587821
 
 
   **[Test build #110898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110898/testReport)**
 for PR 25820 at commit 
[`f2c25f0`](https://github.com/apache/spark/commit/f2c25f068b29e2a71a7e4eacaa075e67001a2652).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for 
csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532588451
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16018/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tengpeng opened a new pull request #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

tengpeng opened a new pull request #20732: [SPARK-23578][ML] Add multicolumn 
support for Binarizer
URL: https://github.com/apache/spark/pull/20732
 
 
   ## What changes were proposed in this pull request?
   
   [Spark-20542] added an API that Bucketizer that can bin multiple columns. 
Based on this change, a multicolumn support is added for Binarizer.
   
   ## How was this patch tested?
   Added test cases.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for 
csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532588444
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng edited a comment on issue #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

zhengruifeng edited a comment on issue #20732: [SPARK-23578][ML] Add 
multicolumn support for Binarizer
URL: https://github.com/apache/spark/pull/20732#issuecomment-532581279
 
 
   --- `Bucketizer` already support multi-columns since SPARK-20542 ---


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on issue #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

zhengruifeng commented on issue #20732: [SPARK-23578][ML] Add multicolumn 
support for Binarizer
URL: https://github.com/apache/spark/pull/20732#issuecomment-532588830
 
 
   @tengpeng Sorry for this stupid mistake!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count 
API for csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532588451
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16018/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count 
API for csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-532588444
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #20732: [SPARK-23578][ML] Add multicolumn 
support for Binarizer
URL: https://github.com/apache/spark/pull/20732#issuecomment-532590735
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25804: [SPARK-29096][SQL] The exact math method should be called only when there is a corresponding function in Math

2019-09-18 Thread GitBox

cloud-fan commented on issue #25804: [SPARK-29096][SQL] The exact math method 
should be called only when there is a corresponding function in Math
URL: https://github.com/apache/spark/pull/25804#issuecomment-532590873
 
 
   LGTM. I think for pgsql tests, we should always set dialect to pgsql (once 
we have that config). For other golden file tests, we should test with ansi 
mode on and off. We can do it later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

AmplabJenkins commented on issue #20732: [SPARK-23578][ML] Add multicolumn 
support for Binarizer
URL: https://github.com/apache/spark/pull/20732#issuecomment-532591209
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #25804: [SPARK-29096][SQL] The exact math method should be called only when there is a corresponding function in Math

2019-09-18 Thread GitBox

cloud-fan closed pull request #25804: [SPARK-29096][SQL] The exact math method 
should be called only when there is a corresponding function in Math
URL: https://github.com/apache/spark/pull/25804
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #20732: [SPARK-23578][ML] Add 
multicolumn support for Binarizer
URL: https://github.com/apache/spark/pull/20732#issuecomment-531897559
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25804: [SPARK-29096][SQL] The exact math method should be called only when there is a corresponding function in Math

2019-09-18 Thread GitBox

cloud-fan commented on issue #25804: [SPARK-29096][SQL] The exact math method 
should be called only when there is a corresponding function in Math
URL: https://github.com/apache/spark/pull/25804#issuecomment-532591531
 
 
   thanks, merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #20732: [SPARK-23578][ML] Add multicolumn support for Binarizer

2019-09-18 Thread GitBox

AmplabJenkins removed a comment on issue #20732: [SPARK-23578][ML] Add 
multicolumn support for Binarizer
URL: https://github.com/apache/spark/pull/20732#issuecomment-532590735
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1083 matches

Mail list logo