date:20180927

[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...

2018-09-27 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22566#discussion_r220985607
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -50,7 +52,26 @@ case class AnalyzeColumnCommand(
 val sizeInBytes = CommandUtils.calculateTotalSize(sparkSession, 
tableMeta)
 
 // Compute stats for each column
-val (rowCount, newColStats) = computeColumnStats(sparkSession, 
tableIdentWithDB, columnNames)
+val conf = sparkSession.sessionState.conf
+val relation = sparkSession.table(tableIdent).logicalPlan
+val attributesToAnalyze = if (allColumns) {
+  relation.output
+} else {
+  columnNames.get.map { col =>
+val exprOption = relation.output.find(attr => 
conf.resolver(attr.name, col))
+exprOption.getOrElse(throw new AnalysisException(s"Column $col 
does not exist."))
+  }
+}
+// Make sure the column types are supported for stats gathering.
+attributesToAnalyze.foreach { attr =>
+  if (!supportsType(attr.dataType)) {
+throw new AnalysisException(
+  s"Column ${attr.name} in table $tableIdent is of type 
${attr.dataType}, " +
+"and Spark does not support statistics collection on this 
column type.")
+  }
+}
--- End diff --

@gatorsmile OK


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...

2018-09-27 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22566#discussion_r220985491
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -33,11 +33,13 @@ import org.apache.spark.sql.types._
 
 /**
  * Analyzes the given columns of the given table to generate statistics, 
which will be used in
- * query optimizations.
+ * query optimizations. Parameter `allColumns` may be specified to 
generate statistics of all the
+ * columns of a given table.
  */
 case class AnalyzeColumnCommand(
 tableIdent: TableIdentifier,
-columnNames: Seq[String]) extends RunnableCommand {
+columnNames: Option[Seq[String]],
+allColumns: Boolean = false ) extends RunnableCommand {
--- End diff --

@gatorsmile ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...

2018-09-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22531
  
@wangyum, are you interested in submitting a PR to check if we can add a 
rule for `.toLowerCase(Locale.ROOT)` and `.toUpperCase(Locale.ROOT)` and add it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-09-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22165#discussion_r220984492
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BarrierCoordinatorSuite.scala ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.TimeoutException
+
+import scala.concurrent.duration._
+import scala.language.postfixOps
+
+import org.scalatest.concurrent.Eventually
+
+import org.apache.spark._
+import org.apache.spark.rpc.RpcTimeout
+
+class BarrierCoordinatorSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
+
+  /**
+   * Get the current ContextBarrierState from barrierCoordinator.states by 
ContextBarrierId.
+   */
+  private def getBarrierState(
+  stageId: Int,
+  stageAttemptId: Int,
+  barrierCoordinator: BarrierCoordinator) = {
+val barrierId = ContextBarrierId(stageId, stageAttemptId)
+barrierCoordinator.states.get(barrierId)
+  }
+
+  test("normal test for single task") {
+sc = new SparkContext("local", "test")
+val barrierCoordinator = new BarrierCoordinator(5, sc.listenerBus, 
sc.env.rpcEnv)
+val rpcEndpointRef = sc.env.rpcEnv.setupEndpoint("barrierCoordinator", 
barrierCoordinator)
+val stageId = 0
+val stageAttemptNumber = 0
+rpcEndpointRef.askSync[Unit](
--- End diff --

Sorry for missing this, done in 8cd78a9.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-09-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22165#discussion_r220984340
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -187,6 +191,12 @@ private[spark] class BarrierCoordinator(
   requesters.clear()
   cancelTimerTask()
 }
+
+// Check for clearing internal data, visible for test only.
+private[spark] def cleanCheck(): Boolean = requesters.isEmpty && 
timerTask == null
+
+// Get currently barrier epoch, visible for test only.
+private[spark] def getBarrierEpoch(): Int = barrierEpoch
--- End diff --

https://github.com/apache/spark/pull/22165#discussion_r218093991 As the 
comment here, need revert back?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-09-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22165#discussion_r220983212
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -141,7 +145,7 @@ private[spark] class BarrierCoordinator(
   logInfo(s"Current barrier epoch for $barrierId is $barrierEpoch.")
   if (epoch != barrierEpoch) {
 requester.sendFailure(new SparkException(s"The request to sync of 
$barrierId with " +
-  s"barrier epoch $barrierEpoch has already finished. Maybe task 
$taskId is not " +
+  s"barrier epoch $epoch has already finished. Maybe task $taskId 
is not " +
--- End diff --

During write the UT for ContextBarrierState, I think this is a little bug 
in log? @jiangxb1987 Please checking if I'm wrong.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22165
  
**[Test build #96702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96702/testReport)**
 for PR 22165 at commit 
[`8cd78a9`](https://github.com/apache/spark/commit/8cd78a95a0e0649fed81fe6217790943855b7417).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22558: [SPARK-25546][core] Don't cache value of EVENT_LOG_CALLS...

2018-09-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22558
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22165
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3542/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22165
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-27 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22237
  
Hi @MaxGekk ,
I just reviewed this PR. I noticed that there is one behavior change. The 
column value of `from_json(corrupt_record...)` become `Row(null, nulll, ...)`, 
instead of `null`. 

``` 
val df = Seq("""{"a" 1, "b": 2}""").toDS()
val schema = new StructType().add("a", IntegerType).add("b", IntegerType)
```

Before the code change:
```
scala> df.select(from_json($"value", schema).as("col")).where("col is 
null").show()
++
| col|
++
|null|
++

scala> df.select(from_json($"value", schema).as("col")).where("col.a is 
null").show()
++
| col|
++
|null|
++ 
```

After the code change:
```
scala> df.select(from_json($"value", schema).as("col")).where("col is 
null").show()
+---+
|col|
+---+
+---+


scala> df.select(from_json($"value", schema).as("col")).where("col.a is 
null").show()
+---+
|col|
+---+
|[,]|
+---+
```

The main difference is that we can't filter the null `col` in the result 
column. Is there any reason for changing this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANALYZE TA...

2018-09-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22566
  
@dilipbiswal Thanks for working on this! Also cc @juliuszsompolski 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...

2018-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22566#discussion_r220979068
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -33,11 +33,13 @@ import org.apache.spark.sql.types._
 
 /**
  * Analyzes the given columns of the given table to generate statistics, 
which will be used in
- * query optimizations.
+ * query optimizations. Parameter `allColumns` may be specified to 
generate statistics of all the
+ * columns of a given table.
  */
 case class AnalyzeColumnCommand(
 tableIdent: TableIdentifier,
-columnNames: Seq[String]) extends RunnableCommand {
+columnNames: Option[Seq[String]],
+allColumns: Boolean = false ) extends RunnableCommand {
--- End diff --

let us do not use the default values?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...

2018-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22566#discussion_r220978815
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -50,7 +52,26 @@ case class AnalyzeColumnCommand(
 val sizeInBytes = CommandUtils.calculateTotalSize(sparkSession, 
tableMeta)
 
 // Compute stats for each column
-val (rowCount, newColStats) = computeColumnStats(sparkSession, 
tableIdentWithDB, columnNames)
+val conf = sparkSession.sessionState.conf
+val relation = sparkSession.table(tableIdent).logicalPlan
+val attributesToAnalyze = if (allColumns) {
+  relation.output
--- End diff --

Are we still able to create a table with zero column? for example, using 
dataframewriter?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...

2018-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22566#discussion_r220978327
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -70,25 +91,9 @@ case class AnalyzeColumnCommand(
*/
   private def computeColumnStats(
   sparkSession: SparkSession,
-  tableIdent: TableIdentifier,
-  columnNames: Seq[String]): (Long, Map[String, CatalogColumnStat]) = {
-
+  relation: LogicalPlan,
+  attributesToAnalyze: Seq[Attribute]): (Long, Map[String, 
CatalogColumnStat]) = {
--- End diff --

`attributesToAnalyze ` -> `columns`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...

2018-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22566#discussion_r220978087
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -50,7 +52,26 @@ case class AnalyzeColumnCommand(
 val sizeInBytes = CommandUtils.calculateTotalSize(sparkSession, 
tableMeta)
 
 // Compute stats for each column
-val (rowCount, newColStats) = computeColumnStats(sparkSession, 
tableIdentWithDB, columnNames)
+val conf = sparkSession.sessionState.conf
+val relation = sparkSession.table(tableIdent).logicalPlan
+val attributesToAnalyze = if (allColumns) {
+  relation.output
+} else {
+  columnNames.get.map { col =>
+val exprOption = relation.output.find(attr => 
conf.resolver(attr.name, col))
+exprOption.getOrElse(throw new AnalysisException(s"Column $col 
does not exist."))
+  }
+}
+// Make sure the column types are supported for stats gathering.
+attributesToAnalyze.foreach { attr =>
+  if (!supportsType(attr.dataType)) {
+throw new AnalysisException(
+  s"Column ${attr.name} in table $tableIdent is of type 
${attr.dataType}, " +
+"and Spark does not support statistics collection on this 
column type.")
+  }
+}
--- End diff --

Also throw an exception when `allColumns ` is set to true but columnNames 
is not empty. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...

2018-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22566#discussion_r220977575
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -50,7 +52,26 @@ case class AnalyzeColumnCommand(
 val sizeInBytes = CommandUtils.calculateTotalSize(sparkSession, 
tableMeta)
 
 // Compute stats for each column
-val (rowCount, newColStats) = computeColumnStats(sparkSession, 
tableIdentWithDB, columnNames)
+val conf = sparkSession.sessionState.conf
+val relation = sparkSession.table(tableIdent).logicalPlan
+val attributesToAnalyze = if (allColumns) {
+  relation.output
+} else {
+  columnNames.get.map { col =>
+val exprOption = relation.output.find(attr => 
conf.resolver(attr.name, col))
+exprOption.getOrElse(throw new AnalysisException(s"Column $col 
does not exist."))
+  }
+}
+// Make sure the column types are supported for stats gathering.
+attributesToAnalyze.foreach { attr =>
+  if (!supportsType(attr.dataType)) {
+throw new AnalysisException(
+  s"Column ${attr.name} in table $tableIdent is of type 
${attr.dataType}, " +
+"and Spark does not support statistics collection on this 
column type.")
+  }
+}
--- End diff --

creating a new private function for the code between 55 and 72?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(numb...

2018-09-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22419#discussion_r220973430
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1245,3 +1245,27 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ * The number truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(number, scale) - Returns number truncated to scale 
decimal places. " +
+"If scale is omitted, then number is truncated to 0 places. " +
+"scale can be negative to truncate (make zero) scale digits left of 
the decimal point.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Truncate(child: Expression, scale: Expression)
--- End diff --

I am still preferring to extend `trunc`.  Not straightforward to know the 
difference between `truncate` and `trunc`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22566: [SPARK-25458][SQL] Support FOR ALL COLUMNS in ANA...

2018-09-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22566#discussion_r220962782
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -33,11 +33,13 @@ import org.apache.spark.sql.types._
 
 /**
  * Analyzes the given columns of the given table to generate statistics, 
which will be used in
- * query optimizations.
+ * query optimizations. Parameter `allColumns` may be specified to 
generate statistics of all the
+ * columns of a given table.
  */
 case class AnalyzeColumnCommand(
 tableIdent: TableIdentifier,
-columnNames: Seq[String]) extends RunnableCommand {
+columnNames: Option[Seq[String]],
+allColumns: Boolean = false ) extends RunnableCommand {
--- End diff --

nit. `false )` -> `false`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22558: [SPARK-25546][core] Don't cache value of EVENT_LOG_CALLS...

2018-09-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22558
  
Also cc @michaelmior and @cloud-fan .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-27 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220959695
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class  
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to 
"benchmarks/AggregateBenchmark-results.txt".
+ * }}}
  */
-class AggregateBenchmark extends BenchmarkWithCodegen {
+object AggregateBenchmark extends SqlBasedBenchmark {
 
-  ignore("aggregate without grouping") {
-val N = 500L << 22
-val benchmark = new Benchmark("agg without grouping", N)
-runBenchmark("agg w/o group", N) {
-  sparkSession.range(N).selectExpr("sum(id)").collect()
+  override def benchmark(): Unit = {
+runBenchmark("aggregate without grouping") {
+  val N = 500L << 22
+  runBenchmarkWithCodegen("agg w/o group", N) {
+spark.range(N).selectExpr("sum(id)").collect()
+  }
 }
-/*
-agg w/o group:   Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
-

-agg w/o group wholestage off30136 / 31885 69.6 
 14.4   1.0X
-agg w/o group wholestage on   1851 / 1860   1132.9 
  0.9  16.3X
- */
-  }
 
-  ignore("stat functions") {
-val N = 100L << 20
+runBenchmark("stat functions") {
+  val N = 100L << 20
 
-runBenchmark("stddev", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
-}
+  runBenchmarkWithCodegen("stddev", N) {
+spark.range(N).groupBy().agg("id" -> "stddev").collect()
+  }
 
-runBenchmark("kurtosis", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  runBenchmarkWithCodegen("kurtosis", N) {
+spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  }
 }
 
-/*
-Using ImperativeAggregate (as implemented in Spark 1.6):
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev:Avg Time(ms)Avg Rate(M/s)  
Relative Rate
-  
---
-  stddev w/o codegen  2019.0410.39 
1.00 X
-  stddev w codegen2097.2910.00 
0.96 X
-  kurtosis w/o codegen2108.99 9.94 
0.96 X
-  kurtosis w codegen  2090.6910.03 
0.97 X
-
-  Using DeclarativeAggregate:
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  stddev codegen=false 5630 / 5776 18.0
  55.6   1.0X
-  stddev codegen=true  1259 / 1314 83.0
  12.0   4.5X
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  kurtosis:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  kurtosis codegen=false 14847 / 15084  7.0
 142.9   1.0X
-  kurtosis codegen=true1652 / 2124 63.0
  15.9   9.0X
-*/
-  }
-
-  ignore("aggregate with linear keys") {
-val N = 20 << 22
+runBenchmark("aggregate with linear keys") {
+  val N = 20 << 22
 
-val benchmark = new Benchmark("Aggregate w keys", N)
-def f(): Unit = {
-  sparkSession.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
-}
+  val benchmark = new Benchmark("Aggregate w keys", N, output = output)
 
-benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-  sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
-  f()
-}
+

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22569
  
**[Test build #96701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96701/testReport)**
 for PR 22569 at commit 
[`dca4e5c`](https://github.com/apache/spark/commit/dca4e5c991c94b5cfbf64bb7b661518ed069329f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22569
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22569
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3541/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-09-27 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21990
  
I'm +1 on switching to the builder and not using the private interface.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22425: [SPARK-23367][Build] Include python document styl...

2018-09-27 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22425#discussion_r220956727
  
--- Diff: dev/tox.ini ---
@@ -14,6 +14,8 @@
 # limitations under the License.
 
 [pycodestyle]
-ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504
+ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504,W605
 max-line-length=100
 
exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*,dist/*
+[pydocstyle]

+ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414
--- End diff --

I just asked to add a line break at the end of file, and the current style 
looks good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenH...

2018-09-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22569#discussion_r220955679
  
--- Diff: 
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala ---
@@ -255,4 +255,16 @@ class OpenHashSetSuite extends SparkFunSuite with 
Matchers {
 val set = new OpenHashSet[Long](0)
 assert(set.size === 0)
   }
+
+  test("support for more than 12M items") {
+val cnt = 1200 // 12M
+val set = new OpenHashSet[Int](cnt)
+for (i <- 0 until cnt) {
+  set.add(i)
+
+  val pos1 = set.addWithoutResize(i) & OpenHashSet.POSITION_MASK
+  val pos2 = set.getPos(i)
+  assert(pos1 == pos2)
+}
--- End diff --

If we want to add the check, we can also add it inside the loop. Another 
loop seems unnecessary to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix repr behaviour for R...

2018-09-27 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/20503
  
I _think_ this could be good to backport into 2.4 assuming the current RC 
fails if @ashashwat has the chance to update it and no one sees any issues with 
including this in a backport to that branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix repr behaviour for R...

2018-09-27 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/20503
  
Sure let's add a test with a unicode string to it if there's concern about 
that and make sure the existing repr with named fields is covered the same test 
case since I don't see an existing explicit test for that (although it's 
probably covered implicitly elsewhere).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22569
  
LGTM except one minor comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenH...

2018-09-27 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22569#discussion_r220954056
  
--- Diff: 
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala ---
@@ -255,4 +255,16 @@ class OpenHashSetSuite extends SparkFunSuite with 
Matchers {
 val set = new OpenHashSet[Long](0)
 assert(set.size === 0)
   }
+
+  test("support for more than 12M items") {
+val cnt = 1200 // 12M
+val set = new OpenHashSet[Int](cnt)
+for (i <- 0 until cnt) {
+  set.add(i)
+
+  val pos1 = set.addWithoutResize(i) & OpenHashSet.POSITION_MASK
+  val pos2 = set.getPos(i)
+  assert(pos1 == pos2)
+}
--- End diff --

nit: Is it better to add the following to check each value after adding 
all, too?
```
for (i <- 0 until cnt) {
  assert(set.contains(i))
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22425: [SPARK-23367][Build] Include python document styl...

2018-09-27 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22425#discussion_r220950524
  
--- Diff: dev/tox.ini ---
@@ -14,6 +14,8 @@
 # limitations under the License.
 
 [pycodestyle]
-ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504
+ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504,W605
 max-line-length=100
 
exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*,dist/*
+[pydocstyle]

+ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414
--- End diff --

I don't think that's what @ueshin was asking for, I think it was a blank 
line after the `ignore=...`, but if @ueshin is around we can see what @ueshin 
says. It's also relatively minor provided everything functions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22425: [SPARK-23367][Build] Include python document styl...

2018-09-27 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22425#discussion_r220950740
  
--- Diff: dev/tox.ini ---
@@ -14,6 +14,8 @@
 # limitations under the License.
 
 [pycodestyle]
-ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504
+ignore=E226,E241,E305,E402,E722,E731,E741,W503,W504,W605
--- End diff --

I'm just confused why this would need to be changed in this PR -- hopefully 
just a hold over from the previous PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...

2018-09-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22568#discussion_r220948793
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5714,24 +5755,31 @@ def test_wrong_args(self):
 pandas_udf(lambda x, y: x, DoubleType(), 
PandasUDFType.SCALAR))
 
 def test_unsupported_types(self):
+from distutils.version import LooseVersion
+import pyarrow as pa
 from pyspark.sql.functions import pandas_udf, PandasUDFType
-schema = StructType(
-[StructField("id", LongType(), True),
- StructField("map", MapType(StringType(), IntegerType()), 
True)])
-with QuietTest(self.sc):
-with self.assertRaisesRegexp(
-NotImplementedError,
-'Invalid returnType.*grouped map Pandas UDF.*MapType'):
-pandas_udf(lambda x: x, schema, PandasUDFType.GROUPED_MAP)
 
-schema = StructType(
-[StructField("id", LongType(), True),
- StructField("arr_ts", ArrayType(TimestampType()), True)])
-with QuietTest(self.sc):
-with self.assertRaisesRegexp(
-NotImplementedError,
-'Invalid returnType.*grouped map Pandas 
UDF.*ArrayType.*TimestampType'):
-pandas_udf(lambda x: x, schema, PandasUDFType.GROUPED_MAP)
+common_err_msg = 'Invalid returnType.*grouped map Pandas UDF.*'
+unsupported_types = [
+StructField('map', MapType(StringType(), IntegerType())),
+StructField('arr_ts', ArrayType(TimestampType())),
+StructField('null', NullType()),
+]
+
+# TODO: Remove this if-statement once minimum pyarrow version is 
0.10.0
+if LooseVersion(pa.__version__) < LooseVersion("0.10.0"):
+unsupported_types.append(
+StructField('bin', BinaryType())
--- End diff --

Likewise, let's just 

```
unsupported_types.append(StructField('bin', BinaryType()))
```

or 

```
unsupported_types += [StructField('bin', BinaryType())]
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...

2018-09-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22568#discussion_r220948302
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5525,32 +5525,73 @@ def data(self):
 .withColumn("v", explode(col('vs'))).drop('vs')
 
 def test_supported_types(self):
-from pyspark.sql.functions import pandas_udf, PandasUDFType, 
array, col
-df = self.data.withColumn("arr", array(col("id")))
+from decimal import Decimal
+from distutils.version import LooseVersion
+import pyarrow as pa
+from pyspark.sql.functions import pandas_udf, PandasUDFType
 
-# Different forms of group map pandas UDF, results of these are 
the same
+input_values_with_schema = [
+(1, StructField('id', IntegerType())),
+(2, StructField('byte', ByteType())),
+(3, StructField('short', ShortType())),
+(4, StructField('int', IntegerType())),
+(5, StructField('long', LongType())),
+(1.1, StructField('float', FloatType())),
+(2.2, StructField('double', DoubleType())),
+(Decimal(1.123), StructField('decim', DecimalType(10, 3))),
+([1, 2, 3], StructField('array', ArrayType(IntegerType(,
+(True, StructField('bool', BooleanType())),
+('hello', StructField('str', StringType())),
+]
--- End diff --

I understood why you did this but I think we can just do like:

```python
values = [
1, 2, 3,
4, 5, 1.1,
...
]
output_schema = StructType([
StructField('id', IntegerType()), StructField('byte', ByteType()), 
StructField('short', ShortType()),
StructField('int', IntegerType()), StructField('long', LongType()), 
StructField('float', FloatType()),
...
])
```


Let's just keep the original way and make it simple.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...

2018-09-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22568#discussion_r220943714
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5525,32 +5525,73 @@ def data(self):
 .withColumn("v", explode(col('vs'))).drop('vs')
 
 def test_supported_types(self):
-from pyspark.sql.functions import pandas_udf, PandasUDFType, 
array, col
-df = self.data.withColumn("arr", array(col("id")))
+from decimal import Decimal
+from distutils.version import LooseVersion
+import pyarrow as pa
+from pyspark.sql.functions import pandas_udf, PandasUDFType
 
-# Different forms of group map pandas UDF, results of these are 
the same
+input_values_with_schema = [
+(1, StructField('id', IntegerType())),
+(2, StructField('byte', ByteType())),
+(3, StructField('short', ShortType())),
+(4, StructField('int', IntegerType())),
+(5, StructField('long', LongType())),
+(1.1, StructField('float', FloatType())),
+(2.2, StructField('double', DoubleType())),
+(Decimal(1.123), StructField('decim', DecimalType(10, 3))),
+([1, 2, 3], StructField('array', ArrayType(IntegerType(,
+(True, StructField('bool', BooleanType())),
+('hello', StructField('str', StringType())),
+]
 
-output_schema = StructType(
-[StructField('id', LongType()),
- StructField('v', IntegerType()),
- StructField('arr', ArrayType(LongType())),
- StructField('v1', DoubleType()),
- StructField('v2', LongType())])
+# TODO: Add BinaryType to 'input_values_with_schema' once minimum 
pyarrow version is 0.10.0
+if LooseVersion(pa.__version__) >= LooseVersion("0.10.0"):
+input_values_with_schema.append(
+(bytearray([0x01, 0x02]), StructField('bin', BinaryType()))
--- End diff --

tiny nit: I would just

```python
input_values_with_schema += [bytearray([0x01, 0x02]), 
StructField('bin', BinaryType())]
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...

2018-09-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22568#discussion_r220948939
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5714,24 +5755,31 @@ def test_wrong_args(self):
 pandas_udf(lambda x, y: x, DoubleType(), 
PandasUDFType.SCALAR))
 
 def test_unsupported_types(self):
+from distutils.version import LooseVersion
+import pyarrow as pa
 from pyspark.sql.functions import pandas_udf, PandasUDFType
-schema = StructType(
-[StructField("id", LongType(), True),
- StructField("map", MapType(StringType(), IntegerType()), 
True)])
-with QuietTest(self.sc):
-with self.assertRaisesRegexp(
-NotImplementedError,
-'Invalid returnType.*grouped map Pandas UDF.*MapType'):
-pandas_udf(lambda x: x, schema, PandasUDFType.GROUPED_MAP)
 
-schema = StructType(
-[StructField("id", LongType(), True),
- StructField("arr_ts", ArrayType(TimestampType()), True)])
-with QuietTest(self.sc):
-with self.assertRaisesRegexp(
-NotImplementedError,
-'Invalid returnType.*grouped map Pandas 
UDF.*ArrayType.*TimestampType'):
-pandas_udf(lambda x: x, schema, PandasUDFType.GROUPED_MAP)
+common_err_msg = 'Invalid returnType.*grouped map Pandas UDF.*'
+unsupported_types = [
+StructField('map', MapType(StringType(), IntegerType())),
+StructField('arr_ts', ArrayType(TimestampType())),
+StructField('null', NullType()),
+]
+
+# TODO: Remove this if-statement once minimum pyarrow version is 
0.10.0
+if LooseVersion(pa.__version__) < LooseVersion("0.10.0"):
+unsupported_types.append(
+StructField('bin', BinaryType())
+)
+
+for unsupported_type in unsupported_types:
+schema = StructType([
+StructField('id', LongType(), True),
+unsupported_type
+])
--- End diff --

I think we can make this inlined as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22568: [SPARK-23401][PYTHON][TESTS] Add more data types ...

2018-09-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22568#discussion_r220944429
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5525,32 +5525,73 @@ def data(self):
 .withColumn("v", explode(col('vs'))).drop('vs')
 
 def test_supported_types(self):
-from pyspark.sql.functions import pandas_udf, PandasUDFType, 
array, col
-df = self.data.withColumn("arr", array(col("id")))
+from decimal import Decimal
+from distutils.version import LooseVersion
+import pyarrow as pa
+from pyspark.sql.functions import pandas_udf, PandasUDFType
 
-# Different forms of group map pandas UDF, results of these are 
the same
+input_values_with_schema = [
+(1, StructField('id', IntegerType())),
+(2, StructField('byte', ByteType())),
+(3, StructField('short', ShortType())),
+(4, StructField('int', IntegerType())),
+(5, StructField('long', LongType())),
+(1.1, StructField('float', FloatType())),
+(2.2, StructField('double', DoubleType())),
+(Decimal(1.123), StructField('decim', DecimalType(10, 3))),
+([1, 2, 3], StructField('array', ArrayType(IntegerType(,
+(True, StructField('bool', BooleanType())),
+('hello', StructField('str', StringType())),
+]
 
-output_schema = StructType(
-[StructField('id', LongType()),
- StructField('v', IntegerType()),
- StructField('arr', ArrayType(LongType())),
- StructField('v1', DoubleType()),
- StructField('v2', LongType())])
+# TODO: Add BinaryType to 'input_values_with_schema' once minimum 
pyarrow version is 0.10.0
+if LooseVersion(pa.__version__) >= LooseVersion("0.10.0"):
+input_values_with_schema.append(
+(bytearray([0x01, 0x02]), StructField('bin', BinaryType()))
+)
+
+values = [[x[0] for x in input_values_with_schema]]
+output_schema = StructType([x[1] for x in 
input_values_with_schema])
 
+df = self.spark.createDataFrame(values, schema=output_schema)
+
+# Different forms of group map pandas UDF, results of these are 
the same
 udf1 = pandas_udf(
-lambda pdf: pdf.assign(v1=pdf.v * pdf.id * 1.0, v2=pdf.v + 
pdf.id),
+lambda pdf: pdf.assign(
+decim=pdf.decim + pdf.decim,
+double=pdf.double + pdf.float,
+byte=pdf.byte + 1,
+long=pdf.byte + pdf.int + pdf.long + pdf.short,
--- End diff --

I would get rid of those calculations with different types to make it 
easier to read.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

2018-09-27 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22295
  
nvm, the merge script only triggers the edits if we have conflicts. If you 
can update 3.0 to 2.5 I'd be happy to merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

2018-09-27 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22295
  
LGTM except the 3.0 to 2.5 I'll change that during the merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21522
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21522
  
**[Test build #96700 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96700/testReport)**
 for PR 21522 at commit 
[`8e3aa44`](https://github.com/apache/spark/commit/8e3aa44c3937d60d5aa35dd03604e57ef218ebb4).
 * This patch **fails Java style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `class VectorAssemblerEstimator(override val uid: String)`
  * `class VectorAssemblerModel(override val uid: String, val 
vectorColsLengths: Map[String, Int])`
  * `  class VectorAssemblerModelWriter(instance: VectorAssemblerModel) 
extends MLWriter `
  * `  class VectorAssemblerModelReader extends 
MLReader[VectorAssemblerModel] `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21522
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96700/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17654: [SPARK-20351] [ML] Add trait hasTrainingSummary to repla...

2018-09-27 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17654
  
Thanks for working on this, remove duplicated code is great. I'm curious as 
to why we couldn't remove some of the function calls to super and instead 
depend on inheritance?

If it's the types on the setters could we add another type parameter of the 
model?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22570
  
Exactly same opinion with Sean's.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-09-27 Thread icexelloss

Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/22305
  
Gental ping @cloud-fan @gatorsmile @HyukjinKwon @ueshin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21522
  
**[Test build #96700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96700/testReport)**
 for PR 21522 at commit 
[`8e3aa44`](https://github.com/apache/spark/commit/8e3aa44c3937d60d5aa35dd03604e57ef218ebb4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22379
  
**[Test build #96699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96699/testReport)**
 for PR 22379 at commit 
[`4c2fcea`](https://github.com/apache/spark/commit/4c2fcea15549f0338b4c7f58aa2f8968ba660aff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator

2018-09-27 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21522
  
cc @jkbradley as the reporter of this issue you might want to take a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21522: [SPARK-24467][ML] VectorAssemblerEstimator

2018-09-27 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21522
  
Jenkins ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3540/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22570
  
**[Test build #96698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96698/testReport)**
 for PR 22570 at commit 
[`68f83a3`](https://github.com/apache/spark/commit/68f83a366497c1b263d4f4bf67ad46bcc5c65c6d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringC...

2018-09-27 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22570#discussion_r220933694
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala ---
@@ -91,7 +91,7 @@ private[rest] class 
StandaloneStatusRequestServlet(masterEndpoint: RpcEndpointRe
   protected def handleStatus(submissionId: String): 
SubmissionStatusResponse = {
 val response = 
masterEndpoint.askSync[DeployMessages.DriverStatusResponse](
   DeployMessages.RequestDriverStatus(submissionId))
-val message = response.exception.map { s"Exception from the 
cluster:\n" + formatException(_) }
+val message = response.exception.map { "Exception from the cluster:\n" 
+ formatException(_) }
--- End diff --

In cases like this, we should use interpolation in place of concatenation, 
if anything. Likewise the previous instance, for example. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22569
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3539/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22569
  
**[Test build #96697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96697/testReport)**
 for PR 22569 at commit 
[`39a77d6`](https://github.com/apache/spark/commit/39a77d6596779044617912f2d6a5c2aeeb1a32f8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22569
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22569
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22569
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22569
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96688/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenHashMapS...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22569
  
**[Test build #96688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96688/testReport)**
 for PR 22569 at commit 
[`39a77d6`](https://github.com/apache/spark/commit/39a77d6596779044617912f2d6a5c2aeeb1a32f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22570
  
**[Test build #96696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96696/testReport)**
 for PR 22570 at commit 
[`4adfa46`](https://github.com/apache/spark/commit/4adfa46dfda68ede43f68485e6df36b50b8dfa96).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96696/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3538/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22570
  
**[Test build #96696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96696/testReport)**
 for PR 22570 at commit 
[`4adfa46`](https://github.com/apache/spark/commit/4adfa46dfda68ede43f68485e6df36b50b8dfa96).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-09-27 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22138
  
Just rebased.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22138
  
**[Test build #96695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96695/testReport)**
 for PR 22138 at commit 
[`d3f097b`](https://github.com/apache/spark/commit/d3f097bcd2c808a543326a6990b217774f86).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22570
  
**[Test build #96694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96694/testReport)**
 for PR 22570 at commit 
[`0a01aa3`](https://github.com/apache/spark/commit/0a01aa311f038771706cb029cf89607390f0cd57).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `println(\"Per-class example fractions, counts:\")`
  * `  instr.logWarning(\"All labels belong to a single class and 
fitIntercept=false. It's a \" +`
  * `  require(className == expectedClassName, \"Error loading 
metadata: Expected class name\" +`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22563: [SPARK-24341][SQL][followup] remove duplicated er...

2018-09-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22563


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96694/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22563: [SPARK-24341][SQL][followup] remove duplicated error che...

2018-09-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22563
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22570
  
**[Test build #96694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96694/testReport)**
 for PR 22570 at commit 
[`0a01aa3`](https://github.com/apache/spark/commit/0a01aa311f038771706cb029cf89607390f0cd57).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22570
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3537/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker ...

2018-09-27 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22570
  
cc @dongjoon-hyun @HyukjinKwon @srowen


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22570: [SPARK-25553][BUILD] Add EmptyInterpolatedStringC...

2018-09-27 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/22570

[SPARK-25553][BUILD] Add EmptyInterpolatedStringChecker to scalastyle config

## What changes were proposed in this pull request?

 
[EmptyInterpolatedStringChecker](http://www.scalastyle.org/rules-dev.html#org_scalastyle_scalariform_EmptyInterpolatedStringChecker)
 used for check for empty string interpolations. This feature is very useful to 
us. This PR add it to `scalastyle-config.xml` and fix all empty interpolated 
string issue.

## How was this patch tested?

manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-25553

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22570


commit 0a01aa311f038771706cb029cf89607390f0cd57
Author: Yuming Wang 
Date:   2018-09-27T12:43:15Z

Add EmptyInterpolatedStringChecker to scalastyle-config.xml




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22568: [SPARK-23401][PYTHON][TESTS] Add more data types for Pan...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22568
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22568: [SPARK-23401][PYTHON][TESTS] Add more data types for Pan...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22568
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96692/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22568: [SPARK-23401][PYTHON][TESTS] Add more data types for Pan...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22568
  
**[Test build #96692 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96692/testReport)**
 for PR 22568 at commit 
[`53ff750`](https://github.com/apache/spark/commit/53ff750456057098b029a801266818fc7204cf79).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-27 Thread sandeep-katta

Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
I am running the same test case with hive version **1.2.1.spark2**  and it 
is passing,can I know with what hive version CI is running and how 
org.apache.hive.jdbc.HiveStatement  and external catalog are linked,I don't see 
any such code. cc @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21588
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96685/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21588
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21588
  
**[Test build #96685 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96685/testReport)**
 for PR 21588 at commit 
[`a011e50`](https://github.com/apache/spark/commit/a011e50e57537589b23099b4c7b6e2e893c86f9e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwrite a p...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21257
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22563: [SPARK-24341][SQL][followup] remove duplicated error che...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22563
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

2018-09-27 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/22528
  
> Another concern here is, we have another place to control the compression 
codec (where we usually delegate to HDFS libraries).

I was considering using Compressor API but its streaming nature 
controverses to structure of zip archive where meta-info is located at the end 
of files, and you cannot read/uncompress it sequentially block-by-block.

> It just sounds like a bandaid fix to allow one zipped file case in multi 
line mode.

I believe it is better to return correct result in a case when wrong result 
is returned for now (try to read zipped CSV), or to force users to use this 
workaround only to read zip archives via RDD API: 
https://docs.databricks.com/spark/latest/data-sources/zip-files.html#zip-files 
. Especially in the case of compressed not splittable CSV, there is not big 
difference how to read it in multiLine enabled or disabled.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22563: [SPARK-24341][SQL][followup] remove duplicated error che...

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22563
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96687/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22563: [SPARK-24341][SQL][followup] remove duplicated error che...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22563
  
**[Test build #96687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96687/testReport)**
 for PR 22563 at commit 
[`d810e0d`](https://github.com/apache/spark/commit/d810e0dfec9bc8f182509ccf75623f2e92e95290).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-09-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22010


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9168: [SPARK-11182] HDFS Delegation Token will be expired when ...

2018-09-27 Thread Tianny

Github user Tianny commented on the issue:

https://github.com/apache/spark/pull/9168
  
@jackiehff Have you solved the problemï¼I met the error same as you.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22010
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-09-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22237#discussion_r220908825
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala
 ---
@@ -15,50 +15,51 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.execution.datasources
+package org.apache.spark.sql.catalyst.util
 
 import org.apache.spark.SparkException
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
-import org.apache.spark.sql.catalyst.util._
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.unsafe.types.UTF8String
 
 class FailureSafeParser[IN](
 rawParser: IN => Seq[InternalRow],
 mode: ParseMode,
-schema: StructType,
+dataType: DataType,
 columnNameOfCorruptRecord: String,
 isMultiLine: Boolean) {
-
-  private val corruptFieldIndex = 
schema.getFieldIndex(columnNameOfCorruptRecord)
-  private val actualSchema = StructType(schema.filterNot(_.name == 
columnNameOfCorruptRecord))
-  private val resultRow = new GenericInternalRow(schema.length)
-  private val nullResult = new GenericInternalRow(schema.length)
-
   // This function takes 2 parameters: an optional partial result, and the 
bad record. If the given
   // schema doesn't contain a field for corrupted record, we just return 
the partial result or a
   // row with all fields null. If the given schema contains a field for 
corrupted record, we will
   // set the bad record to this field, and set other fields according to 
the partial result or null.
-  private val toResultRow: (Option[InternalRow], () => UTF8String) => 
InternalRow = {
--- End diff --

Just to make the review easier, backporting easier, and keep the original 
author of the codes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22568: [SPARK-23401][PYTHON][TESTS] Add more data types for Pan...

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22568
  
**[Test build #96692 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96692/testReport)**
 for PR 22568 at commit 
[`53ff750`](https://github.com/apache/spark/commit/53ff750456057098b029a801266818fc7204cf79).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22237
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96684/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22237
  
**[Test build #96693 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96693/testReport)**
 for PR 22237 at commit 
[`939e220`](https://github.com/apache/spark/commit/939e220475b2c33ab63df117ee6aec1ed4afdd2b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22237
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22237
  
**[Test build #96684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96684/testReport)**
 for PR 22237 at commit 
[`4e196f6`](https://github.com/apache/spark/commit/4e196f654b3b921787ca5a936f0a5635d26d75d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

301 - 400 of 601 matches

Mail list logo