[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-28 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22197#discussion_r213551202
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -350,25 +356,38 @@ private[parquet] class ParquetFilters(
   }
 
   /**
-   * Returns a map from name of the column to the data type, if predicate 
push down applies.
+   * Returns a map, which contains parquet field name and data type, if 
predicate push down applies.
*/
-  private def getFieldMap(dataType: MessageType): Map[String, 
ParquetSchemaType] = dataType match {
-case m: MessageType =>
-  // Here we don't flatten the fields in the nested schema but just 
look up through
-  // root fields. Currently, accessing to nested fields does not push 
down filters
-  // and it does not support to create filters for them.
-  
m.getFields.asScala.filter(_.isPrimitive).map(_.asPrimitiveType()).map { f =>
-f.getName -> ParquetSchemaType(
-  f.getOriginalType, f.getPrimitiveTypeName, f.getTypeLength, 
f.getDecimalMetadata)
-  }.toMap
-case _ => Map.empty[String, ParquetSchemaType]
+  private def getFieldMap(dataType: MessageType): Map[String, 
ParquetField] = {
+// Here we don't flatten the fields in the nested schema but just look 
up through
+// root fields. Currently, accessing to nested fields does not push 
down filters
+// and it does not support to create filters for them.
+val primitiveFields =
+  
dataType.getFields.asScala.filter(_.isPrimitive).map(_.asPrimitiveType()).map { 
f =>
+f.getName -> ParquetField(f.getName,
+  ParquetSchemaType(f.getOriginalType,
+f.getPrimitiveTypeName, f.getTypeLength, f.getDecimalMetadata))
+  }
+if (caseSensitive) {
+  primitiveFields.toMap
+} else {
+  // Don't consider ambiguity here, i.e. more than one field is 
matched in case insensitive
+  // mode, just skip pushdown for these fields, they will trigger 
Exception when reading,
+  // See: SPARK-25132.
--- End diff --

@cloud-fan, it is a great idea, thanks!
I think it is not to "dedup" before pushdown and pruning.
Maybe we should do parquet schema clip before pushdown and pruning.
If duplicated fields are detected, throw the exception.
If not, pass clipped parquet schema via hadoopconf to parquet lib.
```
catalystRequestedSchema = {
  val conf = context.getConfiguration
  val schemaString = 
conf.get(ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA)
  assert(schemaString != null, "Parquet requested schema not set.")
  StructType.fromString(schemaString)
}

val caseSensitive = 
context.getConfiguration.getBoolean(SQLConf.CASE_SENSITIVE.key,
  SQLConf.CASE_SENSITIVE.defaultValue.get)
val parquetRequestedSchema = ParquetReadSupport.clipParquetSchema(
  context.getFileSchema, catalystRequestedSchema, caseSensitive)
```
I am trying this way, will update soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95390/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21546
  
**[Test build #95390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95390/testReport)**
 for PR 21546 at commit 
[`ffb47cb`](https://github.com/apache/spark/commit/ffb47cb2d411b91e240ab40cd6bd75b025e417c2).
 * This patch **fails from timeout after a configured wait of \`400m\`**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ArrowEvalPython(udfs: Seq[PythonUDF], output: 
Seq[Attribute], child: LogicalPlan)`
  * `case class BatchEvalPython(udfs: Seq[PythonUDF], output: 
Seq[Attribute], child: LogicalPlan)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21860#discussion_r213547480
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -853,33 +853,48 @@ case class HashAggregateExec(
 
 val updateRowInHashMap: String = {
   if (isFastHashMapEnabled) {
-ctx.INPUT_ROW = fastRowBuffer
-val boundUpdateExpr = 
updateExpr.map(BindReferences.bindReference(_, inputAttr))
-val subExprs = 
ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
-val effectiveCodes = subExprs.codes.mkString("\n")
-val fastRowEvals = 
ctx.withSubExprEliminationExprs(subExprs.states) {
-  boundUpdateExpr.map(_.genCode(ctx))
-}
-val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) =>
-  val dt = updateExpr(i).dataType
-  CodeGenerator.updateColumn(
-fastRowBuffer, dt, i, ev, updateExpr(i).nullable, 
isVectorizedHashMapEnabled)
-}
+if (isVectorizedHashMapEnabled) {
+  ctx.INPUT_ROW = fastRowBuffer
+  val boundUpdateExpr = 
updateExpr.map(BindReferences.bindReference(_, inputAttr))
+  val subExprs = 
ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
+  val effectiveCodes = subExprs.codes.mkString("\n")
+  val fastRowEvals = 
ctx.withSubExprEliminationExprs(subExprs.states) {
+boundUpdateExpr.map(_.genCode(ctx))
+  }
+  val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) 
=>
+val dt = updateExpr(i).dataType
+CodeGenerator.updateColumn(
+  fastRowBuffer, dt, i, ev, updateExpr(i).nullable, 
isVectorized = true)
+  }
 
-// If fast hash map is on, we first generate code to update row in 
fast hash map, if the
-// previous loop up hit fast hash map. Otherwise, update row in 
regular hash map.
-s"""
-   |if ($fastRowBuffer != null) {
-   |  // common sub-expressions
-   |  $effectiveCodes
-   |  // evaluate aggregate function
-   |  ${evaluateVariables(fastRowEvals)}
-   |  // update fast row
-   |  ${updateFastRow.mkString("\n").trim}
-   |} else {
-   |  $updateRowInRegularHashMap
-   |}
-   """.stripMargin
+  // If vectorized fast hash map is on, we first generate code to 
update row
+  // in vectorized fast hash map, if the previous loop up hit 
vectorized fast hash map.
+  // Otherwise, update row in regular hash map.
+  s"""
+ |if ($fastRowBuffer != null) {
+ |  // common sub-expressions
+ |  $effectiveCodes
+ |  // evaluate aggregate function
+ |  ${evaluateVariables(fastRowEvals)}
+ |  // update fast row
+ |  ${updateFastRow.mkString("\n").trim}
+ |} else {
+ |  $updateRowInRegularHashMap
+ |}
+  """.stripMargin
+} else {
+  // If fast hash map is on and the previous loop up hit fast hash 
map,
--- End diff --

if row-based hash map is on...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/2
  
+1 on @rdblue 's idea. One point is, we should use 
`ColumnarBatchScan.supportsBatch` to indicate columnar scan or not, instead of 
asking the RDD to report it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-28 Thread ivoson
Github user ivoson commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213546414
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

If we start a job with conf spark.executor.memory=1024 or 
spark.driver.memory=1024, it means 1024 bytes for now. So I think we should 
update the doc to avoid confusion for spark users.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21546
  
@HyukjinKwon I redid the benchmarks for `toPandas` with the current code 
and updated the description. It's not a huge speedup now, but definitely does 
improve some. I'll also followup with another PR with the out-of-order batches 
to improve this even further. Let me know if this looks ok to you (pending 
tests). Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2659/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2659/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-28 Thread ivoson
Github user ivoson commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213544524
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

@xuanyuanking @HyukjinKwon @srowen thanks for your reply. I've also noticed 
the code above, but the 'DRIVER_MEMORY' and 'EXECUTOR_MEMORY' in the 
config/package.scala never used, maybe this is for future usage I think.  The 
code below shows how the conf is used for now, please take a look.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L465


https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1130


https://github.com/ivoson/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L265


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22197#discussion_r213544078
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -350,25 +356,38 @@ private[parquet] class ParquetFilters(
   }
 
   /**
-   * Returns a map from name of the column to the data type, if predicate 
push down applies.
+   * Returns a map, which contains parquet field name and data type, if 
predicate push down applies.
*/
-  private def getFieldMap(dataType: MessageType): Map[String, 
ParquetSchemaType] = dataType match {
-case m: MessageType =>
-  // Here we don't flatten the fields in the nested schema but just 
look up through
-  // root fields. Currently, accessing to nested fields does not push 
down filters
-  // and it does not support to create filters for them.
-  
m.getFields.asScala.filter(_.isPrimitive).map(_.asPrimitiveType()).map { f =>
-f.getName -> ParquetSchemaType(
-  f.getOriginalType, f.getPrimitiveTypeName, f.getTypeLength, 
f.getDecimalMetadata)
-  }.toMap
-case _ => Map.empty[String, ParquetSchemaType]
+  private def getFieldMap(dataType: MessageType): Map[String, 
ParquetField] = {
+// Here we don't flatten the fields in the nested schema but just look 
up through
+// root fields. Currently, accessing to nested fields does not push 
down filters
+// and it does not support to create filters for them.
+val primitiveFields =
+  
dataType.getFields.asScala.filter(_.isPrimitive).map(_.asPrimitiveType()).map { 
f =>
+f.getName -> ParquetField(f.getName,
+  ParquetSchemaType(f.getOriginalType,
+f.getPrimitiveTypeName, f.getTypeLength, f.getDecimalMetadata))
+  }
+if (caseSensitive) {
+  primitiveFields.toMap
+} else {
+  // Don't consider ambiguity here, i.e. more than one field is 
matched in case insensitive
+  // mode, just skip pushdown for these fields, they will trigger 
Exception when reading,
+  // See: SPARK-25132.
--- End diff --

ping @yucai


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2659/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95394/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileS...

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22232#discussion_r213543748
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceSuite.scala
 ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd}
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.expressions.PredicateHelper
+import org.apache.spark.sql.test.SharedSQLContext
+
+
+class FileSourceSuite extends QueryTest with SharedSQLContext with 
PredicateHelper {
+
+  test("[SPARK-25237] remove updateBytesReadWithFileSize in FileScanRdd") {
+withTempPath { p =>
+  val path = p.getAbsolutePath
+  spark.range(1000).selectExpr("id AS c0", "rand() AS 
c1").repartition(10).write.csv(path)
--- End diff --

I think a single partition is ok for this test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22138
  
**[Test build #95394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95394/testReport)**
 for PR 22138 at commit 
[`ba576e8`](https://github.com/apache/spark/commit/ba576e88578b50897dd73b385f3f2308976c088a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22261: [SPARK-25248.1][PYSPARK] update barrier Python API

2018-08-28 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22261
  
What is `.1` in the title `[SPARK-25248.1]`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2018-08-28 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19691
  
@HyukjinKwon can you trigger again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213542353
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -385,107 +385,120 @@ case class MapEntries(child: Expression) extends 
UnaryExpression with ExpectsInp
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
+  val arrayData = ctx.freshName("arrayData")
   val numElements = ctx.freshName("numElements")
   val keys = ctx.freshName("keys")
   val values = ctx.freshName("values")
   val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
   val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+
+  val wordSize = UnsafeRow.WORD_SIZE
+  val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize 
* 2
+  val elementSize = if (isKeyPrimitive && isValuePrimitive) {
+Some(structSize + wordSize)
+  } else {
+None
+  }
+
+  val allocation = CodeGenerator.createArrayData(arrayData, 
childDataType.keyType, numElements,
--- End diff --

this is hacky. Actually we want to create an array of struct, but here we 
lied and say we want to create an array of key type.

I think we should call `ArrayData.allocateArrayData` here directly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22246: [SPARK-25235] [SHELL] Merge the REPL code in Scal...

2018-08-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22246


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
**[Test build #95402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95402/testReport)**
 for PR 21669 at commit 
[`719b059`](https://github.com/apache/spark/commit/719b059910adff2368a0e7e0b55ad26329d5030b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21912
  
LGTM except a few comments


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22261: [SPARK-25248.1][PYSPARK] update barrier Python API

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22261
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22261: [SPARK-25248.1][PYSPARK] update barrier Python API

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22261
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95399/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22246: [SPARK-25235] [SHELL] Merge the REPL code in Scala 2.11 ...

2018-08-28 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/22246
  
Thanks all for reviewing. Merged into master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22261: [SPARK-25248.1][PYSPARK] update barrier Python API

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22261
  
**[Test build #95399 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95399/testReport)**
 for PR 22261 at commit 
[`afb50ee`](https://github.com/apache/spark/commit/afb50ee1150279f9cb27f92e220a332e029dbc43).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213541643
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -385,107 +385,120 @@ case class MapEntries(child: Expression) extends 
UnaryExpression with ExpectsInp
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, c => {
+  val arrayData = ctx.freshName("arrayData")
   val numElements = ctx.freshName("numElements")
   val keys = ctx.freshName("keys")
   val values = ctx.freshName("values")
   val isKeyPrimitive = 
CodeGenerator.isPrimitiveType(childDataType.keyType)
   val isValuePrimitive = 
CodeGenerator.isPrimitiveType(childDataType.valueType)
+
+  val wordSize = UnsafeRow.WORD_SIZE
+  val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize 
* 2
+  val elementSize = if (isKeyPrimitive && isValuePrimitive) {
+Some(structSize + wordSize)
+  } else {
+None
+  }
+
+  val allocation = CodeGenerator.createArrayData(arrayData, 
childDataType.keyType, numElements,
+s" $prettyName failed.", elementSize = elementSize)
+
   val code = if (isKeyPrimitive && isValuePrimitive) {
-genCodeForPrimitiveElements(ctx, keys, values, ev.value, 
numElements)
+val genCodeForPrimitive = genCodeForPrimitiveElements(
+  ctx, arrayData, keys, values, ev.value, numElements, structSize)
+s"""
+   |if ($arrayData instanceof UnsafeArrayData) {
+   |  $genCodeForPrimitive
+   |} else {
+   |  ${genCodeForAnyElements(ctx, arrayData, keys, values, 
ev.value, numElements)}
+   |}
+ """.stripMargin
   } else {
-genCodeForAnyElements(ctx, keys, values, ev.value, numElements)
+s"${genCodeForAnyElements(ctx, arrayData, keys, values, ev.value, 
numElements)}"
   }
+
   s"""
  |final int $numElements = $c.numElements();
  |final ArrayData $keys = $c.keyArray();
  |final ArrayData $values = $c.valueArray();
+ |$allocation
  |$code
""".stripMargin
 })
   }
 
-  private def getKey(varName: String) = CodeGenerator.getValue(varName, 
childDataType.keyType, "z")
+  private def getKey(varName: String, index: String) =
+CodeGenerator.getValue(varName, childDataType.keyType, index)
 
-  private def getValue(varName: String) = {
-CodeGenerator.getValue(varName, childDataType.valueType, "z")
-  }
+  private def getValue(varName: String, index: String) =
+CodeGenerator.getValue(varName, childDataType.valueType, index)
 
   private def genCodeForPrimitiveElements(
   ctx: CodegenContext,
+  arrayData: String,
   keys: String,
   values: String,
-  arrayData: String,
-  numElements: String): String = {
-val unsafeRow = ctx.freshName("unsafeRow")
+  resultArrayData: String,
+  numElements: String,
+  structSize: Int): String = {
 val unsafeArrayData = ctx.freshName("unsafeArrayData")
+val baseObject = ctx.freshName("baseObject")
+val unsafeRow = ctx.freshName("unsafeRow")
 val structsOffset = ctx.freshName("structsOffset")
+val offset = ctx.freshName("offset")
+val z = ctx.freshName("z")
 val calculateHeader = "UnsafeArrayData.calculateHeaderPortionInBytes"
 
 val baseOffset = Platform.BYTE_ARRAY_OFFSET
 val wordSize = UnsafeRow.WORD_SIZE
-val structSize = UnsafeRow.calculateBitSetWidthInBytes(2) + wordSize * 
2
-val structSizeAsLong = structSize + "L"
-val keyTypeName = 
CodeGenerator.primitiveTypeName(childDataType.keyType)
-val valueTypeName = 
CodeGenerator.primitiveTypeName(childDataType.valueType)
-
-val valueAssignment = s"$unsafeRow.set$valueTypeName(1, 
${getValue(values)});"
-val valueAssignmentChecked = if (childDataType.valueContainsNull) {
-  s"""
- |if ($values.isNullAt(z)) {
- |  $unsafeRow.setNullAt(1);
- |} else {
- |  $valueAssignment
- |}
-   """.stripMargin
-} else {
-  valueAssignment
-}
+val structSizeAsLong = s"${structSize}L"
 
-val assignmentLoop = (byteArray: String) =>
-  s"""
- |final int $structsOffset = $calculateHeader($numElements) + 
$numElements * $wordSize;
- |UnsafeRow $unsafeRow = new UnsafeRow(2);
- |for (int z = 0; z < $numElements; z++) {
- |  long offset = $structsOffset + z * $structSizeAsLong;
 

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-28 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/7
  
Also, you need to update split in python and R.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213540868
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1490,6 +1423,63 @@ object CodeGenerator extends Logging {
 }
   }
 
+  /**
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
+   *
+   * @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
+   * @param numElements code representing the number of elements the array 
should contain
+   * @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   */
+  def createArrayData(
+  arrayName: String,
+  elementType: DataType,
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): String = {
+val (isPrimitiveType, elemSize) = if (elementSize.isDefined) {
+  (false, elementSize.get)
+} else {
+  (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize)
+}
+
+s"""
+   |ArrayData $arrayName = ArrayData.allocateArrayData(
+   |  $elemSize, $numElements, $isPrimitiveType, 
"$additionalErrorMessage");
+ """.stripMargin
+  }
+
+  /**
+   * Generates assignment code for an [[ArrayData]]
--- End diff --

shall we mention that the returned code should be put inside a loop?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r213540571
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -229,33 +229,59 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
 
 
 /**
- * Splits str around pat (pattern is a regular expression).
+ * Splits str around pattern (pattern is a regular expression).
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
+" and returns an array of at most `limit`",
+  arguments = """
+Arguments:
+  * str - a string expression to split.
+  * pattern - a string representing a regular expression. The pattern 
string should be a
+Java regular expression.
+  * limit - an integer expression which controls the number of times 
the pattern is applied.
+
+limit > 0:
+  The resulting array's length will not be more than `limit`, and 
the resulting array's
+  last entry will contain all input beyond the last matched 
pattern.
+
+limit < 0:
+  `pattern` will be applied as many times as possible, and the 
resulting
+  array can be of any size.
+
+limit = 0:
+  `pattern` will be applied as many times as possible, the 
resulting array can
+  be of any size, and trailing empty strings will be discarded.
+  """,
--- End diff --

How about this formatting?;
```

 function_desc | Extended Usage:
Arguments:
  * str - a string expression to split.
  * pattern - a string representing a regular expression. The pattern 
string should be a
Java regular expression.
  * limit - an integer expression which controls the number of times 
the pattern is applied.

limit > 0: The resulting array's length will not be more than 
`limit`, and the resulting array's
   last entry will contain all input beyond the last 
matched pattern.
limit < 0: `pattern` will be applied as many times as possible, and 
the resulting
   array can be of any size.
limit = 0: `pattern` will be applied as many times as possible, the 
resulting array can
   be of any size, and trailing empty strings will be 
discarded.
  
Examples:
  > SELECT split('oneAtwoBthreeC', '[ABC]');
   ["one","two","three",""]
  > SELECT split('oneAtwoBthreeC', '[ABC]', 0);
   ["one","two","three"]
  > SELECT split('oneAtwoBthreeC', '[ABC]', 2);
   ["one","twoBthreeC"]
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213540562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1490,6 +1423,63 @@ object CodeGenerator extends Logging {
 }
   }
 
+  /**
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
+   *
+   * @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
+   * @param numElements code representing the number of elements the array 
should contain
+   * @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   */
+  def createArrayData(
+  arrayName: String,
+  elementType: DataType,
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): String = {
+val (isPrimitiveType, elemSize) = if (elementSize.isDefined) {
+  (false, elementSize.get)
+} else {
+  (CodeGenerator.isPrimitiveType(elementType), elementType.defaultSize)
+}
+
+s"""
+   |ArrayData $arrayName = ArrayData.allocateArrayData(
+   |  $elemSize, $numElements, $isPrimitiveType, 
"$additionalErrorMessage");
+ """.stripMargin
+  }
+
+  /**
+   * Generates assignment code for an [[ArrayData]]
+   *
+   * @param arrayName name of the array to create
+   * @param elementType data type of the elements in destination and 
source arrays
+   * @param srcArray code representing the number of elements the array 
should contain
--- End diff --

rename it to `length`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213540440
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -1314,9 +1314,9 @@ class CollectionExpressionsSuite extends 
SparkFunSuite with ExpressionEvalHelper
   }
 
   test("Array Distinct") {
-val a0 = Literal.create(Seq(2, 1, 2, 3, 4, 4, 5), 
ArrayType(IntegerType))
+val a0 = Literal.create(Seq(2, 1, 2, 3, 4, 4, 5), 
ArrayType(IntegerType, false))
--- End diff --

why this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22146
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22146
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95392/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213540247
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -473,13 +473,13 @@ public static UnsafeArrayData fromPrimitiveArray(
 return result;
   }
 
-  public static UnsafeArrayData forPrimitiveArray(int offset, int length, 
int elementSize) {
-return fromPrimitiveArray(null, offset, length, elementSize);
+  public static UnsafeArrayData forPrimitiveArray(int length, int 
elementSize) {
+return fromPrimitiveArray(null, 0, length, elementSize);
--- End diff --

so this is used to create a new unsafe array with no data. This saves 
duplicated code, but we will do unnecessary memory copy.

Shall we just add a new method in `UnsafeArrayData` to create a fresh array?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22146
  
**[Test build #95392 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95392/testReport)**
 for PR 22146 at commit 
[`f3b6082`](https://github.com/apache/spark/commit/f3b60822e688a6a16404f5e983e953e3da99ffba).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95395/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22198
  
**[Test build #95395 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95395/testReport)**
 for PR 22198 at commit 
[`cc0cd4f`](https://github.com/apache/spark/commit/cc0cd4f4bf90624b796f33dbc6c997032b98b0a0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r213539793
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -229,33 +229,59 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
 
 
 /**
- * Splits str around pat (pattern is a regular expression).
+ * Splits str around pattern (pattern is a regular expression).
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
+" and returns an array of at most `limit`",
+  arguments = """
+Arguments:
+  * str - a string expression to split.
+  * pattern - a string representing a regular expression. The pattern 
string should be a
+Java regular expression.
+  * limit - an integer expression which controls the number of times 
the pattern is applied.
+
+limit > 0:
+  The resulting array's length will not be more than `limit`, and 
the resulting array's
+  last entry will contain all input beyond the last matched 
pattern.
+
+limit < 0:
+  `pattern` will be applied as many times as possible, and the 
resulting
+  array can be of any size.
+
+limit = 0:
+  `pattern` will be applied as many times as possible, the 
resulting array can
+  be of any size, and trailing empty strings will be discarded.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]');
["one","two","three",""]
+| > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 0);
+   ["one","two","three"]
+| > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2);
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213539798
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1490,6 +1423,63 @@ object CodeGenerator extends Logging {
 }
   }
 
+  /**
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
+   *
+   * @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
+   * @param numElements code representing the number of elements the array 
should contain
+   * @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
--- End diff --

it's better to give an example when we will set it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r213539767
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -229,33 +229,59 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
 
 
 /**
- * Splits str around pat (pattern is a regular expression).
+ * Splits str around pattern (pattern is a regular expression).
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
+" and returns an array of at most `limit`",
+  arguments = """
+Arguments:
+  * str - a string expression to split.
+  * pattern - a string representing a regular expression. The pattern 
string should be a
+Java regular expression.
+  * limit - an integer expression which controls the number of times 
the pattern is applied.
+
+limit > 0:
+  The resulting array's length will not be more than `limit`, and 
the resulting array's
+  last entry will contain all input beyond the last matched 
pattern.
+
+limit < 0:
+  `pattern` will be applied as many times as possible, and the 
resulting
+  array can be of any size.
+
+limit = 0:
+  `pattern` will be applied as many times as possible, the 
resulting array can
+  be of any size, and trailing empty strings will be discarded.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]');
["one","two","three",""]
+| > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 0);
--- End diff --

drop `|`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r213539696
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -473,13 +473,13 @@ public static UnsafeArrayData fromPrimitiveArray(
 return result;
   }
 
-  public static UnsafeArrayData forPrimitiveArray(int offset, int length, 
int elementSize) {
-return fromPrimitiveArray(null, offset, length, elementSize);
+  public static UnsafeArrayData forPrimitiveArray(int length, int 
elementSize) {
+return fromPrimitiveArray(null, 0, length, elementSize);
--- End diff --

is it safe? I vaguely remember the address of offheap memory block is 
usually not 0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22260: [MINOR] Fix scala 2.12 build using collect

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22260
  
**[Test build #95401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95401/testReport)**
 for PR 22260 at commit 
[`626f265`](https://github.com/apache/spark/commit/626f26598ccfc692cca59e29a2d7861133654ef0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22112
  
**[Test build #95400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95400/testReport)**
 for PR 22112 at commit 
[`a4e6639`](https://github.com/apache/spark/commit/a4e6639ea098eebe4a06dc9ca27c4386f59bf413).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22260: [MINOR] Fix scala 2.12 build using collect

2018-08-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22260
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2658/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...

2018-08-28 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22247
  
@squito Thanks for the refactor!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22112
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r213538271
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -229,33 +229,59 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
 
 
 /**
- * Splits str around pat (pattern is a regular expression).
+ * Splits str around pattern (pattern is a regular expression).
--- End diff --

pattern? regex? we should use a consisntent word.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22112
  
**[Test build #95386 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95386/testReport)**
 for PR 22112 at commit 
[`a4e6639`](https://github.com/apache/spark/commit/a4e6639ea098eebe4a06dc9ca27c4386f59bf413).
 * This patch **fails from timeout after a configured wait of \`400m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95386/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22259: [WIP][SPARK-25044][SQL] (take 2) Address translation of ...

2018-08-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22259
  
LGTM. How did you work around the type tag not found issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

2018-08-28 Thread xuejianbest
Github user xuejianbest commented on a diff in the pull request:

https://github.com/apache/spark/pull/22048#discussion_r213537923
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -294,23 +294,25 @@ class Dataset[T] private[sql](
 // We set a minimum column width at '3'
 val minimumColWidth = 3
 
+//Regular expression matching full width characters
+val fullWidthRegex = 
"""[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
 if (!vertical) {
   // Initialise the width of each column to a minimum value
   val colWidths = Array.fill(numCols)(minimumColWidth)
 
   // Compute the width of each column
   for (row <- rows) {
 for ((cell, i) <- row.zipWithIndex) {
-  colWidths(i) = math.max(colWidths(i), cell.length)
+  colWidths(i) = math.max(colWidths(i), cell.length + 
fullWidthRegex.findAllIn(cell).size)
--- End diff --

I committed a new version. See if this is appropriate please ?
@srowen


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r213538010
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -229,33 +229,59 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
 
 
 /**
- * Splits str around pat (pattern is a regular expression).
+ * Splits str around pattern (pattern is a regular expression).
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
+" and returns an array of at most `limit`",
+  arguments = """
+Arguments:
+  * str - a string expression to split.
+  * pattern - a string representing a regular expression. The pattern 
string should be a
+Java regular expression.
+  * limit - an integer expression which controls the number of times 
the pattern is applied.
+
+limit > 0:
+  The resulting array's length will not be more than `limit`, and 
the resulting array's
+  last entry will contain all input beyond the last matched 
pattern.
+
+limit < 0:
+  `pattern` will be applied as many times as possible, and the 
resulting
+  array can be of any size.
+
+limit = 0:
+  `pattern` will be applied as many times as possible, the 
resulting array can
+  be of any size, and trailing empty strings will be discarded.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]');
["one","two","three",""]
+| > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 0);
+   ["one","two","three"]
+| > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2);
+   ["one","twoBthreeC"]
--- End diff --

Add the netative case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22048#discussion_r213537514
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -294,23 +294,29 @@ class Dataset[T] private[sql](
 // We set a minimum column width at '3'
 val minimumColWidth = 3
 
+// Regular expression matching full width characters
+val fullWidthRegex = 
"""[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
--- End diff --

This line goes over the limit, 100.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22261: [SPARK-25248.1][PYSPARK] update barrier Python API

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22261
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22246: [SPARK-25235] [SHELL] Merge the REPL code in Scala 2.11 ...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22246
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95393/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22246: [SPARK-25235] [SHELL] Merge the REPL code in Scala 2.11 ...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22246
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22261: [SPARK-25248.1][PYSPARK] update barrier Python API

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22261
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2657/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for...

2018-08-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/22240#discussion_r213537490
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala ---
@@ -68,7 +74,7 @@ class BarrierTaskContext(
*
* CAUTION! In a barrier stage, each task must have the same number of 
barrier() calls, in all
* possible code branches. Otherwise, you may get the job hanging or a 
SparkException after
-   * timeout. Some examples of misuses listed below:
+   * timeout. Some examples of '''misuses''' listed below:
--- End diff --

just saw it, will include it if Jenkins fails:)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22048#discussion_r213537463
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -294,23 +294,29 @@ class Dataset[T] private[sql](
 // We set a minimum column width at '3'
 val minimumColWidth = 3
 
+// Regular expression matching full width characters
+val fullWidthRegex = 
"""[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
+// The number of half width of a string
+def stringHalfWidth = (str: String) => {
+  str.length + fullWidthRegex.findAllIn(str).size
+}
--- End diff --

better to add tests for the helper function, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22048#discussion_r213537424
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -294,23 +294,29 @@ class Dataset[T] private[sql](
 // We set a minimum column width at '3'
 val minimumColWidth = 3
 
+// Regular expression matching full width characters
+val fullWidthRegex = 
"""[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
+// The number of half width of a string
+def stringHalfWidth = (str: String) => {
+  str.length + fullWidthRegex.findAllIn(str).size
+}
--- End diff --

better to move this method into `util.Utils` or something as a helper 
function?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22246: [SPARK-25235] [SHELL] Merge the REPL code in Scala 2.11 ...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22246
  
**[Test build #95393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95393/testReport)**
 for PR 22246 at commit 
[`bab5947`](https://github.com/apache/spark/commit/bab5947c3a0396a47b2ca399abea70471f4adbaf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22261: [SPARK-25248.1][PYSPARK] update barrier Python API

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22261
  
**[Test build #95399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95399/testReport)**
 for PR 22261 at commit 
[`afb50ee`](https://github.com/apache/spark/commit/afb50ee1150279f9cb27f92e220a332e029dbc43).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22261: [SPARK-25248.1][PYSPARK] update barrier Python AP...

2018-08-28 Thread mengxr
GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/22261

[SPARK-25248.1][PYSPARK] update barrier Python API

## What changes were proposed in this pull request?

I made one pass over the Python APIs for barrier mode and updated them to 
match the Scala doc in #22240 . Major changes:

* export the public classes
* expand the docs
* add doc for BarrierTaskInfo.addresss

cc: @jiangxb1987 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark SPARK-25248.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22261.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22261


commit afb50ee1150279f9cb27f92e220a332e029dbc43
Author: Xiangrui Meng 
Date:   2018-08-29T03:44:54Z

update barrier Python API




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22209
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22260: [MINOR] Fix scala 2.12 build using collect

2018-08-28 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22260
  
Can you add `[SQL][MINOR]` in the title? Also, can you narrow down the 
tittle cuz it is a little obscure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22209
  
**[Test build #95382 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95382/testReport)**
 for PR 22209 at commit 
[`76f1801`](https://github.com/apache/spark/commit/76f180181261a2d7adcce27c40bfb9126c094bc5).
 * This patch **fails from timeout after a configured wait of \`400m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22209
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95382/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22260: [MINOR] Fix scala 2.12 build using collect

2018-08-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22260#discussion_r213536395
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ProjectionOverSchema.scala
 ---
@@ -38,7 +38,7 @@ private[execution] case class 
ProjectionOverSchema(schema: StructType) {
   case GetArrayItem(child, arrayItemOrdinal) =>
 getProjection(child).map { projection => GetArrayItem(projection, 
arrayItemOrdinal) }
   case a: GetArrayStructFields =>
-getProjection(a.child).map(p => (p, p.dataType)).map {
+getProjection(a.child).map(p => (p, p.dataType)).collect {
--- End diff --

How about this? IMO `.collect` can't catch illegal inputs?
```
getProjection(a.child).map(p => (p, p.dataType)).map {
  case (projection, ArrayType(projSchema @ StructType(_), _)) =>
GetArrayStructFields(projection,
  projSchema(a.field.name),
  projSchema.fieldIndex(a.field.name),
  projSchema.size,
  a.containsNull)
  case _ =>
sys.error("")
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22260: [MINOR] Fix scala 2.12 build using collect

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22260
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22260: [MINOR] Fix scala 2.12 build using collect

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22260
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22260: [MINOR] Fix scala 2.12 build using collect

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22260
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22260: [MINOR] Fix scala 2.12 build using collect

2018-08-28 Thread sadhen
GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22260

[MINOR] Fix scala 2.12 build using collect

## What changes were proposed in this pull request?
Introduced by #21320 
```
[error] [warn] 
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/ProjectionOverSchema.scala:41:
 match may not be exhaustive.
[error] It would fail on the following inputs: (_, ArrayType(_, _)), (_, _)
[error] [warn] getProjection(a.child).map(p => (p, p.dataType)).map 
{
[error] [warn]
[error] [warn] 
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/ProjectionOverSchema.scala:52:
 match may not be exhaustive.
[error] It would fail on the following input: (_, _)
[error] [warn] getProjection(child).map(p => (p, p.dataType)).map {
[error] [warn]
```

```
$ sbt
> ++2.12.6
> project sql
> compile
```

## How was this patch tested?
Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark fix_exhaustive_match

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22260


commit 626f26598ccfc692cca59e29a2d7861133654ef0
Author: 忍冬 
Date:   2018-08-29T03:14:11Z

Fix exhaustive match using collect




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22240
  
**[Test build #95398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95398/testReport)**
 for PR 22240 at commit 
[`365e7b8`](https://github.com/apache/spark/commit/365e7b8c0dc161b765476cffb59c0a174d6f85ae).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2656/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22258: [SPARK-25266][CORE] Fix memory leak in Barrier Execution...

2018-08-28 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22258
  
LGTM pending test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-08-28 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22240
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22112
  
**[Test build #4299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4299/testReport)**
 for PR 22112 at commit 
[`a4e6639`](https://github.com/apache/spark/commit/a4e6639ea098eebe4a06dc9ca27c4386f59bf413).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22112
  
**[Test build #4298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4298/testReport)**
 for PR 22112 at commit 
[`a4e6639`](https://github.com/apache/spark/commit/a4e6639ea098eebe4a06dc9ca27c4386f59bf413).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21546
  
**[Test build #95397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95397/testReport)**
 for PR 21546 at commit 
[`ffb47cb`](https://github.com/apache/spark/commit/ffb47cb2d411b91e240ab40cd6bd75b025e417c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2655/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21546
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21546
  
Yes, that is the worst case. If there is some bug with types/schema then
there is an automatic fallback to the non-arrow code path too

On Tue, Aug 28, 2018, 7:16 PM Xiao Li  wrote:

> @BryanCutler  The worst case is to turn
> off spark.sql.execution.arrow.enabled, if the new code path has a bug,
> right?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21546
  
Yup, since `spark.sql.execution.arrow.enabled` is an experimental feature, 
we could just turn this off if there are critical bugs found later after the 
release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22209
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19691
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22209
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95389/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-08-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22209
  
**[Test build #95389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95389/testReport)**
 for PR 22209 at commit 
[`dccbf36`](https://github.com/apache/spark/commit/dccbf36a2041052da7489f301abce3fda3a845ef).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22254: [CATALYST] Add a call to apply method explicitly in comb...

2018-08-28 Thread SongYadong
Github user SongYadong commented on the issue:

https://github.com/apache/spark/pull/22254
  
OK, I will close it. thanks for your review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22254: [CATALYST] Add a call to apply method explicitly ...

2018-08-28 Thread SongYadong
Github user SongYadong closed the pull request at:

https://github.com/apache/spark/pull/22254


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-28 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21546
  
@BryanCutler The worst case is to turn off 
`spark.sql.execution.arrow.enabled`, if the new code path has a bug, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22258: [SPARK-25266][CORE] Fix memory leak in Barrier Execution...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22258
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >