date:20170910

[GitHub] spark issue #19178: [SPARK-21966][SQL]ResolveMissingReference rule should no...

2017-09-10 Thread DonnyZone

Github user DonnyZone commented on the issue:

https://github.com/apache/spark/pull/19178
  
Thanks, I will make a try in our private repository as there are several 
such cases and the users want to in a seamless way. It is really complicated 
for a general support.
Should I close this PR now?
@gatorsmile @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16158
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81622/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16158
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16158
  
**[Test build #81622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81622/testReport)**
 for PR 16158 at commit 
[`297091f`](https://github.com/apache/spark/commit/297091fcb8562bba856fadb5848c2e66e0f30bfb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19106: [SPARK-21770][ML] ProbabilisticClassificationMode...

2017-09-10 Thread smurching

Github user smurching commented on a diff in the pull request:

https://github.com/apache/spark/pull/19106#discussion_r137988907
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala
 ---
@@ -245,6 +245,13 @@ private[ml] object ProbabilisticClassificationModel {
 v.values(i) /= sum
 i += 1
   }
+} else {
+  var i = 0
+  val size = v.size
+  while (i < size) {
+v.values(i) = 1.0 / size
--- End diff --

You could use `java.util.Arrays.fill` to update `v.values` in-place, but 
I'm not sure that it'll make a huge difference.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

2017-09-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18704
  
ping @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19184#discussion_r137988200
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -104,6 +124,10 @@ public void loadNext() throws IOException {
 if (taskContext != null) {
   taskContext.killTaskIfInterrupted();
 }
+if (this.din == null) {
+  // Good time to init (if all files are opened, we can get Too Many 
files exception)
+  initStreams();
+}
--- End diff --

The valid fix should be to import a new config to control the concurrent 
number of opened spill files, it also means you should use some data structure 
to keep and track the request of open spill files.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19174: [SPARK-21963][CORE][TEST]Create temp file should be dele...

2017-09-10 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19174
  
This seems not a big problem, all the temp files are created under 
`target/tmp`, this can be cleaned by `mvn clean` or `sbt clean`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137988061
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
--- End diff --

We need negative test cases, i.e, `JacksonGenerator` initialized with 
`StructType` but used to write map, array of map. Also `JacksonGenerator` 
initialized with `MapType` but used to write struct, array of struct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137987843
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

It will throw `ClassCastException`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18875
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81616/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137987691
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

hmm, what type of exception thrown now if call `write(row)` when the 
generator is initialized with `MapType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18875
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81616/testReport)**
 for PR 18875 at commit 
[`b8bf38c`](https://github.com/apache/spark/commit/b8bf38cda5f3b03473f5bd1f1c4c8599e342b869).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-10 Thread awarrior

Github user awarrior commented on the issue:

https://github.com/apache/spark/pull/19118
  
I met a trouble when I write a test case. It seems that this issue won't be 
triggered in only one node. I found that Driver node do createPathFromString so 
that there is no problem.

>   java.lang.Thread.State: RUNNABLE
  at org.apache.hadoop.fs.FileSystem.getStatistics(FileSystem.java:3271)
  - locked <0x1211> (a java.lang.Class)
  at org.apache.hadoop.fs.FileSystem.initialize(FileSystem.java:202)
  at 
org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:92)
  at 
org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47)
  at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2598)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
  at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
  at 
org.apache.spark.internal.io.SparkHadoopWriterUtils$.createPathFromString(SparkHadoopWriterUtils.scala:55)

Does anyone know how to test in this case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137987296
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

oh I got you wrong. I thought you mean the matching in `rootFieldWeriters`. 
So we need to keep both of them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137986494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

I saw you remove the pattern matching?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137986264
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,25 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case _: MapType | ArrayType(_: MapType, _) =>
+  // TODO: let `JacksonUtils.verifySchema` verify a `MapType`
+  try {
+val st = StructType(StructField("a", 
rowSchema.asInstanceOf[MapType]) :: Nil)
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
 case _ => TypeCheckResult.TypeCheckFailure(
-  s"Input type ${child.dataType.simpleString} must be a struct or 
array of structs.")
+  s"Input type ${child.dataType.simpleString} must be a struct, array 
of structs or " +
+  s"an arbitrary map.")
--- End diff --

`an arbitrary map` -> `a map or array of map.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137986124
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,42 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(mt: MapType, _) =>
+  try {
+val st = StructType(StructField("a", mt) :: Nil)
+JacksonUtils.verifySchema(st)
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case MapType(_: DataType, st: StructType, _: Boolean) =>
--- End diff --

Yeah, as you support arbitrary map. Not all map types actually, it leaves 
to `JacksonUtils` to verify the schema.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81625/testReport)**
 for PR 18875 at commit 
[`b71b56a`](https://github.com/apache/spark/commit/b71b56ab49ba048f397459a18f946422d9440b29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-10 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19132
  
Still I have a question about history server, is you event log an 
incomplete event log or completed when you met such issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refe...

2017-09-10 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/18592#discussion_r137985658
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -99,6 +95,20 @@ object TPCDSQueryBenchmark {
   }
 
   def main(args: Array[String]): Unit = {
+if (args.length < 1) {
--- End diff --

We can pass the argument through the run-configuration even though we use 
IDE like IntelliJ right?
Or, how about give `dataLocation` through a new property?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137984767
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,42 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(mt: MapType, _) =>
+  try {
+val st = StructType(StructField("a", mt) :: Nil)
+JacksonUtils.verifySchema(st)
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case MapType(_: DataType, st: StructType, _: Boolean) =>
--- End diff --

@viirya I think if we have `case mt: MapType`, we don't need this pattern 
to verify schema, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81624 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81624/testReport)**
 for PR 18875 at commit 
[`f567679`](https://github.com/apache/spark/commit/f567679ec724052266587e35cdf25868587de750).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19186
  
**[Test build #81623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81623/testReport)**
 for PR 19186 at commit 
[`f8fa957`](https://github.com/apache/spark/commit/f8fa9573a1b40ff236e9c52cf429e2742c8f2bd0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-10 Thread zhengruifeng

GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/19186

[SPARK-21972][ML] Add param handlePersistence

## What changes were proposed in this pull request?
Add param handlePersistence

## How was this patch tested?
existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark fix_double_cache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19186


commit 45adf7e35175228821cc8f84a63855b851e25d55
Author: Zheng RuiFeng 
Date:   2017-03-31T03:10:45Z

recreate pr

commit 208e1009037aed780ca194658a68dec2221db3bc
Author: Zheng RuiFeng 
Date:   2017-07-06T05:06:04Z

del unused equation

commit 936d46634c69a8f0f8a546475edd3bfbd501d1e1
Author: Zheng RuiFeng 
Date:   2017-08-29T09:52:28Z

revert Predictor

commit df4d263d21c3e4413b4d74aaabcd3a10348a0001
Author: Zheng RuiFeng 
Date:   2017-09-04T05:27:56Z

create param HasHandlePersistence

commit 971e52c4a18b4261d82ac14fefa6bb849367562c
Author: Zheng RuiFeng 
Date:   2017-09-04T05:57:13Z

fix

commit f8fa9573a1b40ff236e9c52cf429e2742c8f2bd0
Author: Zheng RuiFeng 
Date:   2017-09-04T06:26:18Z

fix mima




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17014: [SPARK-18608][ML] Fix double-caching in ML algori...

2017-09-10 Thread zhengruifeng

Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/17014


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-10 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/17014
  
@smurching OK, I will close this PR and resubmit it to the new ticket.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16158
  
**[Test build #81622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81622/testReport)**
 for PR 16158 at commit 
[`297091f`](https://github.com/apache/spark/commit/297091fcb8562bba856fadb5848c2e66e0f30bfb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19184#discussion_r137982497
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -104,6 +124,10 @@ public void loadNext() throws IOException {
 if (taskContext != null) {
   taskContext.killTaskIfInterrupted();
 }
+if (this.din == null) {
+  // Good time to init (if all files are opened, we can get Too Many 
files exception)
+  initStreams();
+}
--- End diff --

IIUC, this PR does not reduce the number of total open files. Since this PR 
tries to open files when they are required, this PR may reduce possibility of 
occurring an error of too may open files.

As @viirya pointed out, it is necessary to provide a feature to control the 
number of opening files at one point (e.g. priority queue).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19169: [SPARK-21957][SQL] Add current_user function

2017-09-10 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/19169#discussion_r137982471
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CurrentUser.scala
 ---
@@ -0,0 +1,47 @@
+/*
--- End diff --

Add this to 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19184#discussion_r137981999
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -104,6 +124,10 @@ public void loadNext() throws IOException {
 if (taskContext != null) {
   taskContext.killTaskIfInterrupted();
 }
+if (this.din == null) {
+  // Good time to init (if all files are opened, we can get Too Many 
files exception)
+  initStreams();
+}
--- End diff --

I agree with @viirya , we're using priority queue to do merge sort, this 
will turn out to be all the readers in the priority queue is opened, so still 
cannot solve this issue.

I think a valid fix is to control the number of concurrent merged files, 
like MR's `io.sort.factor`.

Also we still need to address similar issue in `ExternalSorter` and other 
places in Shuffle.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19184#discussion_r137981905
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -104,6 +124,10 @@ public void loadNext() throws IOException {
 if (taskContext != null) {
   taskContext.killTaskIfInterrupted();
 }
+if (this.din == null) {
+  // Good time to init (if all files are opened, we can get Too Many 
files exception)
--- End diff --

This comment looks confusing. Maybe `It is the time to initialize and hold 
the input stream of the spill file for loading records. Keeps the input stream 
open too early will very possibly encounter too many file open issue.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19184#discussion_r137981565
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -72,12 +77,27 @@ public UnsafeSorterSpillReader(
   bufferSizeBytes = DEFAULT_BUFFER_SIZE_BYTES;
 }
 
+try (InputStream bs = new NioBufferedFileInputStream(file, (int) 
bufferSizeBytes);
+DataInputStream dataIn = new 
DataInputStream(serializerManager.wrapStream(blockId, bs))) {
+  this.numRecords = dataIn.readInt();
+  this.numRecordsRemaining = numRecords;
+}
+
+this.buffSize = bufferSizeBytes;
+this.file = file;
+this.blockId = blockId;
+this.serializerManager = serializerManager;
+
+logger.debug("bufSize: {}, file: {}, records: {}", buffSize, file, 
this.numRecords);
--- End diff --

Is this log useful? If the number of spill readers is so many, I guess we 
don't want to see so many log info?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19184#discussion_r137981394
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -72,12 +77,27 @@ public UnsafeSorterSpillReader(
   bufferSizeBytes = DEFAULT_BUFFER_SIZE_BYTES;
 }
 
+try (InputStream bs = new NioBufferedFileInputStream(file, (int) 
bufferSizeBytes);
--- End diff --

Please add a comment here to say we don't need to hold the file open until 
we actually want to load the records.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19178: [SPARK-21966][SQL]ResolveMissingReference rule should no...

2017-09-10 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19178
  
yea, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19178: [SPARK-21966][SQL]ResolveMissingReference rule should no...

2017-09-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19178
  
Yes. We do not plan to introduce extra complexity for improving this rule. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19178: [SPARK-21966][SQL]ResolveMissingReference rule should no...

2017-09-10 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19178
  
Aha, you mean this is an expected design and we won't change this logic in 
`ResolveMissingReference` now, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2017-09-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
ping @jiangxb1987 @cloud-fan This can be a useful feature for production as 
@wzhfy suggested. Can you help review this if possible? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19184
  
cc @cloud-fan @jiangxb1987 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19184
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81614/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19184
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP] Reused state should respect...

2017-09-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19068
  
cc @cloud-fan @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19184
  
**[Test build #81614 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81614/testReport)**
 for PR 19184 at commit 
[`dcc2960`](https://github.com/apache/spark/commit/dcc2960d5f60add9bfd9446df59b0d0d06365947).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19147
  
**[Test build #81621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81621/testReport)**
 for PR 19147 at commit 
[`dbc6dd2`](https://github.com/apache/spark/commit/dbc6dd2138b427d8436eb0d8bdc4ba134f254e35).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19184#discussion_r137980253
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -104,6 +124,10 @@ public void loadNext() throws IOException {
 if (taskContext != null) {
   taskContext.killTaskIfInterrupted();
 }
+if (this.din == null) {
+  // Good time to init (if all files are opened, we can get Too Many 
files exception)
+  initStreams();
+}
--- End diff --

I think you need to first describe more about how to fix this issue in the 
description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81620/testReport)**
 for PR 18875 at commit 
[`b8ad442`](https://github.com/apache/spark/commit/b8ad442437390611c63d8b0705d02c3ca3ba3c9c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137980084
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

ok. I just leave a TODO comment for it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #81619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81619/testReport)**
 for PR 16677 at commit 
[`f2a7aac`](https://github.com/apache/spark/commit/f2a7aacc22caf27c1a8af612c9432586e6a86d17).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19178: [SPARK-21966][SQL]ResolveMissingReference rule sh...

2017-09-10 Thread DonnyZone

Github user DonnyZone commented on a diff in the pull request:

https://github.com/apache/spark/pull/19178#discussion_r137979887
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1115,6 +1115,8 @@ class Analyzer(
   g.copy(join = true, child = addMissingAttr(g.child, missing))
 case d: Distinct =>
   throw new AnalysisException(s"Can't add $missingAttrs to $d")
+case u: Union =>
+  u.withNewChildren(u.children.map(addMissingAttr(_, 
missingAttrs)))
--- End diff --

Yeah, I agree with you. Current implementation only checks UnaryNode. 
It is necessary to take all node types into consideration.
Thanks for suggestion, I will work on a general solution.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19086
  
Thanks for working on it! @jinxing64 

Our current behavior is based on what the other traditional DBMS systems 
do. Moving tables across different database is rare. For example, DB2 and 
Oracle do not support it.
- https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_9019.htm
- 
https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r980.html

So far, we do not have a clean way to support our current behavior and the 
Hive's behavior. Thus, we want to hold this until the community has more 
requests. 

The quality of this PR is very close to the one we can merge. You can patch 
it to your local forked version. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9183: [SPARK-11215] [ML] Add multiple columns support to String...

2017-09-10 Thread minixalpha

Github user minixalpha commented on the issue:

https://github.com/apache/spark/pull/9183
  
Thanks for you job! @WeichenXu123 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137979202
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

Basically I don't like to increase the size of this PR anymore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137979158
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

Yeah, it is. Maybe we should revise `JacksonUtils` to be able to verify 
`MapType` directly. But I don't see `JacksonUtils.verifySchema` used in other 
place. We can leave it as TODO if you want to.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137979014
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

oh yes. It's a tricky way but make sense. =D


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9183: [SPARK-11215] [ML] Add multiple columns support to String...

2017-09-10 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/9183
  
@minixalpha Sorry for delay. Too busy recently. But I will try to finish 
and commit my new PR once I get time. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19172: [SPARK-21856] Add probability and rawPrediction to MLPC ...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81618/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19172: [SPARK-21856] Add probability and rawPrediction to MLPC ...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19172
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19172: [SPARK-21856] Add probability and rawPrediction to MLPC ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19172
  
**[Test build #81618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81618/testReport)**
 for PR 19172 at commit 
[`d2ccc1b`](https://github.com/apache/spark/commit/d2ccc1b5b9fc74563635a6fdc6153080661924c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137978728
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

ok got it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137978375
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

Maybe we can just do something like:

case mt: MapType =>
  val st = StructType(StructField("a", mt) :: Nil)
  JacksonUtils.verifySchema(st)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137978096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

@viirya But the `JacksonUtils.verifySchema` only verify a `StructType`, so 
we need to add a new one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977638
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

`ClassCastExceptoin` may not informative as `UnsupportedOperationException` 
here. I'd prefer to see `UnsupportedOperationException`. Let's keep this 
pattern matching for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977643
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
+UTF8String.fromString(json)
+  }
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}"""))
--- End diff --

@viirya I think we also need to do `gen.flush()`, right? So maybe we can 
keep `getAndReset` and modify it as below. Will it be better?
```
  def getAndReset(gen: JacksonGenerator, writer: Writer): String = {
gen.flush()
writer.toString
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19172: [SPARK-21856] Add probability and rawPrediction to MLPC ...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19172
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19172: [SPARK-21856] Add probability and rawPrediction to MLPC ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19172
  
**[Test build #81617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81617/testReport)**
 for PR 19172 at commit 
[`fac9c6e`](https://github.com/apache/spark/commit/fac9c6eeace6a5ea21a017cd40e09a1c8d44e5c7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977826
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
+UTF8String.fromString(json)
+  }
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}"""))
--- End diff --

Yeah, but it's just two lines. Depends on you to have a function or not. At 
least it shouldn't be called `getAndReset` now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19172: [SPARK-21856] Add probability and rawPrediction to MLPC ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19172
  
**[Test build #81618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81618/testReport)**
 for PR 19172 at commit 
[`d2ccc1b`](https://github.com/apache/spark/commit/d2ccc1b5b9fc74563635a6fdc6153080661924c1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19172: [SPARK-21856] Add probability and rawPrediction to MLPC ...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81617/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977281
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

Do you mean `dataType` now is a map type but `writeFields`'s 2nd parameter 
is a `StructType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977153
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
+UTF8String.fromString(json)
+  }
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}"""))
--- End diff --

Yes, maybe we can just `writer.toString` and assert it with the normal 
string.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977056
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
--- End diff --

ok. I got it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137976953
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

@viirya I found if we don't check it in here, it will throw 
`ClassCastExceptoin` from `writeObject()` firstly. So, maybe we don't need to 
check type in `rootFieldWriters` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9183: [SPARK-11215] [ML] Add multiple columns support to String...

2017-09-10 Thread minixalpha

Github user minixalpha commented on the issue:

https://github.com/apache/spark/pull/9183
  
@WeichenXu123  Any activity for the new PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19172: [SPARK-21856] Add probability and rawPrediction to MLPC ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19172
  
**[Test build #81617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81617/testReport)**
 for PR 19172 at commit 
[`fac9c6e`](https://github.com/apache/spark/commit/fac9c6eeace6a5ea21a017cd40e09a1c8d44e5c7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query...

2017-09-10 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18592
  
thanks, I'll make a pr later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-10 Thread smurching

Github user smurching commented on the issue:

https://github.com/apache/spark/pull/17014
  
Hi @zhengruifeng, thanks for your work on this!

Now that we're introducing a new handlePersistence parameter (a new public 
API), it'd be good to track work in a separate JIRA/PR as @jkbradley suggested 
so others are aware of the proposed change.

I've created a new JIRA ticket for adding the handlePersistence param here: 
[SPARK-21972](https://issues.apache.org/jira/browse/SPARK-21972). Would you 
mind resubmitting your work as a new PR that addresses the new JIRA ticket 
(SPARK-21972)?

Thanks & sorry for the inconvenience!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19185
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19185
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81615/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19185
  
**[Test build #81615 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81615/testReport)**
 for PR 19185 at commit 
[`53ac68e`](https://github.com/apache/spark/commit/53ac68e24731e405d44d50be70b1c25f2ab9df3d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137976262
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

We need to revise `JacksonUtils` to check map type too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137976065
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

Not all map types are valid for JSON. We still need verify the type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19172: [SPARK-21856] Add probability and rawPrediction t...

2017-09-10 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19172#discussion_r137975691
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1655,6 +1655,26 @@ def 
test_multinomial_logistic_regression_with_bound(self):
 np.allclose(model.interceptVector.toArray(), [-0.9057, 
-1.1392, -0.0033], atol=1E-4))
 
 
+class MultilayerPerceptronClassifierTest(SparkSessionTestCase):
+
+def test_multilayer_perceptron_classifier(self):
--- End diff --

Rename to ```test_raw_and_probability_prediction```?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975716
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

ok. I'll remove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975624
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
--- End diff --

I don't prefer tests depending each others. I think we can just have a 
writer for each test, instead of reusing it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query...

2017-09-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18592
  
Also sounds good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975414
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
+UTF8String.fromString(json)
+  }
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}"""))
--- End diff --

Can we simply compare the strings? I.e., don't need to do 
`UTF8String.fromString` in `getAndReset` and here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81616/testReport)**
 for PR 18875 at commit 
[`b8bf38c`](https://github.com/apache/spark/commit/b8bf38cda5f3b03473f5bd1f1c4c8599e342b869).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-10 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/19185
  
@gatorsmile Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19185
  
**[Test build #81615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81615/testReport)**
 for PR 19185 at commit 
[`53ac68e`](https://github.com/apache/spark/commit/53ac68e24731e405d44d50be70b1c25f2ab9df3d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19185
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19185
  
@yanboliang 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975214
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,30 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` can only be used to write out a row when 
initialized with `StructType`.")
+  }
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
   def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
 
+  /**
+   * Transforms a `MapData` to JSON object using Jackson
+   *
+   * @param map a map to convert
+   */
+  def write(map: MapData): Unit = dataType match {
+case mt: MapType => writeObject(writeMapData(map, mt, 
mapElementWriter))
+case _ => throw new UnsupportedOperationException(
--- End diff --

Similar to https://github.com/apache/spark/pull/18875/files#r137975134, we 
can avoid this. Doing pattern matching per writing seems too burden.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975134
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

Rethinking about this, I think we can avoid matching the data type and 
simply do `writeObject(writeFields(row, schema, rootFieldWriters))` like 
before. When accessing `rootFieldWriters`, there is an exception thrown then.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...

2017-09-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19185
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19185: [Spark-21854] Added LogisticRegressionTrainingSum...

2017-09-10 Thread jmwdpk

GitHub user jmwdpk opened a pull request:

https://github.com/apache/spark/pull/19185

[Spark-21854] Added LogisticRegressionTrainingSummary for 
MultinomialLogisticRegression in Python API

## What changes were proposed in this pull request?

Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression 
in Python API

## How was this patch tested?

Added unit test

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jmwdpk/spark SPARK-21854

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19185


commit 50cfafe9e0cda8702fa43fd27ea75bfbd042268e
Author: Ming Jiang 
Date:   2017-09-04T04:22:33Z

added probabilityCol to LogisticRegressionSummary

commit 60579d5f36ef26f6e3ec675a795ccc86d6a71c8d
Author: Ming Jiang 
Date:   2017-09-05T05:08:15Z

modified LogisticRegressionSummary and LogisticRegressionModel in 
classification.py

commit 1a73e6c6d20d6374379ec2fb237e7f596e77bc62
Author: Ming Jiang 
Date:   2017-09-07T22:39:08Z

test on multiclass summary

commit 53ac68e24731e405d44d50be70b1c25f2ab9df3d
Author: Ming Jiang 
Date:   2017-09-08T05:53:47Z

fixed numClasses




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137974036
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` can only be used to write out a row when 
initialized with `StructType`.")
+  }
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = {
+writeArray(writeArrayData(array, arrElementWriter))
+  }
--- End diff --

We don't need to change this line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19184: [SPARK-21971][CORE] Too many open files in Spark ...

2017-09-10 Thread rajeshbalamohan

Github user rajeshbalamohan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19184#discussion_r137973976
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -104,6 +124,10 @@ public void loadNext() throws IOException {
 if (taskContext != null) {
   taskContext.killTaskIfInterrupted();
 }
+if (this.din == null) {
+  // Good time to init (if all files are opened, we can get Too Many 
files exception)
+  initStreams();
+}
--- End diff --

Good point. PR has been tried with queries involving window functions (e.g 
Q67) for which it worked fine. 

During spill merges (esp getSortedIterator), it is possible to encounter 
too many open files issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137973919
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,55 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+/**
+ * `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+ * Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+ * struct. Once it is initialized with `MapType`, it can be used to write 
out a map. An exception
--- End diff --

`Once it is initialized with ``MapType``, it can be used to write out a map 
or an array of map.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19174: [SPARK-21963][CORE][TEST]Create temp file should be dele...

2017-09-10 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/19174
  
@kiszk thank you for review it.
I checked the other test cases that use ` File.createTempFile` and 
`Files.createTempFile `.
Except as described in this PR, others test cases that already use 
`File.deleteOnExit` or` File.delete`.
thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 235 matches

Mail list logo