[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94576403
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging {
 
 if (server != null) {
   server.shutdown()
+  server.awaitShutdown()
   server = null
 }
 
-brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) 
}
+// On Windows, `logDirs` is left open even after Kafka server above is 
completely shut-downed
+// in some cases. It leads to test failures on Windows if these are 
not ignored.
+brokerConf.logDirs.map(new File(_))
+  .filterNot(FileUtils.deleteQuietly)
--- End diff --

Oh, your comment didn't show up when I was writing the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94576764
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging {
 
 if (server != null) {
   server.shutdown()
+  server.awaitShutdown()
   server = null
 }
 
-brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) 
}
+// On Windows, `logDirs` is left open even after Kafka server above is 
completely shut-downed
+// in some cases. It leads to test failures on Windows if these are 
not ignored.
+brokerConf.logDirs.map(new File(_))
+  .filterNot(FileUtils.deleteQuietly)
--- End diff --

It seems a Windows specific problem. It holds an exclusive lock. If 
anything does not close the file, then, it can't be deleted or renamed unlike 
Linux or Mac. I assume it will be able to be deleted if we call `System.gc()` 
though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94577026
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging {
 
 if (server != null) {
   server.shutdown()
+  server.awaitShutdown()
   server = null
 }
 
-brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) 
}
+// On Windows, `logDirs` is left open even after Kafka server above is 
completely shut-downed
+// in some cases. It leads to test failures on Windows if these are 
not ignored.
+brokerConf.logDirs.map(new File(_))
+  .filterNot(FileUtils.deleteQuietly)
--- End diff --

So, I _guess_ it can't be deleted until the object holding the lock is 
actually garbage-collected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94577735
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging {
 
 if (server != null) {
   server.shutdown()
+  server.awaitShutdown()
   server = null
 }
 
-brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) 
}
+// On Windows, `logDirs` is left open even after Kafka server above is 
completely shut-downed
+// in some cases. It leads to test failures on Windows if these are 
not ignored.
+brokerConf.logDirs.map(new File(_))
+  .filterNot(FileUtils.deleteQuietly)
--- End diff --

Yes, it seems the contents always should be deleted first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
I really don't think using the full path makes `HiveSparkSubmitSuite.scala` 
flaky but if it seems frequently failed, let me use the original code path for 
OSs except Windows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
Could you maybe check if this patch is applied properly? That error is 
exactly what this PR fixes and it seems the line number in the errors is not 
matched to the one in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
I just checked each except for the one below is passed in Windows via 
concatenated 
[full-log](https://gist.github.com/HyukjinKwon/2d199ac9156c380015ad5a71f77866be).

It seems `DirectKafkaStreamSuite` is flaky due to the occasional failure of 
removing created temp directory (the same problem with 
https://github.com/apache/spark/pull/16451#discussion_r94562756 but that one is 
constantly failed)

It is sometimes passed 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=A7615F8B-58B0-4D9B-A914-32E7BF7DCB65&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/A7615F8B-58B0-4D9B-A914-32E7BF7DCB65)
 but sometimes failed as below: 

```
DirectKafkaStreamSuite:
 Exception encountered when attempting to run a suite with class name: 
org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite *** ABORTED *** (59 
seconds, 626 milliseconds)
   java.io.IOException: Failed to delete: 
C:\projects\spark\target\tmp\spark-426107da-68cf-4d94-b0d6-1f428f1c53f6
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13252: [SPARK-15473][SQL] CSV data source writes header ...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/13252


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13252: [SPARK-15473][SQL] CSV data source writes header for emp...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13252
  
Let me suggest a generalized way latter because it does not look a clean 
fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94754860
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
 ---
@@ -228,3 +150,35 @@ class CSVFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 schema.foreach(field => verifyType(field.dataType))
   }
 }
+
--- End diff --

These just came from `CSVRelation`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94755002
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -39,22 +38,43 @@ private[csv] object CSVInferSchema {
* 3. Replace any null types with string type
*/
   def infer(
--- End diff --

This is kind of a important change to introduce similar functionalities 
with JSON. (e,g., creating a dataframe from `RDD[String]` or `Dataset[String]`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94755096
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -214,127 +234,47 @@ private[csv] object CSVInferSchema {
 
 case _ => None
   }
-}
-
-private[csv] object CSVTypeCast {
-  // A `ValueConverter` is responsible for converting the given value to a 
desired type.
-  private type ValueConverter = String => Any
 
   /**
-   * Create converters which cast each given string datum to each 
specified type in given schema.
-   * Currently, we do not support complex types (`ArrayType`, `MapType`, 
`StructType`).
-   *
-   * For string types, this is simply the datum.
-   * For other types, this is converted into the value according to the 
type.
-   * For other nullable types, returns null if it is null or equals to the 
value specified
-   * in `nullValue` option.
-   *
-   * @param schema schema that contains data types to cast the given value 
into.
-   * @param options CSV options.
+   * Generates a header from the given row which is null-safe and 
duplicate-safe.
*/
-  def makeConverters(
-  schema: StructType,
-  options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
-schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
-  }
-
-  /**
-   * Create a converter which converts the string value to a value 
according to a desired type.
-   */
-  def makeConverter(
-   name: String,
-   dataType: DataType,
-   nullable: Boolean = true,
-   options: CSVOptions = CSVOptions()): ValueConverter = dataType 
match {
-case _: ByteType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options)(_.toByte)
-
-case _: ShortType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options)(_.toShort)
-
-case _: IntegerType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options)(_.toInt)
-
-case _: LongType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options)(_.toLong)
-
-case _: FloatType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options) {
-case options.nanValue => Float.NaN
-case options.negativeInf => Float.NegativeInfinity
-case options.positiveInf => Float.PositiveInfinity
-case datum =>
-  Try(datum.toFloat)
-
.getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).floatValue())
-  }
-
-case _: DoubleType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options) {
-case options.nanValue => Double.NaN
-case options.negativeInf => Double.NegativeInfinity
-case options.positiveInf => Double.PositiveInfinity
-case datum =>
-  Try(datum.toDouble)
-
.getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).doubleValue())
+  private def makeSafeHeader(
--- End diff --

This just came from `CSVFileFormat`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94755470
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -126,6 +127,39 @@ private[csv] class CSVOptions(@transient private val 
parameters: CaseInsensitive
   val inputBufferSize = 128
 
   val isCommentSet = this.comment != '\u'
+
+  def asWriterSettings: CsvWriterSettings = {
--- End diff --

These just came from `CSVParser`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94755659
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.math.BigDecimal
+import java.text.NumberFormat
+import java.util.Locale
+
+import scala.util.Try
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.CsvParser
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+private[csv] class UnivocityParser(
+schema: StructType,
+requiredSchema: StructType,
+options: CSVOptions) extends Logging {
+  def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
+
+  val valueConverters = makeConverters(schema, options)
+  val parser = new CsvParser(options.asParserSettings)
+
+  // A `ValueConverter` is responsible for converting the given value to a 
desired type.
+  private type ValueConverter = String => Any
+
+  var numMalformedRecords = 0
+  val row = new GenericInternalRow(requiredSchema.length)
--- End diff --

Now, we reuse the single row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94755607
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala
 ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.io.Writer
+
+import com.univocity.parsers.csv.CsvWriter
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.types._
+
+private[csv] class UnivocityGenerator(
+schema: StructType,
+writer: Writer,
+options: CSVOptions = new CSVOptions(Map.empty[String, String])) {
+  private val writerSettings = options.asWriterSettings
+  writerSettings.setHeaders(schema.fieldNames: _*)
+  private val gen = new CsvWriter(writer, writerSettings)
+
+  // A `ValueConverter` is responsible for converting a value of an 
`InternalRow` to `String`.
--- End diff --

These below just mostly came from `CSVTypeCast`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94755764
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.math.BigDecimal
+import java.text.NumberFormat
+import java.util.Locale
+
+import scala.util.Try
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.CsvParser
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+private[csv] class UnivocityParser(
+schema: StructType,
+requiredSchema: StructType,
+options: CSVOptions) extends Logging {
+  def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
+
+  val valueConverters = makeConverters(schema, options)
+  val parser = new CsvParser(options.asParserSettings)
+
+  // A `ValueConverter` is responsible for converting the given value to a 
desired type.
+  private type ValueConverter = String => Any
+
+  var numMalformedRecords = 0
+  val row = new GenericInternalRow(requiredSchema.length)
+  val indexArr: Array[Int] = {
+val fields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredSchema ++ schema.filterNot(requiredSchema.contains(_))
+} else {
+  requiredSchema
+}
+fields.filter(schema.contains).map(schema.indexOf).toArray
+  }
+
+  /**
+   * Create converters which cast each given string datum to each 
specified type in given schema.
+   * Currently, we do not support complex types (`ArrayType`, `MapType`, 
`StructType`).
+   *
+   * For string types, this is simply the datum.
+   * For other types, this is converted into the value according to the 
type.
+   * For other nullable types, returns null if it is null or equals to the 
value specified
+   * in `nullValue` option.
+   *
+   * @param schema schema that contains data types to cast the given value 
into.
+   * @param options CSV options.
+   */
+  private def makeConverters(
+  schema: StructType,
+  options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
+schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+  }
+
+  /**
+   * Create a converter which converts the string value to a value 
according to a desired type.
+   */
+  def makeConverter(
+  name: String,
+  dataType: DataType,
+  nullable: Boolean = true,
+  options: CSVOptions = CSVOptions()): ValueConverter = dataType match 
{
+case _: ByteType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toByte)
+
+case _: ShortType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toShort)
+
+case _: IntegerType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toInt)
+
+case _: LongType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toLong)
+
+case _: FloatType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) {
+case options.nanValue => Float.NaN
+case options.negativeInf => Float.NegativeInfinity
+case options.positiveInf => Float.PositiveInfinity
+case datum =>
+  Try(datum.toFloat)
+
.getOrElse(NumberFormat.getInstance(Locale.US).parse

[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94756023
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.math.BigDecimal
+import java.text.NumberFormat
+import java.util.Locale
+
+import scala.util.Try
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.CsvParser
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+private[csv] class UnivocityParser(
+schema: StructType,
+requiredSchema: StructType,
+options: CSVOptions) extends Logging {
+  def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
+
+  val valueConverters = makeConverters(schema, options)
+  val parser = new CsvParser(options.asParserSettings)
+
+  // A `ValueConverter` is responsible for converting the given value to a 
desired type.
+  private type ValueConverter = String => Any
+
+  var numMalformedRecords = 0
+  val row = new GenericInternalRow(requiredSchema.length)
+  val indexArr: Array[Int] = {
+val fields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredSchema ++ schema.filterNot(requiredSchema.contains(_))
+} else {
+  requiredSchema
+}
+fields.filter(schema.contains).map(schema.indexOf).toArray
+  }
+
+  /**
+   * Create converters which cast each given string datum to each 
specified type in given schema.
+   * Currently, we do not support complex types (`ArrayType`, `MapType`, 
`StructType`).
+   *
+   * For string types, this is simply the datum.
+   * For other types, this is converted into the value according to the 
type.
+   * For other nullable types, returns null if it is null or equals to the 
value specified
+   * in `nullValue` option.
+   *
+   * @param schema schema that contains data types to cast the given value 
into.
+   * @param options CSV options.
+   */
+  private def makeConverters(
+  schema: StructType,
+  options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
+schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+  }
+
+  /**
+   * Create a converter which converts the string value to a value 
according to a desired type.
+   */
+  def makeConverter(
+  name: String,
+  dataType: DataType,
+  nullable: Boolean = true,
+  options: CSVOptions = CSVOptions()): ValueConverter = dataType match 
{
+case _: ByteType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toByte)
+
+case _: ShortType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toShort)
+
+case _: IntegerType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toInt)
+
+case _: LongType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toLong)
+
+case _: FloatType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) {
+case options.nanValue => Float.NaN
+case options.negativeInf => Float.NegativeInfinity
+case options.positiveInf => Float.PositiveInfinity
+case datum =>
+  Try(datum.toFloat)
+
.getOrElse(NumberFormat.getInstance(Locale.US).parse

[GitHub] spark pull request #15329: [SPARK-17763][SQL] JacksonParser silently parses ...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/15329


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15329: [SPARK-17763][SQL] JacksonParser silently parses null as...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15329
  
After re-thinking, I think this might be blocked by 
https://github.com/apache/spark/pull/14124 (or at least closely related). 
Anyhow, it looks dependent of it. Let me close this for now and try to reopen 
this if I can feel able to proceed this further independently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94773568
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.math.BigDecimal
+import java.text.NumberFormat
+import java.util.Locale
+
+import scala.util.Try
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.CsvParser
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+private[csv] class UnivocityParser(
+schema: StructType,
+requiredSchema: StructType,
+options: CSVOptions) extends Logging {
+  def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
+
+  val valueConverters = makeConverters(schema, options)
+  val parser = new CsvParser(options.asParserSettings)
+
+  // A `ValueConverter` is responsible for converting the given value to a 
desired type.
+  private type ValueConverter = String => Any
+
+  var numMalformedRecords = 0
+  val row = new GenericInternalRow(requiredSchema.length)
--- End diff --

Also, it separates numMalformedRecords when it calls `parse (...)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13771: [SPARK-13748][PYSPARK][DOC] Add the description for expl...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13771
  
Hi @davies, this PR adds a documentation that describes the potentially 
confusing behaviour as is for the current state. I know this is a super trivial 
but it is an improvement without any regression. Assuming the JIRA 
https://issues.apache.org/jira/browse/SPARK-12624, it seems we don't want to 
allow omitting and it looks it describes the correct behaviour.

Could I please ask to take a look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
ping @joshrosen and @davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15751: [SPARK-18246][SQL] Throws an exception before execution ...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15751
  
Hi @marmbrus, would there be something you are worried of and I should take 
a careful look for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15049
  
Hi all, could you please guide me a bit further about what I should do in 
this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14788
  
gentle ping...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14451: [SPARK-16848][SQL] Make jdbc() and read.format("jdbc") c...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14451
  
Hi @gatorsmile, could I ask what do you think about this PR for now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13988
  
Hi @cloud-fan, do you mind if I ask to check whether it looks making sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94892239
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.math.BigDecimal
+import java.text.NumberFormat
+import java.util.Locale
+
+import scala.util.Try
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.CsvParser
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+private[csv] class UnivocityParser(
+schema: StructType,
+requiredSchema: StructType,
+options: CSVOptions) extends Logging {
+  def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
+
+  val valueConverters = makeConverters(schema, options)
+  val parser = new CsvParser(options.asParserSettings)
+
+  // A `ValueConverter` is responsible for converting the given value to a 
desired type.
+  private type ValueConverter = String => Any
+
+  var numMalformedRecords = 0
+  val row = new GenericInternalRow(requiredSchema.length)
+  val indexArr: Array[Int] = {
+val fields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredSchema ++ schema.filterNot(requiredSchema.contains(_))
+} else {
+  requiredSchema
+}
+fields.filter(schema.contains).map(schema.indexOf).toArray
+  }
+
+  /**
+   * Create converters which cast each given string datum to each 
specified type in given schema.
+   * Currently, we do not support complex types (`ArrayType`, `MapType`, 
`StructType`).
+   *
+   * For string types, this is simply the datum.
+   * For other types, this is converted into the value according to the 
type.
+   * For other nullable types, returns null if it is null or equals to the 
value specified
+   * in `nullValue` option.
+   *
+   * @param schema schema that contains data types to cast the given value 
into.
+   * @param options CSV options.
+   */
+  private def makeConverters(
+  schema: StructType,
+  options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
+schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+  }
+
+  /**
+   * Create a converter which converts the string value to a value 
according to a desired type.
+   */
+  def makeConverter(
+  name: String,
+  dataType: DataType,
+  nullable: Boolean = true,
+  options: CSVOptions = CSVOptions()): ValueConverter = dataType match 
{
+case _: ByteType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toByte)
+
+case _: ShortType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toShort)
+
+case _: IntegerType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toInt)
+
+case _: LongType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options)(_.toLong)
+
+case _: FloatType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) {
+case options.nanValue => Float.NaN
+case options.negativeInf => Float.NegativeInfinity
+case options.positiveInf => Float.PositiveInfinity
+case datum =>
+  Try(datum.toFloat)
+
.getOrElse(NumberFormat.getInstance(Locale.US).parse

[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13988#discussion_r94892718
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.math.BigDecimal
+import java.text.NumberFormat
+import java.util.Locale
+
+import scala.util.Try
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.CsvParser
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+private[csv] class UnivocityParser(
+schema: StructType,
+requiredSchema: StructType,
+options: CSVOptions) extends Logging {
+  def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
+
+  val valueConverters = makeConverters(schema, options)
--- End diff --

Some changes about converting here came from `CSVTypeCast`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14451: [SPARK-16848][SQL] Make jdbc() and read.format("jdbc") c...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14451
  
Ah, thanks. Let me check out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14451: [SPARK-16848][SQL] Make jdbc() and read.format("jdbc") c...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14451
  
@gatorsmile, I just updated the PR description and added another 
`table(..)` API and the test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94900183
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala ---
@@ -339,7 +339,13 @@ class HiveSparkSubmitSuite
   private def runSparkSubmit(args: Seq[String]): Unit = {
 val sparkHome = sys.props.getOrElse("spark.test.home", 
fail("spark.test.home is not set!"))
 val history = ArrayBuffer.empty[String]
-val commands = Seq("./bin/spark-submit") ++ args
+val sparkSubmit = if (Utils.isWindows) {
+  // On Windows, `ProcessBuilder.directory` does not change the 
current working directory.
+  new File("..\\..\\bin\\spark-submit.cmd").getAbsolutePath
+} else {
+  "./bin/spark-submit"
--- End diff --

I really don't get why/how using the full path (possibly) makes 
`HiveSparkSubmitSuite` flaky because `SparkSubmitSuite` uses the full path too.

I just used the same way used in the original path just simply because 5 
out of 18 builds were failed in `HiveSparkSubmitSuite`. I can't think of a 
workaround for Windows in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] `org.apache.spark.sql.hive.HiveSparkSubmitSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=3E2A36EC-E0E9-4A49-9142-50EBB669F9C0&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/3E2A36EC-E0E9-4A49-9142-50EBB669F9C0)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
@srowen, regarding 
https://github.com/apache/spark/pull/16451#discussion_r94572449, would it be 
okay to leave as is or should I give a shot for `Utils.deleteRecursively`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14451: [SPARK-16848][SQL] Make jdbc() and read.format("jdbc") c...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14451
  
Sure, I will add.  It seems `format("jdbc").load()` thorws an exception 
already which is being tested 
[here](https://github.com/apache/spark/blob/f923c849e5b8f7e7aeafee59db598a9bf4970f50/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala#L294-L308).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14451: [SPARK-16848][SQL] Make jdbc() and read.format("jdbc") c...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14451
  
Yeap. Then, let me add a test case and fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14451: [SPARK-16848][SQL] Check schema validation for us...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14451#discussion_r94903231
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -536,10 +539,17 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
*/
   @scala.annotation.varargs
   def textFile(paths: String*): Dataset[String] = {
+assertNoSpecifiedSchema("textFile")
+text(paths : 
_*).select("value").as[String](sparkSession.implicits.newStringEncoder)
+  }
+
+  /**
+   * A convenient function for schema validation in APIs.
+   */
+  private def assertNoSpecifiedSchema(operation: String): Unit = {
 if (userSpecifiedSchema.nonEmpty) {
-  throw new AnalysisException("User specified schema not supported 
with `textFile`")
+  throw new AnalysisException(s"User specified schema not supported 
with `$operation`")
 }
-text(paths : 
_*).select("value").as[String](sparkSession.implicits.newStringEncoder)
--- End diff --

This prints weird.. I just added 

```scala
  /**
   * A convenient function for schema validation in APIs.
   */
  private def assertNoSpecifiedSchema(operation: String): Unit = {
if (userSpecifiedSchema.nonEmpty) {
  throw new AnalysisException(s"User specified schema not supported 
with `$operation`")
}
  }
```

at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14451: [SPARK-16848][SQL] Check schema validation for us...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14451#discussion_r94903385
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -536,10 +539,17 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
*/
   @scala.annotation.varargs
   def textFile(paths: String*): Dataset[String] = {
+assertNoSpecifiedSchema("textFile")
+text(paths : 
_*).select("value").as[String](sparkSession.implicits.newStringEncoder)
+  }
+
+  /**
+   * A convenient function for schema validation in APIs.
+   */
+  private def assertNoSpecifiedSchema(operation: String): Unit = {
 if (userSpecifiedSchema.nonEmpty) {
-  throw new AnalysisException("User specified schema not supported 
with `textFile`")
+  throw new AnalysisException(s"User specified schema not supported 
with `$operation`")
 }
-text(paths : 
_*).select("value").as[String](sparkSession.implicits.newStringEncoder)
--- End diff --

This prints weird.. I just added 

```scala
  /**
   * A convenient function for schema validation in APIs.
   */
  private def assertNoSpecifiedSchema(operation: String): Unit = {
if (userSpecifiedSchema.nonEmpty) {
  throw new AnalysisException(s"User specified schema not supported 
with `$operation`")
}
  }
```

at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14451: [SPARK-16848][SQL] Check schema validation for us...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14451#discussion_r94903467
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -536,10 +539,17 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
*/
   @scala.annotation.varargs
   def textFile(paths: String*): Dataset[String] = {
+assertNoSpecifiedSchema("textFile")
+text(paths : 
_*).select("value").as[String](sparkSession.implicits.newStringEncoder)
+  }
+
+  /**
+   * A convenient function for schema validation in APIs.
+   */
+  private def assertNoSpecifiedSchema(operation: String): Unit = {
 if (userSpecifiedSchema.nonEmpty) {
-  throw new AnalysisException("User specified schema not supported 
with `textFile`")
+  throw new AnalysisException(s"User specified schema not supported 
with `$operation`")
 }
-text(paths : 
_*).select("value").as[String](sparkSession.implicits.newStringEncoder)
--- End diff --

This prints weird.. I just added 

```scala
  /**
   * A convenient function for schema validation in APIs.
   */
  private def assertNoSpecifiedSchema(operation: String): Unit = {
if (userSpecifiedSchema.nonEmpty) {
  throw new AnalysisException(s"User specified schema not supported 
with `$operation`")
}
  }
```

at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
@holdenk, Yup. (actually, it is cloudpipe/cloudpickle@cbd3f34 because there 
were more minor fixes). It seems there are several customized codes such as 
injecting `numpy` as below:

```python
def inject_numpy(self):
numpy = sys.modules.get('numpy')
if not numpy or not hasattr(numpy, 'ufunc'):
return
self.dispatch[numpy.ufunc] = self.__class__.save_ufunc
```

So, I tried to take out the valid changes from there for now.

I also agree with using it as a thirdparty library if possible. That would 
be bigger job than this though because we now have a quite lot of customized 
codes there.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
BTW, this at least makes Spark on working state in Python 3.6.0 with a 
minimised changes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14451: [SPARK-16848][SQL] Check schema validation for user-spec...

2017-01-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

2017-01-06 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/13988


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

2017-01-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13988
  
Sure! Let me split this into reading and writing ones. Thank you for yout 
comments. Let me close this for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r95055056
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging {
 
 if (server != null) {
   server.shutdown()
+  server.awaitShutdown()
   server = null
 }
 
-brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) 
}
+// On Windows, `logDirs` is left open even after Kafka server above is 
completely shut-downed
+// in some cases. It leads to test failures on Windows if these are 
not ignored.
+brokerConf.logDirs.map(new File(_))
+  .filterNot(FileUtils.deleteQuietly)
--- End diff --

> Is the point just that the former method is "quiet" instead of throwing 
an exception

Yes!

I will try to clean up including your another comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
> Utils.deleteRecursively does seem to delete the contents first, so I 
don't know why it fails on Windows.

Maybe, I misunderstood this comment and also I might be wrong but let me 
try to explain at my best.
On Windows, if we have a folder below and call 
`Utils.deleteRecursively(".\\tmp")`,

```
tmp
└── resource1.txt
```

It will try to remove `resource1.txt` first. If anything opened 
`resource1.txt` without closing, Windows does not seem allowing to delete 
`resource1.txt` and also the folder `tmp` unless the object referencing 
`resource1.txt` closes it or is garbage-collected.

For example,

- Windows

```scala
scala> new java.io.FileInputStream("resource1.txt")
res2: java.io.FileInputStream = java.io.FileInputStream@4980d3

scala> new java.io.File("resource1.txt").delete()
res3: Boolean = false

scala> new java.io.File("resource1.txt").exists()
res4: Boolean = true
```

- Mac

```scala
scala> new java.io.FileInputStream("resource1.txt")
res6: java.io.FileInputStream = java.io.FileInputStream@13805618

scala> new java.io.File("resource1.txt").delete()
res7: Boolean = true

scala> new java.io.File("resource1.txt").exists()
res8: Boolean = false
```

So, if deleting `resource1.txt` throws an exception in `finally` block by 
`true` from `!file.delete()` and `true` from `file.exists()`. Afterwords, 
another exception is thrown by the failure of removing `tmp` because the 
directory is not empty in `finally` block.

This suppresses the original exception message from the failure of removing 
the contents but throws a new exception that says the failure of removing the 
parent directory.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] 
`org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=DC5D6D27-6DDA-4D4A-8579-24BC1A5B487B&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/DC5D6D27-6DDA-4D4A-8579-24BC1A5B487B)
Build started: [TESTS] 
`org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=872A75B1-A478-4007-B02B-882A85DA94BC&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/872A75B1-A478-4007-B02B-882A85DA94BC)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
I made a new function just simply because it repeats the same logics. 
Either way is fine with me.

So.. do you mean something like the one as below?

```scala
try {
  Utils.deleteRecursively(snapshotDir)
} catch {
  case e: IOException if Utils.isWindows =>
logWarning(e.getMessage)
}

try {
  Utils.deleteRecursively(logDir)
} catch {
  case e: IOException if Utils.isWindows =>
logWarning(e.getMessage)
}
```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Ah, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] 
`org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=B6F011A1-2E0F-4A58-AAA5-A9C04DF8A5F8&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/B6F011A1-2E0F-4A58-AAA5-A9C04DF8A5F8)
Build started: [TESTS] 
`org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=8D9D4C4E-4137-4A39-84E9-68DC944F4B5A&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/8D9D4C4E-4137-4A39-84E9-68DC944F4B5A)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16496: [SPARK-16101][SQL] Refactoring CSV write path to ...

2017-01-07 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16496

[SPARK-16101][SQL] Refactoring CSV write path to be consistent with JSON 
data source

## What changes were proposed in this pull request?

This PR refactors CSV write path to be consistent with JSON data source.

This PR makes the methods in classes have consistent arguments with JSON 
ones.
  - `UnivocityGenerator` and `JacksonGenerator` 

``` scala
private[csv] class UnivocityGenerator(
schema: StructType,
writer: Writer,
options: CSVOptions = new CSVOptions(Map.empty[String, String])) {
...

def write ...
def close ...
def flush ...
```

``` scala
private[sql] class JacksonGenerator(
   schema: StructType,
   writer: Writer,
   options: JSONOptions = new JSONOptions(Map.empty[String, String])) {
...

def write ...
def close ...
def flush ...
```

- This PR also makes the classes put in together in a consistent manner 
with JSON.
  - `CsvFileFormat`

``` scala
CsvFileFormat
CsvOutputWriter
```

  - `JsonFileFormat`

``` scala
JsonFileFormat
JsonOutputWriter
```

## How was this patch tested?

Existing tests should cover this.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-16101-write

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16496.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16496


commit b8d97f7c391ea1f833d7a4025e3c627fc6385cfb
Author: hyukjinkwon 
Date:   2017-01-07T14:47:25Z

Refactoring CSV write path to be consistent with JSON data source




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

2017-01-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16496
  
cc @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

2017-01-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16496
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Ah, I just found anothet one. Let me check all other errors again. They are 
now pretty few (30-ish).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16501: [WIP][SPARK-19117][TESTS] Skip the tests using sc...

2017-01-08 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16501

[WIP][SPARK-19117][TESTS] Skip the tests using script transformation on 
Windows

## What changes were proposed in this pull request?

This PR proposes to skip the tests for script transformation failed on 
Windows due to fixed bash location.

```
SQLQuerySuite:
 - script *** FAILED *** (553 milliseconds)
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 56.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
56.0 (TID 54, localhost, executor driver): java.io.IOException: Cannot run 
program "/bin/bash": CreateProcess error=2, The system cannot find the file 
specified

 - Star Expansion - script transform *** FAILED *** (2 seconds, 375 
milliseconds)
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 389.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
389.0 (TID 725, localhost, executor driver): java.io.IOException: Cannot run 
program "/bin/bash": CreateProcess error=2, The system cannot find the file 
specified

 - test script transform for stdout *** FAILED *** (2 seconds, 813 
milliseconds)
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 391.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
391.0 (TID 726, localhost, executor driver): java.io.IOException: Cannot run 
program "/bin/bash": CreateProcess error=2, The system cannot find the file 
specified

 - test script transform for stderr *** FAILED *** (2 seconds, 407 
milliseconds)
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 393.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
393.0 (TID 727, localhost, executor driver): java.io.IOException: Cannot run 
program "/bin/bash": CreateProcess error=2, The system cannot find the file 
specified

 - test script transform data type *** FAILED *** (171 milliseconds)
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 395.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
395.0 (TID 728, localhost, executor driver): java.io.IOException: Cannot run 
program "/bin/bash": CreateProcess error=2, The system cannot find the file 
specified
```

```
HiveQuerySuite:
 - transform *** FAILED *** (359 milliseconds)
   Failed to execute query using catalyst:
   Error: Job aborted due to stage failure: Task 0 in stage 1347.0 failed 1 
times, most recent failure: Lost task 0.0 in stage 1347.0 (TID 2395, localhost, 
executor driver): java.io.IOException: Cannot run program "/bin/bash": 
CreateProcess error=2, The system cannot find the file specified
  
 - schema-less transform *** FAILED *** (344 milliseconds)
   Failed to execute query using catalyst:
   Error: Job aborted due to stage failure: Task 0 in stage 1348.0 failed 1 
times, most recent failure: Lost task 0.0 in stage 1348.0 (TID 2396, localhost, 
executor driver): java.io.IOException: Cannot run program "/bin/bash": 
CreateProcess error=2, The system cannot find the file specified

 - transform with custom field delimiter *** FAILED *** (296 milliseconds)
   Failed to execute query using catalyst:
   Error: Job aborted due to stage failure: Task 0 in stage 1349.0 failed 1 
times, most recent failure: Lost task 0.0 in stage 1349.0 (TID 2397, localhost, 
executor driver): java.io.IOException: Cannot run program "/bin/bash": 
CreateProcess error=2, The system cannot find the file specified

 - transform with custom field delimiter2 *** FAILED *** (297 milliseconds)
   Failed to execute query using catalyst:
   Error: Job aborted due to stage failure: Task 0 in stage 1350.0 failed 1 
times, most recent failure: Lost task 0.0 in stage 1350.0 (TID 2398, localhost, 
executor driver): java.io.IOException: Cannot run program "/bin/bash": 
CreateProcess error=2, The system cannot find the file specified

 - transform with custom field delimiter3 *** FAILED *** (312 milliseconds)
   Failed to execute query using catalyst:
   Error: Job aborted due to stage failure: Task 0 in stage 1351.0 failed 1 
times, most recent failure: Lost task 0.0 in stage 1351.0 (TID 2399, localhost, 
executor driver): java.io.IOException: Cannot run program "/bin/bash": 
CreateProcess error=2, The system cannot find the file specified

 - transform with SerDe2 *** FAILED *** (437 milliseconds)
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 1355.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
1355.0 (TID 2403, localhost, executor driver): java.io.IOException: Cannot run 
program "/bin/bash": Cr

[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
```
 - recovery with file input stream *** FAILED *** (10 seconds, 205 
milliseconds)
   The code passed to eventually never returned normally. Attempted 660 
times over 10.0142724 seconds. Last failure message: Unexpected 
internal error near index 1
   \
^. (CheckpointSuite.scala:680)

 - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (2 
seconds, 563 milliseconds)
   org.apache.spark.sql.execution.QueryExecutionException: FAILED: 
Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask. 
org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

 - rolling file appender - size-based rolling (compressed) *** FAILED *** 
(15 milliseconds)
   1000 was not less than 1000 (FileAppenderSuite.scala:128)

 - recover from node failures with replication *** FAILED *** (34 seconds, 
613 milliseconds)
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 
1 in stage 6.0 failed 4 times, most recent failure: Lost task 1.3 in stage 6.0 
(TID 33, localhost, executor 28): java.io.IOException: 
org.apache.spark.SparkException: Failed to get broadcast_6_piece0 of broadcast_6
   ...
   Caused by: org.apache.spark.SparkException: Failed to get 
broadcast_6_piece0 of broadcast_6
   ...
```

`recovery with file input stream` - seems the same problem with this.
`SPARK-18220: read Hive orc table with varchar column` - I could not see 
the cause at the first look but it seems not related with this problem because 
apparently the test does not use any path.
`rolling file appender - size-based rolling (compressed)` - this test seems 
possible flaky. It was passed in 
https://ci.appveyor.com/project/spark-test/spark/build/503-6DEDA384-4A91-45CD-AD26-EE0757D3D2AC/job/etb359vk0fwbrqgo
`recover from node failures with replication` - this test seems possibly 
flaky too. It was passed in 
https://ci.appveyor.com/project/spark-test/spark/build/503-6DEDA384-4A91-45CD-AD26-EE0757D3D2AC/job/etb359vk0fwbrqgo

Other test failures are deal with in 
https://github.com/apache/spark/pull/16501.

Let me add a fix for the first one in `CheckpointSuite` after verifying it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r95078070
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala ---
@@ -629,7 +629,7 @@ class CheckpointSuite extends TestSuiteBase with 
DStreamCheckpointTester
 ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, 
_, _]]
   val filenames = 
fileInputDStream.batchTimeToSelectedFiles.synchronized
  { fileInputDStream.batchTimeToSelectedFiles.values.flatten }
-  filenames.map(_.split(File.separator).last.toInt).toSeq.sorted
+  filenames.map(_.split("/").last.toInt).toSeq.sorted
--- End diff --

This is always "/" because `FileInputDStream.batchTimeToSelectedFiles` is 
in form of `/a/b/c` via 
https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L207

```scala
scala> import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.Path

scala> new Path("C:\\a\\b\\c").toString
res0: String = C:/a/b/c
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r95078082
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala ---
@@ -755,7 +755,14 @@ class CheckpointSuite extends TestSuiteBase with 
DStreamCheckpointTester
 assert(outputBuffer.asScala.flatten.toSet === expectedOutput.toSet)
   }
 } finally {
-  Utils.deleteRecursively(testDir)
+  try {
+// As the driver shuts down in the middle of processing above, 
`testDir` is not closed
+// correctly which causes the test failure on Windows.
+Utils.deleteRecursively(testDir)
+  } catch {
+case e: IOException if Utils.isWindows =>
+  logWarning(e.getMessage)
+  }
--- End diff --

Here, it seems it shuts down in the middle of processing for testing. So, 
it looks ending up with not closing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] `org.apache.spark.streaming.CheckpointSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=19AF9EA6-D757-4E20-A27E-A6A86D6401EA&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/19AF9EA6-D757-4E20-A27E-A6A86D6401EA)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
(FWIW, I am pretty sure of this PR for now)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16501: [WIP][SPARK-19117][TESTS] Skip the tests using script tr...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16501
  
Build started: [TESTS] `ALL` 
[![PR-16501](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=16201E76-7067-408C-A359-61B524963D3B&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/16201E76-7067-408C-A359-61B524963D3B)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16502: Branch 2.1

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16502
  
Hi @bupt2012, it seems this PR is mistakenly open. Could you please close 
this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16501: [WIP][SPARK-19117][TESTS] Skip the tests using script tr...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16501
  
Build started: [TESTS] `org.apache.spark.sql.hive.execution.SQLQuerySuite` 
[![PR-16501](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=82B29FFC-BA11-4AA9-ACFC-F529FFAADB12&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/82B29FFC-BA11-4AA9-ACFC-F529FFAADB12)
Build started: [TESTS] `org.apache.spark.sql.hive.execution.HiveQuerySuite` 
[![PR-16501](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=1D8572C0-31FF-43ED-AC2E-476C76127BF7&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/1D8572C0-31FF-43ED-AC2E-476C76127BF7)
Build started: [TESTS] 
`org.apache.spark.sql.catalyst.LogicalPlanToSQLSuite` 
[![PR-16501](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=0E18DDFB-25FC-4868-B3CD-F18E29A0618D&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/0E18DDFB-25FC-4868-B3CD-F18E29A0618D)
Build started: [TESTS] 
`org.apache.spark.sql.hive.execution.ScriptTransformationSuite` 
[![PR-16501](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=3935B8AA-FEB7-4232-91D5-B9C8C13E596F&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/3935B8AA-FEB7-4232-91D5-B9C8C13E596F)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
@joshrosen and @davies, could you review this if you have some time please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16501: [WIP][SPARK-19117][TESTS] Skip the tests using script tr...

2017-01-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16501
  
Build started: [TESTS] `org.apache.spark.sql.hive.execution.HiveQuerySuite` 
[![PR-16501](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=072A48DF-9BFF-488E-9510-4FE37B211F68&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/072A48DF-9BFF-488E-9510-4FE37B211F68)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16515: [MINOR][PYTHON][EXAMPLE] Fix binary classificatio...

2017-01-09 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16515

[MINOR][PYTHON][EXAMPLE] Fix binary classification metrics example to work

## What changes were proposed in this pull request?

LibSVM datasource loads `ml.linalg.SparseVector` whereas the examples 
requires it to be `mllib.linalg.SparseVector`.  Scala exmaples, 
`BinaryClassificationMetricsExample.scala` is fine.

```
  File 
".../spark/examples/src/main/python/mllib/binary_classification_metrics_example.py",
 line 39, in 
.rdd.map(lambda row: LabeledPoint(row[0], row[1]))
  File ".../spark/python/pyspark/mllib/regression.py", line 54, in __init__
self.features = _convert_to_vector(features)
  File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 80, in 
_convert_to_vector
raise TypeError("Cannot convert type %s into Vector" % type(l))
TypeError: Cannot convert type  
into Vector
```

## How was this patch tested?

Manually via

```
./bin/spark-submit 
examples/src/main/python/mllib/binary_classification_metrics_example.py
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark minor-example-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16515.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16515


commit 9a4fd40609d6ef71dc3fd3db0f72502eb3f070f0
Author: hyukjinkwon 
Date:   2017-01-09T10:31:15Z

Fix binary classification metrics example to work




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16515: [MINOR][PYTHON][EXAMPLE] Fix binary classificatio...

2017-01-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16515#discussion_r95132326
  
--- Diff: 
examples/src/main/python/mllib/binary_classification_metrics_example.py ---
@@ -18,25 +18,20 @@
 Binary Classification Metrics Example.
 """
 from __future__ import print_function
-from pyspark.sql import SparkSession
+from pyspark import SparkContext
 # $example on$
 from pyspark.mllib.classification import LogisticRegressionWithLBFGS
 from pyspark.mllib.evaluation import BinaryClassificationMetrics
-from pyspark.mllib.regression import LabeledPoint
+from pyspark.mllib.util import MLUtils
 # $example off$
 
 if __name__ == "__main__":
-spark = SparkSession\
-.builder\
-.appName("BinaryClassificationMetricsExample")\
-.getOrCreate()
+sc = SparkContext(appName="BinaryClassificationMetricsExample")
--- End diff --

I just used `SparkContext` to be consistent with other examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16515: [MINOR][PYTHON][EXAMPLE] Fix binary classification metri...

2017-01-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16515
  
@yanboliang Could I please ask to take a look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16515: [MINOR][PYTHON][EXAMPLE] Fix binary classification metri...

2017-01-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16515
  
Hm.. actually, it seems there are more. Let me open a JIRA and sweep it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16515: [SPARK-19134][PYTHON][EXAMPLE] Fix several Python...

2017-01-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16515#discussion_r95140703
  
--- Diff: 
examples/src/main/python/mllib/binary_classification_metrics_example.py ---
@@ -18,25 +18,20 @@
 Binary Classification Metrics Example.
 """
 from __future__ import print_function
-from pyspark.sql import SparkSession
+from pyspark import SparkContext
 # $example on$
 from pyspark.mllib.classification import LogisticRegressionWithLBFGS
 from pyspark.mllib.evaluation import BinaryClassificationMetrics
-from pyspark.mllib.regression import LabeledPoint
+from pyspark.mllib.util import MLUtils
 # $example off$
 
 if __name__ == "__main__":
-spark = SparkSession\
-.builder\
-.appName("BinaryClassificationMetricsExample")\
-.getOrCreate()
+sc = SparkContext(appName="BinaryClassificationMetricsExample")
--- End diff --

Yes, it is up to my understanding.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16515: [SPARK-19134][EXAMPLE] Fix several sql, mllib and status...

2017-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16515
  
Thank you all!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16553: [SPARK-9435][SQL] Reuse function in Java UDF to c...

2017-01-11 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16553

[SPARK-9435][SQL] Reuse function in Java UDF to correctly support 
expressions that require equality comparison

## What changes were proposed in this pull request?

Currently, running the codes in Java

```java
spark.udf().register("inc", new UDF1() {
  @Override
  public Long call(Long i) {
return i + 1;
  }
}, DataTypes.LongType);

spark.range(10).toDF("x").createOrReplaceTempView("tmp");
Row result = spark.sql("SELECT inc(x) FROM tmp GROUP BY inc(x)").head();
Assert.assertEquals(7, result.getLong(0));
```

fails as below:

```
org.apache.spark.sql.AnalysisException: expression 'tmp.`x`' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() (or first_value) if you don't care which value you get.;;
Aggregate [UDF(x#19L)], [UDF(x#19L) AS UDF(x)#23L]
+- SubqueryAlias tmp, `tmp`
   +- Project [id#16L AS x#19L]
  +- Range (0, 10, step=1, splits=Some(8))

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:57)
```

The root cause is because we were creating the function every time when it 
needs to build as below:

```scala
scala> def inc(i: Int) = i + 1
inc: (i: Int)Int

scala> (inc(_: Int)).hashCode
res15: Int = 1231799381

scala> (inc(_: Int)).hashCode
res16: Int = 2109839984

scala> (inc(_: Int)) == (inc(_: Int))
res17: Boolean = false
```

This seems leading to the comparison failure between `ScalaUDF` created 
from Java UDF API, for example, in `Expression.semanticEquals`.

In case of Scala one, it seems already fine.

Both can be tested easily as below if any reviewer is comfortable with 
Scala:

```scala
val df = Seq((1, 10), (2, 11), (3, 12)).toDF("x", "y")
val javaUDF = new UDF1[Int, Int]  {
  override def call(i: Int): Int = i + 1
}
// spark.udf.register("inc", javaUDF, IntegerType) // Uncomment this for 
Java API
// spark.udf.register("inc", (i: Int) => i + 1)// Uncomment this for 
Scala API
df.createOrReplaceTempView("tmp")
spark.sql("SELECT inc(y) FROM tmp GROUP BY inc(y)").show()
```

## How was this patch tested?

Unit test in `JavaUDFSuite.java`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-9435

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16553


commit 30ed14f38b5c38091d07d0e014a49e494aeb73cc
Author: hyukjinkwon 
Date:   2017-01-11T18:02:08Z

Reuse function in Java UDF to support correctly expression equality 
comparison




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16553: [SPARK-9435][SQL] Reuse function in Java UDF to correctl...

2017-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16553
  
cc @marmbrus, I just saw you in the JIRA. Could you please take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16553: [SPARK-9435][SQL] Reuse function in Java UDF to c...

2017-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16553#discussion_r95694410
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -488,219 +488,241 @@ class UDFRegistration private[sql] 
(functionRegistry: FunctionRegistry) extends
* @since 1.3.0
*/
   def register(name: String, f: UDF1[_, _], returnType: DataType): Unit = {
+val func = f.asInstanceOf[UDF1[Any, Any]].call(_: Any)
--- End diff --

Ah, sure. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
gentle ping @joshrosen and @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16553: [SPARK-9435][SQL] Reuse function in Java UDF to c...

2017-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16553#discussion_r95705218
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -109,9 +109,10 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
  | * @since 1.3.0
  | */
  |def register(name: String, f: UDF$i[$extTypeArgs, _], 
returnType: DataType): Unit = {
+ |  val func = f$anyCast.call($anyParams)
  |  functionRegistry.registerFunction(
  |name,
- |(e: Seq[Expression]) => ScalaUDF(f$anyCast.call($anyParams), 
returnType, e))
+ |(e: Seq[Expression]) => ScalaUDF(func, returnType, e))
--- End diff --

I verified this by overwriting the current changes after copying and 
pasting and checking no diff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14451: [SPARK-16848][SQL] Check schema validation for user-spec...

2017-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14451
  
I just simply rebased this. Could this be merged by any change after the 
build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16502: Branch 2.1

2017-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16502
  
Hi @bupt2012, could you just click the "Close pull request" button below?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14451: [SPARK-16848][SQL] Check schema validation for user-spec...

2017-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14451
  
Thank you @gatorsmile!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16584: [SPARK-19221][PROJECT INFRA][R] Add winutils bina...

2017-01-13 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16584

[SPARK-19221][PROJECT INFRA][R] Add winutils binaries to the path in 
AppVeyor tests for Hadoop libraries to call native codes properly

## What changes were proposed in this pull request?

It seems Hadoop libraries need winutils binaries for native libraries in 
the path.

It is not a problem in tests for now because we are only testing SparkR on 
Windows via AppVeyor but it can be a problem if we run Scala tests via AppVeyor 
as below:

```
 - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (3 
seconds, 937 milliseconds)
   org.apache.spark.sql.execution.QueryExecutionException: FAILED: 
Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask. 
org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
   at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:625)
   at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:609)
   at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
   ...
```

This PR proposes to add it to the `Path` for AppVeyor tests.

## How was this patch tested?

Manually via AppVeyor.

**Before**


https://ci.appveyor.com/project/spark-test/spark/build/572-windows-complete/job/c4vrysr5uvj2hgu7

**After**


https://ci.appveyor.com/project/spark-test/spark/build/549-windows-complete/job/gc8a1pjua2bc4i8m



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark set-path-appveyor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16584.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16584


commit 7a8163cd006f669371b5188e8b55fc12fe5375d3
Author: hyukjinkwon 
Date:   2017-01-13T16:10:03Z

Add hadoop.dll in AppVeyor tests for Hadoop libraries to call native 
libraries properly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16584: [SPARK-19221][PROJECT INFRA][R] Add winutils binaries to...

2017-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16584
  
Here, SparkR tests to verify it works fine after merged into master.

Build started: [SparkR] `ALL` 
[![PR-16584](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=D02D7C5B-ED94-4326-B72C-DDC1229ECC88&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/D02D7C5B-ED94-4326-B72C-DDC1229ECC88)

Here, `OrcSourceSuite` that contains the problematic test that seems using 
the winutils binaries.

Build started: [org.apache.spark.sql.hive.orc.OrcSourceSuite] `ALL` 
[![PR-16584](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=ADC9D5F6-726B-4691-B60F-4B8DAB18010B&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/ADC9D5F6-726B-4691-B60F-4B8DAB18010B)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16584: [SPARK-19221][PROJECT INFRA][R] Add winutils binaries to...

2017-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16584
  
cc @srowen this is related with the last test failure I have identified so 
far.

Let me complete fixing the test failures (found in [a way I have 
tested](https://ci.appveyor.com/project/spark-test/spark/build/530-windows-test))
 on Windows in another PR including newly introduced ones, (too) flaky ones  
and potentially missed ones by showing a green.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16584: [SPARK-19221][PROJECT INFRA][R] Add winutils binaries to...

2017-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16584
  
cc @shivaram too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16577: [SPARK-19214][SQL] Typed aggregate count output field na...

2017-01-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16577
  
(It just rings a bell to me. It seems it is okay to break the default names 
if it seems required - 
https://github.com/apache/spark/pull/1#issuecomment-246932928)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16553: [SPARK-9435][SQL] Reuse function in Java UDF to correctl...

2017-01-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16553
  
@marmbrus, could you take another look when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
gentle ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-01-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16578
  
Does this take over https://github.com/apache/spark/pull/14957? If so, we 
might need `Closes #14957` in the PR description for the merge script to close 
that one or let the author know this takes over that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
The only added comments are as blow:

```python
# __kwdefaults__ contains the default values of keyword-only arguments 
which are
# introduced from Python 3. The possible cases for __kwdefaults__ in 
namedtuple
# are as below:
#
# - Does not exist in Python 2.
# - Returns None in <= Python 3.5.x.
# - Returns a dictionary containing the default values to the keys from 
Python 3.6.x
#(See https://bugs.python.org/issue25628).
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
(I re-ran `./run-tests --python-executables=python3.6` for sure)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16553: [SPARK-9435][SQL] Reuse function in Java UDF to correctl...

2017-01-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16553
  
@gatorsmile Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16553: [SPARK-9435][SQL] Reuse function in Java UDF to c...

2017-01-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16553#discussion_r96137414
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaUDFSuite.java ---
@@ -108,4 +109,24 @@ public void udf3Test() {
 result = spark.sql("SELECT stringLengthTest('test', 'test2')").head();
 Assert.assertEquals(9, result.getInt(0));
   }
+
+  @SuppressWarnings("unchecked")
+  @Test
+  public void udf4Test() {
+spark.udf().register("inc", new UDF1() {
+  @Override
+  public Long call(Long i) {
+return i + 1;
+  }
+}, DataTypes.LongType);
+
+spark.range(10).toDF("x").createOrReplaceTempView("tmp");
+List results = spark.sql("SELECT inc(x) FROM tmp GROUP BY 
inc(x)").collectAsList();
--- End diff --

Sure, makes sense. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >