subject:"\[GitHub\] spark pull request #16351\: \[SPARK\-18943\]\[SQL\] Avoid per\-record type dispatch..."

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16351


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16351#discussion_r93814412
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala
 ---
@@ -66,144 +66,138 @@ class CSVTypeCastSuite extends SparkFunSuite {
   }
 
   test("Nullable types are handled") {
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ByteType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ShortType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", LongType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", FloatType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DoubleType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", BooleanType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DecimalType.DoubleDecimal, true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", TimestampType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DateType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-
-// casting a null to not nullable field should throw an exception.
-var message = intercept[RuntimeException] {
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
-
-message = intercept[RuntimeException] {
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
-  }
-
-  test("String type should also respect `nullValue`") {
--- End diff --

Some tests in ``String type should also respect `nullValue` `` were 
duplicated and other were folded into `Nullable types are handled`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16351#discussion_r93814350
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala
 ---
@@ -66,144 +66,141 @@ class CSVTypeCastSuite extends SparkFunSuite {
   }
 
   test("Nullable types are handled") {
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ByteType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ShortType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", LongType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", FloatType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DoubleType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", BooleanType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DecimalType.DoubleDecimal, true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", TimestampType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DateType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-
-// casting a null to not nullable field should throw an exception.
-var message = intercept[RuntimeException] {
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
-
-message = intercept[RuntimeException] {
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
+val types = Seq(ByteType, ShortType, IntegerType, LongType, FloatType, 
DoubleType,
+  BooleanType, DecimalType.DoubleDecimal, TimestampType, DateType, 
StringType)
+
+types.foreach { t =>
+  val converterOne =
+CSVTypeCast.makeConverter("_1", t, nullable = true, 
CSVOptions("nullValue", "-"))
+  assertNull(converterOne.apply("-"))
+  assertNull(converterOne.apply(null))
+
+  // casting a null to not nullable field should throw an exception.
+  val converterTwo =
+CSVTypeCast.makeConverter("_1", t, nullable = false, 
CSVOptions("nullValue", "-"))
+
+  var message = intercept[RuntimeException] {
+converterTwo.apply("-")
+  }.getMessage
+  assert(message.contains("null value found but field _1 is not 
nullable."))
+
+  message = intercept[RuntimeException] {
+converterTwo.apply(null)
+  }.getMessage
+  assert(message.contains("null value found but field _1 is not 
nullable."))
+}
   }
 
   test("String type should also respect `nullValue`") {
--- End diff --

I see, some of tests are overlapped. Let me try to clean up with some 
comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16351#discussion_r93791725
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala
 ---
@@ -66,144 +66,141 @@ class CSVTypeCastSuite extends SparkFunSuite {
   }
 
   test("Nullable types are handled") {
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ByteType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ShortType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", LongType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", FloatType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DoubleType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", BooleanType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DecimalType.DoubleDecimal, true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", TimestampType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DateType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-
-// casting a null to not nullable field should throw an exception.
-var message = intercept[RuntimeException] {
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
-
-message = intercept[RuntimeException] {
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
+val types = Seq(ByteType, ShortType, IntegerType, LongType, FloatType, 
DoubleType,
+  BooleanType, DecimalType.DoubleDecimal, TimestampType, DateType, 
StringType)
+
+types.foreach { t =>
+  val converterOne =
+CSVTypeCast.makeConverter("_1", t, nullable = true, 
CSVOptions("nullValue", "-"))
+  assertNull(converterOne.apply("-"))
+  assertNull(converterOne.apply(null))
+
+  // casting a null to not nullable field should throw an exception.
+  val converterTwo =
+CSVTypeCast.makeConverter("_1", t, nullable = false, 
CSVOptions("nullValue", "-"))
+
+  var message = intercept[RuntimeException] {
+converterTwo.apply("-")
+  }.getMessage
+  assert(message.contains("null value found but field _1 is not 
nullable."))
+
+  message = intercept[RuntimeException] {
+converterTwo.apply(null)
+  }.getMessage
+  assert(message.contains("null value found but field _1 is not 
nullable."))
+}
   }
 
   test("String type should also respect `nullValue`") {
--- End diff --

this test is not needed right? you already included string type in 
https://github.com/apache/spark/pull/16351/files#diff-b6ea04714c153af78c2acbab6265cda8R70


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16351#discussion_r93791600
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala
 ---
@@ -66,144 +66,141 @@ class CSVTypeCastSuite extends SparkFunSuite {
   }
 
   test("Nullable types are handled") {
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ByteType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ShortType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", LongType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", FloatType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DoubleType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", BooleanType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DecimalType.DoubleDecimal, true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", TimestampType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DateType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-
-// casting a null to not nullable field should throw an exception.
-var message = intercept[RuntimeException] {
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
-
-message = intercept[RuntimeException] {
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
+val types = Seq(ByteType, ShortType, IntegerType, LongType, FloatType, 
DoubleType,
+  BooleanType, DecimalType.DoubleDecimal, TimestampType, DateType, 
StringType)
+
+types.foreach { t =>
+  val converterOne =
--- End diff --

isn't it more natural to write `converter1`, `converter2`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16351#discussion_r93381128
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -215,84 +215,133 @@ private[csv] object CSVInferSchema {
 }
 
 private[csv] object CSVTypeCast {
+  // A `ValueConverter` is responsible for converting the given value to a 
desired type.
+  private type ValueConverter = String => Any
 
   /**
-   * Casts given string datum to specified type.
-   * Currently we do not support complex types (ArrayType, MapType, 
StructType).
+   * Create converters which cast each given string datum to each 
specified type in given schema.
+   * Currently, we do not support complex types (`ArrayType`, `MapType`, 
`StructType`).
*
-   * For string types, this is simply the datum. For other types.
+   * For string types, this is simply the datum.
+   * For other types, this is converted into the value according to the 
type.
* For other nullable types, returns null if it is null or equals to the 
value specified
* in `nullValue` option.
*
-   * @param datum string value
-   * @param name field name in schema.
-   * @param castType data type to cast `datum` into.
-   * @param nullable nullability for the field.
+   * @param schema schema that contains data types to cast the given value 
into.
* @param options CSV options.
*/
-  def castTo(
+  def makeConverters(
+  schema: StructType,
+  options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
+schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+  }
+
+  /**
+   * Create a converter which converts the string value to a value 
according to a desired type.
+   */
+  def makeConverter(
+   name: String,
+   dataType: DataType,
+   nullable: Boolean = true,
+   options: CSVOptions = CSVOptions()): ValueConverter = dataType 
match {
+case _: ByteType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { case datum =>
--- End diff --

nit: nullSafeDatum(d, name, nullable, options)(_.toByte)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16351#discussion_r93381049
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -215,84 +215,133 @@ private[csv] object CSVInferSchema {
 }
 
 private[csv] object CSVTypeCast {
+  // A `ValueConverter` is responsible for converting the given value to a 
desired type.
+  private type ValueConverter = String => Any
 
   /**
-   * Casts given string datum to specified type.
-   * Currently we do not support complex types (ArrayType, MapType, 
StructType).
+   * Create converters which cast each given string datum to each 
specified type in given schema.
+   * Currently, we do not support complex types (`ArrayType`, `MapType`, 
`StructType`).
*
-   * For string types, this is simply the datum. For other types.
+   * For string types, this is simply the datum.
+   * For other types, this is converted into the value according to the 
type.
* For other nullable types, returns null if it is null or equals to the 
value specified
* in `nullValue` option.
*
-   * @param datum string value
-   * @param name field name in schema.
-   * @param castType data type to cast `datum` into.
-   * @param nullable nullability for the field.
+   * @param schema schema that contains data types to cast the given value 
into.
* @param options CSV options.
*/
-  def castTo(
+  def makeConverters(
+  schema: StructType,
+  options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
+schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+  }
+
+  /**
+   * Create a converter which converts the string value to a value 
according to a desired type.
+   */
+  def makeConverter(
+   name: String,
+   dataType: DataType,
+   nullable: Boolean = true,
+   options: CSVOptions = CSVOptions()): ValueConverter = dataType 
match {
+case _: ByteType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { case datum =>
+datum.toByte
+  }
+
+case _: ShortType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { case datum =>
+datum.toShort
+  }
+
+case _: IntegerType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { case datum =>
+datum.toInt
+  }
+
+case _: LongType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { case datum =>
+datum.toLong
+  }
+
+case _: FloatType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) {
+case options.nanValue => Float.NaN
+case options.negativeInf => Float.NegativeInfinity
+case options.positiveInf => Float.PositiveInfinity
+case datum =>
+  Try(datum.toFloat)
+
.getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).floatValue())
+  }
+
+case _: DoubleType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) {
+case options.nanValue => Double.NaN
+case options.negativeInf => Double.NegativeInfinity
+case options.positiveInf => Double.PositiveInfinity
+case datum =>
+  Try(datum.toDouble)
+
.getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).doubleValue())
+  }
+
+case _: BooleanType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { case datum =>
+datum.toBoolean
+  }
+
+case dt: DecimalType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { case datum =>
+val value = new BigDecimal(datum.replaceAll(",", ""))
+Decimal(value, dt.precision, dt.scale)
+  }
+
+case _: TimestampType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { case datum =>
+// This one will lose microseconds parts.
+// See https://issues.apache.org/jira/browse/SPARK-10681.
+Try(options.timestampFormat.parse(datum).getTime * 1000L)
+  .getOrElse {
+  // If it fails to parse, then tries the way used in 2.0 and 1.x 
for backwards
+  // compatibility.
+  DateTimeUtils.stringToTime(datum).getTime * 1000L
+}
+  }
+
+case _: DateType => (d: String) =>
+  nullSafeDatum(d, name, nullable, options) { case datum =>
+// This one will lose microseconds parts.
+// See https://issues.apache.org/jira/browse/SPARK-10681.x
+
Try(DateTimeUtils.millisToDays(options.dateFormat.parse(datum).getTime))
+

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16351#discussion_r93230247
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -215,84 +215,133 @@ private[csv] object CSVInferSchema {
 }
 
 private[csv] object CSVTypeCast {
+  // A `ValueConverter` is responsible for converting the given value to a 
desired type.
+  private type ValueConverter = String => Any
 
   /**
-   * Casts given string datum to specified type.
-   * Currently we do not support complex types (ArrayType, MapType, 
StructType).
+   * Create converters which cast each given string datum to each 
specified type in given schema.
+   * Currently, we do not support complex types (`ArrayType`, `MapType`, 
`StructType`).
*
-   * For string types, this is simply the datum. For other types.
+   * For string types, this is simply the datum.
+   * For other types, this is converted into the value according to the 
type.
* For other nullable types, returns null if it is null or equals to the 
value specified
* in `nullValue` option.
*
-   * @param datum string value
-   * @param name field name in schema.
-   * @param castType data type to cast `datum` into.
-   * @param nullable nullability for the field.
+   * @param schema schema that contains data types to cast the given value 
into.
* @param options CSV options.
*/
-  def castTo(
+  private[sql] def makeConverters(
--- End diff --

Ops, I can remove this access modifier. Will remove it soon and the one 
below too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-20 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16351

[SPARK-18943][SQL] Avoid per-record type dispatch in CSV when reading

## What changes were proposed in this pull request?

`CSVRelation.csvParser` does type dispatch for each value in each row. We 
can prevent this because the schema is already kept in `CSVRelation`.

So, this PR proposes that converters are created first according to the 
schema, and then apply them to each.

## How was this patch tested?

Tests in `CSVTypeCastSuite` and `CSVRelation`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark type-dispatch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16351.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16351


commit e72d1bc419dfd6da7f6e298d5b5412dba69eb5ad
Author: hyukjinkwon 
Date:   2016-12-20T11:54:05Z

Avoid per-record type dispatch in CSV when reading




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

9 matches

Site Navigation

Mail list logo

Footer information