[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

cloud-fan Sun, 24 Jul 2016 20:27:04 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14313#discussion_r72005471
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
    @@ -407,84 +496,8 @@ private[sql] class JDBCRDD(
             var i = 0
             while (i < conversions.length) {
               val pos = i + 1
    -          conversions(i) match {
    -            case BooleanConversion => mutableRow.setBoolean(i, 
rs.getBoolean(pos))
    -            case DateConversion =>
    -              // DateTimeUtils.fromJavaDate does not handle null value, so 
we need to check it.
    -              val dateVal = rs.getDate(pos)
    -              if (dateVal != null) {
    -                mutableRow.setInt(i, DateTimeUtils.fromJavaDate(dateVal))
    -              } else {
    -                mutableRow.update(i, null)
    -              }
    -            // When connecting with Oracle DB through JDBC, the precision 
and scale of BigDecimal
    -            // object returned by ResultSet.getBigDecimal is not correctly 
matched to the table
    -            // schema reported by ResultSetMetaData.getPrecision and 
ResultSetMetaData.getScale.
    -            // If inserting values like 19999 into a column with 
NUMBER(12, 2) type, you get through
    -            // a BigDecimal object with scale as 0. But the dataframe 
schema has correct type as
    -            // DecimalType(12, 2). Thus, after saving the dataframe into 
parquet file and then
    -            // retrieve it, you will get wrong result 199.99.
    -            // So it is needed to set precision and scale for Decimal 
based on JDBC metadata.
    -            case DecimalConversion(p, s) =>
    -              val decimalVal = rs.getBigDecimal(pos)
    -              if (decimalVal == null) {
    -                mutableRow.update(i, null)
    -              } else {
    -                mutableRow.update(i, Decimal(decimalVal, p, s))
    -              }
    -            case DoubleConversion => mutableRow.setDouble(i, 
rs.getDouble(pos))
    -            case FloatConversion => mutableRow.setFloat(i, 
rs.getFloat(pos))
    -            case IntegerConversion => mutableRow.setInt(i, rs.getInt(pos))
    -            case LongConversion => mutableRow.setLong(i, rs.getLong(pos))
    -            // TODO(davies): use getBytes for better performance, if the 
encoding is UTF-8
    -            case StringConversion => mutableRow.update(i, 
UTF8String.fromString(rs.getString(pos)))
    -            case TimestampConversion =>
    -              val t = rs.getTimestamp(pos)
    -              if (t != null) {
    -                mutableRow.setLong(i, DateTimeUtils.fromJavaTimestamp(t))
    -              } else {
    -                mutableRow.update(i, null)
    -              }
    -            case BinaryConversion => mutableRow.update(i, rs.getBytes(pos))
    -            case BinaryLongConversion =>
    -              val bytes = rs.getBytes(pos)
    -              var ans = 0L
    -              var j = 0
    -              while (j < bytes.size) {
    -                ans = 256 * ans + (255 & bytes(j))
    -                j = j + 1
    -              }
    -              mutableRow.setLong(i, ans)
    -            case ArrayConversion(elementConversion) =>
    -              val array = rs.getArray(pos).getArray
    -              if (array != null) {
    -                val data = elementConversion match {
    -                  case TimestampConversion =>
    -                    array.asInstanceOf[Array[java.sql.Timestamp]].map { 
timestamp =>
    -                      nullSafeConvert(timestamp, 
DateTimeUtils.fromJavaTimestamp)
    -                    }
    -                  case StringConversion =>
    -                    array.asInstanceOf[Array[java.lang.String]]
    -                      .map(UTF8String.fromString)
    -                  case DateConversion =>
    -                    array.asInstanceOf[Array[java.sql.Date]].map { date =>
    -                      nullSafeConvert(date, DateTimeUtils.fromJavaDate)
    -                    }
    -                  case DecimalConversion(p, s) =>
    -                    array.asInstanceOf[Array[java.math.BigDecimal]].map { 
decimal =>
    -                      nullSafeConvert[java.math.BigDecimal](decimal, d => 
Decimal(d, p, s))
    -                    }
    -                  case BinaryLongConversion =>
    -                    throw new IllegalArgumentException(s"Unsupported array 
element conversion $i")
    -                  case _: ArrayConversion =>
    -                    throw new IllegalArgumentException("Nested arrays 
unsupported")
    -                  case _ => array.asInstanceOf[Array[Any]]
    -                }
    -                mutableRow.update(i, new GenericArrayData(data))
    -              } else {
    -                mutableRow.update(i, null)
    -              }
    -          }
    +          val value = conversions(i).apply(rs, pos)
    +          mutableRow.update(i, value)
    --- End diff --
    
    previously we use `mutableRow.setXXX` for primitive types, but now we 
always use `update`. Is there a way we can still avoid the boxing here?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

Reply via email to