[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

yhuai Thu, 17 Jul 2014 09:27:56 -0700

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1439#discussion_r15067812
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala 
---
    @@ -67,95 +61,12 @@ case class HiveTableScan(
       }
     
       @transient
    -  private[this] val hadoopReader = new 
HadoopTableReader(relation.tableDesc, context)
    -
    -  /**
    -   * The hive object inspector for this table, which can be used to 
extract values from the
    -   * serialized row representation.
    -   */
    -  @transient
    -  private[this] lazy val objectInspector =
    -    
relation.tableDesc.getDeserializer.getObjectInspector.asInstanceOf[StructObjectInspector]
    -
    -  /**
    -   * Functions that extract the requested attributes from the hive output. 
 Partitioned values are
    -   * casted from string to its declared data type.
    -   */
    -  @transient
    -  protected lazy val attributeFunctions: Seq[(Any, Array[String]) => Any] 
= {
    -    attributes.map { a =>
    -      val ordinal = relation.partitionKeys.indexOf(a)
    -      if (ordinal >= 0) {
    -        val dataType = relation.partitionKeys(ordinal).dataType
    -        (_: Any, partitionKeys: Array[String]) => {
    -          castFromString(partitionKeys(ordinal), dataType)
    -        }
    -      } else {
    -        val ref = objectInspector.getAllStructFieldRefs
    -          .find(_.getFieldName == a.name)
    -          .getOrElse(sys.error(s"Can't find attribute $a"))
    -        val fieldObjectInspector = ref.getFieldObjectInspector
    -
    -        val unwrapHiveData = fieldObjectInspector match {
    -          case _: HiveVarcharObjectInspector =>
    -            (value: Any) => value.asInstanceOf[HiveVarchar].getValue
    -          case _: HiveDecimalObjectInspector =>
    -            (value: Any) => 
BigDecimal(value.asInstanceOf[HiveDecimal].bigDecimalValue())
    -          case _ =>
    -            identity[Any] _
    -        }
    -
    -        (row: Any, _: Array[String]) => {
    -          val data = objectInspector.getStructFieldData(row, ref)
    -          val hiveData = unwrapData(data, fieldObjectInspector)
    -          if (hiveData != null) unwrapHiveData(hiveData) else null
    -        }
    -      }
    -    }
    -  }
    +  private[this] val hadoopReader = new HadoopTableReader(attributes, 
relation, context)
     
       private[this] def castFromString(value: String, dataType: DataType) = {
         Cast(Literal(value), dataType).eval(null)
       }
     
    -  private def addColumnMetadataToConf(hiveConf: HiveConf) {
    --- End diff --
    
    I would keep it. It is important to set needed columns in conf. So, RCFile 
and ORC can know what columns should be skipped. Also, seems 
`hiveConf.set(serdeConstants.LIST_COLUMN_TYPES, columnTypeNames)` and 
`hiveConf.set(serdeConstants.LIST_COLUMNS, columnInternalNames)` will be used 
to push down filters.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

Reply via email to