[ 
https://issues.apache.org/jira/browse/HUDI-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lietong Liu updated HUDI-1667:
------------------------------
    Fix Version/s: 0.6.0
      Description: 
When HoodieMergeOnReadRDD read record from base file,  will create new 
InternalRow base on requiredStructSchema.
{code:java}
//代码占位符
private def createRowWithRequiredSchema(row: InternalRow): InternalRow = {
  val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema)
  val posIterator = requiredFieldPosition.iterator
  var curIndex = 0
  tableState.requiredStructSchema.foreach(
    f => {
      val curPos = posIterator.next()
      val curField = row.get(curPos, f.dataType)
      rowToReturn.update(curIndex, curField)
      curIndex = curIndex + 1
    }
  )
  rowToReturn
}

{code}
 Hoodie doesn't check isNull when get value from all fields here.

If vectorization is enabled, which  means row is *ColumnarBatchRow*_*.*_  
***ColumnarBatchRow* may return non-null value even if value of field is null. 
So, hoodie may set non-null value in field which is null.

  was:
When HoodieMergeOnReadRDD read record from base file,  will create new 
InternalRow base on requiredStructSchema.
{code:java}
//代码占位符
private def createRowWithRequiredSchema(row: InternalRow): InternalRow = {
  val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema)
  val posIterator = requiredFieldPosition.iterator
  var curIndex = 0
  tableState.requiredStructSchema.foreach(
    f => {
      val curPos = posIterator.next()
      val curField = row.get(curPos, f.dataType)
      rowToReturn.update(curIndex, curField)
      curIndex = curIndex + 1
    }
  )
  rowToReturn
}

{code}
 


> Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set 
> non-null value in field which is null if vectorization is enabled.
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-1667
>                 URL: https://issues.apache.org/jira/browse/HUDI-1667
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Common Core
>            Reporter: Lietong Liu
>            Priority: Major
>             Fix For: 0.6.0
>
>
> When HoodieMergeOnReadRDD read record from base file,  will create new 
> InternalRow base on requiredStructSchema.
> {code:java}
> //代码占位符
> private def createRowWithRequiredSchema(row: InternalRow): InternalRow = {
>   val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema)
>   val posIterator = requiredFieldPosition.iterator
>   var curIndex = 0
>   tableState.requiredStructSchema.foreach(
>     f => {
>       val curPos = posIterator.next()
>       val curField = row.get(curPos, f.dataType)
>       rowToReturn.update(curIndex, curField)
>       curIndex = curIndex + 1
>     }
>   )
>   rowToReturn
> }
> {code}
>  Hoodie doesn't check isNull when get value from all fields here.
> If vectorization is enabled, which  means row is *ColumnarBatchRow*_*.*_  
> ***ColumnarBatchRow* may return non-null value even if value of field is 
> null. So, hoodie may set non-null value in field which is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to