[ 
https://issues.apache.org/jira/browse/HUDI-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lietong Liu updated HUDI-1667:
------------------------------
    Status: In Progress  (was: Open)

> Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set 
> non-null value in field which is null if vectorization is enabled.
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-1667
>                 URL: https://issues.apache.org/jira/browse/HUDI-1667
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Common Core
>            Reporter: Lietong Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>
> When HoodieMergeOnReadRDD read record from base file,  will create new 
> InternalRow base on requiredStructSchema.
> {code:java}
> //代码占位符
> private def createRowWithRequiredSchema(row: InternalRow): InternalRow = {
>   val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema)
>   val posIterator = requiredFieldPosition.iterator
>   var curIndex = 0
>   tableState.requiredStructSchema.foreach(
>     f => {
>       val curPos = posIterator.next()
>       val curField = row.get(curPos, f.dataType)
>       rowToReturn.update(curIndex, curField)
>       curIndex = curIndex + 1
>     }
>   )
>   rowToReturn
> }
> {code}
>  Hoodie doesn't check isNull when get value from all fields here.
> If vectorization is enabled, which  means row is *ColumnarBatchRow*_*.*_  
> ***ColumnarBatchRow* may return non-null value even if value of field is 
> null. So, hoodie may set non-null value in field which is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to