[jira] [Assigned] (HUDI-1667) Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set non-null value in field which is null if vectorization is enabled.
[ https://issues.apache.org/jira/browse/HUDI-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu reassigned HUDI-1667: - Assignee: Lietong Liu > Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set > non-null value in field which is null if vectorization is enabled. > --- > > Key: HUDI-1667 > URL: https://issues.apache.org/jira/browse/HUDI-1667 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: Lietong Liu >Assignee: Lietong Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > When HoodieMergeOnReadRDD read record from base file, will create new > InternalRow base on requiredStructSchema. > {code:java} > //代码占位符 > private def createRowWithRequiredSchema(row: InternalRow): InternalRow = { > val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema) > val posIterator = requiredFieldPosition.iterator > var curIndex = 0 > tableState.requiredStructSchema.foreach( > f => { > val curPos = posIterator.next() > val curField = row.get(curPos, f.dataType) > rowToReturn.update(curIndex, curField) > curIndex = curIndex + 1 > } > ) > rowToReturn > } > {code} > Hoodie doesn't check isNull when get value from all fields here. > If vectorization is enabled, which means row is *ColumnarBatchRow*_*.*_ > ***ColumnarBatchRow* may return non-null value even if value of field is > null. So, hoodie may set non-null value in field which is null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1667) Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set non-null value in field which is null if vectorization is enabled.
[ https://issues.apache.org/jira/browse/HUDI-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu resolved HUDI-1667. --- Resolution: Fixed > Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set > non-null value in field which is null if vectorization is enabled. > --- > > Key: HUDI-1667 > URL: https://issues.apache.org/jira/browse/HUDI-1667 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: Lietong Liu >Assignee: Lietong Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > When HoodieMergeOnReadRDD read record from base file, will create new > InternalRow base on requiredStructSchema. > {code:java} > //代码占位符 > private def createRowWithRequiredSchema(row: InternalRow): InternalRow = { > val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema) > val posIterator = requiredFieldPosition.iterator > var curIndex = 0 > tableState.requiredStructSchema.foreach( > f => { > val curPos = posIterator.next() > val curField = row.get(curPos, f.dataType) > rowToReturn.update(curIndex, curField) > curIndex = curIndex + 1 > } > ) > rowToReturn > } > {code} > Hoodie doesn't check isNull when get value from all fields here. > If vectorization is enabled, which means row is *ColumnarBatchRow*_*.*_ > ***ColumnarBatchRow* may return non-null value even if value of field is > null. So, hoodie may set non-null value in field which is null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1667) Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set non-null value in field which is null if vectorization is enabled.
[ https://issues.apache.org/jira/browse/HUDI-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu updated HUDI-1667: -- Status: In Progress (was: Open) > Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set > non-null value in field which is null if vectorization is enabled. > --- > > Key: HUDI-1667 > URL: https://issues.apache.org/jira/browse/HUDI-1667 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: Lietong Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > When HoodieMergeOnReadRDD read record from base file, will create new > InternalRow base on requiredStructSchema. > {code:java} > //代码占位符 > private def createRowWithRequiredSchema(row: InternalRow): InternalRow = { > val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema) > val posIterator = requiredFieldPosition.iterator > var curIndex = 0 > tableState.requiredStructSchema.foreach( > f => { > val curPos = posIterator.next() > val curField = row.get(curPos, f.dataType) > rowToReturn.update(curIndex, curField) > curIndex = curIndex + 1 > } > ) > rowToReturn > } > {code} > Hoodie doesn't check isNull when get value from all fields here. > If vectorization is enabled, which means row is *ColumnarBatchRow*_*.*_ > ***ColumnarBatchRow* may return non-null value even if value of field is > null. So, hoodie may set non-null value in field which is null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1667) Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set non-null value in field which is null if vectorization is enabled.
[ https://issues.apache.org/jira/browse/HUDI-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu updated HUDI-1667: -- Fix Version/s: 0.6.0 Description: When HoodieMergeOnReadRDD read record from base file, will create new InternalRow base on requiredStructSchema. {code:java} //代码占位符 private def createRowWithRequiredSchema(row: InternalRow): InternalRow = { val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema) val posIterator = requiredFieldPosition.iterator var curIndex = 0 tableState.requiredStructSchema.foreach( f => { val curPos = posIterator.next() val curField = row.get(curPos, f.dataType) rowToReturn.update(curIndex, curField) curIndex = curIndex + 1 } ) rowToReturn } {code} Hoodie doesn't check isNull when get value from all fields here. If vectorization is enabled, which means row is *ColumnarBatchRow*_*.*_ ***ColumnarBatchRow* may return non-null value even if value of field is null. So, hoodie may set non-null value in field which is null. was: When HoodieMergeOnReadRDD read record from base file, will create new InternalRow base on requiredStructSchema. {code:java} //代码占位符 private def createRowWithRequiredSchema(row: InternalRow): InternalRow = { val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema) val posIterator = requiredFieldPosition.iterator var curIndex = 0 tableState.requiredStructSchema.foreach( f => { val curPos = posIterator.next() val curField = row.get(curPos, f.dataType) rowToReturn.update(curIndex, curField) curIndex = curIndex + 1 } ) rowToReturn } {code} > Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set > non-null value in field which is null if vectorization is enabled. > --- > > Key: HUDI-1667 > URL: https://issues.apache.org/jira/browse/HUDI-1667 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: Lietong Liu >Priority: Major > Fix For: 0.6.0 > > > When HoodieMergeOnReadRDD read record from base file, will create new > InternalRow base on requiredStructSchema. > {code:java} > //代码占位符 > private def createRowWithRequiredSchema(row: InternalRow): InternalRow = { > val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema) > val posIterator = requiredFieldPosition.iterator > var curIndex = 0 > tableState.requiredStructSchema.foreach( > f => { > val curPos = posIterator.next() > val curField = row.get(curPos, f.dataType) > rowToReturn.update(curIndex, curField) > curIndex = curIndex + 1 > } > ) > rowToReturn > } > {code} > Hoodie doesn't check isNull when get value from all fields here. > If vectorization is enabled, which means row is *ColumnarBatchRow*_*.*_ > ***ColumnarBatchRow* may return non-null value even if value of field is > null. So, hoodie may set non-null value in field which is null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1667) Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set non-null value in field which is null if vectorization is enabled.
Lietong Liu created HUDI-1667: - Summary: Fix bug when HoodieMergeOnReadRDD read record from base file, Hoodie may set non-null value in field which is null if vectorization is enabled. Key: HUDI-1667 URL: https://issues.apache.org/jira/browse/HUDI-1667 Project: Apache Hudi Issue Type: Bug Components: Common Core Reporter: Lietong Liu When HoodieMergeOnReadRDD read record from base file, will create new InternalRow base on requiredStructSchema. {code:java} //代码占位符 private def createRowWithRequiredSchema(row: InternalRow): InternalRow = { val rowToReturn = new SpecificInternalRow(tableState.requiredStructSchema) val posIterator = requiredFieldPosition.iterator var curIndex = 0 tableState.requiredStructSchema.foreach( f => { val curPos = posIterator.next() val curField = row.get(curPos, f.dataType) rowToReturn.update(curIndex, curField) curIndex = curIndex + 1 } ) rowToReturn } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1583) Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read.
[ https://issues.apache.org/jira/browse/HUDI-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu resolved HUDI-1583. --- Resolution: Fixed > Hudi will skip remaining log files if there is logFile with zero size in > logFileList when merge on read. > - > > Key: HUDI-1583 > URL: https://issues.apache.org/jira/browse/HUDI-1583 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.6.0 >Reporter: Lietong Liu >Priority: Major > Fix For: 0.6.0 > > > When 'spark.speculation' is enabled, there may be logFile with zero size. > *HoodieLogFormatReader.hasNext()* will return false when encounter logFile > with zero size,which will skip remaining log files。 > > {code:java} > @Override > public boolean hasNext() { > if (currentReader == null) > { return false; } > else if (currentReader.hasNext()) > { return true; } > else if (logFiles.size() > 0) { > try { > HoodieLogFile nextLogFile = logFiles.remove(0); > // First close previous reader only if readBlockLazily is true > if (!readBlocksLazily) > { this.currentReader.close(); } > else > { this.prevReadersInOpenState.add(currentReader); } > this.currentReader = > new HoodieLogFileReader(fs, nextLogFile, readerSchema, bufferSize, > readBlocksLazily, false); > } catch (IOException io) > { throw new HoodieIOException("unable to initialize read with log file ", > io); } > LOG.info("Moving to the next reader for logfile " + > currentReader.getLogFile()); > return this.currentReader.hasNext() || hasNext(); > } > return false; > } > > {code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1583) Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read.
[ https://issues.apache.org/jira/browse/HUDI-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu updated HUDI-1583: -- Description: When 'spark.speculation' is enabled, there may be logFile with zero size. *HoodieLogFormatReader.hasNext()* will return false when encounter logFile with zero size,which will skip remaining log files。 {code:java} @Override public boolean hasNext() { if (currentReader == null) { return false; } else if (currentReader.hasNext()) { return true; } else if (logFiles.size() > 0) { try { HoodieLogFile nextLogFile = logFiles.remove(0); // First close previous reader only if readBlockLazily is true if (!readBlocksLazily) { this.currentReader.close(); } else { this.prevReadersInOpenState.add(currentReader); } this.currentReader = new HoodieLogFileReader(fs, nextLogFile, readerSchema, bufferSize, readBlocksLazily, false); } catch (IOException io) { throw new HoodieIOException("unable to initialize read with log file ", io); } LOG.info("Moving to the next reader for logfile " + currentReader.getLogFile()); return this.currentReader.hasNext() || hasNext(); } return false; } {code} was: When `spark.speculation` is enabled, there may be logFile with zero size. `HoodieLogFormatReader.hasNext()` will return false when encounter logFile with zero size,which will skip remaining log files。 ``` @Override public boolean hasNext() { if (currentReader == null) { return false; } else if (currentReader.hasNext()) { return true; } else if (logFiles.size() > 0) { try { HoodieLogFile nextLogFile = logFiles.remove(0); // First close previous reader only if readBlockLazily is true if (!readBlocksLazily) { this.currentReader.close(); } else { this.prevReadersInOpenState.add(currentReader); } this.currentReader = new HoodieLogFileReader(fs, nextLogFile, readerSchema, bufferSize, readBlocksLazily, false); } catch (IOException io) { throw new HoodieIOException("unable to initialize read with log file ", io); } LOG.info("Moving to the next reader for logfile " + currentReader.getLogFile()); return this.currentReader.hasNext() || hasNext(); } return false; } ``` > Hudi will skip remaining log files if there is logFile with zero size in > logFileList when merge on read. > - > > Key: HUDI-1583 > URL: https://issues.apache.org/jira/browse/HUDI-1583 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.6.0 >Reporter: Lietong Liu >Priority: Major > Fix For: 0.6.0 > > > When 'spark.speculation' is enabled, there may be logFile with zero size. > *HoodieLogFormatReader.hasNext()* will return false when encounter logFile > with zero size,which will skip remaining log files。 > > {code:java} > @Override > public boolean hasNext() { > if (currentReader == null) > { return false; } > else if (currentReader.hasNext()) > { return true; } > else if (logFiles.size() > 0) { > try { > HoodieLogFile nextLogFile = logFiles.remove(0); > // First close previous reader only if readBlockLazily is true > if (!readBlocksLazily) > { this.currentReader.close(); } > else > { this.prevReadersInOpenState.add(currentReader); } > this.currentReader = > new HoodieLogFileReader(fs, nextLogFile, readerSchema, bufferSize, > readBlocksLazily, false); > } catch (IOException io) > { throw new HoodieIOException("unable to initialize read with log file ", > io); } > LOG.info("Moving to the next reader for logfile " + > currentReader.getLogFile()); > return this.currentReader.hasNext() || hasNext(); > } > return false; > } > > {code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1583) Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read.
[ https://issues.apache.org/jira/browse/HUDI-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu updated HUDI-1583: -- Description: When `spark.speculation` is enabled, there may be logFile with zero size. `HoodieLogFormatReader.hasNext()` will return false when encounter logFile with zero size,which will skip remaining log files。 ``` @Override public boolean hasNext() { if (currentReader == null) { return false; } else if (currentReader.hasNext()) { return true; } else if (logFiles.size() > 0) { try { HoodieLogFile nextLogFile = logFiles.remove(0); // First close previous reader only if readBlockLazily is true if (!readBlocksLazily) { this.currentReader.close(); } else { this.prevReadersInOpenState.add(currentReader); } this.currentReader = new HoodieLogFileReader(fs, nextLogFile, readerSchema, bufferSize, readBlocksLazily, false); } catch (IOException io) { throw new HoodieIOException("unable to initialize read with log file ", io); } LOG.info("Moving to the next reader for logfile " + currentReader.getLogFile()); return this.currentReader.hasNext() || hasNext(); } return false; } ``` was: When `spark.speculation` is enabled, there may be logFile with zero size. `HoodieLogFormatReader.hasNext()` will return false when encounter logFile with zero size,which will skip remaining log files。 > Hudi will skip remaining log files if there is logFile with zero size in > logFileList when merge on read. > - > > Key: HUDI-1583 > URL: https://issues.apache.org/jira/browse/HUDI-1583 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.6.0 >Reporter: Lietong Liu >Priority: Major > Fix For: 0.6.0 > > > When `spark.speculation` is enabled, there may be logFile with zero size. > `HoodieLogFormatReader.hasNext()` will return false when encounter logFile > with zero size,which will skip remaining log files。 > ``` > @Override > public boolean hasNext() { > if (currentReader == null) { > return false; > } else if (currentReader.hasNext()) { > return true; > } else if (logFiles.size() > 0) { > try { > HoodieLogFile nextLogFile = logFiles.remove(0); > // First close previous reader only if readBlockLazily is true > if (!readBlocksLazily) { > this.currentReader.close(); > } else { > this.prevReadersInOpenState.add(currentReader); > } > this.currentReader = > new HoodieLogFileReader(fs, nextLogFile, readerSchema, bufferSize, > readBlocksLazily, false); > } catch (IOException io) { > throw new HoodieIOException("unable to initialize read with log file ", io); > } > LOG.info("Moving to the next reader for logfile " + > currentReader.getLogFile()); > return this.currentReader.hasNext() || hasNext(); > } > return false; > } > ``` > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1583) Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read.
[ https://issues.apache.org/jira/browse/HUDI-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu updated HUDI-1583: -- Attachment: (was: image-2021-02-19-19-07-49-264.png) > Hudi will skip remaining log files if there is logFile with zero size in > logFileList when merge on read. > - > > Key: HUDI-1583 > URL: https://issues.apache.org/jira/browse/HUDI-1583 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.6.0 >Reporter: Lietong Liu >Priority: Major > Fix For: 0.6.0 > > > When `spark.speculation` is enabled, there may be logFile with zero size. > `HoodieLogFormatReader.hasNext()` will return false when encounter logFile > with zero size,which will skip remaining log files。 > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1583) Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read.
[ https://issues.apache.org/jira/browse/HUDI-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu updated HUDI-1583: -- Description: When `spark.speculation` is enabled, there may be logFile with zero size. `HoodieLogFormatReader.hasNext()` will return false when encounter logFile with zero size,which will skip remaining log files。 was: When `spark.speculation` is enabled, there may be logFile with zero size. `HoodieLogFormatReader.hasNext()` will return false when encounter logFile with zero size,which will skip remaining log files。 !image-2021-02-19-19-07-49-264.png! > Hudi will skip remaining log files if there is logFile with zero size in > logFileList when merge on read. > - > > Key: HUDI-1583 > URL: https://issues.apache.org/jira/browse/HUDI-1583 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.6.0 >Reporter: Lietong Liu >Priority: Major > Fix For: 0.6.0 > > > When `spark.speculation` is enabled, there may be logFile with zero size. > `HoodieLogFormatReader.hasNext()` will return false when encounter logFile > with zero size,which will skip remaining log files。 > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1583) Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read.
[ https://issues.apache.org/jira/browse/HUDI-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu updated HUDI-1583: -- Attachment: image-2021-02-19-19-07-49-264.png > Hudi will skip remaining log files if there is logFile with zero size in > logFileList when merge on read. > - > > Key: HUDI-1583 > URL: https://issues.apache.org/jira/browse/HUDI-1583 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.6.0 >Reporter: Lietong Liu >Priority: Major > Fix For: 0.6.0 > > Attachments: image-2021-02-19-19-07-49-264.png > > > When `spark.speculation` is enabled, there may be logFile with zero size. > `HoodieLogFormatReader.hasNext()` will return false when encounter logFile > with zero size,which will skip remaining log files。 > > !image-2021-02-19-19-07-49-264.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1583) Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read.
[ https://issues.apache.org/jira/browse/HUDI-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lietong Liu updated HUDI-1583: -- Description: When `spark.speculation` is enabled, there may be logFile with zero size. `HoodieLogFormatReader.hasNext()` will return false when encounter logFile with zero size,which will skip remaining log files。 !image-2021-02-19-19-07-49-264.png! > Hudi will skip remaining log files if there is logFile with zero size in > logFileList when merge on read. > - > > Key: HUDI-1583 > URL: https://issues.apache.org/jira/browse/HUDI-1583 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.6.0 >Reporter: Lietong Liu >Priority: Major > Fix For: 0.6.0 > > Attachments: image-2021-02-19-19-07-49-264.png > > > When `spark.speculation` is enabled, there may be logFile with zero size. > `HoodieLogFormatReader.hasNext()` will return false when encounter logFile > with zero size,which will skip remaining log files。 > > !image-2021-02-19-19-07-49-264.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1583) Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read.
Lietong Liu created HUDI-1583: - Summary: Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read. Key: HUDI-1583 URL: https://issues.apache.org/jira/browse/HUDI-1583 Project: Apache Hudi Issue Type: Bug Components: Common Core Affects Versions: 0.6.0 Reporter: Lietong Liu Fix For: 0.6.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)