[ https://issues.apache.org/jira/browse/HUDI-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-3301: ----------------------------- Sprint: (was: 2022/09/19) > MergedLogRecordReader inline reading should be stateless and thread safe > ------------------------------------------------------------------------ > > Key: HUDI-3301 > URL: https://issues.apache.org/jira/browse/HUDI-3301 > Project: Apache Hudi > Issue Type: Bug > Components: metadata > Reporter: Manoj Govindassamy > Assignee: Yue Zhang > Priority: Major > Fix For: 0.12.1 > > > Metadata table inline reading (enable.full.scan.log.files = false) today > alters instance member fields and not thread safe. > > When the inline reading is enabled, HoodieMetadataMergedLogRecordReader > doesn't do full read of log and base files and doesn't fill in the > ExternalSpillableMap records cache. Each getRecordsByKeys() thereby will > re-read the log and base files by design. But the issue here is this reading > alters the instance members and the filled in records are relevant only for > that request. Any concurrent getRecordsByKeys() is also modifying the member > variable leading to NPE. > > To avoid this, a temporary fix of making getRecordsByKeys() a synchronized > method has been pushed to master. But this fix doesn't solve all usecases. We > need to make the whole class stateless and thread safe for inline reading. -- This message was sent by Atlassian Jira (v8.20.10#820010)