Hi Yexiang, Thanks for reporting the issue. The issue is not a regression in Hudi 1.0.2 RC2, and there is a workaround. So it might be not a blocker of the release, yet we should definitely fix it for better user experience.
Thanks, - Ethan On Wed, Apr 23, 2025 at 7:48 AM Yexiang Chang <yc2...@cornell.edu> wrote: > I just tested 1.0.2-rc2 on EMR-7.8 and can confirm HUDI-9119( > https://issues.apache.org/jira/browse/HUDI-9119) still persists. > > This is going to critically impact Hudi users on EMR, was wondering if we > can fix this for the 1.0.2 release? > > On Tue, Apr 22, 2025 at 8:39 PM Danny Chan <danny0...@apache.org> wrote: > > > -1 on this. > > > > we found the new fg reader just caches the Spark GenericInternalRow in > > records cache, which is 5x larger than the original avro bytes based > > payload records, thus, the records is more prone to spill, the spill > > is kind of a bottleneck of the compaction/regular reader read path, > > the spill causes performance regression actually. We should mark this > > as block of 1.0.2 I think. Also the cache takes a map metadata for > > each record which also takes a lot of memory(the map obj takes a lot > > of memory itself). To address this issue, I have fired a JIRA task: > > https://issues.apache.org/jira/browse/HUDI-9318 > > > > Best, > > Danny > > > > Voon <v...@apache.org> 于2025年4月22日周二 09:59写道: > > > > > > Hi everyone, > > > > > > Please review and vote on the release candidate #2 for the version > 1.0.2, > > > as follows: > > > > > > [ ] +1, Approve the release > > > > > > [ ] -1, Do not approve the release (please provide specific comments) > > > > > > > > > > > > The complete staging area is available for your review, which includes: > > > > > > * JIRA release notes [1], > > > > > > * the official Apache source release and binary convenience releases to > > be > > > deployed to dist.apache.org [2], which are signed with the key with > > > fingerprint B8DC892C439CCB5C0CCA3BEA68050B561D9AFB32 [3], > > > > > > * all artifacts to be deployed to the Maven Central Repository [4], > > > > > > * source code tag "1.0.2-rc2" [5], > > > > > > > > > > > > The vote will be open for at least 72 hours. It is adopted by majority > > > approval, with at least 3 PMC affirmative votes. > > > > > > > > > > > > Thanks, > > > > > > Release Manager > > > > > > > > > > > > [1] > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12355558 > > > > > > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-1.0.2-rc2/ > > > > > > [3] https://dist.apache.org/repos/dist/release/hudi/KEYS > > > > > > [4] > > https://repository.apache.org/content/repositories/orgapachehudi-1149/ > > > > > > [5] https://github.com/apache/hudi/releases/tag/release-1.0.2-rc2 > > >