[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lamber-ken updated HUDI-625: ---------------------------- Attachment: image-2020-02-21-15-35-56-637.png > Address performance concerns on DiskBasedMap.get() during upsert of thin > records > -------------------------------------------------------------------------------- > > Key: HUDI-625 > URL: https://issues.apache.org/jira/browse/HUDI-625 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Performance, Writer Core > Reporter: Vinoth Chandar > Assignee: Vinoth Chandar > Priority: Major > Fix For: 0.6.0 > > Attachments: image-2020-02-20-23-34-24-155.png, > image-2020-02-20-23-34-27-466.png, image-2020-02-21-15-35-56-637.png > > > [https://github.com/apache/incubator-hudi/issues/1328] > > So what's going on here is that each entry (single data field) is estimated > to be around 500-750 bytes in memory and things spill a lot... > {code:java} > 20/02/20 23:00:39 INFO ExternalSpillableMap: Estimated Payload size => 760 > for 3675605,HoodieRecord{key=HoodieKey { recordKey=3675605 > partitionPath=default}, currentLocation='HoodieRecordLocation > {instantTime=20200220225748, fileId=499f8d2c-df6a-4275-9166-3de4ac91f3bf-0}', > newLocation='HoodieRecordLocation {instantTime=20200220225921, > fileId=499f8d2c-df6a-4275-9166-3de4ac91f3bf-0}'} {code} > > This is not too far from reality > !image-2020-02-20-23-34-27-466.png|width=952,height=58! > !image-2020-02-20-23-34-24-155.png|width=975,height=19! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)