[ 
https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yaojingyi updated HUDI-917:
---------------------------
    Description: 
'Write Amplification Factor' = 'Total Written' / 'Total Upserted' 

'Total Written' is always increase.

'Total Upserted' is not always increase. When the newest commit is insert new 
data,'Total Upserted' will be 0.

It leads the result are not in line with our understanding. 

 

if I insert 3 times, update 1 times, then insert again(10 rows per time)

'stats wa ' result as follows:

!image-2020-05-21-14-22-39-624.png!

 

'Total Written' need change to get the difference between adjacent commit.

I found that the numsWrites always increase. It's the reason of this.

 

 

 

 

  was:
'Write Amplification Factor' = 'Total Written' / 'Total Upserted' 

'Total Written' is always increase.

'Total Upserted' is not always increase. When the newest commit is insert new 
data,'Total Upserted' will be 0.

It leads the result are not in line with our understanding. 

 

if I insert 3 times, update 1 times, then insert again(10 rows per time)

'stats wa ' result as follows:

!image-2020-05-21-14-22-39-624.png!

 

'Total Written' need change to get the difference between adjacent commit.


> Calculation of 'stats wa' need to be modified
> ---------------------------------------------
>
>                 Key: HUDI-917
>                 URL: https://issues.apache.org/jira/browse/HUDI-917
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: CLI
>            Reporter: yaojingyi
>            Priority: Major
>         Attachments: image-2020-05-21-14-10-03-871.png, 
> image-2020-05-21-14-21-33-244.png, image-2020-05-21-14-22-39-624.png
>
>
> 'Write Amplification Factor' = 'Total Written' / 'Total Upserted' 
> 'Total Written' is always increase.
> 'Total Upserted' is not always increase. When the newest commit is insert new 
> data,'Total Upserted' will be 0.
> It leads the result are not in line with our understanding. 
>  
> if I insert 3 times, update 1 times, then insert again(10 rows per time)
> 'stats wa ' result as follows:
> !image-2020-05-21-14-22-39-624.png!
>  
> 'Total Written' need change to get the difference between adjacent commit.
> I found that the numsWrites always increase. It's the reason of this.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to