danny0405 commented on issue #7897: URL: https://github.com/apache/hudi/issues/7897#issuecomment-1426629317
> I noticed that for each new record I append I had parquet file,so, first parquet has the first record, then when i insert new row a second parquet file created with both records, and when I insert for the third time a third parquet file is created with the 3 rows and when I update any of them I have a log file contains the update, and after number of appends the parqeut files compacted into one parquet file(the newest parquet file is kept (which has the three records appended) however , the other two parquet files are removed. This is actually how the `BLOOM_FILTER` index works, all the inserts are written into a new FileSlice, only delta updates are written into logs.(Because you know, for UPDATEs, Hudi needs to know where its old records are located). And there are also small file/fileSlice strategy here so that things are kind of more complex, like you have perceived that new records are written into the same file group. The rt view would merge all the base parquet and delta logs so that the result is correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org