On Thu, Dec 11, 2014 at 10:05:20AM +0800, Qu Wenruo wrote:
> 
> -------- Original Message --------
> Subject: Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?
> From: Zygo Blaxell <zblax...@furryterror.org>
> To: Qu Wenruo <quwen...@cn.fujitsu.com>
> Date: 2014年12月11日 05:57
> >On Thu, Dec 04, 2014 at 02:56:55PM +0800, Qu Wenruo wrote:
> >>The main memory usage in btrfsck is extent record, which
> >>we can't free them until we read them all in and checked, so even we
> >>mmap/unmap, it can only help with
> >>the extent_buffer(which is already freed if not used according to refs).
> >I'm thinking aloud here, but is it *really* necessary to read everything
> >into memory?
> Totally agreed to only read what we need.
> But some backref and counts on refs can only be determined after a
> full scan, especially for leaf/node corruption
> case.

It might be faster (and smaller) to pipe them out to sort (with gzip/lzma
compression on temporary files) than to try to insert them in a tree.
I have used that technique in some of my deduplicating programs.  It can
cut the working set size by several orders of magnitude (trading it for
an O(n log n) sort, which will mostly read and write sequentially).

e.g. duplicate refs will all sort together, so when you are sequentially
reading the sorted data and the current key value changes, you know you've
seen everything that could be a duplicate, and can discard everything
in RAM.

Attachment: signature.asc
Description: Digital signature

Reply via email to