Am 14.08.2012 16:00, schrieb Florian Philipp:
> Am 13.08.2012 20:18, schrieb Michael Hampicke:
>> Am 13.08.2012 19:14, schrieb Florian Philipp:
>>> Am 13.08.2012 16:52, schrieb Michael Mol:
>>>> On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke
>>>> < <>> wrote:
>>>>         Have you indexed your ext4 partition?
>>>>         # tune2fs -O dir_index /dev/your_partition
>>>>         # e2fsck -D /dev/your_partition
>>>>     Hi, the dir_index is active. I guess that's why delete operations
>>>>     take as long as they take (index has to be updated every time) 
>>>> 1) Scan for files to remove
>>>> 2) disable index
>>>> 3) Remove files
>>>> 4) enable index
>>>> ?
>>>> -- 
>>>> :wq
>>> Other things to think about:
>>> 1. Play around with data=journal/writeback/ordered. IIRC, data=journal
>>> actually used to improve performance depending on the workload as it
>>> delays random IO in favor of sequential IO (when updating the journal).
>>> 2. Increase the journal size.
>>> 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of
>>> course this only helps after re-allocating everything.
>>> 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since
>>> 2.6.39 IIRC). For example:
>>> find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \
>>> xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f
>>> 5. Use a separate device for the journal.
>>> 6. Temporarily deactivate the journal with tune2fs similar to MM's idea.
>>> Regards,
>>> Florian Philipp
>> Trying out different journals-/options was already on my list, but the
>> manpage on chattr regarding the T attribute is an interesting read.
>> Definitely worth trying.
>> Parallelizing multiple finds was something I already did, but the only
>> thing that increased was the IO wait :) But now having read all the
>> suggestions in this thread, I might try it again.
>> Separate device for the journal is a good idea, but not possible atm
>> (machine is abroad in a data center)
> Something else I just remembered. I guess it doesn't help you with your
> current problem but it might come in handy when working with such large
> cache dirs: I once wrote a script that sorts files by their starting
> physical block. This improved reading them quite a bit (2 minutes
> instead of 11 minutes for copying the portage tree).
> It's a terrible clutch, will probably fail when passing FS boundaries or
> a thousand other oddities and requires root for some very scary
> programs. I never had the time to finish an improved C version. Anyway,
> maybe it helps you:
> #!/bin/bash
> #
> # Example below copies /usr/portage/* to /tmp/portage.
> # Replace /usr/portage with the input directory.
> # Replace `cpio` with whatever does the actual work. Input is a
> # \0-delimited file list.
> #
> FIFO=/tmp/$(uuidgen).fifo
> mkfifo "$FIFO"
> find /usr/portage -type f -fprintf "$FIFO" 'bmap <%i> 0\n' -print0 |
> tr '\n\0' '\0\n' |
> paste <(
>   debugfs -f "$FIFO" /dev/mapper/vg-portage |
>   grep -E '^[[:digit:]]+'
> ) - |
> sort -k 1,1n |
> cut -f 2- |
> tr '\n\0' '\0\n' |
> cpio -p0 --make-directories /tmp/portage/
> unlink "$FIFO"

No, I don't think that's practicable with the number of files in my
setup. To be honest, currently I am quite happy with the performance of
btrfs. Running through the directory tree only takes 1/10th of the time
it took with ext4, and deletes are pretty fast as well. I'm sure there's
still room for more improvement, but right now it's much better than it
was before.

Reply via email to