Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-16 Thread Neil Bothwick
On Tue, 14 Aug 2012 15:54:17 +0200, Daniel Troeder wrote:

  sys-process/incron ?  
 Uh... didn't know that one! ... very interesting :)
 
 Have you used it?

Yes...

 How does it perform if there are lots of modifications going on?
 Does it have a throttle against fork bombing?
 must-read-myself-a-little.

but only for fairly infrequently written locations. I have no idea how
well it scales.


-- 
Neil Bothwick

Hors d'oeuvres: 3 sandwiches cut into 40 pieces.


signature.asc
Description: PGP signature


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-15 Thread Bill Kenworthy
On Tue, 2012-08-14 at 18:36 +0200, Helmut Jarausch wrote:
 On 08/14/2012 04:07:39 AM, Adam Carter wrote:
   I think btrfs probably is meant to provide a lot of the modern
   features like reiser4 or xfs
  
  Unfortunately btrfs is still generally slower than ext4 for example.
  Checkout http://openbenchmarking.org/, eg
  http://openbenchmarking.org/s/ext4%20btrfs
  
  The OS will use any spare RAM for disk caching, so if there's not much
  else running on that box, most of your content will be served from
  RAM. It may be that whatever fs you choose wont make that much of a
  difference anyways.
  
 
 If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's  
 used by some distribution and Oracle for real work)
 Most benchmark don't use compression since other FS can't use it. But  
 that's unfair. With compression, one needs to read
 much less data (my /usr partition has less than 50% of an ext4  
 partition, savings with the root partition are even higher).
 
 I'm using the mount options  
 compress=lzo,noacl,noatime,autodefrag,space_cache which require a  
 recent kernel.
 
 I'd give it a try.
 
 Helmut.
 
 

Whats the latest on fsck tools for BTRFS? - useful ones are still not
available right?  Reason I am asking is that is not an easy question to
google, and my last attempt to use BTRFS for serious work ended in tears
when I couldn't rescue a corrupted file system.

BillK






Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-15 Thread Bill Kenworthy
On Wed, 2012-08-15 at 15:31 +0800, Bill Kenworthy wrote:
 On Tue, 2012-08-14 at 18:36 +0200, Helmut Jarausch wrote:
  On 08/14/2012 04:07:39 AM, Adam Carter wrote:
I think btrfs probably is meant to provide a lot of the modern
features like reiser4 or xfs
   
   Unfortunately btrfs is still generally slower than ext4 for example.
   Checkout http://openbenchmarking.org/, eg
   http://openbenchmarking.org/s/ext4%20btrfs
   
   The OS will use any spare RAM for disk caching, so if there's not much
   else running on that box, most of your content will be served from
   RAM. It may be that whatever fs you choose wont make that much of a
   difference anyways.
   
  
  If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's  
  used by some distribution and Oracle for real work)
  Most benchmark don't use compression since other FS can't use it. But  
  that's unfair. With compression, one needs to read
  much less data (my /usr partition has less than 50% of an ext4  
  partition, savings with the root partition are even higher).
  
  I'm using the mount options  
  compress=lzo,noacl,noatime,autodefrag,space_cache which require a  
  recent kernel.
  
  I'd give it a try.
  
  Helmut.
  
  
 
 Whats the latest on fsck tools for BTRFS? - useful ones are still not
 available right?  Reason I am asking is that is not an easy question to
 google, and my last attempt to use BTRFS for serious work ended in tears
 when I couldn't rescue a corrupted file system.
 
 BillK

Sorry, replying to myself to clarify ... I sent this as I was reading
the backlog before the statement that the tools are incomplete.  my
question is more along the lines of do they work? (which was answered as
I do not know in posted links which are probably old)

Another point I just saw is its inability to support swapfiles.  Also in
the past OO would not compile on a btrfs (/tmp/portage) filesystem as it
did something that basicly killed everything.  Other packages were fine.
Then there was a certain man page I couldnt backup to a btrfs file
system, ~/.gvfs files that hung the system when I tried to put on btrfs.
Hopefully they have been fixed.



BillK







Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Daniel Troeder
On 13.08.2012 16:53, Michael Hampicke wrote:
 2012/8/13 Daniel Troeder dan...@admin-box.com
 3rd thought: purging old files with find? your cache system should
 have some kind of DB that holds that information.
 3: Well, it's a 3rd party application that - in theory - should take
 care of removing old files. Sadly, it does not work as it's supposed to
 be, While time passes the number of orphans grow :(
There is also the possibility to write a really small daemon (less than
50 lines of C) that registers with inotify for the entire fs and
journals the file activity to a sqlite-db.

A simple sql-query from a cron/bash script will then give you all the
files to delete with paths.

It will probably be less work to write the daemon than to do 40
fs-benchmarks - and the result will be the most efficient.



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Neil Bothwick
On Tue, 14 Aug 2012 10:21:54 +0200, Daniel Troeder wrote:

 There is also the possibility to write a really small daemon (less than
 50 lines of C) that registers with inotify for the entire fs and
 journals the file activity to a sqlite-db.

sys-process/incron ?


-- 
Neil Bothwick

A friend of mine sent me a postcard with a satellite photo of the
entire planet on it, and on the back he wrote, Wish you were here.


signature.asc
Description: PGP signature


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Florian Philipp
Am 14.08.2012 11:46, schrieb Neil Bothwick:
 On Tue, 14 Aug 2012 10:21:54 +0200, Daniel Troeder wrote:
 
 There is also the possibility to write a really small daemon (less than
 50 lines of C) that registers with inotify for the entire fs and
 journals the file activity to a sqlite-db.
 
 sys-process/incron ?
 
 

I think in order to make it work, you have to increase the number of
file descriptors available to inotify. See
/proc/sys/fs/inotify/max_user_watches

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Daniel Troeder
On 14.08.2012 11:46, Neil Bothwick wrote:
 On Tue, 14 Aug 2012 10:21:54 +0200, Daniel Troeder wrote:
 
 There is also the possibility to write a really small daemon (less than
 50 lines of C) that registers with inotify for the entire fs and
 journals the file activity to a sqlite-db.
 
 sys-process/incron ?
Uh... didn't know that one! ... very interesting :)

Have you used it?
How does it perform if there are lots of modifications going on?
Does it have a throttle against fork bombing?
must-read-myself-a-little.

A incron line
# sqlite3 /file.sql 'INSERT filename, date INTO table'
would be inefficient, because it spawn lots of processes, but it would
be very nice to simply test out the idea. Then a
# sqlite3 /file.sql 'SELECT filename FROM table SORTBY date  date-30days'
or something to get the files older than 30 days, and voilá :)




Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Florian Philipp
Am 13.08.2012 20:18, schrieb Michael Hampicke:
 Am 13.08.2012 19:14, schrieb Florian Philipp:
 Am 13.08.2012 16:52, schrieb Michael Mol:
 On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke
 mgehampi...@gmail.com mailto:mgehampi...@gmail.com wrote:

 Have you indexed your ext4 partition?

 # tune2fs -O dir_index /dev/your_partition
 # e2fsck -D /dev/your_partition

 Hi, the dir_index is active. I guess that's why delete operations
 take as long as they take (index has to be updated every time) 


 1) Scan for files to remove
 2) disable index
 3) Remove files
 4) enable index

 ?

 -- 
 :wq

 Other things to think about:

 1. Play around with data=journal/writeback/ordered. IIRC, data=journal
 actually used to improve performance depending on the workload as it
 delays random IO in favor of sequential IO (when updating the journal).

 2. Increase the journal size.

 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of
 course this only helps after re-allocating everything.

 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since
 2.6.39 IIRC). For example:
 find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \
 xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f

 5. Use a separate device for the journal.

 6. Temporarily deactivate the journal with tune2fs similar to MM's idea.

 Regards,
 Florian Philipp

 
 Trying out different journals-/options was already on my list, but the
 manpage on chattr regarding the T attribute is an interesting read.
 Definitely worth trying.
 
 Parallelizing multiple finds was something I already did, but the only
 thing that increased was the IO wait :) But now having read all the
 suggestions in this thread, I might try it again.
 
 Separate device for the journal is a good idea, but not possible atm
 (machine is abroad in a data center)
 

Something else I just remembered. I guess it doesn't help you with your
current problem but it might come in handy when working with such large
cache dirs: I once wrote a script that sorts files by their starting
physical block. This improved reading them quite a bit (2 minutes
instead of 11 minutes for copying the portage tree).

It's a terrible clutch, will probably fail when passing FS boundaries or
a thousand other oddities and requires root for some very scary
programs. I never had the time to finish an improved C version. Anyway,
maybe it helps you:

#!/bin/bash
#
# Example below copies /usr/portage/* to /tmp/portage.
# Replace /usr/portage with the input directory.
# Replace `cpio` with whatever does the actual work. Input is a
# \0-delimited file list.
#
FIFO=/tmp/$(uuidgen).fifo
mkfifo $FIFO
find /usr/portage -type f -fprintf $FIFO 'bmap %i 0\n' -print0 |
tr '\n\0' '\0\n' |
paste (
  debugfs -f $FIFO /dev/mapper/vg-portage |
  grep -E '^[[:digit:]]+'
) - |
sort -k 1,1n |
cut -f 2- |
tr '\n\0' '\0\n' |
cpio -p0 --make-directories /tmp/portage/
unlink $FIFO



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Florian Philipp
Am 14.08.2012 15:54, schrieb Daniel Troeder:
 On 14.08.2012 11:46, Neil Bothwick wrote:
 On Tue, 14 Aug 2012 10:21:54 +0200, Daniel Troeder wrote:

 There is also the possibility to write a really small daemon (less than
 50 lines of C) that registers with inotify for the entire fs and
 journals the file activity to a sqlite-db.

 sys-process/incron ?
 Uh... didn't know that one! ... very interesting :)
 
 Have you used it?
 How does it perform if there are lots of modifications going on?
 Does it have a throttle against fork bombing?
 must-read-myself-a-little.
 
 A incron line
 # sqlite3 /file.sql 'INSERT filename, date INTO table'
 would be inefficient, because it spawn lots of processes, but it would
 be very nice to simply test out the idea. Then a
 # sqlite3 /file.sql 'SELECT filename FROM table SORTBY date  date-30days'
 or something to get the files older than 30 days, and voilá :)
 
 

Maybe inotifywait is better for this kind of batch job.

Collecting events:
inotifywait -rm -e CREATE,DELETE --timefmt '%s' --format \
  $(printf '%%T\t%%e\t%%w%%f') /tmp  events.tbl
# the printf is there because inotifywait's format does not
# recognize common escapes like \t
# Output format:
# Seconds since epoch \t CREATE/DELETE \t file name \n

Filtering events:
sort --stable -k3 events.tbl |
awk '
  function update() {
line=$0; exists= $2==DELETE ? 0 : 1; file=$3
  }
  NR==1{ update(); next }
  { if($3!=file  exists==1){ print line } update() }'
# Sorts by file name while preserving temporal order.
# Uses awk to suppress output of files that have been deleted.
# Output: Last CREATE event for each existing file

Retrieving files created 30+ days ago:
awk -v newest=$(date -d -5seconds +%s) '
  $1newest{ nextfile }
  { print $3 }'

Remarks:

The awk scripts need some improvement if you have to handle whitespaces
in filenames but with the input format, it should be able to work with
everything except newlines.

Inotifywait itself is utterly useless when dealing with newlines in file
names unless you want to put some serious effort into sanitizing the output.

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Florian Philipp
Am 14.08.2012 17:09, schrieb Florian Philipp:
 
 Retrieving files created 30+ days ago:
 awk -v newest=$(date -d -5seconds +%s) '
   $1newest{ nextfile }
   { print $3 }'
 

s/-5seconds/-30days/



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Helmut Jarausch

On 08/14/2012 04:07:39 AM, Adam Carter wrote:

 I think btrfs probably is meant to provide a lot of the modern
 features like reiser4 or xfs

Unfortunately btrfs is still generally slower than ext4 for example.
Checkout http://openbenchmarking.org/, eg
http://openbenchmarking.org/s/ext4%20btrfs

The OS will use any spare RAM for disk caching, so if there's not much
else running on that box, most of your content will be served from
RAM. It may be that whatever fs you choose wont make that much of a
difference anyways.



If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's  
used by some distribution and Oracle for real work)
Most benchmark don't use compression since other FS can't use it. But  
that's unfair. With compression, one needs to read
much less data (my /usr partition has less than 50% of an ext4  
partition, savings with the root partition are even higher).


I'm using the mount options  
compress=lzo,noacl,noatime,autodefrag,space_cache which require a  
recent kernel.


I'd give it a try.

Helmut.




Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Pandu Poluan
On Aug 14, 2012 11:42 PM, Helmut Jarausch jarau...@igpm.rwth-aachen.de
wrote:

 On 08/14/2012 04:07:39 AM, Adam Carter wrote:

  I think btrfs probably is meant to provide a lot of the modern
  features like reiser4 or xfs

 Unfortunately btrfs is still generally slower than ext4 for example.
 Checkout http://openbenchmarking.org/, eg
 http://openbenchmarking.org/s/ext4%20btrfs

 The OS will use any spare RAM for disk caching, so if there's not much
 else running on that box, most of your content will be served from
 RAM. It may be that whatever fs you choose wont make that much of a
 difference anyways.


 If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's
used by some distribution and Oracle for real work)
 Most benchmark don't use compression since other FS can't use it. But
that's unfair. With compression, one needs to read
 much less data (my /usr partition has less than 50% of an ext4 partition,
savings with the root partition are even higher).

 I'm using the mount options
compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent
kernel.

 I'd give it a try.

 Helmut.


Are the support tools for btrfs (fsck, defrag, etc.) already complete?

If so, I certainly would like to take it out for a spin...

Rgds,


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Jason Weisberger
Sure, but wouldn't compression make write operations slower?  And isn't he
looking for performance?
On Aug 14, 2012 1:14 PM, Pandu Poluan pa...@poluan.info wrote:


 On Aug 14, 2012 11:42 PM, Helmut Jarausch jarau...@igpm.rwth-aachen.de
 wrote:
 
  On 08/14/2012 04:07:39 AM, Adam Carter wrote:
 
   I think btrfs probably is meant to provide a lot of the modern
   features like reiser4 or xfs
 
  Unfortunately btrfs is still generally slower than ext4 for example.
  Checkout http://openbenchmarking.org/, eg
  http://openbenchmarking.org/s/ext4%20btrfs
 
  The OS will use any spare RAM for disk caching, so if there's not much
  else running on that box, most of your content will be served from
  RAM. It may be that whatever fs you choose wont make that much of a
  difference anyways.
 
 
  If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's
 used by some distribution and Oracle for real work)
  Most benchmark don't use compression since other FS can't use it. But
 that's unfair. With compression, one needs to read
  much less data (my /usr partition has less than 50% of an ext4
 partition, savings with the root partition are even higher).
 
  I'm using the mount options
 compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent
 kernel.
 
  I'd give it a try.
 
  Helmut.
 

 Are the support tools for btrfs (fsck, defrag, etc.) already complete?

 If so, I certainly would like to take it out for a spin...

 Rgds,




Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Volker Armin Hemmann
Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger:
 Sure, but wouldn't compression make write operations slower?  And isn't he
 looking for performance?

not really. As long as the CPU can compress faster than the disk can write 
stuff.

More interessting: is btrfs trying to be smart - only compressing compressible 
stuff?

-- 
#163933



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Michael Hampicke
Am 14.08.2012 16:00, schrieb Florian Philipp:
 Am 13.08.2012 20:18, schrieb Michael Hampicke:
 Am 13.08.2012 19:14, schrieb Florian Philipp:
 Am 13.08.2012 16:52, schrieb Michael Mol:
 On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke
 mgehampi...@gmail.com mailto:mgehampi...@gmail.com wrote:

 Have you indexed your ext4 partition?

 # tune2fs -O dir_index /dev/your_partition
 # e2fsck -D /dev/your_partition

 Hi, the dir_index is active. I guess that's why delete operations
 take as long as they take (index has to be updated every time) 


 1) Scan for files to remove
 2) disable index
 3) Remove files
 4) enable index

 ?

 -- 
 :wq

 Other things to think about:

 1. Play around with data=journal/writeback/ordered. IIRC, data=journal
 actually used to improve performance depending on the workload as it
 delays random IO in favor of sequential IO (when updating the journal).

 2. Increase the journal size.

 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of
 course this only helps after re-allocating everything.

 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since
 2.6.39 IIRC). For example:
 find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \
 xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f

 5. Use a separate device for the journal.

 6. Temporarily deactivate the journal with tune2fs similar to MM's idea.

 Regards,
 Florian Philipp


 Trying out different journals-/options was already on my list, but the
 manpage on chattr regarding the T attribute is an interesting read.
 Definitely worth trying.

 Parallelizing multiple finds was something I already did, but the only
 thing that increased was the IO wait :) But now having read all the
 suggestions in this thread, I might try it again.

 Separate device for the journal is a good idea, but not possible atm
 (machine is abroad in a data center)

 
 Something else I just remembered. I guess it doesn't help you with your
 current problem but it might come in handy when working with such large
 cache dirs: I once wrote a script that sorts files by their starting
 physical block. This improved reading them quite a bit (2 minutes
 instead of 11 minutes for copying the portage tree).
 
 It's a terrible clutch, will probably fail when passing FS boundaries or
 a thousand other oddities and requires root for some very scary
 programs. I never had the time to finish an improved C version. Anyway,
 maybe it helps you:
 
 #!/bin/bash
 #
 # Example below copies /usr/portage/* to /tmp/portage.
 # Replace /usr/portage with the input directory.
 # Replace `cpio` with whatever does the actual work. Input is a
 # \0-delimited file list.
 #
 FIFO=/tmp/$(uuidgen).fifo
 mkfifo $FIFO
 find /usr/portage -type f -fprintf $FIFO 'bmap %i 0\n' -print0 |
 tr '\n\0' '\0\n' |
 paste (
   debugfs -f $FIFO /dev/mapper/vg-portage |
   grep -E '^[[:digit:]]+'
 ) - |
 sort -k 1,1n |
 cut -f 2- |
 tr '\n\0' '\0\n' |
 cpio -p0 --make-directories /tmp/portage/
 unlink $FIFO
 

No, I don't think that's practicable with the number of files in my
setup. To be honest, currently I am quite happy with the performance of
btrfs. Running through the directory tree only takes 1/10th of the time
it took with ext4, and deletes are pretty fast as well. I'm sure there's
still room for more improvement, but right now it's much better than it
was before.



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Volker Armin Hemmann
Am Mittwoch, 15. August 2012, 00:05:40 schrieb Pandu Poluan:

 
 Are the support tools for btrfs (fsck, defrag, etc.) already complete?

no

-- 
#163933



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Michael Hampicke
Am 14.08.2012 10:21, schrieb Daniel Troeder:
 On 13.08.2012 16:53, Michael Hampicke wrote:
 2012/8/13 Daniel Troeder dan...@admin-box.com
 3rd thought: purging old files with find? your cache system should
 have some kind of DB that holds that information.
 3: Well, it's a 3rd party application that - in theory - should take
 care of removing old files. Sadly, it does not work as it's supposed to
 be, While time passes the number of orphans grow :(
 There is also the possibility to write a really small daemon (less than
 50 lines of C) that registers with inotify for the entire fs and
 journals the file activity to a sqlite-db.
 
 A simple sql-query from a cron/bash script will then give you all the
 files to delete with paths.
 
 It will probably be less work to write the daemon than to do 40
 fs-benchmarks - and the result will be the most efficient.
 

That is an interesting idea, but I have never used inotify on such a
huge file base, I am not sure what impact that has in terms of cpu
cycles being used. But I am going to try this on some snowy winter
weekend :)



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Michael Hampicke
Am 14.08.2012 19:21, schrieb Jason Weisberger:
 Sure, but wouldn't compression make write operations slower?  And isn't he
 looking for performance?
 On Aug 14, 2012 1:14 PM, Pandu Poluan pa...@poluan.info wrote:
 

 On Aug 14, 2012 11:42 PM, Helmut Jarausch jarau...@igpm.rwth-aachen.de
 wrote:

 On 08/14/2012 04:07:39 AM, Adam Carter wrote:

 I think btrfs probably is meant to provide a lot of the modern
 features like reiser4 or xfs

 Unfortunately btrfs is still generally slower than ext4 for example.
 Checkout http://openbenchmarking.org/, eg
 http://openbenchmarking.org/s/ext4%20btrfs

 The OS will use any spare RAM for disk caching, so if there's not much
 else running on that box, most of your content will be served from
 RAM. It may be that whatever fs you choose wont make that much of a
 difference anyways.


 If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's
 used by some distribution and Oracle for real work)
 Most benchmark don't use compression since other FS can't use it. But
 that's unfair. With compression, one needs to read
 much less data (my /usr partition has less than 50% of an ext4
 partition, savings with the root partition are even higher).

 I'm using the mount options
 compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent
 kernel.

 I'd give it a try.

 Helmut.


 Are the support tools for btrfs (fsck, defrag, etc.) already complete?

 If so, I certainly would like to take it out for a spin...

 Rgds,


 

I have enough cpu power at hand for compression, I guess that should not
be the issue. But the cache dir mostly consists of prescaled jpeg
images, so compressing them again would not give me any benefits, speed-
or size-wise.



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Michael Hampicke
Am 14.08.2012 19:42, schrieb Volker Armin Hemmann:
 Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger:
 Sure, but wouldn't compression make write operations slower?  And isn't he
 looking for performance?
 
 not really. As long as the CPU can compress faster than the disk can write 
 stuff.
 
 More interessting: is btrfs trying to be smart - only compressing 
 compressible 
 stuff?
 

It does do that, but letting btrfs check if the files are already
compressed, if you know, that they are compressed, is a waste of cpu
cycles :)



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Paul Hartman
On Tue, Aug 14, 2012 at 12:05 PM, Pandu Poluan pa...@poluan.info wrote:

 On Aug 14, 2012 11:42 PM, Helmut Jarausch jarau...@igpm.rwth-aachen.de
 wrote:

 On 08/14/2012 04:07:39 AM, Adam Carter wrote:

  I think btrfs probably is meant to provide a lot of the modern
  features like reiser4 or xfs

 Unfortunately btrfs is still generally slower than ext4 for example.
 Checkout http://openbenchmarking.org/, eg
 http://openbenchmarking.org/s/ext4%20btrfs

 The OS will use any spare RAM for disk caching, so if there's not much
 else running on that box, most of your content will be served from
 RAM. It may be that whatever fs you choose wont make that much of a
 difference anyways.


 If one can run a recent kernel (3.5.x) btrfs seems quite stable (It's used
 by some distribution and Oracle for real work)
 Most benchmark don't use compression since other FS can't use it. But
 that's unfair. With compression, one needs to read
 much less data (my /usr partition has less than 50% of an ext4 partition,
 savings with the root partition are even higher).

 I'm using the mount options
 compress=lzo,noacl,noatime,autodefrag,space_cache which require a recent
 kernel.

 I'd give it a try.

 Helmut.


 Are the support tools for btrfs (fsck, defrag, etc.) already complete?

Do they exist? Yes (sys-fs/btrfs-progs). Are they complete? Probably not...



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Alecks Gates
On Tue, Aug 14, 2012 at 12:50 PM, Michael Hampicke gentoo-u...@hadt.biz wrote:
 Am 14.08.2012 19:42, schrieb Volker Armin Hemmann:
 Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger:
 Sure, but wouldn't compression make write operations slower?  And isn't he
 looking for performance?

 not really. As long as the CPU can compress faster than the disk can write
 stuff.

 More interessting: is btrfs trying to be smart - only compressing 
 compressible
 stuff?


 It does do that, but letting btrfs check if the files are already
 compressed, if you know, that they are compressed, is a waste of cpu
 cycles :)


Also look into the difference between compress and compress-force[0].
I wonder how much overhead checking whether or not to compress a file
costs.  I use mount options similar to Helmut and get great results:
defaults,autodefrag,space_cache,compress=lzo,subvol=@,relatime

But most of my data is compressible.  Compression makes such a huge
difference, it surprises me.  Apparently on this Ubuntu system it
automatically makes use of all files on / as a subvolume in @.
Interesting.

Anyway, btrfs-progs does include basic fsck now but I wouldn't use it
for anything serious[1].


[0] https://btrfs.wiki.kernel.org/index.php/Mount_options
[1] https://btrfs.wiki.kernel.org/index.php/Btrfsck



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Michael Mol
On Tue, Aug 14, 2012 at 3:55 PM, Alecks Gates aleck...@gmail.com wrote:

 On Tue, Aug 14, 2012 at 12:50 PM, Michael Hampicke gentoo-u...@hadt.biz
 wrote:
  Am 14.08.2012 19:42, schrieb Volker Armin Hemmann:
  Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger:
  Sure, but wouldn't compression make write operations slower?  And
 isn't he
  looking for performance?
 
  not really. As long as the CPU can compress faster than the disk can
 write
  stuff.
 
  More interessting: is btrfs trying to be smart - only compressing
 compressible
  stuff?
 
 
  It does do that, but letting btrfs check if the files are already
  compressed, if you know, that they are compressed, is a waste of cpu
  cycles :)
 

 Also look into the difference between compress and compress-force[0].
 I wonder how much overhead checking whether or not to compress a file
 costs.  I use mount options similar to Helmut and get great results:
 defaults,autodefrag,space_cache,compress=lzo,subvol=@,relatime

 But most of my data is compressible.  Compression makes such a huge
 difference, it surprises me.  Apparently on this Ubuntu system it
 automatically makes use of all files on / as a subvolume in @.
 Interesting.


Huge difference, how?

Could we see some bonnie++ comparisons between the various configurations
we've discussed for ext4 and btrfs? Depending on the results, it might be
getting time for me to take the plunge myself.

-- 
:wq


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-14 Thread Alecks Gates
On Tue, Aug 14, 2012 at 3:17 PM, Michael Mol mike...@gmail.com wrote:
 On Tue, Aug 14, 2012 at 3:55 PM, Alecks Gates aleck...@gmail.com wrote:

 On Tue, Aug 14, 2012 at 12:50 PM, Michael Hampicke gentoo-u...@hadt.biz
 wrote:
  Am 14.08.2012 19:42, schrieb Volker Armin Hemmann:
  Am Dienstag, 14. August 2012, 13:21:35 schrieb Jason Weisberger:
  Sure, but wouldn't compression make write operations slower?  And
  isn't he
  looking for performance?
 
  not really. As long as the CPU can compress faster than the disk can
  write
  stuff.
 
  More interessting: is btrfs trying to be smart - only compressing
  compressible
  stuff?
 
 
  It does do that, but letting btrfs check if the files are already
  compressed, if you know, that they are compressed, is a waste of cpu
  cycles :)
 

 Also look into the difference between compress and compress-force[0].
 I wonder how much overhead checking whether or not to compress a file
 costs.  I use mount options similar to Helmut and get great results:
 defaults,autodefrag,space_cache,compress=lzo,subvol=@,relatime

 But most of my data is compressible.  Compression makes such a huge
 difference, it surprises me.  Apparently on this Ubuntu system it
 automatically makes use of all files on / as a subvolume in @.
 Interesting.


 Huge difference, how?

 Could we see some bonnie++ comparisons between the various configurations
 we've discussed for ext4 and btrfs? Depending on the results, it might be
 getting time for me to take the plunge myself.

 --
 :wq

Check out some of the benchmarks on Phoronix[0].  It's definitely not
a win-win scenario, but it seems to be great at random writes and
compiling.  And a lot of those wins are without compress=lzo enabled,
so it only gets better.  I'm not going to say it's the absolute best
out there (because it isn't, of course), but it's at least worth
checking into.  I'm using a standard 2.5 HDD like in this[1] so
perhaps that's why I see the results.

[0] http://www.phoronix.com/scan.php?page=searchq=Btrfs
[1] http://www.phoronix.com/scan.php?page=articleitem=btrfs_old_linux31



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Nilesh Govindrajan

On Mon 13 Aug 2012 06:46:53 PM IST, Michael Hampicke wrote:

Howdy gentooers,

I am looking for a filesystem that perfomes well for a cache
directory. Here's some data on that dir:
- cache for prescaled images files + metadata files
- nested directory structure ( 20/2022/202231/*files* )
- about 20GB
- 100.000 directories
- about 2 million files

The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two
10.000rpm hard drives running a RAID1.

Up until now I was using ext4 with noatime, but I am not happy with
it's performence. Finding and deleting old files with 'find' is
incredible slow, so I am looking for a filesystem that performs
better. First candiate that came to mind was reiserfs, but last time I
tried it, it became slower over time (fragmentation?).
Currently I am running a test with btrfs and so far I am quiet happy
with it as it is much faster in my use case.

Do you guys have any other suggestions? How about JFS? I used that on
my old NAS box because of it's low cpu usage. Should I give reiser4 a
try, or better leave it be given Hans Reiser's current status?

Thx in advance,
Mike


You should have a look at xfs.

I used to use ext4 earlier, traversing through /usr/portage used to be 
very slow. When I switched xfs, speed increased drastically.


This might be kind of unrelated, but makes sense.

--
Nilesh Govindrajan
http://nileshgr.com



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Michael Hampicke

 You should have a look at xfs.

 I used to use ext4 earlier, traversing through /usr/portage used to be
 very slow. When I switched xfs, speed increased drastically.

 This might be kind of unrelated, but makes sense.


I guess traversing through directories may be faster with XFS, but in my
experience ext4 perfoms better than XFS in regard to operations (cp, rm) on
small files.
I read that there are some tuning options for XFS and small files, but
never tried it.

But if somone seconds XFS I will try it too.


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Pandu Poluan
On Aug 13, 2012 9:01 PM, Michael Hampicke mgehampi...@gmail.com wrote:

 You should have a look at xfs.

 I used to use ext4 earlier, traversing through /usr/portage used to be
very slow. When I switched xfs, speed increased drastically.

 This might be kind of unrelated, but makes sense.


 I guess traversing through directories may be faster with XFS, but in my
experience ext4 perfoms better than XFS in regard to operations (cp, rm) on
small files.
 I read that there are some tuning options for XFS and small files, but
never tried it.

 But if somone seconds XFS I will try it too.

Have you indexed your ext4 partition?

# tune2fs -O dir_index /dev/your_partition
# e2fsck -D /dev/your_partition

Rgds,


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Daniel Troeder
On 13.08.2012 15:16, Michael Hampicke wrote:
 - about 20GB
 - 100.000 directories
 - about 2 million files
 
 The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two
 10.000rpm hard drives running a RAID1.
1st thought: switch to SSDs
2nd thought: maybe lots of writes? - get a SSD for the fs metadata
3rd thought: purging old files with find? your cache system should
have some kind of DB that holds that information.



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Dale
Michael Hampicke wrote:

 You should have a look at xfs.

 I used to use ext4 earlier, traversing through /usr/portage used
 to be very slow. When I switched xfs, speed increased drastically.

 This might be kind of unrelated, but makes sense.


 I guess traversing through directories may be faster with XFS, but in
 my experience ext4 perfoms better than XFS in regard to operations
 (cp, rm) on small files.
 I read that there are some tuning options for XFS and small files, but
 never tried it.

 But if somone seconds XFS I will try it too.

It's been a while since I messed with this but isn't XFS the one that
hates power failures and such? 

Dale

:-) :-) 

-- 
I am only responsible for what I said ... Not for what you understood or how 
you interpreted my words!



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Michael Hampicke

 Have you indexed your ext4 partition?

 # tune2fs -O dir_index /dev/your_partition
 # e2fsck -D /dev/your_partition

Hi, the dir_index is active. I guess that's why delete operations take as
long as they take (index has to be updated every time)


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Michael Mol
On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke mgehampi...@gmail.comwrote:

 Have you indexed your ext4 partition?

 # tune2fs -O dir_index /dev/your_partition
 # e2fsck -D /dev/your_partition

 Hi, the dir_index is active. I guess that's why delete operations take as
 long as they take (index has to be updated every time)


1) Scan for files to remove
2) disable index
3) Remove files
4) enable index

?

-- 
:wq


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Michael Hampicke
2012/8/13 Daniel Troeder dan...@admin-box.com

 On 13.08.2012 15:16, Michael Hampicke wrote:
  - about 20GB
  - 100.000 directories
  - about 2 million files
 
  The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two
  10.000rpm hard drives running a RAID1.
 1st thought: switch to SSDs
 2nd thought: maybe lots of writes? - get a SSD for the fs metadata
 3rd thought: purging old files with find? your cache system should
 have some kind of DB that holds that information.


1: SSDs are not possible atm. The machine is in a data center abroad.
2: Writes are not that much of a problem at this time.
3: Well, it's a 3rd party application that - in theory - should take care
of removing old files. Sadly, it does not work as it's supposed to be,
While time passes the number of orphans grow :(


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Michael Hampicke

 I guess traversing through directories may be faster with XFS, but in my
 experience ext4 perfoms better than XFS in regard to operations (cp, rm) on
 small files.
 I read that there are some tuning options for XFS and small files, but
 never tried it.

  But if somone seconds XFS I will try it too.


 It's been a while since I messed with this but isn't XFS the one that
 hates power failures and such?

 Dale

 :-) :-)

 --
 I am only responsible for what I said ... Not for what you understood or how 
 you interpreted my words!

  Well, it's the delayed allocation of XFS (which prevents fragmentation)
that does not like sudden power losses :) But ext4 has that too, you can
disable it though - that should be true for XFS too.
But the power situation in the datacenter has never been a problem so far,
and even if the cache partition get's screwed, we can always rebuild it.
Takes a few hours, but it would not be the end of the world :)


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Nilesh Govindrajan

On Mon 13 Aug 2012 08:28:15 PM IST, Michael Hampicke wrote:

I guess traversing through directories may be faster with XFS,
but in my experience ext4 perfoms better than XFS in regard to
operations (cp, rm) on small files.
I read that there are some tuning options for XFS and small
files, but never tried it.

But if somone seconds XFS I will try it too.


It's been a while since I messed with this but isn't XFS the one
that hates power failures and such?

Dale

:-) :-)

--
I am only responsible for what I said ... Not for what you understood or 
how you interpreted my words!

Well, it's the delayed allocation of XFS (which prevents
fragmentation) that does not like sudden power losses :) But ext4 has
that too, you can disable it though - that should be true for XFS too.
But the power situation in the datacenter has never been a problem so
far, and even if the cache partition get's screwed, we can always
rebuild it. Takes a few hours, but it would not be the end of the world :)


Yes, XFS hates power failures. I got a giant UPS for my home desktop to 
use XFS because of it's excellent performance ;-)


--
Nilesh Govindrajan
http://nileshgr.com



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Michael Hampicke
Am 13.08.2012 16:52, schrieb Michael Mol:
 On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke 
 mgehampi...@gmail.comwrote:
 
 Have you indexed your ext4 partition?

 # tune2fs -O dir_index /dev/your_partition
 # e2fsck -D /dev/your_partition

 Hi, the dir_index is active. I guess that's why delete operations take as
 long as they take (index has to be updated every time)

 
 1) Scan for files to remove
 2) disable index
 3) Remove files
 4) enable index
 
 ?
 

That's what I love about gentoo-users :) , I would never have thought of
that myself. I will try this and see how much of an performance gain
there is. Disabling the index should only require a 'mount -o remount' I
guess.



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Michael Mol
On Mon, Aug 13, 2012 at 11:26 AM, Michael Hampicke mgehampi...@gmail.comwrote:

 Am 13.08.2012 16:52, schrieb Michael Mol:
  On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke 
 mgehampi...@gmail.comwrote:
 
  Have you indexed your ext4 partition?
 
  # tune2fs -O dir_index /dev/your_partition
  # e2fsck -D /dev/your_partition
 
  Hi, the dir_index is active. I guess that's why delete operations take
 as
  long as they take (index has to be updated every time)
 
 
  1) Scan for files to remove
  2) disable index
  3) Remove files
  4) enable index
 
  ?
 

 That's what I love about gentoo-users :) , I would never have thought of
 that myself. I will try this and see how much of an performance gain
 there is. Disabling the index should only require a 'mount -o remount' I
 guess.


It's the same logic as behind database programming; do a bulk modification,
then update the index afterwards. The index update will take longer than
that for a single file, but it has the potential to be more efficient in
bulk operations. (It'd be nice if ext4 supported transactional behaviors,
where you could defer index updates until after a commit, but I don't think
it (or any filesystem on Linux) does.)

You *should* be able to enable/disable indexes on a per-directory basis, so
if your search pattern is confined, I'd go that route.



-- 
:wq


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Florian Philipp
Am 13.08.2012 16:52, schrieb Michael Mol:
 On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke
 mgehampi...@gmail.com mailto:mgehampi...@gmail.com wrote:
 
 Have you indexed your ext4 partition?
 
 # tune2fs -O dir_index /dev/your_partition
 # e2fsck -D /dev/your_partition
 
 Hi, the dir_index is active. I guess that's why delete operations
 take as long as they take (index has to be updated every time) 
 
 
 1) Scan for files to remove
 2) disable index
 3) Remove files
 4) enable index
 
 ?
 
 -- 
 :wq

Other things to think about:

1. Play around with data=journal/writeback/ordered. IIRC, data=journal
actually used to improve performance depending on the workload as it
delays random IO in favor of sequential IO (when updating the journal).

2. Increase the journal size.

3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of
course this only helps after re-allocating everything.

4. Try parallelizing. Ext4 requires relatively few locks nowadays (since
2.6.39 IIRC). For example:
find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \
xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f

5. Use a separate device for the journal.

6. Temporarily deactivate the journal with tune2fs similar to MM's idea.

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Michael Hampicke
Am 13.08.2012 19:14, schrieb Florian Philipp:
 Am 13.08.2012 16:52, schrieb Michael Mol:
 On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke
 mgehampi...@gmail.com mailto:mgehampi...@gmail.com wrote:

 Have you indexed your ext4 partition?

 # tune2fs -O dir_index /dev/your_partition
 # e2fsck -D /dev/your_partition

 Hi, the dir_index is active. I guess that's why delete operations
 take as long as they take (index has to be updated every time) 


 1) Scan for files to remove
 2) disable index
 3) Remove files
 4) enable index

 ?

 -- 
 :wq
 
 Other things to think about:
 
 1. Play around with data=journal/writeback/ordered. IIRC, data=journal
 actually used to improve performance depending on the workload as it
 delays random IO in favor of sequential IO (when updating the journal).
 
 2. Increase the journal size.
 
 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of
 course this only helps after re-allocating everything.
 
 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since
 2.6.39 IIRC). For example:
 find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \
 xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f
 
 5. Use a separate device for the journal.
 
 6. Temporarily deactivate the journal with tune2fs similar to MM's idea.
 
 Regards,
 Florian Philipp
 

Trying out different journals-/options was already on my list, but the
manpage on chattr regarding the T attribute is an interesting read.
Definitely worth trying.

Parallelizing multiple finds was something I already did, but the only
thing that increased was the IO wait :) But now having read all the
suggestions in this thread, I might try it again.

Separate device for the journal is a good idea, but not possible atm
(machine is abroad in a data center)



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Paul Hartman
On Mon, Aug 13, 2012 at 8:16 AM, Michael Hampicke mgehampi...@gmail.com wrote:
 Howdy gentooers,

 I am looking for a filesystem that perfomes well for a cache directory.
 Here's some data on that dir:
 - cache for prescaled images files + metadata files
 - nested directory structure ( 20/2022/202231/*files* )
 - about 20GB
 - 100.000 directories
 - about 2 million files

 The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two
 10.000rpm hard drives running a RAID1.

 Up until now I was using ext4 with noatime, but I am not happy with it's
 performence. Finding and deleting old files with 'find' is incredible slow,
 so I am looking for a filesystem that performs better. First candiate that
 came to mind was reiserfs, but last time I tried it, it became slower over
 time (fragmentation?).
 Currently I am running a test with btrfs and so far I am quiet happy with it
 as it is much faster in my use case.

 Do you guys have any other suggestions? How about JFS? I used that on my old
 NAS box because of it's low cpu usage. Should I give reiser4 a try, or
 better leave it be given Hans Reiser's current status?

I think btrfs probably is meant to provide a lot of the modern
features like reiser4 or xfs (tail-packing, indexing, compression,
snapshots, subvolumes, etc). Don't know if it is considered stable
enough for your usage but at least it is under active development and
funded by large names. I think if you would consider reiser4 as a
possibility then you should consider btrfs as well.



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Volker Armin Hemmann
Am Montag, 13. August 2012, 15:13:03 schrieb Paul Hartman:
 On Mon, Aug 13, 2012 at 8:16 AM, Michael Hampicke mgehampi...@gmail.com 
wrote:
  Howdy gentooers,
  
  I am looking for a filesystem that perfomes well for a cache directory.
  Here's some data on that dir:
  - cache for prescaled images files + metadata files
  - nested directory structure ( 20/2022/202231/*files* )
  - about 20GB
  - 100.000 directories
  - about 2 million files
  
  The system has 2x Intel Xon Quad-cores (Nehalem), 16GB of RAM and two
  10.000rpm hard drives running a RAID1.
  
  Up until now I was using ext4 with noatime, but I am not happy with it's
  performence. Finding and deleting old files with 'find' is incredible
  slow,
  so I am looking for a filesystem that performs better. First candiate that
  came to mind was reiserfs, but last time I tried it, it became slower over
  time (fragmentation?).
  Currently I am running a test with btrfs and so far I am quiet happy with
  it as it is much faster in my use case.
  
  Do you guys have any other suggestions? How about JFS? I used that on my
  old NAS box because of it's low cpu usage. Should I give reiser4 a try,
  or better leave it be given Hans Reiser's current status?
 
 I think btrfs probably is meant to provide a lot of the modern
 features like reiser4 or xfs (tail-packing, indexing, compression,
 snapshots, subvolumes, etc). Don't know if it is considered stable
 enough for your usage but at least it is under active development and
 funded by large names. I think if you would consider reiser4 as a
 possibility then you should consider btrfs as well.

reiser4 has one feature btrfs and ever other is missing. atomic operations.

Which is a wonderful feature. Too bad 'politics' killed reiser4.

-- 
#163933



Re: [gentoo-user] Fast file system for cache directory with lot's of files

2012-08-13 Thread Adam Carter
 I think btrfs probably is meant to provide a lot of the modern
 features like reiser4 or xfs

Unfortunately btrfs is still generally slower than ext4 for example.
Checkout http://openbenchmarking.org/, eg
http://openbenchmarking.org/s/ext4%20btrfs

The OS will use any spare RAM for disk caching, so if there's not much
else running on that box, most of your content will be served from
RAM. It may be that whatever fs you choose wont make that much of a
difference anyways.