Re: Optimize shell

2006-02-07 Thread Olivier Nicole
Thanks for the suggestions.
 I am setting up a machine to work as a mail back-up. It receives copy
 of every email for every user. When the disk is almost full, I want to
 delete older messages up to a total size of 40.

Going to database storing was a good idea, but not an issue as the
system is already running. Using delete functions from other tools
could be a solution though I doubt it goes accross all the users.

Using bash could be a way to go, as using locate (possible, but then
it would need a second command to get the file size, so I am not sure
that it would save much).

And my assumption was wrong, the most time consumption was in the sed,
not in the sort. In fact I did not need the sed as I could split the
fields on the / for sort and pick up the correct argument in
awk. Using xargs also speed up the things a small bit.

Here is the final solution:

mailbackroot66: cat func5
#!/bin/sh
/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
/usr/bin/sort -t/ -n +6 | /usr/bin/awk '{sum+=$7; if (sum  2) print 
$11;}'|xargs cat /dev/null
mailbackroot67: time ./func5
0.806u 3.086s 0:35.69 10.8% 67+405k 9864+21io 5pf+0w

And the original one:

mailbackroot68: cat func1
#!/bin/sh
for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
/usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
+.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | 
/usr/bin/awk '{sum+=$2; if (sum  2) print $3;}'`; do
cat $i /dev/null
done
mailbackroot69: time ./func1
223.665u 12.341s 4:53.42 80.4%  48+315k 9100+13io 0pf+0w

35 seconds is OK.

Best regards,

Olivier

Original question:
  I am setting up a machine to work as a mail back-up. It receives copy
 of every email for every user. When the disk is almost full, I want to
 delete older messages up to a total size of 40.
 
 Messages are stored in /home/sub_home/user/Maildir/cur in maildir
 format. 
 
 Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th
 where the first number is a Unix time stamp.
 
 I came up with the following sheel to find the messages of all users,
 sort them by date and compute the total size up to 4gB.
 
 for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
 /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
 +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | 
 /usr/bin/awk '{sum+=$2; if (sum  40) print $3;}'`; do
 /bin/rm $i
 done
 
 find /home -mindepth 5 -ls makes a list of all files and directory at
  a depth of 5 and more because my directory structure is so that
  messages are store at level 6
 
 grep /Maildir/cur/ because courrierimapo tends to put things in other
  directories it creates when it needs too
 
 These two commads give me a list of the form:
 
 13974908 -rw---1 on   staff3124 Jan 27 
 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th
 
 where 3124 is the size
 
 The sed command transforms the line into date, size, filname:
 
 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th
 
 Then it sorts on the date field and awk is used to sum on the size
 field and print the filename until the total of 4gB is reached.
 
 That works OK, but it is damn slow: for 200 users, 7800 messages and
 302MB it takes something like 3+ minutes... For 25 GB of email it
 should take more than 4 hours, this is too much.
 
 It sems that the long part is the sort:
 
 without sort
 time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
 /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
 +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' |  cat /dev/null
 0.026u 0.035s 0:07.67 0.6%  51+979k 0+0io 0pf+0w
 
 with sort
 time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
 /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
 +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | 
 cat /dev/null
 0.281u 0.366s 3:44.75 0.2%  39+1042k 0+0io 0pf+0w
 
 Any idea how to speed up the things?
 
 Thanks in advance,
 
 Olivier
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Optimize shell

2006-02-06 Thread Olivier Nicole
Hello,

I am setting up a machine to work as a mail back-up. It receives copy
of every email for every user. When the disk is almost full, I want to
delete older messages up to a total size of 40.

Messages are stored in /home/sub_home/user/Maildir/cur in maildir
format. 

Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th
where the first number is a Unix time stamp.

I came up with the following sheel to find the messages of all users,
sort them by date and compute the total size up to 4gB.

for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
/usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
+.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | 
/usr/bin/awk '{sum+=$2; if (sum  40) print $3;}'`; do
/bin/rm $i
done

find /home -mindepth 5 -ls makes a list of all files and directory at
 a depth of 5 and more because my directory structure is so that
 messages are store at level 6

grep /Maildir/cur/ because courrierimapo tends to put things in other
 directories it creates when it needs too

These two commads give me a list of the form:

13974908 -rw---1 on   staff3124 Jan 27 
15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th

where 3124 is the size

The sed command transforms the line into date, size, filname:

1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th

Then it sorts on the date field and awk is used to sum on the size
field and print the filename until the total of 4gB is reached.

That works OK, but it is damn slow: for 200 users, 7800 messages and
302MB it takes something like 3+ minutes... For 25 GB of email it
should take more than 4 hours, this is too much.

It sems that the long part is the sort:

without sort
time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
/usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
+.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' |  cat /dev/null
0.026u 0.035s 0:07.67 0.6%  51+979k 0+0io 0pf+0w

with sort
time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
/usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
+.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat 
/dev/null
0.281u 0.366s 3:44.75 0.2%  39+1042k 0+0io 0pf+0w

Any idea how to speed up the things?

Thanks in advance,

Olivier
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Optimize shell

2006-02-06 Thread Norberto Meijome
Olivier Nicole wrote:
 Hello,
 
 I am setting up a machine to work as a mail back-up. It receives copy
 of every email for every user. When the disk is almost full, I want to
 delete older messages up to a total size of 40.
 
 Messages are stored in /home/sub_home/user/Maildir/cur in maildir
 format. 
 
 Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th
 where the first number is a Unix time stamp.
 
 I came up with the following sheel to find the messages of all users,
 sort them by date and compute the total size up to 4gB.
 
 for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
 /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
 +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | 
 /usr/bin/awk '{sum+=$2; if (sum  40) print $3;}'`; do
 /bin/rm $i
 done
 
 find /home -mindepth 5 -ls makes a list of all files and directory at
  a depth of 5 and more because my directory structure is so that
  messages are store at level 6
 
 grep /Maildir/cur/ because courrierimapo tends to put things in other
  directories it creates when it needs too
 
 These two commads give me a list of the form:
 
 13974908 -rw---1 on   staff3124 Jan 27 
 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th
 
 where 3124 is the size
 
 The sed command transforms the line into date, size, filname:
 
 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th
 
 Then it sorts on the date field and awk is used to sum on the size
 field and print the filename until the total of 4gB is reached.
 
 That works OK, but it is damn slow: for 200 users, 7800 messages and
 302MB it takes something like 3+ minutes... For 25 GB of email it
 should take more than 4 hours, this is too much.
 
 It sems that the long part is the sort:
 
 without sort
 time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
 /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
 +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' |  cat /dev/null
 0.026u 0.035s 0:07.67 0.6%  51+979k 0+0io 0pf+0w
 
 with sort
 time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | 
 /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) 
 +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | 
 cat /dev/null
 0.281u 0.366s 3:44.75 0.2%  39+1042k 0+0io 0pf+0w
 
 Any idea how to speed up the things?

Assuming the issue with sort being slow is the amount of items to
handle, it may help if you reduced the number of items in the list.
i.e., can you set a limit such as delete the oldest x months / keep only
 3 months of recent mail in the cur folder? (in which case you may just
do a search by timestamp and forget about sorting and awking)

I have also found that sort is much slower than purpose built sorting
utilities (sort is much much slower than zmergelog when sorting large
(several GB) of apache log files) - maybe you can write or use some
other tool for this?

Beto
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Optimize shell

2006-02-06 Thread a non y mouse
Olivier Nicole wrote:
 I am setting up a machine to work as a mail back-up. It receives copy
 of every email for every user. When the disk is almost full, I want to
 delete older messages up to a total size of 40.

 Any idea how to speed up the things?

look into the squirrel webmail proon plugin. i imagine that you could,
at minimum, adapt what it does to prune old messages into something
usable for yourself.

http://www.squirrelmail.org/plugin_view.php?id=251

-- 
http://forea.ch/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]