Re: Optimize shell
Thanks for the suggestions. I am setting up a machine to work as a mail back-up. It receives copy of every email for every user. When the disk is almost full, I want to delete older messages up to a total size of 40. Going to database storing was a good idea, but not an issue as the system is already running. Using delete functions from other tools could be a solution though I doubt it goes accross all the users. Using bash could be a way to go, as using locate (possible, but then it would need a second command to get the file size, so I am not sure that it would save much). And my assumption was wrong, the most time consumption was in the sed, not in the sort. In fact I did not need the sed as I could split the fields on the / for sort and pick up the correct argument in awk. Using xargs also speed up the things a small bit. Here is the final solution: mailbackroot66: cat func5 #!/bin/sh /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sort -t/ -n +6 | /usr/bin/awk '{sum+=$7; if (sum 2) print $11;}'|xargs cat /dev/null mailbackroot67: time ./func5 0.806u 3.086s 0:35.69 10.8% 67+405k 9864+21io 5pf+0w And the original one: mailbackroot68: cat func1 #!/bin/sh for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum 2) print $3;}'`; do cat $i /dev/null done mailbackroot69: time ./func1 223.665u 12.341s 4:53.42 80.4% 48+315k 9100+13io 0pf+0w 35 seconds is OK. Best regards, Olivier Original question: I am setting up a machine to work as a mail back-up. It receives copy of every email for every user. When the disk is almost full, I want to delete older messages up to a total size of 40. Messages are stored in /home/sub_home/user/Maildir/cur in maildir format. Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th where the first number is a Unix time stamp. I came up with the following sheel to find the messages of all users, sort them by date and compute the total size up to 4gB. for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum 40) print $3;}'`; do /bin/rm $i done find /home -mindepth 5 -ls makes a list of all files and directory at a depth of 5 and more because my directory structure is so that messages are store at level 6 grep /Maildir/cur/ because courrierimapo tends to put things in other directories it creates when it needs too These two commads give me a list of the form: 13974908 -rw---1 on staff3124 Jan 27 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th where 3124 is the size The sed command transforms the line into date, size, filname: 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th Then it sorts on the date field and awk is used to sum on the size field and print the filename until the total of 4gB is reached. That works OK, but it is damn slow: for 200 users, 7800 messages and 302MB it takes something like 3+ minutes... For 25 GB of email it should take more than 4 hours, this is too much. It sems that the long part is the sort: without sort time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | cat /dev/null 0.026u 0.035s 0:07.67 0.6% 51+979k 0+0io 0pf+0w with sort time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat /dev/null 0.281u 0.366s 3:44.75 0.2% 39+1042k 0+0io 0pf+0w Any idea how to speed up the things? Thanks in advance, Olivier ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Optimize shell
Hello, I am setting up a machine to work as a mail back-up. It receives copy of every email for every user. When the disk is almost full, I want to delete older messages up to a total size of 40. Messages are stored in /home/sub_home/user/Maildir/cur in maildir format. Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th where the first number is a Unix time stamp. I came up with the following sheel to find the messages of all users, sort them by date and compute the total size up to 4gB. for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum 40) print $3;}'`; do /bin/rm $i done find /home -mindepth 5 -ls makes a list of all files and directory at a depth of 5 and more because my directory structure is so that messages are store at level 6 grep /Maildir/cur/ because courrierimapo tends to put things in other directories it creates when it needs too These two commads give me a list of the form: 13974908 -rw---1 on staff3124 Jan 27 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th where 3124 is the size The sed command transforms the line into date, size, filname: 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th Then it sorts on the date field and awk is used to sum on the size field and print the filename until the total of 4gB is reached. That works OK, but it is damn slow: for 200 users, 7800 messages and 302MB it takes something like 3+ minutes... For 25 GB of email it should take more than 4 hours, this is too much. It sems that the long part is the sort: without sort time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | cat /dev/null 0.026u 0.035s 0:07.67 0.6% 51+979k 0+0io 0pf+0w with sort time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat /dev/null 0.281u 0.366s 3:44.75 0.2% 39+1042k 0+0io 0pf+0w Any idea how to speed up the things? Thanks in advance, Olivier ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Optimize shell
Olivier Nicole wrote: Hello, I am setting up a machine to work as a mail back-up. It receives copy of every email for every user. When the disk is almost full, I want to delete older messages up to a total size of 40. Messages are stored in /home/sub_home/user/Maildir/cur in maildir format. Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th where the first number is a Unix time stamp. I came up with the following sheel to find the messages of all users, sort them by date and compute the total size up to 4gB. for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum 40) print $3;}'`; do /bin/rm $i done find /home -mindepth 5 -ls makes a list of all files and directory at a depth of 5 and more because my directory structure is so that messages are store at level 6 grep /Maildir/cur/ because courrierimapo tends to put things in other directories it creates when it needs too These two commads give me a list of the form: 13974908 -rw---1 on staff3124 Jan 27 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th where 3124 is the size The sed command transforms the line into date, size, filname: 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th Then it sorts on the date field and awk is used to sum on the size field and print the filename until the total of 4gB is reached. That works OK, but it is damn slow: for 200 users, 7800 messages and 302MB it takes something like 3+ minutes... For 25 GB of email it should take more than 4 hours, this is too much. It sems that the long part is the sort: without sort time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | cat /dev/null 0.026u 0.035s 0:07.67 0.6% 51+979k 0+0io 0pf+0w with sort time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat /dev/null 0.281u 0.366s 3:44.75 0.2% 39+1042k 0+0io 0pf+0w Any idea how to speed up the things? Assuming the issue with sort being slow is the amount of items to handle, it may help if you reduced the number of items in the list. i.e., can you set a limit such as delete the oldest x months / keep only 3 months of recent mail in the cur folder? (in which case you may just do a search by timestamp and forget about sorting and awking) I have also found that sort is much slower than purpose built sorting utilities (sort is much much slower than zmergelog when sorting large (several GB) of apache log files) - maybe you can write or use some other tool for this? Beto ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Optimize shell
Olivier Nicole wrote: I am setting up a machine to work as a mail back-up. It receives copy of every email for every user. When the disk is almost full, I want to delete older messages up to a total size of 40. Any idea how to speed up the things? look into the squirrel webmail proon plugin. i imagine that you could, at minimum, adapt what it does to prune old messages into something usable for yourself. http://www.squirrelmail.org/plugin_view.php?id=251 -- http://forea.ch/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]