On Thu, 2010-10-28 at 10:04 -0400, Tom Buskey wrote: > > An agent or agents purporting to be Greg Rundlett (freephile) said: > > > I liked this post which in summary is a reminder of the Unix > Philosophy > > > http://teddziuba.com/2010/10/taco-bell-programming.html > > > > Some of it reminds me of Jon Bentley's(?) "Programming Pearls" > series/books.
Which is in the library: http://www.librarything.com/catalog/dlslug&deepsearch=bentley Back to the Taco Bell link and the suggestion to minimize the amopunt of code you write: Still, shell scripts are code. They're often hard to read and hard to maintain. The suggestion that sed was preferable to Python seems like poor advice. Certainly Perl beats a sed script. I wrote some horrendously complicated sed scripts ages ago before I knew of Perl and Python. I do agree with his basic point. Use the system tools to glue your processing into a series of simple steps. If you *are* processing millions of web pages, don't forget to use the timestamp features in wget and find to skip pages you've already processed. (This only makes sense if you read the "taco bell" link) now=$(date) find ... -newer last_processed ... touch --date="$now" last_processed If the crawling overlaps with the processing, you will process some files in consecutive runs. I assume that's OK. At least you process every file. -- Lloyd Kvam Venix Corp DLSLUG/GNHLUG library http://dlslug.org/library.html http://www.librarything.com/catalog/dlslug http://www.librarything.com/catalog/dlslug&sort=stamp http://www.librarything.com/rss/recent/dlslug _______________________________________________ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/