On Thu, 2010-10-28 at 10:04 -0400, Tom Buskey wrote:
> > An agent or agents purporting to be Greg Rundlett (freephile) said:
> > > I liked this post which in summary is a reminder of the Unix
> Philosophy
> > > http://teddziuba.com/2010/10/taco-bell-programming.html
> >
> 
> Some of it reminds me of Jon Bentley's(?) "Programming Pearls"
> series/books.

Which is in the library:
http://www.librarything.com/catalog/dlslug&deepsearch=bentley


Back to the Taco Bell link and the suggestion to minimize the amopunt of
code you write:
Still, shell scripts are code.  They're often hard to read and hard to
maintain.  The suggestion that sed was preferable to Python seems like
poor advice.  Certainly Perl beats a sed script.  I wrote some
horrendously complicated sed scripts ages ago before I knew of Perl and
Python.

I do agree with his basic point.  Use the system tools to glue your
processing into a series of simple steps.

If you *are* processing millions of web pages, don't forget to use the
timestamp features in wget and find to skip pages you've already
processed.  (This only makes sense if you read the "taco bell" link)
        now=$(date)
        find ... -newer last_processed ...
        touch --date="$now" last_processed
If the crawling overlaps with the processing, you will process some
files in consecutive runs.  I assume that's OK.  At least you process
every file.

-- 
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://dlslug.org/library.html
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/catalog/dlslug&sort=stamp
http://www.librarything.com/rss/recent/dlslug

_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

Reply via email to