James G. Sack (jim)([EMAIL PROTECTED])@Tue, Aug 12, 2008 at 06:45:02PM -0700: > Andrew Lentvorski wrote: > > Ralph Shumaker wrote: > >> Way back when I was running whendoze (around 10 years ago), I > >> had a shareware program called File Hound. One of the many > >> features that I liked about that program is that if I had it > >> running while browsing the internet, and I right click on > >> something and choose "copy link location", File Hound would > >> take it from there and fetch the object of that link, in the > >> background, while I continued surfing. It even cached > >> subsequent links and got to them as soon as it could. > > > > IIRC, FasterFox had an "aggressive" setting that would do this. > > > > This kind of aggressive prefetching is frowned upon for a > > couple of reasons. The primary one is that it consumes a lot > > of server bandwidth that the user quite often doesn't follow > > up. Basically, if everybody starts prefetching, your bandwidth > > goes up 10x with no increase in anything else which would help > > you pay for it. > > > > Some sites started monitoring it and would actually block your > > IP for a few minutes if they detected you doing it. > > > > The wget man page talks about that too, under the "--random-wait" > option. >
People who would block your IP would surely have a "robots.txt" file. (Correct me if I got the filename wrong) If your script honors that file, it should earn you fewer enemies. Using tabs to load pages in the background is good enough for me. I don't really want my tool to follow subsequent links unless I know something about what is there. I like letting databases be stored on the server ;-) . As far as getting the URL to your script is concerned, this sounds like a good job for a named pipe. Check out the manpage for the "mkfifo" command. You could have your script read from the pipe as long as you want, and react any time you sent it a URL. Let's say you do a "Save page as" in your browser, and save it to that file (which is really a pipe). Your script could scrape the URLs out of the webpage, and call wget to do the rest. In fact, any script or program could send a document to that pipe. As long as your script can accurately pick out the URLs, it could be quite flexible. Wade Curry syntaxman -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list
