According to Zachary Jenks: > > Greetings Mr. Adams, Please see http://www.htdig.org/FAQ.html#q1.16
> > Currently I have auto indexing turned off in my Apache setup because I do > not want the public to access any of my php applications or view file lists. > Therefore, htdig is not indexing my directories. I've read over the FAQ and > entered in the sample script (FAQ 5.25) into my htdig.conf file but I'm not > getting any results. I get the following message after ./rundig: > > ------------------------------------------------------------------ > > htmerge: Unable to open word list file > '/www3/umesd/searchengine/htdig/db/db.wordlist'. > > Did you index anything? > > Check your config file and try running htdig again. > > ------------------------------------------------------------------ > > And -vvv shows me that it's setting New server to: , 0. > > > > Question1: Can you tell me exactly how and where to place that sample > script so that it works? I put it all in htdig.conf after "start_url" as > follows: > > ------------------------------------------------------------------------- > > start_url: '/www3/umesd/searchengine/docs/': > > > > find /www3 -type f -name \*.html -print | \ > > sed -e 's|/www3|http://www.umesd.k12.or.us/|' > \ > > /www3/umesd/searchengine/docs/ > > -------------------------------------------------------------------------- > --- > > > > Is this correct??? Incorrect on 3 counts. First of all, the output from find and sed should go to a regular file, not a directory, so you either need to remove the trailing slash after "docs", if docs is a file and not a directory, or append a file name after docs/ if docs is a directory. This change must be made in both the start_url entry in htdig.conf and in the script that runs find and sed. Secondly, the file name you use in the start_url entry must be enclosed in left quotes (`), i.e. the character usually on the same key as the tilde (~), and not in apostrophes ('). Finally the find command above doesn't go in htdig.conf, but rather in a separate script that should be run before you run htdig and htmerge, or rundig or whatever you use to do an indexing run. If you use a shell script to do the indexing, you can just add the find and sed commands above to that script. > > Question2: Will this script allow me to index directories without > providing access to the public? That actually depends a lot on how you set things up. Normally, htdig will read the URLs it's given via HTTP requests, so technically those URLs would be publically accessible. However, they can be protected from public access if you set up "Basic" authentication in your web server and use the -u option to htdig to give the username/password, or the authorization attribute (http://www.htdig.org/attrs.html#authorization) in htdig.conf. You can also side-step the HTTP server using local_urls. Note too that the find command above will find documents that aren't linked from other documents on your site, so it may make things accessible from the search engine that would otherwise be hidden from view by simply following links on your site. This, however, is different than documents that aren't accessible to the public -- documents that are "hidden" by not linking to them are still accessible if they're under your DocumentRoot and aren't otherwise protected by web server security controls. Such documents may be found by some editing of URLs in the browser's "Location" field. (Several recent news stories about confidential information being unwittingly "leaked" to the public from a company's web server hinged on such misguided attempts at securing information.) There's also the question of whether you make the resulting index publicly searchable or not. You can protect the original documents all you want, but if the index you make from them is wide open to the public, you're opening up a pretty big peephole into that content. See http://www.htdig.org/FAQ.html#q4.20 -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

