Quoting Douglas Kline ([EMAIL PROTECTED]) on Wed, Apr 27, 2005 at 02:11:25PM 
-0400:
> > If you want all files indexed within that tree, then you could use some
> > sort of script to dump out a recursive directory listing to a file, then
> > use that file as the source for your Start_URL.
> > 
> > If you only want a subset then that technique might not be practical.
> > > is there a simple incantation of htdig that would allow me to index a
> > > file tree (not via the web server)?
> > > I have a hodgepodge of files, all text, some with and some without
> > > extensions that I would like to have full text search capabilities on.
> > > But I can not figure out how to get them all indexed. Some do, some
> > > don't. (I am using a file:// URL as the starting point).
> > > I am using 3.2.0b6.
> > > Looks like htdig is geared mainly towards web site indexing and I am
> > > trying to bend it too much...
> 
> If this is under Unix, you could use the "find" command to write out all the
> files in a tree.  If you want to select some of them, you could use options to
> the "find" command which are more plentiful with the Gnu version or you could
> pipe the output to a command like a grep or sed to select some files.  The
> argument to ht-Dig should be a URL or list of URL's.  So somewhere a URL has 
> to
> be used to address the files.  Once you have a URL you can use, the various
> files can be addressed by pathnames starting from the URL.  You could convert
> the list of files into a list of URL's with substitutions which could be
> executed in the same pipe from the find command.
> 
> If you're not running Unix, there might be some parallel operations you could
> use.


Even when I feed it a list of all files (find>awk>html), ht_dig would
still produce an index that missed lots of keywords. I have no idea why. 
A string like "msgrcv" would show up in some files but not others.

Meanwhile searching for alternatives I turned to id-tools, which does not
generate a web searchable index, but indexes all files by just pointing
at the root of the tree and could be made to include any text file with a
simple config change.


thx
afx

-- 
atsec information security GmbH                Phone: +49-89-44249830
Steinstrasse 70                                  Fax: +49-89-44249831
D-81667 Muenchen, Germany                        Web: atsec.com
                      May the Source be with you!


-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to