> Not sure what you're trying to achieve by having a different startURL
> every day?

well, I think the reason the index.html page generated by the web server got
indexed is that when I first set it up I just used /wotd/data for the
start_url (which indexed that page).  since I didn't want to reindex that
page/directory everyday because I didn't know whether it would re-index all
of the pages already indexed (the directory gets 1 new file a day) then I
set up this scheme.

it would be useful to have a --start_url in the htdig command line so I
didn't have to modify the .conf file every day.

anyhow.  I deleted everything in db/* and wrote a script to index data/* one
file at a time.  that seems to have fixed the problem but I'd be interested
to know whether there is a simpler, more natural solution.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Mike
Holderness
Sent: Friday, March 19, 2004 5:01 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: [htdig] Re: how to suppress indexing /


In-Reply-To: <[EMAIL PROTECTED]>
"Erick Calder" <[EMAIL PROTECTED]> wrote Thu, 18 Mar 2004 12:35:07 -0800:

> I publish an index of the word-of-the-day from yourdictionary.com which
> may
> be found at: http://www.arix.com/wotd/
>
> I create the index by grabbing the daily WOTD and writing a .html file
> into
> /var/www/html/wotd/data.  I create a config file (today.conf) to index
> the
> new file and call "htdig -c today.conf; htmerge -c today.conf" - a
> sample
> config file is included below.

Not sure what you're trying to achieve by having a different startURL
every day?


> my question is: when I search for a word I get a bunch of hits like:
>
>       Index of /wotd/data
>
> try it yourself by searching for "prince".  why is this and how can I
> suppress it?

In your shoes, I'd first test the idea that one of the pages reachable
through links from your startURL contains a link to the file index,
or that one of the CGI links returns this when called by htdig...

Mike

> --- today.conf ---
>
> common_dir:     /var/www/html/wotd
> database_dir:   ${common_dir}/db
> start_url:              http://www.arix.com/wotd/data/prince.html
> limit_urls_to:  ${start_url}
> max_head_length:        10000
> max_doc_size:           200000
> maintainer:             [EMAIL PROTECTED]
> no_excerpt_show_top:    true
> excerpt_length: 300
> template_map:   Long long ${common_dir}/long.html
> template_name:  long
> search_algorithm:       exact:1 synonyms:0.5 endings:0.1
> search_results_header: ${common_dir}/header.html
> search_results_footer: ${common_dir}/footer.html
> nothing_found_file: ${common_dir}/nichts.html
>
>
>
> --__--__--
>
> Message: 5
> Date: Thu, 18 Mar 2004 14:58:50 -0600
> From: "Wendt, Trevor" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Subject: [htdig] Links to specific URL?
>
> Does anyone know if it is possible to pull a specific URL and all the =
> pages linked to that URL out of the DB? (through command line or web)
>
> Similar in functionality to the link tool Google provides =
> http://www.google.com/help/operators.html#link (I would use it but we =
> use HTDig internally).=20
>
> I vaguely remember a question similar to this but have not been able to
> =
> find it in the FAQ or mailing list archives.=20
>
> Let me know, thanks!
>
>
> --__--__--
>
> Message: 6
> To: [EMAIL PROTECTED]
> Date: Thu, 18 Mar 2004 19:17:01 -0500
> From: Douglas Kline <[EMAIL PROTECTED]>
> Subject: [htdig] Link Lines to Find Javascript References
>
>
> We have been using lines with LINK references in order to get pages
> which are
> accessed via Javascript pull-down menus into the ht-Dig database with
> version
> 3.1.5.  With the new version 3.2.0b5 that doesn't seem to work any
> more.  The
> pages aren't being indexed.  What has changed in this regard?  TIA.
>
> Douglas
>
> ========
> Douglas Kline
> [EMAIL PROTECTED]
>
>
>
>
>
> --__--__--
>
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general
>
> End of htdig-general Digest



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to