Hi,
    I have just installed htdig and overall it is working beautifully.  My setup is as follows:
 
Most of the documents I want to search are word docs.  These docs are organized in directories and there are essentially no html files in these directories for htdig to navigate with.  I have turned on apache directory indexes though, so htdig can navigate that way.  It does seem able to recurse all the directories and find all of the docs.  I have installed doc2html.pl and catdoc to support searching these word docs.  Again, it seems to do this fine.
 
The problem is that when I do a search for a phrase (like 'oracle failover') I get the following hits:
 
 

Documents 1 - 10 of 40 matches. More *'s indicate a better match.

Index of /R2P3_Release/4_Failure_Failover_and_Recovery_Procedures/4.2_Oracle****
s is a test Name Last modified Size Description [DIR] Parent Directory 11-Sep-2002 13:44 - [ ] Oracle_Service_Migration.doc 04-Oct-2002 14:45 127k [ ] Reintroduction_Oracle_and_Weblogic_Node_after_Failure.doc 27-Sep-2002 13:49 110k Apache/1.3.27 Server at dcadm00 Port 80
http://dcadm00/R2P3_Release/4_Failure_Failover_and_Recovery_Procedures/4.2_Oracle/ 12/10/02, 1084 bytes
Index of /R2P3_Release/4_Failure_Failover_and_Recovery_Procedures/4.2_Oracle****
s is a test Name Last modified Size Description [DIR] Parent Directory 11-Sep-2002 13:44 - [ ] Oracle_Service_Migration.doc 04-Oct-2002 14:45 127k [ ] Reintroduction_Oracle_and_Weblogic_Node_after_Failure.doc 27-Sep-2002 13:49 110k Apache/1.3.27 Server at dcadm00 Port 80
http://dcadm00/R2P3_Release/4_Failure_Failover_and_Recovery_Procedures/4.2_Oracle/?D=A 12/10/02, 1084 bytes
Index of /R2P3_Release/4_Failure_Failover_and_Recovery_Procedures/4.2_Oracle****
s is a test Name Last modified Size Description [DIR] Parent Directory 11-Sep-2002 13:44 - [ ] Reintroduction_Oracle_and_Weblogic_Node_after_Failure.doc 27-Sep-2002 13:49 110k [ ] Oracle_Service_Migration.doc 04-Oct-2002 14:45 127k Apache/1.3.27 Server at dcadm00 Port 80
http://dcadm00/R2P3_Release/4_Failure_Failover_and_Recovery_Procedures/4.2_Oracle/?S=A 12/10/02, 1084 bytes
Index of /R2P3_Release/4_Failure_Failover_and_Recovery_Procedures/4.2_Oracle****
s is a test Name Last modified Size Description [DIR] Parent Directory 11-Sep-2002 13:44 - [ ] Reintroduction_Oracle_and_Weblogic_Node_after_Failure.doc 27-Sep-2002 13:49 110k [ ] Oracle_Service_Migration.doc 04-Oct-2002 14:45 127k Apache/1.3.27 Server at dcadm00 Port 80
http://dcadm00/R2P3_Release/4_Failure_Failover_and_Recovery_Procedures/4.2_Oracle/?M=A 12/10/02, 1084 bytes
 
 
 
 
Notice that it seems to be giving me multiple hits for the same directory where the relevant doc is ( also appending a ?M=A or ?D=A ).  It does return the actual doc further down the list:
 
[Oracle_Service_Migration.doc]***
... Related Documents (daily health check procedure) Description This migration is tested to work in the following circumstances; Both Nodes up and Oracle running healthily on one of the nodes Oracle Node down and surviving node healthy Both Nodes up and Oracle not running on either server Operating System ...
http://dcadm00/R2P3_Release/4_Failure_Failover_and_Recovery_Procedures/4.2_Oracle/Oracle_Service_Migration.doc 10/04/02, 129536 bytes
What is causing these multiple hits of the parent directory that are 'almost' identical to each other?
 
 
Thanks,
 

Cliff Meece
BP NextGen
Chicago Cantera 1
(   Mobile:  +1 (630) 640-4373
(   Office:  +1 (630) 836-5058
*   E-Mail:  [EMAIL PROTECTED]

 

Reply via email to