According to StR:
> I have an list.php that generates a list of all sites in my domain
> 
> like this
> 
> <a href="dir1/files.php"> Dir 1</a>
> <a href="dir2/files.php"> Dir 2</a>
> <a href="dir3/files.php"> Dir 3</a>
> 
> and dir1/files.php looks like:
> 
> <a href="dir1/file1.php"> 1 File 1 </a>
> <a href="dir1/file2.php"> 1 File 2 </a>
> <a href="dir1/file3.php"> 1 File 3 </a>
> <a href="dir1/file4.php"> 1 File 4 </a>
> ...

Well, first of all, if you have a relative href like dir1/file1.php inside
the file dir1/files.php, then the web client (htdig, or a web browser)
would piece the URL together as dir1/dir1/file1.php.  Is that what
you want?  If not, i.e. if file1.php is at the same level as files.php
in dir1, then the URLs within dir1/files.php shouldn't also contain the
"dir1/" portion of the path.

> File 1, 2,3,4... are the files i want htdig to index..
> 
> each dir has like 500 links
> 
> but if i search a word of dir2.. it only finds me matches from file1 to 
> file...250... from 251 to 500 it does not find them... 
> 
> and if i search words of the files in dir3 it does find them.. .why is that?
> 
> Thanks every1...
> 
> PD: max_head_length:        10000
>       max_doc_size:           2000000

Well, this behaviour is certainly consistent with document truncation,
as described in http://www.htdig.org/FAQ.html#q5.1 .  However, at
approximately 40 bytes per link line in dir1/files.php, for 500 files
you'd only have about 20 KB plus overhead for that file.  Even if htdig
wasn't picking up your max_doc_size setting above, for whatever reason,
the default value should still be adequate.

A couple other things to look into:

- point your web browser to the dir1/files.php page, and do a View File
Info to see the total size.  It may be that there's a lot of "padding"
in it, making it bigger than it should be.

- have a close look at the max_doc_size setting in your htdig.conf,
perhaps using "cat -v", to make sure there isn't some control character
or something that slipped into the value for it.

If neither of these provide any relief, try running htdig -vvv to see
what htdig sees when it indexes one of these URL list files.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

----------------------------------------------------------------------------
                   Bringing you mounds of caffeinated joy
                      >>>     http://thinkgeek.com/sf    <<<

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to