[aseek-users] Not able to limit found documents to one directory

KEVIN ZEMBOWER Wed, 15 Jan 2003 14:06:06 -0800

I'm still not able to limit the found document to one's in a particular directory. 
Could anyone please take a look at this and let me know if I'm making a dumb mistake. 
Or, if anyone's got this working, could you paste in your examples?


URL of search page which returns more than 400 documents containing 'advocacy':
http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0 

URL of search page which should limit returns to pages containing 'advocacy' in the 
/popreporter' directory only. However, this returns the same number of documents as 
the first query, including most not in the /popreporter/ directory:
http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0&ul=http://www.jhuccp.org/popreporter/%
 

Entry in subsets:
mysql> select * from subsets;
| subset_id | mask                                |
+-----------+-------------------------------------+
|         1 | http://www.jhuccp.org/popreporter/% |
1 row in set (0.01 sec)

Results of generating subsets and spaces:
aspseek@www:~$ sbin/index -B
Loading configuration from /usr/local/aspseek/etc/db.conf
Loading configuration from /usr/local/aspseek/etc/ucharset.conf
Loading configuration from /usr/local/aspseek/etc/stopwords.conf
Loading configuration from /usr/local/aspseek/etc/aspseek.conf
Generating subset http://www.jhuccp.org/popreporter/% ... done (97 URLs)
index process finished.

Entry in var/dlog.log after running query:
Subset http://www.jhuccp.org/ not found

Entry in var/aspseek12/logs.txt, after last reindex:
aspseek@www:~$ tail /usr/local/aspseek/var/aspseek12/logs.txt 
     Sec Count    Ch   Ch1   Ch2   New       Size       HQ  Hr hits HR lost  W hit W 
miss  W ins
 100.033   607    42   580    77     1   47365422    30162    28155    2007 203609  
33367      2
 100.022   925    41   862    90     0   61748911    46290    45258    1032 328313  
27801      6
 100.023  1513    27  1461   112     0   34825946    79586    78797     789 617709  
30690      2
New indexing session started at: 1042650902
Got next   5018 URLs for:   0.104 seconds. Queued docs:  5018.Time 0-1042650902.
New indexing session started at: 1042667810
aspseek@www:~$ 

File aspseek.conf:
aspseek@www:~$ grep -v '^[[:space:]]*$' etc/aspseek.conf |grep -v "^#"
Include db.conf
Include ucharset.conf
Include stopwords.conf
Converter application/pdf text/html /usr/local/bin/pdftohtml -i -noframes -stdout $in 
> $out
Converter application/msword text/plain /usr/local/bin/antiword $in > $out
DeleteNoServer no
Server  http://www.jhuccp.org/
DeltaBufferSize 64
Disallow /cgi-bin/ \.cgi /nph
Disallow \.tif$  \.au$   \.mov$  \.jpe$  \.cur$  \.qt$
Disallow \.b$    \.sh$   \.md5$   \.rpm$
Disallow \.arj$  \.tar$  \.zip$  \.tgz$  \.gz$
Disallow \.lha$  \.lzh$  \.tar\.Z$  \.rar$  \.zoo$
Disallow \.gif$  \.jpg$  \.jpeg$ \.bmp$  \.tiff$ \.xpm$ \.xbm$
Disallow \.vdo$  \.mpeg$ \.mpe$  \.mpg$  \.avi$  \.movie$
Disallow \.mid$  \.mp3$  \.rm$   \.ram$  \.wav$  \.aiff$ \.ra$
Disallow \.vrml$ \.wrl$  \.png$
Disallow \.exe$  \.cab$  \.dll$  \.bin$  \.class$
Disallow \.tex$  \.texi$ \.xls$  \.texinfo$
Disallow \.rtf$  \.cdf$  \.ps$
Disallow \.ai$   \.eps$  \.ppt$  \.hqx$
Disallow \.cpt$  \.bms$  \.oda$  \.tcl$
Disallow \.o$ \.a$ \.la$ \.so$ \.so\.[0-9]$
Disallow \.pat$ \.pm$ \.m4$ \.am$
Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$ \?N=D$ \?S=A$ \?S=D$
Disallow [^:]//
Disallow mmc/.*\.php
Disallow PHPTEST
aspseek@www:~$ 

Thanks for taking the time to look at this and for your thoughts and suggestions.

-Kevin Zembower

-----
E. Kevin Zembower
Unix Administrator
Johns Hopkins University/Center for Communications Programs
111 Market Place, Suite 310
Baltimore, MD  21202
410-659-6139

[aseek-users] Not able to limit found documents to one directory

Reply via email to