I'm still not able to limit the found document to one's in a particular directory. Could anyone please take a look at this and let me know if I'm making a dumb mistake. Or, if anyone's got this working, could you paste in your examples?
URL of search page which returns more than 400 documents containing 'advocacy': http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0 URL of search page which should limit returns to pages containing 'advocacy' in the /popreporter' directory only. However, this returns the same number of documents as the first query, including most not in the /popreporter/ directory: http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0&ul=http://www.jhuccp.org/popreporter/% Entry in subsets: mysql> select * from subsets; | subset_id | mask | +-----------+-------------------------------------+ | 1 | http://www.jhuccp.org/popreporter/% | 1 row in set (0.01 sec) Results of generating subsets and spaces: aspseek@www:~$ sbin/index -B Loading configuration from /usr/local/aspseek/etc/db.conf Loading configuration from /usr/local/aspseek/etc/ucharset.conf Loading configuration from /usr/local/aspseek/etc/stopwords.conf Loading configuration from /usr/local/aspseek/etc/aspseek.conf Generating subset http://www.jhuccp.org/popreporter/% ... done (97 URLs) index process finished. Entry in var/dlog.log after running query: Subset http://www.jhuccp.org/ not found Entry in var/aspseek12/logs.txt, after last reindex: aspseek@www:~$ tail /usr/local/aspseek/var/aspseek12/logs.txt Sec Count Ch Ch1 Ch2 New Size HQ Hr hits HR lost W hit W miss W ins 100.033 607 42 580 77 1 47365422 30162 28155 2007 203609 33367 2 100.022 925 41 862 90 0 61748911 46290 45258 1032 328313 27801 6 100.023 1513 27 1461 112 0 34825946 79586 78797 789 617709 30690 2 New indexing session started at: 1042650902 Got next 5018 URLs for: 0.104 seconds. Queued docs: 5018.Time 0-1042650902. New indexing session started at: 1042667810 aspseek@www:~$ File aspseek.conf: aspseek@www:~$ grep -v '^[[:space:]]*$' etc/aspseek.conf |grep -v "^#" Include db.conf Include ucharset.conf Include stopwords.conf Converter application/pdf text/html /usr/local/bin/pdftohtml -i -noframes -stdout $in > $out Converter application/msword text/plain /usr/local/bin/antiword $in > $out DeleteNoServer no Server http://www.jhuccp.org/ DeltaBufferSize 64 Disallow /cgi-bin/ \.cgi /nph Disallow \.tif$ \.au$ \.mov$ \.jpe$ \.cur$ \.qt$ Disallow \.b$ \.sh$ \.md5$ \.rpm$ Disallow \.arj$ \.tar$ \.zip$ \.tgz$ \.gz$ Disallow \.lha$ \.lzh$ \.tar\.Z$ \.rar$ \.zoo$ Disallow \.gif$ \.jpg$ \.jpeg$ \.bmp$ \.tiff$ \.xpm$ \.xbm$ Disallow \.vdo$ \.mpeg$ \.mpe$ \.mpg$ \.avi$ \.movie$ Disallow \.mid$ \.mp3$ \.rm$ \.ram$ \.wav$ \.aiff$ \.ra$ Disallow \.vrml$ \.wrl$ \.png$ Disallow \.exe$ \.cab$ \.dll$ \.bin$ \.class$ Disallow \.tex$ \.texi$ \.xls$ \.texinfo$ Disallow \.rtf$ \.cdf$ \.ps$ Disallow \.ai$ \.eps$ \.ppt$ \.hqx$ Disallow \.cpt$ \.bms$ \.oda$ \.tcl$ Disallow \.o$ \.a$ \.la$ \.so$ \.so\.[0-9]$ Disallow \.pat$ \.pm$ \.m4$ \.am$ Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$ \?N=D$ \?S=A$ \?S=D$ Disallow [^:]// Disallow mmc/.*\.php Disallow PHPTEST aspseek@www:~$ Thanks for taking the time to look at this and for your thoughts and suggestions. -Kevin Zembower ----- E. Kevin Zembower Unix Administrator Johns Hopkins University/Center for Communications Programs 111 Market Place, Suite 310 Baltimore, MD 21202 410-659-6139
