I'm trying to restrict the found documents to one's in a particular directory. Our 
aspseek search engine is at http://www.jhuccp.org/cgi-bin/s.cgi. 

If you enter a search term like 'advocacy', you should get a return of about 424 
documents. To do this, aspseek uses this URL: 
http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0

We want to limit the found documents to the ones that have 'advocacy' in them in the 
/popreporter/ directory. To do this, I created this record in MySQL: 
www:/usr/local/aspseek/etc# mysql -u aspseek12 -p aspseek12 
mysql> select * from subsets; 
+-----------+-------------------------------------+ 
| subset_id | mask | 
+-----------+-------------------------------------+ 
| 2 | http://www.jhuccp.org/popreporter/% | 
+-----------+-------------------------------------+ 

When I run index -B, I get: 
www:/usr/local/aspseek/etc# su - -s /bin/bash aspseek 
aspseek@www:~$ sbin/index -B 
Loading configuration from /usr/local/aspseek/etc/db.conf 
Loading configuration from /usr/local/aspseek/etc/ucharset.conf 
Loading configuration from /usr/local/aspseek/etc/stopwords.conf 
Loading configuration from /usr/local/aspseek/etc/aspseek.conf 
Generating subset http://www.jhuccp.org/popreporter/% ... done (96 URLs) 
index process finished. 
aspseek@www:~$ 

This seems to indicate that I've got the subset set up correctly. 

Then, to test this, I manually edit the URL in the browser's location box to: 
http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0&ul=http://www.jhuccp.org/popreporter/%
 
I've tried variations on this, such as putting the URL in quotes, just using 
'/popreporter/' etc. Still no joy.

When I submit it, it returns the same 424 documents as before; no restriction to the 
/popreporter/ directory is done. 

I've read in some of the posts to this list that the subset should be set up without 
the '%', so I also tried that:
aspseek@www:~$ mysql -u aspseek12 -p aspseek12
Enter password: 
mysql> select * from subsets;
+-----------+------------------------------------+
| subset_id | mask                               |
+-----------+------------------------------------+
|         1 | http://www.jhuccp.org/popreporter/ |
+-----------+------------------------------------+
1 row in set (0.00 sec)

Then I run:
aspseek@www:~$ sbin/index -a -m -u "http://www.jhuccp.org/popreporter/%";
Loading configuration from /usr/local/aspseek/etc/db.conf
Loading configuration from /usr/local/aspseek/etc/ucharset.conf
Loading configuration from /usr/local/aspseek/etc/stopwords.conf
Loading configuration from /usr/local/aspseek/etc/aspseek.conf
Adding URL: http://www.jhuccp.org/popreporter/current.shtml
Adding URL: http://www.jhuccp.org/popreporter/subscribe.shtml
Adding URL: http://www.jhuccp.org/popreporter/index.shtml
Adding URL: http://www.jhuccp.org/popreporter/2002/02-25.shtml
<snip>
Adding URL: http://www.jhuccp.org/popreporter/2001/06-11.shtml
Adding URL: http://www.jhuccp.org/popreporter/2001/06-04.shtml
Saving real-time database ... done.
Saving delta files [..................................................] done.
Deleting 'deleted' records from urlword[s] ... done. (0 records deleted)
Saving real-time ... done
Saving redirects ... done
Splitting href delta file ... done
Saving href delta files ... done
Saving direct href delta files ... done
Calculating ranks  [................................................] done.
Saving lastmods ... done
Generating word site ... done
Generating subset http://www.jhuccp.org/popreporter/ ... done (0 URLs)
index process finished.
aspseek@www:~$ 

The dlog.log says, "Subset http://www.jhuccp.org/ not found". Yet, the index command 
suggests that it found plenty.

Could someone please set me straight on how this should work? Thank you very much for 
your help. 

-Kevin Zembower 

Reply via email to