On Thu, 13 Jun 2002, Alexander F Avdonkin wrote: > Gregory Kozlovsky wrote: > > > All right, > > > > I typed in mysql > > select url,hops from urlword where > > url='http://www.washtimes.com/entertainment/'; > > and got > > http://www.washtimes.com/entertainment/ | 2 | > > Now, I indexed with index -o command line argument which, according to > > the description > > -o Index documents with less hops first. Here "hops" means the "depth" > > value of the document. > > If this was true, this document should've have level 1, because it is linked > > to the front page. > > Here is the reason, why it has level 2 found with index -P: > > > > 2 200 http://www.washtimes.com/entertainment/ > > 1 301 http://www.washtimes.com/entertainment > > 0 200 http://www.washtimes.com/ > > > > Apparently, the redirection from URL without the slash at the end to one > > with the > > slash is not recognized by the ASPSeek as a special case. Is there any way > > around it? > > > > Unfortunately, there is no way to workaround, you can only set MaxHops to 3. > Or you can modify sources, so that redirect will not increase hop value. > See file parse.cpp, method CUrl::HTTPGetUrlAndStore, fix line > hrID = wordCache.GetHref(str, CurSrv, doc.m_urlID, doc.m_hops + 1, srv);
Actually I have a patch for this problem which adds two new config parameters (config excerpt below): ####################################################################### #IncrementHopsOnRedirect yes/no # Allow/disallow index to increment hops value when redirects are # encountered. Applies only to redirects generated by Location headers. # ***** SURGEON GENERALS WARNING ***** # This option can be harmful as it negates the indexers built in ability # to be self limiting in the case where a redirect loop is encountered. # Please ensure that RedirectLoopLimit is set to a resonable value to # enable recovery from entry into a redirect loop. # ***** SURGEON GENERALS WARNING ***** # This option does however allow a greater number of documents to be # indexed for sites that redirect frequently (e.g. for cookie testing, # typically on each page). Test results (with MaxHops 4) on such a site # increased actual documents indexed from 34 to 756. # Can be set multiple times before "Server" command and takes effect till # the end of config file or till next IncrementHopsOnRedirect command. # Default value is "yes". IncrementHopsOnRedirect no ####################################################################### #RedirectLoopLimit <number> # Maximum allowable contiguous redirects. # Default value is 8. # Can be set multiple times before "Server" command and takes effect till # the end of config file or till next RedirectLoopLimit command. RedirectLoopLimit 16 If there is interest I'll send it through. Matt. > > > > Gregory > > > > -----Original Message----- > > From: Alexander F Avdonkin [mailto:[EMAIL PROTECTED]] > > Sent: Donnerstag, 13. Juni 2002 16:34 > > To: [EMAIL PROTECTED] > > Subject: Re: [aseek-users] Indexing with MaxHops > > > > You told that page is reacheable by 2 clicks. > > Check if all intermediate pages are indexed and which hop values are > > assigned > > to them. > > > > Gregory Kozlovsky wrote: > > > > > No, it was not indexed. I checked logging into mysql and using a SELECT > > > statement. > > > > > > -----Original Message----- > > > From: Alexander F Avdonkin [mailto:[EMAIL PROTECTED]] > > > Sent: Donnerstag, 13. Juni 2002 16:17 > > > To: [EMAIL PROTECTED] > > > Subject: Re: [aseek-users] Indexing with MaxHops > > > > > > Hello Gregory, > > > > > > As for first problem, check if page referring to absent URL is indexed and > > > what > > > hop value is assigned to it. > > > > > > Alexander. >
