On Thu, 13 Jun 2002, Alexander F Avdonkin wrote:

> Gregory Kozlovsky wrote:
> 
> > All right,
> >
> > I typed in mysql
> >     select url,hops from urlword where
> > url='http://www.washtimes.com/entertainment/';
> > and got
> >     http://www.washtimes.com/entertainment/ |    2 |
> > Now, I indexed with index -o command line argument which, according to
> > the description
> >     -o Index documents with less hops first. Here "hops" means the "depth"
> > value of the document.
> > If this was true, this document should've have level 1, because it is linked
> > to the front page.
> > Here is the reason, why it has level 2 found with index -P:
> >
> >    2    200 http://www.washtimes.com/entertainment/
> >    1    301 http://www.washtimes.com/entertainment
> >    0    200 http://www.washtimes.com/
> >
> > Apparently, the redirection from URL without the slash at the end to one
> > with the
> > slash is not recognized by the ASPSeek as a special case. Is there any way
> > around it?
> >
> 
> Unfortunately, there is no way to workaround, you can only set MaxHops to 3.
> Or you can modify sources, so that redirect will not increase hop value.
> See file parse.cpp, method CUrl::HTTPGetUrlAndStore, fix line
> hrID = wordCache.GetHref(str, CurSrv, doc.m_urlID, doc.m_hops + 1, srv);

Actually I have a patch for this problem which adds two new config parameters
(config excerpt below): 

#######################################################################
#IncrementHopsOnRedirect yes/no
# Allow/disallow index to increment hops value when redirects are
# encountered.  Applies only to redirects generated by Location headers.
#                ***** SURGEON GENERALS WARNING *****
# This option can be harmful as it negates the indexers built in ability
# to be self limiting in the case where a redirect loop is encountered.
# Please ensure that RedirectLoopLimit is set to a resonable value to
# enable recovery from entry into a redirect loop.
#                ***** SURGEON GENERALS WARNING *****
# This option does however allow a greater number of documents to be
# indexed for sites that redirect frequently (e.g. for cookie testing,
# typically on each page).  Test results (with MaxHops 4) on such a site
# increased actual documents indexed from 34 to 756.
# Can be set multiple times before "Server" command and takes effect till
# the end of config file or till next IncrementHopsOnRedirect command.
# Default value is "yes".
IncrementHopsOnRedirect no

#######################################################################
#RedirectLoopLimit <number>
# Maximum allowable contiguous redirects.
# Default value is 8.
# Can be set multiple times before "Server" command and takes effect till
# the end of config file or till next RedirectLoopLimit command.
RedirectLoopLimit 16

If there is interest I'll send it through.


Matt.

> >
> >     Gregory
> >
> > -----Original Message-----
> > From: Alexander F Avdonkin [mailto:[EMAIL PROTECTED]]
> > Sent: Donnerstag, 13. Juni 2002 16:34
> > To: [EMAIL PROTECTED]
> > Subject: Re: [aseek-users] Indexing with MaxHops
> >
> > You told that page is reacheable by 2 clicks.
> > Check if all intermediate pages are indexed and which hop values are
> > assigned
> > to them.
> >
> > Gregory Kozlovsky wrote:
> >
> > > No, it was not indexed. I checked logging into mysql and using a SELECT
> > > statement.
> > >
> > > -----Original Message-----
> > > From: Alexander F Avdonkin [mailto:[EMAIL PROTECTED]]
> > > Sent: Donnerstag, 13. Juni 2002 16:17
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: [aseek-users] Indexing with MaxHops
> > >
> > > Hello Gregory,
> > >
> > > As for first problem, check if page referring to absent URL is indexed and
> > > what
> > > hop value is assigned to it.
> > >
> > > Alexander.
> 

Reply via email to