Hi Gilles,

Jake's application looks like it's BroadVision --- and that is how the system
maintains session.  Most BroadVision systems will dump you to an error page or
give you "session expired" messages without the BV_SessionID/BV_EngineID
strings.

The only way I know to do this is to grab a session/engine ID from the home
page of a BroadVision site (and these ID's could be embedded in a form and not
on the URL) then have this "fresh" session info inserted into the links that
ht://Dig presents on the search results page.

I don't know of any other way to do it --- if it could be done, it would be
great to have an alternative search facility that will work with a BroadVision
site.  I'm in the thick of debugging a Verity search issue on a BV site right
now.  )-:

=====Keith


--- Gilles Detillieux <[EMAIL PROTECTED]> wrote:
> According to Jake Baillie:
> > I have an evil application that's inserting a session ID (yadda, yadda, 
> > we've heard it all before).
> > 
> > So, I put together a rewrite rule:
> > 
> > url_rewrite_rules: (.*)\\?BV_SessionID=(.*)\\&(.*) \\1?\\3
> > 
> > Now, I'm actually using htdig not as a search engine here, but merely a 
> > spider. I'm using the -t option to output a text list of URLs, and I'm 
> > going to take that list and do something else with it.
> > 
> > What I want to happen is:
> > 
> >
>
http://www.domain.com/something.jsp?BV_SessionID=24324234234&other=yadda&paramater=stupid
> > 
> > to rewrite to:
> > 
> > http://www.domain.com/something.jsp?other=yadda&parameter=stupid
> > 
> > when it enters the database (and writes that db.log text file). This is 
> > happening, as it stands, with my rule above. When I do htdig -vvv, I can 
> > see the normalization being done. Good.
> > 
> > The problem - it seems to be taking the links off of the page it retrieves
> 
> > (reading into the anchor tags), and normalizing them too, instead of just 
> > following them verbatim from the page and translating them later. This is
> a 
> > problem, because the site cannot be traversed without the session id on
> the 
> > line (I know, I didn't design it), but I need it to go away when the page 
> > is included in the database, because I might have to stop and restart
> htdig 
> > before the site is fully traversed, and the session ids expire after 60 
> > minutes. And htdig doesn't know a page is duplicated if the session id is 
> > different.
> > 
> > See the problem? :) If not, I can clarify. If so, suggestions are 
> > appreciated. :)
> > 
> > Please hit reply all, as I'm not subscribed to the list.
> 
> OK, this application is a bit more evil than the other session-ID-inserting
> applications we've heard all about before.  With most of these, the session
> ID can be safely omitted before the URL is fetched.  Unfortunately, htdig
> processes url_rewrite_rules before fetching the URL - it really almost has
> to, as it needs to know if this is a new URL or not before fetching it.
> 
> What you're asking for is for htdig to process url_rewrite_rules only
> for the purpose of determining if the URL has been visited or not, but
> that it keeps the session ID for when it fetches the URL.  Even that
> won't be good enough, though.
> 
> If I understand correctly, the session ID MUST be there in the URL or you
> can't access the document, plus, if the session ID has expired you also
> can no longer access the document until you get a fresh session ID.  So,
> how can you possibly get htsearch to return URLs with a useable session
> ID so that the search results actually lead to something you can fetch?
> 
> In your position, I think I'd find the programmer of this evil application
> and slap him about the head until he agrees to right something that's more
> search-engine-friendly.
> 
> -- 
> Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
> Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
> Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
> <[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html


__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to