Disable URL-rewriting?
Is there any way to disable URL rewriting (with jsessionid) in Tomcat or via struts-config.xml or anything? I'm about at my wits end with this jsessionid thing - now our search engine which indexes by crawling the site (and doesn't support cookies) can't index properly because of the jsessionid property ... and frankly, I'm not really caring at this point if the 3-5% of visitors to our site can't use sessions (there's practically no functionality that depends on it anyway - mainly a performance improvement, where it is being used). I'd like to leave sessions via cookies enabled, but disable URL rewriting for sessions. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disable URL-rewriting?
NAFAIK, but that's by no means definitive. I *do* know that you can configure a TC context (or default context) to do the opposite. In other words, turn off cookies and only use rewriting. Hm, what if you create a filter to wrap the response with an HttpServletResponseWrapper that no-ops encodeUrl and encodeURL? Quoting Brice Ruth <[EMAIL PROTECTED]>: > Is there any way to disable URL rewriting (with jsessionid) in Tomcat or > via struts-config.xml or anything? I'm about at my wits end with this > jsessionid thing - now our search engine which indexes by crawling the > site (and doesn't support cookies) can't index properly because of the > jsessionid property ... and frankly, I'm not really caring at this point > if the 3-5% of visitors to our site can't use sessions (there's > practically no functionality that depends on it anyway - mainly a > performance improvement, where it is being used). I'd like to leave > sessions via cookies enabled, but disable URL rewriting for sessions. > > -- > Brice D. Ruth > Sr. IT Analyst > Fiskars Brands, Inc. -- Kris Schneider <mailto:[EMAIL PROTECTED]> D.O.Tech <http://www.dotech.com/> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disable URL-rewriting?
Kris, Thanks for the response. I saw that I could do the opposite :) - no help there. I'd be really interested in pursuing your second suggestion. Ideally, I'd see setting up a filter that processes every incoming request and determines if its originating from our search engine, if so, then no-op those methods, otherwise leave everything as is ... How exactly would I go about doing what you suggest? I've only been doing Java/Servlet stuff for about a year or so, and that mostly via JSP/Struts/etc. - so I'm no expert when it comes to this. Kris Schneider wrote: NAFAIK, but that's by no means definitive. I *do* know that you can configure a TC context (or default context) to do the opposite. In other words, turn off cookies and only use rewriting. Hm, what if you create a filter to wrap the response with an HttpServletResponseWrapper that no-ops encodeUrl and encodeURL? Quoting Brice Ruth <[EMAIL PROTECTED]>: Is there any way to disable URL rewriting (with jsessionid) in Tomcat or via struts-config.xml or anything? I'm about at my wits end with this jsessionid thing - now our search engine which indexes by crawling the site (and doesn't support cookies) can't index properly because of the jsessionid property ... and frankly, I'm not really caring at this point if the 3-5% of visitors to our site can't use sessions (there's practically no functionality that depends on it anyway - mainly a performance improvement, where it is being used). I'd like to leave sessions via cookies enabled, but disable URL rewriting for sessions. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disable URL-rewriting?
First, check out a tutorial on Servlet 2.3 filters. One at: http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets8.html Essentially, it'll boil down to creating two classes: a filter (implements javax.servlet.Filter) and a response wrapper (extends javax.servlet.http.HttpServletResponseWrapper). Then, you'll need to add and elements to web.xml. How would you determine if a request originated from your search engine? User-Agent header? Specific request parameter? Quoting Brice Ruth <[EMAIL PROTECTED]>: > Kris, > > Thanks for the response. I saw that I could do the opposite :) - no help > there. I'd be really interested in pursuing your second suggestion. > Ideally, I'd see setting up a filter that processes every incoming > request and determines if its originating from our search engine, if so, > then no-op those methods, otherwise leave everything as is ... > > How exactly would I go about doing what you suggest? I've only been > doing Java/Servlet stuff for about a year or so, and that mostly via > JSP/Struts/etc. - so I'm no expert when it comes to this. > > Kris Schneider wrote: > > >NAFAIK, but that's by no means definitive. I *do* know that you can > configure a > >TC context (or default context) to do the opposite. In other words, turn > off > >cookies and only use rewriting. > > > >Hm, what if you create a filter to wrap the response with an > >HttpServletResponseWrapper that no-ops encodeUrl and encodeURL? > > > >Quoting Brice Ruth <[EMAIL PROTECTED]>: > > > > > > > >>Is there any way to disable URL rewriting (with jsessionid) in Tomcat or > >>via struts-config.xml or anything? I'm about at my wits end with this > >>jsessionid thing - now our search engine which indexes by crawling the > >>site (and doesn't support cookies) can't index properly because of the > >>jsessionid property ... and frankly, I'm not really caring at this point > >>if the 3-5% of visitors to our site can't use sessions (there's > >>practically no functionality that depends on it anyway - mainly a > >>performance improvement, where it is being used). I'd like to leave > >>sessions via cookies enabled, but disable URL rewriting for sessions. > >> > >>-- > >>Brice D. Ruth > >>Sr. IT Analyst > >>Fiskars Brands, Inc. > >> > >> > > > > > > > > -- > Brice D. Ruth > Sr. IT Analyst > Fiskars Brands, Inc. -- Kris Schneider <mailto:[EMAIL PROTECTED]> D.O.Tech <http://www.dotech.com/> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disable URL-rewriting?
Am Dienstag, 18. November 2003 18:29 schrieb Brice Ruth: If anyone knows a solution to this matter, I'm definitely interested in hearing about it, too. From all what I can tell, some search engines don't care (Altavista's Scooter, for example), while some do (Google, in particular) and go away when they detect a session. I experimented for quite some time, but to no real avail. One approach I tried is having a custom action explicitly killing all sessions when entering the main page and invoking that one from my index.jsp. This works, but I couldn't keep to this approach when porting (my private site) to Tiles some time ago, didn't look further into the possible reasons behind this. Plus, Struts also stores some info in session scope, the user's locale, for example. When I had to decide between sessions and Google (in fact, I only need sessions for pages 'beyond' authen- tication stage; my provider doesn't grant me access to resin.conf, so I had to write a filter for that task), I decided to better stick to the framework and have Google satisfied otherwise (ah, robots.txt doesn't make any difference in this direction, too). Gone bad already, I chose the cloaking approach (don't try this at home :-), so I changed my index.jsp to something like: <%@ page contentType="text/html; charset=ISO-8859-1" session="false" %> <% String userAgent = request.getHeader("User-Agent"); if (userAgent.indexOf("oogle") == -1) { %> <% } else { %> <% } %> and just saved a static snapshot of the main page in /static with Cookies enabled, so the jsessionid things are not appended to the links. Google won't be able to tell the difference because of the forward. Considering 'brute' means as disabling URL rewriting in general, I'd rather guess this is a server-specific thing. In Java, you don't have any means of telling the server how to maintain session state, the only thing you can say is 'request.getSession()' and check for an existing session if you give 'false' as a parameter to the overloaded version, and you can programmatically kill sessions via session.invalidate(). This is all Struts can do, too. So I think you have to check your server's manuals on this matter. If it's Tomcat, I can rather reliably tell that you can decide whether to use cookies or not, but the same is not true for the URL-rewriting approach as a 'fallback' means, and then, the server has to adhere the Servlet specification, after all. Now. As you can't determine *how* sessions are maintained, you can still determine to some degree *when* they're created. One thing I can tell from my experiences with Struts is that using the standard forward Action will auto- matically result in having a session around afterwards, but if you write a custom one that just says 'return mapping.findForward("success");, things are quite different. And so on. I really have to look further into this if my time allows. Note this is just telling from my experiences when developing my private site. On the job, the landing sites usually are static HTML pages, not so much because of session handling or search engines, but because of the general vulnerability of dynamic pages to DoS attacks and high traffic volumes in general. HTH, -- Chris. > Is there any way to disable URL rewriting (with jsessionid) in Tomcat > or via struts-config.xml or anything? I'm about at my wits end with > this jsessionid thing - now our search engine which indexes by > crawling the site (and doesn't support cookies) can't index properly > because of the jsessionid property ... and frankly, I'm not really > caring at this point if the 3-5% of visitors to our site can't use > sessions (there's practically no functionality that depends on it > anyway - mainly a performance improvement, where it is being used). > I'd like to leave sessions via cookies enabled, but disable URL > rewriting for sessions. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disable URL-rewriting?
I'd probably do a User-Agent header check ... Now, I've got a Filter already doing some things for the site - so I know how to configure web.xml and such, what do I need to do to use HttpServletResponseWrapper? Brice Kris Schneider wrote: First, check out a tutorial on Servlet 2.3 filters. One at: http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets8.html Essentially, it'll boil down to creating two classes: a filter (implements javax.servlet.Filter) and a response wrapper (extends javax.servlet.http.HttpServletResponseWrapper). Then, you'll need to add and elements to web.xml. How would you determine if a request originated from your search engine? User-Agent header? Specific request parameter? Quoting Brice Ruth <[EMAIL PROTECTED]>: Kris, Thanks for the response. I saw that I could do the opposite :) - no help there. I'd be really interested in pursuing your second suggestion. Ideally, I'd see setting up a filter that processes every incoming request and determines if its originating from our search engine, if so, then no-op those methods, otherwise leave everything as is ... How exactly would I go about doing what you suggest? I've only been doing Java/Servlet stuff for about a year or so, and that mostly via JSP/Struts/etc. - so I'm no expert when it comes to this. Kris Schneider wrote: NAFAIK, but that's by no means definitive. I *do* know that you can configure a TC context (or default context) to do the opposite. In other words, turn off cookies and only use rewriting. Hm, what if you create a filter to wrap the response with an HttpServletResponseWrapper that no-ops encodeUrl and encodeURL? Quoting Brice Ruth <[EMAIL PROTECTED]>: Is there any way to disable URL rewriting (with jsessionid) in Tomcat or via struts-config.xml or anything? I'm about at my wits end with this jsessionid thing - now our search engine which indexes by crawling the site (and doesn't support cookies) can't index properly because of the jsessionid property ... and frankly, I'm not really caring at this point if the 3-5% of visitors to our site can't use sessions (there's practically no functionality that depends on it anyway - mainly a performance improvement, where it is being used). I'd like to leave sessions via cookies enabled, but disable URL rewriting for sessions. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disable URL-rewriting?
Kris, Nevermind - I see your link has info on this, too - I'll let the list know how this goes. Brice Brice Ruth wrote: I'd probably do a User-Agent header check ... Now, I've got a Filter already doing some things for the site - so I know how to configure web.xml and such, what do I need to do to use HttpServletResponseWrapper? Brice Kris Schneider wrote: First, check out a tutorial on Servlet 2.3 filters. One at: http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets8.html Essentially, it'll boil down to creating two classes: a filter (implements javax.servlet.Filter) and a response wrapper (extends javax.servlet.http.HttpServletResponseWrapper). Then, you'll need to add and elements to web.xml. How would you determine if a request originated from your search engine? User-Agent header? Specific request parameter? Quoting Brice Ruth <[EMAIL PROTECTED]>: Kris, Thanks for the response. I saw that I could do the opposite :) - no help there. I'd be really interested in pursuing your second suggestion. Ideally, I'd see setting up a filter that processes every incoming request and determines if its originating from our search engine, if so, then no-op those methods, otherwise leave everything as is ... How exactly would I go about doing what you suggest? I've only been doing Java/Servlet stuff for about a year or so, and that mostly via JSP/Struts/etc. - so I'm no expert when it comes to this. Kris Schneider wrote: NAFAIK, but that's by no means definitive. I *do* know that you can configure a TC context (or default context) to do the opposite. In other words, turn off cookies and only use rewriting. Hm, what if you create a filter to wrap the response with an HttpServletResponseWrapper that no-ops encodeUrl and encodeURL? Quoting Brice Ruth <[EMAIL PROTECTED]>: Is there any way to disable URL rewriting (with jsessionid) in Tomcat or via struts-config.xml or anything? I'm about at my wits end with this jsessionid thing - now our search engine which indexes by crawling the site (and doesn't support cookies) can't index properly because of the jsessionid property ... and frankly, I'm not really caring at this point if the 3-5% of visitors to our site can't use sessions (there's practically no functionality that depends on it anyway - mainly a performance improvement, where it is being used). I'd like to leave sessions via cookies enabled, but disable URL rewriting for sessions. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disable URL-rewriting?
Got it. Simple as pie, actually. Created the Wrapper class, no-op'd the encodeUrl/encodeURL and put it into the Filter that was already setup to handle all incoming requests ... piece of cake :) Sweet! Here's the wrapper code: public class StripSessionIdWrapper extends HttpServletResponseWrapper { public StripSessionIdWrapper(HttpServletResponse response) { super(response); } public String encodeUrl(String url) { return url; } public String encodeURL(String url) { return url; } } and here's the couple lines I have in my Filter: StripSessionIdWrapper wrapper = new StripSessionIdWrapper((HttpServletResponse)response); and then, instead of passing "response" to the chain.doFilter, I pass "wrapper" Nice ... clean, I like it. Kris Schneider wrote: First, check out a tutorial on Servlet 2.3 filters. One at: http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets8.html Essentially, it'll boil down to creating two classes: a filter (implements javax.servlet.Filter) and a response wrapper (extends javax.servlet.http.HttpServletResponseWrapper). Then, you'll need to add and elements to web.xml. How would you determine if a request originated from your search engine? User-Agent header? Specific request parameter? Quoting Brice Ruth <[EMAIL PROTECTED]>: Kris, Thanks for the response. I saw that I could do the opposite :) - no help there. I'd be really interested in pursuing your second suggestion. Ideally, I'd see setting up a filter that processes every incoming request and determines if its originating from our search engine, if so, then no-op those methods, otherwise leave everything as is ... How exactly would I go about doing what you suggest? I've only been doing Java/Servlet stuff for about a year or so, and that mostly via JSP/Struts/etc. - so I'm no expert when it comes to this. Kris Schneider wrote: NAFAIK, but that's by no means definitive. I *do* know that you can configure a TC context (or default context) to do the opposite. In other words, turn off cookies and only use rewriting. Hm, what if you create a filter to wrap the response with an HttpServletResponseWrapper that no-ops encodeUrl and encodeURL? Quoting Brice Ruth <[EMAIL PROTECTED]>: Is there any way to disable URL rewriting (with jsessionid) in Tomcat or via struts-config.xml or anything? I'm about at my wits end with this jsessionid thing - now our search engine which indexes by crawling the site (and doesn't support cookies) can't index properly because of the jsessionid property ... and frankly, I'm not really caring at this point if the 3-5% of visitors to our site can't use sessions (there's practically no functionality that depends on it anyway - mainly a performance improvement, where it is being used). I'd like to leave sessions via cookies enabled, but disable URL rewriting for sessions. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disable URL-rewriting?
Am Dienstag, 18. November 2003 20:10 schrieb Kris Schneider: Oh, I see that I left that approach out, for I dropped it at an early stage in my design considerations. Of course, you can kill off session information via a filter that way, but what will happen if the user has disabled Cookies, and when it's not Google, but some John Doe user? OK, let's simulate the results. Request number #1 is made, resulting in session creation due to something in my code, but the filter intercepts and removes the session info from the response. So far, so good. Now the user clicks on a link. As the request contains no session info, the server handles it as a new session-related request and creates a new session. Returning, the filter kills the session again, of course. And so on. In the meantime, there may be 30+ stale sessions hanging around, waiting for timeout just because the filter suppressed some crucial infor- mation the server expected on return. Given it's possible to say session.invalidate() somewhere in time before committing the response via the filter, I still would feel like doing something very wrong somehow, for I would code against the machine and the specifications, and that's not how it's meant to be. It may work, even, but still I would feel just uneasy. Then, the preferable approach would be to avoid sessions if you don't want them (just as in RL) instead of killing them afterwards, If that's not possible, I tend to handle general things in a general fashion and handle the special conditions (possible scenario: the GoogleBot shows up again) in special ways, too. Considering a customized Google filter: well, the simple cloaking approach I described is easier to implement and just closer to my personal taste. YMMV. Thanks for an interesting idea I may have dropped too soon back then, but I think, even when reconsidering it, I would still stick to a different approach. It's my private site after all, and I just don't want to feel bad about it in some way :-) -- Chris. NB. One thing I still wonder about is whether it's really the jsessionid thing Google dislikes or the *.do ending in general. AFAI can tell, Google also indexes plain .jsps, but those run in a session anyway by default. Don't know. If it's really just the *.do ending, it's eas to map Struts to *.html (assigning a .htm ending to truly static pages), use mod_rewrite or the rewrite filter available in Resin 3.0. But didn't check all that yet. > NAFAIK, but that's by no means definitive. I *do* know that you can > configure a TC context (or default context) to do the opposite. In > other words, turn off cookies and only use rewriting. > > Hm, what if you create a filter to wrap the response with an > HttpServletResponseWrapper that no-ops encodeUrl and encodeURL? > > Quoting Brice Ruth <[EMAIL PROTECTED]>: > > Is there any way to disable URL rewriting (with jsessionid) in > > Tomcat or via struts-config.xml or anything? I'm about at my wits > > end with this jsessionid thing - now our search engine which > > indexes by crawling the site (and doesn't support cookies) can't > > index properly because of the jsessionid property ... and frankly, > > I'm not really caring at this point if the 3-5% of visitors to our > > site can't use sessions (there's practically no functionality that > > depends on it anyway - mainly a performance improvement, where it > > is being used). I'd like to leave sessions via cookies enabled, but > > disable URL rewriting for sessions. > > > > -- > > Brice D. Ruth > > Sr. IT Analyst > > Fiskars Brands, Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disable URL-rewriting?
I'm going to handle this by specifically looking at the User-Agent header that gets sent ... if I can detect that it is a search engine (particularly *our search engine crawler), then I'll disable the encodeURL/Url methods ... otherwise, I'll leave things as is. This takes care of your considerations, I think. Brice Christian Bollmeyer wrote: Am Dienstag, 18. November 2003 20:10 schrieb Kris Schneider: Oh, I see that I left that approach out, for I dropped it at an early stage in my design considerations. Of course, you can kill off session information via a filter that way, but what will happen if the user has disabled Cookies, and when it's not Google, but some John Doe user? OK, let's simulate the results. Request number #1 is made, resulting in session creation due to something in my code, but the filter intercepts and removes the session info from the response. So far, so good. Now the user clicks on a link. As the request contains no session info, the server handles it as a new session-related request and creates a new session. Returning, the filter kills the session again, of course. And so on. In the meantime, there may be 30+ stale sessions hanging around, waiting for timeout just because the filter suppressed some crucial infor- mation the server expected on return. Given it's possible to say session.invalidate() somewhere in time before committing the response via the filter, I still would feel like doing something very wrong somehow, for I would code against the machine and the specifications, and that's not how it's meant to be. It may work, even, but still I would feel just uneasy. Then, the preferable approach would be to avoid sessions if you don't want them (just as in RL) instead of killing them afterwards, If that's not possible, I tend to handle general things in a general fashion and handle the special conditions (possible scenario: the GoogleBot shows up again) in special ways, too. Considering a customized Google filter: well, the simple cloaking approach I described is easier to implement and just closer to my personal taste. YMMV. Thanks for an interesting idea I may have dropped too soon back then, but I think, even when reconsidering it, I would still stick to a different approach. It's my private site after all, and I just don't want to feel bad about it in some way :-) -- Chris. NB. One thing I still wonder about is whether it's really the jsessionid thing Google dislikes or the *.do ending in general. AFAI can tell, Google also indexes plain .jsps, but those run in a session anyway by default. Don't know. If it's really just the *.do ending, it's eas to map Struts to *.html (assigning a .htm ending to truly static pages), use mod_rewrite or the rewrite filter available in Resin 3.0. But didn't check all that yet. NAFAIK, but that's by no means definitive. I *do* know that you can configure a TC context (or default context) to do the opposite. In other words, turn off cookies and only use rewriting. Hm, what if you create a filter to wrap the response with an HttpServletResponseWrapper that no-ops encodeUrl and encodeURL? Quoting Brice Ruth <[EMAIL PROTECTED]>: Is there any way to disable URL rewriting (with jsessionid) in Tomcat or via struts-config.xml or anything? I'm about at my wits end with this jsessionid thing - now our search engine which indexes by crawling the site (and doesn't support cookies) can't index properly because of the jsessionid property ... and frankly, I'm not really caring at this point if the 3-5% of visitors to our site can't use sessions (there's practically no functionality that depends on it anyway - mainly a performance improvement, where it is being used). I'd like to leave sessions via cookies enabled, but disable URL rewriting for sessions. -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Brice D. Ruth Sr. IT Analyst Fiskars Brands, Inc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]