Re: Check if a URL exists programatically
2015-07-17 18:48 GMT+03:00 Mitch Claborn mitch...@claborn.net: I spent some time yesterday digging through code without much luck. Today I'm going to experiment with this: getting a Request Dispatcher for the URL from the ServletContext, creating a dummy ServerRequest and ServerResponse object and invoking include(request, response) or forward() on that dispatcher. With luck, I'll be able to get what would be the response from a HEAD or a GET request in some sort of output stream in the response object, then examine that output stream for the result. The way I finally solved this is a bit of a shortcut, but it works. Since our site is completely based on Struts, I'm reading the struts.xml file and matching the action names against the URL returned by google. For those that don't have a specific match, I call a routine in my default action that checks for various dynamically named pages (categories, products, etc). It runs super fast and doesn't need the dummy request and response objects. I was hoping for something that would be framework agnostic, but this will do for now. Thanks all for your help and suggestions. Mitch - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Check if a URL exists programatically
2015-07-17 18:48 GMT+03:00 Mitch Claborn mitch...@claborn.net: I spent some time yesterday digging through code without much luck. Today I'm going to experiment with this: getting a Request Dispatcher for the URL from the ServletContext, creating a dummy ServerRequest and ServerResponse object and invoking include(request, response) or forward() on that dispatcher. With luck, I'll be able to get what would be the response from a HEAD or a GET request in some sort of output stream in the response object, then examine that output stream for the result. Using dummy objects with those APIs is disallowed by Servlet specification. If you run in strict compliance mode, Tomcat will check this requirement. As far as I remember, the error message mentions the chapter number of specification. http://tomcat.apache.org/tomcat-7.0-doc/config/systemprops.html#Specification See for WRAP_SAME_OBJECT Testing for existence of static pages should be easy, with ServletContext.getResource[AsStream]() or with other APIs (using Tomcat internal resources APIs, or accessing the files directly) Testing for existence of dynamic pages may be hard. You cannot check for existence unless making an actual request (better with a HEAD request rather than with a GET). If you are unlucky, a GET request may trigger some action. E.g. Tomcat Manager application was suffering from such feature, https://bz.apache.org/bugzilla/show_bug.cgi?id=50231 A HEAD request is better, as it produces no output, but e.g. for JSPs a HEAD request is implemented as GET + suppressing output, so it actually performs the same processing. Best regards, Konstantin Kolinko - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Check if a URL exists programatically
On 07/17/2015 10:48 AM, Mitch Claborn wrote: On 07/16/2015 02:19 PM, chris derham wrote: I already have a custom error page. When I detect that a URL returned by google would return a 404, I exclude it from the search results so that the user never sees it. Mitch Mitch, Ok I see now what you mean. Sorry your original email was quite clear. Hmm interesting challenge. Big picture terms, I guess the two obvious choices seem to be to not use google for searching, or parse the google results, and determine the url validity as you are doing. Depending on the urls you use, that could be horrible. Guess that's where you are. Is not using google an option? Please let us know how you resolve it. Chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org Doing without google is not an option. We are quite happy with them except for this one, admittedly minor, glitch. I spent some time yesterday digging through code without much luck. Today I'm going to experiment with this: getting a Request Dispatcher for the URL from the ServletContext, creating a dummy ServerRequest and ServerResponse object and invoking include(request, response) or forward() on that dispatcher. With luck, I'll be able to get what would be the response from a HEAD or a GET request in some sort of output stream in the response object, then examine that output stream for the result. I guess I'm giving up on this. I tried the approach described above, but can't seem to make it work. Trying the case of a known-good URL as a baseline. When I invoke displatcher,forward(request,response) my dummy response objects gets called with a sendError(404, /url.html), but I can also see evidence that the code that should run for that URL (a struts action) is running and is returning a good Struts response. When I enable low level logging, it appears to me that the JSP that renders the output is being called, but the output is not making it back to my dummy response object. That sendError() is coming from the DefaultServlet, which is odd because I would think that should not be called as Struts is (should be) intercepting all of the requests. I must be setting something up wrong somewhere. The only next step I can think of is to compile Tomcat for myself so I can debug the execution path from the forward() to figure out what's going on. I can't justify that much time and effort on this. I'm guessing the RequestDispatcher only works down below the filters, which is where Struts is invoked. I welcome any further ideas. -- Mitch - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Check if a URL exists programatically
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Mitch, On 7/20/15 2:09 PM, Mitch Claborn wrote: On 07/17/2015 10:48 AM, Mitch Claborn wrote: On 07/16/2015 02:19 PM, chris derham wrote: I already have a custom error page. When I detect that a URL returned by google would return a 404, I exclude it from the search results so that the user never sees it. Mitch Mitch, Ok I see now what you mean. Sorry your original email was quite clear. Hmm interesting challenge. Big picture terms, I guess the two obvious choices seem to be to not use google for searching, or parse the google results, and determine the url validity as you are doing. Depending on the urls you use, that could be horrible. Guess that's where you are. Is not using google an option? Please let us know how you resolve it. Chris - - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org Doing without google is not an option. We are quite happy with them except for this one, admittedly minor, glitch. I spent some time yesterday digging through code without much luck. Today I'm going to experiment with this: getting a Request Dispatcher for the URL from the ServletContext, creating a dummy ServerRequest and ServerResponse object and invoking include(request, response) or forward() on that dispatcher. With luck, I'll be able to get what would be the response from a HEAD or a GET request in some sort of output stream in the response object, then examine that output stream for the result. I guess I'm giving up on this. I tried the approach described above, but can't seem to make it work. Trying the case of a known-good URL as a baseline. When I invoke displatcher,forward(request,response) my dummy response objects gets called with a sendError(404, /url.html), but I can also see evidence that the code that should run for that URL (a struts action) is running and is returning a good Struts response. When I enable low level logging, it appears to me that the JSP that renders the output is being called, but the output is not making it back to my dummy response object. That sendError() is coming from the DefaultServlet, which is odd because I would think that should not be called as Struts is (should be) intercepting all of the requests. S2 is implemented as a Filter. If nothing matches in the S2 setup, it will probably just call-down the Filter chain, eventually ending up at the DefaultServlet. So, a 404 is pretty much always handled by the DefaultServlet. I must be setting something up wrong somewhere. The only next step I can think of is to compile Tomcat for myself so I can debug the execution path from the forward() to figure out what's going on. I can't justify that much time and effort on this. I'm guessing the RequestDispatcher only works down below the filters, which is where Struts is invoked. RequestDispatcher will act pretty much just like an incoming request. At this point, you may just want to make the loopback request. You mentioned wasting resources using this approach. Which resources? If you're willing to call the RequestDispatcher, you're pretty much using those resources already. About the only difference is the use of another Thread. You can limit the number of threads used for these loopback requests by creating a second Connector that is only used for loopback requests, and use an Executor that has a small number of threads. Of course, if any incoming request can result in a loopback request, then it's possible to DOS your server just by making lots of requests that will trigger these kinds of lookback requests. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org iQIcBAEBCAAGBQJVrXoAAAoJEBzwKT+lPKRYzxwQAK9y9WOmqbDh8pCik1paKHsU aldPFJVwxdgxNxKPnhNHvtBVHBn+aueOv9ywK1MKun2UqYxznZmTon3Fy4IehVcV Y16+45MXXA/dpIDEwVgj8ByNB/7NRPscxkg9IIKV+eliGhhjpb33owCoT8qd5p7/ yDwvVM5bMZ9h4+faHinu/FY56Qx7tjBpXER/uLOK8aDgxgak1TdyhBzQHXktD1zB UPmydwDxlzGv0dODY/cEzWAh8FBDiyZtRakAKSs0rCD3t7Zs3q4JecEFq/vQDP71 xZoGwBtge3+Im2gEav5GYYF2EsDKrEUD1dbqCUyBI3uOnHQvNptngeKXfoq4Vkv6 6HY3VEMS0wsYPAG2JhAc/TVGH0Cm8Eq9FFvlRUeCIjOwVUK0OXACXTP1Wn9VDyUH vo+VfIUHgqzkdoGzKyoU6gvZgA7cwQAAp9iQlrVhbAxtvKkgor607a3g0LZ+A5hI Zw04wNy4ANsYi8ad989Ycg/Xmr9tZId6F1y9+sSmeJ3imWnEOYH6uyToa/0p8cQd VC9SfuOATSrjOdnn7CPiGdnQCmW3JSB3mZBCp4er78rHf5oyDN5Ybgm5jXGfGKI+ 61WlePY/NA5UsIMR8DYWSPIXdJfyVfEQcoUVmWV2fIt2zq0sf0c4wpt69c12PR+z 7aTZc4+lCOLbN0KJ/3zv =8TfU -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Check if a URL exists programatically
On 07/16/2015 02:19 PM, chris derham wrote: I already have a custom error page. When I detect that a URL returned by google would return a 404, I exclude it from the search results so that the user never sees it. Mitch Mitch, Ok I see now what you mean. Sorry your original email was quite clear. Hmm interesting challenge. Big picture terms, I guess the two obvious choices seem to be to not use google for searching, or parse the google results, and determine the url validity as you are doing. Depending on the urls you use, that could be horrible. Guess that's where you are. Is not using google an option? Please let us know how you resolve it. Chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org Doing without google is not an option. We are quite happy with them except for this one, admittedly minor, glitch. I spent some time yesterday digging through code without much luck. Today I'm going to experiment with this: getting a Request Dispatcher for the URL from the ServletContext, creating a dummy ServerRequest and ServerResponse object and invoking include(request, response) or forward() on that dispatcher. With luck, I'll be able to get what would be the response from a HEAD or a GET request in some sort of output stream in the response object, then examine that output stream for the result. -- Mitch - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Check if a URL exists programatically
Short question: How can I, from within code running under Tomcat, determine if a given URL request to that tomcat instance would result in a 404 or not, without calling back to the Tomcat using an HTTP HEAD or GET? Background: We use google custom search by calling the google server and then formatting the results on our search page. Our range of products is fairly fluid, and there is occasionally a gap between when a product goes away and the google search index is updated, which would result in a 404 if user clicked that link in the search results. (I know that I can ask google to re-index, but I still need to solve this problem.) Rather than write a ton of code for the various types of pages that we have (product, category, etc) I'd like to just be able to call some Tomcat method to determine if the URL that I get back from google would result in a 404 or not. I'm currently calling back to the Tomcat instance using an HTTP HEAD call, but that is a waste of resources and during periods of high volume uses up processing threads that I want to reserve for actual customers. We are using Tomcat 7 with Struts. -- Mitch - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Check if a URL exists programatically
On 07/16/2015 01:04 PM, chris derham wrote: Short question: How can I, from within code running under Tomcat, determine if a given URL request to that tomcat instance would result in a 404 or not, without calling back to the Tomcat using an HTTP HEAD or GET? Background: We use google custom search by calling the google server and then formatting the results on our search page. Our range of products is fairly fluid, and there is occasionally a gap between when a product goes away and the google search index is updated, which would result in a 404 if user clicked that link in the search results. (I know that I can ask google to re-index, but I still need to solve this problem.) Rather than write a ton of code for the various types of pages that we have (product, category, etc) I'd like to just be able to call some Tomcat method to determine if the URL that I get back from google would result in a 404 or not. I'm currently calling back to the Tomcat instance using an HTTP HEAD call, but that is a waste of resources and during periods of high volume uses up processing threads that I want to reserve for actual customers. We are using Tomcat 7 with Struts. Mitch, What will you do when you detect a 404? Couldn't you just implement a custom 404 error page, that does what ever it is? Chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org I already have a custom error page. When I detect that a URL returned by google would return a 404, I exclude it from the search results so that the user never sees it. Mitch - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Check if a URL exists programatically
I already have a custom error page. When I detect that a URL returned by google would return a 404, I exclude it from the search results so that the user never sees it. Mitch Mitch, Ok I see now what you mean. Sorry your original email was quite clear. Hmm interesting challenge. Big picture terms, I guess the two obvious choices seem to be to not use google for searching, or parse the google results, and determine the url validity as you are doing. Depending on the urls you use, that could be horrible. Guess that's where you are. Is not using google an option? Please let us know how you resolve it. Chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Check if a URL exists programatically
Short question: How can I, from within code running under Tomcat, determine if a given URL request to that tomcat instance would result in a 404 or not, without calling back to the Tomcat using an HTTP HEAD or GET? Background: We use google custom search by calling the google server and then formatting the results on our search page. Our range of products is fairly fluid, and there is occasionally a gap between when a product goes away and the google search index is updated, which would result in a 404 if user clicked that link in the search results. (I know that I can ask google to re-index, but I still need to solve this problem.) Rather than write a ton of code for the various types of pages that we have (product, category, etc) I'd like to just be able to call some Tomcat method to determine if the URL that I get back from google would result in a 404 or not. I'm currently calling back to the Tomcat instance using an HTTP HEAD call, but that is a waste of resources and during periods of high volume uses up processing threads that I want to reserve for actual customers. We are using Tomcat 7 with Struts. Mitch, What will you do when you detect a 404? Couldn't you just implement a custom 404 error page, that does what ever it is? Chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org