Re: Check if a URL exists programatically

2015-07-21 Thread Mitch Claborn

2015-07-17 18:48 GMT+03:00 Mitch Claborn mitch...@claborn.net:

I spent some time yesterday digging through code without much luck. Today
I'm going to experiment with this: getting a Request Dispatcher for the URL
from the ServletContext, creating a dummy ServerRequest and ServerResponse
object and invoking include(request, response) or forward() on that
dispatcher.  With luck, I'll be able to get what would be the response from
a HEAD or a GET request in some sort of output stream in the response
object, then examine that output stream for the result.


The way I finally solved this is a bit of a shortcut, but it works.
Since our site is completely based on Struts, I'm reading the struts.xml
file and matching the action names against the URL returned by google.
For those that don't have a specific match, I call a routine in my
default action that checks for various dynamically named pages
(categories, products, etc).  It runs super fast and doesn't need the
dummy request and response objects.

I was hoping for something that would be framework agnostic, but this
will do for now.

Thanks all for your help and suggestions.

Mitch



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Check if a URL exists programatically

2015-07-21 Thread Konstantin Kolinko
2015-07-17 18:48 GMT+03:00 Mitch Claborn mitch...@claborn.net:

 I spent some time yesterday digging through code without much luck. Today
 I'm going to experiment with this: getting a Request Dispatcher for the URL
 from the ServletContext, creating a dummy ServerRequest and ServerResponse
 object and invoking include(request, response) or forward() on that
 dispatcher.  With luck, I'll be able to get what would be the response from
 a HEAD or a GET request in some sort of output stream in the response
 object, then examine that output stream for the result.


Using dummy objects with those APIs is disallowed by Servlet specification.

If you run in strict compliance mode, Tomcat will check this
requirement. As far as I remember, the error message mentions the
chapter number of specification.

http://tomcat.apache.org/tomcat-7.0-doc/config/systemprops.html#Specification
See for WRAP_SAME_OBJECT


Testing for existence of static pages should be easy, with
ServletContext.getResource[AsStream]() or with other APIs (using
Tomcat internal resources APIs, or accessing the files directly)

Testing for existence of dynamic pages may be hard. You cannot check
for existence unless making an actual request (better with a HEAD
request rather than with a GET).

If you are unlucky, a GET request may trigger some action. E.g. Tomcat
Manager application was suffering from such feature,
https://bz.apache.org/bugzilla/show_bug.cgi?id=50231

A HEAD request is better, as it produces no output, but e.g. for JSPs
a HEAD request is implemented as GET + suppressing output, so it
actually performs the same processing.

Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Check if a URL exists programatically

2015-07-20 Thread Mitch Claborn

On 07/17/2015 10:48 AM, Mitch Claborn wrote:

On 07/16/2015 02:19 PM, chris derham wrote:
I already have a custom error page. When I detect that a URL 
returned by
google would return a 404, I exclude it from the search results so 
that the

user never sees it.

Mitch

Mitch,

Ok I see now what you mean. Sorry your original email was quite clear.

Hmm interesting challenge. Big picture terms, I guess the two obvious
choices seem to be to not use google for searching, or parse the
google results, and determine the url validity as you are doing.
Depending on the urls you use, that could be horrible. Guess that's
where you are. Is not using google an option?

Please let us know how you resolve it.

Chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org


Doing without google is not an option.  We are quite happy with them 
except for this one, admittedly minor, glitch.


I spent some time yesterday digging through code without much luck. 
Today I'm going to experiment with this: getting a Request Dispatcher 
for the URL from the ServletContext, creating a dummy ServerRequest 
and ServerResponse object and invoking include(request, response) or 
forward() on that dispatcher.  With luck, I'll be able to get what 
would be the response from a HEAD or a GET request in some sort of 
output stream in the response object, then examine that output stream 
for the result.




I guess I'm giving up on this. I tried the approach described above, but 
can't seem to make it work.  Trying the case of a known-good URL as a 
baseline.  When I invoke displatcher,forward(request,response) my dummy 
response objects gets called with a sendError(404, /url.html), but I 
can also see evidence that the code that should run for that URL (a 
struts action) is running and is returning a good Struts response.  When 
I enable low level logging, it appears to me that the JSP that renders 
the output is being called, but the output is not making it back to my 
dummy response object.


That sendError() is coming from the DefaultServlet, which is odd because 
I would think that should not be called as Struts is (should be) 
intercepting all of the requests.


I must be setting something up wrong somewhere.  The only next step I 
can think of is to compile Tomcat for myself so I can debug the 
execution path from the forward() to figure out what's going on.  I 
can't justify that much time and effort on this.


I'm guessing the RequestDispatcher only works down below the filters, 
which is where Struts is invoked.


I welcome any further ideas.


--

Mitch


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Check if a URL exists programatically

2015-07-20 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Mitch,

On 7/20/15 2:09 PM, Mitch Claborn wrote:
 On 07/17/2015 10:48 AM, Mitch Claborn wrote:
 On 07/16/2015 02:19 PM, chris derham wrote:
 I already have a custom error page. When I detect that a URL 
 returned by google would return a 404, I exclude it from the
 search results so that the user never sees it.
 
 Mitch
 Mitch,
 
 Ok I see now what you mean. Sorry your original email was quite
 clear.
 
 Hmm interesting challenge. Big picture terms, I guess the two
 obvious choices seem to be to not use google for searching, or
 parse the google results, and determine the url validity as you
 are doing. Depending on the urls you use, that could be
 horrible. Guess that's where you are. Is not using google an
 option?
 
 Please let us know how you resolve it.
 
 Chris
 
 
- -

 
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org
 
 
 Doing without google is not an option.  We are quite happy with
 them except for this one, admittedly minor, glitch.
 
 I spent some time yesterday digging through code without much
 luck. Today I'm going to experiment with this: getting a Request
 Dispatcher for the URL from the ServletContext, creating a dummy
 ServerRequest and ServerResponse object and invoking
 include(request, response) or forward() on that dispatcher.  With
 luck, I'll be able to get what would be the response from a HEAD
 or a GET request in some sort of output stream in the response
 object, then examine that output stream for the result.
 
 
 I guess I'm giving up on this. I tried the approach described
 above, but can't seem to make it work.  Trying the case of a
 known-good URL as a baseline.  When I invoke
 displatcher,forward(request,response) my dummy response objects
 gets called with a sendError(404, /url.html), but I can also see
 evidence that the code that should run for that URL (a struts
 action) is running and is returning a good Struts response.  When I
 enable low level logging, it appears to me that the JSP that
 renders the output is being called, but the output is not making it
 back to my dummy response object.
 
 That sendError() is coming from the DefaultServlet, which is odd
 because I would think that should not be called as Struts is
 (should be) intercepting all of the requests.

S2 is implemented as a Filter. If nothing matches in the S2 setup, it
will probably just call-down the Filter chain, eventually ending up at
the DefaultServlet. So, a 404 is pretty much always handled by the
DefaultServlet.

 I must be setting something up wrong somewhere.  The only next step
 I can think of is to compile Tomcat for myself so I can debug the 
 execution path from the forward() to figure out what's going on.
 I can't justify that much time and effort on this.
 
 I'm guessing the RequestDispatcher only works down below the
 filters, which is where Struts is invoked.

RequestDispatcher will act pretty much just like an incoming request.
At this point, you may just want to make the loopback request. You
mentioned wasting resources using this approach. Which resources? If
you're willing to call the RequestDispatcher, you're pretty much using
those resources already. About the only difference is the use of
another Thread. You can limit the number of threads used for these
loopback requests by creating a second Connector that is only used
for loopback requests, and use an Executor that has a small number
of threads. Of course, if any incoming request can result in a
loopback request, then it's possible to DOS your server just by making
lots of requests that will trigger these kinds of lookback requests.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCAAGBQJVrXoAAAoJEBzwKT+lPKRYzxwQAK9y9WOmqbDh8pCik1paKHsU
aldPFJVwxdgxNxKPnhNHvtBVHBn+aueOv9ywK1MKun2UqYxznZmTon3Fy4IehVcV
Y16+45MXXA/dpIDEwVgj8ByNB/7NRPscxkg9IIKV+eliGhhjpb33owCoT8qd5p7/
yDwvVM5bMZ9h4+faHinu/FY56Qx7tjBpXER/uLOK8aDgxgak1TdyhBzQHXktD1zB
UPmydwDxlzGv0dODY/cEzWAh8FBDiyZtRakAKSs0rCD3t7Zs3q4JecEFq/vQDP71
xZoGwBtge3+Im2gEav5GYYF2EsDKrEUD1dbqCUyBI3uOnHQvNptngeKXfoq4Vkv6
6HY3VEMS0wsYPAG2JhAc/TVGH0Cm8Eq9FFvlRUeCIjOwVUK0OXACXTP1Wn9VDyUH
vo+VfIUHgqzkdoGzKyoU6gvZgA7cwQAAp9iQlrVhbAxtvKkgor607a3g0LZ+A5hI
Zw04wNy4ANsYi8ad989Ycg/Xmr9tZId6F1y9+sSmeJ3imWnEOYH6uyToa/0p8cQd
VC9SfuOATSrjOdnn7CPiGdnQCmW3JSB3mZBCp4er78rHf5oyDN5Ybgm5jXGfGKI+
61WlePY/NA5UsIMR8DYWSPIXdJfyVfEQcoUVmWV2fIt2zq0sf0c4wpt69c12PR+z
7aTZc4+lCOLbN0KJ/3zv
=8TfU
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Check if a URL exists programatically

2015-07-17 Thread Mitch Claborn

On 07/16/2015 02:19 PM, chris derham wrote:

I already have a custom error page.  When I detect that a URL returned by
google would return a 404, I exclude it from the search results so that the
user never sees it.

Mitch

Mitch,

Ok I see now what you mean. Sorry your original email was quite clear.

Hmm interesting challenge. Big picture terms, I guess the two obvious
choices seem to be to not use google for searching, or parse the
google results, and determine the url validity as you are doing.
Depending on the urls you use, that could be horrible. Guess that's
where you are. Is not using google an option?

Please let us know how you resolve it.

Chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org


Doing without google is not an option.  We are quite happy with them 
except for this one, admittedly minor, glitch.


I spent some time yesterday digging through code without much luck. 
Today I'm going to experiment with this: getting a Request Dispatcher 
for the URL from the ServletContext, creating a dummy ServerRequest and 
ServerResponse object and invoking include(request, response) or 
forward() on that dispatcher.  With luck, I'll be able to get what would 
be the response from a HEAD or a GET request in some sort of output 
stream in the response object, then examine that output stream for the 
result.


--

Mitch


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Check if a URL exists programatically

2015-07-16 Thread Mitch Claborn
Short question: How can I, from within code running under Tomcat, 
determine if a given URL request to that tomcat instance would result in 
a 404 or not, without calling back to the Tomcat using an HTTP HEAD or GET?


Background: We use google custom search by calling the google server and 
then formatting the results on our search page.  Our range of products 
is fairly fluid, and there is occasionally a gap between when a product 
goes away and the google search index is updated, which would result in 
a 404 if user clicked that link in the search results.  (I know that I 
can ask google to re-index, but I still need to solve this problem.)


Rather than write a ton of code for the various types of pages that we 
have (product, category, etc) I'd like to just be able to call some 
Tomcat method to determine if the URL that I get back from google would 
result in a 404 or not.  I'm currently calling back to the Tomcat 
instance using an HTTP HEAD call, but that is a waste of resources and 
during periods of high volume uses up processing threads that I want to 
reserve for actual customers.


We are using Tomcat 7 with Struts.


--

Mitch


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Check if a URL exists programatically

2015-07-16 Thread Mitch Claborn

On 07/16/2015 01:04 PM, chris derham wrote:

Short question: How can I, from within code running under Tomcat, determine
if a given URL request to that tomcat instance would result in a 404 or not,
without calling back to the Tomcat using an HTTP HEAD or GET?

Background: We use google custom search by calling the google server and
then formatting the results on our search page.  Our range of products is
fairly fluid, and there is occasionally a gap between when a product goes
away and the google search index is updated, which would result in a 404 if
user clicked that link in the search results.  (I know that I can ask google
to re-index, but I still need to solve this problem.)

Rather than write a ton of code for the various types of pages that we have
(product, category, etc) I'd like to just be able to call some Tomcat method
to determine if the URL that I get back from google would result in a 404 or
not.  I'm currently calling back to the Tomcat instance using an HTTP HEAD
call, but that is a waste of resources and during periods of high volume
uses up processing threads that I want to reserve for actual customers.

We are using Tomcat 7 with Struts.

Mitch,

What will you do when you detect a 404? Couldn't you just implement a
custom 404 error page, that does what ever it is?

Chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




I already have a custom error page.  When I detect that a URL returned 
by google would return a 404, I exclude it from the search results so 
that the user never sees it.


Mitch



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Check if a URL exists programatically

2015-07-16 Thread chris derham
 I already have a custom error page.  When I detect that a URL returned by
 google would return a 404, I exclude it from the search results so that the
 user never sees it.

 Mitch

Mitch,

Ok I see now what you mean. Sorry your original email was quite clear.

Hmm interesting challenge. Big picture terms, I guess the two obvious
choices seem to be to not use google for searching, or parse the
google results, and determine the url validity as you are doing.
Depending on the urls you use, that could be horrible. Guess that's
where you are. Is not using google an option?

Please let us know how you resolve it.

Chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Check if a URL exists programatically

2015-07-16 Thread chris derham
 Short question: How can I, from within code running under Tomcat, determine
 if a given URL request to that tomcat instance would result in a 404 or not,
 without calling back to the Tomcat using an HTTP HEAD or GET?

 Background: We use google custom search by calling the google server and
 then formatting the results on our search page.  Our range of products is
 fairly fluid, and there is occasionally a gap between when a product goes
 away and the google search index is updated, which would result in a 404 if
 user clicked that link in the search results.  (I know that I can ask google
 to re-index, but I still need to solve this problem.)

 Rather than write a ton of code for the various types of pages that we have
 (product, category, etc) I'd like to just be able to call some Tomcat method
 to determine if the URL that I get back from google would result in a 404 or
 not.  I'm currently calling back to the Tomcat instance using an HTTP HEAD
 call, but that is a waste of resources and during periods of high volume
 uses up processing threads that I want to reserve for actual customers.

 We are using Tomcat 7 with Struts.

Mitch,

What will you do when you detect a 404? Couldn't you just implement a
custom 404 error page, that does what ever it is?

Chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org