Re: Removing the jsessionid for SEO
Hello, I still didn't find the time to make a blog post about this. So I just put the code on pastebin: http://pastebin.org/31242 I'm looking forward to your feedback :) I tested this filter on Jetty and Tomcat (with Firefox' user agent switcher) where it worked fine. However, as stated in the code, some app servers might behave a little different, so YMMV. greetings, Rüdiger Am Montag, den 14.04.2008, 16:37 +0200 schrieb Korbinian Bachl - privat: Yeah, its quite a shame that google doesnt open source their logic ;) would be nice if you could give us the code however, so we could have a look at it :) Rüdiger Schulz schrieb: Hm, SEO is really a little bit like black science sometimes *g* This (german) article states, that SID cloaking would be ok for google: http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking Some more googling, and here someone seems to confirm this: http://www.webmasterworld.com/cloaking/3201743.htm I was actually at SMX West and Matt Cutts specifically sa*id* that this is OK All I can say in our case is that I added this filter several months ago, and I can't see any negative effects so far. greetings, Rüdiger 2008/4/14, Korbinian Bachl - privat [EMAIL PROTECTED]: Hi Rüdiger, AFAIK this could lead to some punishment by google, as he browses the site multiple times using different agents and origin IPs and in case he sees different behaviours he thinks about cloaking/ prepared content and will act accordingly to it; This is usually noticed after the regular google index refreshes that happen some times a year - you should keep an eye onto this; Best, Korbinian Rüdiger Schulz schrieb: Hello everybody, I just want to add my 2 cents to this discussion. At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index. Yeah, it would be nice if the google bot would be as clever as the one from yahoo, and just remove them himself. But he doesn't. So I implemented a Servlet-Filter which checks the user agent header for google bot, and skips the url rewriting just for those clients. As this will generate lots of new sessions, the filter invalidates the session right after the request. Also, if a crawler is doing a request containing a jsessionid (which he stored before the filter was implemented), he redirects the crawler to the same URL, just without the jsessionid parameter. That way, the index will be updated for those old URLs. Now we have almost none of those URLs in google's index. If anyone is interested in the code, I'd be willing to publish this. As it is not wicket specific, I could share it with some generic servlet tools OS project - is there something like that on apache or elsewhere? But maybe Google is smarter by now, and it is not required anymore? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
Hello everybody, I just want to add my 2 cents to this discussion. At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index. Yeah, it would be nice if the google bot would be as clever as the one from yahoo, and just remove them himself. But he doesn't. So I implemented a Servlet-Filter which checks the user agent header for google bot, and skips the url rewriting just for those clients. As this will generate lots of new sessions, the filter invalidates the session right after the request. Also, if a crawler is doing a request containing a jsessionid (which he stored before the filter was implemented), he redirects the crawler to the same URL, just without the jsessionid parameter. That way, the index will be updated for those old URLs. Now we have almost none of those URLs in google's index. If anyone is interested in the code, I'd be willing to publish this. As it is not wicket specific, I could share it with some generic servlet tools OS project - is there something like that on apache or elsewhere? But maybe Google is smarter by now, and it is not required anymore? -- greetings from Berlin, Rüdiger Schulz www.2rue.de www.indyphone.de - Coole Handy Logos einfach selber bauen
Re: Removing the jsessionid for SEO
Hi Rüdiger, AFAIK this could lead to some punishment by google, as he browses the site multiple times using different agents and origin IPs and in case he sees different behaviours he thinks about cloaking/ prepared content and will act accordingly to it; This is usually noticed after the regular google index refreshes that happen some times a year - you should keep an eye onto this; Best, Korbinian Rüdiger Schulz schrieb: Hello everybody, I just want to add my 2 cents to this discussion. At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index. Yeah, it would be nice if the google bot would be as clever as the one from yahoo, and just remove them himself. But he doesn't. So I implemented a Servlet-Filter which checks the user agent header for google bot, and skips the url rewriting just for those clients. As this will generate lots of new sessions, the filter invalidates the session right after the request. Also, if a crawler is doing a request containing a jsessionid (which he stored before the filter was implemented), he redirects the crawler to the same URL, just without the jsessionid parameter. That way, the index will be updated for those old URLs. Now we have almost none of those URLs in google's index. If anyone is interested in the code, I'd be willing to publish this. As it is not wicket specific, I could share it with some generic servlet tools OS project - is there something like that on apache or elsewhere? But maybe Google is smarter by now, and it is not required anymore? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
Hm, SEO is really a little bit like black science sometimes *g* This (german) article states, that SID cloaking would be ok for google: http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking Some more googling, and here someone seems to confirm this: http://www.webmasterworld.com/cloaking/3201743.htm I was actually at SMX West and Matt Cutts specifically sa*id* that this is OK All I can say in our case is that I added this filter several months ago, and I can't see any negative effects so far. greetings, Rüdiger 2008/4/14, Korbinian Bachl - privat [EMAIL PROTECTED]: Hi Rüdiger, AFAIK this could lead to some punishment by google, as he browses the site multiple times using different agents and origin IPs and in case he sees different behaviours he thinks about cloaking/ prepared content and will act accordingly to it; This is usually noticed after the regular google index refreshes that happen some times a year - you should keep an eye onto this; Best, Korbinian Rüdiger Schulz schrieb: Hello everybody, I just want to add my 2 cents to this discussion. At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index. Yeah, it would be nice if the google bot would be as clever as the one from yahoo, and just remove them himself. But he doesn't. So I implemented a Servlet-Filter which checks the user agent header for google bot, and skips the url rewriting just for those clients. As this will generate lots of new sessions, the filter invalidates the session right after the request. Also, if a crawler is doing a request containing a jsessionid (which he stored before the filter was implemented), he redirects the crawler to the same URL, just without the jsessionid parameter. That way, the index will be updated for those old URLs. Now we have almost none of those URLs in google's index. If anyone is interested in the code, I'd be willing to publish this. As it is not wicket specific, I could share it with some generic servlet tools OS project - is there something like that on apache or elsewhere? But maybe Google is smarter by now, and it is not required anymore? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- greetings from Berlin, Rüdiger Schulz www.2rue.de www.indyphone.de - Coole Handy Logos einfach selber bauen
Re: Removing the jsessionid for SEO
Yeah, its quite a shame that google doesnt open source their logic ;) would be nice if you could give us the code however, so we could have a look at it :) Rüdiger Schulz schrieb: Hm, SEO is really a little bit like black science sometimes *g* This (german) article states, that SID cloaking would be ok for google: http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking Some more googling, and here someone seems to confirm this: http://www.webmasterworld.com/cloaking/3201743.htm I was actually at SMX West and Matt Cutts specifically sa*id* that this is OK All I can say in our case is that I added this filter several months ago, and I can't see any negative effects so far. greetings, Rüdiger 2008/4/14, Korbinian Bachl - privat [EMAIL PROTECTED]: Hi Rüdiger, AFAIK this could lead to some punishment by google, as he browses the site multiple times using different agents and origin IPs and in case he sees different behaviours he thinks about cloaking/ prepared content and will act accordingly to it; This is usually noticed after the regular google index refreshes that happen some times a year - you should keep an eye onto this; Best, Korbinian Rüdiger Schulz schrieb: Hello everybody, I just want to add my 2 cents to this discussion. At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index. Yeah, it would be nice if the google bot would be as clever as the one from yahoo, and just remove them himself. But he doesn't. So I implemented a Servlet-Filter which checks the user agent header for google bot, and skips the url rewriting just for those clients. As this will generate lots of new sessions, the filter invalidates the session right after the request. Also, if a crawler is doing a request containing a jsessionid (which he stored before the filter was implemented), he redirects the crawler to the same URL, just without the jsessionid parameter. That way, the index will be updated for those old URLs. Now we have almost none of those URLs in google's index. If anyone is interested in the code, I'd be willing to publish this. As it is not wicket specific, I could share it with some generic servlet tools OS project - is there something like that on apache or elsewhere? But maybe Google is smarter by now, and it is not required anymore? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
Hi Rüdiger, I would be very interested in the code. If you can not find a suitable repository, could you just do something simple like linking to a zip from a blog post? Regards, Erik. Rüdiger Schulz wrote: Hello everybody, I just want to add my 2 cents to this discussion. At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index. Yeah, it would be nice if the google bot would be as clever as the one from yahoo, and just remove them himself. But he doesn't. So I implemented a Servlet-Filter which checks the user agent header for google bot, and skips the url rewriting just for those clients. As this will generate lots of new sessions, the filter invalidates the session right after the request. Also, if a crawler is doing a request containing a jsessionid (which he stored before the filter was implemented), he redirects the crawler to the same URL, just without the jsessionid parameter. That way, the index will be updated for those old URLs. Now we have almost none of those URLs in google's index. If anyone is interested in the code, I'd be willing to publish this. As it is not wicket specific, I could share it with some generic servlet tools OS project - is there something like that on apache or elsewhere? But maybe Google is smarter by now, and it is not required anymore? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
I'll wrap something up in the course of this week, and post it on my blog. (so little time a.t.m.) greetings, Rüdiger 2008/4/14, Erik van Oosten [EMAIL PROTECTED]: Hi Rüdiger, I would be very interested in the code. If you can not find a suitable repository, could you just do something simple like linking to a zip from a blog post? Regards, Erik. Rüdiger Schulz wrote: Hello everybody, I just want to add my 2 cents to this discussion. At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index. Yeah, it would be nice if the google bot would be as clever as the one from yahoo, and just remove them himself. But he doesn't. So I implemented a Servlet-Filter which checks the user agent header for google bot, and skips the url rewriting just for those clients. As this will generate lots of new sessions, the filter invalidates the session right after the request. Also, if a crawler is doing a request containing a jsessionid (which he stored before the filter was implemented), he redirects the crawler to the same URL, just without the jsessionid parameter. That way, the index will be updated for those old URLs. Now we have almost none of those URLs in google's index. If anyone is interested in the code, I'd be willing to publish this. As it is not wicket specific, I could share it with some generic servlet tools OS project - is there something like that on apache or elsewhere? But maybe Google is smarter by now, and it is not required anymore? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- greetings from Berlin, Rüdiger Schulz www.2rue.de www.indyphone.de - Coole Handy Logos einfach selber bauen
Re: Removing the jsessionid for SEO
Hi Jeremy, youre absolutely right; Nearly all spiders today can handle the default sessions, may it be Java, PHP, .Net etc. ; those guys at google and mircosoft arent beginners! And its also important to understand that a URL with wicket in fact is to a part nothing more than a plain string that can be manipulated the way you like. Wicket (a.k.a. your page) needs 1 or maybe 2 parameters - and it has to know how to find them, the rest of the URL can be messed with so it fits to your needs as long as you can garantue the unique behaviour of content-to-URL so you dont get marked as duplicate content spammer; Best, Korbinian Jeremy Thomerson schrieb: If I understood you correctly, the first page is bookmarkable, the second is a wicket URL, tied to the session. That'd be bad for SEO - search engines couldn't see page 2, or they would, but the URL is tied to their session, so even if a user visited that URL, they wouldn't get that page. This means that any content past page one is unreachable from a search engine. I had another thread going about a problem I was having with sessions, which turned up some interesting data. I have over 31,000 pages indexed by Google, they are visiting bookmarkable URLS that DO have jsessionid in them, but only two pages in their index have a jsessionid in them. They obviously handle jsessionid fine these days, or at least they are for me. If you need all of your content to be indexed, you really need to concern yourself with making every page bookmarkable. Take a look at Korbinian's comments above - it looks like he is doing it well. Or have a look at my comments or my site http://www.texashuntfish.com. You should specifically look at http://www.texashuntfish.com/thf/app/forum - I am using DataTable's there, but every link (including sort, etc) is bookmarkable. So, you may go into a category and get an URL like http://www.texashuntfish.com/thf/app/forum/cat-53/Let-s-Talk-Texas-Outdoors-Classifieds-Buy-Sell-Tradeor http://www.texashuntfish.com/thf/app/forum/18395/Winchester-22-model-61-for-sell. The cat-53 or the /18395/ are the only things that matters. I have a strategy mounted on /forum that will take the first parameter and use it to decode what kind of page is being requested - a category page, or a specific post, etc. Everything after that first parameter is specifically for SEO. Putting good keywords in the URL like that, and putting the subject of every article / calendar event / news or forum thread is what shot us up in the rankings of multiple search engines. Migrating the app from what it was before somerandomscript.cfm?foo=123123bar=12321 to this made a HUGE difference. It wasn't without work - Wicket is super easy if you don't have to worry about URLs - but they also make it easy to totally customize all of your URLs, too. Shoot back any questions you have. Hopefully I can share more information, or even some code later. Maybe Korbinian and I should put some information on the Wiki about pretty URLs and SEO. Jeremy On Fri, Apr 4, 2008 at 1:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Thanks, That's kinda the route I've already taken. On my site, www.startfound.com , if you click on any company to see more details it goes to a bookmarkable page. Same with any tag. Maybe if I've already got that much, I shouldn't concern myself with the fact that page 2 of my list is not bookmarkable but reachable by google bot. Or maybe I should just add a noindex meta tag on every page that's not page 1. It'd be kinda ridiculous to require login to see past page 1. That may be good for SEO but it'll drive people away. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeremy Thomerson Sent: Thursday, April 03, 2008 10:00 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO I've been building a community-driven hunting and fishing site in Texas for the past year and a half. Since I have converted it to Wicket from ColdFusion, our search engine rankings have gone WAY UP. That's right, we're on the first page for tons of searches. Search for texas hunting - we're second under only the Texas Parks and Wildlife Association. How? With Wicket? Yes - it requires a little more work. What I do is that for any link that I want Google to be able to follow, I have a subclass of Link specific to that. For instance, ViewThreadLink, which takes the ID for the link and a model (detachable) of the thread. Then I mount an IRequestTargetUrlCodingStrategy for each big category of things in my webapp. I've made several strategies that I use over and over, just giving them a different mount path and a different parameter to tell it what kind of article, etc, that it will match to. This is made easier because over 75% of the objects in our site are all similar enough that the extend from a base class that provides the basic functionality for an article / thread / etc that has
RE: Removing the jsessionid for SEO
Dan Kaplan-3 wrote: Google jsessionid SEO for more. Most of the results tell you to get rid of the jsessionid. Granted, it doesn't seem google has specifically mentioned this either way so all these comments are rumors. But the fact of the matter is Google *DOES* index your urls with the jessionid still in it. You'd think they'd be smart enough to remove that, right? If they can't get that much right, I wouldn't want to make any other assumptions about their abilities on similar matters. Search Matt Cutts blog for session id. He specifically suggests to not even include query string parameters that look like session ids. From what I remember Google can and does index pages with session ids BUT to a reduced degree. -- View this message in context: http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16646137.html Sent from the Wicket - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
If I understood you correctly, the first page is bookmarkable, the second is a wicket URL, tied to the session. That'd be bad for SEO - search engines couldn't see page 2, or they would, but the URL is tied to their session, so even if a user visited that URL, they wouldn't get that page. This means that any content past page one is unreachable from a search engine. I had another thread going about a problem I was having with sessions, which turned up some interesting data. I have over 31,000 pages indexed by Google, they are visiting bookmarkable URLS that DO have jsessionid in them, but only two pages in their index have a jsessionid in them. They obviously handle jsessionid fine these days, or at least they are for me. If you need all of your content to be indexed, you really need to concern yourself with making every page bookmarkable. Take a look at Korbinian's comments above - it looks like he is doing it well. Or have a look at my comments or my site http://www.texashuntfish.com. You should specifically look at http://www.texashuntfish.com/thf/app/forum - I am using DataTable's there, but every link (including sort, etc) is bookmarkable. So, you may go into a category and get an URL like http://www.texashuntfish.com/thf/app/forum/cat-53/Let-s-Talk-Texas-Outdoors-Classifieds-Buy-Sell-Tradeor http://www.texashuntfish.com/thf/app/forum/18395/Winchester-22-model-61-for-sell. The cat-53 or the /18395/ are the only things that matters. I have a strategy mounted on /forum that will take the first parameter and use it to decode what kind of page is being requested - a category page, or a specific post, etc. Everything after that first parameter is specifically for SEO. Putting good keywords in the URL like that, and putting the subject of every article / calendar event / news or forum thread is what shot us up in the rankings of multiple search engines. Migrating the app from what it was before somerandomscript.cfm?foo=123123bar=12321 to this made a HUGE difference. It wasn't without work - Wicket is super easy if you don't have to worry about URLs - but they also make it easy to totally customize all of your URLs, too. Shoot back any questions you have. Hopefully I can share more information, or even some code later. Maybe Korbinian and I should put some information on the Wiki about pretty URLs and SEO. Jeremy On Fri, Apr 4, 2008 at 1:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Thanks, That's kinda the route I've already taken. On my site, www.startfound.com , if you click on any company to see more details it goes to a bookmarkable page. Same with any tag. Maybe if I've already got that much, I shouldn't concern myself with the fact that page 2 of my list is not bookmarkable but reachable by google bot. Or maybe I should just add a noindex meta tag on every page that's not page 1. It'd be kinda ridiculous to require login to see past page 1. That may be good for SEO but it'll drive people away. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeremy Thomerson Sent: Thursday, April 03, 2008 10:00 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO I've been building a community-driven hunting and fishing site in Texas for the past year and a half. Since I have converted it to Wicket from ColdFusion, our search engine rankings have gone WAY UP. That's right, we're on the first page for tons of searches. Search for texas hunting - we're second under only the Texas Parks and Wildlife Association. How? With Wicket? Yes - it requires a little more work. What I do is that for any link that I want Google to be able to follow, I have a subclass of Link specific to that. For instance, ViewThreadLink, which takes the ID for the link and a model (detachable) of the thread. Then I mount an IRequestTargetUrlCodingStrategy for each big category of things in my webapp. I've made several strategies that I use over and over, just giving them a different mount path and a different parameter to tell it what kind of article, etc, that it will match to. This is made easier because over 75% of the objects in our site are all similar enough that the extend from a base class that provides the basic functionality for an article / thread / etc that has a title, text, pictures, comments, the standard stuff. So, yes, it takes work. But that's okay - SEO always takes work. I also have given a lot of care to use good page titles, good semantic HTML and stuff things into the URL that don't have anything to do with locating the resource, but give the search engines a clue as to what the content is. Yes, some pages end up with a jsessionid - and I don't like it (example: http://www.google.com/search?hl=enclient=firefox-arls=com.ubuntu%3Aen- US%3Aofficialq=%22south+texas+management+buck%22btnG=Search). But, most don't because almost all
Re: Removing the jsessionid for SEO
Hi Jeremy, Hi Dan, for a project long ago I had the trail of making a product-browser SEO friendly; I used a plain PagingNavigator at first, and then extended it to have it to use the IndexedUrlPageParameters; this allowed me to put anything into the path to have a nice URL; the key here is to look at the URL and treat it as a unique resource line; so I did it sth like that: mountName{(/anyparams)}*{/pageNumber} this gave me the possiblity to have a browsing URL where I could put anything in while the rest still works; remember also that the URL for SEO may (!) change in future, so go for maximum flexible designs, up you see a resource, then any params to feed the spider (there may be 0 to over 10) and a hook at the end that has to be a number (where 0 is pretended in case nothing at the end is a number); so I was able to finally let the spider see things like: e.g: product/brand_New/BestItemOfTheWorld product/specialCategory/moreSpecial/moreInfo/2 product/spcialCategory/moreSpecail/brandName/moreDetails/1 etc. now, you wonder if I feed the spider with this how do I know where to end? the key was that the part between got merged internally and was specified by the application so we overcome the problem of: a, recreating the view that should be the right one (here: we had a tree-like behaviour for our products where we could compare to the tree in database) b, duplicate content (very bad! - never, ever have a spider find the same content (or very very similar!) under more than one URL !) this strategy did very well; Today with wicket 1.3 I would go nearly the same but stick to the HybridURL scheme, and maybe try to be even more flexible with URL scheme by having the basic schemes and resources specified in persistence (URL-hook, initialState); Remember it is important to feed same resources under same URLs out, else the spider will think you might try to fake content for him; The jsessionID is sth. I dont care about anymore - its 2008, spiders knows it and the usual visitor/ surfer has no clue how to different a URL from an emailadress; however many people have turned cookies + JS off because of security fears - in turn the JSessionID will concern only few people who know about some details but hamper many people that have no knowledge of the internet and its techniques all over - IMHO. @Jeremy: your aproach also seems interesting to me, can you give more details about it? Best, Korbinian Jeremy Thomerson schrieb: - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Removing the jsessionid for SEO
That is helpful, but: This is an extension of the standard, so not all bots may follow it. I wonder if the major ones do... -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeremy Levy Sent: Thursday, April 03, 2008 6:16 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO We have a similar issue, and are trying the following out right now.. http://www.google.com/support/webmasters/bin/answer.py?hl=enanswer=40367 User-agent: * Disallow: /*? On Thu, Apr 3, 2008 at 9:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok, at least I'm not missing anything. I understand the benefits it's providing with its stateful framework. Developing a site with Wicket is easier than with any other framework I've used. But this statefulness, which makes websites so easy to develop, seems to be counter productive to SEO: GoogleBot will follow and index stateful links. Worst case scenario, these actually become visible to google users and when they click the link it takes them to an invalid session page. They think, This site is broken and move on to the next link of their search result. Another approach to solving this is to block all the stateful pages in my robots.txt file. But how can I block these links in robots.txt since they change per session? Is there any way to know what the url will resolve to when googlebot tries to visit my site so I can tell it to disallow: /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 5:45 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok I did a little preliminary research on this. Right now PagingNavigator uses PagingNavigationLink's to represent its page. This extends Link. I'm supposed to override PagingNavigator's newPagingNavigationLink() method to accomplish this (I think) but past that, this isn't very straightforward to me. Do I need to create my own BookmarkablePagingNavigationLink? When I do... what next? I really don't know enough about bookmarkablePageLinks to do this. Right now, all the magic happens inside PagingNavigationLink. Won't I have to move all that logic into the WebPage that I'm passing into BookmarkablePagingNavigationLink? This seems like a lot of work. Am I missing something critical? no, you are not missing anything. you see, when you go stateless, like what you want, then you have to recreate all the magic stuff that makes stateful links Just Work. Without state you are back to the servlet/mvc programming model: you have to encode the state that you want into the link, then on the trip back decode it, recreate something from it, and then apply that something onto the components. This is the crapwork that wicket does for you usually. -igor -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:40 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all the links it uses. -igor On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote: I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED
Re: Removing the jsessionid for SEO
Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final String[] botAgents = { onetszukaj, googlebot, appie, architext, jeeves, bjaaland, ferret, gulliver, harvest, htdig, linkwalker, lycos_, moget, muscatferret, myweb, nomad, scooter, yahoo!\\sslurp\\schina, slurp, weblayers, antibot, bruinbot, digout4u, echo!, ia_archiver, jennybot, mercator, netcraft, msnbot, petersnews, unlost_web_crawler, voila, webbase, webcollage, cfetch, zyborg, wisenutbot, robot, crawl, spider }; /* and so on... */ public Unbuffered(final HttpServletResponse res) { super(res); } @Override public CharSequence encodeURL(final CharSequence url) { return isAgent() ? url : super.encodeURL(url); } private static boolean isAgent() { String agent = ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().getHeader(User-Agent); for(String bot : botAgents) { if (agent.toLowerCase().indexOf(bot) != -1) { return true; } } return false; } } I didn't test this code but I do similar thing in my old application in Spring and it works. Take care, Artur -- View this message in context: http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html Sent from the Wicket - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final String[] botAgents = { onetszukaj, googlebot, appie, architext, jeeves, bjaaland, ferret, gulliver, harvest, htdig, linkwalker, lycos_, moget, muscatferret, myweb, nomad, scooter, yahoo!\\sslurp\\schina, slurp, weblayers, antibot, bruinbot, digout4u, echo!, ia_archiver, jennybot, mercator, netcraft, msnbot, petersnews, unlost_web_crawler, voila, webbase, webcollage, cfetch, zyborg, wisenutbot, robot, crawl, spider }; /* and so on... */ public Unbuffered(final HttpServletResponse res) { super(res); } @Override public CharSequence encodeURL(final CharSequence url) { return isAgent() ? url : super.encodeURL(url); } private static boolean isAgent() { String agent = ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().getHeader(User-Agent); for(String bot : botAgents) { if (agent.toLowerCase().indexOf(bot) != -1) { return true; } } return false; } } I didn't test this code but I do similar thing in my old application in Spring and it works. Take care, Artur -- View this message in context: http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html Sent from the Wicket - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Removing the jsessionid for SEO
When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final String[] botAgents = { onetszukaj, googlebot, appie, architext, jeeves, bjaaland, ferret, gulliver, harvest, htdig, linkwalker, lycos_, moget, muscatferret, myweb, nomad, scooter, yahoo!\\sslurp\\schina, slurp, weblayers, antibot, bruinbot, digout4u, echo!, ia_archiver, jennybot, mercator, netcraft, msnbot, petersnews, unlost_web_crawler, voila, webbase, webcollage, cfetch, zyborg, wisenutbot, robot, crawl, spider }; /* and so on... */ public Unbuffered(final HttpServletResponse res) { super(res); } @Override public CharSequence encodeURL(final CharSequence url) { return isAgent() ? url : super.encodeURL(url); } private static boolean isAgent() { String agent = ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge tHeader(User-Agent); for(String bot : botAgents) { if (agent.toLowerCase().indexOf(bot) != -1) { return true; } } return false; } } I didn't test this code but I do similar thing in my old application in Spring and it works. Take care, Artur -- View this message in context: http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739 6.html Sent from the Wicket - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. _ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final String[] botAgents = { onetszukaj, googlebot, appie, architext, jeeves, bjaaland, ferret, gulliver, harvest, htdig, linkwalker, lycos_, moget, muscatferret, myweb, nomad, scooter, yahoo!\\sslurp\\schina, slurp, weblayers, antibot, bruinbot, digout4u, echo!, ia_archiver, jennybot, mercator, netcraft, msnbot, petersnews, unlost_web_crawler, voila, webbase, webcollage, cfetch, zyborg, wisenutbot, robot, crawl, spider }; /* and so on... */ public Unbuffered(final HttpServletResponse res) { super(res); } @Override public CharSequence encodeURL(final CharSequence url) { return isAgent() ? url : super.encodeURL(url); } private static boolean isAgent() { String agent = ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge tHeader(User-Agent); for(String bot : botAgents) { if (agent.toLowerCase().indexOf(bot) != -1) { return true; } } return false; } } I didn't test this code but I do similar thing in my old application in Spring and it works. Take care, Artur -- View this message in context: http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739 6.htmlhttp://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html Sent from the Wicket - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. _ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final String[] botAgents = { onetszukaj, googlebot, appie, architext, jeeves, bjaaland, ferret, gulliver, harvest, htdig, linkwalker, lycos_, moget, muscatferret, myweb, nomad, scooter, yahoo!\\sslurp\\schina, slurp, weblayers, antibot, bruinbot, digout4u, echo!, ia_archiver, jennybot, mercator, netcraft, msnbot, petersnews, unlost_web_crawler, voila, webbase, webcollage, cfetch, zyborg, wisenutbot, robot, crawl, spider }; /* and so on... */ public Unbuffered(final HttpServletResponse res) { super(res); } @Override public CharSequence encodeURL(final CharSequence url) { return isAgent() ? url : super.encodeURL(url); } private static boolean isAgent() { String agent = ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge tHeader(User-Agent); for(String bot : botAgents) { if (agent.toLowerCase().indexOf(bot) != -1) { return true; } } return false; } } I didn't test this code but I do similar thing in my old application in Spring and it works. Take care, Artur -- View this message in context: http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739 6.htmlhttp://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html Sent from the Wicket - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. _ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final String[] botAgents = { onetszukaj, googlebot, appie, architext, jeeves, bjaaland, ferret, gulliver, harvest, htdig, linkwalker, lycos_, moget, muscatferret, myweb, nomad, scooter, yahoo!\\sslurp\\schina, slurp, weblayers, antibot, bruinbot, digout4u, echo!, ia_archiver, jennybot, mercator, netcraft, msnbot, petersnews, unlost_web_crawler, voila, webbase, webcollage, cfetch, zyborg, wisenutbot, robot, crawl, spider }; /* and so on... */ public Unbuffered(final HttpServletResponse res) { super(res); } @Override public CharSequence encodeURL(final CharSequence url) { return isAgent() ? url : super.encodeURL(url); } private static boolean isAgent() { String agent = ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge tHeader(User-Agent); for(String bot : botAgents) { if (agent.toLowerCase().indexOf(bot) != -1) { return true; } } return false; } } I didn't test this code but I do similar thing in my old application in Spring and it works. Take care, Artur -- View this message in context: http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739 6.htmlhttp://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html Sent from the Wicket - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient
RE: Removing the jsessionid for SEO
Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 11:10 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final String[] botAgents = { onetszukaj, googlebot, appie, architext, jeeves, bjaaland, ferret, gulliver, harvest, htdig, linkwalker, lycos_, moget, muscatferret, myweb, nomad, scooter, yahoo!\\sslurp\\schina, slurp, weblayers, antibot, bruinbot, digout4u, echo!, ia_archiver, jennybot, mercator, netcraft, msnbot, petersnews, unlost_web_crawler, voila, webbase, webcollage, cfetch, zyborg, wisenutbot
Re: Removing the jsessionid for SEO
dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 11:10 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final String[] botAgents = { onetszukaj, googlebot, appie, architext, jeeves, bjaaland, ferret, gulliver, harvest, htdig, linkwalker, lycos_, moget, muscatferret
RE: Removing the jsessionid for SEO
Clarifications: When I said About those 404s, I was talking about if you use the fix I provided and turn off cookies on your browser. When I said, If I ban cookies I mean to say, If I require cookies -Original Message- From: Dan Kaplan [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:22 PM To: users@wicket.apache.org Subject: RE: Removing the jsessionid for SEO Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 11:10 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final String[] botAgents = { onetszukaj, googlebot, appie, architext, jeeves, bjaaland, ferret, gulliver, harvest, htdig, linkwalker, lycos_, moget, muscatferret, myweb
RE: Removing the jsessionid for SEO
How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 11:10 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg wrote: also by doing what you have done users with cookies disabled wont be able to use your site... In my opinion session id is a problem. Google index the same page again and again. About the users without cookies we can do like this: static class Unbuffered extends WebResponse { private static final
Re: Removing the jsessionid for SEO
instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 11:10 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO isnt google always saying that you shouldn't alter behavior of your site depending of it is there bot or not? On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote: Hi! igor.vaynberg
Re: Removing the jsessionid for SEO
On 4/4/08, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. 0.5% of your users care about the URL that is displayed in a google search result. It doesn't look weird or unprofessional. It is not like your URL ends in .php or *gawk* .asp is it? It brings the sophistication of Java to your users. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. I think you need to give the google engineers *some* credit. I seriously doubt they are *THAT* stupid. Martijn -- Buy Wicket in Action: http://manning.com/dashorst Apache Wicket 1.3.2 is released Get it now: http://www.apache.org/dyn/closer.cgi/wicket/1.3.2 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Removing the jsessionid for SEO
I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 11:10 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section of Google's Webmaster Guidelines - http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi gn It specifically recommends allow(ing) search bots to crawl your sites without session IDs or arguments that track their path through the site. -Original Message- From: Johan Compagner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 7:35 AM To: users
Re: Removing the jsessionid for SEO
you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all the links it uses. -igor On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote: I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 11:10 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special treatment for their bot, they are referring to content more than anything. Regarding the session id being coded in the URL, see the Technical guidelines section
RE: Removing the jsessionid for SEO
Awesome, thanks -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:40 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all the links it uses. -igor On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote: I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 11:10 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: right. if you strip sessionid then all your nonbookmarkable urls will resolve to a 404. that will probably drop your rank a lot faster -igor On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote: the problem is that then you have to have all stateless pages. Else google can't crawl your website. And if that is the case then you could be completely stateless so you dont have a session (id) to worry about at all. johan On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry [EMAIL PROTECTED] wrote: When Google asks to not have special
RE: Removing the jsessionid for SEO
-Original Message- From: Martijn Dashorst [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:36 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On 4/4/08, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. 0.5% of your users care about the URL that is displayed in a google search result. It doesn't look weird or unprofessional. It is not like your URL ends in .php or *gawk* .asp is it? It brings the sophistication of Java to your users. My URL ends with ;jsessionid=an7goabg0az (my actual situation). I personally think that looks weirder than .php or .asp. Where did you get that 0.5% statistic? Regardless, my users won't see ANY url if my site is on the 50th page of the search. That's the important issue here. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. I think you need to give the google engineers *some* credit. I seriously doubt they are *THAT* stupid. Martijn These links suggest otherwise: http://www.webmasterworld.com/google/3238326.htm http://www.webmasterworld.com/forum3/5624.htm http://www.webmasterworld.com/forum3/5479.htm http://randomcoder.com/articles/jsessionid-considered-harmful Google jsessionid SEO for more. Most of the results tell you to get rid of the jsessionid. Granted, it doesn't seem google has specifically mentioned this either way so all these comments are rumors. But the fact of the matter is Google *DOES* index your urls with the jessionid still in it. You'd think they'd be smart enough to remove that, right? If they can't get that much right, I wouldn't want to make any other assumptions about their abilities on similar matters. -- Buy Wicket in Action: http://manning.com/dashorst Apache Wicket 1.3.2 is released Get it now: http://www.apache.org/dyn/closer.cgi/wicket/1.3.2 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
On 4/4/08, Dan Kaplan [EMAIL PROTECTED] wrote: My URL ends with ;jsessionid=an7goabg0az (my actual situation). I personally think that looks weirder than .php or .asp. Nah, it shows that you are using Java. Much more sophisticated! Where did you get that 0.5% statistic? Regardless, my users won't see ANY url if my site is on the 50th page of the search. That's the important issue here. I made the 0.5% statistic up. Developers are notoriously anal about URL's where John and Jane Doe typically just use the google search box as their URL bar. Did you ever look at the URLs of Amazon? they are not pretty, and you'd need to have a very weird jsessionid to overthrow Amazon's URL scheme on the ugly scale. Where's the proof that Google punishes you for having a jsessionid in the URL? I think you need to give the google engineers *some* credit. I seriously doubt they are *THAT* stupid. These links suggest otherwise: http://www.webmasterworld.com/google/3238326.htm http://www.webmasterworld.com/forum3/5624.htm http://www.webmasterworld.com/forum3/5479.htm http://randomcoder.com/articles/jsessionid-considered-harmful These links are from 2002 (over 5 years ago). Wicket wasn't even born then. I surely hope that technology has evolved since then. Anyway, I'm glad I don't have to build apps that require SEO or public bots that navigate our sites. In fact if that ever happened, I think our company would instantly be very famous (we deal with privacy sensitive information that should stay out of Google/Yahoo/LiveSeach's indexes) Martijn - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok I did a little preliminary research on this. Right now PagingNavigator uses PagingNavigationLink's to represent its page. This extends Link. I'm supposed to override PagingNavigator's newPagingNavigationLink() method to accomplish this (I think) but past that, this isn't very straightforward to me. Do I need to create my own BookmarkablePagingNavigationLink? When I do... what next? I really don't know enough about bookmarkablePageLinks to do this. Right now, all the magic happens inside PagingNavigationLink. Won't I have to move all that logic into the WebPage that I'm passing into BookmarkablePagingNavigationLink? This seems like a lot of work. Am I missing something critical? no, you are not missing anything. you see, when you go stateless, like what you want, then you have to recreate all the magic stuff that makes stateful links Just Work. Without state you are back to the servlet/mvc programming model: you have to encode the state that you want into the link, then on the trip back decode it, recreate something from it, and then apply that something onto the components. This is the crapwork that wicket does for you usually. -igor -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:40 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all the links it uses. -igor On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote: I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL
RE: Removing the jsessionid for SEO
Ok I did a little preliminary research on this. Right now PagingNavigator uses PagingNavigationLink's to represent its page. This extends Link. I'm supposed to override PagingNavigator's newPagingNavigationLink() method to accomplish this (I think) but past that, this isn't very straightforward to me. Do I need to create my own BookmarkablePagingNavigationLink? When I do... what next? I really don't know enough about bookmarkablePageLinks to do this. Right now, all the magic happens inside PagingNavigationLink. Won't I have to move all that logic into the WebPage that I'm passing into BookmarkablePagingNavigationLink? This seems like a lot of work. Am I missing something critical? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:40 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all the links it uses. -igor On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote: I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I ban cookies and Googlebot visits my home page (with a navigator on it), it'll try to follow all these page links and from its perspective, they all lead back to the first page. So it's kinda a catch-22: Include the jsessionid in the urls and get bad SEO or remove the jsessionid and get bad SEO :( Perhaps the answer to my prayers is a combination of the noindex/nofollow meta tag with a sitemap.xml. I'm thinking I can put a nofollow on the home page (so googlebot doesn't try to follow the navigator links) and use the sitemap.xml to point out the individual pages I want it to index. Matej: can you go into more detail about your hybrid URL statement? Won't google index, for example, /home and /home.1 if I use it? When it follows the next page, won't the url become /home.1.2 or something? That .2 is a page version: If google indexes that and tries to visit it again, won't it report about an invalid session? -Original Message- From: Matej Knopp [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 11:10 AM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On the other hand, crawling non-bookmarkable pages is not very useful anyway, since ?wicket:interface url will always get page expired when you click on the result. However, preserving session makes lot of sense with hybrid url. Google remembers the original url (without page instance) while indexing the real page (after redirect). I think though that the crawler is quite advanced. I'm would think it supports cookies (at least JSESSIONID) as well as it evaluates some of the javascript on page. -Matej On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
RE: Removing the jsessionid for SEO
Ok, at least I'm not missing anything. I understand the benefits it's providing with its stateful framework. Developing a site with Wicket is easier than with any other framework I've used. But this statefulness, which makes websites so easy to develop, seems to be counter productive to SEO: GoogleBot will follow and index stateful links. Worst case scenario, these actually become visible to google users and when they click the link it takes them to an invalid session page. They think, This site is broken and move on to the next link of their search result. Another approach to solving this is to block all the stateful pages in my robots.txt file. But how can I block these links in robots.txt since they change per session? Is there any way to know what the url will resolve to when googlebot tries to visit my site so I can tell it to disallow: /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 5:45 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok I did a little preliminary research on this. Right now PagingNavigator uses PagingNavigationLink's to represent its page. This extends Link. I'm supposed to override PagingNavigator's newPagingNavigationLink() method to accomplish this (I think) but past that, this isn't very straightforward to me. Do I need to create my own BookmarkablePagingNavigationLink? When I do... what next? I really don't know enough about bookmarkablePageLinks to do this. Right now, all the magic happens inside PagingNavigationLink. Won't I have to move all that logic into the WebPage that I'm passing into BookmarkablePagingNavigationLink? This seems like a lot of work. Am I missing something critical? no, you are not missing anything. you see, when you go stateless, like what you want, then you have to recreate all the magic stuff that makes stateful links Just Work. Without state you are back to the servlet/mvc programming model: you have to encode the state that you want into the link, then on the trip back decode it, recreate something from it, and then apply that something onto the components. This is the crapwork that wicket does for you usually. -igor -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:40 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all the links it uses. -igor On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote: I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I provided I don't get a 404, but the links refresh the page I'm already on. IE: If I'm on A, and a link to B is non-bookmarkable, clicking B refreshes A. This issue is very disconcerting to me. It's one of the reasons I wish that DataView had an option to work in stateless mode. Cause if I
Re: Removing the jsessionid for SEO
We have a similar issue, and are trying the following out right now.. http://www.google.com/support/webmasters/bin/answer.py?hl=enanswer=40367 User-agent: * Disallow: /*? On Thu, Apr 3, 2008 at 9:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok, at least I'm not missing anything. I understand the benefits it's providing with its stateful framework. Developing a site with Wicket is easier than with any other framework I've used. But this statefulness, which makes websites so easy to develop, seems to be counter productive to SEO: GoogleBot will follow and index stateful links. Worst case scenario, these actually become visible to google users and when they click the link it takes them to an invalid session page. They think, This site is broken and move on to the next link of their search result. Another approach to solving this is to block all the stateful pages in my robots.txt file. But how can I block these links in robots.txt since they change per session? Is there any way to know what the url will resolve to when googlebot tries to visit my site so I can tell it to disallow: /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 5:45 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok I did a little preliminary research on this. Right now PagingNavigator uses PagingNavigationLink's to represent its page. This extends Link. I'm supposed to override PagingNavigator's newPagingNavigationLink() method to accomplish this (I think) but past that, this isn't very straightforward to me. Do I need to create my own BookmarkablePagingNavigationLink? When I do... what next? I really don't know enough about bookmarkablePageLinks to do this. Right now, all the magic happens inside PagingNavigationLink. Won't I have to move all that logic into the WebPage that I'm passing into BookmarkablePagingNavigationLink? This seems like a lot of work. Am I missing something critical? no, you are not missing anything. you see, when you go stateless, like what you want, then you have to recreate all the magic stuff that makes stateful links Just Work. Without state you are back to the servlet/mvc programming model: you have to encode the state that you want into the link, then on the trip back decode it, recreate something from it, and then apply that something onto the components. This is the crapwork that wicket does for you usually. -igor -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:40 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all the links it uses. -igor On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote: I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url. There has got to be some negative effect when google visits it the second time and the jsessionid has changed but it sees the same exact content. Worst case, it'll think you're trying to trick it. About those 404s, I'm finding that with the fix I
Re: Removing the jsessionid for SEO
On Thu, Apr 3, 2008 at 6:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok, at least I'm not missing anything. I understand the benefits it's providing with its stateful framework. Developing a site with Wicket is easier than with any other framework I've used. But this statefulness, which makes websites so easy to develop, seems to be counter productive to SEO: well, perhaps the differentiator here is that wicket is made for web applications not web sites. GoogleBot will follow and index stateful links. Worst case scenario, these actually become visible to google users and when they click the link it takes them to an invalid session page. They think, This site is broken and move on to the next link of their search result. yep, you need to make sure that all stateful links are behind a login or something similar that the bot cant get passed. Another approach to solving this is to block all the stateful pages in my robots.txt file. But how can I block these links in robots.txt since they change per session? Is there any way to know what the url will resolve to when googlebot tries to visit my site so I can tell it to disallow: /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...? no there isnt a way, you have to use wildmasks on the other hand it is not that difficult to develop the stateless paging navigator, it will take a bit of work though. -igor -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 5:45 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok I did a little preliminary research on this. Right now PagingNavigator uses PagingNavigationLink's to represent its page. This extends Link. I'm supposed to override PagingNavigator's newPagingNavigationLink() method to accomplish this (I think) but past that, this isn't very straightforward to me. Do I need to create my own BookmarkablePagingNavigationLink? When I do... what next? I really don't know enough about bookmarkablePageLinks to do this. Right now, all the magic happens inside PagingNavigationLink. Won't I have to move all that logic into the WebPage that I'm passing into BookmarkablePagingNavigationLink? This seems like a lot of work. Am I missing something critical? no, you are not missing anything. you see, when you go stateless, like what you want, then you have to recreate all the magic stuff that makes stateful links Just Work. Without state you are back to the servlet/mvc programming model: you have to encode the state that you want into the link, then on the trip back decode it, recreate something from it, and then apply that something onto the components. This is the crapwork that wicket does for you usually. -igor -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:40 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all the links it uses. -igor On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote: I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url
Re: Removing the jsessionid for SEO
To clarify my message below: With a CryptedUrlWebRequestCodingStrategy and alot of BookmarkablePages. On Thu, Apr 3, 2008 at 9:16 PM, Jeremy Levy [EMAIL PROTECTED] wrote: We have a similar issue, and are trying the following out right now.. http://www.google.com/support/webmasters/bin/answer.py?hl=enanswer=40367 User-agent: * Disallow: /*? On Thu, Apr 3, 2008 at 9:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok, at least I'm not missing anything. I understand the benefits it's providing with its stateful framework. Developing a site with Wicket is easier than with any other framework I've used. But this statefulness, which makes websites so easy to develop, seems to be counter productive to SEO: GoogleBot will follow and index stateful links. Worst case scenario, these actually become visible to google users and when they click the link it takes them to an invalid session page. They think, This site is broken and move on to the next link of their search result. Another approach to solving this is to block all the stateful pages in my robots.txt file. But how can I block these links in robots.txt since they change per session? Is there any way to know what the url will resolve to when googlebot tries to visit my site so I can tell it to disallow: /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 5:45 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok I did a little preliminary research on this. Right now PagingNavigator uses PagingNavigationLink's to represent its page. This extends Link. I'm supposed to override PagingNavigator's newPagingNavigationLink() method to accomplish this (I think) but past that, this isn't very straightforward to me. Do I need to create my own BookmarkablePagingNavigationLink? When I do... what next? I really don't know enough about bookmarkablePageLinks to do this. Right now, all the magic happens inside PagingNavigationLink. Won't I have to move all that logic into the WebPage that I'm passing into BookmarkablePagingNavigationLink? This seems like a lot of work. Am I missing something critical? no, you are not missing anything. you see, when you go stateless, like what you want, then you have to recreate all the magic stuff that makes stateful links Just Work. Without state you are back to the servlet/mvc programming model: you have to encode the state that you want into the link, then on the trip back decode it, recreate something from it, and then apply that something onto the components. This is the crapwork that wicket does for you usually. -igor -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:40 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all the links it uses. -igor On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote: I wasn't talking about the links that are on the list (I already make those bookmarkable). I'm talking about the links that the Navigator generates. How do I make it so page 2 is bookmarkable? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:30 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO instead of item.add(new link(foo) { onclick() }); do item.add(new bookmarkablepagelink(foo, page.class)); -igor On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote: How? I asked how to do it before and nobody suggested this as a possibility. -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:26 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO dataview can work in a stateless mode, just use bookmarkable links inside it -igor On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Regardless, at the very least this makes your site look weird and unprofessional when google puts a jsessionid on your url
Re: Removing the jsessionid for SEO
I've been building a community-driven hunting and fishing site in Texas for the past year and a half. Since I have converted it to Wicket from ColdFusion, our search engine rankings have gone WAY UP. That's right, we're on the first page for tons of searches. Search for texas hunting - we're second under only the Texas Parks and Wildlife Association. How? With Wicket? Yes - it requires a little more work. What I do is that for any link that I want Google to be able to follow, I have a subclass of Link specific to that. For instance, ViewThreadLink, which takes the ID for the link and a model (detachable) of the thread. Then I mount an IRequestTargetUrlCodingStrategy for each big category of things in my webapp. I've made several strategies that I use over and over, just giving them a different mount path and a different parameter to tell it what kind of article, etc, that it will match to. This is made easier because over 75% of the objects in our site are all similar enough that the extend from a base class that provides the basic functionality for an article / thread / etc that has a title, text, pictures, comments, the standard stuff. So, yes, it takes work. But that's okay - SEO always takes work. I also have given a lot of care to use good page titles, good semantic HTML and stuff things into the URL that don't have anything to do with locating the resource, but give the search engines a clue as to what the content is. Yes, some pages end up with a jsessionid - and I don't like it (example: http://www.google.com/search?hl=enclient=firefox-arls=com.ubuntu%3Aen-US%3Aofficialq=%22south+texas+management+buck%22btnG=Search). But, most don't because almost all of my links are bookmarkable. When the user clicks something that they can only do as a signed-in user, then it redirects them to the sign in page, they sign in, and are taken back to the page they were on. Then they can pick up, and I don't worry about bookmarkable URLs for anything that requires user-authentication (wizards to post a new listing, story, admin links, etc). Jeremy Thomerson TexasHuntFish.com On Thu, Apr 3, 2008 at 8:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok, at least I'm not missing anything. I understand the benefits it's providing with its stateful framework. Developing a site with Wicket is easier than with any other framework I've used. But this statefulness, which makes websites so easy to develop, seems to be counter productive to SEO: GoogleBot will follow and index stateful links. Worst case scenario, these actually become visible to google users and when they click the link it takes them to an invalid session page. They think, This site is broken and move on to the next link of their search result. Another approach to solving this is to block all the stateful pages in my robots.txt file. But how can I block these links in robots.txt since they change per session? Is there any way to know what the url will resolve to when googlebot tries to visit my site so I can tell it to disallow: /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...? -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 5:45 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED] wrote: Ok I did a little preliminary research on this. Right now PagingNavigator uses PagingNavigationLink's to represent its page. This extends Link. I'm supposed to override PagingNavigator's newPagingNavigationLink() method to accomplish this (I think) but past that, this isn't very straightforward to me. Do I need to create my own BookmarkablePagingNavigationLink? When I do... what next? I really don't know enough about bookmarkablePageLinks to do this. Right now, all the magic happens inside PagingNavigationLink. Won't I have to move all that logic into the WebPage that I'm passing into BookmarkablePagingNavigationLink? This seems like a lot of work. Am I missing something critical? no, you are not missing anything. you see, when you go stateless, like what you want, then you have to recreate all the magic stuff that makes stateful links Just Work. Without state you are back to the servlet/mvc programming model: you have to encode the state that you want into the link, then on the trip back decode it, recreate something from it, and then apply that something onto the components. This is the crapwork that wicket does for you usually. -igor -Original Message- From: Igor Vaynberg [mailto:[EMAIL PROTECTED] Sent: Thursday, April 03, 2008 3:40 PM To: users@wicket.apache.org Subject: Re: Removing the jsessionid for SEO you subclass the pagenavigator and make it use bookmarkable links also. it has factory methods for all
Removing the jsessionid for SEO
victori_ provided this information on IRC and I just wanted to share it with everyone else. Googlebot and others dont use cookies. This means when they visit your site it adds ;jsessionid=code to the end of all your urls they visit. When they re-visit it, they get a different code, consider that a different url with the same content and punish you. So, for the web crawling bots, its very important to get rid of this (Perhaps its worthwhile to check this code in to the code base). Heres what you do in your Application: @Override protected WebResponse newWebResponse(final HttpServletResponse servletRe sponse) { return CleanWebResponse.getNew(this, servletResponse); } Here's the CleanWebResponse class: public class CleanWebResponse { public static WebResponse getNew(final Application app, final HttpServletResponse servletResponse) { return app.getRequestCycleSettings().getBufferResponse() ? new Buffered(servletResponse) : new Unbuffered( servletResponse); } static class Buffered extends BufferedWebResponse { public Buffered(final HttpServletResponse httpServletResponse) { super(httpServletResponse); } @Override public CharSequence encodeURL(final CharSequence url) { return url; } } static class Unbuffered extends WebResponse { public Unbuffered(final HttpServletResponse httpServletResponse) { super(httpServletResponse); } @Override public CharSequence encodeURL(final CharSequence url) { return url; } } } Note, I haven't tested this myself yet but I plan to tonight. Hope this was helpful. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
you would think that the crawl bots are smart enough to ignore jsessionid tokens... -igor On Wed, Apr 2, 2008 at 5:20 PM, Dan Kaplan [EMAIL PROTECTED] wrote: victori_ provided this information on IRC and I just wanted to share it with everyone else. Googlebot and others don't use cookies. This means when they visit your site it adds ;jsessionid=code to the end of all your urls they visit. When they re-visit it, they get a different code, consider that a different url with the same content and punish you. So, for the web crawling bots, it's very important to get rid of this (Perhaps it's worthwhile to check this code in to the code base). Here's what you do in your Application: @Override protected WebResponse newWebResponse(final HttpServletResponse servletRe sponse) { return CleanWebResponse.getNew(this, servletResponse); } Here's the CleanWebResponse class: public class CleanWebResponse { public static WebResponse getNew(final Application app, final HttpServletResponse servletResponse) { return app.getRequestCycleSettings().getBufferResponse() ? new Buffered(servletResponse) : new Unbuffered( servletResponse); } static class Buffered extends BufferedWebResponse { public Buffered(final HttpServletResponse httpServletResponse) { super(httpServletResponse); } @Override public CharSequence encodeURL(final CharSequence url) { return url; } } static class Unbuffered extends WebResponse { public Unbuffered(final HttpServletResponse httpServletResponse) { super(httpServletResponse); } @Override public CharSequence encodeURL(final CharSequence url) { return url; } } } Note, I haven't tested this myself yet but I plan to tonight. Hope this was helpful. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
also by doing what you have done users with cookies disabled wont be able to use your site... -igor On Wed, Apr 2, 2008 at 7:44 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: you would think that the crawl bots are smart enough to ignore jsessionid tokens... -igor On Wed, Apr 2, 2008 at 5:20 PM, Dan Kaplan [EMAIL PROTECTED] wrote: victori_ provided this information on IRC and I just wanted to share it with everyone else. Googlebot and others don't use cookies. This means when they visit your site it adds ;jsessionid=code to the end of all your urls they visit. When they re-visit it, they get a different code, consider that a different url with the same content and punish you. So, for the web crawling bots, it's very important to get rid of this (Perhaps it's worthwhile to check this code in to the code base). Here's what you do in your Application: @Override protected WebResponse newWebResponse(final HttpServletResponse servletRe sponse) { return CleanWebResponse.getNew(this, servletResponse); } Here's the CleanWebResponse class: public class CleanWebResponse { public static WebResponse getNew(final Application app, final HttpServletResponse servletResponse) { return app.getRequestCycleSettings().getBufferResponse() ? new Buffered(servletResponse) : new Unbuffered( servletResponse); } static class Buffered extends BufferedWebResponse { public Buffered(final HttpServletResponse httpServletResponse) { super(httpServletResponse); } @Override public CharSequence encodeURL(final CharSequence url) { return url; } } static class Unbuffered extends WebResponse { public Unbuffered(final HttpServletResponse httpServletResponse) { super(httpServletResponse); } @Override public CharSequence encodeURL(final CharSequence url) { return url; } } } Note, I haven't tested this myself yet but I plan to tonight. Hope this was helpful. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Removing the jsessionid for SEO
I have noticed something like this with http_check on nagios. Is there a proper way to get rid of these temporary sessions? On Wed, Apr 2, 2008 at 10:45 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: also by doing what you have done users with cookies disabled wont be able to use your site... -igor On Wed, Apr 2, 2008 at 7:44 PM, Igor Vaynberg [EMAIL PROTECTED] wrote: you would think that the crawl bots are smart enough to ignore jsessionid tokens... -igor On Wed, Apr 2, 2008 at 5:20 PM, Dan Kaplan [EMAIL PROTECTED] wrote: victori_ provided this information on IRC and I just wanted to share it with everyone else. Googlebot and others don't use cookies. This means when they visit your site it adds ;jsessionid=code to the end of all your urls they visit. When they re-visit it, they get a different code, consider that a different url with the same content and punish you. So, for the web crawling bots, it's very important to get rid of this (Perhaps it's worthwhile to check this code in to the code base). Here's what you do in your Application: @Override protected WebResponse newWebResponse(final HttpServletResponse servletRe sponse) { return CleanWebResponse.getNew(this, servletResponse); } Here's the CleanWebResponse class: public class CleanWebResponse { public static WebResponse getNew(final Application app, final HttpServletResponse servletResponse) { return app.getRequestCycleSettings().getBufferResponse() ? new Buffered(servletResponse) : new Unbuffered( servletResponse); } static class Buffered extends BufferedWebResponse { public Buffered(final HttpServletResponse httpServletResponse) { super(httpServletResponse); } @Override public CharSequence encodeURL(final CharSequence url) { return url; } } static class Unbuffered extends WebResponse { public Unbuffered(final HttpServletResponse httpServletResponse) { super(httpServletResponse); } @Override public CharSequence encodeURL(final CharSequence url) { return url; } } } Note, I haven't tested this myself yet but I plan to tonight. Hope this was helpful. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Ryan Gravener http://ryangravener.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]