subject:"Removing the jsessionid for SEO"

Re: Removing the jsessionid for SEO

2008-04-22 Thread Rüdiger Schulz

Hello,

I still didn't find the time to make a blog post about this. So I just
put the code on pastebin:

http://pastebin.org/31242

I'm looking forward to your feedback :)

I tested this filter on Jetty and Tomcat (with Firefox' user agent
switcher) where it worked fine. However, as stated in the code, some app
servers might behave a little different, so YMMV.


greetings,

Rüdiger



Am Montag, den 14.04.2008, 16:37 +0200 schrieb Korbinian Bachl - privat:
 Yeah, its quite a shame that google doesnt open source their logic ;)
 
 would be nice if you could give us the code however, so we could have a 
 look at it :)
 
 Rüdiger Schulz schrieb:
  Hm, SEO is really a little bit like black science sometimes *g*
  
  This (german) article states, that SID cloaking would be ok for google:
  http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking
  
  Some more googling, and here someone seems to confirm this:
  http://www.webmasterworld.com/cloaking/3201743.htm
   I was actually at SMX West and Matt Cutts specifically sa*id* that this is
  OK
  
  All I can say in our case is that I added this filter several months ago,
  and I can't see any negative effects so far.
  
  
  greetings,
  
  Rüdiger
  
  
  2008/4/14, Korbinian Bachl - privat [EMAIL PROTECTED]:
  Hi Rüdiger,
 
  AFAIK this could lead to some punishment by google, as he browses the site
  multiple times using different agents and origin IPs and in case he sees
  different behaviours he thinks about cloaking/ prepared content and will 
  act
  accordingly to it;
 
  This is usually noticed after the regular google index refreshes that
  happen some times a year - you should keep an eye onto this;
 
  Best,
 
  Korbinian
 
  Rüdiger Schulz schrieb:
 
  Hello everybody,
 
  I just want to add my 2 cents to this discussion.
 
  At IndyPhone we too wanted to get rid of jesessionid-URLs in google's
  index.
  Yeah, it would be nice if the google bot would be as clever as the one
  from
  yahoo, and just remove them himself. But he doesn't.
 
  So I implemented a Servlet-Filter which checks the user agent header for
  google bot, and skips the url rewriting just for those clients. As this
  will
  generate lots of new sessions, the filter invalidates the session right
  after the request. Also, if a crawler is doing a request containing a
  jsessionid (which he stored before the filter was implemented), he
  redirects
  the crawler to the same URL, just without the jsessionid parameter. That
  way, the index will be updated for those old URLs.
 
  Now we have almost none of those URLs in google's index.
 
  If anyone is interested in the code, I'd be willing to publish this. As
  it
  is not wicket specific, I could share it with some generic servlet tools
  OS
  project - is there something like that on apache or elsewhere?
 
  But maybe Google is smarter by now, and it is not required anymore?
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
  
  
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-14 Thread Rüdiger Schulz

Hello everybody,

I just want to add my 2 cents to this discussion.

At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index.
Yeah, it would be nice if the google bot would be as clever as the one from
yahoo, and just remove them himself. But he doesn't.

So I implemented a Servlet-Filter which checks the user agent header for
google bot, and skips the url rewriting just for those clients. As this will
generate lots of new sessions, the filter invalidates the session right
after the request. Also, if a crawler is doing a request containing a
jsessionid (which he stored before the filter was implemented), he redirects
the crawler to the same URL, just without the jsessionid parameter. That
way, the index will be updated for those old URLs.

Now we have almost none of those URLs in google's index.

If anyone is interested in the code, I'd be willing to publish this. As it
is not wicket specific, I could share it with some generic servlet tools OS
project - is there something like that on apache or elsewhere?

But maybe Google is smarter by now, and it is not required anymore?

-- 
greetings from Berlin,

Rüdiger Schulz

www.2rue.de
www.indyphone.de - Coole Handy Logos einfach selber bauen

Re: Removing the jsessionid for SEO

2008-04-14 Thread Korbinian Bachl - privat


Hi Rüdiger,

AFAIK this could lead to some punishment by google, as he browses the 
site multiple times using different agents and origin IPs and in case he 
sees different behaviours he thinks about cloaking/ prepared content and 
will act accordingly to it;


This is usually noticed after the regular google index refreshes that 
happen some times a year - you should keep an eye onto this;


Best,

Korbinian

Rüdiger Schulz schrieb:

Hello everybody,

I just want to add my 2 cents to this discussion.

At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index.
Yeah, it would be nice if the google bot would be as clever as the one from
yahoo, and just remove them himself. But he doesn't.

So I implemented a Servlet-Filter which checks the user agent header for
google bot, and skips the url rewriting just for those clients. As this will
generate lots of new sessions, the filter invalidates the session right
after the request. Also, if a crawler is doing a request containing a
jsessionid (which he stored before the filter was implemented), he redirects
the crawler to the same URL, just without the jsessionid parameter. That
way, the index will be updated for those old URLs.

Now we have almost none of those URLs in google's index.

If anyone is interested in the code, I'd be willing to publish this. As it
is not wicket specific, I could share it with some generic servlet tools OS
project - is there something like that on apache or elsewhere?

But maybe Google is smarter by now, and it is not required anymore?



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-14 Thread Rüdiger Schulz

Hm, SEO is really a little bit like black science sometimes *g*

This (german) article states, that SID cloaking would be ok for google:
http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking

Some more googling, and here someone seems to confirm this:
http://www.webmasterworld.com/cloaking/3201743.htm
 I was actually at SMX West and Matt Cutts specifically sa*id* that this is
OK

All I can say in our case is that I added this filter several months ago,
and I can't see any negative effects so far.


greetings,

Rüdiger


2008/4/14, Korbinian Bachl - privat [EMAIL PROTECTED]:

 Hi Rüdiger,

 AFAIK this could lead to some punishment by google, as he browses the site
 multiple times using different agents and origin IPs and in case he sees
 different behaviours he thinks about cloaking/ prepared content and will act
 accordingly to it;

 This is usually noticed after the regular google index refreshes that
 happen some times a year - you should keep an eye onto this;

 Best,

 Korbinian

 Rüdiger Schulz schrieb:

  Hello everybody,
 
  I just want to add my 2 cents to this discussion.
 
  At IndyPhone we too wanted to get rid of jesessionid-URLs in google's
  index.
  Yeah, it would be nice if the google bot would be as clever as the one
  from
  yahoo, and just remove them himself. But he doesn't.
 
  So I implemented a Servlet-Filter which checks the user agent header for
  google bot, and skips the url rewriting just for those clients. As this
  will
  generate lots of new sessions, the filter invalidates the session right
  after the request. Also, if a crawler is doing a request containing a
  jsessionid (which he stored before the filter was implemented), he
  redirects
  the crawler to the same URL, just without the jsessionid parameter. That
  way, the index will be updated for those old URLs.
 
  Now we have almost none of those URLs in google's index.
 
  If anyone is interested in the code, I'd be willing to publish this. As
  it
  is not wicket specific, I could share it with some generic servlet tools
  OS
  project - is there something like that on apache or elsewhere?
 
  But maybe Google is smarter by now, and it is not required anymore?
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-- 
greetings from Berlin,

Rüdiger Schulz

www.2rue.de
www.indyphone.de - Coole Handy Logos einfach selber bauen

Re: Removing the jsessionid for SEO

2008-04-14 Thread Korbinian Bachl - privat



Yeah, its quite a shame that google doesnt open source their logic ;)

would be nice if you could give us the code however, so we could have a 
look at it :)


Rüdiger Schulz schrieb:

Hm, SEO is really a little bit like black science sometimes *g*

This (german) article states, that SID cloaking would be ok for google:
http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking

Some more googling, and here someone seems to confirm this:
http://www.webmasterworld.com/cloaking/3201743.htm
 I was actually at SMX West and Matt Cutts specifically sa*id* that this is
OK

All I can say in our case is that I added this filter several months ago,
and I can't see any negative effects so far.


greetings,

Rüdiger


2008/4/14, Korbinian Bachl - privat [EMAIL PROTECTED]:

Hi Rüdiger,

AFAIK this could lead to some punishment by google, as he browses the site
multiple times using different agents and origin IPs and in case he sees
different behaviours he thinks about cloaking/ prepared content and will act
accordingly to it;

This is usually noticed after the regular google index refreshes that
happen some times a year - you should keep an eye onto this;

Best,

Korbinian

Rüdiger Schulz schrieb:


Hello everybody,

I just want to add my 2 cents to this discussion.

At IndyPhone we too wanted to get rid of jesessionid-URLs in google's
index.
Yeah, it would be nice if the google bot would be as clever as the one
from
yahoo, and just remove them himself. But he doesn't.

So I implemented a Servlet-Filter which checks the user agent header for
google bot, and skips the url rewriting just for those clients. As this
will
generate lots of new sessions, the filter invalidates the session right
after the request. Also, if a crawler is doing a request containing a
jsessionid (which he stored before the filter was implemented), he
redirects
the crawler to the same URL, just without the jsessionid parameter. That
way, the index will be updated for those old URLs.

Now we have almost none of those URLs in google's index.

If anyone is interested in the code, I'd be willing to publish this. As
it
is not wicket specific, I could share it with some generic servlet tools
OS
project - is there something like that on apache or elsewhere?

But maybe Google is smarter by now, and it is not required anymore?



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-14 Thread Erik van Oosten

Hi Rüdiger,

I would be very interested in the code.
If you can not find a suitable repository, could you just do something
simple like linking to a  zip from a blog post?

Regards,
Erik.



Rüdiger Schulz wrote:
 Hello everybody,

 I just want to add my 2 cents to this discussion.

 At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index.
 Yeah, it would be nice if the google bot would be as clever as the one from
 yahoo, and just remove them himself. But he doesn't.

 So I implemented a Servlet-Filter which checks the user agent header for
 google bot, and skips the url rewriting just for those clients. As this will
 generate lots of new sessions, the filter invalidates the session right
 after the request. Also, if a crawler is doing a request containing a
 jsessionid (which he stored before the filter was implemented), he redirects
 the crawler to the same URL, just without the jsessionid parameter. That
 way, the index will be updated for those old URLs.

 Now we have almost none of those URLs in google's index.

 If anyone is interested in the code, I'd be willing to publish this. As it
 is not wicket specific, I could share it with some generic servlet tools OS
 project - is there something like that on apache or elsewhere?

 But maybe Google is smarter by now, and it is not required anymore?

   

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-14 Thread Rüdiger Schulz

I'll wrap something up in the course of this week, and post it on my blog.
(so little time a.t.m.)

greetings,


Rüdiger

2008/4/14, Erik van Oosten [EMAIL PROTECTED]:

 Hi Rüdiger,

 I would be very interested in the code.
 If you can not find a suitable repository, could you just do something
 simple like linking to a  zip from a blog post?

 Regards,

 Erik.




 Rüdiger Schulz wrote:
  Hello everybody,
 
  I just want to add my 2 cents to this discussion.
 
  At IndyPhone we too wanted to get rid of jesessionid-URLs in google's
 index.
  Yeah, it would be nice if the google bot would be as clever as the one
 from
  yahoo, and just remove them himself. But he doesn't.
 
  So I implemented a Servlet-Filter which checks the user agent header for
  google bot, and skips the url rewriting just for those clients. As this
 will
  generate lots of new sessions, the filter invalidates the session right
  after the request. Also, if a crawler is doing a request containing a
  jsessionid (which he stored before the filter was implemented), he
 redirects
  the crawler to the same URL, just without the jsessionid parameter. That
  way, the index will be updated for those old URLs.
 
  Now we have almost none of those URLs in google's index.
 
  If anyone is interested in the code, I'd be willing to publish this. As
 it
  is not wicket specific, I could share it with some generic servlet tools
 OS
  project - is there something like that on apache or elsewhere?
 
  But maybe Google is smarter by now, and it is not required anymore?
 
 


 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-- 
greetings from Berlin,

Rüdiger Schulz

www.2rue.de
www.indyphone.de - Coole Handy Logos einfach selber bauen

Re: Removing the jsessionid for SEO

2008-04-13 Thread Korbinian Bachl - privat


Hi Jeremy,

youre absolutely right; Nearly all spiders today can handle the default 
sessions, may it be Java, PHP, .Net etc. ; those guys at google and 
mircosoft arent beginners!


And its also important to understand that a URL with wicket in fact is 
to a part nothing more than a plain string that can be manipulated the 
way you like. Wicket (a.k.a. your page) needs 1 or maybe 2 parameters - 
and it has to know how to find them, the rest of the URL can be messed 
with so it fits to your needs as long as you can garantue the unique 
behaviour of content-to-URL so you dont get marked as duplicate content 
spammer;


Best,


Korbinian

Jeremy Thomerson schrieb:

If I understood you correctly, the first page is bookmarkable, the second is
a wicket URL, tied to the session.  That'd be bad for SEO - search engines
couldn't see page 2, or they would, but the URL is tied to their session, so
even if a user visited that URL, they wouldn't get that page.  This means
that any content past page one is unreachable from a search engine.  I had
another thread going about a problem I was having with sessions, which
turned up some interesting data.  I have over 31,000 pages indexed by
Google, they are visiting bookmarkable URLS that DO have jsessionid in them,
but only two pages in their index have a jsessionid in them.  They obviously
handle jsessionid fine these days, or at least they are for me.

If you need all of your content to be indexed, you really need to concern
yourself with making every page bookmarkable.  Take a look at Korbinian's
comments above - it looks like he is doing it well.  Or have a look at my
comments or my site http://www.texashuntfish.com.

You should specifically look at http://www.texashuntfish.com/thf/app/forum -
I am using DataTable's there, but every link (including sort, etc) is
bookmarkable.  So, you may go into a category and get an URL like
http://www.texashuntfish.com/thf/app/forum/cat-53/Let-s-Talk-Texas-Outdoors-Classifieds-Buy-Sell-Tradeor
http://www.texashuntfish.com/thf/app/forum/18395/Winchester-22-model-61-for-sell.
The cat-53 or the /18395/ are the only things that matters.  I have a
strategy mounted on /forum that will take the first parameter and use it
to decode what kind of page is being requested - a category page, or a
specific post, etc.  Everything after that first parameter is specifically
for SEO.

Putting good keywords in the URL like that, and putting the subject of every
article / calendar event / news or forum thread is what shot us up in the
rankings of multiple search engines.  Migrating the app from what it was
before somerandomscript.cfm?foo=123123bar=12321 to this made a HUGE
difference.  It wasn't without work - Wicket is super easy if you don't have
to worry about URLs - but they also make it easy to totally customize all of
your URLs, too.

Shoot back any questions you have.  Hopefully I can share more information,
or even some code later.  Maybe Korbinian and I should put some information
on the Wiki about pretty URLs and SEO.

Jeremy

On Fri, Apr 4, 2008 at 1:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote:


Thanks,

That's kinda the route I've already taken.  On my site, www.startfound.com
,
if you click on any company to see more details it goes to a bookmarkable
page.  Same with any tag.  Maybe if I've already got that much, I
shouldn't
concern myself with the fact that page 2 of my list is not bookmarkable
but
reachable by google bot.  Or maybe I should just add a noindex meta tag on
every page that's not page 1.

It'd be kinda ridiculous to require login to see past page 1.  That may be
good for SEO but it'll drive people away.


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Jeremy Thomerson
Sent: Thursday, April 03, 2008 10:00 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

I've been building a community-driven hunting and fishing site in Texas
for
the past year and a half.  Since I have converted it to Wicket from
ColdFusion, our search engine rankings have gone WAY UP.  That's right,
we're on the first page for tons of searches.  Search for texas

hunting

-
we're second under only the Texas Parks and Wildlife Association.

How?  With Wicket?  Yes - it requires a little more work.  What I do is
that
for any link that I want Google to be able to follow, I have a subclass

of

Link specific to that.  For instance, ViewThreadLink, which takes the ID
for
the link and a model (detachable) of the thread.  Then I mount an
IRequestTargetUrlCodingStrategy for each big category of things in my
webapp.  I've made several strategies that I use over and over, just
giving
them a different mount path and a different parameter to tell it what

kind

of article, etc, that it will match to.  This is made easier because

over

75% of the objects in our site are all similar enough that the extend

from

a
base class that provides the basic functionality for an article / thread

/

etc that has

RE: Removing the jsessionid for SEO

2008-04-12 Thread John Patterson




Dan Kaplan-3 wrote:
 
 Google jsessionid SEO for more.  Most of the results tell you to get rid
 of the jsessionid.  Granted, it doesn't seem google has specifically
 mentioned this either way so all these comments are rumors.  But the fact
 of
 the matter is Google *DOES* index your urls with the jessionid still in
 it.
 You'd think they'd be smart enough to remove that, right?  If they can't
 get
 that much right, I wouldn't want to make any other assumptions about their
 abilities on similar matters.  
 

Search Matt Cutts blog for session id.  He specifically suggests to not even
include query string parameters that look like session ids.  From what I
remember Google can and does index pages with session ids BUT to a reduced
degree.
-- 
View this message in context: 
http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16646137.html
Sent from the Wicket - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-11 Thread Jeremy Thomerson

If I understood you correctly, the first page is bookmarkable, the second is
a wicket URL, tied to the session. That'd be bad for SEO - search engines
couldn't see page 2, or they would, but the URL is tied to their session, so
even if a user visited that URL, they wouldn't get that page. This means
that any content past page one is unreachable from a search engine. I had
another thread going about a problem I was having with sessions, which
turned up some interesting data. I have over 31,000 pages indexed by
Google, they are visiting bookmarkable URLS that DO have jsessionid in them,
but only two pages in their index have a jsessionid in them. They obviously
handle jsessionid fine these days, or at least they are for me.

If you need all of your content to be indexed, you really need to concern
yourself with making every page bookmarkable. Take a look at Korbinian's
comments above - it looks like he is doing it well. Or have a look at my
comments or my site http://www.texashuntfish.com.

You should specifically look at http://www.texashuntfish.com/thf/app/forum -
I am using DataTable's there, but every link (including sort, etc) is
bookmarkable. So, you may go into a category and get an URL like
http://www.texashuntfish.com/thf/app/forum/cat-53/Let-s-Talk-Texas-Outdoors-Classifieds-Buy-Sell-Tradeor
http://www.texashuntfish.com/thf/app/forum/18395/Winchester-22-model-61-for-sell.
The cat-53 or the /18395/ are the only things that matters. I have a
strategy mounted on /forum that will take the first parameter and use it
to decode what kind of page is being requested - a category page, or a
specific post, etc. Everything after that first parameter is specifically
for SEO.

Putting good keywords in the URL like that, and putting the subject of every
article / calendar event / news or forum thread is what shot us up in the
rankings of multiple search engines. Migrating the app from what it was
before somerandomscript.cfm?foo=123123bar=12321 to this made a HUGE
difference. It wasn't without work - Wicket is super easy if you don't have
to worry about URLs - but they also make it easy to totally customize all of
your URLs, too.

Shoot back any questions you have. Hopefully I can share more information,
or even some code later. Maybe Korbinian and I should put some information
on the Wiki about pretty URLs and SEO.

Jeremy

On Fri, Apr 4, 2008 at 1:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote:

Thanks,

That's kinda the route I've already taken. On my site, www.startfound.com
,
if you click on any company to see more details it goes to a bookmarkable
page. Same with any tag. Maybe if I've already got that much, I
shouldn't
concern myself with the fact that page 2 of my list is not bookmarkable
but
reachable by google bot. Or maybe I should just add a noindex meta tag on
every page that's not page 1.

It'd be kinda ridiculous to require login to see past page 1. That may be
good for SEO but it'll drive people away.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Jeremy Thomerson
Sent: Thursday, April 03, 2008 10:00 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

I've been building a community-driven hunting and fishing site in Texas
for
the past year and a half. Since I have converted it to Wicket from
ColdFusion, our search engine rankings have gone WAY UP. That's right,
we're on the first page for tons of searches. Search for texas
hunting
-
we're second under only the Texas Parks and Wildlife Association.

How? With Wicket? Yes - it requires a little more work. What I do is
that
for any link that I want Google to be able to follow, I have a subclass
of
Link specific to that. For instance, ViewThreadLink, which takes the ID
for
the link and a model (detachable) of the thread. Then I mount an
IRequestTargetUrlCodingStrategy for each big category of things in my
webapp. I've made several strategies that I use over and over, just
giving
them a different mount path and a different parameter to tell it what
kind
of article, etc, that it will match to. This is made easier because
over
75% of the objects in our site are all similar enough that the extend
from
a
base class that provides the basic functionality for an article / thread
/
etc that has a title, text, pictures, comments, the standard stuff.

So, yes, it takes work. But that's okay - SEO always takes work. I
also
have given a lot of care to use good page titles, good semantic HTML and
stuff things into the URL that don't have anything to do with locating
the
resource, but give the search engines a clue as to what the content is.

Re: Removing the jsessionid for SEO

2008-04-04 Thread Korbinian Bachl - privat


Hi Jeremy,
Hi Dan,

for a project long ago I had the trail of making a product-browser SEO 
friendly; I used a plain PagingNavigator at first, and then extended it 
to have it to use the IndexedUrlPageParameters; this allowed me to put 
anything into the path to have a nice URL;


the key here is to look at the URL and treat it as a unique resource 
line; so I did it sth like that:


mountName{(/anyparams)}*{/pageNumber}

this gave me the possiblity to have a browsing URL where I could put 
anything in while the rest still works; remember also that the URL for 
SEO may (!) change in future, so go for maximum flexible designs, up you 
see a resource, then any params to feed the spider (there may be 0 to 
over 10) and a hook at the end that has to be a number (where 0 is 
pretended in case nothing at the end is a number);


so I was able to finally let the spider see things like:

e.g:
product/brand_New/BestItemOfTheWorld
product/specialCategory/moreSpecial/moreInfo/2
product/spcialCategory/moreSpecail/brandName/moreDetails/1

etc.

now, you wonder if I feed the spider with this how do I know where to 
end?  the key was that the part between got merged internally and was 
specified by the application so we overcome the problem of:


a, recreating the view that should be the right one (here: we had a 
tree-like behaviour for our products where we could compare to the tree 
in database)


b, duplicate content (very bad! - never, ever have a spider find the 
same content (or very very similar!) under more than one URL !)


this strategy did very well; Today with wicket 1.3 I would go nearly the 
same but stick to the HybridURL scheme, and maybe try to be even more 
flexible with URL scheme by having the basic schemes and resources 
specified in persistence (URL-hook, initialState); Remember it is 
important to feed same resources under same URLs out, else the spider 
will think you might try to fake content for him;


The jsessionID is sth. I dont care about anymore - its 2008, spiders 
knows it and the usual visitor/ surfer has no clue how to different a 
URL from an emailadress; however many people have turned cookies + JS 
off because of security fears - in turn the JSessionID will concern only 
few people who know about some details but hamper many people that have 
no knowledge of the internet and its techniques all over - IMHO.


@Jeremy: your aproach also seems interesting to me, can you give more 
details about it?


Best,

Korbinian

Jeremy Thomerson schrieb:




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Removing the jsessionid for SEO

2008-04-04 Thread Dan Kaplan

That is helpful, but: This is an extension of the standard, so not all bots
may follow it.  I wonder if the major ones do...

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeremy Levy
 Sent: Thursday, April 03, 2008 6:16 PM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO

 We have a similar issue, and are trying the following out right now..

 http://www.google.com/support/webmasters/bin/answer.py?hl=enanswer=40367

 User-agent: *
 Disallow: /*?

 On Thu, Apr 3, 2008 at 9:09 PM, Dan Kaplan [EMAIL PROTECTED]
 wrote:

  Ok, at least I'm not missing anything.  I understand the benefits it's
  providing with its stateful framework.  Developing a site with Wicket is
  easier than with any other framework I've used.  But this statefulness,
  which makes websites so easy to develop, seems to be counter productive
 to
  SEO:

  GoogleBot will follow and index stateful links.  Worst case scenario,
  these
  actually become visible to google users and when they click the link it
  takes them to an invalid session page.  They think, This site is
  broken
  and move on to the next link of their search result.

  Another approach to solving this is to block all the stateful pages in
 my
  robots.txt file.  But how can I block these links in robots.txt since
 they
  change per session?  Is there any way to know what the url will resolve
 to
  when googlebot tries to visit my site so I can tell it to disallow:
  /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?

   -Original Message-
   From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 03, 2008 5:45 PM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO

   On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED]
   wrote:
Ok I did a little preliminary research on this.  Right now
   PagingNavigator
 uses PagingNavigationLink's to represent its page.  This extends
  Link.
   I'm
 supposed to override PagingNavigator's newPagingNavigationLink()
  method
   to
 accomplish this (I think) but past that, this isn't very
   straightforward to
 me.

 Do I need to create my own BookmarkablePagingNavigationLink?  When
 I
   do...
 what next?  I really don't know enough about bookmarkablePageLinks
 to
   do
 this.  Right now, all the magic happens inside
 PagingNavigationLink.
   Won't
 I have to move all that logic into the WebPage that I'm passing
 into
 BookmarkablePagingNavigationLink?  This seems like a lot of work.
 Am
  I
 missing something critical?

   no, you are not missing anything. you see, when you go stateless, like
   what you want, then you have to recreate all the magic stuff that
   makes stateful links Just Work. Without state you are back to the
   servlet/mvc programming model: you have to encode the state that you
   want into the link, then on the trip back decode it, recreate
   something from it, and then apply that something onto the components.
   This is the crapwork that wicket does for you usually.

   -igor

  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]

 Sent: Thursday, April 03, 2008 3:40 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  you subclass the pagenavigator and make it use bookmarkable links
  also. it has factory methods for all the links it uses.

  -igor

  On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan
 [EMAIL PROTECTED]

  wrote:
   I wasn't talking about the links that are on the list (I
 already
   make
  those
bookmarkable).  I'm talking about the links that the Navigator
  generates.
How do I make it so page 2 is bookmarkable?

-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED]

   Sent: Thursday, April 03, 2008 3:30 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

instead of

item.add(new link(foo) { onclick() });

do

item.add(new bookmarkablepagelink(foo, page.class));

-igor

On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
   [EMAIL PROTECTED]
  wrote:
 How?  I asked how to do it before and nobody suggested this
 as
  a
  possibility.

  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 3:26 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  dataview can work in a stateless mode, just use
 bookmarkable
   links
  inside
it

  -igor

  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
   [EMAIL PROTECTED

Re: Removing the jsessionid for SEO

2008-04-03 Thread Artur W.


Hi!


igor.vaynberg wrote:
 
 also by doing what you have done users with cookies disabled wont be
 able to use your site...
 

In my opinion session id is a problem. Google index the same page again and
again.

About the users without cookies we can do like this:


static class Unbuffered extends WebResponse {

private static final String[] botAgents = { onetszukaj, 
googlebot,
appie, architext,
jeeves, bjaaland, ferret, gulliver, harvest, 
htdig,
linkwalker, lycos_, moget, muscatferret, 
myweb, nomad,
scooter,
yahoo!\\sslurp\\schina, slurp, weblayers, 
antibot, bruinbot,
digout4u,
echo!, ia_archiver, jennybot, mercator, 
netcraft, msnbot,
petersnews,
unlost_web_crawler, voila, webbase, webcollage, 
cfetch,
zyborg,
wisenutbot, robot, crawl, spider }; /* and so 
on... */

public Unbuffered(final HttpServletResponse res) {
super(res);
}

@Override
public CharSequence encodeURL(final CharSequence url) {
return isAgent() ? url : super.encodeURL(url);
}

private static boolean isAgent() {

String agent =
((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().getHeader(User-Agent);

for(String bot : botAgents) {
if (agent.toLowerCase().indexOf(bot) != -1) {
return true;
}
}

return false;
}
}


I didn't test this code but I do similar thing in my old application in
Spring and it works.

Take care,
Artur


-- 
View this message in context: 
http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html
Sent from the Wicket - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-03 Thread Johan Compagner

isnt google always saying that you shouldn't alter behavior of your site
depending of it is there bot or not?

On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote:


 Hi!


 igor.vaynberg wrote:
 
  also by doing what you have done users with cookies disabled wont be
  able to use your site...
 

 In my opinion session id is a problem. Google index the same page again
 and
 again.

 About the users without cookies we can do like this:


static class Unbuffered extends WebResponse {

 private static final String[] botAgents = { onetszukaj,
 googlebot,
 appie, architext,
jeeves, bjaaland, ferret, gulliver,
 harvest, htdig,
linkwalker, lycos_, moget, muscatferret,
 myweb, nomad,
 scooter,
yahoo!\\sslurp\\schina, slurp, weblayers,
 antibot, bruinbot,
 digout4u,
echo!, ia_archiver, jennybot, mercator,
 netcraft, msnbot,
 petersnews,
unlost_web_crawler, voila, webbase,
 webcollage, cfetch,
 zyborg,
wisenutbot, robot, crawl, spider }; /* and
 so on... */

public Unbuffered(final HttpServletResponse res) {
super(res);
 }

@Override
public CharSequence encodeURL(final CharSequence url) {
 return isAgent() ? url : super.encodeURL(url);
}

private static boolean isAgent() {

String agent =

 ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().getHeader(User-Agent);

for(String bot : botAgents) {
if (agent.toLowerCase().indexOf(bot) != -1)
 {
return true;
}
}

return false;
}
}


 I didn't test this code but I do similar thing in my old application in
 Spring and it works.

 Take care,
 Artur


 --
 View this message in context:
 http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html
 Sent from the Wicket - User mailing list archive at Nabble.com.


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

RE: Removing the jsessionid for SEO

2008-04-03 Thread Zappaterrini, Larry

When Google asks to not have special treatment for their bot, they are
referring to content more than anything. Regarding the session id being
coded in the URL, see the Technical guidelines section of Google's
Webmaster Guidelines -
http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
gn 

It specifically recommends allow(ing) search bots to crawl your sites
without session IDs or arguments that track their path through the
site.

-Original Message-
From: Johan Compagner [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2008 7:35 AM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

isnt google always saying that you shouldn't alter behavior of your site
depending of it is there bot or not?

On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote:


 Hi!


 igor.vaynberg wrote:
 
  also by doing what you have done users with cookies disabled wont be
  able to use your site...
 

 In my opinion session id is a problem. Google index the same page
again
 and
 again.

 About the users without cookies we can do like this:


static class Unbuffered extends WebResponse {

 private static final String[] botAgents = {
onetszukaj,
 googlebot,
 appie, architext,
jeeves, bjaaland, ferret, gulliver,
 harvest, htdig,
linkwalker, lycos_, moget,
muscatferret,
 myweb, nomad,
 scooter,
yahoo!\\sslurp\\schina, slurp, weblayers,
 antibot, bruinbot,
 digout4u,
echo!, ia_archiver, jennybot, mercator,
 netcraft, msnbot,
 petersnews,
unlost_web_crawler, voila, webbase,
 webcollage, cfetch,
 zyborg,
wisenutbot, robot, crawl, spider }; /*
and
 so on... */

public Unbuffered(final HttpServletResponse res) {
super(res);
 }

@Override
public CharSequence encodeURL(final CharSequence url) {
 return isAgent() ? url : super.encodeURL(url);
}

private static boolean isAgent() {

String agent =


((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
tHeader(User-Agent);

for(String bot : botAgents) {
if (agent.toLowerCase().indexOf(bot) !=
-1)
 {
return true;
}
}

return false;
}
}


 I didn't test this code but I do similar thing in my old application
in
 Spring and it works.

 Take care,
 Artur


 --
 View this message in context:

http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
6.html
 Sent from the Wicket - User mailing list archive at Nabble.com.


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



__

The information contained in this message is proprietary and/or confidential. 
If you are not the 
intended recipient, please: (i) delete the message and all copies; (ii) do not 
disclose, 
distribute or use the message in any manner; and (iii) notify the sender 
immediately. In addition, 
please be aware that any message addressed to our domain is subject to 
archiving and review by 
persons other than the intended recipient. Thank you.
_

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-03 Thread Johan Compagner

the problem is that then you have to have all stateless pages. Else google
can't crawl your website.
And if that is the case then you could be completely stateless so you dont
have a session (id) to worry about at all.

johan




On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
[EMAIL PROTECTED] wrote:

 When Google asks to not have special treatment for their bot, they are
 referring to content more than anything. Regarding the session id being
 coded in the URL, see the Technical guidelines section of Google's
 Webmaster Guidelines -
 http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
 gn

 It specifically recommends allow(ing) search bots to crawl your sites
 without session IDs or arguments that track their path through the
 site.

 -Original Message-
 From: Johan Compagner [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 7:35 AM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO

 isnt google always saying that you shouldn't alter behavior of your site
 depending of it is there bot or not?

 On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote:

 
  Hi!
 
 
  igor.vaynberg wrote:
  
   also by doing what you have done users with cookies disabled wont be
   able to use your site...
  
 
  In my opinion session id is a problem. Google index the same page
 again
  and
  again.
 
  About the users without cookies we can do like this:
 
 
 static class Unbuffered extends WebResponse {
 
  private static final String[] botAgents = {
 onetszukaj,
  googlebot,
  appie, architext,
 jeeves, bjaaland, ferret, gulliver,
  harvest, htdig,
 linkwalker, lycos_, moget,
 muscatferret,
  myweb, nomad,
  scooter,
 yahoo!\\sslurp\\schina, slurp, weblayers,
  antibot, bruinbot,
  digout4u,
 echo!, ia_archiver, jennybot, mercator,
  netcraft, msnbot,
  petersnews,
 unlost_web_crawler, voila, webbase,
  webcollage, cfetch,
  zyborg,
 wisenutbot, robot, crawl, spider }; /*
 and
  so on... */
 
 public Unbuffered(final HttpServletResponse res) {
 super(res);
  }
 
 @Override
 public CharSequence encodeURL(final CharSequence url) {
  return isAgent() ? url : super.encodeURL(url);
 }
 
 private static boolean isAgent() {
 
 String agent =
 
 
 ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
 tHeader(User-Agent);
 
 for(String bot : botAgents) {
 if (agent.toLowerCase().indexOf(bot) !=
 -1)
  {
 return true;
 }
 }
 
 return false;
 }
 }
 
 
  I didn't test this code but I do similar thing in my old application
 in
  Spring and it works.
 
  Take care,
  Artur
 
 
  --
  View this message in context:
 
 http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
 6.htmlhttp://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html
  Sent from the Wicket - User mailing list archive at Nabble.com.
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 

 __

 The information contained in this message is proprietary and/or
 confidential. If you are not the
 intended recipient, please: (i) delete the message and all copies; (ii) do
 not disclose,
 distribute or use the message in any manner; and (iii) notify the sender
 immediately. In addition,
 please be aware that any message addressed to our domain is subject to
 archiving and review by
 persons other than the intended recipient. Thank you.
 _

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-03 Thread Igor Vaynberg

right. if you strip sessionid then all your nonbookmarkable urls will
resolve to a 404. that will probably drop your rank a lot faster

-igor


On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote:
 the problem is that then you have to have all stateless pages. Else google
  can't crawl your website.
  And if that is the case then you could be completely stateless so you dont
  have a session (id) to worry about at all.

  johan






  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
  [EMAIL PROTECTED] wrote:

   When Google asks to not have special treatment for their bot, they are
   referring to content more than anything. Regarding the session id being
   coded in the URL, see the Technical guidelines section of Google's
   Webmaster Guidelines -
   http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
   gn
  
   It specifically recommends allow(ing) search bots to crawl your sites
   without session IDs or arguments that track their path through the
   site.
  
   -Original Message-
   From: Johan Compagner [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 03, 2008 7:35 AM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO
  
   isnt google always saying that you shouldn't alter behavior of your site
   depending of it is there bot or not?
  
   On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote:
  
   
Hi!
   
   
igor.vaynberg wrote:

 also by doing what you have done users with cookies disabled wont be
 able to use your site...

   
In my opinion session id is a problem. Google index the same page
   again
and
again.
   
About the users without cookies we can do like this:
   
   
   static class Unbuffered extends WebResponse {
   
private static final String[] botAgents = {
   onetszukaj,
googlebot,
appie, architext,
   jeeves, bjaaland, ferret, gulliver,
harvest, htdig,
   linkwalker, lycos_, moget,
   muscatferret,
myweb, nomad,
scooter,
   yahoo!\\sslurp\\schina, slurp, weblayers,
antibot, bruinbot,
digout4u,
   echo!, ia_archiver, jennybot, mercator,
netcraft, msnbot,
petersnews,
   unlost_web_crawler, voila, webbase,
webcollage, cfetch,
zyborg,
   wisenutbot, robot, crawl, spider }; /*
   and
so on... */
   
   public Unbuffered(final HttpServletResponse res) {
   super(res);
}
   
   @Override
   public CharSequence encodeURL(final CharSequence url) {
return isAgent() ? url : super.encodeURL(url);
   }
   
   private static boolean isAgent() {
   
   String agent =
   
   
   ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
   tHeader(User-Agent);
   
   for(String bot : botAgents) {
   if (agent.toLowerCase().indexOf(bot) !=
   -1)
{
   return true;
   }
   }
   
   return false;
   }
   }
   
   
I didn't test this code but I do similar thing in my old application
   in
Spring and it works.
   
Take care,
Artur
   
   
--
View this message in context:
   
   http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
   
 6.htmlhttp://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html


   Sent from the Wicket - User mailing list archive at Nabble.com.
   
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
   
  
   __
  
   The information contained in this message is proprietary and/or
   confidential. If you are not the
   intended recipient, please: (i) delete the message and all copies; (ii) do
   not disclose,
   distribute or use the message in any manner; and (iii) notify the sender
   immediately. In addition,
   please be aware that any message addressed to our domain is subject to
   archiving and review by
   persons other than the intended recipient. Thank you.
   _
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-03 Thread Matej Knopp

On the other hand, crawling non-bookmarkable pages is not very useful
anyway, since ?wicket:interface url will always get page expired when
you click on the result.

However, preserving session makes lot of sense with hybrid url. Google
remembers the original url (without page instance) while indexing the
real page (after redirect).

I think though that the crawler is quite advanced. I'm would think  it
supports cookies (at least JSESSIONID) as well as it evaluates some of
the javascript on page.

-Matej

On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED] wrote:
 right. if you strip sessionid then all your nonbookmarkable urls will
  resolve to a 404. that will probably drop your rank a lot faster

  -igor




  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED] wrote:
   the problem is that then you have to have all stateless pages. Else google
can't crawl your website.
And if that is the case then you could be completely stateless so you dont
have a session (id) to worry about at all.
  
johan
  
  
  
  
  
  
On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
[EMAIL PROTECTED] wrote:
  
 When Google asks to not have special treatment for their bot, they are
 referring to content more than anything. Regarding the session id being
 coded in the URL, see the Technical guidelines section of Google's
 Webmaster Guidelines -
 http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
 gn

 It specifically recommends allow(ing) search bots to crawl your sites
 without session IDs or arguments that track their path through the
 site.

 -Original Message-
 From: Johan Compagner [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 7:35 AM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO

 isnt google always saying that you shouldn't alter behavior of your site
 depending of it is there bot or not?

 On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED] wrote:

 
  Hi!
 
 
  igor.vaynberg wrote:
  
   also by doing what you have done users with cookies disabled wont be
   able to use your site...
  
 
  In my opinion session id is a problem. Google index the same page
 again
  and
  again.
 
  About the users without cookies we can do like this:
 
 
 static class Unbuffered extends WebResponse {
 
  private static final String[] botAgents = {
 onetszukaj,
  googlebot,
  appie, architext,
 jeeves, bjaaland, ferret, gulliver,
  harvest, htdig,
 linkwalker, lycos_, moget,
 muscatferret,
  myweb, nomad,
  scooter,
 yahoo!\\sslurp\\schina, slurp, weblayers,
  antibot, bruinbot,
  digout4u,
 echo!, ia_archiver, jennybot, mercator,
  netcraft, msnbot,
  petersnews,
 unlost_web_crawler, voila, webbase,
  webcollage, cfetch,
  zyborg,
 wisenutbot, robot, crawl, spider }; /*
 and
  so on... */
 
 public Unbuffered(final HttpServletResponse res) {
 super(res);
  }
 
 @Override
 public CharSequence encodeURL(final CharSequence url) {
  return isAgent() ? url : super.encodeURL(url);
 }
 
 private static boolean isAgent() {
 
 String agent =
 
 
 ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
 tHeader(User-Agent);
 
 for(String bot : botAgents) {
 if (agent.toLowerCase().indexOf(bot) !=
 -1)
  {
 return true;
 }
 }
 
 return false;
 }
 }
 
 
  I didn't test this code but I do similar thing in my old application
 in
  Spring and it works.
 
  Take care,
  Artur
 
 
  --
  View this message in context:
 
 http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
 
 6.htmlhttp://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html
  
  
 Sent from the Wicket - User mailing list archive at Nabble.com.
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 

 __

 The information contained in this message is proprietary and/or
 confidential. If you are not the
 intended recipient

RE: Removing the jsessionid for SEO

2008-04-03 Thread Dan Kaplan

Regardless, at the very least this makes your site look weird and
unprofessional when google puts a jsessionid on your url.  There has got to
be some negative effect when google visits it the second time and the
jsessionid has changed but it sees the same exact content.  Worst case,
it'll think you're trying to trick it.

About those 404s, I'm finding that with the fix I provided I don't get a
404, but the links refresh the page I'm already on.  IE: If I'm on A, and a
link to B is non-bookmarkable, clicking B refreshes A.  

This issue is very disconcerting to me.  It's one of the reasons I wish that
DataView had an option to work in stateless mode.  Cause if I ban cookies
and Googlebot visits my home page (with a navigator on it), it'll try to
follow all these page links and from its perspective, they all lead back to
the first page.  So it's kinda a catch-22: Include the jsessionid in the
urls and get bad SEO or remove the jsessionid and get bad SEO :(

Perhaps the answer to my prayers is a combination of the noindex/nofollow
meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the home
page (so googlebot doesn't try to follow the navigator links) and use the
sitemap.xml to point out the individual pages I want it to index.  


Matej: can you go into more detail about your hybrid URL statement?  Won't
google index, for example, /home and /home.1 if I use it?  When it follows
the next page, won't the url become /home.1.2 or something?  That .2 is a
page version: If google indexes that and tries to visit it again, won't it
report about an invalid session?  
 
-Original Message-
From: Matej Knopp [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2008 11:10 AM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

On the other hand, crawling non-bookmarkable pages is not very useful
anyway, since ?wicket:interface url will always get page expired when
you click on the result.

However, preserving session makes lot of sense with hybrid url. Google
remembers the original url (without page instance) while indexing the
real page (after redirect).

I think though that the crawler is quite advanced. I'm would think  it
supports cookies (at least JSESSIONID) as well as it evaluates some of
the javascript on page.

-Matej

On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED]
wrote:
 right. if you strip sessionid then all your nonbookmarkable urls will
  resolve to a 404. that will probably drop your rank a lot faster

  -igor




  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED]
wrote:
   the problem is that then you have to have all stateless pages. Else
google
can't crawl your website.
And if that is the case then you could be completely stateless so you
dont
have a session (id) to worry about at all.
  
johan
  
  
  
  
  
  
On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
[EMAIL PROTECTED] wrote:
  
 When Google asks to not have special treatment for their bot, they
are
 referring to content more than anything. Regarding the session id
being
 coded in the URL, see the Technical guidelines section of Google's
 Webmaster Guidelines -

http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
 gn

 It specifically recommends allow(ing) search bots to crawl your
sites
 without session IDs or arguments that track their path through the
 site.

 -Original Message-
 From: Johan Compagner [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 7:35 AM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO

 isnt google always saying that you shouldn't alter behavior of your
site
 depending of it is there bot or not?

 On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED]
wrote:

 
  Hi!
 
 
  igor.vaynberg wrote:
  
   also by doing what you have done users with cookies disabled
wont be
   able to use your site...
  
 
  In my opinion session id is a problem. Google index the same page
 again
  and
  again.
 
  About the users without cookies we can do like this:
 
 
 static class Unbuffered extends WebResponse {
 
  private static final String[] botAgents = {
 onetszukaj,
  googlebot,
  appie, architext,
 jeeves, bjaaland, ferret, gulliver,
  harvest, htdig,
 linkwalker, lycos_, moget,
 muscatferret,
  myweb, nomad,
  scooter,
 yahoo!\\sslurp\\schina, slurp,
weblayers,
  antibot, bruinbot,
  digout4u,
 echo!, ia_archiver, jennybot,
mercator,
  netcraft, msnbot,
  petersnews,
 unlost_web_crawler, voila, webbase,
  webcollage, cfetch,
  zyborg,
 wisenutbot

Re: Removing the jsessionid for SEO

2008-04-03 Thread Igor Vaynberg

dataview can work in a stateless mode, just use bookmarkable links inside it

-igor


On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 Regardless, at the very least this makes your site look weird and
  unprofessional when google puts a jsessionid on your url.  There has got to
  be some negative effect when google visits it the second time and the
  jsessionid has changed but it sees the same exact content.  Worst case,
  it'll think you're trying to trick it.

  About those 404s, I'm finding that with the fix I provided I don't get a
  404, but the links refresh the page I'm already on.  IE: If I'm on A, and a
  link to B is non-bookmarkable, clicking B refreshes A.

  This issue is very disconcerting to me.  It's one of the reasons I wish that
  DataView had an option to work in stateless mode.  Cause if I ban cookies
  and Googlebot visits my home page (with a navigator on it), it'll try to
  follow all these page links and from its perspective, they all lead back to
  the first page.  So it's kinda a catch-22: Include the jsessionid in the
  urls and get bad SEO or remove the jsessionid and get bad SEO :(

  Perhaps the answer to my prayers is a combination of the noindex/nofollow
  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the home
  page (so googlebot doesn't try to follow the navigator links) and use the
  sitemap.xml to point out the individual pages I want it to index.


  Matej: can you go into more detail about your hybrid URL statement?  Won't
  google index, for example, /home and /home.1 if I use it?  When it follows
  the next page, won't the url become /home.1.2 or something?  That .2 is a
  page version: If google indexes that and tries to visit it again, won't it
  report about an invalid session?



  -Original Message-
  From: Matej Knopp [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 11:10 AM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  On the other hand, crawling non-bookmarkable pages is not very useful
  anyway, since ?wicket:interface url will always get page expired when
  you click on the result.

  However, preserving session makes lot of sense with hybrid url. Google
  remembers the original url (without page instance) while indexing the
  real page (after redirect).

  I think though that the crawler is quite advanced. I'm would think  it
  supports cookies (at least JSESSIONID) as well as it evaluates some of
  the javascript on page.

  -Matej

  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED]
  wrote:
   right. if you strip sessionid then all your nonbookmarkable urls will
resolve to a 404. that will probably drop your rank a lot faster
  
-igor
  
  
  
  
On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED]
  wrote:
 the problem is that then you have to have all stateless pages. Else
  google
  can't crawl your website.
  And if that is the case then you could be completely stateless so you
  dont
  have a session (id) to worry about at all.

  johan






  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
  [EMAIL PROTECTED] wrote:

   When Google asks to not have special treatment for their bot, they
  are
   referring to content more than anything. Regarding the session id
  being
   coded in the URL, see the Technical guidelines section of Google's
   Webmaster Guidelines -
  
  http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
   gn
  
   It specifically recommends allow(ing) search bots to crawl your
  sites
   without session IDs or arguments that track their path through the
   site.
  
   -Original Message-
   From: Johan Compagner [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 03, 2008 7:35 AM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO
  
   isnt google always saying that you shouldn't alter behavior of your
  site
   depending of it is there bot or not?
  
   On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED]
  wrote:
  
   
Hi!
   
   
igor.vaynberg wrote:

 also by doing what you have done users with cookies disabled
  wont be
 able to use your site...

   
In my opinion session id is a problem. Google index the same page
   again
and
again.
   
About the users without cookies we can do like this:
   
   
   static class Unbuffered extends WebResponse {
   
private static final String[] botAgents = {
   onetszukaj,
googlebot,
appie, architext,
   jeeves, bjaaland, ferret, gulliver,
harvest, htdig,
   linkwalker, lycos_, moget,
   muscatferret

RE: Removing the jsessionid for SEO

2008-04-03 Thread Dan Kaplan

Clarifications:

When I said About those 404s, I was talking about if you use the fix I
provided and turn off cookies on your browser.

When I said, If I ban cookies I mean to say, If I require cookies

-Original Message-
From: Dan Kaplan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2008 3:22 PM
To: users@wicket.apache.org
Subject: RE: Removing the jsessionid for SEO

Regardless, at the very least this makes your site look weird and
unprofessional when google puts a jsessionid on your url.  There has got to
be some negative effect when google visits it the second time and the
jsessionid has changed but it sees the same exact content.  Worst case,
it'll think you're trying to trick it.

About those 404s, I'm finding that with the fix I provided I don't get a
404, but the links refresh the page I'm already on.  IE: If I'm on A, and a
link to B is non-bookmarkable, clicking B refreshes A.  

This issue is very disconcerting to me.  It's one of the reasons I wish that
DataView had an option to work in stateless mode.  Cause if I ban cookies
and Googlebot visits my home page (with a navigator on it), it'll try to
follow all these page links and from its perspective, they all lead back to
the first page.  So it's kinda a catch-22: Include the jsessionid in the
urls and get bad SEO or remove the jsessionid and get bad SEO :(

Perhaps the answer to my prayers is a combination of the noindex/nofollow
meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the home
page (so googlebot doesn't try to follow the navigator links) and use the
sitemap.xml to point out the individual pages I want it to index.  


Matej: can you go into more detail about your hybrid URL statement?  Won't
google index, for example, /home and /home.1 if I use it?  When it follows
the next page, won't the url become /home.1.2 or something?  That .2 is a
page version: If google indexes that and tries to visit it again, won't it
report about an invalid session?  
 
-Original Message-
From: Matej Knopp [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2008 11:10 AM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

On the other hand, crawling non-bookmarkable pages is not very useful
anyway, since ?wicket:interface url will always get page expired when
you click on the result.

However, preserving session makes lot of sense with hybrid url. Google
remembers the original url (without page instance) while indexing the
real page (after redirect).

I think though that the crawler is quite advanced. I'm would think  it
supports cookies (at least JSESSIONID) as well as it evaluates some of
the javascript on page.

-Matej

On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED]
wrote:
 right. if you strip sessionid then all your nonbookmarkable urls will
  resolve to a 404. that will probably drop your rank a lot faster

  -igor




  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED]
wrote:
   the problem is that then you have to have all stateless pages. Else
google
can't crawl your website.
And if that is the case then you could be completely stateless so you
dont
have a session (id) to worry about at all.
  
johan
  
  
  
  
  
  
On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
[EMAIL PROTECTED] wrote:
  
 When Google asks to not have special treatment for their bot, they
are
 referring to content more than anything. Regarding the session id
being
 coded in the URL, see the Technical guidelines section of Google's
 Webmaster Guidelines -

http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
 gn

 It specifically recommends allow(ing) search bots to crawl your
sites
 without session IDs or arguments that track their path through the
 site.

 -Original Message-
 From: Johan Compagner [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 7:35 AM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO

 isnt google always saying that you shouldn't alter behavior of your
site
 depending of it is there bot or not?

 On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED]
wrote:

 
  Hi!
 
 
  igor.vaynberg wrote:
  
   also by doing what you have done users with cookies disabled
wont be
   able to use your site...
  
 
  In my opinion session id is a problem. Google index the same page
 again
  and
  again.
 
  About the users without cookies we can do like this:
 
 
 static class Unbuffered extends WebResponse {
 
  private static final String[] botAgents = {
 onetszukaj,
  googlebot,
  appie, architext,
 jeeves, bjaaland, ferret, gulliver,
  harvest, htdig,
 linkwalker, lycos_, moget,
 muscatferret,
  myweb

RE: Removing the jsessionid for SEO

2008-04-03 Thread Dan Kaplan

How?  I asked how to do it before and nobody suggested this as a
possibility.  

-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2008 3:26 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

dataview can work in a stateless mode, just use bookmarkable links inside it

-igor

On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 Regardless, at the very least this makes your site look weird and
  unprofessional when google puts a jsessionid on your url.  There has got
to
  be some negative effect when google visits it the second time and the
  jsessionid has changed but it sees the same exact content.  Worst case,
  it'll think you're trying to trick it.

  About those 404s, I'm finding that with the fix I provided I don't get a
  404, but the links refresh the page I'm already on.  IE: If I'm on A, and
a
  link to B is non-bookmarkable, clicking B refreshes A.

  This issue is very disconcerting to me.  It's one of the reasons I wish
that
  DataView had an option to work in stateless mode.  Cause if I ban cookies
  and Googlebot visits my home page (with a navigator on it), it'll try to
  follow all these page links and from its perspective, they all lead back
to
  the first page.  So it's kinda a catch-22: Include the jsessionid in the
  urls and get bad SEO or remove the jsessionid and get bad SEO :(

  Perhaps the answer to my prayers is a combination of the noindex/nofollow
  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the
home
  page (so googlebot doesn't try to follow the navigator links) and use the
  sitemap.xml to point out the individual pages I want it to index.

  Matej: can you go into more detail about your hybrid URL statement?
Won't
  google index, for example, /home and /home.1 if I use it?  When it
follows
  the next page, won't the url become /home.1.2 or something?  That .2 is a
  page version: If google indexes that and tries to visit it again, won't
it
  report about an invalid session?

  -Original Message-
  From: Matej Knopp [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 11:10 AM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  On the other hand, crawling non-bookmarkable pages is not very useful
  anyway, since ?wicket:interface url will always get page expired when
  you click on the result.

  However, preserving session makes lot of sense with hybrid url. Google
  remembers the original url (without page instance) while indexing the
  real page (after redirect).

  I think though that the crawler is quite advanced. I'm would think  it
  supports cookies (at least JSESSIONID) as well as it evaluates some of
  the javascript on page.

  -Matej

  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED]
  wrote:
   right. if you strip sessionid then all your nonbookmarkable urls will
resolve to a 404. that will probably drop your rank a lot faster

-igor

On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED]
  wrote:
 the problem is that then you have to have all stateless pages. Else
  google
  can't crawl your website.
  And if that is the case then you could be completely stateless so
you
  dont
  have a session (id) to worry about at all.

  johan

  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
  [EMAIL PROTECTED] wrote:

   When Google asks to not have special treatment for their bot,
they
  are
   referring to content more than anything. Regarding the session id
  being
   coded in the URL, see the Technical guidelines section of
Google's
   Webmaster Guidelines -

  http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
   gn

   It specifically recommends allow(ing) search bots to crawl your
  sites
   without session IDs or arguments that track their path through
the
   site.

   -Original Message-
   From: Johan Compagner [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 03, 2008 7:35 AM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO

   isnt google always saying that you shouldn't alter behavior of
your
  site
   depending of it is there bot or not?

   On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED]
  wrote:

Hi!

igor.vaynberg wrote:

 also by doing what you have done users with cookies disabled
  wont be
 able to use your site...

In my opinion session id is a problem. Google index the same
page
   again
and
again.

About the users without cookies we can do like this:

   static class Unbuffered extends WebResponse {

private static final

Re: Removing the jsessionid for SEO

2008-04-03 Thread Igor Vaynberg

instead of

item.add(new link(foo) { onclick() });

do

item.add(new bookmarkablepagelink(foo, page.class));

-igor


On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 How?  I asked how to do it before and nobody suggested this as a
  possibility.



  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 3:26 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  dataview can work in a stateless mode, just use bookmarkable links inside it

  -igor


  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
   Regardless, at the very least this makes your site look weird and
unprofessional when google puts a jsessionid on your url.  There has got
  to
be some negative effect when google visits it the second time and the
jsessionid has changed but it sees the same exact content.  Worst case,
it'll think you're trying to trick it.
  
About those 404s, I'm finding that with the fix I provided I don't get a
404, but the links refresh the page I'm already on.  IE: If I'm on A, and
  a
link to B is non-bookmarkable, clicking B refreshes A.
  
This issue is very disconcerting to me.  It's one of the reasons I wish
  that
DataView had an option to work in stateless mode.  Cause if I ban cookies
and Googlebot visits my home page (with a navigator on it), it'll try to
follow all these page links and from its perspective, they all lead back
  to
the first page.  So it's kinda a catch-22: Include the jsessionid in the
urls and get bad SEO or remove the jsessionid and get bad SEO :(
  
Perhaps the answer to my prayers is a combination of the noindex/nofollow
meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the
  home
page (so googlebot doesn't try to follow the navigator links) and use the
sitemap.xml to point out the individual pages I want it to index.
  
  
Matej: can you go into more detail about your hybrid URL statement?
  Won't
google index, for example, /home and /home.1 if I use it?  When it
  follows
the next page, won't the url become /home.1.2 or something?  That .2 is a
page version: If google indexes that and tries to visit it again, won't
  it
report about an invalid session?
  
  
  
-Original Message-
From: Matej Knopp [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2008 11:10 AM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO
  
On the other hand, crawling non-bookmarkable pages is not very useful
anyway, since ?wicket:interface url will always get page expired when
you click on the result.
  
However, preserving session makes lot of sense with hybrid url. Google
remembers the original url (without page instance) while indexing the
real page (after redirect).
  
I think though that the crawler is quite advanced. I'm would think  it
supports cookies (at least JSESSIONID) as well as it evaluates some of
the javascript on page.
  
-Matej
  
On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg [EMAIL PROTECTED]
wrote:
 right. if you strip sessionid then all your nonbookmarkable urls will
  resolve to a 404. that will probably drop your rank a lot faster

  -igor




  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner [EMAIL PROTECTED]
wrote:
   the problem is that then you have to have all stateless pages. Else
google
can't crawl your website.
And if that is the case then you could be completely stateless so
  you
dont
have a session (id) to worry about at all.
  
johan
  
  
  
  
  
  
On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
[EMAIL PROTECTED] wrote:
  
 When Google asks to not have special treatment for their bot,
  they
are
 referring to content more than anything. Regarding the session id
being
 coded in the URL, see the Technical guidelines section of
  Google's
 Webmaster Guidelines -

http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
 gn

 It specifically recommends allow(ing) search bots to crawl your
sites
 without session IDs or arguments that track their path through
  the
 site.

 -Original Message-
 From: Johan Compagner [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 7:35 AM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO

 isnt google always saying that you shouldn't alter behavior of
  your
site
 depending of it is there bot or not?

 On Thu, Apr 3, 2008 at 1:00 PM, Artur W. [EMAIL PROTECTED]
wrote:

 
  Hi!
 
 
  igor.vaynberg

Re: Removing the jsessionid for SEO

2008-04-03 Thread Martijn Dashorst

On 4/4/08, Dan Kaplan [EMAIL PROTECTED] wrote:
 Regardless, at the very least this makes your site look weird and
  unprofessional when google puts a jsessionid on your url.

0.5% of your users care about the URL that is displayed in a google
search result. It doesn't look weird or unprofessional. It is not like
your URL ends in .php or *gawk* .asp is it? It brings the
sophistication of Java to your users.

  There has got to
  be some negative effect when google visits it the second time and the
  jsessionid has changed but it sees the same exact content.  Worst case,
  it'll think you're trying to trick it.

I think you need to give the google engineers *some* credit. I
seriously doubt they are *THAT* stupid.

Martijn

-- 
Buy Wicket in Action: http://manning.com/dashorst
Apache Wicket 1.3.2 is released
Get it now: http://www.apache.org/dyn/closer.cgi/wicket/1.3.2

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Removing the jsessionid for SEO

2008-04-03 Thread Dan Kaplan

I wasn't talking about the links that are on the list (I already make those
bookmarkable).  I'm talking about the links that the Navigator generates.
How do I make it so page 2 is bookmarkable?

-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2008 3:30 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

instead of

item.add(new link(foo) { onclick() });

do

item.add(new bookmarkablepagelink(foo, page.class));

-igor

On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 How?  I asked how to do it before and nobody suggested this as a
  possibility.

  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 3:26 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  dataview can work in a stateless mode, just use bookmarkable links inside
it

  -igor

  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED]
wrote:
   Regardless, at the very least this makes your site look weird and
unprofessional when google puts a jsessionid on your url.  There has
got
  to
be some negative effect when google visits it the second time and the
jsessionid has changed but it sees the same exact content.  Worst
case,
it'll think you're trying to trick it.

About those 404s, I'm finding that with the fix I provided I don't get
a
404, but the links refresh the page I'm already on.  IE: If I'm on A,
and
  a
link to B is non-bookmarkable, clicking B refreshes A.

This issue is very disconcerting to me.  It's one of the reasons I
wish
  that
DataView had an option to work in stateless mode.  Cause if I ban
cookies
and Googlebot visits my home page (with a navigator on it), it'll try
to
follow all these page links and from its perspective, they all lead
back
  to
the first page.  So it's kinda a catch-22: Include the jsessionid in
the
urls and get bad SEO or remove the jsessionid and get bad SEO :(

Perhaps the answer to my prayers is a combination of the
noindex/nofollow
meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the
  home
page (so googlebot doesn't try to follow the navigator links) and use
the
sitemap.xml to point out the individual pages I want it to index.

Matej: can you go into more detail about your hybrid URL statement?
  Won't
google index, for example, /home and /home.1 if I use it?  When it
  follows
the next page, won't the url become /home.1.2 or something?  That .2
is a
page version: If google indexes that and tries to visit it again,
won't
  it
report about an invalid session?

-Original Message-
From: Matej Knopp [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2008 11:10 AM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

On the other hand, crawling non-bookmarkable pages is not very useful
anyway, since ?wicket:interface url will always get page expired when
you click on the result.

However, preserving session makes lot of sense with hybrid url. Google
remembers the original url (without page instance) while indexing the
real page (after redirect).

I think though that the crawler is quite advanced. I'm would think  it
supports cookies (at least JSESSIONID) as well as it evaluates some of
the javascript on page.

-Matej

On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
[EMAIL PROTECTED]
wrote:
 right. if you strip sessionid then all your nonbookmarkable urls
will
  resolve to a 404. that will probably drop your rank a lot
faster

  -igor

  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
[EMAIL PROTECTED]
wrote:
   the problem is that then you have to have all stateless pages.
Else
google
can't crawl your website.
And if that is the case then you could be completely stateless
so
  you
dont
have a session (id) to worry about at all.

johan

On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
[EMAIL PROTECTED] wrote:

 When Google asks to not have special treatment for their bot,
  they
are
 referring to content more than anything. Regarding the session
id
being
 coded in the URL, see the Technical guidelines section of
  Google's
 Webmaster Guidelines -

http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
 gn

 It specifically recommends allow(ing) search bots to crawl
your
sites
 without session IDs or arguments that track their path through
  the
 site.

 -Original Message-
 From: Johan Compagner [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 7:35 AM
 To: users

Re: Removing the jsessionid for SEO

2008-04-03 Thread Igor Vaynberg

you subclass the pagenavigator and make it use bookmarkable links
also. it has factory methods for all the links it uses.

-igor


On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 I wasn't talking about the links that are on the list (I already make those
  bookmarkable).  I'm talking about the links that the Navigator generates.
  How do I make it so page 2 is bookmarkable?


  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]


 Sent: Thursday, April 03, 2008 3:30 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  instead of

  item.add(new link(foo) { onclick() });

  do

  item.add(new bookmarkablepagelink(foo, page.class));

  -igor


  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
   How?  I asked how to do it before and nobody suggested this as a
possibility.
  
  
  
-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2008 3:26 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO
  
dataview can work in a stateless mode, just use bookmarkable links inside
  it
  
-igor
  
  
On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED]
  wrote:
 Regardless, at the very least this makes your site look weird and
  unprofessional when google puts a jsessionid on your url.  There has
  got
to
  be some negative effect when google visits it the second time and the
  jsessionid has changed but it sees the same exact content.  Worst
  case,
  it'll think you're trying to trick it.

  About those 404s, I'm finding that with the fix I provided I don't get
  a
  404, but the links refresh the page I'm already on.  IE: If I'm on A,
  and
a
  link to B is non-bookmarkable, clicking B refreshes A.

  This issue is very disconcerting to me.  It's one of the reasons I
  wish
that
  DataView had an option to work in stateless mode.  Cause if I ban
  cookies
  and Googlebot visits my home page (with a navigator on it), it'll try
  to
  follow all these page links and from its perspective, they all lead
  back
to
  the first page.  So it's kinda a catch-22: Include the jsessionid in
  the
  urls and get bad SEO or remove the jsessionid and get bad SEO :(

  Perhaps the answer to my prayers is a combination of the
  noindex/nofollow
  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the
home
  page (so googlebot doesn't try to follow the navigator links) and use
  the
  sitemap.xml to point out the individual pages I want it to index.


  Matej: can you go into more detail about your hybrid URL statement?
Won't
  google index, for example, /home and /home.1 if I use it?  When it
follows
  the next page, won't the url become /home.1.2 or something?  That .2
  is a
  page version: If google indexes that and tries to visit it again,
  won't
it
  report about an invalid session?



  -Original Message-
  From: Matej Knopp [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 11:10 AM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  On the other hand, crawling non-bookmarkable pages is not very useful
  anyway, since ?wicket:interface url will always get page expired when
  you click on the result.

  However, preserving session makes lot of sense with hybrid url. Google
  remembers the original url (without page instance) while indexing the
  real page (after redirect).

  I think though that the crawler is quite advanced. I'm would think  it
  supports cookies (at least JSESSIONID) as well as it evaluates some of
  the javascript on page.

  -Matej

  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
  [EMAIL PROTECTED]
  wrote:
   right. if you strip sessionid then all your nonbookmarkable urls
  will
resolve to a 404. that will probably drop your rank a lot
  faster
  
-igor
  
  
  
  
On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
  [EMAIL PROTECTED]
  wrote:
 the problem is that then you have to have all stateless pages.
  Else
  google
  can't crawl your website.
  And if that is the case then you could be completely stateless
  so
you
  dont
  have a session (id) to worry about at all.

  johan






  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
  [EMAIL PROTECTED] wrote:

   When Google asks to not have special treatment for their bot,
they
  are
   referring to content more than anything. Regarding the session
  id
  being
   coded in the URL, see the Technical guidelines section

RE: Removing the jsessionid for SEO

2008-04-03 Thread Dan Kaplan

Awesome, thanks

-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2008 3:40 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

you subclass the pagenavigator and make it use bookmarkable links
also. it has factory methods for all the links it uses.

-igor

On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 I wasn't talking about the links that are on the list (I already make
those
  bookmarkable).  I'm talking about the links that the Navigator generates.
  How do I make it so page 2 is bookmarkable?

  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]

 Sent: Thursday, April 03, 2008 3:30 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  instead of

  item.add(new link(foo) { onclick() });

  do

  item.add(new bookmarkablepagelink(foo, page.class));

  -igor

  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED]
wrote:
   How?  I asked how to do it before and nobody suggested this as a
possibility.

-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2008 3:26 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

dataview can work in a stateless mode, just use bookmarkable links
inside
  it

-igor

On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED]
  wrote:
 Regardless, at the very least this makes your site look weird and
  unprofessional when google puts a jsessionid on your url.  There
has
  got
to
  be some negative effect when google visits it the second time and
the
  jsessionid has changed but it sees the same exact content.  Worst
  case,
  it'll think you're trying to trick it.

  About those 404s, I'm finding that with the fix I provided I don't
get
  a
  404, but the links refresh the page I'm already on.  IE: If I'm on
A,
  and
a
  link to B is non-bookmarkable, clicking B refreshes A.

  This issue is very disconcerting to me.  It's one of the reasons I
  wish
that
  DataView had an option to work in stateless mode.  Cause if I ban
  cookies
  and Googlebot visits my home page (with a navigator on it), it'll
try
  to
  follow all these page links and from its perspective, they all lead
  back
to
  the first page.  So it's kinda a catch-22: Include the jsessionid
in
  the
  urls and get bad SEO or remove the jsessionid and get bad SEO :(

  Perhaps the answer to my prayers is a combination of the
  noindex/nofollow
  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on
the
home
  page (so googlebot doesn't try to follow the navigator links) and
use
  the
  sitemap.xml to point out the individual pages I want it to index.

  Matej: can you go into more detail about your hybrid URL statement?
Won't
  google index, for example, /home and /home.1 if I use it?  When it
follows
  the next page, won't the url become /home.1.2 or something?  That
.2
  is a
  page version: If google indexes that and tries to visit it again,
  won't
it
  report about an invalid session?

  -Original Message-
  From: Matej Knopp [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 11:10 AM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  On the other hand, crawling non-bookmarkable pages is not very
useful
  anyway, since ?wicket:interface url will always get page expired
when
  you click on the result.

  However, preserving session makes lot of sense with hybrid url.
Google
  remembers the original url (without page instance) while indexing
the
  real page (after redirect).

  I think though that the crawler is quite advanced. I'm would think
it
  supports cookies (at least JSESSIONID) as well as it evaluates some
of
  the javascript on page.

  -Matej

  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
  [EMAIL PROTECTED]
  wrote:
   right. if you strip sessionid then all your nonbookmarkable urls
  will
resolve to a 404. that will probably drop your rank a lot
  faster

-igor

On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
  [EMAIL PROTECTED]
  wrote:
 the problem is that then you have to have all stateless pages.
  Else
  google
  can't crawl your website.
  And if that is the case then you could be completely
stateless
  so
you
  dont
  have a session (id) to worry about at all.

  johan

  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry 
  [EMAIL PROTECTED] wrote:

   When Google asks to not have special

RE: Removing the jsessionid for SEO

2008-04-03 Thread Dan Kaplan

-Original Message-
From: Martijn Dashorst [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 03, 2008 3:36 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

On 4/4/08, Dan Kaplan [EMAIL PROTECTED] wrote:
 Regardless, at the very least this makes your site look weird and
  unprofessional when google puts a jsessionid on your url.

0.5% of your users care about the URL that is displayed in a google
search result. It doesn't look weird or unprofessional. It is not like
your URL ends in .php or *gawk* .asp is it? It brings the
sophistication of Java to your users.

My URL ends with ;jsessionid=an7goabg0az (my actual situation).  I
personally think that looks weirder than .php or .asp.  

Where did you get that 0.5% statistic?  Regardless, my users won't see ANY
url if my site is on the 50th page of the search.  That's the important
issue here.  

  There has got to
  be some negative effect when google visits it the second time and the
  jsessionid has changed but it sees the same exact content.  Worst case,
  it'll think you're trying to trick it.

I think you need to give the google engineers *some* credit. I
seriously doubt they are *THAT* stupid.

Martijn

These links suggest otherwise:
http://www.webmasterworld.com/google/3238326.htm
http://www.webmasterworld.com/forum3/5624.htm
http://www.webmasterworld.com/forum3/5479.htm
http://randomcoder.com/articles/jsessionid-considered-harmful

Google jsessionid SEO for more.  Most of the results tell you to get rid
of the jsessionid.  Granted, it doesn't seem google has specifically
mentioned this either way so all these comments are rumors.  But the fact of
the matter is Google *DOES* index your urls with the jessionid still in it.
You'd think they'd be smart enough to remove that, right?  If they can't get
that much right, I wouldn't want to make any other assumptions about their
abilities on similar matters.  

-- 
Buy Wicket in Action: http://manning.com/dashorst
Apache Wicket 1.3.2 is released
Get it now: http://www.apache.org/dyn/closer.cgi/wicket/1.3.2

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-03 Thread Martijn Dashorst

On 4/4/08, Dan Kaplan [EMAIL PROTECTED] wrote:
 My URL ends with ;jsessionid=an7goabg0az (my actual situation).  I
 personally think that looks weirder than .php or .asp.

Nah, it shows that you are using Java. Much more sophisticated!

  Where did you get that 0.5% statistic?  Regardless, my users won't see ANY
  url if my site is on the 50th page of the search.  That's the important
  issue here.

I made the 0.5% statistic up. Developers are notoriously anal about
URL's where John and Jane Doe typically just use the google search box
as their URL bar. Did you ever look at the URLs of Amazon? they are
not pretty, and you'd need to have a very weird jsessionid to
overthrow Amazon's URL scheme on the ugly scale.

Where's the proof that Google punishes you for having a jsessionid in the URL?

  I think you need to give the google engineers *some* credit. I
  seriously doubt they are *THAT* stupid.
 These links suggest otherwise:
  http://www.webmasterworld.com/google/3238326.htm
  http://www.webmasterworld.com/forum3/5624.htm
  http://www.webmasterworld.com/forum3/5479.htm
  http://randomcoder.com/articles/jsessionid-considered-harmful

These links are from 2002 (over 5 years ago). Wicket wasn't even born
then. I surely hope that technology has evolved since then.

Anyway, I'm glad I don't have to build apps that require SEO or public
bots that navigate our sites. In fact if that ever happened, I think
our company would instantly be very famous (we deal with privacy
sensitive information that should stay out of Google/Yahoo/LiveSeach's
indexes)

Martijn

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-03 Thread Igor Vaynberg

On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 Ok I did a little preliminary research on this.  Right now PagingNavigator
  uses PagingNavigationLink's to represent its page.  This extends Link.  I'm
  supposed to override PagingNavigator's newPagingNavigationLink() method to
  accomplish this (I think) but past that, this isn't very straightforward to
  me.

  Do I need to create my own BookmarkablePagingNavigationLink?  When I do...
  what next?  I really don't know enough about bookmarkablePageLinks to do
  this.  Right now, all the magic happens inside PagingNavigationLink.  Won't
  I have to move all that logic into the WebPage that I'm passing into
  BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am I
  missing something critical?

no, you are not missing anything. you see, when you go stateless, like
what you want, then you have to recreate all the magic stuff that
makes stateful links Just Work. Without state you are back to the
servlet/mvc programming model: you have to encode the state that you
want into the link, then on the trip back decode it, recreate
something from it, and then apply that something onto the components.
This is the crapwork that wicket does for you usually.

-igor




   -Original Message-
   From: Igor Vaynberg [mailto:[EMAIL PROTECTED]


  Sent: Thursday, April 03, 2008 3:40 PM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO
  
   you subclass the pagenavigator and make it use bookmarkable links
   also. it has factory methods for all the links it uses.
  
   -igor
  
  
   On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED]
   wrote:
I wasn't talking about the links that are on the list (I already make
   those
 bookmarkable).  I'm talking about the links that the Navigator
   generates.
 How do I make it so page 2 is bookmarkable?
   
   
 -Original Message-
 From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
   
   
Sent: Thursday, April 03, 2008 3:30 PM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO
   
 instead of
   
 item.add(new link(foo) { onclick() });
   
 do
   
 item.add(new bookmarkablepagelink(foo, page.class));
   
 -igor
   
   
 On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED]
   wrote:
  How?  I asked how to do it before and nobody suggested this as a
   possibility.
 
 
 
   -Original Message-
   From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 03, 2008 3:26 PM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO
 
   dataview can work in a stateless mode, just use bookmarkable links
   inside
 it
 
   -igor
 
 
   On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED]
 wrote:
Regardless, at the very least this makes your site look weird
   and
 unprofessional when google puts a jsessionid on your url.  There
   has
 got
   to
 be some negative effect when google visits it the second time and
   the
 jsessionid has changed but it sees the same exact content.  Worst
 case,
 it'll think you're trying to trick it.
   
 About those 404s, I'm finding that with the fix I provided I
   don't get
 a
 404, but the links refresh the page I'm already on.  IE: If I'm
   on A,
 and
   a
 link to B is non-bookmarkable, clicking B refreshes A.
   
 This issue is very disconcerting to me.  It's one of the reasons
   I
 wish
   that
 DataView had an option to work in stateless mode.  Cause if I ban
 cookies
 and Googlebot visits my home page (with a navigator on it), it'll
   try
 to
 follow all these page links and from its perspective, they all
   lead
 back
   to
 the first page.  So it's kinda a catch-22: Include the jsessionid
   in
 the
 urls and get bad SEO or remove the jsessionid and get bad SEO :(
   
 Perhaps the answer to my prayers is a combination of the
 noindex/nofollow
 meta tag with a sitemap.xml.  I'm thinking I can put a nofollow
   on the
   home
 page (so googlebot doesn't try to follow the navigator links) and
   use
 the
 sitemap.xml to point out the individual pages I want it to index.
   
   
 Matej: can you go into more detail about your hybrid URL
   statement?
   Won't
 google index, for example, /home and /home.1 if I use it?  When
   it
   follows
 the next page, won't the url become /home.1.2 or something?  That
   .2
 is a
 page version: If google indexes that and tries to visit it again,
 won't
   it
 report about an invalid session?
   
   
   
 -Original Message-
 From: Matej Knopp [mailto:[EMAIL

RE: Removing the jsessionid for SEO

2008-04-03 Thread Dan Kaplan

Ok I did a little preliminary research on this.  Right now PagingNavigator
uses PagingNavigationLink's to represent its page.  This extends Link.  I'm
supposed to override PagingNavigator's newPagingNavigationLink() method to
accomplish this (I think) but past that, this isn't very straightforward to
me.  

Do I need to create my own BookmarkablePagingNavigationLink?  When I do...
what next?  I really don't know enough about bookmarkablePageLinks to do
this.  Right now, all the magic happens inside PagingNavigationLink.  Won't
I have to move all that logic into the WebPage that I'm passing into
BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am I
missing something critical?

 -Original Message-
 From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 3:40 PM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO
 
 you subclass the pagenavigator and make it use bookmarkable links
 also. it has factory methods for all the links it uses.
 
 -igor
 
 
 On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED]
 wrote:
  I wasn't talking about the links that are on the list (I already make
 those
   bookmarkable).  I'm talking about the links that the Navigator
 generates.
   How do I make it so page 2 is bookmarkable?
 
 
   -Original Message-
   From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
 
 
  Sent: Thursday, April 03, 2008 3:30 PM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO
 
   instead of
 
   item.add(new link(foo) { onclick() });
 
   do
 
   item.add(new bookmarkablepagelink(foo, page.class));
 
   -igor
 
 
   On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan [EMAIL PROTECTED]
 wrote:
How?  I asked how to do it before and nobody suggested this as a
 possibility.
   
   
   
 -Original Message-
 From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 3:26 PM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO
   
 dataview can work in a stateless mode, just use bookmarkable links
 inside
   it
   
 -igor
   
   
 On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan [EMAIL PROTECTED]
   wrote:
  Regardless, at the very least this makes your site look weird
 and
   unprofessional when google puts a jsessionid on your url.  There
 has
   got
 to
   be some negative effect when google visits it the second time and
 the
   jsessionid has changed but it sees the same exact content.  Worst
   case,
   it'll think you're trying to trick it.
 
   About those 404s, I'm finding that with the fix I provided I
 don't get
   a
   404, but the links refresh the page I'm already on.  IE: If I'm
 on A,
   and
 a
   link to B is non-bookmarkable, clicking B refreshes A.
 
   This issue is very disconcerting to me.  It's one of the reasons
 I
   wish
 that
   DataView had an option to work in stateless mode.  Cause if I ban
   cookies
   and Googlebot visits my home page (with a navigator on it), it'll
 try
   to
   follow all these page links and from its perspective, they all
 lead
   back
 to
   the first page.  So it's kinda a catch-22: Include the jsessionid
 in
   the
   urls and get bad SEO or remove the jsessionid and get bad SEO :(
 
   Perhaps the answer to my prayers is a combination of the
   noindex/nofollow
   meta tag with a sitemap.xml.  I'm thinking I can put a nofollow
 on the
 home
   page (so googlebot doesn't try to follow the navigator links) and
 use
   the
   sitemap.xml to point out the individual pages I want it to index.
 
 
   Matej: can you go into more detail about your hybrid URL
 statement?
 Won't
   google index, for example, /home and /home.1 if I use it?  When
 it
 follows
   the next page, won't the url become /home.1.2 or something?  That
 .2
   is a
   page version: If google indexes that and tries to visit it again,
   won't
 it
   report about an invalid session?
 
 
 
   -Original Message-
   From: Matej Knopp [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 03, 2008 11:10 AM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO
 
   On the other hand, crawling non-bookmarkable pages is not very
 useful
   anyway, since ?wicket:interface url will always get page expired
 when
   you click on the result.
 
   However, preserving session makes lot of sense with hybrid url.
 Google
   remembers the original url (without page instance) while indexing
 the
   real page (after redirect).
 
   I think though that the crawler is quite advanced. I'm would
 think  it
   supports cookies (at least JSESSIONID) as well as it evaluates
 some of
   the javascript on page.
 
   -Matej
 
   On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg

RE: Removing the jsessionid for SEO

2008-04-03 Thread Dan Kaplan

Ok, at least I'm not missing anything.  I understand the benefits it's
providing with its stateful framework.  Developing a site with Wicket is
easier than with any other framework I've used.  But this statefulness,
which makes websites so easy to develop, seems to be counter productive to
SEO:  

GoogleBot will follow and index stateful links.  Worst case scenario, these
actually become visible to google users and when they click the link it
takes them to an invalid session page.  They think, This site is broken
and move on to the next link of their search result.  

Another approach to solving this is to block all the stateful pages in my
robots.txt file.  But how can I block these links in robots.txt since they
change per session?  Is there any way to know what the url will resolve to
when googlebot tries to visit my site so I can tell it to disallow:
/?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?  


 -Original Message-
 From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 5:45 PM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO
 
 On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED]
 wrote:
  Ok I did a little preliminary research on this.  Right now
 PagingNavigator
   uses PagingNavigationLink's to represent its page.  This extends Link.
 I'm
   supposed to override PagingNavigator's newPagingNavigationLink() method
 to
   accomplish this (I think) but past that, this isn't very
 straightforward to
   me.
 
   Do I need to create my own BookmarkablePagingNavigationLink?  When I
 do...
   what next?  I really don't know enough about bookmarkablePageLinks to
 do
   this.  Right now, all the magic happens inside PagingNavigationLink.
 Won't
   I have to move all that logic into the WebPage that I'm passing into
   BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am I
   missing something critical?
 
 no, you are not missing anything. you see, when you go stateless, like
 what you want, then you have to recreate all the magic stuff that
 makes stateful links Just Work. Without state you are back to the
 servlet/mvc programming model: you have to encode the state that you
 want into the link, then on the trip back decode it, recreate
 something from it, and then apply that something onto the components.
 This is the crapwork that wicket does for you usually.
 
 -igor
 
 
 
 
-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
 
 
   Sent: Thursday, April 03, 2008 3:40 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO
   
you subclass the pagenavigator and make it use bookmarkable links
also. it has factory methods for all the links it uses.
   
-igor
   
   
On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED]
wrote:
 I wasn't talking about the links that are on the list (I already
 make
those
  bookmarkable).  I'm talking about the links that the Navigator
generates.
  How do I make it so page 2 is bookmarkable?


  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]


 Sent: Thursday, April 03, 2008 3:30 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  instead of

  item.add(new link(foo) { onclick() });

  do

  item.add(new bookmarkablepagelink(foo, page.class));

  -igor


  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
 [EMAIL PROTECTED]
wrote:
   How?  I asked how to do it before and nobody suggested this as a
possibility.
  
  
  
-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2008 3:26 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO
  
dataview can work in a stateless mode, just use bookmarkable
 links
inside
  it
  
-igor
  
  
On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
 [EMAIL PROTECTED]
  wrote:
 Regardless, at the very least this makes your site look
 weird
and
  unprofessional when google puts a jsessionid on your url.
 There
has
  got
to
  be some negative effect when google visits it the second
 time and
the
  jsessionid has changed but it sees the same exact content.
 Worst
  case,
  it'll think you're trying to trick it.

  About those 404s, I'm finding that with the fix I provided I
don't get
  a
  404, but the links refresh the page I'm already on.  IE: If
 I'm
on A,
  and
a
  link to B is non-bookmarkable, clicking B refreshes A.

  This issue is very disconcerting to me.  It's one of the
 reasons
I
  wish
that
  DataView had an option to work in stateless mode.  Cause if
 I

Re: Removing the jsessionid for SEO

2008-04-03 Thread Jeremy Levy

We have a similar issue, and are trying the following out right now..

http://www.google.com/support/webmasters/bin/answer.py?hl=enanswer=40367

User-agent: *
Disallow: /*?




On Thu, Apr 3, 2008 at 9:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote:

 Ok, at least I'm not missing anything.  I understand the benefits it's
 providing with its stateful framework.  Developing a site with Wicket is
 easier than with any other framework I've used.  But this statefulness,
 which makes websites so easy to develop, seems to be counter productive to
 SEO:

 GoogleBot will follow and index stateful links.  Worst case scenario,
 these
 actually become visible to google users and when they click the link it
 takes them to an invalid session page.  They think, This site is
 broken
 and move on to the next link of their search result.

 Another approach to solving this is to block all the stateful pages in my
 robots.txt file.  But how can I block these links in robots.txt since they
 change per session?  Is there any way to know what the url will resolve to
 when googlebot tries to visit my site so I can tell it to disallow:
 /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?


  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 5:45 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO
 
  On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED]
  wrote:
   Ok I did a little preliminary research on this.  Right now
  PagingNavigator
uses PagingNavigationLink's to represent its page.  This extends
 Link.
  I'm
supposed to override PagingNavigator's newPagingNavigationLink()
 method
  to
accomplish this (I think) but past that, this isn't very
  straightforward to
me.
  
Do I need to create my own BookmarkablePagingNavigationLink?  When I
  do...
what next?  I really don't know enough about bookmarkablePageLinks to
  do
this.  Right now, all the magic happens inside PagingNavigationLink.
  Won't
I have to move all that logic into the WebPage that I'm passing into
BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am
 I
missing something critical?
 
  no, you are not missing anything. you see, when you go stateless, like
  what you want, then you have to recreate all the magic stuff that
  makes stateful links Just Work. Without state you are back to the
  servlet/mvc programming model: you have to encode the state that you
  want into the link, then on the trip back decode it, recreate
  something from it, and then apply that something onto the components.
  This is the crapwork that wicket does for you usually.
 
  -igor
 
 
  
  
 -Original Message-
 From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
  
  
Sent: Thursday, April 03, 2008 3:40 PM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO

 you subclass the pagenavigator and make it use bookmarkable links
 also. it has factory methods for all the links it uses.

 -igor


 On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED]
 
 wrote:
  I wasn't talking about the links that are on the list (I already
  make
 those
   bookmarkable).  I'm talking about the links that the Navigator
 generates.
   How do I make it so page 2 is bookmarkable?
 
 
   -Original Message-
   From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
 
 
  Sent: Thursday, April 03, 2008 3:30 PM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO
 
   instead of
 
   item.add(new link(foo) { onclick() });
 
   do
 
   item.add(new bookmarkablepagelink(foo, page.class));
 
   -igor
 
 
   On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
  [EMAIL PROTECTED]
 wrote:
How?  I asked how to do it before and nobody suggested this as
 a
 possibility.
   
   
   
 -Original Message-
 From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 3:26 PM
 To: users@wicket.apache.org
 Subject: Re: Removing the jsessionid for SEO
   
 dataview can work in a stateless mode, just use bookmarkable
  links
 inside
   it
   
 -igor
   
   
 On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
  [EMAIL PROTECTED]
   wrote:
  Regardless, at the very least this makes your site look
  weird
 and
   unprofessional when google puts a jsessionid on your url.
  There
 has
   got
 to
   be some negative effect when google visits it the second
  time and
 the
   jsessionid has changed but it sees the same exact content.
  Worst
   case,
   it'll think you're trying to trick it.
 
   About those 404s, I'm finding that with the fix I

Re: Removing the jsessionid for SEO

2008-04-03 Thread Igor Vaynberg

On Thu, Apr 3, 2008 at 6:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 Ok, at least I'm not missing anything.  I understand the benefits it's
  providing with its stateful framework.  Developing a site with Wicket is
  easier than with any other framework I've used.  But this statefulness,
  which makes websites so easy to develop, seems to be counter productive to
  SEO:

well, perhaps the differentiator here is that wicket is made for web
applications not web sites.

  GoogleBot will follow and index stateful links.  Worst case scenario, these
  actually become visible to google users and when they click the link it
  takes them to an invalid session page.  They think, This site is broken
  and move on to the next link of their search result.

yep, you need to make sure that all stateful links are behind a login
or something similar that the bot cant get passed.

  Another approach to solving this is to block all the stateful pages in my
  robots.txt file.  But how can I block these links in robots.txt since they
  change per session?  Is there any way to know what the url will resolve to
  when googlebot tries to visit my site so I can tell it to disallow:
  /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?

no there isnt a way, you have to use wildmasks

on the other hand it is not that difficult to develop the stateless
paging navigator, it will take a bit of work though.

-igor







   -Original Message-
   From: Igor Vaynberg [mailto:[EMAIL PROTECTED]


  Sent: Thursday, April 03, 2008 5:45 PM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO
  
   On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED]
   wrote:
Ok I did a little preliminary research on this.  Right now
   PagingNavigator
 uses PagingNavigationLink's to represent its page.  This extends Link.
   I'm
 supposed to override PagingNavigator's newPagingNavigationLink() method
   to
 accomplish this (I think) but past that, this isn't very
   straightforward to
 me.
   
 Do I need to create my own BookmarkablePagingNavigationLink?  When I
   do...
 what next?  I really don't know enough about bookmarkablePageLinks to
   do
 this.  Right now, all the magic happens inside PagingNavigationLink.
   Won't
 I have to move all that logic into the WebPage that I'm passing into
 BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am I
 missing something critical?
  
   no, you are not missing anything. you see, when you go stateless, like
   what you want, then you have to recreate all the magic stuff that
   makes stateful links Just Work. Without state you are back to the
   servlet/mvc programming model: you have to encode the state that you
   want into the link, then on the trip back decode it, recreate
   something from it, and then apply that something onto the components.
   This is the crapwork that wicket does for you usually.
  
   -igor
  
  
   
   
  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
   
   
 Sent: Thursday, April 03, 2008 3:40 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO
 
  you subclass the pagenavigator and make it use bookmarkable links
  also. it has factory methods for all the links it uses.
 
  -igor
 
 
  On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan [EMAIL PROTECTED]
  wrote:
   I wasn't talking about the links that are on the list (I already
   make
  those
bookmarkable).  I'm talking about the links that the Navigator
  generates.
How do I make it so page 2 is bookmarkable?
  
  
-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
  
  
   Sent: Thursday, April 03, 2008 3:30 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO
  
instead of
  
item.add(new link(foo) { onclick() });
  
do
  
item.add(new bookmarkablepagelink(foo, page.class));
  
-igor
  
  
On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
   [EMAIL PROTECTED]
  wrote:
 How?  I asked how to do it before and nobody suggested this as a
  possibility.



  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 3:26 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  dataview can work in a stateless mode, just use bookmarkable
   links
  inside
it

  -igor


  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
   [EMAIL PROTECTED]
wrote:
   Regardless, at the very least this makes your site look
   weird
  and
unprofessional when google puts a jsessionid on your url

Re: Removing the jsessionid for SEO

2008-04-03 Thread Jeremy Levy

To clarify my message below:  With a CryptedUrlWebRequestCodingStrategy and
alot of BookmarkablePages.

On Thu, Apr 3, 2008 at 9:16 PM, Jeremy Levy [EMAIL PROTECTED] wrote:

 We have a similar issue, and are trying the following out right now..

 http://www.google.com/support/webmasters/bin/answer.py?hl=enanswer=40367

 User-agent: *
 Disallow: /*?




 On Thu, Apr 3, 2008 at 9:09 PM, Dan Kaplan [EMAIL PROTECTED]
 wrote:

  Ok, at least I'm not missing anything.  I understand the benefits it's
  providing with its stateful framework.  Developing a site with Wicket is
  easier than with any other framework I've used.  But this statefulness,
  which makes websites so easy to develop, seems to be counter productive
  to
  SEO:
 
  GoogleBot will follow and index stateful links.  Worst case scenario,
  these
  actually become visible to google users and when they click the link it
  takes them to an invalid session page.  They think, This site is
  broken
  and move on to the next link of their search result.
 
  Another approach to solving this is to block all the stateful pages in
  my
  robots.txt file.  But how can I block these links in robots.txt since
  they
  change per session?  Is there any way to know what the url will resolve
  to
  when googlebot tries to visit my site so I can tell it to disallow:
  /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?
 
 
   -Original Message-
   From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 03, 2008 5:45 PM
   To: users@wicket.apache.org
   Subject: Re: Removing the jsessionid for SEO
  
   On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED]
   wrote:
Ok I did a little preliminary research on this.  Right now
   PagingNavigator
 uses PagingNavigationLink's to represent its page.  This extends
  Link.
   I'm
 supposed to override PagingNavigator's newPagingNavigationLink()
  method
   to
 accomplish this (I think) but past that, this isn't very
   straightforward to
 me.
   
 Do I need to create my own BookmarkablePagingNavigationLink?  When
  I
   do...
 what next?  I really don't know enough about bookmarkablePageLinks
  to
   do
 this.  Right now, all the magic happens inside
  PagingNavigationLink.
   Won't
 I have to move all that logic into the WebPage that I'm passing
  into
 BookmarkablePagingNavigationLink?  This seems like a lot of work.
   Am I
 missing something critical?
  
   no, you are not missing anything. you see, when you go stateless, like
   what you want, then you have to recreate all the magic stuff that
   makes stateful links Just Work. Without state you are back to the
   servlet/mvc programming model: you have to encode the state that you
   want into the link, then on the trip back decode it, recreate
   something from it, and then apply that something onto the components.
   This is the crapwork that wicket does for you usually.
  
   -igor
  
  
   
   
  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
   
   
 Sent: Thursday, April 03, 2008 3:40 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO
 
  you subclass the pagenavigator and make it use bookmarkable links
  also. it has factory methods for all the links it uses.
 
  -igor
 
 
  On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan 
  [EMAIL PROTECTED]
  wrote:
   I wasn't talking about the links that are on the list (I
  already
   make
  those
bookmarkable).  I'm talking about the links that the Navigator
  generates.
How do I make it so page 2 is bookmarkable?
  
  
-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
  
  
   Sent: Thursday, April 03, 2008 3:30 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO
  
instead of
  
item.add(new link(foo) { onclick() });
  
do
  
item.add(new bookmarkablepagelink(foo, page.class));
  
-igor
  
  
On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
   [EMAIL PROTECTED]
  wrote:
 How?  I asked how to do it before and nobody suggested this
  as a
  possibility.



  -Original Message-
  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 03, 2008 3:26 PM
  To: users@wicket.apache.org
  Subject: Re: Removing the jsessionid for SEO

  dataview can work in a stateless mode, just use
  bookmarkable
   links
  inside
it

  -igor


  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
   [EMAIL PROTECTED]
wrote:
   Regardless, at the very least this makes your site look
   weird
  and
unprofessional when google puts a jsessionid on your
  url

Re: Removing the jsessionid for SEO

2008-04-03 Thread Jeremy Thomerson

I've been building a community-driven hunting and fishing site in Texas for
the past year and a half. Since I have converted it to Wicket from
ColdFusion, our search engine rankings have gone WAY UP. That's right,
we're on the first page for tons of searches. Search for texas hunting -
we're second under only the Texas Parks and Wildlife Association.

How? With Wicket? Yes - it requires a little more work. What I do is that
for any link that I want Google to be able to follow, I have a subclass of
Link specific to that. For instance, ViewThreadLink, which takes the ID for
the link and a model (detachable) of the thread. Then I mount an
IRequestTargetUrlCodingStrategy for each big category of things in my
webapp. I've made several strategies that I use over and over, just giving
them a different mount path and a different parameter to tell it what kind
of article, etc, that it will match to. This is made easier because over
75% of the objects in our site are all similar enough that the extend from a
base class that provides the basic functionality for an article / thread /
etc that has a title, text, pictures, comments, the standard stuff.

So, yes, it takes work. But that's okay - SEO always takes work. I also
have given a lot of care to use good page titles, good semantic HTML and
stuff things into the URL that don't have anything to do with locating the
resource, but give the search engines a clue as to what the content is.

Yes, some pages end up with a jsessionid - and I don't like it (example:
http://www.google.com/search?hl=enclient=firefox-arls=com.ubuntu%3Aen-US%3Aofficialq=%22south+texas+management+buck%22btnG=Search).
But, most don't because almost all of my links are bookmarkable. When the
user clicks something that they can only do as a signed-in user, then it
redirects them to the sign in page, they sign in, and are taken back to the
page they were on. Then they can pick up, and I don't worry about
bookmarkable URLs for anything that requires user-authentication (wizards to
post a new listing, story, admin links, etc).

Jeremy Thomerson
TexasHuntFish.com

On Thu, Apr 3, 2008 at 8:09 PM, Dan Kaplan [EMAIL PROTECTED] wrote:

Ok, at least I'm not missing anything. I understand the benefits it's
providing with its stateful framework. Developing a site with Wicket is
easier than with any other framework I've used. But this statefulness,
which makes websites so easy to develop, seems to be counter productive to
SEO:

GoogleBot will follow and index stateful links. Worst case scenario,
these
actually become visible to google users and when they click the link it
takes them to an invalid session page. They think, This site is
broken
and move on to the next link of their search result.

Another approach to solving this is to block all the stateful pages in my
robots.txt file. But how can I block these links in robots.txt since they
change per session? Is there any way to know what the url will resolve to
when googlebot tries to visit my site so I can tell it to disallow:
/?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?

-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 03, 2008 5:45 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan [EMAIL PROTECTED]
wrote:
Ok I did a little preliminary research on this. Right now
PagingNavigator
uses PagingNavigationLink's to represent its page. This extends
Link.
I'm
supposed to override PagingNavigator's newPagingNavigationLink()
method
to
accomplish this (I think) but past that, this isn't very
straightforward to
me.

Do I need to create my own BookmarkablePagingNavigationLink? When I
do...
what next? I really don't know enough about bookmarkablePageLinks to
do
this. Right now, all the magic happens inside PagingNavigationLink.
Won't
I have to move all that logic into the WebPage that I'm passing into
BookmarkablePagingNavigationLink? This seems like a lot of work. Am
I
missing something critical?

no, you are not missing anything. you see, when you go stateless, like
what you want, then you have to recreate all the magic stuff that
makes stateful links Just Work. Without state you are back to the
servlet/mvc programming model: you have to encode the state that you
want into the link, then on the trip back decode it, recreate
something from it, and then apply that something onto the components.
This is the crapwork that wicket does for you usually.

-igor

-Original Message-
From: Igor Vaynberg [mailto:[EMAIL PROTECTED]

Sent: Thursday, April 03, 2008 3:40 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

you subclass the pagenavigator and make it use bookmarkable links
also. it has factory methods for all

Removing the jsessionid for SEO

2008-04-02 Thread Dan Kaplan

victori_ provided this information on IRC and I just wanted to share it with
everyone else.  Googlebot and others dont use cookies.  This means when
they visit your site it adds ;jsessionid=code to the end of all your urls
they visit.  When they re-visit it, they get a different code, consider that
a different url with the same content and punish you.  So, for the web
crawling bots, its very important to get rid of this (Perhaps its
worthwhile to check this code in to the code base).  

Heres what you do in your Application:

  @Override  
protected WebResponse newWebResponse(final HttpServletResponse servletRe
sponse) {  
  return CleanWebResponse.getNew(this, servletResponse);  
  } 

Here's the CleanWebResponse class:
public class CleanWebResponse {
public static WebResponse getNew(final Application app, final
HttpServletResponse servletResponse) {
return app.getRequestCycleSettings().getBufferResponse() ? new
Buffered(servletResponse) : new Unbuffered(
servletResponse);
}

static class Buffered extends BufferedWebResponse {
public Buffered(final HttpServletResponse httpServletResponse) {
super(httpServletResponse);
}

@Override
public CharSequence encodeURL(final CharSequence url) {
return url;
}
}

static class Unbuffered extends WebResponse {
public Unbuffered(final HttpServletResponse httpServletResponse) {
super(httpServletResponse);
}

@Override
public CharSequence encodeURL(final CharSequence url) {
return url;
}
}
}

Note, I haven't tested this myself yet but I plan to tonight.  Hope this was
helpful.  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-02 Thread Igor Vaynberg

you would think that the crawl bots are smart enough to ignore
jsessionid tokens...

-igor


On Wed, Apr 2, 2008 at 5:20 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 victori_ provided this information on IRC and I just wanted to share it with
  everyone else.  Googlebot and others don't use cookies.  This means when
  they visit your site it adds ;jsessionid=code to the end of all your urls
  they visit.  When they re-visit it, they get a different code, consider that
  a different url with the same content and punish you.  So, for the web
  crawling bots, it's very important to get rid of this (Perhaps it's
  worthwhile to check this code in to the code base).

  Here's what you do in your Application:

   @Override
  protected WebResponse newWebResponse(final HttpServletResponse servletRe
  sponse) {
   return CleanWebResponse.getNew(this, servletResponse);
   }

  Here's the CleanWebResponse class:
  public class CleanWebResponse {
 public static WebResponse getNew(final Application app, final
  HttpServletResponse servletResponse) {
 return app.getRequestCycleSettings().getBufferResponse() ? new
  Buffered(servletResponse) : new Unbuffered(
 servletResponse);
 }

 static class Buffered extends BufferedWebResponse {
 public Buffered(final HttpServletResponse httpServletResponse) {
 super(httpServletResponse);
 }

 @Override
 public CharSequence encodeURL(final CharSequence url) {
 return url;
 }
 }

 static class Unbuffered extends WebResponse {
 public Unbuffered(final HttpServletResponse httpServletResponse) {
 super(httpServletResponse);
 }

 @Override
 public CharSequence encodeURL(final CharSequence url) {
 return url;
 }
 }
  }

  Note, I haven't tested this myself yet but I plan to tonight.  Hope this was
  helpful.


  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-02 Thread Igor Vaynberg

also by doing what you have done users with cookies disabled wont be
able to use your site...

-igor


On Wed, Apr 2, 2008 at 7:44 PM, Igor Vaynberg [EMAIL PROTECTED] wrote:
 you would think that the crawl bots are smart enough to ignore
  jsessionid tokens...

  -igor




  On Wed, Apr 2, 2008 at 5:20 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
   victori_ provided this information on IRC and I just wanted to share it 
 with
everyone else.  Googlebot and others don't use cookies.  This means when
they visit your site it adds ;jsessionid=code to the end of all your urls
they visit.  When they re-visit it, they get a different code, consider 
 that
a different url with the same content and punish you.  So, for the web
crawling bots, it's very important to get rid of this (Perhaps it's
worthwhile to check this code in to the code base).
  
Here's what you do in your Application:
  
 @Override
protected WebResponse newWebResponse(final HttpServletResponse 
 servletRe
sponse) {
 return CleanWebResponse.getNew(this, servletResponse);
 }
  
Here's the CleanWebResponse class:
public class CleanWebResponse {
   public static WebResponse getNew(final Application app, final
HttpServletResponse servletResponse) {
   return app.getRequestCycleSettings().getBufferResponse() ? new
Buffered(servletResponse) : new Unbuffered(
   servletResponse);
   }
  
   static class Buffered extends BufferedWebResponse {
   public Buffered(final HttpServletResponse httpServletResponse) {
   super(httpServletResponse);
   }
  
   @Override
   public CharSequence encodeURL(final CharSequence url) {
   return url;
   }
   }
  
   static class Unbuffered extends WebResponse {
   public Unbuffered(final HttpServletResponse httpServletResponse) {
   super(httpServletResponse);
   }
  
   @Override
   public CharSequence encodeURL(final CharSequence url) {
   return url;
   }
   }
}
  
Note, I haven't tested this myself yet but I plan to tonight.  Hope this 
 was
helpful.
  
  
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
  
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Removing the jsessionid for SEO

2008-04-02 Thread Ryan Gravener

I have noticed something like this with http_check on nagios.  Is
there a proper way to get rid of these temporary sessions?

On Wed, Apr 2, 2008 at 10:45 PM, Igor Vaynberg [EMAIL PROTECTED] wrote:
 also by doing what you have done users with cookies disabled wont be
  able to use your site...

  -igor




  On Wed, Apr 2, 2008 at 7:44 PM, Igor Vaynberg [EMAIL PROTECTED] wrote:
   you would think that the crawl bots are smart enough to ignore
jsessionid tokens...
  
-igor
  
  
  
  
On Wed, Apr 2, 2008 at 5:20 PM, Dan Kaplan [EMAIL PROTECTED] wrote:
 victori_ provided this information on IRC and I just wanted to share it 
 with
  everyone else.  Googlebot and others don't use cookies.  This means 
 when
  they visit your site it adds ;jsessionid=code to the end of all your 
 urls
  they visit.  When they re-visit it, they get a different code, 
 consider that
  a different url with the same content and punish you.  So, for the web
  crawling bots, it's very important to get rid of this (Perhaps it's
  worthwhile to check this code in to the code base).

  Here's what you do in your Application:

   @Override
  protected WebResponse newWebResponse(final HttpServletResponse 
 servletRe
  sponse) {
   return CleanWebResponse.getNew(this, servletResponse);
   }

  Here's the CleanWebResponse class:
  public class CleanWebResponse {
 public static WebResponse getNew(final Application app, final
  HttpServletResponse servletResponse) {
 return app.getRequestCycleSettings().getBufferResponse() ? new
  Buffered(servletResponse) : new Unbuffered(
 servletResponse);
 }

 static class Buffered extends BufferedWebResponse {
 public Buffered(final HttpServletResponse httpServletResponse) {
 super(httpServletResponse);
 }

 @Override
 public CharSequence encodeURL(final CharSequence url) {
 return url;
 }
 }

 static class Unbuffered extends WebResponse {
 public Unbuffered(final HttpServletResponse 
 httpServletResponse) {
 super(httpServletResponse);
 }

 @Override
 public CharSequence encodeURL(final CharSequence url) {
 return url;
 }
 }
  }

  Note, I haven't tested this myself yet but I plan to tonight.  Hope 
 this was
  helpful.


  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]


  

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]





-- 
Ryan Gravener
http://ryangravener.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

40 matches

Mail list logo