SEO limitation for Hyperlink

Nicolas Wetzel Fri, 27 Feb 2009 07:26:12 -0800

Hi all,

I'm working for a compagny which build a web site broadcasting music based
on gwt: www.awdio.com


On SEO, we've found some interresting stuff to cope with Ajax specifity :
search engine can't  have javascript engine so they are not able to retrieve
the entire html produced by gwt script (or by other ajax framework script).
So each page ie "gwt screen"   can not be indexed by them. Rather than
duplicate each page with a hand-made static html page accessible by the
noscript tag, we produce them with a java program which launches an
SWTBrowser (Eclipse 3.4) with the start url : http://www.awdio.com.

The main issue with this approach is that the client program has no means of
knowing when the page is fully rendered by the javascript process.
In the gwt awdio code we implemented a "semaphore" (flag) which notifies the
SWTBrowser based client of the completion.

This semaphore works with a hidden <DIV> drawing</DIV> which is accessible
or not in the DOM, i.e the the html content produced contains it.
With org.eclipse.swt.browser.Browser.getText() we can retrieve the html
content, and test for the presence of the above mentioned flag.

To do that  the java program listens at the Browser statustext event
(org.eclipse.swt.browser.StatusTextListener).

Also, when the page is loaded, the program gets the content and looks up at
all the internal links <a href="#  built by the gwt Hyperlink widget. Before
storing the html content in a cachable static page,  all the '#' are
remplaced by a '/' so that the bot will get fully qualified URLs (the
crawlers do not handle anchors).

Finally, the program follow each links with the SWTBrowser so all the static
version of the pages can be produced automaticaly.

At last in the awdio server, a front-end servlet detects the user-agent of
request and if it's a search engine the static produced page is returned.
else the gwt host page is returned.
As far as I understand, this might be considered "shadowing". But the
content seen by the crawler is exactly the same as the one seen by the user
(after Javascript execution).

In the onModuleLoad of the awdio EntryPoint the right part of the url is
parsed to build the corresponding historyToken. So when the
www.awdio.com/events is requested on a browser, it react in the same way as
if the user clicked on an internal link (#events).

Everythink looksfine,  but there is still a big issue.....

If a user copies and pastes one of our URLs on his own site, it will contain
the hash sign (e.g. : http://www.awdio.com/#events). Which means that the
search engine will not rank pages independently (all pages will be
considered as a single one : http://www.awdio.com).

We can still add "link to this page" buttons wherever necessary, but it's
not satisfactory.

To conclude, it seems that this whole solution solves the AJAX indexing
issue, with the very annoying exception of page ranking (due to the #anchor
URLs). Maybe Google should start to consider #anchors as having a new
meaning for our Web 2.0 generation ? Maybe by considering a specific value
of the "rel" attribute ? (e.g. :  <A HREF="#mypage" rel="ispagelink">My
Page</A>) ?

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google Web Toolkit" group.
To post to this group, send email to Google-Web-Toolkit@googlegroups.com
To unsubscribe from this group, send email to 
google-web-toolkit+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/Google-Web-Toolkit?hl=en
-~----------~----~----~----~------~----~------~--~---

SEO limitation for Hyperlink

Reply via email to