Hi all, I'm working for a compagny which build a web site broadcasting music based on gwt: www.awdio.com
On SEO, we've found some interresting stuff to cope with Ajax specifity : search engine can't have javascript engine so they are not able to retrieve the entire html produced by gwt script (or by other ajax framework script). So each page ie "gwt screen" can not be indexed by them. Rather than duplicate each page with a hand-made static html page accessible by the noscript tag, we produce them with a java program which launches an SWTBrowser (Eclipse 3.4) with the start url : http://www.awdio.com. The main issue with this approach is that the client program has no means of knowing when the page is fully rendered by the javascript process. In the gwt awdio code we implemented a "semaphore" (flag) which notifies the SWTBrowser based client of the completion. This semaphore works with a hidden <DIV> drawing</DIV> which is accessible or not in the DOM, i.e the the html content produced contains it. With org.eclipse.swt.browser.Browser.getText() we can retrieve the html content, and test for the presence of the above mentioned flag. To do that the java program listens at the Browser statustext event (org.eclipse.swt.browser.StatusTextListener). Also, when the page is loaded, the program gets the content and looks up at all the internal links <a href="# built by the gwt Hyperlink widget. Before storing the html content in a cachable static page, all the '#' are remplaced by a '/' so that the bot will get fully qualified URLs (the crawlers do not handle anchors). Finally, the program follow each links with the SWTBrowser so all the static version of the pages can be produced automaticaly. At last in the awdio server, a front-end servlet detects the user-agent of request and if it's a search engine the static produced page is returned. else the gwt host page is returned. As far as I understand, this might be considered "shadowing". But the content seen by the crawler is exactly the same as the one seen by the user (after Javascript execution). In the onModuleLoad of the awdio EntryPoint the right part of the url is parsed to build the corresponding historyToken. So when the www.awdio.com/events is requested on a browser, it react in the same way as if the user clicked on an internal link (#events). Everythink looksfine, but there is still a big issue..... If a user copies and pastes one of our URLs on his own site, it will contain the hash sign (e.g. : http://www.awdio.com/#events). Which means that the search engine will not rank pages independently (all pages will be considered as a single one : http://www.awdio.com). We can still add "link to this page" buttons wherever necessary, but it's not satisfactory. To conclude, it seems that this whole solution solves the AJAX indexing issue, with the very annoying exception of page ranking (due to the #anchor URLs). Maybe Google should start to consider #anchors as having a new meaning for our Web 2.0 generation ? Maybe by considering a specific value of the "rel" attribute ? (e.g. : <A HREF="#mypage" rel="ispagelink">My Page</A>) ? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group. To post to this group, send email to Google-Web-Toolkit@googlegroups.com To unsubscribe from this group, send email to google-web-toolkit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/Google-Web-Toolkit?hl=en -~----------~----~----~----~------~----~------~--~---