Hi Is there any standardized way that nutch is getting a semantic version of a web-page, e.g. the HTML page is as follows
<html> <head> <link rel="semantic-content" href="index-semantic.xml"/> </head> <body> blablabal .. </body> </html> and the sematic XML (index-semantic.xml) would be something more useful than the HTML itself <?xml version="1.0"?> <semantic-of href="index.html"> ... </semantic-of> resp. some RDF or whatever. Any pointers are very welcome. Thanks Michi -- Michael Wechner Wyona - Open Source Content Management - Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL PROTECTED] [EMAIL PROTECTED] +41 44 272 91 61 ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
