[Nutch-dev] Getting a semantic version of an "HTML page"

Michael Wechner Tue, 06 Feb 2007 07:46:50 -0800

Hi

Is there any standardized way that nutch is getting a semantic version 
of a web-page, e.g. the HTML page is as follows


<html>
<head>
 <link rel="semantic-content" href="index-semantic.xml"/>
</head>
<body>
blablabal ..
</body>
</html>

and the sematic XML (index-semantic.xml) would be something more useful 
than the HTML itself

<?xml version="1.0"?>

<semantic-of href="index.html">
...
</semantic-of>

resp. some RDF or whatever.

Any pointers are very welcome.

Thanks

Michi

-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
[EMAIL PROTECTED]                        [EMAIL PROTECTED]
+41 44 272 91 61


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Getting a semantic version of an "HTML page"

Reply via email to