Danny, Would it be possible to convert the HTML you’re analyzing as a string into an XML structure? With XML, the link detection would be a trivial XPath expression. If the HTML doesn’t parse as legal XML out-of-the-box, you can take a look at the Tidy integration that’s built into MarkLogic <http://developer.marklogic.com:8040/5.0doc/docapp.xqy#search.xqy?query=tidy>.
Justin Justin Makeig Director, Product Management MarkLogic Corporation [email protected]<mailto:[email protected]> Phone: +1 650 655 2387 www.marklogic.com<http://www.marklogic.com/> On Jul 16, 2012, at 11:51 AM, Danny Sinang wrote: Hi, I'm trying to use regex in detecting html anchor tags. So far, my Googling has yield this as the best regex to use : <a[\s]+[^>]*?href[\s]?=[\s\"\']*(.*?)[\"\']*.*?>([^<]+|.*?)?<\/a> My problem is, how do I assign that to a variable in XQuery so I can call fn:analyze-string() . I was hoping to do it this way : let $htmlBody := $asset/assetContent/htmlBody/string() let $pattern := <a[\s]+[^>]*?href[\s]?=[\s\"\']*(.*?)[\"\']*.*?>([^<]+|.*?)?<\/a> return fn:analyze-string($htmlBody, $pattern) But I can't enclose the regex with either a single or double quote. Any idea ? Regards, Danny _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
