Just a general question about regular expression support and some weigh in from 
MarkLogic.
 
Ideally I think full support for Regular Expression matching and named grouping 
for cts:highlight and other searches would be a nice to have feature
 
I am working on alot of Regex matching to automatically assign markup to xml 
elements and query for patterns using an index.
Since what I am looking for are specific patterns in the Text for Legal 
Citations. I cannot enumerate all the possible combinations of a pattern. I 
have a workaround by using a regex matching xquery function that returns 
matching text and non-matching text once I locate the document.  
I would then collect all the matching phrases and create a cts:word-query for 
each match,
then run cts:highlight over the matches, first to create the boundary element. 
And then reiterate the over the boundary element to add metadata to each 
element. 
 
Here are my limitations, 

*       I cannot capture named groups(I could Ideally use non-capture groups 
and just use replace functions).
*       fn:replace only returns positions 1-9 as per xquery spec (Again 
non-capture groups will muddy regex or Regex the Regex:-) to make all 
non-named-groups non-capture groups).
*       Speed is a concern and native functions would ideally perform better.
*       I would like to use regex on cts:queries for searching for documents.
*       Necessity to build an expression by having access to cts:text, cts:node 
like cts:highlight.  My created function limits my ability to construct nodes 
to pass to the function like cts:highlight.

 
Ideally, a function or set of functions to do regex matching on indexes would 
be useful or as general purpose utilities to perform such functions:
 
Recommendation 1:
A highlight utility or enhancement of cts:highlight to allow for regex-
 
cts:pattern-highlight($node, $query, $expression)
   cts:text : text-captured
   cts:group as element(cts:group) (Captures Named Regex (?<group>:[expr])
 
Recommendation 2:
A cts:query that allows for Regex Patterns
 
cts:(regex|pattern)-query($patterns as xs:string*,$options,$weight)
    $pattern : a regular-expression or sequence of $expressions
    $options : (case-sensitive|case-insensitive|  (:i = Regex Ignore Case:)
                    whitespace-sensitive|whitespace-insensitive| (:x = 
Whitespace mode:)
                    single-mode|multiline                                (:s= 
Mode:)
                    element-boundary                                     (:I 
guess preserve element boundaries:)
                    named-capture| indexed-capture                (:Captures 
group names and returns them to cts:group:)
                    
 Also, if someone can weigh in on the ramifications of regex searches with 
Marklogic indexing and is there a possibility of a native regex support for 
cts:search
(beyond fn:matches, fn:replace,fn:tokenize)
 
                    
 
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to