+1 MarkLogic is an excellent product. This Lux thing was inspired by it. On Fri, May 6, 2022 at 11:29 AM Walter Underwood <wun...@wunderwood.org> wrote: > > If you need to search XML, consider MarkLogic. It is a very full-featured > database and search engine based on XML. > > https://www.marklogic.com > > Disclaimer: I worked there for a couple of years ten years ago. But I’ve been > inside that product and it is non-muggle technology. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > On May 6, 2022, at 5:35 AM, Michael Sokolov <msoko...@gmail.com> wrote: > > Many years ago I had started this Lux project that was designed to > build an XML-aware index using Solr; see > https://github.com/msokolov/lux/tree/master/src/main/java/lux/index/analysis > for the analysis chain I used. Maybe you'll find something useful in > this project? It's dormant for years, and pre-dates interval queries, > but the code is still the code, and XML has not really changed... > > On Fri, May 6, 2022 at 5:23 AM Mikhail Khludnev <m...@apache.org> wrote: > > > Hi Devs! > > I found intervals quite nice and natural for retrieving scoped data (thanks, > Alan!): > <tag>foo stuff bar</tag> > I.containing(I.ordered(I.term("<tag>"), I.term("<tag>")), > I.unordered(I.term("bar"), I.term("foo"))); > It works like a charm until it encounter ill nested tags: > <tag>foo <tag>bug</tag> bar</tag> > Due to intrinsic minimalizations it picks the internal tag. I feel like plain > intervals backed on positions lack tag scoping information. > Do you know any approaches for retrieving XML in Lucene? > > -- > Sincerely yours > Mikhail Khludnev > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org