Hello Community,
I have a question regarding a strange behavior difference between term queries
and value queries.
During some test for a new service, my team and I were trying to query
documents based on a value query and the german language (xml:lang=”de”).
However the query result was empty, but we do know there are documents that
should match.
So I test around a bit and found out that if you send a raw combined query (via
the Java API) to the REST service of MarkLogic ,
the server just ignores any language set via the query options.
To make things clear for you, I created a simple test case that shows this
behavior. In a fresh new database with default settings I created 4 test
documents via the following XQuery:
xdmp:document-insert("/demo1.xml", <root><element
xml:lang="de">Zeitstand-Innendruckfestigkeit</element></root>);
xdmp:document-insert("/demo2.xml", <root><element xml:lang="en">long-term
hydrostatic strength</element></root>);
xdmp:document-insert("/demo3.xml",
<root><element>Zeitstand-Innendruckfestigkeit</element></root>);
xdmp:document-insert("/demo4.xml", <root><element>long-term hydrostatic
strength</element></root>);
As you can see, the documents are tagged differently by language or have no
tagging at all (which should result in English by default).
Now I am searching for the document with the following command:
search:resolve(<query xmlns="http://marklogic.com/appservices/search"
xmlns:search="http://marklogic.com/appservices/search">
<and-query>
<value-query type="string">
<element ns="" name="element"/>
<text>long-term hydrostatic strength</text>
<weight>1.0</weight>
</value-query>
</and-query>
</query>,
<search:options xmlns:search="http://marklogic.com/appservices/search">
<search:page-length>300</search:page-length>
<search:term>
<search:term-option>lang=de</search:term-option>
</search:term>
</search:options>)
So if you execute the search with the English word “long-term hydrostatic
strength” and “<search:term-option>lang=en</search:term-option>” within the
query options part,
all is fine and it results in 2 found documents (for both value-query and
term-query).
<search:response snippet-format="snippet" total="2" start="1" page-length="300"
xmlns:search="http://marklogic.com/appservices/search">
<search:result index="1" uri="/demo2.xml"
path="fn:doc("/demo2.xml")" score="14336" confidence="0.5296452"
fitness="0.7490314">
<search:snippet>
<search:match
path="fn:doc("/demo2.xml")/root/element"><search:highlight>long-term
hydrostatic strength</search:highlight></search:match>
</search:snippet>
</search:result>
<search:result index="2" uri="/demo4.xml"
path="fn:doc("/demo4.xml")" score="14336" confidence="0.5296452"
fitness="0.7490314">
<search:snippet>
<search:match
path="fn:doc("/demo4.xml")/root/element"><search:highlight>long-term
hydrostatic strength</search:highlight></search:match>
</search:snippet>
</search:result>
<search:metrics>
<search:query-resolution-time>PT0.001277S</search:query-resolution-time>
<search:snippet-resolution-time>PT0.001016S</search:snippet-resolution-time>
<search:total-time>PT0.003034S</search:total-time>
</search:metrics>
</search:response>
The same applies if you try to run this with German language and the German
equivalent term “Zeitstand-Innendruckfestigkeit”. And the third scenario does
also work: searching for
the German term with no language constraint will result in one found document
(with both term-query and value-query).
However if you do above query for the English term with the language constraint
“<search:term-option>lang=de</search:term-option>” within the query options,
it results in no documents found for the term-query but in 2 found documents
for the value-query!
In other words it seems that the value-query just ignores the language
constraint set by the query options.
My question for this long story is: Is this a desired behavior or is this
something I should report to MarkLogic support as a bug?
Mit freundlichen Grüßen
i.A. Hubertus Willuhn
Informatiker
Datenservice | XML-Technologie
T +49 30 2601-2032| F +49 30 2601-42032
Folgen Sie uns auf
[cid:[email protected]]
<https://www.facebook.com/din.software.gmbh/>
DIN Software GmbH, Am DIN-Platz, Burggrafenstraße 6, 10787 Berlin;
http://www.dinsoftware.de; Registergericht: AG Berlin-Charlottenburg, HRB
28484; Geschäftsführer: Dr.-Ing. Mario Schacht
Der Inhalt dieser E-Mail (einschließlich Anhängen) ist vertraulich. Falls Sie
diese E-Mail versehentlich erhalten haben, löschen Sie sie bitte und
informieren den Absender. Die DIN Software GmbH liefert Daten und erteilt
Auskünfte nach Maßgabe einer Haftungsbeschränkung, die
hier<http://www.din-sw.de/index.php?id=221> abrufbar ist. The contents of this
e-mail (including attachments) are confidential. If you received this e-mail in
error, please delete it and notify the sender. DIN Software GmbH provides data
and information in accordance with its statement on the limitation of liability
which is available here<http://www.din-sw.de/index.php?id=222>.
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general