Agreed; however it's not clear that trailing whitespace needs to be
preserved in order to be able to search for DITA tokens, as in the
original example. I guess it might depend on just what the tokens
consist of but a word- or phrase-search might be able to make use of the
implicit tokenization done by the indexer without the need for the
trailing whitespace.
EG: cts:attribute-word-search(..."topic/topic") ought to match
"topic/topic" and not match "mytopic/topic-foo", I think.
-Mike
David Sewell wrote:
Someone from Mark Logic really needs to weigh in on this. It appears
that ML Server is doing attribute value normalization upon loading:
http://www.w3.org/TR/REC-xml/#AVNormalize
However, the spec says "All attributes for which no declaration has been
read SHOULD be treated by a non-validating processor as if declared
CDATA." Meaning that unless a schema is associated with the file, the
server should not be normalizing attribute whitespace, unless I'm not
understanding something properly.
I also confirmed this behavior with a simple XML file load.
David S.
On Mon, 10 Mar 2008, Eliot Kimber wrote:
In storing some DITA documents into MarkLogic I discovered that the trailing
spaces in the DITA class= attributes are not preserved. I created a simple
test and got the same behavior, e.g.:
xdmp:document-load("test.xml", <root foo=" bar "/>)
<result>{doc("text.xml")/@foo}</result>
Returns:
<result>bar</result>
not
<result> bar </result>
The DITA standard requires the trailing spaces in the class= values because
the value is a sequence blank-delimited tokens where you need to be able to
match on " {token} " so you don't get false positives (for example, the
"topic" type is the token "topic/topic", without the spaces, a search for
"topic/topic" would also match "mytopic/topic-foo", which would be bad.
My question is: is this behavior unalterable or is it configurable?
This behavior does make it impossible to use DITA documents stored in
MarkLogic with any normal DITA-aware processor (because they all expect there
to be a trailing space in the class= value) without some serious workaround
(essentially a post-fetch fixup to add back in the trailing space).
Cheers,
Eliot
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general