Agreed; however it's not clear that trailing whitespace needs to be preserved in order to be able to search for DITA tokens, as in the original example. I guess it might depend on just what the tokens consist of but a word- or phrase-search might be able to make use of the implicit tokenization done by the indexer without the need for the trailing whitespace.

EG: cts:attribute-word-search(..."topic/topic") ought to match "topic/topic" and not match "mytopic/topic-foo", I think.

-Mike

David Sewell wrote:
Someone from Mark Logic really needs to weigh in on this. It appears
that ML Server is doing attribute value normalization upon loading:

 http://www.w3.org/TR/REC-xml/#AVNormalize

However, the spec says "All attributes for which no declaration has been
read SHOULD be treated by a non-validating processor as if declared
CDATA." Meaning that unless a schema is associated with the file, the
server should not be normalizing attribute whitespace, unless I'm not
understanding something properly.

I also confirmed this behavior with a simple XML file load.

David S.

On Mon, 10 Mar 2008, Eliot Kimber wrote:

In storing some DITA documents into MarkLogic I discovered that the trailing
spaces in the DITA class= attributes are not preserved. I created a simple
test and got the same behavior, e.g.:

xdmp:document-load("test.xml", <root foo=" bar "/>)

<result>{doc("text.xml")/@foo}</result>

Returns:

<result>bar</result>

not

<result> bar </result>

The DITA standard requires the trailing spaces in the class= values because
the value is a sequence blank-delimited tokens where you need to be able to
match on " {token} " so you don't get false positives (for example, the
"topic" type is the token "topic/topic", without the spaces, a search for
"topic/topic" would also match "mytopic/topic-foo", which would be bad.

My question is: is this behavior unalterable or is it configurable?

This behavior does make it impossible to use DITA documents stored in
MarkLogic with any normal DITA-aware processor (because they all expect there
to be a trailing space in the class= value) without some serious workaround
(essentially a post-fetch fixup to add back in the trailing space).

Cheers,

Eliot



_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to