Tim, could you show a (shortened if you want, but complete) example of your query , and how you invoke it and how you are getting the results ?
The behavior you describe is *probably* the serialization of XDM to Text as described here http://www.w3.org/TR/xquery-30/#id-serialization (or) http://www.w3.org/TR/xslt-xquery-serialization/ ( rather obtusely until you learn how to decipher W3C specification documents). A critical issue is that the conversion of bare "text nodes" to "text" is part of the serialization process, not part of the node construction. Node construction with multiple children does *not* add newlines ... (it adds spaces - see below) Serialization *may* add newlines, depending on exactly how your are constructing your document, where you are sending the output and what software and settings are used to eventually get it to where you see it. I suspect you are outputting *only* text nodes ... which means the result is a "Sequence of Nodes" and falls under the category of (5.2.7 Serialization Feature) which is full of "may"s and "musts" and "implementation-defined" But in general most XDM (the result of an XQuery or XSLT) processors that produce text are consistent and if not told otherwise (via various output method declarations, command line overrides, API settings etc.) does this For every item in the result Convert that item to a "string" (using the serialization or atomization rules for that item) Output that string followed by a newline ( it takes about 20 pages to distill to this ... but in your case the critical part is if you are producing a sequence of items or a single item that wraps a sequence. A sequence will be newline separated. Why ? Because text nodes are treated differently during element construction then they are by themselves. During element construction adjacent text nodes are combined (without any separation). (http://www.w3.org/TR/xquery-30/ , 3.9.1.3 Content, "Adjacent text nodes in the content sequence are merged into a single text node by concatenating their contents, with no intervening blanks. After concatenation, any text node whose content is a zero-length string is deleted from the content sequence." ) If you then serialize the element it won't have any extra spaces. BUT ... if your XQuery produces a sequence of values (strings, dates, nodes, whatever) then each item *during serialization* is individually serialized, and depending on the processor likely newline separated. Try this <e>{ text {"a"}, text{"string"}, text{"is"} , text{"here"} }</e> You should get something like this <e>astringishere</e> may vary depending on various settings but will NOT put a newline between the text nodes. The point here is this is ONE item result (an element) Now try this text {"a"}, text{"string"}, text{"is"} , text{"here"} You should get something like this: a string is here Note: this is a sequence of FOUR items each serialized then followed by a NL. While you're at it, you might as well discover you probably don’t need the text{} ... which creates *nodes*, if what you want is just strings then converting them to nodes is unnecessary, even if you want them as a child of an element. The rules for combining multiple text (or strings or other atomic values) is different .. in this case it follows the element construction rules: http://www.w3.org/TR/xquery-30/#id-content (sec 3.9.1.3) "For each adjacent sequence of one or more atomic values returned by an enclosed expression, a new text node is constructed, containing the result of casting each atomic value to a string, with a single space character inserted between adjacent values. " So try this: <e>{ "a", "string", "is" , "here" }</e> What do you get ? <e>a string is here</e> Different ! ... and often baffling to people until they figure out whats going on. This holds true even if you extract the text back out of the node. like: <e>{ "a", "string", "is" , "here"}</e>/string() or <e>{ "a", "string", "is" , "here"}</e>/node() A way to double check is to count the results... now many values in the above ? 4 ? nope, 1. count(<e>{ "a", "string", "is" , "here"}</e>/node()) count(("a", "string", "is" , "here")) --- Note I had to enclose the sequence in() ... 4 Now if you don’t create an element, and do this directly: "a", "string", "is" , "here" What do you get ? No element constructor rules so were back to the serialization of multiple items .. so you get a string is here This is why concat and string-join (and in V7 and later the || operator) make a difference. "A" || "string" || "is" || "here concat("a", "string", "is" , "here") string-join( ("a", "string", "is" , "here") , "" ) All produce 1 item (string) with no separators. Same is true if you get fancy like string-join( for $i in 1 to 1000 return concat( "a" , "big" , "runon" , "string", "#" , $i , string-join(("these","are","colon","separated" ),":" ) , "" ) But stick that in an element instead of string joining and it’s a tad different <e>{ for $i in 1 to 1000 return concat( "a" , "big" , "runon" , "string", "#" , $i , string-join(("these","are","colon","separated" ),":" ) }</e> Or outside an element ... for $i in 1 to 1000 return concat( "a" , "big" , "runon" , "string", "#" , $i , string-join(("these","are","colon","separated" ),":" ) All different, but once you get the rules its quite predictable, and maybe even sane. ----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation [email protected] Phone: +1 812-482-5224 Cell: +1 812-630-7622 www.marklogic.com<http://www.marklogic.com/> From: [email protected] [mailto:[email protected]] On Behalf Of Tim Sent: Monday, September 01, 2014 12:24 PM To: 'MarkLogic Developer Discussion' Subject: [MarkLogic Dev General] How to force EOL characters when downloading a text file Hi Folks, I am extracting text from an xml file which can be downloaded by a user. The file extension is custom. To create the record I basically walk through the XML elements and generate the corresponding text, e.g. text{“first line”}, text{“second line”}, … When the user downloads the file, the Windows form of linefeeds are required (CR-LF) and I’m trying to determine how to force that, if it is in the content type, disposition, or merely in the way in which I add linefeeds to the generate text, e.g. text{“first line”}, “
”, “
” text{“second line”}, “
”, “
” … It seems that using the text{“”} directive adds the linefeed character to the generated text without explicitly adding CR-LF. Thank for any help with this! Tim M.
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
