I need to calculate offsets into a Word document XML (.docx) archive using two 
methods: Counting characters in NSAttributedString’s interpretation, and the 
text nodes (etc.) of the document XML itself. The offsets have to match. They 
don’t, mostly because of the way the parser treats runs of space characters.

I can explain the need, but let’s keep this brief.

NSXMLDocument (Node, Element…) elides runs of spaces in text nodes to single 
spaces. This is a problem, because the scholars who produced the Word source 
files learned in 1975 to double-space at the end of sentences. 
NSAttributedString renders the multiple spaces as such; thus the character 
counts diverge.

"what Socrates wanted.  Plato implies"
(two spaces) comes through as

"what Socrates wanted. Plato implies"
(one space).

Already tried:

* Passing the NSXMLDocumentTidyXML option to NSXMLDocument(data:, options:) 
takes care of single-space elements, but not this.

* NSXMLNodePreserveWhitespace sounds useful, but makes no difference.

* The nodes themselves already have the attribute `xml:space="preserve"`.

* Intercepting every `<w:t/>` element and _forcing_ `xml:space="preserve"`, 
need it or not, makes no difference.

* If there’s a way for my code, as an NSXMLDocument (etc.) client, to examine 
the source text before filtering, I haven’t found it.

* I assume it doesn’t matter that I’m working in Swift.

Ideas?

Further details can be found in Stack Overflow at 
http://stackoverflow.com/questions/33770055/nsxmldocument-family-runs-of-whitespace-collapsed-to-one
 


        — F


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to