Commenting inline, ...
-----Original Message-----
From: Peter Kelly [mailto:[email protected]]
Sent: Monday, July 6, 2015 04:41
To: [email protected]
Subject: Re: Word round trip issue? And round trip in general.
[ ... ]
This is one of the few instances where we actually completely replace something
in the docx file every time it is modified. The OPC specifies a set of XML
files that indicate relationships between different “parts” (i.e. files) in a
package. They’re used as an alternative to path names (I don’t know why, it
seems unnecessary, but that’s how it’s done in OOXML).
<orcmid>
The OPC structure allows the interdependencies among package
parts to be known and managed without understanding the files
that have the dependencies. Also, the OPC model is not
limited to Zip implementations so there is the prospect that
these would be mapped and represented on a server (for example)
in quite different ways, and pull/push processing was thought
to be aided by having the dependencies at the package level and
also subdividing parts for more efficient interleaved access.
Most of this is not used for OOXML, AFAIK, but the OPC design
Allowed for it.
PS: The Office 2016 implementations are supporting concurrent
shared editing when the documents are in a Microsoft cloud
service, such as OneDrive, and that makes the server-side
storage and protocols for its access interesting too. I have
no idea what they are, just that MSFT is moving rapidly to
enable this sort of thing.
</orcmid>
I think there’s two likely possibilities:
1. OpenOffice is too strict in what it accepts from the OPC relationship files,
and handles only a subset of possible valid relationships (presumably whatever
MS Office writes out).
2. Corinthia is too liberal in writing out the relationships, in that it does
so in a way that, while accepted by MS Office and some other apps, isn’t
strictly in accordance with the spec.
I suspect it’s likely the former, but I’m not infallible and it could be the
latter ;)
If you unzip a .docx file, have a look at the files in _rels and word/_rels -
these are the OPC files that would differ and are likely what OO for whatever
reason is struggling with.
<orcmid>
It is good to look at the OPC specification. This is part
of the OOXML specification although it is designed for
independent use, and it is so used. The easy way to get
the spec is to download ECMA-376, latest edition, Part 2.
See <http://www.ecma-international.org/publications/standards/Ecma-376.htm>.
The original format (called Metro) was in fact for very
large final-form documents that were amenable to accessing
by pull requests from high-end publishing engines (the
format that became known, later as XPS).
There are free-standing implementations of OPC processors,
including one in C on SourceForge. A .net version has
been open-sourced and I assume that there is a Java version
at Apache POI. I can't speak to their quality. I can't
speak to the quality of the OpenOffice processing of
OPC-carried OOXML either.
These might be the basis for tests and they might also be
useful sources of ideas for how to disassemble and reassemble
OOXML documents.
PS: There have been some changes in OPC, as there have been
in the OOXML specifications, so you may have to distinguish
documents that honor older specifications and others that
reflect breaking changes.
</orcmid>
[ ... ]