OOXML

2014-08-01 Thread Rory O'Farrell

For information:
http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/

-- 
Rory O'Farrell 

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



OOXML documentation

2014-07-24 Thread Peter Kelly
Hi,

I've begun writing up some documentation on the OOXML file format on the wiki:

https://wiki.openoffice.org/wiki/OOXML

The new content is that linked to from the first section, currently limited to 
a description of the packaging format, extensibility features, and a brief 
introduction to WordProcessingML. This is just the beginning, and there's a 
*lot* more to be covered, which will happen over the coming weeks (months?).

I've deliberately avoided discussing details of any particular implementation, 
so that it's a general description that is hopefully of use to anyone writing 
software to deal with the format. I will however be releasing my own 
implementation (which converts to/from HTML) once I've completed the port to 
Linux (which is close to completion).

If anyone else who has experience working with the format would like to 
contribute as well, that would be great. My expertise is limited to the word 
processing aspects only, as I've never worked with the spreadsheet or 
presentation formats.

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: OOXML

2014-08-02 Thread Peter Kelly
On 1 Aug 2014, at 2:42 pm, Rory O'Farrell  wrote:

> For information:
> http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/

An interesting article. This brings to mind a few issues I've been thinking 
about for a while:

- I think the rather extreme anti-OOXML stance that some take can be 
counterproductive. I certainly hold the view that ODF is a superior standard in 
many respects (though not all), however there are circumstances where it makes 
sense for a given piece of software to support both. For example they cite the 
lack of support for ODF in Google Docs and iWork; if one wants to develop 
software that will interoperate with these would require OOXML support.

My take on the issue is that it's important to support both, because as much as 
we might dislike the fact, OOXML is out there and used very widely. With the 
work I'm currently doing on UX Write, I'm adding to the existing OOXML 
(specifically .docx) support with support for for ODF (.odt) and doing this in 
a common framework such that the app itself doesn't care which format the file 
is natively stored in, it will work equally well with both. Additionally, once 
I have the ODF support in, it will be possible to leverage this support for 
conversion between the two formats in both directions. I'll be giving a talk on 
this at ApacheCon EU later this year, and yes this framework will soon be open 
source - if anyone is interested in collaborating on it, please let me know.

- One of the criticisms raised is that there are several different versions of 
OOXML, not all of which are entirely compatible. However this is also true of 
ODF (or at least of MS's implementation in Office 2007 and 2010; I'm not sure 
where the fault lies). One of the big questions I've been asking myself in the 
work I'm currently on ODF is whether I should have my implementation it save 
ODF 1.1 by default, or version 1.2 by default. If I choose the former, it will 
work with Office 2007 and onwards. The latter, only Office 2013 (I think). For 
someone such as myself writing a new implementation of the (prat of) ODF spec, 
and desiring compatibility with Office 2007 and 2010, which is the best choice?

- I consider the use of proprietary fonts to be a separate issue from the 
standard itself. The specification is silent on the matter, so this is really a 
criticism of MS Office rather than OOXML itself. Nonetheless, it's an important 
one, and one I believe we should address by promoting the use of open source 
fonts (e.g. https://www.google.com/fonts) independently and in addition to the 
use of ODF. Perhaps these could be made available as an easily-distributed 
separate package, so that those who want to stick with MS Office for whatever 
reason could be encouraged to install & use them, for improved interoperability 
with other office suites?

In an organisation where there are some users on MS and others on OO/LO, these 
fonts could be deployed by the IT department as part of the standard desktop 
image, and all templates created by the organisations could use these fonts by 
default, which would lead to wider usage.

- Towards the end of the article, there's a discussion about the lack of 
support for ODF by some vendors, particularly Google and Apple. The question 
then is how do we fix that? My view is that there needs to be a migration path 
- and by that I mean not just a tool to convert documents from OOMXL to ODF, 
but the ability to go both ways, and work with either format for as long as 
necessary for the migration to complete. Most (all?) successful transitions 
I've seen have used a similar approach - Microsoft going from DOS to Windows, 
Apple going from 68k -> PPC -> Intel, and Mac OS classic -> OS X, and so forth.

In the case of document formats, for a country whose government currently uses 
MS Office and OOXML that wants to make the switch to ODF and 
OpenOffice/LibreOffice/other tools, it's not going to be an overnight change. 
It could very well take several years, and during that period everyone in the 
organisation will need to have the capability to work with both formats. New or 
modified documents would in general be saved in ODF, but older documents as 
well as documents that need to be exchanged with people running MS Office 2007 
or 2010 (which I think don't support ODF 1.2) would need to be in OOXML, until 
such time as everyone has upgraded to a fully-conformant version of MS Office, 
or switched to OpenOffice et al.

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: OOXML

2014-08-02 Thread Alexandro Colorado
The Support that is done is to receieve OOXML not to produce them, the
discussion issue would be to support legacy formats like .doc or .xls.

I still dont see a point to generate OOXML and most people dont care
as long as they can send in office native formats.

I never heard someone saying, please send it on docx, your doc is a
closed binary format.

On 8/2/14, Peter Kelly  wrote:
> On 1 Aug 2014, at 2:42 pm, Rory O'Farrell  wrote:
>
>> For information:
>> http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/
>
> An interesting article. This brings to mind a few issues I've been thinking
> about for a while:
>
> - I think the rather extreme anti-OOXML stance that some take can be
> counterproductive. I certainly hold the view that ODF is a superior standard
> in many respects (though not all), however there are circumstances where it
> makes sense for a given piece of software to support both. For example they
> cite the lack of support for ODF in Google Docs and iWork; if one wants to
> develop software that will interoperate with these would require OOXML
> support.
>
> My take on the issue is that it's important to support both, because as much
> as we might dislike the fact, OOXML is out there and used very widely. With
> the work I'm currently doing on UX Write, I'm adding to the existing OOXML
> (specifically .docx) support with support for for ODF (.odt) and doing this
> in a common framework such that the app itself doesn't care which format the
> file is natively stored in, it will work equally well with both.
> Additionally, once I have the ODF support in, it will be possible to
> leverage this support for conversion between the two formats in both
> directions. I'll be giving a talk on this at ApacheCon EU later this year,
> and yes this framework will soon be open source - if anyone is interested in
> collaborating on it, please let me know.
>
> - One of the criticisms raised is that there are several different versions
> of OOXML, not all of which are entirely compatible. However this is also
> true of ODF (or at least of MS's implementation in Office 2007 and 2010; I'm
> not sure where the fault lies). One of the big questions I've been asking
> myself in the work I'm currently on ODF is whether I should have my
> implementation it save ODF 1.1 by default, or version 1.2 by default. If I
> choose the former, it will work with Office 2007 and onwards. The latter,
> only Office 2013 (I think). For someone such as myself writing a new
> implementation of the (prat of) ODF spec, and desiring compatibility with
> Office 2007 and 2010, which is the best choice?
>
> - I consider the use of proprietary fonts to be a separate issue from the
> standard itself. The specification is silent on the matter, so this is
> really a criticism of MS Office rather than OOXML itself. Nonetheless, it's
> an important one, and one I believe we should address by promoting the use
> of open source fonts (e.g. https://www.google.com/fonts) independently and
> in addition to the use of ODF. Perhaps these could be made available as an
> easily-distributed separate package, so that those who want to stick with MS
> Office for whatever reason could be encouraged to install & use them, for
> improved interoperability with other office suites?
>
> In an organisation where there are some users on MS and others on OO/LO,
> these fonts could be deployed by the IT department as part of the standard
> desktop image, and all templates created by the organisations could use
> these fonts by default, which would lead to wider usage.
>
> - Towards the end of the article, there's a discussion about the lack of
> support for ODF by some vendors, particularly Google and Apple. The question
> then is how do we fix that? My view is that there needs to be a migration
> path - and by that I mean not just a tool to convert documents from OOMXL to
> ODF, but the ability to go both ways, and work with either format for as
> long as necessary for the migration to complete. Most (all?) successful
> transitions I've seen have used a similar approach - Microsoft going from
> DOS to Windows, Apple going from 68k -> PPC -> Intel, and Mac OS classic ->
> OS X, and so forth.
>
> In the case of document formats, for a country whose government currently
> uses MS Office and OOXML that wants to make the switch to ODF and
> OpenOffice/LibreOffice/other tools, it's not going to be an overnight
> change. It could very well take several years, and during that period
> everyone in the organisation will need to have the capability to work with
> both formats. New or modified documents would in general be saved in ODF,
> but older documents

Re: OOXML

2014-08-02 Thread Louis Suárez-Potts

> On 2014-08-02, at 10:24, Alexandro Colorado  wrote:
> 
> The Support that is done is to receieve OOXML not to produce them, the
> discussion issue would be to support legacy formats like .doc or .xls.
> 
> I still dont see a point to generate OOXML and most people dont care
> as long as they can send in office native formats.
> 
> I never heard someone saying, please send it on docx, your doc is a
> closed binary format.

Actually, I have. But it also matters on mobile, as well as, I'd guess, for 
some developing processes for batch conversion of documents. Finally, it's not 
evident to me that refusing to develop to what is likely to become the major 
desktop document format globally—alas—is a good strategy that would lead to the 
adoption of OO. Rather, it seems it would only help those applications that do 
(express) both ODF *and* .docx well.

louis
-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: OOXML

2014-08-02 Thread jan i
On 2 August 2014 17:06, Louis Suárez-Potts  wrote:

>
> > On 2014-08-02, at 10:24, Alexandro Colorado  wrote:
> >
> > The Support that is done is to receieve OOXML not to produce them, the
> > discussion issue would be to support legacy formats like .doc or .xls.
> >
> > I still dont see a point to generate OOXML and most people dont care
> > as long as they can send in office native formats.
> >
> > I never heard someone saying, please send it on docx, your doc is a
> > closed binary format.
>
> Actually, I have. But it also matters on mobile, as well as, I'd guess,
> for some developing processes for batch conversion of documents. Finally,
> it's not evident to me that refusing to develop to what is likely to become
> the major desktop document format globally—alas—is a good strategy that
> would lead to the adoption of OO. Rather, it seems it would only help those
> applications that do (express) both ODF *and* .docx well.
>

Please dont forget, the computer business have always had 2 types of
standard the official one and the de facto one.

For those to young to remember, tcp/ip is not an official standard (OSI
was) but something a number of companies decided to promote, I see docx in
the same light.

rgds
jan I

>
> louis
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
>
>


Re: OOXML

2014-08-02 Thread Peter Kelly
On 2 Aug 2014, at 9:24 pm, Alexandro Colorado  wrote:

> The Support that is done is to receieve OOXML not to produce them, the
> discussion issue would be to support legacy formats like .doc or .xls.
> 
> I still dont see a point to generate OOXML and most people dont care
> as long as they can send in office native formats.
> 
> I never heard someone saying, please send it on docx, your doc is a
> closed binary format.

I (and I suspect I'm not alone) see a lack of the ability to 1) Save OOXML 
documents and 2) Do so while preserving all elements, including unsupported 
features and Microsoft-only data as being the #1 limitation to OpenOffice 
today. The fact is, OOXML is in practice extremely widely used (vastly more so 
than ODF) and I argue that if OpenOffice is to have any relevance going forward 
it must support it, and support it well.

The migration path in particular, which I mentioned previously, is not just 
about importing files but enabling a period of a number of years during which 
an organisation can effectively work with a mixture of OOXML and ODF documents. 
This allows the transition to be done incrementally - a company with 30,000 
employees will only migrate if there's a way they can do so bit-by-bit, with 
some departments sticking with OOXML for longer than others. Because there will 
be people in different departments that need to work together, those who insist 
on remaining with OOXML for the time being must be able to collaborate in both 
directions with those who have switched for all their other documents.

It's the same situation as the transition Microsoft made from the old binary 
formats to OOXML - Office 2007 (and all later versions) still support the older 
formats, for both read and write, and I expect they will continue for some 
time. If Office 2007 had completely dropped support for saving .doc, .xls, and 
.ppt, it would have been dead-on-arrival, as it took several years before most 
people were saving in the newer format by default.

Now there is still the question of how OpenOffice could go about supporting 
these formats. There is already an import filter which sort-of works (though I 
had to direct a customer to LibreOffice the other day as they were having 
trouble opening a perfectly-valid .docx using OpenOffice). This could be left 
in place, with fixes where necessary, and a new export filter written for 
saving. The problem with this however is that import/export is inherently a 
lossy process; if there is any information within a document that is not 
supported by OO or the filters, then it will be lost after an open/save. This 
information could also include proprietary extension data that is supported by 
Office which there is no way to interpret since its format is not published 
(macros, I believe, are an example of this).

The approach I took with UX Write was to use bidirectional transformation [1], 
which ensures updates happen in a non-destructive manner. When you open a .docx 
file in UX Write, it converts it into HTML, and keeps track of information that 
it allows it to map each HTML element back to the original XML element in the 
.docx file from which it was generated. When you save the file, instead of 
overwriting it with a new version, it *updates* the existing version by 
figuring out what changes have occurred in the HTML document, and applying 
those changes to the original .docx file. This way, only the parts that the 
user has actually modified are touched; anything UX Write doesn't know about 
(e.g. embedded spreadsheets) is left untouched. I'm planning to use the same 
design for ODF.

Crucially, this meant that I was able to implement support for OOXML (well, 
specifically the WordProcessingML part of it) in an incremental fashion. First 
there was only support for editing text; then came basic formatting, then 
lists, tables, styles etc. Even today, my implementation doesn't have support 
for the complete feature set, but it is nonetheless able to "walk lightly" in 
editing the document, by not touching anything that isn't supported. Coming 
back to the migration path I mentioned above, whereby there is a need to be 
able to interoperate with people using OOXML for some period of time, assuming 
they're eventually lead towards using only ODF.

I'd be keen to hear any thoughts others have on this issue, in the sense of how 
best to tackle it within OpenOffice.

I recommend having a look at the slides linked to below, which give a great 
introduction to what bidirectional transformation is and how it works. There's 
been a ton of research been done on this in the past, and I think it's ideal 
for dealing with different document formats, particularly when a given app has 
treats a particular format as "native" (HTML in the case of UX Write, ODF in 
the case of OpenOffice). With this approach, we could bypass an entire class of 

RE: OOXML

2014-08-02 Thread Dennis E. Hamilton
s below.


-Original Message-
From: jan i [mailto:j...@apache.org] 
Sent: Saturday, August 2, 2014 08:57
To: dev
Subject: Re: OOXML

On 2 August 2014 17:06, Louis Suárez-Potts  wrote:

>
> > On 2014-08-02, at 10:24, Alexandro Colorado  wrote:
> >
> > The Support that is done is to receieve OOXML not to produce them, the
> > discussion issue would be to support legacy formats like .doc or .xls.
> >
> > I still dont see a point to generate OOXML and most people dont care
> > as long as they can send in office native formats.
> >
> > I never heard someone saying, please send it on docx, your doc is a
> > closed binary format.
>
> Actually, I have. But it also matters on mobile, as well as, I'd guess,
> for some developing processes for batch conversion of documents. Finally,
> it's not evident to me that refusing to develop to what is likely to become
> the major desktop document format globally—alas—is a good strategy that
> would lead to the adoption of OO. Rather, it seems it would only help those
> applications that do (express) both ODF *and* .docx well.
>

Please dont forget, the computer business have always had 2 types of
standard the official one and the de facto one.

For those to young to remember, tcp/ip is not an official standard (OSI
was) but something a number of companies decided to promote, I see docx in
the same light.


   I think this has it backwards.  For ages, .doc was the defacto standard 
   And de jure ISO/W3C standards like SGML, ODA, and even XML did not do 
   Anything to dent that.  That is now .doc and .docx, however defacto 
   you consider them to be (although they are both now all open formats).

   I am squarely in the same camp as Peter Kelley and Luis Suarez-
   Potts with regard to the pragmatic situation that exists.  One-way 
   movement to ODF is simply going to be unacceptable, possibly forever,
   if you are determined to have "there must be only one" in a niche of
   like-minded followers.

   This is unfortunate for one particular reason -- ODF is the only well-
   established multi-platform document format, thanks to the wider platform
   support of LibreOffice and Apache OpenOffice.  (Those also introduce
   de facto and monoculture factors that are omitted in the marketing 
   speak.)

   But without a dramatic increase in Linux penetration, this may not dent
   The state of affairs much.  The bigger penetration opportunity is iOS 
   and Android, not Linux.  And you may have noticed that Microsoft has 
   figured that out and is moving dramatically to provide OOXML inter-
   operability via the cloud (especially Sky-/One-Drive and Office Web 
   Apps) and via phone/phablet/tablet presence on Windows 8, WindowsPhone8, 
   Android (including the Amazon flavor), and iOS.  There are even 
   provisions for concurrent collaboration already strong in the flag-
   carrying application, Microsoft OneNote, an openly-documented but 
   not-standardized format.  

   The last time I checked, the OneDrive free in-browser Office Web Apps 
   also support ODF 1.2 documents, although it will convert them to a 
   MSO-compatible cloud subset form if you want to edit them there, even
   Though retrievable in ODF 1.2.  Viewing works out of the box.  My 
   impression of the editing pre-conversion is that is a safety measure 
   in case any ODF feature loss is unacceptable and so you still have an 
   intact original there.



-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: OOXML

2014-08-02 Thread jan i
On 2 August 2014 20:27, Dennis E. Hamilton  wrote:

> s below.
>
>
> -Original Message-
> From: jan i [mailto:j...@apache.org]
> Sent: Saturday, August 2, 2014 08:57
> To: dev
> Subject: Re: OOXML
>
> On 2 August 2014 17:06, Louis Suárez-Potts  wrote:
>
> >
> > > On 2014-08-02, at 10:24, Alexandro Colorado  wrote:
> > >
> > > The Support that is done is to receieve OOXML not to produce them, the
> > > discussion issue would be to support legacy formats like .doc or .xls.
> > >
> > > I still dont see a point to generate OOXML and most people dont care
> > > as long as they can send in office native formats.
> > >
> > > I never heard someone saying, please send it on docx, your doc is a
> > > closed binary format.
> >
> > Actually, I have. But it also matters on mobile, as well as, I'd guess,
> > for some developing processes for batch conversion of documents. Finally,
> > it's not evident to me that refusing to develop to what is likely to
> become
> > the major desktop document format globally—alas—is a good strategy that
> > would lead to the adoption of OO. Rather, it seems it would only help
> those
> > applications that do (express) both ODF *and* .docx well.
> >
>
> Please dont forget, the computer business have always had 2 types of
> standard the official one and the de facto one.
>
> For those to young to remember, tcp/ip is not an official standard (OSI
> was) but something a number of companies decided to promote, I see docx in
> the same light.
>
> 
>I think this has it backwards.  For ages, .doc was the defacto standard
>And de jure ISO/W3C standards like SGML, ODA, and even XML did not do
>Anything to dent that.  That is now .doc and .docx, however defacto
>you consider them to be (although they are both now all open formats).
>
>I am squarely in the same camp as Peter Kelley and Luis Suarez-
>Potts with regard to the pragmatic situation that exists.  One-way
>movement to ODF is simply going to be unacceptable, possibly forever,
>if you are determined to have "there must be only one" in a niche of
>like-minded followers.
>
>This is unfortunate for one particular reason -- ODF is the only well-
>established multi-platform document format, thanks to the wider platform
>support of LibreOffice and Apache OpenOffice.  (Those also introduce
>de facto and monoculture factors that are omitted in the marketing
>speak.)
>
>    But without a dramatic increase in Linux penetration, this may not dent
>The state of affairs much.  The bigger penetration opportunity is iOS
>and Android, not Linux.  And you may have noticed that Microsoft has
>figured that out and is moving dramatically to provide OOXML inter-
>operability via the cloud (especially Sky-/One-Drive and Office Web
>Apps) and via phone/phablet/tablet presence on Windows 8, WindowsPhone8,
>Android (including the Amazon flavor), and iOS.  There are even
>provisions for concurrent collaboration already strong in the flag-
>carrying application, Microsoft OneNote, an openly-documented but
>not-standardized format.
>
>The last time I checked, the OneDrive free in-browser Office Web Apps
>also support ODF 1.2 documents, although it will convert them to a
>MSO-compatible cloud subset form if you want to edit them there, even
>Though retrievable in ODF 1.2.  Viewing works out of the box.  My
>impression of the editing pre-conversion is that is a safety measure
>in case any ODF feature loss is unacceptable and so you still have an
>intact original there.
> 
>

I too am on peter fast rolling waggon :-) but I am also confused.

@peter maybe you could explain a couple of things, for non-document
specialists:

1) Following your thought, with biderectional editors. Why would a editor
have a home format ?

Following your thought to the end, the editor would always save/read in the
format, and things not supported in the format with be saved as private.

2) When editing in format foo, one can expect that not all features are
supported (like e.g. microsoft macros), these are handled as private
containers.

But looking at LO there seems to be huge challenges when doing especially
copy/paste operations ?

3) If we save private info in .docx, how can be be sure that a microsoft
editor does not destroy it ?

Does the standard contain some rules about keeping private information ?

thanks in advance
jan I.

>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
>
>


RE: OOXML

2014-08-02 Thread Dennis E. Hamilton
In line with the sketch that Peter Kelley provides below, I am personally very 
sympathetic to the idea of having an internal model that can tolerate 
difference in format between input and output while preserving in the output 
everything from the input format it can, even by leaving markers that will be 
useful on future input of the produced form.  (There is a well-known case of 
Microsoft Office doing this for HTML it exports, although the added information 
for recovery of the MSO rendition led to many complaints about document bloat.)

There are some conflicts between the desire to do this and the fact that some 
alterations have non-local consequences and may have other effects.  I still 
support the idea, but there are some tricky cases, including

- Changes that overlap/conflict with tracked changes but tracked changes are 
not updated/preserved properly
- Accessibility impacts
- Digital signature applying to content not observable by the signer
- Covert content of various kinds
- breaking of RDF/RDA connections into the document (along with failure to 
preserve markers correctly)

The digital signature and covert-content avoidance cases work against 
preserving material that is not evident in a given application.  In the case of 
ODF, the damage to tracked changes is survivable (with some loss), because the 
ODF approach is resilient.  But not knowing about the tracked changes gets into 
the digital signature problem if the material is preserved while not being 
visible to the user.

There is also a case around confusion between two consumers having to do with 
how image renditions in ODF are negotiated, with the consumer presenting the 
best that it recognizes that is not necessarily the preferable best that the 
producer listed in the choices it offered in the document.  This raises Digital 
signature considerations as well.

I don’t think this should stop the kind of exploration Peter Kelly is embarked 
upon.  At some point, these considerations will surface and it will be 
interesting to see what a creative accommodation might be.

It's not clear to me that the openoffice.org descendants can do much about 
format ecumenicalism very quickly, if at all, so I have probably gotten pretty 
off-topic at this point.


 -- Dennis E. Hamilton
dennis.hamil...@acm.org+1-206-779-9430
https://keybase.io/orcmid  PGP F96E 89FF D456 628A
X.509 certs used and requested for signed e-mail



From: Peter Kelly [mailto:kelly...@gmail.com] 
Sent: Saturday, August 2, 2014 09:43
To: dev@openoffice.apache.org
Subject: Re: OOXML

On 2 Aug 2014, at 9:24 pm, Alexandro Colorado  wrote:


The Support that is done is to receieve OOXML not to produce them, the
discussion issue would be to support legacy formats like .doc or .xls.

I still dont see a point to generate OOXML and most people dont care
as long as they can send in office native formats.

I never heard someone saying, please send it on docx, your doc is a
closed binary format.

I (and I suspect I'm not alone) see a lack of the ability to 1) Save OOXML 
documents and 2) Do so while preserving all elements, including unsupported 
features and Microsoft-only data as being the #1 limitation to OpenOffice 
today. The fact is, OOXML is in practice extremely widely used (vastly more so 
than ODF) and I argue that if OpenOffice is to have any relevance going forward 
it must support it, and support it well.

The migration path in particular, which I mentioned previously, is not just 
about importing files but enabling a period of a number of years during which 
an organisation can effectively work with a mixture of OOXML and ODF documents. 
This allows the transition to be done incrementally - a company with 30,000 
employees will only migrate if there's a way they can do so bit-by-bit, with 
some departments sticking with OOXML for longer than others. Because there will 
be people in different departments that need to work together, those who insist 
on remaining with OOXML for the time being must be able to collaborate in both 
directions with those who have switched for all their other documents.

It's the same situation as the transition Microsoft made from the old binary 
formats to OOXML - Office 2007 (and all later versions) still support the older 
formats, for both read and write, and I expect they will continue for some 
time. If Office 2007 had completely dropped support for saving .doc, .xls, and 
.ppt, it would have been dead-on-arrival, as it took several years before most 
people were saving in the newer format by default.

Now there is still the question of how OpenOffice could go about supporting 
these formats. There is already an import filter which sort-of works (though I 
had to direct a customer to LibreOffice the other day as they were having 
trouble opening a perfectly-valid .docx using OpenOffice). This could be left 
in place, with fixes where necessary, and a new export filter written for 
saving. The probl

RE: OOXML

2014-08-02 Thread Dennis E. Hamilton
Below, Jan asks

  "Does the standard contain some rules about keeping private information ?"

There are two cases for ODF 1.2.

First there is the case for foreign elements/attributes/attribute values.  This 
would be the case for some sort of extended material incorporated in the ODF 
document.  This makes a Conforming OpenDocument Document into an Extended 
OpenDocument Document.  A Conforming OpenDocument Consumer is permitted to 
ignore all of that, based on some rules about whether or not it occurs in 
(technically-defined) paragraph content or elsewhere in the format.  There can 
also be foreign content in the XML package of the document, where there is no 
recognized relationship of that content to anything in the document as seen by 
an ODF Consumer.  

There are places where the preservation of such foreign material is recommended 
but not required.  Most implementations lose all content that they are not 
implemented to interpret.  Microsoft Office very definitely does that in its 
acceptance of OpenDocument Document files.  This happens mainly because the 
typical internal model doesn't preserve the original XML parts and it doesn't 
work by manipulation of the XML parts.  I suspect that Microsoft concerns about 
document security are also a factor, in addition to unwillingness to support 
features that are not part of the ODF specification.  (The position, as I 
understand it, is that they will support the standard, not OpenOffice's 
particular implementation around it, and I don't know how much flexibility 
there is in that respect.  That OpenOffice *is* the standard is a popular view 
that happens to be inconsistent with the principles of ISO or any 
standards-development organization that are committed to the ideal of 
independently-implemented interoperable implementations.)

The second case has to do with features of ODF that a particular implementation 
does not support.  In general, these do not survive in current implementations, 
since import into the internal model loses that material and there is 
consequently no provision for exporting it.  Here, there is the fact that there 
is no strict minimum Conforming OpenDocument Consumer.  A consumer must not 
object to anything in the document file that conforms to the ODF specification, 
but it is not required to "interpret" all or even any minimum set of features.  
There is no producer that I am aware of that produces all features provided for 
in the ODF specification, and most implementations only interpret those 
features that they are designed to produce (sometimes incorrectly) themselves.  
This doesn't matter too much if you use implementations with a common 
genealogy, but across independent implementations not having any common code 
base there tend to be unexpected surprises.  There are also many places where a 
provision of ODF is not rigorously defined and implementation-dependent 
variation is the result, whether explicitly called out (e.g., for macros and 
scripts) or not (e.g., for supported image formats).


 -- Dennis E. Hamilton
dennis.hamil...@acm.org+1-206-779-9430
https://keybase.io/orcmid  PGP F96E 89FF D456 628A
X.509 certs used and requested for signed e-mail



-Original Message-
From: jan i [mailto:j...@apache.org] 
Sent: Saturday, August 2, 2014 11:58
To: dev; Dennis Hamilton
Subject: Re: OOXML

On 2 August 2014 20:27, Dennis E. Hamilton  wrote:

> s below.
>
>
> -Original Message-
> From: jan i [mailto:j...@apache.org]
> Sent: Saturday, August 2, 2014 08:57
> To: dev
> Subject: Re: OOXML
>
> On 2 August 2014 17:06, Louis Suárez-Potts  wrote:
>
> >
> > > On 2014-08-02, at 10:24, Alexandro Colorado  wrote:
> > >
> > > The Support that is done is to receieve OOXML not to produce them, the
> > > discussion issue would be to support legacy formats like .doc or .xls.
> > >
> > > I still dont see a point to generate OOXML and most people dont care
> > > as long as they can send in office native formats.
> > >
> > > I never heard someone saying, please send it on docx, your doc is a
> > > closed binary format.
> >
> > Actually, I have. But it also matters on mobile, as well as, I'd guess,
> > for some developing processes for batch conversion of documents. Finally,
> > it's not evident to me that refusing to develop to what is likely to
> become
> > the major desktop document format globally—alas—is a good strategy that
> > would lead to the adoption of OO. Rather, it seems it would only help
> those
> > applications that do (express) both ODF *and* .docx well.
> >
>
> Please dont forget, the computer business have always had 2 types of
> standard the official one and the de facto one.
>
> For those to young to remember, tcp/ip is not

Re: OOXML

2014-08-02 Thread Andrew Douglas Pitonyak


I am often required to read and write DOCX files and I know others for 
which this is a need. If I cannot accurately read / write DOCX files (or 
if I suspect that it may not work correctly) then I use Word; I don't 
like it when I have to use Word.


On 08/02/2014 10:24 AM, Alexandro Colorado wrote:

The Support that is done is to receieve OOXML not to produce them, the
discussion issue would be to support legacy formats like .doc or .xls.

I still dont see a point to generate OOXML and most people dont care
as long as they can send in office native formats.

I never heard someone saying, please send it on docx, your doc is a
closed binary format.

On 8/2/14, Peter Kelly  wrote:

On 1 Aug 2014, at 2:42 pm, Rory O'Farrell  wrote:


For information:
http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/

An interesting article. This brings to mind a few issues I've been thinking
about for a while:

- I think the rather extreme anti-OOXML stance that some take can be
counterproductive. I certainly hold the view that ODF is a superior standard
in many respects (though not all), however there are circumstances where it
makes sense for a given piece of software to support both. For example they
cite the lack of support for ODF in Google Docs and iWork; if one wants to
develop software that will interoperate with these would require OOXML
support.

My take on the issue is that it's important to support both, because as much
as we might dislike the fact, OOXML is out there and used very widely. With
the work I'm currently doing on UX Write, I'm adding to the existing OOXML
(specifically .docx) support with support for for ODF (.odt) and doing this
in a common framework such that the app itself doesn't care which format the
file is natively stored in, it will work equally well with both.
Additionally, once I have the ODF support in, it will be possible to
leverage this support for conversion between the two formats in both
directions. I'll be giving a talk on this at ApacheCon EU later this year,
and yes this framework will soon be open source - if anyone is interested in
collaborating on it, please let me know.

- One of the criticisms raised is that there are several different versions
of OOXML, not all of which are entirely compatible. However this is also
true of ODF (or at least of MS's implementation in Office 2007 and 2010; I'm
not sure where the fault lies). One of the big questions I've been asking
myself in the work I'm currently on ODF is whether I should have my
implementation it save ODF 1.1 by default, or version 1.2 by default. If I
choose the former, it will work with Office 2007 and onwards. The latter,
only Office 2013 (I think). For someone such as myself writing a new
implementation of the (prat of) ODF spec, and desiring compatibility with
Office 2007 and 2010, which is the best choice?

- I consider the use of proprietary fonts to be a separate issue from the
standard itself. The specification is silent on the matter, so this is
really a criticism of MS Office rather than OOXML itself. Nonetheless, it's
an important one, and one I believe we should address by promoting the use
of open source fonts (e.g. https://www.google.com/fonts) independently and
in addition to the use of ODF. Perhaps these could be made available as an
easily-distributed separate package, so that those who want to stick with MS
Office for whatever reason could be encouraged to install & use them, for
improved interoperability with other office suites?

In an organisation where there are some users on MS and others on OO/LO,
these fonts could be deployed by the IT department as part of the standard
desktop image, and all templates created by the organisations could use
these fonts by default, which would lead to wider usage.

- Towards the end of the article, there's a discussion about the lack of
support for ODF by some vendors, particularly Google and Apple. The question
then is how do we fix that? My view is that there needs to be a migration
path - and by that I mean not just a tool to convert documents from OOMXL to
ODF, but the ability to go both ways, and work with either format for as
long as necessary for the migration to complete. Most (all?) successful
transitions I've seen have used a similar approach - Microsoft going from
DOS to Windows, Apple going from 68k -> PPC -> Intel, and Mac OS classic ->
OS X, and so forth.

In the case of document formats, for a country whose government currently
uses MS Office and OOXML that wants to make the switch to ODF and
OpenOffice/LibreOffice/other tools, it's not going to be an overnight
change. It could very well take several years, and during that period
everyone in the organisation will need to have the capability to work with
both formats. New or modified documents would in general be saved in ODF,
but older documents as well as documents that need to be exc

Re: OOXML

2014-08-02 Thread Guy Waterval
+1
-- 
gw


2014-08-03 1:47 GMT+02:00 Andrew Douglas Pitonyak :

>
> I am often required to read and write DOCX files and I know others for
> which this is a need. If I cannot accurately read / write DOCX files (or if
> I suspect that it may not work correctly) then I use Word; I don't like it
> when I have to use Word.
>
>
> On 08/02/2014 10:24 AM, Alexandro Colorado wrote:
>
>> The Support that is done is to receieve OOXML not to produce them, the
>> discussion issue would be to support legacy formats like .doc or .xls.
>>
>> I still dont see a point to generate OOXML and most people dont care
>> as long as they can send in office native formats.
>>
>> I never heard someone saying, please send it on docx, your doc is a
>> closed binary format.
>>
>> On 8/2/14, Peter Kelly  wrote:
>>
>>> On 1 Aug 2014, at 2:42 pm, Rory O'Farrell  wrote:
>>>
>>>  For information:
>>>> http://www.themukt.com/2014/07/31/never-use-microsofts-ooxml-format/
>>>>
>>> An interesting article. This brings to mind a few issues I've been
>>> thinking
>>> about for a while:
>>>
>>> - I think the rather extreme anti-OOXML stance that some take can be
>>> counterproductive. I certainly hold the view that ODF is a superior
>>> standard
>>> in many respects (though not all), however there are circumstances where
>>> it
>>> makes sense for a given piece of software to support both. For example
>>> they
>>> cite the lack of support for ODF in Google Docs and iWork; if one wants
>>> to
>>> develop software that will interoperate with these would require OOXML
>>> support.
>>>
>>> My take on the issue is that it's important to support both, because as
>>> much
>>> as we might dislike the fact, OOXML is out there and used very widely.
>>> With
>>> the work I'm currently doing on UX Write, I'm adding to the existing
>>> OOXML
>>> (specifically .docx) support with support for for ODF (.odt) and doing
>>> this
>>> in a common framework such that the app itself doesn't care which format
>>> the
>>> file is natively stored in, it will work equally well with both.
>>> Additionally, once I have the ODF support in, it will be possible to
>>> leverage this support for conversion between the two formats in both
>>> directions. I'll be giving a talk on this at ApacheCon EU later this
>>> year,
>>> and yes this framework will soon be open source - if anyone is
>>> interested in
>>> collaborating on it, please let me know.
>>>
>>> - One of the criticisms raised is that there are several different
>>> versions
>>> of OOXML, not all of which are entirely compatible. However this is also
>>> true of ODF (or at least of MS's implementation in Office 2007 and 2010;
>>> I'm
>>> not sure where the fault lies). One of the big questions I've been asking
>>> myself in the work I'm currently on ODF is whether I should have my
>>> implementation it save ODF 1.1 by default, or version 1.2 by default. If
>>> I
>>> choose the former, it will work with Office 2007 and onwards. The latter,
>>> only Office 2013 (I think). For someone such as myself writing a new
>>> implementation of the (prat of) ODF spec, and desiring compatibility with
>>> Office 2007 and 2010, which is the best choice?
>>>
>>> - I consider the use of proprietary fonts to be a separate issue from the
>>> standard itself. The specification is silent on the matter, so this is
>>> really a criticism of MS Office rather than OOXML itself. Nonetheless,
>>> it's
>>> an important one, and one I believe we should address by promoting the
>>> use
>>> of open source fonts (e.g. https://www.google.com/fonts) independently
>>> and
>>> in addition to the use of ODF. Perhaps these could be made available as
>>> an
>>> easily-distributed separate package, so that those who want to stick
>>> with MS
>>> Office for whatever reason could be encouraged to install & use them, for
>>> improved interoperability with other office suites?
>>>
>>> In an organisation where there are some users on MS and others on OO/LO,
>>> these fonts could be deployed by the IT department as part of the
>>> standard
>>> desktop image, and all templates created by the organisations could use
>>> these fonts by default, which w

Re: OOXML

2014-08-03 Thread jan i
On 2 August 2014 22:31, Dennis E. Hamilton  wrote:

> Below, Jan asks
>
>   "Does the standard contain some rules about keeping private information
> ?"
>
> There are two cases for ODF 1.2.
>
> First there is the case for foreign elements/attributes/attribute values.
>  This would be the case for some sort of extended material incorporated in
> the ODF document.  This makes a Conforming OpenDocument Document into an
> Extended OpenDocument Document.  A Conforming OpenDocument Consumer is
> permitted to ignore all of that, based on some rules about whether or not
> it occurs in (technically-defined) paragraph content or elsewhere in the
> format.  There can also be foreign content in the XML package of the
> document, where there is no recognized relationship of that content to
> anything in the document as seen by an ODF Consumer.
>
> There are places where the preservation of such foreign material is
> recommended but not required.  Most implementations lose all content that
> they are not implemented to interpret.  Microsoft Office very definitely
> does that in its acceptance of OpenDocument Document files.  This happens
> mainly because the typical internal model doesn't preserve the original XML
> parts and it doesn't work by manipulation of the XML parts.  I suspect that
> Microsoft concerns about document security are also a factor, in addition
> to unwillingness to support features that are not part of the ODF
> specification.  (The position, as I understand it, is that they will
> support the standard, not OpenOffice's particular implementation around it,
> and I don't know how much flexibility there is in that respect.  That
> OpenOffice *is* the standard is a popular view that happens to be
> inconsistent with the principles of ISO or any standards-development
> organization that are committed to the ideal of independently-implemented
> interoperable implementations.)
>
> The second case has to do with features of ODF that a particular
> implementation does not support.  In general, these do not survive in
> current implementations, since import into the internal model loses that
> material and there is consequently no provision for exporting it.  Here,
> there is the fact that there is no strict minimum Conforming OpenDocument
> Consumer.  A consumer must not object to anything in the document file that
> conforms to the ODF specification, but it is not required to "interpret"
> all or even any minimum set of features.  There is no producer that I am
> aware of that produces all features provided for in the ODF specification,
> and most implementations only interpret those features that they are
> designed to produce (sometimes incorrectly) themselves.  This doesn't
> matter too much if you use implementations with a common genealogy, but
> across independent implementations not having any common code base there
> tend to be unexpected surprises.  There are also many places where a
> provision of ODF is not rigorously defined and implementation-dependent
> variation is the result, whether explicitly called out (e.g., for macros
> and scripts) or not (e.g., for supported image formats).
>

Does a consumer normally have some sort of conformance sheet (like we have
for communication protocols) or is it solely the user that painfully finds
the lack of support ?


In the other mail you write a quite interesting note about digital signing
of artifact the user cannot see. Do you happen to know how microsoft goes
around that with the web based offerings ?

Thanks for some very interesting input.
rgds
jan I.

>
>
>  -- Dennis E. Hamilton
> dennis.hamil...@acm.org+1-206-779-9430
> https://keybase.io/orcmid  PGP F96E 89FF D456 628A
> X.509 certs used and requested for signed e-mail
>
>
>
> -Original Message-
> From: jan i [mailto:j...@apache.org]
> Sent: Saturday, August 2, 2014 11:58
> To: dev; Dennis Hamilton
> Subject: Re: OOXML
>
> On 2 August 2014 20:27, Dennis E. Hamilton 
> wrote:
>
> > s below.
> >
> >
> > -Original Message-
> > From: jan i [mailto:j...@apache.org]
> > Sent: Saturday, August 2, 2014 08:57
> > To: dev
> > Subject: Re: OOXML
> >
> > On 2 August 2014 17:06, Louis Suárez-Potts  wrote:
> >
> > >
> > > > On 2014-08-02, at 10:24, Alexandro Colorado  wrote:
> > > >
> > > > The Support that is done is to receieve OOXML not to produce them,
> the
> > > > discussion issue would be to support legacy formats like .doc or
> .xls.
> > > >
> > > > I still dont see a point to generate OOXML and most people dont care
> > > > as long as they can send in offi

Re: OOXML

2014-08-03 Thread Peter Kelly
On 3 Aug 2014, at 1:57 am, jan i  wrote:

> I too am on peter fast rolling waggon :-) but I am also confused.
> 
> @peter maybe you could explain a couple of things, for non-document
> specialists:
> 
> 1) Following your thought, with biderectional editors. Why would a editor
> have a home format ?

There's two ways to view a format: (1) as a way of encoding information for 
storage or transmission, and (2) as an in-memory data structure used by the 
editor at runtime. In some programs these are two different things, and in 
others they are the same. The latter is true of web browsers - HTML is both the 
file format and the runtime data model; the W3C DOM APIs can be used to 
manipulate the HTML structure directly. I believe this was also true to a large 
extent with the binary formats used by older versions of MS Office, for 
purposes of efficiency [1].

I'm not familiar with the internals of OpenOffice - one thing I'd be very 
interested to know is does it use ODF for it's in-memory representation of the 
document? Or are the runtime data structures used different to the XML trees 
that one finds in an ODF package?

> Following your thought to the end, the editor would always save/read in the
> format, and things not supported in the format with be saved as private.

The issue of how to handle features not supported by the format is a tricky 
one. My initial view is that those features are best disabled if the user 
chooses to save in that format (or alternatively a warning message shown on 
save), since even if there were private extensions saved in the foreign format, 
they won't be supported in other apps, and are not guaranteed to be preserved 
(see further below).

> 2) When editing in format foo, one can expect that not all features are
> supported (like e.g. microsoft macros), these are handled as private
> containers.
> 
> But looking at LO there seems to be huge challenges when doing especially
> copy/paste operations ?

Yes, this is a very tricky problem. Even with a simple bidirectional 
transformation model, where you have a 1:1 mapping between elements in the 
concrete document and elements in the abstract document (concrete = original 
format, abstract = format used by the editor), it's not possible to know what 
should be done for elements that have been copied & pasted.

One approach would be to make the mapping 1:n, where if an element in the 
abstract (editable) document is copied & pasted one or more times, then its 
corresponding element in the concrete document is also duplicated at save time 
when the file is updated. However, this can potentially violate uniqueness 
constraints, e.g. if the element being copied is supposed to have a unique 
identifier, you can't just go making a direct copy of it, as you'd end up with 
two elements with the same identifier. However, if the implementation was aware 
of such uniqueness constraints for specific elements it could ensure these are 
still respected, even if it doesn't support any other aspects of the element 
(e.g. editing or rendering).

Cut & paste is much easier to handle though as it's equivalent to a move 
operation, which doesn't have any implications for uniqueness constraints.

> 3) If we save private info in .docx, how can be be sure that a microsoft
> editor does not destroy it ?
> 
> Does the standard contain some rules about keeping private information ?

Well, we can never be *completely* sure that a microsoft editor won't destroy 
something ;)

Having said that though, there are a couple of provisions for this. One is 
simply the ability to include extra files in the package, labeled with a 
particular namespace. Each OOXML package contains a "relationship graph", which 
is a separate data structure from the zip file's directory hierarchy, and is 
what OOXML uses to identify "parts" (files) within the package. In principle, 
there should be no problem with simply adding an extra part with whatever 
namespace you like, and that being preserved. However, this isn't guaranteed if 
an implementation does an import/export, since usually any extra information 
gets lost on import and is no longer there by the time export occurs.

I've just done a test on this in fact, to see how different implementations 
handle it. I added an extra XML file to a package, and referenced it from the 
relationships graph. Under Word 2011 and Word 2013, this file was preserved 
after modification. Under LibreOffice Writer however, the file disappeared from 
the package after a save. I suspect this is due to the file being imported into 
either ODF of LibreOffice's own internal data model, and thus the extra 
information being missing on save (if any of the LO developers are reading 
this... perhaps you can comment here).

Ironically the warning message LO displayed when I tried t

Re: OOXML

2014-08-03 Thread Peter Kelly
On 3 Aug 2014, at 3:05 am, Dennis E. Hamilton  wrote:

> In line with the sketch that Peter Kelley provides below, I am personally 
> very sympathetic to the idea of having an internal model that can tolerate 
> difference in format between input and output while preserving in the output 
> everything from the input format it can, even by leaving markers that will be 
> useful on future input of the produced form.  (There is a well-known case of 
> Microsoft Office doing this for HTML it exports, although the added 
> information for recovery of the MSO rendition led to many complaints about 
> document bloat.)

On a semi-related note, there's once quite fascinating implementation of ODF 
I've seen called WebODF (see http://webodf.org; the code is open source). This 
is an in-browser editor, and actually works by loading the content.xml file 
from the ODF package into the DOM tree of the browser, thus having it contained 
within the HTML content of the page. Through clever use of CSS namespaces, it's 
able to achieve a pretty faithful rendering of the document using the browser's 
built-in layout engine, even though the content itself is not in HTML.

From what I understand about their approach, the reason they did this I believe 
is as a way to ensure that the XML structure of the ODF file is preserved 
exactly, which is much more difficult to achieve if the content is converted 
into HTML first (as in my implementation). Web browsers are actually very good 
at handling content in this way, since you can just use the CSS property 
setting "display: none" to hide any elements that shouldn't be rendered on 
screen, and this CSS can be kept entirely separate (or even dynamically 
generated by javascript) and not part of the XML content itself. So WebODF 
takes advantage of the fact that a web browser will just preserve information 
by default, and it works quite well.

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: OOXML

2014-08-03 Thread Regina Henschel

Hi Peter,

Peter Kelly schrieb:

On 3 Aug 2014, at 1:57 am, jan i mailto:j...@apache.org>> wrote:


I too am on peter fast rolling waggon :-) but I am also confused.

@peter maybe you could explain a couple of things, for non-document
specialists:

1) Following your thought, with biderectional editors. Why would a editor
have a home format ?


There's two ways to view a format: (1) as a way of encoding information
for storage or transmission, and (2) as an in-memory data structure used
by the editor at runtime. In some programs these are two different
things, and in others they are the same. The latter is true of web
browsers - HTML is both the file format and the runtime data model; the
W3C DOM APIs can be used to manipulate the HTML structure directly. I
believe this was also true to a large extent with the binary formats
used by older versions of MS Office, for purposes of efficiency [1].

I'm not familiar with the internals of OpenOffice - one thing I'd be
very interested to know is does it use ODF for it's in-memory
representation of the document? Or are the runtime data structures used
different to the XML trees that one finds in an ODF package?


No, OpenOffice has a very different in-memory representation than the 
ODF format. And the API is a third version of looking at the document.


Kind regards
Regina

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: OOXML

2014-08-03 Thread Peter Kelly
On 3 Aug 2014, at 6:52 pm, Regina Henschel  wrote:

> Peter Kelly schrieb:
>> There's two ways to view a format: (1) as a way of encoding information
>> for storage or transmission, and (2) as an in-memory data structure used
>> by the editor at runtime. In some programs these are two different
>> things, and in others they are the same. The latter is true of web
>> browsers - HTML is both the file format and the runtime data model; the
>> W3C DOM APIs can be used to manipulate the HTML structure directly. I
>> believe this was also true to a large extent with the binary formats
>> used by older versions of MS Office, for purposes of efficiency [1].
>> 
>> I'm not familiar with the internals of OpenOffice - one thing I'd be
>> very interested to know is does it use ODF for it's in-memory
>> representation of the document? Or are the runtime data structures used
>> different to the XML trees that one finds in an ODF package?
> 
> No, OpenOffice has a very different in-memory representation than the ODF 
> format. And the API is a third version of looking at the document.

Interesting.

Given this is the case, what would you suggest would be the best strategy for 
supporting OOXML?

1) Two-way conversion between OOXML and ODF, with OpenOffice then dealing 
solely with the file as ODF (not even being aware it came from OOXML originally)
2) Two-way conversion between OOXML and OpenOffice's internal representation, 
bypassing ODF altogether

The second option has the advantage that it would be easier to cater for 
features that are supported in OOXML but not ODF, e.g. table styles. However 
the first option has the advantage that it would keep the core entirely 
separate from the OOXML filter, and could potentially be constructed as in a 
general-purpose manner and made usable as a library by other software.

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: OOXML

2014-08-03 Thread jan i
On 3 August 2014 18:50, Peter Kelly  wrote:

> On 3 Aug 2014, at 6:52 pm, Regina Henschel 
> wrote:
>
> Peter Kelly schrieb:
>
> There's two ways to view a format: (1) as a way of encoding information
> for storage or transmission, and (2) as an in-memory data structure used
> by the editor at runtime. In some programs these are two different
> things, and in others they are the same. The latter is true of web
> browsers - HTML is both the file format and the runtime data model; the
> W3C DOM APIs can be used to manipulate the HTML structure directly. I
> believe this was also true to a large extent with the binary formats
> used by older versions of MS Office, for purposes of efficiency [1].
>
> I'm not familiar with the internals of OpenOffice - one thing I'd be
> very interested to know is does it use ODF for it's in-memory
> representation of the document? Or are the runtime data structures used
> different to the XML trees that one finds in an ODF package?
>
>
> No, OpenOffice has a very different in-memory representation than the ODF
> format. And the API is a third version of looking at the document.
>
>
> Interesting.
>
> Given this is the case, what would you suggest would be the best strategy
> for supporting OOXML?
>
> 1) Two-way conversion between OOXML and ODF, with OpenOffice then dealing
> solely with the file as ODF (not even being aware it came from OOXML
> originally)
> 2) Two-way conversion between OOXML and OpenOffice's internal
> representation, bypassing ODF altogether
>
> The second option has the advantage that it would be easier to cater for
> features that are supported in OOXML but not ODF, e.g. table styles.
> However the first option has the advantage that it would keep the core
> entirely separate from the OOXML filter, and could potentially be
> constructed as in a general-purpose manner and made usable as a library by
> other software.
>

By painfull experience, I found out that our internal (memory) structure is
a superset of mixed ODF and pre-odf items. I dont think you can have a pure
odf/OOXML memory structure, you need internal pointers as well (like
start/finish of copy buffer)...but of course those 2 parts should have been
well separated.

I wonder, you wrote earlier that UXwrite uses html internally, that seems
for me as the lowest common nominator...I would have thought a real
superset would have been the better choise ?

Some parts of AOO uses the structure directly others go through the API,
that is not very clean, and makes it extremly difficult to test chaanges in
the internal memory layout. An application like this (and many other
similar types), should see the memory as a capsule, with a fixed API around
it.

rgds
jan I

>
> --
> Dr. Peter M. Kelly
> Founder, UX Productivity
> pe...@uxproductivity.com
> http://www.uxproductivity.com/
> http://www.kellypmk.net/
>
> PGP key: http://www.kellypmk.net/pgp-key
> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>
>


RE: OOXML

2014-08-03 Thread Dennis E. Hamilton
Below, Jan asks

  “Does a consumer normally have some sort of conformance sheet 
(like we have for communication protocols) or is it solely the user 
that painfully finds the lack of support ?”

I think this is easy to answer.  Where have you found an ODF conformance sheet 
for Apache OpenOffice?  LibreOffice?

Many choices of what to implement and also deviations of the way features are 
implemented are left implementation-dependent.  In ODF 1.2 there are more cases 
where *implementation-defined* is a requirement.  I am not aware how any of 
those have come up for AOO and LibO and how the implementation-based choices 
are defined, if any.

Here is a serious conformance statement I have found: 
<http://technet.microsoft.com/en-us/library/ff852100(v=office.14).aspx>

Here are some about ODF (scroll down to [MS-OODF], [MS-OODF2], and [MS-OODF3], 
<http://msdn.microsoft.com/en-us/library/gg548604.aspx>.   

Here’s the on-line version of the one for ODF 1.2 support: 
<http://msdn.microsoft.com/en-us/library/hh695327.aspx>.  

It is instructive to expand the sidebar section 2 Standards Support Statements 
and 2.1 Normative Variations.  (I never know what it means to say something is 
not supported.  I believe it is clear that such features are not produced, but 
I have no idea what happens when a not-supported provision is encountered in an 
input document.  All in all, I think this is, compared to other 
implementations, a “glass-half-full” condition.)

In the past there was an on-line database that you could use to review 
compliance with ODF feature by feature, line chapter and verse.  It provided 
for user comments and questions at that level.  It was ill-maintained and I can 
no longer find it.  It looks like the [MS-OODFn] documents have taken on that 
task.  The statements in those documents are very much what was to be found on 
the database.

Cynics will point out that the EUC required Microsoft to describe all 
deviations in its support of ODF.  It is unfortunate that the EUC did not 
consider that such statements would be important from other sources of ODF 
Consumers as well.


 -- Dennis E. Hamilton
dennis.hamil...@acm.org+1-206-779-9430
https://keybase.io/orcmid  PGP F96E 89FF D456 628A
X.509 certs used and requested for signed e-mail




-Original Message-
From: jan i [mailto:j...@apache.org] 
Sent: Sunday, August 3, 2014 00:57
To: dev; Dennis Hamilton
Subject: Re: OOXML

On 2 August 2014 22:31, Dennis E. Hamilton  wrote:
> [ ... ] There is no strict minimum Conforming OpenDocument
> Consumer.  A consumer must not object to anything in the document file that
> conforms to the ODF specification, but it is not required to "interpret"
> all or even any minimum set of features.  There is no producer that I am
> aware of that produces all features provided for in the ODF specification,
> and most implementations only interpret those features that they are
> designed to produce (sometimes incorrectly) themselves.  This doesn't
> matter too much if you use implementations with a common genealogy, but
> across independent implementations not having any common code base there
> tend to be unexpected surprises.  There are also many places where a
> provision of ODF is not rigorously defined and implementation-dependent
> variation is the result, whether explicitly called out (e.g., for macros
> and scripts) or not (e.g., for supported image formats).
>

Does a consumer normally have some sort of conformance sheet (like we have
for communication protocols) or is it solely the user that painfully finds
the lack of support ?


[ ... ]


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: OOXML

2014-08-03 Thread jan i
On 3 August 2014 19:56, Dennis E. Hamilton  wrote:

> Below, Jan asks
>
>   “Does a consumer normally have some sort of conformance sheet
> (like we have for communication protocols) or is it solely the user
> that painfully finds the lack of support ?”
>
> I think this is easy to answer.  Where have you found an ODF conformance
> sheet for Apache OpenOffice?  LibreOffice?
>

I have of course not, and always been wondering. My background is
communication protocols and in broad terms ODF can be seen as such, so to
me a statement of conformance is natural. But given your explanation that
many parts are left implementation-dependent (unlike "real" communication
protocols) I understand why.

Im simple words, its a wonder it work, we dont know why, but its a lot
better than nothing :-)

thanks
jan.


> Many choices of what to implement and also deviations of the way features
> are implemented are left implementation-dependent.  In ODF 1.2 there are
> more cases where *implementation-defined* is a requirement.  I am not aware
> how any of those have come up for AOO and LibO and how the
> implementation-based choices are defined, if any.
>
> Here is a serious conformance statement I have found:
> <http://technet.microsoft.com/en-us/library/ff852100(v=office.14).aspx>
>
> Here are some about ODF (scroll down to [MS-OODF], [MS-OODF2], and
> [MS-OODF3],
> <http://msdn.microsoft.com/en-us/library/gg548604.aspx>.
>
> Here’s the on-line version of the one for ODF 1.2 support:
> <http://msdn.microsoft.com/en-us/library/hh695327.aspx>.
>
> It is instructive to expand the sidebar section 2 Standards Support
> Statements and 2.1 Normative Variations.  (I never know what it means to
> say something is not supported.  I believe it is clear that such features
> are not produced, but I have no idea what happens when a not-supported
> provision is encountered in an input document.  All in all, I think this
> is, compared to other implementations, a “glass-half-full” condition.)
>
> In the past there was an on-line database that you could use to review
> compliance with ODF feature by feature, line chapter and verse.  It
> provided for user comments and questions at that level.  It was
> ill-maintained and I can no longer find it.  It looks like the [MS-OODFn]
> documents have taken on that task.  The statements in those documents are
> very much what was to be found on the database.
>
> Cynics will point out that the EUC required Microsoft to describe all
> deviations in its support of ODF.  It is unfortunate that the EUC did not
> consider that such statements would be important from other sources of ODF
> Consumers as well.
>
>
>  -- Dennis E. Hamilton
> dennis.hamil...@acm.org+1-206-779-9430
> https://keybase.io/orcmid  PGP F96E 89FF D456 628A
> X.509 certs used and requested for signed e-mail
>
>
>
>
> -Original Message-
> From: jan i [mailto:j...@apache.org]
> Sent: Sunday, August 3, 2014 00:57
> To: dev; Dennis Hamilton
> Subject: Re: OOXML
>
> On 2 August 2014 22:31, Dennis E. Hamilton 
> wrote:
> > [ ... ] There is no strict minimum Conforming OpenDocument
> > Consumer.  A consumer must not object to anything in the document file
> that
> > conforms to the ODF specification, but it is not required to "interpret"
> > all or even any minimum set of features.  There is no producer that I am
> > aware of that produces all features provided for in the ODF
> specification,
> > and most implementations only interpret those features that they are
> > designed to produce (sometimes incorrectly) themselves.  This doesn't
> > matter too much if you use implementations with a common genealogy, but
> > across independent implementations not having any common code base there
> > tend to be unexpected surprises.  There are also many places where a
> > provision of ODF is not rigorously defined and implementation-dependent
> > variation is the result, whether explicitly called out (e.g., for macros
> > and scripts) or not (e.g., for supported image formats).
> >
>
> Does a consumer normally have some sort of conformance sheet (like we have
> for communication protocols) or is it solely the user that painfully finds
> the lack of support ?
>
>
> [ ... ]
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
>
>


RE: OOXML

2014-08-03 Thread Dennis E. Hamilton
In a later note, Jan asks about my statement concerning digital signatures, 
private content, and covert content:

  "In the other mail you write a quite interesting note about 
   digital signing of artifact the user cannot see. Do you 
   happen to know how microsoft goes around that with the web 
   based offerings ?

Digital signatures officially entered ODF with the ODF 1.2 specification, 
although there was an implementation of that capability in versions of 
OpenOffice.org that extended their ODF 1.0/1.1 support to provide digital 
signatures.  (The ODF 1.2 version is incompatible and that created some 
interesting interoperability issues until the implementations sorted it out.)

With regard to Microsoft Office.  Microsoft supports the ODF 1.2 digital 
signature in their support for ODF in Microsoft Office 2013.  Since Microsoft 
is careful about what is signed and whether the user knows what is being signed 
(in terms of what is visible to users), there is no problem.

On receiving digitally signed ODF 1.2 documents, Microsoft verifies those 
signatures as provided.  Any editing will break the signature (as is true for 
all Consumers) and if the result is signed, there will be no unsupported 
features or private/covert content left, so all is well.

I am not certain how this applies to the Office Web Applications.  It appears 
that the Web Applications notice that a document is signed (whether they check 
it or not I have not tested) but provide no way to sign a document that is 
edited in one of the Web Applications.  


PS: Here is what I did.

I downloaded an OpenOffice Calc (.ods) file that I already had in OneDrive, 
saved it under a new name, and signed it using LibreOffice.  I put that back up 
on OneDrive.  Now, when I open the .ods, I am warned that there may be features 
lost because editing is with the on-line Excel application.  The Excel Online 
Help reports that an existing digital signature will be lost if any attempt to 
edit is performed.

When I edited the document anyhow, there was no way to sign it on saving it 
back to OneDrive.  It appears that I have to open it either in AOO or LibO or 
Excel on the desktop and sign it there.  That's easy to do on Windows 8 because 
I have a OneDrive virtual folder on my desktop.  (By the way, the making of a 
copy of the Calc file before editing in the Web Application is no longer 
automatic.  I can edit the Calc document directly, but there is a warning about 
it.  The warning links to details of what can be lost when Excel edits the Calc 
document.  That includes loss of the digital signature.)

I just uploaded a signed Microsoft Word 2013 document.  When I opened it in the 
Web Application to edit it, I was warned that editing would invalidate the 
signature.  After editing, I could find no way using the Web Application to 
sign the document.  I would have to open it in the desktop application in order 
to do that.


-Original Message-
From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org] 
Sent: Saturday, August 2, 2014 13:05
To: dev@openoffice.apache.org
Subject: RE: OOXML

[ ... ]
There are some tricky cases, including

- Changes that overlap/conflict with tracked changes but tracked changes are 
not updated/preserved properly
- Accessibility impacts
- Digital signature applying to content not observable by the signer
- Covert content of various kinds
- breaking of RDF/RDA connections into the document (along with failure to 
preserve markers correctly)

The digital signature and covert-content avoidance cases work against 
preserving material that is not evident in a given application.  In the case of 
ODF, the damage to tracked changes is survivable (with some loss), because the 
ODF approach is resilient.  But not knowing about the tracked changes gets into 
the digital signature problem if the material is preserved while not being 
visible to the user.

[ ... ]


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: OOXML

2014-08-03 Thread Andrew Douglas Pitonyak


On 08/03/2014 12:50 PM, Peter Kelly wrote:
On 3 Aug 2014, at 6:52 pm, Regina Henschel <mailto:rb.hensc...@t-online.de>> wrote:


The second option has the advantage that it would be easier to cater 
for features that are supported in OOXML but not ODF, e.g. table 
styles. However the first option has the advantage that it would keep 
the core entirely separate from the OOXML filter, and could 
potentially be constructed as in a general-purpose manner and made 
usable as a library by other software.


If AOO does not support Table Styles and a particular file format does 
not, the biggest problem is that you lose table styles when you load, 
edit, then save. If Aoo does not support Table Styles, then obviously 
that feature will not properly "round trip" from file to memory to file.


--
Andrew Pitonyak
My Macro Document: http://www.pitonyak.org/AndrewMacro.odt
Info:  http://www.pitonyak.org/oo.php



Re: OOXML

2014-08-04 Thread Peter Kelly
On 4 Aug 2014, at 12:16 am, jan i  wrote:

> By painfull experience, I found out that our internal (memory) structure is
> a superset of mixed ODF and pre-odf items. I dont think you can have a pure
> odf/OOXML memory structure, you need internal pointers as well (like
> start/finish of copy buffer)...but of course those 2 parts should have been
> well separated.

It's possible in theory, though I'm not familiar enough with the OO codebase to 
say whether it would work in practice.

The key idea is to maintain two separate data structures - one which is the ODF 
XML trees, and another which is the internal representation. Any time a change 
gets made to the former, the implementation must update the latter to reflect 
the change. Modification operations on the latter would need to go in the other 
direction.

This is how WebKit works (well, at least how it worked last time I touched the 
code, which was more than 10 years ago...). There is the DOM tree and the 
rendering tree. The DOM tree stores the HTML structure exactly as it was parsed 
from the original file; this is accessible to javascript code and can be 
modified in arbitrary ways. Whenever the DOM tree changes, WebKit updates its 
rendering tree, based both on the DOM tree and applicable rules from the CSS 
stylesheet. The rendering tree is the internal model which is used for 
displaying the content on screen.

Importantly, the DOM tree is also allowed to contain arbitrary XML elements in 
any namespace. This is how WebODF works; it includes the content.xml from the 
package directly, and that's the "authoritative" data structure that is 
manipulated during editing. The CSS rules WebODF uses control rendering of the 
content.

> I wonder, you wrote earlier that UXwrite uses html internally, that seems
> for me as the lowest common nominator...I would have thought a real
> superset would have been the better choise ?

Well a convenient thing about HTML is that you can include your extensions 
without affecting the rendered output, or risking loss of the data. This 
includes custom elements, custom attributes, and CSS style names that you may 
choose to assign special meaning to.

The reasons for this are largely due to the way in which HTML has historically 
evolved... browsers deliberately allow the presence of "invalid" elements they 
don't know about, to cater for future versions of the spec which add new 
elements. The idea is "graceful degradation", such that if you try to view a 
site that uses some new HTML features your browser doesn't support, it should 
at least in theory still let you see most of the content, just that you won't 
be able to use the new features. Depending on the HTML/CSS design, this works 
better in practice on some sites than on others. Then of course there's 
JavaScript APIs which can cause compatibility issues, though that's a separate 
topic, and the browser will usually at least display the content even if it 
can't do dynamic stuff because the JS code threw an exception.

In the case of UX Write, there's a few instances where I've used custom 
extensions to handle certain things. The main ones are:

1. Table of contents/list of tables/list of figures.

When you insert one of these into your document, it inserts a  element 
with a CSS class name of "tableofcontents", "listoffigures", or "listoftables", 
which were chosen as these are the same keywords that LaTeX uses for these 
features. UX Write treats these as having special meaning, in the sense that 
when opening a document (and when the document is modified), it updates the 
content of these  elements based on the set of all heading, figure, or 
table elements in the document (including numbering/captions).

2. OOXML-specific features.

When converting from .docx to .html during the process of opening a document, 
it assigns certain pre-defined CSS class names to particular types of HTML 
elements to indicate their purpose. For example, a cross-reference whose 
display format is supposed to include both the label and caption of a figure 
will be translated as:

...

where N is the id of the target. The editing code knows about these class names 
and uses them to update the text inside the  element if the figure number or 
caption changes. Similarly, where there is an unsupported object, like an 
embedded spreadsheet, it will translate this as:

[Unsupported object].

During editing, WebKit preserves these, since they're just CSS class names and 
don't in any way cause problems with the HTML or rendering. All of the core 
editing operations are implemented in javascript, and these take the class 
names into account where appropriate.

3. Element mappings for bidirectional transformation.

For every HTML element that is generated from an OOXML element, it sets the id 
attribute to a string of the form bdt(N)-(M),

Re: OOXML

2014-08-04 Thread Andrea Pescetti

Andrew Douglas Pitonyak wrote:

On 08/03/2014 12:50 PM, Peter Kelly wrote:

features that are supported in OOXML but not ODF, e.g. table
styles.


If AOO does not support Table Styles and a particular file format does
not, the biggest problem is that you lose table styles when you load,
edit, then save. If Aoo does not support Table Styles, then obviously
that feature will not properly "round trip" from file to memory to file.


Just a minor detail in this discussion, but last time I checked (a few 
years ago) ODF did have support for Table Styles; OpenOffice didn't 
expose a UI for that, but as Andrew wrote this is an editor problem, not 
a format problem.


Regards,
  Andrea.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



RE: OOXML

2014-08-04 Thread Dennis E. Hamilton
It is important to understand that an XML DOM does not capture all of the 
constraints and referential requirements within an ODF document.  In 
particular, content.xml does not have everything and there are references using 
XLink (relative hrefs) and also special identifiers (not IDREFs) to other 
files, whether for binary attachments or into other defined parts (styles.xml 
and meta.xml for two).

There is also considerable internal structuring that is off-hierachy.  Some of 
the connections are via fragment IDs (xml:id) and IDREFs, others are by 
identifiers (not IDs and IDREFs) that are introduced in the ODF specification 
but which are not modelled in the Relax NG Schema (beyond saying they have 
string values, for example).

This sort of thing also happens rather heavily in OOXML, where communication 
among parts uses a unique cross-part relationship model.  There are also many 
cross references to named components by other than XML IDs and IDREFs, whether 
or not the components and the references occur in the same part of the OPC 
package.

One could continue the kind of hack that plants that information as benign 
markers into an internal form of the XML parts (even as a single XML document, 
although that is tricky when ODF documents are nested as subdocuments of 
another), so long as they are replaced when the XML document is committed to a 
saved ODF document file format.

In terms of having a DOM that maps to the external file form and a different 
internal model, the only time that the internal model needs to update the 
externally-oriented DOM is as part of a Save operation.  There might be more 
coupling, but performance and storage issues will doubtless impact the 
engineering outcome, especially for handling large documents with alacrity.  
Copy and paste and undo management will also be factors, along with maintaining 
pagination, word counts, and such.

On the other hand, it is convenient (practically necessary) to specify the 
semantics of ODF, or some profile of ODF, as if operations are on the format 
itself, since it is only the format that is more-or-less well-specified.  It 
would be interesting to know how much this could be taken literally in an 
application.  I think there might be forensic tools on ODF documents that might 
be able to operate that way.  I'm not at all certain about production WYSIWYG 
consumers and producers, especially ones implemented to harmonize between 
OOXML, ODF and other interesting formats (EPUB coming to mind).

I will watch Peter Kelly's efforts with great interest to see how much the 
boundaries can be moved in this area.


 -- Dennis E. Hamilton
dennis.hamil...@acm.org+1-206-779-9430
https://keybase.io/orcmid  PGP F96E 89FF D456 628A
X.509 certs used and requested for signed e-mail


 - Original Message ---
From: Peter Kelly [mailto:kelly...@gmail.com] 
Sent: Monday, August 4, 2014 01:27
To: dev@openoffice.apache.org
Subject: Re: OOXML

On 4 Aug 2014, at 12:16 am, jan i  wrote:


[ ... ]

It's possible in theory, though I'm not familiar enough with the OO codebase to 
say whether it would work in practice.

The key idea is to maintain two separate data structures - one which is the ODF 
XML trees, and another which is the internal representation. Any time a change 
gets made to the former, the implementation must update the latter to reflect 
the change. Modification operations on the latter would need to go in the other 
direction.

[ ... ]

In the case of UX Write, there's a few instances where I've used custom 
extensions to handle certain things. The main ones are:

1. Table of contents/list of tables/list of figures.

When you insert one of these into your document, it inserts a  element 
with a CSS class name of "tableofcontents", "listoffigures", or "listoftables", 
which were chosen as these are the same keywords that LaTeX uses for these 
features. UX Write treats these as having special meaning, in the sense that 
when opening a document (and when the document is modified), it updates the 
content of these  elements based on the set of all heading, figure, or 
table elements in the document (including numbering/captions).

2. OOXML-specific features.

When converting from .docx to .html during the process of opening a document, 
it assigns certain pre-defined CSS class names to particular types of HTML 
elements to indicate their purpose. For example, a cross-reference whose 
display format is supposed to include both the label and caption of a figure 
will be translated as:

[ ... ]



-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: OOXML documentation

2014-07-24 Thread Kay Schenk
On Thu, Jul 24, 2014 at 8:50 AM, Peter Kelly  wrote:

> Hi,
>
> I've begun writing up some documentation on the OOXML file format on the
> wiki:
>
> https://wiki.openoffice.org/wiki/OOXML
>
> The new content is that linked to from the first section, currently
> limited to a description of the packaging format, extensibility features,
> and a brief introduction to WordProcessingML. This is just the beginning,
> and there's a *lot* more to be covered, which will happen over the coming
> weeks (months?).
>
> I've deliberately avoided discussing details of any particular
> implementation, so that it's a general description that is hopefully of use
> to anyone writing software to deal with the format. I will however be
> releasing my own implementation (which converts to/from HTML) once I've
> completed the port to Linux (which is close to completion).
>
> If anyone else who has experience working with the format would like to
> contribute as well, that would be great. My expertise is limited to the
> word processing aspects only, as I've never worked with the spreadsheet or
> presentation formats.
>
> --
> Dr. Peter M. Kelly
> Founder, UX Productivity
> pe...@uxproductivity.com
> http://www.uxproductivity.com/
> http://www.kellypmk.net/
>
> PGP key: http://www.kellypmk.net/pgp-key
> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>
>
Again, thank you! We have a lot to learn!

-- 
-
MzK

"To be trusted is a greater compliment than being loved."
   -- George MacDonald


Re: OOXML documentation

2014-07-27 Thread jan i
On 24 July 2014 17:50, Peter Kelly  wrote:

> Hi,
>
> I've begun writing up some documentation on the OOXML file format on the
> wiki:
>
> https://wiki.openoffice.org/wiki/OOXML
>
> The new content is that linked to from the first section, currently
> limited to a description of the packaging format, extensibility features,
> and a brief introduction to WordProcessingML. This is just the beginning,
> and there's a *lot* more to be covered, which will happen over the coming
> weeks (months?).
>
> I've deliberately avoided discussing details of any particular
> implementation, so that it's a general description that is hopefully of use
> to anyone writing software to deal with the format. I will however be
> releasing my own implementation (which converts to/from HTML) once I've
> completed the port to Linux (which is close to completion).
>
> If anyone else who has experience working with the format would like to
> contribute as well, that would be great. My expertise is limited to the
> word processing aspects only, as I've never worked with the spreadsheet or
> presentation formats.
>
HI.

it would be nice if you could fill the empty pages, with at least a link to
where one can find e.g. the spreadsheet format, preferable in a readable
form.

tthanks in advance.
rgds
jan I.


>
> --
> Dr. Peter M. Kelly
> Founder, UX Productivity
> pe...@uxproductivity.com
> http://www.uxproductivity.com/
> http://www.kellypmk.net/
>
> PGP key: http://www.kellypmk.net/pgp-key
> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>
>


Re: OOXML documentation

2014-08-06 Thread Andrea Pescetti

On 24/07/2014 Peter Kelly wrote:

https://wiki.openoffice.org/wiki/OOXML
The new content is that linked to from the first section, currently
limited to a description of the packaging format, extensibility
features, and a brief introduction to WordProcessingML. This is just the
beginning, and there's a *lot* more to be covered, which will happen
over the coming weeks (months?).


Thank you for this and for the other OOXML discussions. I'd like to make 
sure we can use your contributions productively. Are you up-to-date with 
the current parser development effort, see 
http://markmail.org/message/7hha2hv7qrdaoxes ? It's much more useful if 
we manage to immediately apply your knowledge there.


Regards,
  Andrea.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Improved OOXML support?

2014-10-20 Thread Jörg Schmidt
Hello,

Does anyone know when the integration of the extended OOXML filter 
functionality will be completed in AOO? 

I mean the enhancements that have been created within a project of the OSBA: 
http://www.osb-alliance.de/en/working-groups/wg-office-interoperability/project-1-by-osb-alliance-working-group-office-interoperability/



Greetings,
Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



OOXML support in AOO

2014-01-27 Thread Jörg Schmidt
Hello,

How far are the results of intitiative [1]: 

"better support for OOXML in LibreOffice / OpenOffice" 

meanwhile been integrated in AOO? 


Greetings,
Jörg


[1] 
Review, see: 
http://www.osb-alliance.de/en/working-groups/projekte/ooxml-filter/projektergebnisse-ooxml-filter/
 

Results see (in English): 
http://www.osb-alliance.de/fileadmin/Working_Groups/OfficeInteroperability/Project1/2013-09-17_OSBA_Press_Release_OOXML_Project_Finished_EN.pdf


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



[PROPOSAL] Focus on OOXML

2014-05-19 Thread Andre Fischer
As you may know, the OOXML support of OpenOffice is not very good.  It 
is also a feature that many of our users are asking for.  Therefore I 
would like to propose that we focus our work on OOXML import and export.


OOXML support consists of three parts: the import, the export, and the 
implementation of features that are missing in OpenOffice.  It is a 
large task and will easily occupy us for two releases or more.  It is 
likely that much of the initial work will not be very visible and will 
not contribute much to the next release.  But I am convinced that it is 
preferable to start working on this important feature now than on some 
more obvious UI changes (but there are some smaller features that we can 
and should still do, like making the UI more flat or replacing the 
application icons).


In a separate proposal I will talk more about the technical aspects of 
the OOXML support.


Regards,
Andre

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-21 Thread Jürgen Schmidt
On 21/10/14 08:34, Jörg Schmidt wrote:
> Hello,
> 
> Does anyone know when the integration of the extended OOXML filter 
> functionality will be completed in AOO? 
> 
> I mean the enhancements that have been created within a project of the OSBA: 
> http://www.osb-alliance.de/en/working-groups/wg-office-interoperability/project-1-by-osb-alliance-working-group-office-interoperability/
> 

No easy to answer when or if this will be integrated at all. We have
spend some time to integrate 2 use cases of this project and spend many
many time on it to make it complete (our work is already merged in LO).
The patches were incomplete and the implementation not complete at all.
We decided for us (some developer) that we don't spend further time on
this.

But the patches are available for any developer and can be used to work
on the integration or the feature at all.

Juergen

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-21 Thread Jörg Schmidt
Hello *, 

> From: Jürgen Schmidt [mailto:jogischm...@gmail.com] 

> No easy to answer when or if this will be integrated at all. We have
> spend some time to integrate 2 use cases of this project and 
> spend many
> many time on it to make it complete (our work is already 
> merged in LO).
> The patches were incomplete and the implementation not 
> complete at all.
> We decided for us (some developer) that we don't spend further time on
> this.

This is bad news for AOO, because it will lose more users. 

I myself had tried in recent weeks to paid support for OOXML filter but 
unfortunately could not find one.


Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-21 Thread BRM
On Tuesday, October 21, 2014 4:41 AM, Jörg Schmidt  
wrote:
 


> Hello *, 

> > From: Jürgen Schmidt [mailto:jogischm...@gmail.com] 
> > No easy to answer when or if this will be integrated at all. We have
> > spend some time to integrate 2 use cases of this project and 
> > spend many
> > many time on it to make it complete (our work is already 
> > merged in LO).
> > The patches were incomplete and the implementation not 
> > complete at all.
> > We decided for us (some developer) that we don't spend further time on
> > this.
> This is bad news for AOO, because it will lose more users. 
> I myself had tried in recent weeks to paid support for OOXML filter but 
> unfortunately could not find one.
> 
Unfortunately that will always be the state of OOXML integration for anyone 
other than Microsoft since OOXML is a poorly defined standard that relies on 
many binary extensions that are not published. Kind of like the old 
DOC/XLS/PPT/MDB formats that were (in many ways) memory dumps of their 
respective applications - only for OOXML they're wrapped by XML.

Until Microsoft publishes a real standard no one will ever be able to have true 
inter-operability.
Of course, this kind of hurts Microsoft too since they basically have the same 
problems with OOXML that they had with the old formats between versions of 
their own Office products; a good standard would make that a non-issue.

$0.02

Ben


Re: Improved OOXML support?

2014-10-21 Thread Jürgen Lange

Hi Ben,

you make me laugh. At the very moment Apache OpenOffice is far behind 
LibreOffice if you look at interoperability to Microsoft products. Last 
week I tried to load an Excel sheet (with pivo) into Apache OpenOffice 
4.1.1. Loading stopped and AOO hang. It was no problem to load this 
table with LO. A similar table (source also Microsoft Office 2010) was 
loaded into AOO 3.4.1 without problem and much faster than into LO (I 
can't remember what version of LO it was).


I will look at it next weekend and report.

There was an issue in loading this Excel sheet that I reported to 
bugzilla, but I think that no one ever had a look at it.


Kind regards
Jürgen


Am 21.10.2014 um 15:58 schrieb BRM:

On Tuesday, October 21, 2014 4:41 AM, Jörg Schmidt  
wrote:
  




Hello *,

From: Jürgen Schmidt [mailto:jogischm...@gmail.com]
No easy to answer when or if this will be integrated at all. We have
spend some time to integrate 2 use cases of this project and
spend many
many time on it to make it complete (our work is already
merged in LO).
The patches were incomplete and the implementation not
complete at all.
We decided for us (some developer) that we don't spend further time on
this.

This is bad news for AOO, because it will lose more users.
I myself had tried in recent weeks to paid support for OOXML filter but 
unfortunately could not find one.


Unfortunately that will always be the state of OOXML integration for anyone 
other than Microsoft since OOXML is a poorly defined standard that relies on 
many binary extensions that are not published. Kind of like the old 
DOC/XLS/PPT/MDB formats that were (in many ways) memory dumps of their 
respective applications - only for OOXML they're wrapped by XML.

Until Microsoft publishes a real standard no one will ever be able to have true 
inter-operability.
Of course, this kind of hurts Microsoft too since they basically have the same 
problems with OOXML that they had with the old formats between versions of 
their own Office products; a good standard would make that a non-issue.

$0.02

Ben





-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Re: Improved OOXML support?

2014-10-21 Thread Jörg Schmidt

> From: BRM [mailto:bm_witn...@yahoo.com.INVALID] 

> Unfortunately that will always be the state of OOXML 
> integration for anyone other than Microsoft since OOXML is a 
> poorly defined standard that relies on many binary extensions 
> that are not published. Kind of like the old DOC/XLS/PPT/MDB 
> formats that were (in many ways) memory dumps of their 
> respective applications - only for OOXML they're wrapped by XML.
> 
> Until Microsoft publishes a real standard no one will ever be 
> able to have true inter-operability.
> Of course, this kind of hurts Microsoft too since they 
> basically have the same problems with OOXML that they had 
> with the old formats between versions of their own Office 
> products; a good standard would make that a non-issue.

Sorry, but in this case MS is not to blame. The OOXML format is published as ISO
standard. 

We could discuss problems of this ISO standard in detail, but this is not
necessary because the fact that LibreOffice has implemented appropriate filters,
proves that it is not a problem of the OOXML standard.

I find it really strange that it seems impossible to find companies that are
willing to integrate corresponding filter in AOO, as a normal commercial 
support.



Greetings,
Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-22 Thread BRM
On Wednesday, October 22, 2014 1:03 AM, Jörg Schmidt  
wrote:
  

> From: BRM [mailto:bm_witn...@yahoo.com.INVALID] 
> > Unfortunately that will always be the state of OOXML 
> > integration for anyone other than Microsoft since OOXML is a 
> > poorly defined standard that relies on many binary extensions 
> > that are not published. Kind of like the old DOC/XLS/PPT/MDB 
> > formats that were (in many ways) memory dumps of their 
> > respective applications - only for OOXML they're wrapped by XML.
> > 
> > Until Microsoft publishes a real standard no one will ever be 
> > able to have true inter-operability.
> > Of course, this kind of hurts Microsoft too since they 
> > basically have the same problems with OOXML that they had 
> > with the old formats between versions of their own Office 
> > products; a good standard would make that a non-issue.
> Sorry, but in this case MS is not to blame. The OOXML format is published as 
> ISO
> standard. 

Yes it is a published ISO standard, but one that relies on many unpublished 
extensions.Yes, AOO can implement something that implements the one-off ISO 
standard (there have been no updates AFAIK);however, it will always be a 
chasing a moving, undocumented target for all those extensions which MS Office 
uses extensively.

> We could discuss problems of this ISO standard in detail, but this is not> 
> necessary because the fact that LibreOffice has implemented appropriate 
> filters,
> proves that it is not a problem of the OOXML standard.

No, just that someone has kept it up to some degree and spent time figuring out 
a set of those extensions that seems common enough.LO doesn't have perfect 
OOXML compatibility with MS Office either; just better than AOO right now.
And, as I noted, even MS Office has problems with OOXML compatibility between 
versions of itself.Not because of the standard but because of all the 
unpublished extensions to the standard; extensions which are likely just binary 
dumps of memory again.

> I find it really strange that it seems impossible to find companies that are> 
> willing to integrate corresponding filter in AOO, as a normal commercial 
> support.
 
Probably because it is not an easy task, too much of a moving target, and 
more.Yes, you can figure out a series of files, but there will always be 
something that is not completely compatible.While there may be a published 
XML-based Base for the OOXML file formats, there are still many parts that are 
not.
And yes, I'll applaud anyone that takes it on. Just saying, don't expect 
perfection, and don't expect to not to have to continuously be working on it 
because it is a continously moving target. And that is the juxt of my point in 
this whole thread.

$0.02
Ben
  

RE: Improved OOXML support?

2014-10-22 Thread Dennis E. Hamilton
 inline

-Original Message-
From: BRM [mailto:bm_witn...@yahoo.com.INVALID] 
Sent: Wednesday, October 22, 2014 08:12
To: dev@openoffice.apache.org
Subject: Re: Improved OOXML support?

On Wednesday, October 22, 2014 1:03 AM, Jörg Schmidt  
wrote:
  

> From: BRM [mailto:bm_witn...@yahoo.com.INVALID] 
> > Unfortunately that will always be the state of OOXML 
> > integration for anyone other than Microsoft since OOXML is a 
> > poorly defined standard that relies on many binary extensions 
> > that are not published. Kind of like the old DOC/XLS/PPT/MDB 
> > formats that were (in many ways) memory dumps of their 
> > respective applications - only for OOXML they're wrapped by XML.
> > 
> > Until Microsoft publishes a real standard no one will ever be 
> > able to have true inter-operability.
> > Of course, this kind of hurts Microsoft too since they 
> > basically have the same problems with OOXML that they had 
> > with the old formats between versions of their own Office 
> > products; a good standard would make that a non-issue.
> Sorry, but in this case MS is not to blame. The OOXML format is published as 
> ISO
> standard. 

Yes it is a published ISO standard, but one that relies on many unpublished 
extensions.Yes, AOO can implement something that implements the one-off ISO 
standard (there have been no updates AFAIK);however, it will always be a 
chasing a moving, undocumented target for all those extensions which MS Office 
uses extensively.


  OOXML is in its 4th edition (December 2012) and I believe another is on its 
way. It is under active maintenance at the ISO level, and you can always get 
the specs most easily from ECMA. 
See <http://www.ecma-international.org/publications/standards/Ecma-376.htm>.
  I'm not so sure about "unpublished" extensions.  There is a mechanism 
provided in the OOXML 
Standard for introducing extensions and my impression is that Microsoft is 
careful to use the mechanism and specify what theirs are, just as they also 
publish their profile for what they support in ODF.


> We could discuss problems of this ISO standard in detail, but this is not> 
> necessary because the fact that LibreOffice has implemented appropriate 
> filters,
> proves that it is not a problem of the OOXML standard.

No, just that someone has kept it up to some degree and spent time figuring out 
a set of those extensions that seems common enough.LO doesn't have perfect 
OOXML compatibility with MS Office either; just better than AOO right now.
And, as I noted, even MS Office has problems with OOXML compatibility between 
versions of itself.Not because of the standard but because of all the 
unpublished extensions to the standard; extensions which are likely just binary 
dumps of memory again.

> I find it really strange that it seems impossible to find companies that are> 
> willing to integrate corresponding filter in AOO, as a normal commercial 
> support.
 
Probably because it is not an easy task, too much of a moving target, and 
more.Yes, you can figure out a series of files, but there will always be 
something that is not completely compatible.While there may be a published 
XML-based Base for the OOXML file formats, there are still many parts that are 
not.
And yes, I'll applaud anyone that takes it on. Just saying, don't expect 
perfection, and don't expect to not to have to continuously be working on it 
because it is a continously moving target. And that is the juxt of my point in 
this whole thread.


I suspect that a bigger detriment to someone building commercial filters for 
AOO OOXML support is finding a meaningful business model, since these 
presumably have to be made freely available and even open-source.  It might be 
easier for developers who are immersed in the Microsoft (Office) technology to 
build better ODF conversions for Microsoft Office than start on the AOO side. 
That might be a superior point of leverage. I suspect there is still a business 
model problem since any enterprise or institution that finds this very 
important could presumably use their leverage with Microsoft directly to have 
better ODF support. 


$0.02
Ben
  


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-22 Thread Jörg Schmidt
> > I find it really strange that it seems impossible to find 
> companies that are> willing to integrate corresponding filter 
> in AOO, as a normal commercial support.
>  
> Probably because it is not an easy task, too much of a moving 
> target, and more.Yes, you can figure out a series of files, 
> but there will always be something that is not completely 
> compatible.

This is absolutely not a problem, the compatibility already provides the 
LibreOffice would be enough (for now).

> I suspect that a bigger detriment to someone building 
> commercial filters for AOO OOXML support is finding a 
> meaningful business model,

Commercial filters are not the issue, but that someone would pay the filter 
development ready, but no company finds that implements this, at least by the 
companies that are listed here:
http://www.openoffice.org/bizdev/consultants.html


Greetings,
Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



RE: Improved OOXML support?

2014-10-22 Thread Dennis E. Hamilton
With regard to the quotation from me, yes, it is possible to find funding for 
improvements.  There have been requests for bids from organization such as the 
OSB Alliance.  It is difficult to know whether they have found someone to bid 
on the work they want though, at an affordable price.  

The improved OOXML support was funded by an organization.  You've seen Jürgen 
Schmidt's response on the difficulty there has been integrating that code into 
Apache OpenOffice.  I don't doubt his appraisal.

 - Dennis

-Original Message-
From: Jörg Schmidt [mailto:joe...@j-m-schmidt.de] 
Sent: Wednesday, October 22, 2014 12:54
To: dev@openoffice.apache.org; dennis.hamil...@acm.org
Subject: Re: Improved OOXML support?

> > I find it really strange that it seems impossible to find 
> companies that are> willing to integrate corresponding filter 
> in AOO, as a normal commercial support.
>  
> Probably because it is not an easy task, too much of a moving 
> target, and more.Yes, you can figure out a series of files, 
> but there will always be something that is not completely 
> compatible.

This is absolutely not a problem, the compatibility already provides the 
LibreOffice would be enough (for now).

> I suspect that a bigger detriment to someone building 
> commercial filters for AOO OOXML support is finding a 
> meaningful business model,

Commercial filters are not the issue, but that someone would pay the filter 
development ready, but no company finds that implements this, at least by the 
companies that are listed here:
http://www.openoffice.org/bizdev/consultants.html


Greetings,
Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-22 Thread Andreas Säger
Am 22.10.2014 um 07:03 schrieb Jörg Schmidt:
> 

> I find it really strange that it seems impossible to find companies that are
> willing to integrate corresponding filter in AOO, as a normal commercial 
> support.
> 

If I were in need of an OOXML suite, I would pay for the real one. The
other one is LibreOffice. Nobody needs a third one.


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-22 Thread Jörg Schmidt

> From: Andreas Säger [mailto:saege...@t-online.de] 

> > I find it really strange that it seems impossible to find 
> companies that are
> > willing to integrate corresponding filter in AOO, as a 
> normal commercial support.
> > 
> 
> If I were in need of an OOXML suite, I would pay for the real one. The
> other one is LibreOffice. Nobody needs a third one.

I think it's good that we live in a free society and customers have the right to
see things differently. 


Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-22 Thread Jörg Schmidt

> From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org] 
> Sent: Wednesday, October 22, 2014 11:13 PM
> To: dev@openoffice.apache.org
> Subject: RE: Improved OOXML support?
> 
> With regard to the quotation from me, yes, it is possible to 
> find funding for improvements.  There have been requests for 
> bids from organization such as the OSB Alliance.  It is 
> difficult to know whether they have found someone to bid on 
> the work they want though, at an affordable price.  
> 
> The improved OOXML support was funded by an organization.  
> You've seen Jürgen Schmidt's response on the difficulty there 
> has been integrating that code into Apache OpenOffice.  I 
> don't doubt his appraisal.

Furthermore Jürgen spoke of it also: 

"We decided for us (some developer) that we don't spend further time on
this."

and this does not relate to technical problems, but it describes the developers 
do not have time. 



Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-23 Thread Andreas Säger
Am 23.10.2014 um 06:47 schrieb Jörg Schmidt:
> 
> I think it's good that we live in a free society and customers have the right 
> to
> see things differently. 
> 

Being a customer, I do see things differently. Every OOXML file is a
vote against ODF. One day in future LO will save ODF as a secondary
option. Finally MS wins because LO can not support the same feature set
as MSO does or because MS comes up with the next shit.


> 
> Jörg
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-23 Thread Jürgen Schmidt
On 23/10/14 07:02, Jörg Schmidt wrote:
> 
>> From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org] 
>> Sent: Wednesday, October 22, 2014 11:13 PM
>> To: dev@openoffice.apache.org
>> Subject: RE: Improved OOXML support?
>>
>> With regard to the quotation from me, yes, it is possible to 
>> find funding for improvements.  There have been requests for 
>> bids from organization such as the OSB Alliance.  It is 
>> difficult to know whether they have found someone to bid on 
>> the work they want though, at an affordable price.  
>>
>> The improved OOXML support was funded by an organization.  
>> You've seen Jürgen Schmidt's response on the difficulty there 
>> has been integrating that code into Apache OpenOffice.  I 
>> don't doubt his appraisal.
> 
> Furthermore Jürgen spoke of it also: 
> 
> "We decided for us (some developer) that we don't spend further time on
> this."
> 
> and this does not relate to technical problems, but it describes the 
> developers do not have time. 

no it means the benefit of the patches is so minimal that a rewrite is
probably cheaper and easier. For the example for 1 use case we have
integrated we spend a lot of time to understand the patch and realize
that the implementation address only one facet and is incomplete form
our pov. We or better Oliver spend even more time on it to make it
complete.

The other use case were addressed wrong from our pov and we took the
feature idea and implement it new and in a way to make it more general
and ready for the future.

Both solution found their way in LO which is fine but it shows ones more
that it is wasted time and resources. Better would be to collaborate and
work together on such things.

I believe that neither AOO nor LO has so many resources that it is
clever to do the work twice in the long term.

Juergen


> 
> 
> 
> Jörg
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-23 Thread Jörg Schmidt
> From: Jürgen Schmidt [mailto:jogischm...@gmail.com] 

> no it means the benefit of the patches is so minimal that a rewrite is
> probably cheaper and easier. For the example for 1 use case we have
> integrated we spend a lot of time to understand the patch and realize
> that the implementation address only one facet and is incomplete form
> our pov. We or better Oliver spend even more time on it to make it
> complete.
> 
> The other use case were addressed wrong from our pov and we took the
> feature idea and implement it new and in a way to make it more general
> and ready for the future.
> 
> Both solution found their way in LO which is fine but it 
> shows ones more
> that it is wasted time and resources. Better would be to 
> collaborate and
> work together on such things.
> 
> I believe that neither AOO nor LO has so many resources that it is
> clever to do the work twice in the long term.

In each case a solution is not to get. My client is willing to pay for it, but 
there is no company within: 
http://www.openoffice.org/bizdev/consultants.html 
that wants to do the work.

The reality is concrete and now unfortunately my customer is working with one 
of the companies sponsoring LO. 

I would much have preferred a job would be created for one of the companies 
supporting the AOO. 



Jörg



-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Improved OOXML support?

2014-10-24 Thread FR web forum
>but there is no company within: 
>http://www.openoffice.org/bizdev/consultants.html 
>that wants to do the work.

Maybe see with IBM:
http://www-03.ibm.com/software/products/en/ibm-support-for-apache-openoffice/

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Contribute code for OOXML export

2014-01-21 Thread Clarence GUO
Hi~ All,
Since Office 2007 Microsoft has defaulted to saving files in OOXML format.
And soon, in April, Microsoft will stop supporting Office 2003, the last
version of Office to write the binary format by default. So it becomes more
and more important for AOO to have capability to support OOXML file format
in order to help these users who need to work with OOXML files. AOO already
has some capabilities for OOXML file import, but it needs many
improvements. We have some pilot code for enabling OOXML export, developed
by De Bin, Jian Yuan, Sun Ying, Jin Long... Although it still has some ways
to go before ready for production, we'd like to contribute it first to AOO
for further development so that more developers can work on the framework
and continuously contribute their works. Since it still has many feature
gaps, we propose to put it on a branch firstly, and continue to enhance it,
and integrate it into a release only when we see it ready.

Clarence


Re: OOXML support in AOO

2014-01-28 Thread Oliver-Rainer Wittmann

Hi,

the OSBA OOXML improvement use case 4 - comments/annotations on text 
ranges - had been addressed for AOO 4.1.
For further details have a look at the draft of the release notes for 
next planned release.


Best regards, Oliver.

On 27.01.2014 18:33, Jörg Schmidt wrote:

Hello,

How far are the results of intitiative [1]:

"better support for OOXML in LibreOffice / OpenOffice"

meanwhile been integrated in AOO?


Greetings,
Jörg


[1]
Review, see:
http://www.osb-alliance.de/en/working-groups/projekte/ooxml-filter/projektergebnisse-ooxml-filter/

Results see (in English):
http://www.osb-alliance.de/fileadmin/Working_Groups/OfficeInteroperability/Project1/2013-09-17_OSBA_Press_Release_OOXML_Project_Finished_EN.pdf


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: OOXML support in AOO

2014-01-28 Thread Keith N. McKenna
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Oliver-Rainer Wittmann wrote:
> Hi,
> 
> the OSBA OOXML improvement use case 4 - comments/annotations on text
> ranges - had been addressed for AOO 4.1.
> For further details have a look at the draft of the release notes for
> next planned release.
> 
> Best regards, Oliver.
> 

Jorg;

A quick reference if needed for the section of the Draft release notes.
https://cwiki.apache.org/confluence/display/OOOUSERS/AOO+4.1+Release+Notes#AOO4.1ReleaseNotes-Comments/Annotationsontextranges

Regards
Keith

> On 27.01.2014 18:33, Jörg Schmidt wrote:
>> Hello,
>>
>> How far are the results of intitiative [1]:
>>
>> "better support for OOXML in LibreOffice / OpenOffice"
>>
>> meanwhile been integrated in AOO?
>>
>>
>> Greetings,
>> Jörg
>>
>>
>> [1]
>> Review, see:
>> http://www.osb-alliance.de/en/working-groups/projekte/ooxml-filter/projektergebnisse-ooxml-filter/
>>
>>
>> Results see (in English):
>> http://www.osb-alliance.de/fileadmin/Working_Groups/OfficeInteroperability/Project1/2013-09-17_OSBA_Press_Release_OOXML_Project_Finished_EN.pdf
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
>> For additional commands, e-mail: dev-h...@openoffice.apache.org
>>

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS58ZfAAoJEH0fu5UhGmBCVLgH/2TU3MNPgXuEuo7Bly8nhmpb
2YS9L/wMQI/1tQhbMqdUhC+s9R8wasZhNNEq+9T5O4Edsb23c1PNBAuH0Fy9J+cc
g+6SSQdbJD9TCmiXH16JHAbgMXPEHwYm0p2Y9CvPUG21FwdBxRPiHt+1/HGOZAnq
UV2HB10uZ96qX6GWC0DfQ5ne4S4/4tcp9PSf7txeckVDrZRKi47fV/M6WCaUy0ws
FNgneUXFeYJV7epeYayHzAb3PGw7qH8RJsDh5RArkG/KviKGguhTvpoIhogqKKxa
UlJn1ZTv9M/usYP1CfeB/1XGn/c0aHXqfrH0D2h4uxuIkc7TJX9ntyUtCjEndtg=
=lhMZ
-END PGP SIGNATURE-


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



wiki of OOXML export status

2014-02-15 Thread Clarence GUO
Hi,
I posted a wiki about OOXML export status to
https://cwiki.apache.org/confluence/display/OOOUSERS/OOXML+Export+Status.
Now all the works are still on my local, hasn't not been committed yet.
I'll commit when fixed current problems and the framework of three
applications are ready.

Thanks & BRs,
Clarence


Bug 124268 - OOXML import problems

2014-02-20 Thread Rainer Bielefeld

Hi,

I think we should decide whether we want to use a tracking bug [2] or a 
Key word (interop_OOXML) [2]. Using both in parallel only makes it more 
difficult to create reliable queries.


CU

Rainer


Hyperlinks:

[1] 
[2] 
 





-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



[PROPOSAL] New OOXML import framework

2014-05-19 Thread Andre Fischer
As one of the first tasks in the OOXML area I would like to propose to 
redesign and re-implement the OOXML parser.


At the moment each application has its own OOXML import design. Those of 
Impress and Calc are basically classic hand written push parser designs 
while that of Writer is semi-automatically derived from the 
WordprocessingML specification.  For all three designs there is hardly 
any documentation and their implementation is hard to understand and 
hard to maintain. All that means that you have to work hard to obtain a 
working knowledge about the OOXML parser for one application and then, 
once you have it, can not transfer it to the other applications.


I propose a new and unified approach that will essentially replace the 
current design and implementation.  Using the same framework in all 
applications has several advantages:


- You only have to learn how to use one well documented framework 
instead of three different and badly documented XML import techniques.


- It exploits the information given by the OOXML schema to produce 
automatically some of the code that has to be hand written today.


- It allows automatic analysis of the coverage of the OOXML 
specification so that we can easily see which parts have already been 
implemented and which are still missing.


- It will be much more easily understandable than the current OOXML 
import (especially that of Writer).


The one big downside is that the new design requires basically a 
reimplementation of the OOXML import.  But to everyone who has seen the 
current implementation might not see that as a downside at all :-)




Development and migration

I propose to do the implementation in a new module (possibly called 
main/ooxml/) with the goal to eventually (i.e. in a couple of releases) 
replace main/oox/ and other places that contain OOXML import code.  It 
will not be active by default until every one agrees that it is release 
ready.  Of course, there will be switches to easily (but not 
accidentally) activate it for development builds.


I also propose to focus first on Impress.  Its complexity regarding 
OOXML is less than that of Writer and Calc and the still existing 
expertise in this area of OpenOffice is probably larger than in Writer 
and definitely larger than in Calc.


Development will start with implementation of the new framework that is 
hinted at above and explained in more detail below.  Then the existing 
Impress import is migrated to the new design by copying and adapting the 
code.  The existing import in main/oox/ remains unchanged.




The new framework

The design of the new framework is based on exploiting the OOXML 
specification (plural because there are different versions, migration 
addendums and MS Office specific extensions).  A parser generator reads 
the specs and creates the actual OOXML parser from that.  The generated 
parser will basically be a (nested) stack automaton where each state 
corresponds roughly to a complex type as defined by the spec.  
Transitions from on state to another correspond to start and end tags 
that move from one complex type to another.


The actions that are executed on transitions and which do the actual 
import work, still have to be provided manually.  With an intermediate 
DSL (domain specific language) that represents the interface between 
OOXML parser and developer, even this step will become more easy and 
more robust.


The use of an intermediate DSL also allows tweaking of the rules derived 
from the OOXML specification should the need arise (to e.g. cope with 
OOXML files that are not 100% conformant to the specs).


The compile time part of the framework is to be implemented in Java to 
allow an efficient and fast development process.  The runtime part of 
the framework, including the generated parser will be implemented in C++ 
and be an integral part of OpenOffice.




Details

At the moment we are using a bare bones XML push parser for reading 
OOXML files.  That means that as the XML parser reads the stream of XML 
elements it asks the OOXML import code to handle start tags, end tags, 
and the text in between.  It is the task of these callbacks to provide 
so called contexts for each element. These contexts can then be used to 
make information like attribute values (which the parser only provides 
to start tags) accessible to the callbacks of text and end tags.
The creation of contexts and persistence of intermediate data is done 
manually in the existing import code.  The new import framework, 
however, will create it automatically, based on the OOXML specifications 
and semi automatically based on DSL requests.  The automatic part is 
extracted from the specs and responsible for preprocessing attribute 
value (e.g. conversion from string to boolean, integer, float/double or 
enumerations). The semi automatic part is driven by developer supplied 
information in DSL files and defines the subset of attributes that are 
really evaluated by the

DocFormats - Open source OOXML implementation

2014-08-14 Thread Peter Kelly
Those of you interested in OOXML may want to have a look at my own 
implementation of (a subset of) the spec, which is part of a library I've just 
made available as open source (license is ASLv2):

https://github.com/uxproductivity/DocFormats

I started working on this around two years ago as part of UX Write, and it's 
been included in the version shipping on the iOS app store since February 2013. 
I've recently finished removing all dependencies on iOS/OS X APIs, and 
converting all the code from Objective C to plain C99. It now also builds on 
Linux, with Windows not being too far away.

The design is based on bidirectional transformation, as a way of achieving 
non-destructive editing of foreign file formats. This permits incremental 
implementation of a given spec without risking data loss due to incomplete 
features, since unsupported features of a given file format are left untouched 
on save. UX Write uses HTML as both its native file format and in-memory data 
model (via WebKit), but relies on DocFormats to read & write .docx files, as 
well as export to LaTeX. The next major task I plan to work on (hopefully with 
help from others!) is .odt support.

Now that this is open source, the eventual goal is for it to be generally 
usable by any app which has a need to support multiple file formats, such as 
OOXML and ODF. Currently it is limited to word processing formats only, but I'm 
interested in expanding it to cover spreadsheets, presentations, and drawings. 
Aside from editors, it also could be used for batch conversion tools, document 
analysis, web publishing, and other purposes.

There are minimal dependencies (basically only libxml and zlib), to make it 
easy to integrate into different apps. I'm not a fan of huge monolithic 
architectures, and have kept it very independent of other other aspects of UX 
Write for this very purpose. Note that this means there is no editing or 
rendering code; it deals solely with conversion. UX Write uses WebKit for the 
rendering, but there are many other ways in which one could build on top of 
this.

I'll be presenting on this at ApacheCon EU this November - see the talk 
"Addressing File Format Compatibility in Word Processors" at 
http://apacheconeu2014.sched.org.

Comments/questions are welcome.

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Contribute code for OOXML export

2014-01-22 Thread Steve Yin
Great news!


On Wed, Jan 22, 2014 at 3:25 PM, Clarence GUO wrote:

> Hi~ All,
> Since Office 2007 Microsoft has defaulted to saving files in OOXML format.
> And soon, in April, Microsoft will stop supporting Office 2003, the last
> version of Office to write the binary format by default. So it becomes more
> and more important for AOO to have capability to support OOXML file format
> in order to help these users who need to work with OOXML files. AOO already
> has some capabilities for OOXML file import, but it needs many
> improvements. We have some pilot code for enabling OOXML export, developed
> by De Bin, Jian Yuan, Sun Ying, Jin Long... Although it still has some ways
> to go before ready for production, we'd like to contribute it first to AOO
> for further development so that more developers can work on the framework
> and continuously contribute their works. Since it still has many feature
> gaps, we propose to put it on a branch firstly, and continue to enhance it,
> and integrate it into a release only when we see it ready.
>
> Clarence
>



-- 
Best Regards,

Steve Yin


Re: Contribute code for OOXML export

2014-01-23 Thread Andrea Pescetti

On 22/01/2014 Clarence GUO wrote:

We have some pilot code for enabling OOXML export, developed
by De Bin, Jian Yuan, Sun Ying, Jin Long... Although it still has some ways
to go before ready for production, we'd like to contribute it first to AOO


Very good news! Even if ODF remains the native and preferred format, 
users regularly asked for better OOXML interoperability. I agree this 
should stay in a branch and not integrated/released until it reaches the 
quality users expect from OpenOffice.


Regards,
  Andrea.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Contribute code for OOXML export

2014-01-23 Thread Shenfeng Liu
+1
It is a feature on top of the requirement list per AOO's early survey.
Let's start!

- Shenfeng (Simon)



2014/1/23 Andrea Pescetti 

> On 22/01/2014 Clarence GUO wrote:
>
>> We have some pilot code for enabling OOXML export, developed
>> by De Bin, Jian Yuan, Sun Ying, Jin Long... Although it still has some
>> ways
>> to go before ready for production, we'd like to contribute it first to AOO
>>
>
> Very good news! Even if ODF remains the native and preferred format, users
> regularly asked for better OOXML interoperability. I agree this should stay
> in a branch and not integrated/released until it reaches the quality users
> expect from OpenOffice.
>
> Regards,
>   Andrea.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
>
>


Re: Contribute code for OOXML export

2014-01-23 Thread Clarence GUO
As although I contributed 10+ changes, I'm not committer yet, could anybody
help me to create a branch?
Thanks,

Clarence


2014/1/23 Andrea Pescetti 

> On 22/01/2014 Clarence GUO wrote:
>
>> We have some pilot code for enabling OOXML export, developed
>> by De Bin, Jian Yuan, Sun Ying, Jin Long... Although it still has some
>> ways
>> to go before ready for production, we'd like to contribute it first to AOO
>>
>
> Very good news! Even if ODF remains the native and preferred format, users
> regularly asked for better OOXML interoperability. I agree this should stay
> in a branch and not integrated/released until it reaches the quality users
> expect from OpenOffice.
>
> Regards,
>   Andrea.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
>
>


Re: Contribute code for OOXML export

2014-01-24 Thread jan i
On 24 January 2014 08:14, Clarence GUO  wrote:

> As although I contributed 10+ changes, I'm not committer yet, could anybody
> help me to create a branch?
> Thanks,
>

https://svn.apache.org/repos/asf/openoffice/branches/ooxml created. Happy
programming.

just write here on the list, whenever you want a merge from trunk.

I cannot comment on how many changes is needed to become a committer, but I
must admit I have not seen your patches on this list ?

rgds
jan I.


> Clarence
>
>
> 2014/1/23 Andrea Pescetti 
>
> > On 22/01/2014 Clarence GUO wrote:
> >
> >> We have some pilot code for enabling OOXML export, developed
> >> by De Bin, Jian Yuan, Sun Ying, Jin Long... Although it still has some
> >> ways
> >> to go before ready for production, we'd like to contribute it first to
> AOO
> >>
> >
> > Very good news! Even if ODF remains the native and preferred format,
> users
> > regularly asked for better OOXML interoperability. I agree this should
> stay
> > in a branch and not integrated/released until it reaches the quality
> users
> > expect from OpenOffice.
> >
> > Regards,
> >   Andrea.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > For additional commands, e-mail: dev-h...@openoffice.apache.org
> >
> >
>


Re: Contribute code for OOXML export

2014-01-26 Thread Clarence GUO
Hi~ Jan,
Thanks Jan for your help. I'll work on this branch.
About my patches, I picked some, see 123909, 123910, 123816, 122927...

Clarence


2014-01-24 jan i 

> On 24 January 2014 08:14, Clarence GUO  wrote:
>
> > As although I contributed 10+ changes, I'm not committer yet, could
> anybody
> > help me to create a branch?
> > Thanks,
> >
>
> https://svn.apache.org/repos/asf/openoffice/branches/ooxml created. Happy
> programming.
>
> just write here on the list, whenever you want a merge from trunk.
>
> I cannot comment on how many changes is needed to become a committer, but I
> must admit I have not seen your patches on this list ?
>
> rgds
> jan I.
>
>
> > Clarence
> >
> >
> > 2014/1/23 Andrea Pescetti 
> >
> > > On 22/01/2014 Clarence GUO wrote:
> > >
> > >> We have some pilot code for enabling OOXML export, developed
> > >> by De Bin, Jian Yuan, Sun Ying, Jin Long... Although it still has some
> > >> ways
> > >> to go before ready for production, we'd like to contribute it first to
> > AOO
> > >>
> > >
> > > Very good news! Even if ODF remains the native and preferred format,
> > users
> > > regularly asked for better OOXML interoperability. I agree this should
> > stay
> > > in a branch and not integrated/released until it reaches the quality
> > users
> > > expect from OpenOffice.
> > >
> > > Regards,
> > >   Andrea.
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > > For additional commands, e-mail: dev-h...@openoffice.apache.org
> > >
> > >
> >
>


Re: wiki of OOXML export status

2014-02-16 Thread Kay Schenk
On Sat, Feb 15, 2014 at 5:50 PM, Clarence GUO wrote:

> Hi,
> I posted a wiki about OOXML export status to
> https://cwiki.apache.org/confluence/display/OOOUSERS/OOXML+Export+Status.
> Now all the works are still on my local, hasn't not been committed yet.
> I'll commit when fixed current problems and the framework of three
> applications are ready.
>
> Thanks & BRs,
> Clarence
>

Thanks for the update! As soon as cwiki is back up, we can all have a look!


-- 
-
MzK

"Cats do not have to be shown how to have a good time,
 for they are unfailing ingenious in that respect."
   -- James Mason


Re: Bug 124268 - OOXML import problems

2014-02-20 Thread Andre Fischer

On 20.02.2014 09:37, Rainer Bielefeld wrote:

Hi,

I think we should decide whether we want to use a tracking bug [2] or 
a Key word (interop_OOXML) [2]. Using both in parallel only makes it 
more difficult to create reliable queries.


I don't think so.   I have created [1] to have list of bugs with 
interesting documents that highlight certain problems of the OOXML 
import.  It is assigned to me, so you can see this as a personal list.  
Key words, in contrast, are global.  I have no intention to start 
tracking all OOXML bugs.


By the way, the interop_OOXML keyword has been used only five times.  I 
assume that we have more OOXML problems than that.


-Andre



CU

Rainer


Hyperlinks:

[1] <https://issues.apache.org/ooo/show_bug.cgi?id=124268>
[2] 
<https://issues.apache.org/ooo/buglist.cgi?keywords=interop_OOXML%2C%20&keywords_type=allwords&list_id=126117&query_format=advanced> 





-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Bug 124268 - OOXML import problems

2014-02-20 Thread Rainer Bielefeld

Edwin Sharp schrieb:


Keywords or Tags should be preferred.


Hi,

especially because availability is visible for everybody keyword 
selector, while Tracking bugs need expert knowledge that they do exist. 
Tracking bugs are favorable if there is no simple relation does exist 
(Blocks TrackingBug1, what itself depends on TrackingBug2 ... . 
Dependency trees are useful sometimes, but my experience is that for 
most issues a simple Tag like "[sidebar]" in the summary does the job.


For my workflow the Tracking Bugs have few impact, but of course that 
might be different for other users.


CU


Rainer


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Bug 124268 - OOXML import problems

2014-02-20 Thread Edwin Sharp
I agree with Rainer.
Keywords or Tags should be preferred.

On Thu, Feb 20, 2014, at 10:37, Rainer Bielefeld wrote:
> Hi,
> 
> I think we should decide whether we want to use a tracking bug [2] or a 
> Key word (interop_OOXML) [2]. Using both in parallel only makes it more 
> difficult to create reliable queries.
> 
> CU
> 
> Rainer
> 
> 
> Hyperlinks:
> 
> [1] 
> [2] 
> 
>  
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: qa-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: qa-h...@openoffice.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: wiki of OOXML export status

2014-02-21 Thread Clarence GUO
Kay,
The status was updated. I will update this wiki weekly or biweekly.

Thanks & BRs
Clarence


2014-02-17 2:19 GMT+08:00 Kay Schenk :

> On Sat, Feb 15, 2014 at 5:50 PM, Clarence GUO  >wrote:
>
> > Hi,
> > I posted a wiki about OOXML export status to
> > https://cwiki.apache.org/confluence/display/OOOUSERS/OOXML+Export+Status
> .
> > Now all the works are still on my local, hasn't not been committed yet.
> > I'll commit when fixed current problems and the framework of three
> > applications are ready.
> >
> > Thanks & BRs,
> > Clarence
> >
>
> Thanks for the update! As soon as cwiki is back up, we can all have a look!
>
>
> --
>
> -
> MzK
>
> "Cats do not have to be shown how to have a good time,
>  for they are unfailing ingenious in that respect."
>-- James Mason
>


Re: [PROPOSAL] New OOXML import framework

2014-05-19 Thread Kay Schenk
[top posting for a moment]

Thank you for this initial introduction to planning better support for
OOXML. The reality is this is necessary, and I would imagine most
involved in the project realize this.  OK, just a bit more below.

On 05/19/2014 06:39 AM, Andre Fischer wrote:
> As one of the first tasks in the OOXML area I would like to propose to
> redesign and re-implement the OOXML parser.
> 
> At the moment each application has its own OOXML import design. Those of
> Impress and Calc are basically classic hand written push parser designs
> while that of Writer is semi-automatically derived from the
> WordprocessingML specification.  For all three designs there is hardly
> any documentation and their implementation is hard to understand and
> hard to maintain. All that means that you have to work hard to obtain a
> working knowledge about the OOXML parser for one application and then,
> once you have it, can not transfer it to the other applications.
> 
> I propose a new and unified approach that will essentially replace the
> current design and implementation.  Using the same framework in all
> applications has several advantages:
> 
> - You only have to learn how to use one well documented framework
> instead of three different and badly documented XML import techniques.
> 
> - It exploits the information given by the OOXML schema to produce
> automatically some of the code that has to be hand written today.
> 
> - It allows automatic analysis of the coverage of the OOXML
> specification so that we can easily see which parts have already been
> implemented and which are still missing.
> 
> - It will be much more easily understandable than the current OOXML
> import (especially that of Writer).
> 
> The one big downside is that the new design requires basically a
> reimplementation of the OOXML import.  But to everyone who has seen the
> current implementation might not see that as a downside at all :-)
> 
> 
> 
> Development and migration
> 
> I propose to do the implementation in a new module (possibly called
> main/ooxml/) with the goal to eventually (i.e. in a couple of releases)
> replace main/oox/ and other places that contain OOXML import code.  It
> will not be active by default until every one agrees that it is release
> ready.  Of course, there will be switches to easily (but not
> accidentally) activate it for development builds.
> 
> I also propose to focus first on Impress.  Its complexity regarding
> OOXML is less than that of Writer and Calc and the still existing
> expertise in this area of OpenOffice is probably larger than in Writer
> and definitely larger than in Calc.
> 
> Development will start with implementation of the new framework that is
> hinted at above and explained in more detail below.  Then the existing
> Impress import is migrated to the new design by copying and adapting the
> code.  The existing import in main/oox/ remains unchanged.
> 
> 
> 
> The new framework
> 
> The design of the new framework is based on exploiting the OOXML
> specification (plural because there are different versions, migration
> addendums and MS Office specific extensions).  A parser generator reads
> the specs and creates the actual OOXML parser from that.  The generated
> parser will basically be a (nested) stack automaton where each state
> corresponds roughly to a complex type as defined by the spec. 
> Transitions from on state to another correspond to start and end tags
> that move from one complex type to another.
> 
> The actions that are executed on transitions and which do the actual
> import work, still have to be provided manually.  With an intermediate
> DSL (domain specific language) that represents the interface between
> OOXML parser and developer, even this step will become more easy and
> more robust.
> 
> The use of an intermediate DSL also allows tweaking of the rules derived
> from the OOXML specification should the need arise (to e.g. cope with
> OOXML files that are not 100% conformant to the specs).
> 
> The compile time part of the framework is to be implemented in Java to
> allow an efficient and fast development process. 

Does this basically mean that we will need to use both Java and C++ for
future builds?

 The runtime part of
> the framework, including the generated parser will be implemented in C++
> and be an integral part of OpenOffice.
> 
> 
> 
> Details
> 
> At the moment we are using a bare bones XML push parser for reading
> OOXML files.  That means that as the XML parser reads the stream of XML
> elements it asks the OOXML import code to handle start tags, end tags,
> and the text in between.  It is the task of these callbacks to provide
> so called contexts for each element. These c

Re: [PROPOSAL] New OOXML import framework

2014-05-19 Thread Andre Fischer

On 20.05.2014 00:28, Kay Schenk wrote:

[top posting for a moment]

Thank you for this initial introduction to planning better support for
OOXML. The reality is this is necessary, and I would imagine most
involved in the project realize this.  OK, just a bit more below.

On 05/19/2014 06:39 AM, Andre Fischer wrote:

The compile time part of the framework is to be implemented in Java to
allow an efficient and fast development process.

Does this basically mean that we will need to use both Java and C++ for
future builds?


We already need Java and C++ for builds.  This does not change.

-Andre


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: [PROPOSAL] New OOXML import framework

2014-05-20 Thread Kay Schenk


On 05/19/2014 11:57 PM, Andre Fischer wrote:
> On 20.05.2014 00:28, Kay Schenk wrote:
>> [top posting for a moment]
>>
>> Thank you for this initial introduction to planning better support for
>> OOXML. The reality is this is necessary, and I would imagine most
>> involved in the project realize this.  OK, just a bit more below.
>>
>> On 05/19/2014 06:39 AM, Andre Fischer wrote:
>>> The compile time part of the framework is to be implemented in Java to
>>> allow an efficient and fast development process.
>> Does this basically mean that we will need to use both Java and C++ for
>> future builds?
> 
> We already need Java and C++ for builds.  This does not change.
> 
> -Andre
> 
> 

OK, right.

> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 

-- 
-
MzK

"Life is either a daring adventure, or nothing."
   -- Helen Keller


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: [PROPOSAL] New OOXML import framework

2014-05-20 Thread Andrea Pescetti

On 19/05/2014 Andre Fischer wrote:

As one of the first tasks in the OOXML area I would like to propose to
redesign and re-implement the OOXML parser.


I can only agree with this one. We've already discussed it many times, 
but even the many users who prefer ODF need a good support for OOXML for 
interoperability, and better support for the Microsoft Office native 
formats is consistently in the top requests.



I propose a new and unified approach that will essentially replace the
current design and implementation.


Sounds good. Especially the idea to be able to automatically know how 
much of the specification is covered will be helpful.



I also propose to focus first on Impress.  Its complexity regarding
OOXML is less than that of Writer and Calc


And this is probably good for users too. In my experience, the import of 
.PPTX files is the most unsatisfactory one at the moment, with many 
obvious deficiencies. Improving this one first would already give good 
results for users.



I have made several experiments regarding the reading of the
specification and generation of parsers and am confident that the
outlined approach will work.


A not-so-original question: we have another Apache project, POI, 
http://poi.apache.org/ that among the other things has an OOXML parser. 
If we are starting from scratch, why not reusing their code? And, if 
there are reasons for not reusing it, could we validate this roadmap 
with the POI developers, who are probably more familiar with OOXML 
parsing than the average reader of this list?


Regards,
  Andrea.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: [PROPOSAL] New OOXML import framework

2014-05-21 Thread Andre Fischer

On 20.05.2014 23:38, Andrea Pescetti wrote:

On 19/05/2014 Andre Fischer wrote:

As one of the first tasks in the OOXML area I would like to propose to
redesign and re-implement the OOXML parser.


I can only agree with this one. We've already discussed it many times, 
but even the many users who prefer ODF need a good support for OOXML 
for interoperability, and better support for the Microsoft Office 
native formats is consistently in the top requests.



I propose a new and unified approach that will essentially replace the
current design and implementation.


Sounds good. Especially the idea to be able to automatically know how 
much of the specification is covered will be helpful.



I also propose to focus first on Impress. Its complexity regarding
OOXML is less than that of Writer and Calc


And this is probably good for users too. In my experience, the import 
of .PPTX files is the most unsatisfactory one at the moment, with many 
obvious deficiencies. Improving this one first would already give good 
results for users.



I have made several experiments regarding the reading of the
specification and generation of parsers and am confident that the
outlined approach will work.


A not-so-original question: we have another Apache project, POI, 
http://poi.apache.org/ that among the other things has an OOXML 
parser. If we are starting from scratch, why not reusing their code? 
And, if there are reasons for not reusing it, could we validate this 
roadmap with the POI developers, who are probably more familiar with 
OOXML parsing than the average reader of this list?


First, we are not really starting from scratch.  There are several 
components to importing OOXML files.  Two important ones are the parser 
that reads (OO)XML streams and turns them into events for start tags, 
end tags, text, etc.  The second  part are the callbacks that are called 
for each of these events.  This second part is the larger and more 
important part.  I want to replace the parser but would like to migrate 
as much as possible of the second part callbacks as possible.


Most of the work in the OOXML import/export project, however, will be 
spent in other areas:


- Implementing features that exist in MS Office but not in OpenOffice.  
Examples are SmartArt shapes (for all applications).


- Improve features in OpenOffice that are not working as well as they 
should/could.  Examples are pivot tables in Calc or the slide show in 
Impress.


- Support existing features in OpenOffice that are just not handled by 
the OOXML importer.



Regarding POI, there are several reasons not to use it:

- As said above, the existing import code is to be migrated to the new 
framework.   The new framework should offer an interface that supports 
this migration.


- POI is implemented in Java.

- As far as I understand POI (I don't find its documentation very 
helpful) is more like a DOM tree with better access to its nodes then a 
streaming parser.  That would result in lower execution speed and larger 
memory consumption.


- OOXML / MS Office is supported up to 2007.   That seems like an 
undesirable restriction.


- The original naming (see http://en.wikipedia.org/wiki/Apache_POI) does 
not imply professional development of the POI project.



Regards,
Andre




Regards,
  Andrea.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



News about the new OOXML framework.

2014-06-03 Thread Andre Fischer

I would like to give a short status update about the new OOXML framework.

- Created the new module main/ooxml
  There are not yet any makefiles that build the contents of the ooxml/ 
module nor link it into the build process of OpenOffice. However, you 
can use e.g. Eclipse to import the Java projects that are described below.


- Moved the old Office Open XML wiki pages out of the way and create two 
new ones:
  = "OfficeOpenXML" contains an introduction into the OOXML file 
format, a status overview of the implementation progress and links to 
more detailed information.
  = "OOXML" and "ooxml" (uppercase/lowercase) redirect to 
"OfficeOpenXML" so that there is no excuse to not find this page.
  = "OOXML Framework" contains more detailed information about the new 
framework.


- Created a new Java project at ooxml/main/source/framework/SchemaParser 
that parses the XML schema files that come with the ECMA-376 
specification files.
  Its purpose is to read the schema files and create a skeleton OOXML 
parser from it.  This skeleton can then be filled in with code for 
importing certain elements of OOXML documents.


- Created a new Java project at 
ooxml/main/source/framework/JavaOOXMLParser.  Its purpose is testing and 
debugging of and experimenting with the schema parser.  It is not 
intended to become a runtime component of OpenOffice.



The SchemaParser is able to parse all files of the ECMA-376 
specification both in the old (1st edition of 2006) and new (4th edition 
of 2012) versions.  It looks like we need both since the new one is the 
current standard (equivalent to the ISO standard) while the old on is 
actually used.

Not all details of the schema files are handled yet.

The JavaOOXMLParser, based on parser tables created by the SchemaParser, 
is already able to parse the large DOCX file of the 1st edition 
specification.  When pretty printed it is about 90 MB large.  It takes 
the parser about 90 s to read it.  Note that the parser is not optimized 
in any way (if it where then it would be optimized for readability, not 
for speed) and that it writes about 650 MB of log files in the process.


If anyone would like to play with the parsers, I will gladly provide 
more details.


Best regards,
Andre


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Change tracking & versioning (was Re: OOXML)

2014-08-03 Thread Peter Kelly
On 3 Aug 2014, at 3:05 am, Dennis E. Hamilton  wrote:

> In line with the sketch that Peter Kelley provides below, I am personally 
> very sympathetic to the idea of having an internal model that can tolerate 
> difference in format between input and output while preserving in the output 
> everything from the input format it can, even by leaving markers that will be 
> useful on future input of the produced form.  (There is a well-known case of 
> Microsoft Office doing this for HTML it exports, although the added 
> information for recovery of the MSO rendition led to many complaints about 
> document bloat.)
> 
> There are some conflicts between the desire to do this and the fact that some 
> alterations have non-local consequences and may have other effects.  I still 
> support the idea, but there are some tricky cases, including
> 
> - Changes that overlap/conflict with tracked changes but tracked changes are 
> not updated/preserved properly

I'm probably getting a bit off-topic here, but this issue is one of the reasons 
I advocate an approach that keeps change tracking information separate from the 
content itself, rather than part of it. In my mind, Git provides the perfect 
model for this, although integrating it (or something else based on a similar 
model) into a word processor or office suite remains, shall we say, a rather 
significant problem to solve, both in the sense of the theoretical model and 
how that would be exposed in a user interface.

By itself, keeping the change information separate wouldn't solve the problem 
of inconsistency when the file is modified by an implementation with no 
knowledge of change tracking information. However, with a data model based on 
that of a version control system, that is able to access the previous version 
of the file as well as the current one, find the differences between the two, 
and allow the user to apply those differences, this could be addressed.

Let's say, just as a mental exercise, that we were to embed a git repository 
directly within an ODF file. That is, the .odt file is a zip archive containing 
the usual content.xml, styles.xml etc and also has a .git directory inside it, 
which contains the complete revision history of all these separate files. When 
you save the document in an implementation that does not support any change 
tracking/versioning, it would just overwrite the XML files in the same way as a 
text editor writes a file to disk. When you save the document in an 
implementation that *does* support this however, it overwrites the files and 
*then* does a git commit.

With this approach, if you were to first create a file in implementation A 
which supports this versioning, you'd have a zip file with a git repository and 
one or more commits, and the "working copy" (that is, all the files within the 
zip archive outside of the .git directory) would be "clean" (up to date). If 
you then open and save it in implementation B which does not support 
versioning, it would not touch the repository and leave the .git directory in 
the zip file untouched, but instead save over the XML files. Then you open it 
in implementation A again, and you can see that the working directory is not 
clean, and there are outstanding changes. These could then be displayed in the 
editor in the same way as is done currently, without the user noticing any 
difference. And you'd have the benefits of knowing the derivation relationships 
between versions, so if you get two different versions of a document back that 
have the same ancestor, you could do a merge.

Now I'm not suggesting that actually storing a git repository inside a .odt 
archive would be a good way to go - partly for efficiency reasons (duplication 
of document's entire history in every copy), and partly because its format is 
pure binary, and is so vastly different from everything else in ODF. 
Nonetheless, at a theoretical level, the core idea - of storing a version 
history separate from the content, from which changes can automatically be 
detected without requiring any extensions to the core part of the standard 
itself - would I think be worth exploring.

I know this is quite a different approach to what you've previously been 
considering; what are your thoughts?

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: DocFormats - Open source OOXML implementation

2014-08-15 Thread Andrea Pescetti

On 15/08/2014 Peter Kelly wrote:

Those of you interested in OOXML may want to have a look at my own
implementation of (a subset of) the spec, which is part of a library
I've just made available as open source (license is ASLv2):
https://github.com/uxproductivity/DocFormats


It's very interesting. I hope that in future it may become relevant to 
OpenOffice or to Apache at large.



The design is based on bidirectional transformation, as a way of
achieving non-destructive editing of foreign file formats. This permits
incremental implementation of a given spec without risking data loss due
to incomplete features, since unsupported features of a given file
format are left untouched on save.


Does this mean that
$ dfutil/dfutil filename.docx filename.html
$ dfutil/dfutil filename.html filename2.docx
should produce a "filename2.docx" that is quite similar to 
"filename.docx"? It is failing rather badly (invalid OOXML output in the 
second conversion, ZIP container clearly missing files and possible 
breaking order) in a simple test I did with a 1-page docx file.


What is the best channel to report issues?


I'll be presenting on this at ApacheCon EU this November - see the talk
"Addressing File Format Compatibility in Word Processors" at
http://apacheconeu2014.sched.org


Looking forward to see it live!

Regards,
  Andrea.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: DocFormats - Open source OOXML implementation

2014-08-15 Thread Peter Kelly
On 16 Aug 2014, at 5:26 am, Andrea Pescetti  wrote:

> On 15/08/2014 Peter Kelly wrote:
>> Those of you interested in OOXML may want to have a look at my own
>> implementation of (a subset of) the spec, which is part of a library
>> I've just made available as open source (license is ASLv2):
>> https://github.com/uxproductivity/DocFormats
> 
> It's very interesting. I hope that in future it may become relevant to 
> OpenOffice or to Apache at large.
> 
>> The design is based on bidirectional transformation, as a way of
>> achieving non-destructive editing of foreign file formats. This permits
>> incremental implementation of a given spec without risking data loss due
>> to incomplete features, since unsupported features of a given file
>> format are left untouched on save.
> 
> Does this mean that
> $ dfutil/dfutil filename.docx filename.html
> $ dfutil/dfutil filename.html filename2.docx
> should produce a "filename2.docx" that is quite similar to "filename.docx"? 
> It is failing rather badly (invalid OOXML output in the second conversion, 
> ZIP container clearly missing files and possible breaking order) in a simple 
> test I did with a 1-page docx file.

I'm not surprised this is the first issue to come up :$ There's a *lot* of 
knowledge I need to document for others; questions from you and others are the 
best way to motivate me to get that written ;)

What's happening here is that when the filename.html produced in the first 
step, each of its elements contains an id attribute containing a numeric 
identifier that refers to a specific element in the source docx file 
(specifically, the word/document.xml file within the package). These numeric 
identifiers are generated during parsing, and correspond to the position of the 
element in document order (so 1, 2, 3, etc.). When you convert from HTML to 
.docx, it uses the id attributes to re-establish these relationships, so that 
it knows which elements in the HTML file correspond to which elements in the 
.docx file.

The problem you encountered stems from the fact that this mapping is only valid 
in specific circumstances - that is, when the .docx file being updated is 
exactly the same as its original. If this is not the case, then the identifier 
assigned to a given node will different whenever there are other nodes that 
have been inserted between it. So for example if you do the following:

dfutil filename.docx filename.html
# Modify filename.html
dfutil filename.html filename.docx
dfutil filename.html filename.docx

Then the third run will fail, because in the second the docx file will have 
been updated based on the changes in the HTML, changing the sequence numbers 
assigned to each node, and then on the second run the mapping will be valid. 
The conversion works on the assumption that the docx file is the same as the 
original. The way that UX Write uses the library, it ensures this is the case, 
but the library does not check for this (and yes, it should; more on this 
below).

Your case is similar, though in this case you're creating a new docx file, not 
updating an existing one. However what it actually does in this case is to 
create an empty .docx file, and then "update" that based on the HTML. In doing 
so, it assumes that the HTML does not contain any mappings (that is, id 
attributes with the prefix "bdt"). Since the filename.html you generated does, 
it tries to map these to elements in the docx file, failing badly.

The only workaround for this at present is to manually edit the HTML file and 
remove all id attributes. The quickest way to do this is with the following 
command:

sed -i '' -E ' s/ id="word[0-9]+"//' filename.html

Then, when you run dfutil, it will see that there is no mapping for any of the 
elements in the HTML file, and thus avoid the problems in the output you 
observed.

Now, onto the fix:

The library needs to have some way of checking that the HTML file being used as 
part of an update operation has a mapping (id attributes) that match the docx 
file being updated (in the case of creating a new file, this is just an empty 
docx file). In the even that this is not the case, it could still do the 
update, but would act as if the entire document had been replaced with a 
completely new one.

The solution I'll likely implement (and this should really be my first task, 
given the potential for problems like the above is this):

- Include a hash of the .docx file (or relevant parts of it) in the HTML file, 
e.g. as a meta element or as part of the prefix on all id attributes
- On update, have re-compute the hash of the .docx file and compare it against 
the one stored in the HTML file (if any), and if there's no match, treat the 
HTML file as a complete replacement of all content


> 
> What is the best chann

Re: DocFormats - Open source OOXML implementation

2014-08-15 Thread Peter Kelly
On 16 Aug 2014, at 5:26 am, Andrea Pescetti  wrote:

> Does this mean that
> $ dfutil/dfutil filename.docx filename.html
> $ dfutil/dfutil filename.html filename2.docx
> should produce a "filename2.docx" that is quite similar to "filename.docx"? 
> It is failing rather badly (invalid OOXML output in the second conversion, 
> ZIP container clearly missing files and possible breaking order) in a simple 
> test I did with a 1-page docx file.
> 
> What is the best channel to report issues?

Currently just email to me (or here on the list), but we should ideally get a 
dedicated mailing list/bug tracking system set up for it soon.

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: DocFormats - Open source OOXML implementation

2014-08-15 Thread Andrea Pescetti

Peter Kelly wrote:

On 16 Aug 2014, at 5:26 am, Andrea Pescetti wrote:

Does this mean that
$ dfutil/dfutil filename.docx filename.html
$ dfutil/dfutil filename.html filename2.docx
should produce a "filename2.docx" that is quite similar to
"filename.docx"? It is failing rather badly (invalid OOXML output in
the second conversion, ZIP container clearly missing files and
possible breaking order) in a simple test I did with a 1-page docx file.


I'm not surprised this is the first issue to come up :$ There's a *lot*
of knowledge I need to document for others; questions from you and
others are the best way to motivate me to get that written ;)


I've also been fixing (or breaking, who knows!) some documentation on my 
clone (my "fork" as Github likes to call it) but I'll submit a pull 
request only when basic things work.



Since the
filename.html you generated does, it tries to map these to elements in
the docx file, failing badly.


OK, but the following fails equally badly (producing an invalid OOXML 
file, even though this time it looks more consistent in size and 
internal content with filename.docx):

$ dfutil/dfutil filename.docx filename.html
Created filename.html
$ dfutil/dfutil filename.html filename.docx

What the best channel to report this issue and the 38 tests that are 
failing in my setup (provided they are all expected to pass)?



- Include a hash of the .docx file (or relevant parts of it) in the HTML
file, e.g. as a meta element or as part of the prefix on all id attributes


Seems a good idea. Perhaps having it as a meta element will be enough, 
unless it makes sense for some reason to link each attribute to a 
specific .docx file. Still, this won't solve the problem above.


Regards,
  Andrea.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: DocFormats - Open source OOXML implementation

2014-08-16 Thread jan i
On 16 August 2014 03:50, Peter Kelly  wrote:

> On 16 Aug 2014, at 5:26 am, Andrea Pescetti  wrote:
>
> On 15/08/2014 Peter Kelly wrote:
>
> Those of you interested in OOXML may want to have a look at my own
> implementation of (a subset of) the spec, which is part of a library
> I've just made available as open source (license is ASLv2):
> https://github.com/uxproductivity/DocFormats
>
>
> It's very interesting. I hope that in future it may become relevant to
> OpenOffice or to Apache at large.
>
> The design is based on bidirectional transformation, as a way of
> achieving non-destructive editing of foreign file formats. This permits
> incremental implementation of a given spec without risking data loss due
> to incomplete features, since unsupported features of a given file
> format are left untouched on save.
>
>
> Does this mean that
> $ dfutil/dfutil filename.docx filename.html
> $ dfutil/dfutil filename.html filename2.docx
> should produce a "filename2.docx" that is quite similar to
> "filename.docx"? It is failing rather badly (invalid OOXML output in the
> second conversion, ZIP container clearly missing files and possible
> breaking order) in a simple test I did with a 1-page docx file.
>
>
> I'm not surprised this is the first issue to come up :$ There's a *lot* of
> knowledge I need to document for others; questions from you and others are
> the best way to motivate me to get that written ;)
>
> What's happening here is that when the filename.html produced in the first
> step, each of its elements contains an id attribute containing a numeric
> identifier that refers to a specific element in the source docx file
> (specifically, the word/document.xml file within the package). These
> numeric identifiers are generated during parsing, and correspond to the
> position of the element in document order (so 1, 2, 3, etc.). When you
> convert from HTML to .docx, it uses the id attributes to re-establish these
> relationships, so that it knows which elements in the HTML file correspond
> to which elements in the .docx file.
>
> The problem you encountered stems from the fact that this mapping is only
> valid in specific circumstances - that is, when the .docx file being
> updated is exactly the same as its original. If this is not the case, then
> the identifier assigned to a given node will different whenever there are
> other nodes that have been inserted between it. So for example if you do
> the following:
>
> dfutil filename.docx filename.html
> # Modify filename.html
> dfutil filename.html filename.docx
> dfutil filename.html filename.docx
>
> Then the third run will fail, because in the second the docx file will
> have been updated based on the changes in the HTML, changing the sequence
> numbers assigned to each node, and then on the second run the mapping will
> be valid. The conversion works on the assumption that the docx file is the
> same as the original. The way that UX Write uses the library, it ensures
> this is the case, but the library does not check for this (and yes, it
> should; more on this below).
>
> Your case is similar, though in this case you're creating a new docx file,
> not updating an existing one. However what it actually does in this case is
> to create an empty .docx file, and then "update" that based on the HTML. In
> doing so, it assumes that the HTML does not contain any mappings (that is,
> id attributes with the prefix "bdt"). Since the filename.html you generated
> does, it tries to map these to elements in the docx file, failing badly.
>
> The only workaround for this at present is to manually edit the HTML file
> and remove all id attributes. The quickest way to do this is with the
> following command:
>
> sed -i '' -E ' s/ id="word[0-9]+"//' filename.html
>
> Then, when you run dfutil, it will see that there is no mapping for any of
> the elements in the HTML file, and thus avoid the problems in the output
> you observed.
>
> Now, onto the fix:
>
> The library needs to have some way of checking that the HTML file being
> used as part of an update operation has a mapping (id attributes) that
> match the docx file being updated (in the case of creating a new file, this
> is just an empty docx file). In the even that this is not the case, it
> could still do the update, but would act as if the entire document had been
> replaced with a completely new one.
>
> The solution I'll likely implement (and this should really be my first
> task, given the potential for problems like the above is this):
>
In my humble opinion you should not use time on this right now.

If you fix a bug we have a

Re: DocFormats - Open source OOXML implementation

2014-08-16 Thread Peter Kelly
On 16 Aug 2014, at 12:55 pm, Andrea Pescetti  wrote:

> I've also been fixing (or breaking, who knows!) some documentation on my 
> clone (my "fork" as Github likes to call it) but I'll submit a pull request 
> only when basic things work.

I've just merged in your changes and also invited you as a committer to the 
repository. Then you'll be able to push directly to it instead of having to 
maintain your own fork.

I vote that we establish a policy of rebasing instead of merging in the general 
case (unless there's a good reason to do otherwise), as this will help maintain 
a mostly-linear history and avoid the annoyances described in [1]. That is, if 
before I push to the repository I see that the remote master has advanced (due 
to you or Jan committing something else), I'll rebase my commits on top of 
yours, so they look like they come "after" them in the history. Likewise, you 
and Jan would do the same if I've made commits. What do you think?

http://blog.spreedly.com/2014/06/24/merge-pull-request-considered-harmful

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)



signature.asc
Description: Message signed with OpenPGP using GPGMail


RE: DocFormats - Open source OOXML implementation

2014-08-16 Thread Dennis E. Hamilton
I don't have any skin in this game.

Yet I am baffled about where this work is going on and what Apache Project it 
relates to.  Is there an incubator proposal for Apache DocFormats on its way?

In particular, I would expect that some thought would be given to the ODF 
Toolkit and that incubator project, <http://incubator.apache.org/odftoolkit/>.

Also, Apache POI would seem to have some relevance, especially the OpenXML4J 
component, <http://poi.apache.org/>.

These are all Java based, as is Armin's current project in the AOO repository.  
I haven't listed open-source projects outside the embrace of ASF.

A single  remark is in-line below (although this notation may derail 
defective HTML presentation of plaintext containing angle brackets).

Re-subscribing to general-incubator now ... 

Oh, and congratulations on joining the IPMC, Jan.
 
 -- Dennis E. Hamilton
dennis.hamil...@acm.org+1-206-779-9430
https://keybase.io/orcmid  PGP F96E 89FF D456 628A
X.509 certs used and requested for signed e-mail



-Original Message-
From: jan i [mailto:j...@apache.org] 
Sent: Saturday, August 16, 2014 01:10
To: dev
Subject: Re: DocFormats - Open source OOXML implementation

On 16 August 2014 03:50, Peter Kelly  wrote:

[ ... ]
> Now, onto the fix:
>
> The library needs to have some way of checking that the HTML file being
> used as part of an update operation has a mapping (id attributes) that
> match the docx file being updated (in the case of creating a new file, this
> is just an empty docx file). In the even that this is not the case, it
> could still do the update, but would act as if the entire document had been
> replaced with a completely new one.
>
> The solution I'll likely implement (and this should really be my first
> task, given the potential for problems like the above is this):
>
In my humble opinion you should not use time on this right now.

If you fix a bug we have a 1-1 relation (1 man used, 1 bug fixed)
If you start getting the documentation right we have a 1-n relations (1 man
used, n men help fix bugs).

Please have in mind, we build a community in order to move away from "I
have to do it, because I am the only one who know how" and you are the most
important enabler of that..we need your knowledge in a file, so that
others can work.

[ ... ]

When the project (hopefully) enters incubator, we will automatically have
access to a bug tracking system (jira), and with that hopefully only being
some month away I would not recommend setting up one now.


   On Github, there is already an issues structure, 
   <https://github.com/uxproductivity/DocFormats/issues>.
   I think this should be continued in use until a different 
   setup arrives "any day soon".  Note that some Github projects 
   create a single subrepository that is just for its issues 
   function.  E.g., https://github.com/keybase/keybase-issues



[ ... ]



-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: DocFormats - Open source OOXML implementation

2014-08-16 Thread jan i
On 16 August 2014 18:38, Dennis E. Hamilton  wrote:

> I don't have any skin in this game.
>
> Yet I am baffled about where this work is going on and what Apache Project
> it relates to.  Is there an incubator proposal for Apache DocFormats on its
> way?
>
Yes there is a proposal on its way, look at general-incubator approx. the
last 3 days. Right now it is not decided who should sponsor this project.


>
> In particular, I would expect that some thought would be given to the ODF
> Toolkit and that incubator project, <
> http://incubator.apache.org/odftoolkit/>.
>
> Also, Apache POI would seem to have some relevance, especially the
> OpenXML4J component, <http://poi.apache.org/>.
>
The intention is clearly to at least have a close cooperation with these
projects. But docFormats aims at a bit more (like e.g. being openoffice on
tablets).

I am right now working on student proposal, to get a compliance sheet made
for products that offer OXML and/or odf. MAybe that would be something you
would want to help out with.


> These are all Java based, as is Armin's current project in the AOO
> repository.  I haven't listed open-source projects outside the embrace of
> ASF.
>
> A single  remark is in-line below (although this notation may
> derail defective HTML presentation of plaintext containing angle brackets).
>
> Re-subscribing to general-incubator now ...
>
> Oh, and congratulations on joining the IPMC, Jan.
>
thanks a lot.

rgds
jan i

>
>  -- Dennis E. Hamilton
> dennis.hamil...@acm.org+1-206-779-9430
> https://keybase.io/orcmid  PGP F96E 89FF D456 628A
> X.509 certs used and requested for signed e-mail
>
>
>
> -Original Message-
> From: jan i [mailto:j...@apache.org]
> Sent: Saturday, August 16, 2014 01:10
> To: dev
> Subject: Re: DocFormats - Open source OOXML implementation
>
> On 16 August 2014 03:50, Peter Kelly  wrote:
>
> [ ... ]
> > Now, onto the fix:
> >
> > The library needs to have some way of checking that the HTML file being
> > used as part of an update operation has a mapping (id attributes) that
> > match the docx file being updated (in the case of creating a new file,
> this
> > is just an empty docx file). In the even that this is not the case, it
> > could still do the update, but would act as if the entire document had
> been
> > replaced with a completely new one.
> >
> > The solution I'll likely implement (and this should really be my first
> > task, given the potential for problems like the above is this):
> >
> In my humble opinion you should not use time on this right now.
>
> If you fix a bug we have a 1-1 relation (1 man used, 1 bug fixed)
> If you start getting the documentation right we have a 1-n relations (1 man
> used, n men help fix bugs).
>
> Please have in mind, we build a community in order to move away from "I
> have to do it, because I am the only one who know how" and you are the most
> important enabler of that..we need your knowledge in a file, so that
> others can work.
>
> [ ... ]
>
> When the project (hopefully) enters incubator, we will automatically have
> access to a bug tracking system (jira), and with that hopefully only being
> some month away I would not recommend setting up one now.
>
> 
>On Github, there is already an issues structure,
><https://github.com/uxproductivity/DocFormats/issues>.
>I think this should be continued in use until a different
>setup arrives "any day soon".  Note that some Github projects
>create a single subrepository that is just for its issues
>function.  E.g., https://github.com/keybase/keybase-issues
> 
>
>
> [ ... ]
>
>
>


Re: DocFormats - Open source OOXML implementation

2014-08-16 Thread Peter Kelly
ODF toolkit and Apache POI are both APIs to specific file formats. The key 
differences with DocFormats are

1. Support for multiple file formats (a limited range supported presently, but 
the intention is to expand to other formats)
2. Ability to "abstract over" a file format, in that the goal is to allow 
people to write apps without caring what format the data is physically stored in
3. Use of HTML as a common intermediate format during translation (though other 
formats can be manipulated if natively supported by an editor, then converted 
back to the source format)
4. Bi-directionality, i.e. the ability to do non-destructive updates when 
converting between formats
5. A building-block for creating HTML-based editors, viewers, and other 
applications (in particular, using WebKit or other browser engines)
6. No reliance on Java

If you want to do mobile, you can't use anything Java-based - that is, if you 
want to support iOS. No-one can use the two projects you mentioned if they want 
to build an iPhone or iPad app, which is one of several reasons ODF is absent 
from the mobile space.

Although in the past, Java has been a great choice for cross-platform 
applications, sadly this is no longer the case - hence C (the code was 
originally in Objective C but translated to C). It's also an extra dependency 
which can unnecessarily bloat requirements for an application, whereas this is 
very lightweight.

In addition to a library for dealing with file formats, the overall idea is 
much wider than that - to build applications on top of this, such as an editor, 
also within the context of the project. And also, promoting the idea of "file 
format independence", in the same way as most now see platform-independence as 
a good thing. We're looking to make it as flexible as possible, so that it can 
be adapted for mobile, desktop, and web. It's sort of a "clean start" in a 
sense, though not necessarily aiming to entirely replicate existing projects, 
but rather something new.

Both the ODF toolkit and Apache POI have useful work which will quite possibly 
be of use. In particular I think the latter may be helpful for supporting the 
older binary MS file formats, and we hope to collaborate with other Apache 
projects where relevant.

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

On 16 Aug 2014, at 11:38 pm, Dennis E. Hamilton  wrote:

> I don't have any skin in this game.
> 
> Yet I am baffled about where this work is going on and what Apache Project it 
> relates to.  Is there an incubator proposal for Apache DocFormats on its way?
> 
> In particular, I would expect that some thought would be given to the ODF 
> Toolkit and that incubator project, <http://incubator.apache.org/odftoolkit/>.
> 
> Also, Apache POI would seem to have some relevance, especially the OpenXML4J 
> component, <http://poi.apache.org/>.
> 
> These are all Java based, as is Armin's current project in the AOO 
> repository.  I haven't listed open-source projects outside the embrace of ASF.
> 
> A single  remark is in-line below (although this notation may derail 
> defective HTML presentation of plaintext containing angle brackets).
> 
> Re-subscribing to general-incubator now ... 
> 
> Oh, and congratulations on joining the IPMC, Jan.
> 
> -- Dennis E. Hamilton
>dennis.hamil...@acm.org+1-206-779-9430
>https://keybase.io/orcmid  PGP F96E 89FF D456 628A
>X.509 certs used and requested for signed e-mail
> 
> 
> 
> -Original Message-
> From: jan i [mailto:j...@apache.org] 
> Sent: Saturday, August 16, 2014 01:10
> To: dev
> Subject: Re: DocFormats - Open source OOXML implementation
> 
> On 16 August 2014 03:50, Peter Kelly  wrote:
> 
> [ ... ]
>> Now, onto the fix:
>> 
>> The library needs to have some way of checking that the HTML file being
>> used as part of an update operation has a mapping (id attributes) that
>> match the docx file being updated (in the case of creating a new file, this
>> is just an empty docx file). In the even that this is not the case, it
>> could still do the update, but would act as if the entire document had been
>> replaced with a completely new one.
>> 
>> The solution I'll likely implement (and this should really be my first
>> task, given the potential for problems like the above is this):
>> 
> In my humble opinion you should not use time on this right now.
> 
> If you fix a bug we have a 1-1 relation (1 man used, 1 bug fixed)
> If you start getting the documentation right we have a 1-n relations (1 man
> used, n men help fi

RE: DocFormats - Open source OOXML implementation

2014-08-16 Thread Dennis E. Hamilton
OK, I get it.  There is cross-talk between this dev-openoffice list and 
general-incubator involving two messages there,

1. A general-incubator post from you, replying to a message from Peter Kelley 
about his DocFormats document-conversion project and bringing Peter's request 
to the attention of general-incubator, at
<http://mail-archives.apache.org/mod_mbox/incubator-general/201408.mbox/%3CCAK2iWdTS%2BKUWWZ%2BBOAnsNW4PiE37OLJA%3Dx%2B5az%3DAdAviiS_47A%40mail.gmail.com%3E>.

2. An observation from Andrea that is essentially good wishes.

I find it an interesting leap from DocFormats to OpenOffice for tablets and 
look forward to seeing the incubator proposal.

I am definitely interested in the "student proposal, to get a compliance sheet 
made
for products that offer OXML and/or odf" that you mention.

Interoperability in interchange among document formats is a driving issue for 
me.  I look forward to more about that.  There has been significant effort in 
this area, although it does not seem to have made much impact and is generally 
little-known.  The OASIS effort on ODF Interoperability and Conformance (OIC 
TC) folded its tent in November 2013.  (On that one, I am an unindicted 
co-conspirator.)

I will see what references I can dig up after I submit updated pre-conference 
versions of some papers due this weekend, 
<https://sites.google.com/site/dchanges14/program>.  Information about those 
interop/conversion efforts would also be good backup information for the 
DChanges 2014 workshop next month.

 - Dennis

PS: Roundtripping between OOXML and HTML is something that Microsoft put 
considerable effort into.  Some found the resulting HTML (pre-HTML5) rather 
nauseous, but it is remarkably presentation-preserving as far as it goes. It 
might be informative to look into how well AOO does the same between ODF and 
[X]HTML as a calibration.  One could also look at the Office Web Apps, that 
manifest OOXML documents via editable web-page interfaces as a descendant.  
These seem to be tied to the way that some Phone and Tablet Microsoft Office 
applications are tied to cloud-stored documents.


-Original Message-
From: jan i [mailto:j...@apache.org] 
Sent: Saturday, August 16, 2014 09:45
To: Dennis Hamilton
Cc: dev; jan iversen
Subject: Re: DocFormats - Open source OOXML implementation

On 16 August 2014 18:38, Dennis E. Hamilton  wrote:

> I don't have any skin in this game.
>
> Yet I am baffled about where this work is going on and what Apache Project
> it relates to.  Is there an incubator proposal for Apache DocFormats on its
> way?
>
Yes there is a proposal on its way, look at general-incubator approx. the
last 3 days. Right now it is not decided who should sponsor this project.

[ ... ]

The intention is clearly to at least have a close cooperation with these
projects. But docFormats aims at a bit more (like e.g. being openoffice on
tablets).

I am right now working on student proposal, to get a compliance sheet made
for products that offer OXML and/or odf. MAybe that would be something you
would want to help out with.


[ ... ]


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: DocFormats - Open source OOXML implementation

2014-08-17 Thread Andrea Pescetti

On 16/08/2014 Peter Kelly wrote:

On 16 Aug 2014, at 12:55 pm, Andrea Pescetti wrote:

I've also been fixing (or breaking, who knows!) some documentation on
my clone (my "fork" as Github likes to call it) but I'll submit a pull
request only when basic things work.

I've just merged in your changes and also invited you as a committer


Thanks. Note (this is just for information, I have absolutely nothing 
against it!) that Apache projects using Github as primary source have a 
policy of not integrating code without a pull request. So one needs to 
"fork" (in the Github sense of course, so not a "fork" in its common 
meaning) the project and create a pull request. This is necessary 
because Apache prefers (and at time requires) that all patches being 
integrated are not only under the right license, but also voluntarily 
contributed. "Apache Way" class finished, sorry for being boring and 
let's move on...



Then you'll be able to push directly to it instead of
having to maintain your own fork.


Perfect. Of course, it was a fork in the Github meaning rather than the 
common meaning, so I never meant to maintain a separate version, I just 
wanted to produce pull requests,



I vote that we establish a policy of rebasing instead of merging in the
general case (unless there's a good reason to do otherwise), as this
will help maintain a mostly-linear history


No strong preferences for me. But I won't commit anything to the 
repository until I get my account properly configured, since it is from 
my work account and I can't afford to mix (so, if a couple of commit 
with the wrong e-mail address already sneaked in, this is already bad, 
but I'll now setup my accounts and environments properly before doing 
any other activity).


Anyway, I have nothing to commit at the moment in terms of code.

Regards,
  Andrea.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: News about the new OOXML framework.

2014-06-03 Thread Jürgen Schmidt
On 03/06/14 09:41, Andre Fischer wrote:
> I would like to give a short status update about the new OOXML framework.
> 
> - Created the new module main/ooxml
>   There are not yet any makefiles that build the contents of the ooxml/
> module nor link it into the build process of OpenOffice. However, you
> can use e.g. Eclipse to import the Java projects that are described below.
> 
> - Moved the old Office Open XML wiki pages out of the way and create two
> new ones:
>   = "OfficeOpenXML" contains an introduction into the OOXML file format,
> a status overview of the implementation progress and links to more
> detailed information.
>   = "OOXML" and "ooxml" (uppercase/lowercase) redirect to
> "OfficeOpenXML" so that there is no excuse to not find this page.
>   = "OOXML Framework" contains more detailed information about the new
> framework.

for convenience, the wiki pages can be found under

https://wiki.openoffice.org/wiki/OOXML

https://wiki.openoffice.org/wiki/OOXML_Framework

> 
> - Created a new Java project at ooxml/main/source/framework/SchemaParser
> that parses the XML schema files that come with the ECMA-376
> specification files.
>   Its purpose is to read the schema files and create a skeleton OOXML
> parser from it.  This skeleton can then be filled in with code for
> importing certain elements of OOXML documents.
> 
> - Created a new Java project at
> ooxml/main/source/framework/JavaOOXMLParser.  Its purpose is testing and
> debugging of and experimenting with the schema parser.  It is not
> intended to become a runtime component of OpenOffice.
> 
> 
> The SchemaParser is able to parse all files of the ECMA-376
> specification both in the old (1st edition of 2006) and new (4th edition
> of 2012) versions.  It looks like we need both since the new one is the
> current standard (equivalent to the ISO standard) while the old on is
> actually used.
> Not all details of the schema files are handled yet.
> 
> The JavaOOXMLParser, based on parser tables created by the SchemaParser,
> is already able to parse the large DOCX file of the 1st edition
> specification.  When pretty printed it is about 90 MB large.  It takes
> the parser about 90 s to read it.  Note that the parser is not optimized
> in any way (if it where then it would be optimized for readability, not
> for speed) and that it writes about 650 MB of log files in the process.
> 
> If anyone would like to play with the parsers, I will gladly provide
> more details.

Thanks for the update, I think it's good to know that you are already
able to read both versions as described above.

I believe most users of OOXML don't care about the spec and even don't
know that the files produced today are by default OOXML transitional.
OOXML is far more complex than reading one spec ... but nobody cares
about the complexity and simply want pr expect a 1:1 support ;-)

Juergen

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: News about the new OOXML framework.

2014-06-03 Thread Dave Fisher
Hi,

Are you aware of Apache POI?

Reads and writes most Ooxml in Java.

Regards,
Dave

Apache POI and OpenOffice PMCs

Sent from my iPhone

> On Jun 3, 2014, at 3:41 AM, Andre Fischer  wrote:
> 
> I would like to give a short status update about the new OOXML framework.
> 
> - Created the new module main/ooxml
>  There are not yet any makefiles that build the contents of the ooxml/ module 
> nor link it into the build process of OpenOffice. However, you can use e.g. 
> Eclipse to import the Java projects that are described below.
> 
> - Moved the old Office Open XML wiki pages out of the way and create two new 
> ones:
>  = "OfficeOpenXML" contains an introduction into the OOXML file format, a 
> status overview of the implementation progress and links to more detailed 
> information.
>  = "OOXML" and "ooxml" (uppercase/lowercase) redirect to "OfficeOpenXML" so 
> that there is no excuse to not find this page.
>  = "OOXML Framework" contains more detailed information about the new 
> framework.
> 
> - Created a new Java project at ooxml/main/source/framework/SchemaParser that 
> parses the XML schema files that come with the ECMA-376 specification files.
>  Its purpose is to read the schema files and create a skeleton OOXML parser 
> from it.  This skeleton can then be filled in with code for importing certain 
> elements of OOXML documents.
> 
> - Created a new Java project at ooxml/main/source/framework/JavaOOXMLParser.  
> Its purpose is testing and debugging of and experimenting with the schema 
> parser.  It is not intended to become a runtime component of OpenOffice.
> 
> 
> The SchemaParser is able to parse all files of the ECMA-376 specification 
> both in the old (1st edition of 2006) and new (4th edition of 2012) versions. 
>  It looks like we need both since the new one is the current standard 
> (equivalent to the ISO standard) while the old on is actually used.
> Not all details of the schema files are handled yet.
> 
> The JavaOOXMLParser, based on parser tables created by the SchemaParser, is 
> already able to parse the large DOCX file of the 1st edition specification.  
> When pretty printed it is about 90 MB large.  It takes the parser about 90 s 
> to read it.  Note that the parser is not optimized in any way (if it where 
> then it would be optimized for readability, not for speed) and that it writes 
> about 650 MB of log files in the process.
> 
> If anyone would like to play with the parsers, I will gladly provide more 
> details.
> 
> Best regards,
> Andre
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: News about the new OOXML framework.

2014-06-03 Thread Andrea Pescetti

Dave Fisher wrote:

Are you aware of Apache POI?
Reads and writes most Ooxml in Java.


Already discussed here: http://markmail.org/message/jhdsrqxfdczvoyy4

And thanks, Andre, for the nice progress and detailed information!

Regards,
  Andrea.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: News about the new OOXML framework.

2014-06-03 Thread Jürgen Schmidt
On 03/06/14 13:20, Dave Fisher wrote:
> Hi,
> 
> Are you aware of Apache POI?
> 
> Reads and writes most Ooxml in Java.

yes, we know POI and Andrea pointed already on thread where this was
discussed.

When you look in the details Java is used for the SchemaParser and later
on to generate a C++ parser. No plans to use Java during runtime in the
office to parse OOXML. But Java is perfect and productive for the
development of the tooling etc.

Juergen

> 
> Regards,
> Dave
> 
> Apache POI and OpenOffice PMCs
> 
> Sent from my iPhone
> 
>> On Jun 3, 2014, at 3:41 AM, Andre Fischer  wrote:
>>
>> I would like to give a short status update about the new OOXML framework.
>>
>> - Created the new module main/ooxml
>>  There are not yet any makefiles that build the contents of the ooxml/ 
>> module nor link it into the build process of OpenOffice. However, you can 
>> use e.g. Eclipse to import the Java projects that are described below.
>>
>> - Moved the old Office Open XML wiki pages out of the way and create two new 
>> ones:
>>  = "OfficeOpenXML" contains an introduction into the OOXML file format, a 
>> status overview of the implementation progress and links to more detailed 
>> information.
>>  = "OOXML" and "ooxml" (uppercase/lowercase) redirect to "OfficeOpenXML" so 
>> that there is no excuse to not find this page.
>>  = "OOXML Framework" contains more detailed information about the new 
>> framework.
>>
>> - Created a new Java project at ooxml/main/source/framework/SchemaParser 
>> that parses the XML schema files that come with the ECMA-376 specification 
>> files.
>>  Its purpose is to read the schema files and create a skeleton OOXML parser 
>> from it.  This skeleton can then be filled in with code for importing 
>> certain elements of OOXML documents.
>>
>> - Created a new Java project at ooxml/main/source/framework/JavaOOXMLParser. 
>>  Its purpose is testing and debugging of and experimenting with the schema 
>> parser.  It is not intended to become a runtime component of OpenOffice.
>>
>>
>> The SchemaParser is able to parse all files of the ECMA-376 specification 
>> both in the old (1st edition of 2006) and new (4th edition of 2012) 
>> versions.  It looks like we need both since the new one is the current 
>> standard (equivalent to the ISO standard) while the old on is actually used.
>> Not all details of the schema files are handled yet.
>>
>> The JavaOOXMLParser, based on parser tables created by the SchemaParser, is 
>> already able to parse the large DOCX file of the 1st edition specification.  
>> When pretty printed it is about 90 MB large.  It takes the parser about 90 s 
>> to read it.  Note that the parser is not optimized in any way (if it where 
>> then it would be optimized for readability, not for speed) and that it 
>> writes about 650 MB of log files in the process.
>>
>> If anyone would like to play with the parsers, I will gladly provide more 
>> details.
>>
>> Best regards,
>> Andre
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
>> For additional commands, e-mail: dev-h...@openoffice.apache.org
>>
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: News about the new OOXML framework.

2014-06-03 Thread Dave Fisher
Great, makes sense. I've been way too busy at $job and am barely reading my 
email.

Regards,
Dave

Sent from my iPhone

> On Jun 3, 2014, at 7:41 AM, Jürgen Schmidt  wrote:
> 
>> On 03/06/14 13:20, Dave Fisher wrote:
>> Hi,
>> 
>> Are you aware of Apache POI?
>> 
>> Reads and writes most Ooxml in Java.
> 
> yes, we know POI and Andrea pointed already on thread where this was
> discussed.
> 
> When you look in the details Java is used for the SchemaParser and later
> on to generate a C++ parser. No plans to use Java during runtime in the
> office to parse OOXML. But Java is perfect and productive for the
> development of the tooling etc.
> 
> Juergen
> 
>> 
>> Regards,
>> Dave
>> 
>> Apache POI and OpenOffice PMCs
>> 
>> Sent from my iPhone
>> 
>>> On Jun 3, 2014, at 3:41 AM, Andre Fischer  wrote:
>>> 
>>> I would like to give a short status update about the new OOXML framework.
>>> 
>>> - Created the new module main/ooxml
>>> There are not yet any makefiles that build the contents of the ooxml/ 
>>> module nor link it into the build process of OpenOffice. However, you can 
>>> use e.g. Eclipse to import the Java projects that are described below.
>>> 
>>> - Moved the old Office Open XML wiki pages out of the way and create two 
>>> new ones:
>>> = "OfficeOpenXML" contains an introduction into the OOXML file format, a 
>>> status overview of the implementation progress and links to more detailed 
>>> information.
>>> = "OOXML" and "ooxml" (uppercase/lowercase) redirect to "OfficeOpenXML" so 
>>> that there is no excuse to not find this page.
>>> = "OOXML Framework" contains more detailed information about the new 
>>> framework.
>>> 
>>> - Created a new Java project at ooxml/main/source/framework/SchemaParser 
>>> that parses the XML schema files that come with the ECMA-376 specification 
>>> files.
>>> Its purpose is to read the schema files and create a skeleton OOXML parser 
>>> from it.  This skeleton can then be filled in with code for importing 
>>> certain elements of OOXML documents.
>>> 
>>> - Created a new Java project at 
>>> ooxml/main/source/framework/JavaOOXMLParser.  Its purpose is testing and 
>>> debugging of and experimenting with the schema parser.  It is not intended 
>>> to become a runtime component of OpenOffice.
>>> 
>>> 
>>> The SchemaParser is able to parse all files of the ECMA-376 specification 
>>> both in the old (1st edition of 2006) and new (4th edition of 2012) 
>>> versions.  It looks like we need both since the new one is the current 
>>> standard (equivalent to the ISO standard) while the old on is actually used.
>>> Not all details of the schema files are handled yet.
>>> 
>>> The JavaOOXMLParser, based on parser tables created by the SchemaParser, is 
>>> already able to parse the large DOCX file of the 1st edition specification. 
>>>  When pretty printed it is about 90 MB large.  It takes the parser about 90 
>>> s to read it.  Note that the parser is not optimized in any way (if it 
>>> where then it would be optimized for readability, not for speed) and that 
>>> it writes about 650 MB of log files in the process.
>>> 
>>> If anyone would like to play with the parsers, I will gladly provide more 
>>> details.
>>> 
>>> Best regards,
>>> Andre
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
>>> For additional commands, e-mail: dev-h...@openoffice.apache.org
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
>> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: News about the new OOXML framework.

2014-06-10 Thread Andre Fischer

Another update of my progress.

I can now create a validating parser, i.e. one that checks that a 
document conforms to the specs while it parses its content.
At the moment the validation is restricted to complex types (as opposed 
to simple types and attributes) but I think that is the hardest part.


One NFA (non-deterministic finite automaton) is created for each complex 
type and one for the top level elements.  The NFAs are then converted 
into equivalent DFAs (deterministic FAs) and finally minimized (via the 
Hopcroft algorithm).  The minimization step became necessary when I 
added support for the 'all' schema element which states that its 
children each occur once in arbitrary order. Recognizing this with an FA 
leads to enumerate all permutations of the children.  With n children 
there are n! permutations.  Luckily the 'all' element is used only once 
and then only for 7 children (7! = 5040).


Here are some numbers:
The 1st and 4th edition of the ECMA-376 specification (1st edition is 
what is used by MS Office, 4th edition is equivalent to the ISO 
standard) have 40 schema files.

These contain 1917 complex types and 781 simple types.
Used are 1851 complex types and 727 simple types (have to check if there 
are really unused complex types or if my optimization is broken).


The non-validating parser has 1853 states and 6987 transitions.

The validating parser has 129530 states and 43512 transitions after 
creating the NFAs.

After conversion to DFAs there remain 20999 states and 73772 transitions.
After minimization there are 6097 states and 34286 transitions.

Please note that the time for parsing OOXML documents does not depend on 
the number of states or transitions.   It only depends on the length of 
the input.  The number of states and transitions only make the parser 
bigger.


Progress and commits are tracked in issue 125035.

Best regards,
Andre


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: News about the new OOXML framework.

2014-06-10 Thread Kay Schenk
On Tue, Jun 10, 2014 at 2:58 AM, Andre Fischer  wrote:

> Another update of my progress.
>
> I can now create a validating parser, i.e. one that checks that a document
> conforms to the specs while it parses its content.
> At the moment the validation is restricted to complex types (as opposed to
> simple types and attributes) but I think that is the hardest part.
>
> One NFA (non-deterministic finite automaton) is created for each complex
> type and one for the top level elements.  The NFAs are then converted into
> equivalent DFAs (deterministic FAs) and finally minimized (via the Hopcroft
> algorithm).  The minimization step became necessary when I added support
> for the 'all' schema element which states that its children each occur once
> in arbitrary order. Recognizing this with an FA leads to enumerate all
> permutations of the children.  With n children there are n! permutations.
>  Luckily the 'all' element is used only once and then only for 7 children
> (7! = 5040).
>
> Here are some numbers:
> The 1st and 4th edition of the ECMA-376 specification (1st edition is what
> is used by MS Office, 4th edition is equivalent to the ISO standard) have
> 40 schema files.
> These contain 1917 complex types and 781 simple types.
> Used are 1851 complex types and 727 simple types (have to check if there
> are really unused complex types or if my optimization is broken).
>
> The non-validating parser has 1853 states and 6987 transitions.
>
> The validating parser has 129530 states and 43512 transitions after
> creating the NFAs.
> After conversion to DFAs there remain 20999 states and 73772 transitions.
> After minimization there are 6097 states and 34286 transitions.
>
> Please note that the time for parsing OOXML documents does not depend on
> the number of states or transitions.   It only depends on the length of the
> input.  The number of states and transitions only make the parser bigger.
>
> Progress and commits are tracked in issue 125035.


Thanks for the update -- very impressive!


>
>
> Best regards,
> Andre
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
>
>


-- 
-
MzK

"In the midst of winter, I found there was, within me,
 an invincible summer."
  -- Albert Camus


Re: News about the new OOXML framework.

2014-06-10 Thread Marcus (OOo)

Am 06/10/2014 11:58 AM, schrieb Andre Fischer:

Another update of my progress.

I can now create a validating parser, i.e. one that checks that a
document conforms to the specs while it parses its content.
At the moment the validation is restricted to complex types (as opposed
to simple types and attributes) but I think that is the hardest part.

One NFA (non-deterministic finite automaton) is created for each complex
type and one for the top level elements. The NFAs are then converted
into equivalent DFAs (deterministic FAs) and finally minimized (via the
Hopcroft algorithm). The minimization step became necessary when I added
support for the 'all' schema element which states that its children each
occur once in arbitrary order. Recognizing this with an FA leads to
enumerate all permutations of the children. With n children there are n!
permutations. Luckily the 'all' element is used only once and then only
for 7 children (7! = 5040).

Here are some numbers:
The 1st and 4th edition of the ECMA-376 specification (1st edition is
what is used by MS Office, 4th edition is equivalent to the ISO
standard) have 40 schema files.
These contain 1917 complex types and 781 simple types.
Used are 1851 complex types and 727 simple types (have to check if there
are really unused complex types or if my optimization is broken).

The non-validating parser has 1853 states and 6987 transitions.

The validating parser has 129530 states and 43512 transitions after
creating the NFAs.
After conversion to DFAs there remain 20999 states and 73772 transitions.
After minimization there are 6097 states and 34286 transitions.

Please note that the time for parsing OOXML documents does not depend on
the number of states or transitions. It only depends on the length of
the input. The number of states and transitions only make the parser
bigger.

Progress and commits are tracked in issue 125035.


that's existing. I cannot really follow the technical stuff and what 
this all means but it sounds like a great progress in the OOXML area. 
And for me this is exiting.


Marcus


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: News about the new OOXML framework.

2014-06-11 Thread Jürgen Schmidt
On 10/06/14 23:54, Marcus (OOo) wrote:
> Am 06/10/2014 11:58 AM, schrieb Andre Fischer:
>> Another update of my progress.
>>
>> I can now create a validating parser, i.e. one that checks that a
>> document conforms to the specs while it parses its content.
>> At the moment the validation is restricted to complex types (as opposed
>> to simple types and attributes) but I think that is the hardest part.
>>
>> One NFA (non-deterministic finite automaton) is created for each complex
>> type and one for the top level elements. The NFAs are then converted
>> into equivalent DFAs (deterministic FAs) and finally minimized (via the
>> Hopcroft algorithm). The minimization step became necessary when I added
>> support for the 'all' schema element which states that its children each
>> occur once in arbitrary order. Recognizing this with an FA leads to
>> enumerate all permutations of the children. With n children there are n!
>> permutations. Luckily the 'all' element is used only once and then only
>> for 7 children (7! = 5040).
>>
>> Here are some numbers:
>> The 1st and 4th edition of the ECMA-376 specification (1st edition is
>> what is used by MS Office, 4th edition is equivalent to the ISO
>> standard) have 40 schema files.
>> These contain 1917 complex types and 781 simple types.
>> Used are 1851 complex types and 727 simple types (have to check if there
>> are really unused complex types or if my optimization is broken).
>>
>> The non-validating parser has 1853 states and 6987 transitions.
>>
>> The validating parser has 129530 states and 43512 transitions after
>> creating the NFAs.
>> After conversion to DFAs there remain 20999 states and 73772 transitions.
>> After minimization there are 6097 states and 34286 transitions.
>>
>> Please note that the time for parsing OOXML documents does not depend on
>> the number of states or transitions. It only depends on the length of
>> the input. The number of states and transitions only make the parser
>> bigger.
>>
>> Progress and commits are tracked in issue 125035.
> 
> that's existing. I cannot really follow the technical stuff and what
> this all means but it sounds like a great progress in the OOXML area.
> And for me this is exiting.

well it simply means that we spent already a lot of time in analyzing
the stuff we have, what and how we can improve it. And it shows that we
follow a very professional approach ;-)

We mean it serious with an improved and better OOXML support including
export. It is what users need and expect because the interoperability
with MS Office is still very important and OOXML becomes more and more
popular and broader used. If we like it or not.

This covers also a lot of ground work, refactoring and feature
development over time to make it possible. The parsing framework is only
one but a very important part of it.

And we will propose, discuss all the things we have in mind in detail
here on the list that everybody who is interested can join our efforts.

Juergen



-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: News about the new OOXML framework.

2014-06-19 Thread Oliver-Rainer Wittmann

Hi,

in the last days I made up my mind about a further part of the OOXML 
framework - namely some tooling to handle the different XML streams 
(called 'part' in the OOXML specification) and the corresponding meta 
data about its content types and its relations to each other (called 
'relationships' in the OOXML specification).


Based on Andre's framework and in some pair-programming sessions with 
Andre a prototype in Java had been worked out to verify my ideas. This 
prototype was also the first usage case to create corresponding imports 
of the meta data using Andre's prototyped OOXMLParser.


The ideas could be implemented quite well and worked. Further analysis 
and concept validation will follow.



Best regards, Oliver.

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Ongoing OOXML development and "implementation defined" items.

2014-08-04 Thread jan i
Hi.

Based on the information from dennis/peter, I think it would be highly
interesting to know:

1) How are the "implementation defined" items implemented, is the intention
to forward a list to dev@ and ask for opinions (of course in a style so an
opinion is possible to give) ?

2) How will the "implementation defined" items, be documented, they are
likely to change over time, to its problaly a poing where maintenance is
needed, and thus a higher demand on documentation ?

3) Will the OOXML implementation allow round-trip of our documents without
loss of information ?


unlike the sidebar development, this develoment seems closed, I expect
there are good reasons for it, but items where many can have an opinion
(format conversion being a typical example) would be nice to discuss.

thanks in advance.
rgds
jan I


I have posted 3 wiki for OOXML export

2014-02-18 Thread shzh zhao
1. shape export
https://cwiki.apache.org/confluence/display/OOOUSERS/Shape+Export+in+OOXML+Export
2. DrawingML shape export
https://cwiki.apache.org/confluence/display/OOOUSERS/DrawingML+export+in+OOXML+export
3. UT method in OOXML export
https://cwiki.apache.org/confluence/display/OOOUSERS/UT+method+in+OOXML+Export

VML shape export will be added later.
-- 



*mailto: *aoo.zhaos...@gmail.com <https://google.com/profiles>
<https://google.com/profiles>


a very special question about the OOXML fileformat

2014-04-09 Thread Jörg Schmidt
Hello,

I hope the following is not OT, although it is not specifically a question 
about OO:

I save the same file in MS Excel 2010 (a) and LO 4.1.x (b) in OOXML format 
(*.xlsx). Now I look at the file headers of (a) and (b) and find the first 10 
bytes:

(a): 50  4B  03  04  14  00  06  00  08  00
(b): 50  4B  03  04  14  00  08  08  08  00 

Because they are zip files the first 4 bytes are 50 4B 03 04 and equal, as well 
the following two bytes are equal. However, bytes 7 and 8 differ. OK, in 
principle i can look up what these bytes encode:
https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html

My question is what values for these 2 bytes of the header are allowed at all 
so it still corresponds to the ISO standard for OOXML.

In the ISO standard: 
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51463

i could not find anything so far. 
There is only general talk of the zip format and I'm not sure whether that 
means that the values after byte 4 can be differently, depending on the program 
that are used.


Greetings,
Jörg


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



RE: Ongoing OOXML development and "implementation defined" items.

2014-08-04 Thread Dennis E. Hamilton
Jan asks good questions below.

I have a comment with regard to (3) "Will the OOXML implementation allow 
round-trip of our documents without loss of information?"

It strikes me that there are OOXML import and export capabilities already in 
Apache OpenOffice and a better question may have to do with whether those 
capabilities will satisfy condition (3) and how can they be improved to do so.  

I am assuming that Jan is considering more than AOO round-tripping with itself, 
although it would be strange were that not already true.  The trick is 
round-tripping with other implementations of OOXML, perhaps, including the 
OOXML support of the AOO cousin, LibreOffice?

I note that the current support is not being mentioned in this discussion so 
far.  I think it may be that is related to the focus on implementations for 
mobile devices and other non-desktop solutions.  It's not clear though.


 -- Dennis E. Hamilton
dennis.hamil...@acm.org+1-206-779-9430
https://keybase.io/orcmid  PGP F96E 89FF D456 628A
X.509 certs used and requested for signed e-mail



-Original Message-
From: jan i [mailto:j...@apache.org] 
Sent: Monday, August 4, 2014 08:42
To: dev
Subject: Ongoing OOXML development and "implementation defined" items.

Hi.

Based on the information from dennis/peter, I think it would be highly
interesting to know:

1) How are the "implementation defined" items implemented, is the intention
to forward a list to dev@ and ask for opinions (of course in a style so an
opinion is possible to give) ?

2) How will the "implementation defined" items, be documented, they are
likely to change over time, to its problaly a poing where maintenance is
needed, and thus a higher demand on documentation ?

3) Will the OOXML implementation allow round-trip of our documents without
loss of information ?


unlike the sidebar development, this develoment seems closed, I expect
there are good reasons for it, but items where many can have an opinion
(format conversion being a typical example) would be nice to discuss.

thanks in advance.
rgds
jan I


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: Ongoing OOXML development and "implementation defined" items.

2014-08-04 Thread jan i
On 4 August 2014 18:23, Dennis E. Hamilton  wrote:

> Jan asks good questions below.
>
> I have a comment with regard to (3) "Will the OOXML implementation allow
> round-trip of our documents without loss of information?"
>
> It strikes me that there are OOXML import and export capabilities already
> in Apache OpenOffice and a better question may have to do with whether
> those capabilities will satisfy condition (3) and how can they be improved
> to do so.
>
> I am assuming that Jan is considering more than AOO round-tripping with
> itself, although it would be strange were that not already true.  The trick
> is round-tripping with other implementations of OOXML, perhaps, including
> the OOXML support of the AOO cousin, LibreOffice?
>
> Sadly enough round-trip within AOO is not at all given, but you are right
I was thinking about the AOO specialities in e.g. Microsoft word (it could
also be that the new development simply ignores these fields)


> I note that the current support is not being mentioned in this discussion
> so far.  I think it may be that is related to the focus on implementations
> for mobile devices and other non-desktop solutions.  It's not clear though.
>
Or because the import function for OOXML does not work for a large amount
of documents.

thanks for clarifying my questions:
rgds
Jan I.

>
>
>  -- Dennis E. Hamilton
> dennis.hamil...@acm.org+1-206-779-9430
> https://keybase.io/orcmid  PGP F96E 89FF D456 628A
> X.509 certs used and requested for signed e-mail
>
>
>
> -Original Message-
> From: jan i [mailto:j...@apache.org]
> Sent: Monday, August 4, 2014 08:42
> To: dev
> Subject: Ongoing OOXML development and "implementation defined" items.
>
> Hi.
>
> Based on the information from dennis/peter, I think it would be highly
> interesting to know:
>
> 1) How are the "implementation defined" items implemented, is the intention
> to forward a list to dev@ and ask for opinions (of course in a style so an
> opinion is possible to give) ?
>
> 2) How will the "implementation defined" items, be documented, they are
> likely to change over time, to its problaly a poing where maintenance is
> needed, and thus a higher demand on documentation ?
>
> 3) Will the OOXML implementation allow round-trip of our documents without
> loss of information ?
>
>
> unlike the sidebar development, this develoment seems closed, I expect
> there are good reasons for it, but items where many can have an opinion
> (format conversion being a typical example) would be nice to discuss.
>
> thanks in advance.
> rgds
> jan I
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
>
>


Re: Ongoing OOXML development and "implementation defined" items.

2014-08-11 Thread jan i
Hi.

Thanks to a (once again) good tip from andre, I found out why I could not
see the OOXML sources and control the code.

Be carefull when you upgrade svn client to newest, I had to do a fallback
before "svn up" would work correctly.

rgds
jan I.



On 4 August 2014 21:49, jan i  wrote:

>
>
>
> On 4 August 2014 18:23, Dennis E. Hamilton 
> wrote:
>
>> Jan asks good questions below.
>>
>> I have a comment with regard to (3) "Will the OOXML implementation allow
>> round-trip of our documents without loss of information?"
>>
>> It strikes me that there are OOXML import and export capabilities already
>> in Apache OpenOffice and a better question may have to do with whether
>> those capabilities will satisfy condition (3) and how can they be improved
>> to do so.
>>
>> I am assuming that Jan is considering more than AOO round-tripping with
>> itself, although it would be strange were that not already true.  The trick
>> is round-tripping with other implementations of OOXML, perhaps, including
>> the OOXML support of the AOO cousin, LibreOffice?
>>
>> Sadly enough round-trip within AOO is not at all given, but you are right
> I was thinking about the AOO specialities in e.g. Microsoft word (it could
> also be that the new development simply ignores these fields)
>
>
>> I note that the current support is not being mentioned in this discussion
>> so far.  I think it may be that is related to the focus on implementations
>> for mobile devices and other non-desktop solutions.  It's not clear though.
>>
> Or because the import function for OOXML does not work for a large amount
> of documents.
>
> thanks for clarifying my questions:
> rgds
> Jan I.
>
>>
>>
>>  -- Dennis E. Hamilton
>> dennis.hamil...@acm.org+1-206-779-9430
>> https://keybase.io/orcmid  PGP F96E 89FF D456 628A
>> X.509 certs used and requested for signed e-mail
>>
>>
>>
>> -Original Message-
>> From: jan i [mailto:j...@apache.org]
>> Sent: Monday, August 4, 2014 08:42
>> To: dev
>> Subject: Ongoing OOXML development and "implementation defined" items.
>>
>> Hi.
>>
>> Based on the information from dennis/peter, I think it would be highly
>> interesting to know:
>>
>> 1) How are the "implementation defined" items implemented, is the
>> intention
>> to forward a list to dev@ and ask for opinions (of course in a style so
>> an
>> opinion is possible to give) ?
>>
>> 2) How will the "implementation defined" items, be documented, they are
>> likely to change over time, to its problaly a poing where maintenance is
>> needed, and thus a higher demand on documentation ?
>>
>> 3) Will the OOXML implementation allow round-trip of our documents without
>> loss of information ?
>>
>>
>> unlike the sidebar development, this develoment seems closed, I expect
>> there are good reasons for it, but items where many can have an opinion
>> (format conversion being a typical example) would be nice to discuss.
>>
>> thanks in advance.
>> rgds
>> jan I
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
>> For additional commands, e-mail: dev-h...@openoffice.apache.org
>>
>>
>


  1   2   >