Decimal-Fidelity in Calc Interchange (was RE: Difficulties with Flat XML under source control)

2012-06-28 Thread Dennis E. Hamilton
Thanks to the laws of serendipity, I just ran into the paper by Michel Hack on 
"On Intermediate Precision Requirements for Correctly-Rounding 
Decimal-to-Binary Floating-Point Conversion."  The PDF is one of the papers 
available at <http://www.informatik.uni-trier.de/Reports/TR-08-2004/>.  The 
references are all valuable, with citation of the work of David Matula and 
others.  I recall that Guy Steele looked at this issue in the work on Common 
Lisp, but I no longer have source information.  I suspect that Java and 
Fortress work would address this, especially because Java specifies IEEE binary 
as the internal form.

There may be other papers in this family that are of interest in taking the 
bumps out of the use of decimal floating-point values in the persistent form of 
cell values in spreadsheet documents and in formulas having decimal-notation 
literal values, especially where the internal use is via conversion to and then 
from a decimal-incommensurate arithmetic.

I am changing the subject so this thread is recognized for where it has 
wandered. I've also stitched the part related to number drift and conversion 
fidelity back in.

 - Dennis

-Original Message-
From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org] 
Sent: Wednesday, June 20, 2012 10:57
To: 'Thorsten Behrens'
Cc: 'libreoffice-dev'
Subject: RE: Difficulties with Flat XML under source control

It occurs to me that Postscript and PDF have dealt with this for imaging models 
that work consistently.  Here, the "in" is to a renderer, but the model for 
representation of decimal expressions of find-sensitivity values seems to have 
been handled (for years).  Those specifications may be some help too.

 - Dennis

-Original Message-
From: Thorsten [mailto:netsr...@googlemail.com] On Behalf Of Thorsten Behrens
Sent: Wednesday, June 20, 2012 06:32
To: Dennis E. Hamilton
Cc: 'libreoffice-dev'
Subject: Re: Difficulties with Flat XML under source control

Dennis E. Hamilton wrote:
> For out-in (which this is, presumably), you want to record a
> decimal expression of the internal value that will convert back to
> the exact internal value on re-input.  (The in-out case is that
> the input conversion provide whatever internal representation that
> will convert to the read value on re-output.  Without additional
> information, it is generally very difficult to have these be the
> same.)
> 
> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
>
Hi Dennis,

yes - but in a first approximation, one can probably relax this a
bit (for the use case at hand): only _after_ the first save
operation this needs to hold. Also, most people would probably be
contempt with this to work for *one* ODF editing application.

> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
> 
Note that there's a difference between spreadsheet values (for which
I think de facto the above holds true - likely everyone stores those
in IEEE doubles), and other content: consumers might employ rather
complex transformations to arrive at internal values, given e.g. a
gradient center coordinate - asking for common behaviour is very
close to asking for a common ODF application model.

Cheers,

-- Thorsten

-Original Message-
From: libreoffice-bounces+dennis.hamilton=acm@lists.freedesktop.org 
[mailto:libreoffice-bounces+dennis.hamilton=acm@lists.freedesktop.org] On 
Behalf Of Thorsten Behrens
Sent: Wednesday, June 20, 2012 05:49
To: Johannes Sixt
Cc: libreoffice-dev
Subject: Re: Difficulties with Flat XML under source control

Johannes Sixt wrote:
> >> - Measurements change. E.g. (just to pick one case), in
> >>  the draw:visible-area-width changes from
> >> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?
> > 
> > Ah; nasty, some rounding problem / internal representation issue -
> > possibly again looking at the code we could do better here to make it
> > more predictable; possibly using more precision we could do better
> > (doubles instead of floats) ?
> 
> Probably. Looking at this again, these changes seem to happen only for
> draw:visible-area-*. Hence, it may also be a matter of conversion
> between screen dimensions (pixels?) and cm/mm/in/etc.
> 
Hrm, yeah - and we *really* don't want this slow drift - any chance
you can file a bug with a preferrably small sample doc?

Thanks,

-- Thorsten

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-22 Thread Johannes Sixt
Am 20.06.2012 14:48, schrieb Thorsten Behrens:
> Johannes Sixt wrote:
 - Measurements change. E.g. (just to pick one case), in
  the draw:visible-area-width changes from
 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?
>>>
>>> Ah; nasty, some rounding problem / internal representation issue -
>>> possibly again looking at the code we could do better here to make it
>>> more predictable; possibly using more precision we could do better
>>> (doubles instead of floats) ?
>>
>> Probably. Looking at this again, these changes seem to happen only for
>> draw:visible-area-*. Hence, it may also be a matter of conversion
>> between screen dimensions (pixels?) and cm/mm/in/etc.
>>
> Hrm, yeah - and we *really* don't want this slow drift - any chance
> you can file a bug with a preferrably small sample doc?

Here we go:

https://bugs.freedesktop.org/show_bug.cgi?id=51334

draw:visible-area-width and -height are properties that pertain only to
OLE objects, IIUC.

-- Hannes
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-21 Thread Michael Stahl
On 17/06/12 22:10, Johannes Sixt wrote:
> - The  changes. That xml:id does not
> seem to be used anywhere. Can I just remove it? What will I lose?

these are sadly auto-generated, which is a bug in itself; they are used
in ODF itself for continuations, i.e. there can be another list that
continues an existing list by referring to its text:id/xml:id;  then
there is another use in ODF 1.2 where RDF metadata can refer to the
element by its xml:id, but that only works if the xml:id is actually
persistent, i.e. the same value that is imported is then exported again;
making the ids persistent requires extending the Writer core, which is a
bit of work...

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-21 Thread Michael Stahl
On 21/06/12 14:07, Stephan Bergmann wrote:
> On 06/20/2012 03:07 PM, Dennis E. Hamilton wrote:
>> I think it is necessary to look at round-trip out-in conversion preservation.
>>
>> For out-in (which this is, presumably), you want to record a decimal 
>> expression of the internal value that will convert back to the exact 
>> internal value on re-input.  (The in-out case is that the input conversion 
>> provide whatever internal representation that will convert to the read value 
>> on re-output.  Without additional information, it is generally very 
>> difficult to have these be the same.)
>>
>> It is also desirable, of course, that any other ODF consumer use the same 
>> technique so that its in-out conversion satisfies the out-in condition of 
>> the original source of the decimal expression of the value.
>>
>> There are old technical papers on how to have this work.  The name David 
>> Matula comes to mind.
>>
>> There might be solutions in the conversions that exist in the basic Java 
>> classes for float data types.  I think this was addressed in Common Lisp 
>> also.
> 
> Hasn't there been progress in that field recently?  Wait, yes, 
>  "Printing floating-point 
> numbers quickly and accurately with integers" by Florian Loitsch.

i am in awe that it's possible to get a paper on this topic published in
this day and age; one would think this kind of problem would have been
solved 30 years ago, and the developers of popular office suites were
just ignorant of the solutions :)


___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-21 Thread Thorsten Behrens
Stephan Bergmann wrote:
> Hasn't there been progress in that field recently?  Wait, yes,
>  "Printing floating-point
> numbers quickly and accurately with integers" by Florian Loitsch.
> 
Nice catch - and some code is here: http://code.google.com/p/double-conversion/

Cheers,

-- Thorsten


pgpiLP7w9vaFU.pgp
Description: PGP signature
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-21 Thread Stephan Bergmann

On 06/20/2012 03:07 PM, Dennis E. Hamilton wrote:

I think it is necessary to look at round-trip out-in conversion preservation.

For out-in (which this is, presumably), you want to record a decimal expression 
of the internal value that will convert back to the exact internal value on 
re-input.  (The in-out case is that the input conversion provide whatever 
internal representation that will convert to the read value on re-output.  
Without additional information, it is generally very difficult to have these be 
the same.)

It is also desirable, of course, that any other ODF consumer use the same 
technique so that its in-out conversion satisfies the out-in condition of the 
original source of the decimal expression of the value.

There are old technical papers on how to have this work.  The name David Matula 
comes to mind.

There might be solutions in the conversions that exist in the basic Java 
classes for float data types.  I think this was addressed in Common Lisp also.


Hasn't there been progress in that field recently?  Wait, yes, 
 "Printing floating-point 
numbers quickly and accurately with integers" by Florian Loitsch.


Stephan
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


RE: Difficulties with Flat XML under source control

2012-06-21 Thread Dennis E. Hamilton
I think it is necessary to look at round-trip out-in conversion preservation.

For out-in (which this is, presumably), you want to record a decimal expression 
of the internal value that will convert back to the exact internal value on 
re-input.  (The in-out case is that the input conversion provide whatever 
internal representation that will convert to the read value on re-output.  
Without additional information, it is generally very difficult to have these be 
the same.)

It is also desirable, of course, that any other ODF consumer use the same 
technique so that its in-out conversion satisfies the out-in condition of the 
original source of the decimal expression of the value.  

There are old technical papers on how to have this work.  The name David Matula 
comes to mind.

There might be solutions in the conversions that exist in the basic Java 
classes for float data types.  I think this was addressed in Common Lisp also.  

-Original Message-
From: libreoffice-bounces+dennis.hamilton=acm@lists.freedesktop.org 
[mailto:libreoffice-bounces+dennis.hamilton=acm@lists.freedesktop.org] On 
Behalf Of Thorsten Behrens
Sent: Wednesday, June 20, 2012 05:49
To: Johannes Sixt
Cc: libreoffice-dev
Subject: Re: Difficulties with Flat XML under source control

Johannes Sixt wrote:
> >> - Measurements change. E.g. (just to pick one case), in
> >>  the draw:visible-area-width changes from
> >> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?
> > 
> > Ah; nasty, some rounding problem / internal representation issue -
> > possibly again looking at the code we could do better here to make it
> > more predictable; possibly using more precision we could do better
> > (doubles instead of floats) ?
> 
> Probably. Looking at this again, these changes seem to happen only for
> draw:visible-area-*. Hence, it may also be a matter of conversion
> between screen dimensions (pixels?) and cm/mm/in/etc.
> 
Hrm, yeah - and we *really* don't want this slow drift - any chance
you can file a bug with a preferrably small sample doc?

Thanks,

-- Thorsten

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


RE: Difficulties with Flat XML under source control

2012-06-20 Thread Dennis E. Hamilton
It occurs to me that Postscript and PDF have dealt with this for imaging models 
that work consistently.  Here, the "in" is to a renderer, but the model for 
representation of decimal expressions of find-sensitivity values seems to have 
been handled (for years).  Those specifications may be some help too.

 - Dennis

-Original Message-
From: Thorsten [mailto:netsr...@googlemail.com] On Behalf Of Thorsten Behrens
Sent: Wednesday, June 20, 2012 06:32
To: Dennis E. Hamilton
Cc: 'libreoffice-dev'
Subject: Re: Difficulties with Flat XML under source control

Dennis E. Hamilton wrote:
> For out-in (which this is, presumably), you want to record a
> decimal expression of the internal value that will convert back to
> the exact internal value on re-input.  (The in-out case is that
> the input conversion provide whatever internal representation that
> will convert to the read value on re-output.  Without additional
> information, it is generally very difficult to have these be the
> same.)
> 
> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
>
Hi Dennis,

yes - but in a first approximation, one can probably relax this a
bit (for the use case at hand): only _after_ the first save
operation this needs to hold. Also, most people would probably be
contempt with this to work for *one* ODF editing application.

> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
> 
Note that there's a difference between spreadsheet values (for which
I think de facto the above holds true - likely everyone stores those
in IEEE doubles), and other content: consumers might employ rather
complex transformations to arrive at internal values, given e.g. a
gradient center coordinate - asking for common behaviour is very
close to asking for a common ODF application model.

Cheers,

-- Thorsten

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-20 Thread Thorsten Behrens
Dennis E. Hamilton wrote:
> For out-in (which this is, presumably), you want to record a
> decimal expression of the internal value that will convert back to
> the exact internal value on re-input.  (The in-out case is that
> the input conversion provide whatever internal representation that
> will convert to the read value on re-output.  Without additional
> information, it is generally very difficult to have these be the
> same.)
> 
> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
>
Hi Dennis,

yes - but in a first approximation, one can probably relax this a
bit (for the use case at hand): only _after_ the first save
operation this needs to hold. Also, most people would probably be
contempt with this to work for *one* ODF editing application.

> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
> 
Note that there's a difference between spreadsheet values (for which
I think de facto the above holds true - likely everyone stores those
in IEEE doubles), and other content: consumers might employ rather
complex transformations to arrive at internal values, given e.g. a
gradient center coordinate - asking for common behaviour is very
close to asking for a common ODF application model.

Cheers,

-- Thorsten


pgp9ixmZUauRP.pgp
Description: PGP signature
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-20 Thread Thorsten Behrens
Johannes Sixt wrote:
> >> - Measurements change. E.g. (just to pick one case), in
> >>  the draw:visible-area-width changes from
> >> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?
> > 
> > Ah; nasty, some rounding problem / internal representation issue -
> > possibly again looking at the code we could do better here to make it
> > more predictable; possibly using more precision we could do better
> > (doubles instead of floats) ?
> 
> Probably. Looking at this again, these changes seem to happen only for
> draw:visible-area-*. Hence, it may also be a matter of conversion
> between screen dimensions (pixels?) and cm/mm/in/etc.
> 
Hrm, yeah - and we *really* don't want this slow drift - any chance
you can file a bug with a preferrably small sample doc?

Thanks,

-- Thorsten


pgpmbQ8ftan0B.pgp
Description: PGP signature
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-20 Thread Miklos Vajna
On Tue, Jun 19, 2012 at 07:56:08PM +0200, Johannes Sixt  wrote:
> > The code to poke at is in:
> > 
> > xmloff/
> > and
> > sw/source/filter/xml/
> 
> Been there, done that. But it's way over my head (and time budget). See
> 
> http://thread.gmane.org/gmane.comp.documentfoundation.libreoffice.devel/23528/focus=23543

Still, once you have such a "clean" script it would be nice to see what
tricks does it do, so we could (step by step) fix LO itself; in the long
term then you would not need such a filter. ;-)
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-19 Thread Johannes Sixt
Michael,

thanks for your feedback!

Am 19.06.2012 10:48, schrieb Michael Meeks:
> On Sun, 2012-06-17 at 22:10 +0200, Johannes Sixt wrote:
>> I'm writing a small tool that transforms the XML into a canonical format
>> so that only substantial changes remain. The question is: Which
>> transformations are allowed?
> 
>   Oh - so ... why write an external tool to do this, and not just fix it
> in LibreOffice ! ? :-)

Because I'm using git, and then it's just a matter of a "simple" 'clean
filter'. :-)

>> -  changes. It's not a problem, I don't care about this.
> 
>   Some level of sorting here might help too.

Not only that. Most of the stuff is irrelevant (diverse counts, editing
duration, time of last edit). That should just be removed if the
document is placed under source control. Such stuff leads to merge
conflicts almost by definition.

(And, BTW, to be able to keep different modifications of the manual in
different branches and *merge* them again is the whole point of this
excercise.)

>> -  changes. I don't know, yet, whether I mind or not.

I'll try removing this entire section and hope that LO does something
sensible.

>> - The  changes. That xml:id does not
>> seem to be used anywhere. Can I just remove it? What will I lose?
> 
>   No idea; if it's unused just try removing it and see what happens.

The ids are sometimes used in a text:continue-list attribute. Hence,
they can't be stripped out blindly.

>> - Measurements change. E.g. (just to pick one case), in
>>  the draw:visible-area-width changes from
>> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?
> 
>   Ah; nasty, some rounding problem / internal representation issue -
> possibly again looking at the code we could do better here to make it
> more predictable; possibly using more precision we could do better
> (doubles instead of floats) ?

Probably. Looking at this again, these changes seem to happen only for
draw:visible-area-*. Hence, it may also be a matter of conversion
between screen dimensions (pixels?) and cm/mm/in/etc.

>   So - the best place to fix this stuff is inside LibreOffice itself :-)
> then it is permanently fixed for everyone: you are not the only problem
> with this pain - soon we'll be using flat odf for our templates and will
> suffer the same way :-) 
> 
>   The code to poke at is in:
> 
>   xmloff/
> and
>   sw/source/filter/xml/

Been there, done that. But it's way over my head (and time budget). See

http://thread.gmane.org/gmane.comp.documentfoundation.libreoffice.devel/23528/focus=23543

-- Hannes
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: Difficulties with Flat XML under source control

2012-06-19 Thread Michael Meeks
Hi Johannes,

On Sun, 2012-06-17 at 22:10 +0200, Johannes Sixt wrote:
> I want to place a software manual under source control. It seems most
> feasible to use a flat XML format, in particular, .fodt.

Yes - that's a good plan :-)

> But I have some difficulties because when LO 3.5.4 opens a .fodt and
> saves it again without making any changes, the resulting file changes
> nevertheless.

Right - this is a regular annoyance ! :-)

> I'm writing a small tool that transforms the XML into a canonical format
> so that only substantial changes remain. The question is: Which
> transformations are allowed?

Oh - so ... why write an external tool to do this, and not just fix it
in LibreOffice ! ? :-)

We'd be -very- interested in some patches that we can apply that will
sort the automatic styles, and generate them with consistent naming in a
sensible order :-)

> (This seems to work so far.)

The style rendering sounds sensible.

> But there are other changes:
> 
> -  changes. It's not a problem, I don't care about this.

Some level of sorting here might help too.

> -  changes. I don't know, yet, whether I mind or not.
> 
> - The  attribute changes. Can I just
> replace the z-index with 1 or 2? What will happen?

Odd :-) perhaps when we have smaller changes we can chase these
oddnesses down better.

> - The  changes. That xml:id does not
> seem to be used anywhere. Can I just remove it? What will I lose?

No idea; if it's unused just try removing it and see what happens.

> - Measurements change. E.g. (just to pick one case), in
>  the draw:visible-area-width changes from
> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?

Ah; nasty, some rounding problem / internal representation issue -
possibly again looking at the code we could do better here to make it
more predictable; possibly using more precision we could do better
(doubles instead of floats) ?

> Any insights are welcome!

So - the best place to fix this stuff is inside LibreOffice itself :-)
then it is permanently fixed for everyone: you are not the only problem
with this pain - soon we'll be using flat odf for our templates and will
suffer the same way :-) 

The code to poke at is in:

xmloff/
and
sw/source/filter/xml/

It's not too hard to build libreoffice, checkout:

http://www.libreoffice.org/developers-2/

Patches are very much more than welcome ! :-)

Thanks !

Michael.

-- 
michael.me...@suse.com  <><, Pseudo Engineer, itinerant idiot

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice