Re: [Python-Dev] Formatting mini-language suggestion

2009-03-12 Thread Jeroen Ruigrok van der Werven
-On [20090312 06:50], Lie Ryan (lie.1...@gmail.com) wrote:
>How about having a country code field, e.g. en-us would format according 
>to US locale, in to India, ch to China, etc... that way the format 
>string would become very simple (although the lib maintainer would need 
>to know customs from all over the world). Then have a special country 
>code that is a placeholder for whatever the locale the machine is set to.

Then you are effectively duplicating what is already available via CLDR [1]
and Babel [2].

[1] http://www.unicode.org/cldr/
[2] http://babel.edgewall.org/

-- 
Jeroen Ruigrok van der Werven  / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Any road leads to the end of the world...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Gisle Aas

On Mar 11, 2009, at 22:43 , Cameron Simpson wrote:


On 11Mar2009 10:09, Joachim K?nig  wrote:

Guido van Rossum wrote:
On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes  
 wrote:

[...]
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54 
.

[...]
If I understand the post properly, it's up to the app to call  
fsync(),
and it's only necessary when you're doing one of the rename  
dances, or
updating a file in place. Basically, as he explains, fsync() is a  
very

heavyweight operation; I'm against calling it by default anywhere.


To me, the flaw seem to be in the close() call (of the operating
system). I'd expect the data to be
in a persistent state once the close() returns. So there would be no
need to fsync if the file gets closed anyway.


Not really. On the whole, flush() means "the object has handed all  
data

to the OS".  close() means "the object has handed all data to the OS
and released the control data structures" (OS file descriptor release;
like the OS, the python interpreter may release python stuff later  
too).


By contrast, fsync() means "the OS has handed filesystem changes to  
the
disc itself". Really really slow, by comparison with memory. It is  
Very

Expensive, and a very different operation to close().


...and at least on OS X there is one level more where you actually  
tell the

disc to flush its buffers to permanent storage with:

   fcntl(fd, F_FULLSYNC)

The fsync manpage says:

 Note that while fsync() will flush all data from the host to the  
drive
 (i.e. the "permanent storage device"), the drive itself may not  
physi-
 cally write the data to the platters for quite some time and it  
may be

 written in an out-of-order sequence.

 Specifically, if the drive loses power or the OS crashes, the  
application
 may find that only some or none of their data was written.  The  
disk
 drive may also re-order the data so that later writes may be  
present,

 while earlier writes are not.

 This is not a theoretical edge case.  This scenario is easily  
reproduced

 with real world workloads and drive power failures.

 For applications that require tighter guarantees about the  
integrity of
 their data, Mac OS X provides the F_FULLFSYNC fcntl.  The  
F_FULLFSYNC
 fcntl asks the drive to flush all buffered data to permanent  
storage.
 Applications, such as databases, that require a strict ordering  
of writes
 should use F_FULLFSYNC to ensure that their data is written in  
the order

 they expect.  Please see fcntl(2) for more detail.

It's not obvious what level of syncing is appropriate to automatically  
happen

from Python so I think it's better to let the application deal with it.

--Gisle

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Antoine Pitrou
Nick Coghlan  gmail.com> writes:
> 
> On the performance side... the overhead from fsync() itself is going to
> dwarf the CPU overhead of going through a wrapper class.

The significant overhead is not in calling sync() or flush() or close(), but in
calling methods which are supposed to be fast (read() from internal buffer or
write() to internal buffer, for example).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-ideas] Rough draft: Proposed format specifier for a thousands separator (discussion moved from python-dev)

2009-03-12 Thread Raymond Hettinger

Here's an update incorporating all the comments received so far.

* Put into PEP format
* Fixed typos
* The suggestion for modifying the locale module was dropped.
* The "n" specifier in the local module was referenced
* Fixed minimumwidth --> width
* PERIOD --> DOT
* Added suggestions by Lie Ryan and Eric Smith

---


PEP: XXX
Title: Format Specifier for Thousands Separator
Version: $Revision$
Last-Modified: $Date$
Author: Raymond Hettinger 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Mar-2009
Post-History: 12-Mar-2009


Motivation
==

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output
exposed to end users.

In the finance world, output with commas is the norm.  Finance
users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention.  The goal is to make a common task easier
for many users.


Current Version of the Mini-Language


* `Python 2.6 docs`_

 .. _Python 2.6 docs: http://docs.python.org/library/string.html#formatstrings

* PEP 3101 Advanced String Formatting


Research so far
===

Scanning the web, I've found that thousands separators are
usually one of COMMA, DOT, SPACE, or UNDERSCORE.
When a COMMA is the decimal separator, the thousands separator
is typically a DOT or SPACE (see examples from Denis Spir).

James Knight observed that Indian/Pakistani numbering systems
group by hundreds.   Ben Finney noted that Chinese group by
ten-thousands.  Eric Smith pointed-out that these are already
handled by the "n" specifier in the locale module (albiet only
for integers).

Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
specifiers like: "_($* #,##0_)".


Proposal I (from Nick Coghlan)
==

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][width][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example::

 format(n, "6,f").replace(",", "_")

This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped::

 format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")


Proposal II (to meet Antoine Pitrou's request)
==

Make both the thousands separator and decimal separator user
specifiable but not locale aware.  For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][width][T[tsep]][dsep precision][type]

Examples::

 format(1234, "8.1f")--> '  1234.0'
 format(1234, "8,1f")--> '  1234,0'
 format(1234, "8T.,1f")  --> ' 1.234,0'
 format(1234, "8T .f")   --> ' 1 234,0'
 format(1234, "8d")  --> '1234'
 format(1234, "8T,d")--> '   1,234'

This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but it comes at the
expense of being a little more complicated to learn and
remember.  Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.

No change is proposed for the local module.


Other Ideas
===

* Lie Ryan suggested a convenience function of the form::

   create_format(self, type='i', base=16, seppos=4, sep=':', \
 charset='0123456789abcdef', maxwidth=32,\
 minwidth=32, pad='0')

* Eric Smith would like the C version of the mini-language
 parser to be exposed.  That would make it easier to write
 custom __format__ methods.


Copyright
=

This document has been placed in the public domain.



..
  Local Variables:
  mode: indented-text
  indent-tabs-mode: nil
  sentence-end-double-space: t
  fill-column: 70
  coding: utf-8
  End: 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Building Py3K branch docs with Sphinx

2009-03-12 Thread Tim Golden
Can I ask which flavour of Sphinx is being used to build the py3k docs? 
I've taken the naive approach of simply pulling the sources from

branches/py3k and then calling make checkout to fetch the appropriate
sources, but these are from http://svn.python.org/projects and are
the same for 2.x and 3.x (and don't work under 3.x).

The latest sphinx from its mercurial tip repo has the same issues
so I wondered whether built the released docs used some other svn 
source or simply patched. The readme points out that the code won't 
work under Python 3.x but someone's managed to build the docs for 
the already-released versions.


(using the make.bat under Windows, but AFAICT the unix-style Makefile 
would have the same issues).


TJG
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Steven D'Aprano
On Thu, 12 Mar 2009 01:03:13 pm Antoine Pitrou wrote:
> Nick Coghlan  gmail.com> writes:
> > The tempfile module would be another example.
>
> Do you really need your temporary files to survive system crashes? ;)

It depends on what you mean by "temporary".

Applications like OpenOffice can sometimes recover from an application 
crash or even a systems crash and give you the opportunity to restore 
the temporary files that were left lying around. Firefox does the same 
thing -- after a crash, it offers you the opportunity to open the 
websites you had open before. Konquorer does much the same, except it 
can only recover from application crashes, not system crashes. I can't 
tell you how many times such features have saved my hide!




-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Antoine Pitrou
Steven D'Aprano  pearwood.info> writes:
> 
> It depends on what you mean by "temporary".
> 
> Applications like OpenOffice can sometimes recover from an application 
> crash or even a systems crash and give you the opportunity to restore 
> the temporary files that were left lying around.

For such files, you want deterministic naming in order to find them again, so
you won't use the tempfile module...



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Building Py3K branch docs with Sphinx

2009-03-12 Thread Tim Golden

Tim Golden wrote:
Can I ask which flavour of Sphinx is being used to build the py3k docs? 
I've taken the naive approach of simply pulling the sources from

branches/py3k and then calling make checkout to fetch the appropriate
sources, but these are from http://svn.python.org/projects and are
the same for 2.x and 3.x (and don't work under 3.x).



... or I could just use an existing Python 2.x installation to build
the 3.x docs. Obviously. (slaps forehead)

TJG
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Toshio Kuratomi
Antoine Pitrou wrote:
> Steven D'Aprano  pearwood.info> writes:
>> It depends on what you mean by "temporary".
>>
>> Applications like OpenOffice can sometimes recover from an application 
>> crash or even a systems crash and give you the opportunity to restore 
>> the temporary files that were left lying around.
> 
> For such files, you want deterministic naming in order to find them again, so
> you won't use the tempfile module...
> 
Something that doesn't require deterministicly named tempfiles was Ted
T'so's explanation linked to earlier.

read data from important file
modify data
create tempfile
write data to tempfile
*sync tempfile to disk*
mv tempfile to filename of important file

The sync is necessary to ensure that the data is written to the disk
before the old file overwrites the new filename.

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Building Py3K branch docs with Sphinx

2009-03-12 Thread Tim Golden

andrew cooke wrote:

Tim Golden wrote:

Tim Golden wrote:

Can I ask which flavour of Sphinx is being used to build the py3k docs?
I've taken the naive approach of simply pulling the sources from
branches/py3k and then calling make checkout to fetch the appropriate
sources, but these are from http://svn.python.org/projects and are
the same for 2.x and 3.x (and don't work under 3.x).


... or I could just use an existing Python 2.x installation to build
the 3.x docs. Obviously. (slaps forehead)


I asked about this on the Sphinx list a while back.  I didn't get any
response at the time, but checking now I see that a week later someone
(the author I assume) commented -
http://groups.google.com/group/sphinx-dev/browse_thread/thread/9a0286f5deeb2912/778a02c397295add

So it seems that there is no public solution until release 0.6, and that
you cannot be able to run doctests when running with a "different" Python
version (my code should work with 3.0 and 2.6, so tests might work; for
some reason I can no longer remember I disabled that).



Thanks for the update; the thing's a bit complicated because Sphinx
is based on docutils and docutils makes heavy use of except ABC, def
and of unicode strings. I tried hand-changing it briefly but it all
got a bit cumbersome. Maybe 2to3 will work ok. For now, tho', I've
switched to using 2.x to generate and all is well.

TJG
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-ideas] Rough draft: Proposed formatspecifier for a thousands separator (discussion moved frompython-dev)

2009-03-12 Thread Raymond Hettinger

Fixed typo in the example with spaces and commas.
Discussion draft at: http://www.python.org/dev/peps/pep-0378/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] py: urls, new bazaar plugin available

2009-03-12 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello Bazaar users!

There's a new Bazaar plugin you can use to more easily access read- 
only or read-write branches on code.python.org.  This plugin provides  
the 'py:' url prefix.  For example, to get the trunk branch with the  
plugin installed, you can now do:


bzr branch py:trunk

or to get the 2.6 branch you can do:

bzr branch py:2.6

You can also use this to get user branches, e.g. my email rewrite  
branch:


bzr branch py:~barry/30email

If you have write access to branches on code.python.org, you can  
either set the environment variable $PYDEV_USER or the Bazaar  
configuration option pydev_user (the value doesn't matter) to get bzr 
+ssh access instead of the standard http access.  py: works both for  
branching and pushing.


Thanks to Karl Fogel for the implementation.  You'll need Karl's pydev  
plugin branch, and instructions on installing this are available here:


http://tinyurl.com/aq55oc

I've updated the wiki page with additional details:

http://wiki.python.org/moin/Bazaar

Enjoy!
Barry

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSbk3xHEjvBPtnXfVAQIiVgQAt3GwmDSkFjog/mp4PxVKn/F6MQoEDa/A
0nNysiU2oEvUXDBWPlab2V53tqpZ/uvOS17hl7ZhlDe+Z2jUUYiCkH+Hfvpz5F9n
u0Gf+5dMA7GQkLhvOezu7r6ngu2mmBB84ZwAfX4tJM+bBuQEn+U3BuRspkDiCb0S
iZONBdHyk5g=
=Pb2v
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Building Py3K branch docs with Sphinx

2009-03-12 Thread andrew cooke
Tim Golden wrote:
> Tim Golden wrote:
>> Can I ask which flavour of Sphinx is being used to build the py3k docs?
>> I've taken the naive approach of simply pulling the sources from
>> branches/py3k and then calling make checkout to fetch the appropriate
>> sources, but these are from http://svn.python.org/projects and are
>> the same for 2.x and 3.x (and don't work under 3.x).
>
>
> ... or I could just use an existing Python 2.x installation to build
> the 3.x docs. Obviously. (slaps forehead)

I asked about this on the Sphinx list a while back.  I didn't get any
response at the time, but checking now I see that a week later someone
(the author I assume) commented -
http://groups.google.com/group/sphinx-dev/browse_thread/thread/9a0286f5deeb2912/778a02c397295add

So it seems that there is no public solution until release 0.6, and that
you cannot be able to run doctests when running with a "different" Python
version (my code should work with 3.0 and 2.6, so tests might work; for
some reason I can no longer remember I disabled that).

Anyway, I generate docs for 3.x code using 2.x and it does work (without
doctests).

Andrew

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-ideas] Rough draft: Proposed format specifier for a thousands separator (discussion moved from python-dev)

2009-03-12 Thread Eric Smith

Raymond Hettinger wrote:

Eric Smith pointed-out that these are already
handled by the "n" specifier in the locale module (albiet only
for integers).


It's supported by float, but it's just not very useful. For Decimal it's
unsupported. Maybe this isn't a distinction worth pointing out.


Proposal I (from Nick Coghlan)
==

...

[[fill]align][sign][#][0][width][,][.precision][type]



Proposal II (to meet Antoine Pitrou's request)
==

...

[[fill]align][sign][#][0][width][T[tsep]][dsep precision][type]


I was going to suggest that since the locale name for this is 
"grouping", we use "G". But since we're not doing a general-purpose 
grouping implementation, I think "T" better says "we're doing thousands, 
not general grouping. Perhaps this should go in a rationale section if 
we opt for "T". Now that I think about it, "G" is already a valid type, 
so it wouldn't work, anyway.



 format(1234, "8T,d")--> '   1,234'


For proposal 2, this case is unfortunate. Because for integers, there is
no decimal allowed in the mini-language (it's currently illegal to use
"8.1d"), you'd only ever add the thousands, but you'd always need the
"T". It would be nice to come up with a specification that would degrade
for integers such that "8,d" would give '   1,234'. Proposal 1 is much
nicer in that regard, although I definitely like the fact that the
actual characters used for DOT and COMMA can be specified with proposal 2.

Maybe you'd never really use "T,", since the comma is redundant, and 
you'd say:

 format(1234, "8Td")--> '   1,234'
in normal use. But "d" is also the default, so it just becomes:
 format(1234, "8T") --> '   1,234'

I like approach 2 in general. I'll give some thought to other, similar 
schemes which would allow "8," or "8,d" to work. I think people will 
write "8," and expect "   1,234", not an error.


Eric.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Martin v. Löwis
> Something that doesn't require deterministicly named tempfiles was Ted
> T'so's explanation linked to earlier.
> 
> read data from important file
> modify data
> create tempfile
> write data to tempfile
> *sync tempfile to disk*
> mv tempfile to filename of important file
> 
> The sync is necessary to ensure that the data is written to the disk
> before the old file overwrites the new filename.

You still wouldn't use the tempfile module in that case. Instead, you
would create a regular file, with the name base on the name of the
important file.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Building Py3K branch docs with Sphinx

2009-03-12 Thread Martin v. Löwis
> Can I ask which flavour of Sphinx is being used to build the py3k docs?

The proper procedure to build the documentation is

make update
make htmlhelp #say

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Jim Jewett
It is starting to look as though flush (and close?) should take an
optional wait parameter, to indicate how much re-assurance you're
willing to wait for.

It also looks like we can't know enough to predict all sensible
symbolic constants -- so instead use a floating point numeric value.

f.flush(wait=0)  ==> current behavior
f.flush(wait=1)  ==> Do everything you can.  On a Mac, this would
apparently mean (everything up to and including) fcntl(fd, F_FULLSYNC)

f.flush(wait=0.5) ==> somewhere in between, depending on the operating
system and file system and disk drive and other stuff the devoloper
won't know in advance.

The exact interpretation of intermediate values might depend on the
installation or even change over time; the only invariant would be
that higher values are at least as safe, and lower values are at least
as fast.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Martin v. Löwis
> It is starting to look as though flush (and close?) should take an
> optional wait parameter, to indicate how much re-assurance you're
> willing to wait for.

Unfortunately, such a thing would be unimplementable on most of today's
operating systems.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Formatting mini-language suggestion

2009-03-12 Thread Greg Ewing

Nick Coghlan wrote:


  [[fill]align][sign][#][0][minimumwidth][,sep][.precision][type]

'sep' is the new field that defines the thousands separator.


Wouldn't it be better to use a locale setting for this,
instead of having to specify it in every format string?
If an app is using a particular thousands separator in
one place, it will probably want to use it everywhere.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread R. David Murray

On Thu, 12 Mar 2009 at 20:56, "Martin v. L?wis" wrote:

It is starting to look as though flush (and close?) should take an
optional wait parameter, to indicate how much re-assurance you're
willing to wait for.


Unfortunately, such a thing would be unimplementable on most of today's
operating systems.


I read Jim's suggestion as a way to indicate the strength of the desire
of the application programmer for certainty, not as a time value.
In other words, 0.0 would map to 'just flush it', 0.5 might map to
'fsync', and 1.0 map to OS-X's "tell the disk to flush its buffers' call.

Assuming I'm right, I don't like the proposal.  It feels too squishy:
the semantics are not well defined.

By the way, I would not like to see python programmers encouraged to make
the same mistake that sqlite3 made.  The decision about how aggressive
to be on flushing data to disk should be in the hands of the _user_, not
the application.  Of course, the application needs some way to enable
the user to make that decision, which is what I presume we are talking
about supporting here.

--
R. David Murray   http://www.bitdance.com___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Formatting mini-language suggestion

2009-03-12 Thread Greg Ewing

James Y Knight wrote:

You might be interested to know that in India, the commas don't come  
every 3 digits. In india, they come every two digits, after the first  
three. Thus one billion = 1,00,00,00,000. How are you gonna represent  
*that* in a formatting mini-language? :)


We outsource it. Send the number by email to a service centre
in India, where an employee formats it for us and sends it
back.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Antoine Pitrou
R. David Murray  bitdance.com> writes:
> 
> By the way, I would not like to see python programmers encouraged to make
> the same mistake that sqlite3 made.  The decision about how aggressive
> to be on flushing data to disk should be in the hands of the _user_, not
> the application.

I disagree. The user usually does not know which kind of flushing is needed in
order for his data to be safe. Actually, he probably doesn't even know what
flushing means, and that files are ever "closed".

However, I also think that any parameter to flush() or close() is a bad idea,
since it can't be used when flushing and closing is implicit. For example when
the file is used in a "with" statement.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread R. David Murray

On Thu, 12 Mar 2009 at 20:25, Antoine Pitrou wrote:

R. David Murray  bitdance.com> writes:


By the way, I would not like to see python programmers encouraged to make
the same mistake that sqlite3 made.  The decision about how aggressive
to be on flushing data to disk should be in the hands of the _user_, not
the application.


I disagree. The user usually does not know which kind of flushing is needed in
order for his data to be safe. Actually, he probably doesn't even know what
flushing means, and that files are ever "closed".


Let me try some examples.

Suppose I'm running my applications on a laptop and I don't want the
disk to be spinning continually while I work.  I'm willing to take the
risk of data loss in order to extend my battery life.

And then there's the high performance server situation, where all the
hardware is at least double redundancy, and we want the fastest disk
performance possible, with data reliability being taken care of by
the redundancy in the system.  (Is this actually possible with today's
hardware and software?  I don't know, but it _should_ be.)

In between there is the medium to low performance, non-redundant server,
where we are willing to trade performance for data integrity.

In all three of these situations I might be running the exact same
application software.

So, the user needs to be in control.  Of course, for users who don't
understand the tradeoffs, there should be a sane default.

Oh, and the user doesn't need to understand flushing, they just
need to be in control of the performance versus data-integrity-
in-the-face-of-crashes tradeoff.

--
R. David Murray   http://www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Jim Jewett
On 3/12/09, "Martin v. Löwis"  wrote:
>> It is starting to look as though flush (and close?) should take an
>> optional wait parameter, to indicate how much re-assurance you're
>> willing to wait for.

> Unfortunately, such a thing would be unimplementable on most of today's
> operating systems.

What am I missing?

_file=file
class file(_file): ...
def flush(self, wait=0):
super().flush(self)
if wait < 0.25:
return
if wait < 0.5 and os.fdatasync:
os.fdatasync(self.fileno())
return
os.fsync(self.fileno())
if wait < 0.75:
return
if os.ffullsync:
os.ffullsync(self.fileno())

(To be honest, I'm not even seeing why it couldn't be done in
Objects/fileobject.c, though I realize extension modules would need to
go through the python interface to take advantage of it.)

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Martin v. Löwis
> Let me try some examples.
> 
> Suppose I'm running my applications on a laptop and I don't want the
> disk to be spinning continually while I work.  I'm willing to take the
> risk of data loss in order to extend my battery life.

So when you select "Save" in your application, would you like the data
to be saved, or would you accept that they get lost? If the latter,
what kind of interaction would you perform with your application to
indicate that you *do* want the data to appear on disk?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread R. David Murray

On Thu, 12 Mar 2009 at 17:01, Jim Jewett wrote:

On 3/12/09, "Martin v. L?wis"  wrote:

It is starting to look as though flush (and close?) should take an
optional wait parameter, to indicate how much re-assurance you're
willing to wait for.



Unfortunately, such a thing would be unimplementable on most of today's
operating systems.


What am I missing?


A less confusing name for your proposed parameter :)

Maybe 'reliability'?


_file=file
class file(_file): ...
   def flush(self, wait=0):
   super().flush(self)
   if wait < 0.25:
   return
   if wait < 0.5 and os.fdatasync:
   os.fdatasync(self.fileno())
   return
   os.fsync(self.fileno())
   if wait < 0.75:
   return
   if os.ffullsync:
   os.ffullsync(self.fileno())

(To be honest, I'm not even seeing why it couldn't be done in
Objects/fileobject.c, though I realize extension modules would need to
go through the python interface to take advantage of it.)___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Martin v. Löwis
Jim Jewett wrote:
> On 3/12/09, "Martin v. Löwis"  wrote:
>>> It is starting to look as though flush (and close?) should take an
>>> optional wait parameter, to indicate how much re-assurance you're
>>> willing to wait for.
> 
>> Unfortunately, such a thing would be unimplementable on most of today's
>> operating systems.
> 
> What am I missing?

As somebody else remarked: I mistook your proposal for a "wait"
parameter to denote a time that you want to wait for the data to appear
on disk, specified, e.g., in seconds.

It didn't occur to me that it might be a unit-less unscaled value, which
I find an ugly API.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Daniel Stutzbach
On Thu, Mar 12, 2009 at 4:09 PM, "Martin v. Löwis" wrote:

> So when you select "Save" in your application, would you like the data
> to be saved, or would you accept that they get lost? If the latter,
> what kind of interaction would you perform with your application to
> indicate that you *do* want the data to appear on disk?
>

I accept that if the computer crashes at just the wrong moment as I click
Save, my changes will not actually be Saved.  No amount of diligence in the
implementation of close() can prevent that since the computer can crash
before the program calls close().

I oppose applications that lose or corrupt both my new save and my
*previous* save if the computer crashes at the wrong moment.  That would
cause me to lose not only my most recent changes (an inconvenience), but
also all the work I have ever done on the file (a major headache for anyone
who doesn't make regular backups).

However, defaulting to calling fsync() when closing a file will:
1) Cripple performance for the many applications that don't need it (e.g.,
temporary files)
2) Fail to prevent data loss for applications that use the
truncate-and-rewrite paradigm for saving

Consider the following example:

with open('mysavefile', 'w') as f:
f.write(data)
f.flush()
os.fsync(f.fileno())
f.close()

If the system crashes after the call to open(), but before the call to
fsync(), then both the old and the new mysavefile may be gone.

Since needing to safely replace a file with new data is a moderately common
task, perhaps it would be useful to have a convenience class that looks like
a file, but takes care of the ugly details behind-the-scenes?  Something
vaguely like this flawed and untested class:

class open_for_safe_replacement(file): # needs a better name
def __init__(self, path, flags):
if 'w' not in flags:
raise RuntimeError, 'Writing without writing?'
self.path = path
self.tmp_name =
some_function_that_generates_a_safe_temporary_filename() # good luck
file.__init__(self.tmp_name, flags)

def close(self):
self.flush()
os.fsync(self.fileno())
self.close()
os.rename(self.tmp_name, self.path) # won't work on Windows :-(

then we could simply:

with appropriate_module.open_for_safe_replacement('mysavefile', 'w'):
f.write(data)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Toshio Kuratomi
Martin v. Löwis wrote:
>> Something that doesn't require deterministicly named tempfiles was Ted
>> T'so's explanation linked to earlier.
>>
>> read data from important file
>> modify data
>> create tempfile
>> write data to tempfile
>> *sync tempfile to disk*
>> mv tempfile to filename of important file
>>
>> The sync is necessary to ensure that the data is written to the disk
>> before the old file overwrites the new filename.
> 
> You still wouldn't use the tempfile module in that case. Instead, you
> would create a regular file, with the name base on the name of the
> important file.
> 
Uhm... why?  The requirements are:

1) lifetime of the temporary file is in control of the app
2) filename is available to the app so it can move it after data is written
3) temporary file can be created on the same filesystem as the important
file.

All of those are doable using the tempfile module.

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev] wait time [was: Ext4 data loss

2009-03-12 Thread Cameron Simpson
On 12Mar2009 22:09, Martin v. L?wis  wrote:
| > Let me try some examples.
| > Suppose I'm running my applications on a laptop and I don't want the
| > disk to be spinning continually while I work.  I'm willing to take the
| > risk of data loss in order to extend my battery life.
| 
| So when you select "Save" in your application, would you like the data
| to be saved, or would you accept that they get lost?

Often, I will accept that they get lost. Why? Because that will only
happen with and OS/hardware failure, and I expect those to be close to
never.

| If the latter,
| what kind of interaction would you perform with your application to
| indicate that you *do* want the data to appear on disk?

I don't. I type "sync" to a convenient shell prompt. On a UNIX OS, that
will not return until all outstanding data at the time of issuing the
command have been commited to disc.

As you can see, that places the timing in the hands of the user.  Where it
belongs, not impacting the performance of the system except at my own
command.

I speak as one who keeps his bogofilter spam datbase on a RAM disc
because bogofilter, too, is subject to atrocious sync overuse, since it
uses a database library that overuses sync. Testing shows at least one
and possibly more _orders_of_magnitude_ improvement in behaviour. Every
so often I copy the bogofilter db back to real disc.

The wholse point of a good OS on decent hardware is that one can commit
data to the _OS_, and trust that it will reach the disc in due course.
Fsync shows an app that doesn't trust the OS.

I hope you don't believe that handing the data to the disc drive
guarentees it has made it to the magnetic medium. It should do, but
the drive will probably acknowledge the data before the medium has
completed updating.

Cheers,
-- 
Cameron Simpson  DoD#743
http://www.cskk.ezoshosting.com/cs/

Isaac Asimov once remarked that friends had chided him for not patenting the
electronic pocket calculator, since he wrote of similar devices back in the
1940's.  His reply, "Have you ever noticed I only described what it looked
like on the *outside*?" - i...@mediaone.net
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Tino Wildenhain

Jim Jewett wrote:

On 3/12/09, "Martin v. Löwis"  wrote:

It is starting to look as though flush (and close?) should take an
optional wait parameter, to indicate how much re-assurance you're
willing to wait for.



Unfortunately, such a thing would be unimplementable on most of today's
operating systems.


What am I missing?

_file=file
class file(_file): ...
def flush(self, wait=0):
super().flush(self)
if wait < 0.25:
return
if wait < 0.5 and os.fdatasync:
os.fdatasync(self.fileno())
return
os.fsync(self.fileno())
if wait < 0.75:
return
if os.ffullsync:
os.ffullsync(self.fileno())



What would be wrong with just making the f*sync calls
methods of the file object and that's about it?

alternatively when flush() should get an optional argument,
I'd call it sync and use a set of predefined and meaningful
constants (and no floating point value).

Just my 2ct.

Regards
Tino



smime.p7s
Description: S/MIME Cryptographic Signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Martin v. Löwis
>>> The sync is necessary to ensure that the data is written to the disk
>>> before the old file overwrites the new filename.
>> You still wouldn't use the tempfile module in that case. Instead, you
>> would create a regular file, with the name base on the name of the
>> important file.
>>
> Uhm... why?

Because it's much easier not to use the tempfile module, than to use it,
and because the main purpose of the tempfile module is irrelevant to
the specific application; the main purpose being the ability to
auto-delete the file when it gets closed.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev] wait time [was: Ext4 data loss

2009-03-12 Thread Martin v. Löwis
Cameron Simpson wrote:
> On 12Mar2009 22:09, Martin v. L?wis  wrote:
> | > Let me try some examples.
> | > Suppose I'm running my applications on a laptop and I don't want the
> | > disk to be spinning continually while I work.  I'm willing to take the
> | > risk of data loss in order to extend my battery life.
> | 
> | So when you select "Save" in your application, would you like the data
> | to be saved, or would you accept that they get lost?
> 
> Often, I will accept that they get lost. Why? Because that will only
> happen with and OS/hardware failure, and I expect those to be close to
> never.

I think you are an atypical user, then. People can accept that data is
lost if the machine crashes at the moment of saving. They get certainly
puzzled if the data is lost if the machine crashes 30 seconds after they
have saved, and not even a backup copy is available anymore.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Toshio Kuratomi
Martin v. Löwis wrote:
 The sync is necessary to ensure that the data is written to the disk
 before the old file overwrites the new filename.
>>> You still wouldn't use the tempfile module in that case. Instead, you
>>> would create a regular file, with the name base on the name of the
>>> important file.
>>>
>> Uhm... why?
> 
> Because it's much easier not to use the tempfile module, than to use it,
> and because the main purpose of the tempfile module is irrelevant to
> the specific application; the main purpose being the ability to
> auto-delete the file when it gets closed.
> 
auto-delete is one of the nice features of tempfile.  Another feature
which is entirely appropriate to this usage, though, though, is creation
of a non-conflicting filename.

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Building Py3K branch docs with Sphinx

2009-03-12 Thread Tim Golden

Martin v. Löwis wrote:

Can I ask which flavour of Sphinx is being used to build the py3k docs?


The proper procedure to build the documentation is

make update
make htmlhelp #say



I think you misunderstood my question. I can build the docs
for 2.x, say -- have done so, in fact many times. 
I was simply trying to use Python 3.x itself

to build the docs for Python 3.x, not realising at first
that Sphinx (and docutils, Jinja etc.) won't actually
run under 3.x. 


Of course, as I later realised, I can just build them with
an existing 2.x install. I think I was sort of hoping to
have it produce its own dogfood, so to speak :)

TJG
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Martin v. Löwis
> auto-delete is one of the nice features of tempfile.  Another feature
> which is entirely appropriate to this usage, though, though, is creation
> of a non-conflicting filename.

Ok. In that use case, however, it is completely irrelevant whether the
tempfile module calls fsync. After it has generated the non-conflicting
filename, it's done.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Leif Walsh
On Thu, 2009-03-12 at 20:25 +, Antoine Pitrou wrote:
> I disagree. The user usually does not know which kind of flushing is needed in
> order for his data to be safe. Actually, he probably doesn't even know what
> flushing means, and that files are ever "closed".
> 
> However, I also think that any parameter to flush() or close() is a bad idea,
> since it can't be used when flushing and closing is implicit. For example when
> the file is used in a "with" statement.

Perhaps this is an argument that the "synciness" of a file should be
defined when it is opened?  This doesn't give very much control to the
programmer, but it certainly seems easy to use correctly.

-- 
Cheers,
Leif


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Building Py3K branch docs with Sphinx

2009-03-12 Thread Martin v. Löwis
Tim Golden wrote:
> Martin v. Löwis wrote:
>>> Can I ask which flavour of Sphinx is being used to build the py3k docs?
>>
>> The proper procedure to build the documentation is
>>
>> make update
>> make htmlhelp #say
> 
> 
> I think you misunderstood my question. I can build the docs
> for 2.x, say -- have done so, in fact many times. I was simply trying to
> use Python 3.x itself
> to build the docs for Python 3.x, not realising at first
> that Sphinx (and docutils, Jinja etc.) won't actually
> run under 3.x.
> Of course, as I later realised, I can just build them with
> an existing 2.x install. I think I was sort of hoping to
> have it produce its own dogfood, so to speak :)

I still think my answer would have helped. It says that
a) you don't need to pull any other magic sphinx version from somewhere;
   the one that the Makefile fetches works just fine, and
b) it uses "python" from the PATH; *not* the ../python which you just
   built.
c) build.bat cannot be trusted to work (as you have found)

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Toshio Kuratomi
Martin v. Löwis wrote:
>> auto-delete is one of the nice features of tempfile.  Another feature
>> which is entirely appropriate to this usage, though, though, is creation
>> of a non-conflicting filename.
> 
> Ok. In that use case, however, it is completely irrelevant whether the
> tempfile module calls fsync. After it has generated the non-conflicting
> filename, it's done.
>
If you're saying that it shouldn't call fsync automatically I'll agree
to that.  The message thread I was replying to seemed to say that
tempfiles didn't need to support fsync because they will be useless
after a system crash.  I'm just refuting that by showing that it is
useful to call fsync on tempfiles as one of the steps in preserving the
data in another file.

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread A.M. Kuchling
On Thu, Mar 12, 2009 at 08:25:59PM +, Antoine Pitrou wrote:
> However, I also think that any parameter to flush() or close() is a bad idea,
> since it can't be used when flushing and closing is implicit. For example when
> the file is used in a "with" statement.

I think the existing os.fsync() and O_SYNC functionality is fine for
new applications and packages to write data securely or not.  We
should just consider whether the stdlib APIs don't make it impossible
to write data securely, e.g. dumbdbm's internal file object, and if
so, is it worth fixing?

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [ANN] EuroPython 2009 – Extra Early Bird registration ends this Saturday!

2009-03-12 Thread Martin P. Hellwig

Newsflash!
A large number of Pythoneers has signed up already, for this reason
alone it is worth booking!

If you already know you are joining the conference, why not save some
money in these financially uncertain times and take advantage of the
extra early bird rate!

The extra early bird rate is just 95 GBP for the conference (70 GBP
for the tutorials) and ends this Saturday 14th of March.

You can book your conference and hotel all at once. Register at
http://www.europython.eu/registration/ .

The talks submitted so far promise to be very interesting and practical.
We have room for more though, go to
http://www.europython.eu/talks/cfp/ for this year's themes, and
submissions criteria, the deadline is on 5th April 2009.

Sponsors
A unique opportunity to affiliate with the prestigious EuroPython 
conference!

http://www.europython.eu/sponsors/

Spread the Word
Improve our publicity by distributing this announcement in your corner
of the community, coordinating this with the organisers is highly
appreciated. http://www.europython.eu/contact/

General Information
For more information about the conference, please visit
http://www.europython.eu/ .

Looking forward to seeing you!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread R. David Murray

On Thu, 12 Mar 2009 at 22:09, "Martin v. L?wis" wrote:

Let me try some examples.

Suppose I'm running my applications on a laptop and I don't want the
disk to be spinning continually while I work.  I'm willing to take the
risk of data loss in order to extend my battery life.


So when you select "Save" in your application, would you like the data
to be saved, or would you accept that they get lost? If the latter,
what kind of interaction would you perform with your application to
indicate that you *do* want the data to appear on disk?


I accept that if I have told my laptop to only sync to disk every five
minutes (as I have at times done), and it crashes (eg: the battery runs
out), then anything I did during those last five minutes will be lost.
If the disk then spins up more often than I told it to, I get very
annoyed.

--
R. David Murray   http://www.bitdance.com___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Steven D'Aprano
On Fri, 13 Mar 2009 08:01:27 am Jim Jewett wrote:
> On 3/12/09, "Martin v. Löwis"  wrote:
> >> It is starting to look as though flush (and close?) should take an
> >> optional wait parameter, to indicate how much re-assurance you're
> >> willing to wait for.
> >
> > Unfortunately, such a thing would be unimplementable on most of
> > today's operating systems.
>
> What am I missing?
>
> _file=file
> class file(_file): ...
> def flush(self, wait=0):
> super().flush(self)
> if wait < 0.25:
> return
> if wait < 0.5 and os.fdatasync:
> os.fdatasync(self.fileno())
> return

[snip rest of function]

Why are you giving the user the illusion of fine control by making the 
wait parameter a continuous variable and then using it as if it were a 
discrete variable? Your example gives only four distinct behaviours, 
for a (effectively) infinite range of wait. This is bad interface 
design: it misleads people into thinking that wait=0.4 is 33% safer 
than wait=0.3 when in fact they are exactly the same.

So, replace the wait parameter with a discrete variable -- named or 
numeric constants. That's a little better, but I still don't think this 
is the right solution. I believe that we want to leave the foundations 
as they are now, or at least don't rush into making changes to them.

A better approach in my opinion is to leave file as-is (although I 
wouldn't object much to it growing a sync method, for convenience) and 
then providing subclasses with the desired behaviour. That scales much 
better: today we can think of three or four levels of "save 
reliability" (corresponding to your 0.25, 0.5, 0.7 and 1 values for 
wait) but next year we might think of six, or ten. Instead of 
overloading the file type with all these different sorts of behaviour, 
requiring who knows how many arguments and a complicated API, we leave 
file nice and simple and allow the application developer to choose the 
subclass she wants:

from filetools import SyncOnWrite as open
f = open('mydata.txt', 'w')
f.write(data)

The choice of which subclass gets used is up to the application, but 
naturally that might be specified by a user-configurable setting.



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Steven D'Aprano
On Fri, 13 Mar 2009 07:25:59 am Antoine Pitrou wrote:
> R. David Murray  bitdance.com> writes:
> > By the way, I would not like to see python programmers encouraged
> > to make the same mistake that sqlite3 made.  The decision about how
> > aggressive to be on flushing data to disk should be in the hands of
> > the _user_, not the application.
>
> I disagree. The user usually does not know which kind of flushing is
> needed in order for his data to be safe. Actually, he probably
> doesn't even know what flushing means, and that files are ever
> "closed".

Surely it depends on what sort of user you're talking about, and that is 
often application or OS specific. As a sweeping generalization, Mac 
users may be more tolerant of slow saves and less tolerant of data loss 
than Windows users, laptop/notebook users will probably expect the app 
to honour whatever setting they put in regarding HDD behaviour, and 
Linux users may expect more fine control over application behaviour and 
be willing to edit config files to get it.



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev] wait time [was: Ext4 data loss

2009-03-12 Thread R. David Murray

On Thu, 12 Mar 2009 at 22:57, "Martin v. L?wis" wrote:

Cameron Simpson wrote:

On 12Mar2009 22:09, Martin v. L?wis  wrote:
| > Let me try some examples.
| > Suppose I'm running my applications on a laptop and I don't want the
| > disk to be spinning continually while I work.  I'm willing to take the
| > risk of data loss in order to extend my battery life.
|
| So when you select "Save" in your application, would you like the data
| to be saved, or would you accept that they get lost?

Often, I will accept that they get lost. Why? Because that will only
happen with and OS/hardware failure, and I expect those to be close to
never.


I think you are an atypical user, then. People can accept that data is
lost if the machine crashes at the moment of saving. They get certainly
puzzled if the data is lost if the machine crashes 30 seconds after they
have saved, and not even a backup copy is available anymore.


The typical user is probably not all that surprised when Windows loses
their data.  They probably figure Windows took more than 30 seconds
to complete the save :) :)

Seriously, though, the point is that IMO an application should not be
calling fsync unless it provides a way for that behavior to be controlled
by the user.

--
R. David Murray   http://www.bitdance.com___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev] wait time [was: Ext4 data loss

2009-03-12 Thread Antoine Pitrou
R. David Murray  bitdance.com> writes:
> 
> Seriously, though, the point is that IMO an application should not be
> calling fsync unless it provides a way for that behavior to be controlled
> by the user.

But whether an application does it or not is none of Python's business, is it?
What is the disagreement exactly?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Capability to alter issue metadata

2009-03-12 Thread Tennessee Leeuwenburg
Hi all,

I am continuing to look at issues in the issue tracker. It would be handy to
be able to update some of the metadata fields. For contributions, it's fine
to just be able to upload patches / post messages, but I can see any number
of issues which could use a bit of looking after.

e.g. http://bugs.python.org/issue4535 should probably be set to "pending
feedback"

I'd be happy to do this kind of thing if people are happy for me to do so...

-T

-- 
--
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] wait time [was: Ext4 data loss]

2009-03-12 Thread Steven D'Aprano
On Fri, 13 Mar 2009 11:27:54 am R. David Murray wrote:
> Seriously, though, the point is that IMO an application should not be
> calling fsync unless it provides a way for that behavior to be
> controlled by the user.

An admirable approach, but also a sweeping generalisation. Who is your 
expected user-base? Power users, who insist on being given the ability 
to configure every last aspect of the application behaviour? Or regular 
users who will be intimidated if you ask them to make the choice? Every 
configuration choice has a cost: not only does it require more effort 
to code and maintain, but it leads to a combinatorial explosion of test 
paths and greater opportunity for bugs. Why pay that cost if your 
application users won't consider the choice a feature? By all means 
give the user the option to make that choice, if they will consider it 
a feature.

The point is that these are *application* decisions, not *language* 
decisions. Python shouldn't be making those decisions, but should be 
enabling application developers to make them.



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Capability to alter issue metadata

2009-03-12 Thread Daniel (ajax) Diniz
Tennessee Leeuwenburg wrote:
> I am continuing to look at issues in the issue tracker. It would be handy to
> be able to update some of the metadata fields. For contributions, it's fine
> to just be able to upload patches / post messages, but I can see any number
> of issues which could use a bit of looking after.

I'm +1 to giving you Developer rights (but have no say in that). I'm
available to change any metadata you want until that happens.

BTW, R. David Murray is also interested in helping with the open
issues[1], so we could coordinate efforts at tracker-discuss.

> e.g. http://bugs.python.org/issue4535 should probably be set to "pending
> feedback"

Set to 'pending', 'pending feedback' is pending approval :)

> I'd be happy to do this kind of thing if people are happy for me to do so...

I am, thank you! :)

Regards,
Daniel

[1] http://mail.python.org/pipermail/tracker-discuss/2009-March/001914.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Capability to alter issue metadata

2009-03-12 Thread Brett Cannon
On Thu, Mar 12, 2009 at 18:22, Tennessee Leeuwenburg  wrote:

> Hi all,
>
> I am continuing to look at issues in the issue tracker. It would be handy
> to be able to update some of the metadata fields. For contributions, it's
> fine to just be able to upload patches / post messages, but I can see any
> number of issues which could use a bit of looking after.
>
> e.g. http://bugs.python.org/issue4535 should probably be set to "pending
> feedback"
>
> I'd be happy to do this kind of thing if people are happy for me to do
> so...


You have the Developer role.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev] wait time [was: Ext4 data loss

2009-03-12 Thread Cameron Simpson
On 13Mar2009 00:35, Antoine Pitrou  wrote:
| R. David Murray  bitdance.com> writes:
| > Seriously, though, the point is that IMO an application should not be
| > calling fsync unless it provides a way for that behavior to be controlled
| > by the user.
| 
| But whether an application does it or not is none of Python's business, is it?
| What is the disagreement exactly?

When the app is written in python, it bears on python's business. The
dispute seems to me to be largely (a) should python libraries call
fsync() and the like on their own, and when (b) whether there should be
class methods to control this. For myself, the answer for (a) is broadly
no and for (b) preferably yes, in which case my answer to (a) becomes
"default to no fsyncness unless asked".

Then the behaviour of the app becomes something to criticise or not
and python can go in its was with a clear conscience.

Cheers,
-- 
Cameron Simpson  DoD#743
http://www.cskk.ezoshosting.com/cs/

DRM: the functionality of refusing to function. - Richard Stallman
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev] wait time [was: Ext4 data loss

2009-03-12 Thread R. David Murray

On Fri, 13 Mar 2009 at 00:35, Antoine Pitrou wrote:

R. David Murray  bitdance.com> writes:


Seriously, though, the point is that IMO an application should not be
calling fsync unless it provides a way for that behavior to be controlled
by the user.


But whether an application does it or not is none of Python's business, is it?
What is the disagreement exactly?


I'd like to see whatever feature gets added support the application
writer in making this user controllable, or at the very least document
that this to do so is best practice if you use the sync feature.

--
R. David Murray   http://www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev] wait time [was: Ext4 data loss

2009-03-12 Thread Steven D'Aprano
On Fri, 13 Mar 2009 01:02:26 pm R. David Murray wrote:
> On Fri, 13 Mar 2009 at 00:35, Antoine Pitrou wrote:
> > R. David Murray  bitdance.com> writes:
> >> Seriously, though, the point is that IMO an application should not
> >> be calling fsync unless it provides a way for that behavior to be
> >> controlled by the user.
> >
> > But whether an application does it or not is none of Python's
> > business, is it? What is the disagreement exactly?
>
> I'd like to see whatever feature gets added support the application
> writer in making this user controllable, or at the very least
> document that this to do so is best practice if you use the sync
> feature.

It's not best practice. It may be best practice for a certain class of 
users and applications, e.g. those who value the ability to control 
low-level behaviour of the app, but it is poor practice for other 
classes of users and applications. Do you really think that having 
Minefield make the file syncing behaviour of the high scores file 
user-configurable is best practice? People care about their high 
scores, but they don't care that much.

It may even lead to more data loss than leaving it out:

* If the application chooses a specific strategy, this strategy might 
(for the sake of the argument) lead to data loss once in ten million 
writes on average.

* If the application makes this a configuration option, the increased 
complexity of writing the code, and the increased number of paths that 
need to be tested, may lead to bugs which cause data loss. This may be 
more risky than the original strategy above (whatever that happens to 
be.)

Complexity is not cost-free, and insisting that the more complex, 
expensive solution is always "best practice" is wrong.


-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] sure [was: Ext4 data loss]

2009-03-12 Thread Jim Jewett
[new name instead of "wait" -- but certainty is too long, patience too
hard to spell, etc...]

>> class file(_file): ...
>> def flush(self, sure=0):
> super().flush(self)
> if sure < 0.25:
> return
> if sure < 0.5 and os.fdatasync:
> os.fdatasync(self.fileno())
> ...

Steven D'Aprano asked
> Why are you giving the user the illusion of fine control by making the
> wait parameter a continuous variable and then using it as if it were a
> discrete variable?

We don't know how many possible values there will be, or whether they
will be affected by environmental settings.  Developers will not
always know what sort of systems users will have, but they can
indicate (with a ratio) where in the range (slow+safe):(fast+risky)
they rate this particular flush.

Before this discussion, I knew about sync, but had not paid attention
even to datasync, let alone fullsync.  I have no idea which additional
options may be relevant in the future, or on smaller devices or other
storage media.

I do expect specific intermediate values (such as 0.3) to be
interpreted differently on a laptop than on a desktop.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Adam Olsen
On Tue, Mar 10, 2009 at 2:11 PM, Christian Heimes  wrote:
> Multiple blogs and news sites are swamped with a discussion about ext4
> and KDE 4.0. Theodore Ts'o - the developer of ext4 - explains the issue
> at
> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.
>
>
> Python's file type doesn't use fsync() and be the victim of the very
> same issue, too. Should we do anything about it?

It's a kernel defect and we shouldn't touch it.

Traditionally you were hooped regardless of what you did, just with
smaller windows.  Did you want to lose your file 50% of the time or
only 10% of the time?  Heck, 1% of the time you lose the *entire*
filesystem.

Along came journaling file systems.  They guarantee the filesystem
itself stays intact, but not your file.  Still, if you hedge your bets
it's a fairly small window.  In fact if you kill performance you can
eliminate the window: write to a new file, flush all the buffers, then
use the journaling filesystem to rename; few people do that though,
due to the insane performance loss.

What we really want is a simple memory barrier.  We don't need the
file to be saved *now*, just so long as it gets saved before the
rename does.  Unfortunately the filesystem APIs don't touch on this,
as they were designed when losing the entire filesystem was
acceptable.  What we need is a heuristic to make them work in this
scenario.  Lo and behold ext3's data=ordered did just that!

Personally, I consider journaling to be a joke without that.  It has
different justifications, but not this critical one.  Yet the ext4
developers didn't see it that way, so it was sacrificed to new
performance improvements (delayed allocation).

2.6.30 has patches lined up that will fix this use case, making sure
the file is written before the rename.  We don't have to touch it.

Of course if you're planning to use the file without renaming then you
probably do need an explicit fsync and an API for that might help
after all.  That's a different problem though, and has always existed.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com