[issue25442] Shelve consistency issues

2015-10-20 Thread R. David Murray

R. David Murray added the comment:

Yeah, if we had an sqlite backend I think we'd make it the default if sqlite 
was available.  There's a proof of concept implementation in the open issue 
3783.  I'm not sure what remains to be done (other than docs)...I didn't read 
through the issue and there's a fair bit of discussion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25442] Shelve consistency issues

2015-10-19 Thread R. David Murray

R. David Murray added the comment:

Shelve does not itself implement any database, but it does *use* a database[*]. 
 Any aspects of this must be directed toward the underlying database library 
used.  In particular, it is not part of the shelve API to know anything about 
any possible underlying file or files, nor is it *necessarily* the case that 
there is pending data to be flushed on close.

So, if you want to suggest a documentation enhancement, it should to make 
reference to the issue and point the user at the documentation for the 
underlying database they choose to use for more information.

[*] There is an open issue proposing an sqlite backend for shelve, but no one 
so far has had the motivation to finish it.

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25442] Shelve consistency issues

2015-10-19 Thread Yanyan Jiang

New submission from Yanyan Jiang:

I am currently working on the file system reliability issues. I have a disk 
driver that is able to simulate crash disk sites after injected power failures. 
This disk is totally compatible with the Linux block driver semantics (refer to 
 https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt), 
and may create many crash sites that pending blocks are partially flushed into 
the disk which is a common behavior of a commodity disk with write buffer.

Our automated tool confirms the corruptions could happen on a crash site at an 
unclean shutdown (Linux with default ext4 setting). We also found that there 
are some discussions on 
[Stackoverflow](http://stackoverflow.com/questions/4226580/prevent-python-shelve-corruption)
 concerning this issue. I am suggesting to explicitly remind the developers of 
such behaviors.

Suggested documentation enhancement
--
As a minimal database library, `shelve` does not offer as strong ACID 
(atomicity, consistency, isolation and durability) guarantee as a database 
(like SQLite). On certain system configurations, a system crash would lead to a 
corrupted shelve file. If you are using shelve to persistent precious data like 
user's document, we suggest using the following steps to ensure data is not 
lost:

1. Create a copy of the file, say, the temporary.
2. Operate on a copy of the temporary file. Closing a shelve db implies data to 
be flushed to the disk.
3. Rename the temporary file to replace the original file. Renaming is 
carefully treated by a journaled filesystem to be atomic.

--
assignee: docs@python
components: Documentation
messages: 253188
nosy: Yanyan Jiang, docs@python
priority: normal
severity: normal
status: open
title: Shelve consistency issues
type: enhancement
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25442] Shelve consistency issues

2015-10-19 Thread Yanyan Jiang

Yanyan Jiang added the comment:

Thanks for reminding. It is originally reported with the default setting. We 
conducted further tests with other options of anydbm (dbhash, dbm, gdbm), none 
of them survived crash testing. For the detailed reasoning please refer to an 
OSDI'14 research paper: 
https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf
 This paper discussed vulnerabilities of GDBM implementation in that paper, and 
these lightweight db implementations have similar problems. We also have tests 
SQLite, and it is much more robust that we have not found ACID violation yet.

Personally I think it is reasonable to have an SQLite backend, as it is much 
safer (plus providing thread safety). Just to see what I can do for that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com