On Sep 9, 5:44 pm, Larry Bates <[EMAIL PROTECTED]> wrote: > castironpi wrote: > > I will try my idea again. I want to talk to people about a module I > > want to write and I will take the time to explain it. I think it's a > > "cool idea" that a lot of people, forgiving the slang, could benefit > > from. What are its flaws? > > > A user has a file he is using either 1/ to persist binary data after > > the run of a single program (persistence) or 2/ share binary data > > between concurrently running programs (IPC / shared memory). The data > > are records of variable types and lengths that can change over time. > > He wants to change a record that's already present in the file. Here > > are two examples. > > > Use Case 1: Hierarchical ElementTree-style data > > > A user has an XML file like the one shown here. > > > <a> > > <b> > > <c>Foo</c> > > </b> > > ... > > > He wants to change "Foo" to "Foobar". > > > <a> > > <b> > > <c>Foobar</c> > > </b> > > ... > > > The change he wants to make is at the beginning of a 4GB file, and > > recopying the remainder is an unacceptable resource drain. > > > Use Case 2: Web session logger > > > A tutor application has written a plugin to a webbrowser that records > > the order of a user's mouse and keyboard activity during a browsing > > session, and makes them concurrently available to other applications > > in a suite, which are written in varying lanugages. The user takes > > some action, such as surfing to a site or clicking on a link. The > > browser plugin records that sequence into shared memory, where it is > > marked as acknowledged by the listener programs, and recycled back > > into an unused block. URLs, user inputs, and link text can be of any > > length, so truncating them to fit a fixed length is not an option. > > > Existing Solutions > > > - Shelve - A Python Standard Library shelf object can store a random > > access dictionary mapping strings to pickled objects. It does not > > provide for hierarchical data stores, and objects must be unpickled > > before they can be examined. > > - Relational Database - Separate tables of nodes, attributes, and > > text, and the relations between them are slow and unwieldy to > > reproduce the contents of a dynamic structure. The VARCHAR data type > > still carries a maximum size, no more flexible than fixed-length > > records. > > - POSH - Python Object Sharing - A module currently in its alpha stage > > promises to make it possible to store Python objects directly in > > shared memory. In its current form, its only entry point is 'fork' > > and does not offer persistence, only sharing. See: > > http://poshmodule.sourceforge.net/ > > > Dynamic Allocation > > > The traditional solution, dynamic memory allocation, is to maintain a > > metadata list of "free blocks" that are available to write to. See: > > http://en.wikipedia.org/wiki/Dynamic_memory_allocation > > http://en.wikipedia.org/wiki/Malloc > > http://en.wikipedia.org/wiki/Mmap > > http://en.wikipedia.org/wiki/Memory_leak > > The catch, and the crux of the proposal, is that the metadata must be > > stored in shared memory along with the data themselves. Assuming they > > are, a program can acquire the offset of an unused block of a > > sufficient size for its data, then write it to the file at that > > offset. The metadata can maintain the offset of one root member, to > > serve as a 'table of contents' or header for the remainder of the > > file. It can be grown and reassigned as needed. > > > An acquaintence writes: It could be quite useful for highly concurrent > > systems: the overhead involved with interprocess communication can be > > overwhelming, and something more flexible than normal object > > persistence to disk might be worth having. > > > Python Applicability > > > The usual problems with data persistence and sharing apply. The > > format of the external data is only established conventionally, and > > conversions between Python objects and raw memory bytes take the usual > > overhead. 'struct.Struct', 'ctypes.Structure', and 'pickle.Pickler' > > currently offer this functionality, and the buffer offset obtained > > from 'alloc' can be used with all three. > > > Ex 1. > > s= struct.Struct( 'III' ) > > x= alloc( s.size ) > > s.pack_into( mem, x, 2, 4, 6 ) > > Struct in its current form does not permit random access into > > structure contents; a user must read or write the entire converted > > strucutre in order to update one field. Alternative: > > s= struct.Struct( 'I' ) > > x1, x2, x3= alloc( s.size ), alloc( s.size ), alloc( s.size ) > > s.pack_into( mem, x1, 2 ) > > s.pack_into( mem, x2, 4 ) > > s.pack_into( mem, x3, 6 ) > > > Ex 2. > > class Items( ctypes.Structure ): > > _fields_= [ > > ( 'x1', ctypes.c_float ), > > ( 'y1', ctypes.c_float ) ] > > x= alloc( ctypes.sizeof( Items ) ) > > c= ctypes.cast( mem+ x, ctypes.POINTER( Items ) ).contents > > c.x1, c.y1= 2, 4 > > The 'mem' variable is obtained from a call to PyObject_AsWriteBuffer. > > > Ex 3. > > s= pickle.dumps( ( 2, 4, 6 ) ) > > x= alloc( len( s ) ) > > mem[ x: x+ len( s ) ]= s > > 'dumps' is still slow and nor does permit random access into contents. > > > Use Cases Revisited > > > Use Case 1: Hierarchical ElementTree-style data > > Solution: Dynamically allocate the tree and its elements. > > > Node: tag: a > > Node: tag: b > > Node: tag: c > > Node: text: Foo > > > The user wants to change "Foo" to "Foobar". > > > Node: tag: a > > Node: tag: b > > Node: tag: c > > Node: text: Foobar > > > Deallocate 'Node: text: Foo', allocate 'Node: text: Foobar', and store > > the new offset into 'Node: tag: c'. Total writes 6 bytes 'foobar', a > > one-word offset, and approximatly 5- 10-word metadata update. > > > Use Case 2: Web session logger > > Dynamically allocate a linked list of data points. > > > Data: 'friendster.com' > > Data: 'My Account' > > > Allocate one block for each string, adding it to a linked list. As > > listeners acknowledge each data point, remove it from the linked > > list. Keep the head node in the 'root offset' metadata field. > > > Restrictions > > > It is not possible for persistent memory to refer to live memory. Any > > objects it refers to must also be located in file. Their mapped > > addresses must not be stored, only their offsets into it. However, > > live references to persistent memory are eminently possible. > > > Current Status > > > A pure Python alloc-free implementation based on the GNU PAVL tree > > library is on Google Code. It is only in proof-of-concept form and > > not commented, but does contain a first-pass test suite. See: > > http://code.google.com/p/pymmapstruct/source/browse/#svn/trunk > > The ctypes solution for access is advised. > > You should review Zope's ZODB and/or memcached before putting in too much > effort. > > -Larry
Larry, I'd love to say they were exactly what I was looking for. They're not. I confess, I stopped reading ZODB when I got to the "uses pickles" part, and 'memcached' when I got to the awkward and unwieldy "SELECT FROM" part. I'm aware of both of those and my solution does something neither other does. -- http://mail.python.org/mailman/listinfo/python-list