Good idea CCing the list, I initially thought it was a topic of limited general relevance, but I realized at least Mike might also have something to say here..
On Sat, Sep 29, 2012 at 2:31 AM, Ralph Versteegen <teeem...@gmail.com> wrote: > On 29 September 2012 00:38, David Gowers <00a...@gmail.com> wrote: > > I found that doing seeks on a file while writing to it is very slow on > Windows XP; apparently the libc implementation flushes its cache when > you seek. glibc doesn't do this. So I actually wrote a file write ... Wow, they like to punish databases?. I'd guess that nobody hosts websites on WinXP machines, but that'd be a wrong guess >_<. > buffering layer used when writing RELOAD documents to avoid this. > Loading RELOAD documents was also considerably slower in Win XP than > in Linux; I don't why, but it was still perfectly acceptable. I'm currently researching this, as far as I know Python does some buffering on its side of things, so how much of an impact this has needs to be determined. My net access is kind of FUBAR, I'll get there eventually though. ... Looks like Python definitely does do its own buffering: http://docs.python.org/library/io.html#io.BufferedIOBase The only reference I've been able to find to slow seek() in python is this one guy who was performing the inanity: for i in range(1000000): f.seek(0) /-_-\ . Anyway I'm curious, so I'll write a PlanarRecordManager subclass that caches the entire content of the relevant super-record, then they can be swapped in and out to compare. >> (the function that 'with' blocks otherwise perform, but for whenever a >> 'with' block is inappropriate). Much like you can do with file >> handles. > > OK, but littering code with lots of close()s isn't too pretty. In practice so far close() doesn't get much usage, versus with blocks. It won't go away (the context management used by 'with' will use it) but there's no obligation to. I only use explicit close()s where indentation levels would otherwise get excessive. (again, just like would be the case with filehandles) > Fine, it is clearer, ever so slightly! If you don't like, don't use. It's no problem. I designed it because I much prefer it. > Sketching DSLs can be a lot of fun (but turns into frustration for me > due to perfectionism. They usually can't be the silver bullet you > want). Yeah, even more so the more powerful the source language is. You end up realizing that you're looking at implementing features that are already done well in the host language, better than any quick hack could be. >> You're right here. I actually went with the 'sub' idea initially >> because it distinguished subsections clearly, but another convention >> can probably handle that. > > That could be a useful convention, but there's no gain in restricting > it to just RELOADWriter. Hm. Point. > >>> >>> The ZoneMap class should definitely have name and extra/extradata >>> members rather than obscuring that in an 'info' tuple. >> >> So what, you want a name[N_ZONES] dict and an extradata[N_ZONES] >> dict instead? Eh, okay. > > I didn't realise .info was indexed by zone id. That explains why you > did it that way. In that light, slightly prefer .info, as long as you > make .info elements namedtuples instead of a tuples. Sure, a lot of that stuff is provisional anyway, I just put things in containers that seem reasonable, and wait for usage to tell me if adjustments are needed. Updated status: * 10 classes are done (most of them written from scratch this month) * 3 are WIP/ need more testing * 20 are not started yet. ...(tile|foe|wall)maps got finished quickly. I've been looking at how to do streaming RELOAD. the general pattern I'm currently looking at: statsby = parentnode.when('stats_by') if statsby: # do stuff with statsby and to explicitly get the next node, just iterate: node = next(parentnode) Related ideas: * No need for another class, the difference in behaviour is minor. * precache node info on the current level. This allows to process nodes 'out-of-order' -- if you request a node that is after the 'current' node, you can still get it.. and process the other one whenever it occurs in your code. * precaching based on a size threshold: if the size specified on a parent node is smaller than N bytes, precache all its children. * saving needs some thought, mainly we need to be able to create 'detached' nodes and 'attach' them later when dealing with complex structures. In general this describes a 'mixed' model, where we stream when we must, and otherwise act like we have random access. I haven't decided on the quality of this model yet. It would raise an error when sizes are drastically out of wack (eg. a container node you expect to be 1000 bytes is 100000 instead) _______________________________________________ Ohrrpgce mailing list ohrrpgce@lists.motherhamster.org http://lists.motherhamster.org/listinfo.cgi/ohrrpgce-motherhamster.org