On Sun, Dec 20, 2009 at 6:15 AM, Michael McCandless <[email protected]> wrote: > On Sun, Dec 20, 2009 at 12:14 AM, Marvin Humphrey > <[email protected]> wrote: > >>> I also think that Mike is making too much distinction between >>> "relying on the file system" and "using shared memory". I think >>> one can safely view them as two interfaces to the same underlying >>> mechanism. > > Using the filesystem for sharing vs using shared memory seem quite > different to me. EG one could create a rich data structure (say an > FST) to represent the terms dict in RAM, then share that terms dict > amongst many processes, right? > > Whereas, using the filesystem really requires a file-flat data > structure?
I guess it depends on your point of view: it would be hard (but not impossible) to do true objects in an mmapped file, but it would be very easy to do has-a type relationships using file offsets as pointers. I tend to have a data-centric (rather than object-centric) point of view, but from here I don't see any data structures that would be significantly more difficult. Do you have a link that explains the FST you refer to? I'm searching, and not finding anything that's a definite match. "Field select table"? > Ie, "going through the filesystem" and "going through shared memory" > are two alternatives for enabling efficient process-only concurrency > models. They have interesting tradeoffs (I'll answer more in 2026), > but the fact that one of them is backed by a file by the OS seems like > a salient difference. For me, file backing doesn't seem like a big difference. Fast moving changes will never hit disk, and I presume there is some way you can convince the system never to actually write out the slow changes (maybe mmap on a RamFS?). I think the real difference is between sharing between threads and sharing between processes --- basically, whether or not you can assume that the address space is identical in all the 'sharees'. I'll mention that, given the New Year, at first I thought 2026 was your realistic time estimate rather than a tracking number. --- I started thinking about how one could do objects with mmap, and came up with an approach that doesn't quite answer that question but might actually work out well for other problems: you could literally compile your index and link it in as a shared library. Each term would be a symbol, and you'd use 'dlsym' to find the associated data. It's possible that you could even use library versioning to handle updates, and stuff like RTLD_NEXT to handle multiple segments. Perhaps a really bad idea, but I find it an intriguing one. I wonder how fast using libdl would be compared to writing your own lookup tables. I'd have to guess it's fairly efficient. Nathan Kurz [email protected]
