Re: real time updates

Nathan Kurz Fri, 13 Mar 2009 12:48:45 -0700

[moved from private to lucy-dev in case others are interested]

On Thu, Mar 12, 2009 at 5:13 PM, Marvin Humphrey <[email protected]> wrote:
> On Fri, Mar 06, 2009 at 10:17:55AM -0700, Nathan Kurz wrote:
>> There was an article recently that might be relevant to your desire
>> for real time updates of KinoSearch databases:
>> http://news.ycombinator.com/item?id=497039
>
> It looks like that system can handle a much greater change rate than KS.  KS
> has a slow best-case write speed, but I'm not worried about that.  The problem
> me and the Lucene folks are trying to address under the topic heading of
> "real-time indexing" is *worst-case* write speed: most of the time you're
> fine, but every once in a while you trigger a large merge and you wait a
> loooooong time.  That problem has a different solution.


What is the actual case that would trigger such a problem?  My
instinct is that while there is no way to avoid the long merge, that
there are schemes where only the update is slow, and the readers can
continue at more or less full speed.

> Except on NFS, KS doesn't have much in the way of lock contention issues
> because index files are never modified.  But regardless of its applicability
> to KS, this "sneaky lock" trick is pretty nice:
> ...
> We couldn't do that because we can't reach out across the system to active
> IndexReaders in other processes, but still: nice.

I realize this, but I'm wondering if a locking approach might be
preferable.  Would the equivalent of row-level-locking allow you to
modify the index files in real time, instead of adding addenda and
tombstones?   I'm not necessarily suggesting that the default index
format should do this, but that it might be worth considering whether
a proposed API would support such a real-time format.


> I'm remind of this presentation on a lock-free hash table:
>
>    http://video.google.com/videoplay?docid=2139967204534450862
>   > http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf

Thanks, I took a glance, and it does seem interesting.  In addition to
the shared memory relevance, I've been looking at CUDA programming on
GPU's, and am interested in lock-free data structures such as this.

> This was well put:
>
>    RAM can be viewed as a 16 Gig L4 cache, and disk as a multi-Terabyte L5.
>    Just as one currently writes code that doesn't distinguish between a fetch
>    from L1 and a fetch from main memory, mmap() allows extending this syntax
>    all the way to a fetch from disk.

Thanks.  While I certainly think that mmap() can have great
performance advantages, it's the simplification it provides that
really appeals to me.  Instead of fighting with the OS, use it!

Nathan Kurz
[email protected]

Re: real time updates

Reply via email to