Re: Ocean Documentation

Jason Rutherglen Mon, 14 Jul 2008 14:05:39 -0700

I took a look at Jackrabbit, which are a very cool animal, and there are
similar ideas in the Lucene portion.  I will try to take a look at the
source to get a better understanding.


On Fri, Jul 11, 2008 at 9:09 AM, Ard Schrijvers <[EMAIL PROTECTED]>
wrote:

> Hello Jason et al,
>
> Indeed there are plenty of usecases of instantly needed updated
> searches, for example the jsr-170 (jcr) compliant Jackrabbit
> implementation: it havily relies on lucene for searching and hierarchy
> resolving, and according jsr-170 spec after a save(), changes need to be
> visible instantly.
>
> Also, I think a very similar solution to yours is implemented there: See
> [1] if you like
>
> Regards Ard
>
> [1] http://jackrabbit.apache.org/index-readers.html
>
>
>
> > I started a wiki name at
> > http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked
> > from http://wiki.apache.org/lucene-java/LuceneResources.
> >
> > Perhaps I should add some background on the wiki.  I can add
> > a little bit here.  I was an early Solr developer/user at a
> > social networking company when Google's GData came out.  It
> > looked similar to Solr so I took a look at it.  The one thing
> > it had over Solr was realtime updates or the ability to add,
> > delete, or update a document and be able to see the update in
> > search results immediately.  With Solr the company had
> > decided on a 10 minute interval of updating the index with
> > delta updates from an Oracle database.  I wanted to see if it
> > was possible with Lucene to create an approximation of what
> > GData does.  The result is Ocean.
> >
> > The use case it was designed for is websites with dynamic
> > data, some of which are social networking, photo sites,
> > discussions boards, blogs, wikis, and such.  More broadly it
> > is possible to use Ocean with any application that requires
> > the database like feature of immediate updates.  Probably the
> > best example of this is all of Google's web applications,
> > outside of web search, uses a GData interface.  Meaning the
> > primary datastore is not mysql or some equivalent, it is a
> > proprietary search based database.  The best example of this
> > is Gmail.  If I receive an email through Gmail I can also
> > search on it immediately, there is no 10 minute delay.  Also
> > in Gmail I can change labels, a common example being changing
> > unread emails to read in bulk.  Presumably Gmail is not
> > reindexing the entire email for each label change.
> >
> > Most highly trafficked web applications do not use the
> > relational facilities like joins because they are too
> > expensive.  Lucene does not offer joins so this is fine.  The
> > only area Lucene is currently weak in is range queries.
> > Mysql uses a btree index whereas Lucene uses the time
> > consuming TermEnum and TermDocs combination.  This is an area
> > Tag Index addresses.
> >
> > The way Ocean is designed there should be no limitations to
> > using it compared to using Lucene IndexWriter.  It offers the
> > same functionality.  If one does not want to use the
> > transaction log Ocean offers because one simply wants to
> > index 1 million documents at once, Ocean offers what is a
> > called a LargeBatch.  It is a way to perform a large number
> > of updates taking advantage of the new IndexWriter speedup,
> > combined with transactional semantics.
> >
> > Karl, does this answer your question or are there areas that
> > could use more explanation?
> >
> >
> > On Fri, Jul 11, 2008 at 6:20 AM, Karl Wettin
> > <[EMAIL PROTECTED]> wrote:
> >
> >
> >
> >       10 jul 2008 kl. 22.08 skrev Jason Rutherglen:
> >
> >
> >
> >               Is there a good place to put Ocean
> > https://issues.apache.org/jira/browse/LUCENE-1313
> > documentation?  Is there a place on the wiki that is good?
> >
> >
> >
> >       Hi Janson,
> >
> >       the wiki is just fine.
> >
> >       I've been reading the docs and looked at your patch.
> > There is a lot of text about how it does what it does, but it
> > says nothing anything about the intended use. I honestly
> > don't even know what you mean by "real time search". You will
> > probably get more attention if the documentation starts out
> > with some use cases or thoughts on when and why it might make
> > sense to use your code.
> >
> >
> >             karl
> >
> >
> > ---------------------------------------------------------------------
> >       To unsubscribe, e-mail: [EMAIL PROTECTED]
> >       For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Ocean Documentation

Reply via email to