Re: Ocean Documentation

Jason Rutherglen Fri, 11 Jul 2008 05:29:47 -0700

I started a wiki name at
http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked from
http://wiki.apache.org/lucene-java/LuceneResources.

Perhaps I should add some background on the wiki.  I can add a little bit
here.  I was an early Solr developer/user at a social networking company
when Google's GData came out.  It looked similar to Solr so I took a look at
it.  The one thing it had over Solr was realtime updates or the ability to
add, delete, or update a document and be able to see the update in search
results immediately.  With Solr the company had decided on a 10 minute
interval of updating the index with delta updates from an Oracle database.
I wanted to see if it was possible with Lucene to create an approximation of
what GData does.  The result is Ocean.

The use case it was designed for is websites with dynamic data, some of
which are social networking, photo sites, discussions boards, blogs, wikis,
and such.  More broadly it is possible to use Ocean with any application
that requires the database like feature of immediate updates.  Probably the
best example of this is all of Google's web applications, outside of web
search, uses a GData interface.  Meaning the primary datastore is not mysql
or some equivalent, it is a proprietary search based database.  The best
example of this is Gmail.  If I receive an email through Gmail I can also
search on it immediately, there is no 10 minute delay.  Also in Gmail I can
change labels, a common example being changing unread emails to read in
bulk.  Presumably Gmail is not reindexing the entire email for each label
change.

Most highly trafficked web applications do not use the relational facilities
like joins because they are too expensive.  Lucene does not offer joins so
this is fine.  The only area Lucene is currently weak in is range queries.
Mysql uses a btree index whereas Lucene uses the time consuming TermEnum and
TermDocs combination.  This is an area Tag Index addresses.

The way Ocean is designed there should be no limitations to using it
compared to using Lucene IndexWriter.  It offers the same functionality.  If
one does not want to use the transaction log Ocean offers because one simply
wants to index 1 million documents at once, Ocean offers what is a called a
LargeBatch.  It is a way to perform a large number of updates taking
advantage of the new IndexWriter speedup, combined with transactional
semantics.

Karl, does this answer your question or are there areas that could use more
explanation?

On Fri, Jul 11, 2008 at 6:20 AM, Karl Wettin <[EMAIL PROTECTED]> wrote:

>
> 10 jul 2008 kl. 22.08 skrev Jason Rutherglen:
>
>  Is there a good place to put Ocean
>> https://issues.apache.org/jira/browse/LUCENE-1313 documentation?  Is
>> there a place on the wiki that is good?
>>
>
> Hi Janson,
>
> the wiki is just fine.
>
> I've been reading the docs and looked at your patch. There is a lot of text
> about how it does what it does, but it says nothing anything about the
> intended use. I honestly don't even know what you mean by "real time
> search". You will probably get more attention if the documentation starts
> out with some use cases or thoughts on when and why it might make sense to
> use your code.
>
>
>       karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Ocean Documentation

Reply via email to