I started a wiki name at http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked from http://wiki.apache.org/lucene-java/LuceneResources.
Perhaps I should add some background on the wiki. I can add a little bit here. I was an early Solr developer/user at a social networking company when Google's GData came out. It looked similar to Solr so I took a look at it. The one thing it had over Solr was realtime updates or the ability to add, delete, or update a document and be able to see the update in search results immediately. With Solr the company had decided on a 10 minute interval of updating the index with delta updates from an Oracle database. I wanted to see if it was possible with Lucene to create an approximation of what GData does. The result is Ocean. The use case it was designed for is websites with dynamic data, some of which are social networking, photo sites, discussions boards, blogs, wikis, and such. More broadly it is possible to use Ocean with any application that requires the database like feature of immediate updates. Probably the best example of this is all of Google's web applications, outside of web search, uses a GData interface. Meaning the primary datastore is not mysql or some equivalent, it is a proprietary search based database. The best example of this is Gmail. If I receive an email through Gmail I can also search on it immediately, there is no 10 minute delay. Also in Gmail I can change labels, a common example being changing unread emails to read in bulk. Presumably Gmail is not reindexing the entire email for each label change. Most highly trafficked web applications do not use the relational facilities like joins because they are too expensive. Lucene does not offer joins so this is fine. The only area Lucene is currently weak in is range queries. Mysql uses a btree index whereas Lucene uses the time consuming TermEnum and TermDocs combination. This is an area Tag Index addresses. The way Ocean is designed there should be no limitations to using it compared to using Lucene IndexWriter. It offers the same functionality. If one does not want to use the transaction log Ocean offers because one simply wants to index 1 million documents at once, Ocean offers what is a called a LargeBatch. It is a way to perform a large number of updates taking advantage of the new IndexWriter speedup, combined with transactional semantics. Karl, does this answer your question or are there areas that could use more explanation? On Fri, Jul 11, 2008 at 6:20 AM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 10 jul 2008 kl. 22.08 skrev Jason Rutherglen: > > Is there a good place to put Ocean >> https://issues.apache.org/jira/browse/LUCENE-1313 documentation? Is >> there a place on the wiki that is good? >> > > Hi Janson, > > the wiki is just fine. > > I've been reading the docs and looked at your patch. There is a lot of text > about how it does what it does, but it says nothing anything about the > intended use. I honestly don't even know what you mean by "real time > search". You will probably get more attention if the documentation starts > out with some use cases or thoughts on when and why it might make sense to > use your code. > > > karl > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
