Re: Realtime Search for Social Networks Collaboration

Otis Gospodnetic Sat, 06 Sep 2008 01:36:54 -0700

Regarding real-time search and Solr, my feeling is the focus should be on first 
adding real-time search to Lucene, and then we'll figure out how to incorporate 
that into Solr later.

I've read Jason's Wiki as well.  Actually, I had to read it a number of times 
to understand bits and pieces of it.  I have to admit there is still some 
fuzziness about the whole things in my head - is "Ocean" something that already 
works, a separate project on googlecode.com?  I think so.  If so, and if you 
are working on getting it integrated into Lucene, would it make it less 
confusing to just refer to it as "real-time search", so there is no confusion?

If this is to be initially integrated into Lucene, why are things like 
replication, crowding/field collapsing, locallucene, name service, tag index, 
etc. all mentioned there on the Wiki and bundled with description of how 
real-time search works and is to be implemented?  I suppose mentioning 
replication kind-of makes sense because the replication approach is closely 
tied to real-time search - all query nodes need to see index changes fast.  But 
Lucene itself offers no replication mechanism, so maybe the replication is 
something to figure out separately, say on the Solr level, later on "once we 
get there".  I think even just the essential real-time search requires 
substantial changes to Lucene (I remember seeing large patches in JIRA), which 
makes it hard to digest, understand, comment on, and ultimately commit (hence 
the luke warm response, I think).  Bringing other non-essential elements into 
discussion at the same time makes it more difficult to
 process all this new stuff, at least for me.  Am I the only one who finds this 
hard?

That said, it sounds like we have some discussion going (Karl...), so I look 
forward to understanding more! :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Yonik Seeley <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Thursday, September 4, 2008 10:13:32 AM
> Subject: Re: Realtime Search for Social Networks Collaboration
> 
> On Wed, Sep 3, 2008 at 6:50 PM, Jason Rutherglen
> wrote:
> > I also think it's got a
> > lot of things now which makes integration difficult to do properly.
> 
> I agree, and that's why the major bump in version number rather than
> minor - we recognize that some features will need some amount of
> rearchitecture.
> 
> > I think the problem with integration with SOLR is it was designed with
> > a different problem set in mind than Ocean, originally the CNET
> > shopping application.
> 
> That was the first use of Solr, but it actually existed before that
> w/o any defined use other than to be a "plan B" alternative to MySQL
> based search servers (that's actually where some of the parameter
> names come from... the default /select URL instead of /search, the
> "rows" parameter, etc).
> 
> But you're right... some things like the replication strategy were
> designed (well, borrowed from Doug to be exact) with the idea that it
> would be OK to have slightly "stale" views of the data in the range of
> minutes.  It just made things easier/possible at the time.  But tons
> of Solr and Lucene users want almost instantaneous visibility of added
> documents, if they can get it.  It's hardly restricted to social
> network applications.
> 
> Bottom line is that Solr aims to be a general enterprise search
> platform, and getting as real-time as we can get, and as scalable as
> we can get are some of the top priorities going forward.
> 
> -Yonik
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Realtime Search for Social Networks Collaboration

Reply via email to