RE: How real-time are Solr/Lucene queries?

2010-05-26 Thread Nagelberg, Kallin
Searching is very fast with Solr, but no way as fast as keying into a map. 
There is possibly disk I/O if your document isn't cached. Your situation sounds 
unique enough I think you're going to need to prototype to see if it meets your 
demands. Figure out how 'fast' is 'fast' for your application, and then see if 
you can hit your targets. Once you have some real numbers and queries you'll be 
able to get more meaningful feedback from the community I imagine.

-Kallin Nagelberg

-Original Message-
From: Thomas J. Buhr [mailto:t...@superstringmedia.com] 
Sent: Wednesday, May 26, 2010 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: How real-time are Solr/Lucene queries?

What about my situation? 

My renderers need to query the index for fast access to layout and style info 
as I already described about 3 messages ago on this thread. Another scenario is 
having automatic queries triggered as my midi player iterates through the 
model. As the player encounters trigger tags it needs to make a query quickly 
so that the next notes played will have the context they are meant to have.

Basically, I need to know that issuing searches to a local index will not be 
slower than searching a hashmap or array. How different or similar will the 
performance be?

Thom


On 2010-05-26, at 9:41 AM, Walter Underwood wrote:

> On May 25, 2010, at 11:24 PM, Amit Nithian wrote:
> 
>> 2) What are typical/accepted definitions of "Real Time" vs "Near Real Time"?
> 
> Real time means that an update is available in the next query after it 
> commits. Near real time means that the delay is small, but not zero.
> 
> This is within a single server. In a cluster, there will be some 
> communication delay. 
> 
>> 3) I could understand POSTing a document to a server and then turning around
>> and searching for it on the same server but what about a replicated
>> environment and how do you prevent caches from being blown and constantly
>> re-warmed (hence performance degradation)?
> 
> You need a different caching design, with transaction-aware caches that are 
> at a lower level, closer to the indexes.
> 
> wunder
> --
> Walter Underwood
> Lead Engineer
> MarkLogic
> 
> 
> 



Re: How real-time are Solr/Lucene queries?

2010-05-26 Thread Sixten Otto
On Wed, May 26, 2010 at 11:30 AM, Thomas J. Buhr
 wrote:
> Basically, I need to know that issuing searches to a local index will not be 
> slower than searching a hashmap or array. How different or similar will the 
> performance be?

If you don't mind my asking... I'm still trying to understand why your
application isn't using something like a hashtable, as opposed to
Lucene. You've said that you have many very tiny pieces of data that
you're storing and looking up, and that you're not analyzing them very
much with Lucene.

You've said that you're looking up these values with Lucene queries,
but haven't said much about the kinds of queries you're using. Your
descriptions read to me like you know what specific things you're
finding, which makes me wonder why a Dictionary (in the abstract
sense) wouldn't work for what you're doing. What role is the search
engine playing that a simpler (and almost certainly faster and less
complicated) data store couldn't?

Perhaps elaborating on that might help folks on this list to better
address your questions about whether Solr/Lucene can meet your
requirements?

Sixten


Re: How real-time are Solr/Lucene queries?

2010-05-26 Thread Thomas J. Buhr
What about my situation? 

My renderers need to query the index for fast access to layout and style info 
as I already described about 3 messages ago on this thread. Another scenario is 
having automatic queries triggered as my midi player iterates through the 
model. As the player encounters trigger tags it needs to make a query quickly 
so that the next notes played will have the context they are meant to have.

Basically, I need to know that issuing searches to a local index will not be 
slower than searching a hashmap or array. How different or similar will the 
performance be?

Thom


On 2010-05-26, at 9:41 AM, Walter Underwood wrote:

> On May 25, 2010, at 11:24 PM, Amit Nithian wrote:
> 
>> 2) What are typical/accepted definitions of "Real Time" vs "Near Real Time"?
> 
> Real time means that an update is available in the next query after it 
> commits. Near real time means that the delay is small, but not zero.
> 
> This is within a single server. In a cluster, there will be some 
> communication delay. 
> 
>> 3) I could understand POSTing a document to a server and then turning around
>> and searching for it on the same server but what about a replicated
>> environment and how do you prevent caches from being blown and constantly
>> re-warmed (hence performance degradation)?
> 
> You need a different caching design, with transaction-aware caches that are 
> at a lower level, closer to the indexes.
> 
> wunder
> --
> Walter Underwood
> Lead Engineer
> MarkLogic
> 
> 
> 



Re: How real-time are Solr/Lucene queries?

2010-05-26 Thread Walter Underwood
On May 25, 2010, at 11:24 PM, Amit Nithian wrote:

> 2) What are typical/accepted definitions of "Real Time" vs "Near Real Time"?

Real time means that an update is available in the next query after it commits. 
Near real time means that the delay is small, but not zero.

This is within a single server. In a cluster, there will be some communication 
delay. 

> 3) I could understand POSTing a document to a server and then turning around
> and searching for it on the same server but what about a replicated
> environment and how do you prevent caches from being blown and constantly
> re-warmed (hence performance degradation)?

You need a different caching design, with transaction-aware caches that are at 
a lower level, closer to the indexes.

wunder
--
Walter Underwood
Lead Engineer
MarkLogic




Re: How real-time are Solr/Lucene queries?

2010-05-25 Thread Amit Nithian
This is an interesting discussion and I have a few questions:
1) My apologies but I haven't been following the NRT patch beyond what was
presented at a meetup some months back and the wiki but what is the status
of it in Solr?
2) What are typical/accepted definitions of "Real Time" vs "Near Real Time"?
==> Related to Grant's points earlier.
3) I could understand POSTing a document to a server and then turning around
and searching for it on the same server but what about a replicated
environment and how do you prevent caches from being blown and constantly
re-warmed (hence performance degradation)? I could set Solr replication once
per minute or less but then the caches caches would be regenerated each
minute which I suspect is not cheap.

Note: I am curious about this from a typical web-based application
perspective as opposed to an embedded/desktop application like context
described earlier in this thread.

Thanks!
Amit

On Tue, May 25, 2010 at 10:28 AM, Thomas J. Buhr
wrote:

> My documents are all quite small if not down right tiny, there is not much
> analysis to do. I plan to mainly use Solr for indexing application
> configuration data which there is a lot of and I have all pre-formated.
> Since it is a music application there are many score templates, scale and
> rhythm strings, notation symbol skins, etc. Then there are slightly more
> usual things to index like application help pages and tutorials.
>
> In terms of queries per second there will be a lot being fired by our
> painter. In our application data is flowing into a painter who in turn
> delegates specific painting tasks to renderer objects. These renderer
> objects then make many queries extremely fast to the embedded Solr indexes
> for data they need, such as layout and style values.
>
> Believe me there is a lot of detailed data involved in music notation and
> abstracting it into configurations in the form of index documents is a good
> way to manage such data. Further, the data in the form of documents work as
> a form of plugins so that alternate configurations for different notation
> types can be added to the index. Then via simple search it is possible to
> dialup a certain set of documents which contain all the details of a given
> notation. Mean while the renderer objects remain generic and are just
> reconfigured with the different indexed configuration documents.
>
> Will making many fast queries from renderers to an embedded local Solr
> index slow my painting down?
>
> Thom
>
>
> On 2010-05-25, at 6:09 AM, Grant Ingersoll wrote:
>
> > How many docs are in the batch you are pulling down?  How many
> docs/second do you expect on the index size?  How big are the docs?  What do
> you expect in terms of queries per second?  How fast do new documents need
> to be available on the local server?  How much analysis do you have to do?
>  Also, define Real Time.  You'd be surprised at the number of people I talk
> to who think they need Real Time, but then when you ask them questions like
> I just did, they don't really need it.  I've seen Solr turn around new docs
> in as little as 30 seconds on commodity hardware w/o any special engineering
> effort and I've seen it faster than that with some engineering effort.  That
> isn't necessarily possible for every application, but...
> >
> > Despite the other suggestions, what you describe still looks feasible to
> me in Solr, pending the questions above (and some followups).
> >
> >
> > On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:
> >
> >> Thanks for the new information. Its really great to see so many options
> for Lucene.
> >>
> >> In my scenario there are the following pieces:
> >>
> >> 1 - A local Java client with an embedded Solr instance and its own local
> index/s.
> >> 2 - A remote server running Solr with index/s that are more like a
> repository that local clients query for extra goodies.
> >> 3 - The client is also a JXTA node so it can share indexes or documents
> too.
> >> 4 - There is no browser involved what so ever.
> >>
> >> My music composing application is a local client that uses
> configurations which would become many different document types. A subset of
> these configurations will be bundled with the application and then many more
> would be made available via a server/s running Solr.
> >>
> >> I would not expect the queries which would be made from within the local
> client to be returned in real-time. I would only expect such queries to be
> made in reasonable time and returned to the client. The client would have
> its local Lucene index system (embedded Solr using SolrJ) which would be
> updated with the results of the query made to the Solr instance running on
> the remote server.
> >>
> >> Then the user on the client would issue queries to the local Lucene
> index/s to obtain results which are used to setup contexts for different
> aspects of the client. For example: an activated context for musical scales
> and rhythms used for creating musical notes, an 

Re: How real-time are Solr/Lucene queries?

2010-05-25 Thread Thomas J. Buhr
My documents are all quite small if not down right tiny, there is not much 
analysis to do. I plan to mainly use Solr for indexing application 
configuration data which there is a lot of and I have all pre-formated. Since 
it is a music application there are many score templates, scale and rhythm 
strings, notation symbol skins, etc. Then there are slightly more usual things 
to index like application help pages and tutorials.

In terms of queries per second there will be a lot being fired by our painter. 
In our application data is flowing into a painter who in turn delegates 
specific painting tasks to renderer objects. These renderer objects then make 
many queries extremely fast to the embedded Solr indexes for data they need, 
such as layout and style values. 

Believe me there is a lot of detailed data involved in music notation and 
abstracting it into configurations in the form of index documents is a good way 
to manage such data. Further, the data in the form of documents work as a form 
of plugins so that alternate configurations for different notation types can be 
added to the index. Then via simple search it is possible to dialup a certain 
set of documents which contain all the details of a given notation. Mean while 
the renderer objects remain generic and are just reconfigured with the 
different indexed configuration documents.

Will making many fast queries from renderers to an embedded local Solr index 
slow my painting down?

Thom


On 2010-05-25, at 6:09 AM, Grant Ingersoll wrote:

> How many docs are in the batch you are pulling down?  How many docs/second do 
> you expect on the index size?  How big are the docs?  What do you expect in 
> terms of queries per second?  How fast do new documents need to be available 
> on the local server?  How much analysis do you have to do?  Also, define Real 
> Time.  You'd be surprised at the number of people I talk to who think they 
> need Real Time, but then when you ask them questions like I just did, they 
> don't really need it.  I've seen Solr turn around new docs in as little as 30 
> seconds on commodity hardware w/o any special engineering effort and I've 
> seen it faster than that with some engineering effort.  That isn't 
> necessarily possible for every application, but...
> 
> Despite the other suggestions, what you describe still looks feasible to me 
> in Solr, pending the questions above (and some followups).
> 
> 
> On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:
> 
>> Thanks for the new information. Its really great to see so many options for 
>> Lucene.
>> 
>> In my scenario there are the following pieces:
>> 
>> 1 - A local Java client with an embedded Solr instance and its own local 
>> index/s.
>> 2 - A remote server running Solr with index/s that are more like a 
>> repository that local clients query for extra goodies.
>> 3 - The client is also a JXTA node so it can share indexes or documents too.
>> 4 - There is no browser involved what so ever.
>> 
>> My music composing application is a local client that uses configurations 
>> which would become many different document types. A subset of these 
>> configurations will be bundled with the application and then many more would 
>> be made available via a server/s running Solr.
>> 
>> I would not expect the queries which would be made from within the local 
>> client to be returned in real-time. I would only expect such queries to be 
>> made in reasonable time and returned to the client. The client would have 
>> its local Lucene index system (embedded Solr using SolrJ) which would be 
>> updated with the results of the query made to the Solr instance running on 
>> the remote server.
>> 
>> Then the user on the client would issue queries to the local Lucene index/s 
>> to obtain results which are used to setup contexts for different aspects of 
>> the client. For example: an activated context for musical scales and rhythms 
>> used for creating musical notes, an activated context for rendering with 
>> layout and style information for different music symbol renderer types.
>> 
>> I'm not yet sure but it may be best to make queries against the local Lucene 
>> index/s and then convert the results into some context objects, maybe an 
>> array or map (I'd like to learn more about how query results can be returned 
>> as arrays or maps as well). Then the tools and renderers which require the 
>> information in the contexts would do any real-time lookup directly from the 
>> context objects not the local or remote Lucene or Solr index/s. The local 
>> client is also a JXTA node so it can share its own index/s with fellow peers.
>> 
>> This is how I envision this happening with my limited knowledge of 
>> Lucene/Solr at this time. What are your thoughts on the feasibility of such 
>> a scenario?
>> 
>> I'm just reading through the Solr reference PDF now and looking over the 
>> Solr admin application. Looking at the Schema.xml it seems to be field not 
>> document oriented.

Re: How real-time are Solr/Lucene queries?

2010-05-25 Thread Jason Rutherglen
The main issue is if you're using facets, which are currently
inefficient for the realtime use case because they're created on the
entire set of segment/readers.  Field caches in Lucene are per segment
and so don't have this problem.

On Tue, May 25, 2010 at 4:09 AM, Grant Ingersoll  wrote:
> How many docs are in the batch you are pulling down?  How many docs/second do 
> you expect on the index size?  How big are the docs?  What do you expect in 
> terms of queries per second?  How fast do new documents need to be available 
> on the local server?  How much analysis do you have to do?  Also, define Real 
> Time.  You'd be surprised at the number of people I talk to who think they 
> need Real Time, but then when you ask them questions like I just did, they 
> don't really need it.  I've seen Solr turn around new docs in as little as 30 
> seconds on commodity hardware w/o any special engineering effort and I've 
> seen it faster than that with some engineering effort.  That isn't 
> necessarily possible for every application, but...
>
> Despite the other suggestions, what you describe still looks feasible to me 
> in Solr, pending the questions above (and some followups).
>
>
> On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:
>
>> Thanks for the new information. Its really great to see so many options for 
>> Lucene.
>>
>> In my scenario there are the following pieces:
>>
>> 1 - A local Java client with an embedded Solr instance and its own local 
>> index/s.
>> 2 - A remote server running Solr with index/s that are more like a 
>> repository that local clients query for extra goodies.
>> 3 - The client is also a JXTA node so it can share indexes or documents too.
>> 4 - There is no browser involved what so ever.
>>
>> My music composing application is a local client that uses configurations 
>> which would become many different document types. A subset of these 
>> configurations will be bundled with the application and then many more would 
>> be made available via a server/s running Solr.
>>
>> I would not expect the queries which would be made from within the local 
>> client to be returned in real-time. I would only expect such queries to be 
>> made in reasonable time and returned to the client. The client would have 
>> its local Lucene index system (embedded Solr using SolrJ) which would be 
>> updated with the results of the query made to the Solr instance running on 
>> the remote server.
>>
>> Then the user on the client would issue queries to the local Lucene index/s 
>> to obtain results which are used to setup contexts for different aspects of 
>> the client. For example: an activated context for musical scales and rhythms 
>> used for creating musical notes, an activated context for rendering with 
>> layout and style information for different music symbol renderer types.
>>
>> I'm not yet sure but it may be best to make queries against the local Lucene 
>> index/s and then convert the results into some context objects, maybe an 
>> array or map (I'd like to learn more about how query results can be returned 
>> as arrays or maps as well). Then the tools and renderers which require the 
>> information in the contexts would do any real-time lookup directly from the 
>> context objects not the local or remote Lucene or Solr index/s. The local 
>> client is also a JXTA node so it can share its own index/s with fellow peers.
>>
>> This is how I envision this happening with my limited knowledge of 
>> Lucene/Solr at this time. What are your thoughts on the feasibility of such 
>> a scenario?
>>
>> I'm just reading through the Solr reference PDF now and looking over the 
>> Solr admin application. Looking at the Schema.xml it seems to be field not 
>> document oriented. From my point of view I think in terms of configuration 
>> types which would be documents. In the schema it seems like only fields are 
>> defined and it does not matter which configuration/document they belong to? 
>> I guess this is fine as long as the indexing takes into account my unique 
>> document types and I can search for them as a whole as well, not only for 
>> specific values across a set of indexed documents.
>>
>> Also, does the schema allow me to index certain documents into specific 
>> indexes or are they all just bunched together? I'd rather have unique 
>> indexes for specific document types. I've just read about multiple cores 
>> running under one Solr instance, is this the only way to support multiple 
>> indexes?
>>
>> I'm thinking of ordering the Lucene in Action v2 book which is due this 
>> month and also the Solr 1.4 book. Before I do I just need to understand a 
>> few things which is why I'm writing such a long message :-)
>>
>> Thom
>>
>>
>> On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
>>
>>> Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
>>> backs onto,  is 'eventually consistent',  so given your real-time 
>>> requirements,  you may want to review this in the first i

Re: How real-time are Solr/Lucene queries?

2010-05-25 Thread Grant Ingersoll
How many docs are in the batch you are pulling down?  How many docs/second do 
you expect on the index size?  How big are the docs?  What do you expect in 
terms of queries per second?  How fast do new documents need to be available on 
the local server?  How much analysis do you have to do?  Also, define Real 
Time.  You'd be surprised at the number of people I talk to who think they need 
Real Time, but then when you ask them questions like I just did, they don't 
really need it.  I've seen Solr turn around new docs in as little as 30 seconds 
on commodity hardware w/o any special engineering effort and I've seen it 
faster than that with some engineering effort.  That isn't necessarily possible 
for every application, but...

Despite the other suggestions, what you describe still looks feasible to me in 
Solr, pending the questions above (and some followups).


On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:

> Thanks for the new information. Its really great to see so many options for 
> Lucene.
> 
> In my scenario there are the following pieces:
> 
> 1 - A local Java client with an embedded Solr instance and its own local 
> index/s.
> 2 - A remote server running Solr with index/s that are more like a repository 
> that local clients query for extra goodies.
> 3 - The client is also a JXTA node so it can share indexes or documents too.
> 4 - There is no browser involved what so ever.
> 
> My music composing application is a local client that uses configurations 
> which would become many different document types. A subset of these 
> configurations will be bundled with the application and then many more would 
> be made available via a server/s running Solr.
> 
> I would not expect the queries which would be made from within the local 
> client to be returned in real-time. I would only expect such queries to be 
> made in reasonable time and returned to the client. The client would have its 
> local Lucene index system (embedded Solr using SolrJ) which would be updated 
> with the results of the query made to the Solr instance running on the remote 
> server.
> 
> Then the user on the client would issue queries to the local Lucene index/s 
> to obtain results which are used to setup contexts for different aspects of 
> the client. For example: an activated context for musical scales and rhythms 
> used for creating musical notes, an activated context for rendering with 
> layout and style information for different music symbol renderer types.
> 
> I'm not yet sure but it may be best to make queries against the local Lucene 
> index/s and then convert the results into some context objects, maybe an 
> array or map (I'd like to learn more about how query results can be returned 
> as arrays or maps as well). Then the tools and renderers which require the 
> information in the contexts would do any real-time lookup directly from the 
> context objects not the local or remote Lucene or Solr index/s. The local 
> client is also a JXTA node so it can share its own index/s with fellow peers.
> 
> This is how I envision this happening with my limited knowledge of 
> Lucene/Solr at this time. What are your thoughts on the feasibility of such a 
> scenario?
> 
> I'm just reading through the Solr reference PDF now and looking over the Solr 
> admin application. Looking at the Schema.xml it seems to be field not 
> document oriented. From my point of view I think in terms of configuration 
> types which would be documents. In the schema it seems like only fields are 
> defined and it does not matter which configuration/document they belong to? I 
> guess this is fine as long as the indexing takes into account my unique 
> document types and I can search for them as a whole as well, not only for 
> specific values across a set of indexed documents. 
> 
> Also, does the schema allow me to index certain documents into specific 
> indexes or are they all just bunched together? I'd rather have unique indexes 
> for specific document types. I've just read about multiple cores running 
> under one Solr instance, is this the only way to support multiple indexes?
> 
> I'm thinking of ordering the Lucene in Action v2 book which is due this month 
> and also the Solr 1.4 book. Before I do I just need to understand a few 
> things which is why I'm writing such a long message :-)
> 
> Thom
> 
> 
> On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
> 
>> Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
>> backs onto,  is 'eventually consistent',  so given your real-time 
>> requirements,  you may want to review this in the first instance, if 
>> Lucandra is of interest.
>> 
>> On 21 May 2010, at 06:12, Walter Underwood wrote:
>> 
>>> Solr is a very good engine, but it is not real-time. You can turn off the 
>>> caches and reduce the delays, but it is fundamentally not real-time.
>>> 
>>> I work at MarkLogic, and we have a real-time transactional search engine 
>>> (and respository). If you are curious,

Re: How real-time are Solr/Lucene queries?

2010-05-23 Thread Peter Karich
Hi Thomas,

> A question that remains is this, is it better to use the core Lucene
API in my local client
> for the work it does locally with indexes or is it okay to use
embedded Solr with SolrJ?

Thats a very good question. Hopefully experts could answer this for us.
I will use SolrJ instead Lucene because of [1] and because I think the
explanation of [2] is a bit misleading,
because it only means the EmbeddedServer part of SolrJ is deprecated and
not the whole API e.g. via CommonsHttpSolrServer
.


But I do not know this for sure.

Regards,
Peter.

[1]
http://stackoverflow.com/questions/2856427/situations-to-prefer-apache-lucene-over-solr

[2]
http://wiki.apache.org/solr/EmbeddedSolr

> While Solr is optimized for the server aspects I'm not sure if it is the best 
> option for the client side of things?
>
> Thom
>
>
> On 2010-05-23, at 7:36 AM, Peter Karich wrote:
>
>   
>> Hi,
>>
>> just as a side note as I did not read the link in your conversation:
>>
>> http://wiki.apache.org/lucene-java/NearRealtimeSearch (I just stumbled
>> over this as I am interested in this feature too)
>>
>> Regards,
>> Peter.
>>
>> 
>>> Thanks for the new information. Its really great to see so many options for 
>>> Lucene.
>>>
>>> In my scenario there are the following pieces:
>>>
>>> 1 - A local Java client with an embedded Solr instance and its own local 
>>> index/s.
>>> 2 - A remote server running Solr with index/s that are more like a 
>>> repository that local clients query for extra goodies.
>>> 3 - The client is also a JXTA node so it can share indexes or documents too.
>>> 4 - There is no browser involved what so ever.
>>>
>>> My music composing application is a local client that uses configurations 
>>> which would become many different document types. A subset of these 
>>> configurations will be bundled with the application and then many more 
>>> would be made available via a server/s running Solr.
>>>
>>> I would not expect the queries which would be made from within the local 
>>> client to be returned in real-time. I would only expect such queries to be 
>>> made in reasonable time and returned to the client. The client would have 
>>> its local Lucene index system (embedded Solr using SolrJ) which would be 
>>> updated with the results of the query made to the Solr instance running on 
>>> the remote server.
>>>
>>> Then the user on the client would issue queries to the local Lucene index/s 
>>> to obtain results which are used to setup contexts for different aspects of 
>>> the client. For example: an activated context for musical scales and 
>>> rhythms used for creating musical notes, an activated context for rendering 
>>> with layout and style information for different music symbol renderer types.
>>>
>>> I'm not yet sure but it may be best to make queries against the local 
>>> Lucene index/s and then convert the results into some context objects, 
>>> maybe an array or map (I'd like to learn more about how query results can 
>>> be returned as arrays or maps as well). Then the tools and renderers which 
>>> require the information in the contexts would do any real-time lookup 
>>> directly from the context objects not the local or remote Lucene or Solr 
>>> index/s. The local client is also a JXTA node so it can share its own 
>>> index/s with fellow peers.
>>>
>>> This is how I envision this happening with my limited knowledge of 
>>> Lucene/Solr at this time. What are your thoughts on the feasibility of such 
>>> a scenario?
>>>
>>> I'm just reading through the Solr reference PDF now and looking over the 
>>> Solr admin application. Looking at the Schema.xml it seems to be field not 
>>> document oriented. From my point of view I think in terms of configuration 
>>> types which would be documents. In the schema it seems like only fields are 
>>> defined and it does not matter which configuration/document they belong to? 
>>> I guess this is fine as long as the indexing takes into account my unique 
>>> document types and I can search for them as a whole as well, not only for 
>>> specific values across a set of indexed documents. 
>>>
>>> Also, does the schema allow me to index certain documents into specific 
>>> indexes or are they all just bunched together? I'd rather have unique 
>>> indexes for specific document types. I've just read about multiple cores 
>>> running under one Solr instance, is this the only way to support multiple 
>>> indexes?
>>>
>>> I'm thinking of ordering the Lucene in Action v2 book which is due this 
>>> month and also the Solr 1.4 book. Before I do I just need to understand a 
>>> few things which is why I'm writing such a long message :-)
>>>
>>> Thom
>>>
>>>
>>> On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
>>>
>>>
>>>   
 Further to earlier note re Lucandra.  I note that Cassandra, which 
 Lucandra backs onto,  is 'eventually consistent',  so given your rea

Re: How real-time are Solr/Lucene queries?

2010-05-23 Thread Thomas J. Buhr
Thanks for the link, as a result I also found the following page which supplied 
a lot of tips:

http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

I've got the Solr 1.4 book plus the Lucene in Action v2 books on order however, 
I will only receive the Solr book now until the other is published. A question 
that remains is this, is it better to use the core Lucene API in my local 
client for the work it does locally with indexes or is it okay to use embedded 
Solr with SolrJ?

While Solr is optimized for the server aspects I'm not sure if it is the best 
option for the client side of things?

Thom


On 2010-05-23, at 7:36 AM, Peter Karich wrote:

> Hi,
> 
> just as a side note as I did not read the link in your conversation:
> 
> http://wiki.apache.org/lucene-java/NearRealtimeSearch (I just stumbled
> over this as I am interested in this feature too)
> 
> Regards,
> Peter.
> 
>> Thanks for the new information. Its really great to see so many options for 
>> Lucene.
>> 
>> In my scenario there are the following pieces:
>> 
>> 1 - A local Java client with an embedded Solr instance and its own local 
>> index/s.
>> 2 - A remote server running Solr with index/s that are more like a 
>> repository that local clients query for extra goodies.
>> 3 - The client is also a JXTA node so it can share indexes or documents too.
>> 4 - There is no browser involved what so ever.
>> 
>> My music composing application is a local client that uses configurations 
>> which would become many different document types. A subset of these 
>> configurations will be bundled with the application and then many more would 
>> be made available via a server/s running Solr.
>> 
>> I would not expect the queries which would be made from within the local 
>> client to be returned in real-time. I would only expect such queries to be 
>> made in reasonable time and returned to the client. The client would have 
>> its local Lucene index system (embedded Solr using SolrJ) which would be 
>> updated with the results of the query made to the Solr instance running on 
>> the remote server.
>> 
>> Then the user on the client would issue queries to the local Lucene index/s 
>> to obtain results which are used to setup contexts for different aspects of 
>> the client. For example: an activated context for musical scales and rhythms 
>> used for creating musical notes, an activated context for rendering with 
>> layout and style information for different music symbol renderer types.
>> 
>> I'm not yet sure but it may be best to make queries against the local Lucene 
>> index/s and then convert the results into some context objects, maybe an 
>> array or map (I'd like to learn more about how query results can be returned 
>> as arrays or maps as well). Then the tools and renderers which require the 
>> information in the contexts would do any real-time lookup directly from the 
>> context objects not the local or remote Lucene or Solr index/s. The local 
>> client is also a JXTA node so it can share its own index/s with fellow peers.
>> 
>> This is how I envision this happening with my limited knowledge of 
>> Lucene/Solr at this time. What are your thoughts on the feasibility of such 
>> a scenario?
>> 
>> I'm just reading through the Solr reference PDF now and looking over the 
>> Solr admin application. Looking at the Schema.xml it seems to be field not 
>> document oriented. From my point of view I think in terms of configuration 
>> types which would be documents. In the schema it seems like only fields are 
>> defined and it does not matter which configuration/document they belong to? 
>> I guess this is fine as long as the indexing takes into account my unique 
>> document types and I can search for them as a whole as well, not only for 
>> specific values across a set of indexed documents. 
>> 
>> Also, does the schema allow me to index certain documents into specific 
>> indexes or are they all just bunched together? I'd rather have unique 
>> indexes for specific document types. I've just read about multiple cores 
>> running under one Solr instance, is this the only way to support multiple 
>> indexes?
>> 
>> I'm thinking of ordering the Lucene in Action v2 book which is due this 
>> month and also the Solr 1.4 book. Before I do I just need to understand a 
>> few things which is why I'm writing such a long message :-)
>> 
>> Thom
>> 
>> 
>> On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
>> 
>> 
>>> Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
>>> backs onto,  is 'eventually consistent',  so given your real-time 
>>> requirements,  you may want to review this in the first instance, if 
>>> Lucandra is of interest.
>>> 
>>> On 21 May 2010, at 06:12, Walter Underwood wrote:
>>> 
>>> 
 Solr is a very good engine, but it is not real-time. You can turn off the 
 caches and reduce the delays, but it is fundamentally not real-time.
 
 I work at MarkLogic, and we have a real-time transactional s

Re: How real-time are Solr/Lucene queries?

2010-05-23 Thread Peter Karich
Hi,

just as a side note as I did not read the link in your conversation:

http://wiki.apache.org/lucene-java/NearRealtimeSearch (I just stumbled
over this as I am interested in this feature too)

Regards,
Peter.

> Thanks for the new information. Its really great to see so many options for 
> Lucene.
>
> In my scenario there are the following pieces:
>
> 1 - A local Java client with an embedded Solr instance and its own local 
> index/s.
> 2 - A remote server running Solr with index/s that are more like a repository 
> that local clients query for extra goodies.
> 3 - The client is also a JXTA node so it can share indexes or documents too.
> 4 - There is no browser involved what so ever.
>
> My music composing application is a local client that uses configurations 
> which would become many different document types. A subset of these 
> configurations will be bundled with the application and then many more would 
> be made available via a server/s running Solr.
>
> I would not expect the queries which would be made from within the local 
> client to be returned in real-time. I would only expect such queries to be 
> made in reasonable time and returned to the client. The client would have its 
> local Lucene index system (embedded Solr using SolrJ) which would be updated 
> with the results of the query made to the Solr instance running on the remote 
> server.
>
> Then the user on the client would issue queries to the local Lucene index/s 
> to obtain results which are used to setup contexts for different aspects of 
> the client. For example: an activated context for musical scales and rhythms 
> used for creating musical notes, an activated context for rendering with 
> layout and style information for different music symbol renderer types.
>
> I'm not yet sure but it may be best to make queries against the local Lucene 
> index/s and then convert the results into some context objects, maybe an 
> array or map (I'd like to learn more about how query results can be returned 
> as arrays or maps as well). Then the tools and renderers which require the 
> information in the contexts would do any real-time lookup directly from the 
> context objects not the local or remote Lucene or Solr index/s. The local 
> client is also a JXTA node so it can share its own index/s with fellow peers.
>
> This is how I envision this happening with my limited knowledge of 
> Lucene/Solr at this time. What are your thoughts on the feasibility of such a 
> scenario?
>
> I'm just reading through the Solr reference PDF now and looking over the Solr 
> admin application. Looking at the Schema.xml it seems to be field not 
> document oriented. From my point of view I think in terms of configuration 
> types which would be documents. In the schema it seems like only fields are 
> defined and it does not matter which configuration/document they belong to? I 
> guess this is fine as long as the indexing takes into account my unique 
> document types and I can search for them as a whole as well, not only for 
> specific values across a set of indexed documents. 
>
> Also, does the schema allow me to index certain documents into specific 
> indexes or are they all just bunched together? I'd rather have unique indexes 
> for specific document types. I've just read about multiple cores running 
> under one Solr instance, is this the only way to support multiple indexes?
>
> I'm thinking of ordering the Lucene in Action v2 book which is due this month 
> and also the Solr 1.4 book. Before I do I just need to understand a few 
> things which is why I'm writing such a long message :-)
>
> Thom
>
>
> On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
>
>   
>> Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
>> backs onto,  is 'eventually consistent',  so given your real-time 
>> requirements,  you may want to review this in the first instance, if 
>> Lucandra is of interest.
>>
>> On 21 May 2010, at 06:12, Walter Underwood wrote:
>>
>> 
>>> Solr is a very good engine, but it is not real-time. You can turn off the 
>>> caches and reduce the delays, but it is fundamentally not real-time.
>>>
>>> I work at MarkLogic, and we have a real-time transactional search engine 
>>> (and respository). If you are curious, contact me directly.
>>>
>>> I do like Solr for lots of applications -- I chose it when I was at Netflix.
>>>
>>> wunder
>>>
>>> On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:
>>>
>>>   
 Hello Soir,

 Soir looks like an excellent API and its nice to have a tutorial that 
 makes it easy to discover the basics of what Soir does, I'm impressed. I 
 can see plenty of potential uses of Soir/Lucene and I'm interested now in 
 just how real-time the queries made to an index can be?

 For example, in my application I have time ordered data being processed by 
 a paint method in real-time. Each piece of data is identified and its 
 associated renderer is invoked. The Ja

Re: How real-time are Solr/Lucene queries?

2010-05-21 Thread Thomas J. Buhr
Thanks for the new information. Its really great to see so many options for 
Lucene.

In my scenario there are the following pieces:

1 - A local Java client with an embedded Solr instance and its own local 
index/s.
2 - A remote server running Solr with index/s that are more like a repository 
that local clients query for extra goodies.
3 - The client is also a JXTA node so it can share indexes or documents too.
4 - There is no browser involved what so ever.

My music composing application is a local client that uses configurations which 
would become many different document types. A subset of these configurations 
will be bundled with the application and then many more would be made available 
via a server/s running Solr.

I would not expect the queries which would be made from within the local client 
to be returned in real-time. I would only expect such queries to be made in 
reasonable time and returned to the client. The client would have its local 
Lucene index system (embedded Solr using SolrJ) which would be updated with the 
results of the query made to the Solr instance running on the remote server.

Then the user on the client would issue queries to the local Lucene index/s to 
obtain results which are used to setup contexts for different aspects of the 
client. For example: an activated context for musical scales and rhythms used 
for creating musical notes, an activated context for rendering with layout and 
style information for different music symbol renderer types.

I'm not yet sure but it may be best to make queries against the local Lucene 
index/s and then convert the results into some context objects, maybe an array 
or map (I'd like to learn more about how query results can be returned as 
arrays or maps as well). Then the tools and renderers which require the 
information in the contexts would do any real-time lookup directly from the 
context objects not the local or remote Lucene or Solr index/s. The local 
client is also a JXTA node so it can share its own index/s with fellow peers.

This is how I envision this happening with my limited knowledge of Lucene/Solr 
at this time. What are your thoughts on the feasibility of such a scenario?

I'm just reading through the Solr reference PDF now and looking over the Solr 
admin application. Looking at the Schema.xml it seems to be field not document 
oriented. From my point of view I think in terms of configuration types which 
would be documents. In the schema it seems like only fields are defined and it 
does not matter which configuration/document they belong to? I guess this is 
fine as long as the indexing takes into account my unique document types and I 
can search for them as a whole as well, not only for specific values across a 
set of indexed documents. 

Also, does the schema allow me to index certain documents into specific indexes 
or are they all just bunched together? I'd rather have unique indexes for 
specific document types. I've just read about multiple cores running under one 
Solr instance, is this the only way to support multiple indexes?

I'm thinking of ordering the Lucene in Action v2 book which is due this month 
and also the Solr 1.4 book. Before I do I just need to understand a few things 
which is why I'm writing such a long message :-)

Thom


On 2010-05-21, at 2:12 AM, Ben Eliott wrote:

> Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
> backs onto,  is 'eventually consistent',  so given your real-time 
> requirements,  you may want to review this in the first instance, if Lucandra 
> is of interest.
> 
> On 21 May 2010, at 06:12, Walter Underwood wrote:
> 
>> Solr is a very good engine, but it is not real-time. You can turn off the 
>> caches and reduce the delays, but it is fundamentally not real-time.
>> 
>> I work at MarkLogic, and we have a real-time transactional search engine 
>> (and respository). If you are curious, contact me directly.
>> 
>> I do like Solr for lots of applications -- I chose it when I was at Netflix.
>> 
>> wunder
>> 
>> On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:
>> 
>>> Hello Soir,
>>> 
>>> Soir looks like an excellent API and its nice to have a tutorial that makes 
>>> it easy to discover the basics of what Soir does, I'm impressed. I can see 
>>> plenty of potential uses of Soir/Lucene and I'm interested now in just how 
>>> real-time the queries made to an index can be?
>>> 
>>> For example, in my application I have time ordered data being processed by 
>>> a paint method in real-time. Each piece of data is identified and its 
>>> associated renderer is invoked. The Java2D renderer would then lookup any 
>>> layout and style values it requires to render the current data it has 
>>> received from the layout and style indexes. What I'm wondering is if this 
>>> lookup which would be a Lucene search will be fast enough?
>>> 
>>> Would it be best to make Lucene queries for the relevant layout and style 
>>> values required by the renderers a