Re: Using Multiple Cores for Multiple Users

2010-11-10 Thread Shalin Shekhar Mangar
On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
wrote:

> Thanks a lot for all the tips, guys! I think that we may explore both
> options just to see what happens. I'm sure that scalability will be a huge
> mess with the core-per-user scenario. I like the idea of creating a user ID
> field and agree that it's probably the best approach. We'll see...I will be
> sure to let the list know what I find! Please don't stop posting your
> comments everyone ;-) My inquiring mind wants to know...
>
>
I think it is customary for me to mention the techniques mentioned in
LotsOfCores for these kind of questions. The patches are mostly useless at
this point but if you are looking for a per-user solution, you will need
most of the tricks mentioned on the wiki page.

http://wiki.apache.org/solr/LotsOfCores

-- 
Regards,
Shalin Shekhar Mangar.


Re: Using Multiple Cores for Multiple Users

2010-11-10 Thread Jan Høydahl / Cominvent
Hi,

If your index is supposed to handle only public information, i.e. public RSS 
feeds, then I don't see a need for multiple cores.

I would probably try to handle this on the query side only. Imagine this 
scenario:

User A registers RSS-X and RSS-Y (the application starts pulling and indexing 
these feeds)
User B registers RSS-Z (the application starts pulling feed Z)
User C registers RSS-X and RSS-Z (the application does nothing, as these are 
already being indexed)

When searching, add a filter to each user's queries. Solr will handle MANY 
terms in such a filter, and it is not likely that a human user subscribes to 
more than say a few 100 feeds.

So for user C, the query would look like .../solr/select?q=foo 
bar&fq=feedID:(RSS-X OR RSS-Z)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 10. nov. 2010, at 03.00, Adam Estrada wrote:

> Thanks a lot for all the tips, guys! I think that we may explore both
> options just to see what happens. I'm sure that scalability will be a huge
> mess with the core-per-user scenario. I like the idea of creating a user ID
> field and agree that it's probably the best approach. We'll see...I will be
> sure to let the list know what I find! Please don't stop posting your
> comments everyone ;-) My inquiring mind wants to know...
> 
> Adam
> 
> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:
> 
>> If storing in a single index (possibly sharded if you need it), you can
>> simply include a solr field that specifies the user ID of the saved thing.
>> On the client side, in your application, simply ensure that there is an fq
>> parameter limiting to the current user, if you want to limit to the current
>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>> cores', there is no relevancy issue.
>> 
>> It IS true that when your index gets very large, commits will start taking
>> longer, which can be a problem. I don't mean commits will take longer just
>> because there is more stuff to commit -- the larger the index, the longer an
>> update to a single document will take to commit.
>> 
>> In general, i suspect that having dozens or hundreds (or thousands!) of
>> cores is not going to scale well, it is not going to make good use of your
>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>> 
>> However, you are probably going to run into some issues with the single
>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>> an oft-asked question that there doesn't seem to be any "just works and does
>> everything for you without needing to think about it" solution for in solr.
>> Judging from past thread. I am not a Solr developer or expert.
>> 
>> 
>> From: Markus Jelsma [markus.jel...@openindex.io]
>> Sent: Tuesday, November 09, 2010 6:57 PM
>> To: solr-user@lucene.apache.org
>> Cc: Adam Estrada
>> Subject: Re: Using Multiple Cores for Multiple Users
>> 
>> Hi,
>> 
>>> All,
>>> 
>>> I have a web application that requires the user to register and then
>> login
>>> to gain access to the site. Pretty standard stuff...Now I would like to
>>> know what the best approach would be to implement a "customized" search
>>> experience for each user. Would this mean creating a separate core per
>>> user? I think that this is not possible without restarting Solr after
>> each
>>> core is added to the multi-core xml file, right?
>> 
>> No, you can dynamically manage cores and parts of their configuration.
>> Sometimes you must reindex after a change, the same is true for reloading
>> cores. Check the wiki on this one [1].
>> 
>>> 
>>> My use case is this...User A would like to index 5 RSS feeds and User B
>>> would like to index 5 completely different RSS feeds and he is not
>>> interested at all in what User A is interested in. This means that they
>>> would have to be separate index cores, right?
>> 
>> If you view documents within an rss feed as a separate documents, you can
>> assign an user ID to those documents, creating a multi user index with rss
>> documents per user, or group or whatever.
>> 
>> Having a core per user isn't a good idea if you have many users.  It takes
>> up
>> additional memory and disk space, doesn't share caches etc.  There is also
>> more maintenance and your need some support scripts to dynamically create
>> new
>> cores - Solr currently doesn't create a new core directory structure.
>> 
>> But, reindexing a very large index takes up a lot more time and resources
>> and
>> relevancy might be an issue depending on the rss feeds' contents.
>> 
>>> 
>>> What is the best approach for this kind of thing?
>> 
>> I'd usually store the feeds in a single index and shard if it's too many
>> for a
>> single server with your specifications. Unless the demands are too
>> specific.
>> 
>>> 
>>> Thanks in advance,
>>> Adam
>> 
>> [1]: http://wiki.apache.org/solr/CoreAdmin
>> 
>> Cheers
>> 



Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
So, if my other filter/selection criteria get some set of the whole index that 
goes say from 50% relevance to 60% relevance, the set still gets ordered by 
relevance and then each item in the returned set is still based on its 
relevance 
relative to the set, right? That would only be a problem if there was some 
minimal relevance desired, right?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 8:00:09 PM
Subject: Re: Using Multiple Cores for Multiple Users

Relevance is TF/DF, meaning the term frequency in the index. DF is the
number of times the term appears in the document.

There is no quick calculation for "total frequency for terms only in
these documents". Facets do this, and they're very very slow.

On Tue, Nov 9, 2010 at 7:50 PM, Dennis Gearon  wrote:
> hm, relevance is before filtering, probably during indexing?
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
>better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Lance Norskog 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 9, 2010 7:07:45 PM
> Subject: Re: Using Multiple Cores for Multiple Users
>
> There is a standard problem with this: relevance is determined from
> all of the words in a field of all documents, not just the documents
> that match the query. That is, when user A searches for 'monkeys' and
> one of his feeds has a document with this word, but someone else is a
> zoophile, 'monkeys' will be a common word in the index. This will skew
> the relevance computation for user A.
>
> You could have a separate text field for each user. This might work
> better- but you can't use field norms (they take up space for all
> documents).
>
> Lance
>
> On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
>  wrote:
>> Thanks a lot for all the tips, guys! I think that we may explore both
>> options just to see what happens. I'm sure that scalability will be a huge
>> mess with the core-per-user scenario. I like the idea of creating a user ID
>> field and agree that it's probably the best approach. We'll see...I will be
>> sure to let the list know what I find! Please don't stop posting your
>> comments everyone ;-) My inquiring mind wants to know...
>>
>> Adam
>>
>> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:
>>
>>> If storing in a single index (possibly sharded if you need it), you can
>>> simply include a solr field that specifies the user ID of the saved thing.
>>> On the client side, in your application, simply ensure that there is an fq
>>> parameter limiting to the current user, if you want to limit to the current
>>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>>> cores', there is no relevancy issue.
>>>
>>> It IS true that when your index gets very large, commits will start taking
>>> longer, which can be a problem. I don't mean commits will take longer just
>>> because there is more stuff to commit -- the larger the index, the longer an
>>> update to a single document will take to commit.
>>>
>>> In general, i suspect that having dozens or hundreds (or thousands!) of
>>> cores is not going to scale well, it is not going to make good use of your
>>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>>>
>>> However, you are probably going to run into some issues with the single
>>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>>> an oft-asked question that there doesn't seem to be any "just works and does
>>> everything for you without needing to think about it" solution for in solr.
>>> Judging from past thread. I am not a Solr developer or expert.
>>>
>>> 
>>> From: Markus Jelsma [markus.jel...@openindex.io]
>>> Sent: Tuesday, November 09, 2010 6:57 PM
>>> To: solr-user@luce

Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Lance Norskog
Relevance is TF/DF, meaning the term frequency in the index. DF is the
number of times the term appears in the document.

There is no quick calculation for "total frequency for terms only in
these documents". Facets do this, and they're very very slow.

On Tue, Nov 9, 2010 at 7:50 PM, Dennis Gearon  wrote:
> hm, relevance is before filtering, probably during indexing?
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
> better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Lance Norskog 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 9, 2010 7:07:45 PM
> Subject: Re: Using Multiple Cores for Multiple Users
>
> There is a standard problem with this: relevance is determined from
> all of the words in a field of all documents, not just the documents
> that match the query. That is, when user A searches for 'monkeys' and
> one of his feeds has a document with this word, but someone else is a
> zoophile, 'monkeys' will be a common word in the index. This will skew
> the relevance computation for user A.
>
> You could have a separate text field for each user. This might work
> better- but you can't use field norms (they take up space for all
> documents).
>
> Lance
>
> On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
>  wrote:
>> Thanks a lot for all the tips, guys! I think that we may explore both
>> options just to see what happens. I'm sure that scalability will be a huge
>> mess with the core-per-user scenario. I like the idea of creating a user ID
>> field and agree that it's probably the best approach. We'll see...I will be
>> sure to let the list know what I find! Please don't stop posting your
>> comments everyone ;-) My inquiring mind wants to know...
>>
>> Adam
>>
>> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:
>>
>>> If storing in a single index (possibly sharded if you need it), you can
>>> simply include a solr field that specifies the user ID of the saved thing.
>>> On the client side, in your application, simply ensure that there is an fq
>>> parameter limiting to the current user, if you want to limit to the current
>>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>>> cores', there is no relevancy issue.
>>>
>>> It IS true that when your index gets very large, commits will start taking
>>> longer, which can be a problem. I don't mean commits will take longer just
>>> because there is more stuff to commit -- the larger the index, the longer an
>>> update to a single document will take to commit.
>>>
>>> In general, i suspect that having dozens or hundreds (or thousands!) of
>>> cores is not going to scale well, it is not going to make good use of your
>>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>>>
>>> However, you are probably going to run into some issues with the single
>>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>>> an oft-asked question that there doesn't seem to be any "just works and does
>>> everything for you without needing to think about it" solution for in solr.
>>> Judging from past thread. I am not a Solr developer or expert.
>>>
>>> 
>>> From: Markus Jelsma [markus.jel...@openindex.io]
>>> Sent: Tuesday, November 09, 2010 6:57 PM
>>> To: solr-user@lucene.apache.org
>>> Cc: Adam Estrada
>>> Subject: Re: Using Multiple Cores for Multiple Users
>>>
>>> Hi,
>>>
>>> > All,
>>> >
>>> > I have a web application that requires the user to register and then
>>> login
>>> > to gain access to the site. Pretty standard stuff...Now I would like to
>>> > know what the best approach would be to implement a "customized" search
>>> > experience for each user. Would this mean creating a separate core per
>>> > user? I think that this is not possible without restarting Solr after
>>> each
>>> > core is added to the multi-core xml file, right?
>>>
>>> No, you can dynamically manage cores and parts of their configuration.
>>> Sometimes yo

Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
hm, relevance is before filtering, probably during indexing?
 Dennis Gearon 


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die. 



- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 7:07:45 PM
Subject: Re: Using Multiple Cores for Multiple Users

There is a standard problem with this: relevance is determined from
all of the words in a field of all documents, not just the documents
that match the query. That is, when user A searches for 'monkeys' and
one of his feeds has a document with this word, but someone else is a
zoophile, 'monkeys' will be a common word in the index. This will skew
the relevance computation for user A.

You could have a separate text field for each user. This might work
better- but you can't use field norms (they take up space for all
documents).

Lance

On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
 wrote:
> Thanks a lot for all the tips, guys! I think that we may explore both
> options just to see what happens. I'm sure that scalability will be a huge
> mess with the core-per-user scenario. I like the idea of creating a user ID
> field and agree that it's probably the best approach. We'll see...I will be
> sure to let the list know what I find! Please don't stop posting your
> comments everyone ;-) My inquiring mind wants to know...
>
> Adam
>
> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:
>
>> If storing in a single index (possibly sharded if you need it), you can
>> simply include a solr field that specifies the user ID of the saved thing.
>> On the client side, in your application, simply ensure that there is an fq
>> parameter limiting to the current user, if you want to limit to the current
>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>> cores', there is no relevancy issue.
>>
>> It IS true that when your index gets very large, commits will start taking
>> longer, which can be a problem. I don't mean commits will take longer just
>> because there is more stuff to commit -- the larger the index, the longer an
>> update to a single document will take to commit.
>>
>> In general, i suspect that having dozens or hundreds (or thousands!) of
>> cores is not going to scale well, it is not going to make good use of your
>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>>
>> However, you are probably going to run into some issues with the single
>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>> an oft-asked question that there doesn't seem to be any "just works and does
>> everything for you without needing to think about it" solution for in solr.
>> Judging from past thread. I am not a Solr developer or expert.
>>
>> 
>> From: Markus Jelsma [markus.jel...@openindex.io]
>> Sent: Tuesday, November 09, 2010 6:57 PM
>> To: solr-user@lucene.apache.org
>> Cc: Adam Estrada
>> Subject: Re: Using Multiple Cores for Multiple Users
>>
>> Hi,
>>
>> > All,
>> >
>> > I have a web application that requires the user to register and then
>> login
>> > to gain access to the site. Pretty standard stuff...Now I would like to
>> > know what the best approach would be to implement a "customized" search
>> > experience for each user. Would this mean creating a separate core per
>> > user? I think that this is not possible without restarting Solr after
>> each
>> > core is added to the multi-core xml file, right?
>>
>> No, you can dynamically manage cores and parts of their configuration.
>> Sometimes you must reindex after a change, the same is true for reloading
>> cores. Check the wiki on this one [1].
>>
>> >
>> > My use case is this...User A would like to index 5 RSS feeds and User B
>> > would like to index 5 completely different RSS feeds and he is not
>> > interested at all in what User A is interested in. This means that they
>> > would have to be separate index cores, right?
>>
>> If you view documents within an rss feed as a separate documents, you can
>> assign an user ID to those documents, creating a multi user index with rss
>> documents per user, or group or whatever.
>>
>> Having a core per user isn

Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Lance Norskog
There is a standard problem with this: relevance is determined from
all of the words in a field of all documents, not just the documents
that match the query. That is, when user A searches for 'monkeys' and
one of his feeds has a document with this word, but someone else is a
zoophile, 'monkeys' will be a common word in the index. This will skew
the relevance computation for user A.

You could have a separate text field for each user. This might work
better- but you can't use field norms (they take up space for all
documents).

Lance

On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
 wrote:
> Thanks a lot for all the tips, guys! I think that we may explore both
> options just to see what happens. I'm sure that scalability will be a huge
> mess with the core-per-user scenario. I like the idea of creating a user ID
> field and agree that it's probably the best approach. We'll see...I will be
> sure to let the list know what I find! Please don't stop posting your
> comments everyone ;-) My inquiring mind wants to know...
>
> Adam
>
> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:
>
>> If storing in a single index (possibly sharded if you need it), you can
>> simply include a solr field that specifies the user ID of the saved thing.
>> On the client side, in your application, simply ensure that there is an fq
>> parameter limiting to the current user, if you want to limit to the current
>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>> cores', there is no relevancy issue.
>>
>> It IS true that when your index gets very large, commits will start taking
>> longer, which can be a problem. I don't mean commits will take longer just
>> because there is more stuff to commit -- the larger the index, the longer an
>> update to a single document will take to commit.
>>
>> In general, i suspect that having dozens or hundreds (or thousands!) of
>> cores is not going to scale well, it is not going to make good use of your
>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>>
>> However, you are probably going to run into some issues with the single
>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>> an oft-asked question that there doesn't seem to be any "just works and does
>> everything for you without needing to think about it" solution for in solr.
>> Judging from past thread. I am not a Solr developer or expert.
>>
>> 
>> From: Markus Jelsma [markus.jel...@openindex.io]
>> Sent: Tuesday, November 09, 2010 6:57 PM
>> To: solr-user@lucene.apache.org
>> Cc: Adam Estrada
>> Subject: Re: Using Multiple Cores for Multiple Users
>>
>> Hi,
>>
>> > All,
>> >
>> > I have a web application that requires the user to register and then
>> login
>> > to gain access to the site. Pretty standard stuff...Now I would like to
>> > know what the best approach would be to implement a "customized" search
>> > experience for each user. Would this mean creating a separate core per
>> > user? I think that this is not possible without restarting Solr after
>> each
>> > core is added to the multi-core xml file, right?
>>
>> No, you can dynamically manage cores and parts of their configuration.
>> Sometimes you must reindex after a change, the same is true for reloading
>> cores. Check the wiki on this one [1].
>>
>> >
>> > My use case is this...User A would like to index 5 RSS feeds and User B
>> > would like to index 5 completely different RSS feeds and he is not
>> > interested at all in what User A is interested in. This means that they
>> > would have to be separate index cores, right?
>>
>> If you view documents within an rss feed as a separate documents, you can
>> assign an user ID to those documents, creating a multi user index with rss
>> documents per user, or group or whatever.
>>
>> Having a core per user isn't a good idea if you have many users.  It takes
>> up
>> additional memory and disk space, doesn't share caches etc.  There is also
>> more maintenance and your need some support scripts to dynamically create
>> new
>> cores - Solr currently doesn't create a new core directory structure.
>>
>> But, reindexing a very large index takes up a lot more time and resources
>> and
>> relevancy might be an issue depending on the rss feeds' contents.
>>
>> >
>> > What is the best approach for this kind of thing?
>>
>> I'd usually store the feeds in a single index and shard if it's too many
>> for a
>> single server with your specifications. Unless the demands are too
>> specific.
>>
>> >
>> > Thanks in advance,
>> > Adam
>>
>> [1]: http://wiki.apache.org/solr/CoreAdmin
>>
>> Cheers
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Adam Estrada
Thanks a lot for all the tips, guys! I think that we may explore both
options just to see what happens. I'm sure that scalability will be a huge
mess with the core-per-user scenario. I like the idea of creating a user ID
field and agree that it's probably the best approach. We'll see...I will be
sure to let the list know what I find! Please don't stop posting your
comments everyone ;-) My inquiring mind wants to know...

Adam

On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:

> If storing in a single index (possibly sharded if you need it), you can
> simply include a solr field that specifies the user ID of the saved thing.
> On the client side, in your application, simply ensure that there is an fq
> parameter limiting to the current user, if you want to limit to the current
> user's stuff.  Relevancy ranking should work just as if you had 'seperate
> cores', there is no relevancy issue.
>
> It IS true that when your index gets very large, commits will start taking
> longer, which can be a problem. I don't mean commits will take longer just
> because there is more stuff to commit -- the larger the index, the longer an
> update to a single document will take to commit.
>
> In general, i suspect that having dozens or hundreds (or thousands!) of
> cores is not going to scale well, it is not going to make good use of your
> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>
> However, you are probably going to run into some issues with the single
> index approach too. In general, how to deal with "multi-tenancy" in Solr is
> an oft-asked question that there doesn't seem to be any "just works and does
> everything for you without needing to think about it" solution for in solr.
> Judging from past thread. I am not a Solr developer or expert.
>
> 
> From: Markus Jelsma [markus.jel...@openindex.io]
> Sent: Tuesday, November 09, 2010 6:57 PM
> To: solr-user@lucene.apache.org
> Cc: Adam Estrada
> Subject: Re: Using Multiple Cores for Multiple Users
>
> Hi,
>
> > All,
> >
> > I have a web application that requires the user to register and then
> login
> > to gain access to the site. Pretty standard stuff...Now I would like to
> > know what the best approach would be to implement a "customized" search
> > experience for each user. Would this mean creating a separate core per
> > user? I think that this is not possible without restarting Solr after
> each
> > core is added to the multi-core xml file, right?
>
> No, you can dynamically manage cores and parts of their configuration.
> Sometimes you must reindex after a change, the same is true for reloading
> cores. Check the wiki on this one [1].
>
> >
> > My use case is this...User A would like to index 5 RSS feeds and User B
> > would like to index 5 completely different RSS feeds and he is not
> > interested at all in what User A is interested in. This means that they
> > would have to be separate index cores, right?
>
> If you view documents within an rss feed as a separate documents, you can
> assign an user ID to those documents, creating a multi user index with rss
> documents per user, or group or whatever.
>
> Having a core per user isn't a good idea if you have many users.  It takes
> up
> additional memory and disk space, doesn't share caches etc.  There is also
> more maintenance and your need some support scripts to dynamically create
> new
> cores - Solr currently doesn't create a new core directory structure.
>
> But, reindexing a very large index takes up a lot more time and resources
> and
> relevancy might be an issue depending on the rss feeds' contents.
>
> >
> > What is the best approach for this kind of thing?
>
> I'd usually store the feeds in a single index and shard if it's too many
> for a
> single server with your specifications. Unless the demands are too
> specific.
>
> >
> > Thanks in advance,
> > Adam
>
> [1]: http://wiki.apache.org/solr/CoreAdmin
>
> Cheers
>


RE: Using Multiple Cores for Multiple Users

2010-11-09 Thread Jonathan Rochkind
If storing in a single index (possibly sharded if you need it), you can simply 
include a solr field that specifies the user ID of the saved thing. On the 
client side, in your application, simply ensure that there is an fq parameter 
limiting to the current user, if you want to limit to the current user's stuff. 
 Relevancy ranking should work just as if you had 'seperate cores', there is no 
relevancy issue. 

It IS true that when your index gets very large, commits will start taking 
longer, which can be a problem. I don't mean commits will take longer just 
because there is more stuff to commit -- the larger the index, the longer an 
update to a single document will take to commit. 

In general, i suspect that having dozens or hundreds (or thousands!) of cores 
is not going to scale well, it is not going to make good use of your cpu/ram/hd 
resources.   Not really the intended use case of multiple cores. 

However, you are probably going to run into some issues with the single index 
approach too. In general, how to deal with "multi-tenancy" in Solr is an 
oft-asked question that there doesn't seem to be any "just works and does 
everything for you without needing to think about it" solution for in solr. 
Judging from past thread. I am not a Solr developer or expert. 


From: Markus Jelsma [markus.jel...@openindex.io]
Sent: Tuesday, November 09, 2010 6:57 PM
To: solr-user@lucene.apache.org
Cc: Adam Estrada
Subject: Re: Using Multiple Cores for Multiple Users

Hi,

> All,
>
> I have a web application that requires the user to register and then login
> to gain access to the site. Pretty standard stuff...Now I would like to
> know what the best approach would be to implement a "customized" search
> experience for each user. Would this mean creating a separate core per
> user? I think that this is not possible without restarting Solr after each
> core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration.
Sometimes you must reindex after a change, the same is true for reloading
cores. Check the wiki on this one [1].

>
> My use case is this...User A would like to index 5 RSS feeds and User B
> would like to index 5 completely different RSS feeds and he is not
> interested at all in what User A is interested in. This means that they
> would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can
assign an user ID to those documents, creating a multi user index with rss
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up
additional memory and disk space, doesn't share caches etc.  There is also
more maintenance and your need some support scripts to dynamically create new
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and
relevancy might be an issue depending on the rss feeds' contents.

>
> What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a
single server with your specifications. Unless the demands are too specific.

>
> Thanks in advance,
> Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
I'm willing to bet a lot that the standard approach is to use a Server Side 
Langauge to customize the queries for the user . . . on the same core/set of 
cores.

The only reasons that my limited experience suggests for a 'core per user' is 
privacy/performance. Unless you have a very small set of users, I would think 
managing cores for LOTS of users to be PIA. Create one (takes time), replicate 
to it (takes MORE time), use it, destroy it after session expires (requires 
garbage collection program running pretty often)(LOTS more time/CPU resource 
taken up.

I am happy to be corrected on any of this.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Markus Jelsma 
To: solr-user@lucene.apache.org
Cc: Adam Estrada 
Sent: Tue, November 9, 2010 3:57:34 PM
Subject: Re: Using Multiple Cores for Multiple Users

Hi,

> All,
> 
> I have a web application that requires the user to register and then login
> to gain access to the site. Pretty standard stuff...Now I would like to
> know what the best approach would be to implement a "customized" search
> experience for each user. Would this mean creating a separate core per
> user? I think that this is not possible without restarting Solr after each
> core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration. 
Sometimes you must reindex after a change, the same is true for reloading 
cores. Check the wiki on this one [1].

> 
> My use case is this...User A would like to index 5 RSS feeds and User B
> would like to index 5 completely different RSS feeds and he is not
> interested at all in what User A is interested in. This means that they
> would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can 
assign an user ID to those documents, creating a multi user index with rss 
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up 
additional memory and disk space, doesn't share caches etc.  There is also 
more maintenance and your need some support scripts to dynamically create new 
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and 
relevancy might be an issue depending on the rss feeds' contents. 

> 
> What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a 
single server with your specifications. Unless the demands are too specific.

> 
> Thanks in advance,
> Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers



Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Markus Jelsma
Hi,

> All,
> 
> I have a web application that requires the user to register and then login
> to gain access to the site. Pretty standard stuff...Now I would like to
> know what the best approach would be to implement a "customized" search
> experience for each user. Would this mean creating a separate core per
> user? I think that this is not possible without restarting Solr after each
> core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration. 
Sometimes you must reindex after a change, the same is true for reloading 
cores. Check the wiki on this one [1].

> 
> My use case is this...User A would like to index 5 RSS feeds and User B
> would like to index 5 completely different RSS feeds and he is not
> interested at all in what User A is interested in. This means that they
> would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can 
assign an user ID to those documents, creating a multi user index with rss 
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up 
additional memory and disk space, doesn't share caches etc.  There is also 
more maintenance and your need some support scripts to dynamically create new 
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and 
relevancy might be an issue depending on the rss feeds' contents. 

> 
> What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a 
single server with your specifications. Unless the demands are too specific.

> 
> Thanks in advance,
> Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers