Re: how well does multicore scale?
This is why using 'groups' as intermidiary permission objects came into existence in databases. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Wed, 10/27/10, mike anderson wrote: > From: mike anderson > Subject: Re: how well does multicore scale? > To: solr-user@lucene.apache.org > Date: Wednesday, October 27, 2010, 5:20 AM > Tagging every document with a few > hundred thousand 6 character user-ids > would increase the document size by two orders of > magnitude. I can't > imagine why this wouldn't mean the index would increase by > just as much > (though I really don't know much about that file > structure). By my simple > math, this would mean that if we want each shard's index to > be able to fit > in memory, then (even with some beefy servers) each query > would have to go > out to a few thousand shards (as opposed to 21 if we used > the MultiCore > approach). This means the typical response time would be > much slower. > > > -mike > > On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind wrote: > > > mike anderson wrote: > > > >> I'm really curious if there is a clever solution > to the obvious problem > >> with: "So your better off using a single index and > with a user id and use > >> a query filter with the user id when fetching > data.", i.e.. when you have > >> hundreds of thousands of user IDs tagged on each > article. That just > >> doesn't > >> sound like it scales very well.. > >> > >> > > Actually, I think that design would scale pretty fine, > I don't think > > there's an 'obvious' problem. You store your userIDs > in a multi-valued field > > (or as multiple terms in a single value, ends up being > similar). You fq on > > there with the current > userID. There's one way to find out of > course, but > > that doesn't seem a patently ridiculous scenario or > anything, that's the > > kind of thing Solr is generally good at, it's what > it's built for. The > > problem might actually be in the time it takes to add > such a document to the > > index; but not in query time. > > > > Doesn't mean it's the best solution for your problem > though, I can't say. > > > > My impression is that Solr in general isn't really > designed to support the > > kind of multi-tenancy use case people are talking > about lately. So trying > > to make it work anyway... if multi-cores work for you, > then great, but be > > aware they weren't really designed for that (having > thousands of cores) and > > may not. If a single index can work for you instead, > great, but as you've > > discovered it's not neccesarily obvious how to set up > the schema to do what > > you need -- really this applies to Solr in general, > unlike an rdbms where > > you just third-form-normalize everything and figure > it'll work for almost > > any use case that comes up, in Solr you > generally need to custom fit the > > schema for your particular use cases, sometimes being > kind of clever to > > figure out the optimal way to do that. > > > > This is, I'd argue/agree, indeed kind of a > disadvantage, setting up a Solr > > index takes more intellectual work than setting up an > rdbms. The trade off > > is you get speed, and flexible ways to set up > relevancy (that still perform > > well). Took a couple decades for rdbms to get as > brainless to use as they > > are, maybe in a couple more we'll have figured out > ways to make indexing > > engines like solr equally brainless, but not yet -- > but it's still pretty > > damn easy for what it is, the lucene/Solr folks have > done a remarkable job. > > >
RE: how well does multicore scale?
mike anderson [saidthero...@gmail.com] wrote: > That's a great point. If SSDs are sufficient, then what does the "Index size > vs Response time" curve look like? Since that would dictate the number > of machines needed. I took a look at > http://wiki.apache.org/solr/SolrPerformanceData but only one use case > seemed comparable. I generally find it very hard to compare acrosse setups. Looking at SolrPerformanceData for example, we see that CNET Shopper has a very poor resposetime/size ratio, while HathiTrust is a lot better. This is not too surprising as CNET seems to use quite advanced searching where HathiTrust's is more simple, but it does illustrate that comparisons are not easy. However, as long as I/O has been identified as the main bottleneck for a given setup, relative gains from different storage back ends should be fairly comparable across setups. We did some work on storage testing with Lucene two years ago (see the I-wish-I-had-the-time-to-update-this page at http://wiki.statsbiblioteket.dk/summa/Hardware), but unfortunately we did very little testing on scaling over index size. ... I just digged out some old measurements that says a little bit: We tried changing the size of out index (by deleting every X document and optimizing) and performing 350K queries with extraction of 2 or 3 fairly small fields for the first 20 hits from each. The machine was capped at 4GB of RAM. I am fairly certain the searcher was single threaded and there were no web-services involved, so this is very raw Lucene speed: 4GB index: 626 queries/second 9GB index: 405 queries/second 17GB index: 205 queries/second 26GB index: 188 queries/second Not a lot of measurement points and I wish I had data for larger index sizes, as it seems that the curve is flattening quite drastically at the end. Graph at http://www.mathcracker.com/scatterplotimage.php?datax=4,9,17,26&datay=626,405,205,188&namex=Index%20size%20in%20GB&namey=queries/second&titl=SSD%20scaling%20performance%20with%20Lucene > We currently have about 25M docs, split into 18 shards, with a > total index size of about 120GB. If index size has truly little > impact on performance then perhaps tagging articles with user > IDs is a better way to approach my use case. I don't know your budget, but do consider buying a single 160GB Intel X25-M or one of the new 256GB SandForce-based SSDs for testing. If it does not deliver what you hoped for, you'll be happy to put it in your workstation. It would be nice if there were some sort of corpus generator that generated Zipfian-distributed data and sample queries so that we could do large scale testing on different hardware without having to share sample data. Regards, Toke Eskildsen
Re: how well does multicore scale?
That's a great point. If SSDs are sufficient, then what does the "Index size vs Response time" curve look like? Since that would dictate the number of machines needed. I took a look at http://wiki.apache.org/solr/SolrPerformanceData but only one use case seemed comparable. We currently have about 25M docs, split into 18 shards, with a total index size of about 120GB. If index size has truly little impact on performance then perhaps tagging articles with user IDs is a better way to approach my use case. -Mike On Wed, Oct 27, 2010 at 9:45 AM, Toke Eskildsen wrote: > On Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote: > > [...] By my simple math, this would mean that if we want each shard's > > index to be able to fit in memory, [...] > > Might I ask why you're planning on using memory-based sharding? The > performance gap between memory and SSDs is not very big so using memory > to get those last queries/second is quite expensive. > >
Re: how well does multicore scale?
On Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote: > [...] By my simple math, this would mean that if we want each shard's > index to be able to fit in memory, [...] Might I ask why you're planning on using memory-based sharding? The performance gap between memory and SSDs is not very big so using memory to get those last queries/second is quite expensive.
Re: how well does multicore scale?
Hi mike, I think I wasn't clear, Each document will only be tagged with one user_id, or to be specific one tenant_id. Users of the same tenant can't upload the same document to the same path. So I use this to make the key unique for each tenant. So I can index, delete without a problem. On Wed, Oct 27, 2010 at 5:50 PM, mike anderson wrote: > Tagging every document with a few hundred thousand 6 character user-ids > would increase the document size by two orders of magnitude. I can't > imagine why this wouldn't mean the index would increase by just as much > (though I really don't know much about that file structure). By my simple > math, this would mean that if we want each shard's index to be able to fit > in memory, then (even with some beefy servers) each query would have to go > out to a few thousand shards (as opposed to 21 if we used the MultiCore > approach). This means the typical response time would be much slower. > > > -mike > > On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind wrote: > >> mike anderson wrote: >> >>> I'm really curious if there is a clever solution to the obvious problem >>> with: "So your better off using a single index and with a user id and use >>> a query filter with the user id when fetching data.", i.e.. when you have >>> hundreds of thousands of user IDs tagged on each article. That just >>> doesn't >>> sound like it scales very well.. >>> >>> >> Actually, I think that design would scale pretty fine, I don't think >> there's an 'obvious' problem. You store your userIDs in a multi-valued field >> (or as multiple terms in a single value, ends up being similar). You fq on >> there with the current userID. There's one way to find out of course, but >> that doesn't seem a patently ridiculous scenario or anything, that's the >> kind of thing Solr is generally good at, it's what it's built for. The >> problem might actually be in the time it takes to add such a document to the >> index; but not in query time. >> >> Doesn't mean it's the best solution for your problem though, I can't say. >> >> My impression is that Solr in general isn't really designed to support the >> kind of multi-tenancy use case people are talking about lately. So trying >> to make it work anyway... if multi-cores work for you, then great, but be >> aware they weren't really designed for that (having thousands of cores) and >> may not. If a single index can work for you instead, great, but as you've >> discovered it's not neccesarily obvious how to set up the schema to do what >> you need -- really this applies to Solr in general, unlike an rdbms where >> you just third-form-normalize everything and figure it'll work for almost >> any use case that comes up, in Solr you generally need to custom fit the >> schema for your particular use cases, sometimes being kind of clever to >> figure out the optimal way to do that. >> >> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr >> index takes more intellectual work than setting up an rdbms. The trade off >> is you get speed, and flexible ways to set up relevancy (that still perform >> well). Took a couple decades for rdbms to get as brainless to use as they >> are, maybe in a couple more we'll have figured out ways to make indexing >> engines like solr equally brainless, but not yet -- but it's still pretty >> damn easy for what it is, the lucene/Solr folks have done a remarkable job. >> > -- Regards, Tharindu
Re: how well does multicore scale?
Tagging every document with a few hundred thousand 6 character user-ids would increase the document size by two orders of magnitude. I can't imagine why this wouldn't mean the index would increase by just as much (though I really don't know much about that file structure). By my simple math, this would mean that if we want each shard's index to be able to fit in memory, then (even with some beefy servers) each query would have to go out to a few thousand shards (as opposed to 21 if we used the MultiCore approach). This means the typical response time would be much slower. -mike On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind wrote: > mike anderson wrote: > >> I'm really curious if there is a clever solution to the obvious problem >> with: "So your better off using a single index and with a user id and use >> a query filter with the user id when fetching data.", i.e.. when you have >> hundreds of thousands of user IDs tagged on each article. That just >> doesn't >> sound like it scales very well.. >> >> > Actually, I think that design would scale pretty fine, I don't think > there's an 'obvious' problem. You store your userIDs in a multi-valued field > (or as multiple terms in a single value, ends up being similar). You fq on > there with the current userID. There's one way to find out of course, but > that doesn't seem a patently ridiculous scenario or anything, that's the > kind of thing Solr is generally good at, it's what it's built for. The > problem might actually be in the time it takes to add such a document to the > index; but not in query time. > > Doesn't mean it's the best solution for your problem though, I can't say. > > My impression is that Solr in general isn't really designed to support the > kind of multi-tenancy use case people are talking about lately. So trying > to make it work anyway... if multi-cores work for you, then great, but be > aware they weren't really designed for that (having thousands of cores) and > may not. If a single index can work for you instead, great, but as you've > discovered it's not neccesarily obvious how to set up the schema to do what > you need -- really this applies to Solr in general, unlike an rdbms where > you just third-form-normalize everything and figure it'll work for almost > any use case that comes up, in Solr you generally need to custom fit the > schema for your particular use cases, sometimes being kind of clever to > figure out the optimal way to do that. > > This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr > index takes more intellectual work than setting up an rdbms. The trade off > is you get speed, and flexible ways to set up relevancy (that still perform > well). Took a couple decades for rdbms to get as brainless to use as they > are, maybe in a couple more we'll have figured out ways to make indexing > engines like solr equally brainless, but not yet -- but it's still pretty > damn easy for what it is, the lucene/Solr folks have done a remarkable job. >
Re: how well does multicore scale?
Creating a unique id for a schema is one of those design tasks: http://wiki.apache.org/solr/UniqueKey A marvelously lucid and well-written page, if I do say so. And I do. On Tue, Oct 26, 2010 at 10:16 PM, Tharindu Mathew wrote: > Really great to know you were able to fire up about 100 cores. But, > when it scales up to around 1000 or even more. I wonder how it would > perform. > > I have a question regarding ids i.e. the unique key. Since there is a > potential use case that two users might add the same document, how > would we set the id. I was thinking of appending the user id to the an > id I would use ex: "/system/bar.pdfuserid25". Otherwise, solr would > replace the document of one user, which is not what we want. > > This is also applicable to deleteById. Is there a better way to do this? > > On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind wrote: >> mike anderson wrote: >>> >>> I'm really curious if there is a clever solution to the obvious problem >>> with: "So your better off using a single index and with a user id and use >>> a query filter with the user id when fetching data.", i.e.. when you have >>> hundreds of thousands of user IDs tagged on each article. That just >>> doesn't >>> sound like it scales very well.. >>> >> >> Actually, I think that design would scale pretty fine, I don't think there's >> an 'obvious' problem. You store your userIDs in a multi-valued field (or as >> multiple terms in a single value, ends up being similar). You fq on there >> with the current userID. There's one way to find out of course, but that >> doesn't seem a patently ridiculous scenario or anything, that's the kind of >> thing Solr is generally good at, it's what it's built for. The problem >> might actually be in the time it takes to add such a document to the index; >> but not in query time. >> >> Doesn't mean it's the best solution for your problem though, I can't say. >> >> My impression is that Solr in general isn't really designed to support the >> kind of multi-tenancy use case people are talking about lately. So trying >> to make it work anyway... if multi-cores work for you, then great, but be >> aware they weren't really designed for that (having thousands of cores) and >> may not. If a single index can work for you instead, great, but as you've >> discovered it's not neccesarily obvious how to set up the schema to do what >> you need -- really this applies to Solr in general, unlike an rdbms where >> you just third-form-normalize everything and figure it'll work for almost >> any use case that comes up, in Solr you generally need to custom fit the >> schema for your particular use cases, sometimes being kind of clever to >> figure out the optimal way to do that. >> >> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr >> index takes more intellectual work than setting up an rdbms. The trade off >> is you get speed, and flexible ways to set up relevancy (that still perform >> well). Took a couple decades for rdbms to get as brainless to use as they >> are, maybe in a couple more we'll have figured out ways to make indexing >> engines like solr equally brainless, but not yet -- but it's still pretty >> damn easy for what it is, the lucene/Solr folks have done a remarkable job. >> > > > > -- > Regards, > > Tharindu > -- Lance Norskog goks...@gmail.com
Re: how well does multicore scale?
Really great to know you were able to fire up about 100 cores. But, when it scales up to around 1000 or even more. I wonder how it would perform. I have a question regarding ids i.e. the unique key. Since there is a potential use case that two users might add the same document, how would we set the id. I was thinking of appending the user id to the an id I would use ex: "/system/bar.pdfuserid25". Otherwise, solr would replace the document of one user, which is not what we want. This is also applicable to deleteById. Is there a better way to do this? On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind wrote: > mike anderson wrote: >> >> I'm really curious if there is a clever solution to the obvious problem >> with: "So your better off using a single index and with a user id and use >> a query filter with the user id when fetching data.", i.e.. when you have >> hundreds of thousands of user IDs tagged on each article. That just >> doesn't >> sound like it scales very well.. >> > > Actually, I think that design would scale pretty fine, I don't think there's > an 'obvious' problem. You store your userIDs in a multi-valued field (or as > multiple terms in a single value, ends up being similar). You fq on there > with the current userID. There's one way to find out of course, but that > doesn't seem a patently ridiculous scenario or anything, that's the kind of > thing Solr is generally good at, it's what it's built for. The problem > might actually be in the time it takes to add such a document to the index; > but not in query time. > > Doesn't mean it's the best solution for your problem though, I can't say. > > My impression is that Solr in general isn't really designed to support the > kind of multi-tenancy use case people are talking about lately. So trying > to make it work anyway... if multi-cores work for you, then great, but be > aware they weren't really designed for that (having thousands of cores) and > may not. If a single index can work for you instead, great, but as you've > discovered it's not neccesarily obvious how to set up the schema to do what > you need -- really this applies to Solr in general, unlike an rdbms where > you just third-form-normalize everything and figure it'll work for almost > any use case that comes up, in Solr you generally need to custom fit the > schema for your particular use cases, sometimes being kind of clever to > figure out the optimal way to do that. > > This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr > index takes more intellectual work than setting up an rdbms. The trade off > is you get speed, and flexible ways to set up relevancy (that still perform > well). Took a couple decades for rdbms to get as brainless to use as they > are, maybe in a couple more we'll have figured out ways to make indexing > engines like solr equally brainless, but not yet -- but it's still pretty > damn easy for what it is, the lucene/Solr folks have done a remarkable job. > -- Regards, Tharindu
Re: how well does multicore scale?
mike anderson wrote: I'm really curious if there is a clever solution to the obvious problem with: "So your better off using a single index and with a user id and use a query filter with the user id when fetching data.", i.e.. when you have hundreds of thousands of user IDs tagged on each article. That just doesn't sound like it scales very well.. Actually, I think that design would scale pretty fine, I don't think there's an 'obvious' problem. You store your userIDs in a multi-valued field (or as multiple terms in a single value, ends up being similar). You fq on there with the current userID. There's one way to find out of course, but that doesn't seem a patently ridiculous scenario or anything, that's the kind of thing Solr is generally good at, it's what it's built for. The problem might actually be in the time it takes to add such a document to the index; but not in query time. Doesn't mean it's the best solution for your problem though, I can't say. My impression is that Solr in general isn't really designed to support the kind of multi-tenancy use case people are talking about lately. So trying to make it work anyway... if multi-cores work for you, then great, but be aware they weren't really designed for that (having thousands of cores) and may not. If a single index can work for you instead, great, but as you've discovered it's not neccesarily obvious how to set up the schema to do what you need -- really this applies to Solr in general, unlike an rdbms where you just third-form-normalize everything and figure it'll work for almost any use case that comes up, in Solr you generally need to custom fit the schema for your particular use cases, sometimes being kind of clever to figure out the optimal way to do that. This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr index takes more intellectual work than setting up an rdbms. The trade off is you get speed, and flexible ways to set up relevancy (that still perform well). Took a couple decades for rdbms to get as brainless to use as they are, maybe in a couple more we'll have figured out ways to make indexing engines like solr equally brainless, but not yet -- but it's still pretty damn easy for what it is, the lucene/Solr folks have done a remarkable job.
Re: how well does multicore scale?
So I fired up about 100 cores and used JMeter to fire off a few thousand queries. It looks like the memory usage isn't much worse than running a single shard. So thats good. I'm really curious if there is a clever solution to the obvious problem with: "So your better off using a single index and with a user id and use a query filter with the user id when fetching data.", i.e.. when you have hundreds of thousands of user IDs tagged on each article. That just doesn't sound like it scales very well.. Cheers, Mike On Fri, Oct 22, 2010 at 10:43 PM, Lance Norskog wrote: > http://wiki.apache.org/solr/CoreAdmin > > Since Solr 1.3 > > On Fri, Oct 22, 2010 at 1:40 PM, mike anderson > wrote: > > Thanks for the advice, everyone. I'll take a look at the API mentioned > and > > do some benchmarking over the weekend. > > > > -Mike > > > > > > On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller > wrote: > > > >> On 10/22/10 1:44 AM, Tharindu Mathew wrote: > >> > Hi Mike, > >> > > >> > I've also considered using a separate cores in a multi tenant > >> > application, ie a separate core for each tenant/domain. But the cores > >> > do not suit that purpose. > >> > > >> > If you check out documentation no real API support exists for this so > >> > it can be done dynamically through SolrJ. And all use cases I found, > >> > only had users configuring it statically and then using it. That was > >> > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. > >> > >> You can dynamically manage cores with solrj. See > >> org.apache.solr.client.solrj.request.CoreAdminRequest's static methods > >> for a place to start. > >> > >> You probably want to turn solr.xml's persist option on so that your > >> cores survive restarts. > >> > >> > > >> > So your better off using a single index and with a user id and use a > >> > query filter with the user id when fetching data. > >> > >> Many times this is probably the case - pro's and con's to each depending > >> on what you are up to. > >> > >> - Mark > >> lucidimagination.com > >> > >> > > >> > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind > >> wrote: > >> >> No, it does not seem reasonable. Why do you think you need a > seperate > >> core > >> >> for every user? > >> >> mike anderson wrote: > >> >>> > >> >>> I'm exploring the possibility of using cores as a solution to > "bookmark > >> >>> folders" in my solr application. This would mean I'll need tens of > >> >>> thousands > >> >>> of cores... does this seem reasonable? I have plenty of CPUs > available > >> for > >> >>> scaling, but I wonder about the memory overhead of adding cores > (aside > >> >>> from > >> >>> needing to fit the new index in memory). > >> >>> > >> >>> Thoughts? > >> >>> > >> >>> -mike > >> >>> > >> >>> > >> >> > >> > > >> > > >> > > >> > >> > > > > > > -- > Lance Norskog > goks...@gmail.com >
Re: how well does multicore scale?
http://wiki.apache.org/solr/CoreAdmin Since Solr 1.3 On Fri, Oct 22, 2010 at 1:40 PM, mike anderson wrote: > Thanks for the advice, everyone. I'll take a look at the API mentioned and > do some benchmarking over the weekend. > > -Mike > > > On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller wrote: > >> On 10/22/10 1:44 AM, Tharindu Mathew wrote: >> > Hi Mike, >> > >> > I've also considered using a separate cores in a multi tenant >> > application, ie a separate core for each tenant/domain. But the cores >> > do not suit that purpose. >> > >> > If you check out documentation no real API support exists for this so >> > it can be done dynamically through SolrJ. And all use cases I found, >> > only had users configuring it statically and then using it. That was >> > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. >> >> You can dynamically manage cores with solrj. See >> org.apache.solr.client.solrj.request.CoreAdminRequest's static methods >> for a place to start. >> >> You probably want to turn solr.xml's persist option on so that your >> cores survive restarts. >> >> > >> > So your better off using a single index and with a user id and use a >> > query filter with the user id when fetching data. >> >> Many times this is probably the case - pro's and con's to each depending >> on what you are up to. >> >> - Mark >> lucidimagination.com >> >> > >> > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind >> wrote: >> >> No, it does not seem reasonable. Why do you think you need a seperate >> core >> >> for every user? >> >> mike anderson wrote: >> >>> >> >>> I'm exploring the possibility of using cores as a solution to "bookmark >> >>> folders" in my solr application. This would mean I'll need tens of >> >>> thousands >> >>> of cores... does this seem reasonable? I have plenty of CPUs available >> for >> >>> scaling, but I wonder about the memory overhead of adding cores (aside >> >>> from >> >>> needing to fit the new index in memory). >> >>> >> >>> Thoughts? >> >>> >> >>> -mike >> >>> >> >>> >> >> >> > >> > >> > >> >> > -- Lance Norskog goks...@gmail.com
Re: how well does multicore scale?
Thanks for the advice, everyone. I'll take a look at the API mentioned and do some benchmarking over the weekend. -Mike On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller wrote: > On 10/22/10 1:44 AM, Tharindu Mathew wrote: > > Hi Mike, > > > > I've also considered using a separate cores in a multi tenant > > application, ie a separate core for each tenant/domain. But the cores > > do not suit that purpose. > > > > If you check out documentation no real API support exists for this so > > it can be done dynamically through SolrJ. And all use cases I found, > > only had users configuring it statically and then using it. That was > > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. > > You can dynamically manage cores with solrj. See > org.apache.solr.client.solrj.request.CoreAdminRequest's static methods > for a place to start. > > You probably want to turn solr.xml's persist option on so that your > cores survive restarts. > > > > > So your better off using a single index and with a user id and use a > > query filter with the user id when fetching data. > > Many times this is probably the case - pro's and con's to each depending > on what you are up to. > > - Mark > lucidimagination.com > > > > > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind > wrote: > >> No, it does not seem reasonable. Why do you think you need a seperate > core > >> for every user? > >> mike anderson wrote: > >>> > >>> I'm exploring the possibility of using cores as a solution to "bookmark > >>> folders" in my solr application. This would mean I'll need tens of > >>> thousands > >>> of cores... does this seem reasonable? I have plenty of CPUs available > for > >>> scaling, but I wonder about the memory overhead of adding cores (aside > >>> from > >>> needing to fit the new index in memory). > >>> > >>> Thoughts? > >>> > >>> -mike > >>> > >>> > >> > > > > > > > >
Re: how well does multicore scale?
On 10/22/10 1:44 AM, Tharindu Mathew wrote: > Hi Mike, > > I've also considered using a separate cores in a multi tenant > application, ie a separate core for each tenant/domain. But the cores > do not suit that purpose. > > If you check out documentation no real API support exists for this so > it can be done dynamically through SolrJ. And all use cases I found, > only had users configuring it statically and then using it. That was > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. You can dynamically manage cores with solrj. See org.apache.solr.client.solrj.request.CoreAdminRequest's static methods for a place to start. You probably want to turn solr.xml's persist option on so that your cores survive restarts. > > So your better off using a single index and with a user id and use a > query filter with the user id when fetching data. Many times this is probably the case - pro's and con's to each depending on what you are up to. - Mark lucidimagination.com > > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind wrote: >> No, it does not seem reasonable. Why do you think you need a seperate core >> for every user? >> mike anderson wrote: >>> >>> I'm exploring the possibility of using cores as a solution to "bookmark >>> folders" in my solr application. This would mean I'll need tens of >>> thousands >>> of cores... does this seem reasonable? I have plenty of CPUs available for >>> scaling, but I wonder about the memory overhead of adding cores (aside >>> from >>> needing to fit the new index in memory). >>> >>> Thoughts? >>> >>> -mike >>> >>> >> > > >
Re: how well does multicore scale?
On Fri, Oct 22, 2010 at 11:18 AM, Lance Norskog wrote: > There is an API now for dynamically loading, unloading, creating and > deleting cores. > Restarting a Solr with thousands of cores will take, I don't know, hours. > Is this in the trunk? Any docs available? > On Thu, Oct 21, 2010 at 10:44 PM, Tharindu Mathew wrote: >> Hi Mike, >> >> I've also considered using a separate cores in a multi tenant >> application, ie a separate core for each tenant/domain. But the cores >> do not suit that purpose. >> >> If you check out documentation no real API support exists for this so >> it can be done dynamically through SolrJ. And all use cases I found, >> only had users configuring it statically and then using it. That was >> maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. >> >> So your better off using a single index and with a user id and use a >> query filter with the user id when fetching data. >> >> On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind wrote: >>> No, it does not seem reasonable. Why do you think you need a seperate core >>> for every user? >>> mike anderson wrote: I'm exploring the possibility of using cores as a solution to "bookmark folders" in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from needing to fit the new index in memory). Thoughts? -mike >>> >> >> >> >> -- >> Regards, >> >> Tharindu >> > > > > -- > Lance Norskog > goks...@gmail.com > -- Regards, Tharindu
Re: how well does multicore scale?
There is an API now for dynamically loading, unloading, creating and deleting cores. Restarting a Solr with thousands of cores will take, I don't know, hours. On Thu, Oct 21, 2010 at 10:44 PM, Tharindu Mathew wrote: > Hi Mike, > > I've also considered using a separate cores in a multi tenant > application, ie a separate core for each tenant/domain. But the cores > do not suit that purpose. > > If you check out documentation no real API support exists for this so > it can be done dynamically through SolrJ. And all use cases I found, > only had users configuring it statically and then using it. That was > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. > > So your better off using a single index and with a user id and use a > query filter with the user id when fetching data. > > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind wrote: >> No, it does not seem reasonable. Why do you think you need a seperate core >> for every user? >> mike anderson wrote: >>> >>> I'm exploring the possibility of using cores as a solution to "bookmark >>> folders" in my solr application. This would mean I'll need tens of >>> thousands >>> of cores... does this seem reasonable? I have plenty of CPUs available for >>> scaling, but I wonder about the memory overhead of adding cores (aside >>> from >>> needing to fit the new index in memory). >>> >>> Thoughts? >>> >>> -mike >>> >>> >> > > > > -- > Regards, > > Tharindu > -- Lance Norskog goks...@gmail.com
Re: how well does multicore scale?
Hi Mike, I've also considered using a separate cores in a multi tenant application, ie a separate core for each tenant/domain. But the cores do not suit that purpose. If you check out documentation no real API support exists for this so it can be done dynamically through SolrJ. And all use cases I found, only had users configuring it statically and then using it. That was maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. So your better off using a single index and with a user id and use a query filter with the user id when fetching data. On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind wrote: > No, it does not seem reasonable. Why do you think you need a seperate core > for every user? > mike anderson wrote: >> >> I'm exploring the possibility of using cores as a solution to "bookmark >> folders" in my solr application. This would mean I'll need tens of >> thousands >> of cores... does this seem reasonable? I have plenty of CPUs available for >> scaling, but I wonder about the memory overhead of adding cores (aside >> from >> needing to fit the new index in memory). >> >> Thoughts? >> >> -mike >> >> > -- Regards, Tharindu
Re: how well does multicore scale?
No, it does not seem reasonable. Why do you think you need a seperate core for every user? mike anderson wrote: I'm exploring the possibility of using cores as a solution to "bookmark folders" in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from needing to fit the new index in memory). Thoughts? -mike