Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-25 Thread Silent Surfer
Hi Michael,

We are storing all our data in addition to index, as we need to display those 
values to the user. So unfortunately we cannot go with the option stored=false, 
which could have potentially solved our issue.

Appreciate any other pointers/suggestions

Thanks,
sS

--- On Fri, 9/25/09, Michael  wrote:

> From: Michael 
> Subject: Re: Can we point a Solr server to index directory dynamically at  
> runtime..
> To: solr-user@lucene.apache.org
> Date: Friday, September 25, 2009, 2:00 PM
> Are you storing (in addition to
> indexing) your data?  Perhaps you could turn
> off storage on data older than 7 days (requires
> reindexing), thus losing the
> ability to return snippets but cutting down on your storage
> space and server
> count.  I've experienced 10x decrease in space
> requirements and a large
> boost in speed after cutting extraneous storage from Solr
> -- the stored data
> is mixed in with the index data and so it slows down
> searches.
> You could also put all 200G onto one Solr instance rather
> than 10 for >7days
> data, and accept that those searches will be slower.
> 
> Michael
> 
> On Fri, Sep 25, 2009 at 1:34 AM, Silent Surfer 
> wrote:
> 
> > Hi,
> >
> > Thank you Michael and Chris for the response.
> >
> > Today after the mail from Michael, we tested with the
> dynamic loading of
> > cores and it worked well. So we need to go with the
> hybrid approach of
> > Multicore and Distributed searching.
> >
> > As per our testing, we found that a Solr instance with
> 20 GB of
> > index(single index or spread across multiple cores)
> can provide better
> > performance when compared to having a Solr instance
> say 40 (or) 50 GB of
> > index (single index or index spread across cores).
> >
> > So the 200 GB of index on day 1 will be spread across
> 200/20=10 Solr salve
> > instances.
> >
> > On day 2 data, 10 more Solr slave servers are
> required; Cumulative Solr
> > Slave instances = 200*2/20=20
> > ...
> > ..
> > ..
> > On day 30 data, 10 more Solr slave servers are
> required; Cumulative Solr
> > Slave instances = 200*30/20=300
> >
> > So with the above approach, we may need ~300 Solr
> slave instances, which
> > becomes very unmanageable.
> >
> > But we know that most of the queries is for the past 1
> week, i.e we
> > definitely need 70 Solr Slaves containing the last 7
> days worth of data up
> > and running.
> >
> > Now for the rest of the 230 Solr instances, do we need
> to keep it running
> > for the odd query,that can span across the 30 days of
> data (30*200 GB=6 TB
> > data) which can come up only a couple of times a day.
> > This linear increase of Solr servers with the
> retention period doesn't
> > seems to be a very scalable solution.
> >
> > So we are looking for something more simpler approach
> to handle this
> > scenario.
> >
> > Appreciate any further inputs/suggestions.
> >
> > Regards,
> > sS
> >
> > --- On Fri, 9/25/09, Chris Hostetter 
> wrote:
> >
> > > From: Chris Hostetter 
> > > Subject: Re: Can we point a Solr server to index
> directory dynamically
> > at  runtime..
> > > To: solr-user@lucene.apache.org
> > > Date: Friday, September 25, 2009, 4:04 AM
> > > : Using a multicore approach, you
> > > could send a "create a core named
> > > : 'core3weeksold' pointing to
> '/datadirs/3weeksold' "
> > > command to a live Solr,
> > > : which would spin it up on the fly.  Then
> you query
> > > it, and maybe keep it
> > > : spun up until it's not queried for 60 seconds
> or
> > > something, then send a
> > > : "remove core 'core3weeksold' " command.
> > > : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
> > > .
> > >
> > > something that seems implicit in the question is
> what to do
> > > when the
> > > request spans all of the data ... this is where
> (in theory)
> > > distributed
> > > searching could help you out.
> > >
> > > index each days worth of data into it's own core,
> that
> > > makes it really
> > > easy to expire the old data (just UNLOAD and
> delete an
> > > entire core once
> > > it's more then 30 days old) if your user is only
> searching
> > > "current" dta
> > > then your app can directly query the core
> containing the
> > > most current data
> > > -- but if they want to query the last week, or
> last two
> > > weeks worth of
> > > data, you do a distributed request for all of the
> shards
> > > needed to search
> > > the appropriate amount of data.
> > >
> > > Between the ALIAS and SWAP commands it on the
> CoreAdmin
> > > screen it should
> > > be pretty easy have cores with names like
> > > "today","1dayold","2dayold" so
> > > that your app can configure simple shard params
> for all the
> > > perumations
> > > you'll need to query.
> > >
> > >
> > > -Hoss
> > >
> > >
> >
> >
> >
> >
> >
> >
> 






Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-25 Thread Michael
Are you storing (in addition to indexing) your data?  Perhaps you could turn
off storage on data older than 7 days (requires reindexing), thus losing the
ability to return snippets but cutting down on your storage space and server
count.  I've experienced 10x decrease in space requirements and a large
boost in speed after cutting extraneous storage from Solr -- the stored data
is mixed in with the index data and so it slows down searches.
You could also put all 200G onto one Solr instance rather than 10 for >7days
data, and accept that those searches will be slower.

Michael

On Fri, Sep 25, 2009 at 1:34 AM, Silent Surfer wrote:

> Hi,
>
> Thank you Michael and Chris for the response.
>
> Today after the mail from Michael, we tested with the dynamic loading of
> cores and it worked well. So we need to go with the hybrid approach of
> Multicore and Distributed searching.
>
> As per our testing, we found that a Solr instance with 20 GB of
> index(single index or spread across multiple cores) can provide better
> performance when compared to having a Solr instance say 40 (or) 50 GB of
> index (single index or index spread across cores).
>
> So the 200 GB of index on day 1 will be spread across 200/20=10 Solr salve
> instances.
>
> On day 2 data, 10 more Solr slave servers are required; Cumulative Solr
> Slave instances = 200*2/20=20
> ...
> ..
> ..
> On day 30 data, 10 more Solr slave servers are required; Cumulative Solr
> Slave instances = 200*30/20=300
>
> So with the above approach, we may need ~300 Solr slave instances, which
> becomes very unmanageable.
>
> But we know that most of the queries is for the past 1 week, i.e we
> definitely need 70 Solr Slaves containing the last 7 days worth of data up
> and running.
>
> Now for the rest of the 230 Solr instances, do we need to keep it running
> for the odd query,that can span across the 30 days of data (30*200 GB=6 TB
> data) which can come up only a couple of times a day.
> This linear increase of Solr servers with the retention period doesn't
> seems to be a very scalable solution.
>
> So we are looking for something more simpler approach to handle this
> scenario.
>
> Appreciate any further inputs/suggestions.
>
> Regards,
> sS
>
> --- On Fri, 9/25/09, Chris Hostetter  wrote:
>
> > From: Chris Hostetter 
> > Subject: Re: Can we point a Solr server to index directory dynamically
> at  runtime..
> > To: solr-user@lucene.apache.org
> > Date: Friday, September 25, 2009, 4:04 AM
> > : Using a multicore approach, you
> > could send a "create a core named
> > : 'core3weeksold' pointing to '/datadirs/3weeksold' "
> > command to a live Solr,
> > : which would spin it up on the fly.  Then you query
> > it, and maybe keep it
> > : spun up until it's not queried for 60 seconds or
> > something, then send a
> > : "remove core 'core3weeksold' " command.
> > : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
> > .
> >
> > something that seems implicit in the question is what to do
> > when the
> > request spans all of the data ... this is where (in theory)
> > distributed
> > searching could help you out.
> >
> > index each days worth of data into it's own core, that
> > makes it really
> > easy to expire the old data (just UNLOAD and delete an
> > entire core once
> > it's more then 30 days old) if your user is only searching
> > "current" dta
> > then your app can directly query the core containing the
> > most current data
> > -- but if they want to query the last week, or last two
> > weeks worth of
> > data, you do a distributed request for all of the shards
> > needed to search
> > the appropriate amount of data.
> >
> > Between the ALIAS and SWAP commands it on the CoreAdmin
> > screen it should
> > be pretty easy have cores with names like
> > "today","1dayold","2dayold" so
> > that your app can configure simple shard params for all the
> > perumations
> > you'll need to query.
> >
> >
> > -Hoss
> >
> >
>
>
>
>
>
>


Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-24 Thread Silent Surfer
Hi,

Thank you Michael and Chris for the response. 

Today after the mail from Michael, we tested with the dynamic loading of cores 
and it worked well. So we need to go with the hybrid approach of Multicore and 
Distributed searching.

As per our testing, we found that a Solr instance with 20 GB of index(single 
index or spread across multiple cores) can provide better performance when 
compared to having a Solr instance say 40 (or) 50 GB of index (single index or 
index spread across cores).

So the 200 GB of index on day 1 will be spread across 200/20=10 Solr salve 
instances.

On day 2 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*2/20=20
...
..
..
On day 30 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*30/20=300

So with the above approach, we may need ~300 Solr slave instances, which 
becomes very unmanageable.

But we know that most of the queries is for the past 1 week, i.e we definitely 
need 70 Solr Slaves containing the last 7 days worth of data up and running.

Now for the rest of the 230 Solr instances, do we need to keep it running for 
the odd query,that can span across the 30 days of data (30*200 GB=6 TB data) 
which can come up only a couple of times a day.
This linear increase of Solr servers with the retention period doesn't seems to 
be a very scalable solution. 

So we are looking for something more simpler approach to handle this scenario. 

Appreciate any further inputs/suggestions.

Regards,
sS

--- On Fri, 9/25/09, Chris Hostetter  wrote:

> From: Chris Hostetter 
> Subject: Re: Can we point a Solr server to index directory dynamically at  
> runtime..
> To: solr-user@lucene.apache.org
> Date: Friday, September 25, 2009, 4:04 AM
> : Using a multicore approach, you
> could send a "create a core named
> : 'core3weeksold' pointing to '/datadirs/3weeksold' "
> command to a live Solr,
> : which would spin it up on the fly.  Then you query
> it, and maybe keep it
> : spun up until it's not queried for 60 seconds or
> something, then send a
> : "remove core 'core3weeksold' " command.
> : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
> .
> 
> something that seems implicit in the question is what to do
> when the 
> request spans all of the data ... this is where (in theory)
> distributed 
> searching could help you out.
> 
> index each days worth of data into it's own core, that
> makes it really 
> easy to expire the old data (just UNLOAD and delete an
> entire core once 
> it's more then 30 days old) if your user is only searching
> "current" dta 
> then your app can directly query the core containing the
> most current data 
> -- but if they want to query the last week, or last two
> weeks worth of 
> data, you do a distributed request for all of the shards
> needed to search 
> the appropriate amount of data.
> 
> Between the ALIAS and SWAP commands it on the CoreAdmin
> screen it should 
> be pretty easy have cores with names like
> "today","1dayold","2dayold" so 
> that your app can configure simple shard params for all the
> perumations 
> you'll need to query.
> 
> 
> -Hoss
> 
>







Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-24 Thread Chris Hostetter
: Using a multicore approach, you could send a "create a core named
: 'core3weeksold' pointing to '/datadirs/3weeksold' " command to a live Solr,
: which would spin it up on the fly.  Then you query it, and maybe keep it
: spun up until it's not queried for 60 seconds or something, then send a
: "remove core 'core3weeksold' " command.
: See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler .

something that seems implicit in the question is what to do when the 
request spans all of the data ... this is where (in theory) distributed 
searching could help you out.

index each days worth of data into it's own core, that makes it really 
easy to expire the old data (just UNLOAD and delete an entire core once 
it's more then 30 days old) if your user is only searching "current" dta 
then your app can directly query the core containing the most current data 
-- but if they want to query the last week, or last two weeks worth of 
data, you do a distributed request for all of the shards needed to search 
the appropriate amount of data.

Between the ALIAS and SWAP commands it on the CoreAdmin screen it should 
be pretty easy have cores with names like "today","1dayold","2dayold" so 
that your app can configure simple shard params for all the perumations 
you'll need to query.


-Hoss



Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-24 Thread Michael
Using a multicore approach, you could send a "create a core named
'core3weeksold' pointing to '/datadirs/3weeksold' " command to a live Solr,
which would spin it up on the fly.  Then you query it, and maybe keep it
spun up until it's not queried for 60 seconds or something, then send a
"remove core 'core3weeksold' " command.
See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler .

Michael

On Thu, Sep 24, 2009 at 12:31 AM, Silent Surfer wrote:

> Hi,
>
> Is there any way to dynamically point the Solr servers to an index/data
> directories at run time?
>
> We are generating 200 GB worth of index per day and we want to retain the
> index for approximately 1 month. So our idea is to keep the first 1 week of
> index available at anytime for the users i.e have set of Solr servers up and
> running and handle request to get the past 1 week of date.
>
> But when user tries to query data which is older than 7 days old, we want
> to dynamically point the existing Solr instances to the inactive/dormant
> indexes and get the results.
>
> The main intention is to limit the number of Solr Slave instances and there
> by limit the # of Servers required.
>
> If the index directory and Solr instances are tightly coupled, then most of
> the Solr instances are just up and running and may hardly used, as most of
> the users are mainly interested in past 1 week data and not beyond that.
>
> Any thoughts or any other approaches to tackle this would be greatly
> appreciated.
>
> Thanks,
> sS
>
>
>
>
>


Can we point a Solr server to index directory dynamically at runtime..

2009-09-23 Thread Silent Surfer
Hi,

Is there any way to dynamically point the Solr servers to an index/data 
directories at run time?

We are generating 200 GB worth of index per day and we want to retain the index 
for approximately 1 month. So our idea is to keep the first 1 week of index 
available at anytime for the users i.e have set of Solr servers up and running 
and handle request to get the past 1 week of date. 

But when user tries to query data which is older than 7 days old, we want to 
dynamically point the existing Solr instances to the inactive/dormant indexes 
and get the results.

The main intention is to limit the number of Solr Slave instances and there by 
limit the # of Servers required.

If the index directory and Solr instances are tightly coupled, then most of the 
Solr instances are just up and running and may hardly used, as most of the 
users are mainly interested in past 1 week data and not beyond that.

Any thoughts or any other approaches to tackle this would be greatly 
appreciated.

Thanks,
sS