Solr Capacity Planning

2017-06-17 Thread Greenhorn Techie
Hi,

We are planning to setup a Solr cloud for building a search application on
huge volumes of data points (~hundreds of billions of solr documents) I
would like to understand if there is any recommendation on how to size the
infrastructure and hardware requirements for Solr clusters. Also, what are
the best practices to consider during this setup.

Thanks


Re: Solr Capacity Planning

2017-06-17 Thread Will Martin
MODERATOR REQUESTED: 

> On Jun 17, 2017, at 3:56 AM, Greenhorn Techie  
> wrote:
> 
> Hi,
> 
> We are planning to setup a Solr cloud for building a search application on
> huge volumes of data points (~hundreds of billions of solr documents) I
> would like to understand if there is any recommendation on how to size the
> infrastructure and hardware requirements for Solr clusters. Also, what are
> the best practices to consider during this setup.
> 
> Thanks

Seriously.
Will Martin



Re: Solr Capacity Planning

2017-06-18 Thread Erick Erickson
I have no idea what Will's comment means, but I will say that at that
scale you'd probably be well advised to get some professional
consulting help, it'll save you a world of hassle.

The basic sizing exercise is here:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

If you follow the process there you'll have an idea how many of your
docs with your queries you can fit on a machine of a particular size
and extrapolate from there. Expect there to be some "learning
experiences" along the way.

Best,
Erick



On Sat, Jun 17, 2017 at 12:56 AM, Greenhorn Techie
 wrote:
> Hi,
>
> We are planning to setup a Solr cloud for building a search application on
> huge volumes of data points (~hundreds of billions of solr documents) I
> would like to understand if there is any recommendation on how to size the
> infrastructure and hardware requirements for Solr clusters. Also, what are
> the best practices to consider during this setup.
>
> Thanks


SOLR capacity planning and Disaster relief

2012-10-20 Thread Worthy LaFollette
CAVEAT: I am a nubie w/r to SOLR (some Lucene experience, but not SOLR
itself.  Trying to come up to speed.


What have you all done w/r to SOLR capacity planning and disaster relief?

I am curious to the following metrics:

 - File handles and other ulimit/profile concerns
 - Space calculations (particularly w/r to optimizations, etc.)
 - Taxonomy considerations
 - Single Core vs. Multi-core
 - ?

Also, anyone plan for Disaster relief for SOLR across non-metro data
centers?   Currently not an issue for me, but will be shortly.


Re: SOLR capacity planning and Disaster relief

2012-10-23 Thread Otis Gospodnetic
Hi Worty,

On Sun, Oct 21, 2012 at 2:30 AM, Worthy LaFollette  wrote:
> CAVEAT: I am a nubie w/r to SOLR (some Lucene experience, but not SOLR
> itself.  Trying to come up to speed.
>
>
> What have you all done w/r to SOLR capacity planning and disaster relief?

Re capacity planning - performance testing with realistic datasets,
query types and rates combined with monitoring tools that show you
system and Solr metrics so you can understand what is going on will
get you far.  Ongoing monitoring and observation of a running system
will let you understand trends, bottlenecks, and figure out if you
need to get ready to buy more RAM or add servers or ...

> I am curious to the following metrics:
>
>  - File handles and other ulimit/profile concerns

Not often a concern any more.  Typical Linux systems come with 1024
max open files, which is often insufficient, so people change that to
20K, 30K, etc.
I *think* we have this system metric in SPM for Solr, but I'm not sure
right now.

>  - Space calculations (particularly w/r to optimizations, etc.)

Monitoring again is the best way to tell and to keep an eye on this.
Optimization can take ~3x disk space, if I remember correctly.  You
can also check ML archives for recent emails re index optimization.

>  - Taxonomy considerations

I think this is typically DIY.

>  - Single Core vs. Multi-core

Not sure what to say here.  Typically one type of data goes in one
core.  You typically don't put both people records and product records
and order records in the same core because these three things have
different structure/schema.

>  - ?
>
> Also, anyone plan for Disaster relief for SOLR across non-metro data
> centers?   Currently not an issue for me, but will be shortly.

Have a look at http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html