Re: SOLR - Recommendation on architecture

2013-03-12 Thread Shawn Heisey

On 3/12/2013 4:12 AM, kobe.free.wo...@gmail.com wrote:

Following is the prod scenario:-

1. Web Server 1 (with above mentioned configuration) - will be hosting Solr
instance and web site.
2. Web Server 2 (with above mentioned configuration) - will be hosting
second Solr instance and web site.

Does this scenario looks fine w.r.t the indexing/ searching performance?
Also, on the front end are .NET web applications that issue queries via HTTP
requests to our searchers.


It's always recommended that Solr live on separate hardware from 
everything else, and I'll add my +1 to wunder's don't use Windows note 
here too.  You've already gotten some awesome replies about why, here's 
my two cents:


Busy web servers, especially those that run full applications, tend to 
be hungry for CPU and RAM resources.  This also describes Solr, which is 
itself a web application (java servlet).  If Solr is not the only thing 
on the box, then nobody can even make a guess about whether the hardware 
you're using will be big enough.  Even when Solr is the only thing on 
the box, advice found here is often only a guess.  Adding additional 
software to the machine guarantees that it's a guess, and a vague one at 
best.


If your servers have plenty of CPU and RAM left over even when the web 
server reaches peak load, then you might be OK.  The 500 users per 
minute figure you've given sounds like a lot.


Note that any enumeration of RAM resources must include the amount of 
required OS disk cache, not just the amount of RAM required by the 
applications themselves.  Here's a blog post about how Lucene (and Solr) 
uses RAM and the OS disk cache.  When it's big enough, the OS disk cache 
is helpful even for applications that don't use MMap:


http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

The number of documents (800K) and fields (450 per doc) that you've 
mentioned sounds like it will produce an index size that's way too big 
to fit in the OS disk cache on a 12GB server, unless all of those fields 
contain numeric data encoded in numeric data types rather than fully 
tokenized text.


If you are searching and/or filtering on very many of those fields, plus 
facets, Solr is going to require a lot of heap memory, further reducing 
the amount of OS disk cache available.  With a web application receiving 
several hundred requests per minute running on the same hardware, 12GB 
probably won't be anywhere near enough ... I'd say the absolute minimum 
you'd want to consider for your combined setup would be 64GB, and more 
might be a good idea.  Depending on the total index size, 32GB might be 
enough for a dedicated Solr server.


Thanks,
Shawn



Re: SOLR - Recommendation on architecture

2013-03-08 Thread Gora Mohanty
On 8 March 2013 14:19, Kobe J kobe.free.wo...@gmail.com wrote:
 We are planning to use SOLR 4.1 for full text indexing. Following is the
 hardware configuration of the web server that we plan to install SOLR on:-

 *CPU*: 2 x Dual Core (4 cores)

 *R**AM:* 12GB

 *Storage*: 212GB

 *OS Version* – Windows 2008 R2
[...]

As with most things, the devil is in the details: What kind of
queries are you planning to run, and what search features
will you be using, e.g., faceting, sorting, highlighting, etc.
A desired query response time is meaningless without also
specifying the number of simultaneous users. Your best bet
is to set up a prototype, and benchmark your search.

Having said that, your proposed hardware seems more than
adequate for your needs:
1. If possible, use SSDs or fast disks
2. I would not use Windows as a server platform

Regards,
Gora


Re: SOLR - Recommendation on architecture

2013-03-08 Thread Jilal Oussama
I would not recommend Windows too


2013/3/8 Kobe J kobe.free.wo...@gmail.com

 We are planning to use SOLR 4.1 for full text indexing. Following is the
 hardware configuration of the web server that we plan to install SOLR on:-

 *CPU*: 2 x Dual Core (4 cores)

 *R**AM:* 12GB

 *Storage*: 212GB

 *OS Version* – Windows 2008 R2



 The dataset to be imported will have approx.. 800k records, with 450 fields
 per record. Query response time should be btw 200ms-800ms.



 Please suggest if the current single server implementation should work fine
 and if the specified configuration is enough for the requirement.



Re: SOLR - Recommendation on architecture

2013-03-08 Thread Upayavira
Because?

Upayavira

On Fri, Mar 8, 2013, at 09:27 AM, Jilal Oussama wrote:
 I would not recommend Windows too
 
 
 2013/3/8 Kobe J kobe.free.wo...@gmail.com
 
  We are planning to use SOLR 4.1 for full text indexing. Following is the
  hardware configuration of the web server that we plan to install SOLR on:-
 
  *CPU*: 2 x Dual Core (4 cores)
 
  *R**AM:* 12GB
 
  *Storage*: 212GB
 
  *OS Version* – Windows 2008 R2
 
 
 
  The dataset to be imported will have approx.. 800k records, with 450 fields
  per record. Query response time should be btw 200ms-800ms.
 
 
 
  Please suggest if the current single server implementation should work fine
  and if the specified configuration is enough for the requirement.
 


Re: SOLR - Recommendation on architecture

2013-03-08 Thread Upayavira
If you are attempting to assess performance, you should use as many
records as you can muster. A Lucene index does start to struggle at a
certain size, and you may be getting close to that, depending upon the
size of your fields.

Are you suggesting that you would host other services on the server as
well? I would expect your Solr instance to want sole use of the server,
as an index of your size will demand it. 

Upayavira

On Fri, Mar 8, 2013, at 10:02 AM, kobe.free.wo...@gmail.com wrote:
 Thanks for your suggestion Gora.
 
 Yes, we are planning to use faceting, sorting features. The number of
 simultaneous users would be around 500 per min. We have preferred windows
 since the server would also be hosting some of our Microsoft based web
 applications. For prototyping, given the number of records we will be
 working with, what number of records do you suggest should we include in
 prototyping?
 
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Recommendation-on-architecture-tp4045718p4045734.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR - Recommendation on architecture

2013-03-08 Thread Walter Underwood
Your servers seems to be about the right size, but as everyone else has said, 
it depends on the kinds of queries.

Solr should be the only service on the system. Solr can make heavy use of the 
disk which will interfere with other processes. If you are lucky enough to get 
the system tuned to run from RAM, it can use 100% of CPU. Tuning Solr will be 
very difficult with other services sharing the same system.

If you need to meet an SLA, you will have a hard time doing that on a shared 
server. When you don't meet that SLA, it will be almost impossible to diagnose 
why.

Why not Windows?

* the Windows filesystem is not designed for heavy server use
* Windows does not allow open files to be deleted -- there are workarounds for 
this in Solr, but it is a continuing problem
* the Windows file cache is organized by file, not by block, which is 
inefficient for Solr's access pattern
* Java on Windows works, but has a number of workarounds and quirks
* the Solr community is almost all Unix users, so you will get much better help 
on Unix

wunder

On Mar 8, 2013, at 3:04 AM, Upayavira wrote:

 If you are attempting to assess performance, you should use as many
 records as you can muster. A Lucene index does start to struggle at a
 certain size, and you may be getting close to that, depending upon the
 size of your fields.
 
 Are you suggesting that you would host other services on the server as
 well? I would expect your Solr instance to want sole use of the server,
 as an index of your size will demand it. 
 
 Upayavira
 
 On Fri, Mar 8, 2013, at 10:02 AM, kobe.free.wo...@gmail.com wrote:
 Thanks for your suggestion Gora.
 
 Yes, we are planning to use faceting, sorting features. The number of
 simultaneous users would be around 500 per min. We have preferred windows
 since the server would also be hosting some of our Microsoft based web
 applications. For prototyping, given the number of records we will be
 working with, what number of records do you suggest should we include in
 prototyping?
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Recommendation-on-architecture-tp4045718p4045734.html
 Sent from the Solr - User mailing list archive at Nabble.com.