Re: SOLR - Recommendation on architecture
On 3/12/2013 4:12 AM, kobe.free.wo...@gmail.com wrote: Following is the prod scenario:- 1. Web Server 1 (with above mentioned configuration) - will be hosting Solr instance and web site. 2. Web Server 2 (with above mentioned configuration) - will be hosting second Solr instance and web site. Does this scenario looks fine w.r.t the indexing/ searching performance? Also, on the front end are .NET web applications that issue queries via HTTP requests to our searchers. It's always recommended that Solr live on separate hardware from everything else, and I'll add my +1 to wunder's don't use Windows note here too. You've already gotten some awesome replies about why, here's my two cents: Busy web servers, especially those that run full applications, tend to be hungry for CPU and RAM resources. This also describes Solr, which is itself a web application (java servlet). If Solr is not the only thing on the box, then nobody can even make a guess about whether the hardware you're using will be big enough. Even when Solr is the only thing on the box, advice found here is often only a guess. Adding additional software to the machine guarantees that it's a guess, and a vague one at best. If your servers have plenty of CPU and RAM left over even when the web server reaches peak load, then you might be OK. The 500 users per minute figure you've given sounds like a lot. Note that any enumeration of RAM resources must include the amount of required OS disk cache, not just the amount of RAM required by the applications themselves. Here's a blog post about how Lucene (and Solr) uses RAM and the OS disk cache. When it's big enough, the OS disk cache is helpful even for applications that don't use MMap: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html The number of documents (800K) and fields (450 per doc) that you've mentioned sounds like it will produce an index size that's way too big to fit in the OS disk cache on a 12GB server, unless all of those fields contain numeric data encoded in numeric data types rather than fully tokenized text. If you are searching and/or filtering on very many of those fields, plus facets, Solr is going to require a lot of heap memory, further reducing the amount of OS disk cache available. With a web application receiving several hundred requests per minute running on the same hardware, 12GB probably won't be anywhere near enough ... I'd say the absolute minimum you'd want to consider for your combined setup would be 64GB, and more might be a good idea. Depending on the total index size, 32GB might be enough for a dedicated Solr server. Thanks, Shawn
Re: SOLR - Recommendation on architecture
On 8 March 2013 14:19, Kobe J kobe.free.wo...@gmail.com wrote: We are planning to use SOLR 4.1 for full text indexing. Following is the hardware configuration of the web server that we plan to install SOLR on:- *CPU*: 2 x Dual Core (4 cores) *R**AM:* 12GB *Storage*: 212GB *OS Version* – Windows 2008 R2 [...] As with most things, the devil is in the details: What kind of queries are you planning to run, and what search features will you be using, e.g., faceting, sorting, highlighting, etc. A desired query response time is meaningless without also specifying the number of simultaneous users. Your best bet is to set up a prototype, and benchmark your search. Having said that, your proposed hardware seems more than adequate for your needs: 1. If possible, use SSDs or fast disks 2. I would not use Windows as a server platform Regards, Gora
Re: SOLR - Recommendation on architecture
I would not recommend Windows too 2013/3/8 Kobe J kobe.free.wo...@gmail.com We are planning to use SOLR 4.1 for full text indexing. Following is the hardware configuration of the web server that we plan to install SOLR on:- *CPU*: 2 x Dual Core (4 cores) *R**AM:* 12GB *Storage*: 212GB *OS Version* – Windows 2008 R2 The dataset to be imported will have approx.. 800k records, with 450 fields per record. Query response time should be btw 200ms-800ms. Please suggest if the current single server implementation should work fine and if the specified configuration is enough for the requirement.
Re: SOLR - Recommendation on architecture
Because? Upayavira On Fri, Mar 8, 2013, at 09:27 AM, Jilal Oussama wrote: I would not recommend Windows too 2013/3/8 Kobe J kobe.free.wo...@gmail.com We are planning to use SOLR 4.1 for full text indexing. Following is the hardware configuration of the web server that we plan to install SOLR on:- *CPU*: 2 x Dual Core (4 cores) *R**AM:* 12GB *Storage*: 212GB *OS Version* – Windows 2008 R2 The dataset to be imported will have approx.. 800k records, with 450 fields per record. Query response time should be btw 200ms-800ms. Please suggest if the current single server implementation should work fine and if the specified configuration is enough for the requirement.
Re: SOLR - Recommendation on architecture
If you are attempting to assess performance, you should use as many records as you can muster. A Lucene index does start to struggle at a certain size, and you may be getting close to that, depending upon the size of your fields. Are you suggesting that you would host other services on the server as well? I would expect your Solr instance to want sole use of the server, as an index of your size will demand it. Upayavira On Fri, Mar 8, 2013, at 10:02 AM, kobe.free.wo...@gmail.com wrote: Thanks for your suggestion Gora. Yes, we are planning to use faceting, sorting features. The number of simultaneous users would be around 500 per min. We have preferred windows since the server would also be hosting some of our Microsoft based web applications. For prototyping, given the number of records we will be working with, what number of records do you suggest should we include in prototyping? -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Recommendation-on-architecture-tp4045718p4045734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR - Recommendation on architecture
Your servers seems to be about the right size, but as everyone else has said, it depends on the kinds of queries. Solr should be the only service on the system. Solr can make heavy use of the disk which will interfere with other processes. If you are lucky enough to get the system tuned to run from RAM, it can use 100% of CPU. Tuning Solr will be very difficult with other services sharing the same system. If you need to meet an SLA, you will have a hard time doing that on a shared server. When you don't meet that SLA, it will be almost impossible to diagnose why. Why not Windows? * the Windows filesystem is not designed for heavy server use * Windows does not allow open files to be deleted -- there are workarounds for this in Solr, but it is a continuing problem * the Windows file cache is organized by file, not by block, which is inefficient for Solr's access pattern * Java on Windows works, but has a number of workarounds and quirks * the Solr community is almost all Unix users, so you will get much better help on Unix wunder On Mar 8, 2013, at 3:04 AM, Upayavira wrote: If you are attempting to assess performance, you should use as many records as you can muster. A Lucene index does start to struggle at a certain size, and you may be getting close to that, depending upon the size of your fields. Are you suggesting that you would host other services on the server as well? I would expect your Solr instance to want sole use of the server, as an index of your size will demand it. Upayavira On Fri, Mar 8, 2013, at 10:02 AM, kobe.free.wo...@gmail.com wrote: Thanks for your suggestion Gora. Yes, we are planning to use faceting, sorting features. The number of simultaneous users would be around 500 per min. We have preferred windows since the server would also be hosting some of our Microsoft based web applications. For prototyping, given the number of records we will be working with, what number of records do you suggest should we include in prototyping? -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Recommendation-on-architecture-tp4045718p4045734.html Sent from the Solr - User mailing list archive at Nabble.com.