Re: shards per disk
On Wed, 2015-01-21 at 09:46 +0100, Toke Eskildsen wrote: Anyway, RAID 0 does really help for random access, [...] Should have been ...does not really help - Toke Eskildsen
Re: shards per disk
On Wed, 2015-01-21 at 07:56 +0100, Nimrod Cohen wrote: RAID [0] configuration each shard has data on each one of the 8 disks in the RAID, on each query to get 1K docs, each shard request to get data from the one RAID disk, so we get 8 request to get date from all of the disks and we get a queue. Your RAID-setup (whether it is hardware or software) should use a parallel queue, so that requests to different physical drives are issued in parallel under the hood. But RAID is not that well-defined, so maybe your controller or your software uses a single sequential queue. In that case, the pattern will be as you describe. Anyway, RAID 0 does really help for random access, when your access pattern is homogeneous across shards. Even if you fix the problem with your current RAID 0 setup, it is unlikely that you would get a noticeable performance advantage over separate drives. It would make it easier to add shards though, as you would not have to purchase a new drive or unbalance your setup by running multiple shards on some drives. Regarding the response time, 2-3 seconds is good for our usage also getting better is always better, if we will get better we might run the analysis on more than 1K. Limit the amount of fields you request and try experimenting with SolrJ and the binary protocol: I have found that the time for serializing the result to XML can be quite high for large responses. If the number of fields needed is very low and the content of those fields is not large, you could try using faceting with DocValues to get the content. - Toke Eskildsen, State and University Library, Denmark
Re: shards per disk
Hey Nimrod, Nice try. I just want to know that these 8 shards are each on different system or do you implemented sharding on single system and each shard with different port? On Tue, Jan 20, 2015 at 7:54 PM, Nimrod Cohen nimrod.co...@nice.com wrote: Hi I done some performance test, and I wanted to know if any one saw the same behavior. We need to get 1K documents out of 100M documents each time we query solr and send them to text Analysis. First configuration had 8 shards on one RAD (Disk F) we got the 1K in around 15 seconds. Second configuration we removed the RAD and work on 8 different disk each shard on one disk and get the 1K documents in 2-3 seconds. Do anyone see this type of performance improvement or can verify that it’s reasonable? Thanks, *NIMROD COHEN* *Software Engineer* (T) +972 (9) 775-3668 (M) +972 (0) 52-5522901 nimrod.co...@nice.com www.nice.com [image: http://tlvbiztalk03/SignatureMaker/img/banner_SAFE_real_time.jpg] http://www.nice.com/real-time-guidance
Re: shards per disk
It sounds like your app needs a lot more RAM so that it is not doing so much I/O. -- Jack Krupansky On Tue, Jan 20, 2015 at 9:24 AM, Nimrod Cohen nimrod.co...@nice.com wrote: Hi I done some performance test, and I wanted to know if any one saw the same behavior. We need to get 1K documents out of 100M documents each time we query solr and send them to text Analysis. First configuration had 8 shards on one RAD (Disk F) we got the 1K in around 15 seconds. Second configuration we removed the RAD and work on 8 different disk each shard on one disk and get the 1K documents in 2-3 seconds. Do anyone see this type of performance improvement or can verify that it’s reasonable? Thanks, *NIMROD COHEN* *Software Engineer* (T) +972 (9) 775-3668 (M) +972 (0) 52-5522901 nimrod.co...@nice.com www.nice.com [image: http://tlvbiztalk03/SignatureMaker/img/banner_SAFE_real_time.jpg] http://www.nice.com/real-time-guidance
RE: shards per disk
Hi All shards are on the same system each one use different port. BTW Data size is about 1T, memory is 192G. NIMROD COHEN Software Engineer RTI (T) +972 (9) 775-3668 (M) +972 (0) 52-5522901 nimrod.co...@nice.com www.nice.com -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: יום ג, 20 ינואר 2015 16:37 To: solr-user@lucene.apache.org Subject: Re: shards per disk Hey Nimrod, Nice try. I just want to know that these 8 shards are each on different system or do you implemented sharding on single system and each shard with different port? On Tue, Jan 20, 2015 at 7:54 PM, Nimrod Cohen nimrod.co...@nice.com wrote: Hi I done some performance test, and I wanted to know if any one saw the same behavior. We need to get 1K documents out of 100M documents each time we query solr and send them to text Analysis. First configuration had 8 shards on one RAD (Disk F) we got the 1K in around 15 seconds. Second configuration we removed the RAD and work on 8 different disk each shard on one disk and get the 1K documents in 2-3 seconds. Do anyone see this type of performance improvement or can verify that it’s reasonable? Thanks, *NIMROD COHEN* *Software Engineer* (T) +972 (9) 775-3668 (M) +972 (0) 52-5522901 nimrod.co...@nice.com www.nice.com [image: http://tlvbiztalk03/SignatureMaker/img/banner_SAFE_real_time.jpg] http://www.nice.com/real-time-guidance
Re: shards per disk
On 1/20/2015 7:45 AM, Nimrod Cohen wrote: All shards are on the same system each one use different port. BTW Data size is about 1T, memory is 192G. If Solr has to actually go to the disk to satisfy a query, it's going to be slow. This will always be true, no matter how many disks you use. In terms of performance, disks are like molasses or a glacier compared to RAM. Even an SSD is a lot slower. Solr performance is good when all of the data that a query needs is sitting in RAM already, cached by the operating system using memory that is not allocated to programs. 192GB of RAM is nowhere near enough to assure good performance if the Solr indexes are 1TB in size. I would bet that this is true even if you put the indexes on SSD instead of spinning magnetic drives ... although performance would be better with SSD. http://wiki.apache.org/solr/SolrPerformanceProblems You should *not* run multiple Solr instances per machine. All of your index cores should be handled by one instance. Running multiple instances is a waste of resources, especially memory, which as already discussed is extremely precious when dealing with a large index. Thanks, Shawn
Re: shards per disk
I think this makes sense to (ie. the setup), since the search is getting 1K documents each time (for textual analysis, ie. they are probably large docs), and use Solr as a storage (which is totally fine) then the parallel multiple drive i/o shards speed things up. The index is probably large, so it is unrealistic to have enough RAM to cache the most used parts (if they are hitting different docs all the time). I'm curious, as Toke's points out, what was the RAID configuration you ran it on initially. Best, roman On Tue, Jan 20, 2015 at 12:43 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Nimrod Cohen [nimrod.co...@nice.com] wrote: We need to get 1K documents out of 100M documents each time we query solr and send them to text Analysis. First configuration had 8 shards on one RAD (Disk F) we got the 1K in around 15 seconds. Second configuration we removed the RAD and work on 8 different disk each shard on one disk and get the 1K documents in 2-3 seconds. Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be about the same performance as shards on individual disks, due to striping. If you did a RAID 1 with, for example, 2*4 disks, your performance would be markedly worse. If you did a RAID 1 of 8*1 disk, it would be better than individual drives as it would mitigate the slowest drive dictates overall speed problem. If your RAID is not really a RAID but instead JBOD or similar (http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD), then the poor performance is to be expected as chances are all your data would reside on the same physical disk. Please describe your RAID setup in detail. Also, is 2-3 second response time satisfactory to you? If not, what are you aiming at? - Toke Eskildsen
RE: shards per disk
Nimrod Cohen [nimrod.co...@nice.com] wrote: We need to get 1K documents out of 100M documents each time we query solr and send them to text Analysis. First configuration had 8 shards on one RAD (Disk F) we got the 1K in around 15 seconds. Second configuration we removed the RAD and work on 8 different disk each shard on one disk and get the 1K documents in 2-3 seconds. Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be about the same performance as shards on individual disks, due to striping. If you did a RAID 1 with, for example, 2*4 disks, your performance would be markedly worse. If you did a RAID 1 of 8*1 disk, it would be better than individual drives as it would mitigate the slowest drive dictates overall speed problem. If your RAID is not really a RAID but instead JBOD or similar (http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD), then the poor performance is to be expected as chances are all your data would reside on the same physical disk. Please describe your RAID setup in detail. Also, is 2-3 second response time satisfactory to you? If not, what are you aiming at? - Toke Eskildsen
RE: shards per disk
Hi Toke, Thanks for your answer. We are using RAID 0 of 8 disk, I don't understand why it should give me the same performance as disk per drive. Below is an explanation as I see it please correct me if I'm wrong. RAID configuration each shard has data on each one of the 8 disks in the RAID, on each query to get 1K docs, each shard request to get data from the one RAID disk, so we get 8 request to get date from all of the disks and we get a queue. Shard per disk configuration each shard has data only on his own disk, each shard request to get data from his own disk and they don't block each other. If I'm wrong please correct me, I do want to get it. Regarding the response time, 2-3 seconds is good for our usage also getting better is always better, if we will get better we might run the analysis on more than 1K. Thanks for the help. NIMROD COHEN Software Engineer RTI (T) +972 (9) 775-3668 (M) +972 (0) 52-5522901 nimrod.co...@nice.com www.nice.com -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: יום ג, 20 ינואר 2015 19:43 To: solr-user@lucene.apache.org Subject: RE: shards per disk Nimrod Cohen [nimrod.co...@nice.com] wrote: We need to get 1K documents out of 100M documents each time we query solr and send them to text Analysis. First configuration had 8 shards on one RAD (Disk F) we got the 1K in around 15 seconds. Second configuration we removed the RAD and work on 8 different disk each shard on one disk and get the 1K documents in 2-3 seconds. Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be about the same performance as shards on individual disks, due to striping. If you did a RAID 1 with, for example, 2*4 disks, your performance would be markedly worse. If you did a RAID 1 of 8*1 disk, it would be better than individual drives as it would mitigate the slowest drive dictates overall speed problem. If your RAID is not really a RAID but instead JBOD or similar (http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD), then the poor performance is to be expected as chances are all your data would reside on the same physical disk. Please describe your RAID setup in detail. Also, is 2-3 second response time satisfactory to you? If not, what are you aiming at? - Toke Eskildsen