Re: shards per disk

2015-01-21 Thread Toke Eskildsen
On Wed, 2015-01-21 at 09:46 +0100, Toke Eskildsen wrote:
 Anyway, RAID 0 does really help for random access, [...]

Should have been ...does not really help

- Toke Eskildsen




Re: shards per disk

2015-01-21 Thread Toke Eskildsen
On Wed, 2015-01-21 at 07:56 +0100, Nimrod Cohen wrote:
 RAID [0] configuration
 
 each shard has data on each one of the 8 disks in the RAID, on each
 query to get 1K docs, each shard request to get data from the one RAID
 disk, so we get 8 request to get date from all of the disks and we get
 a queue.

Your RAID-setup (whether it is hardware or software) should use a
parallel queue, so that requests to different physical drives are issued
in parallel under the hood. But RAID is not that well-defined, so maybe
your controller or your software uses a single sequential queue. In that
case, the pattern will be as you describe.

Anyway, RAID 0 does really help for random access, when your access
pattern is homogeneous across shards. Even if you fix the problem with
your current RAID 0 setup, it is unlikely that you would get a
noticeable performance advantage over separate drives. It would make it
easier to add shards though, as you would not have to purchase a new
drive or unbalance your setup by running multiple shards on some drives.

 Regarding the response time, 2-3 seconds is good for our usage also
 getting better is always better, if we will get better we might run
 the analysis on more than 1K.

Limit the amount of fields you request and try experimenting with SolrJ
and the binary protocol: I have found that the time for serializing the
result to XML can be quite high for large responses.

If the number of fields needed is very low and the content of those
fields is not large, you could try using faceting with DocValues to get
the content.


- Toke Eskildsen, State and University Library, Denmark





Re: shards per disk

2015-01-20 Thread Nitin Solanki
Hey Nimrod,
Nice try. I just want to know that these 8 shards are each on different
system or do you implemented sharding on single system and each shard with
different port?

On Tue, Jan 20, 2015 at 7:54 PM, Nimrod Cohen nimrod.co...@nice.com wrote:

 Hi

 I done some performance test, and I wanted to know if any one saw the same
 behavior.



 We need to get 1K documents out of 100M documents each time we query solr
 and send them to text Analysis.

 First configuration had 8 shards on one RAD (Disk F) we  got the 1K in
 around 15 seconds.

 Second configuration we removed the RAD and work on 8 different disk each
 shard on one disk and get the 1K documents in 2-3 seconds.



 Do anyone see this type of performance improvement or can verify that it’s
 reasonable?



 Thanks,

 *NIMROD COHEN*
 *Software Engineer*
 (T) +972 (9) 775-3668
 (M) +972 (0) 52-5522901
 nimrod.co...@nice.com
 www.nice.com
 [image: http://tlvbiztalk03/SignatureMaker/img/banner_SAFE_real_time.jpg]
 http://www.nice.com/real-time-guidance





Re: shards per disk

2015-01-20 Thread Jack Krupansky
It sounds like your app needs a lot more RAM so that it is not doing so
much I/O.

-- Jack Krupansky

On Tue, Jan 20, 2015 at 9:24 AM, Nimrod Cohen nimrod.co...@nice.com wrote:

 Hi

 I done some performance test, and I wanted to know if any one saw the same
 behavior.



 We need to get 1K documents out of 100M documents each time we query solr
 and send them to text Analysis.

 First configuration had 8 shards on one RAD (Disk F) we  got the 1K in
 around 15 seconds.

 Second configuration we removed the RAD and work on 8 different disk each
 shard on one disk and get the 1K documents in 2-3 seconds.



 Do anyone see this type of performance improvement or can verify that it’s
 reasonable?



 Thanks,

 *NIMROD COHEN*
 *Software Engineer*
 (T) +972 (9) 775-3668
 (M) +972 (0) 52-5522901
 nimrod.co...@nice.com
 www.nice.com
 [image: http://tlvbiztalk03/SignatureMaker/img/banner_SAFE_real_time.jpg]
 http://www.nice.com/real-time-guidance





RE: shards per disk

2015-01-20 Thread Nimrod Cohen
Hi
All shards are on the same system each one use different port.
BTW
Data size is about 1T, memory is 192G.

NIMROD COHEN 
Software Engineer 
RTI
(T) +972 (9) 775-3668
(M) +972 (0) 52-5522901
nimrod.co...@nice.com 
www.nice.com  


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: יום ג, 20 ינואר 2015 16:37
To: solr-user@lucene.apache.org
Subject: Re: shards per disk

Hey Nimrod,
Nice try. I just want to know that these 8 shards are each on different system 
or do you implemented sharding on single system and each shard with different 
port?

On Tue, Jan 20, 2015 at 7:54 PM, Nimrod Cohen nimrod.co...@nice.com wrote:

 Hi

 I done some performance test, and I wanted to know if any one saw the 
 same behavior.



 We need to get 1K documents out of 100M documents each time we query 
 solr and send them to text Analysis.

 First configuration had 8 shards on one RAD (Disk F) we  got the 1K in 
 around 15 seconds.

 Second configuration we removed the RAD and work on 8 different disk 
 each shard on one disk and get the 1K documents in 2-3 seconds.



 Do anyone see this type of performance improvement or can verify that 
 it’s reasonable?



 Thanks,

 *NIMROD COHEN*
 *Software Engineer*
 (T) +972 (9) 775-3668
 (M) +972 (0) 52-5522901
 nimrod.co...@nice.com
 www.nice.com
 [image: 
 http://tlvbiztalk03/SignatureMaker/img/banner_SAFE_real_time.jpg]
 http://www.nice.com/real-time-guidance





Re: shards per disk

2015-01-20 Thread Shawn Heisey
On 1/20/2015 7:45 AM, Nimrod Cohen wrote:
 All shards are on the same system each one use different port.
 BTW
 Data size is about 1T, memory is 192G.

If Solr has to actually go to the disk to satisfy a query, it's going to
be slow.  This will always be true, no matter how many disks you use. 
In terms of performance, disks are like molasses or a glacier compared
to RAM.  Even an SSD is a lot slower.

Solr performance is good when all of the data that a query needs is
sitting in RAM already, cached by the operating system using memory that
is not allocated to programs.  192GB of RAM is nowhere near enough to
assure good performance if the Solr indexes are 1TB in size.  I would
bet that this is true even if you put the indexes on SSD instead of
spinning magnetic drives ... although performance would be better with SSD.

http://wiki.apache.org/solr/SolrPerformanceProblems

You should *not* run multiple Solr instances per machine.  All of your
index cores should be handled by one instance.  Running multiple
instances is a waste of resources, especially memory, which as already
discussed is extremely precious when dealing with a large index.

Thanks,
Shawn



Re: shards per disk

2015-01-20 Thread Roman Chyla
I think this makes sense to (ie. the setup), since the search is getting 1K
documents each time (for textual analysis, ie. they are probably large
docs), and use Solr as a storage (which is totally fine) then the parallel
multiple drive i/o shards speed things up. The index is probably large, so
it is unrealistic to have enough RAM to cache the most used parts (if they
are hitting different docs all the time). I'm curious, as Toke's points
out, what was the RAID configuration you ran it on initially.

Best,

roman

On Tue, Jan 20, 2015 at 12:43 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 Nimrod Cohen [nimrod.co...@nice.com] wrote:
  We need to get 1K documents out of 100M documents each
  time we query solr and send them to text Analysis.
  First configuration had 8 shards on one RAD (Disk F) we
  got the 1K in around 15 seconds.
  Second configuration we removed the RAD and work on 8
  different disk each shard on one disk and get the 1K
  documents in 2-3 seconds.

 Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be
 about the same performance as shards on individual disks, due to striping.
 If you did a RAID 1 with, for example, 2*4 disks, your performance would be
 markedly worse. If you did a RAID 1 of 8*1 disk, it would be better than
 individual drives as it would mitigate the slowest drive dictates overall
 speed problem. If your RAID is not really a RAID but instead JBOD or
 similar (http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD),
 then the poor performance is to be expected as chances are all your data
 would reside on the same physical disk.

 Please describe your RAID setup in detail.

 Also, is 2-3 second response time satisfactory to you? If not, what are
 you aiming at?

 - Toke Eskildsen



RE: shards per disk

2015-01-20 Thread Toke Eskildsen
Nimrod Cohen [nimrod.co...@nice.com] wrote:
 We need to get 1K documents out of 100M documents each
 time we query solr and send them to text Analysis.
 First configuration had 8 shards on one RAD (Disk F) we
 got the 1K in around 15 seconds.
 Second configuration we removed the RAD and work on 8
 different disk each shard on one disk and get the 1K
 documents in 2-3 seconds.

Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be about 
the same performance as shards on individual disks, due to striping. If you did 
a RAID 1 with, for example, 2*4 disks, your performance would be markedly 
worse. If you did a RAID 1 of 8*1 disk, it would be better than individual 
drives as it would mitigate the slowest drive dictates overall speed problem. 
If your RAID is not really a RAID but instead JBOD or similar 
(http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD), then the poor 
performance is to be expected as chances are all your data would reside on the 
same physical disk.

Please describe your RAID setup in detail.

Also, is 2-3 second response time satisfactory to you? If not, what are you 
aiming at?

- Toke Eskildsen


RE: shards per disk

2015-01-20 Thread Nimrod Cohen
Hi Toke,

Thanks for your answer.

We are using RAID 0 of 8 disk, I don't understand why it should give me the 
same performance as disk per drive.

Below is an explanation as I see it please correct me if I'm wrong.



RAID configuration

each shard has data on each one of the 8 disks in the RAID, on each query to 
get 1K docs, each shard request to get data from the one RAID disk, so we get 8 
request to get date from all of the disks and we get a queue.



Shard per disk configuration

each shard has data only on his own disk, each shard request to get data from 
his own disk and they don't block each other.



If I'm wrong please correct me, I do want to get it.



Regarding the response time, 2-3 seconds is good for our usage also getting 
better is always better, if we will get better we might run the analysis on 
more than 1K.



Thanks for the help.

NIMROD COHEN

Software Engineer

RTI

(T) +972 (9) 775-3668

(M) +972 (0) 52-5522901

nimrod.co...@nice.com

www.nice.com







-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: יום ג, 20 ינואר 2015 19:43
To: solr-user@lucene.apache.org
Subject: RE: shards per disk



Nimrod Cohen [nimrod.co...@nice.com] wrote:

 We need to get 1K documents out of 100M documents each time we query

 solr and send them to text Analysis.

 First configuration had 8 shards on one RAD (Disk F) we got the 1K in

 around 15 seconds.

 Second configuration we removed the RAD and work on 8 different disk

 each shard on one disk and get the 1K documents in 2-3 seconds.



Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be about 
the same performance as shards on individual disks, due to striping. If you did 
a RAID 1 with, for example, 2*4 disks, your performance would be markedly 
worse. If you did a RAID 1 of 8*1 disk, it would be better than individual 
drives as it would mitigate the slowest drive dictates overall speed problem. 
If your RAID is not really a RAID but instead JBOD or similar 
(http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD), then the poor 
performance is to be expected as chances are all your data would reside on the 
same physical disk.



Please describe your RAID setup in detail.



Also, is 2-3 second response time satisfactory to you? If not, what are you 
aiming at?



- Toke Eskildsen