Re: Huge Performance: Solr distributed search

2011-11-25 Thread Mikhail Garber
in general terms, when your Java heap is so large, it is beneficial to
set mx and ms to the same size.

On Wed, Nov 23, 2011 at 5:12 AM, Artem Lokotosh arco...@gmail.com wrote:
 Hi!

 * Data:
 - Solr 3.4;
 - 30 shards ~ 13GB, 27-29M docs each shard.

 * Machine parameters (Ubuntu 10.04 LTS):
 user@Solr:~$ uname -a
 Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
 x86_64 GNU/Linux
 user@Solr:~$ cat /proc/cpuinfo
 processor       : 0 - 3
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 44
 model name      : Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
 stepping        : 2
 cpu MHz         : 3458.000
 cache size      : 12288 KB
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 11
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
 tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
 sse4_2 popcnt aes hypervisor lahf_lm ida arat
 bogomips        : 6916.00
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:
 user@Solr:~$ cat /proc/meminfo
 MemTotal:       16992680 kB
 MemFree:          110424 kB
 Buffers:            9976 kB
 Cached:         11588380 kB
 SwapCached:        41952 kB
 Active:          9860764 kB
 Inactive:        6198668 kB
 Active(anon):    4062144 kB
 Inactive(anon):   398972 kB
 Active(file):    5798620 kB
 Inactive(file):  5799696 kB
 Unevictable:           0 kB
 Mlocked:               0 kB
 SwapTotal:      46873592 kB
 SwapFree:       46810712 kB
 Dirty:                36 kB
 Writeback:             0 kB
 AnonPages:       4424756 kB
 Mapped:           940660 kB
 Shmem:                40 kB
 Slab:             362344 kB
 SReclaimable:     350372 kB
 SUnreclaim:        11972 kB
 KernelStack:        2488 kB
 PageTables:        68568 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:    55369932 kB
 Committed_AS:    5740556 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:      350532 kB
 VmallocChunk:   34359384964 kB
 HardwareCorrupted:     0 kB
 HugePages_Total:       0
 HugePages_Free:        0
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 DirectMap4k:       10240 kB
 DirectMap2M:    17299456 kB

 - Apache Tomcat 6.0.32:
 !-- java arguments --
 -XX:+DisableExplicitGC
 -XX:PermSize=512M
 -XX:MaxPermSize=512M
 -Xmx12G
 -Xms3G
 -XX:NewSize=128M
 -XX:MaxNewSize=128M
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSClassUnloadingEnabled
 -XX:CMSInitiatingOccupancyFraction=50
 -XX:GCTimeRatio=9
 -XX:MinHeapFreeRatio=25
 -XX:MaxHeapFreeRatio=25
 -verbose:gc
 -XX:+PrintGCTimeStamps
 -Xloggc:/opt/search/tomcat/logs/gc.log

 Out search schema is:
 - 5 servers with configuration above;
 - one tomcat6 application on each server with 6 solr applications.

 - Full addresses are:
 1) 
 http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,http://192.168.1.85:8080/solr6
 2) 
 http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,http://192.168.1.86:8080/solr12
 ...
 5) 
 http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,http://192.168.1.89:8080/solr30
 - At another server there is a additional common application with
 shards paramerter:
 requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
 str name=echoParamsexplicit/str
 str 
 name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,192.168.1.89:8080/solr30/str
 int name=rows10/int
 /lst
 /requestHandler
 - schema and solrconfig are identical for all shards, for first shard
 see attach;
 - on these servers are only search, indexation is on another
 (optimized to 2 segments shards replicate with ssh/rsync scripts).

 So now the major problem is huge performance on distributed search.
 Take look on, for example, these logs:
 This is on 30 shards:
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000}
 status=0 QTime=40712
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000}
 status=0 QTime=36097
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000}
 status=0 QTime=75756
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(blessing's)rows=2000}
 status=0 QTime=30342
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(reiterated)rows=2000}
 status=0 QTime=55690

 Sometimes QTime is more than 15. But when we run identical queries
 on one shard separately, QTime is between 200 and 1500.
 Does ditributed solr search really slow or our architecture is non
 optimal? Or maybe need to use any third-party applications?
 Thanks for any replies.

 --
 Best regards,
 Artem



Re: simple persistance layer on top of Solr

2011-11-01 Thread Mikhail Garber
This is very good idea and I used it several times over the years with
great success. As long as you understand limitations (global
transactions, not being able to update records, ...)


On Tue, Nov 1, 2011 at 8:47 AM, Memory Makers memmakers...@gmail.com wrote:
 Greetings guys,

 I have been thinking of using Solr as a simple database due to it's
 blinding speed -- actually I've used that approach in some projects with
 decent success.

 Any thoughts on that?

 Thanks,

 MM.



Re: Bet you didn't know Lucene can...

2011-10-25 Thread Mikhail Garber
Solr as enterprise event warehouse. Multiple heterogeneous
applications and log file sweepers posting stuff to centralized Solr
index.

On Sat, Oct 22, 2011 at 2:12 AM, Grant Ingersoll gsing...@apache.org wrote:
 Hi All,
 I'm giving a talk at ApacheCon titled Bet you didn't know Lucene can... 
 (http://na11.apachecon.com/talks/18396).  It's based on my observation, that 
 over the years, a number of us in the community have done some pretty cool 
 things using Lucene/Solr that don't fit under the core premise of full text 
 search.  I've got a fair number of ideas for the talk (easily enough for 1 
 hour), but I wanted to reach out to hear your stories of ways you've (ab)used 
 Lucene and Solr to see if we couldn't extend the conversation to a bit more 
 than the conference and also see if I can't inject more ideas beyond the ones 
 I have.  I don't need deep technical details, but just high level use case 
 and the basic insight that led you to believe Lucene/Solr could solve the 
 problem.

 Thanks in advance,
 Grant

 
 Grant Ingersoll
 http://www.lucidimagination.com