Solr error message

Jim Raney Mon, 11 Apr 2016 13:13:23 -0700

Hello,

We're seeing the following error in riak/yokazuna:

2016-04-11 19:36:18.803 [error]<0.23120.8>@yz_pb_search:maybe_process:84 "Failed to determine Solr portfor all nodes in search plan"[{lager_trunc_io,alist,3,[{file,"src/lager_trunc_io.erl"},{line,448}]},{lager_trunc_io,alist,3,[{file,"src/lager_trunc_io.erl"},{line,421}]},{lager_trunc_io,alist,3,[{file,"src/lager_trunc_io.erl"},{line,418}]},{lager_trunc_io,alist,3,[{file,"src/lager_trunc_io.erl"},{line,421}]},{lager_trunc_io,alist,3,[{file,"src/lager_trunc_io.erl"},{line,418}]},{lager_trunc_io,alist,3,[{file,"src/lager_trunc_io.erl"},{line,421}]},{lager_trunc_io,alist,3,[{file,"src/lager_trunc_io.erl"},{line,418}]},{lager_trunc_io,print,3,[{file,"src/lager_trunc_io.erl"},{line,168}]}]

This is a 7-node cluster running the RPM of 2.1.3 on CentOS 7, in Googlecloud, with 16-CPU/60GB RAM VMs. They are configured with levelDB, witha 500G SSD disk for the first four tiers and a 2TB magnetic disk for theremainder. IOPSs/throughput are not an issue with our application.

There is a UWSGI-based REST service that sits in front of riak thatcontains all of the application logic. The testing suite (locust) loadsbinary data files that the uwsgi service processes and inserts intoriak. As part of that processing yokazuna indexes get searched.

We find that ~40 minutes to an hour into load testing we start seeingthe above error logged (leading to 500s from locust's perspective). Itcorresponds with Search Query Fail Count, which we graph with zabbix.Over time the number gets larger and larger, and after about an hour ofload testng it starts to curve upwards sharply.


In riak.conf we have:

search = on
search.solr.start_timeout = 120s
search.solr.port = 8093
search.solr.jmx_port = 8985

search.solr.jvm_options = -d64 -Xms2g -Xmx16g -XX:+UseStringCache-XX:+UseCompressedOops

and we are using java-1.7.0-openjdk-1.7.0.99-2.6.5.0.el7_2.x86_64 fromthe CentOS repos. I've been graphing JMX stats with zabbix and nothinglooks untoward, the heap gradually climbs up in size but neverskyrockets and certainly doesn't come close to the 16GB cap (barely getsabove 3GB before things really go south). With jconsole I see the samenumbers, with a gradually increasing time for garbage collection (lastrecorded was "23.751 seconds on PS Scavenge (640 collections)"),although it's hard to tell if there's any large pauses from gc.

We graph a bunch of additional stats in zabbix, and the boxes in thecluster never get close to capping out CPU or running out of RAM.

I googled around and couldn't find any reference to the logged error.Does it have to do with solr having a problem contacting other nodes inthe cluster? Or is it some kind of node lookup issue?


--
Jim Raney


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Solr error message

Reply via email to