Hi Ilyas,

So, from those stats it's looking like it's taking around 3 seconds on average 
to write a message into Solr (search_index_latency_mean, in microseconds), 
though the max value is almost 27 seconds.  So there is definitely something 
wrong with the way Solr is behaving, or possibly with the way you are indexing 
you data, though from your description everything looks fine -- 200 bytes of 
JSON, which I assume contains fields like username, device id, ip address etc, 
looking at your schema.  You should definitely be able to get better 
performance out of Solr, though those stats are being taken in Yokozuna, so I 
wouldn't rule out what might be going on in Riak, at least not without knowing 
more.

One thing you might start looking at are the JVM stats via JConsole, for 
example, to see if there is anything suspicious going on with the JVM.  You 
should at least be able to get things like stats from the garbage collector, 
heap size, etc.

I would also recommend using tools on your debian machine like top, vmstat, 
iotsat, to get a picture of how much time is being spent in Solr vs Riak.  It 
would be interesting to see what the CPU and I/O behavior of Riak and Java are, 
in this case.

While you don't necessarily need collectd to tract stats, I have some 
collectd/python scripts for gathering data about Riak and the JVM.  Please feel 
free to pilfer/use at your discretion (The proc man page on most lines is 
helpful).  All they do is scrape the /proc file system to get stats about CPU 
time, io, etc.  There is also some collectd config for collecting stats via JMX 
from the JVM, in case you are interested in that.

https://github.com/fadushin/riak_puppet_stuff/tree/master/modules/riak_node/files/collectd

Yokozuna uses ibrowse to connect (via HTTP) to the Solr, and there is a way to 
set the browse connection pool to something larger than 10.  You might be able 
to get better throughput that way, but I would first try to sort out why the 
latency is so bad.

To change the size of the ibrowse connection pool, attach to Riak, and set the 
ibrowse default_max_sessions environment variable to something greater than 10, 
e.g., 100.

prompt$ riak attach
Remote Shell: Use "Ctrl-C a" to quit. q() or init:stop() will terminate the 
riak node.
Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [async-threads:10] 
[kernel-poll:false] [frame-pointer]

Eshell V5.10.3  (abort with ^G)
(riak@192.168.1.202)1> rpc:multicall(application, get_env, [ibrowse, 
default_max_sessions]).
{[undefined,undefined,undefined,undefined,undefined],[]}
(riak@192.168.1.202)2> rpc:multicall(application, set_env, [ibrowse, 
default_max_sessions, 100]).
{[ok,ok,ok,ok,ok],[]}
(riak@192.168.1.202)3> rpc:multicall(application, get_env, [ibrowse, 
default_max_sessions]).     
{[{ok,100},{ok,100},{ok,100},{ok,100},{ok,100}],[]}

One other thought -- does any of your JSON contain internationalized data, and 
if so, how is it encoded, e.g., UTF-8 or UTF-16, ISO-8859, etc?  Your etop 
listing didn't suggest anything out of sorts with extractors, but we might want 
to get a handle on what is going on there, as well.

Since your quoted issue 320, are the Java errors in your logs associated with 
broken pipes on the Solr server?  That would suggest that we might be getting 
timeouts on the client (yokozuna) side, and connections are getting closed 
before the server can write a response, but I think the default timeout is 60 
seconds, so you shouldn't be hitting that (looking at your stats), though those 
stats are taken from a relatively small time window.

I hope that helps you diagnose where the bottleneck is.  Keep us posted.

-Fred

> On Oct 2, 2015, at 2:23 AM, ilyas <i.serg...@keepsolid.com> wrote:
> 
> 
> It looks like this issues
> 
> https://github.com/basho/yokozuna/issues/320 
> <https://github.com/basho/yokozuna/issues/320>
> 
> I try to set 
> maxThreads to 150
> Acceptors to 10
> lowResourcesMaxIdleTime to 50000
> in /usr/lib/riak/lib/yokozuna/priv/solr/etc/jetty.xml as recommended in 
> https://github.com/basho/yokozuna/issues/330 
> <https://github.com/basho/yokozuna/issues/330> 
> 
> but it has no effect
> 
> On 10/01/2015 11:53 PM, Fred Dushin wrote:
>> Is there any more information in these logs that you can share?  For 
>> example, is this the only entry with this exception?  Or are there more?  
>> Are there any associated stack traces?  An EOF exception can come from many 
>> different scenarios.
>> 
>> Is there anything in the Riak console.log that looks suspicious?
>> 
>> Finally, you might want to take a look at what is going on inside of riak 
>> when you get into this state (slow writes to Solr), by looking at Riak stats.
>> 
>> You get get to Riak stats via curl, e.g.,
>> 
>>      curl http://localhost:8098/stats <http://localhost:8098/stats> | python 
>> -m json.tool
> ok, output is attached
> 
>> Stats you might want to pay special attention to:
>> 
>> riak_kv_vnodeq (min, max, median, etc)  -- the aggregate length of the vnode 
>> queues.  Long vnode queues may mean your vnode is locked waiting on Solr
>> vnode_put_fsm_time (mean, media, percentile, etc) The amount of time spent 
>> on average waiting for a vnode put to complete.  Long times may also be 
>> indicative of waits writing into Solr.
> "riak_kv_vnodeq_max": 0,
> "riak_kv_vnodeq_mean": 0.0,
> "riak_kv_vnodeq_median": 0,
> "riak_kv_vnodeq_min": 0,
> "riak_kv_vnodeq_total": 0,
> 
> 
> <riak_stats.txt>_______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to