Re: reads/writes during node replacement

2016-11-14 Thread Johnny Tan
Thank you Magnus.

On Mon, Nov 14, 2016 at 7:06 AM, Magnus Kessler  wrote:

> On 12 November 2016 at 00:08, Johnny Tan  wrote:
>
>> When doing a node replace (http://docs.basho.com/riak/1.
>> 4.12/ops/running/nodes/replacing/), after commit-ing the plan, how does
>> the cluster handle reads/writes? Do I include the new node in my app's
>> config as soon as I commit, and let riak internally handle which node(s)
>> will do the reads/writes? Or do I wait until the ringready on the new node
>> before being able to do reads/writes to it?
>>
>> johnny
>>
>>
> Hi Johnny,
>
> As soon as a node has been joined to the cluster it is capable of taking
> on requests. `riak-admin ringready` returns true after a join or leave
> operation when the new ring state has been communicated successfully to all
> nodes in the cluster.
>
> During a replacement operation, the leaving node will hand off [0] all its
> partitions to the joining node. Both nodes can handle requests during this
> phase and store data in the partitions they own. Once the leaving node has
> handed off all its partitions, it will automatically shut down. Please keep
> this in mind when configuring your clients or load balancers. Clients
> should deal with nodes being temporarily or permanently unavailable.
>
> Kind Regards,
>
> Magnus
>
> [0]: http://docs.basho.com/riak/kv/2.1.4/using/reference/handoff/
>
> --
> Magnus Kessler
> Client Services Engineer
> Basho Technologies Limited
>
> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


reads/writes during node replacement

2016-11-11 Thread Johnny Tan
When doing a node replace (
http://docs.basho.com/riak/1.4.12/ops/running/nodes/replacing/), after
commit-ing the plan, how does the cluster handle reads/writes? Do I include
the new node in my app's config as soon as I commit, and let riak
internally handle which node(s) will do the reads/writes? Or do I wait
until the ringready on the new node before being able to do reads/writes to
it?

johnny
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: bitcask merges & deletions

2016-06-16 Thread Johnny Tan
Hm, definitely not a cronjob. I'll look at our app and see if there's
anything that does something like that there.

On Wed, Jun 15, 2016 at 9:10 PM, Luke Bakken  wrote:

> Hi Johnny,
>
> Since this seems to happen regularly on one node on your cluster (not
> necessarily the same node), do you have a repetitive process that
> performs a *lot* of updates or deletes on a single key that could be
> correlated to these merges?
> --
> Luke Bakken
> Engineer
> lbak...@basho.com
>
>
> On Wed, Jun 15, 2016 at 10:22 AM, Johnny Tan  wrote:
> > We're running riak-1.4.2
> >
> > Every few weeks, we have a riak node that starts to slowly fill up on
> disk
> > space for several days, and then suddenly gain that space back again.
> >
> > In looking into this more today, I think I see what's going on.
> >
> > Per the console.log on a node that it's happening to right now, there
> are an
> > unusually large amount of merges happening right now. There are 6 total
> > nodes in our cluster, it's only happening to this node today. (In
> previous
> > weeks, it's been other nodes, but it's always been one node at a time.)
> >
> > Normally, we get 50-70 merges per day per node (according to various
> nodes'
> > console.log, including the node in question). Yesterday and today, the
> node
> > in question has several hundred merges happening.
> >
> > When I look inside the bitcask directory, I see a lot of files with this
> set
> > of permissions:
> > -rwSrw-r--
> >
> > My understanding is that those are files marked for deletions after
> bitcask
> > merging.
> >
> > The number of those files is currently growing, and from a spot-check,
> they
> > indeed match up as the files that have been merged.
> >
> > So it seems the two are related: a lot of merges are happening, which
> then
> > causes a large number of files to be marked for deletion, and those
> marked
> > files are piling up and not getting deleted for some reason.
> >
> > If I don't do anything, those files eventually get deleted, and
> everything
> > is good again for another couple weeks until it happens to another node.
> But
> > the disk usage does get high enough to alert us, and obviously we don't
> want
> > it to get anywhere near 100%.
> >
> >
> > I'm trying to figure out why there are times when this happens. One
> thing I
> > noticed is a difference in the merge log entries.
> >
> > Here's one from a "normal" day, nearly all the entries for that day are
> > roughly this same length and same amount of time merging:
> > 2016-06-10 05:27:39.426 UTC [info] <0.15230.160> Merged
> >
> {["/var/lib/riak/bitcask/890602560248518965780370444936484965102833893376/84000.bitcask.data","/var/lib/riak/bitcask/890602560248518965780370444936484965102833893376/83999.bitcask.data"],[]}
> > in 11.902028 seconds.
> >
> > But here's one from today on the problematic node:
> > 2016-06-15 17:13:40.626 UTC [info] <0.17903.500> Merged
> >
> {["/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83633.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83632.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83631.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83630.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83629.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83628.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83627.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83626.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83625.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83624.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83623.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83622.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83621.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83620.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83619.bitcask.data","/var/lib/riak/bitcask/12331420064979493372343

bitcask merges & deletions

2016-06-15 Thread Johnny Tan
We're running riak-1.4.2

Every few weeks, we have a riak node that starts to slowly fill up on disk
space for several days, and then suddenly gain that space back again.

In looking into this more today, I think I see what's going on.

Per the console.log on a node that it's happening to right now, there are
an unusually large amount of merges happening right now. There are 6 total
nodes in our cluster, it's only happening to this node today. (In previous
weeks, it's been other nodes, but it's always been one node at a time.)

Normally, we get 50-70 merges per day per node (according to various nodes'
console.log, including the node in question). Yesterday and today, the node
in question has several hundred merges happening.

When I look inside the bitcask directory, I see a lot of files with this
set of permissions:
-rwSrw-r--

My understanding is that those are files marked for deletions after bitcask
merging.

The number of those files is currently growing, and from a spot-check, they
indeed match up as the files that have been merged.

So it seems the two are related: a lot of merges are happening, which then
causes a large number of files to be marked for deletion, and those marked
files are piling up and not getting deleted for some reason.

If I don't do anything, those files eventually get deleted, and everything
is good again for another couple weeks until it happens to another node.
But the disk usage does get high enough to alert us, and obviously we don't
want it to get anywhere near 100%.


I'm trying to figure out why there are times when this happens. One thing I
noticed is a difference in the merge log entries.

Here's one from a "normal" day, nearly all the entries for that day are
roughly this same length and same amount of time merging:
2016-06-10 05:27:39.426 UTC [info] <0.15230.160> Merged
{["/var/lib/riak/bitcask/890602560248518965780370444936484965102833893376/84000.bitcask.data","/var/lib/riak/bitcask/890602560248518965780370444936484965102833893376/83999.bitcask.data"],[]}
in 11.902028 seconds.

But here's one from today on the problematic node:
2016-06-15 17:13:40.626 UTC [info] <0.17903.500> Merged
{["/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83633.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83632.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83631.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83630.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83629.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83628.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83627.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83626.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83625.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83624.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83623.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83622.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83621.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83620.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83619.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83618.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83617.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83616.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83615.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83614.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83613.bitcask.data","/var/lib/riak/bitcask/12331420064979493372343590776043637978346930...",...],...}
in 220.186043 seconds.

It's not just that it takes 20x longer to merge, but it seems to be doing a
lot more files at once.

What is going on?

I'm not sure how much of the app.config is relevant, but I'll at least
paste just the bitcask and merge sections for now:
{bitcask, [
{data_root, "/var/lib/riak/bitcask"},
{dead_bytes_merge_trigger, 268435456},
{dead_bytes_threshold, 67108864},
{frag_merge_trigger, 60},
{frag_threshold, 40},
{io_mode, erlang},
{max_file_size, 1073741824},
{small_file_threshold, 134217728}
]},
{merge_index, [
{buffer_rollover_size, 1048576},
{data_root, "/var/lib/riak/merge_index"},

Re: Changing ring size on 1.4 cluster

2016-06-01 Thread Johnny Tan
Thank you Luke.

On Wed, Jun 1, 2016 at 1:46 PM, Luke Bakken  wrote:

> Hi Johnny,
>
> Yes, the latter two are your main options. For a 1.4 series Riak
> installation, your only option is to bring up a new cluster with the
> desired ring size and replicate data.
> --
> Luke Bakken
> Engineer
> lbak...@basho.com
>
>
> On Fri, May 27, 2016 at 12:11 PM, Johnny Tan  wrote:
> > The docs
> http://docs.basho.com/riak/kv/2.1.4/configuring/basic/#ring-size
> > seem to imply that there's no easy, non-destructive way to change a
> > cluster's ring size live for Riak-1.4x.
> >
> > I thought about replacing one node at a time, but you can't join a new
> node
> > or replace an existing one with a node that has a different ring size.
> >
> > I was also thinking of bring up a completely new cluster with the new
> ring
> > size, and then replicating the data from the original cluster, and take a
> > quick maintenance window to failover to the new cluster.
> >
> > One other alternative seems to be to upgrade to 2.0, and then use 2.x's
> > ability to resize the ring.
> >
> > Are these latter two my main options?
> >
> > johnny
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Changing ring size on 1.4 cluster

2016-05-27 Thread Johnny Tan
The docs http://docs.basho.com/riak/kv/2.1.4/configuring/basic/#ring-size
seem to imply that there's no easy, non-destructive way to change a
cluster's ring size live for Riak-1.4x.

I thought about replacing one node at a time, but you can't join a new node
or replace an existing one with a node that has a different ring size.

I was also thinking of bring up a completely new cluster with the new ring
size, and then replicating the data from the original cluster, and take a
quick maintenance window to failover to the new cluster.

One other alternative seems to be to upgrade to 2.0, and then use 2.x's
ability to resize the ring.

Are these latter two my main options?

johnny
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: uneven disk distribution

2015-06-04 Thread Johnny Tan
To followup:

Since we use chef (configuration management), the riak configs are the same
across all our riak nodes (except for stuff like hostnames/IPs, etc.).

I ran riak_kv:repair and it looks like it had fixed the problem on node
004, but then a _different_ node (002) started to throw a bunch of locking
errors. (I thought I saved a copy of that log somewhere but can't seem to
find it now, and the old ones are rotated out.)

Nothing I did would stem those errors on 002, even though 004 seemed
perfectly fine after the repair. In the end, I rm'd 002's bitcask
directory, rejoined the cluster, and it seems to now be back in shape. No
errors, the nodes are all relatively similar in size -- 002 lags a little
behind the others, but not in a worrisome way.

I'm sure this was related to Bitcask merging, I just still haven't
pinpointed what it was. But I appreciate the input and suggestions.

johnny

On Fri, May 15, 2015 at 4:48 PM, Charlie Voiselle 
wrote:

> Johnny:
>
> Something else to look for would be any errors in the console.log related
> to Bitcask merging.  It would be interesting to see if the unusual disk
> utilization was related to a specific partition.  If it is, you could
> consider removing that particular partition and running riak_kv:repair to
> restore the replicas from the adjacent partitions.  I can provide more
> information if you find that to be the case.
>
> Regards,
> Charlie Voiselle
> Client Services, Basho
>
>
> On May 14, 2015 10:06 PM, "Engel Sanchez"  wrote:
>
>> Hi Johnny. Make sure that the configuration on that node is not different
>> to the others. For example, it could be configured to never merge Bitcask
>> files, so that space could never be reclaimed.
>>
>>
>> http://docs.basho.com/riak/latest/ops/advanced/backends/bitcask/#Configuring-Bitcask
>>
>> On Thu, May 14, 2015 at 4:31 PM, Johnny Tan  wrote:
>>
>>> We have a 6-node test riak cluster. One of the nodes seems to be using
>>> far more disk:
>>> staging-riak001.pp /dev/sda3  15G  6.3G  7.2G  47% /
>>> staging-riak002.pp /dev/sda3  15G  6.4G  7.1G  48% /
>>> staging-riak003.pp /dev/sda3  15G  6.1G  7.5G  45% /
>>> staging-riak004.pp /dev/sda3  15G   14G  266M  99% /
>>> staging-riak005.pp /dev/sda3  15G  5.8G  7.7G  44% /
>>> staging-riak006.pp /dev/sda3  15G  6.3G  7.3G  47% /
>>>
>>> Specifically, /var/lib/riak/bitcask is using up most of that space. It
>>> seems to have files in there that are much older than any of the other
>>> nodes. We've done maintenance of various sort on this cluster -- as the
>>> name indicates, we use it as a staging ground before we go to production. I
>>> don't recall a specific issue per se, but I wouldn't rule it out.
>>>
>>> Is there a way to figure out if there's an underlying issue here, or
>>> whether some of this disk space is not really current and can somehow be
>>> purged?
>>>
>>> What info would help answer those questions?
>>>
>>> johnny
>>>
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


uneven disk distribution

2015-05-14 Thread Johnny Tan
We have a 6-node test riak cluster. One of the nodes seems to be using far
more disk:
staging-riak001.pp /dev/sda3  15G  6.3G  7.2G  47% /
staging-riak002.pp /dev/sda3  15G  6.4G  7.1G  48% /
staging-riak003.pp /dev/sda3  15G  6.1G  7.5G  45% /
staging-riak004.pp /dev/sda3  15G   14G  266M  99% /
staging-riak005.pp /dev/sda3  15G  5.8G  7.7G  44% /
staging-riak006.pp /dev/sda3  15G  6.3G  7.3G  47% /

Specifically, /var/lib/riak/bitcask is using up most of that space. It
seems to have files in there that are much older than any of the other
nodes. We've done maintenance of various sort on this cluster -- as the
name indicates, we use it as a staging ground before we go to production. I
don't recall a specific issue per se, but I wouldn't rule it out.

Is there a way to figure out if there's an underlying issue here, or
whether some of this disk space is not really current and can somehow be
purged?

What info would help answer those questions?

johnny
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


loadbalancing

2010-05-18 Thread Johnny Tan
I assume the best practice is to use a virtual IP that is
loadbalanced to each member of a riak ring to read/write data?

Since there is no state, I assume stickiness is not an issue. Are
there any other potential gotchas?

johnny

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com