Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
Cool..

gave me an exception about 
** exception error: undefined shell command profit/0

but it worked and now I have new data.. thanks a lot!

Cheers
Simon

On Wed, 11 Dec 2013 17:05:29 -0500
Matthew Von-Maszewski  wrote:

> One of the core developers says that the following line should stop the stats 
> process.  It will then be automatically started, without the stuck data.
> 
> exit(whereis(riak_core_stat_calc_sup), kill), profit().
> 
> Matthew
> 
> On Dec 11, 2013, at 4:50 PM, Simon Effenberg  
> wrote:
> 
> > So I think I have no real chance to get good numbers. I can see a
> > little bit through the app monitoring but I'm not sure if I can see
> > real differences about the 100 -> 170 open_files increase.
> > 
> > I will try to change the value on the already migrated nodes as well to
> > see if this improves the stuff I can see..
> > 
> > Any other ideas?
> > 
> > Cheers
> > Simon
> > 
> > On Wed, 11 Dec 2013 15:37:03 -0500
> > Matthew Von-Maszewski  wrote:
> > 
> >> The real Riak developers have suggested this might be your problem with 
> >> stats being stuck:
> >> 
> >> https://github.com/basho/riak_core/pull/467
> >> 
> >> The fix is included in the upcoming 1.4.4 maintenance release (which is 
> >> overdue so I am not going to bother guessing when it will actually arrive).
> >> 
> >> Matthew
> >> 
> >> On Dec 11, 2013, at 2:47 PM, Simon Effenberg  
> >> wrote:
> >> 
> >>> I will do..
> >>> 
> >>> but one other thing:
> >>> 
> >>> watch Every 10.0s: sudo riak-admin status | grep put_fsm
> >>> node_put_fsm_time_mean : 2208050
> >>> node_put_fsm_time_median : 39231
> >>> node_put_fsm_time_95 : 17400382
> >>> node_put_fsm_time_99 : 50965752
> >>> node_put_fsm_time_100 : 59537762
> >>> node_put_fsm_active : 5
> >>> node_put_fsm_active_60s : 364
> >>> node_put_fsm_in_rate : 5
> >>> node_put_fsm_out_rate : 3
> >>> node_put_fsm_rejected : 0
> >>> node_put_fsm_rejected_60s : 0
> >>> node_put_fsm_rejected_total : 0
> >>> 
> >>> this is not changing at all.. so maybe my expectations are _wrong_?! So
> >>> I will start searching around if there is a "status" bug or I'm
> >>> looking in the wrong place... maybe there is no problem while searching
> >>> for one?! But I see that at least the app has some issues on GET and
> >>> PUT (more on PUT).. so I would like to know how fast the things are..
> >>> but "status" isn't working.. argh...
> >>> 
> >>> Cheers
> >>> Simon
> >>> 
> >>> 
> >>> On Wed, 11 Dec 2013 14:32:07 -0500
> >>> Matthew Von-Maszewski  wrote:
> >>> 
>  An additional thought:  if increasing max_open_files does NOT help, try 
>  removing +S 4:4 from the vm.args.  Typically +S setting helps leveldb, 
>  but one other user mentioned that the new sorted 2i queries needed more 
>  CPU in the Erlang layer.
>  
>  Summary:
>  - try increasing max_open_files to 170
>  - helps:  try setting sst_block_size to 32768 in app.config
>  - does not help:  try removing +S from vm.args
>  
>  Matthew
>  
>  On Dec 11, 2013, at 1:58 PM, Simon Effenberg  
>  wrote:
>  
> > Hi Matthew,
> > 
> > On Wed, 11 Dec 2013 18:38:49 +0100
> > Matthew Von-Maszewski  wrote:
> > 
> >> Simon,
> >> 
> >> I have plugged your various values into the attached spreadsheet.  I 
> >> assumed a vnode count to allow for one of your twelve servers to die 
> >> (256 ring size / 11 servers).
> > 
> > Great, thanks!
> > 
> >> 
> >> The spreadsheet suggests you can safely raise your max_open_files from 
> >> 100 to 170.  I would suggest doing this for the next server you 
> >> upgrade.  If part of your problem is file cache thrashing, you should 
> >> see an improvement.
> > 
> > I will try this out.. starting the next server in 3-4 hours.
> > 
> >> 
> >> Only if max_open_files helps, you should then consider adding 
> >> {sst_block_size, 32767} to the eleveldb portion of app.config.  This 
> >> setting, given your value sizes, would likely half the size of the 
> >> metadata held in the file cache.  It only impacts the files newly 
> >> compacted in the upgrade, and would gradually increase space in the 
> >> file cache while slowing down the file cache thrashing.
> > 
> > So I'll do this at the over-next server if the next server is fine.
> > 
> >> 
> >> What build/packaging of Riak do you use, or do you build from source?
> > 
> > Using the debian packages from the basho site..
> > 
> > I'm really wondering why the "put" performance is that bad.
> > Here are the changes which were introduced/changed only on the new
> > upgraded servers:
> > 
> > 
> > +fsm_limit => 5,
> > --- our '+P' is set to 262144 so more than 3x fsm_limit which was
> > --- stated somewhere
> > +# after finishing the upgrade this should be switched to v1 !!!
> > +object_format   

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Matthew Von-Maszewski
One of the core developers says that the following line should stop the stats 
process.  It will then be automatically started, without the stuck data.

exit(whereis(riak_core_stat_calc_sup), kill), profit().

Matthew

On Dec 11, 2013, at 4:50 PM, Simon Effenberg  wrote:

> So I think I have no real chance to get good numbers. I can see a
> little bit through the app monitoring but I'm not sure if I can see
> real differences about the 100 -> 170 open_files increase.
> 
> I will try to change the value on the already migrated nodes as well to
> see if this improves the stuff I can see..
> 
> Any other ideas?
> 
> Cheers
> Simon
> 
> On Wed, 11 Dec 2013 15:37:03 -0500
> Matthew Von-Maszewski  wrote:
> 
>> The real Riak developers have suggested this might be your problem with 
>> stats being stuck:
>> 
>> https://github.com/basho/riak_core/pull/467
>> 
>> The fix is included in the upcoming 1.4.4 maintenance release (which is 
>> overdue so I am not going to bother guessing when it will actually arrive).
>> 
>> Matthew
>> 
>> On Dec 11, 2013, at 2:47 PM, Simon Effenberg  
>> wrote:
>> 
>>> I will do..
>>> 
>>> but one other thing:
>>> 
>>> watch Every 10.0s: sudo riak-admin status | grep put_fsm
>>> node_put_fsm_time_mean : 2208050
>>> node_put_fsm_time_median : 39231
>>> node_put_fsm_time_95 : 17400382
>>> node_put_fsm_time_99 : 50965752
>>> node_put_fsm_time_100 : 59537762
>>> node_put_fsm_active : 5
>>> node_put_fsm_active_60s : 364
>>> node_put_fsm_in_rate : 5
>>> node_put_fsm_out_rate : 3
>>> node_put_fsm_rejected : 0
>>> node_put_fsm_rejected_60s : 0
>>> node_put_fsm_rejected_total : 0
>>> 
>>> this is not changing at all.. so maybe my expectations are _wrong_?! So
>>> I will start searching around if there is a "status" bug or I'm
>>> looking in the wrong place... maybe there is no problem while searching
>>> for one?! But I see that at least the app has some issues on GET and
>>> PUT (more on PUT).. so I would like to know how fast the things are..
>>> but "status" isn't working.. argh...
>>> 
>>> Cheers
>>> Simon
>>> 
>>> 
>>> On Wed, 11 Dec 2013 14:32:07 -0500
>>> Matthew Von-Maszewski  wrote:
>>> 
 An additional thought:  if increasing max_open_files does NOT help, try 
 removing +S 4:4 from the vm.args.  Typically +S setting helps leveldb, but 
 one other user mentioned that the new sorted 2i queries needed more CPU in 
 the Erlang layer.
 
 Summary:
 - try increasing max_open_files to 170
 - helps:  try setting sst_block_size to 32768 in app.config
 - does not help:  try removing +S from vm.args
 
 Matthew
 
 On Dec 11, 2013, at 1:58 PM, Simon Effenberg  
 wrote:
 
> Hi Matthew,
> 
> On Wed, 11 Dec 2013 18:38:49 +0100
> Matthew Von-Maszewski  wrote:
> 
>> Simon,
>> 
>> I have plugged your various values into the attached spreadsheet.  I 
>> assumed a vnode count to allow for one of your twelve servers to die 
>> (256 ring size / 11 servers).
> 
> Great, thanks!
> 
>> 
>> The spreadsheet suggests you can safely raise your max_open_files from 
>> 100 to 170.  I would suggest doing this for the next server you upgrade. 
>>  If part of your problem is file cache thrashing, you should see an 
>> improvement.
> 
> I will try this out.. starting the next server in 3-4 hours.
> 
>> 
>> Only if max_open_files helps, you should then consider adding 
>> {sst_block_size, 32767} to the eleveldb portion of app.config.  This 
>> setting, given your value sizes, would likely half the size of the 
>> metadata held in the file cache.  It only impacts the files newly 
>> compacted in the upgrade, and would gradually increase space in the file 
>> cache while slowing down the file cache thrashing.
> 
> So I'll do this at the over-next server if the next server is fine.
> 
>> 
>> What build/packaging of Riak do you use, or do you build from source?
> 
> Using the debian packages from the basho site..
> 
> I'm really wondering why the "put" performance is that bad.
> Here are the changes which were introduced/changed only on the new
> upgraded servers:
> 
> 
> +fsm_limit => 5,
> --- our '+P' is set to 262144 so more than 3x fsm_limit which was
> --- stated somewhere
> +# after finishing the upgrade this should be switched to v1 !!!
> +object_format => '__atom_v0',
> 
> -  '-env ERL_MAX_ETS_TABLES' => 8192,
> +  '-env ERL_MAX_ETS_TABLES'  => 256000, # old package used 8192
> but 1.4.2 raised it to this high number
> +  '-env ERL_MAX_PORTS'   => 64000,
> +  # Treat error_logger warnings as warnings
> +  '+W'   => 'w',
> +  # Tweak GC to run more often
> +  '-env ERL_FULLSWEEP_AFTER' => 0,
> +  # Force the erlang VM to use SMP

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
So I think I have no real chance to get good numbers. I can see a
little bit through the app monitoring but I'm not sure if I can see
real differences about the 100 -> 170 open_files increase.

I will try to change the value on the already migrated nodes as well to
see if this improves the stuff I can see..

Any other ideas?

Cheers
Simon

On Wed, 11 Dec 2013 15:37:03 -0500
Matthew Von-Maszewski  wrote:

> The real Riak developers have suggested this might be your problem with stats 
> being stuck:
> 
> https://github.com/basho/riak_core/pull/467
> 
> The fix is included in the upcoming 1.4.4 maintenance release (which is 
> overdue so I am not going to bother guessing when it will actually arrive).
> 
> Matthew
> 
> On Dec 11, 2013, at 2:47 PM, Simon Effenberg  
> wrote:
> 
> > I will do..
> > 
> > but one other thing:
> > 
> > watch Every 10.0s: sudo riak-admin status | grep put_fsm
> > node_put_fsm_time_mean : 2208050
> > node_put_fsm_time_median : 39231
> > node_put_fsm_time_95 : 17400382
> > node_put_fsm_time_99 : 50965752
> > node_put_fsm_time_100 : 59537762
> > node_put_fsm_active : 5
> > node_put_fsm_active_60s : 364
> > node_put_fsm_in_rate : 5
> > node_put_fsm_out_rate : 3
> > node_put_fsm_rejected : 0
> > node_put_fsm_rejected_60s : 0
> > node_put_fsm_rejected_total : 0
> > 
> > this is not changing at all.. so maybe my expectations are _wrong_?! So
> > I will start searching around if there is a "status" bug or I'm
> > looking in the wrong place... maybe there is no problem while searching
> > for one?! But I see that at least the app has some issues on GET and
> > PUT (more on PUT).. so I would like to know how fast the things are..
> > but "status" isn't working.. argh...
> > 
> > Cheers
> > Simon
> > 
> > 
> > On Wed, 11 Dec 2013 14:32:07 -0500
> > Matthew Von-Maszewski  wrote:
> > 
> >> An additional thought:  if increasing max_open_files does NOT help, try 
> >> removing +S 4:4 from the vm.args.  Typically +S setting helps leveldb, but 
> >> one other user mentioned that the new sorted 2i queries needed more CPU in 
> >> the Erlang layer.
> >> 
> >> Summary:
> >> - try increasing max_open_files to 170
> >>  - helps:  try setting sst_block_size to 32768 in app.config
> >>  - does not help:  try removing +S from vm.args
> >> 
> >> Matthew
> >> 
> >> On Dec 11, 2013, at 1:58 PM, Simon Effenberg  
> >> wrote:
> >> 
> >>> Hi Matthew,
> >>> 
> >>> On Wed, 11 Dec 2013 18:38:49 +0100
> >>> Matthew Von-Maszewski  wrote:
> >>> 
>  Simon,
>  
>  I have plugged your various values into the attached spreadsheet.  I 
>  assumed a vnode count to allow for one of your twelve servers to die 
>  (256 ring size / 11 servers).
> >>> 
> >>> Great, thanks!
> >>> 
>  
>  The spreadsheet suggests you can safely raise your max_open_files from 
>  100 to 170.  I would suggest doing this for the next server you upgrade. 
>   If part of your problem is file cache thrashing, you should see an 
>  improvement.
> >>> 
> >>> I will try this out.. starting the next server in 3-4 hours.
> >>> 
>  
>  Only if max_open_files helps, you should then consider adding 
>  {sst_block_size, 32767} to the eleveldb portion of app.config.  This 
>  setting, given your value sizes, would likely half the size of the 
>  metadata held in the file cache.  It only impacts the files newly 
>  compacted in the upgrade, and would gradually increase space in the file 
>  cache while slowing down the file cache thrashing.
> >>> 
> >>> So I'll do this at the over-next server if the next server is fine.
> >>> 
>  
>  What build/packaging of Riak do you use, or do you build from source?
> >>> 
> >>> Using the debian packages from the basho site..
> >>> 
> >>> I'm really wondering why the "put" performance is that bad.
> >>> Here are the changes which were introduced/changed only on the new
> >>> upgraded servers:
> >>> 
> >>> 
> >>> +fsm_limit => 5,
> >>> --- our '+P' is set to 262144 so more than 3x fsm_limit which was
> >>> --- stated somewhere
> >>> +# after finishing the upgrade this should be switched to v1 !!!
> >>> +object_format => '__atom_v0',
> >>> 
> >>> -  '-env ERL_MAX_ETS_TABLES' => 8192,
> >>> +  '-env ERL_MAX_ETS_TABLES'  => 256000, # old package used 8192
> >>> but 1.4.2 raised it to this high number
> >>> +  '-env ERL_MAX_PORTS'   => 64000,
> >>> +  # Treat error_logger warnings as warnings
> >>> +  '+W'   => 'w',
> >>> +  # Tweak GC to run more often
> >>> +  '-env ERL_FULLSWEEP_AFTER' => 0,
> >>> +  # Force the erlang VM to use SMP
> >>> +  '-smp' => 'enable',
> >>> +  #
> >>> 
> >>> Cheers
> >>> Simon
> >>> 
> >>> 
>  
>  Matthew
>  
>  
>  
>  On Dec 11, 2013, at 9:48 AM, Simon Effenberg  
>  wrote:
>  
> > Hi Matthew,
> > 
> > thanks 

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Matthew Von-Maszewski
The real Riak developers have suggested this might be your problem with stats 
being stuck:

https://github.com/basho/riak_core/pull/467

The fix is included in the upcoming 1.4.4 maintenance release (which is overdue 
so I am not going to bother guessing when it will actually arrive).

Matthew

On Dec 11, 2013, at 2:47 PM, Simon Effenberg  wrote:

> I will do..
> 
> but one other thing:
> 
> watch Every 10.0s: sudo riak-admin status | grep put_fsm
> node_put_fsm_time_mean : 2208050
> node_put_fsm_time_median : 39231
> node_put_fsm_time_95 : 17400382
> node_put_fsm_time_99 : 50965752
> node_put_fsm_time_100 : 59537762
> node_put_fsm_active : 5
> node_put_fsm_active_60s : 364
> node_put_fsm_in_rate : 5
> node_put_fsm_out_rate : 3
> node_put_fsm_rejected : 0
> node_put_fsm_rejected_60s : 0
> node_put_fsm_rejected_total : 0
> 
> this is not changing at all.. so maybe my expectations are _wrong_?! So
> I will start searching around if there is a "status" bug or I'm
> looking in the wrong place... maybe there is no problem while searching
> for one?! But I see that at least the app has some issues on GET and
> PUT (more on PUT).. so I would like to know how fast the things are..
> but "status" isn't working.. argh...
> 
> Cheers
> Simon
> 
> 
> On Wed, 11 Dec 2013 14:32:07 -0500
> Matthew Von-Maszewski  wrote:
> 
>> An additional thought:  if increasing max_open_files does NOT help, try 
>> removing +S 4:4 from the vm.args.  Typically +S setting helps leveldb, but 
>> one other user mentioned that the new sorted 2i queries needed more CPU in 
>> the Erlang layer.
>> 
>> Summary:
>> - try increasing max_open_files to 170
>>  - helps:  try setting sst_block_size to 32768 in app.config
>>  - does not help:  try removing +S from vm.args
>> 
>> Matthew
>> 
>> On Dec 11, 2013, at 1:58 PM, Simon Effenberg  
>> wrote:
>> 
>>> Hi Matthew,
>>> 
>>> On Wed, 11 Dec 2013 18:38:49 +0100
>>> Matthew Von-Maszewski  wrote:
>>> 
 Simon,
 
 I have plugged your various values into the attached spreadsheet.  I 
 assumed a vnode count to allow for one of your twelve servers to die (256 
 ring size / 11 servers).
>>> 
>>> Great, thanks!
>>> 
 
 The spreadsheet suggests you can safely raise your max_open_files from 100 
 to 170.  I would suggest doing this for the next server you upgrade.  If 
 part of your problem is file cache thrashing, you should see an 
 improvement.
>>> 
>>> I will try this out.. starting the next server in 3-4 hours.
>>> 
 
 Only if max_open_files helps, you should then consider adding 
 {sst_block_size, 32767} to the eleveldb portion of app.config.  This 
 setting, given your value sizes, would likely half the size of the 
 metadata held in the file cache.  It only impacts the files newly 
 compacted in the upgrade, and would gradually increase space in the file 
 cache while slowing down the file cache thrashing.
>>> 
>>> So I'll do this at the over-next server if the next server is fine.
>>> 
 
 What build/packaging of Riak do you use, or do you build from source?
>>> 
>>> Using the debian packages from the basho site..
>>> 
>>> I'm really wondering why the "put" performance is that bad.
>>> Here are the changes which were introduced/changed only on the new
>>> upgraded servers:
>>> 
>>> 
>>> +fsm_limit => 5,
>>> --- our '+P' is set to 262144 so more than 3x fsm_limit which was
>>> --- stated somewhere
>>> +# after finishing the upgrade this should be switched to v1 !!!
>>> +object_format => '__atom_v0',
>>> 
>>> -  '-env ERL_MAX_ETS_TABLES' => 8192,
>>> +  '-env ERL_MAX_ETS_TABLES'  => 256000, # old package used 8192
>>> but 1.4.2 raised it to this high number
>>> +  '-env ERL_MAX_PORTS'   => 64000,
>>> +  # Treat error_logger warnings as warnings
>>> +  '+W'   => 'w',
>>> +  # Tweak GC to run more often
>>> +  '-env ERL_FULLSWEEP_AFTER' => 0,
>>> +  # Force the erlang VM to use SMP
>>> +  '-smp' => 'enable',
>>> +  #
>>> 
>>> Cheers
>>> Simon
>>> 
>>> 
 
 Matthew
 
 
 
 On Dec 11, 2013, at 9:48 AM, Simon Effenberg  
 wrote:
 
> Hi Matthew,
> 
> thanks for all your time and work.. see inline for answers..
> 
> On Wed, 11 Dec 2013 09:17:32 -0500
> Matthew Von-Maszewski  wrote:
> 
>> The real Riak developers have arrived on-line for the day.  They are 
>> telling me that all of your problems are likely due to the extended 
>> upgrade times, and yes there is a known issue with handoff between 1.3 
>> and 1.4.  They also say everything should calm down after all nodes are 
>> upgraded.
>> 
>> I will review your system settings now and see if there is something 
>> that might make the other machines upgrade quicker.  So three more 
>> questions:
>> 
>> - what is 

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
I will do..

but one other thing:

watch Every 10.0s: sudo riak-admin status | grep put_fsm
node_put_fsm_time_mean : 2208050
node_put_fsm_time_median : 39231
node_put_fsm_time_95 : 17400382
node_put_fsm_time_99 : 50965752
node_put_fsm_time_100 : 59537762
node_put_fsm_active : 5
node_put_fsm_active_60s : 364
node_put_fsm_in_rate : 5
node_put_fsm_out_rate : 3
node_put_fsm_rejected : 0
node_put_fsm_rejected_60s : 0
node_put_fsm_rejected_total : 0

this is not changing at all.. so maybe my expectations are _wrong_?! So
I will start searching around if there is a "status" bug or I'm
looking in the wrong place... maybe there is no problem while searching
for one?! But I see that at least the app has some issues on GET and
PUT (more on PUT).. so I would like to know how fast the things are..
but "status" isn't working.. argh...

Cheers
Simon


On Wed, 11 Dec 2013 14:32:07 -0500
Matthew Von-Maszewski  wrote:

> An additional thought:  if increasing max_open_files does NOT help, try 
> removing +S 4:4 from the vm.args.  Typically +S setting helps leveldb, but 
> one other user mentioned that the new sorted 2i queries needed more CPU in 
> the Erlang layer.
> 
> Summary:
> - try increasing max_open_files to 170
>   - helps:  try setting sst_block_size to 32768 in app.config
>   - does not help:  try removing +S from vm.args
> 
> Matthew
> 
> On Dec 11, 2013, at 1:58 PM, Simon Effenberg  
> wrote:
> 
> > Hi Matthew,
> > 
> > On Wed, 11 Dec 2013 18:38:49 +0100
> > Matthew Von-Maszewski  wrote:
> > 
> >> Simon,
> >> 
> >> I have plugged your various values into the attached spreadsheet.  I 
> >> assumed a vnode count to allow for one of your twelve servers to die (256 
> >> ring size / 11 servers).
> > 
> > Great, thanks!
> > 
> >> 
> >> The spreadsheet suggests you can safely raise your max_open_files from 100 
> >> to 170.  I would suggest doing this for the next server you upgrade.  If 
> >> part of your problem is file cache thrashing, you should see an 
> >> improvement.
> > 
> > I will try this out.. starting the next server in 3-4 hours.
> > 
> >> 
> >> Only if max_open_files helps, you should then consider adding 
> >> {sst_block_size, 32767} to the eleveldb portion of app.config.  This 
> >> setting, given your value sizes, would likely half the size of the 
> >> metadata held in the file cache.  It only impacts the files newly 
> >> compacted in the upgrade, and would gradually increase space in the file 
> >> cache while slowing down the file cache thrashing.
> > 
> > So I'll do this at the over-next server if the next server is fine.
> > 
> >> 
> >> What build/packaging of Riak do you use, or do you build from source?
> > 
> > Using the debian packages from the basho site..
> > 
> > I'm really wondering why the "put" performance is that bad.
> > Here are the changes which were introduced/changed only on the new
> > upgraded servers:
> > 
> > 
> > +fsm_limit => 5,
> > --- our '+P' is set to 262144 so more than 3x fsm_limit which was
> > --- stated somewhere
> > +# after finishing the upgrade this should be switched to v1 !!!
> > +object_format => '__atom_v0',
> > 
> > -  '-env ERL_MAX_ETS_TABLES' => 8192,
> > +  '-env ERL_MAX_ETS_TABLES'  => 256000, # old package used 8192
> > but 1.4.2 raised it to this high number
> > +  '-env ERL_MAX_PORTS'   => 64000,
> > +  # Treat error_logger warnings as warnings
> > +  '+W'   => 'w',
> > +  # Tweak GC to run more often
> > +  '-env ERL_FULLSWEEP_AFTER' => 0,
> > +  # Force the erlang VM to use SMP
> > +  '-smp' => 'enable',
> > +  #
> > 
> > Cheers
> > Simon
> > 
> > 
> >> 
> >> Matthew
> >> 
> >> 
> >> 
> >> On Dec 11, 2013, at 9:48 AM, Simon Effenberg  
> >> wrote:
> >> 
> >>> Hi Matthew,
> >>> 
> >>> thanks for all your time and work.. see inline for answers..
> >>> 
> >>> On Wed, 11 Dec 2013 09:17:32 -0500
> >>> Matthew Von-Maszewski  wrote:
> >>> 
>  The real Riak developers have arrived on-line for the day.  They are 
>  telling me that all of your problems are likely due to the extended 
>  upgrade times, and yes there is a known issue with handoff between 1.3 
>  and 1.4.  They also say everything should calm down after all nodes are 
>  upgraded.
>  
>  I will review your system settings now and see if there is something 
>  that might make the other machines upgrade quicker.  So three more 
>  questions:
>  
>  - what is the average size of your keys
> >>> 
> >>> bucket names are between 5 and 15 characters (only ~ 10 buckets)..
> >>> key names are normally something like 26iesj:hovh7egz
> >>> 
>  
>  - what is the average size of your value (data stored)
> >>> 
> >>> I have to guess.. but mean is (from Riak) 12kb but 95th percentile is
> >>> at 75kb and in theory we have a limit of 1MB (then it will be split up)
> >>> but sometimes 

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Matthew Von-Maszewski
An additional thought:  if increasing max_open_files does NOT help, try 
removing +S 4:4 from the vm.args.  Typically +S setting helps leveldb, but one 
other user mentioned that the new sorted 2i queries needed more CPU in the 
Erlang layer.

Summary:
- try increasing max_open_files to 170
  - helps:  try setting sst_block_size to 32768 in app.config
  - does not help:  try removing +S from vm.args

Matthew

On Dec 11, 2013, at 1:58 PM, Simon Effenberg  wrote:

> Hi Matthew,
> 
> On Wed, 11 Dec 2013 18:38:49 +0100
> Matthew Von-Maszewski  wrote:
> 
>> Simon,
>> 
>> I have plugged your various values into the attached spreadsheet.  I assumed 
>> a vnode count to allow for one of your twelve servers to die (256 ring size 
>> / 11 servers).
> 
> Great, thanks!
> 
>> 
>> The spreadsheet suggests you can safely raise your max_open_files from 100 
>> to 170.  I would suggest doing this for the next server you upgrade.  If 
>> part of your problem is file cache thrashing, you should see an improvement.
> 
> I will try this out.. starting the next server in 3-4 hours.
> 
>> 
>> Only if max_open_files helps, you should then consider adding 
>> {sst_block_size, 32767} to the eleveldb portion of app.config.  This 
>> setting, given your value sizes, would likely half the size of the metadata 
>> held in the file cache.  It only impacts the files newly compacted in the 
>> upgrade, and would gradually increase space in the file cache while slowing 
>> down the file cache thrashing.
> 
> So I'll do this at the over-next server if the next server is fine.
> 
>> 
>> What build/packaging of Riak do you use, or do you build from source?
> 
> Using the debian packages from the basho site..
> 
> I'm really wondering why the "put" performance is that bad.
> Here are the changes which were introduced/changed only on the new
> upgraded servers:
> 
> 
> +fsm_limit => 5,
> --- our '+P' is set to 262144 so more than 3x fsm_limit which was
> --- stated somewhere
> +# after finishing the upgrade this should be switched to v1 !!!
> +object_format => '__atom_v0',
> 
> -  '-env ERL_MAX_ETS_TABLES' => 8192,
> +  '-env ERL_MAX_ETS_TABLES'  => 256000, # old package used 8192
> but 1.4.2 raised it to this high number
> +  '-env ERL_MAX_PORTS'   => 64000,
> +  # Treat error_logger warnings as warnings
> +  '+W'   => 'w',
> +  # Tweak GC to run more often
> +  '-env ERL_FULLSWEEP_AFTER' => 0,
> +  # Force the erlang VM to use SMP
> +  '-smp' => 'enable',
> +  #
> 
> Cheers
> Simon
> 
> 
>> 
>> Matthew
>> 
>> 
>> 
>> On Dec 11, 2013, at 9:48 AM, Simon Effenberg  
>> wrote:
>> 
>>> Hi Matthew,
>>> 
>>> thanks for all your time and work.. see inline for answers..
>>> 
>>> On Wed, 11 Dec 2013 09:17:32 -0500
>>> Matthew Von-Maszewski  wrote:
>>> 
 The real Riak developers have arrived on-line for the day.  They are 
 telling me that all of your problems are likely due to the extended 
 upgrade times, and yes there is a known issue with handoff between 1.3 and 
 1.4.  They also say everything should calm down after all nodes are 
 upgraded.
 
 I will review your system settings now and see if there is something that 
 might make the other machines upgrade quicker.  So three more questions:
 
 - what is the average size of your keys
>>> 
>>> bucket names are between 5 and 15 characters (only ~ 10 buckets)..
>>> key names are normally something like 26iesj:hovh7egz
>>> 
 
 - what is the average size of your value (data stored)
>>> 
>>> I have to guess.. but mean is (from Riak) 12kb but 95th percentile is
>>> at 75kb and in theory we have a limit of 1MB (then it will be split up)
>>> but sometimes thanks to sibblings (we have to buckets with allow_mult)
>>> we have also some 7MB in MAX but this will be reduced again (it's a new
>>> feature in our app which has to many parallel wrights below of 15ms).
>>> 
 
 - in regular use, are your keys accessed randomly across their entire 
 range, or do they contain a date component which clusters older, less used 
 keys
>>> 
>>> normally we don't search but retrieve keys by key name.. and we have
>>> data which is up to 6 months old and normally we access mostly
>>> new/active/hot data and not all the old ones.. besides this we have a
>>> job doing a 2i query every 5mins and another one doing this maybe once
>>> an hour.. both don't work while the upgrade is ongoing (2i isn't
>>> working).
>>> 
>>> Cheers
>>> Simon
>>> 
 
 Matthew
 
 
 On Dec 11, 2013, at 8:43 AM, Simon Effenberg  
 wrote:
 
> Oh and at the moment they are waiting for some handoffs and I see
> errors in logfiles:
> 
> 
> 2013-12-11 13:41:47.948 UTC [error]
> <0.7157.24>@riak_core_handoff_sender:start_fold:269 hinted_handoff
> transfer of riak_kv_vnode from 

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
Hi Matthew,

On Wed, 11 Dec 2013 18:38:49 +0100
Matthew Von-Maszewski  wrote:

> Simon,
> 
> I have plugged your various values into the attached spreadsheet.  I assumed 
> a vnode count to allow for one of your twelve servers to die (256 ring size / 
> 11 servers).

Great, thanks!

> 
> The spreadsheet suggests you can safely raise your max_open_files from 100 to 
> 170.  I would suggest doing this for the next server you upgrade.  If part of 
> your problem is file cache thrashing, you should see an improvement.

I will try this out.. starting the next server in 3-4 hours.

> 
> Only if max_open_files helps, you should then consider adding 
> {sst_block_size, 32767} to the eleveldb portion of app.config.  This setting, 
> given your value sizes, would likely half the size of the metadata held in 
> the file cache.  It only impacts the files newly compacted in the upgrade, 
> and would gradually increase space in the file cache while slowing down the 
> file cache thrashing.

So I'll do this at the over-next server if the next server is fine.

> 
> What build/packaging of Riak do you use, or do you build from source?

Using the debian packages from the basho site..

I'm really wondering why the "put" performance is that bad.
Here are the changes which were introduced/changed only on the new
upgraded servers:


+fsm_limit => 5,
--- our '+P' is set to 262144 so more than 3x fsm_limit which was
--- stated somewhere
+# after finishing the upgrade this should be switched to v1 !!!
+object_format => '__atom_v0',

-  '-env ERL_MAX_ETS_TABLES' => 8192,
+  '-env ERL_MAX_ETS_TABLES'  => 256000, # old package used 8192
but 1.4.2 raised it to this high number
+  '-env ERL_MAX_PORTS'   => 64000,
+  # Treat error_logger warnings as warnings
+  '+W'   => 'w',
+  # Tweak GC to run more often
+  '-env ERL_FULLSWEEP_AFTER' => 0,
+  # Force the erlang VM to use SMP
+  '-smp' => 'enable',
+  #

Cheers
Simon


> 
> Matthew
> 
> 
> 
> On Dec 11, 2013, at 9:48 AM, Simon Effenberg  
> wrote:
> 
> > Hi Matthew,
> >
> > thanks for all your time and work.. see inline for answers..
> >
> > On Wed, 11 Dec 2013 09:17:32 -0500
> > Matthew Von-Maszewski  wrote:
> >
> >> The real Riak developers have arrived on-line for the day.  They are 
> >> telling me that all of your problems are likely due to the extended 
> >> upgrade times, and yes there is a known issue with handoff between 1.3 and 
> >> 1.4.  They also say everything should calm down after all nodes are 
> >> upgraded.
> >>
> >> I will review your system settings now and see if there is something that 
> >> might make the other machines upgrade quicker.  So three more questions:
> >>
> >> - what is the average size of your keys
> >
> > bucket names are between 5 and 15 characters (only ~ 10 buckets)..
> > key names are normally something like 26iesj:hovh7egz
> >
> >>
> >> - what is the average size of your value (data stored)
> >
> > I have to guess.. but mean is (from Riak) 12kb but 95th percentile is
> > at 75kb and in theory we have a limit of 1MB (then it will be split up)
> > but sometimes thanks to sibblings (we have to buckets with allow_mult)
> > we have also some 7MB in MAX but this will be reduced again (it's a new
> > feature in our app which has to many parallel wrights below of 15ms).
> >
> >>
> >> - in regular use, are your keys accessed randomly across their entire 
> >> range, or do they contain a date component which clusters older, less used 
> >> keys
> >
> > normally we don't search but retrieve keys by key name.. and we have
> > data which is up to 6 months old and normally we access mostly
> > new/active/hot data and not all the old ones.. besides this we have a
> > job doing a 2i query every 5mins and another one doing this maybe once
> > an hour.. both don't work while the upgrade is ongoing (2i isn't
> > working).
> >
> > Cheers
> > Simon
> >
> >>
> >> Matthew
> >>
> >>
> >> On Dec 11, 2013, at 8:43 AM, Simon Effenberg  
> >> wrote:
> >>
> >>> Oh and at the moment they are waiting for some handoffs and I see
> >>> errors in logfiles:
> >>>
> >>>
> >>> 2013-12-11 13:41:47.948 UTC [error]
> >>> <0.7157.24>@riak_core_handoff_sender:start_fold:269 hinted_handoff
> >>> transfer of riak_kv_vnode from 'riak@10.46.109.202'
> >>> 468137243207554840987117797979434404733540892672
> >>>
> >>> but I remember that somebody else had this as well and if I recall
> >>> correctly it disappeared after the full upgrade was done.. but at the
> >>> moment it's hard to think about upgrading everything at once..
> >>> (~12hours 100% disk utilization on all 12 nodes will lead to real slow
> >>> puts/gets)
> >>>
> >>> What can I do?
> >>>
> >>> Cheers
> >>> Simon
> >>>
> >>> PS: transfers output:
> >>>
> >>> 'riak@10.46.109.202' waiting to handoff 17 partitions
> >>> 'riak@10.46.109.201' waiting to handoff 19 

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
Hi Matthew,

thanks for all your time and work.. see inline for answers..

On Wed, 11 Dec 2013 09:17:32 -0500
Matthew Von-Maszewski  wrote:

> The real Riak developers have arrived on-line for the day.  They are telling 
> me that all of your problems are likely due to the extended upgrade times, 
> and yes there is a known issue with handoff between 1.3 and 1.4.  They also 
> say everything should calm down after all nodes are upgraded.
> 
> I will review your system settings now and see if there is something that 
> might make the other machines upgrade quicker.  So three more questions:
> 
> - what is the average size of your keys

bucket names are between 5 and 15 characters (only ~ 10 buckets)..
key names are normally something like 26iesj:hovh7egz

> 
> - what is the average size of your value (data stored)

I have to guess.. but mean is (from Riak) 12kb but 95th percentile is
at 75kb and in theory we have a limit of 1MB (then it will be split up)
but sometimes thanks to sibblings (we have to buckets with allow_mult)
we have also some 7MB in MAX but this will be reduced again (it's a new
feature in our app which has to many parallel wrights below of 15ms).

> 
> - in regular use, are your keys accessed randomly across their entire range, 
> or do they contain a date component which clusters older, less used keys

normally we don't search but retrieve keys by key name.. and we have
data which is up to 6 months old and normally we access mostly
new/active/hot data and not all the old ones.. besides this we have a
job doing a 2i query every 5mins and another one doing this maybe once
an hour.. both don't work while the upgrade is ongoing (2i isn't
working).

Cheers
Simon

> 
> Matthew
> 
> 
> On Dec 11, 2013, at 8:43 AM, Simon Effenberg  
> wrote:
> 
> > Oh and at the moment they are waiting for some handoffs and I see
> > errors in logfiles:
> > 
> > 
> > 2013-12-11 13:41:47.948 UTC [error]
> > <0.7157.24>@riak_core_handoff_sender:start_fold:269 hinted_handoff
> > transfer of riak_kv_vnode from 'riak@10.46.109.202'
> > 468137243207554840987117797979434404733540892672
> > 
> > but I remember that somebody else had this as well and if I recall
> > correctly it disappeared after the full upgrade was done.. but at the
> > moment it's hard to think about upgrading everything at once..
> > (~12hours 100% disk utilization on all 12 nodes will lead to real slow
> > puts/gets)
> > 
> > What can I do?
> > 
> > Cheers
> > Simon
> > 
> > PS: transfers output:
> > 
> > 'riak@10.46.109.202' waiting to handoff 17 partitions
> > 'riak@10.46.109.201' waiting to handoff 19 partitions
> > 
> > (these are the 1.4.2 nodes)
> > 
> > 
> > On Wed, 11 Dec 2013 14:39:58 +0100
> > Simon Effenberg  wrote:
> > 
> >> Also some side notes:
> >> 
> >> "top" is even better on new 1.4.2 than on 1.3.1 machines.. IO
> >> utilization of disk is mostly the same (round about 33%)..
> >> 
> >> but
> >> 
> >> 95th percentile of response time for get (avg over all nodes):
> >>  before upgrade: 29ms
> >>  after upgrade: almost the same
> >> 
> >> 95th percentile of response time for put (avg over all nodes):
> >>  before upgrade: 60ms
> >>  after upgrade: 1548ms 
> >>but this is only because of 2 of 12 nodes are
> >>on 1.4.2 and are really slow (17000ms)
> >> 
> >> Cheers,
> >> Simon
> >> 
> >> On Wed, 11 Dec 2013 13:45:56 +0100
> >> Simon Effenberg  wrote:
> >> 
> >>> Sorry I forgot the half of it..
> >>> 
> >>> seffenberg@kriak46-1:~$ free -m
> >>> total   used   free sharedbuffers cached
> >>> Mem: 23999  23759239  0184  16183
> >>> -/+ buffers/cache:   7391  16607
> >>> Swap:0  0  0
> >>> 
> >>> We have 12 servers..
> >>> datadir on the compacted servers (1.4.2) ~ 765 GB
> >>> 
> >>> AAE is enabled.
> >>> 
> >>> I attached app.config and vm.args.
> >>> 
> >>> Cheers
> >>> Simon
> >>> 
> >>> On Wed, 11 Dec 2013 07:33:31 -0500
> >>> Matthew Von-Maszewski  wrote:
> >>> 
>  Ok, I am now suspecting that your servers are either using swap space 
>  (which is slow) or your leveldb file cache is thrashing (opening and 
>  closing multiple files per request).
>  
>  How many servers do you have and do you use Riak's active anti-entropy 
>  feature?  I am going to plug all of this into a spreadsheet. 
>  
>  Matthew Von-Maszewski
>  
>  
>  On Dec 11, 2013, at 7:09, Simon Effenberg  
>  wrote:
>  
> > Hi Matthew
> > 
> > Memory: 23999 MB
> > 
> > ring_creation_size, 256
> > max_open_files, 100
> > 
> > riak-admin status:
> > 
> > memory_total : 276001360
> > memory_processes : 191506322
> > memory_processes_used : 191439568
> > memory_system : 84495038
> > memory_atom : 686993
> > memory_atom_used : 686560
> > memory_binary : 21965352
> > memory_code : 11332732
> > memory_ets : 10823528
> > 
> > Thanks for look

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Matthew Von-Maszewski
The real Riak developers have arrived on-line for the day.  They are telling me 
that all of your problems are likely due to the extended upgrade times, and yes 
there is a known issue with handoff between 1.3 and 1.4.  They also say 
everything should calm down after all nodes are upgraded.

I will review your system settings now and see if there is something that might 
make the other machines upgrade quicker.  So three more questions:

- what is the average size of your keys

- what is the average size of your value (data stored)

- in regular use, are your keys accessed randomly across their entire range, or 
do they contain a date component which clusters older, less used keys

Matthew


On Dec 11, 2013, at 8:43 AM, Simon Effenberg  wrote:

> Oh and at the moment they are waiting for some handoffs and I see
> errors in logfiles:
> 
> 
> 2013-12-11 13:41:47.948 UTC [error]
> <0.7157.24>@riak_core_handoff_sender:start_fold:269 hinted_handoff
> transfer of riak_kv_vnode from 'riak@10.46.109.202'
> 468137243207554840987117797979434404733540892672
> 
> but I remember that somebody else had this as well and if I recall
> correctly it disappeared after the full upgrade was done.. but at the
> moment it's hard to think about upgrading everything at once..
> (~12hours 100% disk utilization on all 12 nodes will lead to real slow
> puts/gets)
> 
> What can I do?
> 
> Cheers
> Simon
> 
> PS: transfers output:
> 
> 'riak@10.46.109.202' waiting to handoff 17 partitions
> 'riak@10.46.109.201' waiting to handoff 19 partitions
> 
> (these are the 1.4.2 nodes)
> 
> 
> On Wed, 11 Dec 2013 14:39:58 +0100
> Simon Effenberg  wrote:
> 
>> Also some side notes:
>> 
>> "top" is even better on new 1.4.2 than on 1.3.1 machines.. IO
>> utilization of disk is mostly the same (round about 33%)..
>> 
>> but
>> 
>> 95th percentile of response time for get (avg over all nodes):
>>  before upgrade: 29ms
>>  after upgrade: almost the same
>> 
>> 95th percentile of response time for put (avg over all nodes):
>>  before upgrade: 60ms
>>  after upgrade: 1548ms 
>>but this is only because of 2 of 12 nodes are
>>on 1.4.2 and are really slow (17000ms)
>> 
>> Cheers,
>> Simon
>> 
>> On Wed, 11 Dec 2013 13:45:56 +0100
>> Simon Effenberg  wrote:
>> 
>>> Sorry I forgot the half of it..
>>> 
>>> seffenberg@kriak46-1:~$ free -m
>>> total   used   free sharedbuffers cached
>>> Mem: 23999  23759239  0184  16183
>>> -/+ buffers/cache:   7391  16607
>>> Swap:0  0  0
>>> 
>>> We have 12 servers..
>>> datadir on the compacted servers (1.4.2) ~ 765 GB
>>> 
>>> AAE is enabled.
>>> 
>>> I attached app.config and vm.args.
>>> 
>>> Cheers
>>> Simon
>>> 
>>> On Wed, 11 Dec 2013 07:33:31 -0500
>>> Matthew Von-Maszewski  wrote:
>>> 
 Ok, I am now suspecting that your servers are either using swap space 
 (which is slow) or your leveldb file cache is thrashing (opening and 
 closing multiple files per request).
 
 How many servers do you have and do you use Riak's active anti-entropy 
 feature?  I am going to plug all of this into a spreadsheet. 
 
 Matthew Von-Maszewski
 
 
 On Dec 11, 2013, at 7:09, Simon Effenberg  
 wrote:
 
> Hi Matthew
> 
> Memory: 23999 MB
> 
> ring_creation_size, 256
> max_open_files, 100
> 
> riak-admin status:
> 
> memory_total : 276001360
> memory_processes : 191506322
> memory_processes_used : 191439568
> memory_system : 84495038
> memory_atom : 686993
> memory_atom_used : 686560
> memory_binary : 21965352
> memory_code : 11332732
> memory_ets : 10823528
> 
> Thanks for looking!
> 
> Cheers
> Simon
> 
> 
> 
> On Wed, 11 Dec 2013 06:44:42 -0500
> Matthew Von-Maszewski  wrote:
> 
>> I need to ask other developers as they arrive for the new day.  Does not 
>> make sense to me.
>> 
>> How many nodes do you have?  How much RAM do you have in each node?  
>> What are your settings for max_open_files and cache_size in the 
>> app.config file?  Maybe this is as simple as leveldb using too much RAM 
>> in 1.4.  The memory accounting for maz_open_files changed in 1.4.
>> 
>> Matthew Von-Maszewski
>> 
>> 
>> On Dec 11, 2013, at 6:28, Simon Effenberg  
>> wrote:
>> 
>>> Hi Matthew,
>>> 
>>> it took around 11hours for the first node to finish the compaction. The
>>> second node is running already 12 hours and is still doing compaction.
>>> 
>>> Besides that I wonder because the fsm_put time on the new 1.4.2 host is
>>> much higher (after the compaction) than on an old 1.3.1 (both are
>>> running in the cluster right now and another one is doing the
>>> compaction/upgrade while it is in the cluster but not directly
>>> accessible because it is out of the Loadbalancer):
>>> 

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
Oh and at the moment they are waiting for some handoffs and I see
errors in logfiles:


2013-12-11 13:41:47.948 UTC [error]
<0.7157.24>@riak_core_handoff_sender:start_fold:269 hinted_handoff
transfer of riak_kv_vnode from 'riak@10.46.109.202'
468137243207554840987117797979434404733540892672

but I remember that somebody else had this as well and if I recall
correctly it disappeared after the full upgrade was done.. but at the
moment it's hard to think about upgrading everything at once..
(~12hours 100% disk utilization on all 12 nodes will lead to real slow
puts/gets)

What can I do?

Cheers
Simon

PS: transfers output:

'riak@10.46.109.202' waiting to handoff 17 partitions
'riak@10.46.109.201' waiting to handoff 19 partitions

(these are the 1.4.2 nodes)


On Wed, 11 Dec 2013 14:39:58 +0100
Simon Effenberg  wrote:

> Also some side notes:
> 
> "top" is even better on new 1.4.2 than on 1.3.1 machines.. IO
> utilization of disk is mostly the same (round about 33%)..
> 
> but
> 
> 95th percentile of response time for get (avg over all nodes):
>   before upgrade: 29ms
>   after upgrade: almost the same
> 
> 95th percentile of response time for put (avg over all nodes):
>   before upgrade: 60ms
>   after upgrade: 1548ms 
> but this is only because of 2 of 12 nodes are
> on 1.4.2 and are really slow (17000ms)
> 
> Cheers,
> Simon
> 
> On Wed, 11 Dec 2013 13:45:56 +0100
> Simon Effenberg  wrote:
> 
> > Sorry I forgot the half of it..
> > 
> > seffenberg@kriak46-1:~$ free -m
> >  total   used   free sharedbuffers cached
> > Mem: 23999  23759239  0184  16183
> > -/+ buffers/cache:   7391  16607
> > Swap:0  0  0
> > 
> > We have 12 servers..
> > datadir on the compacted servers (1.4.2) ~ 765 GB
> > 
> > AAE is enabled.
> > 
> > I attached app.config and vm.args.
> > 
> > Cheers
> > Simon
> > 
> > On Wed, 11 Dec 2013 07:33:31 -0500
> > Matthew Von-Maszewski  wrote:
> > 
> > > Ok, I am now suspecting that your servers are either using swap space 
> > > (which is slow) or your leveldb file cache is thrashing (opening and 
> > > closing multiple files per request).
> > > 
> > > How many servers do you have and do you use Riak's active anti-entropy 
> > > feature?  I am going to plug all of this into a spreadsheet. 
> > > 
> > > Matthew Von-Maszewski
> > > 
> > > 
> > > On Dec 11, 2013, at 7:09, Simon Effenberg  
> > > wrote:
> > > 
> > > > Hi Matthew
> > > > 
> > > > Memory: 23999 MB
> > > > 
> > > > ring_creation_size, 256
> > > > max_open_files, 100
> > > > 
> > > > riak-admin status:
> > > > 
> > > > memory_total : 276001360
> > > > memory_processes : 191506322
> > > > memory_processes_used : 191439568
> > > > memory_system : 84495038
> > > > memory_atom : 686993
> > > > memory_atom_used : 686560
> > > > memory_binary : 21965352
> > > > memory_code : 11332732
> > > > memory_ets : 10823528
> > > > 
> > > > Thanks for looking!
> > > > 
> > > > Cheers
> > > > Simon
> > > > 
> > > > 
> > > > 
> > > > On Wed, 11 Dec 2013 06:44:42 -0500
> > > > Matthew Von-Maszewski  wrote:
> > > > 
> > > >> I need to ask other developers as they arrive for the new day.  Does 
> > > >> not make sense to me.
> > > >> 
> > > >> How many nodes do you have?  How much RAM do you have in each node?  
> > > >> What are your settings for max_open_files and cache_size in the 
> > > >> app.config file?  Maybe this is as simple as leveldb using too much 
> > > >> RAM in 1.4.  The memory accounting for maz_open_files changed in 1.4.
> > > >> 
> > > >> Matthew Von-Maszewski
> > > >> 
> > > >> 
> > > >> On Dec 11, 2013, at 6:28, Simon Effenberg  
> > > >> wrote:
> > > >> 
> > > >>> Hi Matthew,
> > > >>> 
> > > >>> it took around 11hours for the first node to finish the compaction. 
> > > >>> The
> > > >>> second node is running already 12 hours and is still doing compaction.
> > > >>> 
> > > >>> Besides that I wonder because the fsm_put time on the new 1.4.2 host 
> > > >>> is
> > > >>> much higher (after the compaction) than on an old 1.3.1 (both are
> > > >>> running in the cluster right now and another one is doing the
> > > >>> compaction/upgrade while it is in the cluster but not directly
> > > >>> accessible because it is out of the Loadbalancer):
> > > >>> 
> > > >>> 1.4.2:
> > > >>> 
> > > >>> node_put_fsm_time_mean : 2208050
> > > >>> node_put_fsm_time_median : 39231
> > > >>> node_put_fsm_time_95 : 17400382
> > > >>> node_put_fsm_time_99 : 50965752
> > > >>> node_put_fsm_time_100 : 59537762
> > > >>> node_put_fsm_active : 5
> > > >>> node_put_fsm_active_60s : 364
> > > >>> node_put_fsm_in_rate : 5
> > > >>> node_put_fsm_out_rate : 3
> > > >>> node_put_fsm_rejected : 0
> > > >>> node_put_fsm_rejected_60s : 0
> > > >>> node_put_fsm_rejected_total : 0
> > > >>> 
> > > >>> 
> > > >>> 1.3.1:
> > > >>> 
> > > >>> node_put_fsm_time_mean : 5036
> > > >>> node_put_fsm_time_median : 1614
> > > >>> node_put_fsm_time_95 : 8789
> > > >>> node_pu

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
Also some side notes:

"top" is even better on new 1.4.2 than on 1.3.1 machines.. IO
utilization of disk is mostly the same (round about 33%)..

but

95th percentile of response time for get (avg over all nodes):
  before upgrade: 29ms
  after upgrade: almost the same

95th percentile of response time for put (avg over all nodes):
  before upgrade: 60ms
  after upgrade: 1548ms 
but this is only because of 2 of 12 nodes are
on 1.4.2 and are really slow (17000ms)

Cheers,
Simon

On Wed, 11 Dec 2013 13:45:56 +0100
Simon Effenberg  wrote:

> Sorry I forgot the half of it..
> 
> seffenberg@kriak46-1:~$ free -m
>  total   used   free sharedbuffers cached
> Mem: 23999  23759239  0184  16183
> -/+ buffers/cache:   7391  16607
> Swap:0  0  0
> 
> We have 12 servers..
> datadir on the compacted servers (1.4.2) ~ 765 GB
> 
> AAE is enabled.
> 
> I attached app.config and vm.args.
> 
> Cheers
> Simon
> 
> On Wed, 11 Dec 2013 07:33:31 -0500
> Matthew Von-Maszewski  wrote:
> 
> > Ok, I am now suspecting that your servers are either using swap space 
> > (which is slow) or your leveldb file cache is thrashing (opening and 
> > closing multiple files per request).
> > 
> > How many servers do you have and do you use Riak's active anti-entropy 
> > feature?  I am going to plug all of this into a spreadsheet. 
> > 
> > Matthew Von-Maszewski
> > 
> > 
> > On Dec 11, 2013, at 7:09, Simon Effenberg  wrote:
> > 
> > > Hi Matthew
> > > 
> > > Memory: 23999 MB
> > > 
> > > ring_creation_size, 256
> > > max_open_files, 100
> > > 
> > > riak-admin status:
> > > 
> > > memory_total : 276001360
> > > memory_processes : 191506322
> > > memory_processes_used : 191439568
> > > memory_system : 84495038
> > > memory_atom : 686993
> > > memory_atom_used : 686560
> > > memory_binary : 21965352
> > > memory_code : 11332732
> > > memory_ets : 10823528
> > > 
> > > Thanks for looking!
> > > 
> > > Cheers
> > > Simon
> > > 
> > > 
> > > 
> > > On Wed, 11 Dec 2013 06:44:42 -0500
> > > Matthew Von-Maszewski  wrote:
> > > 
> > >> I need to ask other developers as they arrive for the new day.  Does not 
> > >> make sense to me.
> > >> 
> > >> How many nodes do you have?  How much RAM do you have in each node?  
> > >> What are your settings for max_open_files and cache_size in the 
> > >> app.config file?  Maybe this is as simple as leveldb using too much RAM 
> > >> in 1.4.  The memory accounting for maz_open_files changed in 1.4.
> > >> 
> > >> Matthew Von-Maszewski
> > >> 
> > >> 
> > >> On Dec 11, 2013, at 6:28, Simon Effenberg  
> > >> wrote:
> > >> 
> > >>> Hi Matthew,
> > >>> 
> > >>> it took around 11hours for the first node to finish the compaction. The
> > >>> second node is running already 12 hours and is still doing compaction.
> > >>> 
> > >>> Besides that I wonder because the fsm_put time on the new 1.4.2 host is
> > >>> much higher (after the compaction) than on an old 1.3.1 (both are
> > >>> running in the cluster right now and another one is doing the
> > >>> compaction/upgrade while it is in the cluster but not directly
> > >>> accessible because it is out of the Loadbalancer):
> > >>> 
> > >>> 1.4.2:
> > >>> 
> > >>> node_put_fsm_time_mean : 2208050
> > >>> node_put_fsm_time_median : 39231
> > >>> node_put_fsm_time_95 : 17400382
> > >>> node_put_fsm_time_99 : 50965752
> > >>> node_put_fsm_time_100 : 59537762
> > >>> node_put_fsm_active : 5
> > >>> node_put_fsm_active_60s : 364
> > >>> node_put_fsm_in_rate : 5
> > >>> node_put_fsm_out_rate : 3
> > >>> node_put_fsm_rejected : 0
> > >>> node_put_fsm_rejected_60s : 0
> > >>> node_put_fsm_rejected_total : 0
> > >>> 
> > >>> 
> > >>> 1.3.1:
> > >>> 
> > >>> node_put_fsm_time_mean : 5036
> > >>> node_put_fsm_time_median : 1614
> > >>> node_put_fsm_time_95 : 8789
> > >>> node_put_fsm_time_99 : 38258
> > >>> node_put_fsm_time_100 : 384372
> > >>> 
> > >>> 
> > >>> any clue why this could/should be?
> > >>> 
> > >>> Cheers
> > >>> Simon
> > >>> 
> > >>> On Tue, 10 Dec 2013 17:21:07 +0100
> > >>> Simon Effenberg  wrote:
> > >>> 
> >  Hi Matthew,
> >  
> >  thanks!.. that answers my questions!
> >  
> >  Cheers
> >  Simon
> >  
> >  On Tue, 10 Dec 2013 11:08:32 -0500
> >  Matthew Von-Maszewski  wrote:
> >  
> > > 2i is not my expertise, so I had to discuss you concerns with another 
> > > Basho developer.  He says:
> > > 
> > > Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk 
> > > format.  You must wait for all nodes to update if you desire to use 
> > > the new 2i query.  The 2i data will properly write/update on both 1.3 
> > > and 1.4 machines during the migration.
> > > 
> > > Does that answer your question?
> > > 
> > > 
> > > And yes, you might see available disk space increase during the 
> > > upgrade compactions if your dataset contains numerous delete 
> > 

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
Sorry I forgot the half of it..

seffenberg@kriak46-1:~$ free -m
 total   used   free sharedbuffers cached
Mem: 23999  23759239  0184  16183
-/+ buffers/cache:   7391  16607
Swap:0  0  0

We have 12 servers..
datadir on the compacted servers (1.4.2) ~ 765 GB

AAE is enabled.

I attached app.config and vm.args.

Cheers
Simon

On Wed, 11 Dec 2013 07:33:31 -0500
Matthew Von-Maszewski  wrote:

> Ok, I am now suspecting that your servers are either using swap space (which 
> is slow) or your leveldb file cache is thrashing (opening and closing 
> multiple files per request).
> 
> How many servers do you have and do you use Riak's active anti-entropy 
> feature?  I am going to plug all of this into a spreadsheet. 
> 
> Matthew Von-Maszewski
> 
> 
> On Dec 11, 2013, at 7:09, Simon Effenberg  wrote:
> 
> > Hi Matthew
> > 
> > Memory: 23999 MB
> > 
> > ring_creation_size, 256
> > max_open_files, 100
> > 
> > riak-admin status:
> > 
> > memory_total : 276001360
> > memory_processes : 191506322
> > memory_processes_used : 191439568
> > memory_system : 84495038
> > memory_atom : 686993
> > memory_atom_used : 686560
> > memory_binary : 21965352
> > memory_code : 11332732
> > memory_ets : 10823528
> > 
> > Thanks for looking!
> > 
> > Cheers
> > Simon
> > 
> > 
> > 
> > On Wed, 11 Dec 2013 06:44:42 -0500
> > Matthew Von-Maszewski  wrote:
> > 
> >> I need to ask other developers as they arrive for the new day.  Does not 
> >> make sense to me.
> >> 
> >> How many nodes do you have?  How much RAM do you have in each node?  What 
> >> are your settings for max_open_files and cache_size in the app.config 
> >> file?  Maybe this is as simple as leveldb using too much RAM in 1.4.  The 
> >> memory accounting for maz_open_files changed in 1.4.
> >> 
> >> Matthew Von-Maszewski
> >> 
> >> 
> >> On Dec 11, 2013, at 6:28, Simon Effenberg  
> >> wrote:
> >> 
> >>> Hi Matthew,
> >>> 
> >>> it took around 11hours for the first node to finish the compaction. The
> >>> second node is running already 12 hours and is still doing compaction.
> >>> 
> >>> Besides that I wonder because the fsm_put time on the new 1.4.2 host is
> >>> much higher (after the compaction) than on an old 1.3.1 (both are
> >>> running in the cluster right now and another one is doing the
> >>> compaction/upgrade while it is in the cluster but not directly
> >>> accessible because it is out of the Loadbalancer):
> >>> 
> >>> 1.4.2:
> >>> 
> >>> node_put_fsm_time_mean : 2208050
> >>> node_put_fsm_time_median : 39231
> >>> node_put_fsm_time_95 : 17400382
> >>> node_put_fsm_time_99 : 50965752
> >>> node_put_fsm_time_100 : 59537762
> >>> node_put_fsm_active : 5
> >>> node_put_fsm_active_60s : 364
> >>> node_put_fsm_in_rate : 5
> >>> node_put_fsm_out_rate : 3
> >>> node_put_fsm_rejected : 0
> >>> node_put_fsm_rejected_60s : 0
> >>> node_put_fsm_rejected_total : 0
> >>> 
> >>> 
> >>> 1.3.1:
> >>> 
> >>> node_put_fsm_time_mean : 5036
> >>> node_put_fsm_time_median : 1614
> >>> node_put_fsm_time_95 : 8789
> >>> node_put_fsm_time_99 : 38258
> >>> node_put_fsm_time_100 : 384372
> >>> 
> >>> 
> >>> any clue why this could/should be?
> >>> 
> >>> Cheers
> >>> Simon
> >>> 
> >>> On Tue, 10 Dec 2013 17:21:07 +0100
> >>> Simon Effenberg  wrote:
> >>> 
>  Hi Matthew,
>  
>  thanks!.. that answers my questions!
>  
>  Cheers
>  Simon
>  
>  On Tue, 10 Dec 2013 11:08:32 -0500
>  Matthew Von-Maszewski  wrote:
>  
> > 2i is not my expertise, so I had to discuss you concerns with another 
> > Basho developer.  He says:
> > 
> > Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk 
> > format.  You must wait for all nodes to update if you desire to use the 
> > new 2i query.  The 2i data will properly write/update on both 1.3 and 
> > 1.4 machines during the migration.
> > 
> > Does that answer your question?
> > 
> > 
> > And yes, you might see available disk space increase during the upgrade 
> > compactions if your dataset contains numerous delete "tombstones".  The 
> > Riak 2.0 code includes a new feature called "aggressive delete" for 
> > leveldb.  This feature is more proactive in pushing delete tombstones 
> > through the levels to free up disk space much more quickly (especially 
> > if you perform block deletes every now and then).
> > 
> > Matthew
> > 
> > 
> > On Dec 10, 2013, at 10:44 AM, Simon Effenberg 
> >  wrote:
> > 
> >> Hi Matthew,
> >> 
> >> see inline..
> >> 
> >> On Tue, 10 Dec 2013 10:38:03 -0500
> >> Matthew Von-Maszewski  wrote:
> >> 
> >>> The sad truth is that you are not the first to see this problem.  And 
> >>> yes, it has to do with your 950GB per node dataset.  And no, nothing 
> >>> to do but sit through it at this time.
> >>> 
> >>> While I did extensive

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Matthew Von-Maszewski
Ok, I am now suspecting that your servers are either using swap space (which is 
slow) or your leveldb file cache is thrashing (opening and closing multiple 
files per request).

How many servers do you have and do you use Riak's active anti-entropy feature? 
 I am going to plug all of this into a spreadsheet. 

Matthew Von-Maszewski


On Dec 11, 2013, at 7:09, Simon Effenberg  wrote:

> Hi Matthew
> 
> Memory: 23999 MB
> 
> ring_creation_size, 256
> max_open_files, 100
> 
> riak-admin status:
> 
> memory_total : 276001360
> memory_processes : 191506322
> memory_processes_used : 191439568
> memory_system : 84495038
> memory_atom : 686993
> memory_atom_used : 686560
> memory_binary : 21965352
> memory_code : 11332732
> memory_ets : 10823528
> 
> Thanks for looking!
> 
> Cheers
> Simon
> 
> 
> 
> On Wed, 11 Dec 2013 06:44:42 -0500
> Matthew Von-Maszewski  wrote:
> 
>> I need to ask other developers as they arrive for the new day.  Does not 
>> make sense to me.
>> 
>> How many nodes do you have?  How much RAM do you have in each node?  What 
>> are your settings for max_open_files and cache_size in the app.config file?  
>> Maybe this is as simple as leveldb using too much RAM in 1.4.  The memory 
>> accounting for maz_open_files changed in 1.4.
>> 
>> Matthew Von-Maszewski
>> 
>> 
>> On Dec 11, 2013, at 6:28, Simon Effenberg  wrote:
>> 
>>> Hi Matthew,
>>> 
>>> it took around 11hours for the first node to finish the compaction. The
>>> second node is running already 12 hours and is still doing compaction.
>>> 
>>> Besides that I wonder because the fsm_put time on the new 1.4.2 host is
>>> much higher (after the compaction) than on an old 1.3.1 (both are
>>> running in the cluster right now and another one is doing the
>>> compaction/upgrade while it is in the cluster but not directly
>>> accessible because it is out of the Loadbalancer):
>>> 
>>> 1.4.2:
>>> 
>>> node_put_fsm_time_mean : 2208050
>>> node_put_fsm_time_median : 39231
>>> node_put_fsm_time_95 : 17400382
>>> node_put_fsm_time_99 : 50965752
>>> node_put_fsm_time_100 : 59537762
>>> node_put_fsm_active : 5
>>> node_put_fsm_active_60s : 364
>>> node_put_fsm_in_rate : 5
>>> node_put_fsm_out_rate : 3
>>> node_put_fsm_rejected : 0
>>> node_put_fsm_rejected_60s : 0
>>> node_put_fsm_rejected_total : 0
>>> 
>>> 
>>> 1.3.1:
>>> 
>>> node_put_fsm_time_mean : 5036
>>> node_put_fsm_time_median : 1614
>>> node_put_fsm_time_95 : 8789
>>> node_put_fsm_time_99 : 38258
>>> node_put_fsm_time_100 : 384372
>>> 
>>> 
>>> any clue why this could/should be?
>>> 
>>> Cheers
>>> Simon
>>> 
>>> On Tue, 10 Dec 2013 17:21:07 +0100
>>> Simon Effenberg  wrote:
>>> 
 Hi Matthew,
 
 thanks!.. that answers my questions!
 
 Cheers
 Simon
 
 On Tue, 10 Dec 2013 11:08:32 -0500
 Matthew Von-Maszewski  wrote:
 
> 2i is not my expertise, so I had to discuss you concerns with another 
> Basho developer.  He says:
> 
> Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk 
> format.  You must wait for all nodes to update if you desire to use the 
> new 2i query.  The 2i data will properly write/update on both 1.3 and 1.4 
> machines during the migration.
> 
> Does that answer your question?
> 
> 
> And yes, you might see available disk space increase during the upgrade 
> compactions if your dataset contains numerous delete "tombstones".  The 
> Riak 2.0 code includes a new feature called "aggressive delete" for 
> leveldb.  This feature is more proactive in pushing delete tombstones 
> through the levels to free up disk space much more quickly (especially if 
> you perform block deletes every now and then).
> 
> Matthew
> 
> 
> On Dec 10, 2013, at 10:44 AM, Simon Effenberg  
> wrote:
> 
>> Hi Matthew,
>> 
>> see inline..
>> 
>> On Tue, 10 Dec 2013 10:38:03 -0500
>> Matthew Von-Maszewski  wrote:
>> 
>>> The sad truth is that you are not the first to see this problem.  And 
>>> yes, it has to do with your 950GB per node dataset.  And no, nothing to 
>>> do but sit through it at this time.
>>> 
>>> While I did extensive testing around upgrade times before shipping 1.4, 
>>> apparently there are data configurations I did not anticipate.  You are 
>>> likely seeing a cascade where a shift of one file from level-1 to 
>>> level-2 is causing a shift of another file from level-2 to level-3, 
>>> which causes a level-3 file to shift to level-4, etc … then the next 
>>> file shifts from level-1.
>>> 
>>> The bright side of this pain is that you will end up with better write 
>>> throughput once all the compaction ends.
>> 
>> I have to deal with that.. but my problem is now, if I'm doing this
>> node by node it looks like 2i searches aren't possible while 1.3 and
>> 1.4 nodes exists in the cluster. Is there any problem which leads me to
>> an 2i repai

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
Hi Matthew

Memory: 23999 MB

ring_creation_size, 256
max_open_files, 100

riak-admin status:

memory_total : 276001360
memory_processes : 191506322
memory_processes_used : 191439568
memory_system : 84495038
memory_atom : 686993
memory_atom_used : 686560
memory_binary : 21965352
memory_code : 11332732
memory_ets : 10823528

Thanks for looking!

Cheers
Simon



On Wed, 11 Dec 2013 06:44:42 -0500
Matthew Von-Maszewski  wrote:

> I need to ask other developers as they arrive for the new day.  Does not make 
> sense to me.
> 
> How many nodes do you have?  How much RAM do you have in each node?  What are 
> your settings for max_open_files and cache_size in the app.config file?  
> Maybe this is as simple as leveldb using too much RAM in 1.4.  The memory 
> accounting for maz_open_files changed in 1.4.
> 
> Matthew Von-Maszewski
> 
> 
> On Dec 11, 2013, at 6:28, Simon Effenberg  wrote:
> 
> > Hi Matthew,
> > 
> > it took around 11hours for the first node to finish the compaction. The
> > second node is running already 12 hours and is still doing compaction.
> > 
> > Besides that I wonder because the fsm_put time on the new 1.4.2 host is
> > much higher (after the compaction) than on an old 1.3.1 (both are
> > running in the cluster right now and another one is doing the
> > compaction/upgrade while it is in the cluster but not directly
> > accessible because it is out of the Loadbalancer):
> > 
> > 1.4.2:
> > 
> > node_put_fsm_time_mean : 2208050
> > node_put_fsm_time_median : 39231
> > node_put_fsm_time_95 : 17400382
> > node_put_fsm_time_99 : 50965752
> > node_put_fsm_time_100 : 59537762
> > node_put_fsm_active : 5
> > node_put_fsm_active_60s : 364
> > node_put_fsm_in_rate : 5
> > node_put_fsm_out_rate : 3
> > node_put_fsm_rejected : 0
> > node_put_fsm_rejected_60s : 0
> > node_put_fsm_rejected_total : 0
> > 
> > 
> > 1.3.1:
> > 
> > node_put_fsm_time_mean : 5036
> > node_put_fsm_time_median : 1614
> > node_put_fsm_time_95 : 8789
> > node_put_fsm_time_99 : 38258
> > node_put_fsm_time_100 : 384372
> > 
> > 
> > any clue why this could/should be?
> > 
> > Cheers
> > Simon
> > 
> > On Tue, 10 Dec 2013 17:21:07 +0100
> > Simon Effenberg  wrote:
> > 
> >> Hi Matthew,
> >> 
> >> thanks!.. that answers my questions!
> >> 
> >> Cheers
> >> Simon
> >> 
> >> On Tue, 10 Dec 2013 11:08:32 -0500
> >> Matthew Von-Maszewski  wrote:
> >> 
> >>> 2i is not my expertise, so I had to discuss you concerns with another 
> >>> Basho developer.  He says:
> >>> 
> >>> Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk 
> >>> format.  You must wait for all nodes to update if you desire to use the 
> >>> new 2i query.  The 2i data will properly write/update on both 1.3 and 1.4 
> >>> machines during the migration.
> >>> 
> >>> Does that answer your question?
> >>> 
> >>> 
> >>> And yes, you might see available disk space increase during the upgrade 
> >>> compactions if your dataset contains numerous delete "tombstones".  The 
> >>> Riak 2.0 code includes a new feature called "aggressive delete" for 
> >>> leveldb.  This feature is more proactive in pushing delete tombstones 
> >>> through the levels to free up disk space much more quickly (especially if 
> >>> you perform block deletes every now and then).
> >>> 
> >>> Matthew
> >>> 
> >>> 
> >>> On Dec 10, 2013, at 10:44 AM, Simon Effenberg  
> >>> wrote:
> >>> 
>  Hi Matthew,
>  
>  see inline..
>  
>  On Tue, 10 Dec 2013 10:38:03 -0500
>  Matthew Von-Maszewski  wrote:
>  
> > The sad truth is that you are not the first to see this problem.  And 
> > yes, it has to do with your 950GB per node dataset.  And no, nothing to 
> > do but sit through it at this time.
> > 
> > While I did extensive testing around upgrade times before shipping 1.4, 
> > apparently there are data configurations I did not anticipate.  You are 
> > likely seeing a cascade where a shift of one file from level-1 to 
> > level-2 is causing a shift of another file from level-2 to level-3, 
> > which causes a level-3 file to shift to level-4, etc … then the next 
> > file shifts from level-1.
> > 
> > The bright side of this pain is that you will end up with better write 
> > throughput once all the compaction ends.
>  
>  I have to deal with that.. but my problem is now, if I'm doing this
>  node by node it looks like 2i searches aren't possible while 1.3 and
>  1.4 nodes exists in the cluster. Is there any problem which leads me to
>  an 2i repair marathon or could I easily wait for some hours for each
>  node until all merges are done before I upgrade the next one? (2i
>  searches can fail for some time.. the APP isn't having problems with
>  that but are new inserts with 2i indices processed successfully or do
>  I have to do the 2i repair?)
>  
>  /s
>  
>  one other good think: saving disk space is one advantage ;)..
>  
>  
> > 
> > Riak 2.0's l

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Matthew Von-Maszewski
I need to ask other developers as they arrive for the new day.  Does not make 
sense to me.

How many nodes do you have?  How much RAM do you have in each node?  What are 
your settings for max_open_files and cache_size in the app.config file?  Maybe 
this is as simple as leveldb using too much RAM in 1.4.  The memory accounting 
for maz_open_files changed in 1.4.

Matthew Von-Maszewski


On Dec 11, 2013, at 6:28, Simon Effenberg  wrote:

> Hi Matthew,
> 
> it took around 11hours for the first node to finish the compaction. The
> second node is running already 12 hours and is still doing compaction.
> 
> Besides that I wonder because the fsm_put time on the new 1.4.2 host is
> much higher (after the compaction) than on an old 1.3.1 (both are
> running in the cluster right now and another one is doing the
> compaction/upgrade while it is in the cluster but not directly
> accessible because it is out of the Loadbalancer):
> 
> 1.4.2:
> 
> node_put_fsm_time_mean : 2208050
> node_put_fsm_time_median : 39231
> node_put_fsm_time_95 : 17400382
> node_put_fsm_time_99 : 50965752
> node_put_fsm_time_100 : 59537762
> node_put_fsm_active : 5
> node_put_fsm_active_60s : 364
> node_put_fsm_in_rate : 5
> node_put_fsm_out_rate : 3
> node_put_fsm_rejected : 0
> node_put_fsm_rejected_60s : 0
> node_put_fsm_rejected_total : 0
> 
> 
> 1.3.1:
> 
> node_put_fsm_time_mean : 5036
> node_put_fsm_time_median : 1614
> node_put_fsm_time_95 : 8789
> node_put_fsm_time_99 : 38258
> node_put_fsm_time_100 : 384372
> 
> 
> any clue why this could/should be?
> 
> Cheers
> Simon
> 
> On Tue, 10 Dec 2013 17:21:07 +0100
> Simon Effenberg  wrote:
> 
>> Hi Matthew,
>> 
>> thanks!.. that answers my questions!
>> 
>> Cheers
>> Simon
>> 
>> On Tue, 10 Dec 2013 11:08:32 -0500
>> Matthew Von-Maszewski  wrote:
>> 
>>> 2i is not my expertise, so I had to discuss you concerns with another Basho 
>>> developer.  He says:
>>> 
>>> Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk format. 
>>>  You must wait for all nodes to update if you desire to use the new 2i 
>>> query.  The 2i data will properly write/update on both 1.3 and 1.4 machines 
>>> during the migration.
>>> 
>>> Does that answer your question?
>>> 
>>> 
>>> And yes, you might see available disk space increase during the upgrade 
>>> compactions if your dataset contains numerous delete "tombstones".  The 
>>> Riak 2.0 code includes a new feature called "aggressive delete" for 
>>> leveldb.  This feature is more proactive in pushing delete tombstones 
>>> through the levels to free up disk space much more quickly (especially if 
>>> you perform block deletes every now and then).
>>> 
>>> Matthew
>>> 
>>> 
>>> On Dec 10, 2013, at 10:44 AM, Simon Effenberg  
>>> wrote:
>>> 
 Hi Matthew,
 
 see inline..
 
 On Tue, 10 Dec 2013 10:38:03 -0500
 Matthew Von-Maszewski  wrote:
 
> The sad truth is that you are not the first to see this problem.  And 
> yes, it has to do with your 950GB per node dataset.  And no, nothing to 
> do but sit through it at this time.
> 
> While I did extensive testing around upgrade times before shipping 1.4, 
> apparently there are data configurations I did not anticipate.  You are 
> likely seeing a cascade where a shift of one file from level-1 to level-2 
> is causing a shift of another file from level-2 to level-3, which causes 
> a level-3 file to shift to level-4, etc … then the next file shifts from 
> level-1.
> 
> The bright side of this pain is that you will end up with better write 
> throughput once all the compaction ends.
 
 I have to deal with that.. but my problem is now, if I'm doing this
 node by node it looks like 2i searches aren't possible while 1.3 and
 1.4 nodes exists in the cluster. Is there any problem which leads me to
 an 2i repair marathon or could I easily wait for some hours for each
 node until all merges are done before I upgrade the next one? (2i
 searches can fail for some time.. the APP isn't having problems with
 that but are new inserts with 2i indices processed successfully or do
 I have to do the 2i repair?)
 
 /s
 
 one other good think: saving disk space is one advantage ;)..
 
 
> 
> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but 
> that is not going to help you today.
> 
> Matthew
> 
> On Dec 10, 2013, at 10:26 AM, Simon Effenberg  
> wrote:
> 
>> Hi @list,
>> 
>> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
>> upgrading the first node (out of 12) this node seems to do many merges.
>> the sst_* directories changes in size "rapidly" and the node is having
>> a disk utilization of 100% all the time.
>> 
>> I know that there is something like that:
>> 
>> "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
>> will initiate an automatic 

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-11 Thread Simon Effenberg
Hi Matthew,

it took around 11hours for the first node to finish the compaction. The
second node is running already 12 hours and is still doing compaction.

Besides that I wonder because the fsm_put time on the new 1.4.2 host is
much higher (after the compaction) than on an old 1.3.1 (both are
running in the cluster right now and another one is doing the
compaction/upgrade while it is in the cluster but not directly
accessible because it is out of the Loadbalancer):

1.4.2:

node_put_fsm_time_mean : 2208050
node_put_fsm_time_median : 39231
node_put_fsm_time_95 : 17400382
node_put_fsm_time_99 : 50965752
node_put_fsm_time_100 : 59537762
node_put_fsm_active : 5
node_put_fsm_active_60s : 364
node_put_fsm_in_rate : 5
node_put_fsm_out_rate : 3
node_put_fsm_rejected : 0
node_put_fsm_rejected_60s : 0
node_put_fsm_rejected_total : 0


1.3.1:

node_put_fsm_time_mean : 5036
node_put_fsm_time_median : 1614
node_put_fsm_time_95 : 8789
node_put_fsm_time_99 : 38258
node_put_fsm_time_100 : 384372


any clue why this could/should be?

Cheers
Simon

On Tue, 10 Dec 2013 17:21:07 +0100
Simon Effenberg  wrote:

> Hi Matthew,
> 
> thanks!.. that answers my questions!
> 
> Cheers
> Simon
> 
> On Tue, 10 Dec 2013 11:08:32 -0500
> Matthew Von-Maszewski  wrote:
> 
> > 2i is not my expertise, so I had to discuss you concerns with another Basho 
> > developer.  He says:
> > 
> > Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk format. 
> >  You must wait for all nodes to update if you desire to use the new 2i 
> > query.  The 2i data will properly write/update on both 1.3 and 1.4 machines 
> > during the migration.
> > 
> > Does that answer your question?
> > 
> > 
> > And yes, you might see available disk space increase during the upgrade 
> > compactions if your dataset contains numerous delete "tombstones".  The 
> > Riak 2.0 code includes a new feature called "aggressive delete" for 
> > leveldb.  This feature is more proactive in pushing delete tombstones 
> > through the levels to free up disk space much more quickly (especially if 
> > you perform block deletes every now and then).
> > 
> > Matthew
> > 
> > 
> > On Dec 10, 2013, at 10:44 AM, Simon Effenberg  
> > wrote:
> > 
> > > Hi Matthew,
> > > 
> > > see inline..
> > > 
> > > On Tue, 10 Dec 2013 10:38:03 -0500
> > > Matthew Von-Maszewski  wrote:
> > > 
> > >> The sad truth is that you are not the first to see this problem.  And 
> > >> yes, it has to do with your 950GB per node dataset.  And no, nothing to 
> > >> do but sit through it at this time.
> > >> 
> > >> While I did extensive testing around upgrade times before shipping 1.4, 
> > >> apparently there are data configurations I did not anticipate.  You are 
> > >> likely seeing a cascade where a shift of one file from level-1 to 
> > >> level-2 is causing a shift of another file from level-2 to level-3, 
> > >> which causes a level-3 file to shift to level-4, etc … then the next 
> > >> file shifts from level-1.
> > >> 
> > >> The bright side of this pain is that you will end up with better write 
> > >> throughput once all the compaction ends.
> > > 
> > > I have to deal with that.. but my problem is now, if I'm doing this
> > > node by node it looks like 2i searches aren't possible while 1.3 and
> > > 1.4 nodes exists in the cluster. Is there any problem which leads me to
> > > an 2i repair marathon or could I easily wait for some hours for each
> > > node until all merges are done before I upgrade the next one? (2i
> > > searches can fail for some time.. the APP isn't having problems with
> > > that but are new inserts with 2i indices processed successfully or do
> > > I have to do the 2i repair?)
> > > 
> > > /s
> > > 
> > > one other good think: saving disk space is one advantage ;)..
> > > 
> > > 
> > >> 
> > >> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but 
> > >> that is not going to help you today.
> > >> 
> > >> Matthew
> > >> 
> > >> On Dec 10, 2013, at 10:26 AM, Simon Effenberg 
> > >>  wrote:
> > >> 
> > >>> Hi @list,
> > >>> 
> > >>> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
> > >>> upgrading the first node (out of 12) this node seems to do many merges.
> > >>> the sst_* directories changes in size "rapidly" and the node is having
> > >>> a disk utilization of 100% all the time.
> > >>> 
> > >>> I know that there is something like that:
> > >>> 
> > >>> "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
> > >>> will initiate an automatic conversion that could pause the startup of
> > >>> each node by 3 to 7 minutes. The leveldb data in "level #1" is being
> > >>> adjusted such that "level #1" can operate as an overlapped data level
> > >>> instead of as a sorted data level. The conversion is simply the
> > >>> reduction of the number of files in "level #1" to being less than eight
> > >>> via normal compaction of data from "level #1" into "level #2". This is
> > >>> a one time conversion."
> > >>> 
> > >>> but it l

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-10 Thread Simon Effenberg
Hi Matthew,

thanks!.. that answers my questions!

Cheers
Simon

On Tue, 10 Dec 2013 11:08:32 -0500
Matthew Von-Maszewski  wrote:

> 2i is not my expertise, so I had to discuss you concerns with another Basho 
> developer.  He says:
> 
> Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk format.  
> You must wait for all nodes to update if you desire to use the new 2i query.  
> The 2i data will properly write/update on both 1.3 and 1.4 machines during 
> the migration.
> 
> Does that answer your question?
> 
> 
> And yes, you might see available disk space increase during the upgrade 
> compactions if your dataset contains numerous delete "tombstones".  The Riak 
> 2.0 code includes a new feature called "aggressive delete" for leveldb.  This 
> feature is more proactive in pushing delete tombstones through the levels to 
> free up disk space much more quickly (especially if you perform block deletes 
> every now and then).
> 
> Matthew
> 
> 
> On Dec 10, 2013, at 10:44 AM, Simon Effenberg  
> wrote:
> 
> > Hi Matthew,
> > 
> > see inline..
> > 
> > On Tue, 10 Dec 2013 10:38:03 -0500
> > Matthew Von-Maszewski  wrote:
> > 
> >> The sad truth is that you are not the first to see this problem.  And yes, 
> >> it has to do with your 950GB per node dataset.  And no, nothing to do but 
> >> sit through it at this time.
> >> 
> >> While I did extensive testing around upgrade times before shipping 1.4, 
> >> apparently there are data configurations I did not anticipate.  You are 
> >> likely seeing a cascade where a shift of one file from level-1 to level-2 
> >> is causing a shift of another file from level-2 to level-3, which causes a 
> >> level-3 file to shift to level-4, etc … then the next file shifts from 
> >> level-1.
> >> 
> >> The bright side of this pain is that you will end up with better write 
> >> throughput once all the compaction ends.
> > 
> > I have to deal with that.. but my problem is now, if I'm doing this
> > node by node it looks like 2i searches aren't possible while 1.3 and
> > 1.4 nodes exists in the cluster. Is there any problem which leads me to
> > an 2i repair marathon or could I easily wait for some hours for each
> > node until all merges are done before I upgrade the next one? (2i
> > searches can fail for some time.. the APP isn't having problems with
> > that but are new inserts with 2i indices processed successfully or do
> > I have to do the 2i repair?)
> > 
> > /s
> > 
> > one other good think: saving disk space is one advantage ;)..
> > 
> > 
> >> 
> >> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but 
> >> that is not going to help you today.
> >> 
> >> Matthew
> >> 
> >> On Dec 10, 2013, at 10:26 AM, Simon Effenberg  
> >> wrote:
> >> 
> >>> Hi @list,
> >>> 
> >>> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
> >>> upgrading the first node (out of 12) this node seems to do many merges.
> >>> the sst_* directories changes in size "rapidly" and the node is having
> >>> a disk utilization of 100% all the time.
> >>> 
> >>> I know that there is something like that:
> >>> 
> >>> "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
> >>> will initiate an automatic conversion that could pause the startup of
> >>> each node by 3 to 7 minutes. The leveldb data in "level #1" is being
> >>> adjusted such that "level #1" can operate as an overlapped data level
> >>> instead of as a sorted data level. The conversion is simply the
> >>> reduction of the number of files in "level #1" to being less than eight
> >>> via normal compaction of data from "level #1" into "level #2". This is
> >>> a one time conversion."
> >>> 
> >>> but it looks much more invasive than explained here or doesn't have to
> >>> do anything with the (probably seen) merges.
> >>> 
> >>> Is this "normal" behavior or could I do anything about it?
> >>> 
> >>> At the moment I'm stucked with the upgrade procedure because this high
> >>> IO load would probably lead to high response times.
> >>> 
> >>> Also we have a lot of data (per node ~950 GB).
> >>> 
> >>> Cheers
> >>> Simon
> >>> 
> >>> ___
> >>> riak-users mailing list
> >>> riak-users@lists.basho.com
> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >> 
> > 
> > 
> > -- 
> > Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> > Fon: + 49-(0)30-8109 - 7173
> > Fax: + 49-(0)30-8109 - 7131
> > 
> > Mail: seffenb...@team.mobile.de
> > Web:www.mobile.de
> > 
> > Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> > 
> > 
> > Geschäftsführer: Malte Krüger
> > HRB Nr.: 18517 P, Amtsgericht Potsdam
> > Sitz der Gesellschaft: Kleinmachnow 
> 


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon: + 49-(0)30-8109 - 7173
Fax: + 49-(0)30-8109 - 7131

Mail: seffenb...@team.mobile.de
Web:www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführ

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-10 Thread Matthew Von-Maszewski
2i is not my expertise, so I had to discuss you concerns with another Basho 
developer.  He says:

Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk format.  
You must wait for all nodes to update if you desire to use the new 2i query.  
The 2i data will properly write/update on both 1.3 and 1.4 machines during the 
migration.

Does that answer your question?


And yes, you might see available disk space increase during the upgrade 
compactions if your dataset contains numerous delete "tombstones".  The Riak 
2.0 code includes a new feature called "aggressive delete" for leveldb.  This 
feature is more proactive in pushing delete tombstones through the levels to 
free up disk space much more quickly (especially if you perform block deletes 
every now and then).

Matthew


On Dec 10, 2013, at 10:44 AM, Simon Effenberg  wrote:

> Hi Matthew,
> 
> see inline..
> 
> On Tue, 10 Dec 2013 10:38:03 -0500
> Matthew Von-Maszewski  wrote:
> 
>> The sad truth is that you are not the first to see this problem.  And yes, 
>> it has to do with your 950GB per node dataset.  And no, nothing to do but 
>> sit through it at this time.
>> 
>> While I did extensive testing around upgrade times before shipping 1.4, 
>> apparently there are data configurations I did not anticipate.  You are 
>> likely seeing a cascade where a shift of one file from level-1 to level-2 is 
>> causing a shift of another file from level-2 to level-3, which causes a 
>> level-3 file to shift to level-4, etc … then the next file shifts from 
>> level-1.
>> 
>> The bright side of this pain is that you will end up with better write 
>> throughput once all the compaction ends.
> 
> I have to deal with that.. but my problem is now, if I'm doing this
> node by node it looks like 2i searches aren't possible while 1.3 and
> 1.4 nodes exists in the cluster. Is there any problem which leads me to
> an 2i repair marathon or could I easily wait for some hours for each
> node until all merges are done before I upgrade the next one? (2i
> searches can fail for some time.. the APP isn't having problems with
> that but are new inserts with 2i indices processed successfully or do
> I have to do the 2i repair?)
> 
> /s
> 
> one other good think: saving disk space is one advantage ;)..
> 
> 
>> 
>> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but that 
>> is not going to help you today.
>> 
>> Matthew
>> 
>> On Dec 10, 2013, at 10:26 AM, Simon Effenberg  
>> wrote:
>> 
>>> Hi @list,
>>> 
>>> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
>>> upgrading the first node (out of 12) this node seems to do many merges.
>>> the sst_* directories changes in size "rapidly" and the node is having
>>> a disk utilization of 100% all the time.
>>> 
>>> I know that there is something like that:
>>> 
>>> "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
>>> will initiate an automatic conversion that could pause the startup of
>>> each node by 3 to 7 minutes. The leveldb data in "level #1" is being
>>> adjusted such that "level #1" can operate as an overlapped data level
>>> instead of as a sorted data level. The conversion is simply the
>>> reduction of the number of files in "level #1" to being less than eight
>>> via normal compaction of data from "level #1" into "level #2". This is
>>> a one time conversion."
>>> 
>>> but it looks much more invasive than explained here or doesn't have to
>>> do anything with the (probably seen) merges.
>>> 
>>> Is this "normal" behavior or could I do anything about it?
>>> 
>>> At the moment I'm stucked with the upgrade procedure because this high
>>> IO load would probably lead to high response times.
>>> 
>>> Also we have a lot of data (per node ~950 GB).
>>> 
>>> Cheers
>>> Simon
>>> 
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
> 
> 
> -- 
> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> Fon: + 49-(0)30-8109 - 7173
> Fax: + 49-(0)30-8109 - 7131
> 
> Mail: seffenb...@team.mobile.de
> Web:www.mobile.de
> 
> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> 
> 
> Geschäftsführer: Malte Krüger
> HRB Nr.: 18517 P, Amtsgericht Potsdam
> Sitz der Gesellschaft: Kleinmachnow 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-10 Thread Simon Effenberg
Hi Matthew,

see inline..

On Tue, 10 Dec 2013 10:38:03 -0500
Matthew Von-Maszewski  wrote:

> The sad truth is that you are not the first to see this problem.  And yes, it 
> has to do with your 950GB per node dataset.  And no, nothing to do but sit 
> through it at this time.
> 
> While I did extensive testing around upgrade times before shipping 1.4, 
> apparently there are data configurations I did not anticipate.  You are 
> likely seeing a cascade where a shift of one file from level-1 to level-2 is 
> causing a shift of another file from level-2 to level-3, which causes a 
> level-3 file to shift to level-4, etc … then the next file shifts from 
> level-1.
> 
> The bright side of this pain is that you will end up with better write 
> throughput once all the compaction ends.

I have to deal with that.. but my problem is now, if I'm doing this
node by node it looks like 2i searches aren't possible while 1.3 and
1.4 nodes exists in the cluster. Is there any problem which leads me to
an 2i repair marathon or could I easily wait for some hours for each
node until all merges are done before I upgrade the next one? (2i
searches can fail for some time.. the APP isn't having problems with
that but are new inserts with 2i indices processed successfully or do
I have to do the 2i repair?)

/s

one other good think: saving disk space is one advantage ;)..


> 
> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but that 
> is not going to help you today.
> 
> Matthew
> 
> On Dec 10, 2013, at 10:26 AM, Simon Effenberg  
> wrote:
> 
> > Hi @list,
> > 
> > I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
> > upgrading the first node (out of 12) this node seems to do many merges.
> > the sst_* directories changes in size "rapidly" and the node is having
> > a disk utilization of 100% all the time.
> > 
> > I know that there is something like that:
> > 
> > "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
> > will initiate an automatic conversion that could pause the startup of
> > each node by 3 to 7 minutes. The leveldb data in "level #1" is being
> > adjusted such that "level #1" can operate as an overlapped data level
> > instead of as a sorted data level. The conversion is simply the
> > reduction of the number of files in "level #1" to being less than eight
> > via normal compaction of data from "level #1" into "level #2". This is
> > a one time conversion."
> > 
> > but it looks much more invasive than explained here or doesn't have to
> > do anything with the (probably seen) merges.
> > 
> > Is this "normal" behavior or could I do anything about it?
> > 
> > At the moment I'm stucked with the upgrade procedure because this high
> > IO load would probably lead to high response times.
> > 
> > Also we have a lot of data (per node ~950 GB).
> > 
> > Cheers
> > Simon
> > 
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon: + 49-(0)30-8109 - 7173
Fax: + 49-(0)30-8109 - 7131

Mail: seffenb...@team.mobile.de
Web:www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-10 Thread Matthew Von-Maszewski
The sad truth is that you are not the first to see this problem.  And yes, it 
has to do with your 950GB per node dataset.  And no, nothing to do but sit 
through it at this time.

While I did extensive testing around upgrade times before shipping 1.4, 
apparently there are data configurations I did not anticipate.  You are likely 
seeing a cascade where a shift of one file from level-1 to level-2 is causing a 
shift of another file from level-2 to level-3, which causes a level-3 file to 
shift to level-4, etc … then the next file shifts from level-1.

The bright side of this pain is that you will end up with better write 
throughput once all the compaction ends.

Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but that is 
not going to help you today.

Matthew

On Dec 10, 2013, at 10:26 AM, Simon Effenberg  wrote:

> Hi @list,
> 
> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
> upgrading the first node (out of 12) this node seems to do many merges.
> the sst_* directories changes in size "rapidly" and the node is having
> a disk utilization of 100% all the time.
> 
> I know that there is something like that:
> 
> "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
> will initiate an automatic conversion that could pause the startup of
> each node by 3 to 7 minutes. The leveldb data in "level #1" is being
> adjusted such that "level #1" can operate as an overlapped data level
> instead of as a sorted data level. The conversion is simply the
> reduction of the number of files in "level #1" to being less than eight
> via normal compaction of data from "level #1" into "level #2". This is
> a one time conversion."
> 
> but it looks much more invasive than explained here or doesn't have to
> do anything with the (probably seen) merges.
> 
> Is this "normal" behavior or could I do anything about it?
> 
> At the moment I'm stucked with the upgrade procedure because this high
> IO load would probably lead to high response times.
> 
> Also we have a lot of data (per node ~950 GB).
> 
> Cheers
> Simon
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Upgrade from 1.3.1 to 1.4.2 => high IO

2013-12-10 Thread Simon Effenberg
Hi @list,

I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
upgrading the first node (out of 12) this node seems to do many merges.
the sst_* directories changes in size "rapidly" and the node is having
a disk utilization of 100% all the time.

I know that there is something like that:

"The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
will initiate an automatic conversion that could pause the startup of
each node by 3 to 7 minutes. The leveldb data in "level #1" is being
adjusted such that "level #1" can operate as an overlapped data level
instead of as a sorted data level. The conversion is simply the
reduction of the number of files in "level #1" to being less than eight
via normal compaction of data from "level #1" into "level #2". This is
a one time conversion."

but it looks much more invasive than explained here or doesn't have to
do anything with the (probably seen) merges.

Is this "normal" behavior or could I do anything about it?

At the moment I'm stucked with the upgrade procedure because this high
IO load would probably lead to high response times.

Also we have a lot of data (per node ~950 GB).

Cheers
Simon

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com