All,

Thanks to Joe Olson for his detective work and write-up.  Further thanks to 
Russell Brown and Dan Brown for debug work provided yesterday and today.

I have updated the Riak ticket with the cause and fix discussion:

   https://github.com/basho/riak_kv/issues/1356

I will discuss this issue internally with relation to the upcoming Riak 2.0.7 
and 2.2.0 releases.  And will create a proper branch and pull request tomorrow.

This is a data loss scenario in tiered storage if Riak starts and stops prior 
to the first recovery log being translated into an .sst table file.  Once the 
first .log file becomes an .sst file, all subsequent recovery .log files go to 
the proper location and will be found upon next stop/start cycle.  Stated 
another way, the first 30Mbytes to 60Mbytes of data written to each vnode after 
a restart is subject to data loss if Riak restarted again quickly.

Matthew

> On Feb 26, 2016, at 11:19 AM, Joe Olson <technol...@nododos.com> wrote:
> 
> 
> 
> Negative.
> 
> I have ring size set to 8, leveldb split across two sets of drives ("fast" 
> and "slow", but meaningless on the test Vagrant box...just two separate 
> directories). I checked all of the ../leveldb/* directories. All LOG files 
> are identical, and no errors in any of them.
> 
> I will try to build another Vagrant machine with the default riak.conf and 
> see if I can get this to repeat. It is almost as if the KV pairs are not 
> persisting to disk at all.
> 
> 
> From: "Matthew Von-Maszewski" <matth...@basho.com>
> To: "Joe Olson" <technol...@nododos.com>
> Cc: "riak-users" <riak-users@lists.basho.com>, "cmancini" <cmanc...@basho.com>
> Sent: Friday, February 26, 2016 10:12:15 AM
> Subject: Re: Ok, I am stumped. Losing data or riak stop
> 
> Joe,
> 
> Are there any error messages in the leveldb LOG and/or LOG.old files?  These 
> files are located within each vnode's directory, likely 
> /var/lib/riak/data/leveldb/*/LOG* on your machine.
> 
> The LOG files are not to be confused with 000xxx.log files.  The lower case 
> *.log files are the recovery files that should contain the keys you are 
> missing.  If they are not loading properly, the LOG files should have clues.
> 
> Matthew
> 
> On Feb 26, 2016, at 11:04 AM, Christopher Mancini <cmanc...@basho.com 
> <mailto:cmanc...@basho.com>> wrote:
> 
> Hey Joe,
> 
> I will do my best to help, but I am not the most experienced with Riak 
> operations. Your best bet to get to a solution as fast as possible is to 
> include the full users group, which I have added to the recipients of this 
> message.
> 
> 1. Are the Riak data directories within Vagrant shared directories between 
> the host and guest? I have had issues with OS file system caching before when 
> working with web server files.
> 
> 2. What version of Ubuntu are you using?
> 
> 3. How did you install Riak on Ubuntu?
> 
> 4. Have you tried restoring the original distribution riak.conf file and seen 
> if the issue persists? This would help you determine if the issue is your 
> config or something with your environment.
> 
> Chris
> 
> On Fri, Feb 26, 2016 at 10:55 AM Joe Olson <technol...@nododos.com 
> <mailto:technol...@nododos.com>> wrote:
> 
> Chris - 
> 
>  I cannot figure out what is going on. Here is my test case. Configuration 
> file attached. I am running a single node of Riak on a vagrant box with a 
> level DB back end. I don't even have to bring the box down, merely stopping 
> and restarting riak '(riak stop' and 'riak start' or 'risk restart) causes 
> all the keys to be lost. The riak node is set up on a Vagrant box. But 
> again....I do not have to bring the machine up or down to get this error.
> 
> I've also deleted the ring info in /var/lib/riak/ring, and deleted all the 
> leveldb files. In this case, the bucket type is just n_val = 1, and the ring 
> size is the minimum of 8. 
> 
> Is it possible Riak is not flushing RAM to disk after write? The keys only 
> reside in RAM?
> 
> My test procedure:
> 
> ====On a remote machine=====
> 
> riak01@ubuntu:/etc$ curl -i http:// 
> <><ip>:8098/types/n1/buckets/test/keys?keys=true
> HTTP/1.1 200 OK
> Vary: Accept-Encoding
> Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
> Date: Fri, 26 Feb 2016 13:14:59 GMT
> Content-Type: application/json
> Content-Length: 17
> 
> {"keys":["test"]}
> 
> riak01@ubuntu:/etc$
> 
> 
> 
> ====On the single Riak node itself====
> 
> [vagrant@i-2016022519 <callto:2016022519>-9bb5c84f riak]$ sudo riak stop
> ok
> [vagrant@i-2016022519 <callto:2016022519>-9bb5c84f riak]$ sudo riak start
> [vagrant@i-2016022519 <callto:2016022519>-9bb5c84f riak]$ sudo riak ping
> pong
> 
> 
> 
> ====Back to the remote machine====
> 
> riak01@ubuntu:/etc$ curl -i http:// 
> <><ip>:8098/types/n1/buckets/test/keys?keys=true
> HTTP/1.1 200 OK
> Vary: Accept-Encoding
> Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
> Date: Fri, 26 Feb 2016 13:16:34 GMT
> Content-Type: application/json
> Content-Length: 11
> 
> {"keys":[]}
> 
> riak01@ubuntu:/etc$
> 
> 
> 
> -- 
> Sincerely,
> 
> Christopher Mancini
> -----------------------------
> 
> employee = {
>     purpose: solve problems with code,
>     phone:    7164625591,
>     email:     cmanc...@basho.com <mailto:cmanc...@basho.com>,
>     github:    http://www.github.com/christophermancini 
> <http://www.github.com/christophermancini>
> }
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to