Sure, will do. I'm still a little confused about how Riak runs on one machine though. Is it running three server nodes or does it run only a single Riak node and store three copies of the data?
Thanks, Ben On Wed, Apr 10, 2013 at 10:14 AM, Matthew Von-Maszewski <[email protected]>wrote: > Ben, > > The runtime recovery log ends in "XXXXXX.log" where XXXXXX is a six digit > numeric. Its size will vary between 30Mbytes and 60Mbytes per vnode > directory.no > > My recommendation is that you change the app.config file's > default_bucket_props detailed below. Completely erase the data storage > area. Then run again. Should make a big total size difference. > > See if this results in more reasonable / comparable sizes. This will give > you a default compression comparison. There is also a way to tune the > database such that it would compress even more, but at the cost of random > read performance. We can try that next. > > Matthew > > > > > On Apr 10, 2013, at 12:31 PM, Ben McCann <[email protected]> wrote: > > Thanks for the help. If I were saving three copies of the data in Riak > that would certainly explain it! I installed Riak via the apt repository > instructions<http://docs.basho.com/riak/1.1.4/tutorials/installation/Installing-on-Debian-and-Ubuntu/>. > Not sure what that does by default. If it's saving three copies of the data > then I assume it would also be running three server nodes or does it run > only a single Riak node and store three copies of the data? I'm accessing > Riak on port 8098, which seems to be the default if only one node is > running. > > The first level of leveldb storage looks to be quite small to me. Is the > runtime data recovery log likely to be very large? Can you tell me where > that would be located or point me to some docs on it? > > I'm not super interested in squeezing out an extra percent or two of > storage here or there, but just want to roughly have some idea if storing > my data with snappy compression will yield me a 30% savings or 50% savings > or 80% savings, etc. So any really big things like perhaps storing three > copies of the data are interesting =) In production, my average document > size is probably about 2k and I have tens of millions and soon to be > hundreds of millions of them. > > Thanks! > -Ben > > > On Wed, Apr 10, 2013 at 6:22 AM, Matthew Von-Maszewski <[email protected] > > wrote: > >> Greetings Ben, >> >> Also, leveldb stores data in "levels". The very first storage level and >> the runtime data recovery log are not compressed. >> >> That said, I agree with Tom that you are most likely seeing Riak store 3 >> copies of your data versus only one for mongodb. It is possible to dumb >> down Riak so that it is closer to mongodb: >> >> 1. in app.config, look for the riak_core options, add the following line: >> >> {default_bucket_props, [{n_val,1}]}, >> >> This will default the system to only storing one copy of your data. >> >> >> 2. if you are using Riak 1.3, again in app.config, look for the riak_kv >> options: >> >> change this >> >> {anti_entropy, {on, []}}, >> >> to >> >> {anti_entropy, {off, []}}, >> >> This will disable Riak's automatic detection and correction of data loss >> / corruption. The feature requires an added 1 to 2% data on disk. >> >> >> Matthew >> >> >> >> On Apr 10, 2013, at 9:01 AM, Tom Santero <[email protected]> wrote: >> >> Hi Ben, >> >> First, allow me to welcome to the list! Stick around, I think you'll like >> it here. :) >> >> How many nodes of Riak are you running vs how many nodes of Mongo? >> >> How much more disk space did Riak take? >> >> Riak is designed to run as a cluster of several nodes, utilizing >> replication to provide resiliency and high-availability during partial >> failure. By default Riak stores three replicas of every object you persist. >> If you are only running a single node of Riak for your testing purposes, I >> suspect this may explain the significant divergence you're seeing when >> compared to the disk space used vs a single mongo, as each replica in Riak >> is being stored to the same disk. >> >> Also, Snappy is optimizes for speed over disk utility, which will have a >> negligible impact on total disk usage when compared to other compression >> libraries such as zlib, etc. That said, for sufficiently large JSON files I >> know that BSON's prefixes can add significant overhead to object sizes such >> that BSON is actually heavier than the JSON it represents. What is the >> average size of the documents you're seeking to store? >> >> Could you tell us a bit more about what you're trying to achieve with >> both Riak and Mongo, respectfully? >> >> Tom >> >> On Wed, Apr 10, 2013 at 12:39 AM, Ben McCann <[email protected]> wrote: >> >>> Hi, >>> >>> I'm currently storing data in MongoDB and would like to evaluate Riak as >>> an alternative. Riak is appealing to me because LevelDB uses Snappy, so I >>> would expect it to take less disk space to store my data set than MongoDB >>> which does not use compression. However, when I benchmarked it by inserting >>> a few hundred thousand JSON records into each datastore, Riak in fact took >>> far more disk space. I'm wondering if there's something I might be missing >>> here as a newcomer to Riak. E.g. I checked the disk space used by running >>> "du -ch /var/lib/riak/leveldb". Is this perhaps not a good way to check >>> disk space usage because perhaps Riak/LevelDB preallocates files? (I know >>> MongoDB does this and has a built-in db.collection.stats command to provide >>> true disk usage information). Are there any other reasons why Riak might be >>> taking more space or anything I could have screwed up? >>> >>> Thanks, >>> Ben >>> >>> -- >>> about.me/benmccann >>> _______________________________________________ >>> riak-users mailing list >>> [email protected] >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> > > > -- > about.me/benmccann > > > -- about.me/benmccann
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
