Fellow Riak users,

I've noticed that when I upload binary files with sizes of >~1 MB to Riak
from my Windows 7 (64 bit) machine, then read the same data back again,
often it has a few corrupted bytes, while maintining the correct total data
length.

Here's the Python script I use to provoke and detect the situation:
https://gist.github.com/anonymous/7376084

Notice that I included the typical output when running the script at the
bottom of the gist. As you can see, for that particular run, half of the
dummy-data files were corrupted. The returned data from Riak has the exact
same length as the source, but not the exact same content. I've only done
brief analysis of how the corruptions appear within the files that are
detected as corrupted, but it looks like it's typically between 1 to 5
bytes that are altered, evenly distributed within the file.

I get no exceptions or warnings from the Riak Python client. Everything
appears to be in order.

So far I've tested this on two different windows machines against two
different Riak clusters (a five node Amazon cluster with a loadbalancer in
front, and a local devcluster running inside an Ubuntu 12.04 Virtual
Machine). The problems appear in all four possible combinations.

However, if I run the script from within an Ubuntu VM, on one of the said
Windows machines, against any of the two Riak clusteres, the problems do
NOT appear.

Another observation: If I generate 50 sample files, upload them, then
repeatedly try to download them over and over again, the script will detect
corruptions in different files on each repetition of downloading. E.g., on
round one it might say that file 1,5, and 19 were corrupted, but on round
two it might say 3, 8 and 19.

Here is the riak stats-view from the Amazon cluster we're running (that I
tested the script agains):
https://gist.github.com/anonymous/7376379

But as I said, the corruptions appear also when working locally between a
Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine.

Here are my local package versions, running on Python 2.7.5 64 bit on
Windows 7 64 bit:
protobuf==2.4.1
riak==2.0.1
riak-pb==1.4.1.1

Any ideas? This seems relatively serious, unless it's some kind of brutal
oversight on my part.

Finkle
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to