Fellow Riak users, I've noticed that when I upload binary files with sizes of >~1 MB to Riak from my Windows 7 (64 bit) machine, then read the same data back again, often it has a few corrupted bytes, while maintining the correct total data length.
Here's the Python script I use to provoke and detect the situation: https://gist.github.com/anonymous/7376084 Notice that I included the typical output when running the script at the bottom of the gist. As you can see, for that particular run, half of the dummy-data files were corrupted. The returned data from Riak has the exact same length as the source, but not the exact same content. I've only done brief analysis of how the corruptions appear within the files that are detected as corrupted, but it looks like it's typically between 1 to 5 bytes that are altered, evenly distributed within the file. I get no exceptions or warnings from the Riak Python client. Everything appears to be in order. So far I've tested this on two different windows machines against two different Riak clusters (a five node Amazon cluster with a loadbalancer in front, and a local devcluster running inside an Ubuntu 12.04 Virtual Machine). The problems appear in all four possible combinations. However, if I run the script from within an Ubuntu VM, on one of the said Windows machines, against any of the two Riak clusteres, the problems do NOT appear. Another observation: If I generate 50 sample files, upload them, then repeatedly try to download them over and over again, the script will detect corruptions in different files on each repetition of downloading. E.g., on round one it might say that file 1,5, and 19 were corrupted, but on round two it might say 3, 8 and 19. Here is the riak stats-view from the Amazon cluster we're running (that I tested the script agains): https://gist.github.com/anonymous/7376379 But as I said, the corruptions appear also when working locally between a Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine. Here are my local package versions, running on Python 2.7.5 64 bit on Windows 7 64 bit: protobuf==2.4.1 riak==2.0.1 riak-pb==1.4.1.1 Any ideas? This seems relatively serious, unless it's some kind of brutal oversight on my part. Finkle
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
