Hello there,

This looks puzzling. Just from looking at the code we haven't found
anything suspicious. Would you mind posting a pair of those files that
failed to match somewhere so we can look at the differences?

Thanks for reporting this.

Engel@Basho


On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <[email protected]>wrote:

> Fellow Riak users,
>
> I've noticed that when I upload binary files with sizes of >~1 MB to Riak
> from my Windows 7 (64 bit) machine, then read the same data back again,
> often it has a few corrupted bytes, while maintining the correct total data
> length.
>
> Here's the Python script I use to provoke and detect the situation:
> https://gist.github.com/anonymous/7376084
>
> Notice that I included the typical output when running the script at the
> bottom of the gist. As you can see, for that particular run, half of the
> dummy-data files were corrupted. The returned data from Riak has the exact
> same length as the source, but not the exact same content. I've only done
> brief analysis of how the corruptions appear within the files that are
> detected as corrupted, but it looks like it's typically between 1 to 5
> bytes that are altered, evenly distributed within the file.
>
> I get no exceptions or warnings from the Riak Python client. Everything
> appears to be in order.
>
> So far I've tested this on two different windows machines against two
> different Riak clusters (a five node Amazon cluster with a loadbalancer in
> front, and a local devcluster running inside an Ubuntu 12.04 Virtual
> Machine). The problems appear in all four possible combinations.
>
> However, if I run the script from within an Ubuntu VM, on one of the said
> Windows machines, against any of the two Riak clusteres, the problems do
> NOT appear.
>
> Another observation: If I generate 50 sample files, upload them, then
> repeatedly try to download them over and over again, the script will detect
> corruptions in different files on each repetition of downloading. E.g., on
> round one it might say that file 1,5, and 19 were corrupted, but on round
> two it might say 3, 8 and 19.
>
> Here is the riak stats-view from the Amazon cluster we're running (that I
> tested the script agains):
> https://gist.github.com/anonymous/7376379
>
> But as I said, the corruptions appear also when working locally between a
> Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine.
>
> Here are my local package versions, running on Python 2.7.5 64 bit on
> Windows 7 64 bit:
> protobuf==2.4.1
> riak==2.0.1
> riak-pb==1.4.1.1
>
> Any ideas? This seems relatively serious, unless it's some kind of brutal
> oversight on my part.
>
> Finkle
>
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to