Hello there, This looks puzzling. Just from looking at the code we haven't found anything suspicious. Would you mind posting a pair of those files that failed to match somewhere so we can look at the differences?
Thanks for reporting this. Engel@Basho On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <[email protected]>wrote: > Fellow Riak users, > > I've noticed that when I upload binary files with sizes of >~1 MB to Riak > from my Windows 7 (64 bit) machine, then read the same data back again, > often it has a few corrupted bytes, while maintining the correct total data > length. > > Here's the Python script I use to provoke and detect the situation: > https://gist.github.com/anonymous/7376084 > > Notice that I included the typical output when running the script at the > bottom of the gist. As you can see, for that particular run, half of the > dummy-data files were corrupted. The returned data from Riak has the exact > same length as the source, but not the exact same content. I've only done > brief analysis of how the corruptions appear within the files that are > detected as corrupted, but it looks like it's typically between 1 to 5 > bytes that are altered, evenly distributed within the file. > > I get no exceptions or warnings from the Riak Python client. Everything > appears to be in order. > > So far I've tested this on two different windows machines against two > different Riak clusters (a five node Amazon cluster with a loadbalancer in > front, and a local devcluster running inside an Ubuntu 12.04 Virtual > Machine). The problems appear in all four possible combinations. > > However, if I run the script from within an Ubuntu VM, on one of the said > Windows machines, against any of the two Riak clusteres, the problems do > NOT appear. > > Another observation: If I generate 50 sample files, upload them, then > repeatedly try to download them over and over again, the script will detect > corruptions in different files on each repetition of downloading. E.g., on > round one it might say that file 1,5, and 19 were corrupted, but on round > two it might say 3, 8 and 19. > > Here is the riak stats-view from the Amazon cluster we're running (that I > tested the script agains): > https://gist.github.com/anonymous/7376379 > > But as I said, the corruptions appear also when working locally between a > Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine. > > Here are my local package versions, running on Python 2.7.5 64 bit on > Windows 7 64 bit: > protobuf==2.4.1 > riak==2.0.1 > riak-pb==1.4.1.1 > > Any ideas? This seems relatively serious, unless it's some kind of brutal > oversight on my part. > > Finkle > > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
