Hi John and Engel, Here's a link to a Dropbox folder with a set of file pairs (the source file and the corrupted version that has taken a round trip via riak): https://www.dropbox.com/sh/snfbiqm0jys9u2a/AZPF7_RcBT
John, to answer your questions: *Windows-->Riak-->Ubuntu VM* When uploading files from windows to riak, then downloading them to the Ubuntu VM, inconsistencies appear also, but always for the same subset of files (if I repeatedly download the same set of files from riak and verify against the source files). This to me indicates that these files were corrupted on the upload from windows to riak. *Ubuntu VM-->Riak-->Windows* When uploading the source files from Ubuntu VM (and after having verified that they can be downloaded into the Ubuntu VM again without any problems) and then downloading them to windows, inconsistencies appear. However, these inconsistencies are varying from file to file from each download round. I.e., by downloading a file a few times I eventually get a non-corrupted version. This to me indicates that the files were correctly uploaded to riak from the Ubuntu VM, but are corrupted somewhere in the download flow on the windows machine. Ergo: Data appears to be corrupted both when going upstream and when going downstream somewhere inside the stack used by the riak python client on windows 7 64 bit. One more observation: I've done some byte for byte comparisons when uploading/downloading, and the error rate appears to be on the order of 0.4 ppm. Finkle 2013/11/9 John Daily <[email protected]> > (And the inverse would also be interesting to know.) > > -John > > On Nov 8, 2013, at 6:41 PM, John Daily <[email protected]> wrote: > > If you upload the files from Windows, and download them to the Ubuntu VM, > do inconsistencies ever appear? > > -John > > On Nov 8, 2013, at 4:58 PM, Engel Sanchez <[email protected]> wrote: > > Hello there, > > This looks puzzling. Just from looking at the code we haven't found > anything suspicious. Would you mind posting a pair of those files that > failed to match somewhere so we can look at the differences? > > Thanks for reporting this. > > Engel@Basho > > > On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <[email protected]>wrote: > >> Fellow Riak users, >> >> I've noticed that when I upload binary files with sizes of >~1 MB to Riak >> from my Windows 7 (64 bit) machine, then read the same data back again, >> often it has a few corrupted bytes, while maintining the correct total data >> length. >> >> Here's the Python script I use to provoke and detect the situation: >> https://gist.github.com/anonymous/7376084 >> >> Notice that I included the typical output when running the script at the >> bottom of the gist. As you can see, for that particular run, half of the >> dummy-data files were corrupted. The returned data from Riak has the exact >> same length as the source, but not the exact same content. I've only done >> brief analysis of how the corruptions appear within the files that are >> detected as corrupted, but it looks like it's typically between 1 to 5 >> bytes that are altered, evenly distributed within the file. >> >> I get no exceptions or warnings from the Riak Python client. Everything >> appears to be in order. >> >> So far I've tested this on two different windows machines against two >> different Riak clusters (a five node Amazon cluster with a loadbalancer in >> front, and a local devcluster running inside an Ubuntu 12.04 Virtual >> Machine). The problems appear in all four possible combinations. >> >> However, if I run the script from within an Ubuntu VM, on one of the said >> Windows machines, against any of the two Riak clusteres, the problems do >> NOT appear. >> >> Another observation: If I generate 50 sample files, upload them, then >> repeatedly try to download them over and over again, the script will detect >> corruptions in different files on each repetition of downloading. E.g., on >> round one it might say that file 1,5, and 19 were corrupted, but on round >> two it might say 3, 8 and 19. >> >> Here is the riak stats-view from the Amazon cluster we're running (that I >> tested the script agains): >> https://gist.github.com/anonymous/7376379 >> >> But as I said, the corruptions appear also when working locally between a >> Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine. >> >> Here are my local package versions, running on Python 2.7.5 64 bit on >> Windows 7 64 bit: >> protobuf==2.4.1 >> riak==2.0.1 >> riak-pb==1.4.1.1 >> >> Any ideas? This seems relatively serious, unless it's some kind of brutal >> oversight on my part. >> >> Finkle >> >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
