(And the inverse would also be interesting to know.)

-John

On Nov 8, 2013, at 6:41 PM, John Daily <[email protected]> wrote:

> If you upload the files from Windows, and download them to the Ubuntu VM, do 
> inconsistencies ever appear?
> 
> -John
> 
> On Nov 8, 2013, at 4:58 PM, Engel Sanchez <[email protected]> wrote:
> 
>> Hello there,
>> 
>> This looks puzzling. Just from looking at the code we haven't found anything 
>> suspicious. Would you mind posting a pair of those files that failed to 
>> match somewhere so we can look at the differences?
>> 
>> Thanks for reporting this.
>> 
>> Engel@Basho
>> 
>> 
>> On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <[email protected]> wrote:
>> Fellow Riak users,
>> 
>> I've noticed that when I upload binary files with sizes of >~1 MB to Riak 
>> from my Windows 7 (64 bit) machine, then read the same data back again, 
>> often it has a few corrupted bytes, while maintining the correct total data 
>> length.
>> 
>> Here's the Python script I use to provoke and detect the situation:
>> https://gist.github.com/anonymous/7376084
>> 
>> Notice that I included the typical output when running the script at the 
>> bottom of the gist. As you can see, for that particular run, half of the 
>> dummy-data files were corrupted. The returned data from Riak has the exact 
>> same length as the source, but not the exact same content. I've only done 
>> brief analysis of how the corruptions appear within the files that are 
>> detected as corrupted, but it looks like it's typically between 1 to 5 bytes 
>> that are altered, evenly distributed within the file.
>> 
>> I get no exceptions or warnings from the Riak Python client. Everything 
>> appears to be in order.
>> 
>> So far I've tested this on two different windows machines against two 
>> different Riak clusters (a five node Amazon cluster with a loadbalancer in 
>> front, and a local devcluster running inside an Ubuntu 12.04 Virtual 
>> Machine). The problems appear in all four possible combinations.
>> 
>> However, if I run the script from within an Ubuntu VM, on one of the said 
>> Windows machines, against any of the two Riak clusteres, the problems do NOT 
>> appear.
>> 
>> Another observation: If I generate 50 sample files, upload them, then 
>> repeatedly try to download them over and over again, the script will detect 
>> corruptions in different files on each repetition of downloading. E.g., on 
>> round one it might say that file 1,5, and 19 were corrupted, but on round 
>> two it might say 3, 8 and 19.
>> 
>> Here is the riak stats-view from the Amazon cluster we're running (that I 
>> tested the script agains):
>> https://gist.github.com/anonymous/7376379
>> 
>> But as I said, the corruptions appear also when working locally between a 
>> Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine.
>> 
>> Here are my local package versions, running on Python 2.7.5 64 bit on 
>> Windows 7 64 bit:
>> protobuf==2.4.1
>> riak==2.0.1
>> riak-pb==1.4.1.1
>> 
>> Any ideas? This seems relatively serious, unless it's some kind of brutal 
>> oversight on my part.
>> 
>> Finkle
>> 
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to