On 02/21/2011 01:39 PM, Kon Wilms wrote:
On Mon, Feb 21, 2011 at 9:45 AM, Steve Wilson<ste...@purdue.edu>  wrote:
We had trouble with reliability for small, actively-accessed files on a
distribute-replicate volume in both GlusterFS 3.11 and 3.12.  It seems that
the replicated servers would eventually get out of sync with each other on
these kinds of files.  For a while, we dropped replication and only ran the
volume as distributed.  This has worked reliably for the past week or so
without any errors that we were seeing before: no such file, invalid
argument, etc.

I'm running thousands of small files over NFSv3 through NGINX with
distribute and have had the opposite experience. Unfortunately when
NGINX can't access a file over NFS it means a customer calling us, so
right now gluster is basically sitting idle (posted my output to the
list a while back with no response).

We've had lots of issues with files disappearing or being inaccessible prior to 3.1.2 with the NFS client and server translator. After 3.1.2, many of these problems *seem* to have been resolved, though all this means in this instance is that the customer hasn't submitted a ticket yet.

I had thought it was originally a timebase issue ... as we had a minute or two drift on some of the nodes (since fixed). But we had a pretty consistent error in this regard.

We did open problem reports. Unfortunately, no action so far (they just closed them this morning, though nothing has been solved per se, the issue simply has not yet resurfaced). I'll leave those reports closed for now.

This said, this error, or one with a very similar signature, has been in the code since the 2.x series. I really ... really want to track it down, but I can't create a simple replicator for it to present to the team. If you have what you think is a simple replicator, please, email me offline. We'll try it here, and if we can get it down to a very simple replication case and test, we'll re-open the bugs.

I'd hate to think its a heisenbug, but that is where I am leaning now.


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
       http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to