Re: [Gluster-users] Best practices after a peer failure?

2011-03-15 Thread Pranith Kumar. Karampuri
hi Mohit,
 Self heal happens whenever a lookup happens on an in-consistent file. The 
commands ls -laR, find do lookup on all the files recursively under the 
directory we specify.

Pranith.

- Original Message -
From: "Mohit Anchlia" 
To: "Pranith Kumar. Karampuri" , gluster-users@gluster.org
Sent: Wednesday, March 16, 2011 3:19:13 AM
Subject: Re: [Gluster-users] Best practices after a peer failure?

I thought self healing is possible only after we run "ls -alR or find
.." . It looks self healing is supposed to work when a dead node is
brought up, is that true?

On Tue, Mar 15, 2011 at 6:07 AM, Pranith Kumar. Karampuri
 wrote:
> hi R.C.,
>    Could you please give the exact steps when you log the bug. Please also 
> give the output of gluster peer status on both the machines after restart. 
> zip the files under /usr/local/var/log/glusterfs/ and /etc/glusterd on both 
> the machines when this issue happens. This should help us debug the issue.
>
> Thanks
> Pranith.
>
> - Original Message -
> From: "R.C." 
> To: gluster-users@gluster.org
> Sent: Tuesday, March 15, 2011 4:14:24 PM
> Subject: Re: [Gluster-users] Best practices after a peer failure?
>
> I've figured out the problem.
>
> If you mount the glusterfs with native client on a peer, if another peer
> crashes then doesn't self-heal after reboot.
>
> Should I put this issue in the bug tracker?
>
> Bye
>
> Raf
>
>
> - Original Message -
> From: "R.C." 
> To: 
> Sent: Monday, March 14, 2011 11:41 PM
> Subject: Best practices after a peer failure?
>
>
>> Hello to the list.
>>
>> I'm practicing GlusterFS in various topologies by means of multiple
>> Virtualbox VMs.
>>
>> As the standard system administrator, I'm mainly interested in disaster
>> recovery scenarios. The first being a replica 2 configuration, with one
>> peer crashing (actually stopping VM abruptly) during data writing to the
>> volume.
>> After rebooting the stopped VM and relaunching the gluster deamon (service
>> glusterd start), the cluster doesn't start healing by itself.
>> I've also tried the suggested commands:
>> find  -print0 | xargs --null stat >/dev/null
>> and
>> find  -type f -exec dd if='{}' of=/dev/null bs=1M \; >
>> /dev/null 2>&1
>> without success.
>> A rebalance command recreates replicas but, when accessing cluster, the
>> always-alive client is the only one committing data to disk.
>>
>> Where am I misoperating?
>>
>> Thank you for your support.
>>
>> Raf
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] add-brick command with replica parameter fails

2011-03-15 Thread Amar Tumballi
Hi James,

Answers inline.

According to the GlusterFS documentation, adding bricks allows a "replica N"
> argument in the command line, as show by this phrase from the manual:
>
>Brick Commands
>
>volume add-brick VOLNAME [(replica COUNT)|(stripe COUNT)] NEW-BRICK
> ...
>
>
We have asked this documentation to be removed, even in man-pages this will
be removed. Please don't use replica or stripe count with 'add-brick' and
'remove-brick'.


> Anybody able to help me out here? Just to forestall the question "have you
> tried adding the storage without the replica 2 parameters?" - yes, I have,
> and that gives me a set of servers in which taking down any one node causes
> all clients to hang until that node comes back up - I posted an earlier
> thread asking for help with that here:
> http://gluster.org/pipermail/gluster-users/2011-March/006886.html
>
>
> About mount point hanging till all the nodes come up, we have fixed similar
bug in 3.1.3 release, please give it a try and see if it works fine.

Regards,
Amar
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Best practices after a peer failure?

2011-03-15 Thread Mohit Anchlia
I thought self healing is possible only after we run "ls -alR or find
.." . It looks self healing is supposed to work when a dead node is
brought up, is that true?

On Tue, Mar 15, 2011 at 6:07 AM, Pranith Kumar. Karampuri
 wrote:
> hi R.C.,
>    Could you please give the exact steps when you log the bug. Please also 
> give the output of gluster peer status on both the machines after restart. 
> zip the files under /usr/local/var/log/glusterfs/ and /etc/glusterd on both 
> the machines when this issue happens. This should help us debug the issue.
>
> Thanks
> Pranith.
>
> - Original Message -
> From: "R.C." 
> To: gluster-users@gluster.org
> Sent: Tuesday, March 15, 2011 4:14:24 PM
> Subject: Re: [Gluster-users] Best practices after a peer failure?
>
> I've figured out the problem.
>
> If you mount the glusterfs with native client on a peer, if another peer
> crashes then doesn't self-heal after reboot.
>
> Should I put this issue in the bug tracker?
>
> Bye
>
> Raf
>
>
> - Original Message -
> From: "R.C." 
> To: 
> Sent: Monday, March 14, 2011 11:41 PM
> Subject: Best practices after a peer failure?
>
>
>> Hello to the list.
>>
>> I'm practicing GlusterFS in various topologies by means of multiple
>> Virtualbox VMs.
>>
>> As the standard system administrator, I'm mainly interested in disaster
>> recovery scenarios. The first being a replica 2 configuration, with one
>> peer crashing (actually stopping VM abruptly) during data writing to the
>> volume.
>> After rebooting the stopped VM and relaunching the gluster deamon (service
>> glusterd start), the cluster doesn't start healing by itself.
>> I've also tried the suggested commands:
>> find  -print0 | xargs --null stat >/dev/null
>> and
>> find  -type f -exec dd if='{}' of=/dev/null bs=1M \; >
>> /dev/null 2>&1
>> without success.
>> A rebalance command recreates replicas but, when accessing cluster, the
>> always-alive client is the only one committing data to disk.
>>
>> Where am I misoperating?
>>
>> Thank you for your support.
>>
>> Raf
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Terry Haley
Ok, I need to take a breath. I just realized, with your question, that
gluster did not do striped volumes until 3.1 and this system is 3.0.
Distributed should still contain whole files on the various nodes. That's
why in the backup some files were missing and other large ones were still
intact.



-Original Message-
From: Joe Landman [mailto:land...@scalableinformatics.com] 
Sent: Tuesday, March 15, 2011 10:47 AM
To: Terry Haley
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Quick question regarding xfs_repair

On 03/15/2011 10:46 AM, Terry Haley wrote:
> I'm guessing since Gluster was striping across this and 3 other nodes, I'm
> pretty much left with whatever data I was able to backup prior to this
> happening and whatever I was lucky enough to get from the gluster system
> that didn't have stripes on this node. I still haven't heard if gluster
will
> fail on a copy if the file could not be fully constructed from the
stripes.
>
> I suppose it's time for some diagnostics to see if the node is worth
> rebuilding or I need to order anything.

What RAID card (or is this SW RAID)?  I'd start with the diagnostics.

Was the volume striping or distributed?  If striping, the data is 
possibly toast, and I'd suggest more effort in recovering the device. 
If distributed, you will be missing some files.



-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Joe Landman

On 03/15/2011 10:52 AM, Terry Haley wrote:

Here's the glusterfs.vol entry:

volume distribute
 type cluster/distribute
 subvolumes 192.160.200.11-1 192.160.200.11-2 192.160.200.12-1
192.160.200.12-2
end-volume



Which version of GlusterFS is this?


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
   http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Terry Haley
Here's the glusterfs.vol entry:

volume distribute
type cluster/distribute
subvolumes 192.160.200.11-1 192.160.200.11-2 192.160.200.12-1
192.160.200.12-2
end-volume



-Original Message-
From: Joe Landman [mailto:land...@scalableinformatics.com] 
Sent: Tuesday, March 15, 2011 10:47 AM
To: Terry Haley
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Quick question regarding xfs_repair

On 03/15/2011 10:46 AM, Terry Haley wrote:
> I'm guessing since Gluster was striping across this and 3 other nodes, I'm
> pretty much left with whatever data I was able to backup prior to this
> happening and whatever I was lucky enough to get from the gluster system
> that didn't have stripes on this node. I still haven't heard if gluster
will
> fail on a copy if the file could not be fully constructed from the
stripes.
>
> I suppose it's time for some diagnostics to see if the node is worth
> rebuilding or I need to order anything.

What RAID card (or is this SW RAID)?  I'd start with the diagnostics.

Was the volume striping or distributed?  If striping, the data is 
possibly toast, and I'd suggest more effort in recovering the device. 
If distributed, you will be missing some files.



-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Terry Haley
-Original Message-
From: Joe Landman [mailto:land...@scalableinformatics.com] 
Sent: Tuesday, March 15, 2011 10:47 AM
To: Terry Haley
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Quick question regarding xfs_repair

On 03/15/2011 10:46 AM, Terry Haley wrote:
> I'm guessing since Gluster was striping across this and 3 other nodes, I'm
> pretty much left with whatever data I was able to backup prior to this
> happening and whatever I was lucky enough to get from the gluster system
> that didn't have stripes on this node. I still haven't heard if gluster
will
> fail on a copy if the file could not be fully constructed from the
stripes.
>
> I suppose it's time for some diagnostics to see if the node is worth
> rebuilding or I need to order anything.

What RAID card (or is this SW RAID)?  I'd start with the diagnostics.

- ADAPTEC 52445 SAS 

Was the volume striping or distributed?  If striping, the data is 
possibly toast, and I'd suggest more effort in recovering the device. 
If distributed, you will be missing some files.

- The gluster volume was distributed. 








-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Joe Landman

On 03/15/2011 10:46 AM, Terry Haley wrote:

I'm guessing since Gluster was striping across this and 3 other nodes, I'm
pretty much left with whatever data I was able to backup prior to this
happening and whatever I was lucky enough to get from the gluster system
that didn't have stripes on this node. I still haven't heard if gluster will
fail on a copy if the file could not be fully constructed from the stripes.

I suppose it's time for some diagnostics to see if the node is worth
rebuilding or I need to order anything.


What RAID card (or is this SW RAID)?  I'd start with the diagnostics.

Was the volume striping or distributed?  If striping, the data is 
possibly toast, and I'd suggest more effort in recovering the device. 
If distributed, you will be missing some files.




--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
   http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Terry Haley
I'm guessing since Gluster was striping across this and 3 other nodes, I'm
pretty much left with whatever data I was able to backup prior to this
happening and whatever I was lucky enough to get from the gluster system
that didn't have stripes on this node. I still haven't heard if gluster will
fail on a copy if the file could not be fully constructed from the stripes.

I suppose it's time for some diagnostics to see if the node is worth
rebuilding or I need to order anything.

Thanks Joe.

Terry

-Original Message-
From: Joe Landman [mailto:land...@scalableinformatics.com] 
Sent: Tuesday, March 15, 2011 10:33 AM
To: Terry Haley
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Quick question regarding xfs_repair

On 03/15/2011 10:32 AM, Terry Haley wrote:
> I did so and received the same error:
>
> [root@temporal002 ~]# xfs_repair /dev/sdd
> Phase 1 - find and verify superblock...
> superblock read failed, offset 0, size 524288, ag 0, rval -1
>
> fatal error -- Input/output error
>
> My messages file looks like this:
>
> Mar 15 10:34:28 temporal002 kernel: sd 7:0:0:0: SCSI error: return code =
> 0x0802
> Mar 15 10:34:28 temporal002 kernel: sdd: Current: sense key: Hardware
Error
> Mar 15 10:34:28 temporal002 kernel: Add. Sense: Internal target
failure
> Mar 15 10:34:28 temporal002 kernel:
> Mar 15 10:34:28 temporal002 kernel: end_request: I/O error, dev sdd,
sector
> 0

Ok, thats a hardware error.  The file system may (or may not) be intact. 
  The RAID/disk however, is probably toast.




-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS performance questions

2011-03-15 Thread Ed W
On 14/03/2011 22:18, Alexander Todorov wrote:
> Hello folks,
> I'm looking for GlusterFS performance metrics. What I'm interested in
> particular is:
> 
> * Do adding more bricks to a volume make reads faster?
> * How do replica count affect that?

Although no one seems to be really talking about performance in these
terms, I think the limiting factor is usually going to be network
latency.  In very approximate terms, each time you touch a file in
Glusterfs you need to ask every other brick for it's opinion as to
whether you have the newest file or not.  Therefore your file IO/sec is
bounded by your network latency...

So I would presume that those who get infiniband network hardware and
it's few uS latency times see far better performance than those of us on
gigabit and the barely sub millisec latency that this entails?

So I suspect you can predict rough performance while changing the
hardware by thinking about how the network constrains you.  eg consider
your access pattern, small files/large files, small reads/large reads,
number of bricks, etc

Note it doesn't seem popular to discuss performance in these terms, but
I think if you read through the old posts in the lists you will see that
really it's this network latency vs required access patterns which
determine whether they feel gluster is fast/slow?

To jump to a conclusion, it makes sense that large reads on large files
do much better than accessing lots of small files...  If you make the
files large enough then you start to test the disk performance, etc

Good luck

Ed W
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Joe Landman

On 03/15/2011 10:32 AM, Terry Haley wrote:

I did so and received the same error:

[root@temporal002 ~]# xfs_repair /dev/sdd
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1

fatal error -- Input/output error

My messages file looks like this:

Mar 15 10:34:28 temporal002 kernel: sd 7:0:0:0: SCSI error: return code =
0x0802
Mar 15 10:34:28 temporal002 kernel: sdd: Current: sense key: Hardware Error
Mar 15 10:34:28 temporal002 kernel: Add. Sense: Internal target failure
Mar 15 10:34:28 temporal002 kernel:
Mar 15 10:34:28 temporal002 kernel: end_request: I/O error, dev sdd, sector
0


Ok, thats a hardware error.  The file system may (or may not) be intact. 
 The RAID/disk however, is probably toast.





--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
   http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Terry Haley
I did so and received the same error:

[root@temporal002 ~]# xfs_repair /dev/sdd
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1

fatal error -- Input/output error

My messages file looks like this:

Mar 15 10:34:28 temporal002 kernel: sd 7:0:0:0: SCSI error: return code =
0x0802
Mar 15 10:34:28 temporal002 kernel: sdd: Current: sense key: Hardware Error
Mar 15 10:34:28 temporal002 kernel: Add. Sense: Internal target failure
Mar 15 10:34:28 temporal002 kernel:
Mar 15 10:34:28 temporal002 kernel: end_request: I/O error, dev sdd, sector
0
Mar 15 10:34:28 temporal002 kernel: sd 7:0:0:0: SCSI error: return code =
0x0802
Mar 15 10:34:28 temporal002 kernel: sdd: Current: sense key: Hardware Error
Mar 15 10:34:28 temporal002 kernel: Add. Sense: Internal target failure
Mar 15 10:34:28 temporal002 kernel:
Mar 15 10:34:28 temporal002 kernel: end_request: I/O error, dev sdd, sector
511
Mar 15 10:34:28 temporal002 kernel: sd 7:0:0:0: SCSI error: return code =
0x0802
Mar 15 10:34:28 temporal002 kernel: sdd: Current: sense key: Hardware Error
Mar 15 10:34:28 temporal002 kernel: Add. Sense: Internal target failure
Mar 15 10:34:28 temporal002 kernel:
Mar 15 10:34:28 temporal002 kernel: end_request: I/O error, dev sdd, sector
1023



-Original Message-
From: Joe Landman [mailto:land...@scalableinformatics.com] 
Sent: Tuesday, March 15, 2011 10:27 AM
To: Terry Haley
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Quick question regarding xfs_repair

On 03/15/2011 10:25 AM, Terry Haley wrote:
> Just to give a quick update. I've rebooted the machine and when attempting
> an xfs_repair I get the following:
>
> [root@temporal002 ~]# xfs_repair -n /dev/sdd
> Phase 1 - find and verify superblock...
> superblock read failed, offset 0, size 524288, ag 0, rval -1

No, try removing the "-n" before giving up on it.  -n mode will 
terminate early as it won't try to repair anything.  First hint of 
trouble and it bails.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Joe Landman

On 03/15/2011 10:25 AM, Terry Haley wrote:

Just to give a quick update. I've rebooted the machine and when attempting
an xfs_repair I get the following:

[root@temporal002 ~]# xfs_repair -n /dev/sdd
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1


No, try removing the "-n" before giving up on it.  -n mode will 
terminate early as it won't try to repair anything.  First hint of 
trouble and it bails.



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
   http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Quick question regarding xfs_repair

2011-03-15 Thread Terry Haley
Just to give a quick update. I've rebooted the machine and when attempting
an xfs_repair I get the following:

[root@temporal002 ~]# xfs_repair -n /dev/sdd
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1

fatal error -- Input/output error

Hardware problem maybe?

Unless anyone has a suggestion, I'm pretty much certain at this point that
there's no way to recover this data. Even if I could do a dd_rescue and then
do a repair on that, gluster would most likely be unhappy with the result.

Thanks for everyone's input!

Terry

-Original Message-
From: Joe Landman [mailto:land...@scalableinformatics.com] 
Sent: Monday, March 14, 2011 1:38 PM
To: Terry Haley
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Quick question regarding xfs_repair

On 03/14/2011 01:34 PM, Terry Haley wrote:
> Just to clarify.
>
> I have a shell open that's currently hung doing an umount. So start
another
> shell and do the lazy umount?
>
> If that hangs as well, then reboot?

Yes.  First do an

killall -9 umount

before the other one.  Might not work, but do try it.  See if your dmesg 
output has a call stack at the end indicating a kernel subsystem oops 
(worse than a file system shutdown).

   If you have to reboot do this:

mount -o remount,sync /

which will put the root into synchronous mode (fewer dirty buffers). 
Then if you have to bounce the unit due to the hang (possible), you can 
do so with somewhat more safety.



>
>
> Thanks
>
> -Original Message-
> From: Joe Landman [mailto:land...@scalableinformatics.com]
> Sent: Monday, March 14, 2011 1:28 PM
> To: Terry Haley
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Quick question regarding xfs_repair
>
> On 03/14/2011 01:22 PM, Terry Haley wrote:
>
>> At this point, all I can see in my future is trying to reboot without
>> remounting and do the repair, which seems like a long shot?
>>
>> Suggestions?
>
> Yeah, looks like it couldn't write to the log, so it marked the file
> system as down.  Believe it or not, this may have saved you ...
>
> Do a
>
>   umount -l /xfs/mount/point
>
> and wait a bit.  It will do the umount in the background.  Put a
> "noauto" option on this in the /etc/fstab just in case  you need to
reboot.
>
> BTW:  Which kernel is this?  The stock Centos kernels xfs support comes
> from centosplus.  Support for xfs isn't bad, but in general, the
> RHEL/Centos kernels aren't (in our experience) stable against very heavy
> loads, nor are they terribly good with xfs.  5.5 is better.
>
> Regards
>
> Joe
>
>


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] add-brick command with replica parameter fails

2011-03-15 Thread Burnash, James
Thanks for the quick response, but if you look at the end of my message you 
will see that I already tried that and the mirroring did not appear to work 
(see my link to my earlier thread in that same message).

Still looking for a working answer to either this question or the linked one.

James Burnash, Unix Engineering

-Original Message-
From: Anush Shetty [mailto:an...@gluster.com]
Sent: Tuesday, March 15, 2011 9:50 AM
To: Burnash, James
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] add-brick command with replica parameter fails

On Tue, Mar 15, 2011 at 6:59 PM, Burnash, James  wrote:
> Hello.
>
> According to the GlusterFS documentation, adding bricks allows a "replica N" 
> argument in the command line, as show by this phrase from the manual:
>
>Brick Commands
>
>volume add-brick VOLNAME [(replica COUNT)|(stripe COUNT)] NEW-BRICK ...
>
>Add the specified brick to the specified volume.
>
> I currently have two servers, volume configured like this:
>
> Volume Name: test-pfs-ro1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: jc1letgfs5:/export/read-only/g01
> Brick2: jc1letgfs6:/export/read-only/g01
> Brick3: jc1letgfs5:/export/read-only/g02
> Brick4: jc1letgfs6:/export/read-only/g02
> Options Reconfigured:
> performance.stat-prefetch: on
> performance.cache-size: 2GB
> network.ping-timeout: 10
>
> I want to add the next two servers, which are identically configured with 
> hardware, storage, filesystems, etc. However, adding the bricks using the 
> "replica" argument to indicate that I want this added as another set of 
> mirrors, fails - as seen below:
>
> gluster volume add-brick test-pfs-ro1 replica 2 
> jc1letgfs7:/export/read-only/g01 jc1letgfs8:/export/read-only/g01 
> jc1letgfs7:/export/read-only/g02 jc1letgfs8:/export/read-only/g02
>
> wrong brick type: replica, use :
> Usage: volume add-brick   ...
> Adding brick to Volume test-pfs-ro1 failed
>


You needn't specify replica there for add-brick. So it should be,

gluster volume add-brick test-pfs-ro1
jc1letgfs7:/export/read-only/g01 jc1letgfs8:/export/read-only/g01
jc1letgfs7:/export/read-only/g02 jc1letgfs8:/export/read-only/g02

-
Anush


DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the 
addressee(s) named herein and may contain legally privileged and/or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
e-mail, and any attachments thereto, is strictly prohibited. If you have 
received this in error, please immediately notify me and permanently delete the 
original and any copy of any e-mail and any printout thereof. E-mail 
transmission cannot be guaranteed to be secure or error-free. The sender 
therefore does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its 
discretion, monitor and review the content of all e-mail communications. 
http://www.knight.com
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] add-brick command with replica parameter fails

2011-03-15 Thread Anush Shetty
On Tue, Mar 15, 2011 at 6:59 PM, Burnash, James  wrote:
> Hello.
>
> According to the GlusterFS documentation, adding bricks allows a "replica N" 
> argument in the command line, as show by this phrase from the manual:
>
>    Brick Commands
>
>        volume add-brick VOLNAME [(replica COUNT)|(stripe COUNT)] NEW-BRICK ...
>
>            Add the specified brick to the specified volume.
>
> I currently have two servers, volume configured like this:
>
> Volume Name: test-pfs-ro1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: jc1letgfs5:/export/read-only/g01
> Brick2: jc1letgfs6:/export/read-only/g01
> Brick3: jc1letgfs5:/export/read-only/g02
> Brick4: jc1letgfs6:/export/read-only/g02
> Options Reconfigured:
> performance.stat-prefetch: on
> performance.cache-size: 2GB
> network.ping-timeout: 10
>
> I want to add the next two servers, which are identically configured with 
> hardware, storage, filesystems, etc. However, adding the bricks using the 
> "replica" argument to indicate that I want this added as another set of 
> mirrors, fails - as seen below:
>
> gluster volume add-brick test-pfs-ro1 replica 2 
> jc1letgfs7:/export/read-only/g01 jc1letgfs8:/export/read-only/g01 
> jc1letgfs7:/export/read-only/g02 jc1letgfs8:/export/read-only/g02
>
> wrong brick type: replica, use :
> Usage: volume add-brick   ...
> Adding brick to Volume test-pfs-ro1 failed
>


You needn't specify replica there for add-brick. So it should be,

gluster volume add-brick test-pfs-ro1
jc1letgfs7:/export/read-only/g01 jc1letgfs8:/export/read-only/g01
jc1letgfs7:/export/read-only/g02 jc1letgfs8:/export/read-only/g02

-
Anush
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] add-brick command with replica parameter fails

2011-03-15 Thread Burnash, James
Hello.

According to the GlusterFS documentation, adding bricks allows a "replica N" 
argument in the command line, as show by this phrase from the manual:

Brick Commands

volume add-brick VOLNAME [(replica COUNT)|(stripe COUNT)] NEW-BRICK ...

Add the specified brick to the specified volume.

I currently have two servers, volume configured like this:

Volume Name: test-pfs-ro1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: jc1letgfs5:/export/read-only/g01
Brick2: jc1letgfs6:/export/read-only/g01
Brick3: jc1letgfs5:/export/read-only/g02
Brick4: jc1letgfs6:/export/read-only/g02
Options Reconfigured:
performance.stat-prefetch: on
performance.cache-size: 2GB
network.ping-timeout: 10

I want to add the next two servers, which are identically configured with 
hardware, storage, filesystems, etc. However, adding the bricks using the 
"replica" argument to indicate that I want this added as another set of 
mirrors, fails - as seen below:

gluster volume add-brick test-pfs-ro1 replica 2 
jc1letgfs7:/export/read-only/g01 jc1letgfs8:/export/read-only/g01 
jc1letgfs7:/export/read-only/g02 jc1letgfs8:/export/read-only/g02

wrong brick type: replica, use :
Usage: volume add-brick   ...
Adding brick to Volume test-pfs-ro1 failed

Anybody able to help me out here? Just to forestall the question "have you 
tried adding the storage without the replica 2 parameters?" - yes, I have, and 
that gives me a set of servers in which taking down any one node causes all 
clients to hang until that node comes back up - I posted an earlier thread 
asking for help with that here: 
http://gluster.org/pipermail/gluster-users/2011-March/006886.html


James Burnash, Unix Engineering


DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the 
addressee(s) named herein and may contain legally privileged and/or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
e-mail, and any attachments thereto, is strictly prohibited. If you have 
received this in error, please immediately notify me and permanently delete the 
original and any copy of any e-mail and any printout thereof. E-mail 
transmission cannot be guaranteed to be secure or error-free. The sender 
therefore does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its 
discretion, monitor and review the content of all e-mail communications. 
http://www.knight.com
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Best practices after a peer failure?

2011-03-15 Thread Pranith Kumar. Karampuri
hi R.C.,
Could you please give the exact steps when you log the bug. Please also 
give the output of gluster peer status on both the machines after restart. zip 
the files under /usr/local/var/log/glusterfs/ and /etc/glusterd on both the 
machines when this issue happens. This should help us debug the issue.

Thanks
Pranith.

- Original Message -
From: "R.C." 
To: gluster-users@gluster.org
Sent: Tuesday, March 15, 2011 4:14:24 PM
Subject: Re: [Gluster-users] Best practices after a peer failure?

I've figured out the problem.

If you mount the glusterfs with native client on a peer, if another peer 
crashes then doesn't self-heal after reboot.

Should I put this issue in the bug tracker?

Bye

Raf


- Original Message - 
From: "R.C." 
To: 
Sent: Monday, March 14, 2011 11:41 PM
Subject: Best practices after a peer failure?


> Hello to the list.
>
> I'm practicing GlusterFS in various topologies by means of multiple 
> Virtualbox VMs.
>
> As the standard system administrator, I'm mainly interested in disaster 
> recovery scenarios. The first being a replica 2 configuration, with one 
> peer crashing (actually stopping VM abruptly) during data writing to the 
> volume.
> After rebooting the stopped VM and relaunching the gluster deamon (service 
> glusterd start), the cluster doesn't start healing by itself.
> I've also tried the suggested commands:
> find  -print0 | xargs --null stat >/dev/null
> and
> find  -type f -exec dd if='{}' of=/dev/null bs=1M \; > 
> /dev/null 2>&1
> without success.
> A rebalance command recreates replicas but, when accessing cluster, the 
> always-alive client is the only one committing data to disk.
>
> Where am I misoperating?
>
> Thank you for your support.
>
> Raf
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Mac / NFS problems

2011-03-15 Thread paul simpson
opps - please disregard that post.  we had a workaround - mounting a remount
on plain nfs.  this wasn't working initially - but now is due to that answer
(nolock).  the original gluster NFS export still doesn't work correctly.  we
can see the files system via NFS - but not write to it!

so, i repeat the original question: has anyone else got mac NFS working with
gluster?  or, anyone got mac FUSE to compile correctly?

-p


On 15 March 2011 10:46, paul simpson  wrote:

> well, answering our own question;  it seems that NFS on the mac (10.6.6)
> has become problematic due to the increased amount of NFS locking used.  you
> just mount with nolocks and things start working.  i hope this help someone
> else out there...
>
> regards,
>
> paul
>
>
> quoting from http://www.facebook.com/note.php?note_id=125738946623
>
> I don't know what it is about Apple and NFS, but they keep moving things
> around. The new UI to NFS mounting is much nicer than it was before, but
> it's now in a totally different place: the Disk Utility. But if you use a
> lot of NFS file systems, it's a pain to have to mount them one by one:
> ignoring the UI and using the /net automount filesystem is far more
> convenient. Just use the file name /net/hostname/path and you don't have to
> mess with any mounting, it just happens by automagic. I wrote a blog entry
> about this a long time ago.
> However, there is a huge problem with this: OS X does a phenominal amount
> of file locking (some would say, needlessly so) and has always been really
> sensitive to the configuration of locking on the NFS servers. So much so
> that if you randomly pick an NFS server in a large enterprise, true success
> is pretty unlikely. It'll succeed, but you'll keep getting messages
> indicating that the lock server is down, followed quickly by another message
> that the lock server is back up again. Even if you do get the NFS server
> tuned precisely the way that OS X wants it, performance sucks because of all
> the lock/unlock protocol requests that fly across the network. They clearly
> did something in Snow Leopard to aggravate this problem: it's now nasty
> enough to make NFS almost useless for me.
>
> Fortunately, there is a fix: just turn off network locking. You can do it
> by adding the "nolocks,locallocks" options in the advanced options field of
> the Disk Utility NFS mounting UI, but this is painful if you do a lot of
> them, and doesn't help at all with /net. You can edit /etc/auto_master to
> add these options to the /net entry, but it doesn't affect other mounts -
> however I do recommend deleting the hidefromfinder option in auto_master. If
> you want to fix every automount, edit /etc/autofs.conf and search for the
> line that starts with AUTOMOUNTD_MNTOPTS=. These options get applied on
> every mount. Add nolocks,locallocks and your world will be faster and
> happier after you reboot.
>
>
>
> On 11 March 2011 09:52, Shehjar Tikoo  wrote:
>
>> David Lloyd wrote:
>>
>>> Hello,
>>>
>>> Were having issues with macs writing to our gluster system.
>>> Gluster vol info at end.
>>>
>>> On a mac, if I make a file in the shell I get the following message:
>>>
>>> smoke:hunter david$ echo hello > test
>>> -bash: test: Operation not permitted
>>>
>>>
>> I can help if you can send the nfs.log file from the /etc/glusterd
>> directory on the nfs server. Before your mount command, set the log-level to
>> trace for nfs server and then run the echo command above. Unmount as soon as
>> you see the error above and email me the nfs.log.
>>
>> -Shehjar
>>
>>
>>
>>
>>> And the file is made but is zero size.
>>>
>>> smoke:hunter david$ ls -l test
>>> -rw-r--r--  1 david  realise  0 Mar  3 08:44 test
>>>
>>>
>>> glusterfs/nfslog logs thus:
>>>
>>> [2011-03-03 08:44:10.379188] I [io-stats.c:333:io_stats_dump_fd]
>>> glustervol1: --- fd stats ---
>>>
>>> [2011-03-03 08:44:10.379222] I [io-stats.c:338:io_stats_dump_fd]
>>> glustervol1:   Filename : /production/hunter/test
>>>
>>> Then try to open the file:
>>>
>>> smoke:hunter david$ cat test
>>>
>>> and get the following messages in the log:
>>>
>>> [2011-03-03 08:51:13.957319] I [afr-common.c:716:afr_lookup_done]
>>> glustervol1-replicate-0: background  meta-data self-heal triggered. path:
>>> /production/hunter/test
>>> [2011-03-03 08:51:13.959466] I
>>> [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk]
>>> glustervol1-replicate-0: background  meta-data self-heal completed on
>>> /production/hunter/test
>>>
>>> If I do the same test on a linux machine (nfs) it's fine.
>>>
>>> We get the same issue on all the macs. They are 10.6.6.
>>>
>>> Gluster volume is mounted:
>>> /n/auto/gv1 -rw,hard,tcp,rsize=32768,wsize=32768,intr
>>> gus:/glustervol1
>>> Other nfs mounts on mac (from linux servers) are OK
>>>
>>> We're using LDAP to authenticate on the macs, the gluster servers aren't
>>> bound into the LDAP domain.
>>>
>>> Any ideas?
>>>
>>> Thanks
>>> David
>>>
>>>
>>> g3:/var/log/glusterfs # gluster volum

[Gluster-users] Mac OS X mounting hints

2011-03-15 Thread R.C.

As stated here:

http://www.moosefs.org/news-reader/items/MFS_running_on_Mac_OS_X..html

MooseFS native client can be mounted on OS X.
Is there anyone having experience in this field that can contribute some 
hints for this task?


Thank you

Bye

Raf 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Mac / NFS problems

2011-03-15 Thread paul simpson
well, answering our own question;  it seems that NFS on the mac (10.6.6) has
become problematic due to the increased amount of NFS locking used.  you
just mount with nolocks and things start working.  i hope this help someone
else out there...

regards,

paul


quoting from http://www.facebook.com/note.php?note_id=125738946623

I don't know what it is about Apple and NFS, but they keep moving things
around. The new UI to NFS mounting is much nicer than it was before, but
it's now in a totally different place: the Disk Utility. But if you use a
lot of NFS file systems, it's a pain to have to mount them one by one:
ignoring the UI and using the /net automount filesystem is far more
convenient. Just use the file name /net/hostname/path and you don't have to
mess with any mounting, it just happens by automagic. I wrote a blog entry
about this a long time ago.
However, there is a huge problem with this: OS X does a phenominal amount of
file locking (some would say, needlessly so) and has always been really
sensitive to the configuration of locking on the NFS servers. So much so
that if you randomly pick an NFS server in a large enterprise, true success
is pretty unlikely. It'll succeed, but you'll keep getting messages
indicating that the lock server is down, followed quickly by another message
that the lock server is back up again. Even if you do get the NFS server
tuned precisely the way that OS X wants it, performance sucks because of all
the lock/unlock protocol requests that fly across the network. They clearly
did something in Snow Leopard to aggravate this problem: it's now nasty
enough to make NFS almost useless for me.

Fortunately, there is a fix: just turn off network locking. You can do it by
adding the "nolocks,locallocks" options in the advanced options field of the
Disk Utility NFS mounting UI, but this is painful if you do a lot of them,
and doesn't help at all with /net. You can edit /etc/auto_master to add
these options to the /net entry, but it doesn't affect other mounts -
however I do recommend deleting the hidefromfinder option in auto_master. If
you want to fix every automount, edit /etc/autofs.conf and search for the
line that starts with AUTOMOUNTD_MNTOPTS=. These options get applied on
every mount. Add nolocks,locallocks and your world will be faster and
happier after you reboot.



On 11 March 2011 09:52, Shehjar Tikoo  wrote:

> David Lloyd wrote:
>
>> Hello,
>>
>> Were having issues with macs writing to our gluster system.
>> Gluster vol info at end.
>>
>> On a mac, if I make a file in the shell I get the following message:
>>
>> smoke:hunter david$ echo hello > test
>> -bash: test: Operation not permitted
>>
>>
> I can help if you can send the nfs.log file from the /etc/glusterd
> directory on the nfs server. Before your mount command, set the log-level to
> trace for nfs server and then run the echo command above. Unmount as soon as
> you see the error above and email me the nfs.log.
>
> -Shehjar
>
>
>
>
>> And the file is made but is zero size.
>>
>> smoke:hunter david$ ls -l test
>> -rw-r--r--  1 david  realise  0 Mar  3 08:44 test
>>
>>
>> glusterfs/nfslog logs thus:
>>
>> [2011-03-03 08:44:10.379188] I [io-stats.c:333:io_stats_dump_fd]
>> glustervol1: --- fd stats ---
>>
>> [2011-03-03 08:44:10.379222] I [io-stats.c:338:io_stats_dump_fd]
>> glustervol1:   Filename : /production/hunter/test
>>
>> Then try to open the file:
>>
>> smoke:hunter david$ cat test
>>
>> and get the following messages in the log:
>>
>> [2011-03-03 08:51:13.957319] I [afr-common.c:716:afr_lookup_done]
>> glustervol1-replicate-0: background  meta-data self-heal triggered. path:
>> /production/hunter/test
>> [2011-03-03 08:51:13.959466] I
>> [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk]
>> glustervol1-replicate-0: background  meta-data self-heal completed on
>> /production/hunter/test
>>
>> If I do the same test on a linux machine (nfs) it's fine.
>>
>> We get the same issue on all the macs. They are 10.6.6.
>>
>> Gluster volume is mounted:
>> /n/auto/gv1 -rw,hard,tcp,rsize=32768,wsize=32768,intr
>> gus:/glustervol1
>> Other nfs mounts on mac (from linux servers) are OK
>>
>> We're using LDAP to authenticate on the macs, the gluster servers aren't
>> bound into the LDAP domain.
>>
>> Any ideas?
>>
>> Thanks
>> David
>>
>>
>> g3:/var/log/glusterfs # gluster volume info
>> Volume Name: glustervol1
>> Type: Distributed-Replicate
>> Status: Started
>> Number of Bricks: 4 x 2 = 8
>> Transport-type: tcp
>> Bricks:
>> Brick1: g1:/mnt/glus1
>> Brick2: g2:/mnt/glus1
>> Brick3: g3:/mnt/glus1
>> Brick4: g4:/mnt/glus1
>> Brick5: g1:/mnt/glus2
>> Brick6: g2:/mnt/glus2
>> Brick7: g3:/mnt/glus2
>> Brick8: g4:/mnt/glus2
>> Options Reconfigured:
>> performance.stat-prefetch: 1
>> performance.cache-size: 1gb
>> performance.write-behind-window-size: 1mb
>> network.ping-timeout: 20
>> diagnostics.latency-measurement: off
>> diagnostics.dump-fd-stats: on
>>
>>
>>
>>
>>
>>
>>
>> ---

Re: [Gluster-users] Best practices after a peer failure?

2011-03-15 Thread R.C.

I've figured out the problem.

If you mount the glusterfs with native client on a peer, if another peer 
crashes then doesn't self-heal after reboot.


Should I put this issue in the bug tracker?

Bye

Raf


- Original Message - 
From: "R.C." 

To: 
Sent: Monday, March 14, 2011 11:41 PM
Subject: Best practices after a peer failure?



Hello to the list.

I'm practicing GlusterFS in various topologies by means of multiple 
Virtualbox VMs.


As the standard system administrator, I'm mainly interested in disaster 
recovery scenarios. The first being a replica 2 configuration, with one 
peer crashing (actually stopping VM abruptly) during data writing to the 
volume.
After rebooting the stopped VM and relaunching the gluster deamon (service 
glusterd start), the cluster doesn't start healing by itself.

I've also tried the suggested commands:
find  -print0 | xargs --null stat >/dev/null
and
find  -type f -exec dd if='{}' of=/dev/null bs=1M \; > 
/dev/null 2>&1

without success.
A rebalance command recreates replicas but, when accessing cluster, the 
always-alive client is the only one committing data to disk.


Where am I misoperating?

Thank you for your support.

Raf



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] WG: Strange behaviour glusterd 3.1

2011-03-15 Thread Daniel Müller
I updated my version to 3.1.2:
glusterfs --version
glusterfs 3.1.2 built on Jan 14 2011 19:21:08
Repository revision: v3.1.1-64-gf2a067c
Copyright (c) 2006-2010 Gluster Inc. 
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU Affero
General Public License.

And the missbehaviour is gone.

But there is another question . If I have a two node setup and a file is
opened on both server from different users,
how is gluster manage the locking then. I tried opened a file , there was
no: the file is already open!? I could change it,
and only the last saved change was replicated?




---
EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen

Tel.: 07071/206-463, Fax: 07071/206-499
eMail: muel...@tropenklinik.de
Internet: www.tropenklinik.de
---

-Ursprüngliche Nachricht-
Von: Daniel Müller [mailto:muel...@tropenklinik.de] 
Gesendet: Donnerstag, 10. März 2011 12:18
An: 'Mohit Anchlia'; 'gluster-users@gluster.org'
Betreff: AW: [Gluster-users] Strange behaviour glusterd 3.1

This is the error in  mnt-glusterfs.log
W [fuse-bridge.c:405:fuse_attr_cbk] glusterfs-fuse: 20480: FSTAT()
/windows/test/start.xlsx => -1 (File descriptor in bad state)

---
EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen

Tel.: 07071/206-463, Fax: 07071/206-499
eMail: muel...@tropenklinik.de
Internet: www.tropenklinik.de
---

-Ursprüngliche Nachricht-
Von: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
Gesendet: Mittwoch, 9. März 2011 20:10
An: muel...@tropenklinik.de; gluster-users@gluster.org
Betreff: Re: [Gluster-users] Strange behaviour glusterd 3.1

If it's a problem can you please enable the debug on that volume and
then see what gets logged in the logs.

I suggest creating a bug since it sounds critical. What if it was production
:)

On Wed, Mar 9, 2011 at 10:42 AM, Daniel Müller 
wrote:
>
>
> did set gluster volume set samba-vol performance.quick-read off .
>
> vim
> a new file on node1. ssh node2 . ls new file -> read error, file not
> found.
>
> did set gluster volume set samba-vol performance.quick-read on.
>
>
> I can ls, change content one time. Then the same again. No
> solution!!!
>
> Should I delete the VOl? and build a new one?
>
> I am
> glad that it is no production environment. It would be a mess.
>
> On Wed, 09
> Mar 2011 23:10:21 +0530, Vijay Bellur  wrote: On Wednesday 09 March 2011
> 04:00 PM, Daniel Müller wrote:
> /mnt/glusterfs ist he mount point of the
> client where the samba-vol (backend:/glusterfs/export) is mounted on.
> So it
> should work. And it did work until last week.
>
>  Can you please check by
> disabling quick-read translator in your setup via the following
> command:
>
> #gluster volume set performance.quick-read off
>
> You may be
> hitting bug 2027 with 3.1.0
>
> Thanks,
> Vijay
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users