Re: [Gluster-users] How to expand Gluster's volume after xfs filesystem resize ?

2013-11-12 Thread COCHE Sébastien
Hello,

 

You are right, I found my mistake. I forgot to extend the second server's 
partition 

I installed the latest version (3.4) on Centos 6.4.

In order to bench GlusterFS without being limited by disk subsystem, I created 
a ramdrive (15GB), on two servers.

The major constraint is that I need to rebuild de brick, from the second node, 
when I restart the server (It is not a production environment)

So the procedure I used to extend the volume is :

-  I remove the brick

-  Extend the Ramdrive (kernel parameter in /etc/grub.conf) (this 
file I forgot to upgrade on the second node)

-  Recreate the xfs file system

-  Add the newly created file system with a new brick

 

After correcting my error, the glusterfile system shown me the right size.

 

Thank you for your help.

 

Best regards

 

Sébastien

 

De : Mark Morlino [mailto:m...@gina.alaska.edu] 
Envoyé : vendredi 8 novembre 2013 18:45
À : COCHE Sébastien
Cc : gluster-users@gluster.org
Objet : Re: [Gluster-users] How to expand Gluster's volume after xfs filesystem 
resize ?

 

Which version of gluster are you using? I have been able to do this with 3.3 
and 3.4 on CentOS. With a replica 2 volume, I have just run lvexend with the -r 
option on both bricks to grow the LV and XF filesystem at the same time. The 
clients see the new size without having to do anything else specifically in 
gluster to resize the volume.

 

On Fri, Nov 8, 2013 at 6:27 AM, COCHE Sébastien sco...@sigma.fr wrote:

Hi all,

 

I am testing gluster’s feature, and how to perform exploitation task.

I created a Gluster cluster composed of 2 nodes.

I create a volume based on xfs filesystem (and LVM), and start a replicated 
gluster volume.

 

I would like to expand the volume size by :

-  Expanding LV

-  Expanding xfs filesystem

-  Expanding gluster volume

 

My problem is: I did not see Gluster command line to take on account the new 
filesystem size.

The filesystem show me the new size, when gluster volume still see the old size.

I tried the command line : ‘gluster volume rebalance…’ but this command only 
work for  stipped volume or for replicated volume with more than 1 brick.

 

How can I expand gluster volume ?

 

Thank you very much

 

Best regards

 

Sébastien Coché

 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Deleted files reappearing

2013-11-12 Thread Øystein Viggen
Amar Tumballi atumb...@redhat.com writes:

 On 11/12/2013 12:54 PM, Øystein Viggen wrote:
 Should I file a bug about this somewhere?  It seems easy enough to
 replicate.
 https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS

Thanks.  I've tried to describe it as best I can, and linked back to
this thread.

https://bugzilla.redhat.com/show_bug.cgi?id=1029337

Øystein
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] New to Gluster. Having trouble with server replacement.

2013-11-12 Thread Krist van Besien
Hello all,

I'm new to gluster. In order to gain some knowledge, and test a few
things I decided to install it on three servers and play around with
it a bit.

My setup:
Three servers dc1-09, dc2-09, dc2-10. All with RHEL 6.4, and Gluster
3.4.0 (from RHS 2.1)
Each server has three disks, mounted in /mnt/raid1, /mnt/raid2 and /mnt/raid3.

I created a distributed/replicated volume, test1, with two replicas.

[root@dc2-10 ~]# gluster volume info test1

Volume Name: test1
Type: Distributed-Replicate
Volume ID: 59049b52-9e25-4cc9-bebd-fb3587948900
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: dc1-09:/mnt/raid1/test1
Brick2: dc2-09:/mnt/raid2/test1
Brick3: dc2-09:/mnt/raid1/test1
Brick4: dc2-10:/mnt/raid2/test1
Brick5: dc2-10:/mnt/raid1/test1
Brick6: dc1-09:/mnt/raid2/test1


I mounted this volume on a fourth unix server, and started a small
script that just keeps writing small files to it, in order to have
some activity.
Then I shut down one of the servers, started it again, shut down
another etc... gluster proved to have no problem keeping the files
available.

Then I decided to just nuke one server, and just completely
reinitialise it. After reinstalling OS + Gluster I had some trouble
getting the server back in the pool.
I followed two hints I found on the internet, and added the old UUID
in to glusterd.info, and made sure the correct
trusted.glusterfs.volume-id was set on all bricks.

Now the new server starts storing stuff again. But it still looks a
bit odd. I don't get consistent output from gluster volume status on
all three servers.

gluster volume info test1 gives me the same output everywhere. However
the output of gluster volume status is different:

[root@dc1-09 glusterd]# gluster volume status test1
Status of volume: test1
Gluster process Port Online Pid
--
Brick dc1-09:/mnt/raid1/test1 49154 Y 10496
Brick dc2-09:/mnt/raid2/test1 49152 Y 7574
Brick dc2-09:/mnt/raid1/test1 49153 Y 7581
Brick dc1-09:/mnt/raid2/test1 49155 Y 10502
NFS Server on localhost 2049 Y 1039
Self-heal Daemon on localhost N/A Y 1046
NFS Server on dc2-09 2049 Y 12397
Self-heal Daemon on dc2-09 N/A Y 12444

There are no active volume tasks


[root@dc2-10 /]# gluster volume status test1
Status of volume: test1
Gluster process Port Online Pid
--
Brick dc2-09:/mnt/raid2/test1 49152 Y 7574
Brick dc2-09:/mnt/raid1/test1 49153 Y 7581
Brick dc2-10:/mnt/raid2/test1 49152 Y 9037
Brick dc2-10:/mnt/raid1/test1 49153 Y 9049
NFS Server on localhost 2049 Y 14266
Self-heal Daemon on localhost N/A Y 14281
NFS Server on 172.16.1.21 2049 Y 12397
Self-heal Daemon on 172.16.1.21 N/A Y 12444

There are no active volume tasks

[root@dc2-09 mnt]# gluster volume status test1
Status of volume: test1
Gluster process Port Online Pid
--
Brick dc1-09:/mnt/raid1/test1 49154 Y 10496
Brick dc2-09:/mnt/raid2/test1 49152 Y 7574
Brick dc2-09:/mnt/raid1/test1 49153 Y 7581
Brick dc2-10:/mnt/raid2/test1 49152 Y 9037
Brick dc2-10:/mnt/raid1/test1 49153 Y 9049
Brick dc1-09:/mnt/raid2/test1 49155 Y 10502
NFS Server on localhost 2049 Y 12397
Self-heal Daemon on localhost N/A Y 12444
NFS Server on dc2-10 2049 Y 14266
Self-heal Daemon on dc2-10 N/A Y 14281
NFS Server on dc1-09 2049 Y 1039
Self-heal Daemon on dc1-09 N/A Y 1046

There are no active volume tasks--

Why would the output of status be different on the three hosts? Is
this normal, or is there still something wrong? If so, how do I fix
this?


Krist


krist.vanbes...@gmail.com
kr...@vanbesien.org
Bern, Switzerland
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Strange errors in client's volume log file

2013-11-12 Thread Alan Orth
Hi,

I've just noticed one of my FUSE clients has an extremely large log file
for one of its volumes.  /var/log/glusterfs/home.log is 470GB right now,
and contains millions (billions?) of the following entries:

[2013-11-12 05:21:57.671118] I [dict.c:370:dict_get]

(--/usr/lib64/glusterfs/3.4.0/xlator/performance/md-cache.so(mdc_lookup+0x2f8)
[0x7f343e784078]

(--/usr/lib64/glusterfs/3.4.0/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x113)
[0x7f343e56c1e3]

(--/usr/lib64/glusterfs/3.4.0/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x1e1)
[0x7f343e35e141]))) 2-dict: !this || key=system.posix_acl_access
[2013-11-12 05:21:57.671133] I [dict.c:370:dict_get]

(--/usr/lib64/glusterfs/3.4.0/xlator/performance/md-cache.so(mdc_lookup+0x2f8)
[0x7f343e784078]

(--/usr/lib64/glusterfs/3.4.0/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x113)
[0x7f343e56c1e3]

(--/usr/lib64/glusterfs/3.4.0/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x233)
[0x7f343e35e193]))) 2-dict: !this || key=system.posix_acl_default
[2013-11-12 05:21:57.671

I see the same errors in another host for this volume, but the file is
only 400 MB. :)

The host is CentOS 6.4, running GlusterFS 3.4.0.  The volume is hosting
user home directories.  The FUSE mount is mounted with the `acl` mount
option.

Thanks!

-- 
Alan Orth
alan.o...@gmail.com
http://alaninkenya.org
http://mjanja.co.ke
I have always wished for my computer to be as easy to use as my telephone; my 
wish has come true because I can no longer figure out how to use my telephone. 
-Bjarne Stroustrup, inventor of C++



signature.asc
Description: OpenPGP digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Multiple Volumes (bricks), One Disk

2013-11-12 Thread David Gibbons
Hi All,

I am interested in some feedback on putting multiple bricks on one physical
disk. Each brick being assigned to a different volume. Here is the scenario:

4 disks per server, 4 servers, 2x2 distribute/replicate

I would prefer to have just one volume but need to do geo-replication on
some of the data (but not all of it). My thought was to use two volumes,
which would allow me to selectively geo-replicate just the data that I need
to, by replicating only one volume.

A couple of questions come to mind:
1) Any implications of doing two bricks for different volumes on one
physical disk?
2) Will the free space across each volume still calculate correctly? IE,
if one volume takes up 2/3 of the total physical disk space, will the
second volume still reflect the correct amount of used space?
3) Am I being stupid/missing something obvious?

Cheers,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Deleted files reappearing

2013-11-12 Thread Lalatendu Mohanty

On 11/12/2013 03:03 PM, Øystein Viggen wrote:

Amar Tumballi atumb...@redhat.com writes:


On 11/12/2013 12:54 PM, Øystein Viggen wrote:

Should I file a bug about this somewhere?  It seems easy enough to
replicate.

https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS

Thanks.  I've tried to describe it as best I can, and linked back to
this thread.

https://bugzilla.redhat.com/show_bug.cgi?id=1029337

Øystein
I am just curious about what does  gluster v heal volumeName info 
split-brain returns when you see this issue?


-Lala
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Re; Strange behaviour with add-brick followed by remove-brick

2013-11-12 Thread Lalatendu Mohanty

On 11/06/2013 10:53 AM, B.K.Raghuram wrote:

Here are the steps that I did to reproduce the problem. Essentially,
if you try to remove a brick that is not the same as the localhost
then it seems to migrate the files on the localhost brick instead and
hence there is a lot of data loss.. If instead, I try to remove the
localhost brick, it works fine. Can we try and get this fix into 3.4.2
as this seems to be the only way to replace a brick, given that
replace-brick is being removed!

[root@s5n9 ~]# gluster volume create v1 transport tcp
s5n9.testing.lan:/data/v1 s5n10.testing.lan:/data/v1
volume create: v1: success: please start the volume to access data
[root@s5n9 ~]# gluster volume start v1
volume start: v1: success
[root@s5n9 ~]# gluster volume info v1

Volume Name: v1
Type: Distribute
Volume ID: 6402b139-2957-4d62-810b-b70e6f9ba922
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: s5n9.testing.lan:/data/v1
Brick2: s5n10.testing.lan:/data/v1

***Now NFS mounted the volume onto my laptop and with a script
created 300 files in the mount. Distribution results below **
[root@s5n9 ~]# ls -l /data/v1 | wc -l
160
[root@s5n10 ~]# ls -l /data/v1 | wc -l
142

[root@s5n9 ~]# gluster volume add-brick v1 s6n11.testing.lan:/data/v1
volume add-brick: success
[root@s5n9 ~]# gluster volume remove-brick v1 s5n10.testing.lan:/data/v1 start
volume remove-brick start: success
ID: 8f3c37d6-2f24-4418-b75a-751dcb6f2b98
[root@s5n9 ~]# gluster volume remove-brick v1 s5n10.testing.lan:/data/v1 status
 Node Rebalanced-files
size   scanned  failures   skipped status run-time
in secs
-  ---
---   ---   ---   ---   
--
localhost0
0Bytes 0 0not started 0.00
s6n12.testing.lan0
0Bytes 0 0not started 0.00
s6n11.testing.lan0
0Bytes 0 0not started 0.00
s5n10.testing.lan0
0Bytes   300 0  completed 1.00


[root@s5n9 ~]# gluster volume remove-brick v1 s5n10.testing.lan:/data/v1 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success

[root@s5n9 ~]# gluster volume info v1

Volume Name: v1
Type: Distribute
Volume ID: 6402b139-2957-4d62-810b-b70e6f9ba922
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: s5n9.testing.lan:/data/v1
Brick2: s6n11.testing.lan:/data/v1


[root@s5n9 ~]# ls -l /data/v1 | wc -l
160
[root@s5n10 ~]# ls -l /data/v1 | wc -l
142
[root@s6n11 ~]# ls -l /data/v1 | wc -l
160
[root@s5n9 ~]# ls /data/v1
file10   file110  file131  file144  file156  file173  file19   file206
  file224  file238  file250  file264  file279  file291  file31  file44
file62  file86
file100  file114  file132  file146  file159  file174  file192  file209
  file225  file24   file252  file265  file28   file292  file32  file46
file63  file87
file101  file116  file134  file147  file16   file18   file196  file210
  file228  file240  file254  file266  file281  file293  file37  file47
file66  file9
file102  file12   file135  file148  file161  file181  file198  file212
  file229  file241  file255  file267  file284  file294  file38  file48
file69  file91
file103  file121  file136  file149  file165  file183  file200  file215
  file231  file243  file256  file268  file285  file295  file4   file50
file7   file93
file104  file122  file137  file150  file17   file184  file201  file216
  file233  file245  file258  file271  file286  file296  file40  file53
file71  file97
file105  file124  file138  file152  file170  file186  file202  file218
  file234  file246  file261  file273  file287  file297  file41  file54
file73
file107  file125  file140  file153  file171  file188  file203  file220
  file236  file248  file262  file275  file288  file298  file42  file55
file75
file11   file13   file141  file154  file172  file189  file204  file222
  file237  file25   file263  file278  file290  file3file43  file58
file80

[root@s6n11 ~]# ls  /data/v1
file10   file110  file131  file144  file156  file173  file19   file206
  file224  file238  file250  file264  file279  file291  file31  file44
file62  file86
file100  file114  file132  file146  file159  file174  file192  file209
  file225  file24   file252  file265  file28   file292  file32  file46
file63  file87
file101  file116  file134  file147  file16   file18   file196  file210
  file228  file240  file254  file266  file281  file293  file37  file47
file66  file9
file102  file12   file135  file148  file161  file181  file198  file212
  file229  file241  file255  file267  file284  file294  file38  file48
file69  file91
file103  file121  

Re: [Gluster-users] Multiple Volumes (bricks), One Disk

2013-11-12 Thread Eric Johnson
I would suggest using different partitions for each brick.  We use LVM 
and start off with a relativity small amount allocated space, then grow 
the partitions as needed.  If you were to place 2 bricks on the same 
partition then the free space is not going to show correctly.  Example:


1TB partition 2 bricks on this partition

brick: vol-1-a   using 200GB
brick: vol-2-a   using 300GB.

Both volumes would show that they have ~500GB free, but in reality there 
would be ~500GB that either could use.  I don't know if there would be 
any other issues with putting 2 or more bricks on the same partition, 
but it doesn't seem like a good idea.  I had gluster setup that way when 
I was first testing it, and it seemed to work other than the free space 
issue, but I quickly realized it would be better to separate out the 
bricks on to their own partition.  Using LVM allows you to easily grow 
partitions as needed.


my 2 cents.


On 11/12/13, 9:31 AM, David Gibbons wrote:

Hi All,

I am interested in some feedback on putting multiple bricks on one 
physical disk. Each brick being assigned to a different volume. Here 
is the scenario:


4 disks per server, 4 servers, 2x2 distribute/replicate

I would prefer to have just one volume but need to do geo-replication 
on some of the data (but not all of it). My thought was to use two 
volumes, which would allow me to selectively geo-replicate just the 
data that I need to, by replicating only one volume.


A couple of questions come to mind:
1) Any implications of doing two bricks for different volumes on one 
physical disk?
2) Will the free space across each volume still calculate correctly? 
IE, if one volume takes up 2/3 of the total physical disk space, will 
the second volume still reflect the correct amount of used space?

3) Am I being stupid/missing something obvious?

Cheers,
Dave


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users



--
Eric Johnson
713-968-2546
VP of MIS
Internet America
www.internetamerica.com

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multiple Volumes (bricks), One Disk

2013-11-12 Thread Joe Julian
Like Eric, I too use lvm to partition off bricks for different volumes. 
You can even specify which physical device a brick is on when you're 
creating your brick, ie. lvcreate -n myvol_brick_a -l50 vg_gluster 
/dev/sda1. This is handy if you have to replace the disk while the old 
one is still alive as you can just install the replacement and do a 
pvmove.


Each brick uses memory. I have 15 volumes, 4 disks per server, and one 
brick per volume per disk. 60 bricks would use a lot of memory. Rather 
than buy a bunch more memory that would usually sit idle, I set 
performance.cache-size to a number that would use up just enough 
memory. Do experiment with that setting. It appears that size limit is 
used for multiple caches so the actual memory used seems to be some 
multiple of what you set it to.


When a server comes back after maintenance and the self-heals start, 
having multiple bricks healing simultaneously can put quite a load on 
your servers. Test that and see if it meets your satisfaction. I 
actually kill bricks for non-essential volumes while the essential 
volumes are healing, then use volume start ... force to start the 
bricks for the degraded volumes individually to manage that.



On 11/12/2013 08:21 AM, Eric Johnson wrote:
I would suggest using different partitions for each brick.  We use LVM 
and start off with a relativity small amount allocated space, then 
grow the partitions as needed.  If you were to place 2 bricks on the 
same partition then the free space is not going to show correctly. 
Example:


1TB partition 2 bricks on this partition

brick: vol-1-a   using 200GB
brick: vol-2-a   using 300GB.

Both volumes would show that they have ~500GB free, but in reality 
there would be ~500GB that either could use.  I don't know if there 
would be any other issues with putting 2 or more bricks on the same 
partition, but it doesn't seem like a good idea.  I had gluster setup 
that way when I was first testing it, and it seemed to work other than 
the free space issue, but I quickly realized it would be better to 
separate out the bricks on to their own partition.  Using LVM allows 
you to easily grow partitions as needed.


my 2 cents.


On 11/12/13, 9:31 AM, David Gibbons wrote:

Hi All,

I am interested in some feedback on putting multiple bricks on one 
physical disk. Each brick being assigned to a different volume. Here 
is the scenario:


4 disks per server, 4 servers, 2x2 distribute/replicate

I would prefer to have just one volume but need to do geo-replication 
on some of the data (but not all of it). My thought was to use two 
volumes, which would allow me to selectively geo-replicate just the 
data that I need to, by replicating only one volume.


A couple of questions come to mind:
1) Any implications of doing two bricks for different volumes on one 
physical disk?
2) Will the free space across each volume still calculate 
correctly? IE, if one volume takes up 2/3 of the total physical disk 
space, will the second volume still reflect the correct amount of 
used space?

3) Am I being stupid/missing something obvious?

Cheers,
Dave


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users



--
Eric Johnson
713-968-2546
VP of MIS
Internet America
www.internetamerica.com


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Samba vfs_glusterfs Quota Support?

2013-11-12 Thread David Gibbons
Ira,

Thank you for the response. I suspect that your patch will resolve this
issue as well -- however, an upgrade to SMB 3.6.20 continues to display the
total volume size behavior, instead of the glusterFS folder-quota behavior
as expected. I note that your patch was accepted into 3.6.next but don't
see whether or not it actually made it into the 3.6.20 release. I'm
probably looking in the wrong place. Any pointers?

Cheers,
Dave


On Wed, Oct 30, 2013 at 11:53 AM, Ira Cooper i...@redhat.com wrote:

 I suspect you are missing the patch needed to make this work.


 http://git.samba.org/?p=samba.git;a=commit;h=872a7d61ca769c47890244a1005c1bd445a3bab6;
  It was put in, in the 3.6.13 timeframe if I'm reading the git history
 correctly.

 The bug manifests when the base of the share has a different amount of
 Quota Allowance than elsewhere in the tree.

 \\foo\ - 5GB quota
 \\foo\bar - 2.5GB quota

 When you run dir in \\foo you get the results from the 5GB quota, and
 the same in \\foo\bar, which is incorrect and highly confusing to users.

 https://bugzilla.samba.org/show_bug.cgi?id=9646

 Despite my discussion of multi-volume it should be the same bug.

 Thanks,

 -Ira / i...@samba.org

 - Original Message -
 From: David Gibbons david.c.gibb...@gmail.com
 To: gluster-users@gluster.org
 Sent: Wednesday, October 30, 2013 11:04:49 AM
 Subject: Re: [Gluster-users] Samba vfs_glusterfs Quota Support?

 Thanks all for the pointers.



 What version of Samba are you running?

 Samba version is 3.6.9:
 [root@gfs-a-1 /]# smbd -V
 Version 3.6.9

 Gluster version is 3.4.1 git:
 [root@gfs-a-1 /]# glusterfs --version
 glusterfs 3.4.1 built on Oct 21 2013 09:22:36


 It should be
 # gluster volume set gfsv0 features.quota-deem-statfs on
 [root@gfs-a-1 /]# gluster volume set gfsv0 features.quota-deem-statfs on
 volume set: failed: option : features.quota-deem-statfs does not exist
 Did you mean features.quota-timeout?

 I wonder if the quota-deem-statfs is part a more recent version?

 Cheers,
 Dave


 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Failed rebalance - lost files, inaccessible files, permission issues

2013-11-12 Thread Shawn Heisey

On 11/9/2013 2:39 AM, Shawn Heisey wrote:

They are from the same log file - the one that I put on my dropbox
account and linked in the original message.  They are consecutive log
entries.


Further info from our developer that is looking deeper into these problems:




Ouch.  I know why the rebalance stopped.  The host simply ran out of 
memory.  From the messages file:


Nov  2 21:55:30 slc01dfs001a kernel: VFS: file-max limit 2438308 reached
Nov  2 21:55:31 slc01dfs001a kernel: automount invoked oom-killer: 
gfp_mask=0xd0, order=1, oom_adj=0, oom_score_adj=0

Nov  2 21:55:31 slc01dfs001a kernel: automount cpuset=/ mems_allowed=0
Nov  2 21:55:31 slc01dfs001a kernel: Pid: 2810, comm: automount Not 
tainted 2.6.32-358.2.1.el6.centos.plus.x86_64 #1


That file max limit line actually goes back to the beginning of Nov. 
2, and happened on all four hosts.  It is because of a file descriptor 
leak and was fixed in 3.3.2: 
https://bugzilla.redhat.com/show_bug.cgi?id=928631


This is unconnected to the file corruption/loss which started much 
earlier.  I'm still trying to understand this part.  I noticed that 
three of the hosts reported successful rebalancing on the same day we 
started losing files.  I am not sure how rebalancing was distributed 
among the hosts, and if the load on the other hosts was enough to keep 
things stable until they stopped.





I gather that we should be at least on 3.3.2, but I suspect that a 
number of other bugs might be a problem unless we go to 3.4.1.  The 
rebalance status output is below.  All hosts except localhost on this 
status were reading completed a very short time after I started the 
rebalance.  The localhost line continued to increment until the 
rebalance died four days after starting.


[root@slc01dfs001a ~]# gluster volume rebalance mdfs status
Node Rebalanced-files  size 
  scanned  failures status
   -  ---   --- 
  ---   ---   
   localhost  1121514 1.5TB 
  9020514   1777661 failed
   slc01nas100Bytes 
 13638699 0  completed
slc01dfs002a00Bytes 
 13638699 1  completed
slc01dfs001b00Bytes 
 13638699 0  completed
slc01dfs002b00Bytes 
 13638700 0  completed
   slc01nas200Bytes 
 13638699 0  completed


Thanks,
Shawn

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users