Re: [ceph-users] Can a cephfs "volume" get errors and how are they fixed?

2015-07-27 Thread Roland Giesler
On 15 July 2015 at 17:34, John Spray  wrote:

>
>
> On 15/07/15 16:11, Roland Giesler wrote:
>
>
>
>  I mount cephfs in /etc/fstab and all seemed well for quite a few
> months.  Now however, I start seeing strange things like directories with
> corrupted files names in the file system.
>
>
> When you encounter a serious issue, please tell us some details about it,
> like what version of ceph you are using, what client you are using (kernel,
> fuse, + version), whether you have any errors in your logs, etc.
>

​I use Proxmox "stock standard".  v3.3-5

Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux
ceph version 0.80.8
​
I mount the filesystem fuse.ceph in fstab so I guess I'm not using the the
kernel client?

I'm in the process of migrating the installations in VZ containers to full
KVM machines so that I'm not stuck on the 2.6 kernel anymore.  Hopefully
that will sort out the troubles I'm having.



>
> I have a vague memory of a symptom like what you're describing happening
> with an older kernel client at some stage, but can't find a ticket about it
> right now.
>
>
>  My question is: How can the filesystem be checked for errors and fixed?
> Or does it heal itself automatically.  The disks are all formatted with
> btrfs.
>
>
> The underlying data storage benefits from the resilience built into RADOS,
> i.e. you don't have to worry about drive failures etc.
>
> CephFS's fsck is in development right now.  We note this in a big red box
> at the top of the documentation[1].
>
> By the way, if data integrity is important to you, you would be better off
> with a more conservative configuration (btrfs is not used by most people in
> production, XFS is the default).
>

​I actually checked by system.  The three Debian machines are running XFS,
but the Ubuntu machine (which has a much newer kernel) has btrfs formatted
drives​

​I'll report back once I have moved to a newer kernel.

thanks

Roland​




>
> Regards,
> John
>
> 1. http://ceph.com/docs/master/cephfs/
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can a cephfs "volume" get errors and how are they fixed?

2015-07-15 Thread Roland Giesler
Hi all,

I have ceph cluster that has the following:

# ceph osd tree
# idweighttype nameup/downreweight
-111.13root default
-28.14host h1
 10.9 osd.1 up1
 30.9 osd.3 up1
 40.9 osd.4 up1
 50.68osd.5 up1
 60.68osd.6 up1
 70.68osd.7 up1
 80.68osd.8 up1
 90.68osd.9 up1
100.68osd.10up1
110.68osd.11up1
120.68osd.12up1
-30.45host s3
 20.45osd.2 up1
-40.9 host s2
130.9 osd.13up1
-51.64host s1
140.29osd.14up1
 00.27osd.0 up1
150.27osd.15up1
160.27osd.16up1
170.27osd.17up1
180.27osd.18up1

s2 and s3 will get more drives in future, but this is the setup for now.

I mount cephfs in /etc/fstab and all seemed well for quite a few months.
Now however, I start seeing strange things like directories with corrupted
files names in the file system.

My question is: How can the filesystem be checked for errors and fixed?  Or
does it heal itself automatically.  The disks are all formatted with btrfs.

thanks

*Roland*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs unmounts itself from time to time

2015-06-19 Thread Roland Giesler
On 19 June 2015 at 13:46, Gregory Farnum  wrote:

> On Thu, Jun 18, 2015 at 10:15 PM, Roland Giesler 
> wrote:
> > On 15 June 2015 at 13:09, Gregory Farnum  wrote:
> >>
> >> On Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler 
> >> wrote:
> >> > I have a small cluster of 4 machines and quite a few drives.  After
> >> > about 2
> ​-3 weeks cephfs fails.  It's not properly mounted anymore
> >> > in
> ​ ​
> /mnt/cephfs, which of course causes the VM's running to fail too.
> >> >
>
​​


> >
> >
> > I'm under the impression that CephFS is the filesystem implimented by
> > ceph-fuse. Is it not?
>
> Of course it is, but it's a different implementation than the kernel
> client and often has different bugs. ;) Plus you can get a newer
> version of it easily.
>

​Let me look into it and see how it might help me.​


>  >> Other than that, can you include more
> >> information about exactly what you mean when saying CephFS unmounts
> >> itself?
> >
> >
> > Everything runs fine for weeks.  Then suddenly a user reports that a VM
> is
> > not functioning anymore.  On investigation is transpires than CephFS is
> not
> > mounted anymore and the error I reported is logged.
> >
> > I can't see anything else wrong at this stage.  ceph is running, the osd
> are
> > all up.
>
> Maybe one of our kernel devs has a better idea but I've no clue how to
> debug this if you can't give me any information about how CephFS came
> to be unmounted. It just doesn't make any sense to me. :(
>

​I'll go through the logs again and find the point where it happens and
post it.

- Roland​
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs unmounts itself from time to time

2015-06-18 Thread Roland Giesler
On 15 June 2015 at 13:09, Gregory Farnum  wrote:

> On Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler 
> wrote:
> > I have a small cluster of 4 machines and quite a few drives.  After
> about 2
> > - 3 weeks cephfs fails.  It's not properly mounted anymore in
> /mnt/cephfs,
> > which of course causes the VM's running to fail too.
> >
> > In /var/log/syslog I have "/mnt/cephfs: File exists at
> > /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.
> >
> > There doesn't seem to be anything wrong with ceph at the time.
> >
> > # ceph -s
> > cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f
> >  health HEALTH_WARN clock skew detected on mon.s1
> >  monmap e2: 2 mons at
> > {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312,
> > quorum 0,1 h1,s1
> >  mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby
> >  osdmap e5577: 19 osds: 19 up, 19 in
> >   pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects
> > 1636 GB used, 9713 GB / 11358 GB avail
> >  384 active+clean
> >   client io 12240 kB/s rd, 1524 B/s wr, 24 op/s
> > # ceph osd tree
> > # id  weight   type nameup/down  reweight
> > -111.13root default
> > -2 8.14host h1
> >  1 0.9 osd.1up1
> >  3 0.9 osd.3up1
> >  4 0.9 osd.4up1
> >  5 0.68osd.5up1
> >  6 0.68osd.6up1
> >  7 0.68osd.7up1
> >  8 0.68osd.8up1
> >  9 0.68osd.9up1
> > 10 0.68osd.10   up1
> > 11 0.68osd.11   up1
> > 12 0.68osd.12   up1
> > -3 0.45host s3
> >  2 0.45osd.2up1
> > -4 0.9 host s2
> > 13 0.9 osd.13   up1
> > -5 1.64host s1
> > 14 0.29osd.14   up1
> >  0 0.27osd.0up1
> > 15 0.27osd.15   up1
> > 16 0.27osd.16   up1
> > 17 0.27osd.17   up1
> > 18 0.27osd.18   up1
> >
> > When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the
> ceph
> > volume is loaded again.  I can restart the VM's and all seems well.
> >
> > I can't find errors pertaining to cephfs in the the other logs either.
> >
> > System information:
> >
> > Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64
> GNU/Linux
>
> I'm not sure what version of Linux this really is (I assume it's a
> vendor kernel of some kind!), but it's definitely an old one! CephFS
> sees pretty continuous improvements to stability and it could be any
> number of resolved bugs.
>

​This is the stock standard installation of Proxmo​x with CephFS.



> If you can't upgrade the kernel, you might try out the ceph-fuse
> client instead as you can run a much newer and more up-to-date version
> of it, even on the old kernel.


​I'm under the impression that CephFS is the filesystem implimented by
ceph-fuse. Is it not? ​



> Other than that, can you include more
> information about exactly what you mean when saying CephFS unmounts
> itself?
>

​Everything runs fine for weeks.  Then suddenly a user reports that a VM is
not functioning anymore.  On investigation is transpires than CephFS is not
mounted anymore and the error I reported is logged.

I can't see anything else wrong at this stage.  ceph is running, the osd
are all up.

thanks again

Roland​



> -Greg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs unmounts itself from time to time

2015-06-15 Thread Roland Giesler
I have a small cluster of 4 machines and quite a few drives.  After about 2
- 3 weeks cephfs fails.  It's not properly mounted anymore in /mnt/cephfs,
which of course causes the VM's running to fail too.

In /var/log/syslog I have "/mnt/cephfs: File exists at
/usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.

​There doesn't seem to be anything wrong with ceph at the time.

# ceph -s
cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f
 health HEALTH_WARN clock skew detected on mon.s1
 monmap e2: 2 mons at {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0},
election epoch 312, quorum 0,1 h1,s1
 mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby
 osdmap e5577: 19 osds: 19 up, 19 in
  pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects
1636 GB used, 9713 GB / 11358 GB avail
 384 active+clean
  client io 12240 kB/s rd, 1524 B/s wr, 24 op/s
​
​# ceph osd tree
# id  weight   type nameup/down  reweight
-111.13root default
-2 8.14host h1
 1 0.9 osd.1up1
 3 0.9 osd.3up1
 4 0.9 osd.4up1
 5 0.68osd.5up1
 6 0.68osd.6up1
 7 0.68osd.7up1
 8 0.68osd.8up1
 9 0.68osd.9up1
10 0.68osd.10   up1
11 0.68osd.11   up1
12 0.68osd.12   up1
-3 0.45host s3
 2 0.45osd.2up1
-4 0.9 host s2
13 0.9 osd.13   up1
-5 1.64host s1
14 0.29osd.14   up1
 0 0.27osd.0up1
15 0.27osd.15   up1
16 0.27osd.16   up1
17 0.27osd.17   up1
18 0.27osd.18   up1

​When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the
ceph volume is loaded again.  I can restart the VM's and all seems well.

I can't find errors pertaining to cephfs in the the other logs either.

System information:

Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux

I can't upgrade to kernel v3.13 since I'm using containers.

Of course, I want to prevent this from happening!  How do I troubleshoot
that?  What is causing this?​

​regards


*Roland Giesler*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
On 14 January 2015 at 12:08, JM  wrote:

> Hi Roland,
>
> You should tune your Ceph Crushmap with a custom rule in order to do that
> (write first on s3 and then to others). This custom rule will be applied
> then to your proxmox pool.
> (what you want to do is only interesting if you run VM from host s3)
>
> Can you give us your crushmap ?
>


# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host h1 {
id -2# do not change unnecessarily
# weight 8.140
alg straw
hash 0# rjenkins1
item osd.1 weight 0.900
item osd.3 weight 0.900
item osd.4 weight 0.900
item osd.5 weight 0.680
item osd.6 weight 0.680
item osd.7 weight 0.680
item osd.8 weight 0.680
item osd.9 weight 0.680
item osd.10 weight 0.680
item osd.11 weight 0.680
item osd.12 weight 0.680
}
host s3 {
id -3# do not change unnecessarily
# weight 0.450
alg straw
hash 0# rjenkins1
item osd.2 weight 0.450
}
host s2 {
id -4# do not change unnecessarily
# weight 0.900
alg straw
hash 0# rjenkins1
item osd.13 weight 0.900
}
host s1 {
id -5# do not change unnecessarily
# weight 1.640
alg straw
hash 0# rjenkins1
item osd.14 weight 0.290
item osd.0 weight 0.270
item osd.15 weight 0.270
item osd.16 weight 0.270
item osd.17 weight 0.270
item osd.18 weight 0.270
}
root default {
id -1# do not change unnecessarily
# weight 11.130
alg straw
hash 0# rjenkins1
item h1 weight 8.140
item s3 weight 0.450
item s2 weight 0.900
item s1 weight 1.640
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map

thanks so far!

regards

Roland



>
>
>
> 2015-01-13 22:03 GMT+01:00 Roland Giesler :
>
>> I have a 4 node ceph cluster, but the disks are not equally distributed
>> across all machines (they are substantially different from each other)
>>
>> One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
>> and two machines have only two 1 TB drives each (s2 & s1).
>>
>> Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
>> mostly from there, but I want to make sure that the writes that happen to
>> the ceph cluster get written to the "local" osd's on s3 first and then the
>> additional writes/copies get done to the network.
>>
>> Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
>> relevant.
>>
>> regards
>>
>>
>> *Roland *
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
On 16 January 2015 at 17:15, Gregory Farnum  wrote:

> > I have set up 4 machines in a cluster.  When I created the Windows 2008
> > server VM on S1 (I corrected my first email: I have three Sunfire X
> series
> > servers, S1, S2, S3) since S1 has 36GB of RAM en 8 x 300GB SAS drives, it
> > was running normally, pretty close to what I had on the bare metal.
> About a
> > month later (after being on leave to 2 weeks), I found a machine that is
> > crawling at a snail pace and I cannot figure out why.
>
> You mean one of the VMs has very slow disk access? Or one of the hosts
> is very slow?
>

The Windows 2008 VM is very slow.  Inside Windows all seems normal, the
CPU's are never more 20% used and when navigating even the menus take a
long time to respond.  The Host (S1) is not slow.


> In any case, you'd need to look at what about that system is different
> from the others and poke at that difference until it exposes an issue,
> I suppose.
>

I'll move the machine to one of the smaller hosts (S2 or S3).  I'll just
have to lower the spec of the VM, since I've set RAM at 10GB, which is much
more than S2 or S3 have.  Let's see what happens.



> -Greg
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
On 14 January 2015 at 21:46, Gregory Farnum  wrote:

> On Tue, Jan 13, 2015 at 1:03 PM, Roland Giesler 
> wrote:
> > I have a 4 node ceph cluster, but the disks are not equally distributed
> > across all machines (they are substantially different from each other)
> >
> > One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
> and
> > two machines have only two 1 TB drives each (s2 & s1).
> >
> > Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
> > mostly from there, but I want to make sure that the writes that happen to
> > the ceph cluster get written to the "local" osd's on s3 first and then
> the
> > additional writes/copies get done to the network.
> >
> > Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
> > relevant.
>
> In general you can't set up Ceph to write to the local node first. In
> some specific cases you can if you're willing to do a lot more work
> around placement, and this *might* be one of those cases.
>
> To do this, you'd need to change the CRUSH rules pretty extensively,
> so that instead of selecting OSDs at random, they have two steps:
> 1) starting from bucket s3, select a random OSD and put it at the
> front of the OSD list for the PG.
> 2) Starting from a bucket which contains all the other OSDs, select
> N-1 more at random (where N is the number of desired replicas).
>

I understand in principle what you're saying.  Let me go back a step and
ask the question somewhat differently then:

I have set up 4 machines in a cluster.  When I created the Windows 2008
server VM on S1 (I corrected my first email: I have three Sunfire X series
servers, S1, S2, S3) since S1 has 36GB of RAM en 8 x 300GB SAS drives, it
was running normally, pretty close to what I had on the bare metal.  About
a month later (after being on leave to 2 weeks), I found a machine that is
crawling at a snail pace and I cannot figure out why.

So instead of suggesting something from my side (without in-depth knowledge
yet), what should I do to get this machine to run at speed again?

Further to my hardware and network:

S1: 2 x Quad Code Xeon, 36GB RAM, 8 x 300GB HDD's
S2: 1 x Opteron Dual Core, 8GB RAM, 2 x 750GB HDD's
S3: 1 x Opetron Dual Core, 8GB RAM, 2 x 750GB HDD's
H1: 1 x Xeon Dual Core, 5GB RAM, 12 x 1TB HDD's
(All these machines are at full drive capacity, that is all their slots are
being utilised)

All the servers are linked with Dual Gigabit Ethernet connections to a
switch with LACP enabled and the links are bonded on each server.  While
this doesn't raise the total transfer speed, it does allow more bandwidth
between the servers.

The H1 machine is only running ceph and thus acts only as storage.  The
other machines (S1, S2 & S3) are for web servers (development and
production), the Windows 2008 server and a few other functions all managed
from proxmox.

The hardware is what my client has been using, but there were lots of
inefficiencies and little redundancy in the setup before we embarked on
this project.  However, the hardware is sufficient for their needs.

I hope that gives you a reasonable picture of the setup so be able to give
me some advice on how to troubleshoot this.

regards

Roland



>
> You can look at the documentation on CRUSH or search the list archives
> for more on this subject.
>
> Note that doing this has a bunch of down sides: you'll have balance
> issues because every piece of data will be on the s3 node (that's a
> TERRIBLE name for a project which has API support for Amazon S3, btw
> :p), if you add new VMs on a different node they'll all be going to
> the s3 node for all their writes (unless you set them up on a
> different pool with different CRUSH rules), s3 will be satisfying all
> the read requests so the other nodes are just backups in case of disk
> failure, etc.
> -Greg
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
So you can see my server names and their osd's too...

# idweighttype nameup/downreweight
-111.13root default
-28.14host h1
 1 0.9osd.1 up1
 3 0.9osd.3 up1
 4 0.9osd.4 up1
 50.68osd.5 up1
 60.68osd.6 up1
 70.68osd.7 up1
 80.68osd.8 up1
 90.68osd.9 up1
100.68osd.10up1
110.68osd.11up1
120.68osd.12up1
-30.45host s3
 20.45osd.2 up1
-4 0.9host s2
13 0.9osd.13up1
-51.64host s1
140.29osd.14up1
 00.27osd.0down   0
150.27osd.15up1
160.27osd.16up1
170.27osd.17up1
180.27osd.18up1

regards

Roland
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
On 14 January 2015 at 12:08, JM  wrote:

> Hi Roland,
>
> You should tune your Ceph Crushmap with a custom rule in order to do that
> (write first on s3 and then to others). This custom rule will be applied
> then to your proxmox pool.
> (what you want to do is only interesting if you run VM from host s3)
>
> Can you give us your crushmap ?
>

Please note that I made a mistake in my email.  The machine that I want to
run on write first, is S1 not S3

For the life of me I cannot find how to extract the crush map.  I found:

ceph osd getcrushmap -o crushfilename

Where can I find the crush file?  I've never needed this.
This is my first installation, so please bear with my while I learn!

Lionel: I read what you're saying.  However, the strange thing is that last
year I had this Windows 2008 VM running on the same cluster without changes
and coming back from leave in the new year, it has crawled to a painfully
slow state.  And I don't quite know where to start to trace this.  The
windows machine is not the problem, since even before windows starts up the
boot process of the VM is very slow.

thanks

Roland




>
>
>
> 2015-01-13 22:03 GMT+01:00 Roland Giesler :
>
>> I have a 4 node ceph cluster, but the disks are not equally distributed
>> across all machines (they are substantially different from each other)
>>
>> One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
>> and two machines have only two 1 TB drives each (s2 & s1).
>>
>> Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
>> mostly from there, but I want to make sure that the writes that happen to
>> the ceph cluster get written to the "local" osd's on s3 first and then the
>> additional writes/copies get done to the network.
>>
>> Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
>> relevant.
>>
>> regards
>>
>>
>> *Roland *
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-14 Thread Roland Giesler
I have a 4 node ceph cluster, but the disks are not equally distributed
across all machines (they are substantially different from each other)

One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
and two machines have only two 1 TB drives each (s2 & s1).

Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
mostly from there, but I want to make sure that the writes that happen to
the ceph cluster get written to the "local" osd's on s3 first and then the
additional writes/copies get done to the network.

Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
relevant.

regards


*Roland *
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bonding woes

2014-11-19 Thread Roland Giesler
Maybe I should rephrase my question by asking what the relationship is
between bonding and ethtool?

*Roland*


On 18 November 2014 22:14, Roland Giesler  wrote:

> Hi people, I have two identical servers (both Sun X2100 M2's) that form
> part of a cluster of 3 machines (other machines will be added later).   I
> want to bond two GB ethernet ports on these, which works perfectly on the
> one, but not on the other.
>
> How can this be?
>
> The one machine (named S2) detects no links up (with ethtool), yet the
> links are up.  When I assign an ip to eth2 for instance, it works 100%
> despite ethtool claiming there is not link.
>
> I understand that bonding uses ethtool to determine whether a link is up
> and then activates the bond.  So how can I "fix" this?
>
> both machines have the following:
>
> /etc/network/interfaces
>
> # network interface settings
> auto lo
> iface lo inet loopback
>
> auto eth2
> iface eth2 inet static
>
> auto eth3
> iface eth3 inet static
>
> iface eth0 inet manual
>
> iface eth1 inet manual
>
> auto bond0
> iface bond0 inet manual
> slaves eth2, eth3
> bond_miimon 100
> bond_mode 802.3ad
> bond_xmit_hash_policy layer2
>
> auto vmbr0
> iface vmbr0 inet static
> address  192.168.121.32
> netmask  255.255.255.0
> gateway  192.168.121.1
> bridge_ports bond0
> bridge_stp off
> bridge_fd 0
>
> And furthermore: /etc/udev/rules.d/70-persistent-net.rules
>
> # This file was automatically generated by the /lib/udev/write_net_rules
> # program, run by the persistent-net-generator.rules rules file.
> #
> # You can modify it, as long as you keep each rule on a single
> # line, and change only the value of the NAME= key.
>
> # PCI device
> 0x14e4:/sys/devices/pci:00/:00:0d.0/:05:00.0/:06:04.0 (tg3)
> SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
> ATTR{address}=="00:16:36:76:0f:3d", ATTR{dev_id}=="0x0", ATTR{type}=="1",
> KERNEL=="eth*", NAME="eth0"
>
> # PCI device
> 0x14e4:/sys/devices/pci:00/:00:0d.0/:05:00.0/:06:04.1 (tg3)
> SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
> ATTR{address}=="00:16:36:76:0f:3e", ATTR{dev_id}=="0x0", ATTR{type}=="1",
> KERNEL=="eth*", NAME="eth1"
>
> # PCI device 0x10de:/sys/devices/pci:00/:00:09.0 (forcedeth)
> SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
> ATTR{address}=="00:16:36:76:0f:40", ATTR{dev_id}=="0x0", ATTR{type}=="1",
> KERNEL=="eth*", NAME="eth2"
>
> # PCI device 0x10de:/sys/devices/pci:00/:00:08.0 (forcedeth)
> SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
> ATTR{address}=="00:16:36:76:0f:3f", ATTR{dev_id}=="0x0", ATTR{type}=="1",
> KERNEL=="eth*", NAME="eth3"
>
> The MAC addresses correlate with the hardware.
> The above is from the machine that works.
>
> On the one that doesn't, the following:
>
> /etc/network/interfaces
>
> # network interface settings
> auto lo
> iface lo inet loopback
>
> auto eth2
> iface eth2 inet static
>
> auto eth3
> iface eth3 inet static
>
> iface eth0 inet manual
>
> iface eth1 inet manual
>
> auto bond0
> iface bond0 inet manual
> slaves eth2, eth3
> bond_miimon 100
> bond_mode 802.3ad
> bond_xmit_hash_policy layer2
>
> auto vmbr0
> iface vmbr0 inet static
> address  192.168.121.31
> netmask  255.255.255.0
> gateway  192.168.121.1
> bridge_ports bond0
> bridge_stp off
> bridge_fd 0
>
> The MAC addresses differ in the udev rules, but nothing else.
>
> ethtool says eth2 and eth3 don't have a link.
>
> On S2 (the working machine) it says eth2 is down and eth3 is up, but a
> bond is formed and the machine is connected.
>
> What is happening here and how can it be resolved?
>
> thanks
>
> Roland
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bonding woes

2014-11-18 Thread Roland Giesler
Hi people, I have two identical servers (both Sun X2100 M2's) that form
part of a cluster of 3 machines (other machines will be added later).   I
want to bond two GB ethernet ports on these, which works perfectly on the
one, but not on the other.

How can this be?

The one machine (named S2) detects no links up (with ethtool), yet the
links are up.  When I assign an ip to eth2 for instance, it works 100%
despite ethtool claiming there is not link.

I understand that bonding uses ethtool to determine whether a link is up
and then activates the bond.  So how can I "fix" this?

both machines have the following:

/etc/network/interfaces

# network interface settings
auto lo
iface lo inet loopback

auto eth2
iface eth2 inet static

auto eth3
iface eth3 inet static

iface eth0 inet manual

iface eth1 inet manual

auto bond0
iface bond0 inet manual
slaves eth2, eth3
bond_miimon 100
bond_mode 802.3ad
bond_xmit_hash_policy layer2

auto vmbr0
iface vmbr0 inet static
address  192.168.121.32
netmask  255.255.255.0
gateway  192.168.121.1
bridge_ports bond0
bridge_stp off
bridge_fd 0

And furthermore: /etc/udev/rules.d/70-persistent-net.rules

# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.

# PCI device
0x14e4:/sys/devices/pci:00/:00:0d.0/:05:00.0/:06:04.0 (tg3)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:16:36:76:0f:3d", ATTR{dev_id}=="0x0", ATTR{type}=="1",
KERNEL=="eth*", NAME="eth0"

# PCI device
0x14e4:/sys/devices/pci:00/:00:0d.0/:05:00.0/:06:04.1 (tg3)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:16:36:76:0f:3e", ATTR{dev_id}=="0x0", ATTR{type}=="1",
KERNEL=="eth*", NAME="eth1"

# PCI device 0x10de:/sys/devices/pci:00/:00:09.0 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:16:36:76:0f:40", ATTR{dev_id}=="0x0", ATTR{type}=="1",
KERNEL=="eth*", NAME="eth2"

# PCI device 0x10de:/sys/devices/pci:00/:00:08.0 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:16:36:76:0f:3f", ATTR{dev_id}=="0x0", ATTR{type}=="1",
KERNEL=="eth*", NAME="eth3"

The MAC addresses correlate with the hardware.
The above is from the machine that works.

On the one that doesn't, the following:

/etc/network/interfaces

# network interface settings
auto lo
iface lo inet loopback

auto eth2
iface eth2 inet static

auto eth3
iface eth3 inet static

iface eth0 inet manual

iface eth1 inet manual

auto bond0
iface bond0 inet manual
slaves eth2, eth3
bond_miimon 100
bond_mode 802.3ad
bond_xmit_hash_policy layer2

auto vmbr0
iface vmbr0 inet static
address  192.168.121.31
netmask  255.255.255.0
gateway  192.168.121.1
bridge_ports bond0
bridge_stp off
bridge_fd 0

The MAC addresses differ in the udev rules, but nothing else.

ethtool says eth2 and eth3 don't have a link.

On S2 (the working machine) it says eth2 is down and eth3 is up, but a bond
is formed and the machine is connected.

What is happening here and how can it be resolved?

thanks

Roland
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com