Re: [ceph-users] Which OS for fresh install?

2014-07-23 Thread Dimitri Maziuk
On 07/23/2014 04:09 PM, Bachelder, Kurt wrote:

 2.) update your grub.conf to boot to the appropriate image (default=0, or 
 whatever kernel in the list you want to boot from).

Actually, edit /etc/sysconfig/kernel, set DEFAULTKERNEL=kernel-lt before
installing it.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem installing ceph from package manager / ceph repositories

2014-06-11 Thread Dimitri Maziuk
On 06/09/2014 03:08 PM, Karan Singh wrote:

 1. When installing Ceph using package manger and ceph repositores , the
 package manager i.e YUM does not respect the ceph.repo file and takes ceph
 package directly from EPEL .

Option 1: install yum-plugin-priorities, add priority = X to ceph.repo.
X should be less than EPEL's priority, the default is I believe 99.

Option 2: add exclude = ceph_package(s) to epel.repo.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recommended way to use Ceph as storage for file server

2014-06-02 Thread Dimitri Maziuk
On 06/02/2014 11:24 AM, Mark Nelson wrote:

 A more or less obvious alternative for CephFS would be to simply create
 a huge RBD and have a separate file server (running NFS / Samba /
 whatever) use that block device as backend. Just put a regular FS on top
 of the RBD and use it that way.
 Clients wouldn't really have any of the real performance and resilience
 benefits that Ceph could offer though, because the (single machine?)
 file server is now the bottleneck.

Performance: assuming all your nodes are fast storage on a quad-10Gb
pipe. Resilience: your gateway can be an active-passive HA pair, that
shouldn't be any different from NFS+DRBD setups.

 It's kind of a tough call.  Your observations regarding the downsides of
 using NFS with RBD are apt.  You could try throwing another distributed
 storage system on top of RBD and use Ceph for the replication/etc, but
 that's not really ideal either.  CephFS is relatively stable with
 active/standby MDS configurations, but it may still have bugs and there
 are no guarantees or official support (yet!).

If you believe in the 10 years rule of thumb, cephfs will become
stable enough for production use sometime between 2017 and 2022 dep. on
whether you start counting from Sage's thesis defense or from the first
official code release. ;)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-27 Thread Dimitri Maziuk
On 05/27/2014 10:30 AM, Craig Lewis wrote:

 A ZFS snapshot is atomic, but it doesn't tell the daemons to flush their
 logs to disk.  Reverting to a snapshot looks the same as if you turned
 off the machine by yanking the power cord at the instant the snapshot
 was taken.

That sounds more relevant than OOM due to slab fragmentation -- as I
understand it, basically that's a concern if you don't have enough ram,
in which case you've a problem zfs or no zfs.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup mon-data?

2014-05-23 Thread Dimitri Maziuk
On 05/23/2014 03:06 PM, Craig Lewis wrote:

 1: ZFS or Btrfs snapshots could do this, but neither one are recommended
 for production.

Out of curiosity, what's the current beef with zfs? I know what problems
are cited for btrfs, but I haven't heard much about zfs lately.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PCI-E SSD Journal for SSD-OSD Disks

2014-05-15 Thread Dimitri Maziuk
On 05/15/2014 01:19 PM, Tyler Wilson wrote:

 Would running a different distribution affect this at all? Our target was
 CentOS 6 however if a more
 recent kernel would make a difference we could switch.

FWIW you can run centos 6 with 3.10 kernel from elrepo.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS over CEPH - best practice

2014-05-12 Thread Dimitri Maziuk

On 5/12/2014 4:52 AM, Andrei Mikhailovsky wrote:

Leen,

thanks for explaining things. I does make sense now.

Unfortunately, it does look like this technology would not fulfill my
requirements as I do need to have an ability to perform maintenance
without shutting down vms.


I've no idea how much state you need to share for iscsi failover; with 
nfs you put the cluster ip address, the lock directories  the daemons 
on a heartbeat'ed pair of machines. With automount you don't need 
multiple active servers, you can do (much simpler) active-passive.


Dima


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS over CEPH - best practice

2014-05-12 Thread Dimitri Maziuk
On 05/12/2014 01:17 PM, McNamara, Bradley wrote:
 The underlying file system on the RBD needs to be a clustered file
system, like OCFS2, GFS2, etc., and a cluster between the two, or more,
iSCSI target servers needs to be created to manage the clustered file
system.


Looks like we aren't sure what the OP wanted multiple servers for:
- serving one image to multiple clients (in which case all of the above
plus more applies), or
- failover setup with one image/one client (in which case you could
usually go active/passive and not care about concurrency and all its
tentacles).

Andrei?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 16 osds: 11 up, 16 in

2014-05-07 Thread Dimitri Maziuk
On 05/07/2014 04:11 PM, Craig Lewis wrote:
 On 5/7/14 13:40 , Sergey Malinin wrote:
 Check dmesg and SMART data on both nodes. This behaviour is similar to
 failing hdd.


 
 It does sound like a failing disk... but there's nothing in dmesg, and
 smartmontools hasn't emailed me about a failing disk.  The same thing is
 happening to more than 50% of my OSDs, in both nodes.

check 'iostat -dmx 5 5' (or some other numbers) -- if you see 100%+ disk
utilization, that could be the dying one.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] advice with hardware configuration

2014-05-06 Thread Dimitri Maziuk
On 05/06/2014 11:34 AM, Xabier Elkano wrote:

 OS: 2xSSD intel SC3500 100G Raid 1

Why would you put os on ssds? If buy enough ram so it doesn't swap,
about the only i/o on the system drive will be logging. All that'd do is
wear out your ssds, not that there's much of that going on. (Our servers
average .01% utilization on system drives, most of it log writes.)

I can see placing os and journals on the same disks, then ssds make
sense because that's where journals are.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] The Ceph disk I would like to have

2014-03-25 Thread Dimitri Maziuk
On 03/25/2014 10:49 AM, Loic Dachary wrote:
 Hi,
 
 It's not available yet but ... are we far away ? 

It's a pity Pi doesn't do SATA. Otherwise all you'd need's a working arm
port and some scripting...


-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] The next generation beyond Ceph

2014-03-21 Thread Dimitri Maziuk
On 03/21/2014 04:20 PM, Loic Dachary wrote:
 Hi Ted,
 
 Sorry if I misunderstood your initial message : I did not realize it
was marketing for the competition.


Dear Loic,

I wanted to reach out to you with this exciting money transfer
opportunity that I believe your bank account could really benefit from.
It is currently still in stealth mode, but it's already very big in
Nigeria. Would you send us all your bank account passwords so we can
educate you about our offer?

;)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD module - RHEL 6.4

2014-01-29 Thread Dimitri Maziuk
On 01/29/2014 12:27 PM, alistair.whit...@barclays.com wrote:

 We will not be able to deploy anything other than a fully supported RedHat 
 kernel

in which case your only option is probably RHEL 7 and hope they didn't
exclude ceph modules from their kernel.

Stock centos 6.5 kernel does not have rbd.ko so I'm sure the upstream
rhel one doesn't either. ELRepo's kernel 3.10 has it, but that's not
going to help you.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD module - RHEL 6.4

2014-01-29 Thread Dimitri Maziuk
On 01/29/2014 12:47 PM, Schlacta, Christ wrote:

 Dkms is red hat technology.  they developed it. Whether or not they support
 it I don't know... what do know is that dkms by design didn't modify your
 running, installed, fully supported RedHat kernel. This is in fact why
 and how RedHat designed it

First of all, that's rubbish, you can't install a driver without
modifying your system. That's why even the stuff RedHat provides as
technology preview is not supported by RedHat for production use; I'm
fairly sure stuff you build yourself is out of the question entirely.

Second, it's usually not about technology, it's about auditors with
checklists. The fact that you can do it and it will most likely work
just fine has nothing to do with it.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] servers advise (dell r515 or supermicro ....)

2014-01-15 Thread Dimitri Maziuk

On 1/15/2014 9:16 AM, Mark Nelson wrote:

On 01/15/2014 09:14 AM, Alexandre DERUMIER wrote:



For the system disk, do you use some kind of internal flash memory disk ?


We probably should have, but ended up with I think just a 500GB 7200rpm
disk, whatever was cheapest. :)


If your system has to swap a lot you need more ram. If it loads stuff 
from disk a lot (other than thrashing), take a closer look at your job 
mix: there's likely things you probably should run elsewhere. Outside of 
those the only thing that bangs on system disk is logging and you can 
log to a dedicated log server and eliminate that bit of system disk i/o 
altogether.


I.e. speed-wise you should be ok with a usb stick for a system drive.

Dima

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph / Dell hardware recommendation

2014-01-15 Thread Dimitri Maziuk
On 01/15/2014 10:53 AM, Alexandre DERUMIER wrote:
 From what I understand the flexbay are inside the box, typically 
 usefull for OS (SSD) drives, then it lets you use all the front hotlug 
 slot with larger platter drives. 
 
 Yes, it's inside the box.
 
 I ask the question because of the derek message:
 
 
  They currently give me a hard time about trying to mix and 
 match SSDs though on the 12 bay back-plane which is not a technical 
 problem but a Dell problem

At a guess, Dell BIOS complains that the drives/configuration is not
supported, contact your Dell representative for replacement. Press F1 to
boot.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph / Dell hardware recommendation

2014-01-15 Thread Dimitri Maziuk
On 01/15/2014 12:42 PM, Derek Yarnell wrote:
...
  I think this is more a configuration Dell has been
 unwilling to sell is all.

Ah.

Every once in a while they make their bios complain when it finds a
non-Dell approved disk. Once enough customers start screaming they
release a bios update that turns that bit off and it stays that way
for a while... and then they release the next h/w model and the cycle
repeats again. ;)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph as offline S3 substitute and peer-to-peer fileshare?

2014-01-02 Thread Dimitri Maziuk
On 01/02/2014 04:20 PM, Alek Storm wrote:
 Anything? Would really appreciate any wisdom at all on this.

I think what you're looking for is called git.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster Performance very Poor

2013-12-27 Thread Dimitri Maziuk
On 12/27/2013 05:10 PM, German Anders wrote:

 1048576000 bytes (1.0 GB) copied, 10.2545 s, 102 MB/s

FWIW I've a crappy crucial v4 ssd that clocks about 106MB/s on
sequential i/o... Not sure how much you expect to see, esp. if you have
a giga*bit* link to some of the disks.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] When will Ceph FS be ready for use with production data

2013-12-21 Thread Dimitri Maziuk

On 12/21/2013 10:04 AM, Wido den Hollander wrote:

On 12/21/2013 02:50 PM, Yan, Zheng wrote:



I don't know when inktank will claim Cephfs is stable. But as a cephfs
developer, I already have trouble to find new issue in my test setup.
If you are willing to help improve cephfs, please try cephfs and
report any issue you encounter.



Great to hear. Are you also testing Multi-MDS or just one Active/Standby?

And snapshots? Those were giving some problems as well.


What was it I heard about performance tiers? Last I tried cephfs was 
spreading i/o fairly over osds, fast and slow, with no way to tune 
that up.


Thanks,
Dima


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] centos6.4 + libvirt + qemu + rbd/ceph

2013-12-06 Thread Dimitri Maziuk
On 12/06/2013 04:03 PM, Alek Paunov wrote:

 We use only Fedora servers for everything, so I am curious, why you are
 excluded this option from your research? (CentOS is always problematic
 with the new bits of technology).

6 months lifecycle and having to os-upgrade your entire data center 3
times a year?

(OK maybe it's 18 months and once every 9 months)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] centos6.4 + libvirt + qemu + rbd/ceph

2013-12-06 Thread Dimitri Maziuk
On 12/06/2013 04:28 PM, Alek Paunov wrote:
 On 07.12.2013 00:11, Dimitri Maziuk wrote:

 6 months lifecycle and having to os-upgrade your entire data center 3
 times a year?

 (OK maybe it's 18 months and once every 9 months)
 
 Most servers novadays are re-provisioned even more often,

Not where I work they aren't.

 Fedora release comes with more and more KVM/Libvirt features and
 resolved issues, so the net effect is positive anyway.

Yes, that is the main argument for tracking ubuntu. ;)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is Ceph a provider of block device too ?

2013-11-21 Thread Dimitri Maziuk
On 11/21/2013 12:52 PM, Gregory Farnum wrote:

 If you want a logically distinct copy (? this seems to be what Dimitri
 is referring to with adding a 3rd DRBD copy on another node)

Disclaimer: I haven't done stacked drbd, this is from my reading of
the fine manual -- I was referring to stacked setup where you make a
drbd raid-1 w/ 2 hosts and then a drbd raid-1 w/ the that drbd device
and another host. I don't believe drbd can keep 3 replicas any other way
-- unlike ceph, obviously.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] alternative approaches to CEPH-FS

2013-11-20 Thread Dimitri Maziuk
On 11/19/2013 08:02 PM, YIP Wai Peng wrote:

 Hm, so maybe this nfsceph is not _that_ bad after all! :) Your read clearly
 wins, so I'm guessing the drdb write is the slow one. Which drdb mode are
 you using?

Active/passive pair, meta-disk internal, protocol C over a 5-long
crossover cable on eth1: 1000baseT/Full. Protocol B would probably
speed up the writes, but when I run things that write a lot I make them
write to /var/tmp anyway...

cheers,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disk Density Considerations

2013-11-06 Thread Dimitri Maziuk

On 2013-11-06 08:37, Mark Nelson wrote:
...

Taking this even further, options like the hadoop fat twin nodes with 12
drives in 1U potentially could be even denser, while spreading the
drives out over even more nodes.  Now instead of 4-5 large dense nodes
you have maybe 35-40 small dense nodes.  The downside here though is
that the cost may be a bit higher and you have to slide out a whole node
to swap drives, though Ceph is more tolerant of this than many
distributed systems.


Another one is 35-40 switch ports vs 4-5. I hear regular 10G ports eat 
up over 10 watts of juice and cat6e cable offers a unique combination of 
poor design and high cost. It's probably ok to need 35-40 routable ip 
addresses: you can add another interface  subnet to your public-facing 
clients.


Dima

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Red Hat clients

2013-10-30 Thread Dimitri Maziuk
On 10/30/2013 02:35 PM, Gruher, Joseph R wrote:
 I have CentOS 6.4 running with the 3.11.6 kernel from elrepo and it
includes the rbd module. I think you could make the same update on RHEL
6.4 and get rbd.

Mmm... I think RHEL means paid support means you can't run an elrepo
kernel. Plus I didn't have much luck with their -ml kernels (also centos
6.current) -- half of them wouldn't boot on our supermicros and the
latest crop won't boot on my dell pc.

So yeah, if by RHEL you mean centos/scilinux and you find an -ml kernel
that actually works on your hardware... then you get rbd. As long as you
don't 'yum update' the kernel.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Project Manila (OpenStack)

2013-10-23 Thread Dimitri Maziuk

On 2013-10-22 22:41, Gregory Farnum wrote:
...

Right now, unsurprisingly, the focus of the existing Manila developers
is on Option 1: it's less work than the others and supports the most
common storage protocols very well. But as mentioned, it would be a
pretty poor fit for CephFS


I must be missing something, I thought CephFS was supposed to be a 
distributed filesystem which to me means option 1 was the point.


Dima

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and RAID

2013-10-03 Thread Dimitri Maziuk
On 10/03/2013 12:40 PM, Andy Paluch wrote:

 Don't you have to take down a ceph node to replace defective drive? If I have 
 a 
 ceph node with 12 disks and one goes bad, would I not have to take the entire 
 node down to replace and then reformat?
 
 If I have a hotswap chassis but using just an hba to connect my drives will 
 the 
 os (say latest Ubuntu)  support hot-swapping the drive or do I have to shut 
 it 
 down to replace the drive then bring ip and format etc.

Linux supports hotswap. You'll have to restart an osd, but not reboot
the node.

The issue with cluster rebalancing is bandwidth: basically, sata/sas
backplane on one node vs (potentially) the slowest network link in your
cluster that also carries data traffic for everybody. There's too many
variables involved, you figure out the balance between ceph replication
and raid replication for your cluster  budget.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and RAID

2013-10-02 Thread Dimitri Maziuk

On 2013-10-02 07:35, Loic Dachary wrote:

Hi,

I would not use RAID5 since it would be redundant with what Ceph provides.


I would not use raid-5 (or 6) because its safety on modern drives is 
questionable and because I haven't seen anyone comment on ceph's 
performance -- e.g. openstack docs explicitly say don't use raid-5 
because swift's access patterns are the worst case for raid.


I would consider (mdadm) raid-1, dep. on the hardware  budget, because 
this way a single disk failure will not trigger a cluster-wide rebalance.


Dima


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] some newbie questions...

2013-08-31 Thread Dimitri Maziuk

On 2013-08-31 11:36, Dzianis Kahanovich wrote:

Johannes Klarenbeek пишет:



1) i read somewhere that it is recommended to have one OSD per disk in a 
production environment.
is this also the maximum disk per OSD or could i use multiple disks per 
OSD? and why?


you could use multiple disks for one OSD if you used some striping and abstract 
the disk (like LVM, MDRAID, etc). But it wouldn't make sense. One OSD writes 
into one filesystem, that is usually one disk in a production environment. 
Using RAID under it wouldn't increase neither reliability nor performance 
drastically.


I see some sense in RAID 0: single ceph-osd daemon per node (but still
disk-per-osd self). But if you have relative few [planned] cores per task on
node - you can think about it.


Raid-0: single disk failure kills the entire filesystem, off-lines the 
osd and triggers a cluster-wide resync. Actual raid: single disk failure 
does not affect the cluster in any way.


Dima

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD to OSD Communication

2013-08-30 Thread Dimitri Maziuk
On 08/30/2013 01:38 PM, Geraint Jones wrote:
 
 
 On 30/08/13 11:33 AM, Wido den Hollander w...@42on.com wrote:
 
 On 08/30/2013 08:19 PM, Geraint Jones wrote:
 Hi Guys

 We are using Ceph in production backing an LXC cluster. The setup is : 2
 x Servers, 24 x 3TB Disks each in groups of 3 as RAID0. SSD for
 journals. Bonded 1gbit ethernet (2gbit total).


 I think you sized your machines too big. I'd say go for 6 machines with
 8 disks each without RAID-0. Let Ceph do it's job and avoid RAID.
 
 Typical traffic is fine - its just been an issue tonight :)

If you hosed and have to recover an 9TB filesystem, you'll have problems
no matter what, ceph or no ceph. You *will* have a disk failure every
once in a while, and there's no r in raid-0, so don't think what
happened is not typical.

(There's nothing wrong with raid as long it's 0.)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD to OSD Communication

2013-08-30 Thread Dimitri Maziuk
On 08/30/2013 01:51 PM, Mark Nelson wrote:
 On 08/30/2013 01:47 PM, Dimitri Maziuk wrote:

 (There's nothing wrong with raid as long it's 0.)
 
 One exception: Some controllers (looking at you LSI!) don't expose disks
 as JBOD or if they do, don't let you use write-back cache.  In those
 cases we some times have people make single-disk RAID0 LUNs. :)

We don't use the ones we have for jbod, but I do recall trying and
failing, yes. They do make what our vendor calls hba controllers,
though, and for noticeably less money.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW High Availability

2013-05-09 Thread Dimitri Maziuk
On 05/09/2013 09:57 AM, Tyler Brekke wrote:
 For High availability RGW you would need a load balancer. HA Proxy is
 an example of a load balancer that has been used successfully with
 rados gateway endpoints.

Strictly speaking for HA you need an HA solution. E.g. heartbeat. Main
difference between that and load balancing is that one server serves the
clients until it dies, then another takes over. With load balancing, all
servers get a share of the requests. It can be configured to do HA: set
main server's share to 100%, then the backup will get no requests as
long as the main is up.

RRDNS is a load balancing solution. Dep. on the implementation it can
simply return a list of IPs instead of a single IP for the host name,
then it's up to the client to pick one. A simple stupid client may
always pick the first one. A simple stupid server may always return the
list in the same order. That could be how all your clients always pick
the same server.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] interesting crush rules

2013-05-01 Thread Dimitri Maziuk
On 05/01/2013 04:51 PM, Gregory Farnum wrote:
 On Wed, May 1, 2013 at 2:44 PM, Sage Weil s...@inktank.com wrote:
 I added a blueprint for extending the crush rule language.  If there are
 interesting or strange placement policies you'd like to do and aren't able
 to currently express using CRUSH, please help us out by enumerating them
 on that blueprint.
 
 http://wiki.ceph.com/01Planning/02Blueprints/Dumpling/extend_crush_rule_language,
 if you don't have the blueprint site handy already. :)

My issue was placement of the files/directories (w/ cephfs): I wanted a
complete file (or directory  all the files in it) on osd.x.

The rules I'm interested in would be like

- pick one osd from rack 1, pick 2 osds from rack 2, put complete copy
of everything on each (HA scenario w/ 2 copies in the on-site rack 2
and a copy in the off-site rack 1).

- pick all osds from group compute nodes and place complete copy of
everything on each (data placement on compute grids).

(Obviously, there's also the bit about getting the clients to read from
the right osd.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mon quorum

2013-04-05 Thread Dimitri Maziuk

On 4/5/2013 7:57 AM, Wido den Hollander wrote:


You always need a majority of your monitors to be up. In this case you
loose 66% of your monitors, so mon.b can't get a majority.

With 3 monitors you need at least 2 to be up to have your cluster working.


That's kinda useless, isn't it? I'd've thought 2 copies on-site and one 
off-site, and if the main site room's down we can work off the off-site 
server is a basic enough HA setup -- we've had it here for some time. 
Now you tell me ceph won't even do that?


Dima

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mon quorum

2013-04-05 Thread Dimitri Maziuk
On 04/05/2013 10:12 AM, Wido den Hollander wrote:

 Think about it this way. You have two racks and the network connection
 between them fails. If both racks keep operating because they can still
 reach that single monitor in their rack you will end up with data
 inconsistency.

Yes. In DRBD land it's called 'split brain' and they have (IIRC) entire
chapter in the user manual about picking up the pieces. It's not a new
problem.

 You should place mon.c outside rack A or B to keep you up and running in
 this situation.

It's not about racks, it's about rooms, but let's say rack == room ==
colocation facility. And I have two of those.

Are you saying I need a 3rd colo with all associated overhead to have a
usable replica of my data in colo #2?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com