[ceph-users] Ceph over IP over Infiniband

2017-12-18 Thread Phil Schwarz
Hi,
I'm currently trying to set up a brand new home cluster :
- 5 nodes, with each :

- 1 HCA Mellanox ConnectX-2
- 1 GB Ethernet (Proxmox 5.1 Network Admin)
- 1 CX4 to CX4 cable

All together connected to a SDR Flextronics IB Switch.

This setup should back a Ceph Luminous (V12.2.2 included in proxmox
V5.1) On all nodes, I did:
- apt-get infiniband-diags
- modprobe mlx4_ib
- modprobe ib_ipoib
- modprobe ib_umad
- ifconfig ib0 IP/MASK

On two nodes (tried previously on a single on, same issue), i installed
opensm ( The switch doesn't have SM included) :
apt-get install opensm
/etc/init.d/opensm stop
/etc/init.d/opensm start
(Necessary to let the daemon create the logfiles)

I tailed the logfile and got a "Active&Running" Setup, with "SUBNET UP"

Every node is OK regardless to IB Setup :
- All ib0 are UP, using ibstat
- ibhosts and ibswitches seem to be OK

On a node :
ibping -S

On every other node :
ibping -G GID_Of_Previous_Server_Port

I got a nice pong reply on every node. Should be happy, but...
But i never went further.. Tried to ping each other. No way to get into
this (mostly probably) simple issue...


Any hint to achieve this task ??


Thanks for all
Best regards

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - SSD cluster

2017-11-21 Thread Phil Schwarz
Hi,
not a real HAL, but keeping this list [1] in mind is mandatory.

According to me, use roughly any kind of Intel SSD :3750 in SATA or best
3700 in MVNE.
Avoid any Samsung pro or EVO  of nearly any kind.(Haven't found a link,
sorry)

My 2 cents

[1] :
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/


Le 21/11/2017 à 11:34, Ronny Aasen a écrit :
> On 20. nov. 2017 23:06, Christian Balzer wrote:
>> On Mon, 20 Nov 2017 15:53:31 +0100 Ansgar Jazdzewski wrote:
>>
>>> Hi *,
>>>
>>> just on note because we hit it, take a look on your discard options
>>> make sure it not run on all OSD at the same time.
>>>
>> Any SSD that actually _requires_ the use of TRIM/DISCARD to maintain
>> either speed or endurance I'd consider unfit for Ceph to boot.
>>
> 
> 
> hello
> 
> is there some sort of hardware compatibillity list for this part ?
> perhaps community maintained on the wiki or similar.
> 
> there are some older blog posts covering some devices, but hard to find
> ceph related for current devices.
> 
> kind regards
> Ronny Aasen
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-15 Thread Phil Schwarz
Hi,
thanks for the explanation, but...
Twisting the Ceph storage model as you plan it is not a good idea :
- You will decrease the support level(I'm not sure many people will
build such an architecture)
- You are certainly going to face strange issues with HW Raid on top of
Ceph OSD
- You should'nt want to go to size=2. I know the counterparts of size=3
(IOPS, Usable space), but it seems not really safe to downgrade to size=2.
- Your servers seem to have enough horsepower regarding CPU,RAM and
disks. But you havent't told us about Ceph replication Network. At least
10Gbe, i hope.
- Your public network should be more than 1Gbe too, far more..
- How will you export VM ? single KVM samba server ? Ceph authx clients ???
- Rapidly, with size=3, you have, with 4 servers : 4*8*2/3=22TB usable
space. With 100 VDI, 220 GB per VM.. Is it enough to expand those VM sizes ?


In a conclusion,i fully understand the issues doing a complete test lab
before buying a complete cluster. But, you should do a few tests before
to tweak the solution to your needs.

Good luck
Best regards


Le 14/11/2017 à 11:36, Oscar Segarra a écrit :
> Hi Anthony,
> 
> 
> o I think you might have some misunderstandings about how Ceph works. 
> Ceph is best deployed as a single cluster spanning multiple servers,
> generally at least 3.  Is that your plan?   
> 
> I want to deply servers for 100VDI Windows 10 each (at least 3 servers).
> I plan to sell servers dependingo of the number of VDI required by my
> customer. For 100 VDI --> 3 servers, for 400 VDI --> 4 servers
> 
> This is my proposal of configuration:
> 
> *Server1:*
> CPU: 2x16 Core
> RAM: 512
> Disk: 2x400 for OS and 8x1.9TB for VM (SSD)
> 
> *Server2:*
> CPU: 2x16 Core
> RAM: 512
> Disk: 2x400 for OS and 8x1.9TB for VM (SSD)
> 
> *Server3:*
> CPU: 2x16 Core
> RAM: 512
> Disk: 2x400 for OS and 8x1.9TB for VM (SSD)
> 
> *Server4:*
> CPU: 2x16 Core
> RAM: 512
> Disk: 2x400 for OS and 8x1.9TB for VM (SSD)
> ...
> *ServerN:*
> CPU: 2x16 Core
> RAM: 512
> Disk: 2x400 for OS and 8x1.9TB for VM (SSD)
> 
> If I create an OSD for each disk and I pin a core for each osd in a
> server I wil need 8 cores just for managing osd. If I create 4 RAID0 of
> 2 disks each, I will need just 4 osd, and so on:
> 
> 1 osd x 1 disk of 4TB
> 1 osd x 2 disks of 2TB
> 1 odd x 4 disks of 1 TB
> 
> If the CPU cycles used by Ceph are a problem, your architecture has IMHO
> bigger problems.  You need to design for a safety margin of RAM and CPU
> to accommodate spikes in usage, both by Ceph and by your desktops. 
> There is no way each of the systems you describe is going to have enough
> cycles for 100 desktops concurrently active.  You'd be allocating each
> of them only ~3GB of RAM -- I've not had to run MS Windows 10 but even
> with page sharing that seems awfully tight on RAM.
> 
> Sorry, I think my design has not been correctly explained. I hope my
> previous explanation clarifies it. The problem is i'm in the design
> phase and I don't know if ceph CPU cycles can be a problem and that is
> the principal object of this post.
> 
> With the numbers you mention throughout the thread, it would seem as
> though you would end up with potentially as little as 80GB of usable
> space per virtual desktop - will that meet your needs?
> 
> Sorry, I think 80GB is enough, nevertheless, I plan to use RBD clones
> and therefore even with size=2, I think I will have more than 80GB
> available for each vdi.
> 
> In this design phase where I am, every advice is really welcome!
> 
> Thanks a lot
> 
> 2017-11-13 23:40 GMT+01:00 Anthony D'Atri  >:
> 
> Oscar, a few thoughts:
> 
> o I think you might have some misunderstandings about how Ceph
> works.  Ceph is best deployed as a single cluster spanning multiple
> servers, generally at least 3.  Is that your plan?  It sort of
> sounds as though you're thinking of Ceph managing only the drives
> local to each of your converged VDI hosts, like local RAID would. 
> Ceph doesn't work that way.  Well, technically it could but wouldn't
> be a great architecture.  You would want to have at least 3 servers,
> with all of the Ceph OSDs in a single cluster.
> 
> o Re RAID0:
> 
> > Then, may I understand that your advice is a RAID0 for each 4TB? For a
> > balanced configuration...
> >
> > 1 osd x 1 disk of 4TB
> > 1 osd x 2 disks of 2TB
> > 1 odd x 4 disks of 1 TB
> 
> 
> For performance a greater number of smaller drives is generally
> going to be best.  VDI desktops are going to be fairly
> latency-sensitive and you'd really do best with SSDs.  All those
> desktops thrashing a small number of HDDs is not going to deliver
> tolerable performance.
> 
> Don't use RAID at all for the OSDs.  Even if you get hardware RAID
> HBAs, configure JBOD/passthrough mode so that OSDs are deployed
> directly on the drives.  This will minimize latency as well as
> manifold hassles th

Re: [ceph-users] [PVE-User] OSD won't start, even created ??

2017-09-09 Thread Phil Schwarz

Did a few more tests :

Older Ceph server with a pveceph create osd command (

(pveceph create osd /dev/sdb

equivalent to

ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid 
a5c0cfed-...4bf939ed70 /dev/sdb )


sgdisk --print /dev/sdd

Disk /dev/sdd: 2930277168 sectors, 1.4 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 638646CF-..-62296C871132
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 2930277134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)End (sector)  Size   Code  Name
   110487808  2930277134   1.4 TiB F800  ceph data
   2204810487807   5.0 GiB F802  ceph journal


On a newer ceph server ( dpkg -l : 12.2.0-pve1 version)

sgdisk --print /dev/sdb

Disk /dev/sdb: 1465149168 sectors, 698.6 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): D63886B6-0.26-BCBCD6FFCA3C
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1465149134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)End (sector)  Size   Code  Name
   12048  206847   100.0 MiB   F800  ceph data
   2  206848  1465149134   698.5 GiB     ceph block


Related to the cep-osd.admin log , i think i used a osd creation process 
leading to a bluestore osd (instead of a filestore one).
And seems that afterward the ceph server is unable to use the new 
bluestore :


( bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 
102: buffer::malformed_input: void 
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode 
past end of struct encoding

)
just before trying to use it as a filestore one :

( probe_block_device_fsid /dev/sdb2 is filestore )


Tried to use the --bluestore 0 flag when creating the osd, but the flag 
is unknown.


thanks by advance for any hint.
Being ready to do a few more tests.
Best regards.

Le 08/09/2017 à 17:25, Phil Schwarz a écrit :

Hi,
any help would be really useful.
Does anyone got a clue with my issue ?

Thanks by advance.
Best regards;


Le 05/09/2017 à 20:25, Phil Schwarz a écrit :

Hi,
I come back with same issue as seen in previous thread ( link given)

trying to a 2TB SATA as OSD:
Using proxmox GUI or CLI (command given) give the same (bad) result.

Didn't want to use a direct 'ceph osd create', thus bypassing pxmfs
redundant filesystem.

I tried to build an OSD woth same disk on another machine (stronger one
with Opteron QuadCore), failing at the same time.


Sorry for crossposting, but i think, i fail against the pveceph wrapper.


Any help or clue would be really useful..

Thanks
Best regards.










-- Link to previous thread (but same problem):
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg38897.html


-- commands :
fdisk /dev/sdc ( mklabel msdos, w, q)
ceph-disk zap /dev/sdc
pveceph createosd /dev/sdc

-- dpkg -l

 dpkg -l |grep ceph
ii  ceph 12.1.2-pve1 amd64
distributed storage and file system
ii  ceph-base12.1.2-pve1 amd64common
ceph daemon libraries and management tools
ii  ceph-common  12.1.2-pve1 amd64common
utilities to mount and interact with a ceph storage cluster
ii  ceph-mgr 12.1.2-pve1 amd64
manager for the ceph distributed storage system
ii  ceph-mon 12.1.2-pve1 amd64
monitor server for the ceph storage system
ii  ceph-osd 12.1.2-pve1 amd64OSD
server for the ceph storage system
ii  libcephfs1   10.2.5-7.2 amd64Ceph
distributed file system client library
ii  libcephfs2   12.1.2-pve1 amd64Ceph
distributed file system client library
ii  python-cephfs12.1.2-pve1 amd64Python
2 libraries for the Ceph libcephfs library

-- tail -f /var/log/ceph/ceph-osd.admin.log

2017-09-03 18:28:20.856641 7fad97e45e00  0 ceph version 12.1.2
(cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process
(unknown), pid 5493
2017-09-03 18:28:20.857104 7fad97e45e00 -1 bluestore(/dev/sdc2)
_read_bdev_label unable to decode label at offset 102:
buffer::malformed_input: void
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode
past end of struct encoding
2017-09-03 18:28:20.857200 7fad97e45e00  1 journal _open /dev/sdc2 fd 4:
2000293007360 bytes, block size 4096 bytes, directio = 0, aio = 0
2017-09-03 18:28:20.857366 7fad97e45e00  1 journal close /dev/sdc2
2017-09-03 18:28:20.857431 7fad97e45e00  0 probe_block_device_fsid
/dev/sdc2 is filestore, ----
2017-09-03 18:28:21.937285 7fa5766a5e00  0 ceph version 12.1.2
(cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous

Re: [ceph-users] [PVE-User] OSD won't start, even created ??

2017-09-08 Thread Phil Schwarz
Hi,
any help would be really useful.
Does anyone got a clue with my issue ?

Thanks by advance.
Best regards;


Le 05/09/2017 à 20:25, Phil Schwarz a écrit :
> Hi,
> I come back with same issue as seen in previous thread ( link given)
> 
> trying to a 2TB SATA as OSD:
> Using proxmox GUI or CLI (command given) give the same (bad) result.
> 
> Didn't want to use a direct 'ceph osd create', thus bypassing pxmfs
> redundant filesystem.
> 
> I tried to build an OSD woth same disk on another machine (stronger one
> with Opteron QuadCore), failing at the same time.
> 
> 
> Sorry for crossposting, but i think, i fail against the pveceph wrapper.
> 
> 
> Any help or clue would be really useful..
> 
> Thanks
> Best regards.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- Link to previous thread (but same problem):
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg38897.html
> 
> 
> -- commands :
> fdisk /dev/sdc ( mklabel msdos, w, q)
> ceph-disk zap /dev/sdc
> pveceph createosd /dev/sdc
> 
> -- dpkg -l
> 
>  dpkg -l |grep ceph
> ii  ceph 12.1.2-pve1 amd64   
> distributed storage and file system
> ii  ceph-base12.1.2-pve1 amd64common
> ceph daemon libraries and management tools
> ii  ceph-common  12.1.2-pve1 amd64common
> utilities to mount and interact with a ceph storage cluster
> ii  ceph-mgr 12.1.2-pve1 amd64   
> manager for the ceph distributed storage system
> ii  ceph-mon 12.1.2-pve1 amd64   
> monitor server for the ceph storage system
> ii  ceph-osd 12.1.2-pve1 amd64OSD
> server for the ceph storage system
> ii  libcephfs1   10.2.5-7.2 amd64Ceph
> distributed file system client library
> ii  libcephfs2   12.1.2-pve1 amd64Ceph
> distributed file system client library
> ii  python-cephfs12.1.2-pve1 amd64Python
> 2 libraries for the Ceph libcephfs library
> 
> -- tail -f /var/log/ceph/ceph-osd.admin.log
> 
> 2017-09-03 18:28:20.856641 7fad97e45e00  0 ceph version 12.1.2
> (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process
> (unknown), pid 5493
> 2017-09-03 18:28:20.857104 7fad97e45e00 -1 bluestore(/dev/sdc2)
> _read_bdev_label unable to decode label at offset 102:
> buffer::malformed_input: void
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode
> past end of struct encoding
> 2017-09-03 18:28:20.857200 7fad97e45e00  1 journal _open /dev/sdc2 fd 4:
> 2000293007360 bytes, block size 4096 bytes, directio = 0, aio = 0
> 2017-09-03 18:28:20.857366 7fad97e45e00  1 journal close /dev/sdc2
> 2017-09-03 18:28:20.857431 7fad97e45e00  0 probe_block_device_fsid
> /dev/sdc2 is filestore, ----
> 2017-09-03 18:28:21.937285 7fa5766a5e00  0 ceph version 12.1.2
> (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process
> (unknown), pid 5590
> 2017-09-03 18:28:21.944189 7fa5766a5e00 -1 bluestore(/dev/sdc2)
> _read_bdev_label unable to decode label at offset 102:
> buffer::malformed_input: void
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode
> past end of struct encoding
> 2017-09-03 18:28:21.944305 7fa5766a5e00  1 journal _open /dev/sdc2 fd 4:
> 2000293007360 bytes, block size 4096 bytes, directio = 0, aio = 0
> 2017-09-03 18:28:21.944527 7fa5766a5e00  1 journal close /dev/sdc2
> 2017-09-03 18:28:21.944588 7fa5766a5e00  0 probe_block_device_fsid
> /dev/sdc2 is filestore, ----
> ___
> pve-user mailing list
> pve-u...@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD won't start, even created ??

2017-09-05 Thread Phil Schwarz

Hi,
I come back with same issue as seen in previous thread ( link given)

trying to a 2TB SATA as OSD:
Using proxmox GUI or CLI (command given) give the same (bad) result.

Didn't want to use a direct 'ceph osd create', thus bypassing pxmfs 
redundant filesystem.


I tried to build an OSD woth same disk on another machine (stronger one 
with Opteron QuadCore), failing at the same time.



Sorry for crossposting, but i think, i fail against the pveceph wrapper.


Any help or clue would be really useful..

Thanks
Best regards.










-- Link to previous thread (but same problem):
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg38897.html


-- commands :
fdisk /dev/sdc ( mklabel msdos, w, q)
ceph-disk zap /dev/sdc
pveceph createosd /dev/sdc

-- dpkg -l

 dpkg -l |grep ceph
ii  ceph 12.1.2-pve1 
amd64distributed storage and file system
ii  ceph-base12.1.2-pve1 
amd64common ceph daemon libraries and management tools
ii  ceph-common  12.1.2-pve1 
amd64common utilities to mount and interact with a ceph storage 
cluster
ii  ceph-mgr 12.1.2-pve1 
amd64manager for the ceph distributed storage system
ii  ceph-mon 12.1.2-pve1 
amd64monitor server for the ceph storage system
ii  ceph-osd 12.1.2-pve1 
amd64OSD server for the ceph storage system
ii  libcephfs1   10.2.5-7.2 
amd64Ceph distributed file system client library
ii  libcephfs2   12.1.2-pve1 
amd64Ceph distributed file system client library
ii  python-cephfs12.1.2-pve1 
amd64Python 2 libraries for the Ceph libcephfs library


-- tail -f /var/log/ceph/ceph-osd.admin.log

2017-09-03 18:28:20.856641 7fad97e45e00  0 ceph version 12.1.2 
(cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process 
(unknown), pid 5493
2017-09-03 18:28:20.857104 7fad97e45e00 -1 bluestore(/dev/sdc2) 
_read_bdev_label unable to decode label at offset 102: 
buffer::malformed_input: void 
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode 
past end of struct encoding
2017-09-03 18:28:20.857200 7fad97e45e00  1 journal _open /dev/sdc2 fd 4: 
2000293007360 bytes, block size 4096 bytes, directio = 0, aio = 0

2017-09-03 18:28:20.857366 7fad97e45e00  1 journal close /dev/sdc2
2017-09-03 18:28:20.857431 7fad97e45e00  0 probe_block_device_fsid 
/dev/sdc2 is filestore, ----
2017-09-03 18:28:21.937285 7fa5766a5e00  0 ceph version 12.1.2 
(cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process 
(unknown), pid 5590
2017-09-03 18:28:21.944189 7fa5766a5e00 -1 bluestore(/dev/sdc2) 
_read_bdev_label unable to decode label at offset 102: 
buffer::malformed_input: void 
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode 
past end of struct encoding
2017-09-03 18:28:21.944305 7fa5766a5e00  1 journal _open /dev/sdc2 fd 4: 
2000293007360 bytes, block size 4096 bytes, directio = 0, aio = 0

2017-09-03 18:28:21.944527 7fa5766a5e00  1 journal close /dev/sdc2
2017-09-03 18:28:21.944588 7fa5766a5e00  0 probe_block_device_fsid 
/dev/sdc2 is filestore, ----

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken Ceph Cluster when adding new one - Proxmox 5.0 & Ceph Luminous

2017-08-29 Thread Phil Schwarz

Hi, back to work, i face my problem.

@Alexandre : AMDTurion  for N54L HP Microserver.
This server is OSD and LXC only, no mon working in.

After rebooting the whole cluster and attempting to add a third time the 
same disk :


ceph osd tree
ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.47226 root default
-2 3.65898 host jon
 1 2.2 osd.1  up  1.0  1.0
 3 1.35899 osd.3  up  1.0  1.0
-3 0.34999 host daenerys
 0 0.34999 osd.0  up  1.0  1.0
-4 1.64969 host tyrion
 2 0.44969 osd.2  up  1.0  1.0
 4 1.2 osd.4  up  1.0  1.0
-5 1.81360 host jaime
 5 1.81360 osd.5  up  1.0  1.0
 6   0 osd.6down0  1.0
 7   0 osd.7down0  1.0
 8   0 osd.8down0  1.0

6,7,8 disks are the same issue for the same disk (which isn't faulty).


Any clue ?
I'm gonna try soon to create the osd on this disk in another server.

Thanks.

Best regards
Le 26/07/2017 à 15:53, Alexandre DERUMIER a écrit :

Hi Phil,


It's possible that rocksdb have a bug with some old cpus currently (old xeon 
and some opteron)
I have the same behaviour with new cluster when creating mons
http://tracker.ceph.com/issues/20529

What is your cpu model ?

in your log:

sh[1869]:  in thread 7f6d85db3c80 thread_name:ceph-osd
sh[1869]:  ceph version 12.1.0 (330b5d17d66c6c05b08ebc129d3e6e8f92f73c60) 
luminous (dev)
sh[1869]:  1: (()+0x9bc562) [0x558561169562]
sh[1869]:  2: (()+0x110c0) [0x7f6d835cb0c0]
sh[1869]:  3: 
(rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x871) 
[0x5585615788b1]
sh[1869]:  4: (rocksdb::VersionSet::Recover(std::vector > const&, bool)+0x26bc) 
[0x55856145ca4c]
sh[1869]:  5: (rocksdb::DBImpl::Recover(std::vector > const&, bool, bool, bool)+0x11f) 
[0x558561423e6f]
sh[1869]:  6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string, std::allocator > const&, std:
sh[1869]:  7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string, std::allocator > const&, rocksdb:
sh[1869]:  8: (RocksDBStore::do_open(std::ostream&, bool)+0x68e) 
[0x5585610af76e]
sh[1869]:  9: (RocksDBStore::create_and_open(std::ostream&)+0xd7) 
[0x5585610b0d27]
sh[1869]:  10: (BlueStore::_open_db(bool)+0x326) [0x55856103c6d6]
sh[1869]:  11: (BlueStore::mkfs()+0x856) [0x55856106d406]
sh[1869]:  12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string, std::allocator > const&, uuid_d, int)+0x348) 
[0x558560bc98f8]
sh[1869]:  13: (main()+0xe58) [0x558560b1da78]
sh[1869]:  14: (__libc_start_main()+0xf1) [0x7f6d825802b1]
sh[1869]:  15: (_start()+0x2a) [0x558560ba4dfa]
sh[1869]: 2017-07-16 14:46:00.763521 7f6d85db3c80 -1 *** Caught signal (Illegal 
instruction) **
sh[1869]:  in thread 7f6d85db3c80 thread_name:ceph-osd
sh[1869]:  ceph version 12.1.0 (330b5d17d66c6c05b08ebc129d3e6e8f92f73c60) 
luminous (dev)
sh[1869]:  1: (()+0x9bc562) [0x558561169562]

- Mail original -
De: "Phil Schwarz" 
À: "Udo Lembke" , "ceph-users" 
Envoyé: Dimanche 16 Juillet 2017 15:04:16
Objet: Re: [ceph-users] Broken Ceph Cluster when adding new one - Proxmox 5.0 & 
Ceph Luminous

Le 15/07/2017 à 23:09, Udo Lembke a écrit :

Hi,

On 15.07.2017 16:01, Phil Schwarz wrote:

Hi,
...

While investigating, i wondered about my config :
Question relative to /etc/hosts file :
Should i use private_replication_LAN Ip or public ones ?

private_replication_LAN!! And the pve-cluster should use another network
(nics) if possible.

Udo


OK, thanks Udo.

After investigation, i did :
- set Noout OSDs
- Stopped CPU-pegging LXC
- Check the cabling
- Restart the whole cluster

Everything went fine !

But, when i tried to add a new OSD :

fdisk /dev/sdc --> Deleted the partition table
parted /dev/sdc --> mklabel msdos (Disk came from a ZFS FreeBSD system)
dd if=/dev/null of=/dev/sdc
ceph-disk zap /dev/sdc
dd if=/dev/zero of=/dev/sdc bs=10M count=1000

And recreated the OSD via Web GUI.
Same result, the OSD is known by the node, but not by the cluster.

Logs seem to show an issue with this bluestore OSD, have a look at the file.

I'm gonna give a try to OSD recreating using Filestore.

Thanks


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken Ceph Cluster when adding new one - Proxmox 5.0 & Ceph Luminous

2017-07-16 Thread Phil Schwarz

Le 16/07/2017 à 17:02, Udo Lembke a écrit :

Hi,

On 16.07.2017 15:04, Phil Schwarz wrote:

...
Same result, the OSD is known by the node, but not by the cluster.
...

Firewall? Or missmatch in /etc/hosts or DNS??

Udo


OK,
- No FW,
- No DNS issue at this point.
- Same procedure followed with the last node, except full cluster update 
before adding new node,new osd.



Only the strange behavior of the 'pveceph createosd' command which
was shown in prevous mail.

...
systemd[1]: ceph-disk@dev-sdc1.service: Main process exited, 
code=exited, status=1/FAILURE

systemd[1]: Failed to start Ceph disk activation: /dev/sdc1.
systemd[1]: ceph-disk@dev-sdc1.service: Unit entered failed state.
systemd[1]: ceph-disk@dev-sdc1.service: Failed with result 'exit-code'

What consequences should i encounter when switching /etc/hosts from 
public_IPs to private_IPs ? ( appart from time travel paradox or 
blackhole bursting ..)


Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken Ceph Cluster when adding new one - Proxmox 5.0 & Ceph Luminous

2017-07-16 Thread Phil Schwarz

Le 15/07/2017 à 23:09, Udo Lembke a écrit :

Hi,

On 15.07.2017 16:01, Phil Schwarz wrote:

Hi,
...

While investigating, i wondered about my config :
Question relative to /etc/hosts file :
Should i use private_replication_LAN Ip or public ones ?

private_replication_LAN!! And the pve-cluster should use another network
(nics) if possible.

Udo


OK, thanks Udo.

After investigation, i did :
- set Noout OSDs
- Stopped CPU-pegging LXC
- Check the cabling
- Restart the whole cluster

Everything went fine !

But, when i tried to add a new OSD :

fdisk /dev/sdc --> Deleted the partition table
parted /dev/sdc --> mklabel msdos (Disk came from a ZFS FreeBSD system)
dd if=/dev/null of=/dev/sdc
ceph-disk zap /dev/sdc
dd if=/dev/zero  of=/dev/sdc bs=10M count=1000

And recreated the OSD via Web GUI.
Same result, the OSD is known by the node, but not by the cluster.

Logs seem to show an issue with this bluestore OSD, have a look at the file.

I'm gonna give a try to OSD recreating using Filestore.

Thanks

pvedaemon[3077]:  starting task 
UPID:varys:7E7D:0004F489:596B5FCE:cephcreateosd:sdc:root@pam:
kernel: [ 3267.263313]  sdc:
systemd[1]: Created slice system-ceph\x2ddisk.slice.
systemd[1]: Starting Ceph disk activation: /dev/sdc2...
sh[1074]: main_trigger: main_trigger: Namespace(cluster='ceph', 
dev='/dev/sdc2', dmcrypt=None, dmcrypt_key_dir='/etc/ceph/dmcrypt-keys', 
func=, log_stdout=True, 
prepend_to_path='/usr/bin', prog='ceph-disk', setgroup=None, setuser=None, 
statedir='/var/lib/ceph', sync=True,
sh[1074]: command: Running command: /sbin/init --version
sh[1074]: command_check_call: Running command: /bin/chown ceph:ceph /dev/sdc2
sh[1074]: command: Running command: /sbin/blkid -o udev -p /dev/sdc2
sh[1074]: command: Running command: /sbin/blkid -o udev -p /dev/sdc2
sh[1074]: main_trigger: trigger /dev/sdc2 parttype 
cafecafe-9b03-4f30-b4c6-b4b80ceff106 uuid 7a6d7546-b93a-452b-9bbc-f660f9a8416c
sh[1074]: command: Running command: /usr/sbin/ceph-disk --verbose 
activate-block /dev/sdc2
systemd[1]: Stopped Ceph disk activation: /dev/sdc2.
systemd[1]: Starting Ceph disk activation: /dev/sdc2...
sh[1074]: main_trigger:
sh[1074]: main_trigger: get_dm_uuid: get_dm_uuid /dev/sdc2 uuid path is 
/sys/dev/block/8:34/dm/uuid
sh[1074]: command: Running command: /sbin/blkid -o udev -p /dev/sdc2
sh[1074]: command: Running command: /usr/bin/ceph-osd --get-device-fsid 
/dev/sdc2
sh[1074]: get_space_osd_uuid: Block /dev/sdc2 has OSD UUID 
----
sh[1074]: main_activate_space: activate: OSD device not present, not starting, 
yet
systemd[1]: Stopped Ceph disk activation: /dev/sdc2.
systemd[1]: Starting Ceph disk activation: /dev/sdc2...
sh[1475]: main_trigger: main_trigger: Namespace(cluster='ceph', 
dev='/dev/sdc2', dmcrypt=None, dmcrypt_key_dir='/etc/ceph/dmcrypt-keys', 
func=, log_stdout=True, 
prepend_to_path='/usr/bin', prog='ceph-disk', setgroup=None, setuser=None, 
statedir='/var/lib/ceph', sync=True,
sh[1475]: command: Running command: /sbin/init --version
sh[1475]: command_check_call: Running command: /bin/chown ceph:ceph /dev/sdc2
sh[1475]: command: Running command: /sbin/blkid -o udev -p /dev/sdc2
sh[1475]: command: Running command: /sbin/blkid -o udev -p /dev/sdc2
sh[1475]: main_trigger: trigger /dev/sdc2 parttype 
cafecafe-9b03-4f30-b4c6-b4b80ceff664 uuid 7a6d7546-b93a-452b-9bbc-f660f9a84664
sh[1475]: command: Running command: /usr/sbin/ceph-disk --verbose 
activate-block /dev/sdc2
kernel: [ 3291.171474]  sdc: sdc1 sdc2
sh[1475]: main_trigger:
sh[1475]: main_trigger: get_dm_uuid: get_dm_uuid /dev/sdc2 uuid path is 
/sys/dev/block/8:34/dm/uuid
sh[1475]: command: Running command: /sbin/blkid -o udev -p /dev/sdc2
sh[1475]: command: Running command: /usr/bin/ceph-osd --get-device-fsid 
/dev/sdc2
sh[1475]: get_space_osd_uuid: Block /dev/sdc2 has OSD UUID 
----
sh[1475]: main_activate_space: activate: OSD device not present, not starting, 
yet
systemd[1]: Stopped Ceph disk activation: /dev/sdc2.
systemd[1]: Starting Ceph disk activation: /dev/sdc2...
sh[1492]: main_trigger: main_trigger: Namespace(cluster='ceph', 
dev='/dev/sdc2', dmcrypt=None, dmcrypt_key_dir='/etc/ceph/dmcrypt-keys', 
func=, log_stdout=True, 
prepend_to_path='/usr/bin', prog='ceph-disk', setgroup=None, setuser=None, 
statedir='/var/lib/ceph', sync=True,
sh[1492]: command: Running command: /sbin/init --version
sh[1492]: command_check_call: Running command: /bin/chown ceph:ceph /dev/sdc2
sh[1492]: command: Running command: /sbin/blkid -o udev -p /dev/sdc2
sh[1492]: command: Running command: /sbin/blkid -o udev -p /dev/sdc2
sh[1492]: main_trigger: trigger /dev/sdc2 parttype 
cafecafe-9b03-4f30-b4c6-b4b80ceff664 uuid 7a6d7546-b93a-452b-9bbc-f660f9a84664
sh[1492]: command: Running command: /usr/sb

[ceph-users] Broken Ceph Cluster when adding new one - Proxmox 5.0 & Ceph Luminous

2017-07-15 Thread Phil Schwarz

Hi,

short version :
I broke my cluster !

Long version , with context:
With a 4 nodes Proxmox Cluster
The nodes are all Pproxmox 5.05+Ceph luminous with filestore
-3 mon+OSD
-1 LXC+OSD

Was working fine
Added a fifth node (proxmox+ceph) today a broke everything..

Though every node can ping each other, the web GUI is full of red 
crossed nodes. No LXC is seen though there up and alive.

However, every other proxmox is manageable through the web GUI

In logs, i've tons of same message on 2 over 3 mons :

" failed to decode message of type 80 v6: buffer::malformed_input: void 
pg_history_t::decode(ceph::buffer::list::iterator&) unknown encoding 
version > 7"


Thanks for your answers.
Best regards

While investigating, i wondered about my config :
Question relative to /etc/hosts file :
Should i use private_replication_LAN Ip or public ones ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Phil Schwarz
Le 02/06/2015 15:33, Eneko Lacunza a écrit :
> Hi,
> 
> On 02/06/15 15:26, Phil Schwarz wrote:
>> On 02/06/15 14:51, Phil Schwarz wrote:
>>>> i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.
>>>>
>>>> -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB
>>>> SATA
>>>> It'll be used as OSD+Mon server only.
>>> Are these SSDs Intel S3700 too? What amount of RAM?
>> Yes, All DCS3700, for the four nodes.
>> 16GB of RAM on this node.
> This should be enough for 3 OSDs I think, I used to have a Dell
> T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK.
> 
> Cheers
> Eneko
> 
Yes, indeed.
My main problem is doing something non adviced...
Running VMs on Ceph nodes...
No choice, but it seems that i'll have to do that.
Hope  i won't peg the CPU too quickly..
Best regards

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Phil Schwarz
Thanks for your answers; mine are inline, too.

Le 02/06/2015 15:17, Eneko Lacunza a écrit :
> Hi,
> 
> On 02/06/15 14:51, Phil Schwarz wrote:
>> i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.
>>
>> -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB
>> SATA
>> It'll be used as OSD+Mon server only.
> Are these SSDs Intel S3700 too? What amount of RAM?
Yes, All DCS3700, for the four nodes.
16GB of RAM on this node.
>> - 3 nodes are setup upon Dell 730+ 1xXeon 2603, 48 GB RAM, 1x 1TB SAS
>> for OS , 4x 4TB SATA for OSD and 2x DCS3700 200GB intel SSD
>>
>> I can't change the hardware, especially the poor cpu...
>>
>> Everything will be connected through Intel X520+Netgear XS708E, as 10GBE
>> storage network.
>>
>> This cluster will support VM (mostly KVM) upon the 3 R730 nodes.
>> I'm already aware of the CPU pegging all the time...But can't change it
>> for the moment.
>> The VM will be Filesharing servers, poor usage services (DNS,DHCP,AD or
>> OpenLDAP).
>> One Proxy cache (Squid) will be used upon a 100Mb Optical fiber with
>> 500+ clients.
>>
>>
>> My question is :
>> Is it recommended to setup  the 2 SSDS as :
>> One SSD as journal for 2 (up to 3in the future) OSDs
>> Or
>> One SSD as journal for the 4 (up to 6 in the future) OSDs and the
>> remaining SSD as cache tiering for the previous SSD+4 OSDs pool ?
> I haven't used cache tiering myself, but others have not reported much
> benefit from it (if any) at all, at least this is my understanding.
> 
Yes, confirmed by the thread "SSD DIsk Distribution".
> So I think it would be better to use both SSDs for journals. It probably
> won't help performance using 2 instead of only 1, but it will lessen the
> impact from a SSD failure. Also it seems that the consensus is 3-4 OSD
> for each SSD, so it will help when you expand to 6 OSD.
Agree; let's go apart from tiering and use journals only.

>> SSD should be rock solid enough to support both bandwidth and living
>> time before being destroyed by the low amount of data that will be
>> written on it (Few hundreds of GB per day as rule of thumb..)
> If all are Intel S3700 you're on the safe side unless you have lots on
> writes. Anyway I suggest you monitor the SMART values.
Ok, i'll keep that in mind too.

Thanks
> 
> Cheers
> Eneko
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Best setup for SSD

2015-06-02 Thread Phil Schwarz
Hi,
i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.

-1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA
It'll be used as OSD+Mon server only.

- 3 nodes are setup upon Dell 730+ 1xXeon 2603, 48 GB RAM, 1x 1TB SAS
for OS , 4x 4TB SATA for OSD and 2x DCS3700 200GB intel SSD

I can't change the hardware, especially the poor cpu...

Everything will be connected through Intel X520+Netgear XS708E, as 10GBE
storage network.

This cluster will support VM (mostly KVM) upon the 3 R730 nodes.
I'm already aware of the CPU pegging all the time...But can't change it
for the moment.
The VM will be Filesharing servers, poor usage services (DNS,DHCP,AD or
OpenLDAP).
One Proxy cache (Squid) will be used upon a 100Mb Optical fiber with
500+ clients.


My question is :
Is it recommended to setup  the 2 SSDS as :
One SSD as journal for 2 (up to 3in the future) OSDs
Or
One SSD as journal for the 4 (up to 6 in the future) OSDs and the
remaining SSD as cache tiering for the previous SSD+4 OSDs pool ?

SSD should be rock solid enough to support both bandwidth and living
time before being destroyed by the low amount of data that will be
written on it (Few hundreds of GB per day as rule of thumb..)

Thanks
Best regards.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what's the difference between pg and pgp?

2015-05-21 Thread Phil Schwarz
Le 21/05/2015 13:49, Ilya Dryomov a écrit :
> On Thu, May 21, 2015 at 12:12 PM, baijia...@126.com  wrote:
>> Re: what's the difference between pg and pgp?
> 
> pg-num is the number of PGs, pgp-num is the number of PGs that will be
> considered for placement, i.e. it's the pgp-num value that is used by
> CRUSH, not pg-num.  For example, consider pg-num = 1024 and pgp-num
> = 1.  In that case you will see 1024 PGs but all of those PGs will map
> to the same set of OSDs.
> 
> When you increase pg-num you are splitting PGs, when you increase
> pgp-num you are moving them, i.e. changing sets of OSDs they map to.
> 
> Thanks,
> 
> Ilya
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
Seems like i said a huge stupid thing ;-)
I'm gonna learn about a new feature of ceph...
I apologize
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what's the difference between pg and pgp?

2015-05-21 Thread Phil Schwarz
Le 21/05/2015 11:12, baijia...@126.com a écrit :
>  
>  
> 
> baijia...@126.com
Hi,
weird question..
There's no relationship at all...Oh, yes a single letter ;-)

pg stands for placement group in ceph storage
pgp stands for Pretty Good Privacy, a crypto based software.

But Wikipedia is your best friend.

Btw, you could use subject for ...the subject of your email; to sum-up
the main idea, and you the body of your email to detail, to be more
specific on your quesiton.
Best regards


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com