Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-14 Thread dE

On 10/15/2017 03:13 AM, Denes Dolhay wrote:


Hello,

Could you include the monitors and the osds as well to your clock skew 
test?


How did you create the osds? ceph-deploy osd create osd1:/dev/sdX 
osd2:/dev/sdY osd3: /dev/sdZ ?


Some log from one of the osds would be great!


Kind regards,

Denes.


On 10/14/2017 07:39 PM, dE wrote:

On 10/14/2017 08:18 PM, David Turner wrote:


What are the ownership permissions on your osd folders? Clock skew 
cares about partial seconds.


It isn't the networking issue because your cluster isn't stuck 
peering. I'm not sure if the creating state happens in disk or in 
the cluster.



On Sat, Oct 14, 2017, 10:01 AM dE . > wrote:


I attached 1TB disks to each osd.

cluster 8161c90e-dbd2-4491-acf8-74449bef916a
 health HEALTH_ERR
    clock skew detected on mon.1, mon.2

    64 pgs are stuck inactive for more than 300 seconds
    64 pgs stuck inactive
    too few PGs per OSD (21 < min 30)
    Monitor clock skew detected
 monmap e1: 3 mons at
{0=10.247.103.139:8567/0,1=10.247.103.140:8567/0,2=10.247.103.141:8567/0

}
    election epoch 12, quorum 0,1,2 0,1,2
 osdmap e10: 3 osds: 3 up, 3 in
    flags sortbitwise,require_jewel_osds
  pgmap v38: 64 pgs, 1 pools, 0 bytes data, 0 objects
    33963 MB used, 3037 GB / 3070 GB avail
  64 creating

I dont seem to have any clock skews --
or i in {139..141}; do ssh $i date +%s; done
1507989554
1507989554
1507989554


On Sat, Oct 14, 2017 at 6:41 PM, David Turner
> wrote:

What is the output of your `ceph status`?


On Fri, Oct 13, 2017, 10:09 PM dE > wrote:

On 10/14/2017 12:53 AM, David Turner wrote:

What does your environment look like?  Someone recently
on the mailing list had PGs stuck creating because of a
networking issue.

On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen
> wrote:

strange that no osd is acting for your pg's
can you show the output from
ceph osd tree


mvh
Ronny Aasen



On 13.10.2017 18:53, dE wrote:
> Hi,
>
>     I'm running ceph 10.2.5 on Debian (official
package).
>
> It cant seem to create any functional pools --
>
> ceph health detail
> HEALTH_ERR 64 pgs are stuck inactive for more
than 300 seconds; 64 pgs
> stuck inactive; too few PGs per OSD (21 < min 30)
> pg 0.39 is stuck inactive for 652.741684, current
state creating, last
> acting []
> pg 0.38 is stuck inactive for 652.741688, current
state creating, last
> acting []
> pg 0.37 is stuck inactive for 652.741690, current
state creating, last
> acting []
> pg 0.36 is stuck inactive for 652.741692, current
state creating, last
> acting []
> pg 0.35 is stuck inactive for 652.741694, current
state creating, last
> acting []
> pg 0.34 is stuck inactive for 652.741696, current
state creating, last
> acting []
> pg 0.33 is stuck inactive for 652.741698, current
state creating, last
> acting []
> pg 0.32 is stuck inactive for 652.741701, current
state creating, last
> acting []
> pg 0.3 is stuck inactive for 652.741762, current
state creating, last
> acting []
> pg 0.2e is stuck inactive for 652.741715, current
state creating, last
> acting []
> pg 0.2d is stuck inactive for 652.741719, current
state creating, last
> acting []
> pg 0.2c is stuck inactive for 652.741721, current
state creating, last
> acting []
> pg 0.2b is stuck inactive for 652.741723, current
state creating, last
> acting []
> pg 0.2a is stuck inactive for 652.741725, current
state creating, last
> acting []
> pg 0.29 is stuck inactive for 652.741727, current

Re: [ceph-users] osd max scrubs not honored?

2017-10-14 Thread J David
On Sat, Oct 14, 2017 at 9:33 AM, David Turner  wrote:
> First, there is no need to deep scrub your PGs every 2 days.

They aren’t being deep scrubbed every two days, nor is there any
attempt (or desire) to do so.  That would be require 8+ scrubs running
at once.  Currently, it takes between 2 and 3 *weeks* to deep scrub
every PG one at a time with no breaks.  Perhaps you misread “48 days”
as “48 hours?”

As long as having one deep scrub running renders the cluster unusable,
the frequency of deep scrubs doesn’t really matter; “ever” is too
often.  If that issue can be resolved, the cron script we wrote will
scrub all the PG’s over a period of 28 days.

> I'm thinking your 1GB is either a typo for a 1TB disk or that your DB
> partitions are 1GB each.

That is a typo, yes.  The SSDs are 100GB (really about 132GB, with
overprovisioning), and each one has three 30GB partitions, one for
each OSD on that host.  These SSDs perform excellently in testing and
in other applications.  They are being utilized <1% of their I/O
capacity (by both IOPS and throughput) by this ceph cluster.  So far
there hasn’t been any thing we’ve seen suggesting there’s a problem
with these drives.

> Third, when talking of a distributed storage system you can never assume it
> isn’t the network.

No assumption is necessary; the network has been exhaustively tested,
both with and without ceph running, both with and without LACP.

The network topology is dirt simple.  There’s a dedicated 10Gbps
switch with 6 two-port LACPs connected to five ceph nodes, one client,
and nothing else.  There are no interface errors, overruns, link
failures or LACP errors on any of the cluster nodes or on the switch.
Like the SSDs (and the CPUs, and the RAM), the network passes all
tests thrown at it and is being utilized by ceph to a very small
fraction of its demonstrated capacity.

But, it’s not a sticking point.  The LAN has now been reconfigured to
remove LACP and use each of the ceph nodes’ 10Gbps interfaces
individually, one as public network, one as cluster network, with
separate VLANs on the switch.  That’s all confirmed to have taken
effect after a full shutdown and restart of all five nodes and the
client.

That change had no effect on this issue.

With that change made, the network was re-tested by setting up 20
simultaneous iperf sessions, 10 clients and 10 servers, with each
machine participating in 4 10-minute tests at once: inbound public
network, outbound public network, inbound cluster network, outbound
cluster network.  With all 20 tests running simultaneously, the
average throughput per test was 7.5Gbps. (With 10 unidirectional
tests, the average throughput is over 9Gbps.)

The client (participating only on the public network) was separately
tested.  With five sequential runs, each run testing inbound and
outbound simultaneously between the client and one of the five ceph
nodes, in each case, the results were over 7Gbps in each direction.

No loss, errors or drops were observed on any interface, nor on the
switch, during either test.

So it does not appear that there are any network problems contributing
to the issue.

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-14 Thread Denes Dolhay

Hello,

Could you include the monitors and the osds as well to your clock skew test?

How did you create the osds? ceph-deploy osd create osd1:/dev/sdX 
osd2:/dev/sdY osd3: /dev/sdZ ?


Some log from one of the osds would be great!


Kind regards,

Denes.


On 10/14/2017 07:39 PM, dE wrote:

On 10/14/2017 08:18 PM, David Turner wrote:


What are the ownership permissions on your osd folders? Clock skew 
cares about partial seconds.


It isn't the networking issue because your cluster isn't stuck 
peering. I'm not sure if the creating state happens in disk or in the 
cluster.



On Sat, Oct 14, 2017, 10:01 AM dE . > wrote:


I attached 1TB disks to each osd.

cluster 8161c90e-dbd2-4491-acf8-74449bef916a
 health HEALTH_ERR
    clock skew detected on mon.1, mon.2

    64 pgs are stuck inactive for more than 300 seconds
    64 pgs stuck inactive
    too few PGs per OSD (21 < min 30)
    Monitor clock skew detected
 monmap e1: 3 mons at
{0=10.247.103.139:8567/0,1=10.247.103.140:8567/0,2=10.247.103.141:8567/0

}
    election epoch 12, quorum 0,1,2 0,1,2
 osdmap e10: 3 osds: 3 up, 3 in
    flags sortbitwise,require_jewel_osds
  pgmap v38: 64 pgs, 1 pools, 0 bytes data, 0 objects
    33963 MB used, 3037 GB / 3070 GB avail
  64 creating

I dont seem to have any clock skews --
or i in {139..141}; do ssh $i date +%s; done
1507989554
1507989554
1507989554


On Sat, Oct 14, 2017 at 6:41 PM, David Turner
> wrote:

What is the output of your `ceph status`?


On Fri, Oct 13, 2017, 10:09 PM dE > wrote:

On 10/14/2017 12:53 AM, David Turner wrote:

What does your environment look like?  Someone recently
on the mailing list had PGs stuck creating because of a
networking issue.

On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen
> wrote:

strange that no osd is acting for your pg's
can you show the output from
ceph osd tree


mvh
Ronny Aasen



On 13.10.2017 18:53, dE wrote:
> Hi,
>
>     I'm running ceph 10.2.5 on Debian (official
package).
>
> It cant seem to create any functional pools --
>
> ceph health detail
> HEALTH_ERR 64 pgs are stuck inactive for more than
300 seconds; 64 pgs
> stuck inactive; too few PGs per OSD (21 < min 30)
> pg 0.39 is stuck inactive for 652.741684, current
state creating, last
> acting []
> pg 0.38 is stuck inactive for 652.741688, current
state creating, last
> acting []
> pg 0.37 is stuck inactive for 652.741690, current
state creating, last
> acting []
> pg 0.36 is stuck inactive for 652.741692, current
state creating, last
> acting []
> pg 0.35 is stuck inactive for 652.741694, current
state creating, last
> acting []
> pg 0.34 is stuck inactive for 652.741696, current
state creating, last
> acting []
> pg 0.33 is stuck inactive for 652.741698, current
state creating, last
> acting []
> pg 0.32 is stuck inactive for 652.741701, current
state creating, last
> acting []
> pg 0.3 is stuck inactive for 652.741762, current
state creating, last
> acting []
> pg 0.2e is stuck inactive for 652.741715, current
state creating, last
> acting []
> pg 0.2d is stuck inactive for 652.741719, current
state creating, last
> acting []
> pg 0.2c is stuck inactive for 652.741721, current
state creating, last
> acting []
> pg 0.2b is stuck inactive for 652.741723, current
state creating, last
> acting []
> pg 0.2a is stuck inactive for 652.741725, current
state creating, last
> acting []
> pg 0.29 is stuck inactive for 652.741727, current
state creating, last
> 

Re: [ceph-users] Ceph iSCSI login failed due to authorization failure

2017-10-14 Thread Maged Mokhtar
On 2017-10-14 17:50, Kashif Mumtaz wrote:

> Hello Dear, 
> 
> I am trying to configure the Ceph iscsi gateway on Ceph Luminious . As per 
> below 
> 
> Ceph iSCSI Gateway -- Ceph Documentation [1] 
> 
> [1] 
> 
> CEPH ISCSI GATEWAY — CEPH DOCUMENTATION
> 
> Ceph is iscsi gateway are configured and chap auth is set. 
> 
> /> ls 
> o- / 
> .
>  [...] 
> o- clusters 
> 
>  [Clusters: 1] 
> | o- ceph 
> ..
>  [HEALTH_WARN] 
> |   o- pools 
> ..
>  [Pools: 2] 
> |   | o- kashif 
> . [Commit: 
> 0b, Avail: 116G, Used: 1K, Commit%: 0%] 
> |   | o- rbd 
> ... [Commit: 
> 10G, Avail: 116G, Used: 3K, Commit%: 8%] 
> |   o- topology 
> ...
>  [OSDs: 13,MONs: 3] 
> o- disks 
> .
>  [10G, Disks: 1] 
> | o- rbd.disk_1 
> ...
>  [disk_1 (10G)] 
> o- iscsi-target 
> .
>  [Targets: 1] 
> o- iqn.2003-01.com.redhat.iscsi-gw:tahir 
> . 
> [Gateways: 2] 
> o- gateways 
> 
>  [Up: 2/2, Portals: 2] 
> | o- gateway 
> 
>  [192.168.10.37 (UP)] 
> | o- gateway2 
> ...
>  [192.168.10.38 (UP)] 
> o- hosts 
> ..
>  [Hosts: 1] 
> o- iqn.1994-05.com.redhat:rh7-client 
> ... [Auth: CHAP, 
> Disks: 1(10G)] 
> o- lun 0 
> ..
>  [rbd.disk_1(10G), Owner: gateway2] 
> /> 
> 
> But initiators are unable to mount it. Try both ion Linux and ESXi 6. 
> 
> Below is the  error message on iscsi gateway server log file. 
> 
> Oct 14 19:34:49 gateway kernel: iSCSI Initiator Node: 
> iqn.1998-01.com.vmware:esx0-36c45c69 is not authorized to access iSCSI target 
> portal group: 1. 
> Oct 14 19:34:49 gateway kernel: iSCSI Login negotiation failed. 
> 
> Oct 14 19:35:27 gateway kernel: iSCSI Initiator Node: 
> iqn.1994-05.com.redhat:5ef55740c576 is not authorized to access iSCSI target 
> portal group: 1. 
> Oct 14 19:35:27 gateway kernel: iSCSI Login negotiation failed. 
> 
> I am giving the ceph authentication on initiator side.
> 
> Discovery on initiator is happening  
> 
> root@server1 ~]# iscsiadm -m discovery -t st -p  192.168.10.37 
> 192.168.10.37:3260,1 iqn.2003-01.com.redhat.iscsi-gw:tahir 
> 192.168.10.38:3260,2 iqn.2003-01.com.redhat.iscsi-gw:tahir 
> 
> But when trying to login , it is giving  "iSCSI login failed due to 
> authorization failure" 
> 
> [root@server1 ~]# iscsiadm -m node -T iqn.2003-01.com.redhat.iscsi-gw:tahir  
> -l 
> Logging in to [iface: default, target: iqn.2003-01.com.redhat.iscsi-gw:tahir, 
> portal: 192.168.10.37,3260] (multiple) 
> Logging in to [iface: default, target: iqn.2003-01.com.redhat.iscsi-gw:tahir, 
> portal: 192.168.10.38,3260] (multiple) 
> iscsiadm: Could not login to [iface: default, target: 
> iqn.2003-01.com.redhat.iscsi-gw:tahir, portal: 192.168.10.37,3260]. 
> iscsiadm: initiator reported error (24 - iSCSI login failed due to 
> authorization failure) 
> iscsiadm: Could not login to [iface: default, target: 
> iqn.2003-01.com.redhat.iscsi-gw:tahir, portal: 192.168.10.38,3260]. 
> iscsiadm: initiator reported error (24 - iSCSI login failed due to 
> authorization failure) 
> iscsiadm: Could not log into all portals 
> 
> Can someone give the idea what is missing. 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This is a bit different from the LIO version i know but it seems the
client initiator configured on the target is 
iqn.1994-05.com.redhat:rh7-client
whereas you are trying to log with: 

iqn.1998-01.com.vmware:esx0-36c45c69 is not authorized

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-14 Thread dE

On 10/14/2017 08:18 PM, David Turner wrote:


What are the ownership permissions on your osd folders? Clock skew 
cares about partial seconds.


It isn't the networking issue because your cluster isn't stuck 
peering. I'm not sure if the creating state happens in disk or in the 
cluster.



On Sat, Oct 14, 2017, 10:01 AM dE . > wrote:


I attached 1TB disks to each osd.

cluster 8161c90e-dbd2-4491-acf8-74449bef916a
 health HEALTH_ERR
    clock skew detected on mon.1, mon.2

    64 pgs are stuck inactive for more than 300 seconds
    64 pgs stuck inactive
    too few PGs per OSD (21 < min 30)
    Monitor clock skew detected
 monmap e1: 3 mons at
{0=10.247.103.139:8567/0,1=10.247.103.140:8567/0,2=10.247.103.141:8567/0

}
    election epoch 12, quorum 0,1,2 0,1,2
 osdmap e10: 3 osds: 3 up, 3 in
    flags sortbitwise,require_jewel_osds
  pgmap v38: 64 pgs, 1 pools, 0 bytes data, 0 objects
    33963 MB used, 3037 GB / 3070 GB avail
  64 creating

I dont seem to have any clock skews --
or i in {139..141}; do ssh $i date +%s; done
1507989554
1507989554
1507989554


On Sat, Oct 14, 2017 at 6:41 PM, David Turner
> wrote:

What is the output of your `ceph status`?


On Fri, Oct 13, 2017, 10:09 PM dE > wrote:

On 10/14/2017 12:53 AM, David Turner wrote:

What does your environment look like?  Someone recently
on the mailing list had PGs stuck creating because of a
networking issue.

On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen
> wrote:

strange that no osd is acting for your pg's
can you show the output from
ceph osd tree


mvh
Ronny Aasen



On 13.10.2017 18:53, dE wrote:
> Hi,
>
>     I'm running ceph 10.2.5 on Debian (official
package).
>
> It cant seem to create any functional pools --
>
> ceph health detail
> HEALTH_ERR 64 pgs are stuck inactive for more than
300 seconds; 64 pgs
> stuck inactive; too few PGs per OSD (21 < min 30)
> pg 0.39 is stuck inactive for 652.741684, current
state creating, last
> acting []
> pg 0.38 is stuck inactive for 652.741688, current
state creating, last
> acting []
> pg 0.37 is stuck inactive for 652.741690, current
state creating, last
> acting []
> pg 0.36 is stuck inactive for 652.741692, current
state creating, last
> acting []
> pg 0.35 is stuck inactive for 652.741694, current
state creating, last
> acting []
> pg 0.34 is stuck inactive for 652.741696, current
state creating, last
> acting []
> pg 0.33 is stuck inactive for 652.741698, current
state creating, last
> acting []
> pg 0.32 is stuck inactive for 652.741701, current
state creating, last
> acting []
> pg 0.3 is stuck inactive for 652.741762, current
state creating, last
> acting []
> pg 0.2e is stuck inactive for 652.741715, current
state creating, last
> acting []
> pg 0.2d is stuck inactive for 652.741719, current
state creating, last
> acting []
> pg 0.2c is stuck inactive for 652.741721, current
state creating, last
> acting []
> pg 0.2b is stuck inactive for 652.741723, current
state creating, last
> acting []
> pg 0.2a is stuck inactive for 652.741725, current
state creating, last
> acting []
> pg 0.29 is stuck inactive for 652.741727, current
state creating, last
> acting []
> pg 0.28 is stuck inactive for 652.741730, current
state creating, last
> acting []
> pg 0.27 is stuck inactive for 652.741732, current
state creating, last
> acting []
> 

Re: [ceph-users] Ceph iSCSI login failed due to authorization failure

2017-10-14 Thread Jason Dillaman
Have you set the CHAP username and password on both sides (and ensured that
the initiator IQN matches)? On the initiator side, you would run the
following before attempting to log into the portal:

iscsiadm   --mode node  --targetname  --op=update --name
node.session.auth.authmethod --value=CHAP
iscsiadm   --mode node  --targetname  --op=update --name
node.session.auth.username --value=
iscsiadm   --mode node  --targetname  --op=update --name
node.session.auth.password --value=



On Sat, Oct 14, 2017 at 11:50 AM, Kashif Mumtaz 
wrote:

> Hello Dear,
>
> I am trying to configure the Ceph iscsi gateway on Ceph Luminious . As per
> below
>
> Ceph iSCSI Gateway — Ceph Documentation
> 
>
> Ceph iSCSI Gateway — Ceph Documentation
> 
>
>
>
> Ceph is iscsi gateway are configured and chap auth is set.
>
>
>
>
> /> ls
> o- / 
> . [...]
>   o- clusters 
>  [Clusters: 1]
>   | o- ceph 
> .. [HEALTH_WARN]
>   |   o- pools ..
> 
> [Pools: 2]
>   |   | o- kashif ..
> ... [Commit: 0b, Avail: 116G, Used: 1K,
> Commit%: 0%]
>   |   | o- rbd ..
> . [Commit: 10G, Avail: 116G, Used:
> 3K, Commit%: 8%]
>   |   o- topology ..
> . [OSDs:
> 13,MONs: 3]
>   o- disks 
> . [10G, Disks: 1]
>   | o- rbd.disk_1 ..
> .
> [disk_1 (10G)]
>   o- iscsi-target ..
> ...
> [Targets: 1]
> o- iqn.2003-01.com.redhat.iscsi-gw:tahir
> .
> [Gateways: 2]
>   o- gateways ..
> .. [Up: 2/2,
> Portals: 2]
>   | o- gateway ..
> ..
> [192.168.10.37 (UP)]
>   | o- gateway2 ..
> .
> [192.168.10.38 (UP)]
>   o- hosts ..
> 
> [Hosts: 1]
> o- iqn.1994-05.com.redhat:rh7-client
> ... [Auth: CHAP,
> Disks: 1(10G)]
>   o- lun 0 ..
>  [rbd.disk_1(10G), Owner:
> gateway2]
> />
>
>
>
> But initiators are unable to mount it. Try both ion Linux and ESXi 6.
>
>
>
> Below is the  error message on iscsi gateway server log file.
>
> Oct 14 19:34:49 gateway kernel: iSCSI Initiator Node:
> iqn.1998-01.com.vmware:esx0-36c45c69 is not authorized to access iSCSI
> target portal group: 1.
> Oct 14 19:34:49 gateway kernel: iSCSI Login negotiation failed.
>
> Oct 14 19:35:27 gateway kernel: iSCSI Initiator Node:
> iqn.1994-05.com.redhat:5ef55740c576 is not authorized to access iSCSI
> target portal group: 1.
> Oct 14 19:35:27 gateway kernel: iSCSI Login negotiation failed.
>
>
> I am giving the ceph authentication on initiator side.
>
> Discovery on initiator is happening
>
> root@server1 ~]# iscsiadm -m discovery -t st -p  192.168.10.37
> 192.168.10.37:3260,1 iqn.2003-01.com.redhat.iscsi-gw:tahir
> 192.168.10.38:3260,2 iqn.2003-01.com.redhat.iscsi-gw:tahir
>
> But when trying to login , it is giving  "iSCSI login failed due to
> authorization failure"
>
>
> [root@server1 ~]# iscsiadm -m node -T iqn.2003-01.com.redhat.iscsi-gw:tahir
> -l
> Logging in to [iface: default, target: iqn.2003-01.com.redhat.iscsi-gw:tahir,
> portal: 192.168.10.37,3260] (multiple)
> Logging in to [iface: default, target: iqn.2003-01.com.redhat.iscsi-gw:tahir,
> portal: 192.168.10.38,3260] (multiple)
> iscsiadm: Could not login to [iface: default, target:
> iqn.2003-01.com.redhat.iscsi-gw:tahir, portal: 192.168.10.37,3260].
> iscsiadm: initiator reported error (24 - iSCSI login failed due to
> authorization failure)
> iscsiadm: Could not login to [iface: default, target:
> iqn.2003-01.com.redhat.iscsi-gw:tahir, portal: 192.168.10.38,3260].
> iscsiadm: 

[ceph-users] Backup VM (Base image + snapshot)

2017-10-14 Thread Oscar Segarra
Hi,

In my VDI environment I have configured the suggested ceph
design/arquitecture:

http://docs.ceph.com/docs/giant/rbd/rbd-snapshot/

Where I have a Base Image + Protected Snapshot + 100 clones (one for each
persistent VDI).

Now, I'd like to configure a backup script/mechanism to perform backups of
each persistent VDI VM to an external (non ceph) device, like NFS or
something similar...

Then, some questions:

1.- Does anybody have been able to do this kind of backups?
2.- Is it possible to export BaseImage in qcow2 format and snapshots in
qcow2 format as well as "linked clones" ?
3.- Is it possible to export the Base Image in raw format, snapshots in raw
format as well and, when recover is required, import both images and
"relink" them?
4.- What is the suggested solution for this scenario?

Thanks a lot everybody!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph iSCSI login failed due to authorization failure

2017-10-14 Thread Kashif Mumtaz
Hello Dear,
I am trying to configure the Ceph iscsi gateway on Ceph Luminious . As per below
Ceph iSCSI Gateway — Ceph Documentation

  
|  
|   |  
Ceph iSCSI Gateway — Ceph Documentation
   |  |

  |

 


Ceph is iscsi gateway are configured and chap auth is set.



/> lso- / 
.
 [...]  o- clusters 

 [Clusters: 1]  | o- ceph 
..
 [HEALTH_WARN]  |   o- pools 
..
 [Pools: 2]  |   | o- kashif 
. [Commit: 0b, 
Avail: 116G, Used: 1K, Commit%: 0%]  |   | o- rbd 
... [Commit: 
10G, Avail: 116G, Used: 3K, Commit%: 8%]  |   o- topology 
...
 [OSDs: 13,MONs: 3]  o- disks 
.
 [10G, Disks: 1]  | o- rbd.disk_1 
...
 [disk_1 (10G)]  o- iscsi-target 
.
 [Targets: 1]    o- iqn.2003-01.com.redhat.iscsi-gw:tahir 
. 
[Gateways: 2]      o- gateways 

 [Up: 2/2, Portals: 2]      | o- gateway 

 [192.168.10.37 (UP)]      | o- gateway2 
...
 [192.168.10.38 (UP)]      o- hosts 
..
 [Hosts: 1]        o- iqn.1994-05.com.redhat:rh7-client 
... [Auth: CHAP, Disks: 
1(10G)]          o- lun 0 
.. 
[rbd.disk_1(10G), Owner: gateway2]/>


But initiators are unable to mount it. Try both ion Linux and ESXi 6.


Below is the  error message on iscsi gateway server log file.
Oct 14 19:34:49 gateway kernel: iSCSI Initiator Node: 
iqn.1998-01.com.vmware:esx0-36c45c69 is not authorized to access iSCSI target 
portal group: 1.Oct 14 19:34:49 gateway kernel: iSCSI Login negotiation failed.
Oct 14 19:35:27 gateway kernel: iSCSI Initiator Node: 
iqn.1994-05.com.redhat:5ef55740c576 is not authorized to access iSCSI target 
portal group: 1.Oct 14 19:35:27 gateway kernel: iSCSI Login negotiation failed.

I am giving the ceph authentication on initiator side.   
Discovery on initiator is happening 
root@server1 ~]# iscsiadm -m discovery -t st -p  
192.168.10.37192.168.10.37:3260,1 
iqn.2003-01.com.redhat.iscsi-gw:tahir192.168.10.38:3260,2 
iqn.2003-01.com.redhat.iscsi-gw:tahir
But when trying to login , it is giving  "iSCSI login failed due to 
authorization failure"

[root@server1 ~]# iscsiadm -m node -T iqn.2003-01.com.redhat.iscsi-gw:tahir  
-lLogging in to [iface: default, target: iqn.2003-01.com.redhat.iscsi-gw:tahir, 
portal: 192.168.10.37,3260] (multiple)Logging in to [iface: default, target: 
iqn.2003-01.com.redhat.iscsi-gw:tahir, portal: 192.168.10.38,3260] 
(multiple)iscsiadm: Could not login to [iface: default, target: 
iqn.2003-01.com.redhat.iscsi-gw:tahir, portal: 192.168.10.37,3260].iscsiadm: 
initiator reported error (24 - iSCSI login failed due to authorization 
failure)iscsiadm: Could not login to [iface: default, target: 
iqn.2003-01.com.redhat.iscsi-gw:tahir, portal: 192.168.10.38,3260].iscsiadm: 
initiator reported error (24 - iSCSI login failed due to authorization 
failure)iscsiadm: Could not log into all portals

Can someone give the idea what is missing.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: assert(objiter->second->version > last_divergent_update) when testing pull out disk and insert

2017-10-14 Thread zhaomingyue
1、this assert happened accidently, not easy to reproduce; In fact, I also 
suppose this assert is caused by device data lost;
but if has lost,how it can accur that (last_update +1 = log.rbegin.version) , 
in case of losting data, it's more likely to be confused. At present, this 
situation can't think clearly.

2、According read_log code,assume a situation:
When osd start ,if pg log has been lost some content because of power off or 
xfs error,then log.head would be bigger than log.rbegin.version in memory;
during peering , using last_update as one determinal arg to find_best, so the 
consistent one(osd who has shorter pg log,but last_update is normal) may become 
the auth log,
other osd use 'this auth log' would lead to pg inconsistent if scrub this 
pg,isn’t it?



-邮件原件-
发件人: Gregory Farnum [mailto:gfar...@redhat.com]
发送时间: 2017年10月14日 0:34
收件人: zhaomingyue 09440 (RD)
抄送: ceph-de...@vger.kernel.org; ceph-us...@ceph.com
主题: Re: [ceph-users] assert(objiter->second->version > last_divergent_update) 
when testing pull out disk and insert

On Fri, Oct 13, 2017 at 12:48 AM, zhaomingyue  wrote:
> Hi:
> I had met an assert problem like
> bug16279(http://tracker.ceph.com/issues/16279) when testing pull out
> disk and insert, ceph version 10.2.5,assert(objiter->second->version >
> last_divergent_update)
>
> according to osd log,I think this maybe due to (log.head !=
> *log.log.rbegin.version.version) when some abnormal condition
> happened,such as power off ,pull out disk and insert.

I don't think is supposed to be possible. We apply all changes like this 
atomically; FileStore does all its journaling to prevent partial updates like 
this.

A few other people have reported the same issue on disk pull, so maybe there's 
some *other* issue going on, but the correct fix is by preventing those two 
from differing (unless I misunderstand the context).

Given one of the reporters on that ticket confirms they also had xfs issues, I 
find it vastly more likely that something in your kernel configuration and 
hardware stack is not writing out data the way it claims to. Be very, very sure 
all that is working correctly!


> In below situation, merge_log would push 234’1034 into divergent
> list;and divergent has only one node;then lead to
> assert(objiter->second->version > last_divergent_update).
>
> olog     (0’0, 234’1034)  olog.head = 234’1034
>
> log      (0’0, 234’1034)  log.head = 234’1033
>
>
>
> I see osd load_pgs code,in function PGLog::read_log() , code like this:
>  .
>  for (p->seek_to_first(); p->valid() ; p->next()) {
>
> .
>
> log.log.push_back(e);
>
> log.head = e.version;  // every pg log node
>
>   }
>
> .
>
>  log.head = info.last_update;
>
>
>
> two doubt:
>
> first : why set (log.head = info.last_update) after all pg log node
> processed(every node has updated log.head = e.version)?
>
> second: Whether it can occur that info.last_update is less than
> *log.log.rbegin.version or not and what scene happens?

I'm looking at the luminous code base right now and things have changed a bit 
so I don't have the specifics of your question on hand.

But the general reason we change these versions around is because we need to 
reconcile the logs across all OSDs. If one OSD has an entry for an operation 
that was never returned to the client, we may need to declare it divergent and 
undo it. (In replicated pools, entries are only divergent if the OSD hosting it 
was either netsplit from the primary, or else managed to commit something 
during a failure event that its peers didn't and then was resubmitted under a 
different ID by the client on recovery. In erasure-coded pools things are more 
complicated because we can only roll operations forward if a quorum of the 
shards are present.) -Greg
-
本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from New H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd max scrubs not honored?

2017-10-14 Thread David Turner
A few things. First, there is no need to deep scrub your PGs every 2 days.
Schedule it out so it's closer to a month or so. If you have a really bad
power hiccup, up the schedule to check for consistency.

Second, you said "Intel SSD DC S3700 1GB divided into three partitions used
for Bluestore blocks.db for each OSD". How large are the partitions for
each osd? What percentage of the available space of the SSD is in use? This
model of SSD over-provisions nicely, but you can always help out by not
provisioning all of it. However DB partitions should be given a good size.
I'm thinking your 1GB is either a typo for a 1TB disk or that your DB
partitions are 1GB each.

Third, when talking of a distributed storage system you can never assume it
isn't the network.  You should really consider disabling your bond and
testing with a single nic between all of your hosts. This would not be the
first time I've seen a bonded network cause issues at least this bad on a
cluster. Do you have cluster_network and public_network set? What does your
network topology look like?

On Fri, Oct 13, 2017, 11:02 PM J David  wrote:

> Thanks all for input on this.
>
> It’s taken a couple of weeks, but based on the feedback from the list,
> we’ve got our version of a scrub-one-at-a-time cron script running and
> confirmed that it’s working properly.
>
> Unfortunately, this hasn’t really solved the real problem.  Even with
> just one scrub and one client running, client I/O requests routinely
> take 30-60 seconds to complete (read or write), which is so poor that
> the cluster is unusable for any sort of interactive activity.  Nobody
> is going to sit around and wait 30-60 seconds for a file to save or
> load, or for a web server to respond, or a SQL query to finish.
>
> Running “ceph -w” blames this on slow requests blocked for > 32 seconds:
>
> 2017-10-13 21:21:34.445798 mon.ceph1 [INF] overall HEALTH_OK
> 2017-10-13 21:21:51.305661 mon.ceph1 [WRN] Health check failed: 42
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:21:57.311892 mon.ceph1 [WRN] Health check update: 140
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:03.343443 mon.ceph1 [WRN] Health check update: 111
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:01.833605 osd.5 [WRN] 1 slow requests, 1 included
> below; oldest blocked for > 30.526819 secs
> 2017-10-13 21:22:01.833614 osd.5 [WRN] slow request 30.526819 seconds
> old, received at 2017-10-13 21:21:31.306718:
> osd_op(client.6104975.0:7330926 0.a2
> 0:456218c9:::rbd_data.1a24832ae8944a.0009d21d:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write
> 2364416~88064] snapc 0=[] ondisk+write+known_if_redirected e18866)
> currently sub_op_commit_rec from 9
> 2017-10-13 21:22:11.238561 mon.ceph1 [WRN] Health check update: 24
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:04.834075 osd.5 [WRN] 1 slow requests, 1 included
> below; oldest blocked for > 30.291869 secs
> 2017-10-13 21:22:04.834082 osd.5 [WRN] slow request 30.291869 seconds
> old, received at 2017-10-13 21:21:34.542137:
> osd_op(client.6104975.0:7331703 0.a2
> 0:4571f0f6:::rbd_data.1a24832ae8944a.0009c8ef:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write
> 2934272~46592] snapc 0=[] ondisk+write+known_if_redirected e18866)
> currently op_applied
> 2017-10-13 21:22:07.834445 osd.5 [WRN] 1 slow requests, 1 included
> below; oldest blocked for > 30.421122 secs
> 2017-10-13 21:22:07.834452 osd.5 [WRN] slow request 30.421122 seconds
> old, received at 2017-10-13 21:21:37.413260:
> osd_op(client.6104975.0:7332411 0.a2
> 0:456218c9:::rbd_data.1a24832ae8944a.0009d21d:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write
> 4068352~16384] snapc 0=[] ondisk+write+known_if_redirected e18866)
> currently op_applied
> 2017-10-13 21:22:16.238929 mon.ceph1 [WRN] Health check update: 8 slow
> requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:21.239234 mon.ceph1 [WRN] Health check update: 4 slow
> requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:21.329402 mon.ceph1 [INF] Health check cleared:
> REQUEST_SLOW (was: 4 slow requests are blocked > 32 sec)
> 2017-10-13 21:22:21.329490 mon.ceph1 [INF] Cluster is now healthy
>
> So far, the following steps have been taken to attempt to resolve this:
>
> 1) Updated to Ubuntu 16.04.3 LTS and Ceph 12.2.1.
>
> 2) Changes to ceph.conf:
> osd max scrubs = 1
> osd scrub during recovery = false
> osd deep scrub interval = 2592000
> osd scrub max interval = 2592000
> osd deep scrub randomize ratio = 0.0
> osd disk thread ioprio priority = 7
> osd disk thread ioprio class = idle
> osd scrub sleep = 0.1
>
> 3) Kernel I/O Scheduler set to cfq.
>
> 4) Deep-scrub moved to cron, with a limit of one running at a time.
>
> With these changes, scrubs now take 40-45 minutes to complete, up from
> 20-25, so the amount of time where there are client 

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-14 Thread David Turner
What is the output of your `ceph status`?

On Fri, Oct 13, 2017, 10:09 PM dE  wrote:

> On 10/14/2017 12:53 AM, David Turner wrote:
>
> What does your environment look like?  Someone recently on the mailing
> list had PGs stuck creating because of a networking issue.
>
> On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen 
> wrote:
>
>> strange that no osd is acting for your pg's
>> can you show the output from
>> ceph osd tree
>>
>>
>> mvh
>> Ronny Aasen
>>
>>
>>
>> On 13.10.2017 18:53, dE wrote:
>> > Hi,
>> >
>> > I'm running ceph 10.2.5 on Debian (official package).
>> >
>> > It cant seem to create any functional pools --
>> >
>> > ceph health detail
>> > HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs
>> > stuck inactive; too few PGs per OSD (21 < min 30)
>> > pg 0.39 is stuck inactive for 652.741684, current state creating, last
>> > acting []
>> > pg 0.38 is stuck inactive for 652.741688, current state creating, last
>> > acting []
>> > pg 0.37 is stuck inactive for 652.741690, current state creating, last
>> > acting []
>> > pg 0.36 is stuck inactive for 652.741692, current state creating, last
>> > acting []
>> > pg 0.35 is stuck inactive for 652.741694, current state creating, last
>> > acting []
>> > pg 0.34 is stuck inactive for 652.741696, current state creating, last
>> > acting []
>> > pg 0.33 is stuck inactive for 652.741698, current state creating, last
>> > acting []
>> > pg 0.32 is stuck inactive for 652.741701, current state creating, last
>> > acting []
>> > pg 0.3 is stuck inactive for 652.741762, current state creating, last
>> > acting []
>> > pg 0.2e is stuck inactive for 652.741715, current state creating, last
>> > acting []
>> > pg 0.2d is stuck inactive for 652.741719, current state creating, last
>> > acting []
>> > pg 0.2c is stuck inactive for 652.741721, current state creating, last
>> > acting []
>> > pg 0.2b is stuck inactive for 652.741723, current state creating, last
>> > acting []
>> > pg 0.2a is stuck inactive for 652.741725, current state creating, last
>> > acting []
>> > pg 0.29 is stuck inactive for 652.741727, current state creating, last
>> > acting []
>> > pg 0.28 is stuck inactive for 652.741730, current state creating, last
>> > acting []
>> > pg 0.27 is stuck inactive for 652.741732, current state creating, last
>> > acting []
>> > pg 0.26 is stuck inactive for 652.741734, current state creating, last
>> > acting []
>> > pg 0.3e is stuck inactive for 652.741707, current state creating, last
>> > acting []
>> > pg 0.f is stuck inactive for 652.741761, current state creating, last
>> > acting []
>> > pg 0.3f is stuck inactive for 652.741708, current state creating, last
>> > acting []
>> > pg 0.10 is stuck inactive for 652.741763, current state creating, last
>> > acting []
>> > pg 0.4 is stuck inactive for 652.741773, current state creating, last
>> > acting []
>> > pg 0.5 is stuck inactive for 652.741774, current state creating, last
>> > acting []
>> > pg 0.3a is stuck inactive for 652.741717, current state creating, last
>> > acting []
>> > pg 0.b is stuck inactive for 652.741771, current state creating, last
>> > acting []
>> > pg 0.c is stuck inactive for 652.741772, current state creating, last
>> > acting []
>> > pg 0.3b is stuck inactive for 652.741721, current state creating, last
>> > acting []
>> > pg 0.d is stuck inactive for 652.741774, current state creating, last
>> > acting []
>> > pg 0.3c is stuck inactive for 652.741722, current state creating, last
>> > acting []
>> > pg 0.e is stuck inactive for 652.741776, current state creating, last
>> > acting []
>> > pg 0.3d is stuck inactive for 652.741724, current state creating, last
>> > acting []
>> > pg 0.22 is stuck inactive for 652.741756, current state creating, last
>> > acting []
>> > pg 0.21 is stuck inactive for 652.741758, current state creating, last
>> > acting []
>> > pg 0.a is stuck inactive for 652.741783, current state creating, last
>> > acting []
>> > pg 0.20 is stuck inactive for 652.741761, current state creating, last
>> > acting []
>> > pg 0.9 is stuck inactive for 652.741787, current state creating, last
>> > acting []
>> > pg 0.1f is stuck inactive for 652.741764, current state creating, last
>> > acting []
>> > pg 0.8 is stuck inactive for 652.741790, current state creating, last
>> > acting []
>> > pg 0.7 is stuck inactive for 652.741792, current state creating, last
>> > acting []
>> > pg 0.6 is stuck inactive for 652.741794, current state creating, last
>> > acting []
>> > pg 0.1e is stuck inactive for 652.741770, current state creating, last
>> > acting []
>> > pg 0.1d is stuck inactive for 652.741772, current state creating, last
>> > acting []
>> > pg 0.1c is stuck inactive for 652.741774, current state creating, last
>> > acting []
>> > pg 0.1b is stuck inactive for 652.741777, current state creating, last
>> > acting []
>> > pg 0.1a is stuck inactive for 652.741784, current state creating, last
>> 

Re: [ceph-users] using Bcache on blueStore

2017-10-14 Thread Jorge Pinilla López
Okay I get your point, its way more safer without cache at all.

I am talking from totally ignorace, so please correct me if I say
something wrong.

What I dont really understand is how badly is DB space used.

1-When its a new OSD, it might be totally empty but its not used for
storing any actual data at all, so writes and reads could be speeded up
by using that free space.

2-When OSD is full, maybe you have tones of cold metadata that is never
used taking all the space in the SSD. So maybe it wouldnt be a bad idea
to push that metadata to the HDD cold DB and trying to bring the actual
hot data into the disk so reads (or maybe writes) could be improved. So
maybe having a hot ratio on the metadata could be interesing, I know
metadata is way more important that the actual data but if metadata is
freezing I dont get the value of being using SSD space.

I know blueStore has also System Cache as I metion in this email

https://www.spinics.net/lists/ceph-users/msg39426.html

but also that cache doesnt include any data at all, so its hard for me
to understand how bluestore can be so fast if its limited to the HDD speed.

If someone knows how RocksDB SSD actually works, how bluestore works to
keep speed up or why the hole metadata should be in a separate SSD
please tell me :) I am really trying to understand this topic.

El 14/10/2017 a las 2:39, Kjetil Joergensen escribió:
> Generally on bcache & for that matter lvmcache & dmwriteboost.
>
> We did extensive "power off" testing with all of them and reliably
> managed to break it on our hardware setup.
>
> while true; boot box; start writing & stress metadata updates (i.e.
> make piles of files and unlink them, or you could find something else
> that's picky about write ordering); let it run for a bit; yank power;
> power on;
>
> This never survived for more than a night without badly corrupting
> some xfs filesystem. We did the same testing without caching and could
> not reproduce.
>
> This may have been a quirk resulting from our particular setup, I get
> the impression that others use it and sleep well at night, but I'd
> recommend testing it under the most unforgivable circumstances you can
> think of before proceeding.
>
> -KJ
>
> On Thu, Oct 12, 2017 at 4:54 PM, Jorge Pinilla López  
> wrote:
>> Well, I wouldn't use bcache on filestore at all.
>> First there are problems with all that you have said and second but way
>> important you got doble writes (in FS data was written to journal and to
>> storage disk at the same time), if jounal and data disk were the same then
>> speed was divided by two getting really bad output.
>>
>> In BlueStore things change quite a lot, first there are not double writes
>> there is no "journal" (well there is  a something call Wal but  it's not
>> used in the same way), data goes directly into the data disk and you only
>> write a few metadata and make a commit into the DB. Rebalancing and scrub go
>> through a RockDB not a file system making it way more simple and effective,
>> you aren't supposed to have all the problems that you had with FS.
>>
>> In addition, cache tiering has been deprecated on Red Hat Ceph Storage so I
>> personally wouldn't use something deprecated by developers and support.
>>
>>
>>  Mensaje original 
>> De: Marek Grzybowski 
>> Fecha: 13/10/17 12:22 AM (GMT+01:00)
>> Para: Jorge Pinilla López , ceph-users@lists.ceph.com
>> Asunto: Re: [ceph-users] using Bcache on blueStore
>>
>> On 12.10.2017 20:28, Jorge Pinilla López wrote:
>>> Hey all!
>>> I have a ceph with multiple HDD and 1 really fast SSD with (30GB per OSD)
>>> per host.
>>>
>>> I have been thinking and all docs say that I should give all the SSD space
>>> for RocksDB, so I would have a HDD data and a 30GB partition for RocksDB.
>>>
>>> But it came to my mind that if the OSD isnt full maybe I am not using all
>>> the space in the SSD, or maybe I prefer having a really small amount of hot
>>> k/v and metadata and the data itself in a really fast device than just
>>> storing all could metadata.
>>>
>>> So I though that using Bcache to make SSD to be a cache and as metadata
>>> and k/v are usually hot, they should be place on the cache. But this doesnt
>>> guarantee me that k/v and metadata are actually always in the SSD cause
>>> under heavy cache loads it can be pushed out (like really big data files).
>>>
>>> So I came up with the idea of setting small 5-10GB partitions for the hot
>>> RocksDB and the rest to use it as a cache, so I make sure that really hot
>>> metadata is actually always on the SSD and the coulder one should be also on
>>> the SSD (as a bcache) if its not really freezing, in that case they would be
>>> pushed to the HDD. It also doesnt make anysense to have metadatada that you
>>> never used using space on the SSD, I rather use that space to store hotter
>>> data.
>>>
>>> This is also make writes faster, and in blueStore we dont have the 

Re: [ceph-users] Questions about bluestore

2017-10-14 Thread Jorge Pinilla López
There are 2 configs to set the size of your DB and 
WalBluestore_block_db_sizeBluestore_block_wal_size
If you have an SSD you should give as much space as you can to the DB and don't 
care about the Wal (Wal would always be placed in the fastest device) 

I am not sure about hot moving the DB but as far as you have replicas you can 
always remake the osd.
If SSD breaks and it's not possible to recover the data all your OSDs will 
break but as far as you have a replicas in other node you can remake the OSDs 
and they will rebalance.That's why you shouldn't use consumers SSD for rocksDB.

 Mensaje original De: Mario Giammarco  
Fecha: 14/10/17  10:54 AM  (GMT+01:00) Para: ceph-users 
 Asunto: Re: [ceph-users] Questions about bluestore 
Nobody can help me? 

Il ven 6 ott 2017, 07:31 Mario Giammarco  ha scritto:
Hello,I am trying Ceph luminous with Bluestore.
I create an osd:
ceph-disk prepare --bluestore /dev/sdg  --block.db /dev/sdf

and I see that on ssd it creates a partition of only 1g for block.db
So:
ceph-disk prepare --bluestore /dev/sdg --block.wal /dev/sdf --block.db /dev/sdf 


and again it creates two partitions, 1g and 500mb
It seems to me that they are too small the ssd is underutilized (docs says you 
need a ssd greater than 1g to put a block.db on)
Other two questions:
- if I already have an osd bluestore can I move later the db on ssd?- docs says 
that I can add several block.db of different osds on one ssd. But what happens 
if ssd breaks?
Thanks,Mario

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions about bluestore

2017-10-14 Thread Mario Giammarco
Nobody can help me?

Il ven 6 ott 2017, 07:31 Mario Giammarco  ha scritto:

> Hello,
> I am trying Ceph luminous with Bluestore.
>
> I create an osd:
>
> ceph-disk prepare --bluestore /dev/sdg  --block.db /dev/sdf
>
> and I see that on ssd it creates a partition of only 1g for block.db
>
> So:
>
> ceph-disk prepare --bluestore /dev/sdg --block.wal /dev/sdf --block.db
> /dev/sdf
>
> and again it creates two partitions, 1g and 500mb
>
> It seems to me that they are too small the ssd is underutilized (docs says
> you need a ssd greater than 1g to put a block.db on)
>
> Other two questions:
>
> - if I already have an osd bluestore can I move later the db on ssd?
> - docs says that I can add several block.db of different osds on one ssd.
> But what happens if ssd breaks?
>
> Thanks,
> Mario
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com