[ceph-users] (no subject)

2018-05-18 Thread Don Doerner
unsubscribe ceph-users
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through security software 
programs and retain such messages in order to comply with applicable data 
security and retention requirements. Quantum is not responsible for the proper 
and complete transmission of the substance of this communication or for any 
delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?

2015-04-30 Thread Don Doerner
Mohamed,

Well, that’s interesting… and in direct conflict with what is written in the 
documentation<http://ceph.com/docs/master/rados/operations/cache-tiering/> 
(wherein it describes relative sizing as proportional to the cache pool’s 
capacity).  I am presently reinstalling, so I’ll give that a try.  Thanks very 
much.

-don-

From: Mohamed Pakkeer [mailto:mdfakk...@gmail.com]
Sent: 30 April, 2015 11:45
To: Don Doerner
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?

Hi Don,

You have to provide the target size through target_max_bytes. 
target_dirty_ratio and target_full_ratio values are based upon 
target_max_bytes. If you provide target_max bytes as 200 GB and 
target_dirty_ratio as 0.4, ceph will start the fulshing, when the cache tier 
has 60GB of dirty objects.

Mohamed

On Thu, Apr 30, 2015 at 11:56 PM, Don Doerner 
mailto:don.doer...@quantum.com>> wrote:
Hi Mohamed,

I did not.  But:

•I confirmed that (by default) cache_target_dirty_ratio was set to 0.4 
(40%) and cache_target_full_ratio was set to 0.8 (80%).

•I did not set target_max_bytes (I felt that the simple, pure relative 
sizing controls were sufficient for my experiment).

•I confirmed that (by default) cache_min_flush_age and 
cache_min_evict_age were set to 0 (so no required delay for either flushing or 
eviction).

Given these settings, I expected to see:

•Flushing begin to happen at 40% of my cache tier size (~200 GB, as it 
happened), or about 80 GB.  Or earlier.

•Eviction begin to happen at 80% of my cache tier size, or about 160 
GB.  Or earlier.

•Cache tier capacity would only exceed 80% only if the flushing process 
couldn’t keep up with the ingest process for fairly long periods of time (at 
the observed ingest rate of ~400 MB/sec, a few hundred seconds).

Am I misunderstanding something?

Thank you very much for your assistance!

-don-

From: Mohamed Pakkeer [mailto:mdfakk...@gmail.com<mailto:mdfakk...@gmail.com>]
Sent: 30 April, 2015 10:52
To: Don Doerner
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?

Hi Don,

Did you configure target_ dirty_ratio, target_full_ratio and target_max_bytes?


K.Mohamed Pakkeer

On Thu, Apr 30, 2015 at 10:26 PM, Don Doerner 
mailto:don.doer...@quantum.com>> wrote:
All,

Synopsis: I can’t get cache tiering to work in HAMMER on RHEL7.

Process:

1.  Fresh install of HAMMER on RHEL7 went well.

2.  Crush map adapted to provide two “root” level resources

a.   “ctstorage”, to use as a cache tier based on very high-performance, 
high IOPS SSD (intrinsic journal).  2 OSDs.

b.  “ecstorage”, to use as an erasure-coded poolbased on low-performance, 
cost effective storage (extrinsic journal).  12 OSDs.

3.  Established a pool “ctpool”, 32 PGs on ctstorage (pool size = min_size 
= 1).  Ran a quick RADOS bench test, all worked as expected.  Cleaned up.

4.  Established a pool “ecpool”, 256 PGs on ecstorage (5+3 profile).  Ran a 
quick RADOS bench test, all worked as expected.  Cleaned up.

5.  Ensured that both pools were empty (i.e., “rados ls” shows no objects)

6.  Put the cache tier on the erasure coded storage (one Bloom hit set, 
interval 900 seconds), set up the overlay.  Used defaults for flushing and 
eviction.  No errors.

7.  Started a 3600-second write test to ecpool.

Objects piled up in ctpool (as expected) – went past the 40% mark (as 
expected), then past the 80% mark (unexpected), then ran into the wall (95% 
full – very unexpected).  Using “rados df”, I can see that the cache tier is 
full (duh!) but not one single object lives in the ecpool.  Nothing was ever 
flushed, nothing was ever evicted.  Thought I might be misreading that, so I 
went back to SAR data that I captured during the test: the SSDs were the only 
[ceph] devices that sustained I/O.

I based this experiment on another (much more successful) experiment that I 
performed using GIANT (.1) on RHEL7 a couple of weeks ago (wherein I used RAM 
as a cache tier); that all worked.  It seems there are at least three 
possibilities…

•I forgot a critical step this time around.

•The steps needed for a cache tier in HAMMER are different than the 
steps needed in GIANT (and different than the documentation online).

•There is a problem with HAMMER in the area of cache tier.

Has anyone successfully deployed cache-tiering in HAMMER?  Did you have to do 
anything unusual?  Do you see any steps that I missed?

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communic

Re: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?

2015-04-30 Thread Don Doerner
OK...  When I hit the wall, Ceph got pretty unusable; I haven't figured out how 
to restore it to health.  So to do as you suggest, I am going to have to scrape 
everything into the trash and try again (3rd time's the charm, right?) - so let 
me get started on that.  I will pause before I run the big test that can 
overflow the cache and consult with you on what specific steps you might 
recommend.

-don-


-Original Message-
From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: 30 April, 2015 10:58
To: Don Doerner; ceph-users@lists.ceph.com
Subject: RE: RHEL7/HAMMER cache tier doesn't flush or evict?
Sensitivity: Personal

I'm using Inkscope to monitor my cluster and looking at the pool details I saw 
that mode was set to none. I'm pretty sure there must be a ceph cmd line to get 
the option state but I couldn't find anything obvious when I was looking for it.

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Don Doerner
> Sent: 30 April 2015 18:47
> To: Nick Fisk; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?
> Sensitivity: Personal
> 
> Hi Nick,
> 
> For brevity, I didn't detail all of the commands I issued.  Looking 
> back
through
> my command history, I can confirm that I did explicitly set cache-mode 
> to writeback (and later reset it to forward to try flush-and-evict).
Question:
> how did you determine that your cache-mode was not writeback?  I'll do 
> that, just to  confirm that this is the problem, then reestablish the
cache-
> mode.
> 
> Thank you very much for your assistance!
> 
> -don-
> 
> -----Original Message-
> From: Nick Fisk [mailto:n...@fisk.me.uk]
> Sent: 30 April, 2015 10:38
> To: Don Doerner; ceph-users@lists.ceph.com
> Subject: RE: RHEL7/HAMMER cache tier doesn't flush or evict?
> Sensitivity: Personal
> 
> Hi Don,
> 
> I experienced the same thing a couple of days ago on Hammer. On 
> investigation the cache mode wasn't set to writeback even though I'm 
> sure
it
> accepted the command successfully when I set the pool up.
> 
> Could you reapply the cache mode writeback command and see if that 
> makes a difference?
> 
> Nick
> 
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On 
> > Behalf Of Don Doerner
> > Sent: 30 April 2015 17:57
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?
> > Sensitivity: Personal
> >
> > All,
> >
> > Synopsis: I can't get cache tiering to work in HAMMER on RHEL7.
> >
> > Process:
> > 1. Fresh install of HAMMER on RHEL7 went well.
> > 2. Crush map adapted to provide two "root" level resources a.
> > "ctstorage", to use as a cache tier based on very high-performance,
> high
> > IOPS SSD (intrinsic journal).  2 OSDs.
> > b. "ecstorage", to use as an erasure-coded poolbased on 
> > low-performance, cost effective storage (extrinsic journal).  12 OSDs.
> > 3. Established a pool "ctpool", 32 PGs on ctstorage (pool size = 
> > min_size
> = 1).
> > Ran a quick RADOS bench test, all worked as expected.  Cleaned up.
> > 4. Established a pool "ecpool", 256 PGs on ecstorage (5+3 profile).
> > Ran a quick RADOS bench test, all worked as expected.  Cleaned up.
> > 5. Ensured that both pools were empty (i.e., "rados ls" shows no
> > objects) 6. Put the cache tier on the erasure coded storage (one 
> > Bloom hit set, interval 900 seconds), set up the overlay.  Used 
> > defaults for flushing and eviction.  No errors.
> > 7. Started a 3600-second write test to ecpool.
> >
> > Objects piled up in ctpool (as expected) - went past the 40% mark 
> > (as expected), then past the 80% mark (unexpected), then ran into 
> > the wall (95% full - very unexpected).  Using "rados df", I can see 
> > that the cache
> tier is
> > full (duh!) but not one single object lives in the ecpool.  Nothing 
> > was
> ever
> > flushed, nothing was ever evicted.  Thought I might be misreading 
> > that, so
> I
> > went back to SAR data that I captured during the test: the SSDs were 
> > the
> only
> > [ceph] devices that sustained I/O.
> >
> > I based this experiment on another (much more successful) experiment 
> > that I performed using GIANT (.1) on RHEL7 a couple of weeks ago 
> > (wherein I used RAM as a cache tier); that all worked.  It seems 
> > there are at least
> three
> > possibilities.
> > 

Re: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?

2015-04-30 Thread Don Doerner
Hi Mohamed,

I did not.  But:

·I confirmed that (by default) cache_target_dirty_ratio was set to 0.4 
(40%) and cache_target_full_ratio was set to 0.8 (80%).

·I did not set target_max_bytes (I felt that the simple, pure relative 
sizing controls were sufficient for my experiment).

·I confirmed that (by default) cache_min_flush_age and 
cache_min_evict_age were set to 0 (so no required delay for either flushing or 
eviction).

Given these settings, I expected to see:

·Flushing begin to happen at 40% of my cache tier size (~200 GB, as it 
happened), or about 80 GB.  Or earlier.

·Eviction begin to happen at 80% of my cache tier size, or about 160 
GB.  Or earlier.

·Cache tier capacity would only exceed 80% only if the flushing process 
couldn’t keep up with the ingest process for fairly long periods of time (at 
the observed ingest rate of ~400 MB/sec, a few hundred seconds).

Am I misunderstanding something?

Thank you very much for your assistance!

-don-

From: Mohamed Pakkeer [mailto:mdfakk...@gmail.com]
Sent: 30 April, 2015 10:52
To: Don Doerner
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?

Hi Don,

Did you configure target_ dirty_ratio, target_full_ratio and target_max_bytes?


K.Mohamed Pakkeer

On Thu, Apr 30, 2015 at 10:26 PM, Don Doerner 
mailto:don.doer...@quantum.com>> wrote:
All,

Synopsis: I can’t get cache tiering to work in HAMMER on RHEL7.

Process:

1.  Fresh install of HAMMER on RHEL7 went well.

2.  Crush map adapted to provide two “root” level resources

a.   “ctstorage”, to use as a cache tier based on very high-performance, 
high IOPS SSD (intrinsic journal).  2 OSDs.

b.  “ecstorage”, to use as an erasure-coded poolbased on low-performance, 
cost effective storage (extrinsic journal).  12 OSDs.

3.  Established a pool “ctpool”, 32 PGs on ctstorage (pool size = min_size 
= 1).  Ran a quick RADOS bench test, all worked as expected.  Cleaned up.

4.  Established a pool “ecpool”, 256 PGs on ecstorage (5+3 profile).  Ran a 
quick RADOS bench test, all worked as expected.  Cleaned up.

5.  Ensured that both pools were empty (i.e., “rados ls” shows no objects)

6.  Put the cache tier on the erasure coded storage (one Bloom hit set, 
interval 900 seconds), set up the overlay.  Used defaults for flushing and 
eviction.  No errors.

7.  Started a 3600-second write test to ecpool.

Objects piled up in ctpool (as expected) – went past the 40% mark (as 
expected), then past the 80% mark (unexpected), then ran into the wall (95% 
full – very unexpected).  Using “rados df”, I can see that the cache tier is 
full (duh!) but not one single object lives in the ecpool.  Nothing was ever 
flushed, nothing was ever evicted.  Thought I might be misreading that, so I 
went back to SAR data that I captured during the test: the SSDs were the only 
[ceph] devices that sustained I/O.

I based this experiment on another (much more successful) experiment that I 
performed using GIANT (.1) on RHEL7 a couple of weeks ago (wherein I used RAM 
as a cache tier); that all worked.  It seems there are at least three 
possibilities…

•I forgot a critical step this time around.

•The steps needed for a cache tier in HAMMER are different than the 
steps needed in GIANT (and different than the documentation online).

•There is a problem with HAMMER in the area of cache tier.

Has anyone successfully deployed cache-tiering in HAMMER?  Did you have to do 
anything unusual?  Do you see any steps that I missed?

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=BQMFaQ&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=DAW8QzIBpV_iBddECxqMq8sRPZPQOBqikPmeCBg26bM&m=Z-3d5aMnP4pxkHHCAf6pW_kRjxRPDF3dx6MfuVGZDgw&s=0C6rvtLPHnddUXaVLBff4sszXT6cKSkGmnZag2VVLfk&e=>



--
Thanks & Regards
K.Mohamed Pakkeer
Mobile- 0091-8754410114
___
cep

Re: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?

2015-04-30 Thread Don Doerner
Hi Nick,

For brevity, I didn't detail all of the commands I issued.  Looking back 
through my command history, I can confirm that I did explicitly set cache-mode 
to writeback (and later reset it to forward to try flush-and-evict).  Question: 
how did you determine that your cache-mode was not writeback?  I'll do that, 
just to  confirm that this is the problem, then reestablish the cache-mode.

Thank you very much for your assistance!

-don-

-Original Message-
From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: 30 April, 2015 10:38
To: Don Doerner; ceph-users@lists.ceph.com
Subject: RE: RHEL7/HAMMER cache tier doesn't flush or evict?
Sensitivity: Personal

Hi Don,

I experienced the same thing a couple of days ago on Hammer. On investigation 
the cache mode wasn't set to writeback even though I'm sure it accepted the 
command successfully when I set the pool up.

Could you reapply the cache mode writeback command and see if that makes a 
difference?

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Don Doerner
> Sent: 30 April 2015 17:57
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?
> Sensitivity: Personal
> 
> All,
> 
> Synopsis: I can't get cache tiering to work in HAMMER on RHEL7.
> 
> Process:
> 1. Fresh install of HAMMER on RHEL7 went well.
> 2. Crush map adapted to provide two "root" level resources a.  
> "ctstorage", to use as a cache tier based on very high-performance,
high
> IOPS SSD (intrinsic journal).  2 OSDs.
> b. "ecstorage", to use as an erasure-coded poolbased on 
> low-performance, cost effective storage (extrinsic journal).  12 OSDs.
> 3. Established a pool "ctpool", 32 PGs on ctstorage (pool size = 
> min_size
= 1).
> Ran a quick RADOS bench test, all worked as expected.  Cleaned up.
> 4. Established a pool "ecpool", 256 PGs on ecstorage (5+3 profile).  
> Ran a quick RADOS bench test, all worked as expected.  Cleaned up.
> 5. Ensured that both pools were empty (i.e., "rados ls" shows no 
> objects) 6. Put the cache tier on the erasure coded storage (one Bloom 
> hit set, interval 900 seconds), set up the overlay.  Used defaults for 
> flushing and eviction.  No errors.
> 7. Started a 3600-second write test to ecpool.
> 
> Objects piled up in ctpool (as expected) - went past the 40% mark (as 
> expected), then past the 80% mark (unexpected), then ran into the wall 
> (95% full - very unexpected).  Using "rados df", I can see that the 
> cache
tier is
> full (duh!) but not one single object lives in the ecpool.  Nothing 
> was
ever
> flushed, nothing was ever evicted.  Thought I might be misreading 
> that, so
I
> went back to SAR data that I captured during the test: the SSDs were 
> the
only
> [ceph] devices that sustained I/O.
> 
> I based this experiment on another (much more successful) experiment 
> that I performed using GIANT (.1) on RHEL7 a couple of weeks ago 
> (wherein I used RAM as a cache tier); that all worked.  It seems there 
> are at least
three
> possibilities.
> . I forgot a critical step this time around.
> . The steps needed for a cache tier in HAMMER are different than the 
> steps needed in GIANT (and different than the documentation online).
> . There is a problem with HAMMER in the area of cache tier.
> 
> Has anyone successfully deployed cache-tiering in HAMMER?  Did you 
> have to do anything unusual?  Do you see any steps that I missed?
> 
> Regards,
> 
> -don-
> 
> 
> The information contained in this transmission may be confidential. 
> Any disclosure, copying, or further distribution of confidential 
> information
is not
> permitted unless such privilege is explicitly granted in writing by
Quantum.
> Quantum reserves the right to have electronic communications, 
> including email and attachments, sent across its networks filtered 
> through anti
virus
> and spam software programs and retain such messages in order to comply 
> with applicable data security and retention requirements. Quantum is 
> not responsible for the proper and complete transmission of the 
> substance of this communication or for any delay in its receipt.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RHEL7/HAMMER cache tier doesn't flush or evict?

2015-04-30 Thread Don Doerner
All,

Synopsis: I can't get cache tiering to work in HAMMER on RHEL7.

Process:

1.  Fresh install of HAMMER on RHEL7 went well.

2.  Crush map adapted to provide two "root" level resources

a.   "ctstorage", to use as a cache tier based on very high-performance, 
high IOPS SSD (intrinsic journal).  2 OSDs.

b.  "ecstorage", to use as an erasure-coded poolbased on low-performance, 
cost effective storage (extrinsic journal).  12 OSDs.

3.  Established a pool "ctpool", 32 PGs on ctstorage (pool size = min_size 
= 1).  Ran a quick RADOS bench test, all worked as expected.  Cleaned up.

4.  Established a pool "ecpool", 256 PGs on ecstorage (5+3 profile).  Ran a 
quick RADOS bench test, all worked as expected.  Cleaned up.

5.  Ensured that both pools were empty (i.e., "rados ls" shows no objects)

6.  Put the cache tier on the erasure coded storage (one Bloom hit set, 
interval 900 seconds), set up the overlay.  Used defaults for flushing and 
eviction.  No errors.

7.  Started a 3600-second write test to ecpool.

Objects piled up in ctpool (as expected) - went past the 40% mark (as 
expected), then past the 80% mark (unexpected), then ran into the wall (95% 
full - very unexpected).  Using "rados df", I can see that the cache tier is 
full (duh!) but not one single object lives in the ecpool.  Nothing was ever 
flushed, nothing was ever evicted.  Thought I might be misreading that, so I 
went back to SAR data that I captured during the test: the SSDs were the only 
[ceph] devices that sustained I/O.

I based this experiment on another (much more successful) experiment that I 
performed using GIANT (.1) on RHEL7 a couple of weeks ago (wherein I used RAM 
as a cache tier); that all worked.  It seems there are at least three 
possibilities...

*I forgot a critical step this time around.

*The steps needed for a cache tier in HAMMER are different than the 
steps needed in GIANT (and different than the documentation online).

*There is a problem with HAMMER in the area of cache tier.

Has anyone successfully deployed cache-tiering in HAMMER?  Did you have to do 
anything unusual?  Do you see any steps that I missed?

Regards,

-don-

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understanding High Availability - iSCSI/CIFS/NFS

2015-04-04 Thread Don Doerner
Hi Justin,

Ceph, proper, does not provide those services.  Ceph does provide Linux block 
devices (look for Rados Block Devices, aka, RBD) and a filesystem, CephFS.

I don’t know much about the filesystem, but the block devices are present on an 
RBD client that you set up, following the instructions at ceph.com.  If you 
need those block devices “converted” into iSCSI devices, you’ll need to include 
a target driver, e.g., LIO.  Issues of high-availability will involve 
appropriate deployment and configuration of your RBD clients and target driver 
technology – not such a trivial undertaking.

Good luck with your configuration…

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Justin 
Chin-You
Sent: 04 April, 2015 06:31
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Understanding High Availability - iSCSI/CIFS/NFS

Hi All,

Hoping someone can help me understand CEPH HA or point me in the direction of a 
doc I missed.

I understand how CEPH HA itself works in regards to PG, OSD and Monitoring. 
However what isn't clear for me is the failover in regards to things like iSCSI 
and the not yet production ready CIFS/NFS.

Scenario:
I have 2 servers that are peered and running CEPH and I am replicating between 
both. Using CEPH I have iSCSI targets and CIFS/NFS stores.

In the event a server should fail how are iSCSI Initiators and CIFS/NFS clients 
re-directed? I am assuming Multipath and Virtual IPs but I can't figure out if 
this is something I need to configure/run on the OS side or if it is in CEPH 
itself.

Any help appreciated!

Thanks!

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install problems GIANT on RHEL7

2015-04-04 Thread Don Doerner
Key problem resolved by actually installing (as opposed to simply configuring) 
the EPEL repo.  And with that, the cluster became viable.  Thanks all.

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 04 April, 2015 09:47
To: ceph-us...@ceph.com
Subject: [ceph-users] Install problems GIANT on RHEL7
Sensitivity: Personal

Folks,

I am having a hard time setting up a fresh install of GIANT on a fresh install 
of RHEL7 - which you would think would be about the easiest of all situations...

1.  Using ceph-deploy 1.5.22 - for some reason it never updates the 
/etc/yum.repos.d to include all of the various ceph repos that are needed.  
Manually added the repos from a slightly older GIANT install.

2.  Missing python-jinja2.  The old way of resolving this, using 
eu.ceph.com doesn't work (that repo has apparently gone missing).  So I hunted 
down the current jinja2 RPM and installed it.

3.  Hit issue 10476 (despite the fact that I am using ceph-deploy 1.5.22).  
Manually updated /etc/yum/pluginconf.d/priorities.conf to have 
"check_obsoletes=1".

4.  Hit issue 11104 despite manually addressing 10476.  Took the approach 
of forcing the installation of "yum install -x python-rados -x python-rbd ceph" 
mentioned therein.  I am getting a key error now.  I may be able to debug this 
and get around the key error, but...
Can anyone tell me that they have successfully installed GIANT 0.87.1 on RHEL7? 
 And how did you manage to do that?  Have I gotten so far off the tracks that I 
should just start over from some point (and what point)?  Or is there a better 
"stable release" to be using at this point, e.g., should I go back to FIREFLY?
Somehow the earlier release of GIANT was a piece of cake by comparison...

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install problems GIANT on RHEL7

2015-04-04 Thread Don Doerner
OK, apparently it's also a good idea to install EPEL, not just copy over the 
repo configuration from another installation.
That resolved the key error and It appears that I have it all installed.

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 04 April, 2015 09:47
To: ceph-us...@ceph.com
Subject: [ceph-users] Install problems GIANT on RHEL7
Sensitivity: Personal

Folks,

I am having a hard time setting up a fresh install of GIANT on a fresh install 
of RHEL7 - which you would think would be about the easiest of all situations...

1.  Using ceph-deploy 1.5.22 - for some reason it never updates the 
/etc/yum.repos.d to include all of the various ceph repos that are needed.  
Manually added the repos from a slightly older GIANT install.

2.  Missing python-jinja2.  The old way of resolving this, using 
eu.ceph.com doesn't work (that repo has apparently gone missing).  So I hunted 
down the current jinja2 RPM and installed it.

3.  Hit issue 10476 (despite the fact that I am using ceph-deploy 1.5.22).  
Manually updated /etc/yum/pluginconf.d/priorities.conf to have 
"check_obsoletes=1".

4.  Hit issue 11104 despite manually addressing 10476.  Took the approach 
of forcing the installation of "yum install -x python-rados -x python-rbd ceph" 
mentioned therein.  I am getting a key error now.  I may be able to debug this 
and get around the key error, but...
Can anyone tell me that they have successfully installed GIANT 0.87.1 on RHEL7? 
 And how did you manage to do that?  Have I gotten so far off the tracks that I 
should just start over from some point (and what point)?  Or is there a better 
"stable release" to be using at this point, e.g., should I go back to FIREFLY?
Somehow the earlier release of GIANT was a piece of cake by comparison...

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Install problems GIANT on RHEL7

2015-04-04 Thread Don Doerner
Folks,

I am having a hard time setting up a fresh install of GIANT on a fresh install 
of RHEL7 - which you would think would be about the easiest of all situations...

1.  Using ceph-deploy 1.5.22 - for some reason it never updates the 
/etc/yum.repos.d to include all of the various ceph repos that are needed.  
Manually added the repos from a slightly older GIANT install.

2.  Missing python-jinja2.  The old way of resolving this, using 
eu.ceph.com doesn't work (that repo has apparently gone missing).  So I hunted 
down the current jinja2 RPM and installed it.

3.  Hit issue 10476 (despite the fact that I am using ceph-deploy 1.5.22).  
Manually updated /etc/yum/pluginconf.d/priorities.conf to have 
"check_obsoletes=1".

4.  Hit issue 11104 despite manually addressing 10476.  Took the approach 
of forcing the installation of "yum install -x python-rados -x python-rbd ceph" 
mentioned therein.  I am getting a key error now.  I may be able to debug this 
and get around the key error, but...
Can anyone tell me that they have successfully installed GIANT 0.87.1 on RHEL7? 
 And how did you manage to do that?  Have I gotten so far off the tracks that I 
should just start over from some point (and what point)?  Or is there a better 
"stable release" to be using at this point, e.g., should I go back to FIREFLY?
Somehow the earlier release of GIANT was a piece of cake by comparison...

Regards,

-don-

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded

2015-03-25 Thread Don Doerner
Sorry all: my company's e-mail security got in the way there.  Try these 
references...

*http://tracker.ceph.com/issues/10350

*
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 25 March, 2015 08:01
To: Udo Lembke; ceph-us...@ceph.com
Subject: Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 
active+undersized+degraded


Assuming you've calculated the number of PGs reasonably, see 
here<https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/10350&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=paGdpY4XEjd5skha6nYQHvvZ31Gx2psGdOhHbuywrRU%3D%0A&s=dc3fc62fa581494703a491f5e7090feafb1dc52128f072e3e4d4a5a882ef9c90>
 and 
here<https://urldefense.proofpoint.com/v1/url?u=http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/%23crush-gives-up-too-soonhttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=paGdpY4XEjd5skha6nYQHvvZ31Gx2psGdOhHbuywrRU%3D%0A&s=1683ddcb2c3bb9c786555c0aad19daaa03b91ad8f3241035f496d16c0e57b552>.
  I'm guessing these will address your issue.  That weird number means that no 
OSD was found/assigned to the PG.



-don-





-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Udo 
Lembke
Sent: 25 March, 2015 01:21
To: ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>
Subject: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 
active+undersized+degraded



Hi,

due to two more hosts (now 7 storage nodes) I want to create an new ec-pool and 
get an strange effect:



ceph@admin:~$ ceph health detail

HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs 
stuck undersized; 2 pgs undersized pg 22.3e5 is stuck unclean since forever, 
current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck unclean since forever, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is stuck undersized for 406.614447, current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck undersized for 406.616563, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is stuck degraded for 406.614566, current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck degraded for 406.616679, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is active+undersized+degraded, acting [76,15,82,11,57,29,2147483647] 
pg 22.240 is active+undersized+degraded, acting [38,85,17,74,2147483647,10,58]



But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647!

Where the heck came the 2147483647 from?



I do following commands:

ceph osd erasure-code-profile set 7hostprofile k=5 m=2 
ruleset-failure-domain=host ceph osd pool create ec7archiv 1024 1024 erasure 
7hostprofile



my version:

ceph -v

ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)





I found an issue in my crush-map - one SSD was twice in the map:

host ceph-061-ssd {

id -16  # do not change unnecessarily

# weight 0.000

alg straw

hash 0  # rjenkins1

}

root ssd {

id -13  # do not change unnecessarily

# weight 0.780

alg straw

hash 0  # rjenkins1

item ceph-01-ssd weight 0.170

item ceph-02-ssd weight 0.170

item ceph-03-ssd weight 0.000

item ceph-04-ssd weight 0.170

item ceph-05-ssd weight 0.170

item ceph-06-ssd weight 0.050

item ceph-07-ssd weight 0.050

item ceph-061-ssd weight 0.000

}



Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd, but 
after fix the crusmap the issue with the osd 2147483647 still excist.



Any idea how to fix that?



regards



Udo



___

ceph-users mailing list

ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

https://urldefense.proofpoint.com/v1/url?u=http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=7L%2Bu4ghQ7Cz2ppDjpUHHs74BvxHqx4qrftnh0Jo1y68%3D%0A&s=4cbce863e3e10b02556b5b7be498e83c60fb4e16cf29235bb0a35dd2bb68828b


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic comm

Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded

2015-03-25 Thread Don Doerner
Assuming you've calculated the number of PGs reasonably, see 
here and 
here.
  I'm guessing these will address your issue.  That weird number means that no 
OSD was found/assigned to the PG.



-don-





-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Udo 
Lembke
Sent: 25 March, 2015 01:21
To: ceph-us...@ceph.com
Subject: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 
active+undersized+degraded



Hi,

due to two more hosts (now 7 storage nodes) I want to create an new ec-pool and 
get an strange effect:



ceph@admin:~$ ceph health detail

HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs 
stuck undersized; 2 pgs undersized pg 22.3e5 is stuck unclean since forever, 
current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck unclean since forever, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is stuck undersized for 406.614447, current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck undersized for 406.616563, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is stuck degraded for 406.614566, current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck degraded for 406.616679, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is active+undersized+degraded, acting [76,15,82,11,57,29,2147483647] 
pg 22.240 is active+undersized+degraded, acting [38,85,17,74,2147483647,10,58]



But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647!

Where the heck came the 2147483647 from?



I do following commands:

ceph osd erasure-code-profile set 7hostprofile k=5 m=2 
ruleset-failure-domain=host ceph osd pool create ec7archiv 1024 1024 erasure 
7hostprofile



my version:

ceph -v

ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)





I found an issue in my crush-map - one SSD was twice in the map:

host ceph-061-ssd {

id -16  # do not change unnecessarily

# weight 0.000

alg straw

hash 0  # rjenkins1

}

root ssd {

id -13  # do not change unnecessarily

# weight 0.780

alg straw

hash 0  # rjenkins1

item ceph-01-ssd weight 0.170

item ceph-02-ssd weight 0.170

item ceph-03-ssd weight 0.000

item ceph-04-ssd weight 0.170

item ceph-05-ssd weight 0.170

item ceph-06-ssd weight 0.050

item ceph-07-ssd weight 0.050

item ceph-061-ssd weight 0.000

}



Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd, but 
after fix the crusmap the issue with the osd 2147483647 still excist.



Any idea how to fix that?



regards



Udo



___

ceph-users mailing list

ceph-users@lists.ceph.com

https://urldefense.proofpoint.com/v1/url?u=http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=7L%2Bu4ghQ7Cz2ppDjpUHHs74BvxHqx4qrftnh0Jo1y68%3D%0A&s=4cbce863e3e10b02556b5b7be498e83c60fb4e16cf29235bb0a35dd2bb68828b

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Reliable OSD

2015-03-17 Thread Don Doerner
Situation: I need to use EC pools (for the economics/power/cooling) for the 
storage of data, but my use case requires a block device.  Ergo, I require a 
cache tier.  I have tried using a 3x replicated pool as a cache tier - the 
throughput was poor, mostly due to latency, mostly due to device saturation 
(i.e., of the tier devices), mostly due to seeking.

Data on the cache tier is tactical: it's going to get pushed off the cache tier 
into the EC pool relatively quickly.  RAID-6 protection (which is roughly the 
same as I get with a 3x replicated pool) is fine.


I happen to have the ability to create a small RAID-6 (on each of several 
nodes) that could collectively serve as a cache tier.  And the RAID controller 
has a battery, so can operate write-back, so latency goes waydown.  Can I 
create a pool of unreplicated OSDs, i.e., can I set the size of the pool to 1?  
It seems like this creates a singularity when it comes to CRUSH: do placement 
groups even make sense?  Or is there any way that I can use my RAID hardware to 
build a low-latency cache tier?



Regards,

-don-___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
I don’t know – I am playing with crush; someday I may fully comprehend it.  Not 
today.

I think you have to look at it like this: if your possible failure domain 
options are OSDs, hosts, racks, …, and you choose racks as your failure domain, 
and you have exactly as many racks as your pool size (and it can’t be any 
smaller, right?), then each PG has to have an OSD from each rack.  If your 144 
OSDs are split evenly across 8 racks, then you have 18 OSDs in each rack 
(presumably distributed over the hosts in that rack, though I don’t think that 
distribution is important for this calculation).  And so your total number of 
choices is 18 to the 8th power, or just over 11 billion (actually, 
11,019,960,576J).  So probably the only thing you have to worry about is “crush 
giving up too soon”, and Yann’s resolution.

-don-

From: Kyle Hutson [mailto:kylehut...@ksu.edu]
Sent: 04 March, 2015 13:15
To: Don Doerner
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] New EC pool undersized

So it sounds like I should figure out at 'how many nodes' do I need to increase 
pg_num to 4096, and again for 8192, and increase those incrementally when as I 
add more hosts, correct?

On Wed, Mar 4, 2015 at 3:04 PM, Don Doerner 
mailto:don.doer...@quantum.com>> wrote:
Sorry, I missed your other questions, down at the bottom.  See 
here<https://urldefense.proofpoint.com/v1/url?u=http://ceph.com/docs/master/rados/operations/placement-groups/&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=5LgZXqMcdSY9dR535Wik6Qn%2Fv%2FLOohBS%2FXU6MSfnaEM%3D%0A&s=2a27a052900c3daae01c06d6a26f502b3be9b9e43bd75515319da5df690823f9>
 (look for “number of replicas for replicated pools or the K+M sum for erasure 
coded pools”) for the formula; 38400/8 probably implies 8192.

The thing is, you’ve got to think about how many ways you can form combinations 
of 8 unique OSDs (with replacement) that match your failure domain rules.  If 
you’ve only got 8 hosts, and your failure domain is hosts, it severely limits 
this number.  And I have read that too many isn’t good either – a serialization 
issue, I believe.

-don-

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
 On Behalf Of Don Doerner
Sent: 04 March, 2015 12:49
To: Kyle Hutson
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

Subject: Re: [ceph-users] New EC pool undersized

Hmmm, I just struggled through this myself.  How many racks do you have?  If 
not more than 8, you might want to make your failure domain smaller?  I.e., 
maybe host?  That, at least, would allow you to debug the situation…

-don-

From: Kyle Hutson [mailto:kylehut...@ksu.edu]
Sent: 04 March, 2015 12:43
To: Don Doerner
Cc: Ceph Users
Subject: Re: [ceph-users] New EC pool undersized

It wouldn't let me simply change the pg_num, giving
Error EEXIST: specified pg_num 2048 <= current 8192

But that's not a big deal, I just deleted the pool and recreated with 'ceph osd 
pool create ec44pool 2048 2048 erasure ec44profile'
...and the result is quite similar: 'ceph status' is now
ceph status
cluster 196e5eb8-d6a7-4435-907e-ea028e946923
 health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs undersized
 monmap e1: 4 mons at 
{hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0<https://urldefense.proofpoint.com/v1/url?u=http://10.5.38.1:6789/0%2Chobbit02%3D10.5.38.2:6789/0%2Chobbit13%3D10.5.38.13:6789/0%2Chobbit14%3D10.5.38.14:6789/0&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=fHQcjtxx3uADdikQAQAh65Z0s%2FzNFIj544bRY5zThgI%3D%0A&s=01b7463be37041310163f5d75abc634fab3280633eaef2158ed6609c6f3978d8>},
 election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14
 osdmap e412: 144 osds: 144 up, 144 in
  pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects
90590 MB used, 640 TB / 640 TB avail
   4 active+undersized+degraded
6140 active+clean

'ceph pg dump_stuck results' in
ok
pg_stat   objects   mip  degr misp unf  bytes log  disklog state 
state_stampvreported  up   up_primary actingacting_primary 
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
2.296 00000000 active+undersized+degraded   
 2015-03-04 14:33:26.672224 0'0  412:9 
[5,55,91,2147483647,83,135,53,26]  5 
[5,55,91,2147483647,83,135,53,26]  50'0  2015-03-04 
14:33:15.649911 0'0  2015-03-04 14:33:15.649911
2.69c 00000000 active+undersized+degraded   
 2015-03-04 14:33:24.984802 0'0  412:9 
[93,134,1,74,112,28,2147483647,60] 93 
[93,134,1,74,112,28,2147483647,60] 93   0'0  2015-03-04 
14:33:15.695747 

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
Sorry, I missed your other questions, down at the bottom.  See 
here<http://ceph.com/docs/master/rados/operations/placement-groups/> (look for 
“number of replicas for replicated pools or the K+M sum for erasure coded 
pools”) for the formula; 38400/8 probably implies 8192.

The thing is, you’ve got to think about how many ways you can form combinations 
of 8 unique OSDs (with replacement) that match your failure domain rules.  If 
you’ve only got 8 hosts, and your failure domain is hosts, it severely limits 
this number.  And I have read that too many isn’t good either – a serialization 
issue, I believe.

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 04 March, 2015 12:49
To: Kyle Hutson
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] New EC pool undersized

Hmmm, I just struggled through this myself.  How many racks do you have?  If 
not more than 8, you might want to make your failure domain smaller?  I.e., 
maybe host?  That, at least, would allow you to debug the situation…

-don-

From: Kyle Hutson [mailto:kylehut...@ksu.edu]
Sent: 04 March, 2015 12:43
To: Don Doerner
Cc: Ceph Users
Subject: Re: [ceph-users] New EC pool undersized

It wouldn't let me simply change the pg_num, giving
Error EEXIST: specified pg_num 2048 <= current 8192

But that's not a big deal, I just deleted the pool and recreated with 'ceph osd 
pool create ec44pool 2048 2048 erasure ec44profile'
...and the result is quite similar: 'ceph status' is now
ceph status
cluster 196e5eb8-d6a7-4435-907e-ea028e946923
 health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs undersized
 monmap e1: 4 mons at 
{hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0<https://urldefense.proofpoint.com/v1/url?u=http://10.5.38.1:6789/0%2Chobbit02%3D10.5.38.2:6789/0%2Chobbit13%3D10.5.38.13:6789/0%2Chobbit14%3D10.5.38.14:6789/0&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=fHQcjtxx3uADdikQAQAh65Z0s%2FzNFIj544bRY5zThgI%3D%0A&s=01b7463be37041310163f5d75abc634fab3280633eaef2158ed6609c6f3978d8>},
 election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14
 osdmap e412: 144 osds: 144 up, 144 in
  pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects
90590 MB used, 640 TB / 640 TB avail
   4 active+undersized+degraded
6140 active+clean

'ceph pg dump_stuck results' in
ok
pg_stat   objects   mip  degr misp unf  bytes log  disklog state 
state_stampvreported  up   up_primary actingacting_primary 
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
2.296 00000000 active+undersized+degraded   
 2015-03-04 14:33:26.672224 0'0  412:9 
[5,55,91,2147483647,83,135,53,26]  5 [5,55,91,2147483647,83,135,53,26]  5   
 0'0  2015-03-04 14:33:15.649911 0'0  2015-03-04 14:33:15.649911
2.69c 00000000 active+undersized+degraded   
 2015-03-04 14:33:24.984802 0'0  412:9 
[93,134,1,74,112,28,2147483647,60] 93 [93,134,1,74,112,28,2147483647,60] 93 
  0'0  2015-03-04 14:33:15.695747 0'0  2015-03-04 14:33:15.695747
2.36d 00000000 active+undersized+degraded   
 2015-03-04 14:33:21.937620 0'0  412:9 
[12,108,136,104,52,18,63,2147483647]12   
[12,108,136,104,52,18,63,2147483647]12   0'0  2015-03-04 14:33:15.652480
0'0  2015-03-04 14:33:15.652480
2.5f7 00000000 active+undersized+degraded   
 2015-03-04 14:33:26.169242 0'0  412:9 
[94,128,73,22,4,60,2147483647,113] 94 [94,128,73,22,4,60,2147483647,113] 94 
  0'0  2015-03-04 14:33:15.687695 0'0  2015-03-04 14:33:15.687695

I do have questions for you, even at this point, though.
1) Where did you find the formula (14400/(k+m))?
2) I was really trying to size this for when it goes to production, at which 
point it may have as many as 384 OSDs. Doesn't that imply I should have even 
more pgs?

On Wed, Mar 4, 2015 at 2:15 PM, Don Doerner 
mailto:don.doer...@quantum.com>> wrote:
Oh duh…  OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try 
2048.

-don-

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
 On Behalf Of Don Doerner
Sent: 04 March, 2015 12:14
To: Kyle Hutson; Ceph Users
Subject: Re: [ceph-users] New EC pool undersized

In this case, that number means that there is not an OSD that can be assigned.  
What’s your k, m from you erasure coded pool?  You’ll need approximately 
(14400/(k+m)) PGs, rounded up to the next power of 2…

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Kyle 
Hutson
Sent: 04 March

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
Hmmm, I just struggled through this myself.  How many racks do you have?  If 
not more than 8, you might want to make your failure domain smaller?  I.e., 
maybe host?  That, at least, would allow you to debug the situation…

-don-

From: Kyle Hutson [mailto:kylehut...@ksu.edu]
Sent: 04 March, 2015 12:43
To: Don Doerner
Cc: Ceph Users
Subject: Re: [ceph-users] New EC pool undersized

It wouldn't let me simply change the pg_num, giving
Error EEXIST: specified pg_num 2048 <= current 8192

But that's not a big deal, I just deleted the pool and recreated with 'ceph osd 
pool create ec44pool 2048 2048 erasure ec44profile'
...and the result is quite similar: 'ceph status' is now
ceph status
cluster 196e5eb8-d6a7-4435-907e-ea028e946923
 health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs undersized
 monmap e1: 4 mons at 
{hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0<https://urldefense.proofpoint.com/v1/url?u=http://10.5.38.1:6789/0%2Chobbit02%3D10.5.38.2:6789/0%2Chobbit13%3D10.5.38.13:6789/0%2Chobbit14%3D10.5.38.14:6789/0&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=fHQcjtxx3uADdikQAQAh65Z0s%2FzNFIj544bRY5zThgI%3D%0A&s=01b7463be37041310163f5d75abc634fab3280633eaef2158ed6609c6f3978d8>},
 election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14
 osdmap e412: 144 osds: 144 up, 144 in
  pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects
90590 MB used, 640 TB / 640 TB avail
   4 active+undersized+degraded
6140 active+clean

'ceph pg dump_stuck results' in
ok
pg_stat   objects   mip  degr misp unf  bytes log  disklog state 
state_stampvreported  up   up_primary actingacting_primary 
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
2.296 00000000 active+undersized+degraded   
 2015-03-04 14:33:26.672224 0'0  412:9 
[5,55,91,2147483647,83,135,53,26]  5 [5,55,91,2147483647,83,135,53,26]  5   
 0'0  2015-03-04 14:33:15.649911 0'0  2015-03-04 14:33:15.649911
2.69c 00000000 active+undersized+degraded   
 2015-03-04 14:33:24.984802 0'0  412:9 
[93,134,1,74,112,28,2147483647,60] 93 [93,134,1,74,112,28,2147483647,60] 93 
  0'0  2015-03-04 14:33:15.695747 0'0  2015-03-04 14:33:15.695747
2.36d 00000000 active+undersized+degraded   
 2015-03-04 14:33:21.937620 0'0  412:9 
[12,108,136,104,52,18,63,2147483647]12 
[12,108,136,104,52,18,63,2147483647]12   0'0  2015-03-04 14:33:15.652480
0'0  2015-03-04 14:33:15.652480
2.5f7 00000000 active+undersized+degraded   
 2015-03-04 14:33:26.169242 0'0  412:9 
[94,128,73,22,4,60,2147483647,113] 94 [94,128,73,22,4,60,2147483647,113] 94 
  0'0  2015-03-04 14:33:15.687695 0'0  2015-03-04 14:33:15.687695

I do have questions for you, even at this point, though.
1) Where did you find the formula (14400/(k+m))?
2) I was really trying to size this for when it goes to production, at which 
point it may have as many as 384 OSDs. Doesn't that imply I should have even 
more pgs?

On Wed, Mar 4, 2015 at 2:15 PM, Don Doerner 
mailto:don.doer...@quantum.com>> wrote:
Oh duh…  OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try 
2048.

-don-

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
 On Behalf Of Don Doerner
Sent: 04 March, 2015 12:14
To: Kyle Hutson; Ceph Users
Subject: Re: [ceph-users] New EC pool undersized

In this case, that number means that there is not an OSD that can be assigned.  
What’s your k, m from you erasure coded pool?  You’ll need approximately 
(14400/(k+m)) PGs, rounded up to the next power of 2…

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Kyle 
Hutson
Sent: 04 March, 2015 12:06
To: Ceph Users
Subject: [ceph-users] New EC pool undersized

Last night I blew away my previous ceph configuration (this environment is 
pre-production) and have 0.87.1 installed. I've manually edited the crushmap so 
it down looks like 
https://dpaste.de/OLEa<https://urldefense.proofpoint.com/v1/url?u=https://dpaste.de/OLEa&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=JSfAuDHRgKln0yM%2FQGMT3hZb3rVLUpdn2wGdV3C0Rbk%3D%0A&s=c1bd46dcd96e656554817882d7f6581903b1e3c6a50313f4bf7494acfd12b442>

I currently have 144 OSDs on 8 nodes.

After increasing pg_num and pgp_num to a more suitable 1024 (due to the high 
number of OSDs), everything looked happy.
So, now I'm trying to play with an erasure-coded pool.
I did:
ceph osd erasure-code-prof

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
Oh duh…  OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try 
2048.

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 04 March, 2015 12:14
To: Kyle Hutson; Ceph Users
Subject: Re: [ceph-users] New EC pool undersized

In this case, that number means that there is not an OSD that can be assigned.  
What’s your k, m from you erasure coded pool?  You’ll need approximately 
(14400/(k+m)) PGs, rounded up to the next power of 2…

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Kyle 
Hutson
Sent: 04 March, 2015 12:06
To: Ceph Users
Subject: [ceph-users] New EC pool undersized

Last night I blew away my previous ceph configuration (this environment is 
pre-production) and have 0.87.1 installed. I've manually edited the crushmap so 
it down looks like 
https://dpaste.de/OLEa<https://urldefense.proofpoint.com/v1/url?u=https://dpaste.de/OLEa&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=JSfAuDHRgKln0yM%2FQGMT3hZb3rVLUpdn2wGdV3C0Rbk%3D%0A&s=c1bd46dcd96e656554817882d7f6581903b1e3c6a50313f4bf7494acfd12b442>

I currently have 144 OSDs on 8 nodes.

After increasing pg_num and pgp_num to a more suitable 1024 (due to the high 
number of OSDs), everything looked happy.
So, now I'm trying to play with an erasure-coded pool.
I did:
ceph osd erasure-code-profile set ec44profile k=4 m=4 
ruleset-failure-domain=rack
ceph osd pool create ec44pool 8192 8192 erasure ec44profile

After settling for a bit 'ceph status' gives
cluster 196e5eb8-d6a7-4435-907e-ea028e946923
 health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck 
unclean; 7 pgs stuck undersized; 7 pgs undersized
 monmap e1: 4 mons at 
{hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0<https://urldefense.proofpoint.com/v1/url?u=http://10.5.38.1:6789/0%2Chobbit02%3D10.5.38.2:6789/0%2Chobbit13%3D10.5.38.13:6789/0%2Chobbit14%3D10.5.38.14:6789/0&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=JSfAuDHRgKln0yM%2FQGMT3hZb3rVLUpdn2wGdV3C0Rbk%3D%0A&s=6fe07b47a00235857630057e09cfb702dcddcea1d3f98d81a574020ee95dee44>},
 election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14
 osdmap e409: 144 osds: 144 up, 144 in
  pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects
90598 MB used, 640 TB / 640 TB avail
   7 active+undersized+degraded
   12281 active+clean

So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck'
ok
pg_stat   objects   mip  degr misp unf  bytes log  disklog state 
state_stampvreported  up   up_primary actingacting_primary 
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
1.d77 00000000 active+undersized+degraded   
 2015-03-04 11:33:57.502849 0'0  408:12
[15,95,58,73,52,31,116,2147483647] 15 [15,95,58,73,52,31,116,2147483647] 15 
  0'0  2015-03-04 11:33:42.100752 0'0  2015-03-04 11:33:42.100752
1.10fa00000000 active+undersized+degraded   
 2015-03-04 11:34:29.362554 0'0  408:12
[23,12,99,114,132,53,56,2147483647] 23   
[23,12,99,114,132,53,56,2147483647] 23   0'0  2015-03-04 11:33:42.168571
0'0  2015-03-04 11:33:42.168571
1.127100000000 active+undersized+degraded   
 2015-03-04 11:33:48.795742 0'0  408:12
[135,112,69,4,22,95,2147483647,83] 135 [135,112,69,4,22,95,2147483647,83] 
135  0'0  2015-03-04 11:33:42.139555 0'0  2015-03-04 11:33:42.139555
1.2b5 00000000 active+undersized+degraded   
 2015-03-04 11:34:32.189738 0'0  408:12
[11,115,139,19,76,52,94,2147483647] 11   
[11,115,139,19,76,52,94,2147483647] 11   0'0  2015-03-04 11:33:42.079673
0'0  2015-03-04 11:33:42.079673
1.7ae 00000000 active+undersized+degraded   
 2015-03-04 11:34:26.848344 0'0  408:12
[27,5,132,119,94,56,52,2147483647] 27 [27,5,132,119,94,56,52,2147483647] 27 
  0'0  2015-03-04 11:33:42.109832 0'0  2015-03-04 11:33:42.109832
1.1a9700000000 active+undersized+degraded   
 2015-03-04 11:34:25.457454 0'0  408:12
[20,53,14,54,102,118,2147483647,72] 20   
[20,53,14,54,102,118,2147483647,72] 20   0'0  2015-03-04 11:33:42.833850
0'0  2015-03-04 11:33:42.833850
1.10a600000000 active+undersized+degraded   
 2015-03-04 11:34:30.059936 0'0  408:12
[136,22,4,2147483647,72,52,101,55] 136 [136,22,4,2147483647,72,52,101,55] 
136  0'0  2015-03-04 11:33:42.125871 0'0  2015-03-04 11:33

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
In this case, that number means that there is not an OSD that can be assigned.  
What’s your k, m from you erasure coded pool?  You’ll need approximately 
(14400/(k+m)) PGs, rounded up to the next power of 2…

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Kyle 
Hutson
Sent: 04 March, 2015 12:06
To: Ceph Users
Subject: [ceph-users] New EC pool undersized

Last night I blew away my previous ceph configuration (this environment is 
pre-production) and have 0.87.1 installed. I've manually edited the crushmap so 
it down looks like 
https://dpaste.de/OLEa

I currently have 144 OSDs on 8 nodes.

After increasing pg_num and pgp_num to a more suitable 1024 (due to the high 
number of OSDs), everything looked happy.
So, now I'm trying to play with an erasure-coded pool.
I did:
ceph osd erasure-code-profile set ec44profile k=4 m=4 
ruleset-failure-domain=rack
ceph osd pool create ec44pool 8192 8192 erasure ec44profile

After settling for a bit 'ceph status' gives
cluster 196e5eb8-d6a7-4435-907e-ea028e946923
 health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck 
unclean; 7 pgs stuck undersized; 7 pgs undersized
 monmap e1: 4 mons at 
{hobbit01=10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0},
 election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14
 osdmap e409: 144 osds: 144 up, 144 in
  pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects
90598 MB used, 640 TB / 640 TB avail
   7 active+undersized+degraded
   12281 active+clean

So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck'
ok
pg_stat   objects   mip  degr misp unf  bytes log  disklog state 
state_stampvreported  up   up_primary actingacting_primary 
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
1.d77 00000000 active+undersized+degraded   
 2015-03-04 11:33:57.502849 0'0  408:12
[15,95,58,73,52,31,116,2147483647] 15 [15,95,58,73,52,31,116,2147483647] 15 
  0'0  2015-03-04 11:33:42.100752 0'0  2015-03-04 11:33:42.100752
1.10fa00000000 active+undersized+degraded   
 2015-03-04 11:34:29.362554 0'0  408:12
[23,12,99,114,132,53,56,2147483647] 23 
[23,12,99,114,132,53,56,2147483647] 23   0'0  2015-03-04 11:33:42.168571
0'0  2015-03-04 11:33:42.168571
1.127100000000 active+undersized+degraded   
 2015-03-04 11:33:48.795742 0'0  408:12
[135,112,69,4,22,95,2147483647,83] 135 [135,112,69,4,22,95,2147483647,83] 
135  0'0  2015-03-04 11:33:42.139555 0'0  2015-03-04 11:33:42.139555
1.2b5 00000000 active+undersized+degraded   
 2015-03-04 11:34:32.189738 0'0  408:12
[11,115,139,19,76,52,94,2147483647] 11 
[11,115,139,19,76,52,94,2147483647] 11   0'0  2015-03-04 11:33:42.079673
0'0  2015-03-04 11:33:42.079673
1.7ae 00000000 active+undersized+degraded   
 2015-03-04 11:34:26.848344 0'0  408:12
[27,5,132,119,94,56,52,2147483647] 27 [27,5,132,119,94,56,52,2147483647] 27 
  0'0  2015-03-04 11:33:42.109832 0'0  2015-03-04 11:33:42.109832
1.1a9700000000 active+undersized+degraded   
 2015-03-04 11:34:25.457454 0'0  408:12
[20,53,14,54,102,118,2147483647,72] 20 
[20,53,14,54,102,118,2147483647,72] 20   0'0  2015-03-04 11:33:42.833850
0'0  2015-03-04 11:33:42.833850
1.10a600000000 active+undersized+degraded   
 2015-03-04 11:34:30.059936 0'0  408:12
[136,22,4,2147483647,72,52,101,55] 136 [136,22,4,2147483647,72,52,101,55] 
136  0'0  2015-03-04 11:33:42.125871 0'0  2015-03-04 11:33:42.125871

This appears to have a number on all these (2147483647) that is way out of line 
from what I would expect.

Thoughts?


--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the rig

Re: [ceph-users] EC configuration questions...

2015-03-03 Thread Don Doerner
Loic,

Thank you, I got it created.  One of these days, I am going to have to try to 
understand some of the crush map details...  Anyway, on to the next step!

-don-

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC configuration questions...

2015-03-02 Thread Don Doerner
Update: the attempt to define a traditional replicated pool was  successful; 
it's online and ready to go.  So the cluster basics appear sound...

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 02 March, 2015 16:18
To: ceph-users@lists.ceph.com
Subject: [ceph-users] EC configuration questions...
Sensitivity: Personal

Hello,

I am trying to set up to measure erasure coding performance and overhead.  My 
Ceph "cluster-of-one" has 27 disks, hence 27 OSDs, all empty.  I have ots of 
memory, and I am using "osd crush chooseleaf type = 0" in my config file, so my 
OSDs should be able to peer with others on the same host, right?

I look at the EC profiles defined, and see only "default" which has k=2,m=1.  
Wanting to set up a more realistic test, I defined a new profile "k8m3", 
similar to default, but with k=8,m=3.

Checked with "ceph osd erasure-code-profile get k8m3", all looks good.

I then go to define my pool: "ceph osd pool create ecpool 256 256 erasure k8m3" 
apparently succeeds.

*Sidebar: my math on the pgnum stuff was (27 pools * 100)/11 = ~246, 
round up to 256.

Now I ask "ceph health", and get:
HEALTH_WARN 256 pgs incomplete; 256 pgs stuck inactive; 256 pgs stuck unclean; 
too few pgs per osd (9 < min 20)

Digging in to this a bit ("ceph health detail"), I see the magic OSD number 
(2147483647) that says that there weren't enough OSDs to assign to a placement 
group, for all placement groups.  And at the same time, it is warning me that I 
have too few PGs per OSD.

At the moment, I am defining a traditional replicated pool (3X) to see if that 
will work...  Anyone have any guess as to what I may be doing incorrectly with 
my erasure coded pool?  Or what I should do next to get a clue?

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] EC configuration questions...

2015-03-02 Thread Don Doerner
Hello,

I am trying to set up to measure erasure coding performance and overhead.  My 
Ceph "cluster-of-one" has 27 disks, hence 27 OSDs, all empty.  I have ots of 
memory, and I am using "osd crush chooseleaf type = 0" in my config file, so my 
OSDs should be able to peer with others on the same host, right?

I look at the EC profiles defined, and see only "default" which has k=2,m=1.  
Wanting to set up a more realistic test, I defined a new profile "k8m3", 
similar to default, but with k=8,m=3.

Checked with "ceph osd erasure-code-profile get k8m3", all looks good.

I then go to define my pool: "ceph osd pool create ecpool 256 256 erasure k8m3" 
apparently succeeds.

*Sidebar: my math on the pgnum stuff was (27 pools * 100)/11 = ~246, 
round up to 256.

Now I ask "ceph health", and get:
HEALTH_WARN 256 pgs incomplete; 256 pgs stuck inactive; 256 pgs stuck unclean; 
too few pgs per osd (9 < min 20)

Digging in to this a bit ("ceph health detail"), I see the magic OSD number 
(2147483647) that says that there weren't enough OSDs to assign to a placement 
group, for all placement groups.  And at the same time, it is warning me that I 
have too few PGs per OSD.

At the moment, I am defining a traditional replicated pool (3X) to see if that 
will work...  Anyone have any guess as to what I may be doing incorrectly with 
my erasure coded pool?  Or what I should do next to get a clue?

Regards,

-don-

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fresh install of GIANT failing?

2015-03-02 Thread Don Doerner
Problem solved, I've been pointed at repository problem and an existing Ceph 
issue (http://tracker.ceph.com/issues/10476) by a couple of helpful folks.
Thanks,

-don-

From: Don Doerner
Sent: 02 March, 2015 10:20
To: Don Doerner; ceph-users@lists.ceph.com
Subject: RE: Fresh install of GIANT failing?
Sensitivity: Personal

Oops, typo...  should say "Using ceph-deploy, I see a failure to install ceph 
on a RHEL7 node"...

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 02 March, 2015 10:17
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: [ceph-users] Fresh install of GIANT failing?
Sensitivity: Personal

All,
Using ceph-deploy, I see a failure to install ceph on a node.  At the beginning 
of the ceph-deploy output, it says it is installing "stable version giant".

The last few lines are...
[192.168.167.192][DEBUG ] --> Finished Dependency Resolution
[192.168.167.192][WARNIN] Error: Package: 1:python-flask-0.10.1-3.el7.noarch 
(Ceph-noarch)
[192.168.167.192][WARNIN]Requires: python-jinja2
[192.168.167.192][WARNIN] Error: Package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: libcephfs1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][DEBUG ]  You could try using --skip-broken to work around the 
problem
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:libcephfs1-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rados-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librados2 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librados2-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librados2-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:librados2-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rbd-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librbd1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 1:librbd1-0.87.1-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87.1-0.el7
[192.168.167.192][DEBUG ]  You could try running: rpm -Va --nofiles --nodigest
[192.168.167.192][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install 
ceph

Offhand , it looks like there is some FIREFLY stuff in the GIANT repo?  Also, 
what source is everyone using for "python-jinja2"?

Prior to the recent GIANT update, this all worked (except for "python-jinja2" 
which I got from an EU repo somewhere...).

Any idea what I did wrong?

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fresh install of GIANT failing?

2015-03-02 Thread Don Doerner
Oops, typo...  should say "Using ceph-deploy, I see a failure to install ceph 
on a RHEL7 node"...

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 02 March, 2015 10:17
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Fresh install of GIANT failing?
Sensitivity: Personal

All,
Using ceph-deploy, I see a failure to install ceph on a node.  At the beginning 
of the ceph-deploy output, it says it is installing "stable version giant".

The last few lines are...
[192.168.167.192][DEBUG ] --> Finished Dependency Resolution
[192.168.167.192][WARNIN] Error: Package: 1:python-flask-0.10.1-3.el7.noarch 
(Ceph-noarch)
[192.168.167.192][WARNIN]Requires: python-jinja2
[192.168.167.192][WARNIN] Error: Package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: libcephfs1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][DEBUG ]  You could try using --skip-broken to work around the 
problem
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:libcephfs1-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rados-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librados2 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librados2-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librados2-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:librados2-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rbd-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librbd1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 1:librbd1-0.87.1-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87.1-0.el7
[192.168.167.192][DEBUG ]  You could try running: rpm -Va --nofiles --nodigest
[192.168.167.192][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install 
ceph

Offhand , it looks like there is some FIREFLY stuff in the GIANT repo?  Also, 
what source is everyone using for "python-jinja2"?

Prior to the recent GIANT update, this all worked (except for "python-jinja2" 
which I got from an EU repo somewhere...).

Any idea what I did wrong?

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fresh install of GIANT failing?

2015-03-02 Thread Don Doerner
All,
Using ceph-deploy, I see a failure to install ceph on a node.  At the beginning 
of the ceph-deploy output, it says it is installing "stable version giant".

The last few lines are...
[192.168.167.192][DEBUG ] --> Finished Dependency Resolution
[192.168.167.192][WARNIN] Error: Package: 1:python-flask-0.10.1-3.el7.noarch 
(Ceph-noarch)
[192.168.167.192][WARNIN]Requires: python-jinja2
[192.168.167.192][WARNIN] Error: Package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: libcephfs1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][DEBUG ]  You could try using --skip-broken to work around the 
problem
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:libcephfs1-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rados-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librados2 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librados2-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librados2-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:librados2-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rbd-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librbd1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 1:librbd1-0.87.1-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87.1-0.el7
[192.168.167.192][DEBUG ]  You could try running: rpm -Va --nofiles --nodigest
[192.168.167.192][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install 
ceph

Offhand , it looks like there is some FIREFLY stuff in the GIANT repo?  Also, 
what source is everyone using for "python-jinja2"?

Prior to the recent GIANT update, this all worked (except for "python-jinja2" 
which I got from an EU repo somewhere...).

Any idea what I did wrong?

Regards,

-don-

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD deprecated?

2015-02-05 Thread Don Doerner
Ken,

Thanks for the reply.

It's really good news, that RBD is considered strategic.  I'm guessing I can 
use the firefly kernel modules on a giant ceph system, as long as RHEL7 is in 
play?  No serious changes in that code from firefly to giant (I'm hoping)?

 
Regards,

-don-


On Thursday, February 5, 2015 10:05 AM, Ken Dreyer  wrote:
 


On 02/05/2015 08:55 AM, Don Doerner wrote:

> I have been using Ceph to provide block devices for various, nefarious
> purposes (mostly testing ;-).  But as I have worked with various Linux
> distributions (RHEL7, CentOS6, CentOS7) and various Ceph releases
> (firefly, giant), I notice that the onlycombination for which I seem
> able to find the needed kernel modules (rbd, libceph) is RHEL7-firefly.

Hi Don,

The RBD kernel module is not deprecated; quite the opposite in fact.

A year ago things were a bit rough regarding supporting the Ceph kernel
modules on RHEL 6 and 7. All Ceph kernel module development goes
upstream first into Linus' kernel tree, and that tree is very different
than what ships in RHEL 6 (2.6.32 plus a lot of patches) and RHEL 7
(3.10.0 plus a lot of patches). This meant that it was historically much
harder for the Ceph developer community to integrate what was going on
upstream with what was happening in the downstream RHEL kernels.

Currently, Red Hat's plan is to ship rbd.ko and some of the associated
firefly userland bits in RHEL 7.1. You mention that you've been testing
on RHEL 7, so I'm guessing you're got a RHEL subscription. As it turns
out, you can try the new kernel package out today in the RHEL 7.1 Beta
that's available to all RHEL subscribers. It's a beta, so please open
support requests with Red Hat if you happen to hit bugs with those new
packages.

Unfortunately CentOS does not rebuild and publish the public RHEL Betas,
so for CentOS 7, you'll have to wait until RHEL 7.1 reaches GA and
CentOS 7.1 rebuilds it. (I suppose you could jump ahead of the CentOS
developers here and rebuild your own kernel package and ceph userland if
you're really eager... but you're really on your own there :)

- Ken___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD deprecated?

2015-02-05 Thread Don Doerner
All,

I have been using Ceph to provide block devices for various, nefarious purposes 
(mostly testing ;-).  But as I have worked with various Linux distributions 
(RHEL7, CentOS6, CentOS7) and various Ceph releases (firefly, giant), I notice 
that the onlycombination for which I seem able to find the needed kernel 
modules (rbd, libceph) is RHEL7-firefly.

I've gone a far ways with that one combination, but I am hitting some limits 
now, in terms of flexibility of configurations.


This suggests to me that either I don't know where/how I am supposed to get the 
needed kernel modules or (more likely, given that I found the RHEL7-firefly 
combination) that the block device presentation is deprecated (or at least 
disfavored).

Can anyone comment on whether it's a good idea to plan around it, going forward?

 
Regards,

-don-___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Moving a Ceph cluster (to a new network)

2015-01-30 Thread Don Doerner
All,

I built up a ceph system on my little development network, then tried to move 
it to a different network.  I edited the ceph.conf file, and fired it up and... 
well, I discovered that I was a bit naive.

I looked through the documentation pretty carefully, and I can't see any list 
of places that the original network addresses are stashed.  Can anyone point me 
at a procedure for changing network addresses like that?  Or point me at a list 
of what all has to be updated (e.g., I am guessing that my keys are all broken)?

In my case, I could recreate the entire cluster but later, when the OSDs have 
valuable data, that won't be an option.  So I'd like to learn how to do this 
now, when the jeopardy is low...

 
Regards,

-don-___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Different flavors of storage?

2015-01-23 Thread Don Doerner
These were exactly the pointers I needed, thank you both very much.

 
Regards,

-don-


On Friday, January 23, 2015 1:09 AM, Luis Periquito  wrote:
 


you have a nice howto here 
http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/ on 
how to do this with crush rules.


On Fri, Jan 23, 2015 at 6:06 AM, Jason King  wrote:

Hi Don,
>
>
>Take a look at CRUSH settings.
>http://ceph.com/docs/master/rados/operations/crush-map/
>
>
>
>Jason
>
>
>2015-01-22 2:41 GMT+08:00 Don Doerner :
>
>OK, I've set up 'giant' in a single-node cluster, played with a replicated 
>pool and an EC pool.  All goes well so far.  Question: I have two different 
>kinds of HDD in my server - some fast, 15K RPM SAS drives and some big, slow 
>(5400 RPM!) SATA drives.
>>
>>
>>Right now, I have OSDs on all, and when I created my pool, it got spread over 
>>all of these drives like peanut butter.
>>
>>
>>The documentation (e.g., the documentation on cache tiering) hints that its 
>>possible to differentiate fast from slow devices, but for the life of me, I 
>>can't see how to create a pool on specific OSDs.  So it must be done some 
>>different way...
>>
>>
>>Can someone please provide a pointer?
>>
>> 
>>Regards,
>>
>>
>>-don-
>>___
>>ceph-users mailing list
>>ceph-users@lists.ceph.com
>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure coded pool why ever k>1?

2015-01-21 Thread Don Doerner
Well, look at it this way: with 3X replication, for each TB of data you need 3 
TB disk.  With (for example) 10+3 EC, you get better protection, and for each 
TB of data you need 1.3 TB disk.

-don-


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Loic 
Dachary
Sent: 21 January, 2015 15:18
To: Chad William Seys; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] erasure coded pool why ever k>1?



On 21/01/2015 22:42, Chad William Seys wrote:
> Hello all,
>   What reasons would one want k>1?
>   I read that m determines the number of OSD which can fail before 
> loss.  But I don't see explained how to choose k.  Any benefits for choosing 
> k>1?

The size of each chunk is object size / K. If you have K=1 and M=2 it will be 
the same as 3 replicas with none of the advantages ;-)

Cheers

--
Loïc Dachary, Artisan Logiciel Libre

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Different flavors of storage?

2015-01-21 Thread Don Doerner
OK, I've set up 'giant' in a single-node cluster, played with a replicated pool 
and an EC pool.  All goes well so far.  Question: I have two different kinds of 
HDD in my server - some fast, 15K RPM SAS drives and some big, slow (5400 RPM!) 
SATA drives.

Right now, I have OSDs on all, and when I created my pool, it got spread over 
all of these drives like peanut butter.

The documentation (e.g., the documentation on cache tiering) hints that its 
possible to differentiate fast from slow devices, but for the life of me, I 
can't see how to create a pool on specific OSDs.  So it must be done some 
different way...

Can someone please provide a pointer?

 
Regards,

-don-___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] unsubscribe

2015-01-12 Thread Don Doerner
unsubscribe

Regards,

-don-

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph erasure-coded pool

2015-01-12 Thread Don Doerner
All,
I wish to experiment with erasure-coded pools in Ceph.  I've got some questions:

1.  Is FIREFLY a reasonable release to be using to try EC pools?  When I 
look at various bits of development info, it appears that the work is complete 
in FIREFLY, but I thought I'd askJ

2.  It looks, in FIREFLY, as if not all I/O operations can be performed on 
EC pools.  I am trying to work with RBD clients, and I've run into some 
conflicting information... can RBDs run on EC pools directly, or is a caching 
tier required?

a.  Assuming a cache tier is required, where might I read information on 
sizing the cache tier?

b.  Looking through the issues, it appears there are some race conditions 
(e.g., #9285<http://tracker.ceph.com/issues/9285>) for cache tiers in FIREFLY.  
Should I avoid cache tiers at this level?  At what level, if any, are these 
addressed (I don't see commits in #9285<http://tracker.ceph.com/issues/9285>, 
for example)?

3.  When configuring EC pools, to specify the number of PGs, can I 
reasonably assume that I should use (K+M) instead of a replica count?  So, for 
example, if I have 24 OSDs, and my EC profile has K=8 and M=4, then I should 
specify 200 (i.e., (24*100)/12) placement groups?

4.  As I add OSDs, can I adjust the number of PGs?
Thanks in advance...
___________

Don Doerner
Quantum Corporation

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com