Re: [ceph-users] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED

2019-05-01 Thread Joe Ryner
I think I have figured out the issue.

 POOLSIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET
RATIO  PG_NUM  NEW PG_NUM  AUTOSCALE
 images28523G3.068779G  1.2441
 1000  warn

My images are 28523G with a replication level 3 and have a total of
68779G in Raw Capacity.

 According to the documentation
http://docs.ceph.com/docs/master/rados/operations/placement-groups/

"*SIZE* is the amount of data stored in the pool. *TARGET SIZE*, if
present, is the amount of data the administrator has specified that
they expect to eventually be stored in this pool. The system uses the
larger of the two values for its calculation.

*RATE* is the multiplier for the pool that determines how much raw storage
capacity is consumed. For example, a 3 replica pool will have a ratio of
3.0, while a k=4,m=2 erasure coded pool will have a ratio of 1.5.

*RAW CAPACITY* is the total amount of raw storage capacity on the OSDs that
are responsible for storing this pool’s (and perhaps other pools’) data.
*RATIO* is the ratio of that total capacity that this pool is consuming
(i.e., ratio = size * rate / raw capacity)."

So ratio = "28523G * 3.0/68779G" = 1.2441x


So I'm oversubscribing by 1.2441x, thus the warning.


But ... looking at #ceph df

POOL ID STORED  OBJECTS USED%USED MAX AVAIL

images3 9.3 TiB   2.82M  28 TiB 57.94   6.7 TiB


I believe the 9.3TiB is the amount I have that is thinly provisioned
vs a fully provisioned 28 TiB?

The raw capacity of the cluster is sitting at about 50% used.


Shouldn't the ratio be the amount STORED(from ceph df) * SIZE (from
ceph osd pool autoscale-status) / Raw Capacity, since ceph uses thin
provisioning in rbd?

Otherwise, this ratio will only work for people who don't thin
provision which goes against what ceph is doing with rbd

http://docs.ceph.com/docs/master/rbd/





On Wed, May 1, 2019 at 11:44 AM Joe Ryner  wrote:

> I have found a little more information.
> When I turn off pg_autoscaler the warning goes away turn it back on and
> the warning comes back.
>
> I have ran the following:
> # ceph osd pool autoscale-status
>  POOLSIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO
> PG_NUM  NEW PG_NUM  AUTOSCALE
>  images28523G3.068779G  1.2441
>   1000  warn
>  locks 676.5M3.068779G  0.
>  8  warn
>  rbd   0 3.068779G  0.
>  8  warn
>  data  0 3.068779G  0.
>  8  warn
>  metadata   3024k3.068779G  0.
>  8  warn
>
> # ceph df
> RAW STORAGE:
> CLASS SIZE   AVAIL   USEDRAW USED %RAW USED
> hdd   51 TiB  26 TiB  24 TiB   24 TiB 48.15
> ssd   17 TiB 8.5 TiB 8.1 TiB  8.1 TiB 48.69
> TOTAL 67 TiB  35 TiB  32 TiB   32 TiB 48.28
>
> POOLS:
> POOL ID STORED  OBJECTS USED%USED MAX
> AVAIL
> data  0 0 B   0 0 B 0
>  6.7 TiB
> metadata  1 6.3 KiB  21 3.0 MiB 0
>  6.7 TiB
> rbd   2 0 B   2 0 B 0
>  6.7 TiB
> images3 9.3 TiB   2.82M  28 TiB 57.94
>  6.7 TiB
> locks 4 215 MiB 517 677 MiB 0
>  6.7 TiB
>
>
> It looks to me like it thinks the images pool no right in the
> autoscale-status.
>
> Below is a osd crush tree
> # ceph osd crush tree
> ID  CLASS WEIGHT   (compat) TYPE NAME
>  -1   66.73337  root default
>  -3   22.28214 22.28214 rack marack
>  -87.27475  7.27475 host abacus
>  19   hdd  1.81879  1.81879 osd.19
>  20   hdd  1.81879  1.42563 osd.20
>  21   hdd  1.81879  1.81879 osd.21
>  50   hdd  1.81839  1.81839 osd.50
> -107.76500  6.67049 host gold
>   7   hdd  0.86299  0.83659 osd.7
>   9   hdd  0.86299  0.78972 osd.9
>  10   hdd  0.86299  0.72031 osd.10
>  14   hdd  0.86299  0.65315 osd.14
>  15   hdd  0.86299  0.72586 osd.15
>  22   hdd  0.86299  0.80528 osd.22
>  23   hdd  0.86299  0.63741 osd.23
>  24   hdd  0.86299  0.77718 osd.24
>  25   hdd  0.86299  0.72499 osd.25
>  -57.24239  7.24239 host hassium
>   0   hdd  1.80800  1.52536 osd.0
>   1   hdd  1.80800  1.65421 osd.1
>  26   hdd  1.80800  1.65140 osd.26
>  51   hdd  1.

Re: [ceph-users] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED

2019-05-01 Thread Joe Ryner
I have found a little more information.
When I turn off pg_autoscaler the warning goes away turn it back on and the
warning comes back.

I have ran the following:
# ceph osd pool autoscale-status
 POOLSIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO
PG_NUM  NEW PG_NUM  AUTOSCALE
 images28523G3.068779G  1.2441
1000  warn
 locks 676.5M3.068779G  0.
   8  warn
 rbd   0 3.068779G  0.
   8  warn
 data  0 3.068779G  0.
   8  warn
 metadata   3024k3.068779G  0.
   8  warn

# ceph df
RAW STORAGE:
CLASS SIZE   AVAIL   USEDRAW USED %RAW USED
hdd   51 TiB  26 TiB  24 TiB   24 TiB 48.15
ssd   17 TiB 8.5 TiB 8.1 TiB  8.1 TiB 48.69
TOTAL 67 TiB  35 TiB  32 TiB   32 TiB 48.28

POOLS:
POOL ID STORED  OBJECTS USED%USED MAX
AVAIL
data  0 0 B   0 0 B 0   6.7
TiB
metadata  1 6.3 KiB  21 3.0 MiB 0   6.7
TiB
rbd   2 0 B   2 0 B 0   6.7
TiB
images3 9.3 TiB   2.82M  28 TiB 57.94   6.7
TiB
locks 4 215 MiB 517 677 MiB 0   6.7
TiB


It looks to me like it thinks the images pool no right in the
autoscale-status.

Below is a osd crush tree
# ceph osd crush tree
ID  CLASS WEIGHT   (compat) TYPE NAME
 -1   66.73337  root default
 -3   22.28214 22.28214 rack marack
 -87.27475  7.27475 host abacus
 19   hdd  1.81879  1.81879 osd.19
 20   hdd  1.81879  1.42563 osd.20
 21   hdd  1.81879  1.81879 osd.21
 50   hdd  1.81839  1.81839 osd.50
-107.76500  6.67049 host gold
  7   hdd  0.86299  0.83659 osd.7
  9   hdd  0.86299  0.78972 osd.9
 10   hdd  0.86299  0.72031 osd.10
 14   hdd  0.86299  0.65315 osd.14
 15   hdd  0.86299  0.72586 osd.15
 22   hdd  0.86299  0.80528 osd.22
 23   hdd  0.86299  0.63741 osd.23
 24   hdd  0.86299  0.77718 osd.24
 25   hdd  0.86299  0.72499 osd.25
 -57.24239  7.24239 host hassium
  0   hdd  1.80800  1.52536 osd.0
  1   hdd  1.80800  1.65421 osd.1
 26   hdd  1.80800  1.65140 osd.26
 51   hdd  1.81839  1.81839 osd.51
 -2   21.30070 21.30070 rack marack2
-127.76999  8.14474 host hamms
 27   ssd  0.86299  0.99367 osd.27
 28   ssd  0.86299  0.95961 osd.28
 29   ssd  0.86299  0.80768 osd.29
 30   ssd  0.86299  0.86893 osd.30
 31   ssd  0.86299  0.92583 osd.31
 32   ssd  0.86299  1.00227 osd.32
 33   ssd  0.86299  0.73099 osd.33
 34   ssd  0.86299  0.80766 osd.34
 35   ssd  0.86299  1.04811 osd.35
 -75.45636  5.45636 host parabola
  5   hdd  1.81879  1.81879 osd.5
 12   hdd  1.81879  1.81879 osd.12
 13   hdd  1.81879  1.81879 osd.13
 -62.63997  3.08183 host radium
  2   hdd  0.87999  1.05594 osd.2
  6   hdd  0.87999  1.10501 osd.6
 11   hdd  0.87999  0.92088 osd.11
 -95.43439  5.43439 host splinter
 16   hdd  1.80800  1.80800 osd.16
 17   hdd  1.81839  1.81839 osd.17
 18   hdd  1.80800  1.80800 osd.18
-11   23.15053 23.15053 rack marack3
-138.63300  8.98921 host helm
 36   ssd  0.86299  0.71931 osd.36
 37   ssd  0.86299  0.92601 osd.37
 38   ssd  0.86299  0.79585 osd.38
 39   ssd  0.86299  1.08521 osd.39
 40   ssd  0.86299  0.89500 osd.40
 41   ssd  0.86299  0.92351 osd.41
 42   ssd  0.86299  0.89690 osd.42
 43   ssd  0.86299  0.92480 osd.43
 44   ssd  0.86299  0.84467 osd.44
 45   ssd  0.86299  0.97795 osd.45
-407.27515  7.89609 host samarium
 46   hdd  1.81879  1.90242 osd.46
 47   hdd  1.81879  1.86723 osd.47
 48   hdd  1.81879  1.93404 osd.48
 49   hdd  1.81879  2.19240 osd.49
 -47.24239  7.24239 host scandium
  3   hdd  1.80800  1.76680 osd.3
  4   hdd  1.80800  1.80800 osd.4
  8   hdd  1.80800  1.80800 osd.8
 52   hdd  1.81839  1.81839 osd.52


Any ideas?





On Wed, May 1, 2019 at 9:32 AM Joe Ryner  wrote:

> Hi,
>
> I have an old ceph cluster and have upgraded recently from Luminous to
> Nautilus.  Afte

[ceph-users] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED

2019-05-01 Thread Joe Ryner
Hi,

I have an old ceph cluster and have upgraded recently from Luminous to
Nautilus.  After converting to Nautilus I decided it was time to convert to
bluestore.

Before I converted the cluster was healthy but after I have a HEALTH_WARN

#ceph health detail
HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1
subtrees have overcommitted pool target_size_ratio
POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool
target_size_bytes
Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit
available storage by 1.244x due to target_size_bytes0  on pools []
POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool
target_size_ratio
Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit
available storage by 1.244x due to target_size_ratio 0.000 on pools []

I started with a target_size ratio of .85 on the images pool and reduced it
to 0 to hopefully get the warning to go away.  The cluster seems to be
running fine, I just can't figure out what the problem is and how to make
the message go away.  I restarted the monitors this morning in hopes to fix
it.  Anyone have any ideas?

Thanks in advance


-- 
Joe Ryner
Associate Director
Center for the Application of Information Technologies (CAIT) -
http://www.cait.org
Western Illinois University - http://www.wiu.edu


P: (309) 298-1804
F: (309) 298-2806
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Upgrade to hammer, crush tuneables issue

2015-11-24 Thread Joe Ryner
quot;: "500",
"filestore_queue_committing_max_bytes": "104857600",
"filestore_op_threads": "2",
"filestore_op_thread_timeout": "60",
"filestore_op_thread_suicide_timeout": "180",
"filestore_commit_timeout": "600",
"filestore_fiemap_threshold": "4096",
"filestore_merge_threshold": "10",
"filestore_split_multiple": "2",
"filestore_update_to": "1000",
"filestore_blackhole": "false",
"filestore_fd_cache_size": "128",
"filestore_fd_cache_shards": "16",
"filestore_dump_file": "",
"filestore_kill_at": "0",
"filestore_inject_stall": "0",
"filestore_fail_eio": "true",
"filestore_debug_verify_split": "false",
"journal_dio": "true",
"journal_aio": "true",
"journal_force_aio": "false",
"keyvaluestore_queue_max_ops": "50",
"keyvaluestore_queue_max_bytes": "104857600",
"keyvaluestore_debug_check_backend": "false",
"keyvaluestore_op_threads": "2",
"keyvaluestore_op_thread_timeout": "60",
"keyvaluestore_op_thread_suicide_timeout": "180",
"keyvaluestore_default_strip_size": "4096",
"keyvaluestore_max_expected_write_size": "16777216",
"keyvaluestore_header_cache_size": "4096",
"keyvaluestore_backend": "leveldb",
"journal_max_corrupt_search": "10485760",
"journal_block_align": "true",
"journal_write_header_frequency": "0",
"journal_ma

[ceph-users] Operating System Upgrade

2015-11-11 Thread Joe Ryner
Hi Everyone,

I am upgrading the nodes of my ceph cluster from centos 6.6 to centos 7.1.  My 
cluster is currently running Firefly 0.80.10.

I would like to not reformat the OSD Drives and just remount them under Centos 
7.1.

Is this supported?  Should it work?

When we upgrade to Centos 7.1 we will reformat the root volume and the /var 
volume and then reinstall ceph.

Thanks in advance.
Joe

-- 
Joe Ryner
Center for the Application of Information Technologies (CAIT)
Production Coordinator
P: (309) 298-1804
F: (309) 298-2806
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd hang

2015-11-05 Thread Joe Ryner
Its weird that it has even been working.  

Thanks again for your help!

- Original Message -
From: "Jason Dillaman" <dilla...@redhat.com>
To: "Joe Ryner" <jry...@cait.org>
Cc: ceph-us...@ceph.com
Sent: Thursday, November 5, 2015 4:29:49 PM
Subject: Re: [ceph-users] rbd hang

On the bright side, at least your week of export-related pain should result in 
a decent speed boost when your clients get 64MB of cache instead of 64B.

-- 

Jason Dillaman 


- Original Message -
> From: "Joe Ryner" <jry...@cait.org>
> To: "Jason Dillaman" <dilla...@redhat.com>
> Cc: ceph-us...@ceph.com
> Sent: Thursday, November 5, 2015 5:20:03 PM
> Subject: Re: [ceph-users] rbd hang
> 
> Thanks for the heads up.  I have had this set this way for a long time in all
> of my deployments.  I assumed that the units where in MB.
> 
> Arg..
> 
> I will test new settings.
> 
> Joe
> 
> - Original Message -
> From: "Jason Dillaman" <dilla...@redhat.com>
> To: "Joe Ryner" <jry...@cait.org>
> Cc: ceph-us...@ceph.com
> Sent: Thursday, November 5, 2015 3:52:20 PM
> Subject: Re: [ceph-users] rbd hang
> 
> It appears you have set your cache size to 64 bytes(!):
> 
> 2015-11-05 15:07:49.927510 7f0d9af5a760 20 librbd::ImageCtx: Initial cache
> settings: size=64 num_objects=10 max_dirty=32 target_dirty=16
> max_dirty_age=5
> 
> This exposed a known issue [1] when you attempt to read more data in a single
> read request than your cache can allocate.
> 
> [1] http://tracker.ceph.com/issues/13388
> 
> --
> 
> Jason Dillaman
> 
> 
> - Original Message -
> > From: "Joe Ryner" <jry...@cait.org>
> > To: "Jason Dillaman" <dilla...@redhat.com>
> > Cc: ceph-us...@ceph.com
> > Sent: Thursday, November 5, 2015 4:24:29 PM
> > Subject: Re: [ceph-users] rbd hang
> > 
> > It worked.
> > 
> > So Whats broken with caching?
> > 
> > - Original Message -
> > From: "Jason Dillaman" <dilla...@redhat.com>
> > To: "Joe Ryner" <jry...@cait.org>
> > Cc: ceph-us...@ceph.com
> > Sent: Thursday, November 5, 2015 3:18:39 PM
> > Subject: Re: [ceph-users] rbd hang
> > 
> > Can you retry with 'rbd --rbd-cache=false -p images export joe
> > /root/joe.raw'?
> > 
> > --
> > 
> > Jason Dillaman
> > 
> > - Original Message -
> > > From: "Joe Ryner" <jry...@cait.org>
> > > To: "Jason Dillaman" <dilla...@redhat.com>
> > > Cc: ceph-us...@ceph.com
> > > Sent: Thursday, November 5, 2015 4:14:28 PM
> > > Subject: Re: [ceph-users] rbd hang
> > > 
> > > Hi,
> > > 
> > > Do you have any ideas as to what might be wrong?  Since my last email I
> > > decided to recreate the cluster.  I am currently testing upgrading from
> > > 0.72
> > > to 0.80.10 with hopes to end up on hammer.
> > > 
> > > So I completely erased the cluster and reloaded the machines with centos
> > > 6.5(to match my production servers)
> > > I loaded 0.72 and got it working perfectly.
> > > Next I upgraded directly to 0.80.10 and now I'm stuck again.  I thought I
> > > had
> > > tested this procedure with 0.80.6 and it worked fine.  Not sure what
> > > changed
> > > or what starting on 0.72 might have to do with it.
> > > 
> > > I would greatly appreciate any help.
> > > 
> > > Joe
> > > 
> > > rbd -p images export joe /root/joe.raw
> > > 2015-11-05 15:07:49.920941 7f0d9af5a760  1 -- :/0 messenger.start
> > > 2015-11-05 15:07:49.922701 7f0d9af5a760  1 -- :/1007378 -->
> > > 10.134.128.41:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0
> > > 0x1558800
> > > con 0x15583f0
> > > 2015-11-05 15:07:49.923178 7f0d9af52700  1 -- 10.134.128.41:0/1007378
> > > learned
> > > my addr 10.134.128.41:0/1007378
> > > 2015-11-05 15:07:49.923533 7f0d95549700 10 client.?.objecter
> > > ms_handle_connect 0x15583f0
> > > 2015-11-05 15:07:49.923617 7f0d95549700 10 client.?.objecter
> > > resend_mon_ops
> > > 2015-11-05 15:07:49.924049 7f0d94547700 10
> > > throttle(msgr_dispatch_throttler-radosclient 0x1550df8) get 491 (0 ->
> > > 491)
> > > 2015-11-05 15:07:49.924242 7f0d94547700 10
> > > throttle(msgr_dispatch_throttler-radosclient 0x1550df8) get 33 (491 ->
> > > 524)
> > > 2015-11-05 15:07:49.924244 7f0d95549700  1 -- 10

Re: [ceph-users] rbd hang

2015-11-05 Thread Joe Ryner
 7f0d8d7fa700 11 objectcacher flusher 0 / 64:  0 tx, 
0 rx, 0 clean, 0 dirty (16 target, 32 max)
2015-11-05 15:07:54.926869 7f0d8effd700 10 client.277182.objecter tick
2015-11-05 15:07:54.928357 7f0d8d7fa700 11 objectcacher flusher 0 / 64:  0 tx, 
0 rx, 0 clean, 0 dirty (16 target, 32 max)
2015-11-05 15:07:55.928511 7f0d8d7fa700 11 objectcacher flusher 0 / 64:  0 tx, 
0 rx, 0 clean, 0 dirty (16 target, 32 max)
2015-11-05 15:07:56.928605 7f0d8d7fa700 11 objectcacher flusher 0 / 64:  0 tx, 
0 rx, 0 clean, 0 dirty (16 target, 32 max)
2015-11-05 15:07:57.928707 7f0d8d7fa700 11 objectcacher flusher 0 / 64:  0 tx, 
0 rx, 0 clean, 0 dirty (16 target, 32 max)


- Original Message -
From: "Jason Dillaman" <dilla...@redhat.com>
To: "Joe Ryner" <jry...@cait.org>
Cc: ceph-us...@ceph.com
Sent: Thursday, October 29, 2015 12:05:38 PM
Subject: Re: [ceph-users] rbd hang

I don't see the read request hitting the wire, so I am thinking your client 
cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. 
 Try adding "debug objecter = 20" to your configuration to get more details.

-- 

Jason Dillaman 

- Original Message -
> From: "Joe Ryner" <jry...@cait.org>
> To: ceph-us...@ceph.com
> Sent: Thursday, October 29, 2015 12:22:01 PM
> Subject: [ceph-users] rbd hang
> 
> i,
> 
> I am having a strange problem with our development cluster.  When I run rbd
> export it just hangs.  I have been running ceph for a long time and haven't
> encountered this kind of issue.  Any ideas as to what is going on?
> 
> rbd -p locks export seco101ira -
> 
> 
> I am running
> 
> Centos 6.6 x86 64
> 
> ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
> 
> I have enabled debugging and get the following when I run the command
> 
> [root@durbium ~]# rbd -p locks export seco101ira -
> 2015-10-29 11:17:08.183597 7fc3334fa7c0  1 librados: starting msgr at :/0
> 2015-10-29 11:17:08.183613 7fc3334fa7c0  1 librados: starting objecter
> 2015-10-29 11:17:08.183739 7fc3334fa7c0  1 -- :/0 messenger.start
> 2015-10-29 11:17:08.183779 7fc3334fa7c0  1 librados: setting wanted keys
> 2015-10-29 11:17:08.183782 7fc3334fa7c0  1 librados: calling monclient init
> 2015-10-29 11:17:08.184365 7fc3334fa7c0  1 -- :/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900
> con 0x15ba540
> 2015-10-29 11:17:08.185006 7fc3334f2700  1 -- 10.134.128.41:0/1024687 learned
> my addr 10.134.128.41:0/1024687
> 2015-10-29 11:17:08.185995 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 1  mon_map v1  491+0+0 (318324477 0 0)
> 0x7fc318000be0 con 0x15ba540
> 2015-10-29 11:17:08.186213 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 2  auth_reply(proto 2 0 (0) Success) v1 
> 33+0+0 (4093383511 0 0) 0x7fc318001090 con 0x15ba540
> 2015-10-29 11:17:08.186544 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
> 0x7fc31c001700 con 0x15ba540
> 2015-10-29 11:17:08.187160 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 3  auth_reply(proto 2 0 (0) Success) v1 
> 206+0+0 (2382192463 0 0) 0x7fc318001090 con 0x15ba540
> 2015-10-29 11:17:08.187354 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0
> 0x7fc31c002220 con 0x15ba540
> 2015-10-29 11:17:08.188001 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 4  auth_reply(proto 2 0 (0) Success) v1 
> 393+0+0 (34117402 0 0) 0x7fc3180008c0 con 0x15ba540
> 2015-10-29 11:17:08.188148 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x15b6b80 con
> 0x15ba540
> 2015-10-29 11:17:08.188334 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0
> 0x15b7700 con 0x15ba540
> 2015-10-29 11:17:08.188355 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0
> 0x15b7ca0 con 0x15ba540
> 2015-10-29 11:17:08.188445 7fc3334fa7c0  1 librados: init done
> 2015-10-29 11:17:08.188463 7fc3334fa7c0 10 librados: wait_for_osdmap waiting
> 2015-10-29 11:17:08.188625 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 5  mon_map v1  491+0+0 (318324477 0 0)
> 0x7fc318001300 con 0x15ba540
> 2015-10-29 11:17:08.188795 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0
> (646930372 0 0) 0x7fc3180015a0 con 0x15ba540
> 2015-10-29 11:17:08.189129 7fc32da9a7

Re: [ceph-users] rbd hang

2015-11-05 Thread Joe Ryner
It worked.

So Whats broken with caching?

- Original Message -
From: "Jason Dillaman" <dilla...@redhat.com>
To: "Joe Ryner" <jry...@cait.org>
Cc: ceph-us...@ceph.com
Sent: Thursday, November 5, 2015 3:18:39 PM
Subject: Re: [ceph-users] rbd hang

Can you retry with 'rbd --rbd-cache=false -p images export joe /root/joe.raw'?

-- 

Jason Dillaman 

- Original Message -
> From: "Joe Ryner" <jry...@cait.org>
> To: "Jason Dillaman" <dilla...@redhat.com>
> Cc: ceph-us...@ceph.com
> Sent: Thursday, November 5, 2015 4:14:28 PM
> Subject: Re: [ceph-users] rbd hang
> 
> Hi,
> 
> Do you have any ideas as to what might be wrong?  Since my last email I
> decided to recreate the cluster.  I am currently testing upgrading from 0.72
> to 0.80.10 with hopes to end up on hammer.
> 
> So I completely erased the cluster and reloaded the machines with centos
> 6.5(to match my production servers)
> I loaded 0.72 and got it working perfectly.
> Next I upgraded directly to 0.80.10 and now I'm stuck again.  I thought I had
> tested this procedure with 0.80.6 and it worked fine.  Not sure what changed
> or what starting on 0.72 might have to do with it.
> 
> I would greatly appreciate any help.
> 
> Joe
> 
> rbd -p images export joe /root/joe.raw
> 2015-11-05 15:07:49.920941 7f0d9af5a760  1 -- :/0 messenger.start
> 2015-11-05 15:07:49.922701 7f0d9af5a760  1 -- :/1007378 -->
> 10.134.128.41:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x1558800
> con 0x15583f0
> 2015-11-05 15:07:49.923178 7f0d9af52700  1 -- 10.134.128.41:0/1007378 learned
> my addr 10.134.128.41:0/1007378
> 2015-11-05 15:07:49.923533 7f0d95549700 10 client.?.objecter
> ms_handle_connect 0x15583f0
> 2015-11-05 15:07:49.923617 7f0d95549700 10 client.?.objecter resend_mon_ops
> 2015-11-05 15:07:49.924049 7f0d94547700 10
> throttle(msgr_dispatch_throttler-radosclient 0x1550df8) get 491 (0 -> 491)
> 2015-11-05 15:07:49.924242 7f0d94547700 10
> throttle(msgr_dispatch_throttler-radosclient 0x1550df8) get 33 (491 -> 524)
> 2015-11-05 15:07:49.924244 7f0d95549700  1 -- 10.134.128.41:0/1007378 <==
> mon.0 10.134.128.41:6789/0 1  mon_map v1  491+0+0 (785919251 0 0)
> 0x7f0d84000cb0 con 0x15583f0
> 2015-11-05 15:07:49.924344 7f0d95549700 10
> throttle(msgr_dispatch_throttler-radosclient 0x1550df8) put 491 (524 -> 33)
> 2015-11-05 15:07:49.924357 7f0d95549700  1 -- 10.134.128.41:0/1007378 <==
> mon.0 10.134.128.41:6789/0 2  auth_reply(proto 2 0 (0) Success) v1 
> 33+0+0 (1606025069 0 0) 0x7f0d84001230 con 0x15583f0
> 2015-11-05 15:07:49.924649 7f0d95549700  1 -- 10.134.128.41:0/1007378 -->
> 10.134.128.41:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
> 0x7f0d80001970 con 0x15583f0
> 2015-11-05 15:07:49.924682 7f0d95549700 10
> throttle(msgr_dispatch_throttler-radosclient 0x1550df8) put 33 (33 -> 0)
> 2015-11-05 15:07:49.925084 7f0d94547700 10
> throttle(msgr_dispatch_throttler-radosclient 0x1550df8) get 206 (0 -> 206)
> 2015-11-05 15:07:49.925133 7f0d95549700  1 -- 10.134.128.41:0/1007378 <==
> mon.0 10.134.128.41:6789/0 3  auth_reply(proto 2 0 (0) Success) v1 
> 206+0+0 (3696183790 0 0) 0x7f0d84001230 con 0x15583f0
> 2015-11-05 15:07:49.925467 7f0d95549700  1 -- 10.134.128.41:0/1007378 -->
> 10.134.128.41:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0
> 0x7f0d80002250 con 0x15583f0
> 2015-11-05 15:07:49.925497 7f0d95549700 10
> throttle(msgr_dispatch_throttler-radosclient 0x1550df8) put 206 (206 -> 0)
> 2015-11-05 15:07:49.926070 7f0d94547700 10
> throttle(msgr_dispatch_throttler-radosclient 0x1550df8) get 393 (0 -> 393)
> 2015-11-05 15:07:49.926161 7f0d95549700  1 -- 10.134.128.41:0/1007378 <==
> mon.0 10.134.128.41:6789/0 4  auth_reply(proto 2 0 (0) Success) v1 
> 393+0+0 (3778161986 0 0) 0x7f0d840014a0 con 0x15583f0
> 2015-11-05 15:07:49.926282 7f0d95549700  1 -- 10.134.128.41:0/1007378 -->
> 10.134.128.41:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x1558a10 con
> 0x15583f0
> 2015-11-05 15:07:49.926332 7f0d95549700 10
> throttle(msgr_dispatch_throttler-radosclient 0x1550df8) put 393 (393 -> 0)
> 2015-11-05 15:07:49.926458 7f0d9af5a760 10 client.277182.objecter
> maybe_request_map subscribing (onetime) to next osd map
> 2015-11-05 15:07:49.926484 7f0d9af5a760  1 -- 10.134.128.41:0/1007378 -->
> 10.134.128.41:6789/0 -- mon_subscribe({monmap=4+,osdmap=0}) v2 -- ?+0
> 0x1558800 con 0x15583f0
> 2015-11-05 15:07:49.926517 7f0d9af5a760  1 -- 10.134.128.41:0/1007378 -->
> 10.134.128.41:6789/0 -- mon_subscribe({monmap=4+,osdmap=0}) v2 -- ?+0
> 0x15568d0 con 0x15583f0
> 2015-11-05 15:07:49.926654 7f0d94547700 10
> throttle(msgr_dispatch_throttler-radosclient 0x1550

Re: [ceph-users] rbd hang

2015-11-05 Thread Joe Ryner
Thanks for the heads up.  I have had this set this way for a long time in all 
of my deployments.  I assumed that the units where in MB.

Arg..

I will test new settings.

Joe

- Original Message -
From: "Jason Dillaman" <dilla...@redhat.com>
To: "Joe Ryner" <jry...@cait.org>
Cc: ceph-us...@ceph.com
Sent: Thursday, November 5, 2015 3:52:20 PM
Subject: Re: [ceph-users] rbd hang

It appears you have set your cache size to 64 bytes(!):

2015-11-05 15:07:49.927510 7f0d9af5a760 20 librbd::ImageCtx: Initial cache 
settings: size=64 num_objects=10 max_dirty=32 target_dirty=16 max_dirty_age=5

This exposed a known issue [1] when you attempt to read more data in a single 
read request than your cache can allocate.

[1] http://tracker.ceph.com/issues/13388

-- 

Jason Dillaman 


- Original Message -
> From: "Joe Ryner" <jry...@cait.org>
> To: "Jason Dillaman" <dilla...@redhat.com>
> Cc: ceph-us...@ceph.com
> Sent: Thursday, November 5, 2015 4:24:29 PM
> Subject: Re: [ceph-users] rbd hang
> 
> It worked.
> 
> So Whats broken with caching?
> 
> - Original Message -
> From: "Jason Dillaman" <dilla...@redhat.com>
> To: "Joe Ryner" <jry...@cait.org>
> Cc: ceph-us...@ceph.com
> Sent: Thursday, November 5, 2015 3:18:39 PM
> Subject: Re: [ceph-users] rbd hang
> 
> Can you retry with 'rbd --rbd-cache=false -p images export joe
> /root/joe.raw'?
> 
> --
> 
> Jason Dillaman
> 
> - Original Message -
> > From: "Joe Ryner" <jry...@cait.org>
> > To: "Jason Dillaman" <dilla...@redhat.com>
> > Cc: ceph-us...@ceph.com
> > Sent: Thursday, November 5, 2015 4:14:28 PM
> > Subject: Re: [ceph-users] rbd hang
> > 
> > Hi,
> > 
> > Do you have any ideas as to what might be wrong?  Since my last email I
> > decided to recreate the cluster.  I am currently testing upgrading from
> > 0.72
> > to 0.80.10 with hopes to end up on hammer.
> > 
> > So I completely erased the cluster and reloaded the machines with centos
> > 6.5(to match my production servers)
> > I loaded 0.72 and got it working perfectly.
> > Next I upgraded directly to 0.80.10 and now I'm stuck again.  I thought I
> > had
> > tested this procedure with 0.80.6 and it worked fine.  Not sure what
> > changed
> > or what starting on 0.72 might have to do with it.
> > 
> > I would greatly appreciate any help.
> > 
> > Joe
> > 
> > rbd -p images export joe /root/joe.raw
> > 2015-11-05 15:07:49.920941 7f0d9af5a760  1 -- :/0 messenger.start
> > 2015-11-05 15:07:49.922701 7f0d9af5a760  1 -- :/1007378 -->
> > 10.134.128.41:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x1558800
> > con 0x15583f0
> > 2015-11-05 15:07:49.923178 7f0d9af52700  1 -- 10.134.128.41:0/1007378
> > learned
> > my addr 10.134.128.41:0/1007378
> > 2015-11-05 15:07:49.923533 7f0d95549700 10 client.?.objecter
> > ms_handle_connect 0x15583f0
> > 2015-11-05 15:07:49.923617 7f0d95549700 10 client.?.objecter resend_mon_ops
> > 2015-11-05 15:07:49.924049 7f0d94547700 10
> > throttle(msgr_dispatch_throttler-radosclient 0x1550df8) get 491 (0 -> 491)
> > 2015-11-05 15:07:49.924242 7f0d94547700 10
> > throttle(msgr_dispatch_throttler-radosclient 0x1550df8) get 33 (491 -> 524)
> > 2015-11-05 15:07:49.924244 7f0d95549700  1 -- 10.134.128.41:0/1007378 <==
> > mon.0 10.134.128.41:6789/0 1  mon_map v1  491+0+0 (785919251 0 0)
> > 0x7f0d84000cb0 con 0x15583f0
> > 2015-11-05 15:07:49.924344 7f0d95549700 10
> > throttle(msgr_dispatch_throttler-radosclient 0x1550df8) put 491 (524 -> 33)
> > 2015-11-05 15:07:49.924357 7f0d95549700  1 -- 10.134.128.41:0/1007378 <==
> > mon.0 10.134.128.41:6789/0 2  auth_reply(proto 2 0 (0) Success) v1 
> > 33+0+0 (1606025069 0 0) 0x7f0d84001230 con 0x15583f0
> > 2015-11-05 15:07:49.924649 7f0d95549700  1 -- 10.134.128.41:0/1007378 -->
> > 10.134.128.41:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
> > 0x7f0d80001970 con 0x15583f0
> > 2015-11-05 15:07:49.924682 7f0d95549700 10
> > throttle(msgr_dispatch_throttler-radosclient 0x1550df8) put 33 (33 -> 0)
> > 2015-11-05 15:07:49.925084 7f0d94547700 10
> > throttle(msgr_dispatch_throttler-radosclient 0x1550df8) get 206 (0 -> 206)
> > 2015-11-05 15:07:49.925133 7f0d95549700  1 -- 10.134.128.41:0/1007378 <==
> > mon.0 10.134.128.41:6789/0 3  auth_reply(proto 2 0 (0) Success) v1 
> > 206+0+0 (3696183790 0 0) 0x7f0d84001230 con 0x15583f0
> > 2015-11-05 15:07:49.925467 7f0d95549700  1 -- 10.134.128.41:0/1007378 --

[ceph-users] rbd hang

2015-10-29 Thread Joe Ryner
mount options xfs = rw,noatime,nodiratime

 filestore xattr use omap = true
 # osd mkfs type = btrfs
 osd mkfs type = xfs
 osd mkfs options btrfs = -L $name


# CAIT -- Manual commands to make and mount file system

# -- Make xfs file system
# mkfs -t xfs -f -L ceph-X -d su=64k,sw=1 /dev/sdX1

# -- Rescan Parition Label
# partprobe /dev/sdX1

# -- Mount ceph file system
# mount -o rw,noatime,nodiratime /dev/disk/by-label/ceph-X 
/var/lib/ceph/osd/ceph-X

[mon.durbium]
host = durbium
mon addr = 10.134.128.41:6789

[mon.zirconium]
host = zirconium
mon addr = 10.134.128.43:6789

[mon.stone]
host = stone
mon addr = 10.134.128.42:6789

[osd.0]
 host = durbium
 devs = /dev/vg-osd-0/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-0/lv-journal

[osd.1]
 host = zirconium
 devs = /dev/vg-osd-1/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-1/lv-journal

[osd.2]
 host = zirconium
 devs = /dev/vg-osd-2/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-2/lv-journal


[osd.3]
 host = zirconium
 devs = /dev/vg-osd-3/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-3/lv-journal

[osd.4]
 host = zirconium
 devs = /dev/vg-osd-4/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-4/lv-journal
[osd.5]
 host = stone
 devs = /dev/vg-osd-5/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-5/lv-journal

[osd.6]
 host = stone
 devs = /dev/vg-osd-6/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-6/lv-journal

[osd.7]
 host = stone
 devs = /dev/vg-osd-7/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-7/lv-journal

[osd.8]
 host = stone
 devs = /dev/vg-osd-8/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-8/lv-journal




-- 
Joe Ryner
Center for the Application of Information Technologies (CAIT)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd export hangs

2015-10-29 Thread Joe Ryner
tions xfs = rw,noatime,nodiratime

 filestore xattr use omap = true
 # osd mkfs type = btrfs
 osd mkfs type = xfs
 osd mkfs options btrfs = -L $name


# CAIT -- Manual commands to make and mount file system

# -- Make xfs file system
# mkfs -t xfs -f -L ceph-X -d su=64k,sw=1 /dev/sdX1

# -- Rescan Parition Label
# partprobe /dev/sdX1

# -- Mount ceph file system
# mount -o rw,noatime,nodiratime /dev/disk/by-label/ceph-X 
/var/lib/ceph/osd/ceph-X

[mon.durbium]
host = durbium
mon addr = 10.134.128.41:6789

[mon.zirconium]
host = zirconium
mon addr = 10.134.128.43:6789

[mon.stone]
host = stone
mon addr = 10.134.128.42:6789

[osd.0]
 host = durbium
 devs = /dev/vg-osd-0/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-0/lv-journal

[osd.1]
 host = zirconium
 devs = /dev/vg-osd-1/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-1/lv-journal

[osd.2]
 host = zirconium
 devs = /dev/vg-osd-2/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-2/lv-journal


[osd.3]
 host = zirconium
 devs = /dev/vg-osd-3/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-3/lv-journal

[osd.4]
 host = zirconium
 devs = /dev/vg-osd-4/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-4/lv-journal
[osd.5]
 host = stone
 devs = /dev/vg-osd-5/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-5/lv-journal

[osd.6]
 host = stone
 devs = /dev/vg-osd-6/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-6/lv-journal

[osd.7]
 host = stone
 devs = /dev/vg-osd-7/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-7/lv-journal

[osd.8]
 host = stone
 devs = /dev/vg-osd-8/lv-osd
 osd mkfs type = xfs
 osd journal = /dev/vg-osd-8/lv-journal




-- 
Joe Ryner
Center for the Application of Information Technologies (CAIT)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd hang

2015-10-29 Thread Joe Ryner
x2307e40
2015-10-29 13:13:49.499110 7f5c270db700 10 client.7368.objecter in 
handle_osd_op_reply
2015-10-29 13:13:49.499112 7f5c270db700  7 client.7368.objecter 
handle_osd_op_reply 3 ondisk v 0'0 uv 1 in 4.a982c550 attempt 0
2015-10-29 13:13:49.499115 7f5c270db700 10 client.7368.objecter  op 0 rval 0 
len 12
2015-10-29 13:13:49.499117 7f5c270db700 15 client.7368.objecter 
handle_osd_op_reply ack
2015-10-29 13:13:49.499120 7f5c270db700 15 client.7368.objecter 
handle_osd_op_reply completed tid 3
2015-10-29 13:13:49.499131 7f5c270db700 15 client.7368.objecter finish_op 3
2015-10-29 13:13:49.499171 7f5c270db700  5 client.7368.objecter 0 unacked, 0 
uncommitted
2015-10-29 13:13:49.499186 7f5c2cb3b7c0 10 librados: Objecter returned from 
call r=0
2015-10-29 13:13:49.499257 7f5c2cb3b7c0 10 librados: call oid=seco101ira.rbd 
nspace=
2015-10-29 13:13:49.499271 7f5c2cb3b7c0 10 client.7368.objecter calc_target 
pgid 4.a982c550 acting [2,5]
2015-10-29 13:13:49.499274 7f5c2cb3b7c0 20 client.7368.objecter  note: not 
requesting commit
2015-10-29 13:13:49.499277 7f5c2cb3b7c0 10 client.7368.objecter op_submit oid 
seco101ira.rbd @4 @4 [call lock.get_info] tid 4 osd.2
2015-10-29 13:13:49.499281 7f5c2cb3b7c0 15 client.7368.objecter send_op 4 to 
osd.2
2015-10-29 13:13:49.499284 7f5c2cb3b7c0  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.43:6803/2741 -- osd_op(client.7368.0:4 seco101ira.rbd [call 
lock.get_info] 4.a982c550 ack+read e4350) v4 -- ?+0 0x230b330 con 0x2307e40
2015-10-29 13:13:49.499311 7f5c2cb3b7c0  5 client.7368.objecter 1 unacked, 0 
uncommitted
2015-10-29 13:13:49.500174 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== osd.2 
10.134.128.43:6803/2741 4  osd_op_reply(4 seco101ira.rbd [call] v0'0 uv1 
ondisk = 0) v6  181+0+15 (3120959177 0 2149983739) 0x7f5c04000c10 con 
0x2307e40
2015-10-29 13:13:49.500200 7f5c270db700 10 client.7368.objecter in 
handle_osd_op_reply
2015-10-29 13:13:49.500203 7f5c270db700  7 client.7368.objecter 
handle_osd_op_reply 4 ondisk v 0'0 uv 1 in 4.a982c550 attempt 0
2015-10-29 13:13:49.500206 7f5c270db700 10 client.7368.objecter  op 0 rval 0 
len 15
2015-10-29 13:13:49.500207 7f5c270db700 15 client.7368.objecter 
handle_osd_op_reply ack
2015-10-29 13:13:49.500210 7f5c270db700 15 client.7368.objecter 
handle_osd_op_reply completed tid 4
2015-10-29 13:13:49.500211 7f5c270db700 15 client.7368.objecter finish_op 4
2015-10-29 13:13:49.500221 7f5c270db700  5 client.7368.objecter 0 unacked, 0 
uncommitted
2015-10-29 13:13:49.500236 7f5c2cb3b7c0 10 librados: Objecter returned from 
call r=0
2015-10-29 13:13:49.500266 7f5c2cb3b7c0 10 librbd::ImageCtx:  cache bytes 64 
order 22 -> about 10 objects
2015-10-29 13:13:49.500269 7f5c2cb3b7c0 10 librbd::ImageCtx: init_layout 
stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix 
rb.0.16cf.238e1f29 format rb.0.16cf.238e1f29.%012llx
2015-10-29 13:13:49.500286 7f5c2cb3b7c0 10 librados: set snap write context: 
seq = 0 and snaps = []
2015-10-29 13:13:49.500303 7f5c2cb3b7c0 10 librados: set snap read head -> head
2015-10-29 13:13:49.500313 7f5c2cb3b7c0 20 librbd: info 0x2305530
2015-10-29 13:13:49.500321 7f5c2cb3b7c0 20 librbd: ictx_check 0x2305530
2015-10-29 13:13:49.500333 7f5c2cb3b7c0 20 librbd: read_iterate 0x2305530 off = 
0 len = 1048576
2015-10-29 13:13:49.500335 7f5c2cb3b7c0 20 librbd: ictx_check 0x2305530
2015-10-29 13:13:49.500348 7f5c2cb3b7c0 20 librbd: aio_read 0x2305530 
completion 0x230a0d0 [0,1048576]
2015-10-29 13:13:49.500361 7f5c2cb3b7c0 20 librbd: ictx_check 0x2305530
2015-10-29 13:13:49.500387 7f5c2cb3b7c0 20 librbd:  oid 
rb.0.16cf.238e1f29. 0~1048576 from [0,1048576]
2015-10-29 13:13:49.500450 7f5c2cb3b7c0 20 librbd::AioCompletion: 
AioCompletion::finish_adding_requests 0x230a0d0 pending 1
2015-10-29 13:13:54.492329 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:13:59.492510 7f5c24fd6700 10 client.7368.objecter tick


the ticks continue on

- Original Message -
From: "Jason Dillaman" <dilla...@redhat.com>
To: "Joe Ryner" <jry...@cait.org>
Cc: ceph-us...@ceph.com
Sent: Thursday, October 29, 2015 12:05:38 PM
Subject: Re: [ceph-users] rbd hang

I don't see the read request hitting the wire, so I am thinking your client 
cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. 
 Try adding "debug objecter = 20" to your configuration to get more details.

-- 

Jason Dillaman 

- Original Message -
> From: "Joe Ryner" <jry...@cait.org>
> To: ceph-us...@ceph.com
> Sent: Thursday, October 29, 2015 12:22:01 PM
> Subject: [ceph-users] rbd hang
> 
> i,
> 
> I am having a strange problem with our development cluster.  When I run rbd
> export it just hangs.  I have been running ceph for a long time and haven't
> encountered this kind of issue.  Any ideas as to what is going on?
> 
> rbd -p locks export seco101ira -
> 
> 
> I am running
> 
> Centos 6.6 x86 64
>

Re: [ceph-users] rbd hang

2015-10-29 Thread Joe Ryner
Periodicly I am also getting these while waiting

2015-10-29 13:41:09.528674 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:14.528779 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:19.528907 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:22.515725 7f5c260d9700  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=4351}) v2 -- ?+0 
0x7f5c080073c0 con 0x2307540
2015-10-29 13:41:22.516453 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 21  mon_subscribe_ack(300s) v1  20+0+0 (646930372 
0 0) 0x7f5c10003170 con 0x2307540
2015-10-29 13:41:24.529012 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:29.529109 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:34.529209 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:39.529306 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:44.529402 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:49.529498 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:54.529597 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:59.529695 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:04.529800 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:09.529904 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:14.530004 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:19.530103 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:24.530200 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:29.530293 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:34.530385 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:39.530480 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:44.530594 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:49.530690 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:54.530787 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:59.530881 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:04.530980 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:09.531087 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:14.531190 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:19.531308 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:24.531417 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:29.531524 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:34.531629 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:39.531733 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:44.531836 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:49.531938 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:49.692028 7f5c270db700  1 client.7368.objecter ms_handle_reset 
on osd.2
2015-10-29 13:43:49.692051 7f5c270db700 10 client.7368.objecter reopen_session 
osd.2 session, addr now osd.2 10.134.128.43:6803/2741
2015-10-29 13:43:49.692176 7f5c270db700  1 -- 10.134.128.41:0/1025119 mark_down 
0x7f5c1c001c40 -- pipe dne
2015-10-29 13:43:49.692287 7f5c270db700 10 client.7368.objecter kick_requests 
for osd.2
2015-10-29 13:43:49.692300 7f5c270db700 10 client.7368.objecter 
maybe_request_map subscribing (onetime) to next osd map
2015-10-29 13:43:49.693706 7f5c270db700 10 client.7368.objecter 
ms_handle_connect 0x7f5c1c003810
2015-10-29 13:43:52.517670 7f5c260d9700  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=4351}) v2 -- ?+0 
0x7f5c080096c0 con 0x2307540
2015-10-29 13:43:52.518032 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 22  mon_subscribe_ack(300s) v1  20+0+0 (646930372 
0 0) 0x7f5c100056d0 con 0x2307540
2015-10-29 13:43:54.532041 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:59.532150 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:04.532252 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:09.532359 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:14.532467 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:19.532587 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:24.532692 7f5c24fd6700 10 client.7368.objecter tick


- Original Message -
From: "Jason Dillaman" <dilla...@redhat.com>
To: "Joe Ryner" <jry...@cait.org>
Cc: ceph-us...@ceph.com
Sent: Thursday, October 29, 2015 12:05:38 PM
Subject: Re: [ceph-users] rbd hang

I don't see the read request hitting the wire, so I am thinking your client 
cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. 
 Try adding "debug objecter = 20" to your configuration to get more details.

-- 

Jason Dillaman 

- Original Message -
> From: "Joe Ryner" <jry...@cait.org>
> To: ceph-us...@ceph.com
> Sent: Thursday, October 29, 2015 12:22:01 PM
> Subject: [ceph-users] rbd hang
> 
> i,
> 
> I am having a strange problem with our developm

Re: [ceph-users] rbd hang

2015-10-29 Thread Joe Ryner
More info

output of dmesg

[259956.804942] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN)
[260752.788609] libceph: osd1 10.134.128.43:6800 socket closed (con state OPEN)
[260757.908206] libceph: osd2 10.134.128.43:6803 socket closed (con state OPEN)
[260763.181751] libceph: osd3 10.134.128.43:6806 socket closed (con state OPEN)
[260852.224607] libceph: osd6 10.134.128.42:6803 socket closed (con state OPEN)
[260852.510451] libceph: osd5 10.134.128.42:6800 socket closed (con state OPEN)
[260856.868099] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN)
[261652.890656] libceph: osd1 10.134.128.43:6800 socket closed (con state OPEN)
[261657.972579] libceph: osd2 10.134.128.43:6803 socket closed (con state OPEN)
[261663.283701] libceph: osd3 10.134.128.43:6806 socket closed (con state OPEN)
[261752.325749] libceph: osd6 10.134.128.42:6803 socket closed (con state OPEN)
[261752.611505] libceph: osd5 10.134.128.42:6800 socket closed (con state OPEN)
[261756.969340] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN)
[262552.961741] libceph: osd1 10.134.128.43:6800 socket closed (con state OPEN)
[262558.074441] libceph: osd2 10.134.128.43:6803 socket closed (con state OPEN)
[262563.385635] libceph: osd3 10.134.128.43:6806 socket closed (con state OPEN)
[262652.427089] libceph: osd6 10.134.128.42:6803 socket closed (con state OPEN)
[262652.712681] libceph: osd5 10.134.128.42:6800 socket closed (con state OPEN)
[262657.070456] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN)

I noticed that the osds are talking on 10.134.128.42 which is a part of the 
public network but I have defined the cluster network as 10.134.128.64/26

The machine has two nics   10.134.128.41 and 10.134.128.105.

In dmesg output I should be seeing the socket closed spam on 10.134.128.10{5,6} 
right?  

ceph.conf snippett (see full in below)
[global]
 public network = 10.134.128.0/26
 cluster network = 10.134.128.64/26

- Original Message -
From: "Jason Dillaman" <dilla...@redhat.com>
To: "Joe Ryner" <jry...@cait.org>
Cc: ceph-us...@ceph.com
Sent: Thursday, October 29, 2015 12:05:38 PM
Subject: Re: [ceph-users] rbd hang

I don't see the read request hitting the wire, so I am thinking your client 
cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. 
 Try adding "debug objecter = 20" to your configuration to get more details.

-- 

Jason Dillaman 

- Original Message -
> From: "Joe Ryner" <jry...@cait.org>
> To: ceph-us...@ceph.com
> Sent: Thursday, October 29, 2015 12:22:01 PM
> Subject: [ceph-users] rbd hang
> 
> i,
> 
> I am having a strange problem with our development cluster.  When I run rbd
> export it just hangs.  I have been running ceph for a long time and haven't
> encountered this kind of issue.  Any ideas as to what is going on?
> 
> rbd -p locks export seco101ira -
> 
> 
> I am running
> 
> Centos 6.6 x86 64
> 
> ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
> 
> I have enabled debugging and get the following when I run the command
> 
> [root@durbium ~]# rbd -p locks export seco101ira -
> 2015-10-29 11:17:08.183597 7fc3334fa7c0  1 librados: starting msgr at :/0
> 2015-10-29 11:17:08.183613 7fc3334fa7c0  1 librados: starting objecter
> 2015-10-29 11:17:08.183739 7fc3334fa7c0  1 -- :/0 messenger.start
> 2015-10-29 11:17:08.183779 7fc3334fa7c0  1 librados: setting wanted keys
> 2015-10-29 11:17:08.183782 7fc3334fa7c0  1 librados: calling monclient init
> 2015-10-29 11:17:08.184365 7fc3334fa7c0  1 -- :/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900
> con 0x15ba540
> 2015-10-29 11:17:08.185006 7fc3334f2700  1 -- 10.134.128.41:0/1024687 learned
> my addr 10.134.128.41:0/1024687
> 2015-10-29 11:17:08.185995 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 1  mon_map v1  491+0+0 (318324477 0 0)
> 0x7fc318000be0 con 0x15ba540
> 2015-10-29 11:17:08.186213 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 2  auth_reply(proto 2 0 (0) Success) v1 
> 33+0+0 (4093383511 0 0) 0x7fc318001090 con 0x15ba540
> 2015-10-29 11:17:08.186544 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
> 0x7fc31c001700 con 0x15ba540
> 2015-10-29 11:17:08.187160 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 3  auth_reply(proto 2 0 (0) Success) v1 
> 206+0+0 (2382192463 0 0) 0x7fc318001090 con 0x15ba540
> 2015-10-29 11:17:08.187354 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0
> 0x7fc31c002220 con 0x15ba540
> 2015-10-29 11:17:08.188001 

[ceph-users] Fedora 18 Qemu

2013-08-27 Thread Joe Ryner
Does anyone have the best patches for Fedora 18 qemu that fixes aio issues?  I 
have built my own but am having mixed results?

Its qemu 1.2.2

Or would it be better to jump to Fedora 19.  I am running Fedora 18 in hopes 
that RHEL 7 will be based on it.

Thanks,
Joe

-- 
Joe Ryner
Center for the Application of Information Technologies (CAIT)
Production Coordinator
P: (309) 298-1804
F: (309) 298-2806
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Repository Mirroring

2013-06-18 Thread Joe Ryner
I would like to make a local mirror or your yum repositories.  Do you support 
any of the standard methods of syncing aka rsync?

Thanks,
Joe

-- 
Joe Ryner
Center for the Application of Information Technologies (CAIT)
Production Coordinator
P: (309) 298-1804
F: (309) 298-2806
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel panic on rbd map when cluster is out of monitor quorum

2013-05-20 Thread Joe Ryner
Hi,

My kernels are running:
3.8.11-200.fc18.x86_64 and 3.8.9-200.fc18.x86_64

My cephx settings are below

auth cluster required = cephx
auth service required = cephx
auth client required = cephx

I will be working on my test cluster later this week and will try to reproduce 
and will file a bug then.

Joe


- Original Message -
From: Sage Weil s...@inktank.com
To: Joe Ryner jry...@cait.org
Cc: ceph-users@lists.ceph.com
Sent: Friday, May 17, 2013 4:01:35 PM
Subject: Re: [ceph-users] Kernel panic on rbd map when cluster is out of 
monitor quorum

On Fri, 17 May 2013, Joe Ryner wrote:
 Hi All,
 
 I have had an issue recently while working on my ceph clusters.  The 
 following issue seems to be true on bobtail and cuttlefish.  I have two 
 production clusters in two different data centers and a test cluster.  We are 
 using ceph to run virtual machines.  I use rbd as block devices for sanlock.
 
 I am running Fedora 18.
 
 I have been moving monitors around and in the process I got the cluster 
 out of quorum, so ceph stopped responding.  During this time I decided 
 to reboot a ceph node that performs an rbd map during startup.  The 
 system boots ok but the service script that is performing the rbd map 
 doesn't finish and eventually the system will OOPS and then finally 
 panic.  I was able to disable the rbd map during boot and finally got 
 the cluster back in quorum and everything settled down nicely.

What kernel version?  Are you using cephx authentication?  If you could 
open a bug at tracker.ceph.com that would be most helpful!

 Question, has anyone seen this behavior of crashing/panic?  I have seen this 
 happen on both of my production clusters.
 Secondly, the ceph command hangs when the cluster is out of quorum, is there 
 a timeout available?

Not currently.  You can do this yourself with 'timeout 120 ...' with any 
recent coreutils.

Thanks-
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New osd not change status to up

2013-04-16 Thread Joe Ryner
Timofey Koolin timofey@... writes:

 
 I have test cluster3 node:
 1 - osd.0 mon.a mds.a
 2 - osd.1
 3 - empty
 
 I create osd.2:
 node1# ceph osd create
 
 
 node3# mkdir /var/lib/ceph/osd/ceph-2
 node3# mkfs.xfs /dev/sdb
 node3# mount /dev/sdb  /var/lib/ceph/osd/ceph-2
 node3# ceph-osd -i 2 --mkfs --mkkey
 
 
 copy keyring from node 3 to node 1 in root/keyring
 node1# ceph auth add osd.2 osd 'allow *' mon 'allow rwx' -i keyring
 node1# ceph osd crush set 2 1 root=default rack=unknownrack host=s3
 
 
 node3# service ceph start
 
 node1# ceph -s
 
 health HEALTH_OK
    monmap e1: 1 mons at {a=x.x.x.x:6789/0}, election epoch 1, quorum 0 a
 
    osdmap e135: 3 osds: 2 up, 2 in
     pgmap v6454: 576 pgs: 576 active+clean; 179 MB data, 2568 MB used, 137 
GB / 139 GB avail
    mdsmap e4: 1/1/1 up {0=a=up:active}
 
 
 
 
Hi Timofey,

I was having some problems with adding OSDs and found that the documentation 
was incorrect.  It has been corrected, please see 
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/?
highlight=osd#adding-an-osd-manual

Please note that the output of ceph osd create provides what the new osd 
number should be.  Also, I found the ceph osd tree is helpful in 
determining how things are layed out.

Joe




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Documentation Error in Adding/Removing OSDs

2013-04-11 Thread Joe Ryner
Probably should mention that the ceph osd create command will output what the 
new {osd-number} should be.

Thanks for making the change so fast.

Joe

- Original Message -
From: John Wilkins john.wilk...@inktank.com
To: Joe Ryner jry...@cait.org
Cc: ceph-users@lists.ceph.com
Sent: Thursday, April 11, 2013 2:37:33 PM
Subject: Re: [ceph-users] Documentation Error in Adding/Removing OSDs


Thanks Joe! I've made the change. You should see it up on the site shortly. 



On Thu, Apr 11, 2013 at 10:00 AM, Joe Ryner  jry...@cait.org  wrote: 


Hi, 

I have found some issues in: 
http://ceph.com/docs/master/rados/operations/add-or-rm-osds 

In the adding section: 
Step 6 Should be ran before 1-5 as it outputs the OSD number when it exits. I 
had a really hard time figuring this out. I am currently running 0.56.4 on RHEL 
6. The First 5 steps imply that you can pick an osd-number out of the either 
but really you have to use the osd number outputted by step 6. 

The following discussion help me figure this out. 
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/11339 


Thanks, 
Joe 
-- 
Joe Ryner 
Center for the Application of Information Technologies (CAIT) 
Production Coordinator 
P: (309) 298-1804 
F: (309) 298-2806 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 




-- 
John Wilkins 
Senior Technical Writer 
Intank 
john.wilk...@inktank.com 
(415) 425-9599 
http://inktank.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com