Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-09-08 Thread Tomasz Kusmierz
Hi there, 

Somebody told me that essentially i was runnning pre released version and I 
should upgrade to 12.2 and comeback if problem persists. 
Today I’ve got upgrade available to 12.2 and installed it … so, I still have 
problems;


Sep 08 18:48:05 proxmox1 ceph-osd[3954]: *** Caught signal (Segmentation fault) 
**
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  in thread 7f5c883f0700 
thread_name:tp_osd_tp
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  ceph version 12.2.0 
(36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  1: (()+0xa07bb4) [0x55e157f22bb4]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  2: (()+0x110c0) [0x7f5ca94030c0]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  3: (()+0x1ff2f) [0x7f5caba05f2f]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  4: 
(rocksdb::BlockBasedTable::NewIndexIterator(rocksdb::ReadOptions const&, 
rocksdb::BlockIter*, 
rocksdb::BlockBasedTable::CachableEntry*)+0x4e6)
 [0x55e158306bb6]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  5: 
(rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice 
const&, rocksdb::GetContext*, bool)+0x283) [0x55e158307963]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  6: 
(rocksdb::TableCache::Get(rocksdb::ReadOptions const&, 
rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, 
rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, 
int)+0x13a) [0x55e1583e718a]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  7: 
(rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, 
rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, 
rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) 
[0x55e1582c8c28]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  8: 
(rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, 
rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, 
bool*)+0x552) [0x55e15838d682]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  9: 
(rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, 
rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, 
rocksdb::PinnableSlice*)+0x13) [0x55e15838dab3]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  10: 
(rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, 
std::__cxx11::basic_string, std::allocator 
>*)+0xc1) [0x55e157e6bb51]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  11: 
(RocksDBStore::get(std::__cxx11::basic_string, 
std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, 
ceph::buffer::list*)+0x1bb) [0x55e157e6308b]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  12: (()+0x885d71) [0x55e157da0d71]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  13: (()+0x885675) [0x55e157da0675]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  14: 
(BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned 
int)+0x5d7) [0x55e157de48c7]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  15: 
(BlueStore::_do_truncate(BlueStore::TransContext*, 
boost::intrusive_ptr&, 
boost::intrusive_ptr, unsigned long, 
std::set, 
std::allocator >*)+0x118) [0x55e157e05cd8]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  16: 
(BlueStore::_do_remove(BlueStore::TransContext*, 
boost::intrusive_ptr&, 
boost::intrusive_ptr)+0xc5) [0x55e157e06755]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  17: 
(BlueStore::_remove(BlueStore::TransContext*, 
boost::intrusive_ptr&, 
boost::intrusive_ptr&)+0x7b) [0x55e157e0807b]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  18: 
(BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
ObjectStore::Transaction*)+0x1f55) [0x55e157e1ec15]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  19: 
(BlueStore::queue_transactions(ObjectStore::Sequencer*, 
std::vector 
>&, boost::intrusive_ptr, ThreadPool::TPHandle*)+0x536) 
[0x55e157e1f916]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  20: 
(PrimaryLogPG::queue_transactions(std::vector >&, 
boost::intrusive_ptr)+0x66) [0x55e157b437f6]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  21: 
(ReplicatedBackend::do_repop(boost::intrusive_ptr)+0xbdc) 
[0x55e157c70d6c]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  22: 
(ReplicatedBackend::_handle_message(boost::intrusive_ptr)+0x2b7) 
[0x55e157c73b47]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  23: 
(PGBackend::handle_message(boost::intrusive_ptr)+0x50) 
[0x55e157b810d0]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  24: 
(PrimaryLogPG::do_request(boost::intrusive_ptr&, 
ThreadPool::TPHandle&)+0x4e3) [0x55e157ae6a83]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  25: 
(OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, 
ThreadPool::TPHandle&)+0x3ab) [0x55e15796b19b]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  26: 
(PGQueueable::RunVis::operator()(boost::intrusive_ptr const&)+0x5a) 
[0x55e157c0354a]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  27: 
(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) 
[0x55e157991d9d]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  28: 
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) 
[0x55e157f6f20f]
Sep 08 18:48:05 proxmox1 ceph-osd[3954]:  29: 
(Sharded

Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread Tomasz Kusmierz
so on IRC I was asked to add this log from OSD that was marked as missing 
during scrub:

https://pastebin.com/raw/YQj3Drzi


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread David Zafman


Please file a bug in tracker: http://tracker.ceph.com/projects/ceph

When an OSD is marked down is there are a crash (e.g. assert, heartbeat 
timeout, declared down by another daemon)?  Please include relevant log 
snippets.  If no obvious information, then bump osd debug log levels.


Luminous LTS release happened today, so 12.2.0 is the best thing to run 
as of now.


See if any existing bugs like http://tracker.ceph.com/issues/21142 are 
related.


David


On 8/29/17 8:24 AM, Tomasz Kusmierz wrote:

So nobody has any clue on this one ???

Should I go with this one to dev mailing list ?


On 27 Aug 2017, at 01:49, Tomasz Kusmierz  wrote:

Hi,
for purposes of experimenting I’m running a home cluster that consists of 
single node and 4 OSD (weights in crush map are true to actual hdd size). I 
prefer to test all new stuff on home equipment before getting egg in the face 
at work :)
Anyway recently I’ve upgrade to Luminous, and replaced my ancient 8x 2TB drives 
with 2x 8TB drives (with hopes of getting more in near future). While doing 
that I’ve converted everything to bluestore. while still on 12.1.1

Everything was running smooth and performance was good (for ceph).

I’ve decided to upgrade recently to 12.1.2 and this is where everything started 
acting up. I’m aware that
- single node cluster is not a cluster
- in the end I might need more OSD (old joke right ?)
- I need to switch from spinning rust to SSD

Before upgrade my “cluster” was only switching to WRN only when I was pumping a 
lot of data into it and it would just come up with “slow requests” stuff. Now 
while completely static, not doing anything (no read, no write) OSD’s are 
committing suicide due to timeout, also before they will commit suicide I can’t 
actually access data from cluster, which make me think that while performing a 
scrub those are unaccessible. Bellow I’ll attach a log excerpt just please 
notice that it happens on deep scrub and normal scrub as well.

After I’ve discovered that I’ve tried to play around with sysctl.conf and with 
ceph.conf ( up to this point sysctl.conf was stock, and ceph.conf was just 
adjusted to allow greater OSD full capacity and disable cephx to speed it up)

also I’m running 3 pools on top of this cluster (all three have size = 2 
min_size = 2):
cephfs_data pg=256 (99.99% of data used in cluster)
cephfs_metadata pg=4 (0.01% of data used in cluster)
rbd pg=8 but this pool contains no data and I’m considering removing it since 
in my use case I’ve got nothing for it.

Please note that while this logs were produced cephFS was not even mounted :/



FYI hardware is old and trusted hp proliant DL180 G6 with 2 xeons @2.2GHz 
giving 16 cores and 32GB or ECC ram and LSI in HBA mode (2x 6GB SAS)



(
As a side issue could somebody explain to my why with bluestore that was supposed to 
cure cancer write performance still sucks ? I know that filestore did suffer from 
writing everything multiple times to same drive, and I did experience this first hand 
when after exhausting journals it was just dead slow, but now while within same host 
in my current configuration it keeps choking [flaps 70MB/s -> 10 MB/s -> 
70MB/s] and I never seen it even approach speed of single slowest drive. This server 
is not a speed daemon, I know, but when performing a simultaneous read / write for 
those drives I was getting around 760MB/s sequential R/W speed.
Right now I’m struggling to comprehend where the bottleneck is while performing 
operations within same host ?! network should not be an issue (correct me if 
I’m wrong here), dumping a singular blob into pool should produce a nice long 
sequence of object placed into drives …
I’m just puzzled why ceph will not exceed combined 40MB/s while still switching 
“cluster” into warning state due to “slow responses”
2017-08-24 20:49:34.457191 osd.8 osd.8 192.168.1.240:6814/3393 503 : cluster 
[WRN] slow request 63.878717 seconds old, received at 2017-08-24 
20:48:30.578398: osd_op(client.994130.1:13659 1.9700016d 
1:b68000e9:::10ffeef.0068:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457195 osd.8 osd.8 192.168.1.240:6814/3393 504 : cluster 
[WRN] slow request 64.177858 seconds old, received at 2017-08-24 
20:48:30.279257: osd_op(client.994130.1:13568 1.b95e13a4 
1:25c87a9d:::10ffeef.000d:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457198 osd.8 osd.8 192.168.1.240:6814/3393 505 : cluster 
[WRN] slow request 64.002653 seconds old, received at 2017-08-24 
20:48:30.454463: osd_op(client.994130.1:13626 1.b426420e 
1:7042642d:::10ffeef.0047:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457200 osd.8 osd.8 192.168.1.240:6814/3393 506 : cluster 
[WRN] slow request 63.873519 seconds old, received at 2017-08-24 

Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread Tomasz Kusmierz
So nobody has any clue on this one ??? 

Should I go with this one to dev mailing list ?

> On 27 Aug 2017, at 01:49, Tomasz Kusmierz  wrote:
> 
> Hi,
> for purposes of experimenting I’m running a home cluster that consists of 
> single node and 4 OSD (weights in crush map are true to actual hdd size). I 
> prefer to test all new stuff on home equipment before getting egg in the face 
> at work :)
> Anyway recently I’ve upgrade to Luminous, and replaced my ancient 8x 2TB 
> drives with 2x 8TB drives (with hopes of getting more in near future). While 
> doing that I’ve converted everything to bluestore. while still on 12.1.1
> 
> Everything was running smooth and performance was good (for ceph).
> 
> I’ve decided to upgrade recently to 12.1.2 and this is where everything 
> started acting up. I’m aware that 
> - single node cluster is not a cluster
> - in the end I might need more OSD (old joke right ?)
> - I need to switch from spinning rust to SSD 
> 
> Before upgrade my “cluster” was only switching to WRN only when I was pumping 
> a lot of data into it and it would just come up with “slow requests” stuff. 
> Now while completely static, not doing anything (no read, no write) OSD’s are 
> committing suicide due to timeout, also before they will commit suicide I 
> can’t actually access data from cluster, which make me think that while 
> performing a scrub those are unaccessible. Bellow I’ll attach a log excerpt 
> just please notice that it happens on deep scrub and normal scrub as well.
> 
> After I’ve discovered that I’ve tried to play around with sysctl.conf and 
> with ceph.conf ( up to this point sysctl.conf was stock, and ceph.conf was 
> just adjusted to allow greater OSD full capacity and disable cephx to speed 
> it up)
> 
> also I’m running 3 pools on top of this cluster (all three have size = 2 
> min_size = 2):
> cephfs_data pg=256 (99.99% of data used in cluster)
> cephfs_metadata pg=4 (0.01% of data used in cluster)
> rbd pg=8 but this pool contains no data and I’m considering removing it since 
> in my use case I’ve got nothing for it.
> 
> Please note that while this logs were produced cephFS was not even mounted :/
> 
> 
> 
> FYI hardware is old and trusted hp proliant DL180 G6 with 2 xeons @2.2GHz 
> giving 16 cores and 32GB or ECC ram and LSI in HBA mode (2x 6GB SAS)
> 
> 
> 
> ( 
> As a side issue could somebody explain to my why with bluestore that was 
> supposed to cure cancer write performance still sucks ? I know that filestore 
> did suffer from writing everything multiple times to same drive, and I did 
> experience this first hand when after exhausting journals it was just dead 
> slow, but now while within same host in my current configuration it keeps 
> choking [flaps 70MB/s -> 10 MB/s -> 70MB/s] and I never seen it even approach 
> speed of single slowest drive. This server is not a speed daemon, I know, but 
> when performing a simultaneous read / write for those drives I was getting 
> around 760MB/s sequential R/W speed. 
> Right now I’m struggling to comprehend where the bottleneck is while 
> performing operations within same host ?! network should not be an issue 
> (correct me if I’m wrong here), dumping a singular blob into pool should 
> produce a nice long sequence of object placed into drives … 
> I’m just puzzled why ceph will not exceed combined 40MB/s while still 
> switching “cluster” into warning state due to “slow responses” 
> 2017-08-24 20:49:34.457191 osd.8 osd.8 192.168.1.240:6814/3393 503 : cluster 
> [WRN] slow request 63.878717 seconds old, received at 2017-08-24 
> 20:48:30.578398: osd_op(client.994130.1:13659 1.9700016d 
> 1:b68000e9:::10ffeef.0068:head [write 0~4194304 [1@-1]] snapc 1=[] 
> ondisk+write+known_if_redirected e4306) currently waiting for active
> 2017-08-24 20:49:34.457195 osd.8 osd.8 192.168.1.240:6814/3393 504 : cluster 
> [WRN] slow request 64.177858 seconds old, received at 2017-08-24 
> 20:48:30.279257: osd_op(client.994130.1:13568 1.b95e13a4 
> 1:25c87a9d:::10ffeef.000d:head [write 0~4194304 [1@-1]] snapc 1=[] 
> ondisk+write+known_if_redirected e4306) currently waiting for active
> 2017-08-24 20:49:34.457198 osd.8 osd.8 192.168.1.240:6814/3393 505 : cluster 
> [WRN] slow request 64.002653 seconds old, received at 2017-08-24 
> 20:48:30.454463: osd_op(client.994130.1:13626 1.b426420e 
> 1:7042642d:::10ffeef.0047:head [write 0~4194304 [1@-1]] snapc 1=[] 
> ondisk+write+known_if_redirected e4306) currently waiting for active
> 2017-08-24 20:49:34.457200 osd.8 osd.8 192.168.1.240:6814/3393 506 : cluster 
> [WRN] slow request 63.873519 seconds old, received at 2017-08-24 
> 20:48:30.583596: osd_op(client.994130.1:13661 1.31551a8 
> 1:158aa8c0:::10ffeef.006a:head [write 0~4194304 [1@-1]] snapc 1=[] 
> ondisk+write+known_if_redirected e4306) currently waiting for active
> 2017-08-24 20:49:34.457206 osd.8 osd.8 192.168.1.240:6814/3393 507 : cluster 
> [WRN] slow request 64.171785 seconds old, received

[ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-26 Thread Tomasz Kusmierz
Hi,
for purposes of experimenting I’m running a home cluster that consists of 
single node and 4 OSD (weights in crush map are true to actual hdd size). I 
prefer to test all new stuff on home equipment before getting egg in the face 
at work :)
Anyway recently I’ve upgrade to Luminous, and replaced my ancient 8x 2TB drives 
with 2x 8TB drives (with hopes of getting more in near future). While doing 
that I’ve converted everything to bluestore. while still on 12.1.1

Everything was running smooth and performance was good (for ceph).

I’ve decided to upgrade recently to 12.1.2 and this is where everything started 
acting up. I’m aware that 
- single node cluster is not a cluster
- in the end I might need more OSD (old joke right ?)
- I need to switch from spinning rust to SSD 

Before upgrade my “cluster” was only switching to WRN only when I was pumping a 
lot of data into it and it would just come up with “slow requests” stuff. Now 
while completely static, not doing anything (no read, no write) OSD’s are 
committing suicide due to timeout, also before they will commit suicide I can’t 
actually access data from cluster, which make me think that while performing a 
scrub those are unaccessible. Bellow I’ll attach a log excerpt just please 
notice that it happens on deep scrub and normal scrub as well.

After I’ve discovered that I’ve tried to play around with sysctl.conf and with 
ceph.conf ( up to this point sysctl.conf was stock, and ceph.conf was just 
adjusted to allow greater OSD full capacity and disable cephx to speed it up)

also I’m running 3 pools on top of this cluster (all three have size = 2 
min_size = 2):
cephfs_data pg=256 (99.99% of data used in cluster)
cephfs_metadata pg=4 (0.01% of data used in cluster)
rbd pg=8 but this pool contains no data and I’m considering removing it since 
in my use case I’ve got nothing for it.

Please note that while this logs were produced cephFS was not even mounted :/



FYI hardware is old and trusted hp proliant DL180 G6 with 2 xeons @2.2GHz 
giving 16 cores and 32GB or ECC ram and LSI in HBA mode (2x 6GB SAS)



( 
As a side issue could somebody explain to my why with bluestore that was 
supposed to cure cancer write performance still sucks ? I know that filestore 
did suffer from writing everything multiple times to same drive, and I did 
experience this first hand when after exhausting journals it was just dead 
slow, but now while within same host in my current configuration it keeps 
choking [flaps 70MB/s -> 10 MB/s -> 70MB/s] and I never seen it even approach 
speed of single slowest drive. This server is not a speed daemon, I know, but 
when performing a simultaneous read / write for those drives I was getting 
around 760MB/s sequential R/W speed. 
Right now I’m struggling to comprehend where the bottleneck is while performing 
operations within same host ?! network should not be an issue (correct me if 
I’m wrong here), dumping a singular blob into pool should produce a nice long 
sequence of object placed into drives … 
I’m just puzzled why ceph will not exceed combined 40MB/s while still switching 
“cluster” into warning state due to “slow responses” 
2017-08-24 20:49:34.457191 osd.8 osd.8 192.168.1.240:6814/3393 503 : cluster 
[WRN] slow request 63.878717 seconds old, received at 2017-08-24 
20:48:30.578398: osd_op(client.994130.1:13659 1.9700016d 
1:b68000e9:::10ffeef.0068:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457195 osd.8 osd.8 192.168.1.240:6814/3393 504 : cluster 
[WRN] slow request 64.177858 seconds old, received at 2017-08-24 
20:48:30.279257: osd_op(client.994130.1:13568 1.b95e13a4 
1:25c87a9d:::10ffeef.000d:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457198 osd.8 osd.8 192.168.1.240:6814/3393 505 : cluster 
[WRN] slow request 64.002653 seconds old, received at 2017-08-24 
20:48:30.454463: osd_op(client.994130.1:13626 1.b426420e 
1:7042642d:::10ffeef.0047:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457200 osd.8 osd.8 192.168.1.240:6814/3393 506 : cluster 
[WRN] slow request 63.873519 seconds old, received at 2017-08-24 
20:48:30.583596: osd_op(client.994130.1:13661 1.31551a8 
1:158aa8c0:::10ffeef.006a:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457206 osd.8 osd.8 192.168.1.240:6814/3393 507 : cluster 
[WRN] slow request 64.171785 seconds old, received at 2017-08-24 
20:48:30.285330: osd_op(client.994130.1:13570 1.49610cc6 
1:63308692:::10ffeef.000f:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
)


So some data on main issue:


FAIL ON DEEP SCRUB
2017-08-26 01:50:04.421944 osd.0 osd.0 192