Our initial values for journal sizes was enough, but flush time was 5 secs, so we increase journal side to fit flush timeframe min|max for 29/30 seconds.
I mean filestore max sync interval = 30 filestore min sync interval = 29 when said flush time 2015-08-21 2:16 GMT+03:00 Samuel Just <sj...@redhat.com>: > Also, what do you mean by "change journal side"? > -Sam > > On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just <sj...@redhat.com> wrote: > > Not sure what you mean by: > > > > but it's stop to work in same moment, when cache layer fulfilled with > > data and evict/flush started... > > -Sam > > > > On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor > > <igor.voloshane...@gmail.com> wrote: > >> No, when we start draining cache - bad pgs was in place... > >> We have big rebalance (disk by disk - to change journal side on both > >> hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors > and 2 > >> pgs inconsistent... > >> > >> In writeback - yes, looks like snapshot works good. but it's stop to > work in > >> same moment, when cache layer fulfilled with data and evict/flush > started... > >> > >> > >> > >> 2015-08-21 2:09 GMT+03:00 Samuel Just <sj...@redhat.com>: > >>> > >>> So you started draining the cache pool before you saw either the > >>> inconsistent pgs or the anomalous snap behavior? (That is, writeback > >>> mode was working correctly?) > >>> -Sam > >>> > >>> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor > >>> <igor.voloshane...@gmail.com> wrote: > >>> > Good joke ))))))))) > >>> > > >>> > 2015-08-21 2:06 GMT+03:00 Samuel Just <sj...@redhat.com>: > >>> >> > >>> >> Certainly, don't reproduce this with a cluster you care about :). > >>> >> -Sam > >>> >> > >>> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just <sj...@redhat.com> > wrote: > >>> >> > What's supposed to happen is that the client transparently directs > >>> >> > all > >>> >> > requests to the cache pool rather than the cold pool when there > is a > >>> >> > cache pool. If the kernel is sending requests to the cold pool, > >>> >> > that's probably where the bug is. Odd. It could also be a bug > >>> >> > specific 'forward' mode either in the client or on the osd. Why > did > >>> >> > you have it in that mode? > >>> >> > -Sam > >>> >> > > >>> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor > >>> >> > <igor.voloshane...@gmail.com> wrote: > >>> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in > >>> >> >> production, > >>> >> >> and they don;t support ncq_trim... > >>> >> >> > >>> >> >> And 4,x first branch which include exceptions for this in > libsata.c. > >>> >> >> > >>> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no > to > >>> >> >> go > >>> >> >> deeper if packege for new kernel exist. > >>> >> >> > >>> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor > >>> >> >> <igor.voloshane...@gmail.com>: > >>> >> >>> > >>> >> >>> root@test:~# uname -a > >>> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 > >>> >> >>> 17:37:22 > >>> >> >>> UTC > >>> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux > >>> >> >>> > >>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just <sj...@redhat.com>: > >>> >> >>>> > >>> >> >>>> Also, can you include the kernel version? > >>> >> >>>> -Sam > >>> >> >>>> > >>> >> >>>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just <sj...@redhat.com > > > >>> >> >>>> wrote: > >>> >> >>>> > Snapshotting with cache/tiering *is* supposed to work. Can > you > >>> >> >>>> > open a > >>> >> >>>> > bug? > >>> >> >>>> > -Sam > >>> >> >>>> > > >>> >> >>>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic > >>> >> >>>> > <andrija.pa...@gmail.com> wrote: > >>> >> >>>> >> This was related to the caching layer, which doesnt support > >>> >> >>>> >> snapshooting per > >>> >> >>>> >> docs...for sake of closing the thread. > >>> >> >>>> >> > >>> >> >>>> >> On 17 August 2015 at 21:15, Voloshanenko Igor > >>> >> >>>> >> <igor.voloshane...@gmail.com> > >>> >> >>>> >> wrote: > >>> >> >>>> >>> > >>> >> >>>> >>> Hi all, can you please help me with unexplained > situation... > >>> >> >>>> >>> > >>> >> >>>> >>> All snapshot inside ceph broken... > >>> >> >>>> >>> > >>> >> >>>> >>> So, as example, we have VM template, as rbd inside ceph. > >>> >> >>>> >>> We can map it and mount to check that all ok with it > >>> >> >>>> >>> > >>> >> >>>> >>> root@test:~# rbd map > >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 > >>> >> >>>> >>> /dev/rbd0 > >>> >> >>>> >>> root@test:~# parted /dev/rbd0 print > >>> >> >>>> >>> Model: Unknown (unknown) > >>> >> >>>> >>> Disk /dev/rbd0: 10.7GB > >>> >> >>>> >>> Sector size (logical/physical): 512B/512B > >>> >> >>>> >>> Partition Table: msdos > >>> >> >>>> >>> > >>> >> >>>> >>> Number Start End Size Type File system Flags > >>> >> >>>> >>> 1 1049kB 525MB 524MB primary ext4 boot > >>> >> >>>> >>> 2 525MB 10.7GB 10.2GB primary lvm > >>> >> >>>> >>> > >>> >> >>>> >>> Than i want to create snap, so i do: > >>> >> >>>> >>> root@test:~# rbd snap create > >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> >> >>>> >>> > >>> >> >>>> >>> And now i want to map it: > >>> >> >>>> >>> > >>> >> >>>> >>> root@test:~# rbd map > >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> >> >>>> >>> /dev/rbd1 > >>> >> >>>> >>> root@test:~# parted /dev/rbd1 print > >>> >> >>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only > file > >>> >> >>>> >>> system). > >>> >> >>>> >>> /dev/rbd1 has been opened read-only. > >>> >> >>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only > file > >>> >> >>>> >>> system). > >>> >> >>>> >>> /dev/rbd1 has been opened read-only. > >>> >> >>>> >>> Error: /dev/rbd1: unrecognised disk label > >>> >> >>>> >>> > >>> >> >>>> >>> Even md5 different... > >>> >> >>>> >>> root@ix-s2:~# md5sum /dev/rbd0 > >>> >> >>>> >>> 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 > >>> >> >>>> >>> root@ix-s2:~# md5sum /dev/rbd1 > >>> >> >>>> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >>> >> >>>> >>> > >>> >> >>>> >>> > >>> >> >>>> >>> Ok, now i protect snap and create clone... but same > thing... > >>> >> >>>> >>> md5 for clone same as for snap,, > >>> >> >>>> >>> > >>> >> >>>> >>> root@test:~# rbd unmap /dev/rbd1 > >>> >> >>>> >>> root@test:~# rbd snap protect > >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> >> >>>> >>> root@test:~# rbd clone > >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap > >>> >> >>>> >>> cold-storage/test-image > >>> >> >>>> >>> root@test:~# rbd map cold-storage/test-image > >>> >> >>>> >>> /dev/rbd1 > >>> >> >>>> >>> root@test:~# md5sum /dev/rbd1 > >>> >> >>>> >>> e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 > >>> >> >>>> >>> > >>> >> >>>> >>> .... but it's broken... > >>> >> >>>> >>> root@test:~# parted /dev/rbd1 print > >>> >> >>>> >>> Error: /dev/rbd1: unrecognised disk label > >>> >> >>>> >>> > >>> >> >>>> >>> > >>> >> >>>> >>> ========= > >>> >> >>>> >>> > >>> >> >>>> >>> tech details: > >>> >> >>>> >>> > >>> >> >>>> >>> root@test:~# ceph -v > >>> >> >>>> >>> ceph version 0.94.2 > (5fb85614ca8f354284c713a2f9c610860720bbf3) > >>> >> >>>> >>> > >>> >> >>>> >>> We have 2 inconstistent pgs, but all images not placed on > this > >>> >> >>>> >>> pgs... > >>> >> >>>> >>> > >>> >> >>>> >>> root@test:~# ceph health detail > >>> >> >>>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors > >>> >> >>>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29] > >>> >> >>>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42] > >>> >> >>>> >>> 18 scrub errors > >>> >> >>>> >>> > >>> >> >>>> >>> ============ > >>> >> >>>> >>> > >>> >> >>>> >>> root@test:~# ceph osd map cold-storage > >>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5 > >>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object > >>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 > >>> >> >>>> >>> (2.770) > >>> >> >>>> >>> -> up > >>> >> >>>> >>> ([37,15,14], p37) acting ([37,15,14], p37) > >>> >> >>>> >>> root@test:~# ceph osd map cold-storage > >>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap > >>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object > >>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg > 2.793cd4a3 > >>> >> >>>> >>> (2.4a3) > >>> >> >>>> >>> -> up > >>> >> >>>> >>> ([12,23,17], p12) acting ([12,23,17], p12) > >>> >> >>>> >>> root@test:~# ceph osd map cold-storage > >>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image > >>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object > >>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg > >>> >> >>>> >>> 2.9519c2a9 > >>> >> >>>> >>> (2.2a9) > >>> >> >>>> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12) > >>> >> >>>> >>> > >>> >> >>>> >>> > >>> >> >>>> >>> Also we use cache layer, which in current moment - in > forward > >>> >> >>>> >>> mode... > >>> >> >>>> >>> > >>> >> >>>> >>> Can you please help me with this.. As my brain stop to > >>> >> >>>> >>> understand > >>> >> >>>> >>> what is > >>> >> >>>> >>> going on... > >>> >> >>>> >>> > >>> >> >>>> >>> Thank in advance! > >>> >> >>>> >>> > >>> >> >>>> >>> > >>> >> >>>> >>> > >>> >> >>>> >>> > >>> >> >>>> >>> > >>> >> >>>> >>> _______________________________________________ > >>> >> >>>> >>> ceph-users mailing list > >>> >> >>>> >>> ceph-users@lists.ceph.com > >>> >> >>>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> >> >>>> >>> > >>> >> >>>> >> > >>> >> >>>> >> > >>> >> >>>> >> > >>> >> >>>> >> -- > >>> >> >>>> >> > >>> >> >>>> >> Andrija Panić > >>> >> >>>> >> > >>> >> >>>> >> _______________________________________________ > >>> >> >>>> >> ceph-users mailing list > >>> >> >>>> >> ceph-users@lists.ceph.com > >>> >> >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> >> >>>> >> > >>> >> >>> > >>> >> >>> > >>> >> >> > >>> > > >>> > > >> > >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com