Re: [ceph-users] Broken snapshots... CEPH 0.94.2

Voloshanenko Igor Thu, 20 Aug 2015 16:21:12 -0700

Our initial values for journal sizes was enough, but flush time was 5 secs,
so we increase journal side to fit flush timeframe min|max for 29/30
seconds.


I mean
  filestore max sync interval = 30
  filestore min sync interval = 29
when said flush time

2015-08-21 2:16 GMT+03:00 Samuel Just <sj...@redhat.com>:

> Also, what do you mean by "change journal side"?
> -Sam
>
> On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just <sj...@redhat.com> wrote:
> > Not sure what you mean by:
> >
> > but it's stop to work in same moment, when cache layer fulfilled with
> > data and evict/flush started...
> > -Sam
> >
> > On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
> > <igor.voloshane...@gmail.com> wrote:
> >> No, when we start draining cache - bad pgs was in place...
> >> We have big rebalance (disk by disk - to change journal side on both
> >> hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors
> and 2
> >> pgs inconsistent...
> >>
> >> In writeback - yes, looks like snapshot works good. but it's stop to
> work in
> >> same moment, when cache layer fulfilled with data and evict/flush
> started...
> >>
> >>
> >>
> >> 2015-08-21 2:09 GMT+03:00 Samuel Just <sj...@redhat.com>:
> >>>
> >>> So you started draining the cache pool before you saw either the
> >>> inconsistent pgs or the anomalous snap behavior?  (That is, writeback
> >>> mode was working correctly?)
> >>> -Sam
> >>>
> >>> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
> >>> <igor.voloshane...@gmail.com> wrote:
> >>> > Good joke )))))))))
> >>> >
> >>> > 2015-08-21 2:06 GMT+03:00 Samuel Just <sj...@redhat.com>:
> >>> >>
> >>> >> Certainly, don't reproduce this with a cluster you care about :).
> >>> >> -Sam
> >>> >>
> >>> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just <sj...@redhat.com>
> wrote:
> >>> >> > What's supposed to happen is that the client transparently directs
> >>> >> > all
> >>> >> > requests to the cache pool rather than the cold pool when there
> is a
> >>> >> > cache pool.  If the kernel is sending requests to the cold pool,
> >>> >> > that's probably where the bug is.  Odd.  It could also be a bug
> >>> >> > specific 'forward' mode either in the client or on the osd.  Why
> did
> >>> >> > you have it in that mode?
> >>> >> > -Sam
> >>> >> >
> >>> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> >>> >> > <igor.voloshane...@gmail.com> wrote:
> >>> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
> >>> >> >> production,
> >>> >> >> and they don;t support ncq_trim...
> >>> >> >>
> >>> >> >> And 4,x first branch which include exceptions for this in
> libsata.c.
> >>> >> >>
> >>> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no
> to
> >>> >> >> go
> >>> >> >> deeper if packege for new kernel exist.
> >>> >> >>
> >>> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
> >>> >> >> <igor.voloshane...@gmail.com>:
> >>> >> >>>
> >>> >> >>> root@test:~# uname -a
> >>> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
> >>> >> >>> 17:37:22
> >>> >> >>> UTC
> >>> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux
> >>> >> >>>
> >>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just <sj...@redhat.com>:
> >>> >> >>>>
> >>> >> >>>> Also, can you include the kernel version?
> >>> >> >>>> -Sam
> >>> >> >>>>
> >>> >> >>>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just <sj...@redhat.com
> >
> >>> >> >>>> wrote:
> >>> >> >>>> > Snapshotting with cache/tiering *is* supposed to work.  Can
> you
> >>> >> >>>> > open a
> >>> >> >>>> > bug?
> >>> >> >>>> > -Sam
> >>> >> >>>> >
> >>> >> >>>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
> >>> >> >>>> > <andrija.pa...@gmail.com> wrote:
> >>> >> >>>> >> This was related to the caching layer, which doesnt support
> >>> >> >>>> >> snapshooting per
> >>> >> >>>> >> docs...for sake of closing the thread.
> >>> >> >>>> >>
> >>> >> >>>> >> On 17 August 2015 at 21:15, Voloshanenko Igor
> >>> >> >>>> >> <igor.voloshane...@gmail.com>
> >>> >> >>>> >> wrote:
> >>> >> >>>> >>>
> >>> >> >>>> >>> Hi all, can you please help me with unexplained
> situation...
> >>> >> >>>> >>>
> >>> >> >>>> >>> All snapshot inside ceph broken...
> >>> >> >>>> >>>
> >>> >> >>>> >>> So, as example, we have VM template, as rbd inside ceph.
> >>> >> >>>> >>> We can map it and mount to check that all ok with it
> >>> >> >>>> >>>
> >>> >> >>>> >>> root@test:~# rbd map
> >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> >>> >> >>>> >>> /dev/rbd0
> >>> >> >>>> >>> root@test:~# parted /dev/rbd0 print
> >>> >> >>>> >>> Model: Unknown (unknown)
> >>> >> >>>> >>> Disk /dev/rbd0: 10.7GB
> >>> >> >>>> >>> Sector size (logical/physical): 512B/512B
> >>> >> >>>> >>> Partition Table: msdos
> >>> >> >>>> >>>
> >>> >> >>>> >>> Number  Start   End     Size    Type     File system  Flags
> >>> >> >>>> >>>  1      1049kB  525MB   524MB   primary  ext4         boot
> >>> >> >>>> >>>  2      525MB   10.7GB  10.2GB  primary               lvm
> >>> >> >>>> >>>
> >>> >> >>>> >>> Than i want to create snap, so i do:
> >>> >> >>>> >>> root@test:~# rbd snap create
> >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> >> >>>> >>>
> >>> >> >>>> >>> And now i want to map it:
> >>> >> >>>> >>>
> >>> >> >>>> >>> root@test:~# rbd map
> >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> >> >>>> >>> /dev/rbd1
> >>> >> >>>> >>> root@test:~# parted /dev/rbd1 print
> >>> >> >>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only
> file
> >>> >> >>>> >>> system).
> >>> >> >>>> >>> /dev/rbd1 has been opened read-only.
> >>> >> >>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only
> file
> >>> >> >>>> >>> system).
> >>> >> >>>> >>> /dev/rbd1 has been opened read-only.
> >>> >> >>>> >>> Error: /dev/rbd1: unrecognised disk label
> >>> >> >>>> >>>
> >>> >> >>>> >>> Even md5 different...
> >>> >> >>>> >>> root@ix-s2:~# md5sum /dev/rbd0
> >>> >> >>>> >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
> >>> >> >>>> >>> root@ix-s2:~# md5sum /dev/rbd1
> >>> >> >>>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >>> >> >>>> >>>
> >>> >> >>>> >>>
> >>> >> >>>> >>> Ok, now i protect snap and create clone... but same
> thing...
> >>> >> >>>> >>> md5 for clone same as for snap,,
> >>> >> >>>> >>>
> >>> >> >>>> >>> root@test:~# rbd unmap /dev/rbd1
> >>> >> >>>> >>> root@test:~# rbd snap protect
> >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> >> >>>> >>> root@test:~# rbd clone
> >>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> >>> >> >>>> >>> cold-storage/test-image
> >>> >> >>>> >>> root@test:~# rbd map cold-storage/test-image
> >>> >> >>>> >>> /dev/rbd1
> >>> >> >>>> >>> root@test:~# md5sum /dev/rbd1
> >>> >> >>>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
> >>> >> >>>> >>>
> >>> >> >>>> >>> .... but it's broken...
> >>> >> >>>> >>> root@test:~# parted /dev/rbd1 print
> >>> >> >>>> >>> Error: /dev/rbd1: unrecognised disk label
> >>> >> >>>> >>>
> >>> >> >>>> >>>
> >>> >> >>>> >>> =========
> >>> >> >>>> >>>
> >>> >> >>>> >>> tech details:
> >>> >> >>>> >>>
> >>> >> >>>> >>> root@test:~# ceph -v
> >>> >> >>>> >>> ceph version 0.94.2
> (5fb85614ca8f354284c713a2f9c610860720bbf3)
> >>> >> >>>> >>>
> >>> >> >>>> >>> We have 2 inconstistent pgs, but all images not placed on
> this
> >>> >> >>>> >>> pgs...
> >>> >> >>>> >>>
> >>> >> >>>> >>> root@test:~# ceph health detail
> >>> >> >>>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> >>> >> >>>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> >>> >> >>>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> >>> >> >>>> >>> 18 scrub errors
> >>> >> >>>> >>>
> >>> >> >>>> >>> ============
> >>> >> >>>> >>>
> >>> >> >>>> >>> root@test:~# ceph osd map cold-storage
> >>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5
> >>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object
> >>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70
> >>> >> >>>> >>> (2.770)
> >>> >> >>>> >>> -> up
> >>> >> >>>> >>> ([37,15,14], p37) acting ([37,15,14], p37)
> >>> >> >>>> >>> root@test:~# ceph osd map cold-storage
> >>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
> >>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object
> >>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg
> 2.793cd4a3
> >>> >> >>>> >>> (2.4a3)
> >>> >> >>>> >>> -> up
> >>> >> >>>> >>> ([12,23,17], p12) acting ([12,23,17], p12)
> >>> >> >>>> >>> root@test:~# ceph osd map cold-storage
> >>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
> >>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object
> >>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg
> >>> >> >>>> >>> 2.9519c2a9
> >>> >> >>>> >>> (2.2a9)
> >>> >> >>>> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12)
> >>> >> >>>> >>>
> >>> >> >>>> >>>
> >>> >> >>>> >>> Also we use cache layer, which in current moment - in
> forward
> >>> >> >>>> >>> mode...
> >>> >> >>>> >>>
> >>> >> >>>> >>> Can you please help me with this.. As my brain stop to
> >>> >> >>>> >>> understand
> >>> >> >>>> >>> what is
> >>> >> >>>> >>> going on...
> >>> >> >>>> >>>
> >>> >> >>>> >>> Thank in advance!
> >>> >> >>>> >>>
> >>> >> >>>> >>>
> >>> >> >>>> >>>
> >>> >> >>>> >>>
> >>> >> >>>> >>>
> >>> >> >>>> >>> _______________________________________________
> >>> >> >>>> >>> ceph-users mailing list
> >>> >> >>>> >>> ceph-users@lists.ceph.com
> >>> >> >>>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >> >>>> >>>
> >>> >> >>>> >>
> >>> >> >>>> >>
> >>> >> >>>> >>
> >>> >> >>>> >> --
> >>> >> >>>> >>
> >>> >> >>>> >> Andrija Panić
> >>> >> >>>> >>
> >>> >> >>>> >> _______________________________________________
> >>> >> >>>> >> ceph-users mailing list
> >>> >> >>>> >> ceph-users@lists.ceph.com
> >>> >> >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >> >>>> >>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>
> >>> >
> >>> >
> >>
> >>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

Reply via email to