[ceph-users] Client io blocked when removing snapshot

2015-12-09 Thread Wukongming
Hi, All

I used a rbd command to create a 6TB-size image, And then created a snapshot of 
this image. After that, I kept writing something like modifying files so the 
snapshots would be cloned one by one.
At this time, I did the fellow 2 ops simultaneously.

1. keep client io to this image.
2. excute a rbd snap rm command to delete snapshot.

Finally ,I found client io blocked for quite a long time. I used SATA disk to 
test, and felt that ceph makes it a priority to remove snapshot.
Also we use iostat tool to help watch the disk state, and it runs in full 
workload.

So, should we have a priority to deal with client io instead of removing 
snapshot?
-
wukongming ID: 12019
Tel:0571-86760239
Dept:2014 UIS2 ONEStor

-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd merge-diff error

2015-12-09 Thread Josh Durgin

Hmm, perhaps there's a secondary bug.

Can you send the output from strace, i.e. strace.log after running:

cat snap1.diff | strace -f -o strace.log rbd merge-diff - snap2.diff 
combined.diff


for a case where it fails?

Josh

On 12/09/2015 08:38 PM, Alex Gorbachev wrote:

More oddity: retrying several times, the merge-diff sometimes works and
sometimes does not, using the same source files.

On Wed, Dec 9, 2015 at 10:15 PM, Alex Gorbachev mailto:a...@iss-integration.com>> wrote:

Hi Josh, looks like I celebrated too soon:

On Wed, Dec 9, 2015 at 2:25 PM, Josh Durgin mailto:jdur...@redhat.com>> wrote:

This is the problem:

http://tracker.ceph.com/issues/14030

As a workaround, you can pass the first diff in via stdin, e.g.:

cat snap1.diff | rbd merge-diff - snap2.diff combined.diff


one test worked - merging the initial full export (export-diff with
just one snapshot)

but the second one failed (merging two incremental diffs):

root@lab2-b1:/data/volume1# cat scrun1-120720151502.bck | rbd
merge-diff - scrun1-120720151504.bck scrun1-part04.bck
Merging image diff: 13% complete...failed.
rbd: merge-diff error

I am not sure how to run gdb in such scenario with stdin/stdout

Thanks,
Alex



Josh


On 12/08/2015 11:11 PM, Josh Durgin wrote:

On 12/08/2015 10:44 PM, Alex Gorbachev wrote:

Hi Josh,

On Mon, Dec 7, 2015 at 6:50 PM, Josh Durgin
mailto:jdur...@redhat.com>
>>
wrote:

 On 12/07/2015 03:29 PM, Alex Gorbachev wrote:

 When trying to merge two results of rbd
export-diff, the
 following error
 occurs:

 iss@lab2-b1:~$ rbd export-diff --from-snap
autosnap120720151500
 spin1/scrun1@autosnap120720151502
 /data/volume1/scrun1-120720151502.bck

 iss@lab2-b1:~$ rbd export-diff --from-snap
autosnap120720151504
 spin1/scrun1@autosnap120720151504
 /data/volume1/scrun1-120720151504.bck

 iss@lab2-b1:~$ rbd merge-diff
/data/volume1/scrun1-120720151502.bck
 /data/volume1/scrun1-120720151504.bck
 /data/volume1/mrg-scrun1-0204.bck
   Merging image diff: 11% complete...failed.
 rbd: merge-diff error

 That's all the output and I have found this link
http://tracker.ceph.com/issues/12911 but not sure if the
patch
 should
 have already been in hammer or how to get it?


 That patch fixed a bug that was only present after
hammer, due to
 parallelizing export-diff. You're likely seeing a
different (possibly
 new) issue.

 Unfortunately there's not much output we can enable for
export-diff in
 hammer. Could you try running the command via gdb
to figure out where
 and why it's failing? Make sure you have librbd-dbg
installed, then
 send the output from gdb doing:

 gdb --args rbd merge-diff
/data/volume1/scrun1-120720151502.bck \
 /data/volume1/scrun1-120720151504.bck
/data/volume1/mrg-scrun1-0204.bck
 break rbd.cc:1931
 break rbd.cc:1935
 break rbd.cc:1967
 break rbd.cc:1985
 break rbd.cc:1999
 break rbd.cc:2008
 break rbd.cc:2021
 break rbd.cc:2053
 break rbd.cc:2098
 run
 # (it will run now, stopping when it hits the error)
 info locals


Will do - how does one load librbd-dbg?  I have the
following on the
system:

librbd-dev - RADOS block device client library
(development files)
librbd1-dbg - debugging symbols for librbd1

is librbd1-dbg sufficient?


Yes, I just forgot the 1 in the package name.

Also a question - the merge-diff really stitches the to
diff files
together, not really merges, correct? For example, in
the following
workflow:

export-diff from full image - 10GB
export-diff from snap1 - 2 GB

Re: [ceph-users] building ceph rpms, "ceph --version" returns no version

2015-12-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

You actually have to walk through part of the make process before you
can build the tarball so that the version is added to the source
files.

I believe the steps are:
./autogen.sh
./configure
make dist-[gzip|bzip2|lzip|xz]

Then you can copy the SPEC file (should already have the version in it
as well as the ceph_ver.h file, I think this is the money maker you
are looking for) and go about your normal compilation. This creates a
tarball that is used to put on the website I think.
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.2
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWaRJTCRDmVDuy+mK58QAAmo4P/j4wkz6B1urR7G1QZmpd
7m+YxrlCfUZk31n58bDc4hXZOW0Y/5Qv6k1ELjw46wIqZAp/pE3AIagBIkTw
TufKDxWfBjnsMDe1smopz/T0M+ByY9s8hOnu+ry0KERXjTjarPmbbUHG2QVY
rB9oYGE4W+r0Cyek2CpUCkFXiPv5mxvMCmMBVBXTuSjOvpKi4JYLGZ1z91lx
FkfQMN8vE2Z3fIrp0aQUuJyQI/GaF8Qg9SEix7BguxjAGXyk7vyOL6z8j+8b
guDixym3aFvXpEYC26wdpoIbRB3chyfRim0AFQ/TgpSzDW6Di1OV8d7Zrtqe
nWUwuJsNYIWPJ+Zgzpa+CjrYBuVPZhf+TqSLRO6Xc27twsXCwZIHYy8sDOmJ
gfwEkmx6VmfJwpsPZPxfZBeCa0Ku3FlJI/HnYUVrZ268KpOLh8vyqQzJH0WJ
maxaCAnmsMNo5C3lXJK2MZG0sx7BEx4RcMECAJ9WOPoHF4D+0aGRbFbV3aL4
OFIy439uLpMTI/4RGxfr9O4T1LYR3FvBCd8VWHZn+kbvrx9L2Gydz+hjlcuS
yztGzzjuXUiCpnmRi7I6z1Myf2+D2ZbGkx4q+8s3c8zg/7TwrNdEXG+l9Qqw
+q1CbKWyX8u3MJfHWB5S6UMc3tPkMgWml7sV4OUEbEfeMU/znbKTniqa9C9M
Po6n
=nE/8
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Dec 9, 2015 at 10:01 AM,   wrote:
> Hi All,
>
>
>
> Long story short:
>
>
>
> I have built ceph hammer RPMs, everything seems to work OK but running “ceph
> --version” does not report the version number. I don’t get a version number
> returned from “service ceph status”, either. I’m concerned that other
> components in our system my rely on ceph --version returning a value, so I
> would like to fix this. Can anyone help, please?
>
>
>
> The back story:
>
>
>
> We need ceph hammer binaries that contain the most recent contributions from
> sponce (https://github.com/ceph/ceph/pull/6322) as we are using
> libradosstriper1 with ceph. As these changes were merged into the hammer
> branch after the latest hammer release, we have had to build our own RPMs.
>
>
>
> To do this, I did a git clone of the hammer branch shortly after this pull
> request was merged. I then tarballed the directory with the “--exclude-vcs”
> option, modified the supplied .spec file by changeing the value of
> “Version:” from “@VERSION@” to “0.94.5” and the value of “Release” from
> “@RPM_RELEASE@%{?dist}” to “1%{?dist}” and built the RPMs. I have since
> noticed the problem with ceph --version.
>
>
>
> The answer looks to be found somewhere in one of these files, but I’m coming
> at this as a system administrator, not a programmer, so it is all rather
> confusing:
>
> ceph-0.94.5/src/common/version.h
>
> ceph-0.94.5/src/common/version.cc
>
> ceph-0.94.5/CMakeLists.txt
>
> ceph-0.94.5/src/ceph_ver.c
>
> ceph-0.94.5/src/ceph_ver.h.in.cmake
>
>
>
> Many thanks in advance,
>
> Bruno
>
>
>
> Bruno Canning
>
> Scientific Computing Department
>
> STFC Rutherford Appleton Laboratory
>
> Harwell Oxford
>
> Didcot
>
> OX11 0QX
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem after reinstalling system

2015-12-09 Thread Christian Balzer

Hello,

I seem to vaguely remember a Ceph leveldb package, which might help in
this case, or something from the CentOS equivalent to backports maybe.

Christian

On Wed, 9 Dec 2015 22:18:56 -0700 Robert LeBlanc wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> I had this problem because CentOS and Debian have different versions
> of leveldb (Debian's was newer) and the old version would not read the
> new version. I just had to blow away the OSDs and let them backfill.
> Going from CentOS to Debian didn't require it, but going back required
> the backfill.
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Wed, Dec 9, 2015 at 5:25 AM, Jacek Jarosiewicz  wrote:
> > Hi,
> >
> > I have a working ceph cluster with storage nodes running Ubuntu 14.04
> > and ceph hammer 0.94.5.
> >
> > Now I want to switch to CentOS 7.1 (forget about the reasons for now,
> > I can explain, but it would be a long story and irrelevant to my
> > question).
> >
> > I've set the osd noout flag and norebalance,norecover for the time of
> > reinstall. The new system is installed with the same version of ceph.
> >
> > I've made a backup of /var/lib/ceph directory (after stopping ceph
> > services obviously) and kept the osd's intact.
> >
> > But after reinstall, when I try to start the daemons (the machine run
> > one monitor and three osd's) I get these messages in the logs:
> >
> > monitor:
> > 2015-12-09 13:15:28.223872 7f4ccd41b880  0 ceph version 0.94.5
> > (9764da52395923e0b32908d83a9f7304401fee43), process ceph-mon, pid 5800
> > 2015-12-09 13:15:29.411448 7f4ccd41b880 -1 error opening mon data
> > directory at '/var/lib/ceph/mon/ceph-cf03': (22) Invalid argument
> >
> >
> > osds:
> >
> > 2015-12-09 13:11:50.480625 7fac03c7f880  0 ceph version 0.94.5
> > (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 3092
> > 2015-12-09 13:11:50.508803 7fac03c7f880  0
> > filestore(/var/lib/ceph/osd/ceph-5) backend xfs (magic 0x58465342)
> > 2015-12-09 13:11:50.640410 7fac03c7f880  0
> > genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features:
> > FIEMAP ioctl is supported and appears to work
> > 2015-12-09 13:11:50.640429 7fac03c7f880  0
> > genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features:
> > FIEMAP ioctl is disabled via 'filestore fiemap' config option
> > 2015-12-09 13:11:50.640890 7fac03c7f880  0
> > genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features:
> > syncfs(2) syscall fully supported (by glibc and kernel)
> > 2015-12-09 13:11:50.646915 7fac03c7f880  0
> > xfsfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_feature: extsize
> > is supported and kernel 3.10.0-229.20.1.el7.x86_64 >= 3.5
> > 2015-12-09 13:11:51.171377 7fac03c7f880 -1
> > filestore(/var/lib/ceph/osd/ceph-5) Error initializing leveldb :
> > Corruption: 29 missing files;
> > e.g.: /var/lib/ceph/osd/ceph-5/current/omap/046388.sst
> >
> > 2015-12-09 13:11:51.171399 7fac03c7f880 -1 osd.5 0 OSD:init: unable to
> > mount object store
> > 2015-12-09 13:11:51.171404 7fac03c7f880 -1  ** ERROR: osd init failed:
> > (1) Operation not permitted
> >
> >
> > can anyone help? I don't see any sst files on any of my other
> > (working) ceph nodes, the directories are fine with correct
> > permissions..
> >
> > I can readd this machine from scratch without data loss, but the
> > rebalancing/recovering will last a week (been there, done that), so I
> > was hoping I could start the osds with only some data out of date.
> >
> > Is it possible? What can I do?
> >
> > Cheers,
> > J
> >
> > --
> > Jacek Jarosiewicz
> > Administrator Systemów Informatycznych
> >
> > 
> > SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
> > ul. Senatorska 13/15, 00-075 Warszawa
> > Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego
> > Rejestru Sądowego,
> > nr KRS 029537; kapitał zakładowy 42.756.000 zł
> > NIP: 957-05-49-503
> > Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa
> >
> > 
> > SUPERMEDIA ->   http://www.supermedia.pl
> > dostep do internetu - hosting - kolokacja - lacza - telefonia
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v1.3.2
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWaQs6CRDmVDuy+mK58QAAoz8P/3wg1VTehoNaE2+cp1lB
> JDPG5g4uwh68kJ96wBuiKVjoabEut1QiQkaM+5Ne+gkt7ZJPUqdF8UEdhTZ3
> 5JV7Xn+3pqjNeAu1R91aF40xp0RAXlhSC6MjfU7GEl0KSvYbLomoR4nyGxrP
> //woszrmbau80f7A5Of0T2ILwx77FJbokOdGSsjL711LDjuqo/nXa/eLzYN1
> jtEC/pwqkbSBYR6avi8ZxqQZoMmPHeXxaTSK4dQyY7l6fWyBk0LB6MZYYaVl
> 8T96wI8uMFBGPu13OYysuq6qrpJ/Cc0YglcmTqBpIOCjLHsZMUMtEgcvLvIB
> m9qobcoxoqyTxjpa5EOhTV+7Qy9qPRv4vjyS0dcrC

Re: [ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I noticed this a while back and did some tracing. As soon as the PGs
are read in by the OSD (very limited amount of housekeeping done), the
OSD is set to the "in" state so that peering with other OSDs can
happen and the recovery process can begin. The problem is that when
the OSD is "in", the clients also see that and start sending requests
to the OSDs before it has had a chance to actually get its bearings
and is able to even service the requests. After discussion with some
of the developers, there is no easy way around this other than let the
PGs recover to other OSDs and then bring in the OSDs after recovery (a
ton of data movement).

I've suggested some options on how to work around this issue, but they
all require a large amount of rework. Since I'm very interested in
reducing this problem, I'm willing to try and submit a fix after I'm
done with the new OP queue I'm working on. I don't know the best
course of action at the moment, but I hope I can get some input for
when I do try and tackle the problem next year.

1. Add a new state that allows OSDs to peer without client requests
coming in. (up -> in -> active) I'm not sure if other OSDs are seen as
clients, I don't think so. I'm not sure if there would have to be some
trickery to make the booting OSDs not be primary until all the PGs are
read and ready for I/O (not necessary recovered yet).
2. When a request comes in for a PG that is not ready, send the client
a redirect message to use the primary in a previous map. I have a
feeling this could be very messy and not very safe.
3. Proxy the OP on behalf of the client until the PGs are ready. The
"other" OSD would have to understand that it is OK to do that
write/read OP even though it is not the primary, this can be difficult
to do safely.

Right now I'm leaning to option #1. When the new OSD boots, keep the
previous primary running and the PG is in degraded mode until the new
OSD has done all of it's housekeeping and can service the IO
effectively, then make a change to the CRUSH map to swap the primaries
where needed. Any input and ideas from the devs would be helpful.
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.2
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWaQ/tCRDmVDuy+mK58QAAt78QAIipf97avpZv+FLF0SUT
F9vaUwTDI8fTpOmca1v4/nJ90pxM0RksYpg7Q+tg7+JlyQ6gns2QoKwUAf5F
EgVPg6pUQXmzkKcVvUgt51NDR4d80E+xIXHSmJKT4iU3BPI5ezNHYoVlAOhm
LXdDrYTaEPy/EfQxj5Prole0mLsCB129ydgPG7ud1qaNjzxLikyihLvA72Bd
AZhOhvjXTXGzWR1Uw2oPStYuw2i0JrFHp9//bipa6hqHd1XJSb3afe6VW9vJ
9E3AqGXMrdZG5Nk7kjaH7MfZbsxl39KimgAcHPDBz1XK2ZrSrtNZ1nTo09+u
Bb8DIB66kAT/4OIXQ1NvwTNn8INi9u14IFPzS2Z1Ewidg7jMAPkS0XxIPjhF
6G01GornpfN+emhOsQRz5sw6WPC8dlLGP9JfEP8+rPkLcNqBP82aCJ68AllZ
TWelhgAJoW/LdyyCaFD87wmQ1lqQxbujcDsLaDzBLQ/vDqmw9mNTubCIKfR2
WKRft9CyDR5r/Ous16RVsy+PFhmw/e/ovrWBFLx4t/KrbQVYUCfgDZrNSLtb
4aNRUtel7PN3AXUtFM8O7gS+CaYv5fP+CotQer8HuSnL4eFGIe9yg2jHSGVy
fmDFEirT3DlxFEDWja8uNFGdJ8rMYjTqOMdyOCS3SLtizTmC/+SF00kk0m9A
sB8x
=i9F/
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Dec 9, 2015 at 7:33 AM, Christian Kauhaus  wrote:
> Am 09.12.2015 um 11:21 schrieb Jan Schermer:
>> Are you seeing "peering" PGs when the blocked requests are happening? That's 
>> what we see regularly when starting OSDs.
>
> Mostly "peering" and "activating".
>
>> I'm not sure this can be solved completely (and whether there are major 
>> improvements in newer Ceph versions), but it can be sped up by
>> 1) making sure you have free (and not dirtied or fragmented) memory on the 
>> node where you are starting the OSD
>>   - that means dropping caches before starting the OSD if you have lots 
>> of "free" RAM that is used for VFS cache
>> 2) starting the OSDs one by one instead of booting several of them
>> 3) if you pin the OSDs to CPUs/cores, do that after the OSD is in - I found 
>> it to be best to pin the OSD to a cgroup limited to one NUMA node and then 
>> limit it to a subset of cores after it has run a bit. OSD tends to use 
>> hundreds of % of CPU when booting
>> 4) you could possibly prewarm cache for the OSD in /var/lib/ceph/osd...
>
> Thank you for your advice. The use case is not so much after rebooting a
> server, but more when we take OSDs in/out for maintenance. During boot, we
> already start them one after another with 10s pause between each pair.
>
> I've done a bit of tracing. I've kept a small cluster running with 2 "in" OSDs
> out of 3 and put the third one "in" at 15:06:22. From ceph.log:
>
> | 2015-12-09 15:06:22.827030 mon.0 172.20.4.6:6789/0 54964 : cluster [INF]
> osdmap e264345: 3 osds: 3 up, 3 in
> | 2015-12-09 15:06:22.828693 mon.0 172.20.4.6:6789/0 54965 : cluster [INF]
> pgmap v39871295: 1800 pgs: 1800 active+clean; 439 GB data, 906 GB used, 4515
> GB / 5421 GB avail; 6406 B/s rd, 889 kB/s wr, 67 op/s
> | [...]
> | 2015-12-09 15:06:29.163793 mon.0 172.20.4.6:6789/0 54972 : cluster [INF]
> pgmap 

Re: [ceph-users] problem after reinstalling system

2015-12-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I had this problem because CentOS and Debian have different versions
of leveldb (Debian's was newer) and the old version would not read the
new version. I just had to blow away the OSDs and let them backfill.
Going from CentOS to Debian didn't require it, but going back required
the backfill.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Dec 9, 2015 at 5:25 AM, Jacek Jarosiewicz  wrote:
> Hi,
>
> I have a working ceph cluster with storage nodes running Ubuntu 14.04 and
> ceph hammer 0.94.5.
>
> Now I want to switch to CentOS 7.1 (forget about the reasons for now, I can
> explain, but it would be a long story and irrelevant to my question).
>
> I've set the osd noout flag and norebalance,norecover for the time of
> reinstall. The new system is installed with the same version of ceph.
>
> I've made a backup of /var/lib/ceph directory (after stopping ceph services
> obviously) and kept the osd's intact.
>
> But after reinstall, when I try to start the daemons (the machine run one
> monitor and three osd's) I get these messages in the logs:
>
> monitor:
> 2015-12-09 13:15:28.223872 7f4ccd41b880  0 ceph version 0.94.5
> (9764da52395923e0b32908d83a9f7304401fee43), process ceph-mon, pid 5800
> 2015-12-09 13:15:29.411448 7f4ccd41b880 -1 error opening mon data directory
> at '/var/lib/ceph/mon/ceph-cf03': (22) Invalid argument
>
>
> osds:
>
> 2015-12-09 13:11:50.480625 7fac03c7f880  0 ceph version 0.94.5
> (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 3092
> 2015-12-09 13:11:50.508803 7fac03c7f880  0
> filestore(/var/lib/ceph/osd/ceph-5) backend xfs (magic 0x58465342)
> 2015-12-09 13:11:50.640410 7fac03c7f880  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features: FIEMAP
> ioctl is supported and appears to work
> 2015-12-09 13:11:50.640429 7fac03c7f880  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2015-12-09 13:11:50.640890 7fac03c7f880  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features: syncfs(2)
> syscall fully supported (by glibc and kernel)
> 2015-12-09 13:11:50.646915 7fac03c7f880  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_feature: extsize is
> supported and kernel 3.10.0-229.20.1.el7.x86_64 >= 3.5
> 2015-12-09 13:11:51.171377 7fac03c7f880 -1
> filestore(/var/lib/ceph/osd/ceph-5) Error initializing leveldb : Corruption:
> 29 missing files; e.g.: /var/lib/ceph/osd/ceph-5/current/omap/046388.sst
>
> 2015-12-09 13:11:51.171399 7fac03c7f880 -1 osd.5 0 OSD:init: unable to mount
> object store
> 2015-12-09 13:11:51.171404 7fac03c7f880 -1  ** ERROR: osd init failed: (1)
> Operation not permitted
>
>
> can anyone help? I don't see any sst files on any of my other (working) ceph
> nodes, the directories are fine with correct permissions..
>
> I can readd this machine from scratch without data loss, but the
> rebalancing/recovering will last a week (been there, done that), so I was
> hoping I could start the osds with only some data out of date.
>
> Is it possible? What can I do?
>
> Cheers,
> J
>
> --
> Jacek Jarosiewicz
> Administrator Systemów Informatycznych
>
> 
> SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
> ul. Senatorska 13/15, 00-075 Warszawa
> Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru
> Sądowego,
> nr KRS 029537; kapitał zakładowy 42.756.000 zł
> NIP: 957-05-49-503
> Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa
>
> 
> SUPERMEDIA ->   http://www.supermedia.pl
> dostep do internetu - hosting - kolokacja - lacza - telefonia
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.2
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWaQs6CRDmVDuy+mK58QAAoz8P/3wg1VTehoNaE2+cp1lB
JDPG5g4uwh68kJ96wBuiKVjoabEut1QiQkaM+5Ne+gkt7ZJPUqdF8UEdhTZ3
5JV7Xn+3pqjNeAu1R91aF40xp0RAXlhSC6MjfU7GEl0KSvYbLomoR4nyGxrP
//woszrmbau80f7A5Of0T2ILwx77FJbokOdGSsjL711LDjuqo/nXa/eLzYN1
jtEC/pwqkbSBYR6avi8ZxqQZoMmPHeXxaTSK4dQyY7l6fWyBk0LB6MZYYaVl
8T96wI8uMFBGPu13OYysuq6qrpJ/Cc0YglcmTqBpIOCjLHsZMUMtEgcvLvIB
m9qobcoxoqyTxjpa5EOhTV+7Qy9qPRv4vjyS0dcrCHUHwNP9QQOmbZIfqskd
W+ENtNK2vBLN+CVd6rXt47RP2+oP5/rIIqSW6KX+LyiCy3FDrn3+C7BFT1re
gyldiVz7KATwJX2/6Wsn2j942x62eZRS+szMqUemTsBJ8gAuaEIhHwJJd5Et
2wKirSOFB0eVtvpuOC5VkmrsRTpwpV9ZHeWbjp7Vf0XrrWyWFwK1a4fydmCd
M0EwhAJjQiSUJL3LTdfkSBMtoJcZzwM9K4eZwSkqgu9blYQM9OvzsNI8Ssk2
C7oXMHGgInqOeShQ3qGzKINo8rbirXwnvWkBL7g1We5s2nJ3acrlWRaTDdnA
kSpX
=+Zg+
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists

Re: [ceph-users] rbd merge-diff error

2015-12-09 Thread Alex Gorbachev
More oddity: retrying several times, the merge-diff sometimes works and
sometimes does not, using the same source files.

On Wed, Dec 9, 2015 at 10:15 PM, Alex Gorbachev 
wrote:

> Hi Josh, looks like I celebrated too soon:
>
> On Wed, Dec 9, 2015 at 2:25 PM, Josh Durgin  wrote:
>
>> This is the problem:
>>
>> http://tracker.ceph.com/issues/14030
>>
>> As a workaround, you can pass the first diff in via stdin, e.g.:
>>
>> cat snap1.diff | rbd merge-diff - snap2.diff combined.diff
>
>
> one test worked - merging the initial full export (export-diff with just
> one snapshot)
>
> but the second one failed (merging two incremental diffs):
>
> root@lab2-b1:/data/volume1# cat scrun1-120720151502.bck | rbd merge-diff
> - scrun1-120720151504.bck scrun1-part04.bck
> Merging image diff: 13% complete...failed.
> rbd: merge-diff error
>
> I am not sure how to run gdb in such scenario with stdin/stdout
>
> Thanks,
> Alex
>
>
>
>>
>>
>> Josh
>>
>>
>> On 12/08/2015 11:11 PM, Josh Durgin wrote:
>>
>>> On 12/08/2015 10:44 PM, Alex Gorbachev wrote:
>>>
 Hi Josh,

 On Mon, Dec 7, 2015 at 6:50 PM, Josh Durgin >>> > wrote:

 On 12/07/2015 03:29 PM, Alex Gorbachev wrote:

 When trying to merge two results of rbd export-diff, the
 following error
 occurs:

 iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151500
 spin1/scrun1@autosnap120720151502
 /data/volume1/scrun1-120720151502.bck

 iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151504
 spin1/scrun1@autosnap120720151504
 /data/volume1/scrun1-120720151504.bck

 iss@lab2-b1:~$ rbd merge-diff
 /data/volume1/scrun1-120720151502.bck
 /data/volume1/scrun1-120720151504.bck
 /data/volume1/mrg-scrun1-0204.bck
   Merging image diff: 11% complete...failed.
 rbd: merge-diff error

 That's all the output and I have found this link
 http://tracker.ceph.com/issues/12911 but not sure if the patch
 should
 have already been in hammer or how to get it?


 That patch fixed a bug that was only present after hammer, due to
 parallelizing export-diff. You're likely seeing a different
 (possibly
 new) issue.

 Unfortunately there's not much output we can enable for
 export-diff in
 hammer. Could you try running the command via gdb to figure out
 where
 and why it's failing? Make sure you have librbd-dbg installed, then
 send the output from gdb doing:

 gdb --args rbd merge-diff /data/volume1/scrun1-120720151502.bck \
 /data/volume1/scrun1-120720151504.bck
 /data/volume1/mrg-scrun1-0204.bck
 break rbd.cc:1931
 break rbd.cc:1935
 break rbd.cc:1967
 break rbd.cc:1985
 break rbd.cc:1999
 break rbd.cc:2008
 break rbd.cc:2021
 break rbd.cc:2053
 break rbd.cc:2098
 run
 # (it will run now, stopping when it hits the error)
 info locals


 Will do - how does one load librbd-dbg?  I have the following on the
 system:

 librbd-dev - RADOS block device client library (development files)
 librbd1-dbg - debugging symbols for librbd1

 is librbd1-dbg sufficient?

>>>
>>> Yes, I just forgot the 1 in the package name.
>>>
>>> Also a question - the merge-diff really stitches the to diff files
 together, not really merges, correct? For example, in the following
 workflow:

 export-diff from full image - 10GB
 export-diff from snap1 - 2 GB
 export-diff from snap2 - 1 GB

 My resulting merge export file would be 13GB, correct?

>>>
>>> It does merge overlapping sections, i.e. part of snap1 that was
>>> overwritten in snap2, so the merged diff may be smaller than the
>>> original two.
>>>
>>> Josh
>>>
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] http://gitbuilder.ceph.com/

2015-12-09 Thread Xav Paice
To get us around the immediate problem, I copied the deb I needed from a
cache to a private repo - I'm sorry that's not going to help you at all,
but if you need a copy, let me know.

The documentation upstream shows that the mod_fastcgi is for older apache
only, and 2.4 onwards can use mod_proxy_fcgi which doesn't rely on that
repo at all.  Seems like a smaller change than switching to civetweb, but I
totally agree that civetweb is simpler.

Just a note - http://docs.ceph.com/docs/master/install/install-ceph-gateway/
doesn't even mention civetweb.  If that's the preferred approach, it would
be nice to document it as such.

On 10 December 2015 at 12:58, Andrew Woodward  wrote:

> This has also exploded puppet-ceph CI. Do we have a workaround? Moving
> to Civetweb is in progress but I would prefer to not disable all of
> the RGW integration until it can be merged.
>
> [1]
> http://logs.openstack.org/21/255421/1/check/gate-puppet-ceph-puppet-beaker-rspec-dsvm-trusty/e75bc1b/console.html#_2015-12-09_19_28_43_778
>
>
> On Tue, Dec 8, 2015 at 7:46 AM Ken Dreyer  wrote:
> >
> > Yes, we've had to move all of our hardware out of the datacenter in
> > Irvine, California to a new home in Raleigh, North Carolina. The
> > backend server for gitbuilder.ceph.com had a *lot* of data and we were
> > not able to sync all of it to an interim server in Raleigh before we
> > had to unplug the old one.
> >
> > Since you brought up fastcgi, it's a good idea to transition your
> > cluster from Apache+mod_fastcgi and start using RGW's Civetweb server
> > instead. Civetweb is much simpler, and future RGW optimizations are
> > all going into the Civetweb stack.
> >
> > - Ken
> >
> > On Tue, Dec 8, 2015 at 2:54 AM, Xav Paice  wrote:
> > > Hi,
> > >
> > > Just wondering if there's a known issue with
> http://gitbuilder.ceph.com/ -
> > > if I go to several urls, e.g.
> > >
> http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-trusty-x86_64-basic,
> I
> > > get a 403.  That's still the right place to get deb's, right?
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High disk utilisation

2015-12-09 Thread Christian Balzer

Hello,

On Wed, 9 Dec 2015 15:57:36 + MATHIAS, Bryn (Bryn) wrote:

> to update this, the error looks like it comes from updatedb scanning the
> ceph disks.
> 
> When we make sure it doesn’t, by putting the ceph mount points in the
> exclusion file, the problem goes away.
> 
Ah, I didn't even think about this, as I have been disabling updatedb or
excluding data trees for years now. 
It's probably something that would a good addition to the documentation.

Also with atop you would have immediately seen who the culprit was.

Regard,

Christian
> Thanks for the help and time.
> On 30 Nov 2015, at 09:53, MATHIAS, Bryn (Bryn)
> mailto:bryn.math...@alcatel-lucent.com>>
> wrote:
> 
> 
> On 30 Nov 2015, at 14:37, MATHIAS, Bryn (Bryn)
> mailto:bryn.math...@alcatel-lucent.com>>
> wrote:
> 
> Hi,
> On 30 Nov 2015, at 13:44, Christian Balzer
> mailto:ch...@gol.com>> wrote:
> 
> 
> Hello,
> 
> On Mon, 30 Nov 2015 07:55:24 + MATHIAS, Bryn (Bryn) wrote:
> 
> Hi Christian,
> 
> I’ll give you a much better dump of detail :)
> 
> Running RHEL 7.1,
> ceph version 0.94.5
> 
> all ceph disks are xfs, with journals on a partition on the disk
> Disks: 6Tb spinners.
> 
> OK, I was guessing that journal on disk, but good to know.
> Which exact model?
> Some of them are rather unsuited for Ceph usage (SMR).
> I don’t know the exact model of the disks but they are not SMR disks.
> 
> Erasure coded pool with 4+1 EC ISA-L also.
> 
> OK, this is where I plead ignorance, no EC experience at all.
> But it would be strange for this to be hitting a single disk at a time.
> It is hitting a single disk in each node, however I’d have thought that
> I’d see repetition over the disks if it were doing this on a per
> placement group basis.
> 
> No scrubbing reported in the ceph log, the cluster isn’t old enough yet
> to be doing any deep scrubbing. Also the cpu usage of the osd deamon
> that controls the disk isn’t spiking which I have seen previously when
> scrubbing or deep scrubbing is taking place.
> 
> Alright, can you confirm (with atop or the likes) that the busy disk is
> actually being written/read to by the OSD process in question and if
> there is a corresponding network traffic for the amount of I/O?
> I checked for network traffic, there didn’t look to be any.
> Looks like the problem is transient and has disappeared for the moment.
> I will post more when I see the problem again.
> 
> Bryn
> 
> Christian
> 
> 
> All disks are at 2% utilisation as given by df.
> 
> For explicitness:
> [root@au-sydney ~]# ceph -s
>   cluster ff900f17-7eec-4fe1-8f31-657d44b86a22
>health HEALTH_OK
>monmap e5: 5 mons at
> {au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,au-sydney=10.50.21.20:6789/0}
> election epoch 274, quorum 0,1,2,3,4
> au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide osdmap e8549:
> 120 osds: 120 up, 120 in pgmap v408422: 8192 pgs, 2 pools, 7794 GB data,
> 5647 kobjects 9891 GB used, 644 TB / 654 TB avail 8192 active+clean
> client io 68363 kB/s wr, 1249 op/s
> 
> 
> Cheers,
> Bryn
> 
> 
> On 30 Nov 2015, at 12:57, Christian Balzer
> mailto:ch...@gol.com>> wrote:
> 
> 
> Hello,
> 
> On Mon, 30 Nov 2015 07:15:35 + MATHIAS, Bryn (Bryn) wrote:
> 
> Hi All,
> 
> I am seeing an issue with ceph performance.
> Starting from an empty cluster of 5 nodes, ~600Tb of storage.
> 
> It would be helpful to have more details (all details in fact) than this.
> Complete HW, OS, FS used, Ceph versions and configuration details
> (journals on HDD, replication levels etc).
> 
> While this might not seem significant to your current question, it might
> prove valuable as to why you're seeing performance problems and how to
> address them.
> 
> monitoring disk usage in nmon I see rolling 100% usage of a disk.
> Ceph -w doesn’t report any spikes in throughput and the application
> putting data is not spiking in the load generated.
> 
> 
> The ceph.log should give a more detailed account, but assuming your
> client side is indeed steady state, this could be very well explained by
> scrubbing, especially deep-scrubbing.
> That should also be visible in the ceph.log.
> 
> Christian
> 
> │sdg2   0%0.0  537.5|
> |
> │ │sdh 2%4.0
> 4439.8|RW
>   
> │
> │sdh1 2%4.0
> 3972.3|RW
>   
>  │
> │sdh2   0%0.0  467.6|
>   |
> │ │sdj 3%2.0
> 3524.7|RW
>   
>│
> │sdj1 3%2.0
> 3488.7|RW
>   

[ceph-users] Monitor rename / recreate issue -- probing state

2015-12-09 Thread deeepdish
Hello,

I encountered a strange issue when rebuilding monitors reusing same hostnames, 
however different IPs.

Steps to reproduce:

- Build monitor using ceph-deploy create mon 
- Remove monitor via 
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ (remove 
monitor) — I didn’t realize there was a ceph-deploy mon destroy command at this 
point.
- Build a new monitor on same hardware using ceph-deploy create mon 
  # reason = to rename / change IP of monitor as per above link
- Monitor ends up in probing mode.   When connecting via the admin socket, I 
see that there are no peers avail.   

The above behavior of only when reinstalling monitors.   I even tried 
reinstalling the OS, however there’s a monmap embedded somewhere causing the 
previous monitor hostnames / IPs to conflict with the new monitor’s peering 
ability.  

On a working monitor:

# sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.b02s08.asok 
mon_status
{
"name": "b02s08",
"rank": 0,
"state": "leader",
"election_epoch": 2618,
"quorum": [
0,
1,
2
],
"outside_quorum": [],
"extra_probe_peers": [
"10.20.10.14:6789\/0",
"10.20.10.16:6789\/0"
],
"sync_provider": [],
"monmap": {
"epoch": 12,
"fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7",
"modified": "2015-12-09 06:23:43.665100",
"created": "0.00",
"mons": [
{
"rank": 0,
"name": "b02s08",
"addr": "10.20.1.8:6789\/0"
},
{
"rank": 1,
"name": "smon01",
"addr": "10.20.10.251:6789\/0"
},
{
"rank": 2,
"name": "smon02",
"addr": "10.20.10.252:6789\/0"
}
]
}
}

[root@b02s08 ~]# 

On a reinstalled (not working) monitor:

 sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smg01.asok 
mon_status
{
"name": "smg01",
"rank": 0,
"state": "probing",
"election_epoch": 0,
"quorum": [],
"outside_quorum": [
"smg01"
],
"extra_probe_peers": [
"10.20.1.8:6789\/0",
"10.20.10.14:6789\/0",
"10.20.10.16:6789\/0",
"10.20.10.18:6789\/0",
"10.20.10.251:6789\/0",
"10.20.10.252:6789\/0"
],
"sync_provider": [],
"monmap": {
"epoch": 0,
"fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7",
"modified": "0.00",
"created": "0.00",
"mons": [
{
"rank": 0,
"name": "smg01",
"addr": "10.20.10.250:6789\/0"
},
{
"rank": 1,
"name": "b02vm14s",
"addr": "0.0.0.0:0\/1"
},
{
"rank": 2,
"name": "b02vm16s",
"addr": "0.0.0.0:0\/2"
},
{
"rank": 3,
"name": "b02s18s",
"addr": "0.0.0.0:0\/3"
},
{
"rank": 4,
"name": "smon01s",
"addr": "0.0.0.0:0\/4"
},
{
"rank": 5,
"name": "smon02s",
"addr": "0.0.0.0:0\/5"
},
{
"rank": 6,
"name": "b02s08",
"addr": "0.0.0.0:0\/6"
}
]
}
}


How can I correct this?

Thanks.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd merge-diff error

2015-12-09 Thread Alex Gorbachev
Hi Josh, looks like I celebrated too soon:

On Wed, Dec 9, 2015 at 2:25 PM, Josh Durgin  wrote:

> This is the problem:
>
> http://tracker.ceph.com/issues/14030
>
> As a workaround, you can pass the first diff in via stdin, e.g.:
>
> cat snap1.diff | rbd merge-diff - snap2.diff combined.diff


one test worked - merging the initial full export (export-diff with just
one snapshot)

but the second one failed (merging two incremental diffs):

root@lab2-b1:/data/volume1# cat scrun1-120720151502.bck | rbd merge-diff -
scrun1-120720151504.bck scrun1-part04.bck
Merging image diff: 13% complete...failed.
rbd: merge-diff error

I am not sure how to run gdb in such scenario with stdin/stdout

Thanks,
Alex



>
>
> Josh
>
>
> On 12/08/2015 11:11 PM, Josh Durgin wrote:
>
>> On 12/08/2015 10:44 PM, Alex Gorbachev wrote:
>>
>>> Hi Josh,
>>>
>>> On Mon, Dec 7, 2015 at 6:50 PM, Josh Durgin >> > wrote:
>>>
>>> On 12/07/2015 03:29 PM, Alex Gorbachev wrote:
>>>
>>> When trying to merge two results of rbd export-diff, the
>>> following error
>>> occurs:
>>>
>>> iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151500
>>> spin1/scrun1@autosnap120720151502
>>> /data/volume1/scrun1-120720151502.bck
>>>
>>> iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151504
>>> spin1/scrun1@autosnap120720151504
>>> /data/volume1/scrun1-120720151504.bck
>>>
>>> iss@lab2-b1:~$ rbd merge-diff
>>> /data/volume1/scrun1-120720151502.bck
>>> /data/volume1/scrun1-120720151504.bck
>>> /data/volume1/mrg-scrun1-0204.bck
>>>   Merging image diff: 11% complete...failed.
>>> rbd: merge-diff error
>>>
>>> That's all the output and I have found this link
>>> http://tracker.ceph.com/issues/12911 but not sure if the patch
>>> should
>>> have already been in hammer or how to get it?
>>>
>>>
>>> That patch fixed a bug that was only present after hammer, due to
>>> parallelizing export-diff. You're likely seeing a different (possibly
>>> new) issue.
>>>
>>> Unfortunately there's not much output we can enable for
>>> export-diff in
>>> hammer. Could you try running the command via gdb to figure out where
>>> and why it's failing? Make sure you have librbd-dbg installed, then
>>> send the output from gdb doing:
>>>
>>> gdb --args rbd merge-diff /data/volume1/scrun1-120720151502.bck \
>>> /data/volume1/scrun1-120720151504.bck
>>> /data/volume1/mrg-scrun1-0204.bck
>>> break rbd.cc:1931
>>> break rbd.cc:1935
>>> break rbd.cc:1967
>>> break rbd.cc:1985
>>> break rbd.cc:1999
>>> break rbd.cc:2008
>>> break rbd.cc:2021
>>> break rbd.cc:2053
>>> break rbd.cc:2098
>>> run
>>> # (it will run now, stopping when it hits the error)
>>> info locals
>>>
>>>
>>> Will do - how does one load librbd-dbg?  I have the following on the
>>> system:
>>>
>>> librbd-dev - RADOS block device client library (development files)
>>> librbd1-dbg - debugging symbols for librbd1
>>>
>>> is librbd1-dbg sufficient?
>>>
>>
>> Yes, I just forgot the 1 in the package name.
>>
>> Also a question - the merge-diff really stitches the to diff files
>>> together, not really merges, correct? For example, in the following
>>> workflow:
>>>
>>> export-diff from full image - 10GB
>>> export-diff from snap1 - 2 GB
>>> export-diff from snap2 - 1 GB
>>>
>>> My resulting merge export file would be 13GB, correct?
>>>
>>
>> It does merge overlapping sections, i.e. part of snap1 that was
>> overwritten in snap2, so the merged diff may be smaller than the
>> original two.
>>
>> Josh
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd merge-diff error

2015-12-09 Thread Alex Gorbachev
Great, thanks Josh!  Using stdin/stdout merge-diff is working.  Thank you
for looking into this.

--
Alex Gorbachev
Storcium

On Wed, Dec 9, 2015 at 2:25 PM, Josh Durgin  wrote:

> This is the problem:
>
> http://tracker.ceph.com/issues/14030
>
> As a workaround, you can pass the first diff in via stdin, e.g.:
>
> cat snap1.diff | rbd merge-diff - snap2.diff combined.diff
>
> Josh
>
>
> On 12/08/2015 11:11 PM, Josh Durgin wrote:
>
>> On 12/08/2015 10:44 PM, Alex Gorbachev wrote:
>>
>>> Hi Josh,
>>>
>>> On Mon, Dec 7, 2015 at 6:50 PM, Josh Durgin >> > wrote:
>>>
>>> On 12/07/2015 03:29 PM, Alex Gorbachev wrote:
>>>
>>> When trying to merge two results of rbd export-diff, the
>>> following error
>>> occurs:
>>>
>>> iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151500
>>> spin1/scrun1@autosnap120720151502
>>> /data/volume1/scrun1-120720151502.bck
>>>
>>> iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151504
>>> spin1/scrun1@autosnap120720151504
>>> /data/volume1/scrun1-120720151504.bck
>>>
>>> iss@lab2-b1:~$ rbd merge-diff
>>> /data/volume1/scrun1-120720151502.bck
>>> /data/volume1/scrun1-120720151504.bck
>>> /data/volume1/mrg-scrun1-0204.bck
>>>   Merging image diff: 11% complete...failed.
>>> rbd: merge-diff error
>>>
>>> That's all the output and I have found this link
>>> http://tracker.ceph.com/issues/12911 but not sure if the patch
>>> should
>>> have already been in hammer or how to get it?
>>>
>>>
>>> That patch fixed a bug that was only present after hammer, due to
>>> parallelizing export-diff. You're likely seeing a different (possibly
>>> new) issue.
>>>
>>> Unfortunately there's not much output we can enable for
>>> export-diff in
>>> hammer. Could you try running the command via gdb to figure out where
>>> and why it's failing? Make sure you have librbd-dbg installed, then
>>> send the output from gdb doing:
>>>
>>> gdb --args rbd merge-diff /data/volume1/scrun1-120720151502.bck \
>>> /data/volume1/scrun1-120720151504.bck
>>> /data/volume1/mrg-scrun1-0204.bck
>>> break rbd.cc:1931
>>> break rbd.cc:1935
>>> break rbd.cc:1967
>>> break rbd.cc:1985
>>> break rbd.cc:1999
>>> break rbd.cc:2008
>>> break rbd.cc:2021
>>> break rbd.cc:2053
>>> break rbd.cc:2098
>>> run
>>> # (it will run now, stopping when it hits the error)
>>> info locals
>>>
>>>
>>> Will do - how does one load librbd-dbg?  I have the following on the
>>> system:
>>>
>>> librbd-dev - RADOS block device client library (development files)
>>> librbd1-dbg - debugging symbols for librbd1
>>>
>>> is librbd1-dbg sufficient?
>>>
>>
>> Yes, I just forgot the 1 in the package name.
>>
>> Also a question - the merge-diff really stitches the to diff files
>>> together, not really merges, correct? For example, in the following
>>> workflow:
>>>
>>> export-diff from full image - 10GB
>>> export-diff from snap1 - 2 GB
>>> export-diff from snap2 - 1 GB
>>>
>>> My resulting merge export file would be 13GB, correct?
>>>
>>
>> It does merge overlapping sections, i.e. part of snap1 that was
>> overwritten in snap2, so the merged diff may be smaller than the
>> original two.
>>
>> Josh
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] http://gitbuilder.ceph.com/

2015-12-09 Thread Andrew Woodward
This has also exploded puppet-ceph CI. Do we have a workaround? Moving
to Civetweb is in progress but I would prefer to not disable all of
the RGW integration until it can be merged.

[1] 
http://logs.openstack.org/21/255421/1/check/gate-puppet-ceph-puppet-beaker-rspec-dsvm-trusty/e75bc1b/console.html#_2015-12-09_19_28_43_778


On Tue, Dec 8, 2015 at 7:46 AM Ken Dreyer  wrote:
>
> Yes, we've had to move all of our hardware out of the datacenter in
> Irvine, California to a new home in Raleigh, North Carolina. The
> backend server for gitbuilder.ceph.com had a *lot* of data and we were
> not able to sync all of it to an interim server in Raleigh before we
> had to unplug the old one.
>
> Since you brought up fastcgi, it's a good idea to transition your
> cluster from Apache+mod_fastcgi and start using RGW's Civetweb server
> instead. Civetweb is much simpler, and future RGW optimizations are
> all going into the Civetweb stack.
>
> - Ken
>
> On Tue, Dec 8, 2015 at 2:54 AM, Xav Paice  wrote:
> > Hi,
> >
> > Just wondering if there's a known issue with http://gitbuilder.ceph.com/ -
> > if I go to several urls, e.g.
> > http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-trusty-x86_64-basic, I
> > get a 403.  That's still the right place to get deb's, right?
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk list crashes in infernalis

2015-12-09 Thread Loic Dachary
Hi Felix,

It would be great if you could try the fix from 
https://github.com/dachary/ceph/commit/7395a6a0c5776d4a92728f1abf0e8a87e5d5e4bb 
. It's only changing the ceph-disk file so you could just get it from 
https://github.com/dachary/ceph/raw/7395a6a0c5776d4a92728f1abf0e8a87e5d5e4bb/src/ceph-disk
 and replace the existing (after a backup) ceph-disk on one of your machines. 

It passes integration tests 
http://167.114.248.156:8081/ubuntu-2015-12-09_19:37:44-ceph-disk-wip-13970-ceph-disk-cciss-infernalis---basic-openstack/
 but these do not have the driver you're using. They only show nothing has been 
broken by the patch ;-)

Cheers

On 08/12/2015 15:27, Stolte, Felix wrote:
> Yes, they do contain a "!"
> 
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> 
> 
> -Ursprüngliche Nachricht-
> Von: Loic Dachary [mailto:l...@dachary.org] 
> Gesendet: Dienstag, 8. Dezember 2015 15:17
> An: Stolte, Felix; ceph-us...@ceph.com
> Betreff: Re: [ceph-users] ceph-disk list crashes in infernalis
> 
> I also need to confirm that the names that show in /sys/block/*/holders are
> with a ! (it would not make sense to me if they were not but ...)
> 
> On 08/12/2015 15:05, Loic Dachary wrote:
>> Hi Felix,
>>
>> Could you please ls -l /dev/cciss /sys/block/cciss*/ ?
>>
>> Thanks for being the cciss proxy in fixing this problem :-)
>>
>> Cheers
>>
>> On 07/12/2015 11:43, Loic Dachary wrote:
>>> Thanks !
>>>
>>> On 06/12/2015 17:50, Stolte, Felix wrote:
 Hi Loic,

 output is:

 /dev:
 insgesamt 0
 crw--- 1 root root 10, 235 Dez  2 17:02 autofs
 drwxr-xr-x 2 root root1000 Dez  2 17:02 block
 drwxr-xr-x 2 root root  60 Dez  2 17:02 bsg
 crw--- 1 root root 10, 234 Dez  5 06:29 btrfs-control
 drwxr-xr-x 3 root root  60 Dez  2 17:02 bus
 crw-r--r-- 1 root root255, 171 Dez  2 17:02 casr
 drwxr-xr-x 2 root root 500 Dez  2 17:02 cciss
 crw-r--r-- 1 root root255, 173 Dez  2 17:02 ccsm
 lrwxrwxrwx 1 root root   3 Dez  2 17:02 cdrom -> sr0
 crw-r--r-- 1 root root255, 178 Dez  2 17:02 cdt
 crw-r--r-- 1 root root255, 172 Dez  2 17:02 cecc
 crw-r--r-- 1 root root255, 176 Dez  2 17:02 cevt
 drwxr-xr-x 2 root root3820 Dez  5 06:29 char
 crw--- 1 root root  5,   1 Dez  2 17:04 console
 lrwxrwxrwx 1 root root  11 Dez  2 17:02 core -> /proc/kcore
 drw-r--r-- 2 root root 200 Dez  2 17:02 cpqhealth
 drwxr-xr-x 2 root root  60 Dez  2 17:02 cpu
 crw--- 1 root root 10,  60 Dez  2 17:02 cpu_dma_latency
 crw-r--r-- 1 root root255, 180 Dez  2 17:02 crom
 crw--- 1 root root 10, 203 Dez  2 17:02 cuse
 drwxr-xr-x 8 root root 160 Dez  2 17:02 disk
 drwxr-xr-x 2 root root 100 Dez  2 17:02 dri
 crw--- 1 root root 10,  61 Dez  2 17:02 ecryptfs
 crw-rw 1 root video29,   0 Dez  2 17:02 fb0
 lrwxrwxrwx 1 root root  13 Dez  2 17:02 fd -> /proc/self/fd
 crw-rw-rw- 1 root root  1,   7 Dez  2 17:02 full
 crw-rw-rw- 1 root root 10, 229 Dez  2 17:02 fuse
 crw--- 1 root root251,   0 Dez  2 17:02 hidraw0
 crw--- 1 root root251,   1 Dez  2 17:02 hidraw1
 crw--- 1 root root 10, 228 Dez  2 17:02 hpet
 drwxr-xr-x 2 root root 360 Dez  2 17:02 hpilo
 crw--- 1 root root 89,   0 Dez  2 17:02 i2c-0
 crw--- 1 root root 89,   1 Dez  2 17:02 i2c-1
 crw--- 1 root root 89,   2 Dez  2 17:02 i2c-2
 crw--- 1 root root 89,   3 Dez  2 17:02 i2c-3
 crw-r--r-- 1 root root255, 184 Dez  2 17:02 indc
 drwxr-xr-x 4 root root 200 Dez  2 17:02 input
 crw--- 1 root root248,   0 Dez  2 17:02 ipmi0
 crw--- 1 root root249,   0 Dez  2 17:02 kfd
 crw-r--r-- 1 root root  1,  11 Dez  2 17:02 kmsg
 srw-rw-rw- 1 root root   0 Dez  2 17:02 log
 brw-rw 1 root disk  7,   0 Dez  2 17:02 loop0
 brw-rw 1 root disk  7,   1 Dez  2 17:02 loop1
 brw-rw 1 root disk  7,   2 Dez  2 17:02 loop2
 brw-rw 1 root disk  7,   3 Dez  2 17:02 loop3
 brw-rw 1 root disk  7,   4 Dez  2 17:02 loop4
 brw-rw 1 root disk  7,   5 Dez  2 17:02 loop5
 brw-rw 1 root disk  7,   6 Dez  2 17:02 loop6
 brw-rw 1 root disk  7,   7 Dez  2 17:02 loop7
 crw--- 1 root root 10, 237 Dez  2 17:02 loop-control
 drwxr-xr-x 2 root root  60 Dez  2 17:02 mapper
 crw--- 1 root root 10, 227 Dez  2 17:02 mcelog
>>>

[ceph-users] OS Liberty + Ceph Hammer: Block Device Mapping is Invalid.

2015-12-09 Thread c...@dolphin-it.de


Can someone help me?
Help would be highly appreciated ;-)


Last message on OpenStack mailing list:

Dear OpenStack-users,

I just installed my first multi-node OS-setup with Ceph as my storage backend.
After configuring cinder, nova and glance as described in the Ceph-HowTo 
(http://docs.ceph.com/docs/master/rbd/rbd-openstack/), there remains one 
blocker for me:

When creating a new instance based on a bootable glance image (same ceph 
cluster), it fails with:

Dashboard:
> Block Device Mapping is Invalid.

nova-compute.log (http://pastebin.com/bKfEijDu):
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] Traceback (most recent call last):
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]   File 
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1738, in 
> _prep_block_device
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] 
> wait_func=self._await_block_device_map_created)
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]   File 
> "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 476, in 
> attach_block_devices
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] map(_log_and_attach, 
> block_device_mapping)
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]   File 
> "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 474, in 
> _log_and_attach
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] bdm.attach(*attach_args, 
> **attach_kwargs)
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]   File 
> "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 385, in 
> attach
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] self._call_wait_func(context, 
> wait_func, volume_api, vol['id'])
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]   File 
> "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 344, in 
> _call_wait_func
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] {'volume_id': volume_id, 'exc': 
> exc})
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]   File 
> "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in 
> __exit__
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] six.reraise(self.type_, self.value, 
> self.tb)
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]   File 
> "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 335, in 
> _call_wait_func
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] wait_func(context, volume_id)
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]   File 
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1426, in 
> _await_block_device_map_created
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] volume_status=volume_status)
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] VolumeNotCreated: Volume 
> eba9ed20-09b1-44fe-920e-de8b6044500d did not finish being created even after 
> we waited 0 seconds or 1 attempts. And its status is error.
> 2015-12-06 16:44:15.991 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]
> 2015-12-06 16:44:16.034 2333 ERROR nova.compute.manager 
> [req-8a7e1c2c-09ea-4c10-acb3-2716e04fe214 051f7eb0c4df40dda84a69d40ee86a48 
> 3c297aff8cb44e618fb88356a2dd836b - - -] [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] Build of instance 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3 aborted: Block Device Mapping is Invalid.
> 2015-12-06 16:44:16.034 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] Traceback (most recent call last):
> 2015-12-06 16:44:16.034 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3]   File 
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1905, in 
> _do_build_and_run_instance
> 2015-12-06 16:44:16.034 2333 ERROR nova.compute.manager [instance: 
> 83677788-eafc-4d9c-9f38-3cad8030ecd3] filter_properties)
> 2015-12-06 16:44:16.034 2333 ERROR nova.compute.manager [instance: 
> 83677788-ea

Re: [ceph-users] rbd merge-diff error

2015-12-09 Thread Josh Durgin

This is the problem:

http://tracker.ceph.com/issues/14030

As a workaround, you can pass the first diff in via stdin, e.g.:

cat snap1.diff | rbd merge-diff - snap2.diff combined.diff

Josh

On 12/08/2015 11:11 PM, Josh Durgin wrote:

On 12/08/2015 10:44 PM, Alex Gorbachev wrote:

Hi Josh,

On Mon, Dec 7, 2015 at 6:50 PM, Josh Durgin mailto:jdur...@redhat.com>> wrote:

On 12/07/2015 03:29 PM, Alex Gorbachev wrote:

When trying to merge two results of rbd export-diff, the
following error
occurs:

iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151500
spin1/scrun1@autosnap120720151502
/data/volume1/scrun1-120720151502.bck

iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151504
spin1/scrun1@autosnap120720151504
/data/volume1/scrun1-120720151504.bck

iss@lab2-b1:~$ rbd merge-diff
/data/volume1/scrun1-120720151502.bck
/data/volume1/scrun1-120720151504.bck
/data/volume1/mrg-scrun1-0204.bck
  Merging image diff: 11% complete...failed.
rbd: merge-diff error

That's all the output and I have found this link
http://tracker.ceph.com/issues/12911 but not sure if the patch
should
have already been in hammer or how to get it?


That patch fixed a bug that was only present after hammer, due to
parallelizing export-diff. You're likely seeing a different (possibly
new) issue.

Unfortunately there's not much output we can enable for
export-diff in
hammer. Could you try running the command via gdb to figure out where
and why it's failing? Make sure you have librbd-dbg installed, then
send the output from gdb doing:

gdb --args rbd merge-diff /data/volume1/scrun1-120720151502.bck \
/data/volume1/scrun1-120720151504.bck
/data/volume1/mrg-scrun1-0204.bck
break rbd.cc:1931
break rbd.cc:1935
break rbd.cc:1967
break rbd.cc:1985
break rbd.cc:1999
break rbd.cc:2008
break rbd.cc:2021
break rbd.cc:2053
break rbd.cc:2098
run
# (it will run now, stopping when it hits the error)
info locals


Will do - how does one load librbd-dbg?  I have the following on the
system:

librbd-dev - RADOS block device client library (development files)
librbd1-dbg - debugging symbols for librbd1

is librbd1-dbg sufficient?


Yes, I just forgot the 1 in the package name.


Also a question - the merge-diff really stitches the to diff files
together, not really merges, correct? For example, in the following
workflow:

export-diff from full image - 10GB
export-diff from snap1 - 2 GB
export-diff from snap2 - 1 GB

My resulting merge export file would be 13GB, correct?


It does merge overlapping sections, i.e. part of snap1 that was
overwritten in snap2, so the merged diff may be smaller than the
original two.

Josh


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] building ceph rpms, "ceph --version" returns no version

2015-12-09 Thread bruno.canning
Hi All,

Long story short:

I have built ceph hammer RPMs, everything seems to work OK but running "ceph 
--version" does not report the version number. I don't get a version number 
returned from "service ceph status", either. I'm concerned that other 
components in our system my rely on ceph --version returning a value, so I 
would like to fix this. Can anyone help, please?

The back story:

We need ceph hammer binaries that contain the most recent contributions from 
sponce (https://github.com/ceph/ceph/pull/6322) as we are using 
libradosstriper1 with ceph. As these changes were merged into the hammer branch 
after the latest hammer release, we have had to build our own RPMs.

To do this, I did a git clone of the hammer branch shortly after this pull 
request was merged. I then tarballed the directory with the "--exclude-vcs" 
option, modified the supplied .spec file by changeing the value of "Version:" 
from "@VERSION@" to "0.94.5" and the value of "Release" from 
"@RPM_RELEASE@%{?dist}" to "1%{?dist}" and built the RPMs. I have since noticed 
the problem with ceph --version.

The answer looks to be found somewhere in one of these files, but I'm coming at 
this as a system administrator, not a programmer, so it is all rather confusing:
ceph-0.94.5/src/common/version.h
ceph-0.94.5/src/common/version.cc
ceph-0.94.5/CMakeLists.txt
ceph-0.94.5/src/ceph_ver.c
ceph-0.94.5/src/ceph_ver.h.in.cmake

Many thanks in advance,
Bruno

Bruno Canning
Scientific Computing Department
STFC Rutherford Appleton Laboratory
Harwell Oxford
Didcot
OX11 0QX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New cluster performance analysis

2015-12-09 Thread Kris Gillespie
One thing I noticed with all my testing, as the speed difference between the 
SSDs and the spinning rust can be quite high and as your journal needs to flush 
every X bytes (configurable), the impact of this flush can be hard, as IO to 
this journal will stop until it’s finished (I believe). Something to see, run a 
fio test but also log the latency stats and then graph them. Should make the 
issue pretty clear. I’ll predict you’re gonna see some spikes.

If so, you may need to

a) decide if its a problem with the future defined workload - maybe it’s not so 
bursty….
b) have a look at 
http://docs.ceph.com/docs/hammer/rados/configuration/journal-ref/ 
 and maybe 
tweak the “journal max writes bytes” or the others

There won’t be a golden rule here however and it’s one of the reasons some 
benchmarks can lead to unfounded worrying.

Cheers

Kris


> On 04 Dec 2015, at 15:10, Jan Schermer  wrote:
> 
>> 
>> On 04 Dec 2015, at 14:31, Adrien Gillard > > wrote:
>> 
>> After some more tests :
>> 
>>  - The pool being used as cache pool has no impact on performance, I get the 
>> same results with a "dedicated" replicated pool.
>>  - You are right Jan, on raw devices I get better performance on a volume if 
>> I fill it first, or at least if I write a zone that already has been 
>> allocated
>>  - The same seem to apply when the test is run on the mounted filesystem.
>> 
> 
> Yeah. The the first (raw device) is because the objects on OSDs get "thick" 
> in the process.
> The second (filesystem) is because of both the OSD objects getting thick and 
> the guest filesystem getting thick.
> Preallocating the space can speed up things considerably (like 100x)).
> Unfortunately I haven't found a way to convince fallocate() &co. to thick 
> provision files.
> 
> Jan
> 
>> 
>> 
>> 
>> 
>> On Thu, Dec 3, 2015 at 2:49 PM, Adrien Gillard > > wrote:
>> I did some more tests :
>> 
>> fio on a raw RBD volume (4K, numjob=32, QD=1) gives me around 3000 IOPS
>> 
>> I also tuned xfs mount options on client (I realized I didn't do that 
>> already) and with 
>> "largeio,inode64,swalloc,logbufs=8,logbsize=256k,attr2,auto,nodev,noatime,nodiratime"
>>  I get better performance :
>> 
>> 4k-32-1-randwrite-libaio: (groupid=0, jobs=32): err= 0: pid=26793: Thu Dec  
>> 3 10:45:55 2015
>>   write: io=1685.3MB, bw=5720.1KB/s, iops=1430, runt=301652msec
>> slat (usec): min=5, max=1620, avg=41.61, stdev=25.82
>> clat (msec): min=1, max=4141, avg=14.61, stdev=112.55
>>  lat (msec): min=1, max=4141, avg=14.65, stdev=112.55
>> clat percentiles (msec):
>>  |  1.00th=[3],  5.00th=[4], 10.00th=[4], 20.00th=[4],
>>  | 30.00th=[4], 40.00th=[5], 50.00th=[5], 60.00th=[5],
>>  | 70.00th=[5], 80.00th=[6], 90.00th=[7], 95.00th=[7],
>>  | 99.00th=[  227], 99.50th=[  717], 99.90th=[ 1844], 99.95th=[ 2245],
>>  | 99.99th=[ 3097]
>> 
>> So, more than 50% improvement but it actually varies quite a lot between 
>> tests (sometimes I get a bit more than 1000). If I run the test fo 30 
>> minutes it drops to 900 IOPS.
>> 
>> As you suggested I also filled a volume with zeros (dd if=/dev/zero 
>> of=/dev/rbd1 bs=1M) and then ran fio on the raw device, I didn't see a lot 
>> of improvement.
>> 
>> If I run fio test directly on block devices I seem to saturate the spinners, 
>> [1] is a graph of IO load on one of the OSD host.
>> [2] is the same OSD graph but when the test is done on a device mounted and 
>> formatted with XFS on the client.
>> If I get half of the IOPS on the XFS volume because of the journal, 
>> shouldn't I get the same amount of IOPS on the backend ?
>> [3] shows what happen if I run the test for 30 minutes.
>> 
>> During the fio tests on the raw device, load average on the OSD servers 
>> increases up to 13/14 and I get a bit of iowait (I guess because the OSD are 
>> busy)
>> During the fio tests on the raw device, load average on the OSD servers 
>> peaks at the beginning and decreases to 5/6, but goes trough the roof on the 
>> client.
>> Scheduler is deadline for all the drives, I didn't try to change it yet.
>> 
>> What I don't understand, even with your explanations, are the rados results. 
>> From what I understand it performs at the RADOS level and thus should not be 
>> impacted by client filesystem.
>> Given the results above I guess you are right and this has to do with the 
>> client filesystem.
>> 
>> The cluster will be used for backups, write IO size during backups is around 
>> 150/200K (I guess mostly sequential) and I am looking for the highest 
>> bandwith and parallelization.
>> 
>> @Nick, I will try to create a new stand alone replicated pool.
>> 
>> 
>> [1] http://postimg.org/image/qvtvdq1n1/ 
>> [2] http://postimg.org/image/nhf6lzwgl/ 
>>

Re: [ceph-users] High disk utilisation

2015-12-09 Thread MATHIAS, Bryn (Bryn)
to update this, the error looks like it comes from updatedb scanning the ceph 
disks.

When we make sure it doesn’t, by putting the ceph mount points in the exclusion 
file, the problem goes away.

Thanks for the help and time.
On 30 Nov 2015, at 09:53, MATHIAS, Bryn (Bryn) 
mailto:bryn.math...@alcatel-lucent.com>> wrote:


On 30 Nov 2015, at 14:37, MATHIAS, Bryn (Bryn) 
mailto:bryn.math...@alcatel-lucent.com>> wrote:

Hi,
On 30 Nov 2015, at 13:44, Christian Balzer 
mailto:ch...@gol.com>> wrote:


Hello,

On Mon, 30 Nov 2015 07:55:24 + MATHIAS, Bryn (Bryn) wrote:

Hi Christian,

I’ll give you a much better dump of detail :)

Running RHEL 7.1,
ceph version 0.94.5

all ceph disks are xfs, with journals on a partition on the disk
Disks: 6Tb spinners.

OK, I was guessing that journal on disk, but good to know.
Which exact model?
Some of them are rather unsuited for Ceph usage (SMR).
I don’t know the exact model of the disks but they are not SMR disks.

Erasure coded pool with 4+1 EC ISA-L also.

OK, this is where I plead ignorance, no EC experience at all.
But it would be strange for this to be hitting a single disk at a time.
It is hitting a single disk in each node, however I’d have thought that I’d see 
repetition over the disks if it were doing this on a per placement group basis.

No scrubbing reported in the ceph log, the cluster isn’t old enough yet
to be doing any deep scrubbing. Also the cpu usage of the osd deamon
that controls the disk isn’t spiking which I have seen previously when
scrubbing or deep scrubbing is taking place.

Alright, can you confirm (with atop or the likes) that the busy disk is
actually being written/read to by the OSD process in question and if there
is a corresponding network traffic for the amount of I/O?
I checked for network traffic, there didn’t look to be any.
Looks like the problem is transient and has disappeared for the moment.
I will post more when I see the problem again.

Bryn

Christian


All disks are at 2% utilisation as given by df.

For explicitness:
[root@au-sydney ~]# ceph -s
  cluster ff900f17-7eec-4fe1-8f31-657d44b86a22
   health HEALTH_OK
   monmap e5: 5 mons at
{au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,au-sydney=10.50.21.20:6789/0}
election epoch 274, quorum 0,1,2,3,4
au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide osdmap e8549:
120 osds: 120 up, 120 in pgmap v408422: 8192 pgs, 2 pools, 7794 GB data,
5647 kobjects 9891 GB used, 644 TB / 654 TB avail 8192 active+clean
client io 68363 kB/s wr, 1249 op/s


Cheers,
Bryn


On 30 Nov 2015, at 12:57, Christian Balzer
mailto:ch...@gol.com>> wrote:


Hello,

On Mon, 30 Nov 2015 07:15:35 + MATHIAS, Bryn (Bryn) wrote:

Hi All,

I am seeing an issue with ceph performance.
Starting from an empty cluster of 5 nodes, ~600Tb of storage.

It would be helpful to have more details (all details in fact) than this.
Complete HW, OS, FS used, Ceph versions and configuration details
(journals on HDD, replication levels etc).

While this might not seem significant to your current question, it might
prove valuable as to why you're seeing performance problems and how to
address them.

monitoring disk usage in nmon I see rolling 100% usage of a disk.
Ceph -w doesn’t report any spikes in throughput and the application
putting data is not spiking in the load generated.


The ceph.log should give a more detailed account, but assuming your
client side is indeed steady state, this could be very well explained by
scrubbing, especially deep-scrubbing.
That should also be visible in the ceph.log.

Christian

│sdg2   0%0.0  537.5|
|
│ │sdh 2%4.0
4439.8|RW

  │
│sdh1 2%4.0
3972.3|RW

   │
│sdh2   0%0.0  467.6|
  |
│ │sdj 3%2.0
3524.7|RW

 │
│sdj1 3%2.0
3488.7|RW

   │
│sdj2   0%0.0   36.0|
  |
│ │sdk   99% 1144.9
3564.6|R>
│
│sdk1  99% 1144.9
3254.9|R>
│ │sdk2   0%0.0  309.7|W
   |
│ │sdl1%4.0  955.1|R
 |
│ │sdl1   1%4.0  791.3|R
 |
│
│sdl2   0%0.0  163.8|
|


Is this anything to do with the way object

Re: [ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Christian Kauhaus
Am 09.12.2015 um 11:21 schrieb Jan Schermer:
> Are you seeing "peering" PGs when the blocked requests are happening? That's 
> what we see regularly when starting OSDs.

Mostly "peering" and "activating".

> I'm not sure this can be solved completely (and whether there are major 
> improvements in newer Ceph versions), but it can be sped up by
> 1) making sure you have free (and not dirtied or fragmented) memory on the 
> node where you are starting the OSD
>   - that means dropping caches before starting the OSD if you have lots 
> of "free" RAM that is used for VFS cache
> 2) starting the OSDs one by one instead of booting several of them
> 3) if you pin the OSDs to CPUs/cores, do that after the OSD is in - I found 
> it to be best to pin the OSD to a cgroup limited to one NUMA node and then 
> limit it to a subset of cores after it has run a bit. OSD tends to use 
> hundreds of % of CPU when booting
> 4) you could possibly prewarm cache for the OSD in /var/lib/ceph/osd...

Thank you for your advice. The use case is not so much after rebooting a
server, but more when we take OSDs in/out for maintenance. During boot, we
already start them one after another with 10s pause between each pair.

I've done a bit of tracing. I've kept a small cluster running with 2 "in" OSDs
out of 3 and put the third one "in" at 15:06:22. From ceph.log:

| 2015-12-09 15:06:22.827030 mon.0 172.20.4.6:6789/0 54964 : cluster [INF]
osdmap e264345: 3 osds: 3 up, 3 in
| 2015-12-09 15:06:22.828693 mon.0 172.20.4.6:6789/0 54965 : cluster [INF]
pgmap v39871295: 1800 pgs: 1800 active+clean; 439 GB data, 906 GB used, 4515
GB / 5421 GB avail; 6406 B/s rd, 889 kB/s wr, 67 op/s
| [...]
| 2015-12-09 15:06:29.163793 mon.0 172.20.4.6:6789/0 54972 : cluster [INF]
pgmap v39871299: 1800 pgs: 1800 active+clean; 439 GB data, 906 GB used, 7700
GB / 8607 GB avail

After a few seconds, backfills start as expected:

| 2015-12-09 15:06:24.853507 osd.3 172.20.4.40:6800/5072 778 : cluster [INF]
410.c9 restarting backfill on osd.2 from (0'0,0'0] MAX to 264336'502426
| [...]
| 2015-12-09 15:06:29.874092 osd.3 172.20.4.40:6800/5072 1308 : cluster [INF]
410.d1 restarting backfill on osd.2 from (0'0,0'0] MAX to 264344'1202983
| 2015-12-09 15:06:32.584907 mon.0 172.20.4.6:6789/0 54973 : cluster [INF]
pgmap v39871300: 1800 pgs: 3 active+remapped+wait_backfill, 191
active+remapped, 1169 active+clean, 437 activating+remapped; 439 GB data, 906
GB used, 7700 GB / 8607 GB avail; 1725 kB/s rd, 2486 kB/s wr, 605 op/s;
23058/278796 objects misplaced (8.271%); 56612 kB/s, 14 objects/s recovering
| 2015-12-09 15:06:24.851307 osd.0 172.20.4.51:6800/4919 2662 : cluster [INF]
410.c8 restarting backfill on osd.2 from (0'0,0'0] MAX to 264344'1017219
| 2015-12-09 15:06:38.555243 mon.0 172.20.4.6:6789/0 54976 : cluster [INF]
pgmap v39871303: 1800 pgs: 22 active+remapped+wait_backfill, 520
active+remapped, 638 active+clean, 620 activating+remapped; 439 GB data, 906
GB used, 7700 GB / 8607
| GB avail; 45289 B/s wr, 4 op/s; 64014/313904 objects misplaced (20.393%)
| 2015-12-09 15:06:38.133376 osd.3 172.20.4.40:6800/5072 1309 : cluster [WRN]
9 slow requests, 9 included below; oldest blocked for > 15.306541 secs
| 2015-12-09 15:06:38.133385 osd.3 172.20.4.40:6800/5072 1310 : cluster [WRN]
slow request 15.305213 seconds old, received at 2015-12-09 15:06:22.828061:
osd_op(client.15205073.0:35726 rbd_header.13998a74b0dc51 [watch reconnect
cookie 139897352489152 gen 37] 410.937870ca ondisk+write+known_if_redirected
e264345) currently reached_pg

It seems that PGs in "activating" state are causing blocked requests.

After a half minute or so, slow requests disappear and backfill proceeds 
normally:

| 2015-12-09 15:06:54.139948 osd.3 172.20.4.40:6800/5072 1396 : cluster [WRN]
42 slow requests, 9 included below; oldest blocked for > 31.188267 secs
| 2015-12-09 15:06:54.139957 osd.3 172.20.4.40:6800/5072 1397 : cluster [WRN]
slow request 15.566440 seconds old, received at 2015-12-09 15:06:38.573403:
osd_op(client.15165527.0:5878994 rbd_data.129a42ae8944a.0f2b
[set-alloc-hint object_size 4194304 write_size 4194304,write 1728512~4096]
410.de3ce70d snapc 3fd2=[3fd2] ack+ondisk+write+known_if_redirected e264348)
currently waiting for subops from 0,2
| 2015-12-09 15:06:54.139977 osd.3 172.20.4.40:6800/5072 1401 : cluster [WRN]
slow request 15.356852 seconds old, received at 2015-12-09 15:06:38.782990:
osd_op(client.15165527.0:5878997 rbd_data.129a42ae8944a.0f2b
[set-alloc-hint object_size 4194304 write_size 4194304,write 1880064~4096]
410.de3ce70d snapc 3fd2=[3fd2] ack+ondisk+write+known_if_redirected e264348)
currently waiting for subops from 0,2
| [...]
| 2015-12-09 15:07:00.072403 mon.0 172.20.4.6:6789/0 54989 : cluster [INF]
osdmap e264351: 3 osds: 3 up, 3 in
| 2015-12-09 15:07:00.074536 mon.0 172.20.4.6:6789/0 54990 : cluster [INF]
pgmap v39871313: 1800 pgs: 277 active+remapped+wait_backfill, 881
active+remapped, 4 active+remapped+backfilling, 638 active+clean; 43

Re: [ceph-users] CephFS: number of PGs for metadata pool

2015-12-09 Thread Mykola Dvornik

Good point. Thanks!

Triple-failure is essentially what I've faced about a months ago. So 
now I want to make sure that the new cephfs setup I am deploying at the 
moment will handle this kind of things better.


On Wed, Dec 9, 2015 at 2:41 PM, John Spray  wrote:
On Wed, Dec 9, 2015 at 1:25 PM, Mykola Dvornik 
 wrote:

 Hi Jan,

 Thanks for the reply. I see your point about replicas. However my 
motivation

 was a bit different.

 Consider some given amount of objects that are stored in the 
metadata pool.
 If I understood correctly ceph data placement approach, the number 
of

 objects per PG should decrease with the amount of PGs per pool.

 So my concern is that in catastrophic event of some PG(s) being 
lost I will
 loose more objects if the amount of PGs per pool is small. At the 
same time
 I don't want to have too few objects per PG to keep things disk IO, 
but not

 CPU bounded.


If you are especially concerned about triple-failures (i.e. permanent
PG loss), I would suggest you look at doing things like a size=4 pool
for your metadata (maybe on SSDs).

You could also look at simply segregating your size=3 metadata on to
separate spinning drives, so that these comparatively less loaded OSDs
will be able to undergo recovery faster in the event of a failure than
an ordinary data drive that's full of terabytes of data, and have a
lower probability of a triple failure.

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: number of PGs for metadata pool

2015-12-09 Thread John Spray
On Wed, Dec 9, 2015 at 1:25 PM, Mykola Dvornik  wrote:
> Hi Jan,
>
> Thanks for the reply. I see your point about replicas. However my motivation
> was a bit different.
>
> Consider some given amount of objects that are stored in the metadata pool.
> If I understood correctly ceph data placement approach, the number of
> objects per PG should decrease with the amount of PGs per pool.
>
> So my concern is that in catastrophic event of some PG(s) being lost I will
> loose more objects if the amount of PGs per pool is small. At the same time
> I don't want to have too few objects per PG to keep things disk IO, but not
> CPU bounded.

If you are especially concerned about triple-failures (i.e. permanent
PG loss), I would suggest you look at doing things like a size=4 pool
for your metadata (maybe on SSDs).

You could also look at simply segregating your size=3 metadata on to
separate spinning drives, so that these comparatively less loaded OSDs
will be able to undergo recovery faster in the event of a failure than
an ordinary data drive that's full of terabytes of data, and have a
lower probability of a triple failure.

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: number of PGs for metadata pool

2015-12-09 Thread Mykola Dvornik

Hi Jan,

Thanks for the reply. I see your point about replicas. However my 
motivation was a bit different.


Consider some given amount of objects that are stored in the metadata 
pool.
If I understood correctly ceph data placement approach, the number of 
objects per PG should decrease with the amount of PGs per pool.


So my concern is that in catastrophic event of some PG(s) being lost I 
will loose more objects if the amount of PGs per pool is small. At the 
same time I don't want to have too few objects per PG to keep things 
disk IO, but not CPU bounded.


So I thought maybe somebody did some research in this direction?













On Wed, Dec 9, 2015 at 1:13 PM, Jan Schermer  wrote:
Number of PGs doesn't affect the number of replicas, so don't worry 
about it.


Jan

 On 09 Dec 2015, at 13:03, Mykola Dvornik  
wrote:


 Hi guys,

 I am creating a 4-node/16OSD/32TB CephFS from scratch.

 According to the ceph documentation the metadata pool should have 
small amount of PGs since it contains some negligible amount of data 
compared to data pool. This makes me feel it might not be safe.


 So I was wondering how to chose the number of PGs per metadata pool 
to maintain its performance and reliability?


 Regards,

 Mykola
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] problem after reinstalling system

2015-12-09 Thread Jacek Jarosiewicz

Hi,

I have a working ceph cluster with storage nodes running Ubuntu 14.04 
and ceph hammer 0.94.5.


Now I want to switch to CentOS 7.1 (forget about the reasons for now, I 
can explain, but it would be a long story and irrelevant to my question).


I've set the osd noout flag and norebalance,norecover for the time of 
reinstall. The new system is installed with the same version of ceph.


I've made a backup of /var/lib/ceph directory (after stopping ceph 
services obviously) and kept the osd's intact.


But after reinstall, when I try to start the daemons (the machine run 
one monitor and three osd's) I get these messages in the logs:


monitor:
2015-12-09 13:15:28.223872 7f4ccd41b880  0 ceph version 0.94.5 
(9764da52395923e0b32908d83a9f7304401fee43), process ceph-mon, pid 5800
2015-12-09 13:15:29.411448 7f4ccd41b880 -1 error opening mon data 
directory at '/var/lib/ceph/mon/ceph-cf03': (22) Invalid argument



osds:

2015-12-09 13:11:50.480625 7fac03c7f880  0 ceph version 0.94.5 
(9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 3092
2015-12-09 13:11:50.508803 7fac03c7f880  0 
filestore(/var/lib/ceph/osd/ceph-5) backend xfs (magic 0x58465342)
2015-12-09 13:11:50.640410 7fac03c7f880  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features: 
FIEMAP ioctl is supported and appears to work
2015-12-09 13:11:50.640429 7fac03c7f880  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features: 
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-12-09 13:11:50.640890 7fac03c7f880  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features: 
syncfs(2) syscall fully supported (by glibc and kernel)
2015-12-09 13:11:50.646915 7fac03c7f880  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_feature: extsize is 
supported and kernel 3.10.0-229.20.1.el7.x86_64 >= 3.5
2015-12-09 13:11:51.171377 7fac03c7f880 -1 
filestore(/var/lib/ceph/osd/ceph-5) Error initializing leveldb : 
Corruption: 29 missing files; e.g.: 
/var/lib/ceph/osd/ceph-5/current/omap/046388.sst


2015-12-09 13:11:51.171399 7fac03c7f880 -1 osd.5 0 OSD:init: unable to 
mount object store
2015-12-09 13:11:51.171404 7fac03c7f880 -1  ** ERROR: osd init failed: 
(1) Operation not permitted



can anyone help? I don't see any sst files on any of my other (working) 
ceph nodes, the directories are fine with correct permissions..


I can readd this machine from scratch without data loss, but the 
rebalancing/recovering will last a week (been there, done that), so I 
was hoping I could start the osds with only some data out of date.


Is it possible? What can I do?

Cheers,
J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA ->   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: number of PGs for metadata pool

2015-12-09 Thread Jan Schermer
Number of PGs doesn't affect the number of replicas, so don't worry about it.

Jan

> On 09 Dec 2015, at 13:03, Mykola Dvornik  wrote:
> 
> Hi guys,
> 
> I am creating a 4-node/16OSD/32TB CephFS from scratch. 
> 
> According to the ceph documentation the metadata pool should have small 
> amount of PGs since it contains some negligible amount of data compared to 
> data pool. This makes me feel it might not be safe.
> 
> So I was wondering how to chose the number of PGs per metadata pool to 
> maintain its performance and reliability?
> 
> Regards,
> 
> Mykola
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS: number of PGs for metadata pool

2015-12-09 Thread Mykola Dvornik

Hi guys,

I am creating a 4-node/16OSD/32TB CephFS from scratch.

According to the ceph documentation the metadata pool should have small 
amount of PGs since it contains some negligible amount of data compared 
to data pool. This makes me feel it might not be safe.


So I was wondering how to chose the number of PGs per metadata pool to 
maintain its performance and reliability?


Regards,

Mykola
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 9.2 fails to install in COS 7.1.1503: Report and Fix

2015-12-09 Thread Loic Dachary
Hi,

It also had to be fixed for the development environment (see 
http://tracker.ceph.com/issues/14019).

Cheers

On 09/12/2015 09:37, Ben Hines wrote:
> FYI - same issue when installing Hammer, 94.5. I also fixed it by enabling 
> the cr repo.
> 
> -Ben
> 
> On Tue, Dec 8, 2015 at 5:13 PM, Goncalo Borges  > wrote:
> 
> Hi Cephers
> 
> This is just to report an issue (and a workaround) regarding dependencies 
> in Centos 7.1.1503
> 
> Last week, I installed a couple of nodes and there were no issues with 
> dependencies. This week, the installation of ceph rpm fails because it 
> depends on gperftools-libs which, on its own, depends on libunwind.
> 
> Searching a bit, I've checked that my last week installs downloaded 
> libunwind from epel (libunwind-1.1-10.el7.x86_64). Today it is no longer 
> there.
> 
> Goggling about it, it seems libunwind will be available in CentOS 
> 7.2.1511 but for the current time, it should be available in Centos CR repos. 
> For Centos 7.1.1503, it provides libunwind-1.1-5.el7.x86_64)
> 
> http://mirror.centos.org/centos/7.1.1503/cr
> 
> Cheers
> Goncalo
> 
> -- 
> Goncalo Borges
> Research Computing
> ARC Centre of Excellence for Particle Physics at the Terascale
> School of Physics A28 | University of Sydney, NSW  2006
> T: +61 2 93511937 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Jan Schermer
Are you seeing "peering" PGs when the blocked requests are happening? That's 
what we see regularly when starting OSDs.

I'm not sure this can be solved completely (and whether there are major 
improvements in newer Ceph versions), but it can be sped up by
1) making sure you have free (and not dirtied or fragmented) memory on the node 
where you are starting the OSD
- that means dropping caches before starting the OSD if you have lots 
of "free" RAM that is used for VFS cache
2) starting the OSDs one by one instead of booting several of them
3) if you pin the OSDs to CPUs/cores, do that after the OSD is in - I found it 
to be best to pin the OSD to a cgroup limited to one NUMA node and then limit 
it to a subset of cores after it has run a bit. OSD tends to use hundreds of % 
of CPU when booting
4) you could possibly prewarm cache for the OSD in /var/lib/ceph/osd...

It's unclear to me whether MONs influence this somehow (the peering stage) but 
I have observed their CPU usage and IO also spikes when OSDs are started, so 
make sure they are not under load.

Jan


> On 09 Dec 2015, at 11:03, Christian Kauhaus  wrote:
> 
> Hi,
> 
> I'm getting blocked requests (>30s) every time when an OSD is set to "in" in
> our clusters. Once this has happened, backfills run smoothly.
> 
> I have currently no idea where to start debugging. Has anyone a hint what to
> examine first in order to narrow this issue?
> 
> TIA
> 
> Christian
> 
> -- 
> Dipl-Inf. Christian Kauhaus <>< · k...@flyingcircus.io · +49 345 219401-0
> Flying Circus Internet Operations GmbH · http://flyingcircus.io
> Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
> HR Stendal 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Christian Kauhaus
Hi,

I'm getting blocked requests (>30s) every time when an OSD is set to "in" in
our clusters. Once this has happened, backfills run smoothly.

I have currently no idea where to start debugging. Has anyone a hint what to
examine first in order to narrow this issue?

TIA

Christian

-- 
Dipl-Inf. Christian Kauhaus <>< · k...@flyingcircus.io · +49 345 219401-0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 9.2 fails to install in COS 7.1.1503: Report and Fix

2015-12-09 Thread Ben Hines
FYI - same issue when installing Hammer, 94.5. I also fixed it by enabling
the cr repo.

-Ben

On Tue, Dec 8, 2015 at 5:13 PM, Goncalo Borges  wrote:

> Hi Cephers
>
> This is just to report an issue (and a workaround) regarding dependencies
> in Centos 7.1.1503
>
> Last week, I installed a couple of nodes and there were no issues with
> dependencies. This week, the installation of ceph rpm fails because it
> depends on gperftools-libs which, on its own, depends on libunwind.
>
> Searching a bit, I've checked that my last week installs downloaded
> libunwind from epel (libunwind-1.1-10.el7.x86_64). Today it is no longer
> there.
>
> Goggling about it, it seems libunwind will be available in CentOS 7.2.1511
> but for the current time, it should be available in Centos CR repos. For
> Centos 7.1.1503, it provides libunwind-1.1-5.el7.x86_64)
>
> http://mirror.centos.org/centos/7.1.1503/cr
>
> Cheers
> Goncalo
>
> --
> Goncalo Borges
> Research Computing
> ARC Centre of Excellence for Particle Physics at the Terascale
> School of Physics A28 | University of Sydney, NSW  2006
> T: +61 2 93511937
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com