Re: [ceph-users] What is recommended ceph docker image for use

2019-05-08 Thread Stefan Kooman
Quoting Ignat Zapolsky (ignat.zapol...@ammeon.com):
> Hi,
> 
> Just a question what is recommended docker container image to use for
> ceph ?
> 
> CEPH website us saying that 12.2.x is LTR but there are at least 2
> more releases in dockerhub – 13 and 14.
> 
> Would there be any advise on selection between 3 releases ?

There isn't a "LTR" concept anymore in Ceph. There are three releases
that are supported at any given time. As soon as Octopus (15) will be
released Luminous (12) won't be (officially) supported anymore.

I would go for the latest release, Nautilus (14), when setting up a new
cluster. But if you want to go for a release that has been more battle
tested in production go for Mimic (13). It might also depend on your use
case: cephfs is proabably best served from Nautilus. Nautilus release
has improvements in all major interfaces (RGW, RBD, cephfs) though, but
might have some (undiscovered) issues.

Gr. Stefan



-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus: significant increase in cephfs metadata pool usage

2019-05-08 Thread Dietmar Rieder
On 5/8/19 10:52 PM, Gregory Farnum wrote:
> On Wed, May 8, 2019 at 5:33 AM Dietmar Rieder
>  wrote:
>>
>> On 5/8/19 1:55 PM, Paul Emmerich wrote:
>>> Nautilus properly accounts metadata usage, so nothing changed it just
>>> shows up correctly now ;)
>>
>> OK, but then I'm not sure I understand why the increase was not sudden
>> (with the update) but it kept growing steadily over days.
> 
> Tracking the amount of data used by omap (ie, the internal RocksDB)
> isn't really possible to do live, and in the past we haven't done it
> at all. In Nautilus, it gets stats whenever a deep scrub happens so
> the omap data is always stale, but at least lets us approximate what's
> in use for a given PG.
> 
> So when you upgraded to Nautilus, the metadata pool scrubbed PGs over
> a period of days and each time a PG scrub finished the amount of data
> accounted to the pool as a whole increased. :)
> -Greg

Thanks for this clear explanation.

BTW what is the difference between the two following metrics:
ceph_pool_stored_raw
ceph_pool_stored

I expected that ceph_pool_stored_raw should large values than
ceph_pool_stored, depending on the redundancy/replication level, however
at least in our case the values are the same.

Best
  Dietmar

-- 
_
D i e t m a r  R i e d e r, Mag.Dr.
Innsbruck Medical University
Biocenter - Division for Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402
Fax: +43 512 9003 73100
Email: dietmar.rie...@i-med.ac.at
Web:   http://www.icbi.at




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSDs failing to boot

2019-05-08 Thread Rawson, Paul L.
Hi Folks,

I'm having trouble getting some of my OSDs to boot. At some point, these 
disks got very full. I fixed the rule that was causing that, and they 
are on average ~30% full now.

I'm getting the following in my logs:

     -1> 2019-05-08 16:05:18.956 7fdc7adbbf00 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/include/interval_set.h:
 
In function 'void interval_set::insert(T, T, T*, T*) [with T = 
long unsigned int; Map = std::map, std::allocator > >]' thread 7fdc7adbbf00 time 
2019-05-08 16:05:18.953372
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/include/interval_set.h:
 
490: FAILED ceph_assert(p->first > start+len)

  ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) 
nautilus (stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14a) [0x7fdc70daa676]
  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char 
const*, char const*, ...)+0) [0x7fdc70daa844]
  3: (interval_set, std::allocator > > >::insert(unsigned long, unsigned long, unsigned 
long*, unsigned long*)+0x45f) [0x55b8960e03df]
  4: (BlueStore::allocate_bluefs_freespace(unsigned long, unsigned long, 
std::vector 
 >*)+0x74e) [0x55b89611d13e]
  5: (BlueFS::_expand_slow_device(unsigned long, 
std::vector 
 >&)+0x111) [0x55b8960c8211]
  6: (BlueFS::_allocate(unsigned char, unsigned long, 
bluefs_fnode_t*)+0x68b) [0x55b8960c8f7b]
  7: (BlueFS::_allocate(unsigned char, unsigned long, 
bluefs_fnode_t*)+0x362) [0x55b8960c8c52]
  8: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned 
long)+0xe5) [0x55b8960c95d5]
  9: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x55b8960cb43b]
  10: (BlueRocksWritableFile::Flush()+0x3d) [0x55b8962bdfcd]
  11: (rocksdb::WritableFileWriter::Flush()+0x19e) [0x55b896531a4e]
  12: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x55b896531d2e]
  13: (rocksdb::BuildTable(std::string const&, rocksdb::Env*, 
rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, 
rocksdb::EnvOptions const&, rocksdb::TableCache*, 
rocksdb::InternalIteratorBase*, 
std::unique_ptr, 
std::default_delete > >, 
rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, 
std::vector >, 
std::allocator > > > const*, 
unsigned int, std::string const&, std::vector >, unsigned long, 
rocksdb::SnapshotChecker*, rocksdb::CompressionType, 
rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, 
rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, 
rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, 
unsigned long, rocksdb::Env::WriteLifeTimeHint)+0x2368) [0x55b89655fb68]
  14: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, 
rocksdb::ColumnFamilyData*, rocksdb::MemTable*, 
rocksdb::VersionEdit*)+0xc66) [0x55b8963d48c6]
  15: (rocksdb::DBImpl::RecoverLogFiles(std::vector > const&, unsigned long*, bool)+0x1dce) 
[0x55b8963d6f1e]
  16: 
(rocksdb::DBImpl::Recover(std::vector > const&, bool, bool, 
bool)+0x809) [0x55b8963d7db9]
  17: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::string 
const&, std::vector > const&, 
std::vector >*, rocksdb::DB**, bool, 
bool)+0x658) [0x55b8963d8bc8]
  18: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::string const&, 
std::vector > const&, 
std::vector >*, rocksdb::DB**)+0x24) 
[0x55b8963da3a4]
  19: (RocksDBStore::do_open(std::ostream&, bool, bool, 
std::vector > const*)+0x1660) [0x55b8961c2a80]
  20: (BlueStore::_open_db(bool, bool, bool)+0xf8e) [0x55b89611b37e]
  21: (BlueStore::_open_db_and_around(bool)+0x165) [0x55b8961388b5]
  22: (BlueStore::_fsck(bool, bool)+0xe5c) [0x55b8961692dc]
  23: (main()+0x107e) [0x55b895fc682e]
  24: (__libc_start_main()+0xf5) [0x7fdc6da4e3d5]
  25: (()+0x2718cf) [0x55b8960ac8cf]

  0> 2019-05-08 16:05:18.960 7fdc7adbbf00 -1 *** Caught signal 
(Aborted) **
  in thread 7fdc7adbbf00 thread_name:ceph-bluestore-

  ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) 
nautilus (stable)
  1: (()+0xf5d0) [0x7fdc6f2905d0]
  2: (gsignal()+0x37) [0x7fdc6da62207]
  3: (abort()+0x148) [0x7fdc6da638f8]
  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x199) [0x7fdc70daa6c5]
  5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char 
const*, char const*, ...)+0) [0x7fdc70daa844]
  6: (interval_set, std::allocator > > >::insert(unsigned long, unsigned long, unsigned 
long*, unsigned long*)+0x45f) [0x55b8960e03df]
  7: (BlueStore::allocate_bluefs_freespace(unsigned long, unsigned long, 
std::vector 
 >*)+0x74e) [0x55b89611d13e]
  8: (BlueFS::_expand_slow_device(unsigned long, 
std::vector 
 >&)+0x111) [0x55b8960c8211]
  9: (BlueFS::_allocate(unsigned char, unsigned long, 
bluefs_fnode_t*)+0x68b) [0x55b8960c8f7b]
  10: (BlueFS::_allo

Re: [ceph-users] Prioritized pool recovery

2019-05-08 Thread Gregory Farnum
On Mon, May 6, 2019 at 6:41 PM Kyle Brantley  wrote:
>
> On 5/6/2019 6:37 PM, Gregory Farnum wrote:
> > Hmm, I didn't know we had this functionality before. It looks to be
> > changing quite a lot at the moment, so be aware this will likely
> > require reconfiguring later.
>
> Good to know, and not a problem. In any case, I'd assume it won't change 
> substantially for luminous, correct?
>
>
> > I'm not seeing this in the luminous docs, are you sure? The source
>
> You're probably right, but there are options for this in luminous:
>
> # ceph osd pool get vm
> Invalid command: missing required parameter var([...] 
> recovery_priority|recovery_op_priority [...])
>
>
> > code indicates in Luminous it's 0-254. (As I said, things have
> > changed, so in the current master build it seems to be -10 to 10 and
> > configured a bit differently.)
>
> > The 1-63 values generally apply to op priorities within the OSD, and
> > are used as part of a weighted priority queue when selecting the next
> > op to work on out of those available; you may have been looking at
> > osd_recovery_op_priority which is on that scale and should apply to
> > individual recovery messages/ops but will not work to schedule PGs
> > differently.
>
> So I was probably looking at the OSD level then.

Ah sorry, I looked at the recovery_priority option and skipped
recovery_op_priority entirely.

So recovery_op_priority sets the priority on the message dispatch
itself and is on the 0-63 scale. I wouldn't mess around with that; the
higher you put it the more of them will be dispatched compared to
client operations.

>
> >
> >> Questions:
> >> 1) If I have pools 1-4, what would I set these values to in order to 
> >> backfill pools 1, 2, 3, and then 4 in order?
> >
> > So if I'm reading the code right, they just need to be different
> > weights, and the higher value will win when trying to get a
> > reservation if there's a queue of them. (However, it's possible that
> > lower-priority pools will send off requests first and get to do one or
> > two PGs first, then the higher-priority pool will get to do all its
> > work before that pool continues.)
>
> Where higher is 0, or higher is 254? And what's the difference between 
> recovery_priority and recovery_op_priority?

For recovery_priority larger numbers are higher. When picking a PG off
the list of pending reservations, it will take the highest priority PG
it sees, and the first request to come in within that priority.

>
> In reading the docs for the OSD, _op_ is "priority set for recovery 
> operations," and non-op is "priority set for recovery work queue." For 
> someone new to ceph such as myself, this reads like the same thing at a 
> glance. Would the recovery operations not be a part of the work queue?
>
> And would this apply the same for the pools?

When a PG needs to recover, it has to acquire a reservation slot on
the local and remote nodes (to limit the total amount of work being
done). It sends off a request and when the total number of
reservations is hit, they go into a pending queue. The
recovery_priority orders that queue.

>
> >
> >> 2) Assuming this is possible, how do I ensure that backfill isn't 
> >> prioritized over client I/O?
> >
> > This is an ongoing issue but I don't think the pool prioritization
> > will change the existing mechanisms.
>
> Okay, understood. Not a huge problem, I'm primarily looking for understanding.
>
>
> >> 3) Is there a command that enumerates the weights of the current 
> >> operations (so that I can observe what's going on)?
> >
> > "ceph osd pool ls detail" will include them.
> >
>
> Perfect!
>
> Thank you very much for the information. Once I have a little more, I'm 
> probably going to work towards sending a pull request in for the docs...
>
>
> --Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph mimic and samba vfs_ceph

2019-05-08 Thread Gregory Farnum
On Wed, May 8, 2019 at 10:05 AM Ansgar Jazdzewski
 wrote:
>
> hi folks,
>
> we try to build a new NAS using the vfs_ceph modul from samba 4.9.
>
> if i try to open the share i recive the error:
>
> May  8 06:58:44 nas01 smbd[375700]: 2019-05-08 06:58:44.732830
> 7ff3d5f6e700  0 -- 10.100.219.51:0/3414601814 >> 10.100.219.11:6789/0
> pipe(0x7ff3cc00c350 sd=6 :45626 s=1 pgs=0 cs=0 l=0
> c=0x7ff3cc008980).connect protocol feature mismatch, my
> 27ffefdfbfff < peer 27fddff8e
> fa4bffb missing 20
>
> so my guess is that i need to compile samba with the libcephfs from
> mimic but i'am not able to because of this compile-error:
>
> ../../source3/modules/vfs_ceph.c: In function ‘cephwrap_stat’:
> ../../source3/modules/vfs_ceph.c:835:11: warning: implicit declaration
> of function ‘ceph_stat’; did you mean ‘ceph_statx’?
> [-Wimplicit-function-declaration]
>   result = ceph_stat(handle->data, smb_fname->base_name, (struct stat
> *) &stbuf);
>^
>ceph_statx
> ../../source3/modules/vfs_ceph.c: In function ‘cephwrap_fstat’:
> ../../source3/modules/vfs_ceph.c:861:11: warning: implicit declaration
> of function ‘ceph_fstat’; did you mean ‘ceph_fstatx’?
> [-Wimplicit-function-declaration]
>   result = ceph_fstat(handle->data, fsp->fh->fd, (struct stat *) &stbuf);
>^~
>ceph_fstatx
> ../../source3/modules/vfs_ceph.c: In function ‘cephwrap_lstat’:
> ../../source3/modules/vfs_ceph.c:894:11: warning: implicit declaration
> of function ‘ceph_lstat’; did you mean ‘ceph_statx’?
> [-Wimplicit-function-declaration]
>   result = ceph_lstat(handle->data, smb_fname->base_name, &stbuf);
>^~
>ceph_statx
>
> maybe i can disable a feature in cephfs to avoid the error in the first place?

Hmmm unfortunately it looks like the public functions got changed and
so there isn't the standard ceph_stat any more.

Fixing the wiring wouldn't be that complicated if you can hack on the
code at all, but there are some other issues with the Samba VFS
implementation that have prevented anyone from prioritizing it so far.
(Namely, smb forks for every incoming client connection, which means
every smb client gets a completely independent cephfs client, which is
very inefficient.)
-Greg

>
> thanks for your help,
> Ansgar
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus: significant increase in cephfs metadata pool usage

2019-05-08 Thread Gregory Farnum
On Wed, May 8, 2019 at 5:33 AM Dietmar Rieder
 wrote:
>
> On 5/8/19 1:55 PM, Paul Emmerich wrote:
> > Nautilus properly accounts metadata usage, so nothing changed it just
> > shows up correctly now ;)
>
> OK, but then I'm not sure I understand why the increase was not sudden
> (with the update) but it kept growing steadily over days.

Tracking the amount of data used by omap (ie, the internal RocksDB)
isn't really possible to do live, and in the past we haven't done it
at all. In Nautilus, it gets stats whenever a deep scrub happens so
the omap data is always stale, but at least lets us approximate what's
in use for a given PG.

So when you upgraded to Nautilus, the metadata pool scrubbed PGs over
a period of days and each time a PG scrub finished the amount of data
accounted to the pool as a whole increased. :)
-Greg

>
> ~Dietmar
>
> --
> _
> D i e t m a r  R i e d e r, Mag.Dr.
> Innsbruck Medical University
> Biocenter - Division for Bioinformatics
> Innrain 80, 6020 Innsbruck
> Phone: +43 512 9003 71402
> Fax: +43 512 9003 73100
> Email: dietmar.rie...@i-med.ac.at
> Web:   http://www.icbi.at
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Data moved pools but didn't move osds & backfilling+remapped loop

2019-05-08 Thread Gregory Farnum
On Wed, May 8, 2019 at 2:37 AM Marco Stuurman
 wrote:
>
> Hi,
>
> I've got an issue with the data in our pool. A RBD image containing 4TB+ data 
> has moved over to a different pool after a crush rule set change, which 
> should not be possible. Besides that it loops over and over to start 
> remapping and backfilling (goes up to 377 pg active+clean then suddenly drops 
> to 361, without crashes accourding to ceph -w & ceph crash ls)
>
> First about the pools:
>
> [root@CEPH-MGMT-1 ~t]# ceph df
> RAW STORAGE:
> CLASSSIZE   AVAIL  USEDRAW USED %RAW USED
> cheaphdd 16 TiB 10 TiB 5.9 TiB  5.9 TiB 36.08
> fasthdd  33 TiB 18 TiB  16 TiB   16 TiB 47.07
> TOTAL50 TiB 28 TiB  22 TiB   22 TiB 43.44
>
> POOLS:
> POOL ID STORED  OBJECTS USED %USED 
> MAX AVAIL
> pool1  37   780 B1.33M  780 B 
>   0   3.4 TiB
> pool2  48 2.0 TiB   510.57k5.9 TiB
>   42.64   2.6 TiB
>
> All data is now in pool2 while the RBD image is created in pool1 (since pool2 
> is new).
>
> The steps it took to make ceph do this is:
>
> - Add osds with a different device class (class cheaphdd)
> - Create crushruleset for cheaphdd only called cheapdisks
> - Create pool2 with new crush rule set
> - Remove device class from the previously existing devices (remove class hdd)
> - Add class fasthdd to those devices
> - Create new crushruleset fastdisks
> - Change crushruleset for pool1 to fastdisks
>
> After this the data starts moving everything from pool1 to pool2, however, 
> the RBD image still works and the disks of pool1 are still filled with data.
>
> I've tried to reproduce this issue using virtual machines but I couldn't make 
> it happen again.
>
> Some extra information:
> ceph osd crush tree --show-shadow ==> https://fe.ax/639aa.H34539.txt
> ceph pg ls-by-pool pool1 ==> https://fe.ax/dcacd.H44900.txt (I know the PG 
> count is too low)
> ceph pg ls-by-pool pool2 ==> https://fe.ax/95a2c.H51533.txt
> ceph -s ==> https://fe.ax/aab41.H69711.txt
>
>
> Can someone shine a light on why the data looks like it's moved to another 
> pool and/or explain why the data in pool2 keeps remapping/backfilling in a 
> loop?

What version of Ceph are you running? Are the PGs active+clean
changing in any other way?

My guess is this is just the reporting getting messed up because none
of the cheaphdd disks are supposed to be reachable by pool1 now, and
so their disk usage is being assigned to pool2. In which case it will
clear up once all the data movement is done.

Can you confirm if it's getting better as PGs actually migrate?

>
> Thanks!
>
>
> Kind regards,
>
> Marco Stuurman
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-08 Thread EDH - Manuel Rios Fernandez
Eric,

Yes we do :

time s3cmd ls s3://[BUCKET]/ --no-ssl and we get near 2min 30 secs for list the 
bucket.

If we instantly hit again the query it normally timeouts.


Could you explain a little more "

With respect to your earlier message in which you included the output of `ceph 
df`, I believe the reason that default.rgw.buckets.index shows as
0 bytes used is that the index uses the metadata branch of the object to store 
its data.
"
I read in IRC today that in Nautilus release now is well calculated and no show 
more 0B. Is it correct?

Thanks for your response.


-Mensaje original-
De: J. Eric Ivancich  
Enviado el: miércoles, 8 de mayo de 2019 21:00
Para: EDH - Manuel Rios Fernandez ; 'Casey Bodley' 
; ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker 
diferent.

Hi Manuel,

My response is interleaved.

On 5/7/19 7:32 PM, EDH - Manuel Rios Fernandez wrote:
> Hi Eric,
> 
> This looks like something the software developer must do, not something than 
> Storage provider must allow no?

True -- so you're using `radosgw-admin bucket list --bucket=XYZ` to list the 
bucket? Currently we do not allow for a "--allow-unordered" flag, but there's 
no reason we could not. I'm working on the PR now, although it might take some 
time before it gets to v13.

> Strange behavior is that sometimes bucket is list fast in less than 30 secs 
> and other time it timeout after 600 secs, the bucket contains 875 folders 
> with a total object number of 6Millions.
> 
> I don’t know how a simple list of 875 folder can timeout after 600 
> secs

Burkhard Linke's comment is on target. The "folders" are a trick using 
delimiters. A bucket is really entirely flat without a hierarchy.

> We bought several NVMe Optane for do 4 partitions in each PCIe card and get 
> up 1.000.000 IOPS for Index. Quite expensive because we calc that our index 
> is just 4GB (100-200M objects),waiting those cards. Any more idea?

With respect to your earlier message in which you included the output of `ceph 
df`, I believe the reason that default.rgw.buckets.index shows as
0 bytes used is that the index uses the metadata branch of the object to store 
its data.

> Regards

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-08 Thread J. Eric Ivancich
Hi Manuel,

My response is interleaved.

On 5/7/19 7:32 PM, EDH - Manuel Rios Fernandez wrote:
> Hi Eric,
> 
> This looks like something the software developer must do, not something than 
> Storage provider must allow no?

True -- so you're using `radosgw-admin bucket list --bucket=XYZ` to list
the bucket? Currently we do not allow for a "--allow-unordered" flag,
but there's no reason we could not. I'm working on the PR now, although
it might take some time before it gets to v13.

> Strange behavior is that sometimes bucket is list fast in less than 30 secs 
> and other time it timeout after 600 secs, the bucket contains 875 folders 
> with a total object number of 6Millions.
> 
> I don’t know how a simple list of 875 folder can timeout after 600 secs

Burkhard Linke's comment is on target. The "folders" are a trick using
delimiters. A bucket is really entirely flat without a hierarchy.

> We bought several NVMe Optane for do 4 partitions in each PCIe card and get 
> up 1.000.000 IOPS for Index. Quite expensive because we calc that our index 
> is just 4GB (100-200M objects),waiting those cards. Any more idea?

With respect to your earlier message in which you included the output of
`ceph df`, I believe the reason that default.rgw.buckets.index shows as
0 bytes used is that the index uses the metadata branch of the object to
store its data.

> Regards

Eric
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph mimic and samba vfs_ceph

2019-05-08 Thread Ansgar Jazdzewski
hi folks,

we try to build a new NAS using the vfs_ceph modul from samba 4.9.

if i try to open the share i recive the error:

May  8 06:58:44 nas01 smbd[375700]: 2019-05-08 06:58:44.732830
7ff3d5f6e700  0 -- 10.100.219.51:0/3414601814 >> 10.100.219.11:6789/0
pipe(0x7ff3cc00c350 sd=6 :45626 s=1 pgs=0 cs=0 l=0
c=0x7ff3cc008980).connect protocol feature mismatch, my
27ffefdfbfff < peer 27fddff8e
fa4bffb missing 20

so my guess is that i need to compile samba with the libcephfs from
mimic but i'am not able to because of this compile-error:

../../source3/modules/vfs_ceph.c: In function ‘cephwrap_stat’:
../../source3/modules/vfs_ceph.c:835:11: warning: implicit declaration
of function ‘ceph_stat’; did you mean ‘ceph_statx’?
[-Wimplicit-function-declaration]
  result = ceph_stat(handle->data, smb_fname->base_name, (struct stat
*) &stbuf);
   ^
   ceph_statx
../../source3/modules/vfs_ceph.c: In function ‘cephwrap_fstat’:
../../source3/modules/vfs_ceph.c:861:11: warning: implicit declaration
of function ‘ceph_fstat’; did you mean ‘ceph_fstatx’?
[-Wimplicit-function-declaration]
  result = ceph_fstat(handle->data, fsp->fh->fd, (struct stat *) &stbuf);
   ^~
   ceph_fstatx
../../source3/modules/vfs_ceph.c: In function ‘cephwrap_lstat’:
../../source3/modules/vfs_ceph.c:894:11: warning: implicit declaration
of function ‘ceph_lstat’; did you mean ‘ceph_statx’?
[-Wimplicit-function-declaration]
  result = ceph_lstat(handle->data, smb_fname->base_name, &stbuf);
   ^~
   ceph_statx

maybe i can disable a feature in cephfs to avoid the error in the first place?

thanks for your help,
Ansgar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Delta Lake Support

2019-05-08 Thread Scottix
Hey Cephers,
There is a new OSS software called Delta Lake https://delta.io/

It is compatible with HDFS but seems ripe to add Ceph support as a backend
storage. Just want to put this on the radar for any feelers.

Best
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clients failing to respond to cache pressure

2019-05-08 Thread Patrick Donnelly
On Wed, May 8, 2019 at 4:10 AM Stolte, Felix  wrote:
>
> Hi folks,
>
> we are running a luminous cluster and using the cephfs for fileservices. We 
> use Tivoli Storage Manager to backup all data in the ceph filesystem to tape 
> for disaster recovery. Backup runs on two dedicated servers, which mounted 
> the cephfs via kernel mount. In order to complete the Backup in time we are 
> using 60 Backup Threads per Server. While backup is running, ceph health 
> often changes from “OK” to “2 clients failing to respond to cache pressure”. 
> After investigating and doing research in the mailing list I set the 
> following parameters:
>
> mds_cache_memory_limit = 34359738368 (32 GB) on MDS Server
>
> client_oc_size = 104857600 (100 MB, default is 200 MB) on Backup Servers
>
> All Servers running Ubuntu 18.04 with Kernel 4.15.0-47 and ceph 12.2.11. We 
> have 3 MDS Servers, 1 Active, 2 Standby. Changing to multiple active MDS 
> Servers is not an option, since we are planning to use snapshots. Cephfs 
> holds 78,815,975 files.
>
> Any advice on getting rid of the Warning would be very much appreciated. On a 
> sidenote: Although MDS Cache Memory is set to 32GB htop shows 60GB Memory 
> Usage for the ceph-mds process

With clients doing backup it's likely that they hold millions of caps.
This is not a good situation to be in. I recommend upgrading to
12.2.12 as we recently backported a fix for the MDS to limit the
number of caps held by clients to 1M. Additionally, trimming the cache
and recalling caps is now throttled. This may help a lot for your
workload.

Note that these fixes haven't been backported to Mimic yet.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clients failing to respond to cache pressure

2019-05-08 Thread Stolte, Felix
Hi Paul,

we are using Kernel 4.15.0-47.

Regards 
Felix

IT-Services
Telefon 02461 61-9243
E-Mail: f.sto...@fz-juelich.de
-
-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-
-
 

Am 08.05.19, 13:58 schrieb "Paul Emmerich" :

Which kernel are you using on the clients?

Paul
-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, May 8, 2019 at 1:10 PM Stolte, Felix  wrote:
>
> Hi folks,
>
> we are running a luminous cluster and using the cephfs for fileservices. 
We use Tivoli Storage Manager to backup all data in the ceph filesystem to tape 
for disaster recovery. Backup runs on two dedicated servers, which mounted the 
cephfs via kernel mount. In order to complete the Backup in time we are using 
60 Backup Threads per Server. While backup is running, ceph health often 
changes from “OK” to “2 clients failing to respond to cache pressure”. After 
investigating and doing research in the mailing list I set the following 
parameters:
>
> mds_cache_memory_limit = 34359738368 (32 GB) on MDS Server
>
> client_oc_size = 104857600 (100 MB, default is 200 MB) on Backup Servers
>
> All Servers running Ubuntu 18.04 with Kernel 4.15.0-47 and ceph 12.2.11. 
We have 3 MDS Servers, 1 Active, 2 Standby. Changing to multiple active MDS 
Servers is not an option, since we are planning to use snapshots. Cephfs holds 
78,815,975 files.
>
> Any advice on getting rid of the Warning would be very much appreciated. 
On a sidenote: Although MDS Cache Memory is set to 32GB htop shows 60GB Memory 
Usage for the ceph-mds process
>
> Best regards
> Felix
>
> 
-
> 
-
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> 
-
> 
-
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stalls on new RBD images.

2019-05-08 Thread Jason Dillaman
On Wed, May 8, 2019 at 7:26 AM  wrote:
>
> Hi.
>
> I'm fishing a bit here.
>
> What we see is that when we have new VM/RBD/SSD-backed images the
> time before they are "fully written" first time - can be lousy
> performance. Sort of like they are thin-provisioned and the subsequent
> growing of the images in Ceph deliveres a performance hit.

Do you have object-map enabled? On a very fast flash-based Ceph
cluster, the object-map becomes a bottleneck on empty RBD images since
the OSDs are only capable of performing ~2-3K object map updates /
second. Since the object-map is only updated when a backing object is
first written, that could account for initial performance hit.
However, once the object-map is updated, it is no longer in the IO
path so you can achieve 10s of thousands of writes per second.

> Does anyone else have someting similar in their setup - how do you deal
> with it?
>
> KVM based virtualization, Ceph Luminous.
>
> Any suggestions/hints/welcome
>
> Jesper
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus: significant increase in cephfs metadata pool usage

2019-05-08 Thread Dietmar Rieder
On 5/8/19 1:55 PM, Paul Emmerich wrote:
> Nautilus properly accounts metadata usage, so nothing changed it just
> shows up correctly now ;)

OK, but then I'm not sure I understand why the increase was not sudden
(with the update) but it kept growing steadily over days.

~Dietmar

-- 
_
D i e t m a r  R i e d e r, Mag.Dr.
Innsbruck Medical University
Biocenter - Division for Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402
Fax: +43 512 9003 73100
Email: dietmar.rie...@i-med.ac.at
Web:   http://www.icbi.at




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What is recommended ceph docker image for use

2019-05-08 Thread Ignat Zapolsky
Hi,

Just a question what is recommended docker container image to use for ceph ?

CEPH website us saying that 12.2.x is LTR but there are at least 2 more 
releases in dockerhub – 13 and 14.

Would there be any advise on selection between 3 releases ?

Sent from Mail for Windows 10


-- 
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify the system manager. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clients failing to respond to cache pressure

2019-05-08 Thread Stolte, Felix


smime.p7m
Description: S/MIME encrypted message
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clients failing to respond to cache pressure

2019-05-08 Thread Paul Emmerich
Which kernel are you using on the clients?

Paul
-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, May 8, 2019 at 1:10 PM Stolte, Felix  wrote:
>
> Hi folks,
>
> we are running a luminous cluster and using the cephfs for fileservices. We 
> use Tivoli Storage Manager to backup all data in the ceph filesystem to tape 
> for disaster recovery. Backup runs on two dedicated servers, which mounted 
> the cephfs via kernel mount. In order to complete the Backup in time we are 
> using 60 Backup Threads per Server. While backup is running, ceph health 
> often changes from “OK” to “2 clients failing to respond to cache pressure”. 
> After investigating and doing research in the mailing list I set the 
> following parameters:
>
> mds_cache_memory_limit = 34359738368 (32 GB) on MDS Server
>
> client_oc_size = 104857600 (100 MB, default is 200 MB) on Backup Servers
>
> All Servers running Ubuntu 18.04 with Kernel 4.15.0-47 and ceph 12.2.11. We 
> have 3 MDS Servers, 1 Active, 2 Standby. Changing to multiple active MDS 
> Servers is not an option, since we are planning to use snapshots. Cephfs 
> holds 78,815,975 files.
>
> Any advice on getting rid of the Warning would be very much appreciated. On a 
> sidenote: Although MDS Cache Memory is set to 32GB htop shows 60GB Memory 
> Usage for the ceph-mds process
>
> Best regards
> Felix
>
> -
> -
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> -
> -
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus: significant increase in cephfs metadata pool usage

2019-05-08 Thread Paul Emmerich
Nautilus properly accounts metadata usage, so nothing changed it just
shows up correctly now ;)


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, May 8, 2019 at 12:32 PM Dietmar Rieder
 wrote:
>
> Hi,
>
> we just recently upgraded our cluster from luminous 12.2.10 to nautilus
> 14.2.1 and I noticed a massive increase of the space used on the cephfs
> metadata pool although the used space in the 2 data pools  basically did
> not change. See the attached graph (NOTE: log10 scale on y-axis)
>
> Is there any reason that explains this?
>
> Thanks
>   Dietmar
>
>
> --
> _
> D i e t m a r  R i e d e r, Mag.Dr.
> Innsbruck Medical University
> Biocenter - Division for Bioinformatics
> Innrain 80, 6020 Innsbruck
> Phone: +43 512 9003 71402
> Fax: +43 512 9003 73100
> Email: dietmar.rie...@i-med.ac.at
> Web:   http://www.icbi.at
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Stalls on new RBD images.

2019-05-08 Thread jesper
Hi.

I'm fishing a bit here.

What we see is that when we have new VM/RBD/SSD-backed images the
time before they are "fully written" first time - can be lousy
performance. Sort of like they are thin-provisioned and the subsequent
growing of the images in Ceph deliveres a performance hit.

Does anyone else have someting similar in their setup - how do you deal
with it?

KVM based virtualization, Ceph Luminous.

Any suggestions/hints/welcome

Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Clients failing to respond to cache pressure

2019-05-08 Thread Stolte, Felix
Hi folks,
 
we are running a luminous cluster and using the cephfs for fileservices. We use 
Tivoli Storage Manager to backup all data in the ceph filesystem to tape for 
disaster recovery. Backup runs on two dedicated servers, which mounted the 
cephfs via kernel mount. In order to complete the Backup in time we are using 
60 Backup Threads per Server. While backup is running, ceph health often 
changes from “OK” to “2 clients failing to respond to cache pressure”. After 
investigating and doing research in the mailing list I set the following 
parameters:
 
mds_cache_memory_limit = 34359738368 (32 GB) on MDS Server
 
client_oc_size = 104857600 (100 MB, default is 200 MB) on Backup Servers
 
All Servers running Ubuntu 18.04 with Kernel 4.15.0-47 and ceph 12.2.11. We 
have 3 MDS Servers, 1 Active, 2 Standby. Changing to multiple active MDS 
Servers is not an option, since we are planning to use snapshots. Cephfs holds 
78,815,975 files.
 
Any advice on getting rid of the Warning would be very much appreciated. On a 
sidenote: Although MDS Cache Memory is set to 32GB htop shows 60GB Memory Usage 
for the ceph-mds process
 
Best regards
Felix

-
-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-
-
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Nautilus: significant increase in cephfs metadata pool usage

2019-05-08 Thread Dietmar Rieder
Hi,

we just recently upgraded our cluster from luminous 12.2.10 to nautilus
14.2.1 and I noticed a massive increase of the space used on the cephfs
metadata pool although the used space in the 2 data pools  basically did
not change. See the attached graph (NOTE: log10 scale on y-axis)

Is there any reason that explains this?

Thanks
  Dietmar


-- 
_
D i e t m a r  R i e d e r, Mag.Dr.
Innsbruck Medical University
Biocenter - Division for Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402
Fax: +43 512 9003 73100
Email: dietmar.rie...@i-med.ac.at
Web:   http://www.icbi.at



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Data moved pools but didn't move osds & backfilling+remapped loop

2019-05-08 Thread Marco Stuurman
Hi,

I've got an issue with the data in our pool. A RBD image containing 4TB+
data has moved over to a different pool after a crush rule set change,
which should not be possible. Besides that it loops over and over to start
remapping and backfilling (goes up to 377 pg active+clean then suddenly
drops to 361, without crashes accourding to ceph -w & ceph crash ls)

First about the pools:

[root@CEPH-MGMT-1 ~t]# ceph df
RAW STORAGE:
CLASSSIZE   AVAIL  USEDRAW USED %RAW USED
cheaphdd 16 TiB 10 TiB 5.9 TiB  5.9 TiB 36.08
fasthdd  33 TiB 18 TiB  16 TiB   16 TiB 47.07
TOTAL50 TiB 28 TiB  22 TiB   22 TiB 43.44

POOLS:
POOL ID STORED  OBJECTS USED %USED
 MAX AVAIL
pool1  37   780 B1.33M  780 B
 0   3.4 TiB
pool2  48 2.0 TiB   510.57k5.9 TiB
42.64   2.6 TiB

All data is now in pool2 while the RBD image is created in pool1 (since
pool2 is new).

The steps it took to make ceph do this is:

- Add osds with a different device class (class cheaphdd)
- Create crushruleset for cheaphdd only called cheapdisks
- Create pool2 with new crush rule set
- Remove device class from the previously existing devices (remove class
hdd)
- Add class fasthdd to those devices
- Create new crushruleset fastdisks
- Change crushruleset for pool1 to fastdisks

After this the data starts moving everything from pool1 to pool2, however,
the RBD image still works and the disks of pool1 are still filled with data.

I've tried to reproduce this issue using virtual machines but I couldn't
make it happen again.

Some extra information:
ceph osd crush tree --show-shadow ==> https://fe.ax/639aa.H34539.txt
ceph pg ls-by-pool pool1 ==> https://fe.ax/dcacd.H44900.txt (I know the PG
count is too low)
ceph pg ls-by-pool pool2 ==> https://fe.ax/95a2c.H51533.txt
ceph -s ==> https://fe.ax/aab41.H69711.txt


Can someone shine a light on why the data looks like it's moved to another
pool and/or explain why the data in pool2 keeps remapping/backfilling in a
loop?

Thanks!


Kind regards,

Marco Stuurman
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] clients failing to respond to cache pressure

2019-05-08 Thread Stolte, Felix


smime.p7m
Description: S/MIME encrypted message
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-08 Thread Burkhard Linke

Hi,


just a comment (and please correct my if I'm wrong)


There are no "folders" in S3. A bucket is a plain list of objects. What 
you recognize as a folder is an artificial construct, e.g. usual path 
delimiter used by S3 access tool to create "folders".



As a result, listing a bucket with 6 million objects in 875 "folders" 
does require listing all 6 million objects. You can validate this by 
looking at the requests send to the RGW (for example using 's3cmd -d la'):


...

DEBUG: Sending request method_string='GET', uri='/?delimiter=/', 
headers={'x-amz-content-sha256': 'XXX', 'Authorization': 
'AWS4-HMAC-SHA256 
Credential=XXX/US/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=XX', 
'x-amz-date': '20190508T073339Z'}, body=(0 bytes)





And compare the request URL to the S3 API spec:

https://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html


'delimiter=/' is just a convenience parameter for grouping the results. 
The implementation still has to enumerate all objects.



Regards,

Burkhard


--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com