Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Georg Höllrigl

On 22.05.2014 15:36, Yehuda Sadeh wrote:

On Thu, May 22, 2014 at 6:16 AM, Georg Höllrigl
georg.hoellr...@xidras.com wrote:

Hello List,

Using the radosgw works fine, as long as the amount of data doesn't get too
big.

I have created one bucket that holds many small files, separated into
different directories. But whenever I try to acess the bucket, I only run
into some timeout. The timeout is at around 30 - 100 seconds. This is
smaller then the Apache timeout of 300 seconds.

I've tried to access the bucket with different clients - one thing is s3cmd
- which still is able to upload things, but takes rather long time, when
listing the contents.
Then I've  tried with s3fs-fuse - which throws
ls: reading directory .: Input/output error

Also Cyberduck and S3Browser show a similar behaivor.

Is there an option, to only send back maybe 1000 list entries, like Amazon
das? So that the client might decide, if he want's to list all the contents?



That how it works, it doesn't return more than 1000 entries at once.


OK. I found that in the Requests. So it's the client, that states how 
many objects should be in the listing with sending the max-keys=1000 
variable:


- - - [23/May/2014:08:49:33 +] GET 
/test/?delimiter=%2Fmax-keys=1000prefix HTTP/1.1 200 715 - 
Cyberduck/4.4.4 (14505) (Windows NT (unknown)/6.2) (x86) 
xidrasservice.com:443



Are there any timeout values in radosgw?


Are you sure the timeout is in the gateway itself? Could be apache
that is timing out. Will need to see the apache access logs for these
operations, radosgw debug and messenger logs (debug rgw = 20, debug ms
= 1), to give a better answer.


No I'm not sure where the timeout comes from. As far as I can tell, 
apache times out after 300 seconds - so that should not be the problem.


I think I found something in the apache logs:
[Fri May 23 08:59:39.385548 2014] [fastcgi:error] [pid 3035:tid 
140723006891776] [client 10.0.1.66:46049] FastCGI: comm with server 
/var/www/s3gw.fcgi aborted: idle timeout (30 sec)
[Fri May 23 08:59:39.385604 2014] [fastcgi:error] [pid 3035:tid 
140723006891776] [client 10.0.1.66:46049] FastCGI: incomplete headers (0 
bytes) received from server /var/www/s3gw.fcgi


I've increased the timeout to 900 in the apache vhosts config:
FastCgiExternalServer /var/www/s3gw.fcgi -socket 
/var/run/ceph/radosgw.vvx-ceph-m-02 -idle-timeout 900

Now it's not working, and I don't get a log entry any more.

Most interesting when watching the debug output - I'm getting that rados 
successfully finished with the request. But at the same time, the client 
tells me, it failed.


I've shortened the log file, as far as I could see, the info repeats 
itself...


2014-05-23 09:38:43.051395 7f1b427fc700  1 == starting new request 
req=0x7f1b3400f1c0 =
2014-05-23 09:38:43.051597 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.1.199:6800/14453 -- osd_op(client.72942.0:120 UHXW458EH1RVULE1BCEH 
[getxattrs,stat] 11.10193f7e ack+read e279) v4 -- ?+0 0x7f1b4640 con 
0x2455930
2014-05-23 09:38:43.053180 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.0 10.0.1.199:6800/14453 23  osd_op_reply(120 
UHXW458EH1RVULE1BCEH [getxattrs,stat] v0'0 uv1 ondisk = 0) v6  
229+0+20 (1060030390 0 1010060712) 0x7f1b58002540 con 0x2455930
2014-05-23 09:38:43.053380 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.1.199:6800/14453 -- osd_op(client.72942.0:121 UHXW458EH1RVULE1BCEH 
[read 0~524288] 11.10193f7e ack+read e279) v4 -- ?+0 0x7f1b45d0 con 
0x2455930
2014-05-23 09:38:43.054359 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.0 10.0.1.199:6800/14453 24  osd_op_reply(121 
UHXW458EH1RVULE1BCEH [read 0~8] v0'0 uv1 ondisk = 0) v6  187+0+8 
(3510944971 0 3829959217) 0x7f1b580057b0 con 0x2455930
2014-05-23 09:38:43.054490 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.1.199:6806/15018 -- osd_op(client.72942.0:122 macm [getxattrs,stat] 
7.1069f101 ack+read e279) v4 -- ?+0 0x7f1b6010 con 0x2457de0
2014-05-23 09:38:43.055871 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.2 10.0.1.199:6806/15018 3  osd_op_reply(122 macm 
[getxattrs,stat] v0'0 uv46 ondisk = 0) v6  213+0+91 (22324782 0 
2022698800) 0x7f1b500025a0 con 0x2457de0
2014-05-23 09:38:43.055963 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.1.199:6806/15018 -- osd_op(client.72942.0:123 macm [read 0~524288] 
7.1069f101 ack+read e279) v4 -- ?+0 0x7f1b3950 con 0x2457de0
2014-05-23 09:38:43.057087 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.2 10.0.1.199:6806/15018 4  osd_op_reply(123 macm [read 0~310] 
v0'0 uv46 ondisk = 0) v6  171+0+310 (3762965810 0 1648184722) 
0x7f1b500026e0 con 0x2457de0
2014-05-23 09:38:43.057364 7f1b427fc700  1 -- 10.0.1.107:0/1005898 -- 
10.0.0.26:6809/4834 -- osd_op(client.72942.0:124 store [call 
version.read,getxattrs,stat] 5.c5755cee ack+read e279) v4 -- ?+0 
0x7f1b66b0 con 0x7f1b440022e0
2014-05-23 09:38:43.059223 7f1b96d80700  1 -- 10.0.1.107:0/1005898 == 
osd.7 10.0.0.26:6809/4834 37  osd_op_reply

Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Georg Höllrigl
Thank you very much - I think I've solved the whole thing. It wasn't in 
radosgw.


The solution was,
- increase the timeout in Apache conf.
- when using haproxy, also increase the timeouts there!


Georg

On 22.05.2014 15:36, Yehuda Sadeh wrote:

On Thu, May 22, 2014 at 6:16 AM, Georg Höllrigl
georg.hoellr...@xidras.com wrote:

Hello List,

Using the radosgw works fine, as long as the amount of data doesn't get too
big.

I have created one bucket that holds many small files, separated into
different directories. But whenever I try to acess the bucket, I only run
into some timeout. The timeout is at around 30 - 100 seconds. This is
smaller then the Apache timeout of 300 seconds.

I've tried to access the bucket with different clients - one thing is s3cmd
- which still is able to upload things, but takes rather long time, when
listing the contents.
Then I've  tried with s3fs-fuse - which throws
ls: reading directory .: Input/output error

Also Cyberduck and S3Browser show a similar behaivor.

Is there an option, to only send back maybe 1000 list entries, like Amazon
das? So that the client might decide, if he want's to list all the contents?



That how it works, it doesn't return more than 1000 entries at once.



Are there any timeout values in radosgw?


Are you sure the timeout is in the gateway itself? Could be apache
that is timing out. Will need to see the apache access logs for these
operations, radosgw debug and messenger logs (debug rgw = 20, debug ms
= 1), to give a better answer.

Yehuda


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw Timeout

2014-05-23 Thread Georg Höllrigl



On 22.05.2014 17:30, Craig Lewis wrote:

On 5/22/14 06:16 , Georg Höllrigl wrote:


I have created one bucket that holds many small files, separated into
different directories. But whenever I try to acess the bucket, I
only run into some timeout. The timeout is at around 30 - 100 seconds.
This is smaller then the Apache timeout of 300 seconds.


Just so we're all talking about the same things, what does many small
files mean to you?  Also, how are you separating them into
directories?  Are you just giving files in the same directory the
same leading string, like dir1_subdir1_filename?


I can only estimate how many files. ATM I've 25M files on the origin but 
only 1/10th has been synced to radosgw. These are distributed throuhg 20 
folders, each containing about 2k directories with ~ 100 - 500 files each.


Do you think that's too much in that usecase?


I'm putting about 1M objects, random sizes, in each bucket.  I'm not
having problems getting individual files, or uploading new ones.  It
does take a long time for s3cmd to list the contents of the bucket. The
only time I get timeouts is when my cluster is very unhealthy.

If you're doing a lot more than that, say 10M or 100M objects, then that
could cause a hot spot on disk.  You might be better off taking your
directories, and putting them in their own bucket.


--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter
http://www.twitter.com/centraldesktop  | Facebook
http://www.facebook.com/CentralDesktop  | LinkedIn
http://www.linkedin.com/groups?gid=147417  | Blog
http://cdblog.centraldesktop.com/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Radosgw Timeout

2014-05-22 Thread Georg Höllrigl

Hello List,

Using the radosgw works fine, as long as the amount of data doesn't get 
too big.


I have created one bucket that holds many small files, separated into 
different directories. But whenever I try to acess the bucket, I only 
run into some timeout. The timeout is at around 30 - 100 seconds. This 
is smaller then the Apache timeout of 300 seconds.


I've tried to access the bucket with different clients - one thing is 
s3cmd - which still is able to upload things, but takes rather long 
time, when listing the contents.

Then I've  tried with s3fs-fuse - which throws
ls: reading directory .: Input/output error

Also Cyberduck and S3Browser show a similar behaivor.

Is there an option, to only send back maybe 1000 list entries, like 
Amazon das? So that the client might decide, if he want's to list all 
the contents?


Are there any timeout values in radosgw?

Any further thoughts, how I would increase performance on these listings?


Kind Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool without Name

2014-05-15 Thread Georg Höllrigl

On 14.05.2014 17:26, Wido den Hollander wrote:

On 05/14/2014 05:24 PM, Georg Höllrigl wrote:

Hello List,

I see a pool without a name:

ceph osd lspools
0 data,1 metadata,2 rbd,3 .rgw.root,4 .rgw.control,5 .rgw,6 .rgw.gc,7
.users.uid,8 openstack-images,9 openstack-volumes,10
openstack-backups,11 .users,12 .users.swift,13 .users.email,14 .log,15
.rgw.buckets,16 .rgw.buckets.index,17 .usage,18 .intent-log,20 ,

I've already deleted one of those (with ID 19) with

rados rmpool   --yes-i-really-really-mean-it

But now it's back with ID 20.

Where do they come from? What kind of data is in there?



You are running Dumpling 0.67.X with the RGW? It's something which is
caused by the RGW.


No, the cluster is in the latest firefly release 0.80.1

I only found your entries from November 2013 - that's how I know how to 
delete the entry.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-November/005737.html

I found the discussion 
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg08975.html - 
but the only info I see, that the empty named pool comes from rados.


So maybe it's a bug somewhere in radosgw? I think pools should have names :)



There is a thread on this list from two weeks ago about this.



Kind Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados GW Method not allowed

2014-05-14 Thread Georg Höllrigl

Hello Everyone,

The important thing here is, to include the rgw_dns_name in ceph.conf 
and to restart radosgw. Also you need the DNS configured to point to 
your radosgw + a wildcard subdomain.
For example s3cmd handles the access this way, and you'll see the 
Method Not Allowed message if you miss anything!



Kind Regards,
Georg

On 13.05.2014 14:30, Georg Höllrigl wrote:

Hello,

System Ubuntu 14.04
Ceph 0.80

I'm getting either a 405 Method Not Allowed or a 403 Permission Denied
from Radosgw.


Here is what I get from radosgw:

HTTP/1.1 405 Method Not Allowed
Date: Tue, 13 May 2014 12:21:43 GMT
Server: Apache
Accept-Ranges: bytes
Content-Length: 82
Content-Type: application/xml

?xml version=1.0
encoding=UTF-8?ErrorCodeMethodNotAllowed/Code/Error

I can see that the user exists using:
radosgw-admin --name client.radosgw.ceph-m-01 metadata list user

I can get the credentials via:

#radosgw-admin user info --uid=test
{ user_id: test,
   display_name: test,
   email: ,
   suspended: 0,
   max_buckets: 1000,
   auid: 0,
   subusers: [],
   keys: [
 { user: test,
   access_key: 95L2C7BFQ8492LVZ271N,
   secret_key: f2tqIet+LrD0kAXYAUrZXydL+1nsO6Gs+we+94U5}],
   swift_keys: [],
   caps: [],
   op_mask: read, write, delete,
   default_placement: ,
   placement_tags: [],
   bucket_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   user_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   temp_url_keys: []}

I've also found some hints about a broken redirect in apache - but not
really a working version.

Any hints? Any thoughts about how to solve that? Where to get more
detailed logs, why it's not supporting to create a bucket?


KInd Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Pool without Name

2014-05-14 Thread Georg Höllrigl

Hello List,

I see a pool without a name:

ceph osd lspools
0 data,1 metadata,2 rbd,3 .rgw.root,4 .rgw.control,5 .rgw,6 .rgw.gc,7 
.users.uid,8 openstack-images,9 openstack-volumes,10 
openstack-backups,11 .users,12 .users.swift,13 .users.email,14 .log,15 
.rgw.buckets,16 .rgw.buckets.index,17 .usage,18 .intent-log,20 ,


I've already deleted one of those (with ID 19) with

rados rmpool   --yes-i-really-really-mean-it

But now it's back with ID 20.

Where do they come from? What kind of data is in there?


Kind Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rados GW Method not allowed

2014-05-13 Thread Georg Höllrigl

Hello,

System Ubuntu 14.04
Ceph 0.80

I'm getting either a 405 Method Not Allowed or a 403 Permission Denied 
from Radosgw.



Here is what I get from radosgw:

HTTP/1.1 405 Method Not Allowed
Date: Tue, 13 May 2014 12:21:43 GMT
Server: Apache
Accept-Ranges: bytes
Content-Length: 82
Content-Type: application/xml

?xml version=1.0 
encoding=UTF-8?ErrorCodeMethodNotAllowed/Code/Error


I can see that the user exists using:
radosgw-admin --name client.radosgw.ceph-m-01 metadata list user

I can get the credentials via:

#radosgw-admin user info --uid=test
{ user_id: test,
  display_name: test,
  email: ,
  suspended: 0,
  max_buckets: 1000,
  auid: 0,
  subusers: [],
  keys: [
{ user: test,
  access_key: 95L2C7BFQ8492LVZ271N,
  secret_key: f2tqIet+LrD0kAXYAUrZXydL+1nsO6Gs+we+94U5}],
  swift_keys: [],
  caps: [],
  op_mask: read, write, delete,
  default_placement: ,
  placement_tags: [],
  bucket_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  user_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  temp_url_keys: []}

I've also found some hints about a broken redirect in apache - but not 
really a working version.


Any hints? Any thoughts about how to solve that? Where to get more 
detailed logs, why it's not supporting to create a bucket?



KInd Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Not getting into a clean state

2014-05-12 Thread Georg Höllrigl

Thank you soo much! That seems to work immidetately.

ATM I still see 3 pgs in active+clean+scrubbing state - but that will 
hopefully fix by time.


So the way to go with firefly, is to either use at least 3 hosts for 
OSDs - or reduce the number of replicas?


Kind Regards,
Georg


On 09.05.2014 10:59, Martin B Nielsen wrote:

Hi,

I experienced exactly the same with 14.04 and the 0.79 release.

It was a fresh clean install with default crushmap and ceph-deploy
install as pr. the quick-start guide.

Oddly enough changing replica size (incl min_size) from 3 - 2 (and 2-1)
and back again it worked.

I didn't have time to look into replicating the issue.

Cheers,
Martin


On Thu, May 8, 2014 at 4:30 PM, Georg Höllrigl
georg.hoellr...@xidras.com mailto:georg.hoellr...@xidras.com wrote:

Hello,

We've a fresh cluster setup - with Ubuntu 14.04 and ceph firefly. By
now I've tried this multiple times - but the result keeps the same
and shows me lots of troubles (the cluster is empty, no client has
accessed it)

#ceph -s
 cluster b04fc583-9e71-48b7-a741-__92f4dff4cfef
  health HEALTH_WARN 470 pgs stale; 470 pgs stuck stale; 18 pgs
stuck unclean; 26 requests are blocked  32 sec
  monmap e2: 3 mons at

{ceph-m-01=10.0.0.100:6789/0,__ceph-m-02=10.0.1.101:6789/0,__ceph-m-03=10.0.1.102:6789/0

http://10.0.0.100:6789/0,ceph-m-02=10.0.1.101:6789/0,ceph-m-03=10.0.1.102:6789/0},
election epoch 8, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03
  osdmap e409: 9 osds: 9 up, 9 in
   pgmap v1231: 480 pgs, 9 pools, 822 bytes data, 43 objects
 9373 MB used, 78317 GB / 78326 GB avail
  451 stale+active+clean
1 stale+active+clean+scrubbing
   10 active+clean
   18 stale+active+remapped

Anyone an idea what happens here? Should an empty cluster not show
only active+clean pgs?


Regards,
Georg
_
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Not getting into a clean state

2014-05-09 Thread Georg Höllrigl

Hello,

I've already thought about that - but even after changing the 
replication level (size) I'm not getting a clean cluster (there are only 
the default pools ATM):


root@ceph-m-02:~#ceph -s
cluster b04fc583-9e71-48b7-a741-92f4dff4cfef
 health HEALTH_WARN 232 pgs stuck unclean; recovery 26/126 objects 
degraded (20.635%)
 monmap e2: 3 mons at 
{ceph-m-01=10.0.0.100:6789/0,ceph-m-02=10.0.1.101:6789/0,ceph-m-03=10.0.1.102:6789/0}, 
election epoch 8, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03

 osdmap e56: 9 osds: 9 up, 9 in
  pgmap v287: 232 pgs, 8 pools, 822 bytes data, 43 objects
9342 MB used, 78317 GB / 78326 GB avail
26/126 objects degraded (20.635%)
 119 active
 113 active+remapped
root@ceph-m-02:~#ceph osd dump | grep size
pool 0 'data' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 48 owner 0 flags hashpspool 
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 49 owner 0 flags 
hashpspool stripe_width 0
pool 2 'rbd' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 50 owner 0 flags hashpspool 
stripe_width 0
pool 3 '.rgw.root' replicated size 2 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 52 owner 0 flags 
hashpspool stripe_width 0
pool 4 '.rgw.control' replicated size 2 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 53 owner 0 flags 
hashpspool stripe_width 0
pool 5 '.rgw' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 54 owner 18446744073709551615 
flags hashpspool stripe_width 0
pool 6 '.rgw.gc' replicated size 2 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 55 owner 0 flags 
hashpspool stripe_width 0
pool 7 '.users.uid' replicated size 2 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 56 owner 
18446744073709551615 flags hashpspool stripe_width 0



Kind Regards,
Georg


On 09.05.2014 08:29, Mark Kirkwood wrote:

So that's two hosts - if this is a new cluster chances are the pools
have replication size=3, and won't place replica pgs on the same host...
'ceph osd dump' will let you know if this is the case. If it is ether
reduce size to 2, add another host or edit your crush rules to allow
replica pgs on the same host.

Cheers

Mark

On 09/05/14 18:20, Georg Höllrigl wrote:

#ceph osd tree
# idweight  type name   up/down reweight
-1  76.47   root default
-2  32.72   host ceph-s-01
0   7.27osd.0   up  1
1   7.27osd.1   up  1
2   9.09osd.2   up  1
3   9.09osd.3   up  1
-3  43.75   host ceph-s-02
4   10.91   osd.4   up  1
5   0.11osd.5   up  1
6   10.91   osd.6   up  1
7   10.91   osd.7   up  1
8   10.91   osd.8   up  1


On 08.05.2014 19:11, Craig Lewis wrote:

What does `ceph osd tree` output?

On 5/8/14 07:30 , Georg Höllrigl wrote:

Hello,

We've a fresh cluster setup - with Ubuntu 14.04 and ceph firefly. By
now I've tried this multiple times - but the result keeps the same and
shows me lots of troubles (the cluster is empty, no client has
accessed it)

#ceph -s
cluster b04fc583-9e71-48b7-a741-92f4dff4cfef
 health HEALTH_WARN 470 pgs stale; 470 pgs stuck stale; 18 pgs
stuck unclean; 26 requests are blocked  32 sec
 monmap e2: 3 mons at
{ceph-m-01=10.0.0.100:6789/0,ceph-m-02=10.0.1.101:6789/0,ceph-m-03=10.0.1.102:6789/0},

election epoch 8, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03
 osdmap e409: 9 osds: 9 up, 9 in
  pgmap v1231: 480 pgs, 9 pools, 822 bytes data, 43 objects
9373 MB used, 78317 GB / 78326 GB avail
 451 stale+active+clean
   1 stale+active+clean+scrubbing
  10 active+clean
  18 stale+active+remapped


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubles MDS

2014-04-24 Thread Georg Höllrigl


Looks like you enabled directory fragments, which is buggy in ceph version 0.72.

Regards
Yan, Zheng




When it's enabled it wasn't intentionally. So how would I disable it?

Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubles MDS

2014-04-24 Thread Georg Höllrigl



And that's exactly what it sounds like — the MDS isn't finding objects
that are supposed to be in the RADOS cluster.


I'm not sure, what I should think about that. MDS shouldn't access data 
for RADOS and vice versa?



Anyway, glad it fixed itself, but it sounds like you've got some
infrastructure issues or something you need to sort out first.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com



I think we found a reason - somehow all the memory gets used up - maybe 
some leak? So ATM it's not really fixed.


Is there anything I could do, so that we could track down and fix this 
in future versions?



Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubles MDS

2014-04-17 Thread Georg Höllrigl
 RESETSESSION but no longer connecting
2014-04-16 12:29:50.769693 7f1ddf5c3700  0 -- 10.0.1.107:6803/21953  
10.0.1.107:6789/0 pipe(0x6dbc6280 sd=243 :43169 s=4 pgs=0 cs=0 l=1 
c=0x1358ed580).connect got RESETSESSION but no longer connecting
2014-04-16 12:29:51.621605 7f1de05d3700  0 -- 10.0.1.107:6803/21953  
10.0.1.107:6789/0 pipe(0x21154780 sd=243 :43929 s=4 pgs=0 cs=0 l=1 
c=0xecb1a580).connect got RESETSESSION but no longer connecting
2014-04-16 12:29:51.886405 7f1dea867700 -1 mds.0.13 *** got signal 
Terminated ***
2014-04-16 12:29:51.886894 7f1dea867700  1 mds.0.13 suicide.  wanted 
down:dne, now up:rejoin



Also see lots of these:
ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.900657 7f937a5e1700  0 
log [ERR] : dir 1c4639d.1c4639d object missing on disk; some 
files may be lost
ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.913245 7f937a5e1700  0 
log [ERR] : dir 1c4617e.1c4617e object missing on disk; some 
files may be lost
ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.925811 7f937a5e1700  0 
log [ERR] : dir 1c45d08.1c45d08 object missing on disk; some 
files may be lost
ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.941476 7f937a5e1700  0 
log [ERR] : dir 1c45d9e.1c45d9e object missing on disk; some 
files may be lost
ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.956158 7f937a5e1700  0 
log [ERR] : dir 1c461e5.1c461e5 object missing on disk; some 
files may be lost
ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.968524 7f937a5e1700  0 
log [ERR] : dir 1c46608.1c46608 object missing on disk; some 
files may be lost
ceph-mds.ceph-m-02.log.1.gz:2014-04-16 13:22:44.979229 7f937a5e1700  0 
log [ERR] : dir 1c468b6.1c468b6 object missing on disk; some 
files may be lost


At the moment, I've only one mds running - but clients (mainly using 
fuse) can't connect.




Regards,
Georg



On 16.04.2014 16:27, Gregory Farnum wrote:

What's the backtrace from the MDS crash?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Apr 16, 2014 at 7:11 AM, Georg Höllrigl
georg.hoellr...@xidras.com wrote:

Hello,

Using Ceph MDS with one active and one standby server - a day ago one of the
mds crashed and I restarted it.
Tonight it crashed again, a few hours later, also the second mds crashed.

#ceph -v
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

At the moment cephfs is dead - with following health status:

#ceph -s
 cluster b04fc583-9e71-48b7-a741-92f4dff4cfef
  health HEALTH_WARN mds cluster is degraded; mds c is laggy
  monmap e3: 3 mons at
{ceph-m-01=10.0.0.176:6789/0,ceph-m-02=10.0.1.107:6789/0,ceph-m-03=10.0.1.108:6789/0},
election epoch 6274, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03
  mdsmap e2055: 1/1/1 up {0=ceph-m-03=up:rejoin(laggy or crashed)}
  osdmap e3752: 39 osds: 39 up, 39 in
   pgmap v3277576: 8328 pgs, 17 pools, 6461 GB data, 17066 kobjects
 13066 GB used, 78176 GB / 91243 GB avail
 8328 active+clean
   client io 1193 B/s rd, 0 op/s

I couldn't really find any useful infos in the logfiles nor searching in
documentations. Any ideas how to get cephfs up and running?

Here is part of mds log:
2014-04-16 14:07:05.603501 7ff184c64700  1 mds.0.server reconnect gave up on
client.7846580 10.0.1.152:0/14639
2014-04-16 14:07:05.603525 7ff184c64700  1 mds.0.46 reconnect_done
2014-04-16 14:07:05.674990 7ff186d69700  1 mds.0.46 handle_mds_map i am now
mds.0.46
2014-04-16 14:07:05.674996 7ff186d69700  1 mds.0.46 handle_mds_map state
change up:reconnect -- up:rejoin
2014-04-16 14:07:05.674998 7ff186d69700  1 mds.0.46 rejoin_start
2014-04-16 14:07:22.347521 7ff17f825700  0 -- 10.0.1.107:6815/17325 
10.0.1.68:0/4128280551 pipe(0x5e2ac80 sd=930 :6815 s=2 pgs=153 cs=1 l=0
c=0x5e2e160).fault with nothing to send, going to standby

Any ideas, how to solve laggy or crashed ?


Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubles MDS

2014-04-17 Thread Georg Höllrigl

Whatever happened - It fixed itself!?

When restarting, I got ~ 165k log messages like:
2014-04-17 07:30:14.856421 7fc50b991700  0 log [WRN] :  ino 1f24fe0
2014-04-17 07:30:14.856422 7fc50b991700  0 log [WRN] :  ino 1f24fe1
2014-04-17 07:30:14.856423 7fc50b991700  0 log [WRN] :  ino 1f24fe2
2014-04-17 07:30:14.856424 7fc50b991700  0 log [WRN] :  ino 1f24fe3
2014-04-17 07:30:14.856427 7fc50b991700  0 log [WRN] :  ino 1f24fe4
2014-04-17 07:30:14.856428 7fc50b991700  0 log [WRN] :  ino 1f24fe5

And the clients recovered!?

I would be really interested, what happend!

Georg

On 17.04.2014 09:45, Georg Höllrigl wrote:

Hello Greg,

I've searched - but don't see any backtraces... I've tried to get some
more info out of the logs. I really hope, there is something interesting
in it:

It all started two days ago with an authentication error:

2014-04-14 21:08:55.929396 7fd93d53f700  1 mds.0.0
standby_replay_restart (as standby)
2014-04-14 21:09:07.989547 7fd93b62e700  1 mds.0.0 replay_done (as standby)
2014-04-14 21:09:08.989647 7fd93d53f700  1 mds.0.0
standby_replay_restart (as standby)
2014-04-14 21:09:10.633786 7fd93b62e700  1 mds.0.0 replay_done (as standby)
2014-04-14 21:09:11.633886 7fd93d53f700  1 mds.0.0
standby_replay_restart (as standby)
2014-04-14 21:09:17.995105 7fd93f644700  0 mds.0.0 handle_mds_beacon no
longer laggy
2014-04-14 21:09:39.798798 7fd93f644700  0 monclient: hunting for new mon
2014-04-14 21:09:39.955078 7fd93f644700  1 mds.-1.-1 handle_mds_map i
(10.0.1.107:6800/16503) dne in the mdsmap, respawning myself
2014-04-14 21:09:39.955094 7fd93f644700  1 mds.-1.-1 respawn
2014-04-14 21:09:39.955106 7fd93f644700  1 mds.-1.-1  e:
'/usr/bin/ceph-mds'
2014-04-14 21:09:39.955109 7fd93f644700  1 mds.-1.-1  0:
'/usr/bin/ceph-mds'
2014-04-14 21:09:39.955110 7fd93f644700  1 mds.-1.-1  1: '-i'
2014-04-14 21:09:39.955112 7fd93f644700  1 mds.-1.-1  2: 'ceph-m-02'
2014-04-14 21:09:39.955113 7fd93f644700  1 mds.-1.-1  3: '--pid-file'
2014-04-14 21:09:39.955114 7fd93f644700  1 mds.-1.-1  4:
'/var/run/ceph/mds.ceph-m-02.pid'
2014-04-14 21:09:39.955116 7fd93f644700  1 mds.-1.-1  5: '-c'
2014-04-14 21:09:39.955117 7fd93f644700  1 mds.-1.-1  6:
'/etc/ceph/ceph.conf'
2014-04-14 21:09:39.979138 7fd93f644700  1 mds.-1.-1  cwd /
2014-04-14 19:09:40.922683 7f8ba9973780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505
2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-14 19:09:40.975070 7f8ba9973780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot

That was fixed with restarting mds (+ the whole server).

2014-04-15 07:07:15.948650 7f9fdec0d780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506
2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:07:15.954422 7f9fdec0d780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:15:49.177861 7fe8a1d60780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401
2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:15:49.184046 7fe8a1d60780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:17:32.598031 7fab123e6780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531
2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:17:32.604592 7fab123e6780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:21:56.099203 7fd37b951780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335
2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:21:56.105254 7fd37b951780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:22:09.345800 7f23392ef780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461
2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:22:09.391087 7f23392ef780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:28:01.762191 7fab6d14b780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263
2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:28:01.779507 7fab6d14b780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:35:49.065110 7fe4f6b0d780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 1233
2014-04-15 07:35:52.191856 7fe4f6b05700  0 -- 10.0.1.107:6800/1233 
10.0.1.108:6789/0 pipe(0x2f9f500 sd=8 :0 s=1 pgs=0 cs=0 l=1
c=0x2f81580).fault
2014-04-15 07:35

[ceph-users] Troubles MDS

2014-04-16 Thread Georg Höllrigl

Hello,

Using Ceph MDS with one active and one standby server - a day ago one of 
the mds crashed and I restarted it.

Tonight it crashed again, a few hours later, also the second mds crashed.

#ceph -v
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

At the moment cephfs is dead - with following health status:

#ceph -s
cluster b04fc583-9e71-48b7-a741-92f4dff4cfef
 health HEALTH_WARN mds cluster is degraded; mds c is laggy
 monmap e3: 3 mons at 
{ceph-m-01=10.0.0.176:6789/0,ceph-m-02=10.0.1.107:6789/0,ceph-m-03=10.0.1.108:6789/0}, 
election epoch 6274, quorum 0,1,2 ceph-m-01,ceph-m-02,ceph-m-03

 mdsmap e2055: 1/1/1 up {0=ceph-m-03=up:rejoin(laggy or crashed)}
 osdmap e3752: 39 osds: 39 up, 39 in
  pgmap v3277576: 8328 pgs, 17 pools, 6461 GB data, 17066 kobjects
13066 GB used, 78176 GB / 91243 GB avail
8328 active+clean
  client io 1193 B/s rd, 0 op/s

I couldn't really find any useful infos in the logfiles nor searching in 
documentations. Any ideas how to get cephfs up and running?


Here is part of mds log:
2014-04-16 14:07:05.603501 7ff184c64700  1 mds.0.server reconnect gave 
up on client.7846580 10.0.1.152:0/14639

2014-04-16 14:07:05.603525 7ff184c64700  1 mds.0.46 reconnect_done
2014-04-16 14:07:05.674990 7ff186d69700  1 mds.0.46 handle_mds_map i am 
now mds.0.46
2014-04-16 14:07:05.674996 7ff186d69700  1 mds.0.46 handle_mds_map state 
change up:reconnect -- up:rejoin

2014-04-16 14:07:05.674998 7ff186d69700  1 mds.0.46 rejoin_start
2014-04-16 14:07:22.347521 7ff17f825700  0 -- 10.0.1.107:6815/17325  
10.0.1.68:0/4128280551 pipe(0x5e2ac80 sd=930 :6815 s=2 pgs=153 cs=1 l=0 
c=0x5e2e160).fault with nothing to send, going to standby


Any ideas, how to solve laggy or crashed ?


Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] issues with 'https://ceph.com/git/?p=ceph.git; a=blob_plain; f=keys/release.asc'

2013-10-01 Thread Georg Höllrigl
 documents) is 
intended for the exclusive and confidential use of the individual or entity to 
which this message is addressed, and unless otherwise expressly indicated, is 
confidential and privileged information of Rackspace. Any dissemination, 
distribution or copying of the enclosed material is prohibited. If you receive 
this transmission in error, please notify us immediately by e-mail at 
ab...@rackspace.com and delete the original message. Your cooperation is 
appreciated.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Dipl.-Ing. (FH) Georg Höllrigl
Technik



Xidras GmbH
Stockern 47
3744 Stockern
Austria

Tel: +43 (0) 2983 201 - 30505
Fax: +43 (0) 2983 201 - 930505
Email:   georg.hoellr...@xidras.com
Web: http://www.xidras.com

FN 317036 f | Landesgericht Krems | ATU64485024



VERTRAULICHE INFORMATIONEN!
Diese eMail enthält vertrauliche Informationen und ist nur für den 
berechtigten
Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten 
wir Sie,

diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und
Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen,
noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form.
Wir danken für Ihre Kooperation!

CONFIDENTIAL!
This email contains confidential information and is intended for the 
authorised
recipient only. If you are not an authorised recipient, please return 
the email

to us and then delete it from your computer and mail-server. You may neither
use nor edit any such emails including attachments, nor make them accessible
to third parties in any manner whatsoever.
Thank you for your cooperation

 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] issues with 'https://ceph.com/git/?p=ceph.git; a=blob_plain; f=keys/release.asc'

2013-09-30 Thread Georg Höllrigl

The whole GIT seems unreachable. Anybody knows what's going on?

On 30.09.2013 17:33, Mike O'Toole wrote:

I have had the same issues.


From: qgra...@onq.com.au
To: ceph-users@lists.ceph.com
Date: Mon, 30 Sep 2013 00:01:11 +
Subject: [ceph-users] issues with 'https://ceph.com/git/?p=ceph.git;
a=blob_plain; f=keys/release.asc'

Hey Guys,

Looks like
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' is down.

Regards,

Quenten Grasso


___ ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Using radosgw with s3cmd: Bucket failure

2013-09-06 Thread Georg Höllrigl

On 23.08.2013 16:24, Yehuda Sadeh wrote:

On Fri, Aug 23, 2013 at 1:47 AM, Tobias Brunner tob...@tobru.ch wrote:

Hi,

I'm trying to use radosgw with s3cmd:

# s3cmd ls

# s3cmd mb s3://bucket-1
ERROR: S3 error: 405 (MethodNotAllowed):

So there seems to be something missing according to buckets. How can I
create buckets? What do I have to configure on the radosgw side to have
buckets working?



The problem that you have here is that s3cmd uses the virtual host
bucket name mechanism, e.g. it tries to access http://bucket.host/
instead of the usual http://host/bucket. You can configure the
gateway to support that (set 'rgw dns name = host' in your
ceph.conf), however, you'll also need to be able to route all these
requests to your host, using some catch-all dns. The easiest way to go
would be to configure your client to not use that virtual host bucket
name, but I'm not completely sure s3cmd can do that.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



I'm standing directly at the same problem - but this didn't help. I've 
set up the DNS, can reach the subdomains and also the rgw dns name.

But still the same troubles here :(

Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Using radosgw with s3cmd: Bucket failure

2013-09-06 Thread Georg Höllrigl

Just in case someone stumbles across the same problem:

The option name in ceph.conf is rgw_dns_name - not as described rgw dns 
name at http://ceph.com/docs/next/radosgw/config-ref/ !?


And the hostname needs to be set to your DNS name withouth any wildcard.

Georg


On 06.09.2013 08:51, Georg Höllrigl wrote:

On 23.08.2013 16:24, Yehuda Sadeh wrote:

On Fri, Aug 23, 2013 at 1:47 AM, Tobias Brunner tob...@tobru.ch wrote:

Hi,

I'm trying to use radosgw with s3cmd:

# s3cmd ls

# s3cmd mb s3://bucket-1
ERROR: S3 error: 405 (MethodNotAllowed):

So there seems to be something missing according to buckets. How can I
create buckets? What do I have to configure on the radosgw side to have
buckets working?



The problem that you have here is that s3cmd uses the virtual host
bucket name mechanism, e.g. it tries to access http://bucket.host/
instead of the usual http://host/bucket. You can configure the
gateway to support that (set 'rgw dns name = host' in your
ceph.conf), however, you'll also need to be able to route all these
requests to your host, using some catch-all dns. The easiest way to go
would be to configure your client to not use that virtual host bucket
name, but I'm not completely sure s3cmd can do that.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



I'm standing directly at the same problem - but this didn't help. I've
set up the DNS, can reach the subdomains and also the rgw dns name.
But still the same troubles here :(

Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Dipl.-Ing. (FH) Georg Höllrigl
Technik



Xidras GmbH
Stockern 47
3744 Stockern
Austria

Tel: +43 (0) 2983 201 - 30505
Fax: +43 (0) 2983 201 - 930505
Email:   georg.hoellr...@xidras.com
Web: http://www.xidras.com

FN 317036 f | Landesgericht Krems | ATU64485024



VERTRAULICHE INFORMATIONEN!
Diese eMail enthält vertrauliche Informationen und ist nur für den 
berechtigten
Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten 
wir Sie,

diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und
Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen,
noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form.
Wir danken für Ihre Kooperation!

CONFIDENTIAL!
This email contains confidential information and is intended for the 
authorised
recipient only. If you are not an authorised recipient, please return 
the email

to us and then delete it from your computer and mail-server. You may neither
use nor edit any such emails including attachments, nor make them accessible
to third parties in any manner whatsoever.
Thank you for your cooperation

 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Destroyed Ceph Cluster

2013-08-19 Thread Georg Höllrigl

Hello List,

The troubles to fix such a cluster continue... I get output like this now:

# ceph health
HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; mds cluster is 
degraded; mds vvx-ceph-m-03 is laggy



When checking for the ceph-mds processes, there are now none left... no 
matter which server I check. And the won't start up again!?


The log starts up with:
2013-08-19 11:23:30.503214 7f7e9dfbd780  0 ceph version 0.67 
(e3b7bc5bce8ab330ec1661381072368af3c218a0), process ceph-mds, pid 27636

2013-08-19 11:23:30.523314 7f7e9904b700  1 mds.-1.0 handle_mds_map standby
2013-08-19 11:23:30.529418 7f7e9904b700  1 mds.0.26 handle_mds_map i am 
now mds.0.26
2013-08-19 11:23:30.529423 7f7e9904b700  1 mds.0.26 handle_mds_map state 
change up:standby -- up:replay

2013-08-19 11:23:30.529426 7f7e9904b700  1 mds.0.26 replay_start
2013-08-19 11:23:30.529434 7f7e9904b700  1 mds.0.26  recovery set is
2013-08-19 11:23:30.529436 7f7e9904b700  1 mds.0.26  need osdmap epoch 
277, have 276
2013-08-19 11:23:30.529438 7f7e9904b700  1 mds.0.26  waiting for osdmap 
277 (which blacklists prior instance)
2013-08-19 11:23:30.534090 7f7e9904b700 -1 mds.0.sessionmap _load_finish 
got (2) No such file or directory
2013-08-19 11:23:30.535483 7f7e9904b700 -1 mds/SessionMap.cc: In 
function 'void SessionMap::_load_finish(int, ceph::bufferlist)' thread 
7f7e9904b700 time 2013-08-19 11:23:30.534107

mds/SessionMap.cc: 83: FAILED assert(0 == failed to load sessionmap)


Anyone an idea how to get the cluster back running?





Georg




On 16.08.2013 16:23, Mark Nelson wrote:

Hi Georg,

I'm not an expert on the monitors, but that's probably where I would
start.  Take a look at your monitor logs and see if you can get a sense
for why one of your monitors is down.  Some of the other devs will
probably be around later that might know if there are any known issues
with recreating the OSDs and missing PGs.

Mark

On 08/16/2013 08:21 AM, Georg Höllrigl wrote:

Hello,

I'm still evaluating ceph - now a test cluster with the 0.67 dumpling.
I've created the setup with ceph-deploy from GIT.
I've recreated a bunch of OSDs, to give them another journal.
There already was some test data on these OSDs.
I've already recreated the missing PGs with ceph pg force_create_pg


HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 5 requests
are blocked  32 sec; mds cluster is degraded; 1 mons down, quorum
0,1,2 vvx-ceph-m-01,vvx-ceph-m-02,vvx-ceph-m-03

Any idea how to fix the cluster, besides completley rebuilding the
cluster from scratch? What if such a thing happens in a production
environment...

The pgs from ceph pg dump looks all like creating for some time now:

2.3d0   0   0   0   0   0   0 creating
  2013-08-16 13:43:08.186537   0'0 0:0 []  [] 0'0
0.000'0 0.00

Is there a way to just dump the data, that was on the discarded OSDs?




Kind Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Destroyed Ceph Cluster

2013-08-16 Thread Georg Höllrigl

Hello,

I'm still evaluating ceph - now a test cluster with the 0.67 dumpling.
I've created the setup with ceph-deploy from GIT.
I've recreated a bunch of OSDs, to give them another journal.
There already was some test data on these OSDs.
I've already recreated the missing PGs with ceph pg force_create_pg


HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 5 requests 
are blocked  32 sec; mds cluster is degraded; 1 mons down, quorum 0,1,2 
vvx-ceph-m-01,vvx-ceph-m-02,vvx-ceph-m-03


Any idea how to fix the cluster, besides completley rebuilding the 
cluster from scratch? What if such a thing happens in a production 
environment...


The pgs from ceph pg dump looks all like creating for some time now:

2.3d0   0   0   0   0   0   0   creating 
  2013-08-16 13:43:08.186537   0'0 0:0 []  [] 
0'0 0.000'0 0.00


Is there a way to just dump the data, that was on the discarded OSDs?




Kind Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mounting a pool via fuse

2013-08-13 Thread Georg Höllrigl

Thank you for the explaination.

By mounting as filesystem I'm talking about something similar to this:
http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/

Using the kernel module, I can mount a subdirectory into my directory 
tree - a directory, where I have assigned a pool.

Using fuse, I can't mount a subdirectory?

By the way setting the layout seems to have a bug:

# cephfs /mnt/macm01 set_layout -p 4
Error setting layout: Invalid argument

I have to add the -u option, then it works:

# cephfs /mnt/mailstore set_layout -p 5 -u 4194304

Kind Regards,
Georg





On 13.08.2013 12:09, Dzianis Kahanovich wrote:

Georg Höllrigl пишет:


I'm using ceph 0.61.7.

When using ceph-fuse, I couldn't find a way, to only mount one pool.

Is there a way to mount a pool - or is it simply not supported?


This mean mount as fs?
Same as kernel-level cephfs (fuse  cephfs = same instance). You cannot mount
pool, but can mount filesystem and can map pool to any point of filesystem
(file or directory), include root.

First, mount ceph via kernel - mount -t ceph (just for cephfs tool syntax
compatibility). For example - to /mnt/ceph. Then say ceph df and lookup pool
number (not name!), for example pool number is 10. And last:
mkdir -p /mnt/ceph/pools/pool1
cephfs /mnt/ceph/pools/pool1 set_layout -p 10

or just (for ceph's root):

cephfs /mnt/ceph set_layout -p 10

Next you can unmount kernel-level and mount this point via fuse.

PS For ceph developers: trying this for qouta (with ceph osd pool set-quota)
semi-working: on quota overflow - nothing limited, but ceph health show
warning. In case of no other ways to quota, it may qualified as bug and not
too actual only while big number of pools performance limitation. So, FYI.



--
Dipl.-Ing. (FH) Georg Höllrigl
Technik



Xidras GmbH
Stockern 47
3744 Stockern
Austria

Tel: +43 (0) 2983 201 - 30505
Fax: +43 (0) 2983 201 - 930505
Email:   georg.hoellr...@xidras.com
Web: http://www.xidras.com

FN 317036 f | Landesgericht Krems | ATU64485024



VERTRAULICHE INFORMATIONEN!
Diese eMail enthält vertrauliche Informationen und ist nur für den 
berechtigten
Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten 
wir Sie,

diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und
Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen,
noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form.
Wir danken für Ihre Kooperation!

CONFIDENTIAL!
This email contains confidential information and is intended for the 
authorised
recipient only. If you are not an authorised recipient, please return 
the email

to us and then delete it from your computer and mail-server. You may neither
use nor edit any such emails including attachments, nor make them accessible
to third parties in any manner whatsoever.
Thank you for your cooperation

 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mounting a pool via fuse

2013-08-09 Thread Georg Höllrigl

Hi,

I'm using ceph 0.61.7.

When using ceph-fuse, I couldn't find a way, to only mount one pool.

Is there a way to mount a pool - or is it simply not supported?



Kind Regards,
Georg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com