On Mon, Mar 4, 2013 at 9:23 AM, Sławomir Skowron <szi...@gmail.com> wrote: > On Mon, Mar 4, 2013 at 6:02 PM, Sage Weil <s...@inktank.com> wrote: >> On Mon, 4 Mar 2013, S?awomir Skowron wrote: >>> Ok, thanks for response. But if i have crush map like this in attachment. >>> >>> All data should be balanced equal, not including hosts with 0.5 weight. >>> >>> How make data auto balanced ?? when i know that some pq's have too >>> much data ?? I have 4800 pg's on RGW only with 78 OSD, it is quite >>> enough. >>> >>> pool 3 '.rgw.buckets' rep size 3 crush_ruleset 0 object_hash rjenkins >>> pg_num 4800 pgp_num 4800 last_change 908 owner 0 >>> >>> When will bee possible to expand number of pg's ?? >> >> Soon. :) >> >> The bigger question for me is why there is one PG that is getting pounded >> while the others are not. Is there a large skew in the workload toward a >> small number of very hot objects? > > Yes, there are constantly about 100-200 operations in second, all > going into RGW backend. But when problems comes, there are more > requests, more GET, and PUT, because of reconnect of applications, > with short timeouts. But statistically all new PUTs normally goes for > many pg's, this should not overload a single master OSD. Maybe > balanced Reads from all replicas could help a little ??. > >> I expect it should be obvious if you go >> to the loaded osd and do >> >> ceph --admin-daemon /var/run/ceph/ceph-osd.NN.asok dump_ops_in_flight >> > > Yes i did that, but only when cluster going unstable there are such > long operations. Normaly there are no ops in queue, only when cluster > going to rebalance, remap, or anything else.
Have you checked the baseline disk performance of the OSDs? Perhaps it's not that the PG is bad but that the OSDs are slow. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html