Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-03 Thread Chris Taylor

On 2015-11-03 12:01 am, gjprabu wrote:


Hi Taylor,

Details are below.

CEPH -S
cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f
health HEALTH_OK
monmap e2: 3 mons at 
{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}

election epoch 526, quorum 0,1,2 integ-hm5,integ-hm6,integ-hm7
osdmap e50127: 3 osds: 3 up, 3 in
pgmap v2923439: 190 pgs, 2 pools, 3401 GB data, 920 kobjects
6711 GB used, 31424 GB / 40160 GB avail
190 active+clean
client io 35153 kB/s rd, 1912 kB/s wr, 672 op/s

Client is automatically unmounted in our cause.

Is it possible to change the PG_num in the production setup.


Yes, but increase by small amounts. I would start at 5 and work up to 32 
at a time to find a comfortable number for your cluster. Too many at a 
time will cause a big performance problem until the cluster recovers.


Increase pg_num first: # ceph osd pool set {pool-name} pg_num {pg_num}

PGs will finish peering.

Then increase pgp_num: ceph osd pool set {pool-name} pgp_num {pg_num}

Objects will replicate until cluster is re-balanced.

Take a look at: 
http://docs.ceph.com/docs/master/rados/operations/placement-groups/




Journal stored on SATA 7.2k RPM 6GPS and 1gb network interface.



For every write to the cluster the data is written to the journal and 
then to the backend filesystem. The process is repeated for each 
replica. This creates a lot of IO to the disk. Without journals on an 
SSD I think you are better off with each disk as it's own OSD instead of 
a large RAID array as a single OSD. Just my opinion. In the case of 
disk/array failure recovery time would be less because of less data to 
recover.


I think a single 1Gb network interface is not enough for a production 
network. A single SATA disk could saturate a 1Gb network link. But it 
will depend on your workload.



We are not configured Public and cluster as a separate network and it 
will be transferable via same LAN. Do we need to do this setup for 
better performance.


It will provide extra bandwidth for the cluster to replicate data 
between OSDs instead of using the public/client network.




Also what is beter i/o operation setting for the crush map.


AFAIK, the crush map controls data placement, not so much for 
performance.




Still we are getting errors in ceph osd logs ,what need to done for 
this error.


Have you tried disabling offloading on the NIC? # ethtool -K eth0 tx off



2015-11-03 13:04:18.809488 7f387019c700 0 BAD CRC IN DATA 3742210963 != 
EXP 924878202
2015-11-03 13:04:18.812911 7f387019c700 0 -- 192.168.112.231:6800/49908 
>> 192.168.112.192:0/1457324982 pipe(0x170d2000 sd=44 :6800 s=0 pgs=0 cs=0 l=0 c=0x1b18bf40).accept peer addr is really 192.168.112.192:0/1457324982 (socket is 192.168.112.192:47128/0)


Regards
Prabu

 On Tue, 03 Nov 2015 12:50:40 +0530 CHRIS TAYLOR 
 wrote 



On 2015-11-02 10:19 pm, gjprabu wrote:


Hi Taylor,

I have checked DNS name and all host resolve to the correct IP. MTU
size is 1500 in switch level configuration done. There is no 
firewall/

selinux is running currently.

Also we would like to know below query's which already in the thread.

Regards
Prabu

 On Tue, 03 Nov 2015 11:20:07 +0530 CHRIS TAYLOR
 wrote 

I would double check the network configuration on the new node.
Including hosts files and DNS names. Do all the host names resolve to
the correct IP addresses from all hosts?

"... 192.168.112.231:6800/49908 >> 192.168.113.42:0/599324131 ..."

Looks like the communication between subnets is a problem. Is
xxx.xxx.113.xxx a typo? If that's correct, check MTU sizes. Are they
configured correctly on the switch and all NICs?

Is there any iptables/firewall rules that could be blocking traffic
between hosts?

Hope that helps,

Chris

On 2015-11-02 9:18 pm, gjprabu wrote:

Hi,

Anybody please help me on this issue.

Regards
Prabu

 On Mon, 02 Nov 2015 17:54:27 +0530 GJPRABU 


wrote 

Hi Team,

We have ceph setup with 2 OSD and replica 2 and it is mounted with
ocfs2 clients and its working. When we added new osd all the clients
rbd mapped device disconnected and got hanged by running rbd ls or 
rbd

map command. We waited for long hours to scale the new osd size but
peering not completed event data sync finished, but client side issue
was persist and thought to try old osd service stop/start, after some
time rbd mapped automatically using existing map script.

After service stop/start in old osd again 3rd OSD rebuild and back
filling started and after some time clients rbd mapped device
disconnected and got hanged by running rbd ls or rbd map command. We
thought to wait till to finished data sync in 3'rd OSD and its
completed, even though client side rbd not mapped. After we restarted
all mon and osd service and client side issue got fixed and mounted
rbd. We suspected some issue in our setup. also attached logs for 
your

reference.



What does 'ceph -s' look like? is the cluster HEALTH_OK?



Something we are missing 

Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-03 Thread gjprabu
Hi Taylor,



   Details are below.



ceph -s

cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f

 health HEALTH_OK

 monmap e2: 3 mons at 
{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}

election epoch 526, quorum 0,1,2 integ-hm5,integ-hm6,integ-hm7

 osdmap e50127: 3 osds: 3 up, 3 in

  pgmap v2923439: 190 pgs, 2 pools, 3401 GB data, 920 kobjects

6711 GB used, 31424 GB / 40160 GB avail

 190 active+clean

 client io 35153 kB/s rd, 1912 kB/s wr, 672 op/s



Client is automatically unmounted in our cause.



Is it possible to change the PG_num in the production setup.



Journal stored on SATA 7.2k RPM  6GPS and 1gb network interface.



We are not configured Public and cluster as a separate network and it will be 
transferable via same LAN. Do we need to do this setup for better performance.



Also what is beter i/o operation setting for the crush map.





Still we are getting errors in ceph osd logs ,what need to done for this error.



2015-11-03 13:04:18.809488 7f387019c700  0 bad crc in data 3742210963 != exp 
924878202

2015-11-03 13:04:18.812911 7f387019c700  0 -- 192.168.112.231:6800/49908 
>> 192.168.112.192:0/1457324982 pipe(0x170d2000 sd=44 :6800 s=0 pgs=0 
cs=0 l=0 c=0x1b18bf40).accept peer addr is really 192.168.112.192:0/1457324982 
(socket is 192.168.112.192:47128/0)





Regards

Prabu










 On Tue, 03 Nov 2015 12:50:40 +0530 Chris Taylor  
wrote 




On 2015-11-02 10:19 pm, gjprabu wrote: 



> Hi Taylor, 

> 

> I have checked DNS name and all host resolve to the correct IP. MTU 

> size is 1500 in switch level configuration done. There is no firewall/ 

> selinux is running currently. 

> 

> Also we would like to know below query's which already in the thread. 

> 

> Regards 

> Prabu 

> 

>  On Tue, 03 Nov 2015 11:20:07 +0530 CHRIS TAYLOR 

>  wrote  

> 

> I would double check the network configuration on the new node. 

> Including hosts files and DNS names. Do all the host names resolve to 

> the correct IP addresses from all hosts? 

> 

> "... 192.168.112.231:6800/49908 >> 192.168.113.42:0/599324131 ..." 

> 

> Looks like the communication between subnets is a problem. Is 

> xxx.xxx.113.xxx a typo? If that's correct, check MTU sizes. Are they 

> configured correctly on the switch and all NICs? 

> 

> Is there any iptables/firewall rules that could be blocking traffic 

> between hosts? 

> 

> Hope that helps, 

> 

> Chris 

> 

> On 2015-11-02 9:18 pm, gjprabu wrote: 

> 

> Hi, 

> 

> Anybody please help me on this issue. 

> 

> Regards 

> Prabu 

> 

>  On Mon, 02 Nov 2015 17:54:27 +0530 GJPRABU 
 

> wrote  

> 

> Hi Team, 

> 

> We have ceph setup with 2 OSD and replica 2 and it is mounted with 

> ocfs2 clients and its working. When we added new osd all the clients 

> rbd mapped device disconnected and got hanged by running rbd ls or rbd 

> map command. We waited for long hours to scale the new osd size but 

> peering not completed event data sync finished, but client side issue 

> was persist and thought to try old osd service stop/start, after some 

> time rbd mapped automatically using existing map script. 

> 

> After service stop/start in old osd again 3rd OSD rebuild and back 

> filling started and after some time clients rbd mapped device 

> disconnected and got hanged by running rbd ls or rbd map command. We 

> thought to wait till to finished data sync in 3'rd OSD and its 

> completed, even though client side rbd not mapped. After we restarted 

> all mon and osd service and client side issue got fixed and mounted 

> rbd. We suspected some issue in our setup. also attached logs for your 

> reference. 

> 



What does 'ceph -s' look like? is the cluster HEALTH_OK? 



> 

> Something we are missing in our setup i don't know, highly appreciated 

> if anybody help us to solve this issue. 

> 

> Before new osd.2 addition : 

> 

> osd.0 - size : 13T and used 2.7 T 

> osd.1 - size : 13T and used 2.7 T 

> 

> After new osd addition : 

> osd.0 size : 13T and used 1.8T 

> osd.1 size : 13T and used 2.1T 

> osd.2 size : 15T and used 2.5T 

> 

> rbd ls 

> repo / integrepository (pg_num: 126) 

> rbd / integdownloads (pg_num: 64) 

> 

> Also we would like to know few clarifications . 

> 

> If any new osd will be added whether all client will be unmounted 

> automatically . 

> 



Clients do not need to unmount images when OSDs are added. 



> While add new osd can we access ( read / write ) from client machines ? 

> 



Clients still have read/write access to RBD images in the cluster while 

adding OSDs and during recovery. 



> How much data will be added in new osd - without change any repilca / 

> pg_num ? 

> 



The data will re-balance between OSDs automatically. I found having more 

PGs help distribute the load more evenly. 

Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-02 Thread Chris Taylor

On 2015-11-02 10:19 pm, gjprabu wrote:


Hi Taylor,

I have checked DNS name and all host resolve to the correct IP. MTU 
size is 1500 in switch level configuration done. There is no firewall/ 
selinux is running currently.


Also we would like to know below query's which already in the thread.

Regards
Prabu

 On Tue, 03 Nov 2015 11:20:07 +0530 CHRIS TAYLOR 
 wrote 


I would double check the network configuration on the new node. 
Including hosts files and DNS names. Do all the host names resolve to 
the correct IP addresses from all hosts?


"... 192.168.112.231:6800/49908 >> 192.168.113.42:0/599324131 ..."

Looks like the communication between subnets is a problem. Is 
xxx.xxx.113.xxx a typo? If that's correct, check MTU sizes. Are they 
configured correctly on the switch and all NICs?


Is there any iptables/firewall rules that could be blocking traffic 
between hosts?


Hope that helps,

Chris

On 2015-11-02 9:18 pm, gjprabu wrote:

Hi,

Anybody please help me on this issue.

Regards
Prabu

 On Mon, 02 Nov 2015 17:54:27 +0530 GJPRABU  
wrote 


Hi Team,

We have ceph setup with 2 OSD and replica 2 and it is mounted with 
ocfs2 clients and its working. When we added new osd all the clients 
rbd mapped device disconnected and got hanged by running rbd ls or rbd 
map command. We waited for long hours to scale the new osd size but 
peering not completed event data sync finished, but client side issue 
was persist and thought to try old osd service stop/start, after some 
time rbd mapped automatically using existing map script.


After service stop/start in old osd again 3rd OSD rebuild and back 
filling started and after some time clients rbd mapped device 
disconnected and got hanged by running rbd ls or rbd map command. We 
thought to wait till to finished data sync in 3'rd OSD and its 
completed, even though client side rbd not mapped. After we restarted 
all mon and osd service and client side issue got fixed and mounted 
rbd. We suspected some issue in our setup. also attached logs for your 
reference.




What does 'ceph -s' look like? is the cluster HEALTH_OK?



Something we are missing in our setup i don't know, highly appreciated 
if anybody help us to solve this issue.


Before new osd.2 addition :

osd.0 - size : 13T and used 2.7 T
osd.1 - size : 13T and used 2.7 T

After new osd addition :
osd.0 size : 13T and used 1.8T
osd.1 size : 13T and used 2.1T
osd.2 size : 15T and used 2.5T

rbd ls
repo / integrepository (pg_num: 126)
rbd / integdownloads (pg_num: 64)

Also we would like to know few clarifications .

If any new osd will be added whether all client will be unmounted 
automatically .




Clients do not need to unmount images when OSDs are added.


While add new osd can we access ( read / write ) from client machines ?



Clients still have read/write access to RBD images in the cluster while 
adding OSDs and during recovery.


How much data will be added in new osd - without change any repilca / 
pg_num ?




The data will re-balance between OSDs automatically. I found having more 
PGs help distribute the load more evenly.



How long to take finish this process ?


Depends greatly on the hardware and configuration. Whether Journals on 
SSD or spinning disks, network connectivity, max_backfills, etc.




If we missed any common configuration - please share the same .


I don't see any configuration for public and cluster networks. If you 
are sharing the same network for clients and object replication/recovery 
the cluster re-balancing data between OSDs could cause problems with the 
client traffic.


Take a look at: 
http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/




ceph.conf
[global]
fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f
mon_initial_members = integ-hm5, integ-hm6, integ-hm7
mon_host = 192.168.112.192,192.168.112.193,192.168.112.194
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2

[mon]
mon_clock_drift_allowed = .500

[client]
rbd_cache = false

Current Logs from new osd also attached old logs.

2015-11-02 12:47:48.481641 7f386f691700 0 bad crc in data 3889133030 != 
exp 2857248268
2015-11-02 12:47:48.482230 7f386f691700 0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 l=0 c=0xc510580).accept peer addr is really 192.168.113.42:0/599324131 (socket is 192.168.113.42:42530/0)
2015-11-02 12:47:48.483951 7f386f691700 0 bad crc in data 3192803598 != 
exp 1083014631
2015-11-02 12:47:48.484512 7f386f691700 0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 cs=0 l=0 c=0xc516f60).accept peer addr is really 192.168.113.42:0/599324131 (socket is 192.168.113.42:42531/0)
2015-11-02 12:47:48.486284 7f386f691700 0 bad crc in data 133120597 != 
exp 393328400
2015-11-02 12:47:48.486777 7f386f691700 0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/5

Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-02 Thread gjprabu
Hi Taylor,



  I have checked DNS name and all host resolve to the correct IP. MTU 
size is 1500 in switch level configuration done. There is no firewall/ selinux  
is running currently. 



 Also we would like to know below query's which already in the thread.



Regards

Prabu



  On Tue, 03 Nov 2015 11:20:07 +0530 Chris Taylor 
 wrote 




I would double check the network configuration on the new node. Including hosts 
files and DNS names. Do all the host names resolve to the correct IP addresses 
from all hosts?

"... 192.168.112.231:6800/49908 >> 192.168.113.42:0/599324131 ..."

Looks like the communication between subnets is a problem. Is xxx.xxx.113.xxx a 
typo? If that's correct, check MTU sizes. Are they configured correctly on the 
switch and all NICs?

Is there any iptables/firewall rules that could be blocking traffic between 
hosts?



Hope that helps,

Chris





On 2015-11-02 9:18 pm, gjprabu wrote:

Hi,



Anybody please help me on this issue.



Regards

Prabu





 On Mon, 02 Nov 2015 17:54:27 +0530 gjprabu  
wrote 






Hi Team,



   We have ceph setup with 2 OSD and replica 2 and it is mounted with ocfs2 
clients and its working. When we added new osd  all the clients rbd mapped 
device disconnected and got hanged by running rbd ls or rbd map command. We 
waited for long hours to scale the new osd size but peering not completed event 
data sync finished, but client side issue was persist and thought to try old 
osd service stop/start, after some time rbd mapped automatically using existing 
map script.



   After service stop/start in old osd again 3rd OSD rebuild and back 
filling started and after some time clients rbd mapped device disconnected and 
got hanged by running rbd ls or rbd map command. We thought to wait till to 
finished data sync in 3'rd OSD and its completed, even though client side rbd 
not mapped. After we restarted all mon and osd service and client side issue 
got fixed and mounted rbd. We suspected some issue in our setup. also attached 
logs for your reference.



  Something we are missing in our setup i don't know, highly appreciated if 
anybody help us to solve this issue.





Before new osd.2 addition :



osd.0 - size : 13T  and used 2.7 T

osd.1 - size : 13T  and used 2.7 T



After new osd addition :

osd.0  size : 13T  and used  1.8T

osd.1  size : 13T  and used  2.1T

osd.2  size : 15T  and used  2.5T



rbd ls

repo / integrepository  (pg_num: 126)

rbd / integdownloads (pg_num: 64)







Also we would like to know few clarifications .



If any new osd will be added whether all client will be unmounted automatically 
.



While add new osd can we access ( read / write ) from client machines ?



How much data will be added in new osd - without change any repilca / pg_num ?



How long to take finish this process ? 



If we missed any common configuration - please share the same .





ceph.conf

[global]

fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f

mon_initial_members = integ-hm5, integ-hm6, integ-hm7

mon_host = 192.168.112.192,192.168.112.193,192.168.112.194

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

osd_pool_default_size = 2



[mon]

mon_clock_drift_allowed = .500



[client]

rbd_cache = false



Current Logs from new osd also attached old logs.



2015-11-02 12:47:48.481641 7f386f691700  0 bad crc in data 3889133030 != exp 
2857248268

2015-11-02 12:47:48.482230 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc510580).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42530/0)

2015-11-02 12:47:48.483951 7f386f691700  0 bad crc in data 3192803598 != exp 
1083014631

2015-11-02 12:47:48.484512 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc516f60).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42531/0)

2015-11-02 12:47:48.486284 7f386f691700  0 bad crc in data 133120597 != exp 
393328400

2015-11-02 12:47:48.486777 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc514620).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42532/0)

2015-11-02 12:47:48.488624 7f386f691700  0 bad crc in data 3299720069 != exp 
211350069

2015-11-02 12:47:48.489100 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc513860).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42533/0)

2015-11-02 12:47:48.490911 7f386f691700  0 bad crc in data 2381447347 != exp 
1177846878

2015-11-02 12:47:48.491390 7f386f691700  0 -- 192.168.112.231:6800/49908 
>

Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-02 Thread Chris Taylor
 

I would double check the network configuration on the new node.
Including hosts files and DNS names. Do all the host names resolve to
the correct IP addresses from all hosts? 

"... 192.168.112.231:6800/49908 >> 192.168.113.42:0/599324131 ..." 

Looks like the communication between subnets is a problem. Is
xxx.xxx.113.xxx a typo? If that's correct, check MTU sizes. Are they
configured correctly on the switch and all NICs? 

Is there any iptables/firewall rules that could be blocking traffic
between hosts? 

Hope that helps, 

Chris 

On 2015-11-02 9:18 pm, gjprabu wrote: 

> Hi, 
> 
> Anybody please help me on this issue. 
> 
> Regards 
> Prabu 
> 
>  On Mon, 02 Nov 2015 17:54:27 +0530 GJPRABU  wrote 
>  
> 
>> Hi Team, 
>> 
>> We have ceph setup with 2 OSD and replica 2 and it is mounted with ocfs2 
>> clients and its working. When we added new osd all the clients rbd mapped 
>> device disconnected and got hanged by running rbd ls or rbd map command. We 
>> waited for long hours to scale the new osd size but peering not completed 
>> event data sync finished, but client side issue was persist and thought to 
>> try old osd service stop/start, after some time rbd mapped automatically 
>> using existing map script. 
>> 
>> After service stop/start in old osd again 3rd OSD rebuild and back filling 
>> started and after some time clients rbd mapped device disconnected and got 
>> hanged by running rbd ls or rbd map command. We thought to wait till to 
>> finished data sync in 3'rd OSD and its completed, even though client side 
>> rbd not mapped. After we restarted all mon and osd service and client side 
>> issue got fixed and mounted rbd. We suspected some issue in our setup. also 
>> attached logs for your reference. 
>> 
>> Something we are missing in our setup i don't know, highly appreciated if 
>> anybody help us to solve this issue. 
>> 
>> Before new osd.2 addition : 
>> 
>> osd.0 - size : 13T and used 2.7 T 
>> osd.1 - size : 13T and used 2.7 T 
>> 
>> After new osd addition : 
>> osd.0 size : 13T and used 1.8T 
>> osd.1 size : 13T and used 2.1T 
>> osd.2 size : 15T and used 2.5T 
>> 
>> rbd ls 
>> repo / integrepository (pg_num: 126) 
>> rbd / integdownloads (pg_num: 64) 
>> 
>> Also we would like to know few clarifications . 
>> 
>> If any new osd will be added whether all client will be unmounted 
>> automatically . 
>> 
>> While add new osd can we access ( read / write ) from client machines ? 
>> 
>> How much data will be added in new osd - without change any repilca / pg_num 
>> ? 
>> 
>> How long to take finish this process ? 
>> 
>> If we missed any common configuration - please share the same . 
>> 
>> ceph.conf 
>> [global] 
>> fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f 
>> mon_initial_members = integ-hm5, integ-hm6, integ-hm7 
>> mon_host = 192.168.112.192,192.168.112.193,192.168.112.194 
>> auth_cluster_required = cephx 
>> auth_service_required = cephx 
>> auth_client_required = cephx 
>> filestore_xattr_use_omap = true 
>> osd_pool_default_size = 2 
>> 
>> [mon] 
>> mon_clock_drift_allowed = .500 
>> 
>> [client] 
>> rbd_cache = false 
>> 
>> Current Logs from new osd also attached old logs. 
>> 
>> 2015-11-02 12:47:48.481641 7f386f691700 0 bad crc in data 3889133030 != exp 
>> 2857248268 
>> 2015-11-02 12:47:48.482230 7f386f691700 0 -- 192.168.112.231:6800/49908 >> 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 l=0 
>> c=0xc510580).accept peer addr is really 192.168.113.42:0/599324131 (socket 
>> is 192.168.113.42:42530/0) 
>> 2015-11-02 12:47:48.483951 7f386f691700 0 bad crc in data 3192803598 != exp 
>> 1083014631 
>> 2015-11-02 12:47:48.484512 7f386f691700 0 -- 192.168.112.231:6800/49908 >> 
>> 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 cs=0 l=0 
>> c=0xc516f60).accept peer addr is really 192.168.113.42:0/599324131 (socket 
>> is 192.168.113.42:42531/0) 
>> 2015-11-02 12:47:48.486284 7f386f691700 0 bad crc in data 133120597 != exp 
>> 393328400 
>> 2015-11-02 12:47:48.486777 7f386f691700 0 -- 192.168.112.231:6800/49908 >> 
>> 192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0 cs=0 l=0 
>> c=0xc514620).accept peer addr is really 192.168.113.42:0/599324131 (socket 
>> is 192.168.113.42:42532/0) 
>> 2015-11-02 12:47:48.488624 7f386f691700 0 bad crc in data 3299720069 != exp 
>> 211350069 
>> 2015-11-02 12:47:48.489100 7f386f691700 0 -- 192.168.112.231:6800/49908 >> 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 l=0 
>> c=0xc513860).accept peer addr is really 192.168.113.42:0/599324131 (socket 
>> is 192.168.113.42:42533/0) 
>> 2015-11-02 12:47:48.490911 7f386f691700 0 bad crc in data 2381447347 != exp 
>> 1177846878 
>> 2015-11-02 12:47:48.491390 7f386f691700 0 -- 192.168.112.231:6800/49908 >> 
>> 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 cs=0 l=0 
>> c=0xc513700).accept peer addr is really 192.168.113.42:0/599324131 (socket 
>> is 192.168.113.42:42534/0) 
>> 20

Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-02 Thread gjprabu
Hi,



Anybody please help me on this issue.



Regards

Prabu




  On Mon, 02 Nov 2015 17:54:27 +0530 gjprabu  
wrote 






Hi Team,



   We have ceph setup with 2 OSD and replica 2 and it is mounted with ocfs2 
clients and its working. When we added new osd  all the clients rbd mapped 
device disconnected and got hanged by running rbd ls or rbd map command. We 
waited for long hours to scale the new osd size but peering not completed event 
data sync finished, but client side issue was persist and thought to try old 
osd service stop/start, after some time rbd mapped automatically using existing 
map script.



   After service stop/start in old osd again 3rd OSD rebuild and back 
filling started and after some time clients rbd mapped device disconnected and 
got hanged by running rbd ls or rbd map command. We thought to wait till to 
finished data sync in 3'rd OSD and its completed, even though client side rbd 
not mapped. After we restarted all mon and osd service and client side issue 
got fixed and mounted rbd. We suspected some issue in our setup. also attached 
logs for your reference.



  Something we are missing in our setup i don't know, highly appreciated if 
anybody help us to solve this issue.





Before new osd.2 addition :



osd.0 - size : 13T  and used 2.7 T

osd.1 - size : 13T  and used 2.7 T



After new osd addition :

osd.0  size : 13T  and used  1.8T

osd.1  size : 13T  and used  2.1T

osd.2  size : 15T  and used  2.5T



rbd ls

repo / integrepository  (pg_num: 126)

rbd / integdownloads (pg_num: 64)







Also we would like to know few clarifications .



If any new osd will be added whether all client will be unmounted automatically 
.



While add new osd can we access ( read / write ) from client machines ?



How much data will be added in new osd - without change any repilca / pg_num ?



How long to take finish this process ? 



If we missed any common configuration - please share the same .





ceph.conf

[global]

fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f

mon_initial_members = integ-hm5, integ-hm6, integ-hm7

mon_host = 192.168.112.192,192.168.112.193,192.168.112.194

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

osd_pool_default_size = 2



[mon]

mon_clock_drift_allowed = .500



[client]

rbd_cache = false



Current Logs from new osd also attached old logs.



2015-11-02 12:47:48.481641 7f386f691700  0 bad crc in data 3889133030 != exp 
2857248268

2015-11-02 12:47:48.482230 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc510580).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42530/0)

2015-11-02 12:47:48.483951 7f386f691700  0 bad crc in data 3192803598 != exp 
1083014631

2015-11-02 12:47:48.484512 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc516f60).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42531/0)

2015-11-02 12:47:48.486284 7f386f691700  0 bad crc in data 133120597 != exp 
393328400

2015-11-02 12:47:48.486777 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc514620).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42532/0)

2015-11-02 12:47:48.488624 7f386f691700  0 bad crc in data 3299720069 != exp 
211350069

2015-11-02 12:47:48.489100 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc513860).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42533/0)

2015-11-02 12:47:48.490911 7f386f691700  0 bad crc in data 2381447347 != exp 
1177846878

2015-11-02 12:47:48.491390 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc513700).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42534/0)

2015-11-02 12:47:48.493167 7f386f691700  0 bad crc in data 2093712440 != exp 
2175112954

2015-11-02 12:47:48.493682 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc514200).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42535/0)

2015-11-02 12:47:48.495150 7f386f691700  0 bad crc in data 3047197039 != exp 
38098198

2015-11-02 12:47:48.495679 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc510b00).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42536/0)

2015-11-02 12:47:48.497259 7f386f691700  0 bad crc in data 1400444622 != exp 
2648291990

2015-11-02 12:

[ceph-users] ceph new osd addition and client disconnected

2015-11-02 Thread gjprabu


Hi Team,



   We have ceph setup with 2 OSD and replica 2 and it is mounted with ocfs2 
clients and its working. When we added new osd  all the clients rbd mapped 
device disconnected and got hanged by running rbd ls or rbd map command. We 
waited for long hours to scale the new osd size but peering not completed event 
data sync finished, but client side issue was persist and thought to try old 
osd service stop/start, after some time rbd mapped automatically using existing 
map script.



   After service stop/start in old osd again 3rd OSD rebuild and back 
filling started and after some time clients rbd mapped device disconnected and 
got hanged by running rbd ls or rbd map command. We thought to wait till to 
finished data sync in 3'rd OSD and its completed, even though client side rbd 
not mapped. After we restarted all mon and osd service and client side issue 
got fixed and mounted rbd. We suspected some issue in our setup. also attached 
logs for your reference.



  Something we are missing in our setup i don't know, highly appreciated if 
anybody help us to solve this issue.





Before new osd.2 addition :



osd.0 - size : 13T  and used 2.7 T

osd.1 - size : 13T  and used 2.7 T



After new osd addition :

osd.0  size : 13T  and used  1.8T

osd.1  size : 13T  and used  2.1T

osd.2  size : 15T  and used  2.5T



rbd ls

repo / integrepository  (pg_num: 126)

rbd / integdownloads (pg_num: 64)







Also we would like to know few clarifications .



If any new osd will be added whether all client will be unmounted automatically 
.



While add new osd can we access ( read / write ) from client machines ?



How much data will be added in new osd - without change any repilca / pg_num ?



How long to take finish this process ? 



If we missed any common configuration - please share the same .





ceph.conf

[global]

fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f

mon_initial_members = integ-hm5, integ-hm6, integ-hm7

mon_host = 192.168.112.192,192.168.112.193,192.168.112.194

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

osd_pool_default_size = 2



[mon]

mon_clock_drift_allowed = .500



[client]

rbd_cache = false



Current Logs from new osd also attached old logs.



2015-11-02 12:47:48.481641 7f386f691700  0 bad crc in data 3889133030 != exp 
2857248268

2015-11-02 12:47:48.482230 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc510580).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42530/0)

2015-11-02 12:47:48.483951 7f386f691700  0 bad crc in data 3192803598 != exp 
1083014631

2015-11-02 12:47:48.484512 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc516f60).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42531/0)

2015-11-02 12:47:48.486284 7f386f691700  0 bad crc in data 133120597 != exp 
393328400

2015-11-02 12:47:48.486777 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc514620).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42532/0)

2015-11-02 12:47:48.488624 7f386f691700  0 bad crc in data 3299720069 != exp 
211350069

2015-11-02 12:47:48.489100 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc513860).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42533/0)

2015-11-02 12:47:48.490911 7f386f691700  0 bad crc in data 2381447347 != exp 
1177846878

2015-11-02 12:47:48.491390 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc513700).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42534/0)

2015-11-02 12:47:48.493167 7f386f691700  0 bad crc in data 2093712440 != exp 
2175112954

2015-11-02 12:47:48.493682 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc514200).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42535/0)

2015-11-02 12:47:48.495150 7f386f691700  0 bad crc in data 3047197039 != exp 
38098198

2015-11-02 12:47:48.495679 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0xc510b00).accept peer addr is really 192.168.113.42:0/599324131 (socket 
is 192.168.113.42:42536/0)

2015-11-02 12:47:48.497259 7f386f691700  0 bad crc in data 1400444622 != exp 
2648291990

2015-11-02 12:47:48.497756 7f386f691700  0 -- 192.168.112.231:6800/49908 
>> 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 cs=0 
l=0 c=0x17f7b700).accept peer