Hi Ram,

On 12/01/17 11:49, Ankireddypalle Reddy wrote:
Xavi,
          As I mentioned before the error could happen for any FOP. Will try to 
run with TRACE debug level. Is there a possibility that we are checking for 
this attribute on a directory, because a directory does not seem to be having 
this attribute set.

No, directories do not have this attribute and no one should be reading it from a directory.

Also is the function to check size and version called after it is decided that 
heal should be run or is this check is the one which decides whether a heal 
should be run.

Almost all checks that trigger a heal are done in the lookup fop when some discrepancy is detected.

The function that checks size and version is called later once a lock on the inode is acquired (even if no heal is needed). However further failures in the processing of any fop can also trigger a self-heal.

Xavi


Thanks and Regards,
Ram

Sent from my iPhone

On Jan 12, 2017, at 2:25 AM, Xavier Hernandez <xhernan...@datalab.es> wrote:

Hi Ram,

On 12/01/17 02:36, Ankireddypalle Reddy wrote:
Xavi,
         I added some more logging information. The trusted.ec.size field 
values are in fact different.
          trusted.ec.size    l1 = 62719407423488    l2 = 0

That's very weird. Directories do not have this attribute. It's only present on 
regular files. But you said that the error happens while creating the file, so 
it doesn't make much sense because file creation always sets trusted.ec.size to 
0.

Could you reproduce the problem with diagnostics.client-log-level set to TRACE 
and send the log to me ? it will create a big log, but I'll have much more 
information about what's going on.

Do you have a mixed setup with nodes of different types ? for example mixed 
32/64 bits architectures or different operating systems ? I ask this because 
62719407423488 in hex is 0x390B00000000, which has the lower 32 bits set to 0, 
but has garbage above that.


          This is a fairly static setup with no brick/ node failure.  Please 
explain why  is that a heal is being triggered and what could have acutually 
caused these size xattrs to differ.  This is causing random I/O failures and is 
impacting the backup schedules.

The launch of self-heal is normal because it has detected an inconsistency. The 
real problem is what originates that inconsistency.

Xavi


[ 2017-01-12 01:19:18.256970] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-8: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-12 01:19:18.257015] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-glusterfsProd-disperse-8: Operation failed 
on some subvolumes (up=7, mask=7, remaining=0, good=3, bad=4)
[2017-01-12 01:19:18.257018] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-glusterfsProd-disperse-8: Heal failed [Invalid argument]
[2017-01-12 01:19:21.002028] E [dict.c:197:key_value_cmp] 
0-glusterfsProd-disperse-4: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-12 01:19:21.002056] E [dict.c:166:log_value] 
0-glusterfsProd-disperse-4: trusted.ec.size [ l1 = 62719407423488 l2 = 0 i1 = 0 
i2 = 0 ]
[2017-01-12 01:19:21.002064] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-4: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-12 01:19:21.209640] E [dict.c:197:key_value_cmp] 
0-glusterfsProd-disperse-4: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-12 01:19:21.209673] E [dict.c:166:log_value] 
0-glusterfsProd-disperse-4: trusted.ec.size [ l1 = 62719407423488 l2 = 0 i1 = 0 
i2 = 0 ]
[2017-01-12 01:19:21.209686] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-4: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-12 01:19:21.209719] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-glusterfsProd-disperse-4: Operation failed 
on some subvolumes (up=7, mask=7, remaining=0, good=6, bad=1)
[2017-01-12 01:19:21.209753] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-glusterfsProd-disperse-4: Heal failed [Invalid argument]

Thanks and Regards,
Ram

-----Original Message-----
From: Ankireddypalle Reddy
Sent: Wednesday, January 11, 2017 9:29 AM
To: Ankireddypalle Reddy; Xavier Hernandez; Gluster Devel 
(gluster-de...@gluster.org); gluster-users@gluster.org
Subject: RE: [Gluster-users] [Gluster-devel] Lot of EIO errors in disperse 
volume

Xavi,
           I built a debug binary to log more information. This is what is 
getting logged. Looks like it is the attribute trusted.ec.size which is 
different among the bricks in a sub volume.

In glustershd.log :

[2017-01-11 14:19:45.023845] N [MSGID: 122029] 
[ec-generic.c:683:ec_combine_lookup] 0-glusterfsProd-disperse-8: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP'
[2017-01-11 14:19:45.027718] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-6: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-11 14:19:45.027736] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-6: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-11 14:19:45.027763] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-6: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-11 14:19:45.027781] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-6: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-11 14:19:45.027793] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-glusterfsProd-disperse-6: Operation failed 
on some subvolumes (up=7, mask=7, remaining=0, good=6, bad=1)
[2017-01-11 14:19:45.027815] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-glusterfsProd-disperse-6: Heal failed [Invalid argument]
[2017-01-11 14:19:45.029035] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-8: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-11 14:19:45.029057] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-8: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-11 14:19:45.029089] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-8: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-11 14:19:45.029105] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-8: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-11 14:19:45.029121] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-glusterfsProd-disperse-8: Operation failed 
on some subvolumes (up=7, mask=7, remaining=0, good=6, bad=1)
[2017-01-11 14:19:45.032566] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-6: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-11 14:19:45.029138] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-glusterfsProd-disperse-8: Heal failed [Invalid argument]
[2017-01-11 14:19:45.032585] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-6: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-11 14:19:45.032614] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-6: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-11 14:19:45.032631] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-6: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-11 14:19:45.032638] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-glusterfsProd-disperse-6: Operation failed 
on some subvolumes (up=7, mask=7, remaining=0, good=6, bad=1)
[2017-01-11 14:19:45.032654] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-glusterfsProd-disperse-6: Heal failed [Invalid argument]
[2017-01-11 14:19:45.037514] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-6: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-11 14:19:45.037536] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-6: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-11 14:19:45.037553] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-6: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-11 14:19:45.037573] W [MSGID: 122056] 
[ec-combine.c:873:ec_combine_check] 0-glusterfsProd-disperse-6: Mismatching 
xdata in answers of 'LOOKUP'
[2017-01-11 14:19:45.037582] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-glusterfsProd-disperse-6: Operation failed 
on some subvolumes (up=7, mask=7, remaining=0, good=6, bad=1)
[2017-01-11 14:19:45.037599] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-glusterfsProd-disperse-6: Heal failed [Invalid argument]
[2017-01-11 14:20:40.001401] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-3: 'trusted.ec.size' is different in two dicts (8, 8)
[2017-01-11 14:20:40.001387] E [dict.c:166:key_value_cmp] 
0-glusterfsProd-disperse-5: 'trusted.ec.size' is different in two dicts (8, 8)

In the mount daemon log:

[2017-01-11 14:20:17.806826] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-0: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.806847] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-0: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.807076] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-1: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.807099] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-1: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.807286] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-10: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.807298] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-10: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.807409] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-11: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.807420] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-11: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.807448] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-4: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.807462] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-4: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.807539] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-2: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.807550] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-2: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.807723] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-3: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.807739] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-3: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.807785] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-5: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.807796] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-5: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.808020] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-9: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.808034] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-9: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.808054] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-6: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.808066] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-6: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.808282] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-8: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.808292] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-8: Invalid 
config xattr [Invalid argument]
[2017-01-11 14:20:17.809212] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 2-glusterfsProd-disperse-7: Invalid or 
corrupted config [Invalid argument]
[2017-01-11 14:20:17.809228] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 2-glusterfsProd-disperse-7: Invalid 
config xattr [Invalid argument]

[2017-01-11 14:20:17.812660] I [MSGID: 109036] [dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 2-glusterfsProd-dht: Setting layout of /Folder_01.05.2017_21.15/CV_MAGNETIC/V_31500/CHUNK_402578 with [Subvol_name: glusterfsProd-disperse-0, Err: -1 , Start: 1789569705 , Stop: 2147483645 , Hash: 1 ], [Subvol_name: glusterfsProd-disperse-1, Err: -1 , Start: 2147483646 , Stop: 2505397586 , Hash: 1 ], [Subvol_name: glusterfsProd-disperse-10, Err: -1 , Start: 2505397587 , Stop: 2863311527 , Hash: 1 ], [Subvol_name: glusterfsProd-disperse-11, Err: -1 , Start: 2863311528 , Stop: 3221225468 , Hash: 1 ], [Subvol_name: glusterfsProd-disperse-2, Err: -1 , Start: 3221225469 , Stop: 3579139409 , Hash: 1 ], [Subvol_name: glusterfsProd-disperse-3, Err: -1 , Start: 3579139410 , Stop: 3937053350 , Hash: 1 ], [Subvol_name: glusterfsProd-disperse-4, Err: -1 , Start: 3937053351 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: glusterfsProd-disperse-5, Err: -1 , Start: 0 , Stop: 357913940 ,
Hash: 1 ], [Subvol_name: glusterfsProd-disperse-6, Err: -1 , Start: 357913941 , 
Stop: 715827881 , Hash: 1 ], [Subvol_name: glusterfsProd-disperse-7, Err: -1 , 
Start: 715827882 , Stop: 1073741822 , Hash: 1 ], [Subvol_name: 
glusterfsProd-disperse-8, Err: -1 , Start: 1073741823 , Stop: 1431655763 , 
Hash: 1 ], [Subvol_name: glusterfsProd-disperse-9, Err: -1 , Start: 1431655764 
, Stop: 1789569704 , Hash: 1 ],


-----Original Message-----
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Ankireddypalle Reddy
Sent: Tuesday, January 10, 2017 10:09 AM
To: Xavier Hernandez; Gluster Devel (gluster-de...@gluster.org); 
gluster-users@gluster.org
Subject: Re: [Gluster-users] [Gluster-devel] Lot of EIO errors in disperse 
volume

Xavi,
          In this case it's the file creation which failed. So I provided the 
xattrs of the parent.

Thanks and Regards,
Ram

-----Original Message-----
From: Xavier Hernandez [mailto:xhernan...@datalab.es]
Sent: Tuesday, January 10, 2017 9:10 AM
To: Ankireddypalle Reddy; Gluster Devel (gluster-de...@gluster.org); 
gluster-users@gluster.org
Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume

Hi Ram,

On 10/01/17 14:42, Ankireddypalle Reddy wrote:
Attachments (2):

1



ec.txt
<https://imap.commvault.com/webconsole/embedded.do?url=https://imap.co
mmvault.com/webconsole/api/drive/publicshare/346714/file/ee2d1536c2dc4
dff94afb12132b4f8f6/action/preview&downloadUrl=https://imap.commvault.
com/webconsole/api/contentstore/publicshare/346714/file/ee2d1536c2dc4d
ff94afb12132b4f8f6/action/download>
[Download]
<https://imap.commvault.com/webconsole/api/contentstore/publicshare/34
6714/file/ee2d1536c2dc4dff94afb12132b4f8f6/action/download>(11.50
KB)

2



ws-glus.log
<https://imap.commvault.com/webconsole/embedded.do?url=https://imap.co
mmvault.com/webconsole/api/drive/publicshare/346714/file/cff3e0506e754
b9a939db02da1cbbd58/action/preview&downloadUrl=https://imap.commvault.
com/webconsole/api/contentstore/publicshare/346714/file/cff3e0506e754b
9a939db02da1cbbd58/action/download>
[Download]
<https://imap.commvault.com/webconsole/api/contentstore/publicshare/34
6714/file/cff3e0506e754b9a939db02da1cbbd58/action/download>(3.48
MB)

Xavi,
         We are encountering errors for different kinds of FOPS.
         The open failed for the following file:

         cvd_2017_01_10_02_28_26.log:98182 1f9fe 01/10 00:57:10 8414465
[MEDIAFS    ] 20117519-52075477 SingleInstancer_FS::StartDataFile2:
Failed to create the data file
[/ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854974/CHUNK_51342720
/SFILE_CONTAINER_062], error=0xECCC0005:{CQiFile::Open(92)} +
{CQiUTFOSAPI::open(96)/ErrNo.5.(Input/output error)-Open failed,
File=/ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854974/CHUNK_5134
2720/SFILE_CONTAINER_062, OperationFlag=0xC1, PermissionMode=0x1FF}

         I've attached the extended attributes for the directories
         /ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854974/ and

/ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854974/CHUNK_51342720
from all the bricks.

        The attributes look fine to me. I've also attached some log
cuts to illustrate the problem.

I need the extended attributes of the file itself, not the parent directories.

Xavi


Thanks and Regards,
Ram

-----Original Message-----
From: Xavier Hernandez [mailto:xhernan...@datalab.es]
Sent: Tuesday, January 10, 2017 7:53 AM
To: Ankireddypalle Reddy; Gluster Devel (gluster-de...@gluster.org);
gluster-users@gluster.org
Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume

Hi Ram,

the error is caused by an extended attribute that does not match on
all
3 bricks of the disperse set. Most probable value is
trusted.ec.version, but could be others.

At first sight, I don't see any change from 3.7.8 that could have
caused this. I'll check again.

What kind of operations are you doing ? this can help me narrow the search.

Xavi

On 10/01/17 13:43, Ankireddypalle Reddy wrote:
Xavi,
         Thanks. If you could please explain what to look for in the
extended attributes then I will check and let you know if I find
anything suspicious.  Also we noticed that some of these operations
would succeed if retried. Do you know of any communicated related
errors that are being reported/triaged.

Thanks and Regards,
Ram

-----Original Message-----
From: Xavier Hernandez [mailto:xhernan...@datalab.es]
Sent: Tuesday, January 10, 2017 7:23 AM
To: Ankireddypalle Reddy; Gluster Devel (gluster-de...@gluster.org);
gluster-users@gluster.org
Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume

Hi Ram,

On 10/01/17 13:14, Ankireddypalle Reddy wrote:
Attachment (1):

1



ecxattrs.txt
<https://imap.commvault.com/webconsole/embedded.do?url=https://imap.
c
o
mmvault.com/webconsole/api/drive/publicshare/346714/file/1272e682787
4
4
f15bf1a54f2b31b559d/action/preview&downloadUrl=https://imap.commvault.
com/webconsole/api/contentstore/publicshare/346714/file/1272e6827874
4
f
15bf1a54f2b31b559d/action/download>
[Download]
<https://imap.commvault.com/webconsole/api/contentstore/publicshare/
3
4
6714/file/1272e68278744f15bf1a54f2b31b559d/action/download>(5.92
KB)

Xavi,
            Please find attached the extended attributes for a
directory from all the bricks. Free space check failed for this with
error number EIO.

What do you mean ? what operation have you made to check the free
space on that directory ?

If it's a recursive check, I need the extended attributes from the
exact file that triggers the EIO. The attached attributes seem
consistent and that directory shouldn't cause any problem. Does an 'ls'
on that directory fail or does it show the contents ?

Xavi


Thanks and Regards,
Ram

-----Original Message-----
From: Xavier Hernandez [mailto:xhernan...@datalab.es]
Sent: Tuesday, January 10, 2017 6:45 AM
To: Ankireddypalle Reddy; Gluster Devel (gluster-de...@gluster.org);
gluster-users@gluster.org
Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume

Hi Ram,

can you execute the following command on all bricks on a file that
is giving EIO ?

getfattr -m. -e hex -d <path to file in brick>

Xavi

On 10/01/17 12:41, Ankireddypalle Reddy wrote:
Xavi,
           We have been running 3.7.8 on these servers. We
upgraded
to 3.7.18 yesterday. We upgraded all the servers at a time.  The
volume was brought down during upgrade.

Thanks and Regards,
Ram

-----Original Message-----
From: Xavier Hernandez [mailto:xhernan...@datalab.es]
Sent: Tuesday, January 10, 2017 6:35 AM
To: Ankireddypalle Reddy; Gluster Devel
(gluster-de...@gluster.org); gluster-users@gluster.org
Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume

Hi Ram,

how did you upgrade gluster ? from which version ?

Did you upgrade one server at a time and waited until self-heal
finished before upgrading the next server ?

Xavi

On 10/01/17 11:39, Ankireddypalle Reddy wrote:
Hi,

     We upgraded to GlusterFS 3.7.18 yesterday.  We see lot of
failures in our applications. Most of the errors are EIO. The
following log lines are commonly seen in the logs:



The message "W [MSGID: 122056] [ec-combine.c:873:ec_combine_check]
0-StoragePool-disperse-4: Mismatching xdata in answers of 'LOOKUP'"
repeated 2 times between [2017-01-10 02:46:25.069809] and
[2017-01-10 02:46:25.069835]

[2017-01-10 02:46:25.069852] W [MSGID: 122056]
[ec-combine.c:873:ec_combine_check] 0-StoragePool-disperse-5:
Mismatching xdata in answers of 'LOOKUP'

The message "W [MSGID: 122056] [ec-combine.c:873:ec_combine_check]
0-StoragePool-disperse-5: Mismatching xdata in answers of 'LOOKUP'"
repeated 2 times between [2017-01-10 02:46:25.069852] and
[2017-01-10 02:46:25.069873]

[2017-01-10 02:46:25.069910] W [MSGID: 122056]
[ec-combine.c:873:ec_combine_check] 0-StoragePool-disperse-6:
Mismatching xdata in answers of 'LOOKUP'

...

[2017-01-10 02:46:26.520774] I [MSGID: 109036]
[dht-common.c:9076:dht_log_new_layout_for_dir_selfheal]
0-StoragePool-dht: Setting layout of
/Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854213/CHUNK_51334585 with
[Subvol_name: StoragePool-disperse-0, Err: -1 , Start: 3221225466
,
Stop: 3758096376 , Hash: 1 ], [Subvol_name:
StoragePool-disperse-1,
Err:
-1 , Start: 3758096377 , Stop: 4294967295 , Hash: 1 ], [Subvol_name:
StoragePool-disperse-2, Err: -1 , Start: 0 , Stop: 536870910 , Hash:
1 ], [Subvol_name: StoragePool-disperse-3, Err: -1 , Start:
536870911 ,
Stop: 1073741821 , Hash: 1 ], [Subvol_name:
StoragePool-disperse-4,
Err:
-1 , Start: 1073741822 , Stop: 1610612732 , Hash: 1 ], [Subvol_name:
StoragePool-disperse-5, Err: -1 , Start: 1610612733 , Stop:
2147483643 ,
Hash: 1 ], [Subvol_name: StoragePool-disperse-6, Err: -1 , Start:
2147483644 , Stop: 2684354554 , Hash: 1 ], [Subvol_name:
StoragePool-disperse-7, Err: -1 , Start: 2684354555 , Stop:
3221225465 ,
Hash: 1 ],

[2017-01-10 02:46:26.522841] N [MSGID: 122031]
[ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-3:
Mismatching dictionary in answers of 'GF_FOP_XATTROP'

The message "N [MSGID: 122031]
[ec-generic.c:1130:ec_combine_xattrop]
0-StoragePool-disperse-3: Mismatching dictionary in answers of
'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10
02:46:26.522841] and [2017-01-10 02:46:26.522894]

[2017-01-10 02:46:26.522898] W [MSGID: 122040]
[ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-3:
Failed to get size and version [Input/output error]

[2017-01-10 02:46:26.523115] N [MSGID: 122031]
[ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-6:
Mismatching dictionary in answers of 'GF_FOP_XATTROP'

The message "N [MSGID: 122031]
[ec-generic.c:1130:ec_combine_xattrop]
0-StoragePool-disperse-6: Mismatching dictionary in answers of
'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10
02:46:26.523115] and [2017-01-10 02:46:26.523143]

[2017-01-10 02:46:26.523147] W [MSGID: 122040]
[ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-6:
Failed to get size and version [Input/output error]

[2017-01-10 02:46:26.523302] N [MSGID: 122031]
[ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-2:
Mismatching dictionary in answers of 'GF_FOP_XATTROP'

The message "N [MSGID: 122031]
[ec-generic.c:1130:ec_combine_xattrop]
0-StoragePool-disperse-2: Mismatching dictionary in answers of
'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10
02:46:26.523302] and [2017-01-10 02:46:26.523324]

[2017-01-10 02:46:26.523328] W [MSGID: 122040]
[ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-2:
Failed to get size and version [Input/output error]



[root@glusterfs3 Log_Files]# gluster --version

glusterfs 3.7.18 built on Dec  8 2016 06:34:26



[root@glusterfs3 Log_Files]# gluster volume info



Volume Name: StoragePool

Type: Distributed-Disperse

Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f

Status: Started

Number of Bricks: 8 x (2 + 1) = 24

Transport-type: tcp

Bricks:

Brick1: glusterfs1sds:/ws/disk1/ws_brick

Brick2: glusterfs2sds:/ws/disk1/ws_brick

Brick3: glusterfs3sds:/ws/disk1/ws_brick

Brick4: glusterfs1sds:/ws/disk2/ws_brick

Brick5: glusterfs2sds:/ws/disk2/ws_brick

Brick6: glusterfs3sds:/ws/disk2/ws_brick

Brick7: glusterfs1sds:/ws/disk3/ws_brick

Brick8: glusterfs2sds:/ws/disk3/ws_brick

Brick9: glusterfs3sds:/ws/disk3/ws_brick

Brick10: glusterfs1sds:/ws/disk4/ws_brick

Brick11: glusterfs2sds:/ws/disk4/ws_brick

Brick12: glusterfs3sds:/ws/disk4/ws_brick

Brick13: glusterfs1sds:/ws/disk5/ws_brick

Brick14: glusterfs2sds:/ws/disk5/ws_brick

Brick15: glusterfs3sds:/ws/disk5/ws_brick

Brick16: glusterfs1sds:/ws/disk6/ws_brick

Brick17: glusterfs2sds:/ws/disk6/ws_brick

Brick18: glusterfs3sds:/ws/disk6/ws_brick

Brick19: glusterfs1sds:/ws/disk7/ws_brick

Brick20: glusterfs2sds:/ws/disk7/ws_brick

Brick21: glusterfs3sds:/ws/disk7/ws_brick

Brick22: glusterfs1sds:/ws/disk8/ws_brick

Brick23: glusterfs2sds:/ws/disk8/ws_brick

Brick24: glusterfs3sds:/ws/disk8/ws_brick

Options Reconfigured:

performance.readdir-ahead: on

diagnostics.client-log-level: INFO



Thanks and Regards,

Ram

***************************Legal
Disclaimer***************************
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message by mistake, please
advise the sender by reply email and delete the message. Thank you."
******************************************************************
*
*
*
*


_______________________________________________
Gluster-devel mailing list
gluster-de...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


***************************Legal
Disclaimer***************************
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message by mistake, please
advise the sender by reply
email and delete the message. Thank you."
*******************************************************************
*
*
*


***************************Legal
Disclaimer***************************
"This communication may contain confidential and privileged material
for the sole use of the intended recipient. Any unauthorized review,
use or distribution by others is strictly prohibited. If you have
received the message by mistake, please advise the sender by reply
email and delete the message. Thank you."
********************************************************************
*
*

***************************Legal
Disclaimer***************************
"This communication may contain confidential and privileged material
for the sole use of the intended recipient. Any unauthorized review,
use or distribution by others is strictly prohibited. If you have
received the message by mistake, please advise the sender by reply
email and delete the message. Thank you."
*********************************************************************
*


***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material
for the sole use of the intended recipient. Any unauthorized review,
use or distribution by others is strictly prohibited. If you have
received the message by mistake, please advise the sender by reply
email and delete the message. Thank you."
**********************************************************************

***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the sole 
use of the intended recipient. Any unauthorized review, use or distribution by others is 
strictly prohibited. If you have received the message by mistake, please advise the 
sender by reply email and delete the message. Thank you."
**********************************************************************

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************


***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************


_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to