Re: [Gluster-devel] Pull Request review workflow

2020-10-15 Thread Ashish Pandey

I think it's a very good suggestion, I have faced this issue too. 
I think we should it now before we get used to of the current process :) 

--- 
Ashish 


- Original Message -

From: "Xavi Hernandez"  
To: "gluster-devel"  
Sent: Thursday, October 15, 2020 6:16:06 PM 
Subject: Re: [Gluster-devel] Pull Request review workflow 

If everyone agrees, I'll prepare a PR with the changes in rfc.sh and 
documentation to implement this change. 

Xavi 

On Thu, Oct 15, 2020 at 1:27 PM Ravishankar N < ravishan...@redhat.com > wrote: 






On 15/10/20 4:36 pm, Sheetal Pamecha wrote: 




+1 
Just a note to the maintainers who are merging PRs to have patience and check 
the commit message when there are more than 1 commits in PR. 




Makes sense. 









Another thing to consider is that rfc.sh script always does a rebase before 
pushing changes. This rewrites history and changes all commits of a PR. I think 
we shouldn't do a rebase in rfc.sh. Only if there are conflicts, I would do a 
manual rebase and push the changes. 













I think we would also need to rebase if say some .t failure was fixed and we 
need to submit the PR on top of that, unless "run regression" always applies 
your PR on the latest HEAD in the concerned branch and triggers the regression. 





Actually True, Since the migration to github. I have not been using ./rfc.sh 
and For me it's easier and cleaner. 







Me as well :) -Ravi 
___ 

Community Meeting Calendar: 

Schedule - 
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC 
Bridge: https://bluejeans.com/441850968 




Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 





___ 

Community Meeting Calendar: 

Schedule - 
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC 
Bridge: https://bluejeans.com/441850968 




Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 


___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Removing problematic language in geo-replication

2020-07-22 Thread Ashish Pandey

1. Can I replace master:slave with primary:secondary everywhere in the 
code and the CLI? Are there any suggestions for more appropriate 
terminology? 

>> Other options could be - Leader : follower 

2. Is it okay to target the changes to a major release (release-9) and 
*not* provide backward compatibility for the CLI? 

>> I hope so as it is not impacting functionality. 

 
Ashish 

- Original Message -

From: "Ravishankar N"  
To: "Gluster Devel"  
Sent: Wednesday, July 22, 2020 2:34:01 PM 
Subject: [Gluster-devel] Removing problematic language in geo-replication 

Hi, 

The gluster code base has some words and terminology (blacklist, 
whitelist, master, slave etc.) that can be considered hurtful/offensive 
to people in a global open source setting. Some of words can be fixed 
trivially but the Geo-replication code seems to be something that needs 
extensive rework. More so because we have these words being used in the 
CLI itself. Two questions that I had were: 

1. Can I replace master:slave with primary:secondary everywhere in the 
code and the CLI? Are there any suggestions for more appropriate 
terminology? 


2. Is it okay to target the changes to a major release (release-9) and 
*not* provide backward compatibility for the CLI? 

Thanks, 

Ravi 


___ 

Community Meeting Calendar: 

Schedule - 
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC 
Bridge: https://bluejeans.com/441850968 




Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 


___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] [Gluster-users] "Transport endpoint is not connected" error + long list of files to be healed

2019-11-13 Thread Ashish Pandey
Hi Mauro, 

Yes, it will take time to heal these files and time depends on the number of 
file/dir you have created and the amount of data you have written while the 
bricks were down. 

YOu can just run following command and keep observing that the count is 
changing or not - 

gluster volume heal tier2 info | grep entries 

--- 
Ashish 

- Original Message -

From: "Mauro Tridici"  
To: "Gluster Devel"  
Cc: "Gluster-users"  
Sent: Wednesday, November 13, 2019 7:00:37 PM 
Subject: [Gluster-users] "Transport endpoint is not connected" error + long 
list of files to be healed 

Dear All, 

our GlusterFS filesystem was showing some problem during some simple users 
actions (for example, during directory or file creation). 




mkdir -p test 
mkdir: impossibile creare la directory `test': Transport endpoint is not 
connected 




After received some users notification, I investigated about the issue and I 
detected that 3 bricks (each one in a separate gluster servers) were down. 
So, I forced the bricks to be up using “gluster vol start tier force” and 
bricks come back successfully. All the bricks are up. 

Anyway, I see from “gluster vol status” command output that also 2 self-heal 
daemons were down and I had to restart daemons to fix the problem. 
Now, everything seems to be ok watching the output of “gluster vol status” and 
I can create a test directory on the file system. 

But, during the last check made using “gluster volume heal tier2 info”, I saw a 
long list of files and directories that need to be healed. 
The list is very long and the command output is still going ahead on my 
terminal. 

What I can do to fix this issue? Does the self-heal feature fix automatically 
each files that need to be healed? 
Could you please help me to understand what I need to do in this case? 

You can find below some information about our GlusterFS configuration: 

Volume Name: tier2 
Type: Distributed-Disperse 
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 12 x (4 + 2) = 72 
Transport-type: tcp 

Thank you in advance. 
Regards, 
Mauro 

 

Community Meeting Calendar: 

APAC Schedule - 
Every 2nd and 4th Tuesday at 11:30 AM IST 
Bridge: https://bluejeans.com/118564314 

NA/EMEA Schedule - 
Every 1st and 3rd Tuesday at 01:00 PM EDT 
Bridge: https://bluejeans.com/118564314 

Gluster-users mailing list 
gluster-us...@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 

___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968


NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] Gluster Community Meeting : 2019-07-09

2019-07-09 Thread Ashish Pandey
Hi All, 

Today, we had Gluster Community Meeting and the minutes of meeting can be found 
on following link - 

https://github.com/gluster/community/blob/master/meetings/2019-07-09-Community_meeting.md
 

--- 
Ashish 
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] Gluster Community Meeting (APAC friendly hours)

2019-07-08 Thread Ashish Pandey
BEGIN:VCALENDAR
PRODID:Zimbra-Calendar-Provider
VERSION:2.0
METHOD:REQUEST
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
BEGIN:STANDARD
DTSTART:16010101T00
TZOFFSETTO:+0530
TZOFFSETFROM:+0530
TZNAME:IST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:2f47412e-78da-4c15-a865-3a07e33cd8f7
SUMMARY:Gluster Community Meeting (APAC friendly hours)
LOCATION:https://bluejeans.com/836554017
ATTENDEE;CN=gluster-users;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TR
 UE:mailto:gluster-us...@gluster.org
ATTENDEE;CN=gluster-devel;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TR
 UE:mailto:gluster-devel@gluster.org
ATTENDEE;CN=Ashish Pandey;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=TR
 UE:mailto:aspan...@redhat.com
ORGANIZER;CN=Ashish Pandey:mailto:aspan...@redhat.com
DTSTART;TZID="Asia/Kolkata":20190709T113000
DTEND;TZID="Asia/Kolkata":20190709T123000
STATUS:CONFIRMED
CLASS:PUBLIC
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
TRANSP:OPAQUE
LAST-MODIFIED:20190708T065254Z
DTSTAMP:20190708T065254Z
SEQUENCE:0
DESCRIPTION:The following is a new meeting request:\n\nSubject: Gluster Comm
 unity Meeting (APAC friendly hours) \nOrganizer: "Ashish Pandey"  \n\nLocation: https://bluejeans.com/836554017 \nTime: Tuesday\, J
 uly 9\, 2019\, 11:30:00 AM - 12:30:00 PM GMT +05:30 Chennai\, Kolkata\, Mumb
 ai\, New Delhi\n \nInvitees: gluster-us...@gluster.org\; gluster-devel@glust
 er.org\; aspan...@redhat.com \n\n\n*~*~*~*~*~*~*~*~*~*\n\n\nBridge: https://
 bluejeans.com/836554017\n\nMinutes meeting: https://hackmd.io/Keo9lk_yRMK24Q
 TEo7qr7g\n\nPrevious Meeting notes: https://github.com/gluster/community/mee
 tings\n\nFlash talk: Amar would like to talk about glusterfs 8.0 and its roa
 dmap. \n
X-ALT-DESC;FMTTYPE=text/html:The following is 
 a new meeting request:\n\n\n\nS
 ubject:Gluster Community Meeting (APAC friendly hours) \n
 Organizer:"Ashish Pandey" \;aspandey@redhat.c
 om\; \n\n\n\nLo
 cation:https://bluejeans.com/836554017 \nTime:Tuesday\, July 9\, 2019\, 11:30:00 AM - 12:30:00 PM GMT +05
 :30 Chennai\, Kolkata\, Mumbai\, New Delhi\n \n\n\nInvitees:gluster-us...@gluster.org
 \; gluster-devel@gluster.org\; aspan...@redhat.com \n\n*~*~*~*~*~*~*~*~*~* 
   Bridge: https://bluejeans.com/836554017\n\nMinutes meeting: https://ha
 ckmd.io/Keo9lk_yRMK24QTEo7qr7g\n\nPrevious Meeting notes: https://github.com
 /gluster/community/meetings\n\nFlash talk: Amar would like to talk about glu
 sterfs 8.0 and its roadmap.\n
BEGIN:VALARM
ACTION:DISPLAY
TRIGGER;RELATED=START:-PT5M
DESCRIPTION:Reminder
END:VALARM
END:VEVENT
END:VCALENDAR___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Should we enable features.locks-notify.contention by default ?

2019-05-30 Thread Ashish Pandey


- Original Message -

From: "Xavi Hernandez"  
To: "Ashish Pandey"  
Cc: "Amar Tumballi Suryanarayan" , "gluster-devel" 
 
Sent: Thursday, May 30, 2019 2:03:54 PM 
Subject: Re: [Gluster-devel] Should we enable features.locks-notify.contention 
by default ? 

On Thu, May 30, 2019 at 9:03 AM Ashish Pandey < aspan...@redhat.com > wrote: 





I am only concerned about in-service upgrade. 
If a feature/option is not present in V1, then I would prefer not to enable it 
by default on V2. 




The problem is that without enabling it, (other-)eager-lock will cause 
performance issues in some cases. It doesn't seem good to keep an option 
disabled if enabling it solves these problems. 




We have seen some problem in other-eager-lock when we changed it to enable by 
default. 




Which problems ? I think the only issue with other-eager-lock has been 
precisely that locks-notify-contention was disabled and a bug that needed to be 
solved anyway. 
I was talking about the issue when we have other-eager-lock disabled and then 
try to do in-service upgrade to a version where this option is ON by default. 
Although we don't have root cause of that, I was wondering if similar issue 
could happen in this case also. 

The difference will be that upgraded bricks will start sending upcall 
notifications. If clients are too old, these will simply be ignored. So I don't 
see any problem right now. 

Am I missing something ? 





--- 
Ashish 


From: "Amar Tumballi Suryanarayan" < atumb...@redhat.com > 
To: "Xavi Hernandez" < xhernan...@redhat.com > 
Cc: "gluster-devel" < gluster-devel@gluster.org > 
Sent: Thursday, May 30, 2019 12:04:43 PM 
Subject: Re: [Gluster-devel] Should we enable features.locks-notify.contention 
by default ? 



On Thu, May 30, 2019 at 11:34 AM Xavi Hernandez < xhernan...@redhat.com > 
wrote: 



Hi all, 

a patch [1] was added some time ago to send upcall notifications from the locks 
xlator to the current owner of a granted lock when another client tries to 
acquire the same lock (inodelk or entrylk). This makes it possible to use 
eager-locking on the client side, which improves performance significantly, 
while also keeping good performance when multiple clients are accessing the 
same files (the current owner of the lock receives the notification and 
releases it as soon as possible, allowing the other client to acquire it and 
proceed very soon). 

Currently both AFR and EC are ready to handle these contention notifications 
and both use eager-locking. However the upcall contention notification is 
disabled by default. 

I think we should enabled it by default. Does anyone see any possible issue if 
we do that ? 





If it helps performance, we should ideally do it. 

But, considering we are days away from glusterfs-7.0 branching, should we do it 
now, or wait for branch out, and make it default for next version? (so that it 
gets time for testing). Considering it is about consistency I would like to 
hear everyone's opinion here. 

Regards, 
Amar 






Regards, 

Xavi 

[1] https://review.gluster.org/c/glusterfs/+/14736 
___ 





-- 
Amar Tumballi (amarts) 

___ 

Community Meeting Calendar: 

APAC Schedule - 
Every 2nd and 4th Tuesday at 11:30 AM IST 
Bridge: https://bluejeans.com/836554017 

NA/EMEA Schedule - 
Every 1st and 3rd Tuesday at 01:00 PM EDT 
Bridge: https://bluejeans.com/486278655 

Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 






___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] Meeting Details on footer of the gluster-devel and gluster-user mailing list

2019-05-07 Thread Ashish Pandey
Hi, 

While we send a mail on gluster-devel or gluster-user mailing list, following 
content gets auto generated and placed at the end of mail. 

Gluster-users mailing list 
gluster-us...@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 

In the similar way, is it possible to attach meeting schedule and link at the 
end of every such mails? 
Like this - 

Meeting schedule - 


* APAC friendly hours 
* Tuesday 14th May 2019 , 11:30AM IST 
* Bridge: https://bluejeans.com/836554017 
* NA/EMEA 
* Tuesday 7th May 2019 , 01:00 PM EDT 
* Bridge: https://bluejeans.com/486278655 

Or just a link to meeting minutes details?? 
https://github.com/gluster/community/tree/master/meetings 

This will help developers and users of the community to know when and where 
meeting happens and how to attend those meetings. 

--- 
Ashish 






___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Should we enable contention notification by default ?

2019-05-02 Thread Ashish Pandey
Xavi, 

I would like to keep this option (features.lock-notify-contention) enabled by 
default. 
However, I can see that there is one more option which will impact the working 
of this option which is "notify-contention-delay" 
.description = "This value determines the minimum amount of time " 
"(in seconds) between upcall contention notifications " 
"on the same inode. If multiple lock requests are " 
"received during this period, only one upcall will " 
"be sent."}, 

I am not sure what should be the best value for this option if we want to keep 
features.lock-notify-contention ON by default? 
It looks like if we keep the value of notify-contention-delay more, say 5 sec, 
it will wait for this much time to send up call 
notification which does not look good. 
Is my understanding correct? 
What will be impact of this value and what should be the default value of this 
option? 

--- 
Ashish 






- Original Message -

From: "Xavi Hernandez"  
To: "gluster-devel"  
Cc: "Pranith Kumar Karampuri" , "Ashish Pandey" 
, "Amar Tumballi"  
Sent: Thursday, May 2, 2019 4:15:38 PM 
Subject: Should we enable contention notification by default ? 

Hi all, 

there's a feature in the locks xlator that sends a notification to current 
owner of a lock when another client tries to acquire the same lock. This way 
the current owner is made aware of the contention and can release the lock as 
soon as possible to allow the other client to proceed. 

This is specially useful when eager-locking is used and multiple clients access 
the same files and directories. Currently both replicated and dispersed volumes 
use eager-locking and can use contention notification to force an early release 
of the lock. 

Eager-locking reduces the number of network requests required for each 
operation, improving performance, but could add delays to other clients while 
it keeps the inode or entry locked. With the contention notification feature we 
avoid this delay, so we get the best performance with minimal issues in 
multiclient environments. 

Currently the contention notification feature is controlled by the 
'features.lock-notify-contention' option and it's disabled by default. Should 
we enable it by default ? 

I don't see any reason to keep it disabled by default. Does anyone foresee any 
problem ? 

Regards, 

Xavi 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Gluster : Improvements on "heal info" command

2019-03-06 Thread Ashish Pandey
No, it is not necessary that the first brick would be local one. 

I really don't think starting from local node will make a difference. 
The major time spent is not in getting list of entries from 
.gluster/indices/xattrop folder. 
The LOCK->XATTR_CHECK->UNLOCK is the cycle which takes most of the time which 
is not going to change even if it is from local brick. 

--- 
Ashish 


- Original Message -

From: "Strahil"  
To: "Ashish" , "Gluster" , 
"Gluster"  
Sent: Wednesday, March 6, 2019 10:21:26 PM 
Subject: Re: [Gluster-users] Gluster : Improvements on "heal info" command 



Hi , 

This sounds nice. I would like to ask if the order is starting from the local 
node's bricks first ? (I am talking about --brick=one) 

Best Regards, 
Strahil Nikolov 
On Mar 5, 2019 10:51, Ashish Pandey  wrote: 



Hi All, 

We have observed and heard from gluster users about the long time "heal info" 
command takes. 
Even when we all want to know if a gluster volume is healthy or not, it takes 
time to list down all the files from all the bricks after which we can be 
sure if the volume is healthy or not. 
Here, we have come up with some options for "heal info" command which provide 
report quickly and reliably. 

gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] 
 

Problem: "gluster v heal  info" command picks each subvolume and 
checks the .glusterfs/indices/xattrop folder of every brick of that subvolume 
to find out if there is any entry 
which needs to be healed. It picks the entry and takes a lock on that entry to 
check xattrs to find out if that entry actually needs heal or not. 
This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file. 

Let's consider two most often seen cases for which we use "heal info" and try 
to understand the improvements. 

Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes. 
A brick of the volume is down and client has written 1 files on one of the 
mount point of this volume. Entries for these 10K files will be created on 
".glusterfs/indices/xattrop" 
on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" 
command for this volume, it goes to all the bricks and picks these 10K file 
entries and 
goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens 
for all the bricks, that means, we check 50K files and perform the 
LOCK->CHECK-XATTR->UNLOCK cycle 50K times, 
while only 10K entries were sufficient to check. It is a very time consuming 
operation. If IO"s are happening one some of the new files, we check these 
files also which will add the time. 
Here, all we wanted to know if our volume has been healed and healthy. 

Solution : Whenever a brick goes down and comes up and when we use "heal info" 
command, our *main intention* is to find out if the volume is *healthy* or 
*unhealthy*. A volume is unhealthy even if one 
file is not healthy. So, we should scan bricks one by one and as soon as we 
find that one brick is having some entries which require to be healed, we can 
come out and list the files and say the volume is not 
healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" 
option has been introduced. 

"gluster v heal vol info --brick=[one,all]" 
"one" - It will scan the brick sequentially and as soon as it will find any 
unhealthy entries, it will list it out and stop scanning other bricks. 
"all" - It will act just like current behavior and provide all the files from 
all the bricks. If we do not provide this option, default (current) behavior 
will be applicable. 

Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* of 
the sub volume has been replaced and a heal has been triggered. 
To know if the volume is in healthy state, we go to each brick of *each and 
every sub volume* and check if there are any entries in 
".glusterfs/indices/xattrop" folder which need heal or not. 
If we know which sub volume participated in brick replacement, we just need to 
check health of that sub volume and not query/check other sub volumes. 

If several clients are writing number of files on this volume, an entry for 
each of these files will be created in .glusterfs/indices/xattrop and "heal 
info' 
command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these 
entries need heal or not which takes lot of time. 
In addition to this a client will also see performance drop as it will have to 
release and take lock again. 


Solution: Provide an option to mention number of sub volume for which we want 
to check heal info. 

"gluster v heal vol info --subvol= " 
Here, --subvol will be given number of the subvolume we want to check. 
Example: 
"gluster v heal vol info --subvol=1 " 



[Gluster-devel] Gluster : Improvements on "heal info" command

2019-03-05 Thread Ashish Pandey
Hi All, 

We have observed and heard from gluster users about the long time "heal info" 
command takes. 
Even when we all want to know if a gluster volume is healthy or not, it takes 
time to list down all the files from all the bricks after which we can be 
sure if the volume is healthy or not. 
Here, we have come up with some options for "heal info" command which provide 
report quickly and reliably. 

gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] 
 

Problem: "gluster v heal  info" command picks each subvolume and 
checks the .glusterfs/indices/xattrop folder of every brick of that subvolume 
to find out if there is any entry 
which needs to be healed. It picks the entry and takes a lock on that entry to 
check xattrs to find out if that entry actually needs heal or not. 
This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file. 

Let's consider two most often seen cases for which we use "heal info" and try 
to understand the improvements. 

Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes. 
A brick of the volume is down and client has written 1 files on one of the 
mount point of this volume. Entries for these 10K files will be created on 
".glusterfs/indices/xattrop" 
on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" 
command for this volume, it goes to all the bricks and picks these 10K file 
entries and 
goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens 
for all the bricks, that means, we check 50K files and perform the 
LOCK->CHECK-XATTR->UNLOCK cycle 50K times, 
while only 10K entries were sufficient to check. It is a very time consuming 
operation. If IO"s are happening one some of the new files, we check these 
files also which will add the time. 
Here, all we wanted to know if our volume has been healed and healthy. 

Solution : Whenever a brick goes down and comes up and when we use "heal info" 
command, our *main intention* is to find out if the volume is *healthy* or 
*unhealthy*. A volume is unhealthy even if one 
file is not healthy. So, we should scan bricks one by one and as soon as we 
find that one brick is having some entries which require to be healed, we can 
come out and list the files and say the volume is not 
healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" 
option has been introduced. 

"gluster v heal vol info --brick=[one,all]" 
"one" - It will scan the brick sequentially and as soon as it will find any 
unhealthy entries, it will list it out and stop scanning other bricks. 
"all" - It will act just like current behavior and provide all the files from 
all the bricks. If we do not provide this option, default (current) behavior 
will be applicable. 

Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* of 
the sub volume has been replaced and a heal has been triggered. 
To know if the volume is in healthy state, we go to each brick of *each and 
every sub volume* and check if there are any entries in 
".glusterfs/indices/xattrop" folder which need heal or not. 
If we know which sub volume participated in brick replacement, we just need to 
check health of that sub volume and not query/check other sub volumes. 

If several clients are writing number of files on this volume, an entry for 
each of these files will be created in .glusterfs/indices/xattrop and "heal 
info' 
command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these 
entries need heal or not which takes lot of time. 
In addition to this a client will also see performance drop as it will have to 
release and take lock again. 

Solution: Provide an option to mention number of sub volume for which we want 
to check heal info. 

"gluster v heal vol info --subvol= " 
Here, --subvol will be given number of the subvolume we want to check. 
Example: 
"gluster v heal vol info --subvol=1 " 


=== 
Performance Data - 
A quick performance test done on standalone system. 

Type: Distributed-Disperse 
Volume ID: ea40eb13-d42c-431c-9c89-0153e834e67e 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 2 x (4 + 2) = 12 
Transport-type: tcp 
Bricks: 
Brick1: apandey:/home/apandey/bricks/gluster/vol-1 
Brick2: apandey:/home/apandey/bricks/gluster/vol-2 
Brick3: apandey:/home/apandey/bricks/gluster/vol-3 
Brick4: apandey:/home/apandey/bricks/gluster/vol-4 
Brick5: apandey:/home/apandey/bricks/gluster/vol-5 
Brick6: apandey:/home/apandey/bricks/gluster/vol-6 
Brick7: apandey:/home/apandey/bricks/gluster/new-1 
Brick8: apandey:/home/apandey/bricks/gluster/new-2 
Brick9: apandey:/home/apandey/bricks/gluster/new-3 
Brick10: apandey:/home/apandey/bricks/gluster/new-4 
Brick11: apandey:/home/apandey/bricks/gluster/new-5 
Brick12: apandey:/home/apandey/bricks/gluster/new-6 

Just disabled the shd to get the data - 

Killed one brick each from two subvolumes and wrote 2000 files on mount point. 
[root@apandey 

Re: [Gluster-devel] Release 6: Kick off!

2019-01-23 Thread Ashish Pandey

Following is the patch I am working and targeting - 
https://review.gluster.org/#/c/glusterfs/+/21933/ 

It is under review phase and yet to be merged. 

-- 
Ashish 

- Original Message -

From: "RAFI KC"  
To: "Shyam Ranganathan" , "GlusterFS Maintainers" 
, "Gluster Devel"  
Sent: Wednesday, January 23, 2019 4:22:42 PM 
Subject: Re: [Gluster-devel] Release 6: Kick off! 

There are three patches that I'm working for Gluster-6. 

[1] : https://review.gluster.org/#/c/glusterfs/+/22075/ 

[2] : https://review.gluster.org/#/c/glusterfs/+/21333/ 

[3] : https://review.gluster.org/#/c/glusterfs/+/21720/ 


Regards 

Rafi KC 

On 1/19/19 1:51 AM, Shyam Ranganathan wrote: 
> On 12/6/18 9:34 AM, Shyam Ranganathan wrote: 
>> On 11/6/18 11:34 AM, Shyam Ranganathan wrote: 
>>> ## Schedule 
>> We have decided to postpone release-6 by a month, to accommodate for 
>> late enhancements and the drive towards getting what is required for the 
>> GCS project [1] done in core glusterfs. 
>> 
>> This puts the (modified) schedule for Release-6 as below, 
>> 
>> Working backwards on the schedule, here's what we have: 
>> - Announcement: Week of Mar 4th, 2019 
>> - GA tagging: Mar-01-2019 
>> - RC1: On demand before GA 
>> - RC0: Feb-04-2019 
>> - Late features cut-off: Week of Jan-21st, 2018 
>> - Branching (feature cutoff date): Jan-14-2018 
>> (~45 days prior to branching) 
> We are slightly past the branching date, I would like to branch early 
> next week, so please respond with a list of patches that need to be part 
> of the release and are still pending a merge, will help address review 
> focus on the same and also help track it down and branch the release. 
> 
> Thanks, Shyam 
> ___ 
> Gluster-devel mailing list 
> Gluster-devel@gluster.org 
> https://lists.gluster.org/mailman/listinfo/gluster-devel 
___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Regression health for release-5.next and release-6

2019-01-14 Thread Ashish Pandey

I downloaded logs of regression runs 1077 and 1073 and tried to investigate it. 
In both regression ec/bug-1236065.t is hanging on TEST 70 which is trying to 
get the online brick count 

I can see that in mount/bricks and glusterd logs it has not move forward after 
this test. 
glusterd.log - 

[2019-01-06 16:27:51.346408]:++ G_LOG:./tests/bugs/ec/bug-1236065.t: 
TEST: 70 5 online_brick_count ++ 
[2019-01-06 16:27:51.645014] I [MSGID: 106499] 
[glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume patchy 
[2019-01-06 16:27:51.646664] I [dict.c:2745:dict_get_str_boolean] 
(-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x4a6c3) 
[0x7f4c37fe06c3] 
-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x43b3a) 
[0x7f4c37fd9b3a] 
-->/build/install/lib/libglusterfs.so.0(dict_get_str_boolean+0x170) 
[0x7f4c433d83fb] ) 0-dict: key nfs.disable, integer type asked, has string type 
[Invalid argument] 
[2019-01-06 16:27:51.647177] I [dict.c:2361:dict_get_strn] 
(-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) 
[0x7f4c38095a32] 
-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) 
[0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) 
[0x7f4c433d7673] ) 0-dict: key brick0.rdma_port, string type asked, has integer 
type [Invalid argument] 
[2019-01-06 16:27:51.647227] I [dict.c:2361:dict_get_strn] 
(-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) 
[0x7f4c38095a32] 
-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) 
[0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) 
[0x7f4c433d7673] ) 0-dict: key brick1.rdma_port, string type asked, has integer 
type [Invalid argument] 
[2019-01-06 16:27:51.647292] I [dict.c:2361:dict_get_strn] 
(-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) 
[0x7f4c38095a32] 
-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) 
[0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) 
[0x7f4c433d7673] ) 0-dict: key brick2.rdma_port, string type asked, has integer 
type [Invalid argument] 
[2019-01-06 16:27:51.647333] I [dict.c:2361:dict_get_strn] 
(-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) 
[0x7f4c38095a32] 
-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) 
[0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) 
[0x7f4c433d7673] ) 0-dict: key brick3.rdma_port, string type asked, has integer 
type [Invalid argument] 
[2019-01-06 16:27:51.647371] I [dict.c:2361:dict_get_strn] 
(-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) 
[0x7f4c38095a32] 
-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) 
[0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) 
[0x7f4c433d7673] ) 0-dict: key brick4.rdma_port, string type asked, has integer 
type [Invalid argument] 
[2019-01-06 16:27:51.647409] I [dict.c:2361:dict_get_strn] 
(-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) 
[0x7f4c38095a32] 
-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) 
[0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) 
[0x7f4c433d7673] ) 0-dict: key brick5.rdma_port, string type asked, has integer 
type [Invalid argument] 
[2019-01-06 16:27:51.647447] I [dict.c:2361:dict_get_strn] 
(-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) 
[0x7f4c38095a32] 
-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) 
[0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) 
[0x7f4c433d7673] ) 0-dict: key brick6.rdma_port, string type asked, has integer 
type [Invalid argument] 
[2019-01-06 16:27:51.649335] E [MSGID: 101191] 
[event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch 
handler 
[2019-01-06 16:27:51.932871] I [MSGID: 106499] 
[glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume patchy 

It is just taking lot of time to get the status at this point. 
It looks like there could be some issue with connection or the handing of 
volume status when some bricks are down. 

--- 
Ashish 



- Original Message -

From: "Mohit Agrawal"  
To: "Shyam Ranganathan"  
Cc: "Gluster Devel"  
Sent: Saturday, January 12, 2019 6:46:20 PM 
Subject: Re: [Gluster-devel] Regression health for release-5.next and release-6 

Previous logs related to client not bricks, below are the brick logs 

[2019-01-12 12:25:25.893485]:++ G_LOG:./tests/bugs/ec/bug-1236065.t: 
TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 
3.o 4.o 5.o 6.o 7.o 8.o 9.o ++ 
The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 
'trusted.ec.size' would not be sent on wire in the future [Invalid 

Re: [Gluster-devel] Master branch lock down: RCA for tests (ec-1468261.t)

2018-08-12 Thread Ashish Pandey
Correction. 

RCA - https://lists.gluster.org/pipermail/gluster-devel/2018-August/055167.html 
Patch - Mohit is working on this patch (server side) which is yet to be merged. 

We can put extra test to make sure bricks are connected to shd before heal 
begin. Will send a patch for that. 

--- 
Ashish 

- Original Message -

From: "Ashish Pandey"  
To: "Shyam Ranganathan"  
Cc: "GlusterFS Maintainers" , "Gluster Devel" 
 
Sent: Monday, August 13, 2018 10:54:16 AM 
Subject: Re: [Gluster-devel] Master branch lock down: RCA for tests 
(ec-1468261.t) 


RCA - https://lists.gluster.org/pipermail/gluster-devel/2018-August/055167.html 
Patch - https://review.gluster.org/#/c/glusterfs/+/20657/ should also fix this 
issue. 

Checking if we can put extra test to make sure bricks are connected to shd 
before heal begin. Will send a patch for that. 

--- 
Ashish 

- Original Message -

From: "Shyam Ranganathan"  
To: "Gluster Devel" , "GlusterFS Maintainers" 
 
Sent: Monday, August 13, 2018 6:12:59 AM 
Subject: Re: [Gluster-devel] Master branch lock down: RCA for tests 
(testname.t) 

As a means of keeping the focus going and squashing the remaining tests 
that were failing sporadically, request each test/component owner to, 

- respond to this mail changing the subject (testname.t) to the test 
name that they are responding to (adding more than one in case they have 
the same RCA) 
- with the current RCA and status of the same 

List of tests and current owners as per the spreadsheet that we were 
tracking are: 

./tests/basic/distribute/rebal-all-nodes-migrate.t TBD 
./tests/basic/tier/tier-heald.t TBD 
./tests/basic/afr/sparse-file-self-heal.t TBD 
./tests/bugs/shard/bug-1251824.t TBD 
./tests/bugs/shard/configure-lru-limit.t TBD 
./tests/bugs/replicate/bug-1408712.t Ravi 
./tests/basic/afr/replace-brick-self-heal.t TBD 
./tests/00-geo-rep/00-georep-verify-setup.t Kotresh 
./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik 
./tests/basic/stats-dump.t TBD 
./tests/bugs/bug-1110262.t TBD 
./tests/basic/ec/ec-data-heal.t Mohit 
./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t Pranith 
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
 
TBD 
./tests/basic/ec/ec-5-2.t Sunil 
./tests/bugs/shard/bug-shard-discard.t TBD 
./tests/bugs/glusterd/remove-brick-testcases.t TBD 
./tests/bugs/protocol/bug-808400-repl.t TBD 
./tests/bugs/quick-read/bug-846240.t Du 
./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t Mohit 
./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh 
./tests/bugs/ec/bug-1236065.t Pranith 
./tests/00-geo-rep/georep-basic-dr-rsync.t Kotresh 
./tests/basic/ec/ec-1468261.t Ashish 
./tests/basic/afr/add-brick-self-heal.t Ravi 
./tests/basic/afr/granular-esh/replace-brick.t Pranith 
./tests/bugs/core/multiplex-limit-issue-151.t Sanju 
./tests/bugs/glusterd/validating-server-quorum.t Atin 
./tests/bugs/replicate/bug-1363721.t Ravi 
./tests/bugs/index/bug-1559004-EMLINK-handling.t Pranith 
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t Karthik 
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t 
Atin 
./tests/bugs/glusterd/rebalance-operations-in-single-node.t TBD 
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t TBD 
./tests/bitrot/bug-1373520.t Kotresh 
./tests/bugs/distribute/bug-1117851.t Shyam/Nigel 
./tests/bugs/glusterd/quorum-validation.t Atin 
./tests/bugs/distribute/bug-1042725.t Shyam 
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t 
Karthik 
./tests/bugs/quota/bug-1293601.t TBD 
./tests/bugs/bug-1368312.t Du 
./tests/bugs/distribute/bug-1122443.t Du 
./tests/bugs/core/bug-1432542-mpx-restart-crash.t 1608568 Nithya/Shyam 

Thanks, 
Shyam 
___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 


___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (ec-1468261.t)

2018-08-12 Thread Ashish Pandey

RCA - https://lists.gluster.org/pipermail/gluster-devel/2018-August/055167.html 
Patch - https://review.gluster.org/#/c/glusterfs/+/20657/ should also fix this 
issue. 

Checking if we can put extra test to make sure bricks are connected to shd 
before heal begin. Will send a patch for that. 

--- 
Ashish 

- Original Message -

From: "Shyam Ranganathan"  
To: "Gluster Devel" , "GlusterFS Maintainers" 
 
Sent: Monday, August 13, 2018 6:12:59 AM 
Subject: Re: [Gluster-devel] Master branch lock down: RCA for tests 
(testname.t) 

As a means of keeping the focus going and squashing the remaining tests 
that were failing sporadically, request each test/component owner to, 

- respond to this mail changing the subject (testname.t) to the test 
name that they are responding to (adding more than one in case they have 
the same RCA) 
- with the current RCA and status of the same 

List of tests and current owners as per the spreadsheet that we were 
tracking are: 

./tests/basic/distribute/rebal-all-nodes-migrate.t TBD 
./tests/basic/tier/tier-heald.t TBD 
./tests/basic/afr/sparse-file-self-heal.t TBD 
./tests/bugs/shard/bug-1251824.t TBD 
./tests/bugs/shard/configure-lru-limit.t TBD 
./tests/bugs/replicate/bug-1408712.t Ravi 
./tests/basic/afr/replace-brick-self-heal.t TBD 
./tests/00-geo-rep/00-georep-verify-setup.t Kotresh 
./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik 
./tests/basic/stats-dump.t TBD 
./tests/bugs/bug-1110262.t TBD 
./tests/basic/ec/ec-data-heal.t Mohit 
./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t Pranith 
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
 
TBD 
./tests/basic/ec/ec-5-2.t Sunil 
./tests/bugs/shard/bug-shard-discard.t TBD 
./tests/bugs/glusterd/remove-brick-testcases.t TBD 
./tests/bugs/protocol/bug-808400-repl.t TBD 
./tests/bugs/quick-read/bug-846240.t Du 
./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t Mohit 
./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh 
./tests/bugs/ec/bug-1236065.t Pranith 
./tests/00-geo-rep/georep-basic-dr-rsync.t Kotresh 
./tests/basic/ec/ec-1468261.t Ashish 
./tests/basic/afr/add-brick-self-heal.t Ravi 
./tests/basic/afr/granular-esh/replace-brick.t Pranith 
./tests/bugs/core/multiplex-limit-issue-151.t Sanju 
./tests/bugs/glusterd/validating-server-quorum.t Atin 
./tests/bugs/replicate/bug-1363721.t Ravi 
./tests/bugs/index/bug-1559004-EMLINK-handling.t Pranith 
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t Karthik 
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t 
Atin 
./tests/bugs/glusterd/rebalance-operations-in-single-node.t TBD 
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t TBD 
./tests/bitrot/bug-1373520.t Kotresh 
./tests/bugs/distribute/bug-1117851.t Shyam/Nigel 
./tests/bugs/glusterd/quorum-validation.t Atin 
./tests/bugs/distribute/bug-1042725.t Shyam 
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t 
Karthik 
./tests/bugs/quota/bug-1293601.t TBD 
./tests/bugs/bug-1368312.t Du 
./tests/bugs/distribute/bug-1122443.t Du 
./tests/bugs/core/bug-1432542-mpx-restart-crash.t 1608568 Nithya/Shyam 

Thanks, 
Shyam 
___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status

2018-08-08 Thread Ashish Pandey
I think the problem with this failure is the same which Shyam suspected for 
other EC failure. 
Connection to bricks are not being setup after killing bricks and starting 
volume using force. 

./tests/basic/ec/ec-1468261.t 
- 
Failure reported  - 

23:03:05 ok 34, LINENUM:79 23:03:05 not ok 35 Got "5" instead of "6", 
LINENUM:80 23:03:05 FAILED COMMAND: 6 ec_child_up_count patchy 0 23:03:05 not 
ok 36 Got "1298" instead of "^0$", LINENUM:83 23:03:05 FAILED COMMAND: ^0$ 
get_pending_heal_count patchy 23:03:05 ok 37, LINENUM:86 23:03:05 ok 38, 
LINENUM:87 23:03:05 not ok 39 Got "3" instead of "4", LINENUM:88 
 
When I see the glustershd log, I can see that there is an issue while starting 
the volume by force to starte the killed bricks. 
The bricks are not getting connected. 
I am seeing following logs in glustershd 
== 
[2018-08-06 23:05:45.077699] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 
0-dict: key 'trusted.ec.size' is would not be sent on wire in future [Invalid 
argument] 
[2018-08-06 23:05:45.077724] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 
0-dict: key 'trusted.ec.dirty' is would not be sent on wire in future [Invalid 
argument] 
[2018-08-06 23:05:45.077744] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 
0-dict: key 'trusted.ec.version' is would not be sent on wire in future 
[Invalid argument] 
[2018-08-06 23:05:46.695719] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 
0-patchy-client-1: changing port to 49152 (from 0) 
[2018-08-06 23:05:46.699766] W [MSGID: 114043] 
[client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-1: failed to set 
the volume [Resource temporarily unavailable] 
[2018-08-06 23:05:46.699809] W [MSGID: 114007] 
[client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-1: failed to get 
'process-uuid' from reply dict [Invalid argument] 
[2018-08-06 23:05:46.699833] E [MSGID: 114044] 
[client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-1: SETVOLUME on 
remote-host failed: cleanup flag is set for xlator.  Try again later [Resource 
temporarily unavailable] 
[2018-08-06 23:05:46.699855] I [MSGID: 114051] 
[client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-1: sending 
CHILD_CONNECTING event 
[2018-08-06 23:05:46.699920] I [MSGID: 114018] 
[client.c:2255:client_rpc_notify] 0-patchy-client-1: disconnected from 
patchy-client-1. Client process will keep trying to connect to glusterd until 
brick's port is available 
[2018-08-06 23:05:50.702806] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 
0-patchy-client-1: changing port to 49152 (from 0) 
[2018-08-06 23:05:50.706726] W [MSGID: 114043] 
[client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-1: failed to set 
the volume [Resource temporarily unavailable] 
[2018-08-06 23:05:50.706783] W [MSGID: 114007] 
[client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-1: failed to get 
'process-uuid' from reply dict [Invalid argument] 
[2018-08-06 23:05:50.706808] E [MSGID: 114044] 
[client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-1: SETVOLUME on 
remote-host failed: cleanup flag is set for xlator.  Try again later [Resource 
temporarily unavailable] 
[2018-08-06 23:05:50.706831] I [MSGID: 114051] 
[client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-1: sending 
CHILD_CONNECTING event 
[2018-08-06 23:05:50.706904] I [MSGID: 114018] 
[client.c:2255:client_rpc_notify] 0-patchy-client-1: disconnected from 
patchy-client-1. Client process will keep trying to connect to glusterd until 
brick's port is available 
[2018-08-06 23:05:54.713490] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 
0-patchy-client-1: changing port to 49152 (from 0) 
[2018-08-06 23:05:54.717417] W [MSGID: 114043] 
[client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-1: failed to set 
the volume [Resource temporarily unavailable] 
[2018-08-06 23:05:54.717483] W [MSGID: 114007] 
[client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-1: failed to get 
'process-uuid' from reply dict [Invalid argument] 
[2018-08-06 23:05:54.717508] E [MSGID: 114044] 
[client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-1: SETVOLUME on 
remote-host failed: cleanup flag is set for xlator.  Try again later [Resource 
temporarily unavailable] 
[2018-08-06 23:05:54.717530] I [MSGID: 114051] 
[client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-1: sending 
CHILD_CONNECTING event 
[2018-08-06 23:05:54.717605] I [MSGID: 114018] 
[client.c:2255:client_rpc_notify] 0-patchy-client-1: disconnected from 
patchy-client-1. Client process will keep trying to connect to glusterd until 
brick's port is available 
[2018-08-06 23:05:58.204494]:++ G_LOG:./tests/basic/ec/ec-1468261.t: 
TEST: 83 ^0$ get_pending_heal_count patchy ++ 
There are many more such logs in this duration 
 
Time at which test at line no 80 started - 
[2018-08-06 23:05:38.652297]:++ 

Re: [Gluster-devel] [Gluster-users] Integration of GPU with glusterfs

2018-01-15 Thread Ashish Pandey

It is disappointing to see the limitation being put by Nvidia on low cost GPU 
usage on data centers. 
https://www.theregister.co.uk/2018/01/03/nvidia_server_gpus/ 

We thought of providing an option in glusterfs by which we can control if we 
want to use GPU or not. 
So, the concern of gluster eating out GPU's which could be used by others can 
be addressed. 

--- 
Ashish 



- Original Message -

From: "Jim Kinney"  
To: gluster-us...@gluster.org, "Lindsay Mathieson" 
, "Darrell Budic" , 
"Gluster Users"  
Cc: "Gluster Devel"  
Sent: Friday, January 12, 2018 6:00:25 PM 
Subject: Re: [Gluster-devel] [Gluster-users] Integration of GPU with glusterfs 



On January 11, 2018 10:58:28 PM EST, Lindsay Mathieson 
 wrote: 
>On 12/01/2018 3:14 AM, Darrell Budic wrote: 
>> It would also add physical resource requirements to future client 
>> deploys, requiring more than 1U for the server (most likely), and I’m 
> 
>> not likely to want to do this if I’m trying to optimize for client 
>> density, especially with the cost of GPUs today. 
> 
>Nvidia has banned their GPU's being used in Data Centers now to, I 
>imagine they are planning to add a licensing fee. 

Nvidia banned only the lower cost, home user versions of their GPU line from 
datacenters. 
> 
>-- 
>Lindsay Mathieson 
> 
>___ 
>Gluster-users mailing list 
>gluster-us...@gluster.org 
>http://lists.gluster.org/mailman/listinfo/gluster-users 

-- 
Sent from my Android device with K-9 Mail. All tyopes are thumb related and 
reflect authenticity. 
___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Integration of GPU with glusterfs

2018-01-11 Thread Ashish Pandey

I have updated the comment. 
Thanks!!! 

--- 
Ashish 
- Original Message -

From: "Shyam Ranganathan" <srang...@redhat.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Gluster Devel" <gluster-devel@gluster.org> 
Sent: Thursday, January 11, 2018 10:12:54 PM 
Subject: Re: [Gluster-users] Integration of GPU with glusterfs 

On 01/11/2018 01:12 AM, Ashish Pandey wrote: 
> There is a gihub issue opened for this. Please provide your comment or 
> reply to this mail. 
> 
> A - https://github.com/gluster/glusterfs/issues/388 

Ashish, the github issue first comment is carrying the default message 
that we populate. 

It would make it more readable if you could copy the text in your mail 
to that instead (it would also look a lot cleaner). 

Thanks, 
Shyam 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Integration of GPU with glusterfs

2018-01-10 Thread Ashish Pandey
Hi, 

We have been thinking of exploiting GPU capabilities to enhance performance of 
glusterfs. We would like to know others thoughts on this. 
In EC, we have been doing CPU intensive computations to encode and decode data 
before writing and reading. This requires a lot of CPU cycles and we have 
been observing 100% CPU usage on client side. Data healing will also have the 
same impact as it also needs to do read-decode-encode-write cycle. 
As most of the modern servers comes with GPU feature, having glusterfs GPU 
ready might give us performance improvements. 
This is not only specific to EC volume, there are other features which will 
require a lot of computations and could use this capability; For Example: 
1 - Encryption/Decryption 
2 - Compression and de-duplication 
3 - Hashing 
4 - Any other? [Please add if you have something in mind] 

Before proceeding further we would like to have your inputs on this. 
Do you have any other use case (existing or future) which could perform better 
on GPU? 
Do you think that it is worth to integrate GPU with glusterfs? The effort to 
have this performance gain could be achieved by some other better ways. 
Any input on the way we should implement it. 

There is a gihub issue opened for this. Please provide your comment or reply to 
this mail. 

A - https://github.com/gluster/glusterfs/issues/388 

--- 
Ashish 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Regression failure : /tests/basic/ec/ec-1468261.t

2017-11-06 Thread Ashish Pandey

I don't think it is an issue with the test you mentioned. 
You may have to re-trigger the test. 
This is what I did for one of my patch. 

-- 
Ashish 
- Original Message -

From: "Nithya Balachandran" <nbala...@redhat.com> 
To: "Gluster Devel" <gluster-devel@gluster.org>, "Xavi Hernandez" 
<jaher...@redhat.com>, "Ashish Pandey" <aspan...@redhat.com> 
Sent: Monday, November 6, 2017 6:35:24 PM 
Subject: Regression failure : /tests/basic/ec/ec-1468261.t 

Can someone take a look at this? 
The run was aborted ( 
https://build.gluster.org/job/centos6-regression/7232/console ) 

Thanks, 
Nithya 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Need inputs on patch #17985

2017-08-22 Thread Ashish Pandey
Raghvendra, 

I have provided my comment on this patch. 
I think EC will not have any issue with this approach. 
However, I would welcome comments from Xavi and Pranith too for any side 
effects which I may not be able to foresee. 

Ashish 

- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, "Xavier Hernandez" 
<xhernan...@datalab.es>, "Gluster Devel" <gluster-devel@gluster.org> 
Sent: Wednesday, August 23, 2017 8:29:48 AM 
Subject: Need inputs on patch #17985 

Hi Ashish, 

Following are the blockers for making a decision on whether patch [1] can be 
merged or not: 
* Evaluation of dentry operations (like rename etc) in dht 
* Whether EC works fine if a non-lookup fop (like open(dir), stat, chmod etc) 
hits EC without a single lookup performed on file/inode 

Can you please comment on the patch? I'll take care of dht part. 

[1] https://review.gluster.org/#/c/17985/ 

regards, 
Raghavendra 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] High load on CPU due to glusterfsd process

2017-08-02 Thread Ashish Pandey
Hi, 

The issue you are seeing is a little complex one but information you have 
provided is very less. 
- Volume info 
- Volume status? 
- What kind of IO is going on? 
- Any brick is down or not? 
- Snapshot of Top command. 
- Anything you are seeing in glustershd or mount logs or bricks logs? 

--- 
Ashish 

- Original Message -

From: "ABHISHEK PALIWAL"  
To: "gluster-users" , "Gluster Devel" 
 
Sent: Wednesday, August 2, 2017 1:49:30 PM 
Subject: Re: [Gluster-devel] High load on CPU due to glusterfsd process 

Could you please response? 

On Fri, Jul 28, 2017 at 5:55 PM, ABHISHEK PALIWAL < abhishpali...@gmail.com > 
wrote: 



Hi Team, 

Whenever I am performing the IO operation on gluster volume, the loads is 
getting increase on CPU which reaches upto 70-80 sometimes. 

when we started debugging, found that the io_worker thread is created to server 
the IO request and consume high CPU till that request gets completed. 

Could you please let me know why io_worker thread takes this much of CPU. 

Is there any way to resole this? 

-- 

Regards 
Abhishek Paliwal 






-- 




Regards 
Abhishek Paliwal 

___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Glusto failures with dispersed volumes + Samba

2017-07-05 Thread Ashish Pandey
Hi Nigel, 

As Pranith has already mentioned, we are getting different gfid's in loc and 
loc->inode. 
It looks like issue with DHT. If a re validate fails for gfid, a fresh look up 
should be done. 

I don't know if it is related or not but a similar bug was fixed by Pranith 
https://review.gluster.org/#/c/16986/ 

Ashish 



- Original Message -

From: "Pranith Kumar Karampuri"  
To: "Anoop C S"  
Cc: "gluster-devel"  
Sent: Thursday, June 29, 2017 7:36:45 PM 
Subject: Re: [Gluster-devel] Glusto failures with dispersed volumes + Samba 



On Thu, Jun 29, 2017 at 6:49 PM, Anoop C S < anoo...@autistici.org > wrote: 


On Thu, 2017-06-29 at 16:35 +0530, Nigel Babu wrote: 
> Hi Pranith and Xavi, 
> 
> We seem to be running into a problem with glusto tests when we try to run 
> them against dispersed 
> volumes over a CIFS mount[1]. 

Is this a new test case? If not was it running successfully before? 

> You can find the logs attached to the job [2]. 

VFS stat call failures are seen in Samba logs: 

[2017/06/29 11:01:55.959374, 0] 
../source3/modules/vfs_glusterfs.c:870(vfs_gluster_stat) 
glfs_stat(.) failed: Invalid argument 

I could also see the following errors(repeatedly..) in glusterfs client logs: 

[2017-06-29 10:33:43.031198] W [MSGID: 122019] 
[ec-helpers.c:412:ec_loc_gfid_check] 0- 
testvol_distributed-dispersed-disperse-0: Mismatching GFID's in loc 
[2017-06-29 10:33:43.031303] I [MSGID: 109094] 
[dht-common.c:1016:dht_revalidate_cbk] 0- 
testvol_distributed-dispersed-dht: Revalidate: subvolume 
testvol_distributed-dispersed-disperse-0 
for /user11 (gfid = 665c515b-3940-480f-af7c-6aaf37731eaa) returned -1 [Invalid 
argument] 




This log basically says that EC received loc which has different gfids in 
loc->inode->gfid and loc->gfid. 



> I've triggered a fresh job[3] to confirm that it only fails in these 
> particular conditions and 
> certainly seems to be the case. The job is currently ongoing, so you may want 
> to take a look when 
> you get some time how this job went. 
> 
> Let me know if you have any questions or need more debugging information. 
> 
> [1]: https://ci.centos.org/job/gluster_glusto/325/testReport/ 
> [2]: https://ci.centos.org/job/gluster_glusto/325/artifact/ 
> [3]: https://ci.centos.org/job/gluster_glusto/326/console 
> 
> 
> ___ 
> Gluster-devel mailing list 
> Gluster-devel@gluster.org 
> http://lists.gluster.org/mailman/listinfo/gluster-devel 






-- 
Pranith 

___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Disperse volume : Sequential Writes

2017-07-04 Thread Ashish Pandey

I think it is a good Idea. 
May be we can add more enhancement in this xlator to improve things in future. 

- Original Message -

From: "Pranith Kumar Karampuri" <pkara...@redhat.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Xavier Hernandez" <xhernan...@datalab.es>, "Gluster Devel" 
<gluster-devel@gluster.org> 
Sent: Monday, July 3, 2017 9:05:54 AM 
Subject: Re: [Gluster-devel] Disperse volume : Sequential Writes 

Ashish, Xavi, 
I think it is better to implement this change as a separate read-after-write 
caching xlator which we can load between EC and client xlator. That way EC will 
not get a lot more functionality than necessary and may be this xlator can be 
used somewhere else in the stack if possible. 

On Fri, Jun 16, 2017 at 4:19 PM, Ashish Pandey < aspan...@redhat.com > wrote: 




I think it should be done as we have agreement on basic design. 


From: "Pranith Kumar Karampuri" < pkara...@redhat.com > 
To: "Xavier Hernandez" < xhernan...@datalab.es > 
Cc: "Ashish Pandey" < aspan...@redhat.com >, "Gluster Devel" < 
gluster-devel@gluster.org > 
Sent: Friday, June 16, 2017 3:50:09 PM 
Subject: Re: [Gluster-devel] Disperse volume : Sequential Writes 




On Fri, Jun 16, 2017 at 3:12 PM, Xavier Hernandez < xhernan...@datalab.es > 
wrote: 


On 16/06/17 10:51, Pranith Kumar Karampuri wrote: 




On Fri, Jun 16, 2017 at 12:02 PM, Xavier Hernandez 
< xhernan...@datalab.es > wrote: 

On 15/06/17 11:50, Pranith Kumar Karampuri wrote: 



On Thu, Jun 15, 2017 at 11:51 AM, Ashish Pandey 
< aspan...@redhat.com  
>> wrote: 

Hi All, 

We have been facing some issues in disperse (EC) volume. 
We know that currently EC is not good for random IO as it 
requires 
READ-MODIFY-WRITE fop 
cycle if an offset and offset+length falls in the middle of 
strip size. 

Unfortunately, it could also happen with sequential writes. 
Consider an EC volume with configuration 4+2. The stripe 
size for 
this would be 512 * 4 = 2048. That is, 2048 bytes of user data 
stored in one stripe. 
Let's say 2048 + 512 = 2560 bytes are already written on this 
volume. 512 Bytes would be in second stripe. 
Now, if there are sequential writes with offset 2560 and of 
size 1 
Byte, we have to read the whole stripe, encode it with 1 
Byte and 
then again have to write it back. 
Next, write with offset 2561 and size of 1 Byte will again 
READ-MODIFY-WRITE the whole stripe. This is causing bad 
performance. 

There are some tools and scenario's where such kind of load is 
coming and users are not aware of that. 
Example: fio and zip 

Solution: 
One possible solution to deal with this issue is to keep 
last stripe 
in memory. 
This way, we need not to read it again and we can save READ fop 
going over the network. 
Considering the above example, we have to keep last 2048 bytes 
(maximum) in memory per file. This should not be a big 
deal as we already keep some data like xattr's and size info in 
memory and based on that we take decisions. 

Please provide your thoughts on this and also if you have 
any other 
solution. 


Just adding more details. 
The stripe will be in memory only when lock on the inode is active. 


I think that's ok. 

One 
thing we are yet to decide on is: do we want to read the stripe 
everytime we get the lock or just after an extending write is 
performed. 
I am thinking keeping the stripe in memory just after an 
extending write 
is better as it doesn't involve extra network operation. 


I wouldn't read the last stripe unconditionally every time we lock 
the inode. There's no benefit at all on random writes (in fact it's 
worse) and a sequential write will issue the read anyway when 
needed. The only difference is a small delay for the first operation 
after a lock. 


Yes, perfect. 



What I would do is to keep the last stripe of every write (we can 
consider to do it per fd), even if it's not the last stripe of the 
file (to also optimize sequential rewrites). 


Ah! good point. But if we remember it per fd, one fd's cached data can 
be over-written by another fd on the disk so we need to also do cache 
invalidation. 



We only cache data if we have the inodelk, so all related fd's must be from the 
same client, and we'll control all its writes so cache invalidation in this 
case is pretty easy. 

There exists the possibility to have two fd's from the same client writing to 
the same region. To control this we would need some range checking in the 
writes, but all this is local, so it's easy to control it. 

Anyway, this is probably not a common case, so we could start by caching only 
the last stripe of the last write, ignoring the fd. 



May be implementation should consider this possibility. 
Yet to think about how to do this. But it is a good point. We should 
consider this. 


Maybe we could keep a list of cached stripes sorted by offs

[Gluster-devel] BUG: Code changes in EC as part of Brick Multiplexing

2017-06-22 Thread Ashish Pandey

Hi, 

There are some code changes in EC which is impacting response time of gluster v 
heal info 
I have sent following patch to initiate the discussion on this and to 
understand why this code change was done. 
https://review.gluster.org/#/c/17606/1 

 
ec: Increase notification in all the cases 

Problem: 
"gluster v heal  info" is taking 
long time to respond when a brick is down. 

RCA: 
Heal info command does virtual mount. 
EC wait for 10 seconds, before sending UP call to upper xlator, 
to get notification (DOWN or UP) from all the bricks. 

Currently, we are increasing ec->xl_notify_count based on 
the current status of the brick. So, if a DOWN event notification 
has come and brick is already down, we are not increasing 
ec->xl_notify_count in ec_handle_down. 

Solution: 
Handle DOWN even as notification irrespective of what 
is the current status of brick. 

 
Code change was done by https://review.gluster.org/#/c/14763/ 

Ashish 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Build failed in Jenkins: regression-test-with-multiplex #60

2017-06-12 Thread Ashish Pandey
Ok, 

I will check if this is catching the data corruption or not after modifying the 
code in EC. 
Initially it was not doing so. 


- Original Message -

From: "Atin Mukherjee" <amukh...@redhat.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Gluster Devel" <gluster-devel@gluster.org> 
Sent: Monday, June 12, 2017 3:21:29 PM 
Subject: Re: Build failed in Jenkins: regression-test-with-multiplex #60 



On Mon, Jun 12, 2017 at 11:37 AM, Atin Mukherjee < amukh...@redhat.com > wrote: 





On Mon, Jun 12, 2017 at 11:15 AM, Ashish Pandey < aspan...@redhat.com > wrote: 




Test is failing because of ENOTCONN. 

-+ 
[2017-06-11 21:26:04.650497] I [fuse-bridge.c:4210:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.24 kernel 7.14 
[2017-06-11 21:26:04.650546] I [fuse-bridge.c:4840:fuse_graph_sync] 0-fuse: 
switched to graph 0 
[2017-06-11 21:26:04.650890] E [fuse-bridge.c:4276:fuse_first_lookup] 0-fuse: 
first lookup on root failed (Transport endpoint is not connected) 
[2017-06-11 21:26:04.651204] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:04.651231] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 2: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:04.651379] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:04.651396] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 3: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:05.654880] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:05.654921] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 4: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:05.655105] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:05.655132] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 5: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:06.658233] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:06.658294] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 6: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:06.658445] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:06.658471] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 7: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:07.661446] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:07.661487] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 8: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:07.661625] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:07.661642] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 9: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:08.664545] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:08.664598] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 10: GETATTR 1 (----0001) 
resolution failed 



In this test, we are trying to kill a brick and starting it using command line. 
I think that is what actually failing. 
In multiplexing, can we do it? Or is there some other way of doing the same 
thing? 




What's the reason of starting the brick from cmdline? Why can't we start the 
volume with force? 




Posted a patch : https://review.gluster.org/#/c/17508 










Ashish 



From: "Atin Mukherjee" < amukh...@redhat.com > 
To: "Ashish Pandey" < aspan...@redhat.com > 
Cc: "Gluster Devel" < gluster-devel@gluster.org > 
Sent: Monday, June 12, 2017 10:10:05 AM 
Subject: Fwd: Build failed in Jenkins: regression-test-with-multiplex #60 


https://review.gluster.org/#/c/16985/ has intro

Re: [Gluster-devel] Build failed in Jenkins: regression-test-with-multiplex #60

2017-06-11 Thread Ashish Pandey

Test is failing because of ENOTCONN. 

-+ 
[2017-06-11 21:26:04.650497] I [fuse-bridge.c:4210:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.24 kernel 7.14 
[2017-06-11 21:26:04.650546] I [fuse-bridge.c:4840:fuse_graph_sync] 0-fuse: 
switched to graph 0 
[2017-06-11 21:26:04.650890] E [fuse-bridge.c:4276:fuse_first_lookup] 0-fuse: 
first lookup on root failed (Transport endpoint is not connected) 
[2017-06-11 21:26:04.651204] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:04.651231] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 2: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:04.651379] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:04.651396] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 3: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:05.654880] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:05.654921] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 4: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:05.655105] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:05.655132] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 5: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:06.658233] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:06.658294] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 6: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:06.658445] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:06.658471] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 7: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:07.661446] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:07.661487] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 8: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:07.661625] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:07.661642] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 9: GETATTR 1 (----0001) 
resolution failed 
[2017-06-11 21:26:08.664545] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 
0-fuse: ----0001: failed to resolve (Transport 
endpoint is not connected) 
[2017-06-11 21:26:08.664598] E [fuse-bridge.c:881:fuse_getattr_resume] 
0-glusterfs-fuse: 10: GETATTR 1 (----0001) 
resolution failed 



In this test, we are trying to kill a brick and starting it using command line. 
I think that is what actually failing. 
In multiplexing, can we do it? Or is there some other way of doing the same 
thing? 

Ashish 


- Original Message -

From: "Atin Mukherjee" <amukh...@redhat.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Gluster Devel" <gluster-devel@gluster.org> 
Sent: Monday, June 12, 2017 10:10:05 AM 
Subject: Fwd: Build failed in Jenkins: regression-test-with-multiplex #60 

https://review.gluster.org/#/c/16985/ has introduced a new test ec-data-heal.t 
which is now constantly failing with brick multiplexing. Can this be looked at? 

-- Forwarded message -- 
From: < jenk...@build.gluster.org > 
Date: Mon, Jun 12, 2017 at 6:33 AM 
Subject: Build failed in Jenkins: regression-test-with-multiplex #60 
To: maintain...@gluster.org , j...@pl.atyp.us , avish...@redhat.com , 
pkara...@redhat.com , amukh...@redhat.com , xhernan...@datalab.es , 
rgowd...@redhat.com , nde...@redhat.com 


See < 
https://build.gluster.org/job/regression-test-with-multiplex/60/display/redirect
 > 

-- 
[...truncated 747.65 KB...] 
./tests/basic/glusterd/arbiter-volume-probe.t - 14 second 
./tests/basic/gfid-access.t - 14 second 
./tests/basic/ec/ec-root-heal.t - 14 second 
./tests/basic

Re: [Gluster-devel] Performance experiments with io-stats translator

2017-06-08 Thread Ashish Pandey

Please note the bug in fio https://github.com/axboe/fio/issues/376 which is 
actually impacting performance in case of EC volume. 
I am not sure if this would be relevant in your case but thought to mention it. 

Ashish 
- Original Message -

From: "Manoj Pillai"  
To: "Krutika Dhananjay"  
Cc: "Gluster Devel"  
Sent: Thursday, June 8, 2017 12:22:19 PM 
Subject: Re: [Gluster-devel] Performance experiments with io-stats translator 

Thanks. So I was suggesting a repeat of the test but this time with iodepth=1 
in the fio job. If reducing the no. of concurrent requests reduces drastically 
the high latency you're seeing from the client-side, that would strengthen the 
hypothesis than serialization/contention among concurrent requests at the n/w 
layers is the root cause here. 

-- Manoj 

On Thu, Jun 8, 2017 at 11:46 AM, Krutika Dhananjay < kdhan...@redhat.com > 
wrote: 



Hi, 

This is what my job file contains: 

[global] 
ioengine=libaio 
#unified_rw_reporting=1 
randrepeat=1 
norandommap=1 
group_reporting 
direct=1 
runtime=60 
thread 
size=16g 


[workload] 
bs=4k 
rw=randread 
iodepth=8 
numjobs=1 
file_service_type=random 
filename=/perf5/iotest/fio_5 
filename=/perf6/iotest/fio_6 
filename=/perf7/iotest/fio_7 
filename=/perf8/iotest/fio_8 

I have 3 vms reading from one mount, and each of these vms is running the above 
job in parallel. 

-Krutika 

On Tue, Jun 6, 2017 at 9:14 PM, Manoj Pillai < mpil...@redhat.com > wrote: 





On Tue, Jun 6, 2017 at 5:05 PM, Krutika Dhananjay < kdhan...@redhat.com > 
wrote: 



Hi, 

As part of identifying performance bottlenecks within gluster stack for VM 
image store use-case, I loaded io-stats at multiple points on the client and 
brick stack and ran randrd test using fio from within the hosted vms in 
parallel. 

Before I get to the results, a little bit about the configuration ... 

3 node cluster; 1x3 plain replicate volume with group virt settings, direct-io. 
3 FUSE clients, one per node in the cluster (which implies reads are served 
from the replica that is local to the client). 

io-stats was loaded at the following places: 
On the client stack: Above client-io-threads and above protocol/client-0 (the 
first child of AFR). 
On the brick stack: Below protocol/server, above and below io-threads and just 
above storage/posix. 

Based on a 60-second run of randrd test and subsequent analysis of the stats 
dumped by the individual io-stats instances, the following is what I found: 

​​Translator Position Avg Latency of READ fop as seen by this translator 

1. parent of client-io-threads 1666us 

∆ (1,2) = 50us 

2. parent of protocol/client-0 1616us 

∆ (2,3) = 1453us 

- end of client stack - 
- beginning of brick stack --- 

3. child of protocol/server 163us 

∆ (3,4) = 7us 

4. parent of io-threads 156us 

∆ (4,5) = 20us 

5. child-of-io-threads 136us 

∆ (5,6) = 11us 

6. parent of storage/posix 125us 
... 
 end of brick stack  

So it seems like the biggest bottleneck here is a combination of the network + 
epoll, rpc layer? 
I must admit I am no expert with networks, but I'm assuming if the client is 
reading from the local brick, then 
even latency contribution from the actual network won't be much, in which case 
bulk of the latency is coming from epoll, rpc layer, etc at both client and 
brick end? Please correct me if I'm wrong. 

I will, of course, do some more runs and confirm if the pattern is consistent. 

-Krutika 





Really interesting numbers! How many concurrent requests are in flight in this 
test? Could you post the fio job? I'm wondering if/how these latency numbers 
change if you reduce the number of concurrent requests. 

-- Manoj 










___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] EC Healing Algorithm

2017-04-06 Thread Ashish Pandey

If the data is written on minimum number of brick, heal will take place on 
failed brick only. 
Data will be read from good bricks, encoding will happen and the fragment on 
the failed brick will be written only. 

- Original Message -

From: "jayakrishnan mm"  
To: "Gluster Devel"  
Sent: Thursday, April 6, 2017 2:21:26 PM 
Subject: [Gluster-devel] EC Healing Algorithm 

Hi 

I am using Glusterfs3.7.15. 
What type of algorithm is used in EC Healing ? I mean , if a brick fails during 
writing and if it comes back online later , whether all the bricks will be 
re-written or only the failed brick is written with the new data? 

Best regards 
JK 


___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Proposal to deprecate replace-brick for "distribute only" volumes

2017-03-16 Thread Ashish Pandey


- Original Message -

From: "Atin Mukherjee"  
To: "Raghavendra Talur" , gluster-devel@gluster.org, 
gluster-us...@gluster.org 
Sent: Thursday, March 16, 2017 4:22:41 PM 
Subject: Re: [Gluster-devel] [Gluster-users] Proposal to deprecate 
replace-brick for "distribute only" volumes 

Makes sense. 

On Thu, 16 Mar 2017 at 06:51, Raghavendra Talur < rta...@redhat.com > wrote: 


Hi, 

In the last few releases, we have changed replace-brick command such 
that it can be called only with "commit force" option. When invoked, 
this is what happens to the volume: 

a. distribute only volume: the given brick is replaced with a empty 
brick with 100% probability of data loss. 
b. distribute-replicate: the given brick is replaced with a empty 
brick and self heal is triggered. If admin is wise enough to monitor 
self heal status before another replace-brick command, data is safe. 
c. distribute-disperse: same as above in distribute-replicate 

My proposal is to fully deprecate replace-brick command for 
"distribute only" volumes. It should print out a error "The right way 
to replace brick for distribute only volume is to add brick, wait for 
rebalance to complete and remove brick" and return a "-1". 




It makes sense. 
I just don't see any use of add-brick before remove-brick except the fact that 
it will 
help to keep the overall storage capacity of volume intact . 
What is the guarantee that the files on the brick which we want to replace 
would migrate to added brick? 

If a brick, which we want to replace, is healthy and we just want to replace it 
then perhaps we should provide 
a command to copy those files to new brick and then remove the old brick. 




Thoughts? 

Thanks, 
Raghavendra Talur 
___ 
Gluster-users mailing list 
gluster-us...@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-users 



-- 
--Atin 

___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious regression failure? tests/basic/ec/ec-background-heals.t

2017-01-26 Thread Ashish Pandey

Xavi, 

shd has been disabled in this test on line number 12 and we have also disabled 
client side heal. 
So, no body is going to try to heal it. 

Ashish 

- Original Message -

From: "Atin Mukherjee" <amukh...@redhat.com> 
To: "Ashish Pandey" <aspan...@redhat.com>, "Raghavendra Gowdappa" 
<rgowd...@redhat.com>, "Xavier Hernandez" <xhernan...@datalab.es> 
Cc: "Gluster Devel" <gluster-devel@gluster.org> 
Sent: Thursday, January 26, 2017 5:50:00 PM 
Subject: Re: [Gluster-devel] Spurious regression failure? 
tests/basic/ec/ec-background-heals.t 

I've +1ed it now. 

On Thu, 26 Jan 2017 at 15:05, Xavier Hernandez < xhernan...@datalab.es > wrote: 


Hi Atin, 

I don't clearly see what's the problem. Even if the truncate causes a 
dirty flag to be set, eventually it should be removed before the 
$HEAL_TIMEOUT value. 

For now I've marked the test as bad. 

Patch is: https://review.gluster.org/16470 

Xavi 

On 25/01/17 17:24, Atin Mukherjee wrote: 
> Can we please address this as early as possible, my patch has hit this 
> failure 3 out of 4 recheck attempts now. I'm guessing some recent 
> changes has caused it. 
> 
> On Wed, 25 Jan 2017 at 12:10, Ashish Pandey < aspan...@redhat.com 
> > wrote: 
> 
> 
> Pranith, 
> 
> In this test tests/basic/ec/ec-background-heals.t, I think the line 
> number 86 actually creating a heal entry instead of 
> helping data heal quickly. What if all the data was already healed 
> at that moment, truncate came and in preop set the dirty flag and at the 
> end, as part of the heal, dirty flag was unset on previous good 
> bricks only and the brick which acted as heal-sink still has dirty 
> marked by truncate. 
> That is why we are only seeing "1" as get_pending_heal_count. If a 
> file was actually not healed it should be "2". 
> If heal on this file completes and unset of dirty flag happens 
> before truncate everything will be fine. 
> 
> I think we can wait for file to be heal without truncate? 
> 
> 71 #Test that disabling background-heals still drains the queue 
> 72 TEST $CLI volume set $V0 disperse.background-heals 1 
> 73 TEST touch $M0/{a,b,c,d} 
> 74 TEST kill_brick $V0 $H0 $B0/${V0}2 
> 75 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "1" mount_get_option_value 
> $M0 $V0-disperse-0 background-heals 
> 76 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "200" 
> mount_get_option_value $M0 $V0-disperse-0 heal-wait-qlength 
> 77 TEST truncate -s 1GB $M0/a 
> 78 echo abc > $M0/b 
> 79 echo abc > $M0/c 
> 80 echo abc > $M0/d 
> 81 TEST $CLI volume start $V0 force 
> 82 EXPECT_WITHIN $CHILD_UP_TIMEOUT "3" ec_child_up_count $V0 0 
> 83 TEST chown root:root $M0/{a,b,c,d} 
> 84 TEST $CLI volume set $V0 disperse.background-heals 0 
> 85 EXPECT_NOT "0" mount_get_option_value $M0 $V0-disperse-0 
> heal-waiters 
> 
> 86 TEST truncate -s 0 $M0/a # This completes the heal fast ;-) <<<<<<< 
> 
> 87 EXPECT_WITHIN $HEAL_TIMEOUT "^0$" get_pending_heal_count $V0 
> 
> ---- 
> Ashish 
> 
> 
> 
> 
> 
>  
> *From: *"Raghavendra Gowdappa" < rgowd...@redhat.com 
> > 
> *To: *"Nithya Balachandran" < nbala...@redhat.com 
> > 
> *Cc: *"Gluster Devel" < gluster-devel@gluster.org 
> >, "Pranith Kumar Karampuri" 
> < pkara...@redhat.com >, "Ashish Pandey" 
> < aspan...@redhat.com > 
> *Sent: *Wednesday, January 25, 2017 9:41:38 AM 
> *Subject: *Re: [Gluster-devel] Spurious regression 
> failure? tests/basic/ec/ec-background-heals.t 
> 
> 
> Found another failure on same test: 
> https://build.gluster.org/job/centos6-regression/2874/consoleFull 
> 
> - Original Message - 
> > From: "Nithya Balachandran" < nbala...@redhat.com 
> > 
> > To: "Gluster Devel" < gluster-devel@gluster.org 
> >, "Pranith Kumar Karampuri" 
> < pkara...@redhat.com >, "Ashish Pandey" 
> > < aspan...@redhat.com > 
> > Sent: Tuesday, January 24, 2017 9:16:31 AM 
> > Subject: [Gluster-devel] Spurious regression 
> failure? tests/basic/ec/ec-background-heals.t 
> > 
> > Hi, 
> > 
> > 
> > Can you please take a look at 
> > https://build.gluster.org/job/centos6-regression/2859/console ? 
> > 
> > tests/basic/ec/ec-background-heals.t has failed. 
> > 
> > Thanks, 
> > Nithya 
> > 
> > ___ 
> > Gluster-devel mailing list 
> > Gluster-devel@gluster.org  
> > http://lists.gluster.org/mailman/listinfo/gluster-devel 
> ___ 
> 
> Gluster-devel mailing list 
> 
> Gluster-devel@gluster.org  
> 
> http://lists.gluster.org/mailman/listinfo/gluster-devel 
> 
> -- 
> - Atin (atinm) 
> 
> 
> ___ 
> Gluster-devel mailing list 
> Gluster-devel@gluster.org 
> http://lists.gluster.org/mailman/listinfo/gluster-devel 
> 




-- 
- Atin (atinm) 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious regression failure? tests/basic/ec/ec-background-heals.t

2017-01-24 Thread Ashish Pandey

Pranith, 

In this test tests/basic/ec/ec-background-heals.t, I think the line number 86 
actually creating a heal entry instead of 
helping data heal quickly. What if all the data was already healed at that 
moment, truncate came and in preop set the dirty flag and at the 
end, as part of the heal, dirty flag was unset on previous good bricks only and 
the brick which acted as heal-sink still has dirty marked by truncate. 
That is why we are only seeing "1" as get_pending_heal_count. If a file was 
actually not healed it should be "2". 
If heal on this file completes and unset of dirty flag happens before truncate 
everything will be fine. 

I think we can wait for file to be heal without truncate? 

71 #Test that disabling background-heals still drains the queue 
72 TEST $CLI volume set $V0 disperse.background-heals 1 
73 TEST touch $M0/{a,b,c,d} 
74 TEST kill_brick $V0 $H0 $B0/${V0}2 
75 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "1" mount_get_option_value $M0 
$V0-disperse-0 background-heals 
76 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "200" mount_get_option_value $M0 
$V0-disperse-0 heal-wait-qlength 
77 TEST truncate -s 1GB $M0/a 
78 echo abc > $M0/b 
79 echo abc > $M0/c 
80 echo abc > $M0/d 
81 TEST $CLI volume start $V0 force 
82 EXPECT_WITHIN $CHILD_UP_TIMEOUT "3" ec_child_up_count $V0 0 
83 TEST chown root:root $M0/{a,b,c,d} 
84 TEST $CLI volume set $V0 disperse.background-heals 0 
85 EXPECT_NOT "0" mount_get_option_value $M0 $V0-disperse-0 heal-waiters 

86 TEST truncate -s 0 $M0/a # This completes the heal fast ;-) <<<<<<< 

87 EXPECT_WITHIN $HEAL_TIMEOUT "^0$" get_pending_heal_count $V0 

 
Ashish 





- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com> 
To: "Nithya Balachandran" <nbala...@redhat.com> 
Cc: "Gluster Devel" <gluster-devel@gluster.org>, "Pranith Kumar Karampuri" 
<pkara...@redhat.com>, "Ashish Pandey" <aspan...@redhat.com> 
Sent: Wednesday, January 25, 2017 9:41:38 AM 
Subject: Re: [Gluster-devel] Spurious regression failure? 
tests/basic/ec/ec-background-heals.t 

Found another failure on same test: 
https://build.gluster.org/job/centos6-regression/2874/consoleFull 

- Original Message - 
> From: "Nithya Balachandran" <nbala...@redhat.com> 
> To: "Gluster Devel" <gluster-devel@gluster.org>, "Pranith Kumar Karampuri" 
> <pkara...@redhat.com>, "Ashish Pandey" 
> <aspan...@redhat.com> 
> Sent: Tuesday, January 24, 2017 9:16:31 AM 
> Subject: [Gluster-devel] Spurious regression failure? 
> tests/basic/ec/ec-background-heals.t 
> 
> Hi, 
> 
> 
> Can you please take a look at 
> https://build.gluster.org/job/centos6-regression/2859/console ? 
> 
> tests/basic/ec/ec-background-heals.t has failed. 
> 
> Thanks, 
> Nithya 
> 
> ___ 
> Gluster-devel mailing list 
> Gluster-devel@gluster.org 
> http://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Error being logged in disperse volumes

2016-12-20 Thread Ashish Pandey

That means ec is not getting correct trusted.ec.config xattr from minimum 
number of bricks. 

1 - Did you see any error on client side while accessing any file? 
2 - If yes, check the file xattr's from all the bricks for such files. 

It is too less information to find out the cause. If [1] is true then you have 
to give all client logs, getxattr from all the bricks for all the file giving 
error. 
gluster v status, info and also gluster volume heal info. 

Is there anything you changed recently on volume? 

Ashish 



- Original Message -

From: "Ankireddypalle Reddy"  
To: gluster-us...@gluster.org, "Gluster Devel (gluster-devel@gluster.org)" 
 
Sent: Tuesday, December 20, 2016 7:42:29 PM 
Subject: [Gluster-users] Error being logged in disperse volumes 



Hi, 

I am seeing many instances of the following error in the log files. What does 
this signify. 



[2016-12-19 08:14:04.988004] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 0-StoragePool-disperse-1: Invalid or 
corrupted config [Invalid argument] 

[2016-12-19 08:14:04.988027] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-1: Invalid 
config xattr [Invalid argument] 

[2016-12-19 08:14:04.988038] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 0-StoragePool-disperse-0: Invalid or 
corrupted config [Invalid argument] 

[2016-12-19 08:14:04.988055] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-0: Invalid 
config xattr [Invalid argument] 

[2016-12-19 08:14:04.988179] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 0-StoragePool-disperse-3: Invalid or 
corrupted config [Invalid argument] 

[2016-12-19 08:14:04.988193] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-3: Invalid 
config xattr [Invalid argument] 

[2016-12-19 08:14:04.988228] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 0-StoragePool-disperse-2: Invalid or 
corrupted config [Invalid argument] 

[2016-12-19 08:14:04.988248] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-2: Invalid 
config xattr [Invalid argument] 

[2016-12-19 08:14:04.988338] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 0-StoragePool-disperse-4: Invalid or 
corrupted config [Invalid argument] 

[2016-12-19 08:14:04.988350] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-4: Invalid 
config xattr [Invalid argument] 

[2016-12-19 08:14:04.988374] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 0-StoragePool-disperse-5: Invalid or 
corrupted config [Invalid argument] 

[2016-12-19 08:14:04.988388] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-5: Invalid 
config xattr [Invalid argument] 

[2016-12-19 08:14:04.988460] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 0-StoragePool-disperse-7: Invalid or 
corrupted config [Invalid argument] 

[2016-12-19 08:14:04.988478] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-7: Invalid 
config xattr [Invalid argument] 

[2016-12-19 08:14:05.508034] E [MSGID: 122001] 
[ec-common.c:872:ec_config_check] 0-StoragePool-disperse-6: Invalid or 
corrupted config [Invalid argument] 

[2016-12-19 08:14:05.508072] E [MSGID: 122066] 
[ec-common.c:969:ec_prepare_update_cbk] 0-StoragePool-disperse-6: Invalid 
config xattr [Invalid argument] 



Thanks and Regards, 

Ram 
***Legal Disclaimer*** 
"This communication may contain confidential and privileged material for the 
sole use of the intended recipient. Any unauthorized review, use or 
distribution 
by others is strictly prohibited. If you have received the message by mistake, 
please advise the sender by reply email and delete the message. Thank you." 
** 

___ 
Gluster-users mailing list 
gluster-us...@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-users 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] 1402538 : Assertion failure during rebalance of symbolic links

2016-12-13 Thread Ashish Pandey
Hi All, 

We have been seeing an issue where re balancing symbolic links leads to an 
assertion failure in EC volume. 

The root cause of this is that while migrating symbolic links to other sub 
volume, it creates a link file (with attributes .T) . 
This file is a regular file. 
Now, during migration a setattr comes to this link and because of possible 
race, posix_stat return stats of this "T" file. 
In ec_manager_seattr, we receive callbacks and check the type of entry. If it 
is a regular file we try to get size and if it is not there, we raise an 
assert. 
So, basically we are checking a size of the link (which will not have size) 
which has been returned as regular file and we are ending up when this 
condition 
becomes TRUE. 

Now, this looks like a problem with re balance and difficult to fix at this 
point (as per the discussion). 
We have an alternative to fix it in EC but that will be more like a hack than 
an actual fix. We should not modify EC 
to deal with an individual issue which is in other translator. 

Now the question is how to proceed with this? Any suggestions? 

Details on this bug can be found here - 
https://bugzilla.redhat.com/show_bug.cgi?id=1402538 

 
Ashish 



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] EC volume: Bug caused by race condition during rmdir and inodelk

2016-11-24 Thread Ashish Pandey
Hi All, 

On EC volume, we have been seeing an interesting bug caused by fine race 
between rmdir and inodelk which leads to EIO error. 
Pranith, Xavi and I had a discussion on this and have some possible solution. 
Your inputs are required on this bug and its possible solution. 

1 - Consider rmdir on /a/b and chown on a/b from 2 different clients/process. 
rmdir /a/b takes lock on "a" and deletes "b". 
However, chown /a/b will take lock on "b" to do setattr fop. Now, in case of 
(4+2) EC volume, inodelk might get ENOENT from 3 bricks (if rmdir /a/b succeeds 
on these 3 bricks) and 
might get locks from rest of the 3 bricks. 

As an operation should be successful on at least 4 bricks, it will throw EIO 
for chown. 

This can be solved on EC side while processing callbacks and based on error we 
can decide which error we should be passed on. In the above case sending 
ENOENT could be safer. 

2 - rmdir /a/b and rmdir /a/b/c comes from 2 different clients/process. 
Now, suppose "c" has been deleted by some other process, rmdir /a/b would be 
succeeded. 
At this point, it is possible that /a/b has been deleted and the inode for "b" 
has been purged on 3 bricks. At time the inodelk on "b" comes for rmdir /a/b/c. 
It will fail on 3 bricks and gets lock on rest of the 3. In this case again, we 
gets EIO. 

To solve this, It was suggested to take lock on parent as well as on entry 
which is to be deleted. So in the above case when we do rmdir /a/b/c we will 
take locks 
on "b" and "c" both. For rmdir /a/b we will take lock on "a" and "b". This will 
certainly impact performance but at this moment this looks feasible solution. 

 
Ashish 




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Review request for EC - set/unset dirty flag for data/metadata update

2016-09-07 Thread Ashish Pandey
Hi, 

Please review the following patch for EC- 
http://review.gluster.org/#/c/13733/ 

Ashish 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Compilation failed on latest gluster

2016-08-25 Thread Ashish Pandey

As Susant and Atin suggested, I cleaned everything and did installation from 
scratch and it is working now. 



- Original Message -

From: "Nigel Babu" <nig...@redhat.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Manikandan Selvaganesh" <mselv...@redhat.com>, "Gluster Devel" 
<gluster-devel@gluster.org> 
Sent: Thursday, August 25, 2016 11:38:46 AM 
Subject: Re: [Gluster-devel] Compilation failed on latest gluster 

Are you using something that's not Centos, NetBSD, or FreeBSD? 

I'm curious how we managed to slip a build failure despite our smoke tests. 

On Thu, Aug 25, 2016 at 11:19 AM, Ashish Pandey < aspan...@redhat.com > wrote: 



Hi, 

I am trying to build latest code on my laptop and it is giving compilation 
error - 

CC cli-rl.o 
CC cli-cmd-global.o 
CC cli-cmd-volume.o 
cli-cmd-volume.c: In function ‘cli_cmd_quota_cbk’: 
cli-cmd-volume.c:1712:35: error: ‘EVENT_QUOTA_ENABLE’ undeclared (first use in 
this function) 
gf_event (EVENT_QUOTA_ENABLE, "volume=%s", volname); 
^ 
cli-cmd-volume.c:1712:35: note: each undeclared identifier is reported only 
once for each function it appears in 
cli-cmd-volume.c:1715:35: error: ‘EVENT_QUOTA_DISABLE’ undeclared (first use in 
this function) 
gf_event (EVENT_QUOTA_DISABLE, "volume=%s", volname); 
^ 
cli-cmd-volume.c:1718:35: error: ‘EVENT_QUOTA_SET_USAGE_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_SET_USAGE_LIMIT, "volume=%s;" 
^ 
cli-cmd-volume.c:1723:35: error: ‘EVENT_QUOTA_SET_OBJECTS_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_SET_OBJECTS_LIMIT, "volume=%s;" 
^ 
cli-cmd-volume.c:1728:35: error: ‘EVENT_QUOTA_REMOVE_USAGE_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_REMOVE_USAGE_LIMIT, "volume=%s;" 
^ 
cli-cmd-volume.c:1732:35: error: ‘EVENT_QUOTA_REMOVE_OBJECTS_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_REMOVE_OBJECTS_LIMIT, 
^ 
cli-cmd-volume.c:1736:35: error: ‘EVENT_QUOTA_ALERT_TIME’ undeclared (first use 
in this function) 
gf_event (EVENT_QUOTA_ALERT_TIME, "volume=%s;time=%s", 
^ 
cli-cmd-volume.c:1740:35: error: ‘EVENT_QUOTA_SOFT_TIMEOUT’ undeclared (first 
use in this function) 
gf_event (EVENT_QUOTA_SOFT_TIMEOUT, "volume=%s;" 
^ 
cli-cmd-volume.c:1744:35: error: ‘EVENT_QUOTA_HARD_TIMEOUT’ undeclared (first 
use in this function) 
gf_event (EVENT_QUOTA_HARD_TIMEOUT, "volume=%s;" 
^ 
cli-cmd-volume.c:1748:35: error: ‘EVENT_QUOTA_DEFAULT_SOFT_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_DEFAULT_SOFT_LIMIT, "volume=%s;" 
^ 
Makefile:539: recipe for target 'cli-cmd-volume.o' failed 

If I roll back 4 patches and then compile it is working. 
I am suspecting that http://review.gluster.org/15230 is doing something. 
Could you please look into this? 
Do I need to do something to make it work? 

Ashish 






___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-devel 






-- 
nigelb 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Compilation failed on latest gluster

2016-08-24 Thread Ashish Pandey
Hi, 

I am trying to build latest code on my laptop and it is giving compilation 
error - 

CC cli-rl.o 
CC cli-cmd-global.o 
CC cli-cmd-volume.o 
cli-cmd-volume.c: In function ‘cli_cmd_quota_cbk’: 
cli-cmd-volume.c:1712:35: error: ‘EVENT_QUOTA_ENABLE’ undeclared (first use in 
this function) 
gf_event (EVENT_QUOTA_ENABLE, "volume=%s", volname); 
^ 
cli-cmd-volume.c:1712:35: note: each undeclared identifier is reported only 
once for each function it appears in 
cli-cmd-volume.c:1715:35: error: ‘EVENT_QUOTA_DISABLE’ undeclared (first use in 
this function) 
gf_event (EVENT_QUOTA_DISABLE, "volume=%s", volname); 
^ 
cli-cmd-volume.c:1718:35: error: ‘EVENT_QUOTA_SET_USAGE_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_SET_USAGE_LIMIT, "volume=%s;" 
^ 
cli-cmd-volume.c:1723:35: error: ‘EVENT_QUOTA_SET_OBJECTS_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_SET_OBJECTS_LIMIT, "volume=%s;" 
^ 
cli-cmd-volume.c:1728:35: error: ‘EVENT_QUOTA_REMOVE_USAGE_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_REMOVE_USAGE_LIMIT, "volume=%s;" 
^ 
cli-cmd-volume.c:1732:35: error: ‘EVENT_QUOTA_REMOVE_OBJECTS_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_REMOVE_OBJECTS_LIMIT, 
^ 
cli-cmd-volume.c:1736:35: error: ‘EVENT_QUOTA_ALERT_TIME’ undeclared (first use 
in this function) 
gf_event (EVENT_QUOTA_ALERT_TIME, "volume=%s;time=%s", 
^ 
cli-cmd-volume.c:1740:35: error: ‘EVENT_QUOTA_SOFT_TIMEOUT’ undeclared (first 
use in this function) 
gf_event (EVENT_QUOTA_SOFT_TIMEOUT, "volume=%s;" 
^ 
cli-cmd-volume.c:1744:35: error: ‘EVENT_QUOTA_HARD_TIMEOUT’ undeclared (first 
use in this function) 
gf_event (EVENT_QUOTA_HARD_TIMEOUT, "volume=%s;" 
^ 
cli-cmd-volume.c:1748:35: error: ‘EVENT_QUOTA_DEFAULT_SOFT_LIMIT’ undeclared 
(first use in this function) 
gf_event (EVENT_QUOTA_DEFAULT_SOFT_LIMIT, "volume=%s;" 
^ 
Makefile:539: recipe for target 'cli-cmd-volume.o' failed 

If I roll back 4 patches and then compile it is working. 
I am suspecting that http://review.gluster.org/15230 is doing something. 
Could you please look into this? 
Do I need to do something to make it work? 

Ashish 





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Patch Review

2016-06-06 Thread Ashish Pandey
Hi All, 

I have modified the code for volume file generation to support decompounder 
translator. 
Please review this patch and provide me your comments/suggestion. 
http://review.gluster.org/#/c/13968/ 

Ashish 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Regression-test-burn-in crash in EC test

2016-04-29 Thread Ashish Pandey

Hi Jeff, 

Where can we find the core dump? 

--- 
Ashish 

- Original Message -

From: "Pranith Kumar Karampuri" <pkara...@redhat.com> 
To: "Jeff Darcy" <jda...@redhat.com> 
Cc: "Gluster Devel" <gluster-devel@gluster.org>, "Ashish Pandey" 
<aspan...@redhat.com> 
Sent: Thursday, April 28, 2016 11:58:54 AM 
Subject: Re: [Gluster-devel] Regression-test-burn-in crash in EC test 

Ashish, 
Could you take a look at this? 

Pranith 

- Original Message - 
> From: "Jeff Darcy" <jda...@redhat.com> 
> To: "Gluster Devel" <gluster-devel@gluster.org> 
> Sent: Wednesday, April 27, 2016 11:31:25 PM 
> Subject: [Gluster-devel] Regression-test-burn-in crash in EC test 
> 
> One of the "rewards" of reviewing and merging people's patches is getting 
> email if the next regression-test-burn-in should fail - even if it fails for 
> a completely unrelated reason. Today I got one that's not among the usual 
> suspects. The failure was a core dump in tests/bugs/disperse/bug-1304988.t, 
> weighing in at a respectable 42 frames. 
> 
> #0 0x7fef25976cb9 in dht_rename_lock_cbk 
> #1 0x7fef25955f62 in dht_inodelk_done 
> #2 0x7fef25957352 in dht_blocking_inodelk_cbk 
> #3 0x7fef32e02f8f in default_inodelk_cbk 
> #4 0x7fef25c029a3 in ec_manager_inodelk 
> #5 0x7fef25bf9802 in __ec_manager 
> #6 0x7fef25bf990c in ec_manager 
> #7 0x7fef25c03038 in ec_inodelk 
> #8 0x7fef25bee7ad in ec_gf_inodelk 
> #9 0x7fef25957758 in dht_blocking_inodelk_rec 
> #10 0x7fef25957b2d in dht_blocking_inodelk 
> #11 0x7fef2597713f in dht_rename_lock 
> #12 0x7fef25977835 in dht_rename 
> #13 0x7fef32e0f032 in default_rename 
> #14 0x7fef32e0f032 in default_rename 
> #15 0x7fef32e0f032 in default_rename 
> #16 0x7fef32e0f032 in default_rename 
> #17 0x7fef32e0f032 in default_rename 
> #18 0x7fef32e07c29 in default_rename_resume 
> #19 0x7fef32d8ed40 in call_resume_wind 
> #20 0x7fef32d98b2f in call_resume 
> #21 0x7fef24cfc568 in open_and_resume 
> #22 0x7fef24cffb99 in ob_rename 
> #23 0x7fef24aee482 in mdc_rename 
> #24 0x7fef248d68e5 in io_stats_rename 
> #25 0x7fef32e0f032 in default_rename 
> #26 0x7fef2ab1b2b9 in fuse_rename_resume 
> #27 0x7fef2ab12c47 in fuse_fop_resume 
> #28 0x7fef2ab107cc in fuse_resolve_done 
> #29 0x7fef2ab108a2 in fuse_resolve_all 
> #30 0x7fef2ab10900 in fuse_resolve_continue 
> #31 0x7fef2ab0fb7c in fuse_resolve_parent 
> #32 0x7fef2ab1077d in fuse_resolve 
> #33 0x7fef2ab10879 in fuse_resolve_all 
> #34 0x7fef2ab10900 in fuse_resolve_continue 
> #35 0x7fef2ab0fb7c in fuse_resolve_parent 
> #36 0x7fef2ab1077d in fuse_resolve 
> #37 0x7fef2ab10824 in fuse_resolve_all 
> #38 0x7fef2ab1093e in fuse_resolve_and_resume 
> #39 0x7fef2ab1b40e in fuse_rename 
> #40 0x7fef2ab2a96a in fuse_thread_proc 
> #41 0x7fef3204daa1 in start_thread 
> 
> In other words we started at FUSE, went through a bunch of performance 
> translators, through DHT to EC, and then crashed on the way back. It seems 
> a little odd that we turn the fop around immediately in EC, and that we have 
> default_inodelk_cbk at frame 3. Could one of the DHT or EC people please 
> take a look at it? Thanks! 
> 
> 
> https://build.gluster.org/job/regression-test-burn-in/868/console 
> ___ 
> Gluster-devel mailing list 
> Gluster-devel@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-devel 
> 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size

2016-04-19 Thread Ashish Pandey
Hi Serkan, 

I have gone through the logs and can see there are some blocked inode lock 
requests. 
We have observed that some other user have also faced this issue with similar 
logs. 
I think you have tried some rolling update on your setup or some NODES , on 
which you have collected these statedumps, must have gone down for one or other 
reason. 

We will further dig it up and will try to find out the root cause. Till than 
you can resolve this issue by restarting the volume which will restart nfs and 
shd and will release any locks taken by these process. 

"gluster volume start  force" will do the same. 

Regards, 
Ashish 


- Original Message -

From: "Serkan Çoban" <cobanser...@gmail.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" 
<gluster-devel@gluster.org> 
Sent: Monday, April 18, 2016 11:51:37 AM 
Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size 

You can find the statedumps of server and client in below link. 
Gluster version is 3.7.10, 78x(16+4) disperse setup. 60 nodes named 
node185..node244 
https://www.dropbox.com/s/cc2dgsxwuk48mba/gluster_statedumps.zip?dl=0 


On Fri, Apr 15, 2016 at 9:52 PM, Ashish Pandey <aspan...@redhat.com> wrote: 
> 
> Actually it was my mistake I overlooked the configuration you provided..It 
> will be huge. 
> I would suggest to take statedump on all the nodes and try to grep for 
> "BLOCKED" in statedump files on all the nodes. 
> See if you can see any such line in any file and send those files. No need 
> to send statedump of all the bricks.. 
> 
> 
> 
> 
>  
> From: "Serkan Çoban" <cobanser...@gmail.com> 
> To: "Ashish Pandey" <aspan...@redhat.com> 
> Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" 
> <gluster-devel@gluster.org> 
> Sent: Friday, April 15, 2016 6:07:00 PM 
> 
> Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size 
> 
> Hi Asish, 
> 
> Sorry for the question but do you want all brick statedumps from all 
> servers or all brick dumps from one server? 
> All server brick dumps is nearly 700MB zipped.. 
> 
> On Fri, Apr 15, 2016 at 2:16 PM, Ashish Pandey <aspan...@redhat.com> wrote: 
>> 
>> To get the state dump of fuse client- 
>> 1 - get the PID of fuse mount process 
>> 2 - kill -USR1  
>> 
>> statedump can be found in the same directory where u get for brick 
>> process. 
>> 
>> Following link could be helpful for future reference - 
>> 
>> https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.md 
>> 
>> Ashish 
>> 
>>  
>> From: "Serkan Çoban" <cobanser...@gmail.com> 
>> To: "Ashish Pandey" <aspan...@redhat.com> 
>> Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" 
>> <gluster-devel@gluster.org> 
>> Sent: Friday, April 15, 2016 4:02:20 PM 
>> Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size 
>> 
>> Yes it is only one brick which error appears. I can send all other 
>> brick dumps too.. 
>> How can I get state dump in fuse client? There is no gluster command 
>> there.. 
>> ___ 
>> Gluster-users mailing list 
>> gluster-us...@gluster.org 
>> http://www.gluster.org/mailman/listinfo/gluster-users 
>> 
> ___ 
> Gluster-users mailing list 
> gluster-us...@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users 
> 
___ 
Gluster-users mailing list 
gluster-us...@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-users 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size

2016-04-15 Thread Ashish Pandey

Actually it was my mistake I overlooked the configuration you provided..It will 
be huge. 
I would suggest to take statedump on all the nodes and try to grep for 
"BLOCKED" in statedump files on all the nodes. 
See if you can see any such line in any file and send those files. No need to 
send statedump of all the bricks.. 




- Original Message -

From: "Serkan Çoban" <cobanser...@gmail.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" 
<gluster-devel@gluster.org> 
Sent: Friday, April 15, 2016 6:07:00 PM 
Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size 

Hi Asish, 

Sorry for the question but do you want all brick statedumps from all 
servers or all brick dumps from one server? 
All server brick dumps is nearly 700MB zipped.. 

On Fri, Apr 15, 2016 at 2:16 PM, Ashish Pandey <aspan...@redhat.com> wrote: 
> 
> To get the state dump of fuse client- 
> 1 - get the PID of fuse mount process 
> 2 - kill -USR1  
> 
> statedump can be found in the same directory where u get for brick process. 
> 
> Following link could be helpful for future reference - 
> https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.md 
> 
> Ashish 
> 
>  
> From: "Serkan Çoban" <cobanser...@gmail.com> 
> To: "Ashish Pandey" <aspan...@redhat.com> 
> Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" 
> <gluster-devel@gluster.org> 
> Sent: Friday, April 15, 2016 4:02:20 PM 
> Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size 
> 
> Yes it is only one brick which error appears. I can send all other 
> brick dumps too.. 
> How can I get state dump in fuse client? There is no gluster command there.. 
> ___ 
> Gluster-users mailing list 
> gluster-us...@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users 
> 
___ 
Gluster-users mailing list 
gluster-us...@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-users 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size

2016-04-15 Thread Ashish Pandey
Hi Serkan, 

Could you also provide us the statedump of all the brick processes and clients? 



Commands to generate statedumps for brick processes/nfs server/quotad 

For bricks: gluster volume statedump  

For nfs server: gluster volume statedump  nfs 
We can find the directory where statedump files are created using 'gluster 
--print-statedumpdir' 
Also, the mount logs would help us to debug the issue. 

Ashish 

- Original Message -

From: "Serkan Çoban"  
To: "Gluster Users" , "Gluster Devel" 
 
Sent: Thursday, April 14, 2016 6:27:10 PM 
Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size 

Here is the related brick log: 

/var/log/glusterfs/bricks/bricks-02.log:[2016-04-14 11:31:25.700556] E 
[inodelk.c:309:__inode_unlock_lock] 0-v0-locks: Matching lock not 
found for unlock 0-9223372036854775807, by 94d29e885e7f on 
0x7f037413b990 
/var/log/glusterfs/bricks/bricks-02.log:[2016-04-14 11:31:25.700639] E 
[MSGID: 115053] [server-rpc-fops.c:276:server_inodelk_cbk] 
0-v0-server: 712984: INODELK 
/workdir/raw_output/xxx/yyy/zzz.dat.gz.snappy1460474606605 
(1191e32e-44ba-4e20-87ca-35ace8519c19) ==> (Invalid argument) [Invalid 
argument] 

On Thu, Apr 14, 2016 at 3:25 PM, Serkan Çoban  wrote: 
> Hi, 
> 
> During read/write tests to a 78x(16+4) distributed disperse volume 
> from 50 clients, One clients hangs on read/write with the following 
> logs: 
> 
> [2016-04-14 11:11:04.728580] W [MSGID: 122056] 
> [ec-combine.c:866:ec_combine_check] 0-v0-disperse-6: Mismatching xdata 
> in answers of 'LOOKUP' 
> [2016-04-14 11:11:04.728624] W [MSGID: 122053] 
> [ec-common.c:116:ec_check_status] 0-v0-disperse-6: Operation failed on 
> some subvolumes (up=F, mask=F, remaining=0, good=D, 
> bad=2) 
> [2016-04-14 11:11:04.736689] I [MSGID: 122058] 
> [ec-heal.c:2340:ec_heal_do] 0-v0-disperse-6: /workdir/raw_output2: 
> name heal successful on F 
> [2016-04-14 11:29:26.718036] W [MSGID: 122056] 
> [ec-combine.c:866:ec_combine_check] 0-v0-disperse-1: Mismatching xdata 
> in answers of 'LOOKUP' 
> [2016-04-14 11:29:26.718121] W [MSGID: 122053] 
> [ec-common.c:116:ec_check_status] 0-v0-disperse-1: Operation failed on 
> some subvolumes (up=F, mask=F, remaining=0, good=E, 
> bad=1) 
> [2016-04-14 11:29:42.501760] I [MSGID: 122058] 
> [ec-heal.c:2340:ec_heal_do] 0-v0-disperse-1: /workdir/raw_output2: 
> name heal successful on F 
> [2016-04-14 11:31:25.714812] E [ec-inode-read.c:1612:ec_manager_stat] 
> (-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_resume+0x91) 
> [0x7f5ec9f942b1] 
> -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(__ec_manager+0x57) 
> [0x7f5ec9f94497] 
> -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_stat+0x2c4)
>  
> [0x7f5ec9faaed4] ) 0-: Assertion failed: ec_get_inode_size(fop, 
> fop->locks[0].lock->loc.inode, >iatt[0].ia_size) 
> [2016-04-14 11:31:25.722372] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-40: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722411] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-41: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722450] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-44: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722477] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-42: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722503] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-43: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722577] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-45: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722605] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-46: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722742] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-49: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722794] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-47: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722818] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-48: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722840] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-50: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722883] E [MSGID: 114031] 
> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-52: remote 
> operation failed [Invalid argument] 
> [2016-04-14 11:31:25.722906] E [MSGID: 114031] 
> 

Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size

2016-04-15 Thread Ashish Pandey

To get the state dump of fuse client- 
1 - get the PID of fuse mount process 
2 - kill -USR1  

statedump can be found in the same directory where u get for brick process. 

Following link could be helpful for future reference - 
https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.md 

Ashish 

- Original Message -

From: "Serkan Çoban" <cobanser...@gmail.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" 
<gluster-devel@gluster.org> 
Sent: Friday, April 15, 2016 4:02:20 PM 
Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size 

Yes it is only one brick which error appears. I can send all other 
brick dumps too.. 
How can I get state dump in fuse client? There is no gluster command there.. 
___ 
Gluster-users mailing list 
gluster-us...@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-users 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Assertion failed: ec_get_inode_size

2016-04-15 Thread Ashish Pandey


I think this is the statesump of only one brick. 
We would required statedump from all the bricks and client process in case of 
fuse or nfs process if it is mounted through nfs. 

 
Ashish 

- Original Message -

From: "Serkan Çoban" <cobanser...@gmail.com> 
To: "Ashish Pandey" <aspan...@redhat.com> 
Cc: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" 
<gluster-devel@gluster.org> 
Sent: Friday, April 15, 2016 2:11:57 PM 
Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size 

Sorry for typo, brick state dump file. 

On Fri, Apr 15, 2016 at 11:41 AM, Serkan Çoban <cobanser...@gmail.com> wrote: 
> Hi I reproduce the problem, brick log file is in below link: 
> https://www.dropbox.com/s/iy09j7mm2hrsf03/bricks-02.5677.dump.1460705370.gz?dl=0
>  
> 
> 
> On Thu, Apr 14, 2016 at 8:07 PM, Ashish Pandey <aspan...@redhat.com> wrote: 
>> Hi Serkan, 
>> 
>> Could you also provide us the statedump of all the brick processes and 
>> clients? 
>> 
>> Commands to generate statedumps for brick processes/nfs server/quotad 
>> 
>> For bricks: gluster volume statedump  
>> 
>> For nfs server: gluster volume statedump  nfs 
>> 
>> 
>> We can find the directory where statedump files are created using 'gluster 
>> --print-statedumpdir' 
>> Also, the mount logs would help us to debug the issue. 
>> 
>> Ashish 
>> 
>>  
>> From: "Serkan Çoban" <cobanser...@gmail.com> 
>> To: "Gluster Users" <gluster-us...@gluster.org>, "Gluster Devel" 
>> <gluster-devel@gluster.org> 
>> Sent: Thursday, April 14, 2016 6:27:10 PM 
>> Subject: Re: [Gluster-users] Assertion failed: ec_get_inode_size 
>> 
>> 
>> Here is the related brick log: 
>> 
>> /var/log/glusterfs/bricks/bricks-02.log:[2016-04-14 11:31:25.700556] E 
>> [inodelk.c:309:__inode_unlock_lock] 0-v0-locks: Matching lock not 
>> found for unlock 0-9223372036854775807, by 94d29e885e7f on 
>> 0x7f037413b990 
>> /var/log/glusterfs/bricks/bricks-02.log:[2016-04-14 11:31:25.700639] E 
>> [MSGID: 115053] [server-rpc-fops.c:276:server_inodelk_cbk] 
>> 0-v0-server: 712984: INODELK 
>> /workdir/raw_output/xxx/yyy/zzz.dat.gz.snappy1460474606605 
>> (1191e32e-44ba-4e20-87ca-35ace8519c19) ==> (Invalid argument) [Invalid 
>> argument] 
>> 
>> On Thu, Apr 14, 2016 at 3:25 PM, Serkan Çoban <cobanser...@gmail.com> wrote: 
>>> Hi, 
>>> 
>>> During read/write tests to a 78x(16+4) distributed disperse volume 
>>> from 50 clients, One clients hangs on read/write with the following 
>>> logs: 
>>> 
>>> [2016-04-14 11:11:04.728580] W [MSGID: 122056] 
>>> [ec-combine.c:866:ec_combine_check] 0-v0-disperse-6: Mismatching xdata 
>>> in answers of 'LOOKUP' 
>>> [2016-04-14 11:11:04.728624] W [MSGID: 122053] 
>>> [ec-common.c:116:ec_check_status] 0-v0-disperse-6: Operation failed on 
>>> some subvolumes (up=F, mask=F, remaining=0, good=D, 
>>> bad=2) 
>>> [2016-04-14 11:11:04.736689] I [MSGID: 122058] 
>>> [ec-heal.c:2340:ec_heal_do] 0-v0-disperse-6: /workdir/raw_output2: 
>>> name heal successful on F 
>>> [2016-04-14 11:29:26.718036] W [MSGID: 122056] 
>>> [ec-combine.c:866:ec_combine_check] 0-v0-disperse-1: Mismatching xdata 
>>> in answers of 'LOOKUP' 
>>> [2016-04-14 11:29:26.718121] W [MSGID: 122053] 
>>> [ec-common.c:116:ec_check_status] 0-v0-disperse-1: Operation failed on 
>>> some subvolumes (up=F, mask=F, remaining=0, good=E, 
>>> bad=1) 
>>> [2016-04-14 11:29:42.501760] I [MSGID: 122058] 
>>> [ec-heal.c:2340:ec_heal_do] 0-v0-disperse-1: /workdir/raw_output2: 
>>> name heal successful on F 
>>> [2016-04-14 11:31:25.714812] E [ec-inode-read.c:1612:ec_manager_stat] 
>>> (-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_resume+0x91) 
>>> [0x7f5ec9f942b1] 
>>> 
>>> -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(__ec_manager+0x57)
>>>  
>>> [0x7f5ec9f94497] 
>>> 
>>> -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_stat+0x2c4)
>>>  
>>> [0x7f5ec9faaed4] ) 0-: Assertion failed: ec_get_inode_size(fop, 
>>> fop->locks[0].lock->loc.inode, >iatt[0].ia_size) 
>>> [2016-04-14 11:31:25.722372] E [MSGID: 114031] 
>>> [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-v0-client-40: remote 
>>> opera

[Gluster-devel] Fragment size in Systematic erasure code

2016-03-14 Thread Ashish Pandey
Hi Xavi,

I think for Systematic erasure coded volume you are going to take fragment size 
of 512 Bytes.
Will there be any CLI option to configure this block size?
We were having a discussion and Manoj was suggesting to have this option which 
might improve performance for some workload.
For example- If we can configure it to 8K, all the read can be served only from 
one brick in case a file size is less than 8K. 

Ashish
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel