[Gluster-devel] Unplanned Jenkins restart

2019-04-11 Thread Deepshikha Khandelwal
Hello,

I had to do an unplanned Jenkins restart. Jenkins was not responding to any
of the requests and was not giving back the regression votes.
I did update the vote verified values of regression jobs which seemed to
change to 0 all of a sudden and was not giving back the vote. I'm
investigating more on the root cause. I'll update on the bug[1] about the
root cause.

Centos regression jobs may have ended up canceled. Please retry them.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1698716
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Proposal: Changes in Gluster Community meetings

2019-04-11 Thread Amar Tumballi Suryanarayan
Hi All,

Below is the final details of our community meeting, and I will be sending
invites to mailing list following this email. You can add Gluster Community
Calendar so you can get notifications on the meetings.

We are starting the meetings from next week. For the first meeting, we need
1 volunteer from users to discuss the use case / what went well, and what
went bad, etc. preferrably in APAC region.  NA/EMEA region, next week.

Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g

Gluster Community Meeting
Previous
Meeting minutes:

   - http://github.com/gluster/community

Date/Time:
Check the community calendar

Bridge

   - APAC friendly hours
  - Bridge: https://bluejeans.com/836554017
   - NA/EMEA
  - Bridge: https://bluejeans.com/486278655

--
Attendance

   - Name, Company

Host

   - Who will host next meeting?
  - Host will need to send out the agenda 24hr - 12hrs in advance to
  mailing list, and also make sure to send the meeting minutes.
  - Host will need to reach out to one user at least who can talk about
  their usecase, their experience, and their needs.
  - Host needs to send meeting minutes as PR to
  http://github.com/gluster/community

User stories

   - Discuss 1 usecase from a user.
  - How was the architecture derived, what volume type used, options,
  etc?
  - What were the major issues faced ? How to improve them?
  - What worked good?
  - How can we all collaborate well, so it is win-win for the community
  and the user? How can we

Community

   -

   Any release updates?
   -

   Blocker issues across the project?
   -

   Metrics
   - Number of new bugs since previous meeting. How many are not triaged?
  - Number of emails, anything unanswered?

Conferences
/ Meetups

   - Any conference in next 1 month where gluster-developers are going?
   gluster-users are going? So we can meet and discuss.

Developer
focus

   -

   Any design specs to discuss?
   -

   Metrics of the week?
   - Coverity
  - Clang-Scan
  - Number of patches from new developers.
  - Did we increase test coverage?
  - [Atin] Also talk about most frequent test failures in the CI and
  carve out an AI to get them fixed.

RoundTable

   - 



Regards,
Amar

On Mon, Mar 25, 2019 at 8:53 PM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

> Thanks for the feedback Darrell,
>
> The new proposal is to have one in North America 'morning' time. (10AM
> PST), And another in ASIA day time, which is evening 7pm/6pm in Australia,
> 9pm Newzealand, 5pm Tokyo, 4pm Beijing.
>
> For example, if we choose Every other Tuesday for meeting, and 1st of the
> month is Tuesday, we would have North America time for 1st, and on 15th it
> would be ASIA/Pacific time.
>
> Hopefully, this way, we can cover all the timezones, and meeting minutes
> would be committed to github repo, so that way, it will be easier for
> everyone to be aware of what is happening.
>
> Regards,
> Amar
>
> On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic 
> wrote:
>
>> As a user, I’d like to visit more of these, but the time slot is my 3AM.
>> Any possibility for a rolling schedule (move meeting +6 hours each week
>> with rolling attendance from maintainers?) or an occasional regional
>> meeting 12 hours opposed to the one you’re proposing?
>>
>>   -Darrell
>>
>> On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan <
>> atumb...@redhat.com> wrote:
>>
>> All,
>>
>> We currently have 3 meetings which are public:
>>
>> 1. Maintainer's Meeting
>>
>> - Runs once in 2 weeks (on Mondays), and current attendance is around 3-5
>> on an avg, and not much is discussed.
>> - Without majority attendance, we can't take any decisions too.
>>
>> 2. Community meeting
>>
>> - Supposed to happen on #gluster-meeting, every 2 weeks, and is the only
>> meeting which is for 'Community/Users'. Others are for developers as of
>> now.
>> Sadly attendance is getting closer to 0 in recent times.
>>
>> 3. GCS meeting
>>
>> - We started it as an effort inside Red Hat gluster team, and opened it
>> up for community from Jan 2019, but the attendance was always from RHT
>> members, and haven't seen any traction from wider group.
>>

[Gluster-devel] Invitation: Gluster Community Meeting (APAC friendly hours) @ Tue Apr 16, 2019 11:30am - 12:30pm (IST) (gluster-devel@gluster.org)

2019-04-11 Thread amarts
BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:REQUEST
BEGIN:VEVENT
DTSTART:20190416T06Z
DTEND:20190416T07Z
DTSTAMP:20190411T085648Z
ORGANIZER;CN=Gluster Community Calendar:mailto:vebj5bl0knsb9d0cm9eh9pbli4@g
 roup.calendar.google.com
UID:256uie4423kjhk4f8btivbg...@google.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=gluster-us...@gluster.org;X-NUM-GUESTS=0:mailto:gluster-users@glust
 er.org
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=maintain...@gluster.org;X-NUM-GUESTS=0:mailto:maintainers@gluster.o
 rg
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=gluster-devel@gluster.org;X-NUM-GUESTS=0:mailto:gluster-devel@glust
 er.org
X-MICROSOFT-CDO-OWNERAPPTID:385341162
CREATED:20190410T163315Z
DESCRIPTION:Bridge: https://bluejeans.com/836554017\n\nMeeting minutes: htt
 ps://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both\n\nPrevious Meeting notes: http:
 //github.com/gluster/community\n\n-::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~
 :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\nPlease do not edit this sec
 tion of the description.\n\nView your event at https://www.google.com/calen
 dar/event?action=VIEW&eid=MjU2dWllNDQyM2tqaGs0ZjhidGl2YmdtM2YgZ2x1c3Rlci1kZ
 XZlbEBnbHVzdGVyLm9yZw&tok=NTIjdmViajVibDBrbnNiOWQwY205ZWg5cGJsaTRAZ3JvdXAuY
 2FsZW5kYXIuZ29vZ2xlLmNvbTE4ODM2ZDY3Mzk4MjRjNDc2OWE3NmEyMTY0ODEwMDg0ODI5ODNl
 ZmY&ctz=Asia%2FKolkata&hl=en&es=1.\n-::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~
 :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-
LAST-MODIFIED:20190411T085646Z
LOCATION:https://bluejeans.com/836554017
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:Gluster Community Meeting (APAC friendly hours)
TRANSP:OPAQUE
END:VEVENT
END:VCALENDAR


invite.ics
Description: application/ics
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Invitation: Gluster Community Meeting (NA/EMEA friendly hours) @ Tue Apr 23, 2019 10:30pm - 11:30pm (IST) (gluster-devel@gluster.org)

2019-04-11 Thread amarts
BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:REQUEST
BEGIN:VEVENT
DTSTART:20190423T17Z
DTEND:20190423T18Z
DTSTAMP:20190411T085751Z
ORGANIZER;CN=Gluster Community Calendar:mailto:vebj5bl0knsb9d0cm9eh9pbli4@g
 roup.calendar.google.com
UID:7v55fde915d3st1ptv8rg6n...@google.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=gluster-us...@gluster.org;X-NUM-GUESTS=0:mailto:gluster-users@glust
 er.org
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=maintain...@gluster.org;X-NUM-GUESTS=0:mailto:maintainers@gluster.o
 rg
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=gluster-devel@gluster.org;X-NUM-GUESTS=0:mailto:gluster-devel@glust
 er.org
X-MICROSOFT-CDO-OWNERAPPTID:-1002675767
CREATED:20190410T163536Z
DESCRIPTION:Bridge: https://bluejeans.com/486278655\n\n\nMeeting minutes: h
 ttps://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both\n\nPrevious Meeting notes: htt
 p://github.com/gluster/community\n\n-::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~
 :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-\nPlease do not edit this s
 ection of the description.\n\nView your event at https://www.google.com/cal
 endar/event?action=VIEW&eid=N3Y1NWZkZTkxNWQzc3QxcHR2OHJnNm4zNzYgZ2x1c3Rlci1
 kZXZlbEBnbHVzdGVyLm9yZw&tok=NTIjdmViajVibDBrbnNiOWQwY205ZWg5cGJsaTRAZ3JvdXA
 uY2FsZW5kYXIuZ29vZ2xlLmNvbWYwYzdiMTk0ODRhYWY1MTBmNjU4NmQ0MGM2M2M1MWU3ZDg0ZD
 QzYzI&ctz=Asia%2FKolkata&hl=en&es=1.\n-::~:~::~:~:~:~:~:~:~:~:~:~:~:~:~:~:~
 :~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~:~::~:~::-
LAST-MODIFIED:20190411T085749Z
LOCATION:https://bluejeans.com/486278655
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:Gluster Community Meeting (NA/EMEA friendly hours)
TRANSP:OPAQUE
END:VEVENT
END:VCALENDAR


invite.ics
Description: application/ics
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] test failure reports for last 15 days

2019-04-11 Thread Xavi Hernandez
On Wed, Apr 10, 2019 at 7:25 PM Xavi Hernandez  wrote:

> On Wed, Apr 10, 2019 at 4:01 PM Atin Mukherjee 
> wrote:
>
>> And now for last 15 days:
>>
>>
>> https://fstat.gluster.org/summary?start_date=2019-03-25&end_date=2019-04-10
>>
>> ./tests/bitrot/bug-1373520.t 18  ==> Fixed through
>> https://review.gluster.org/#/c/glusterfs/+/22481/, I don't see this
>> failing in brick mux post 5th April
>> ./tests/bugs/ec/bug-1236065.t 17  ==> happens only in brick mux,
>> needs analysis.
>>
>
> I've identified the problem here, but not the cause yet. There's a stale
> inodelk acquired by a process that is already dead, which causes inodelk
> requests from self-heal and other processes to block.
>
> The reason why it seemed to block in random places is that all commands
> are executed with the working directory pointing to a gluster directory
> which needs healing after the initial tests. Because of the stale inodelk,
> when any application tries to open a file in the working directory, it's
> blocked.
>
> I'll investigate what causes this.
>

I think I've found the problem. This is a fragment of the brick log that
includes script steps, connections and disconnections of brick 0, and lock
requests to the problematic lock:

[2019-04-11 08:22:20.381398]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
TEST: 66 kill_brick patchy jahernan /d/backends/patchy2 ++
[2019-04-11 08:22:22.532646]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
TEST: 67 kill_brick patchy jahernan /d/backends/patchy3 ++
[2019-04-11 08:22:23.709655] I [MSGID: 115029]
[server-handshake.c:550:server_setvolume] 0-patchy-server: accepted client
from
CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-2
(version: 7dev) with subvol /d/backends/patchy1
[2019-04-11 08:22:23.792204] I [common.c:234:pl_trace_in] 8-patchy-locks:
[REQUEST] Locker = {Pid=29710, lk-owner=68580998b47f,
Client=CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-2,
Frame=18676} Lockee = {gfid=35743386-b7c2-41c9-aafd-6b13de216704, fd=(nil),
path=/test} Lock = {lock=INODELK, cmd=SETLK, type=WRITE, domain:
patchy-disperse-0, start=0, len=0, pid=0}
[2019-04-11 08:22:23.792299] I [common.c:285:pl_trace_out] 8-patchy-locks:
[GRANTED] Locker = {Pid=29710, lk-owner=68580998b47f,
Client=CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-2,
Frame=18676} Lockee = {gfid=35743386-b7c2-41c9-aafd-6b13de216704, fd=(nil),
path=/test} Lock = {lock=INODELK, cmd=SETLK, type=WRITE, domain:
patchy-disperse-0, start=0, len=0, pid=0}
[2019-04-11 08:22:24.628478]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
TEST: 68 5 online_brick_count ++
[2019-04-11 08:22:26.097092]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
TEST: 70 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o
2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++
[2019-04-11 08:22:26.333740]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
TEST: 71 ec_test_make ++
[2019-04-11 08:22:27.718963] I [MSGID: 115029]
[server-handshake.c:550:server_setvolume] 0-patchy-server: accepted client
from
CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-3
(version: 7dev) with subvol /d/backends/patchy1
[2019-04-11 08:22:27.801416] I [common.c:234:pl_trace_in] 8-patchy-locks:
[REQUEST] Locker = {Pid=29885, lk-owner=68580998b47f,
Client=CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-3,
Frame=19233} Lockee = {gfid=35743386-b7c2-41c9-aafd-6b13de216704, fd=(nil),
path=/test} Lock = {lock=INODELK, cmd=SETLK, type=UNLOCK, domain:
patchy-disperse-0, start=0, len=0, pid=0}
[2019-04-11 08:22:27.801434] E [inodelk.c:513:__inode_unlock_lock]
8-patchy-locks:  Matching lock not found for unlock 0-9223372036854775807,
by 68580998b47f on 0x7f0ed0029190
[2019-04-11 08:22:27.801446] I [common.c:285:pl_trace_out] 8-patchy-locks:
[Invalid argument] Locker = {Pid=29885, lk-owner=68580998b47f,
Client=CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-3,
Frame=19233} Lockee = {gfid=35743386-b7c2-41c9-aafd-6b13de216704, fd=(nil),
path=/test} Lock = {lock=INODELK, cmd=SETLK, type=UNLOCK, domain:
patchy-disperse-0, start=0, len=0, pid=0}

This is a fragment of the client log:

[2019-04-11 08:22:20.381398]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
TEST: 66 kill_brick patchy jahernan /d/backends/patchy2 ++
[2019-04-11 08:22:20.675938] I [MSGID: 114018]
[client.c:2333:client_rpc_notify] 0-patchy-client-1: disconnected from
patchy-client-1. Client process will keep trying to connect to glusterd
until brick's port is available
[2019-04-11 08:22:21.674772] W [MSGID: 122035]
[ec-common.c:654:ec_child_select] 0-patchy-disperse-0: Executing operation
with so

Re: [Gluster-devel] test failure reports for last 15 days

2019-04-11 Thread Xavi Hernandez
On Thu, Apr 11, 2019 at 11:28 AM Xavi Hernandez  wrote:

> On Wed, Apr 10, 2019 at 7:25 PM Xavi Hernandez 
> wrote:
>
>> On Wed, Apr 10, 2019 at 4:01 PM Atin Mukherjee 
>> wrote:
>>
>>> And now for last 15 days:
>>>
>>>
>>> https://fstat.gluster.org/summary?start_date=2019-03-25&end_date=2019-04-10
>>>
>>> ./tests/bitrot/bug-1373520.t 18  ==> Fixed through
>>> https://review.gluster.org/#/c/glusterfs/+/22481/, I don't see this
>>> failing in brick mux post 5th April
>>> ./tests/bugs/ec/bug-1236065.t 17  ==> happens only in brick mux,
>>> needs analysis.
>>>
>>
>> I've identified the problem here, but not the cause yet. There's a stale
>> inodelk acquired by a process that is already dead, which causes inodelk
>> requests from self-heal and other processes to block.
>>
>> The reason why it seemed to block in random places is that all commands
>> are executed with the working directory pointing to a gluster directory
>> which needs healing after the initial tests. Because of the stale inodelk,
>> when any application tries to open a file in the working directory, it's
>> blocked.
>>
>> I'll investigate what causes this.
>>
>
> I think I've found the problem. This is a fragment of the brick log that
> includes script steps, connections and disconnections of brick 0, and lock
> requests to the problematic lock:
>
> [2019-04-11 08:22:20.381398]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
> TEST: 66 kill_brick patchy jahernan /d/backends/patchy2 ++
> [2019-04-11 08:22:22.532646]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
> TEST: 67 kill_brick patchy jahernan /d/backends/patchy3 ++
> [2019-04-11 08:22:23.709655] I [MSGID: 115029]
> [server-handshake.c:550:server_setvolume] 0-patchy-server: accepted client
> from
> CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-2
> (version: 7dev) with subvol /d/backends/patchy1
> [2019-04-11 08:22:23.792204] I [common.c:234:pl_trace_in] 8-patchy-locks:
> [REQUEST] Locker = {Pid=29710, lk-owner=68580998b47f,
> Client=CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-2,
> Frame=18676} Lockee = {gfid=35743386-b7c2-41c9-aafd-6b13de216704, fd=(nil),
> path=/test} Lock = {lock=INODELK, cmd=SETLK, type=WRITE, domain:
> patchy-disperse-0, start=0, len=0, pid=0}
> [2019-04-11 08:22:23.792299] I [common.c:285:pl_trace_out] 8-patchy-locks:
> [GRANTED] Locker = {Pid=29710, lk-owner=68580998b47f,
> Client=CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-2,
> Frame=18676} Lockee = {gfid=35743386-b7c2-41c9-aafd-6b13de216704, fd=(nil),
> path=/test} Lock = {lock=INODELK, cmd=SETLK, type=WRITE, domain:
> patchy-disperse-0, start=0, len=0, pid=0}
> [2019-04-11 08:22:24.628478]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
> TEST: 68 5 online_brick_count ++
> [2019-04-11 08:22:26.097092]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
> TEST: 70 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o
> 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++
> [2019-04-11 08:22:26.333740]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
> TEST: 71 ec_test_make ++
> [2019-04-11 08:22:27.718963] I [MSGID: 115029]
> [server-handshake.c:550:server_setvolume] 0-patchy-server: accepted client
> from
> CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-3
> (version: 7dev) with subvol /d/backends/patchy1
> [2019-04-11 08:22:27.801416] I [common.c:234:pl_trace_in] 8-patchy-locks:
> [REQUEST] Locker = {Pid=29885, lk-owner=68580998b47f,
> Client=CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-3,
> Frame=19233} Lockee = {gfid=35743386-b7c2-41c9-aafd-6b13de216704, fd=(nil),
> path=/test} Lock = {lock=INODELK, cmd=SETLK, type=UNLOCK, domain:
> patchy-disperse-0, start=0, len=0, pid=0}
> [2019-04-11 08:22:27.801434] E [inodelk.c:513:__inode_unlock_lock]
> 8-patchy-locks:  Matching lock not found for unlock 0-9223372036854775807,
> by 68580998b47f on 0x7f0ed0029190
> [2019-04-11 08:22:27.801446] I [common.c:285:pl_trace_out] 8-patchy-locks:
> [Invalid argument] Locker = {Pid=29885, lk-owner=68580998b47f,
> Client=CTX_ID:1c2952c2-e90f-4631-8712-170b8c05aa6e-GRAPH_ID:0-PID:28900-HOST:jahernan-PC_NAME:patchy-client-1-RECON_NO:-3,
> Frame=19233} Lockee = {gfid=35743386-b7c2-41c9-aafd-6b13de216704, fd=(nil),
> path=/test} Lock = {lock=INODELK, cmd=SETLK, type=UNLOCK, domain:
> patchy-disperse-0, start=0, len=0, pid=0}
>
> This is a fragment of the client log:
>
> [2019-04-11 08:22:20.381398]:++ G_LOG:tests/bugs/ec/bug-1236065.t:
> TEST: 66 kill_brick patchy jahernan /d/backends/patchy2 ++
> [2019-04-11 08:22:20.675938] I [MSGID: 114018]
> [client.c:2333:client_rpc_notify] 0-patchy-client-1: disconnected from
> patchy-client-1. Client pr

Re: [Gluster-devel] [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-11 Thread Karthik Subrahmanya
On Thu, Apr 11, 2019 at 12:43 PM Martin Toth  wrote:

> Hi Karthik,
>
> more over, I would like to ask if there are some recommended
> settings/parameters for SHD in order to achieve good or fair I/O while
> volume will be healed when I will replace Brick (this should trigger
> healing process).
>
If I understand you concern correctly, you need to get fair I/O performance
for clients while healing takes place as part of  the replace brick
operation. For this you can turn off the "data-self-heal" and
"metadata-self-heal" options until the heal completes on the new brick.
Turning off client side healing doesn't compromise data integrity and
consistency. During the read request from client, pending xattr is
evaluated for replica copies and read is only served from correct copy.
During writes, IO will continue on both the replicas, SHD will take care of
healing files.
After replacing the brick, we strongly recommend you to consider upgrading
your gluster to one of the maintained versions. We have many stability
related fixes there, which can handle some critical issues and corner cases
which you could hit during these kind of scenarios.

Regards,
Karthik

> I had some problems in past when healing was triggered, VM disks became
> unresponsive because healing took most of I/O. My volume containing only
> big files with VM disks.
>
> Thanks for suggestions.
> BR,
> Martin
>
> On 10 Apr 2019, at 12:38, Martin Toth  wrote:
>
> Thanks, this looks ok to me, I will reset brick because I don't have any
> data anymore on failed node so I can use same path / brick name.
>
> Is reseting brick dangerous command? Should I be worried about some
> possible failure that will impact remaining two nodes? I am running really
> old 3.7.6 but stable version.
>
> Thanks,
> BR!
>
> Martin
>
>
> On 10 Apr 2019, at 12:20, Karthik Subrahmanya  wrote:
>
> Hi Martin,
>
> After you add the new disks and creating raid array, you can run the
> following command to replace the old brick with new one:
>
> - If you are going to use a different name to the new brick you can run
> gluster volume replace-brickcommit force
>
> - If you are planning to use the same name for the new brick as well then
> you can use
> gluster volume reset-brickcommit force
> Here old-brick & new-brick's hostname &  path should be same.
>
> After replacing the brick, make sure the brick comes online using volume
> status.
> Heal should automatically start, you can check the heal status to see all
> the files gets replicated to the newly added brick. If it does not start
> automatically, you can manually start that by running gluster volume heal
> .
>
> HTH,
> Karthik
>
> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth  wrote:
>
>> Hi all,
>>
>> I am running replica 3 gluster with 3 bricks. One of my servers failed -
>> all disks are showing errors and raid is in fault state.
>>
>> Type: Replicate
>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1
>> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is down
>> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1
>>
>> So one of my bricks is totally failed (node2). It went down and all data
>> are lost (failed raid on node2). Now I am running only two bricks on 2
>> servers out from 3.
>> This is really critical problem for us, we can lost all data. I want to
>> add new disks to node2, create new raid array on them and try to replace
>> failed brick on this node.
>>
>> What is the procedure of replacing Brick2 on node2, can someone advice? I
>> can’t find anything relevant in documentation.
>>
>> Thanks in advance,
>> Martin
>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-11 Thread Karthik Subrahmanya
On Thu, Apr 11, 2019 at 1:40 PM Strahil Nikolov 
wrote:

> Hi Karthik,
>
> - the volume configuration you were using?
> I used oVirt 4.2.6 Gluster Wizard, so I guess - we need to involve the
> oVirt devs here.
> - why you wanted to replace your brick?
> I have deployed the arbiter on another location as I thought I can deploy
> the Thin Arbiter (still waiting the docs to be updated), but once I
> realized that GlusterD doesn't support Thin Arbiter, I had to build another
> machine for a local arbiter - thus a replacement was needed.
>
We are working on supporting Thin-arbiter with GlusterD. Once done, we will
update on the users list so that you can play with it and let us know your
experience.

> - which brick(s) you tried replacing?
> I was replacing the old arbiter with a new one
> - what problem(s) did you face?
> All oVirt VMs got paused due to I/O errors.
>
There could be many reasons for this. Without knowing the exact state of
the system at that time, I am afraid to make any comment on this.

>
> At the end, I have rebuild the whole setup and I never tried to replace
> the brick this way (used only reset-brick which didn't cause any issues).
>
> As I mentioned that was on v3.12, which is not the default for oVirt
> 4.3.x - so my guess is that it is OK now (current is v5.5).
>
I don't remember anyone complaining about this recently. This should work
in the latest releases.

>
> Just sharing my experience.
>
Highly appreciated.

Regards,
Karthik

>
> Best Regards,
> Strahil Nikolov
>
> В четвъртък, 11 април 2019 г., 0:53:52 ч. Гринуич-4, Karthik Subrahmanya <
> ksubr...@redhat.com> написа:
>
>
> Hi Strahil,
>
> Can you give us some more insights on
> - the volume configuration you were using?
> - why you wanted to replace your brick?
> - which brick(s) you tried replacing?
> - what problem(s) did you face?
>
> Regards,
> Karthik
>
> On Thu, Apr 11, 2019 at 10:14 AM Strahil  wrote:
>
> Hi Karthnik,
> I used only once the brick replace function when I wanted to change my
> Arbiter (v3.12.15 in oVirt 4.2.7)  and it was a complete disaster.
> Most probably I should have stopped the source arbiter before doing that,
> but the docs didn't mention it.
>
> Thus I always use reset-brick, as it never let me down.
>
> Best Regards,
> Strahil Nikolov
> On Apr 11, 2019 07:34, Karthik Subrahmanya  wrote:
>
> Hi Strahil,
>
> Thank you for sharing your experience with reset-brick option.
> Since he is using the gluster version 3.7.6, we do not have the
> reset-brick [1] option implemented there. It is introduced in 3.9.0. He has
> to go with replace-brick with the force option if he wants to use the same
> path & name for the new brick.
> Yes, it is recommended to have the new brick to be of the same size as
> that of the other bricks.
>
> [1]
> https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command
>
> Regards,
> Karthik
>
> On Wed, Apr 10, 2019 at 10:31 PM Strahil  wrote:
>
> I have used reset-brick - but I have just changed the brick layout.
> You may give it a try, but I guess you need your new brick to have same
> amount of space (or more).
>
> Maybe someone more experienced should share a more sound solution.
>
> Best Regards,
> Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth 
> wrote:
> >
> > Hi all,
> >
> > I am running replica 3 gluster with 3 bricks. One of my servers failed -
> all disks are showing errors and raid is in fault state.
> >
> > Type: Replicate
> > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
> > Status: Started
> > Number of Bricks: 1 x 3 = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1
> > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is
> down
> > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1
> >
> > So one of my bricks is totally failed (node2). It went down and all data
> are lost (failed raid on node2). Now I am running only two bricks on 2
> servers out from 3.
> > This is really critical problem for us, we can lost all data. I want to
> add new disks to node2, create new raid array on them and try to replace
> failed brick on this node.
> >
> > What is the procedure of replacing Brick2 on node2, can someone advice?
> I can’t find anything relevant in documentation.
> >
> > Thanks in advance,
> > Martin
> > ___
> > Gluster-users mailing list
> > gluster-us...@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-11 Thread Karthik Subrahmanya
On Thu, Apr 11, 2019 at 6:38 PM Martin Toth  wrote:

> Hi Karthik,
>
> On Thu, Apr 11, 2019 at 12:43 PM Martin Toth  wrote:
>
>> Hi Karthik,
>>
>> more over, I would like to ask if there are some recommended
>> settings/parameters for SHD in order to achieve good or fair I/O while
>> volume will be healed when I will replace Brick (this should trigger
>> healing process).
>>
> If I understand you concern correctly, you need to get fair I/O
> performance for clients while healing takes place as part of  the replace
> brick operation. For this you can turn off the "data-self-heal" and
> "metadata-self-heal" options until the heal completes on the new brick.
>
>
> This is exactly what I mean. I am running VM disks on remaining 2 (out of
> 3 - one failed as mentioned) nodes and I need to ensure there will be fair
> I/O performance available on these two nodes while replace brick operation
> will heal volume.
> I will not run any VMs on node where replace brick operation will be
> running. So if I understand correctly, when I will set :
>
> # gluster volume set  cluster.data-self-heal off
> # gluster volume set  cluster.metadata-self-heal off
>
> this will tell Gluster clients (libgfapi and FUSE mount) not to read from
> node “where replace brick operation” is in place but from remaing two
> healthy nodes. Is this correct ? Thanks for clarification.
>
The reads will be served from one of the good bricks since the file will
either be not present on the replaced brick at the time of read or it will
be present but marked for heal if it is not already healed. If already
healed by SHD, then it could be served from the new brick as well, but
there won't be any problem in reading from there in that scenario.
By setting these two options whenever a read comes from client it will not
try to heal the file for data/metadata. Otherwise it would try to heal (if
not already healed by SHD) when the read comes on this, hence slowing down
the client.

>
> Turning off client side healing doesn't compromise data integrity and
> consistency. During the read request from client, pending xattr is
> evaluated for replica copies and read is only served from correct copy.
> During writes, IO will continue on both the replicas, SHD will take care of
> healing files.
> After replacing the brick, we strongly recommend you to consider upgrading
> your gluster to one of the maintained versions. We have many stability
> related fixes there, which can handle some critical issues and corner cases
> which you could hit during these kind of scenarios.
>
>
> This will be first priority in infrastructure after fixing this cluster
> back to fully functional replica3. I will upgrade to 3.12.x and then to
> version 5 or 6.
>
Sounds good.

If you are planning to have the same name for the new brick and if you get
the error like "Brick may be containing or be contained by an existing
brick" even after using the force option, try  using a different name. That
should work.

Regards,
Karthik

>
> BR,
> Martin
>
> Regards,
> Karthik
>
>> I had some problems in past when healing was triggered, VM disks became
>> unresponsive because healing took most of I/O. My volume containing only
>> big files with VM disks.
>>
>> Thanks for suggestions.
>> BR,
>> Martin
>>
>> On 10 Apr 2019, at 12:38, Martin Toth  wrote:
>>
>> Thanks, this looks ok to me, I will reset brick because I don't have any
>> data anymore on failed node so I can use same path / brick name.
>>
>> Is reseting brick dangerous command? Should I be worried about some
>> possible failure that will impact remaining two nodes? I am running really
>> old 3.7.6 but stable version.
>>
>> Thanks,
>> BR!
>>
>> Martin
>>
>>
>> On 10 Apr 2019, at 12:20, Karthik Subrahmanya 
>> wrote:
>>
>> Hi Martin,
>>
>> After you add the new disks and creating raid array, you can run the
>> following command to replace the old brick with new one:
>>
>> - If you are going to use a different name to the new brick you can run
>> gluster volume replace-brickcommit
>> force
>>
>> - If you are planning to use the same name for the new brick as well then
>> you can use
>> gluster volume reset-brickcommit force
>> Here old-brick & new-brick's hostname &  path should be same.
>>
>> After replacing the brick, make sure the brick comes online using volume
>> status.
>> Heal should automatically start, you can check the heal status to see all
>> the files gets replicated to the newly added brick. If it does not start
>> automatically, you can manually start that by running gluster volume heal
>> .
>>
>> HTH,
>> Karthik
>>
>> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth  wrote:
>>
>>> Hi all,
>>>
>>> I am running replica 3 gluster with 3 bricks. One of my servers failed -
>>> all disks are showing errors and raid is in fault state.
>>>
>>> Type: Replicate
>>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
>>> Status: Started
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: node1.san:/tank/gluster/

[Gluster-devel] [Gluster-infra] Upgrading build.gluster.org

2019-04-11 Thread Deepshikha Khandelwal
Hello,

I’ve planned to do an upgrade of build.gluster.org tomorrow morning so as
to install and pull in the latest security upgrade of the Jenkins plugins.
I’ll stop all the running jobs and re-trigger them once I'm done with the
upgrade.

The downtime window will be from :
UTC: 0330 to 0400
IST: 0900 to 0930

The outage is for 30 minutes. Please bear with us as we continue to ensure
the latest plugins and fixes for build.gluster.org

Thanks,
Deepshikha
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] test failure reports for last 15 days

2019-04-11 Thread FNU Raghavendra Manjunath
While analysing the logs of the runs where uss.t failed made following
observations.

1) In the first iteration of uss.t, the time difference between the first
test of the .t file and the last test of the .t file is just within 1
minute.

But, I think it is the cleanup sequence which is taking more time. One of
the reasons I guess this is happening is, we dont see the brick process
shutting down message
in the logs.


2) In the 2nd iteration of uss.t (because 1st iteration failed because of
timeout) it fails because something has not been completed in the cleanup
sequence of the previous iteration.

The volume start command itself fails in the 2nd iteration. Because of that
the remaining tests also fail

This is from cmd_history.log

uster.org:/d/backends/2/patchy_snap_mnt
builder202.int.aws.gluster.org:/d/backends/3/patchy_snap_mnt
++
[2019-04-10 19:54:09.145086]  : volume create patchy
builder202.int.aws.gluster.org:/d/backends/1/patchy_snap_mnt
builder202.int.aws.gluster.org:/d/backends/2/patchy_snap_mnt
builder202.int.aws.gluster.org:/d/backends/3/patchy_snap_mnt : SUCCESS
[2019-04-10 19:54:09.156221]:++ G_LOG:./tests/basic/uss.t: TEST: 39
gluster --mode=script --wignore volume set patchy nfs.disable false
++
[2019-04-10 19:54:09.265138]  : volume set patchy nfs.disable false :
SUCCESS
[2019-04-10 19:54:09.274386]:++ G_LOG:./tests/basic/uss.t: TEST: 42
gluster --mode=script --wignore volume start patchy ++
[2019-04-10 19:54:09.565086]  : volume start patchy : FAILED : Commit
failed on localhost. Please check log file for details.
[2019-04-10 19:54:09.572753]:++ G_LOG:./tests/basic/uss.t: TEST: 44
_GFS --attribute-timeout=0 --entry-timeout=0 --volfile-server=
builder202.int.aws.gluster.org --volfile-id=patchy /mnt/glusterfs/0
++


And this is from the brick showing some issue with the export directory not
being present properly.

[2019-04-10 19:54:09.544476] I [MSGID: 100030] [glusterfsd.c:2857:main]
0-/build/install/sbin/glusterfsd: Started running
/build/install/sbin/glusterfsd version 7dev (args:
/build/install/sbin/glusterfsd -s buil
der202.int.aws.gluster.org --volfile-id
patchy.builder202.int.aws.gluster.org.d-backends-1-patchy_snap_mnt -p
/var/run/gluster/vols/patchy/builder202.int.aws.gluster.org-d-backends-1-patchy_snap_mnt.pid
-S /var/
run/gluster/7ac65190b72da80a.socket --brick-name
/d/backends/1/patchy_snap_mnt -l
/var/log/glusterfs/bricks/d-backends-1-patchy_snap_mnt.log --xlator-option
*-posix.glusterd-uuid=695c060d-74d3-440e-8cdb-327ec297
f2d2 --process-name brick --brick-port 49152 --xlator-option
patchy-server.listen-port=49152)
[2019-04-10 19:54:09.549394] I [socket.c:962:__socket_server_bind]
0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
[2019-04-10 19:54:09.553190] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2019-04-10 19:54:09.553209] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 0
[2019-04-10 19:54:09.556932] I
[rpcsvc.c:2694:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured
rpc.outstanding-rpc-limit with value 64
[2019-04-10 19:54:09.557859] E [MSGID: 138001] [index.c:2392:init]
0-patchy-index: Failed to find parent dir
(/d/backends/1/patchy_snap_mnt/.glusterfs) of index basepath
/d/backends/1/patchy_snap_mnt/.glusterfs/
indices. [No such file or directory]>
(.glusterfs is absent)
[2019-04-10 19:54:09.557884] E [MSGID: 101019] [xlator.c:629:xlator_init]
0-patchy-index: Initialization of volume 'patchy-index' failed, review your
volfile again
[2019-04-10 19:54:09.557892] E [MSGID: 101066]
[graph.c:409:glusterfs_graph_init] 0-patchy-index: initializing translator
failed
[2019-04-10 19:54:09.557900] E [MSGID: 101176]
[graph.c:772:glusterfs_graph_activate] 0-graph: init failed
[2019-04-10 19:54:09.564154] I [io-stats.c:4033:fini] 0-patchy-io-stats:
io-stats translator unloaded
[2019-04-10 19:54:09.564748] W [glusterfsd.c:1592:cleanup_and_exit]
(-->/build/install/sbin/glusterfsd(mgmt_getspec_cbk+0x806) [0x411f32]
-->/build/install/sbin/glusterfsd(glusterfs_process_volfp+0x272) [0x40b9b
9] -->/build/install/sbin/glusterfsd(cleanup_and_exit+0x88) [0x4093a5] )
0-: received signum (-1), shutting down


And this is from the cmd_history.log file of the 2nd iteration uss.t from
another jenkins run of uss.t

[2019-04-10 15:35:51.927343]:++ G_LOG:./tests/basic/uss.t: TEST: 39
gluster --mode=script --wignore volume set patchy nfs.disable false
++
[2019-04-10 15:35:52.038072]  : volume set patchy nfs.disable false :
SUCCESS
[2019-04-10 15:35:52.057582]:++ G_LOG:./tests/basic/uss.t: TEST: 42
gluster --mode=script --wignore volume start patchy ++
[2019-04-10 15:35:52.104288]  : volume start patchy : FAILED : Failed to
find brick directory /d/backends/1/patchy_snap_mnt for volume patchy.
Reason : No such file or directory => (export 

[Gluster-devel] Be careful before closing fd in a default case

2019-04-11 Thread Mohit Agrawal
Hi,

  I want to highlight recent bug(
https://bugzilla.redhat.com/show_bug.cgi?id=1699025) due to raised after
fixed one Coverity
   bug https://review.gluster.org/#/c/glusterfs/+/20720/
  As we know all gluster processes initially keeping open standard fd's (0,1,2)
at the time of daemonizing so that kernel
  don't assign these fd's to any fd open by gluster process. In this
Coverity bug, we closed fd in changelog fini if fd value is not equal to -1.
  As we know GF_CALLOC initializes to all structure members to 0 so initial
fd value was 0 and changelog_init did not open htime_fd
  because changelog was not active so at the time of calling changelog fini it
closes fd(0). After closing fd(0) by changelog fini if any
  client(shd) is trying to establish a connection with the server(in the
brick_mux environment), the server gets fd(0) as a socket fd.
  I have observed socket event framework (socket_event_handler) was not
working perfectly for fd(0) while
  volumes are stopped in a loop in brick_mux environment and bricks are not
detached successfully.

  So always we should careful at the time of closing fd, before closing fd in
default case we should check fd should not be zero.
  I have fixed the same from (
https://review.gluster.org/#/c/glusterfs/+/22549/) and upload a .t also.

Regards,
Mohit Agrawal
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel