Re: [Gluster-users] GlusterFS and Kafka

2017-05-24 Thread Christopher Schmidt
So this change of the Gluster Volume Plugin will make it into K8s 1.7 or
1.8. Unfortunately too late for me.

Does anyone know how to disable performance translators by default?


Raghavendra Talur  schrieb am Mi., 24. Mai 2017, 19:30:

> On Wed, May 24, 2017 at 4:08 PM, Christopher Schmidt 
> wrote:
> >
> >
> > Vijay Bellur  schrieb am Mi., 24. Mai 2017 um 05:53
> Uhr:
> >>
> >> On Tue, May 23, 2017 at 1:39 AM, Christopher Schmidt <
> fakod...@gmail.com>
> >> wrote:
> >>>
> >>> OK, seems that this works now.
> >>>
> >>> A couple of questions:
> >>> - What do you think, are all these options necessary for Kafka?
> >>
> >>
> >> I am not entirely certain what subset of options will make it work as I
> do
> >> not understand the nature of failure with  Kafka and the default gluster
> >> configuration. It certainly needs further analysis to identify the list
> of
> >> options necessary. Would it be possible for you to enable one option
> after
> >> the other and determine the configuration that ?
> >>
> >>
> >>>
> >>> - You wrote that there have to be kind of application profiles. So to
> >>> find out, which set of options work is currently a matter of testing
> (and
> >>> hope)? Or are there any experiences for MongoDB / ProstgreSQL /
> Zookeeper
> >>> etc.?
> >>
> >>
> >> Application profiles are work in progress. We have a few that are
> focused
> >> on use cases like VM storage, block storage etc. at the moment.
> >>
> >>>
> >>> - I am using Heketi and Dynamik Storage Provisioning together with
> >>> Kubernetes. Can I set this volume options somehow by default or by
> volume
> >>> plugin?
> >>
> >>
> >>
> >> Adding Raghavendra and Michael to help address this query.
> >
> >
> > For me it would be sufficient to disable some (or all) translators, for
> all
> > volumes that'll be created, somewhere here:
> > https://github.com/gluster/gluster-containers/tree/master/CentOS
> > This is the container used by the GlusterFS DaemonSet for Kubernetes.
>
> Work is in progress to give such option at volume plugin level. We
> currently have a patch[1] in review for Heketi that allows users to
> set Gluster options using heketi-cli instead of going into a Gluster
> pod. Once this is in, we can add options in storage-class of
> Kubernetes that pass down Gluster options for every volume created in
> that storage-class.
>
> [1] https://github.com/heketi/heketi/pull/751
>
> Thanks,
> Raghavendra Talur
>
> >
> >>
> >>
> >> -Vijay
> >>
> >>
> >>
> >>>
> >>>
> >>> Thanks for you help... really appreciated.. Christopher
> >>>
> >>> Vijay Bellur  schrieb am Mo., 22. Mai 2017 um
> 16:41
> >>> Uhr:
> 
>  Looks like a problem with caching. Can you please try by disabling all
>  performance translators? The following configuration commands would
> disable
>  performance translators in the gluster client stack:
> 
>  gluster volume set  performance.quick-read off
>  gluster volume set  performance.io-cache off
>  gluster volume set  performance.write-behind off
>  gluster volume set  performance.stat-prefetch off
>  gluster volume set  performance.read-ahead off
>  gluster volume set  performance.readdir-ahead off
>  gluster volume set  performance.open-behind off
>  gluster volume set  performance.client-io-threads off
> 
>  Thanks,
>  Vijay
> 
> 
> 
>  On Mon, May 22, 2017 at 9:46 AM, Christopher Schmidt
>   wrote:
> >
> > Hi all,
> >
> > has anyone ever successfully deployed a Kafka (Cluster) on GlusterFS
> > volumes?
> >
> > I my case it's a Kafka Kubernetes-StatefulSet and a Heketi GlusterFS.
> > Needless to say that I am getting a lot of filesystem related
> > exceptions like this one:
> >
> > Failed to read `log header` from file channel
> > `sun.nio.ch.FileChannelImpl@67afa54a`. Expected to read 12 bytes,
> but
> > reached end of file after reading 0 bytes. Started read from position
> > 123065680.
> >
> > I limited the amount of exceptions with the
> > log.flush.interval.messages=1 option, but not all...
> >
> > best Christopher
> >
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> >
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-05-24 Thread Mahdi Adnan
Hi,


Still no RPMs in SIG repository.


--

Respectfully
Mahdi A. Mahdi


From: Niels de Vos 
Sent: Monday, May 22, 2017 3:26:02 PM
To: Atin Mukherjee
Cc: Mahdi Adnan; Vijay Bellur; gluster-user
Subject: Re: [Gluster-users] Rebalance + VM corruption - current status and 
request for feedback

On Sun, May 21, 2017 at 02:37:34AM +, Atin Mukherjee wrote:
> On Sun, 21 May 2017 at 02:17, Mahdi Adnan  wrote:
>
> > Thank you so much for your replay.
> >
> > Yes, i checked the 310 repository before and it was't there, see below;
> >
>
> I think 3.10.2 bits are still in 3.10 test repository in storage sig and
> hasn't been pushed to release mirror yet. Niels can update further on this.

Nobody reported test results, so there was no urgency to push the
release to the CentOS mirrors. I've marked the packages for release now,
and the CentOS team will probably sign+mirror them during the day
tomorrow.

Niels


>
>
> > Installed Packages
> > Name: glusterfs-server
> > Arch: x86_64
> > Version : 3.10.1
> > Release : 1.el7
> > Size: 4.3 M
> > Repo: installed
> > From repo   : centos-gluster310
> > Summary : Distributed file-system server
> > URL : http://gluster.readthedocs.io/en/latest/
> > License : GPLv2 or LGPLv3+
> > Description :
> >
> > and no other packages available in the repo
> >
> > --
> >
> > Respectfully
> > *Mahdi A. Mahdi*
> >
> > --
> > *From:* Vijay Bellur 
> > *Sent:* Saturday, May 20, 2017 6:46:51 PM
> > *To:* Krutika Dhananjay
> > *Cc:* Mahdi Adnan; raghavendra talur; gluster-user
> > *Subject:* Re: [Gluster-users] Rebalance + VM corruption - current status
> > and request for feedback
> >
> >
> >
> > On Sat, May 20, 2017 at 6:38 AM, Krutika Dhananjay 
> > wrote:
> >
> >> Raghavendra Talur might know. Adding him to the thread.
> >>
> >> -Krutika
> >>
> >> On Sat, May 20, 2017 at 2:47 PM, Mahdi Adnan 
> >> wrote:
> >>
> >>> Good morning,
> >>>
> >>>
> >>> SIG repository does not have the latest glusterfs 3.10.2.
> >>>
> >>> Do you have any idea when it's going to be updated ?
> >>>
> >>> Is there any other recommended place to get the latest rpms ?
> >>>
> >>>
> >
> > RPMs are available in the centos-gluster310-test repository [1].
> >
> > -Vijay
> >
> > [1] http://lists.gluster.org/pipermail/maintainers/2017-May/002575.html
> >
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> - Atin (atinm)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Mahdi Adnan
Well yes and no, when i start the re-balance and check it's status, it just 
tells me it completed the re-balance, but it really did not move any data and 
the volume is not evenly distributed.

right now brick6 is full, brick 5 is going to be full in few hours or so.


--

Respectfully
Mahdi A. Mahdi


From: Nithya Balachandran 
Sent: Wednesday, May 24, 2017 8:16:53 PM
To: Mahdi Adnan
Cc: Mohammed Rafi K C; gluster-users@gluster.org
Subject: Re: [Gluster-users] Distributed re-balance issue



On 24 May 2017 at 22:45, Nithya Balachandran 
> wrote:


On 24 May 2017 at 21:55, Mahdi Adnan 
> wrote:

Hi,


Thank you for your response.

I have around 15 files, each is 2TB qcow.

One brick reached 96% so i removed it with "brick remove" and waited until it 
goes for around 40% and stopped the removal process with brick remove stop.

The issue is brick1 drain it's data to brick6 only, and when brick6 reached 
around 90% i did the same thing as before and it drained the data to brick1 
only.

now brick6 reached 99% and i have only a few gigabytes left which will fill in 
the next half hour or so.

attached are the logs for all 6 bricks.

Hi,

Just to clarify, did you run a rebalance (gluster volume rebalance  start) 
or did you only run remove-brick  ?

On re-reading your original email, I see you did run a rebalance. Did it 
complete? Also which bricks are full at the moment?


--

Respectfully
Mahdi A. Mahdi


From: Nithya Balachandran >
Sent: Wednesday, May 24, 2017 6:45:10 PM
To: Mohammed Rafi K C
Cc: Mahdi Adnan; gluster-users@gluster.org
Subject: Re: [Gluster-users] Distributed re-balance issue



On 24 May 2017 at 20:02, Mohammed Rafi K C 
> wrote:


On 05/23/2017 08:53 PM, Mahdi Adnan wrote:

Hi,


I have a distributed volume with 6 bricks, each have 5TB and it's hosting large 
qcow2 VM disks (I know it's reliable but it's not important data)

I started with 5 bricks and then added another one, started the re balance 
process, everything went well, but now im looking at the bricks free space and 
i found one brick is around 82% while others ranging from 20% to 60%.

The brick with highest utilization is hosting more qcow2 disk than other 
bricks, and whenever i start re balance it just complete in 0 seconds and 
without moving any data.

How much is your average file size in the cluster? And number of files 
(roughly) .



What will happen with the brick became full ?

Once brick contents goes beyond 90%, new files won't be created in the brick. 
But existing files can grow.



Can i move data manually from one brick to the other ?

Nop.It is not recommended, even though gluster will try to find the file, it 
may break.



Why re balance not distributing data evenly on all bricks ?

Rebalance works based on layout, so we need to see how layouts are distributed. 
If one of your bricks has higher capacity, it will have larger layout.




That is correct. As Rafi said, the layout matters here. Can you please send 
across all the rebalance logs from all the 6 nodes?


Nodes runing CentOS 7.3

Gluster 3.8.11


Volume info;

Volume Name: ctvvols
Type: Distribute
Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
Status: Started
Snapshot Count: 0
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: ctv01:/vols/ctvvols
Brick2: ctv02:/vols/ctvvols
Brick3: ctv03:/vols/ctvvols
Brick4: ctv04:/vols/ctvvols
Brick5: ctv05:/vols/ctvvols
Brick6: ctv06:/vols/ctvvols
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: none
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
features.shard: off
user.cifs: off
network.ping-timeout: 10
storage.owner-uid: 36
storage.owner-gid: 36



re balance log:


[2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir] 
0-ctvvols-dht: Migration operation on dir 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0
 took 0.00 secs
[2017-05-23 14:45:12.640043] I [MSGID: 109081] [dht-common.c:4202:dht_setxattr] 
0-ctvvols-dht: fixing the layout of 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
[2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir] 
0-ctvvols-dht: migrate data called on 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
[2017-05-23 14:45:12.642421] I 

[Gluster-users] Fwd: Re: GlusterFS removal from Openstack Cinder

2017-05-24 Thread Joe Julian

Forwarded for posterity and follow-up.


 Forwarded Message 
Subject:Re: GlusterFS removal from Openstack Cinder
Date:   Fri, 05 May 2017 21:07:27 +
From:   Amye Scavarda 
To: 	Eric Harney , Joe Julian , 
Vijay Bellur 

CC: Amye Scavarda 



Eric,
I'm sorry to hear this.
I'm reaching out internally (within Gluster CI team and CentOS CI which 
supports Gluster) to get an idea of the level of effort we'll need to 
provide to resolve this.
It'll take me a few days to get this, but this is on my radar. In the 
meantime, is there somewhere I should be looking at for requirements to 
meet this gateway?


Thanks!
-- amye

On Fri, May 5, 2017 at 16:09 Joe Julian > wrote:


   On 05/05/2017 12:54 PM, Eric Harney wrote:
>> On 04/28/2017 12:41 PM, Joe Julian wrote:
>>> I learned, today, that GlusterFS was deprecated and removed from
>>> Cinder as one of our #gluster (freenode) users was attempting to
>>> upgrade openstack. I could find no rational nor discussion of that
>>> removal. Could you please educate me about that decision?
>>>
>
> Hi Joe,
>
> I can fill in on the rationale here.
>
> Keeping a driver in the Cinder tree requires running a CI platform to
> test that driver and report results against all patchsets
   submitted to
> Cinder.  This is a fairly large burden, which we could not meet
   once the
> Gluster Cinder driver was no longer an active development target
   at Red Hat.
>
> This was communicated via a warning issued by the driver for anyone
> running the OpenStack Newton code, and via the Cinder release
   notes for
> the Ocata release.  (I can see in retrospect that this was
   probably not
> communicated widely enough.)
>
> I apologize for not reaching out to the Gluster community about this.
>
> If someone from the Gluster world is interested in bringing this
   driver
> back, I can help coordinate there.  But it will require someone
   stepping
> in in a big way to maintain it.
>
> Thanks,
> Eric

   Ah, Red Hat's statement that the acquisition of InkTank was not an
   abandonment of Gluster seems rather disingenuous now. I'm disappointed.

   Would you please start a thread on the gluster-users and gluster-devel
   mailing lists and see if there's anyone willing to take ownership of
   this. I'm certainly willing to participate as well but my $dayjob has
   gone more kubernetes than openstack so I have only my limited free time
   that I can donate.

--
Amye Scavarda | a...@redhat.com  | Gluster 
Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Deleting large files on sharded volume hangs and doesn't delete shards

2017-05-24 Thread Walter Deignan
Sorry for the slow response. Your hunch was right. It seems to be a 
problem between tiering and sharding.

I untiered the volume and the symptom vanished. I then deleted and 
recreated the volume entirely (without tiering) in order to cleanup the 
orphaned shards.

-Walter Deignan
-Uline IT, Systems Architect



From:   Nithya Balachandran 
To: Walter Deignan 
Cc: gluster-users 
Date:   05/17/2017 10:17 PM
Subject:Re: [Gluster-users] Deleting large files on sharded volume 
hangs and doesn't delete shards



I don't think we have tested shards with a tiered volume.  Do you see such 
issues on non-tiered sharded volumes?

Regards,
Nithya

On 18 May 2017 at 00:51, Walter Deignan  wrote:
I have a reproducible issue where attempting to delete a file large enough 
to have been sharded hangs. I can't kill the 'rm' command and eventually 
am forced to reboot the client (which in this case is also part of the 
gluster cluster). After the node finishes rebooting I can see that while 
the file front-end is gone, the back-end shards are still present. 

Is this a known issue? Any way to get around it? 

-- 

[root@dc-vihi19 ~]# gluster volume info gv0 

Volume Name: gv0 
Type: Tier 
Volume ID: d42e366f-381d-4787-bcc5-cb6770cb7d58 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 24 
Transport-type: tcp 
Hot Tier : 
Hot Tier Type : Distributed-Replicate 
Number of Bricks: 4 x 2 = 8 
Brick1: dc-vihi71:/gluster/bricks/brick4/data 
Brick2: dc-vihi19:/gluster/bricks/brick4/data 
Brick3: dc-vihi70:/gluster/bricks/brick4/data 
Brick4: dc-vihi19:/gluster/bricks/brick3/data 
Brick5: dc-vihi71:/gluster/bricks/brick3/data 
Brick6: dc-vihi19:/gluster/bricks/brick2/data 
Brick7: dc-vihi70:/gluster/bricks/brick3/data 
Brick8: dc-vihi19:/gluster/bricks/brick1/data 
Cold Tier: 
Cold Tier Type : Distributed-Replicate 
Number of Bricks: 8 x 2 = 16 
Brick9: dc-vihi19:/gluster/bricks/brick5/data 
Brick10: dc-vihi70:/gluster/bricks/brick1/data 
Brick11: dc-vihi19:/gluster/bricks/brick6/data 
Brick12: dc-vihi71:/gluster/bricks/brick1/data 
Brick13: dc-vihi19:/gluster/bricks/brick7/data 
Brick14: dc-vihi70:/gluster/bricks/brick2/data 
Brick15: dc-vihi19:/gluster/bricks/brick8/data 
Brick16: dc-vihi71:/gluster/bricks/brick2/data 
Brick17: dc-vihi19:/gluster/bricks/brick9/data 
Brick18: dc-vihi70:/gluster/bricks/brick5/data 
Brick19: dc-vihi19:/gluster/bricks/brick10/data 
Brick20: dc-vihi71:/gluster/bricks/brick5/data 
Brick21: dc-vihi19:/gluster/bricks/brick11/data 
Brick22: dc-vihi70:/gluster/bricks/brick6/data 
Brick23: dc-vihi19:/gluster/bricks/brick12/data 
Brick24: dc-vihi71:/gluster/bricks/brick6/data 
Options Reconfigured: 
nfs.disable: on 
transport.address-family: inet 
features.ctr-enabled: on 
cluster.tier-mode: cache 
features.shard: on 
features.shard-block-size: 512MB 
network.ping-timeout: 5 
cluster.server-quorum-ratio: 51% 

[root@dc-vihi19 temp]# ls -lh 
total 26G 
-rw-rw-rw-. 1 root root 31G May 17 10:38 win7.qcow2 
[root@dc-vihi19 temp]# getfattr -n glusterfs.gfid.string win7.qcow2 
# file: win7.qcow2 
glusterfs.gfid.string="7f4a0fea-72c0-41e4-97a5-6297be0a9142" 

[root@dc-vihi19 temp]# rm win7.qcow2 
rm: remove regular file âwin7.qcow2â? y 

*Process hangs and can't be killed. A reboot later...* 

login as: root 
Authenticating with public key "rsa-key-20170510" 
Last login: Wed May 17 14:04:29 2017 from ** 
[root@dc-vihi19 ~]# find /gluster/bricks -name 
"7f4a0fea-72c0-41e4-97a5-6297be0a9142*" 
/gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.23 

/gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.35 

/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.52 

/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.29 

/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.22 

/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.24 


and so on... 


-Walter Deignan
-Uline IT, Systems Architect
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS and Kafka

2017-05-24 Thread Raghavendra Talur
On Wed, May 24, 2017 at 4:08 PM, Christopher Schmidt  wrote:
>
>
> Vijay Bellur  schrieb am Mi., 24. Mai 2017 um 05:53 Uhr:
>>
>> On Tue, May 23, 2017 at 1:39 AM, Christopher Schmidt 
>> wrote:
>>>
>>> OK, seems that this works now.
>>>
>>> A couple of questions:
>>> - What do you think, are all these options necessary for Kafka?
>>
>>
>> I am not entirely certain what subset of options will make it work as I do
>> not understand the nature of failure with  Kafka and the default gluster
>> configuration. It certainly needs further analysis to identify the list of
>> options necessary. Would it be possible for you to enable one option after
>> the other and determine the configuration that ?
>>
>>
>>>
>>> - You wrote that there have to be kind of application profiles. So to
>>> find out, which set of options work is currently a matter of testing (and
>>> hope)? Or are there any experiences for MongoDB / ProstgreSQL / Zookeeper
>>> etc.?
>>
>>
>> Application profiles are work in progress. We have a few that are focused
>> on use cases like VM storage, block storage etc. at the moment.
>>
>>>
>>> - I am using Heketi and Dynamik Storage Provisioning together with
>>> Kubernetes. Can I set this volume options somehow by default or by volume
>>> plugin?
>>
>>
>>
>> Adding Raghavendra and Michael to help address this query.
>
>
> For me it would be sufficient to disable some (or all) translators, for all
> volumes that'll be created, somewhere here:
> https://github.com/gluster/gluster-containers/tree/master/CentOS
> This is the container used by the GlusterFS DaemonSet for Kubernetes.

Work is in progress to give such option at volume plugin level. We
currently have a patch[1] in review for Heketi that allows users to
set Gluster options using heketi-cli instead of going into a Gluster
pod. Once this is in, we can add options in storage-class of
Kubernetes that pass down Gluster options for every volume created in
that storage-class.

[1] https://github.com/heketi/heketi/pull/751

Thanks,
Raghavendra Talur

>
>>
>>
>> -Vijay
>>
>>
>>
>>>
>>>
>>> Thanks for you help... really appreciated.. Christopher
>>>
>>> Vijay Bellur  schrieb am Mo., 22. Mai 2017 um 16:41
>>> Uhr:

 Looks like a problem with caching. Can you please try by disabling all
 performance translators? The following configuration commands would disable
 performance translators in the gluster client stack:

 gluster volume set  performance.quick-read off
 gluster volume set  performance.io-cache off
 gluster volume set  performance.write-behind off
 gluster volume set  performance.stat-prefetch off
 gluster volume set  performance.read-ahead off
 gluster volume set  performance.readdir-ahead off
 gluster volume set  performance.open-behind off
 gluster volume set  performance.client-io-threads off

 Thanks,
 Vijay



 On Mon, May 22, 2017 at 9:46 AM, Christopher Schmidt
  wrote:
>
> Hi all,
>
> has anyone ever successfully deployed a Kafka (Cluster) on GlusterFS
> volumes?
>
> I my case it's a Kafka Kubernetes-StatefulSet and a Heketi GlusterFS.
> Needless to say that I am getting a lot of filesystem related
> exceptions like this one:
>
> Failed to read `log header` from file channel
> `sun.nio.ch.FileChannelImpl@67afa54a`. Expected to read 12 bytes, but
> reached end of file after reading 0 bytes. Started read from position
> 123065680.
>
> I limited the amount of exceptions with the
> log.flush.interval.messages=1 option, but not all...
>
> best Christopher
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users


>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Nithya Balachandran
On 24 May 2017 at 22:45, Nithya Balachandran  wrote:

>
>
> On 24 May 2017 at 21:55, Mahdi Adnan  wrote:
>
>> Hi,
>>
>>
>> Thank you for your response.
>>
>> I have around 15 files, each is 2TB qcow.
>>
>> One brick reached 96% so i removed it with "brick remove" and waited
>> until it goes for around 40% and stopped the removal process with brick
>> remove stop.
>>
>> The issue is brick1 drain it's data to brick6 only, and when brick6
>> reached around 90% i did the same thing as before and it drained the data
>> to brick1 only.
>>
>> now brick6 reached 99% and i have only a few gigabytes left which will
>> fill in the next half hour or so.
>>
>> attached are the logs for all 6 bricks.
>>
>> Hi,
>
> Just to clarify, did you run a rebalance (gluster volume rebalance 
> start) or did you only run remove-brick  ?
>
> On re-reading your original email, I see you did run a rebalance. Did it
complete? Also which bricks are full at the moment?


>
> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Nithya Balachandran 
>> *Sent:* Wednesday, May 24, 2017 6:45:10 PM
>> *To:* Mohammed Rafi K C
>> *Cc:* Mahdi Adnan; gluster-users@gluster.org
>> *Subject:* Re: [Gluster-users] Distributed re-balance issue
>>
>>
>>
>> On 24 May 2017 at 20:02, Mohammed Rafi K C  wrote:
>>
>>>
>>>
>>> On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>>>
>>> Hi,
>>>
>>>
>>> I have a distributed volume with 6 bricks, each have 5TB and it's
>>> hosting large qcow2 VM disks (I know it's reliable but it's not important
>>> data)
>>>
>>> I started with 5 bricks and then added another one, started the re
>>> balance process, everything went well, but now im looking at the bricks
>>> free space and i found one brick is around 82% while others ranging from
>>> 20% to 60%.
>>>
>>> The brick with highest utilization is hosting more qcow2 disk than other
>>> bricks, and whenever i start re balance it just complete in 0 seconds and
>>> without moving any data.
>>>
>>>
>>> How much is your average file size in the cluster? And number of files
>>> (roughly) .
>>>
>>>
>>> What will happen with the brick became full ?
>>>
>>> Once brick contents goes beyond 90%, new files won't be created in the
>>> brick. But existing files can grow.
>>>
>>>
>>> Can i move data manually from one brick to the other ?
>>>
>>>
>>> Nop.It is not recommended, even though gluster will try to find the
>>> file, it may break.
>>>
>>>
>>> Why re balance not distributing data evenly on all bricks ?
>>>
>>>
>>> Rebalance works based on layout, so we need to see how layouts are
>>> distributed. If one of your bricks has higher capacity, it will have larger
>>> layout.
>>>
>>>
>>
>>
>>> That is correct. As Rafi said, the layout matters here. Can you please
>>> send across all the rebalance logs from all the 6 nodes?
>>>
>>>
>> Nodes runing CentOS 7.3
>>>
>>> Gluster 3.8.11
>>>
>>>
>>> Volume info;
>>> Volume Name: ctvvols
>>> Type: Distribute
>>> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 6
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ctv01:/vols/ctvvols
>>> Brick2: ctv02:/vols/ctvvols
>>> Brick3: ctv03:/vols/ctvvols
>>> Brick4: ctv04:/vols/ctvvols
>>> Brick5: ctv05:/vols/ctvvols
>>> Brick6: ctv06:/vols/ctvvols
>>> Options Reconfigured:
>>> nfs.disable: on
>>> performance.readdir-ahead: on
>>> transport.address-family: inet
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.low-prio-threads: 32
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> cluster.quorum-type: none
>>> cluster.server-quorum-type: server
>>> cluster.data-self-heal-algorithm: full
>>> cluster.locking-scheme: granular
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 1
>>> features.shard: off
>>> user.cifs: off
>>> network.ping-timeout: 10
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>>
>>>
>>> re balance log:
>>>
>>>
>>> [2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>>> a7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0 took 0.00 secs
>>> [2017-05-23 14:45:12.640043] I [MSGID: 109081]
>>> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
>>> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4
>>> 206-848a-d73e85a1cc35
>>> [2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir]
>>> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-840eb
>>> a7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
>>> [2017-05-23 14:45:12.642421] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>>> a7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35 took 0.00 secs
>>> 

Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Nithya Balachandran
On 24 May 2017 at 21:55, Mahdi Adnan  wrote:

> Hi,
>
>
> Thank you for your response.
>
> I have around 15 files, each is 2TB qcow.
>
> One brick reached 96% so i removed it with "brick remove" and waited until
> it goes for around 40% and stopped the removal process with brick remove
> stop.
>
> The issue is brick1 drain it's data to brick6 only, and when brick6
> reached around 90% i did the same thing as before and it drained the data
> to brick1 only.
>
> now brick6 reached 99% and i have only a few gigabytes left which will
> fill in the next half hour or so.
>
> attached are the logs for all 6 bricks.
>
> Hi,

Just to clarify, did you run a rebalance (gluster volume rebalance 
start) or did you only run remove-brick  ?


-- 
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Nithya Balachandran 
> *Sent:* Wednesday, May 24, 2017 6:45:10 PM
> *To:* Mohammed Rafi K C
> *Cc:* Mahdi Adnan; gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Distributed re-balance issue
>
>
>
> On 24 May 2017 at 20:02, Mohammed Rafi K C  wrote:
>
>>
>>
>> On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>>
>> Hi,
>>
>>
>> I have a distributed volume with 6 bricks, each have 5TB and it's hosting
>> large qcow2 VM disks (I know it's reliable but it's not important data)
>>
>> I started with 5 bricks and then added another one, started the re
>> balance process, everything went well, but now im looking at the bricks
>> free space and i found one brick is around 82% while others ranging from
>> 20% to 60%.
>>
>> The brick with highest utilization is hosting more qcow2 disk than other
>> bricks, and whenever i start re balance it just complete in 0 seconds and
>> without moving any data.
>>
>>
>> How much is your average file size in the cluster? And number of files
>> (roughly) .
>>
>>
>> What will happen with the brick became full ?
>>
>> Once brick contents goes beyond 90%, new files won't be created in the
>> brick. But existing files can grow.
>>
>>
>> Can i move data manually from one brick to the other ?
>>
>>
>> Nop.It is not recommended, even though gluster will try to find the file,
>> it may break.
>>
>>
>> Why re balance not distributing data evenly on all bricks ?
>>
>>
>> Rebalance works based on layout, so we need to see how layouts are
>> distributed. If one of your bricks has higher capacity, it will have larger
>> layout.
>>
>>
>
>
>> That is correct. As Rafi said, the layout matters here. Can you please
>> send across all the rebalance logs from all the 6 nodes?
>>
>>
> Nodes runing CentOS 7.3
>>
>> Gluster 3.8.11
>>
>>
>> Volume info;
>> Volume Name: ctvvols
>> Type: Distribute
>> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: ctv01:/vols/ctvvols
>> Brick2: ctv02:/vols/ctvvols
>> Brick3: ctv03:/vols/ctvvols
>> Brick4: ctv04:/vols/ctvvols
>> Brick5: ctv05:/vols/ctvvols
>> Brick6: ctv06:/vols/ctvvols
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> performance.low-prio-threads: 32
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> cluster.quorum-type: none
>> cluster.server-quorum-type: server
>> cluster.data-self-heal-algorithm: full
>> cluster.locking-scheme: granular
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 1
>> features.shard: off
>> user.cifs: off
>> network.ping-timeout: 10
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>>
>>
>> re balance log:
>>
>>
>> [2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>> a7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0 took 0.00 secs
>> [2017-05-23 14:45:12.640043] I [MSGID: 109081]
>> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
>> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-
>> 4206-848a-d73e85a1cc35
>> [2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir]
>> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-840eb
>> a7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
>> [2017-05-23 14:45:12.642421] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>> a7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35 took 0.00 secs
>> [2017-05-23 14:45:12.645610] I [MSGID: 109081]
>> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
>> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-
>> 4d90-abf5-de757dd04078
>> [2017-05-23 14:45:12.647034] I [dht-rebalance.c:2652:gf_defrag_process_dir]
>> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-840eb
>> 

Re: [Gluster-users] [Gluster-devel] Community Meeting 2017-05-24

2017-05-24 Thread Amar Tumballi
Sorry about missing it. Didn't add this event on calendar, and hence
missed. Will be there from next IRC meet.

-Amar

On Wed, May 24, 2017 at 9:28 PM, Kaushal M  wrote:

> Very poor turnout today, just 3 attendees including me.
>
> But, we actually did have a discussion and came out with a couple of AIs.
>
> The logs and minutes are available at the links below.
>
> Archive: https://github.com/gluster/glusterfs/wiki/Community-
> Meeting-2017-05-24
> Minutes: https://meetbot.fedoraproject.org/gluster-meeting/2017-05-
> 24/gluster_community_meeting_2017-05-24.2017-05-24-15.03.html
> Minutes (text):
> https://meetbot.fedoraproject.org/gluster-meeting/2017-05-
> 24/gluster_community_meeting_2017-05-24.2017-05-24-15.03.txt
> Log: https://meetbot.fedoraproject.org/gluster-meeting/2017-05-
> 24/gluster_community_meeting_2017-05-24.2017-05-24-15.03.log.html
>
>
> The next meeting will be held on 7-June. The meeting pad is at
> https://bit.ly/gluster-community-meetings as always for your updates
> and topics.
>
> ~kaushal
>
> Meeting summary
> ---
> * Roll Call  (kshlm, 15:08:19)
>
> * Openstack Cinder glusterfs support has been removed  (kshlm, 15:17:35)
>   * LINK:
> https://wiki.openstack.org/wiki/ThirdPartySystems/RedHat_GlusterFS_CI
> shows BharatK and deepakcs  (JoeJulian, 15:30:46)
>   * LINK:
> https://github.com/openstack/cinder/commit/
> 16e93ccd4f3a6d62ed9d277f03b64bccc63ae060
> (kshlm, 15:38:52)
>   * ACTION: ndevos will check with Eric Harney about the Openstack
> Gluster efforts  (kshlm, 15:39:49)
>   * ACTION: JoeJulian will share his conversations with Eric Harney
> (kshlm, 15:40:24)
>
> Meeting ended at 15:42:15 UTC.
>
>
>
>
> Action Items
> 
> * ndevos will check with Eric Harney about the Openstack Gluster efforts
> * JoeJulian will share his conversations with Eric Harney
>
>
>
>
> Action Items, by person
> ---
> * JoeJulian
>   * JoeJulian will share his conversations with Eric Harney
> * ndevos
>   * ndevos will check with Eric Harney about the Openstack Gluster
> efforts
> * **UNASSIGNED**
>   * (none)
>
>
>
>
> People Present (lines said)
> ---
> * kshlm (37)
> * ndevos (31)
> * JoeJulian (30)
> * zodbot (6)
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Community Meeting 2017-05-24

2017-05-24 Thread Kaushal M
Very poor turnout today, just 3 attendees including me.

But, we actually did have a discussion and came out with a couple of AIs.

The logs and minutes are available at the links below.

Archive: https://github.com/gluster/glusterfs/wiki/Community-Meeting-2017-05-24
Minutes: 
https://meetbot.fedoraproject.org/gluster-meeting/2017-05-24/gluster_community_meeting_2017-05-24.2017-05-24-15.03.html
Minutes (text):
https://meetbot.fedoraproject.org/gluster-meeting/2017-05-24/gluster_community_meeting_2017-05-24.2017-05-24-15.03.txt
Log: 
https://meetbot.fedoraproject.org/gluster-meeting/2017-05-24/gluster_community_meeting_2017-05-24.2017-05-24-15.03.log.html


The next meeting will be held on 7-June. The meeting pad is at
https://bit.ly/gluster-community-meetings as always for your updates
and topics.

~kaushal

Meeting summary
---
* Roll Call  (kshlm, 15:08:19)

* Openstack Cinder glusterfs support has been removed  (kshlm, 15:17:35)
  * LINK:
https://wiki.openstack.org/wiki/ThirdPartySystems/RedHat_GlusterFS_CI
shows BharatK and deepakcs  (JoeJulian, 15:30:46)
  * LINK:

https://github.com/openstack/cinder/commit/16e93ccd4f3a6d62ed9d277f03b64bccc63ae060
(kshlm, 15:38:52)
  * ACTION: ndevos will check with Eric Harney about the Openstack
Gluster efforts  (kshlm, 15:39:49)
  * ACTION: JoeJulian will share his conversations with Eric Harney
(kshlm, 15:40:24)

Meeting ended at 15:42:15 UTC.




Action Items

* ndevos will check with Eric Harney about the Openstack Gluster efforts
* JoeJulian will share his conversations with Eric Harney




Action Items, by person
---
* JoeJulian
  * JoeJulian will share his conversations with Eric Harney
* ndevos
  * ndevos will check with Eric Harney about the Openstack Gluster
efforts
* **UNASSIGNED**
  * (none)




People Present (lines said)
---
* kshlm (37)
* ndevos (31)
* JoeJulian (30)
* zodbot (6)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Nithya Balachandran
On 24 May 2017 at 20:02, Mohammed Rafi K C  wrote:

>
>
> On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>
> Hi,
>
>
> I have a distributed volume with 6 bricks, each have 5TB and it's hosting
> large qcow2 VM disks (I know it's reliable but it's not important data)
>
> I started with 5 bricks and then added another one, started the re balance
> process, everything went well, but now im looking at the bricks free space
> and i found one brick is around 82% while others ranging from 20% to 60%.
>
> The brick with highest utilization is hosting more qcow2 disk than other
> bricks, and whenever i start re balance it just complete in 0 seconds and
> without moving any data.
>
>
> How much is your average file size in the cluster? And number of files
> (roughly) .
>
>
> What will happen with the brick became full ?
>
> Once brick contents goes beyond 90%, new files won't be created in the
> brick. But existing files can grow.
>
>
> Can i move data manually from one brick to the other ?
>
>
> Nop.It is not recommended, even though gluster will try to find the file,
> it may break.
>
>
> Why re balance not distributing data evenly on all bricks ?
>
>
> Rebalance works based on layout, so we need to see how layouts are
> distributed. If one of your bricks has higher capacity, it will have larger
> layout.
>
>


> That is correct. As Rafi said, the layout matters here. Can you please
> send across all the rebalance logs from all the 6 nodes?
>
>
Nodes runing CentOS 7.3
>
> Gluster 3.8.11
>
>
> Volume info;
> Volume Name: ctvvols
> Type: Distribute
> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6
> Transport-type: tcp
> Bricks:
> Brick1: ctv01:/vols/ctvvols
> Brick2: ctv02:/vols/ctvvols
> Brick3: ctv03:/vols/ctvvols
> Brick4: ctv04:/vols/ctvvols
> Brick5: ctv05:/vols/ctvvols
> Brick6: ctv06:/vols/ctvvols
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.quorum-type: none
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> features.shard: off
> user.cifs: off
> network.ping-timeout: 10
> storage.owner-uid: 36
> storage.owner-gid: 36
>
>
> re balance log:
>
>
> [2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir]
> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0 took 0.00 secs
> [2017-05-23 14:45:12.640043] I [MSGID: 109081] 
> [dht-common.c:4202:dht_setxattr]
> 0-ctvvols-dht: fixing the layout of /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir]
> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.642421] I [dht-rebalance.c:2866:gf_defrag_process_dir]
> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35 took 0.00 secs
> [2017-05-23 14:45:12.645610] I [MSGID: 109081] 
> [dht-common.c:4202:dht_setxattr]
> 0-ctvvols-dht: fixing the layout of /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647034] I [dht-rebalance.c:2652:gf_defrag_process_dir]
> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647589] I [dht-rebalance.c:2866:gf_defrag_process_dir]
> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078 took 0.00 secs
> [2017-05-23 14:45:12.653291] I [dht-rebalance.c:3838:gf_defrag_start_crawl]
> 0-DHT: crawling file-system completed
> [2017-05-23 14:45:12.653323] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 23
> [2017-05-23 14:45:12.653508] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 24
> [2017-05-23 14:45:12.653536] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 25
> [2017-05-23 14:45:12.653556] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 26
> [2017-05-23 14:45:12.653580] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 27
> [2017-05-23 14:45:12.653603] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 28
> [2017-05-23 14:45:12.653623] I 

Re: [Gluster-users] Failure while upgrading gluster to 3.10.1

2017-05-24 Thread Atin Mukherjee
Are the other glusterd instances are up? output of gluster peer status &
gluster volume status please?

On Wed, May 24, 2017 at 4:20 PM, Pawan Alwandi  wrote:

> Thanks Atin,
>
> So I got gluster downgraded to 3.7.9 on host 1 and now have the glusterfs
> and glusterfsd processes come up.  But I see the volume is mounted read
> only.
>
> I see these being logged every 3s:
>
> [2017-05-24 10:45:44.440435] W [socket.c:852:__socket_keepalive]
> 0-socket: failed to set keep idle -1 on socket 17, Invalid argument
> [2017-05-24 10:45:44.440475] E [socket.c:2966:socket_connect]
> 0-management: Failed to set keep-alive: Invalid argument
> [2017-05-24 10:45:44.440734] W [socket.c:852:__socket_keepalive]
> 0-socket: failed to set keep idle -1 on socket 20, Invalid argument
> [2017-05-24 10:45:44.440754] E [socket.c:2966:socket_connect]
> 0-management: Failed to set keep-alive: Invalid argument
> [2017-05-24 10:45:44.441354] E [rpc-clnt.c:362:saved_frames_unwind] (-->
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
> (--> 
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
> (--> 
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
> (--> 
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7f767c239c8e]
> (--> 
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8]
> ) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1))
> called at 2017-05-24 10:45:44.440945 (xid=0xbf)
> [2017-05-24 10:45:44.441505] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/gl
> usterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
> sterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
> sterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] ) 0-management:
> Lock for vol shared not held
> [2017-05-24 10:45:44.441660] E [rpc-clnt.c:362:saved_frames_unwind] (-->
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
> (--> 
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
> (--> 
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
> (--> 
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7f767c239c8e]
> (--> 
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8]
> ) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1))
> called at 2017-05-24 10:45:44.441086 (xid=0xbf)
> [2017-05-24 10:45:44.441790] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/gl
> usterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
> sterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
> sterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] ) 0-management:
> Lock for vol shared not held
>
> The heal info says this:
>
> # gluster volume heal shared info
> Brick 192.168.0.5:/data/exports/shared
> Number of entries: 0
>
> Brick 192.168.0.6:/data/exports/shared
> Status: Transport endpoint is not connected
>
> Brick 192.168.0.7:/data/exports/shared
> Status: Transport endpoint is not connected
>
> Any idea whats up here?
>
> Pawan
>
> On Mon, May 22, 2017 at 9:42 PM, Atin Mukherjee 
> wrote:
>
>>
>>
>> On Mon, May 22, 2017 at 9:05 PM, Pawan Alwandi  wrote:
>>
>>>
>>> On Mon, May 22, 2017 at 8:36 PM, Atin Mukherjee 
>>> wrote:
>>>


 On Mon, May 22, 2017 at 7:51 PM, Atin Mukherjee 
 wrote:

> Sorry Pawan, I did miss the other part of the attachments. So looking
> from the glusterd.info file from all the hosts, it looks like host2
> and host3 do not have the correct op-version. Can you please set the
> op-version as "operating-version=30702" in host2 and host3 and restart
> glusterd instance one by one on all the nodes?
>

 Please ensure that all the hosts are upgraded to the same bits before
 doing this change.

>>>
>>> Having to upgrade all 3 hosts to newer version before gluster could work
>>> successfully on any of them means application downtime.  The applications
>>> running on these hosts are expected to be highly available.  So with the
>>> way the things are right now, is an online upgrade possible?  My upgrade
>>> steps are: (1) stop the applications (2) umount the gluster volume, and
>>> then (3) upgrade gluster one host at a time.
>>>
>>
>> One of the way to mitigate this is to first do an online upgrade to
>> glusterfs-3.7.9 (op-version:30707) given this bug was 

Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Mohammed Rafi K C


On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>
> Hi,
>
>
> I have a distributed volume with 6 bricks, each have 5TB and it's
> hosting large qcow2 VM disks (I know it's reliable but it's
> not important data)
>
> I started with 5 bricks and then added another one, started the re
> balance process, everything went well, but now im looking at the
> bricks free space and i found one brick is around 82% while others
> ranging from 20% to 60%.
>
> The brick with highest utilization is hosting more qcow2 disk than
> other bricks, and whenever i start re balance it just complete in 0
> seconds and without moving any data.
>

How much is your average file size in the cluster? And number of files
(roughly) .


> What will happen with the brick became full ?
>
Once brick contents goes beyond 90%, new files won't be created in the
brick. But existing files can grow.


> Can i move data manually from one brick to the other ?
>

Nop.It is not recommended, even though gluster will try to find the
file, it may break.


> Why re balance not distributing data evenly on all bricks ?
>

Rebalance works based on layout, so we need to see how layouts are
distributed. If one of your bricks has higher capacity, it will have
larger layout.

>
> Nodes runing CentOS 7.3
>
> Gluster 3.8.11
>
>
> Volume info;
>
> Volume Name: ctvvols
> Type: Distribute
> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6
> Transport-type: tcp
> Bricks:
> Brick1: ctv01:/vols/ctvvols
> Brick2: ctv02:/vols/ctvvols
> Brick3: ctv03:/vols/ctvvols
> Brick4: ctv04:/vols/ctvvols
> Brick5: ctv05:/vols/ctvvols
> Brick6: ctv06:/vols/ctvvols
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.quorum-type: none
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> features.shard: off
> user.cifs: off
> network.ping-timeout: 10
> storage.owner-uid: 36
> storage.owner-gid: 36
>
>
> re balance log:
>
>
> [2017-05-23 14:45:12.637671] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0
> took 0.00 secs
> [2017-05-23 14:45:12.640043] I [MSGID: 109081]
> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.641516] I
> [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht: migrate
> data called on
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.642421] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> took 0.00 secs
> [2017-05-23 14:45:12.645610] I [MSGID: 109081]
> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647034] I
> [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht: migrate
> data called on
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647589] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> took 0.00 secs
> [2017-05-23 14:45:12.653291] I
> [dht-rebalance.c:3838:gf_defrag_start_crawl] 0-DHT: crawling
> file-system completed
> [2017-05-23 14:45:12.653323] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 23
> [2017-05-23 14:45:12.653508] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 24
> [2017-05-23 14:45:12.653536] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 25
> [2017-05-23 14:45:12.653556] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 26
> [2017-05-23 14:45:12.653580] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 27
> [2017-05-23 14:45:12.653603] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 28
> [2017-05-23 14:45:12.653623] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 29
> [2017-05-23 14:45:12.653638] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 30
> [2017-05-23 

[Gluster-users] 3.7.13 - Safe to replace brick ?

2017-05-24 Thread lemonnierk
Hi,

Does anyone know if the corruption bugs we've had for a while in add-brick
only happen when adding new bricks, or does replace-brick corrupt shards
too ?

I have a 3.7.13 volume with a brick I'd like to move to another
server, and I'll do a backup and the move at night just in case,
but I'd rather know if it risks corrupting the disk or not.

I believe the rebalance bug is a separate issue, it won't be a problem
here as it's only a replica 3 with no distribute at all.

Thanks !


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Distributed re-balance issue

2017-05-24 Thread Mahdi Adnan
Hi,


I have a distributed volume with 6 bricks, each have 5TB and it's hosting large 
qcow2 VM disks (I know it's reliable but it's not important data)

I started with 5 bricks and then added another one, started the re balance 
process, everything went well, but now im looking at the bricks free space and 
i found one brick is around 82% while others ranging from 20% to 60%.

The brick with highest utilization is hosting more qcow2 disk than other 
bricks, and whenever i start re balance it just complete in 0 seconds and 
without moving any data.

What will happen with the brick became full ?

Can i move data manually from one brick to the other ?

Why re balance not distributing data evenly on all bricks ?


Nodes runing CentOS 7.3

Gluster 3.8.11


Volume info;

Volume Name: ctvvols
Type: Distribute
Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
Status: Started
Snapshot Count: 0
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: ctv01:/vols/ctvvols
Brick2: ctv02:/vols/ctvvols
Brick3: ctv03:/vols/ctvvols
Brick4: ctv04:/vols/ctvvols
Brick5: ctv05:/vols/ctvvols
Brick6: ctv06:/vols/ctvvols
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: none
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
features.shard: off
user.cifs: off
network.ping-timeout: 10
storage.owner-uid: 36
storage.owner-gid: 36



re balance log:


[2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir] 
0-ctvvols-dht: Migration operation on dir 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0
 took 0.00 secs
[2017-05-23 14:45:12.640043] I [MSGID: 109081] [dht-common.c:4202:dht_setxattr] 
0-ctvvols-dht: fixing the layout of 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
[2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir] 
0-ctvvols-dht: migrate data called on 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
[2017-05-23 14:45:12.642421] I [dht-rebalance.c:2866:gf_defrag_process_dir] 
0-ctvvols-dht: Migration operation on dir 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
 took 0.00 secs
[2017-05-23 14:45:12.645610] I [MSGID: 109081] [dht-common.c:4202:dht_setxattr] 
0-ctvvols-dht: fixing the layout of 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
[2017-05-23 14:45:12.647034] I [dht-rebalance.c:2652:gf_defrag_process_dir] 
0-ctvvols-dht: migrate data called on 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
[2017-05-23 14:45:12.647589] I [dht-rebalance.c:2866:gf_defrag_process_dir] 
0-ctvvols-dht: Migration operation on dir 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
 took 0.00 secs
[2017-05-23 14:45:12.653291] I [dht-rebalance.c:3838:gf_defrag_start_crawl] 
0-DHT: crawling file-system completed
[2017-05-23 14:45:12.653323] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 23
[2017-05-23 14:45:12.653508] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 24
[2017-05-23 14:45:12.653536] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 25
[2017-05-23 14:45:12.653556] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 26
[2017-05-23 14:45:12.653580] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 27
[2017-05-23 14:45:12.653603] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 28
[2017-05-23 14:45:12.653623] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 29
[2017-05-23 14:45:12.653638] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 30
[2017-05-23 14:45:12.653659] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 31
[2017-05-23 14:45:12.653677] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 32
[2017-05-23 14:45:12.653692] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 33
[2017-05-23 14:45:12.653711] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 34
[2017-05-23 14:45:12.653723] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 
Thread wokeup. defrag->current_thread_count: 35
[2017-05-23 14:45:12.653739] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: 

Re: [Gluster-users] mount glusterfs with another ip

2017-05-24 Thread Niels de Vos
On Wed, May 24, 2017 at 05:04:34PM +0430, atris adam wrote:
> Hello everybody,
> 
> I have created a glusterfs volume with local ip, I want to mount the
> glusterfs volume with valid Ip somewhere else. The client machine can not
> ping the local ip, I mean it is in another network.
> 
> Is it possible to create a glusterfs volume in one network lan and mount it
> in another network lan, which the lans are not pingable to each other?

Not at the moment. There is an idea that should make this possible. In
the future (Gluster 3.12, maybe?) there is GF-Proxy that will enable
this.

Until then, you should look into setting up NFS-Ganesha or Samba to
function as a proxy server. We strongly recommend to use only a single
access protocol (FUSE mounts, NFS or Samba) to prevent inconsistencies
in caching/permissions/locking/..., so keep that in mind.

Niels


signature.asc
Description: PGP signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] mount glusterfs with another ip

2017-05-24 Thread atris adam
Hello everybody,

I have created a glusterfs volume with local ip, I want to mount the
glusterfs volume with valid Ip somewhere else. The client machine can not
ping the local ip, I mean it is in another network.

Is it possible to create a glusterfs volume in one network lan and mount it
in another network lan, which the lans are not pingable to each other?

thx
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-05-24 Thread Niels de Vos
On Wed, May 24, 2017 at 11:18:07AM +, Mahdi Adnan wrote:
> Hi,
> 
> 
> Still no RPMs in SIG repository.

This depends on the CentOS team to sign and mirror the updates. The
packages are correctly tagged, but have not been signed+pushed yet.

You can get the updates by running this command:

  # yum --enablerepo=centos-gluster310-test update glusterfs

Those packages are not signed though. If that is important to you,
you''ll have to wait until the CentOS team caught up on their work.

HTH,
Niels


> 
> 
> --
> 
> Respectfully
> Mahdi A. Mahdi
> 
> 
> From: Niels de Vos 
> Sent: Monday, May 22, 2017 3:26:02 PM
> To: Atin Mukherjee
> Cc: Mahdi Adnan; Vijay Bellur; gluster-user
> Subject: Re: [Gluster-users] Rebalance + VM corruption - current status and 
> request for feedback
> 
> On Sun, May 21, 2017 at 02:37:34AM +, Atin Mukherjee wrote:
> > On Sun, 21 May 2017 at 02:17, Mahdi Adnan  wrote:
> >
> > > Thank you so much for your replay.
> > >
> > > Yes, i checked the 310 repository before and it was't there, see below;
> > >
> >
> > I think 3.10.2 bits are still in 3.10 test repository in storage sig and
> > hasn't been pushed to release mirror yet. Niels can update further on this.
> 
> Nobody reported test results, so there was no urgency to push the
> release to the CentOS mirrors. I've marked the packages for release now,
> and the CentOS team will probably sign+mirror them during the day
> tomorrow.
> 
> Niels
> 
> 
> >
> >
> > > Installed Packages
> > > Name: glusterfs-server
> > > Arch: x86_64
> > > Version : 3.10.1
> > > Release : 1.el7
> > > Size: 4.3 M
> > > Repo: installed
> > > From repo   : centos-gluster310
> > > Summary : Distributed file-system server
> > > URL : http://gluster.readthedocs.io/en/latest/
> > > License : GPLv2 or LGPLv3+
> > > Description :
> > >
> > > and no other packages available in the repo
> > >
> > > --
> > >
> > > Respectfully
> > > *Mahdi A. Mahdi*
> > >
> > > --
> > > *From:* Vijay Bellur 
> > > *Sent:* Saturday, May 20, 2017 6:46:51 PM
> > > *To:* Krutika Dhananjay
> > > *Cc:* Mahdi Adnan; raghavendra talur; gluster-user
> > > *Subject:* Re: [Gluster-users] Rebalance + VM corruption - current status
> > > and request for feedback
> > >
> > >
> > >
> > > On Sat, May 20, 2017 at 6:38 AM, Krutika Dhananjay 
> > > wrote:
> > >
> > >> Raghavendra Talur might know. Adding him to the thread.
> > >>
> > >> -Krutika
> > >>
> > >> On Sat, May 20, 2017 at 2:47 PM, Mahdi Adnan 
> > >> wrote:
> > >>
> > >>> Good morning,
> > >>>
> > >>>
> > >>> SIG repository does not have the latest glusterfs 3.10.2.
> > >>>
> > >>> Do you have any idea when it's going to be updated ?
> > >>>
> > >>> Is there any other recommended place to get the latest rpms ?
> > >>>
> > >>>
> > >
> > > RPMs are available in the centos-gluster310-test repository [1].
> > >
> > > -Vijay
> > >
> > > [1] http://lists.gluster.org/pipermail/maintainers/2017-May/002575.html
> > >
> > >
> > > ___
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org
> > > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> > --
> > - Atin (atinm)


signature.asc
Description: PGP signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS and Kafka

2017-05-24 Thread Christopher Schmidt
Vijay Bellur  schrieb am Mi., 24. Mai 2017 um 05:53 Uhr:

> On Tue, May 23, 2017 at 1:39 AM, Christopher Schmidt 
> wrote:
>
>> OK, seems that this works now.
>>
>> A couple of questions:
>> - What do you think, are all these options necessary for Kafka?
>>
>
> I am not entirely certain what subset of options will make it work as I do
> not understand the nature of failure with  Kafka and the default gluster
> configuration. It certainly needs further analysis to identify the list of
> options necessary. Would it be possible for you to enable one option after
> the other and determine the configuration that ?
>

I've done a short test To disable
http://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Developer-guide/write-behind/
seems
to be enough at least for Kafka.


>
>
>
>> - You wrote that there have to be kind of application profiles. So to
>> find out, which set of options work is currently a matter of testing (and
>> hope)? Or are there any experiences for MongoDB / ProstgreSQL / Zookeeper
>> etc.?
>>
>
> Application profiles are work in progress. We have a few that are focused
> on use cases like VM storage, block storage etc. at the moment.
>
>
>> - I am using Heketi and Dynamik Storage Provisioning together with
>> Kubernetes. Can I set this volume options somehow by default or by volume
>> plugin?
>>
>
>
> Adding Raghavendra and Michael to help address this query.
>
> -Vijay
>
>
>
>
>>
>> Thanks for you help... really appreciated.. Christopher
>>
>> Vijay Bellur  schrieb am Mo., 22. Mai 2017 um
>> 16:41 Uhr:
>>
>>> Looks like a problem with caching. Can you please try by disabling all
>>> performance translators? The following configuration commands would disable
>>> performance translators in the gluster client stack:
>>>
>>> gluster volume set  performance.quick-read off
>>> gluster volume set  performance.io-cache off
>>> gluster volume set  performance.write-behind off
>>> gluster volume set  performance.stat-prefetch off
>>> gluster volume set  performance.read-ahead off
>>> gluster volume set  performance.readdir-ahead off
>>> gluster volume set  performance.open-behind off
>>> gluster volume set  performance.client-io-threads off
>>>
>>> Thanks,
>>> Vijay
>>>
>>>
>>>
>>> On Mon, May 22, 2017 at 9:46 AM, Christopher Schmidt >> > wrote:
>>>
 Hi all,

 has anyone ever successfully deployed a Kafka (Cluster) on GlusterFS
 volumes?

 I my case it's a Kafka Kubernetes-StatefulSet and a Heketi GlusterFS.
 Needless to say that I am getting a lot of filesystem related
 exceptions like this one:

 Failed to read `log header` from file channel
 `sun.nio.ch.FileChannelImpl@67afa54a`. Expected to read 12 bytes, but
 reached end of file after reading 0 bytes. Started read from position
 123065680.

 I limited the amount of exceptions with
 the log.flush.interval.messages=1 option, but not all...

 best Christopher


 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://lists.gluster.org/mailman/listinfo/gluster-users

>>>
>>>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Failure while upgrading gluster to 3.10.1

2017-05-24 Thread Pawan Alwandi
Thanks Atin,

So I got gluster downgraded to 3.7.9 on host 1 and now have the glusterfs
and glusterfsd processes come up.  But I see the volume is mounted read
only.

I see these being logged every 3s:

[2017-05-24 10:45:44.440435] W [socket.c:852:__socket_keepalive] 0-socket:
failed to set keep idle -1 on socket 17, Invalid argument
[2017-05-24 10:45:44.440475] E [socket.c:2966:socket_connect] 0-management:
Failed to set keep-alive: Invalid argument
[2017-05-24 10:45:44.440734] W [socket.c:852:__socket_keepalive] 0-socket:
failed to set keep idle -1 on socket 20, Invalid argument
[2017-05-24 10:45:44.440754] E [socket.c:2966:socket_connect] 0-management:
Failed to set keep-alive: Invalid argument
[2017-05-24 10:45:44.441354] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_
connection_cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/
libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8] ) 0-management:
forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-24
10:45:44.440945 (xid=0xbf)
[2017-05-24 10:45:44.441505] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] ) 0-management:
Lock for vol shared not held
[2017-05-24 10:45:44.441660] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_
connection_cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/
libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8] ) 0-management:
forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-24
10:45:44.441086 (xid=0xbf)
[2017-05-24 10:45:44.441790] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/
glusterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] ) 0-management:
Lock for vol shared not held

The heal info says this:

# gluster volume heal shared info
Brick 192.168.0.5:/data/exports/shared
Number of entries: 0

Brick 192.168.0.6:/data/exports/shared
Status: Transport endpoint is not connected

Brick 192.168.0.7:/data/exports/shared
Status: Transport endpoint is not connected

Any idea whats up here?

Pawan

On Mon, May 22, 2017 at 9:42 PM, Atin Mukherjee  wrote:

>
>
> On Mon, May 22, 2017 at 9:05 PM, Pawan Alwandi  wrote:
>
>>
>> On Mon, May 22, 2017 at 8:36 PM, Atin Mukherjee 
>> wrote:
>>
>>>
>>>
>>> On Mon, May 22, 2017 at 7:51 PM, Atin Mukherjee 
>>> wrote:
>>>
 Sorry Pawan, I did miss the other part of the attachments. So looking
 from the glusterd.info file from all the hosts, it looks like host2
 and host3 do not have the correct op-version. Can you please set the
 op-version as "operating-version=30702" in host2 and host3 and restart
 glusterd instance one by one on all the nodes?

>>>
>>> Please ensure that all the hosts are upgraded to the same bits before
>>> doing this change.
>>>
>>
>> Having to upgrade all 3 hosts to newer version before gluster could work
>> successfully on any of them means application downtime.  The applications
>> running on these hosts are expected to be highly available.  So with the
>> way the things are right now, is an online upgrade possible?  My upgrade
>> steps are: (1) stop the applications (2) umount the gluster volume, and
>> then (3) upgrade gluster one host at a time.
>>
>
> One of the way to mitigate this is to first do an online upgrade to
> glusterfs-3.7.9 (op-version:30707) given this bug was introduced in 3.7.10
> and then come to 3.11.
>
>
>> Our goal is to get gluster upgraded to 3.11 from 3.6.9, and to make this
>> an online upgrade we are okay to take two steps 3.6.9 -> 3.7 and then 3.7
>> to 3.11.
>>
>>
>>>
>>>

 Apparently it looks like there is a bug which you have uncovered,
 during peer handshaking if one of the glusterd 

Re: [Gluster-users] GlusterFS and Kafka

2017-05-24 Thread Christopher Schmidt
Vijay Bellur  schrieb am Mi., 24. Mai 2017 um 05:53 Uhr:

> On Tue, May 23, 2017 at 1:39 AM, Christopher Schmidt 
> wrote:
>
>> OK, seems that this works now.
>>
>> A couple of questions:
>> - What do you think, are all these options necessary for Kafka?
>>
>
> I am not entirely certain what subset of options will make it work as I do
> not understand the nature of failure with  Kafka and the default gluster
> configuration. It certainly needs further analysis to identify the list of
> options necessary. Would it be possible for you to enable one option after
> the other and determine the configuration that ?
>
>
>
>> - You wrote that there have to be kind of application profiles. So to
>> find out, which set of options work is currently a matter of testing (and
>> hope)? Or are there any experiences for MongoDB / ProstgreSQL / Zookeeper
>> etc.?
>>
>
> Application profiles are work in progress. We have a few that are focused
> on use cases like VM storage, block storage etc. at the moment.
>
>
>> - I am using Heketi and Dynamik Storage Provisioning together with
>> Kubernetes. Can I set this volume options somehow by default or by volume
>> plugin?
>>
>
>
> Adding Raghavendra and Michael to help address this query.
>

For me it would be sufficient to disable some (or all) translators, for all
volumes that'll be created, somewhere here:
https://github.com/gluster/gluster-containers/tree/master/CentOS
This is the container used by the GlusterFS DaemonSet for Kubernetes.


>
> -Vijay
>
>
>
>
>>
>> Thanks for you help... really appreciated.. Christopher
>>
>> Vijay Bellur  schrieb am Mo., 22. Mai 2017 um
>> 16:41 Uhr:
>>
>>> Looks like a problem with caching. Can you please try by disabling all
>>> performance translators? The following configuration commands would disable
>>> performance translators in the gluster client stack:
>>>
>>> gluster volume set  performance.quick-read off
>>> gluster volume set  performance.io-cache off
>>> gluster volume set  performance.write-behind off
>>> gluster volume set  performance.stat-prefetch off
>>> gluster volume set  performance.read-ahead off
>>> gluster volume set  performance.readdir-ahead off
>>> gluster volume set  performance.open-behind off
>>> gluster volume set  performance.client-io-threads off
>>>
>>> Thanks,
>>> Vijay
>>>
>>>
>>>
>>> On Mon, May 22, 2017 at 9:46 AM, Christopher Schmidt >> > wrote:
>>>
 Hi all,

 has anyone ever successfully deployed a Kafka (Cluster) on GlusterFS
 volumes?

 I my case it's a Kafka Kubernetes-StatefulSet and a Heketi GlusterFS.
 Needless to say that I am getting a lot of filesystem related
 exceptions like this one:

 Failed to read `log header` from file channel
 `sun.nio.ch.FileChannelImpl@67afa54a`. Expected to read 12 bytes, but
 reached end of file after reading 0 bytes. Started read from position
 123065680.

 I limited the amount of exceptions with
 the log.flush.interval.messages=1 option, but not all...

 best Christopher


 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://lists.gluster.org/mailman/listinfo/gluster-users

>>>
>>>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users