Re: [Gluster-users] disperse volume brick counts limits in RHES

2017-05-08 Thread Xavier Hernandez

Hi Alastair,

the numbers I'm giving correspond to an Intel Xeon E5-2630L 2 GHz CPU.

On 08/05/17 22:44, Alastair Neil wrote:

so the bottleneck is that computations with 16x20 matrix require  ~4
times the cycles?


This is only part of the problem. A 16x16 matrix can be processed at a 
rate of 400 MB/s, so a single fragment on a brick will be processed at 
400/16 = 25 MB/s which is not the case.


Note that the fragment on a brick is only part of a whole file, so 25 
MB/s on a brick means that the real file is being processed at 400 MB/s.



It seems then that there is ample room for
improvement, as there are many linear algebra packages out there that
scale better than O(nxm).


That's true for much bigger matrices where synchronization time between 
threads is negligible compared to the computation time. In this case the 
algorithm is highly optimized and any attempt to distribute the 
computation would be worse.


Note that the current algorithm can rebuild the original data at a rate 
of ~5 CPU cycles per byte with a 16x16 configuration without any SIMD 
extension. With SSE or AVX this goes down to near 1 cycle per byte.


In this case the best we can do is to do more than one heal in parallel. 
This will use more than one core to compute the matrices, getting an 
overall better performance.



Is the healing time dominated by the EC
compute time?  If Serkan saw a hard 2x scaling then it seems likely.


Partially. The computation speed is doubled on a 8+2 configuration, but 
also the number of IOPS is halved, and each one is of twice the size of 
a 16+4 operation. This means that we only have half of the latencies 
when using 8+2 and bandwidth is better utilized.


The theoretical speed of matrix processing is 25 MB/s per brick, but the 
real speed seen is considerably smaller, so network latencies and other 
factors also contribute to the heal time.


Xavi



-Alastair




On 8 May 2017 at 03:02, Xavier Hernandez mailto:xhernan...@datalab.es>> wrote:

On 05/05/17 13:49, Pranith Kumar Karampuri wrote:



On Fri, May 5, 2017 at 2:38 PM, Serkan Çoban
mailto:cobanser...@gmail.com>
>>
wrote:

It is the over all time, 8TB data disk healed 2x faster in 8+2
configuration.


Wow, that is counter intuitive for me. I will need to explore
about this
to find out why that could be. Thanks a lot for this feedback!


Matrix multiplication for encoding/decoding of 8+2 is 4 times faster
than 16+4 (one matrix of 16x16 is composed by 4 submatrices of 8x8),
however each matrix operation on a 16+4 configuration takes twice
the amount of data of a 8+2, so net effect is that 8+2 is twice as
fast as 16+4.

An 8+2 also uses bigger blocks on each brick, processing the same
amount of data in less I/O operations and bigger network packets.

Probably these are the reasons why 16+4 is slower than 8+2.

See my other email for more detailed description.

Xavi




On Fri, May 5, 2017 at 10:00 AM, Pranith Kumar Karampuri
mailto:pkara...@redhat.com>
>> wrote:
>
>
> On Fri, May 5, 2017 at 11:42 AM, Serkan Çoban
mailto:cobanser...@gmail.com>
>>
wrote:
>>
>> Healing gets slower as you increase m in m+n configuration.
>> We are using 16+4 configuration without any problems
other then heal
>> speed.
>> I tested heal speed with 8+2 and 16+4 on 3.9.0 and see
that heals on
>> 8+2 is faster by 2x.
>
>
> As you increase number of nodes that are participating in
an EC
set number
> of parallel heals increase. Is the heal speed you saw
improved per
file or
> the over all time it took to heal the data?
>
>>
>>
>>
>> On Fri, May 5, 2017 at 9:04 AM, Ashish Pandey
mailto:aspan...@redhat.com>
>> wrote:
>> >
>> > 8+2 and 8+3 configurations are not the limitation but just
suggestions.
>> > You can create 16+3 volume without any issue.
>> >
>> > Ashish
>> >
>> > 
>> > From: "Alastair Neil" mailto:ajneil.t...@gmail.com>
>>
>> > To: "gluster-users" mailto:gluster-users@gluster.org>
>>
>> > Sent: Friday, May 5, 2017 2:23:32 AM
   

[Gluster-users] Empty info file preventing glusterd from starting

2017-05-08 Thread ABHISHEK PALIWAL
Hi Atin/Team,

We are using gluster-3.7.6 with setup of two brick and while restart of
system I have seen that the glusterd daemon is getting failed from start.


At the time of analyzing the logs from etc-glusterfs...log file I have
received the below logs


[2017-05-06 03:33:39.798087] I [MSGID: 100030] [glusterfsd.c:2348:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6
(args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2017-05-06 03:33:39.807859] I [MSGID: 106478] [glusterd.c:1350:init]
0-management: Maximum allowed open file descriptors set to 65536
[2017-05-06 03:33:39.807974] I [MSGID: 106479] [glusterd.c:1399:init]
0-management: Using /system/glusterd as working directory
[2017-05-06 03:33:39.826833] I [MSGID: 106513]
[glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 30706
[2017-05-06 03:33:39.827515] E [MSGID: 106206]
[glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: Failed
to get next store iter
[2017-05-06 03:33:39.827563] E [MSGID: 106207]
[glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: Failed
to update volinfo for c_glusterfs volume
[2017-05-06 03:33:39.827625] E [MSGID: 106201]
[glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management:
Unable to restore volume: c_glusterfs
[2017-05-06 03:33:39.827722] E [MSGID: 101019] [xlator.c:428:xlator_init]
0-management: Initialization of volume 'management' failed, review your
volfile again
[2017-05-06 03:33:39.827762] E [graph.c:322:glusterfs_graph_init]
0-management: initializing translator failed
[2017-05-06 03:33:39.827784] E [graph.c:661:glusterfs_graph_activate]
0-graph: init failed
[2017-05-06 03:33:39.828396] W [glusterfsd.c:1238:cleanup_and_exit]
(-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b0b8) [0x1000a648]
-->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b210) [0x1000a4d8]
-->/usr/sbin/glusterd(cleanup_and_exit-0x1beac) [0x100097ac] ) 0-: received
signum (0), shutting down


I have found one of the existing case is there and also solution patch is
available but the status of that patch in "cannot merge". Also the "info"
file is empty and "info.tmp" file present in "lib/glusterd/vol" directory.

Below is the link of the existing case.

https://review.gluster.org/#/c/16279/5

please let me know what is the plan of community to provide the solution of
this problem and in which version.

Regards
Abhishek Paliwal
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Quota limits gone after upgrading to 3.8

2017-05-08 Thread Sanoj Unnikrishnan
Hi mabi,

This bug was fixed recently, https://bugzilla.redhat.com/sh
ow_bug.cgi?id=1414346. It would be available in 3.11 release. I will plan
to back port same to earlier releases.

Your quota limits are still set and honored, It is only the listing that
has gone wrong. Using list with command with single path should display the
limit on that path. The printing of list gets messed up when the last gfid
in the quota.conf file is not present in the FS (due to an rmdir without a
remove limit)

You could use the following workaround to get rid of the issue.
 => Remove exactly the last 17 bytes of " /var/lib/glusterd/vols/<
volname>/quota.conf"
  Note: keep a backup of quota.conf for safety

If this does not solve the issue, please revert back with
1) quota.conf file
2) output of list command (when executed along with path)
3) getfattr -d -m . -e hex  | grep limit

It would be great to have your feedback for quota on this thread (
http://lists.gluster.org/pipermail/gluster-users/2017-April/030676.html)

Thanks & Regards,
Sanoj


On Mon, May 8, 2017 at 7:58 PM, mabi  wrote:

> Hello,
>
> I upgraded last week my 2 nodes replica GlusterFS cluster from 3.7.20 to
> 3.8.11 and on one of the volumes I use the quota feature of GlusterFS.
> Unfortunately, I just noticed by using the usual command "gluster volume
> quota myvolume list" that all my quotas on that volume are gone. I had
> around 10 different quotas set on different directories.
>
> Does anyone have an idea where the quotas have vanished? are they gone for
> always and do I need to re-set them all?
>
> Regards,
> M.
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] disperse volume brick counts limits in RHES

2017-05-08 Thread Alastair Neil
so the bottleneck is that computations with 16x20 matrix require  ~4 times
the cycles?  It seems then that there is ample room for improvement, as
there are many linear algebra packages out there that scale better than
O(nxm).  Is the healing time dominated by the EC compute time?  If Serkan
saw a hard 2x scaling then it seems likely.

-Alastair




On 8 May 2017 at 03:02, Xavier Hernandez  wrote:

> On 05/05/17 13:49, Pranith Kumar Karampuri wrote:
>
>>
>>
>> On Fri, May 5, 2017 at 2:38 PM, Serkan Çoban > > wrote:
>>
>> It is the over all time, 8TB data disk healed 2x faster in 8+2
>> configuration.
>>
>>
>> Wow, that is counter intuitive for me. I will need to explore about this
>> to find out why that could be. Thanks a lot for this feedback!
>>
>
> Matrix multiplication for encoding/decoding of 8+2 is 4 times faster than
> 16+4 (one matrix of 16x16 is composed by 4 submatrices of 8x8), however
> each matrix operation on a 16+4 configuration takes twice the amount of
> data of a 8+2, so net effect is that 8+2 is twice as fast as 16+4.
>
> An 8+2 also uses bigger blocks on each brick, processing the same amount
> of data in less I/O operations and bigger network packets.
>
> Probably these are the reasons why 16+4 is slower than 8+2.
>
> See my other email for more detailed description.
>
> Xavi
>
>
>>
>>
>> On Fri, May 5, 2017 at 10:00 AM, Pranith Kumar Karampuri
>> mailto:pkara...@redhat.com>> wrote:
>> >
>> >
>> > On Fri, May 5, 2017 at 11:42 AM, Serkan Çoban
>> mailto:cobanser...@gmail.com>> wrote:
>> >>
>> >> Healing gets slower as you increase m in m+n configuration.
>> >> We are using 16+4 configuration without any problems other then
>> heal
>> >> speed.
>> >> I tested heal speed with 8+2 and 16+4 on 3.9.0 and see that heals
>> on
>> >> 8+2 is faster by 2x.
>> >
>> >
>> > As you increase number of nodes that are participating in an EC
>> set number
>> > of parallel heals increase. Is the heal speed you saw improved per
>> file or
>> > the over all time it took to heal the data?
>> >
>> >>
>> >>
>> >>
>> >> On Fri, May 5, 2017 at 9:04 AM, Ashish Pandey
>> mailto:aspan...@redhat.com>> wrote:
>> >> >
>> >> > 8+2 and 8+3 configurations are not the limitation but just
>> suggestions.
>> >> > You can create 16+3 volume without any issue.
>> >> >
>> >> > Ashish
>> >> >
>> >> > 
>> >> > From: "Alastair Neil" > >
>> >> > To: "gluster-users" > >
>> >> > Sent: Friday, May 5, 2017 2:23:32 AM
>> >> > Subject: [Gluster-users] disperse volume brick counts limits in
>> RHES
>> >> >
>> >> >
>> >> > Hi
>> >> >
>> >> > we are deploying a large (24node/45brick) cluster and noted
>> that the
>> >> > RHES
>> >> > guidelines limit the number of data bricks in a disperse set to
>> 8.  Is
>> >> > there
>> >> > any reason for this.  I am aware that you want this to be a
>> power of 2,
>> >> > but
>> >> > as we have a large number of nodes we were planning on going
>> with 16+3.
>> >> > Dropping to 8+2 or 8+3 will be a real waste for us.
>> >> >
>> >> > Thanks,
>> >> >
>> >> >
>> >> > Alastair
>> >> >
>> >> >
>> >> > ___
>> >> > Gluster-users mailing list
>> >> > Gluster-users@gluster.org 
>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>> >> >
>> >> >
>> >> > ___
>> >> > Gluster-users mailing list
>> >> > Gluster-users@gluster.org 
>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>> >> ___
>> >> Gluster-users mailing list
>> >> Gluster-users@gluster.org 
>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>> >
>> >
>> >
>> >
>> > --
>> > Pranith
>>
>>
>>
>>
>> --
>> Pranith
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Elasticsearch facing CorruptIndexException exception with GlusterFs 3.10.1

2017-05-08 Thread Vijay Bellur
On Mon, May 8, 2017 at 1:46 PM, Abhijit Paul 
wrote:

> *@Prasanna & @Pranith & @Keithley *Thank you very much for this update,
> let me try out this with build frm src and then use with Elasticsearch in
> kubernetes environment, i will let you know the update.
>
> My hole intention to use this gluster-block solution is to *avoid
> Elasticsearch index health turn RED issue due to CorruptIndex* issue by
> using GlusterFS & FUSE, *on this regard if any further pointer or
> forwards are there will relay appreciate.*
>
>>
>>
>>
>
We expect gluster's block interface to not have the same problem as
encountered by the file system interface with Elasticsearch.  Our limited
testing validates that expectation as outlined in Prasanna's blog post.

Feedback from your testing would be very welcome! Please feel free to let
us know if you require any help in getting the deployment working.

Thanks!
Vijay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Release 3.11: RC0 has been tagged

2017-05-08 Thread Shyam

Hi,

Pending features for 3.11 have been merged (and those that did not make 
it have been moved out of the 3.11 release window). Thus, leading to 
creating 3.11 RC0 tag in the gluster repositories.


Packagers have been notified via mail, and packages for the different 
distributions will be made available soon.


We would like to, at this point of the release, encourage users and the 
development community, to *test 3.11* and provide feedback on the lists, 
or raise bugs [1].


If any bug you raise, is a blocker for the release, please add it to the 
release tracker as well [2].


The scratch version of the release notes can be found here [3], and 
request all developers who added features to 3.11, to send in their 
respective commits for updating the release notes with the required 
information (please use the same github issue# as the feature, when 
posting commits against the release-notes, that way the issue also gets 
updated with a reference to the commit).


This is also a good time for developers to edit gluster documentation, 
to add details regarding the features added to 3.11 [4].


Thanks,
Shyam and Kaushal

[1] File a bug: https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS

[2] Tracker BZ for 3.11.0 blockers: 
https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.11.0


[3] Release notes: 
https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md


[4] Gluster documentation repository: https://github.com/gluster/glusterdocs
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Elasticsearch facing CorruptIndexException exception with GlusterFs 3.10.1

2017-05-08 Thread Kaleb S. KEITHLEY

On 05/08/2017 05:32 AM, Pranith Kumar Karampuri wrote:
  We released gluster-block v0.2 just this Friday for which RHEL 
packages are yet to be built.


+Kaleb,
  Could you help with this please?


Of course this doesn't build in EPEL, where glusterfs-api-devel is not 
available. EPEL does not have packages that exist in RHEL; RHEL has 
glusterfs, but not glusterfs-api-devel, which is in a channel other than 
the base for reasons I don't understand.


There's no way to do scratch builds in CentOS's Storage SIG. We can add 
it to the Storage SIG after it has been accepted into Fedora. I asked 
previously, but didn't get an answer, about the status is of getting 
this into Fedora.


The best option ATM is to do a mockchain build — which anyone can do — 
using the glusterfs src.rpm, e.g. from the Storage SIG, and the 
gluster-block src.rpm.




Prasanna,
  Could you let Abhijit know the rpm versions for tcmu-runner and 
other packages so that this feature can be used?



On Mon, May 8, 2017 at 2:49 PM, Pranith Kumar Karampuri 
mailto:pkara...@redhat.com>> wrote:


Wait, wait we are discussing this issue only. Expect a reply in some
time :-)

On Mon, May 8, 2017 at 2:19 PM, Abhijit Paul
mailto:er.abhijitp...@gmail.com>> wrote:

poking for previous mail reply

On Sun, May 7, 2017 at 1:06 AM, Abhijit Paul
mailto:er.abhijitp...@gmail.com>> wrote:


https://pkalever.wordpress.com/2017/03/14/elasticsearch-with-gluster-block/


here used tested environment is Fedora ,
but i am using RHEL based Oracle linux so does gluster-block
compatible with RHEL as well? What i needs to change & make
it work?

On Fri, May 5, 2017 at 5:42 PM, Pranith Kumar Karampuri
mailto:pkara...@redhat.com>> wrote:



On Fri, May 5, 2017 at 5:40 PM, Pranith Kumar Karampuri
mailto:pkara...@redhat.com>> wrote:



On Fri, May 5, 2017 at 5:36 PM, Abhijit Paul
mailto:er.abhijitp...@gmail.com>> wrote:

So should i start using gluster-block with
elasticsearch in kubernetes environment?

My expectation from gluster-block is, it should
not CorruptIndex of elasticsearch...and issue
facing in previous mails.

Please let me know whether should i
processed with above mentioned combination.


We are still in the process of fixing the failure
scenarios of tcmu-runner dying and failingover in
the multipath scenarios.


Prasanna did test that elasticsearch itself worked fine
in gluster-block environment when all the machines are
up etc i.e. success path. We are doing failure path
testing and fixing things at the moment.


On Fri, May 5, 2017 at 5:06 PM, Pranith Kumar
Karampuri mailto:pkara...@redhat.com>> wrote:

Abhijit we just started making the efforts
to get all of this stable.

On Fri, May 5, 2017 at 4:45 PM, Abhijit Paul
mailto:er.abhijitp...@gmail.com>> wrote:

I yet to try gluster-block with
elasticsearch...but carious to know does
this combination plays well in
kubernetes environment?

On Fri, May 5, 2017 at 12:14 PM, Abhijit
Paul mailto:er.abhijitp...@gmail.com>> wrote:

thanks Krutika for the alternative.

@*Prasanna @**Pranith*
I was going thorough the mentioned
blog post and saw that used tested
environment was Fedora ,
but i am using RHEL based Oracle
linux so does gluster-block
compatible with RHEL as well?

On Fri, May 5, 2017 at 12:03 PM,
Krutika Dhananjay
mailto:kdhan...@redhat.com>> wrote:

Yeah, there are a couple of
cache consistency issues with
performance translators that are
causing these exceptions.
Some of them were fixed b

Re: [Gluster-users] Elasticsearch facing CorruptIndexException exception with GlusterFs 3.10.1

2017-05-08 Thread Abhijit Paul
*@Prasanna & @Pranith & @Keithley *Thank you very much for this update, let
me try out this with build frm src and then use with Elasticsearch in
kubernetes environment, i will let you know the update.

My hole intention to use this gluster-block solution is to *avoid
Elasticsearch index health turn RED issue due to CorruptIndex* issue by
using GlusterFS & FUSE, *on this regard if any further pointer or forwards
are there will relay appreciate.*

Regards,
Abhijit

On Mon, May 8, 2017 at 5:26 PM, Prasanna Kalever 
wrote:

> On Mon, May 8, 2017 at 3:02 PM, Pranith Kumar Karampuri
>  wrote:
> > Abhijit,
> >  We released gluster-block v0.2 just this Friday for which RHEL
> > packages are yet to be built.
> >
> > +Kaleb,
> >  Could you help with this please?
> >
> > Prasanna,
> >  Could you let Abhijit know the rpm versions for tcmu-runner and
> other
> > packages so that this feature can be used?
>
> Hi Abhijit,
>
> We will soon try to help you with the centos packages, but for time
> being you can try compile from sources, dependencies include
>
> gluster-block [1]: use v0.2, INSTALL guide [2]
> tcmu-runner [3] : >= 1.0.4 ( you can use current head for time being )
> targetcli [4]: >= 2.1.fb43 (try yum install, you should get this in
> RHEL, this pulls the required deps)
>
> [1] https://github.com/gluster/gluster-block
> [2] https://github.com/gluster/gluster-block/blob/master/INSTALL
> [3] https://github.com/open-iscsi/tcmu-runner
> [4] https://github.com/open-iscsi/targetcli-fb
>
>
> Note: gluster-block(server side) will be supported soon on RHEL 7.3 and +
> only.
>
> Cheers!
> --
> prasanna
>
>
>
>
> >
> >
> > On Mon, May 8, 2017 at 2:49 PM, Pranith Kumar Karampuri
> >  wrote:
> >>
> >> Wait, wait we are discussing this issue only. Expect a reply in some
> time
> >> :-)
> >>
> >> On Mon, May 8, 2017 at 2:19 PM, Abhijit Paul 
> >> wrote:
> >>>
> >>> poking for previous mail reply
> >>>
> >>> On Sun, May 7, 2017 at 1:06 AM, Abhijit Paul  >
> >>> wrote:
> 
> 
>  https://pkalever.wordpress.com/2017/03/14/elasticsearch-
> with-gluster-block/
>  here used tested environment is Fedora ,
>  but i am using RHEL based Oracle linux so does gluster-block
> compatible
>  with RHEL as well? What i needs to change & make it work?
> 
>  On Fri, May 5, 2017 at 5:42 PM, Pranith Kumar Karampuri
>   wrote:
> >
> >
> >
> > On Fri, May 5, 2017 at 5:40 PM, Pranith Kumar Karampuri
> >  wrote:
> >>
> >>
> >>
> >> On Fri, May 5, 2017 at 5:36 PM, Abhijit Paul
> >>  wrote:
> >>>
> >>> So should i start using gluster-block with elasticsearch in
> >>> kubernetes environment?
> >>>
> >>> My expectation from gluster-block is, it should not CorruptIndex of
> >>> elasticsearch...and issue facing in previous mails.
> >>>
> >>> Please let me know whether should i processed with above mentioned
> >>> combination.
> >>
> >>
> >> We are still in the process of fixing the failure scenarios of
> >> tcmu-runner dying and failingover in the multipath scenarios.
> >
> >
> > Prasanna did test that elasticsearch itself worked fine in
> > gluster-block environment when all the machines are up etc i.e.
> success
> > path. We are doing failure path testing and fixing things at the
> moment.
> >
> >>
> >>
> >>>
> >>>
> >>> On Fri, May 5, 2017 at 5:06 PM, Pranith Kumar Karampuri
> >>>  wrote:
> 
>  Abhijit we just started making the efforts to get all of this
>  stable.
> 
>  On Fri, May 5, 2017 at 4:45 PM, Abhijit Paul
>   wrote:
> >
> > I yet to try gluster-block with elasticsearch...but carious to
> know
> > does this combination plays well in kubernetes environment?
> >
> > On Fri, May 5, 2017 at 12:14 PM, Abhijit Paul
> >  wrote:
> >>
> >> thanks Krutika for the alternative.
> >>
> >> @Prasanna @Pranith
> >> I was going thorough the mentioned blog post and saw that used
> >> tested environment was Fedora ,
> >> but i am using RHEL based Oracle linux so does gluster-block
> >> compatible with RHEL as well?
> >>
> >> On Fri, May 5, 2017 at 12:03 PM, Krutika Dhananjay
> >>  wrote:
> >>>
> >>> Yeah, there are a couple of cache consistency issues with
> >>> performance translators that are causing these exceptions.
> >>> Some of them were fixed by 3.10.1. Some still remain.
> >>>
> >>> Alternatively you can give gluster-block + elasticsearch a try,
> >>> which doesn't require solving all these caching issues.
> >>> Here's a blog post on the same -
> >>> https://pkalever.wordpress.com/2017/03/14/elasticsearch-
> with-gluster-block/
> >>>
> >>> Adding Prasanna and Pranith who worked on this, in case you

[Gluster-users] Quota limits gone after upgrading to 3.8

2017-05-08 Thread mabi
Hello,

I upgraded last week my 2 nodes replica GlusterFS cluster from 3.7.20 to 3.8.11 
and on one of the volumes I use the quota feature of GlusterFS. Unfortunately, 
I just noticed by using the usual command "gluster volume quota myvolume list" 
that all my quotas on that volume are gone. I had around 10 different quotas 
set on different directories.

Does anyone have an idea where the quotas have vanished? are they gone for 
always and do I need to re-set them all?

Regards,
M.___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] VM going down

2017-05-08 Thread Alessandro Briosi
Il 08/05/2017 12:38, Krutika Dhananjay ha scritto:
> The newly introduced "SEEK" fop seems to be failing at the bricks.
>
> Adding Niels for his inputs/help.
>

Don't know if this is related though the SEEK is done only when the VM
is started, not when it's suddenly shutdown.
Though it's an odd message (as the file really is there), the VM starts
correctly.

Alessandro
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] VM going down

2017-05-08 Thread Alessandro Briosi
Il 08/05/2017 12:57, Jesper Led Lauridsen TS Infra server ha scritto:
>
> I dont know if this has any relation to you issue. But I have seen
> several times during gluster healing that my wm’s fail or are marked
> unresponsive in rhev. My conclusion is that the load gluster puts on
> the wm-images during checksum while healing, result in to much latency
> and wm’s fail.
>
>  
>
> My plans is to try using sharding, so the wm-images/files are split
> into smaller files, changing the number of allowed concurrent heals
> ‘cluster.background-self-heal-count’ and disabling
> ‘cluster.self-heal-daemon’.
>

The thing is that there are no heal processes running, no log entries
either.
Few days ago I had a failure and the heal process started and finished
without any problems.

I do not use sharding yet.

Alessandro
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Bad perf for small files on large EC volume

2017-05-08 Thread Serkan Çoban
There are 300M files right I am not counting wrong?
With that file profile I would never use EC in first place.
Maybe you can pack the files into tar archives or similar before
migrating to gluster?
It will take ages to heal a drive with that file count...

On Mon, May 8, 2017 at 3:59 PM, Ingard Mevåg  wrote:
> With attachments :)
>
> 2017-05-08 14:57 GMT+02:00 Ingard Mevåg :
>>
>> Hi
>>
>> We've got 3 servers with 60 drives each setup with an EC volume running on
>> gluster 3.10.0
>> The servers are connected via 10gigE.
>>
>> We've done the changes recommended here :
>> https://bugzilla.redhat.com/show_bug.cgi?id=1349953#c17 and we're able to
>> max out the network with the iozone tests referenced in the same ticket.
>>
>> However for small files we are getting 3-5 MB/s with the smallfile_cli.py
>> tool. For instance:
>> python smallfile_cli.py --operation create --threads 32 --file-size 100
>> --files 1000 --top /tmp/dfs-archive-001/
>> .
>> .
>> total threads = 32
>> total files = 31294
>> total data = 2.984 GB
>>  97.79% of requested files processed, minimum is  90.00
>> 785.542908 sec elapsed time
>> 39.837416 files/sec
>> 39.837416 IOPS
>> 3.890373 MB/sec
>> .
>>
>> We're going to use these servers for archive purposes, so the files will
>> be moved there and accessed very little. After noticing our migration tool
>> performing very badly we did some analyses on the data actually being moved
>> :
>>
>> Bucket 31808791 (16.27 GB) :: 0 bytes - 1.00 KB
>> Bucket 49448258 (122.89 GB) :: 1.00 KB - 5.00 KB
>> Bucket 13382242 (96.92 GB) :: 5.00 KB - 10.00 KB
>> Bucket 13557684 (195.15 GB) :: 10.00 KB - 20.00 KB
>> Bucket 22735245 (764.96 GB) :: 20.00 KB - 50.00 KB
>> Bucket 15101878 (1041.56 GB) :: 50.00 KB - 100.00 KB
>> Bucket 10734103 (1558.35 GB) :: 100.00 KB - 200.00 KB
>> Bucket 17695285 (5773.74 GB) :: 200.00 KB - 500.00 KB
>> Bucket 13632394 (10039.92 GB) :: 500.00 KB - 1.00 MB
>> Bucket 21815815 (32641.81 GB) :: 1.00 MB - 2.00 MB
>> Bucket 36940815 (117683.33 GB) :: 2.00 MB - 5.00 MB
>> Bucket 13580667 (91899.10 GB) :: 5.00 MB - 10.00 MB
>> Bucket 10945768 (232316.33 GB) :: 10.00 MB - 50.00 MB
>> Bucket 1723848 (542581.89 GB) :: 50.00 MB - 9223372036.85 GB
>>
>> So it turns out we've got a very large number of very small files being
>> written to this volume.
>> I've attached the volume config and 2 profiling runs so if someone wants
>> to take a look and maybe give us some hints in terms of what volume settings
>> will be best for writing a lot of small files that would be much
>> appreciated.
>>
>> kind regards
>> ingard
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Bad perf for small files on large EC volume

2017-05-08 Thread Ingard Mevåg
Hi

We've got 3 servers with 60 drives each setup with an EC volume running on
gluster 3.10.0
The servers are connected via 10gigE.

We've done the changes recommended here :
https://bugzilla.redhat.com/show_bug.cgi?id=1349953#c17 and we're able to
max out the network with the iozone tests referenced in the same ticket.

However for small files we are getting 3-5 MB/s with the smallfile_cli.py
tool. For instance:
python smallfile_cli.py --operation create --threads 32 --file-size 100
--files 1000 --top /tmp/dfs-archive-001/
.
.
total threads = 32
total files = 31294
total data = 2.984 GB
 97.79% of requested files processed, minimum is  90.00
785.542908 sec elapsed time
39.837416 files/sec
39.837416 IOPS
3.890373 MB/sec
.

We're going to use these servers for archive purposes, so the files will be
moved there and accessed very little. After noticing our migration tool
performing very badly we did some analyses on the data actually being moved
:

Bucket 31808791 (16.27 GB) :: 0 bytes - 1.00 KB
Bucket 49448258 (122.89 GB) :: 1.00 KB - 5.00 KB
Bucket 13382242 (96.92 GB) :: 5.00 KB - 10.00 KB
Bucket 13557684 (195.15 GB) :: 10.00 KB - 20.00 KB
Bucket 22735245 (764.96 GB) :: 20.00 KB - 50.00 KB
Bucket 15101878 (1041.56 GB) :: 50.00 KB - 100.00 KB
Bucket 10734103 (1558.35 GB) :: 100.00 KB - 200.00 KB
Bucket 17695285 (5773.74 GB) :: 200.00 KB - 500.00 KB
Bucket 13632394 (10039.92 GB) :: 500.00 KB - 1.00 MB
Bucket 21815815 (32641.81 GB) :: 1.00 MB - 2.00 MB
Bucket 36940815 (117683.33 GB) :: 2.00 MB - 5.00 MB
Bucket 13580667 (91899.10 GB) :: 5.00 MB - 10.00 MB
Bucket 10945768 (232316.33 GB) :: 10.00 MB - 50.00 MB
Bucket 1723848 (542581.89 GB) :: 50.00 MB - 9223372036.85 GB

So it turns out we've got a very large number of very small files being
written to this volume.
I've attached the volume config and 2 profiling runs so if someone wants to
take a look and maybe give us some hints in terms of what volume settings
will be best for writing a lot of small files that would be much
appreciated.

kind regards
ingard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Elasticsearch facing CorruptIndexException exception with GlusterFs 3.10.1

2017-05-08 Thread Kaleb S. KEITHLEY

On 05/08/2017 05:32 AM, Pranith Kumar Karampuri wrote:

Abhijit,
  We released gluster-block v0.2 just this Friday for which RHEL 
packages are yet to be built.


+Kaleb,
  Could you help with this please?

Prasanna,
  Could you let Abhijit know the rpm versions for tcmu-runner and 
other packages so that this feature can be used?




For RHEL 7?

Do you want CentOS Storage SIG packages?

And what is the status of getting this into Fedora, which could include 
EPEL (instead of CentOS Storage SIG)?


In the mean time I can try scratch builds in Fedora/EPEL using the 
src.rpm from the COPR build. If there are no dependency issues that 
should work. For now.


--

Kaleb


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Elasticsearch facing CorruptIndexException exception with GlusterFs 3.10.1

2017-05-08 Thread Prasanna Kalever
On Mon, May 8, 2017 at 3:02 PM, Pranith Kumar Karampuri
 wrote:
> Abhijit,
>  We released gluster-block v0.2 just this Friday for which RHEL
> packages are yet to be built.
>
> +Kaleb,
>  Could you help with this please?
>
> Prasanna,
>  Could you let Abhijit know the rpm versions for tcmu-runner and other
> packages so that this feature can be used?

Hi Abhijit,

We will soon try to help you with the centos packages, but for time
being you can try compile from sources, dependencies include

gluster-block [1]: use v0.2, INSTALL guide [2]
tcmu-runner [3] : >= 1.0.4 ( you can use current head for time being )
targetcli [4]: >= 2.1.fb43 (try yum install, you should get this in
RHEL, this pulls the required deps)

[1] https://github.com/gluster/gluster-block
[2] https://github.com/gluster/gluster-block/blob/master/INSTALL
[3] https://github.com/open-iscsi/tcmu-runner
[4] https://github.com/open-iscsi/targetcli-fb


Note: gluster-block(server side) will be supported soon on RHEL 7.3 and + only.

Cheers!
--
prasanna




>
>
> On Mon, May 8, 2017 at 2:49 PM, Pranith Kumar Karampuri
>  wrote:
>>
>> Wait, wait we are discussing this issue only. Expect a reply in some time
>> :-)
>>
>> On Mon, May 8, 2017 at 2:19 PM, Abhijit Paul 
>> wrote:
>>>
>>> poking for previous mail reply
>>>
>>> On Sun, May 7, 2017 at 1:06 AM, Abhijit Paul 
>>> wrote:


 https://pkalever.wordpress.com/2017/03/14/elasticsearch-with-gluster-block/
 here used tested environment is Fedora ,
 but i am using RHEL based Oracle linux so does gluster-block compatible
 with RHEL as well? What i needs to change & make it work?

 On Fri, May 5, 2017 at 5:42 PM, Pranith Kumar Karampuri
  wrote:
>
>
>
> On Fri, May 5, 2017 at 5:40 PM, Pranith Kumar Karampuri
>  wrote:
>>
>>
>>
>> On Fri, May 5, 2017 at 5:36 PM, Abhijit Paul
>>  wrote:
>>>
>>> So should i start using gluster-block with elasticsearch in
>>> kubernetes environment?
>>>
>>> My expectation from gluster-block is, it should not CorruptIndex of
>>> elasticsearch...and issue facing in previous mails.
>>>
>>> Please let me know whether should i processed with above mentioned
>>> combination.
>>
>>
>> We are still in the process of fixing the failure scenarios of
>> tcmu-runner dying and failingover in the multipath scenarios.
>
>
> Prasanna did test that elasticsearch itself worked fine in
> gluster-block environment when all the machines are up etc i.e. success
> path. We are doing failure path testing and fixing things at the moment.
>
>>
>>
>>>
>>>
>>> On Fri, May 5, 2017 at 5:06 PM, Pranith Kumar Karampuri
>>>  wrote:

 Abhijit we just started making the efforts to get all of this
 stable.

 On Fri, May 5, 2017 at 4:45 PM, Abhijit Paul
  wrote:
>
> I yet to try gluster-block with elasticsearch...but carious to know
> does this combination plays well in kubernetes environment?
>
> On Fri, May 5, 2017 at 12:14 PM, Abhijit Paul
>  wrote:
>>
>> thanks Krutika for the alternative.
>>
>> @Prasanna @Pranith
>> I was going thorough the mentioned blog post and saw that used
>> tested environment was Fedora ,
>> but i am using RHEL based Oracle linux so does gluster-block
>> compatible with RHEL as well?
>>
>> On Fri, May 5, 2017 at 12:03 PM, Krutika Dhananjay
>>  wrote:
>>>
>>> Yeah, there are a couple of cache consistency issues with
>>> performance translators that are causing these exceptions.
>>> Some of them were fixed by 3.10.1. Some still remain.
>>>
>>> Alternatively you can give gluster-block + elasticsearch a try,
>>> which doesn't require solving all these caching issues.
>>> Here's a blog post on the same -
>>> https://pkalever.wordpress.com/2017/03/14/elasticsearch-with-gluster-block/
>>>
>>> Adding Prasanna and Pranith who worked on this, in case you need
>>> more info on this.
>>>
>>> -Krutika
>>>
>>> On Fri, May 5, 2017 at 12:15 AM, Abhijit Paul
>>>  wrote:

 Thanks for the reply, i will try it out but i am also facing one
 more issue "i.e. replicated volumes returning different timestamps"
 so is this because of Bug 1426548 - Openshift Logging
 ElasticSearch FSLocks when using GlusterFS storage backend ?

 FYI i am using glusterfs 3.10.1 tar.gz

 Regards,
 Abhijit



 On Thu, May 4, 2017 at 10:58 PM, Amar Tumballi
  wrote:
>
>
>
> On Thu, May 4, 2017 at 10:41 P

Re: [Gluster-users] VM going down

2017-05-08 Thread Jesper Led Lauridsen TS Infra server
I dont know if this has any relation to you issue. But I have seen several 
times during gluster healing that my wm’s fail or are marked unresponsive in 
rhev. My conclusion is that the load gluster puts on the wm-images during 
checksum while healing, result in to much latency and wm’s fail.

My plans is to try using sharding, so the wm-images/files are split into 
smaller files, changing the number of allowed concurrent heals 
‘cluster.background-self-heal-count’ and disabling ‘cluster.self-heal-daemon’.

/Jesper

Fra: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] På vegne af Krutika Dhananjay
Sendt: 8. maj 2017 12:38
Til: Alessandro Briosi ; de Vos, Niels 
Cc: gluster-users 
Emne: Re: [Gluster-users] VM going down

The newly introduced "SEEK" fop seems to be failing at the bricks.
Adding Niels for his inputs/help.

-Krutika

On Mon, May 8, 2017 at 3:43 PM, Alessandro Briosi 
mailto:a...@metalit.com>> wrote:
Hi all,
I have sporadic VM going down which files are on gluster FS.

If I look at the gluster logs the only events I find are:
/var/log/glusterfs/bricks/data-brick2-brick.log

[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
66d9eefb-ee55-40ad-9f44-c55d1e809006 held by {client=0x7f4c7c004880,
pid=0 lk-owner=5c7099efc97f}
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
a8d82b3d-1cf9-45cf-9858-d8546710b49c held by {client=0x7f4c840f31d0,
pid=0 lk-owner=5c7019fac97f}
[2017-05-08 09:51:17.661835] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-2.qcow2
[2017-05-08 09:51:17.661838] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-1.qcow2
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 10:01:06.210392] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
[2017-05-08 10:01:06.237463] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(a8d82b3d-1cf9-45cf-9858-d8546710b49c) ==> (No such device or address)
[No such device or address]
[2017-05-08 10:01:07.019974] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:07:3687-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:07.041967] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 19 length 859136720896 [No such
device or address]
[2017-05-08 10:01:07.041992] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(66d9eefb-ee55-40ad-9f44-c55d1e809006) ==> (No such device or address)
[No such device or address]

The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .

This is not the first time that this happened, and I do not see any
problems with networking or the hosts.

Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one too)

Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet

Any hint on how to dig more deeply into the reason would be greatly
appreciated.

Alessandro
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://list

Re: [Gluster-users] VM going down

2017-05-08 Thread Krutika Dhananjay
The newly introduced "SEEK" fop seems to be failing at the bricks.

Adding Niels for his inputs/help.

-Krutika

On Mon, May 8, 2017 at 3:43 PM, Alessandro Briosi  wrote:

> Hi all,
> I have sporadic VM going down which files are on gluster FS.
>
> If I look at the gluster logs the only events I find are:
> /var/log/glusterfs/bricks/data-brick2-brick.log
>
> [2017-05-08 09:51:17.661697] I [MSGID: 115036]
> [server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
> connection from
> srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
> [2017-05-08 09:51:17.661697] I [MSGID: 115036]
> [server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
> connection from
> srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
> [2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
> 0-datastore2-server: releasing lock on
> 66d9eefb-ee55-40ad-9f44-c55d1e809006 held by {client=0x7f4c7c004880,
> pid=0 lk-owner=5c7099efc97f}
> [2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
> 0-datastore2-server: releasing lock on
> a8d82b3d-1cf9-45cf-9858-d8546710b49c held by {client=0x7f4c840f31d0,
> pid=0 lk-owner=5c7019fac97f}
> [2017-05-08 09:51:17.661835] I [MSGID: 115013]
> [server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
> /images/201/vm-201-disk-2.qcow2
> [2017-05-08 09:51:17.661838] I [MSGID: 115013]
> [server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
> /images/201/vm-201-disk-1.qcow2
> [2017-05-08 09:51:17.661953] I [MSGID: 101055]
> [client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
> connection srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
> [2017-05-08 09:51:17.661953] I [MSGID: 101055]
> [client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
> connection srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
> [2017-05-08 10:01:06.210392] I [MSGID: 115029]
> [server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
> client from
> srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> (version: 3.8.11)
> [2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]
> 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
> device or address]
> [2017-05-08 10:01:06.237463] E [MSGID: 115089]
> [server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
> (a8d82b3d-1cf9-45cf-9858-d8546710b49c) ==> (No such device or address)
> [No such device or address]
> [2017-05-08 10:01:07.019974] I [MSGID: 115029]
> [server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
> client from
> srvpve2-162483-2017/05/08-10:01:07:3687-datastore2-client-0-0-0
> (version: 3.8.11)
> [2017-05-08 10:01:07.041967] E [MSGID: 113107] [posix.c:1079:posix_seek]
> 0-datastore2-posix: seek failed on fd 19 length 859136720896 [No such
> device or address]
> [2017-05-08 10:01:07.041992] E [MSGID: 115089]
> [server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
> (66d9eefb-ee55-40ad-9f44-c55d1e809006) ==> (No such device or address)
> [No such device or address]
>
> The strange part is that I cannot seem to find any other error.
> If I restart the VM everything works as expected (it stopped at ~9.51
> UTC and was started at ~10.01 UTC) .
>
> This is not the first time that this happened, and I do not see any
> problems with networking or the hosts.
>
> Gluster version is 3.8.11
> this is the incriminated volume (though it happened on a different one too)
>
> Volume Name: datastore2
> Type: Replicate
> Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: srvpve2g:/data/brick2/brick
> Brick2: srvpve3g:/data/brick2/brick
> Brick3: srvpve1g:/data/brick2/brick (arbiter)
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
>
> Any hint on how to dig more deeply into the reason would be greatly
> appreciated.
>
> Alessandro
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] VM going down

2017-05-08 Thread Alessandro Briosi
Hi all,
I have sporadic VM going down which files are on gluster FS.

If I look at the gluster logs the only events I find are:
/var/log/glusterfs/bricks/data-brick2-brick.log

[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661697] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
connection from
srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
66d9eefb-ee55-40ad-9f44-c55d1e809006 held by {client=0x7f4c7c004880,
pid=0 lk-owner=5c7099efc97f}
[2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
0-datastore2-server: releasing lock on
a8d82b3d-1cf9-45cf-9858-d8546710b49c held by {client=0x7f4c840f31d0,
pid=0 lk-owner=5c7019fac97f}
[2017-05-08 09:51:17.661835] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-2.qcow2
[2017-05-08 09:51:17.661838] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
/images/201/vm-201-disk-1.qcow2
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
[2017-05-08 09:51:17.661953] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
connection srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
[2017-05-08 10:01:06.210392] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
device or address]
[2017-05-08 10:01:06.237463] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(a8d82b3d-1cf9-45cf-9858-d8546710b49c) ==> (No such device or address)
[No such device or address]
[2017-05-08 10:01:07.019974] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
client from
srvpve2-162483-2017/05/08-10:01:07:3687-datastore2-client-0-0-0
(version: 3.8.11)
[2017-05-08 10:01:07.041967] E [MSGID: 113107] [posix.c:1079:posix_seek]
0-datastore2-posix: seek failed on fd 19 length 859136720896 [No such
device or address]
[2017-05-08 10:01:07.041992] E [MSGID: 115089]
[server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
(66d9eefb-ee55-40ad-9f44-c55d1e809006) ==> (No such device or address)
[No such device or address]

The strange part is that I cannot seem to find any other error.
If I restart the VM everything works as expected (it stopped at ~9.51
UTC and was started at ~10.01 UTC) .

This is not the first time that this happened, and I do not see any
problems with networking or the hosts.

Gluster version is 3.8.11
this is the incriminated volume (though it happened on a different one too)

Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet

Any hint on how to dig more deeply into the reason would be greatly
appreciated.

Alessandro
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Elasticsearch facing CorruptIndexException exception with GlusterFs 3.10.1

2017-05-08 Thread Pranith Kumar Karampuri
Abhijit,
 We released gluster-block v0.2 just this Friday for which RHEL
packages are yet to be built.

+Kaleb,
 Could you help with this please?

Prasanna,
 Could you let Abhijit know the rpm versions for tcmu-runner and other
packages so that this feature can be used?


On Mon, May 8, 2017 at 2:49 PM, Pranith Kumar Karampuri  wrote:

> Wait, wait we are discussing this issue only. Expect a reply in some time
> :-)
>
> On Mon, May 8, 2017 at 2:19 PM, Abhijit Paul 
> wrote:
>
>> poking for previous mail reply
>>
>> On Sun, May 7, 2017 at 1:06 AM, Abhijit Paul 
>> wrote:
>>
>>>  https://pkalever.wordpress.com/2017/03/14/elasticsearch-wit
>>> h-gluster-block/
>>> here used tested environment is Fedora ,
>>> but i am using RHEL based Oracle linux so does gluster-block compatible
>>> with RHEL as well? What i needs to change & make it work?
>>>
>>> On Fri, May 5, 2017 at 5:42 PM, Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>


 On Fri, May 5, 2017 at 5:40 PM, Pranith Kumar Karampuri <
 pkara...@redhat.com> wrote:

>
>
> On Fri, May 5, 2017 at 5:36 PM, Abhijit Paul  > wrote:
>
>> So should i start using gluster-block with elasticsearch in kubernetes
>> environment?
>>
>> My expectation from gluster-block is, it should not CorruptIndex
>> of elasticsearch...and issue facing in previous mails.
>>
>> Please let me know whether should i processed with above mentioned
>> combination.
>>
>
> We are still in the process of fixing the failure scenarios of
> tcmu-runner dying and failingover in the multipath scenarios.
>

 Prasanna did test that elasticsearch itself worked fine in
 gluster-block environment when all the machines are up etc i.e. success
 path. We are doing failure path testing and fixing things at the moment.


>
>
>>
>> On Fri, May 5, 2017 at 5:06 PM, Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>> Abhijit we just started making the efforts to get all of this stable.
>>>
>>> On Fri, May 5, 2017 at 4:45 PM, Abhijit Paul <
>>> er.abhijitp...@gmail.com> wrote:
>>>
 I yet to try gluster-block with elasticsearch...but carious to know
 does this combination plays well in kubernetes environment?

 On Fri, May 5, 2017 at 12:14 PM, Abhijit Paul <
 er.abhijitp...@gmail.com> wrote:

> thanks Krutika for the alternative.
>
> @*Prasanna @**Pranith*
> I was going thorough the mentioned blog post and saw that used
> tested environment was Fedora ,
> but i am using RHEL based Oracle linux so does gluster-block
> compatible with RHEL as well?
>
> On Fri, May 5, 2017 at 12:03 PM, Krutika Dhananjay <
> kdhan...@redhat.com> wrote:
>
>> Yeah, there are a couple of cache consistency issues with
>> performance translators that are causing these exceptions.
>> Some of them were fixed by 3.10.1. Some still remain.
>>
>> Alternatively you can give gluster-block + elasticsearch a try,
>> which doesn't require solving all these caching issues.
>> Here's a blog post on the same - https://pkalever.wordpress.com
>> /2017/03/14/elasticsearch-with-gluster-block/
>>
>> Adding Prasanna and Pranith who worked on this, in case you need
>> more info on this.
>>
>> -Krutika
>>
>> On Fri, May 5, 2017 at 12:15 AM, Abhijit Paul <
>> er.abhijitp...@gmail.com> wrote:
>>
>>> Thanks for the reply, i will try it out but i am also facing one
>>> more issue "i.e. replicated volumes returning different
>>> timestamps"
>>> so is this because of Bug 1426548 - Openshift Logging
>>> ElasticSearch FSLocks when using GlusterFS storage backend
>>>  ?
>>>
>>> *FYI i am using glusterfs 3.10.1 tar.gz*
>>>
>>> Regards,
>>> Abhijit
>>>
>>>
>>>
>>> On Thu, May 4, 2017 at 10:58 PM, Amar Tumballi <
>>> atumb...@redhat.com> wrote:
>>>


 On Thu, May 4, 2017 at 10:41 PM, Abhijit Paul <
 er.abhijitp...@gmail.com> wrote:

> Since i am new to gluster, can please provide how to turn 
> off/disable
> "perf xlator options"?
>
>
 $ gluster volume set  performance.stat-prefetch off
 $ gluster volume set  performance.read-ahead off
 $ gluster volume set  performance.write-behind off
 $ gluster volume set  performance.io-cache off
 $ gluster volume set  performance.quick-read off


 Regards,
 Amar
>

Re: [Gluster-users] Elasticsearch facing CorruptIndexException exception with GlusterFs 3.10.1

2017-05-08 Thread Pranith Kumar Karampuri
Wait, wait we are discussing this issue only. Expect a reply in some time
:-)

On Mon, May 8, 2017 at 2:19 PM, Abhijit Paul 
wrote:

> poking for previous mail reply
>
> On Sun, May 7, 2017 at 1:06 AM, Abhijit Paul 
> wrote:
>
>>  https://pkalever.wordpress.com/2017/03/14/elasticsearch-wit
>> h-gluster-block/
>> here used tested environment is Fedora ,
>> but i am using RHEL based Oracle linux so does gluster-block compatible
>> with RHEL as well? What i needs to change & make it work?
>>
>> On Fri, May 5, 2017 at 5:42 PM, Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, May 5, 2017 at 5:40 PM, Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>


 On Fri, May 5, 2017 at 5:36 PM, Abhijit Paul 
 wrote:

> So should i start using gluster-block with elasticsearch in kubernetes
> environment?
>
> My expectation from gluster-block is, it should not CorruptIndex
> of elasticsearch...and issue facing in previous mails.
>
> Please let me know whether should i processed with above mentioned
> combination.
>

 We are still in the process of fixing the failure scenarios of
 tcmu-runner dying and failingover in the multipath scenarios.

>>>
>>> Prasanna did test that elasticsearch itself worked fine in gluster-block
>>> environment when all the machines are up etc i.e. success path. We are
>>> doing failure path testing and fixing things at the moment.
>>>
>>>


>
> On Fri, May 5, 2017 at 5:06 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> Abhijit we just started making the efforts to get all of this stable.
>>
>> On Fri, May 5, 2017 at 4:45 PM, Abhijit Paul <
>> er.abhijitp...@gmail.com> wrote:
>>
>>> I yet to try gluster-block with elasticsearch...but carious to know
>>> does this combination plays well in kubernetes environment?
>>>
>>> On Fri, May 5, 2017 at 12:14 PM, Abhijit Paul <
>>> er.abhijitp...@gmail.com> wrote:
>>>
 thanks Krutika for the alternative.

 @*Prasanna @**Pranith*
 I was going thorough the mentioned blog post and saw that used
 tested environment was Fedora ,
 but i am using RHEL based Oracle linux so does gluster-block
 compatible with RHEL as well?

 On Fri, May 5, 2017 at 12:03 PM, Krutika Dhananjay <
 kdhan...@redhat.com> wrote:

> Yeah, there are a couple of cache consistency issues with
> performance translators that are causing these exceptions.
> Some of them were fixed by 3.10.1. Some still remain.
>
> Alternatively you can give gluster-block + elasticsearch a try,
> which doesn't require solving all these caching issues.
> Here's a blog post on the same - https://pkalever.wordpress.com
> /2017/03/14/elasticsearch-with-gluster-block/
>
> Adding Prasanna and Pranith who worked on this, in case you need
> more info on this.
>
> -Krutika
>
> On Fri, May 5, 2017 at 12:15 AM, Abhijit Paul <
> er.abhijitp...@gmail.com> wrote:
>
>> Thanks for the reply, i will try it out but i am also facing one
>> more issue "i.e. replicated volumes returning different
>> timestamps"
>> so is this because of Bug 1426548 - Openshift Logging
>> ElasticSearch FSLocks when using GlusterFS storage backend
>>  ?
>>
>> *FYI i am using glusterfs 3.10.1 tar.gz*
>>
>> Regards,
>> Abhijit
>>
>>
>>
>> On Thu, May 4, 2017 at 10:58 PM, Amar Tumballi <
>> atumb...@redhat.com> wrote:
>>
>>>
>>>
>>> On Thu, May 4, 2017 at 10:41 PM, Abhijit Paul <
>>> er.abhijitp...@gmail.com> wrote:
>>>
 Since i am new to gluster, can please provide how to turn 
 off/disable
 "perf xlator options"?


>>> $ gluster volume set  performance.stat-prefetch off
>>> $ gluster volume set  performance.read-ahead off
>>> $ gluster volume set  performance.write-behind off
>>> $ gluster volume set  performance.io-cache off
>>> $ gluster volume set  performance.quick-read off
>>>
>>>
>>> Regards,
>>> Amar
>>>

> On Wed, May 3, 2017 at 8:51 PM, Atin Mukherjee <
> amukh...@redhat.com> wrote:
>
>> I think there is still some pending stuffs in some of the
>> gluster perf xlators to make that work complete. Cced the 
>> relevant folks
>> for more information. Can you please turn off all the perf 
>> xlator options
>> as a work around to move forward?
>>

Re: [Gluster-users] Elasticsearch facing CorruptIndexException exception with GlusterFs 3.10.1

2017-05-08 Thread Abhijit Paul
poking for previous mail reply

On Sun, May 7, 2017 at 1:06 AM, Abhijit Paul 
wrote:

>  https://pkalever.wordpress.com/2017/03/14/elasticsearch-wit
> h-gluster-block/
> here used tested environment is Fedora ,
> but i am using RHEL based Oracle linux so does gluster-block compatible
> with RHEL as well? What i needs to change & make it work?
>
> On Fri, May 5, 2017 at 5:42 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Fri, May 5, 2017 at 5:40 PM, Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, May 5, 2017 at 5:36 PM, Abhijit Paul 
>>> wrote:
>>>
 So should i start using gluster-block with elasticsearch in kubernetes
 environment?

 My expectation from gluster-block is, it should not CorruptIndex
 of elasticsearch...and issue facing in previous mails.

 Please let me know whether should i processed with above mentioned
 combination.

>>>
>>> We are still in the process of fixing the failure scenarios of
>>> tcmu-runner dying and failingover in the multipath scenarios.
>>>
>>
>> Prasanna did test that elasticsearch itself worked fine in gluster-block
>> environment when all the machines are up etc i.e. success path. We are
>> doing failure path testing and fixing things at the moment.
>>
>>
>>>
>>>

 On Fri, May 5, 2017 at 5:06 PM, Pranith Kumar Karampuri <
 pkara...@redhat.com> wrote:

> Abhijit we just started making the efforts to get all of this stable.
>
> On Fri, May 5, 2017 at 4:45 PM, Abhijit Paul  > wrote:
>
>> I yet to try gluster-block with elasticsearch...but carious to know
>> does this combination plays well in kubernetes environment?
>>
>> On Fri, May 5, 2017 at 12:14 PM, Abhijit Paul <
>> er.abhijitp...@gmail.com> wrote:
>>
>>> thanks Krutika for the alternative.
>>>
>>> @*Prasanna @**Pranith*
>>> I was going thorough the mentioned blog post and saw that used
>>> tested environment was Fedora ,
>>> but i am using RHEL based Oracle linux so does gluster-block
>>> compatible with RHEL as well?
>>>
>>> On Fri, May 5, 2017 at 12:03 PM, Krutika Dhananjay <
>>> kdhan...@redhat.com> wrote:
>>>
 Yeah, there are a couple of cache consistency issues with
 performance translators that are causing these exceptions.
 Some of them were fixed by 3.10.1. Some still remain.

 Alternatively you can give gluster-block + elasticsearch a try,
 which doesn't require solving all these caching issues.
 Here's a blog post on the same - https://pkalever.wordpress.com
 /2017/03/14/elasticsearch-with-gluster-block/

 Adding Prasanna and Pranith who worked on this, in case you need
 more info on this.

 -Krutika

 On Fri, May 5, 2017 at 12:15 AM, Abhijit Paul <
 er.abhijitp...@gmail.com> wrote:

> Thanks for the reply, i will try it out but i am also facing one
> more issue "i.e. replicated volumes returning different
> timestamps"
> so is this because of Bug 1426548 - Openshift Logging
> ElasticSearch FSLocks when using GlusterFS storage backend
>  ?
>
> *FYI i am using glusterfs 3.10.1 tar.gz*
>
> Regards,
> Abhijit
>
>
>
> On Thu, May 4, 2017 at 10:58 PM, Amar Tumballi <
> atumb...@redhat.com> wrote:
>
>>
>>
>> On Thu, May 4, 2017 at 10:41 PM, Abhijit Paul <
>> er.abhijitp...@gmail.com> wrote:
>>
>>> Since i am new to gluster, can please provide how to turn 
>>> off/disable
>>> "perf xlator options"?
>>>
>>>
>> $ gluster volume set  performance.stat-prefetch off
>> $ gluster volume set  performance.read-ahead off
>> $ gluster volume set  performance.write-behind off
>> $ gluster volume set  performance.io-cache off
>> $ gluster volume set  performance.quick-read off
>>
>>
>> Regards,
>> Amar
>>
>>>
 On Wed, May 3, 2017 at 8:51 PM, Atin Mukherjee <
 amukh...@redhat.com> wrote:

> I think there is still some pending stuffs in some of the
> gluster perf xlators to make that work complete. Cced the 
> relevant folks
> for more information. Can you please turn off all the perf xlator 
> options
> as a work around to move forward?
>
> On Wed, May 3, 2017 at 8:04 PM, Abhijit Paul <
> er.abhijitp...@gmail.com> wrote:
>
>> Dear folks,
>>
>> I setup Glusterfs(3.10.1) NFS type as persistence volume for
>> Elasticsearch(5.1.2) but currently facing i

Re: [Gluster-users] gdeploy, Centos7 & Ansible 2.3

2017-05-08 Thread hvjunk

> On 08 May 2017, at 09:34 , knarra  wrote:
> Hi,
> 
> There is a new version of gdeploy built where the above seen issues are 
> fixed. Can you please update gdeploy to the version below [1] and run the 
> test again ?
> [1] 
> https://copr-be.cloud.fedoraproject.org/results/sac/gdeploy/epel-7-x86_64/00547404-gdeploy/gdeploy-2.0.2-6.noarch.rpm
>  
> 
> Thanks
> 
> kasturi
Thank you Kasturi,

Please se the run with errors below, using this conf file:
==
[hosts]
10.10.10.11
10.10.10.12
10.10.10.13

[backend-setup]
devices=/dev/sdb
mountpoints=/gluster/brick1
brick_dirs=/gluster/brick1/one
pools=pool1

#Installing nfs-ganesha
[yum]
action=install
repolist=
gpgcheck=no
update=no
packages=glusterfs-ganesha

#This will create a volume. Skip this section if your volume already exists
[volume]
action=create
volname=ganesha
transport=tcp
replica_count=3
arbiter=1
force=yes

#Creating a high availability cluster and exporting the volume
[nfs-ganesha]
action=create-cluster
ha-name=ganesha-ha-360
cluster-nodes=10.10.10.11,10.10.10.12
vip=10.10.10.31,10.10.10.41
volname=ganesha

==



[root@linked-clone-of-centos-linux ~]# gdeploy -c t.conf
ERROR! no action detected in task. This often indicates a misspelled module 
name, or incorrect module path.

The error appears to have been in '/tmp/tmpvhTM5i/pvcreate.yml': line 16, 
column 5, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  # Create pv on all the disks
  - name: Create Physical Volume
^ here


The error appears to have been in '/tmp/tmpvhTM5i/pvcreate.yml': line 16, 
column 5, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  # Create pv on all the disks
  - name: Create Physical Volume
^ here

Ignoring errors...
ERROR! no action detected in task. This often indicates a misspelled module 
name, or incorrect module path.

The error appears to have been in '/tmp/tmpvhTM5i/vgcreate.yml': line 8, column 
5, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  tasks:
  - name: Create volume group on the disks
^ here


The error appears to have been in '/tmp/tmpvhTM5i/vgcreate.yml': line 8, column 
5, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  tasks:
  - name: Create volume group on the disks
^ here

Ignoring errors...
ERROR! no action detected in task. This often indicates a misspelled module 
name, or incorrect module path.

The error appears to have been in 
'/tmp/tmpvhTM5i/auto_lvcreate_for_gluster.yml': line 7, column 5, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  tasks:
  - name: Create logical volume named metadata
^ here


The error appears to have been in 
'/tmp/tmpvhTM5i/auto_lvcreate_for_gluster.yml': line 7, column 5, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  tasks:
  - name: Create logical volume named metadata
^ here

Ignoring errors...

PLAY [gluster_servers] 
**

TASK [Create an xfs filesystem] 
*
failed: [10.10.10.11] (item=/dev/GLUSTER_vg1/GLUSTER_lv1) => {"failed": true, 
"item": "/dev/GLUSTER_vg1/GLUSTER_lv1", "msg": "Device 
/dev/GLUSTER_vg1/GLUSTER_lv1 not found."}
failed: [10.10.10.13] (item=/dev/GLUSTER_vg1/GLUSTER_lv1) => {"failed": true, 
"item": "/dev/GLUSTER_vg1/GLUSTER_lv1", "msg": "Device 
/dev/GLUSTER_vg1/GLUSTER_lv1 not found."}
failed: [10.10.10.12] (item=/dev/GLUSTER_vg1/GLUSTER_lv1) => {"failed": true, 
"item": "/dev/GLUSTER_vg1/GLUSTER_lv1", "msg": "Device 
/dev/GLUSTER_vg1/GLUSTER_lv1 not found."}
to retry, use: --limit @/tmp/tmpvhTM5i/fscreate.retry

PLAY RECAP 
**
10.10.10.11: ok=0changed=0unreachable=0failed=1
10.10.10.12: ok=0changed=0unreachable=0failed=1
10.10.10.13: ok=0changed=0unreachable=0failed=1

Ignoring errors...

PLAY [gluster_servers] 
**

TASK [Create the backend disks, skips if present] 
*

Re: [Gluster-users] disperse volume brick counts limits in RHES

2017-05-08 Thread Serkan Çoban
>What network do you have?
We have 2X10G bonded interfaces on each server.

Thanks to Xavier for detailed explanation of EC details.

On Sat, May 6, 2017 at 2:20 AM, Alastair Neil  wrote:
> What network do you have?
>
>
> On 5 May 2017 at 09:51, Serkan Çoban  wrote:
>>
>> In our use case every node has 26 bricks. I am using 60 nodes, one 9PB
>> volume with 16+4 EC configuration, each brick in a sub-volume is on
>> different host.
>> We put 15-20k 2GB files every day into 10-15 folders. So it is 1500K
>> files/folder. Our gluster version is 3.7.11.
>> Heal speed in this environment is 8-10MB/sec/brick.
>>
>> I did some tests for parallel self heal feature with version 3.9, two
>> servers 26 bricks each, 8+2 and 16+4 EC configuration.
>> This was a small test environment and the results are as I said 8+2 is
>> 2x faster then 16+4 with parallel self heal threads set to 2/4.
>> In 1-2 months our new servers arriving, I will do detailed tests for
>> heal performance for 8+2 and 16+4 and inform you the results.
>>
>>
>> On Fri, May 5, 2017 at 2:54 PM, Pranith Kumar Karampuri
>>  wrote:
>> >
>> >
>> > On Fri, May 5, 2017 at 5:19 PM, Pranith Kumar Karampuri
>> >  wrote:
>> >>
>> >>
>> >>
>> >> On Fri, May 5, 2017 at 2:38 PM, Serkan Çoban 
>> >> wrote:
>> >>>
>> >>> It is the over all time, 8TB data disk healed 2x faster in 8+2
>> >>> configuration.
>> >>
>> >>
>> >> Wow, that is counter intuitive for me. I will need to explore about
>> >> this
>> >> to find out why that could be. Thanks a lot for this feedback!
>> >
>> >
>> > From memory I remember you said you have a lot of small files hosted on
>> > the
>> > volume, right? It could be because of the bug
>> > https://review.gluster.org/17151 is fixing. That is the only reason I
>> > could
>> > guess right now. We will try to test this kind of case if you could give
>> > us
>> > a bit more details about average file-size/depth of directories etc to
>> > simulate similar looking directory structure.
>> >
>> >>
>> >>
>> >>>
>> >>>
>> >>> On Fri, May 5, 2017 at 10:00 AM, Pranith Kumar Karampuri
>> >>>  wrote:
>> >>> >
>> >>> >
>> >>> > On Fri, May 5, 2017 at 11:42 AM, Serkan Çoban
>> >>> > 
>> >>> > wrote:
>> >>> >>
>> >>> >> Healing gets slower as you increase m in m+n configuration.
>> >>> >> We are using 16+4 configuration without any problems other then
>> >>> >> heal
>> >>> >> speed.
>> >>> >> I tested heal speed with 8+2 and 16+4 on 3.9.0 and see that heals
>> >>> >> on
>> >>> >> 8+2 is faster by 2x.
>> >>> >
>> >>> >
>> >>> > As you increase number of nodes that are participating in an EC set
>> >>> > number
>> >>> > of parallel heals increase. Is the heal speed you saw improved per
>> >>> > file
>> >>> > or
>> >>> > the over all time it took to heal the data?
>> >>> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Fri, May 5, 2017 at 9:04 AM, Ashish Pandey 
>> >>> >> wrote:
>> >>> >> >
>> >>> >> > 8+2 and 8+3 configurations are not the limitation but just
>> >>> >> > suggestions.
>> >>> >> > You can create 16+3 volume without any issue.
>> >>> >> >
>> >>> >> > Ashish
>> >>> >> >
>> >>> >> > 
>> >>> >> > From: "Alastair Neil" 
>> >>> >> > To: "gluster-users" 
>> >>> >> > Sent: Friday, May 5, 2017 2:23:32 AM
>> >>> >> > Subject: [Gluster-users] disperse volume brick counts limits in
>> >>> >> > RHES
>> >>> >> >
>> >>> >> >
>> >>> >> > Hi
>> >>> >> >
>> >>> >> > we are deploying a large (24node/45brick) cluster and noted that
>> >>> >> > the
>> >>> >> > RHES
>> >>> >> > guidelines limit the number of data bricks in a disperse set to
>> >>> >> > 8.
>> >>> >> > Is
>> >>> >> > there
>> >>> >> > any reason for this.  I am aware that you want this to be a power
>> >>> >> > of
>> >>> >> > 2,
>> >>> >> > but
>> >>> >> > as we have a large number of nodes we were planning on going with
>> >>> >> > 16+3.
>> >>> >> > Dropping to 8+2 or 8+3 will be a real waste for us.
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> >
>> >>> >> >
>> >>> >> > Alastair
>> >>> >> >
>> >>> >> >
>> >>> >> > ___
>> >>> >> > Gluster-users mailing list
>> >>> >> > Gluster-users@gluster.org
>> >>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
>> >>> >> >
>> >>> >> >
>> >>> >> > ___
>> >>> >> > Gluster-users mailing list
>> >>> >> > Gluster-users@gluster.org
>> >>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
>> >>> >> ___
>> >>> >> Gluster-users mailing list
>> >>> >> Gluster-users@gluster.org
>> >>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Pranith
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Pranith
>> >
>> >
>> >
>> >
>> > --
>> > Pranith
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___

Re: [Gluster-users] gdeploy, Centos7 & Ansible 2.3

2017-05-08 Thread knarra

On 05/06/2017 03:14 PM, hvjunk wrote:

Hi there,

 So, busy testing/installing/etc. and was pointed last night in the 
direction of gdeploy. I did a quick try on Ubuntu 16.04, found some 
module related troubles, so I retried on Centos 7 this morning.


Seems that the playbooks aren’t 2.3 “compatible”…

The brick VMs are setup using the set-vms-centos.sh & 
sshkeys-centos.yml playbook from 
https://bitbucket.org/dismyne/gluster-ansibles/src/24b62dcc858364ee3744d351993de0e8e35c2680/?at=Centos-gdeploy-tests


The “installation”/gdeploy VM run:

The relevant history output:

   18  yum install epel-release
   19  yum install ansible
   20  yum search gdeploy
   21  yum install 
https://download.gluster.org/pub/gluster/gdeploy/LATEST/CentOS7/gdeploy-2.0.1-9.noarch.rpm

   22  vi t.conf
   23  gdeploy -c t.conf
   24  history
   25  mkdir .ssh
   26  cd .ssh
   27  ls
   28  vi id_rsa
   29  chmod 0600 id_rsa
   30  cd
   31  gdeploy -c t.conf
   32  ssh -v 10.10.10.11
   33  ssh -v 10.10.10.12
   34  ssh -v 10.10.10.13
   35  gdeploy -c t.conf

The t.conf:
==
 [hosts]
10.10.10.11
10.10.10.12
10.10.10.13

[backend-setup]
devices=/dev/sdb
mountpoints=/gluster/brick1
brick_dirs=/gluster/brick1/one

==

The gdeploy run:

=
[root@linked-clone-of-centos-linux ~]# gdeploy -c t.conf
ERROR! no action detected in task. This often indicates a misspelled 
module name, or incorrect module path.


The error appears to have been in '/tmp/tmpezTsyO/pvcreate.yml': line 
16, column 5, but may

be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  # Create pv on all the disks
  - name: Create Physical Volume
^ here


The error appears to have been in '/tmp/tmpezTsyO/pvcreate.yml': line 
16, column 5, but may

be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  # Create pv on all the disks
  - name: Create Physical Volume
^ here

Ignoring errors...
ERROR! no action detected in task. This often indicates a misspelled 
module name, or incorrect module path.


The error appears to have been in '/tmp/tmpezTsyO/vgcreate.yml': line 
8, column 5, but may

be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  tasks:
  - name: Create volume group on the disks
^ here


The error appears to have been in '/tmp/tmpezTsyO/vgcreate.yml': line 
8, column 5, but may

be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  tasks:
  - name: Create volume group on the disks
^ here

Ignoring errors...
ERROR! no action detected in task. This often indicates a misspelled 
module name, or incorrect module path.


The error appears to have been in 
'/tmp/tmpezTsyO/auto_lvcreate_for_gluster.yml': line 7, column 5, but may

be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  tasks:
  - name: Create logical volume named metadata
^ here


The error appears to have been in 
'/tmp/tmpezTsyO/auto_lvcreate_for_gluster.yml': line 7, column 5, but may

be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  tasks:
  - name: Create logical volume named metadata
^ here

Ignoring errors...

PLAY [gluster_servers] 
**


TASK [Create a xfs filesystem] 
**
failed: [10.10.10.13] (item=/dev/GLUSTER_vg1/GLUSTER_lv1) => 
{"failed": true, "item": "/dev/GLUSTER_vg1/GLUSTER_lv1", "msg": 
"Device /dev/GLUSTER_vg1/GLUSTER_lv1 not found."}
failed: [10.10.10.12] (item=/dev/GLUSTER_vg1/GLUSTER_lv1) => 
{"failed": true, "item": "/dev/GLUSTER_vg1/GLUSTER_lv1", "msg": 
"Device /dev/GLUSTER_vg1/GLUSTER_lv1 not found."}
failed: [10.10.10.11] (item=/dev/GLUSTER_vg1/GLUSTER_lv1) => 
{"failed": true, "item": "/dev/GLUSTER_vg1/GLUSTER_lv1", "msg": 
"Device /dev/GLUSTER_vg1/GLUSTER_lv1 not found."}

to retry, use: --limit @/tmp/tmpezTsyO/fscreate.retry

PLAY RECAP 
**

10.10.10.11: ok=0changed=0  unreachable=0failed=1
10.10.10.12: ok=0changed=0  unreachable=0failed=1
10.10.10.13: ok=0changed=0  unreachable=0failed=1

Ignoring errors...

PLAY [gluster_servers] 
**


TASK [Create the backend disks, skips if present] 

Re: [Gluster-users] disperse volume brick counts limits in RHES

2017-05-08 Thread Xavier Hernandez

On 05/05/17 13:49, Pranith Kumar Karampuri wrote:



On Fri, May 5, 2017 at 2:38 PM, Serkan Çoban mailto:cobanser...@gmail.com>> wrote:

It is the over all time, 8TB data disk healed 2x faster in 8+2
configuration.


Wow, that is counter intuitive for me. I will need to explore about this
to find out why that could be. Thanks a lot for this feedback!


Matrix multiplication for encoding/decoding of 8+2 is 4 times faster 
than 16+4 (one matrix of 16x16 is composed by 4 submatrices of 8x8), 
however each matrix operation on a 16+4 configuration takes twice the 
amount of data of a 8+2, so net effect is that 8+2 is twice as fast as 16+4.


An 8+2 also uses bigger blocks on each brick, processing the same amount 
of data in less I/O operations and bigger network packets.


Probably these are the reasons why 16+4 is slower than 8+2.

See my other email for more detailed description.

Xavi





On Fri, May 5, 2017 at 10:00 AM, Pranith Kumar Karampuri
mailto:pkara...@redhat.com>> wrote:
>
>
> On Fri, May 5, 2017 at 11:42 AM, Serkan Çoban
mailto:cobanser...@gmail.com>> wrote:
>>
>> Healing gets slower as you increase m in m+n configuration.
>> We are using 16+4 configuration without any problems other then heal
>> speed.
>> I tested heal speed with 8+2 and 16+4 on 3.9.0 and see that heals on
>> 8+2 is faster by 2x.
>
>
> As you increase number of nodes that are participating in an EC
set number
> of parallel heals increase. Is the heal speed you saw improved per
file or
> the over all time it took to heal the data?
>
>>
>>
>>
>> On Fri, May 5, 2017 at 9:04 AM, Ashish Pandey
mailto:aspan...@redhat.com>> wrote:
>> >
>> > 8+2 and 8+3 configurations are not the limitation but just
suggestions.
>> > You can create 16+3 volume without any issue.
>> >
>> > Ashish
>> >
>> > 
>> > From: "Alastair Neil" mailto:ajneil.t...@gmail.com>>
>> > To: "gluster-users" mailto:gluster-users@gluster.org>>
>> > Sent: Friday, May 5, 2017 2:23:32 AM
>> > Subject: [Gluster-users] disperse volume brick counts limits in
RHES
>> >
>> >
>> > Hi
>> >
>> > we are deploying a large (24node/45brick) cluster and noted
that the
>> > RHES
>> > guidelines limit the number of data bricks in a disperse set to
8.  Is
>> > there
>> > any reason for this.  I am aware that you want this to be a
power of 2,
>> > but
>> > as we have a large number of nodes we were planning on going
with 16+3.
>> > Dropping to 8+2 or 8+3 will be a real waste for us.
>> >
>> > Thanks,
>> >
>> >
>> > Alastair
>> >
>> >
>> > ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org 
>> > http://lists.gluster.org/mailman/listinfo/gluster-users

>> >
>> >
>> > ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org 
>> > http://lists.gluster.org/mailman/listinfo/gluster-users

>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org 
>> http://lists.gluster.org/mailman/listinfo/gluster-users

>
>
>
>
> --
> Pranith




--
Pranith


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users