Re: [Gluster-users] Some more questions

2018-05-09 Thread Gandalf Corvotempesta
Il giorno mer 9 mag 2018 alle ore 21:57 Jim Kinney 
ha scritto:
> It all depends on how you are set up on the distribute. Think RAID 10
with 4 drives - each pair strips (distribute) and the pair of pairs
replicates.

Exactly, thus I have to add the same replica count.
In a RAID10, you have to add 2 disk at once. Or in a 3way RAID10 you have
to add 3 disks at once and so on.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Some more questions

2018-05-09 Thread Gandalf Corvotempesta
Il giorno mer 9 mag 2018 alle ore 21:31 Jim Kinney 
ha scritto:
> correct. a new server will NOT add space in this manner. But the original
Q was about rebalancing after adding a 4th server. If you are using
distributed/replication, then yes, a new server with be adding a portion of
it's space to add more space to the cluster.

Wait, in a distribute-replicate, with replica count 3, adding a fourth
server doesn't add space. I can't add a fourth server, I have to add 3
more servers (due to the replica count)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Some more questions

2018-05-09 Thread Gandalf Corvotempesta
Il giorno mer 9 mag 2018 alle ore 21:22 Jim Kinney 
ha scritto:
> You can change the replica count. Add a fourth server, add it's brick to
existing volume with gluster volume add-brick vol0 replica 4
newhost:/path/to/brick

This doesn't add space, but only a new replica, increasing the number of
copies
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Some more questions

2018-05-09 Thread Gandalf Corvotempesta
Ok, some more question as I'm still planning our SDS (but I'm prone to use
LizardFS, gluster is too inflexible)

Let's assume a replica 3:

1) currently, is not possbile to add a single server and rebalance like any
order SDS (Ceph, Lizard, Moose, DRBD, ), right ? In replica 3, I have
to add 3 new servers

2) The same should be by add disks on spare slots on existing servers.
Always a multiple of replica count, thus 1 disk per server

3) Can I grow the cluster by replacing 3 disks with bigger ones? In
example, with 12 2TB disks (on each server), I can replace 3 of them (1 per
server) with 4TB to get more space, right ? Or should I replace *all* disks
?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] arbiter node on client?

2018-05-07 Thread Gandalf Corvotempesta
Il giorno lun 7 mag 2018 alle ore 13:22 Dave Sherohman 
ha scritto:
> I'm pretty sure that you can only have one arbiter per subvolume, and
> I'm not even sure what the point of multiple arbiters over the same data
> would be.

Multiple arbiter add availability. I can safely shutdown one hypervisor
node (where arbiter is located)
and still have a 100% working cluster with quorum.

Is possible to add arbiter on the fly or must be configured during the
volume creation ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] arbiter node on client?

2018-05-06 Thread Gandalf Corvotempesta
is possible to add an arbiter node on the client?

Let's assume a gluster storage made with 2 storage server. This is prone to
split-brains.
An arbiter node can be added, but can I put the arbiter on one of the
client ?

Can I use multiple arbiter for the same volume ? In example, one arbiter on
each client.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] shard corruption bug

2018-05-04 Thread Gandalf Corvotempesta
Il giorno ven 4 mag 2018 alle ore 14:06 Jim Kinney 
ha scritto:
> It stopped being an outstanding issue at 3.12.7. I think it's now fixed.

So, is not possible to extend and rebalance a working cluster with sharded
data ?
Can someone confirm this ? Maybe the ones that hit the bug in the past
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] shard corruption bug

2018-05-04 Thread Gandalf Corvotempesta
Hi to all
is the "famous" corruption bug when sharding enabled fixed or still a work
in progress ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Single brick expansion

2018-04-26 Thread Gandalf Corvotempesta
Any updates about this feature?
It was planned for v4 but seems to be postponed...
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Reconstructing files from shards

2018-04-23 Thread Gandalf Corvotempesta
2018-04-22 15:10 GMT+02:00 Jim Kinney :
> So a stock ovirt with gluster install that uses sharding
> A. Can't safely have sharding turned off once files are in use
> B. Can't be expanded with additional bricks

If the expansion bug is still unresolved, yes :-)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Reconstructing files from shards

2018-04-23 Thread Gandalf Corvotempesta
2018-04-23 9:34 GMT+02:00 Alessandro Briosi :
> Is it that really so?

yes, i've opened a bug asking developers to block removal of sharding
when volume has data on it or to write a huge warning message
saying that data loss will happen

> I thought that sharding was a extended attribute on the files created when
> sharding is enabled.
>
> Turning off sharding on the volume would not turn off sharding on the files,
> but on newly created files ...

No, because sharded file are reconstructed on-the-fly based on the
volume's sharding property.
If you disable sharding, gluster knows nothing about the previous
shard configuration, thus won't be able to read
all shards for each file. It will only returns the first shard,
resulting in data-loss or corruption.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Reconstructing files from shards

2018-04-22 Thread Gandalf Corvotempesta
Il dom 22 apr 2018, 10:46 Alessandro Briosi  ha scritto:

> Imho the easiest path would be to turn off sharding on the volume and
> simply do a copy of the files (to a different directory, or rename and
> then copy i.e.)
>
> This should simply store the files without sharding.
>

If you turn off sharding on a sharded volume with data in it, all sharded
files would be unreadable

>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] small files performance

2017-10-13 Thread Gandalf Corvotempesta
Where did you read 2k IOPS?

Each disk is able to do about 75iops as I'm using SATA disk, getting even
closer to 2000 it's impossible

Il 13 ott 2017 9:42 AM, "Szymon Miotk"  ha scritto:

> Depends what you need.
> 2K iops for small file writes is not a bad result.
> In my case I had a system that was just poorly written and it was
> using 300-1000 iops for constant operations and was choking on
> cleanup.
>
>
> On Thu, Oct 12, 2017 at 6:23 PM, Gandalf Corvotempesta
>  wrote:
> > So, even with latest version, gluster is still unusable with small files
> ?
> >
> > 2017-10-12 10:51 GMT+02:00 Szymon Miotk :
> >> I've analyzed small files performance few months ago, because I had
> >> huge performance problems with small files writes on Gluster.
> >> The read performance has been improved in many ways in recent releases
> >> (md-cache, parallel-readdir, hot-tier).
> >> But write performance is more or less the same and you cannot go above
> >> 10K smallfiles create - even with SSD or Optane drives.
> >> Even ramdisk is not helping much here, because the bottleneck is not
> >> in the storage performance.
> >> Key problems I've noticed:
> >> - LOOKUPs are expensive, because there is separate query for every
> >> depth level of destination directory (md-cache helps here a bit,
> >> unless you are creating lot of directories). So the deeper the
> >> directory structure, the worse.
> >> - for every file created, Gluster creates another file in .glusterfs
> >> directory, doubling the required IO and network latency. What's worse,
> >> XFS, the recommended filesystem, doesn't like flat directory sturcture
> >> with thousands files in each directory. But that's exactly how Gluster
> >> stores its metadata in .glusterfs, so the performance decreases by
> >> 40-50% after 10M files.
> >> - complete directory structure is created on each of the bricks. So
> >> every mkdir results in io on every brick you have in the volume.
> >> - hot-tier may be great for improving reads, but for small files
> >> writes it actually kills performance even more.
> >> - FUSE driver requires context switch between userspace and kernel
> >> each time you create a file, so with small files the context switches
> >> are also taking their toll
> >>
> >> The best results I got were:
> >> - create big file on Gluster, mount it as XFS over loopback interface
> >> - 13.5K smallfile writes. Drawback - you can use it only on one
> >> server, as XFS will crash when two servers will write to it.
> >> - use libgfapi - 20K smallfile writes performance. Drawback - no nice
> >> POSIX filesystem, huge CPU usage on Gluster server.
> >>
> >> I was testing with 1KB files, so really small.
> >>
> >> Best regards,
> >> Szymon Miotk
> >>
> >> On Fri, Oct 6, 2017 at 4:43 PM, Gandalf Corvotempesta
> >>  wrote:
> >>> Any update about this?
> >>> I've seen some works about optimizing performance for small files, is
> >>> now gluster "usable" for storing, in example, Maildirs or git sources
> >>> ?
> >>>
> >>> at least in 3.7 (or 3.8, I don't remember exactly), extracting kernel
> >>> sources took about 4-5 minutes.
> >>> ___
> >>> Gluster-users mailing list
> >>> Gluster-users@gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] gluster status

2017-10-12 Thread Gandalf Corvotempesta
How can I show the current state of a gluster cluster, like status,
replicas down, what is going on and so on ?

Something like /proc/mdstat for raid, where I can see which disks are
down, if raid is rebuilding,checking, 

Anything similiar in gluster?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] dbench

2017-10-12 Thread Gandalf Corvotempesta
trying to check gluster performance with dbench

I'm using a replica 3 with a bonded dual gigabit (balance-alb) on all
servers and shard (64M) enabled.

I'm unable to over 3MB (three) MB/s from *inside* VM, thus I think
there isn't any small file issue, as from inside VM there isn't any
metadata operation for gluster.

Any advice ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] data corruption - any update?

2017-10-11 Thread Gandalf Corvotempesta
Just to clarify as i'm planning to put gluster in production (after
fixing some issue, but for this I need community help):

corruption happens only in this cases:

- volume with shard enabled
AND
- rebalance operation

In any other cases, corruption should not happen (or at least is not
known to happen)

So, what If I have to replace a failed brick/disks ? Will this trigger
a rebalance and then corruption?

rebalance, is only needed when you have to expend a volume, ie by
adding more bricks ?

2017-10-05 13:55 GMT+02:00 Nithya Balachandran :
>
>
> On 4 October 2017 at 23:34, WK  wrote:
>>
>> Just so I know.
>>
>> Is it correct to assume that this corruption issue is ONLY involved if you
>> are doing rebalancing with sharding enabled.
>>
>> So if I am not doing rebalancing I should be fine?
>
>
> That is correct.
>
>>
>> -bill
>>
>>
>>
>> On 10/3/2017 10:30 PM, Krutika Dhananjay wrote:
>>
>>
>>
>> On Wed, Oct 4, 2017 at 10:51 AM, Nithya Balachandran 
>> wrote:
>>>
>>>
>>>
>>> On 3 October 2017 at 13:27, Gandalf Corvotempesta
>>>  wrote:
>>>>
>>>> Any update about multiple bugs regarding data corruptions with
>>>> sharding enabled ?
>>>>
>>>> Is 3.12.1 ready to be used in production?
>>>
>>>
>>> Most issues have been fixed but there appears to be one more race for
>>> which the patch is being worked on.
>>>
>>> @Krutika, is that correct?
>>>
>>>
>>
>> That is my understanding too, yes, in light of the discussion that
>> happened at https://bugzilla.redhat.com/show_bug.cgi?id=1465123
>>
>> -Krutika
>>
>>>
>>> Thanks,
>>> Nithya
>>>
>>>
>>>
>>>>
>>>> ___
>>>> Gluster-users mailing list
>>>> Gluster-users@gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] iozone results

2017-10-11 Thread Gandalf Corvotempesta
I'm testing iozone inside a VM booted from a gluster volume.
By looking at network traffic on the host (the one connected to the
gluster storage) I can
see that a simple

iozone -w -c -e -i 0 -+n -C -r 64k -s 1g -t 1 -F /tmp/gluster.ioz


will make about 1200mbit/s on a bonded dual gigabit nic (probably,
with a bad bonding mode configured)

fio returns about 5kB/s, that are 40 kbps.

As I'm using replica 3, the host has to write to 3 storage server,
thus: 40*3 = 120 kbps

If I understood properly, i'm able to reach about 120 kpbs on
network side, with sequential writes, right ?

Why a simple "dd" will return only 30MB/s ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] ZFS with SSD ZIL vs XFS

2017-10-10 Thread Gandalf Corvotempesta
Last time I've read about tiering in gluster, there wasn't any performance
gain with VM workload and more over doesn't speed up writes...

Il 10 ott 2017 9:27 PM, "Bartosz Zięba"  ha scritto:

> Hi,
>
> Have you thought about using an SSD as a GlusterFS hot tiers?
>
> Regards,
> Bartosz
>
>
> On 10.10.2017 19:59, Gandalf Corvotempesta wrote:
>
>> 2017-10-10 18:27 GMT+02:00 Jeff Darcy :
>>
>>> Probably not.  If there is, it would probably favor XFS.  The developers
>>> at Red Hat use XFS almost exclusively.  We at Facebook have a mix, but
>>> XFS is (I think) the most common.  Whatever the developers use tends to
>>> become "the way local filesystems work" and code is written based on
>>> that profile, so even without intention that tends to get a bit of a
>>> boost.  To the extent that ZFS makes different tradeoffs - e.g. using
>>> lots more memory, very different disk access patterns - it's probably
>>> going to have a bit more of an "impedance mismatch" with the choices
>>> Gluster itself has made.
>>>
>> Ok, so XFS is the way to go :)
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ZFS with SSD ZIL vs XFS

2017-10-10 Thread Gandalf Corvotempesta
Any performance report to share?

Il 10 ott 2017 8:25 PM, "Dmitri Chebotarov" <4dim...@gmail.com> ha scritto:

>
> I've had good results with using SSD as LVM cache for gluster bricks (
> http://man7.org/linux/man-pages/man7/lvmcache.7.html). I still use XFS on
> bricks.
>
>
>
> On Tue, Oct 10, 2017 at 12:27 PM, Jeff Darcy  wrote:
>
>> On Tue, Oct 10, 2017, at 11:19 AM, Gandalf Corvotempesta wrote:
>> > Anyone made some performance comparison between XFS and ZFS with ZIL
>> > on SSD, in gluster environment ?
>> >
>> > I've tried to compare both on another SDS (LizardFS) and I haven't
>> > seen any tangible performance improvement.
>> >
>> > Is gluster different ?
>>
>> Probably not.  If there is, it would probably favor XFS.  The developers
>> at Red Hat use XFS almost exclusively.  We at Facebook have a mix, but
>> XFS is (I think) the most common.  Whatever the developers use tends to
>> become "the way local filesystems work" and code is written based on
>> that profile, so even without intention that tends to get a bit of a
>> boost.  To the extent that ZFS makes different tradeoffs - e.g. using
>> lots more memory, very different disk access patterns - it's probably
>> going to have a bit more of an "impedance mismatch" with the choices
>> Gluster itself has made.
>>
>> If you're interested in ways to benefit from a disk+SSD combo under XFS,
>> it is possible to configure XFS with a separate journal device but I
>> believe there were some bugs encountered when doing that.  Richard
>> Wareing's upcoming Dev Summit talk on Hybrid XFS might cover those, in
>> addition to his own work on using an SSD in even more interesting ways.
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ZFS with SSD ZIL vs XFS

2017-10-10 Thread Gandalf Corvotempesta
2017-10-10 18:27 GMT+02:00 Jeff Darcy :
> Probably not.  If there is, it would probably favor XFS.  The developers
> at Red Hat use XFS almost exclusively.  We at Facebook have a mix, but
> XFS is (I think) the most common.  Whatever the developers use tends to
> become "the way local filesystems work" and code is written based on
> that profile, so even without intention that tends to get a bit of a
> boost.  To the extent that ZFS makes different tradeoffs - e.g. using
> lots more memory, very different disk access patterns - it's probably
> going to have a bit more of an "impedance mismatch" with the choices
> Gluster itself has made.

Ok, so XFS is the way to go :)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] ZFS with SSD ZIL vs XFS

2017-10-10 Thread Gandalf Corvotempesta
Anyone made some performance comparison between XFS and ZFS with ZIL
on SSD, in gluster environment ?

I've tried to compare both on another SDS (LizardFS) and I haven't
seen any tangible performance improvement.

Is gluster different ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] small files performance

2017-10-10 Thread Gandalf Corvotempesta
2017-10-10 8:25 GMT+02:00 Karan Sandha :

> Hi Gandalf,
>
> We have multiple tuning to do for small-files which decrease the time for
> negative lookups , meta-data caching, parallel readdir. Bumping the server
> and client event threads will help you out in increasing the small file
> performance.
>
> gluster v set   group metadata-cache
> gluster v set  group nl-cache
> gluster v set  performance.parallel-readdir on (Note : readdir
> should be on)
>

This is what i'm getting with suggested parameters.
I'm running "fio" from a mounted gluster client:
172.16.0.12:/gv0 on /mnt2 type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)



# fio --ioengine=libaio --filename=fio.test --size=256M
--direct=1 --rw=randrw --refill_buffers --norandommap
--bs=8k --rwmixread=70 --iodepth=16 --numjobs=16
--runtime=60 --group_reporting --name=fio-test
fio-test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio,
iodepth=16
...
fio-2.16
Starting 16 processes
fio-test: Laying out IO file(s) (1 file(s) / 256MB)
Jobs: 14 (f=13): [m(5),_(1),m(8),f(1),_(1)] [33.9% done] [1000KB/440KB/0KB
/s] [125/55/0 iops] [eta 01m:59s]
fio-test: (groupid=0, jobs=16): err= 0: pid=2051: Tue Oct 10 16:51:46 2017
  read : io=43392KB, bw=733103B/s, iops=89, runt= 60610msec
slat (usec): min=14, max=1992.5K, avg=177873.67, stdev=382294.06
clat (usec): min=768, max=6016.8K, avg=1871390.57, stdev=1082220.06
 lat (usec): min=872, max=6630.6K, avg=2049264.23, stdev=1158405.41
clat percentiles (msec):
 |  1.00th=[   20],  5.00th=[  208], 10.00th=[  457], 20.00th=[  873],
 | 30.00th=[ 1237], 40.00th=[ 1516], 50.00th=[ 1795], 60.00th=[ 2073],
 | 70.00th=[ 2442], 80.00th=[ 2835], 90.00th=[ 3326], 95.00th=[ 3785],
 | 99.00th=[ 4555], 99.50th=[ 4948], 99.90th=[ 5211], 99.95th=[ 5800],
 | 99.99th=[ 5997]
  write: io=18856KB, bw=318570B/s, iops=38, runt= 60610msec
slat (usec): min=17, max=3428, avg=212.62, stdev=287.88
clat (usec): min=59, max=6015.6K, avg=1693729.12, stdev=1003122.83
 lat (usec): min=79, max=6015.9K, avg=1693941.74, stdev=1003126.51
clat percentiles (usec):
 |  1.00th=[  724],  5.00th=[144384], 10.00th=[403456],
20.00th=[765952],
 | 30.00th=[1105920], 40.00th=[1368064], 50.00th=[1630208],
60.00th=[1875968],
 | 70.00th=[2179072], 80.00th=[2572288], 90.00th=[3031040],
95.00th=[3489792],
 | 99.00th=[4227072], 99.50th=[4423680], 99.90th=[4751360],
99.95th=[5210112],
 | 99.99th=[5996544]
lat (usec) : 100=0.15%, 250=0.05%, 500=0.06%, 750=0.09%, 1000=0.05%
lat (msec) : 2=0.28%, 4=0.09%, 10=0.15%, 20=0.39%, 50=1.81%
lat (msec) : 100=1.02%, 250=1.63%, 500=5.59%, 750=6.03%, 1000=7.31%
lat (msec) : 2000=35.61%, >=2000=39.67%
  cpu  : usr=0.01%, sys=0.01%, ctx=8218, majf=11, minf=295
  IO depths: 1=0.2%, 2=0.4%, 4=0.8%, 8=1.6%, 16=96.9%, 32=0.0%,
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.2%, 32=0.0%, 64=0.0%,
>=64=0.0%
 issued: total=r=5424/w=2357/d=0, short=r=0/w=0/d=0,
drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: io=43392KB, aggrb=715KB/s, minb=715KB/s, maxb=715KB/s,
mint=60610msec, maxt=60610msec
  WRITE: io=18856KB, aggrb=311KB/s, minb=311KB/s, maxb=311KB/s,
mint=60610msec, maxt=60610msec
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] small files performance

2017-10-06 Thread Gandalf Corvotempesta
Any update about this?
I've seen some works about optimizing performance for small files, is
now gluster "usable" for storing, in example, Maildirs or git sources
?

at least in 3.7 (or 3.8, I don't remember exactly), extracting kernel
sources took about 4-5 minutes.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] data corruption - any update?

2017-10-03 Thread Gandalf Corvotempesta
Any update about multiple bugs regarding data corruptions with
sharding enabled ?

Is 3.12.1 ready to be used in production?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] bonding mode

2017-10-02 Thread Gandalf Corvotempesta
I'm testing GlusterFS and Lizard.
I've set both SDS in replica 3.

All servers are configured with bonding mode "balance-rr" with 2x1Gbps nic
With iperf i'm able to saturate both link with a single connection.
With Lizard i'm able to saturate both link with a single "dd" write

With gluster i'm able to saturate only one link, reaching about 35MB/s
(35*3*8 = 840mbit/s = 1 link only)

Any clue why gluster isn't able to use both links ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] EC 1+2

2017-09-23 Thread Gandalf Corvotempesta
Already read that.
Seems that I have to use a multiple of 512, so 512*(3-2) is 512.

Seems fine

Il 23 set 2017 5:00 PM, "Dmitri Chebotarov" <4dim...@gmail.com> ha scritto:

> Hi
>
> Take a look at this link (under “Optimal volumes”), for Erasure Coded
> volume optimal configuration
>
> http://docs.gluster.org/Administrator%20Guide/Setting%20Up%20Volumes/
>
> On Sat, Sep 23, 2017 at 10:01 Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>
>> Is possible to create a dispersed volume 1+2 ? (Almost the same as
>> replica 3, the same as RAID-6)
>>
>> If yes, how many server I have to add in the future to expand the
>> storage? 1 or 3?
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] EC 1+2

2017-09-23 Thread Gandalf Corvotempesta
Is possible to create a dispersed volume 1+2 ? (Almost the same as replica
3, the same as RAID-6)

If yes, how many server I have to add in the future to expand the storage?
1 or 3?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS as virtual machine storage

2017-09-08 Thread Gandalf Corvotempesta
2017-09-08 14:11 GMT+02:00 Pavel Szalbot :
> Gandalf, SIGKILL (killall -9 glusterfsd) did not stop I/O after few
> minutes. SIGTERM on the other hand causes crash, but this time it is
> not read-only remount, but around 10 IOPS tops and 2 IOPS on average.
> -ps

So, seems to be reliable to server crashes but not to server shutdown :)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS as virtual machine storage

2017-09-08 Thread Gandalf Corvotempesta
2017-09-08 13:44 GMT+02:00 Pavel Szalbot :
> I did not test SIGKILL because I suppose if graceful exit is bad, SIGKILL
> will be as well. This assumption might be wrong. So I will test it. It would
> be interesting to see client to work in case of crash (SIGKILL) and not in
> case of graceful exit of glusterfsd.

Exactly. if this happen, probably there is a bug in gluster's signal management.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS as virtual machine storage

2017-09-08 Thread Gandalf Corvotempesta
2017-09-08 13:21 GMT+02:00 Pavel Szalbot :
> Gandalf, isn't possible server hard-crash too much? I mean if reboot
> reliably kills the VM, there is no doubt network crash or poweroff
> will as well.

IIUP, the only way to keep I/O running is to gracefully exiting glusterfsd.
killall should send signal 15 (SIGTERM) to the process, maybe a bug in
signal management
on gluster side? Because kernel is already telling glusterfsd to exit,
though signal 15 but glusterfsd
seems to handle this in a bad way.

a server hard-crash doesn't send any signal. I think this could be
also similiar to SIGKILL (9)
that can't be catched/ignored software side.

In other words: is this a bug in gluster's signal management (if
SIGKILL is working and SIGTERM no, i'll almost sure this is a bug in
signal management),
a engineering bug (relying only on a graceful exit [but even SIGTERM
should be threthed as graceful exit] to preserve I/O on clients) or
something else ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS as virtual machine storage

2017-09-08 Thread Gandalf Corvotempesta
2017-09-08 13:07 GMT+02:00 Pavel Szalbot :

> OK, so killall seems to be ok after several attempts i.e. iops do not stop
> on VM. Reboot caused I/O errors after maybe 20 seconds since issuing the
> command. I will check the servers console during reboot to see if the VM
> errors appear just after the power cycle and will try to crash the VM after
> killall again...
>
>
Also try to kill the Gluster VM without killing glusterfsd, simulating a
server hard-crash . Or try to remove the network interface.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [New Release] GlusterD2 v4.0dev-7

2017-07-05 Thread Gandalf Corvotempesta
Il 5 lug 2017 11:31 AM, "Kaushal M"  ha scritto:

- Preliminary support for volume expansion has been added. (Note that
rebalancing is not available yet)


What do you mean with this?
Any differences in volume expansion from the current architecture?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread Gandalf Corvotempesta
Il 30 giu 2017 3:51 PM,  ha scritto:

Note: I also noticed that you said “order”. Do you mean when we create via
volume set we have to make an order for bricks? I thought gluster handles
(and  do the math) itself.

Yes, you have to specify the exact order
Gluster is not flexible in this way and doesn't help you at all.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to shutdown a node properly ?

2017-06-30 Thread Gandalf Corvotempesta
Yes but why killing gluster notifies all clients and a graceful shutdown
don't?
I think this is a bug, if I'm shutting down a server, it's obvious that all
clients should stop to connect to it

Il 30 giu 2017 3:24 AM, "Ravishankar N"  ha scritto:

> On 06/30/2017 12:40 AM, Renaud Fortier wrote:
>
> On my nodes, when i use the system.d script to kill gluster (service
> glusterfs-server stop) only glusterd is killed. Then I guess the shutdown
> doesn’t kill everything !
>
>
> Killing glusterd does not kill other gluster processes.
>
> When you shutdown a node, everything obviously gets killed but the client
> does not get notified immediately that the brick went down, leading for it
> to wait for the 42 second ping-timeout after which it assumes the brick is
> down. When you kill the brick manually before shutdown, the client
> immediate  receives the notification and you don't see the hang. See Xavi's
> description in Bug 1054694.
>
> So if it is a planned shutdown or reboot, it is better to kill the gluster
> processes before shutting the node down. BTW, you can use
> https://github.com/gluster/glusterfs/blob/master/extras/
> stop-all-gluster-processes.sh which automatically checks for pending
> heals etc before killing the gluster processes.
>
> -Ravi
>
>
>
>
> *De :* Gandalf Corvotempesta [mailto:gandalf.corvotempe...@gmail.com
> ]
> *Envoyé :* 29 juin 2017 13:41
> *À :* Ravishankar N  
> *Cc :* gluster-users@gluster.org; Renaud Fortier
>  
> *Objet :* Re: [Gluster-users] How to shutdown a node properly ?
>
>
>
> Init.d/system.d script doesn't kill gluster automatically on
> reboot/shutdown?
>
>
>
> Il 29 giu 2017 5:16 PM, "Ravishankar N"  ha
> scritto:
>
> On 06/29/2017 08:31 PM, Renaud Fortier wrote:
>
> Hi,
>
> Everytime I shutdown a node, I lost access (from clients) to the volumes
> for 42 seconds (network.ping-timeout). Is there a special way to shutdown a
> node to keep the access to the volumes without interruption ? Currently, I
> use the ‘shutdown’ or ‘reboot’ command.
>
> `killall glusterfs glusterfsd glusterd` before issuing shutdown or
> reboot. If it is a replica or EC volume, ensure that there are no pending
> heals before bringing down a node. i.e. `gluster volume heal volname info`
> should show 0 entries.
>
>
>
>
> My setup is :
>
> -4 gluster 3.10.3 nodes on debian 8 (jessie)
>
> -3 volumes Distributed-Replicate 2 X 2 = 4
>
>
>
> Thank you
>
> Renaud
>
>
>
> ___
>
> Gluster-users mailing list
>
> Gluster-users@gluster.org
>
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to shutdown a node properly ?

2017-06-29 Thread Gandalf Corvotempesta
Init.d/system.d script doesn't kill gluster automatically on
reboot/shutdown?

Il 29 giu 2017 5:16 PM, "Ravishankar N"  ha scritto:

> On 06/29/2017 08:31 PM, Renaud Fortier wrote:
>
> Hi,
>
> Everytime I shutdown a node, I lost access (from clients) to the volumes
> for 42 seconds (network.ping-timeout). Is there a special way to shutdown a
> node to keep the access to the volumes without interruption ? Currently, I
> use the ‘shutdown’ or ‘reboot’ command.
>
> `killall glusterfs glusterfsd glusterd` before issuing shutdown or
> reboot. If it is a replica or EC volume, ensure that there are no pending
> heals before bringing down a node. i.e. `gluster volume heal volname info`
> should show 0 entries.
>
>
>
> My setup is :
>
> -4 gluster 3.10.3 nodes on debian 8 (jessie)
>
> -3 volumes Distributed-Replicate 2 X 2 = 4
>
>
>
> Thank you
>
> Renaud
>
>
> ___
> Gluster-users mailing 
> listGluster-users@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to remove dead peer, osrry urgent again :(

2017-06-11 Thread Gandalf Corvotempesta
Il 11 giu 2017 1:00 PM, "Atin Mukherjee"  ha scritto:

Yes. And please ensure you do this after bringing down all the glusterd
instances and then once the peer file is removed from all the nodes restart
glusterd on all the nodes one after another.


If you have to bring down all gluster instances before file removal, you
also bring down the whole gluster storage
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-06-06 Thread Gandalf Corvotempesta
Any additional tests would be great as a similiar bug was detected and
fixed some months ago and after that, this bug arose​.

Is still unclear to me why two very similiar bug was discovered in two
different times for the same operation
How this is possible?

If you fixed the first bug, why the second one wasn't triggered on your
test environment?


Il 6 giu 2017 10:35 AM, "Mahdi Adnan"  ha scritto:

> Hi,
>
>
> Sorry i did't confirm the results sooner.
>
> Yes, it's working fine without issues for me.
>
> If anyone else can confirm so we can be sure it's 100% resolved.
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Krutika Dhananjay 
> *Sent:* Tuesday, June 6, 2017 9:17:40 AM
> *To:* Mahdi Adnan
> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
> Lemonnier
> *Subject:* Re: Rebalance + VM corruption - current status and request for
> feedback
>
> Hi Mahdi,
>
> Did you get a chance to verify this fix again?
> If this fix works for you, is it OK if we move this bug to CLOSED state
> and revert the rebalance-cli warning patch?
>
> -Krutika
>
> On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan 
> wrote:
>
>> Hello,
>>
>>
>> Yes, i forgot to upgrade the client as well.
>>
>> I did the upgrade and created a new volume, same options as before, with
>> one VM running and doing lots of IOs. i started the rebalance with force
>> and after it completed the process i rebooted the VM, and it did start
>> normally without issues.
>>
>> I repeated the process and did another rebalance while the VM running and
>> everything went fine.
>>
>> But the logs in the client throwing lots of warning messages:
>>
>>
>> [2017-05-29 13:14:59.416382] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>> [2017-05-29 13:14:59.416427] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>> [2017-05-29 13:14:59.808251] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>> [2017-05-29 13:14:59.808287] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>
>>
>>
>> Although the process went smooth, i will run another extensive test
>> tomorrow just to be sure.
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Krutika Dhananjay 
>> *Sent:* Monday, May 29, 2017 9:20:29 AM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>> Lemonnier
>> *Subject:* Re: Rebalance + VM corruption - current status and request
>> for feedback
>>
>> Hi,
>>
>> I took a look at your logs.
>> It very much seems like an issue that is caused by a mismatch in
>> glusterfs client and server packages.
>> So your client (mount) seems to be still running 3.7.20, as confirmed by
>> the occurrence of the following log message:
>>
>> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>> --volfile-server=s3 --volfile-server=s4 --v

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-06-05 Thread Gandalf Corvotempesta
Great, thanks!

Il 5 giu 2017 6:49 AM, "Krutika Dhananjay"  ha scritto:

> The fixes are already available in 3.10.2, 3.8.12 and 3.11.0
>
> -Krutika
>
> On Sun, Jun 4, 2017 at 5:30 PM, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>
>> Great news.
>> Is this planned to be published in next release?
>>
>> Il 29 mag 2017 3:27 PM, "Krutika Dhananjay"  ha
>> scritto:
>>
>>> Thanks for that update. Very happy to hear it ran fine without any
>>> issues. :)
>>>
>>> Yeah so you can ignore those 'No such file or directory' errors. They
>>> represent a transient state where DHT in the client process is yet to
>>> figure out the new location of the file.
>>>
>>> -Krutika
>>>
>>>
>>> On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan 
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>> Yes, i forgot to upgrade the client as well.
>>>>
>>>> I did the upgrade and created a new volume, same options as before,
>>>> with one VM running and doing lots of IOs. i started the rebalance with
>>>> force and after it completed the process i rebooted the VM, and it did
>>>> start normally without issues.
>>>>
>>>> I repeated the process and did another rebalance while the VM running
>>>> and everything went fine.
>>>>
>>>> But the logs in the client throwing lots of warning messages:
>>>>
>>>>
>>>> [2017-05-29 13:14:59.416382] W [MSGID: 114031]
>>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>>> [2017-05-29 13:14:59.416427] W [MSGID: 114031]
>>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>>> [2017-05-29 13:14:59.808251] W [MSGID: 114031]
>>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>>> [2017-05-29 13:14:59.808287] W [MSGID: 114031]
>>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>>>
>>>>
>>>>
>>>> Although the process went smooth, i will run another extensive test
>>>> tomorrow just to be sure.
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> --
>>>> *From:* Krutika Dhananjay 
>>>> *Sent:* Monday, May 29, 2017 9:20:29 AM
>>>>
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>>>> Lemonnier
>>>> *Subject:* Re: Rebalance + VM corruption - current status and request
>>>> for feedback
>>>>
>>>> Hi,
>>>>
>>>> I took a look at your logs.
>>>> It very much seems like an issue that is caused by a mismatch in
>>>> glusterfs client and server packages.
>>>> So your client (mount) seems to be still running 3.7.20, as confirmed
>>>> by the occurrence of the following log message:
>>>>
>>>> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
>>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
>>>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>>>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>>> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterf

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-06-04 Thread Gandalf Corvotempesta
Great news.
Is this planned to be published in next release?

Il 29 mag 2017 3:27 PM, "Krutika Dhananjay"  ha
scritto:

> Thanks for that update. Very happy to hear it ran fine without any issues.
> :)
>
> Yeah so you can ignore those 'No such file or directory' errors. They
> represent a transient state where DHT in the client process is yet to
> figure out the new location of the file.
>
> -Krutika
>
>
> On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan 
> wrote:
>
>> Hello,
>>
>>
>> Yes, i forgot to upgrade the client as well.
>>
>> I did the upgrade and created a new volume, same options as before, with
>> one VM running and doing lots of IOs. i started the rebalance with force
>> and after it completed the process i rebooted the VM, and it did start
>> normally without issues.
>>
>> I repeated the process and did another rebalance while the VM running and
>> everything went fine.
>>
>> But the logs in the client throwing lots of warning messages:
>>
>>
>> [2017-05-29 13:14:59.416382] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>> [2017-05-29 13:14:59.416427] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>> [2017-05-29 13:14:59.808251] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>> [2017-05-29 13:14:59.808287] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>
>>
>>
>> Although the process went smooth, i will run another extensive test
>> tomorrow just to be sure.
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Krutika Dhananjay 
>> *Sent:* Monday, May 29, 2017 9:20:29 AM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>> Lemonnier
>> *Subject:* Re: Rebalance + VM corruption - current status and request
>> for feedback
>>
>> Hi,
>>
>> I took a look at your logs.
>> It very much seems like an issue that is caused by a mismatch in
>> glusterfs client and server packages.
>> So your client (mount) seems to be still running 3.7.20, as confirmed by
>> the occurrence of the following log message:
>>
>> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>
>> whereas the servers have rightly been upgraded to 3.10.2, as seen in
>> rebalance log:
>>
>> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2
>> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebala

Re: [Gluster-users] small files optimizations

2017-05-10 Thread Gandalf Corvotempesta
Yes much clearer but I think this makes some trouble like space available
shown by gluster. Or not?

Il 10 mag 2017 9:10 AM,  ha scritto:

> > As gluster doesn't support to "share" bricks with multiple volumes, I
> would
> > like to create a single volume with VMs and maildirs
>
> Just create two bricks for two volumes ?
> For example if your disk is /mnt/storage, have a
> /mnt/storage/brick_VM and a /mnt/storage/brick_mail or something like that.
> It'll be a lot cleaner I think
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] small files optimizations

2017-05-10 Thread Gandalf Corvotempesta
Currently, which are the best small files optimization that we can enable
on a gluster storage?

I'm planning to move a couple of dovecot servers, with thousands mail files
(from a couple of KB to less than 10-20MB)

These optimizations are compatible with VMs workload, like sharding?
As gluster doesn't support to "share" bricks with multiple volumes, I would
like to create a single volume with VMs and maildirs

Does it make sense ?

Would be interesting to implement a soft of logical volume (with it's own
feature settings) on top of the same bricks?
Like LVM on a disk.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Don't allow data loss via add-brick (was Re: Add single server)

2017-05-03 Thread Gandalf Corvotempesta
2017-05-03 14:22 GMT+02:00 Atin Mukherjee :
> Fix is up @ https://review.gluster.org/#/c/17160/ . The only thing which
> we'd need to decide (and are debating on) is that should we bypass this
> validation with rebalance start force or not. What do others think?

This is a good way to manage bugs. Release notes was updated accordingly and
gluster cli doesn't allow "stupid" operation like rebalance on a
sharded volume that
lead to data loss.

Well done.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 21:08 GMT+02:00 Vijay Bellur :
> We might also want to start thinking about spare bricks that can be brought
> into a volume based on some policy.  For example, if the posix health
> checker determines that underlying storage stack has problems, we can bring
> a spare brick into the volume to replace the failing brick. More policies
> can be evolved for triggering the action of bringing in a spare brick to a
> volume.

Something similiar to a global hotspare.
if Gluster detects some SMART issues (lot of reallocation, predictive
failures and so on) can
bring the hotspare in action, starting to replace the almost-failed disks.

If disk fails totally during the replace, healing should start from a
middle point and not from start (as some data were already synced
automatically) by using the "spare" disk as automatic replacement.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 21:00 GMT+02:00 Shyam :
> So, Gandalf, it will be part of the roadmap, just when we maybe able to pick
> and deliver this is not clear yet (as Pranith puts it as well).

I doesn't matter when. Knowing that adding a single brick will be made
possible is enough (at least for me)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:55 GMT+02:00 Pranith Kumar Karampuri :
> Replace-brick as a command is implemented with the goal of replacing a disk
> that went bad. So the availability was already less. In 2013-2014 I proposed
> that we do it by adding brick to just the replica set and increase its
> replica-count just for that set once heal is complete we could remove this
> brick. But at the point I didn't see any benefit to that approach, because
> availability was already down by 1. But with all of this discussion it seems
> like a good time to revive this idea. I saw that Shyam suggested the same in
> the PR he mentioned before.

Why availability is already less?
replace-brick is usefull for adding a new disks (as we are discussing
here) or if you
have to preventive replace/dismiss a disk.

If you have disks that are getting older and older, you can safely
replace them one by one
with replace-disks. Doing this way will keep you at desidered
redundancy for the whole phase.
If you just remove the older disk and let gluster to heal, you loose
one replica. During the heal
process another disk could fail and so on.

The same like with any RAID. If possible, adding the new disks and the
remove the older one is
better than brutally replace disks. "mdadm", with replace disks (and I
think also ZFS) will add the new
disks keeping full redundancy and after replacement is done, the older
disk is decommissioned.

I don't see any drawback in doing this even with gluster, only advantages.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:46 GMT+02:00 Shyam :
> Fair point. If Gandalf concurs, we will add this to our "+1 scaling" feature
> effort (not yet on github as an issue).

Everything is ok for me as long that:

a) operation must be automated (this is what i've asked initially
[1]), maybe with a single command.

b) during the replace phase, replica count is not decreased but increased.

[1]
Joe's solution require a distributed-replicated volume. this shouldn't
be an issue, modern servers has multiple disk bays, starting with 1
disk per server (replica 3) is fine, If I need to expand i'll add 3
disks more (1 more per server). After this, will be possible to do a
replacement like in Joe's solution. The biggest drawback is that this
soulution isn't viable for server with just 1 brick and no more disk
bays available (ie: a single raid with all disks used as single brick)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:43 GMT+02:00 Shyam :
> I do agree that for the duration a brick is replaced its replication count
> is down by 1, is that your concern? In which case I do note that without (a)
> above, availability is at risk during the operation. Which needs other
> strategies/changes to ensure tolerance to errors/faults.

Oh, yes, i've forgot this too.

I don't know Ceph, but Lizard, when moving chunks across the cluster,
does a copy, not a movement
During the whole operation you'll end with some files/chunks
replicated more than the requirement.

If you have a replica 3, during the movement, some file get replica 4
In Gluster the same operation will bring you replica 2.

IMHO, this isn't a viable/reliable solution

Any change to change "replace-brick" to increase the replica count
during the operation ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:42 GMT+02:00 Joe Julian :
> Because it's done by humans.

Exactly. I forgot to mention this.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:36 GMT+02:00 Pranith Kumar Karampuri :
> Why?

Because you have to manually replace bricks with the newer one, format
the older one and add it back.
What happens if, by mistake, we replace the older brick with another
brick on the same disk ?

Currently you have to only check proper placement based on server,
with this workaround you also have
to check for brick placement on each disk. You add a level and thus
you are incrementing the moving parts and
operations that may  go wrong.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:30 GMT+02:00 Shyam :
> Yes, as a matter of fact, you can do this today using the CLI and creating
> nx2 instead of 1x2. 'n' is best decided by you, depending on the growth
> potential of your cluster, as at some point 'n' wont be enough if you grow
> by some nodes.
>
> But, when a brick is replaced we will fail to address "(a) ability to retain
> replication/availability levels" as we support only homogeneous replication
> counts across all DHT subvols. (I could be corrected on this when using
> replace-brick though)


Yes, but this is error prone.

I'm still thinking that saving (I don't know where, I don't know how)
a mapping between
files and bricks would solve many issues and add much more flexibility.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:22 GMT+02:00 Shyam :
> Brick splitting (I think was first proposed by Jeff Darcy) is to create more
> bricks out of given storage backends. IOW, instead of using a given brick as
> is, create sub-dirs and use them as bricks.
>
> Hence, given 2 local FS end points by the user (say), instead of creating a
> 1x2 volume, create a nx2 volume, with n sub-dirs within the given local FS
> end points as the bricks themselves.
>
> Hence, this gives us n units to work with than just one, helping with issues
> like +1 scaling, among others.

So, with just one disk, you'll be able to do some replacement like
Joe's solution
for adding a single brick regardless the replica count
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:08 GMT+02:00 Pranith Kumar Karampuri :
> Filename can be renamed and then we lost the link because hash will be
> different. Anyways all these kinds of problems are already solved in
> distribute layer.

Filename can be renamed even with the current architecture.
How do you change the GFID after file rename ? In the same way you can
re-hash the file.

> I am sorry at the moment with the given information I am not able to wrap my
> head around the solution you are trying to suggest :-(.

mine was just a POC.

tl;dr: if you can save a mapping between files and brick, inside the
gluster cluster,
you'll get much more flexibility, no SPOF and no need for dedicated
metadata servers.

> At the moment, brick-splitting, inversion of afr/dht has some merit in my
> mind, with tilt towards any solution that avoids this inversion and still
> get the desired benefits.

What is brick-splitting ? Any docs about this ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:00 GMT+02:00 Pranith Kumar Karampuri :
> Let's say we have 1 disk, we format it with say XFS and that becomes a brick
> at the moment. Just curious, what will be the relationship between brick to
> disk in this case(If we leave out LVM for this example)?

No relation. You have to add that brick to the volume.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 19:50 GMT+02:00 Shyam :
> Splitting the bricks need not be a post factum decision, we can start with
> larger brick counts, on a given node/disk count, and hence spread these
> bricks to newer nodes/bricks as they are added.
>
> If I understand the ceph PG count, it works on a similar notion, till the
> cluster grows beyond the initial PG count (set for the pool) at which point
> there is a lot more data movement (as the pg count has to be increased, and
> hence existing PGs need to be further partitioned)

Exactly.
Last time i've used ceph, the PGs worked in a similiar way.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 19:36 GMT+02:00 Pranith Kumar Karampuri :
> To know GFID of file1 you must know where the file resides so that you can
> do getxattr trusted.gfid on the file. So storing server/brick location on
> gfid is not getting us much more information that what we already have.

It was an example. You can use the same xattr solution based on a hash.
A full-path for each volume is unique (obviously, you can't have two
"/tmp/my/file" on the same volume), thus
hashing that to something like SHA1("/tmp/my/file") will give you a
unique name (50b73d9c5dfda264d3878860ed7b1295e104e8ae)
You can use that unique file-name (stored somewhere like
".metedata/50b73d9c5dfda264d3878860ed7b1295e104e8ae" to store the
xattr with proper file locations across the cluster.

As long as you sync the ".metadata" directory across the trusted pool
(or across all member regarding the affected volume),
you should be able to get proper file location by looking for the xattr.

This is just a very basic and stupid POC, i'm just trying to explain
my reasoning.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 19:12 GMT+02:00 Pranith Kumar Karampuri :
> I agree it should. Question is how? What will be the resulting brick-map?

This is why i'm suggesting to add a file mapping somewhere.
You could also use xattr for this:

"file1" is mapped to GFID, then, as xattr for that GFID, you could
save the server/brick location, it this
way you always know where a file is.

To keep it simple for non-developers like me (this is wrong, it's a
simplification):
"/tmp/file1" hashes to 306040e474f199e7969ec266afd10d93

hash starts with "3" thus is located on brick3

You don't need any metadata for this, the hash algoritm is the only
thing you need.

But if you store the file location mapping somewhere (in example as
xattr for the GFID file) you can look for the file without using the
hash algoritm location.

ORIG_FILE="/tmp/file1"
GFID="306040e474f199e7969ec266afd10d93"
FILE_LOCATION=$(getfattr -n "file_location" $GFID)

if $FILE_LOCATION
   read from $FILE_LOCATION
else
   read from original algoritm
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 18:57 GMT+02:00 Pranith Kumar Karampuri :
> Yes this is precisely what all the other SDS with metadata servers kind of
> do. They kind of keep a map of on what all servers a particular file/blob is
> stored in a metadata server.

Not exactly. Other SDS has some servers dedicated to metadata and,
personally, I don't like that approach.

> GlusterFS doesn't do that. In GlusterFS what
> bricks need to be replicated is always given and distribute layer on top of
> these replication layer will do the job of distributing and fetching the
> data. Because replication happens at a brick level and not at a file level
> and distribute happens on top of replication and not at file level. There
> isn't too much metadata that needs to be stored per file. Hence no need for
> separate metadata servers.

And this is great, that's why i'm talking about embedding a sort of database
to be stored on all nodes. no metadata servers, only a mapping between files
and servers.

> If you know path of the file, you can always know where the file is stored
> using pathinfo:
> Method-2 in the following link:
> https://gluster.readthedocs.io/en/latest/Troubleshooting/gfid-to-path/
>
> You don't need any db.

For the current gluster yes.
I'm talking about a different thing.

In a RAID, you have data stored somewhere on the array, with metadata
defining how this data should
be wrote or read. obviously, raid metadata must be stored in a fixed
position, or you won't be able to read
that.

Something similiar could be added in gluster (i don't know if it would
be hard): you store a file mapping in a fixed
position in gluster, then all gluster clients will be able to know
where a file is by looking at this "metadata" stored in
the fixed position.

Like ".gluster" directory. Gluster is using some "internal"
directories for internal operations (".shards", ".gluster", ".trash")
A ".metadata" with file mapping would be hard to add ?

> Basically what you want, if I understood correctly is:
> If we add a 3rd node with just one disk, the data should automatically
> arrange itself splitting itself to 3 categories(Assuming replica-2)
> 1) Files that are present in Node1, Node2
> 2) Files that are present in Node2, Node3
> 3) Files that are present in Node1, Node3
>
> As you can see we arrived at a contradiction where all the nodes should have
> at least 2 bricks but there is only 1 disk. Hence the contradiction. We
> can't do what you are asking without brick splitting. i.e. we need to split
> the disk into 2 bricks.

I don't think so.
Let's assume a replica 2.

S1B1 + S2B1

1TB each, thus 1TB available (2TB/2)

Adding a third 1TB disks should increase available space to 1.5TB (3TB/2)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 18:30 GMT+02:00 Gandalf Corvotempesta
:
> Maybe a simple DB (just as an idea: sqlite, berkeleydb, ...) stored in
> a fixed location on gluster itself, being replicated across nodes.

Even better, embedding RocksDB with it's data directory stored in Gluster
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 18:23 GMT+02:00 Pranith Kumar Karampuri :
> IMHO It is difficult to implement what you are asking for without metadata
> server which stores where each replica is stored.

Can't you distribute a sort of file mapping to each node ?
AFAIK , gluster already has some metadata stored in the cluster, what
is missing is a mapping between each file/shard and brick.

Maybe a simple DB (just as an idea: sqlite, berkeleydb, ...) stored in
a fixed location on gluster itself, being replicated across nodes.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
Il 29 apr 2017 4:12 PM, "Gandalf Corvotempesta" <
gandalf.corvotempe...@gmail.com> ha scritto:

Anyway, the proposed workaround:
https://joejulian.name/blog/how-to-expand-glusterfs-
replicated-clusters-by-one-server/
won't work with just a single volume made up of 2 replicated bricks.
If I have a replica 2 volume with server1:brick1 and server2:brick1,
how can I add server3:brick1 ?
I don't have any bricks to "replace"


Can someone confirm this?
Is possible to use the method described by Joe even with only 3 bricks ?

What if I would like to add the fourth?

I'm really asking, not criticizing.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-04-30 Thread Gandalf Corvotempesta
2017-04-30 10:13 GMT+02:00  :
> I was (I believe) the first one to run into the bug, it happens and I knew it
> was a risk when installing gluster.

I know.

> But since then I didn't see any warnings anywhere except here, I agree
> with you that it should be mentionned in big bold letters on the site.
>
> Might even be worth adding a warning directly on the cli when trying to
> add bricks if sharding is enabled, to make sure no-one will destroy a
> whole cluster for a known bug.

Exactly. This is making me angry.

Even $BigVendor usually release a security bulletin, in example:
https://support.citrix.com/article/CTX214305
https://support.citrix.com/article/CTX214768

Immediatly after discovering that bug, a report was made available (on
official website, not on a mailinglist)
telling users which operations should be avoided until a fix is made.

Gluster don't. There is a huge bug that isn't referenced in official docs.

Is not acting like a customer, i'm just asking for some transparancy.

Even if this is an open source project, nobody should play with user data.
This bug (or, better, these bugs) are know from time, an there is NO WORDS
in any official docs nor the web site.

is not a rare bug, it *always* loose data when used with VMs and
sharding during a rebalance.
this feature should be disabled or users should be warned somewhere on
web site and not forcing
all of them to look through ML archives.

Anyway, i've just asked for a feature like simplifying the add-brick
process. Gluster devs are free to ignore it
but if they are interest in something similiar, i'm willing to provide
more info (if I can, i'm not a developer)

I really love gluster, lack of metadata server is awesome, files
stored "verbatim" with no alteration is amazing (almost all SDS alter
file when stored on disks)
but being forced to add bricks in a multiple of replica count is
making gluster very expesive (yes, there is a workaround with multiple
steps, but this is prone to
error, thus i'm asking to simplify this phase allowing users to add a
single brick to a replica X volume with automatic member replacement
and rebalance)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-04-30 Thread Gandalf Corvotempesta
I'm not acting like a customer, but having lost a whole cluster twice for
the same reason (hopefully they was test cluster with no valuable data only
because I'm still waiting for an hardware part that has still to arrive)
I'm a little bit angry.

The commercial solution isn't a solution because this bug is present even
there.

I haven't put my company data on this cluster only because the HBA I would
like to use isn't delivered to my address and I'm waiting for it and for a
shorter rack rails

So I was a little but luck. If I has all the hardware part, probably i
would be firesd after causing data loss by using a software marked as stable

Is known that this feature is causing data loss and there is no evidence or
no warning in official docs.

Il 30 apr 2017 12:14 AM,  ha scritto:

> I have to agree though, you keep acting like a customer.
> If you don't like what the developers focus on, you are free to
> try and offer a bounty to motivate someone to look at what you want,
> or even better : go and buy a license for one of gluster's commercial
> alternatives.
>
>
> On Sat, Apr 29, 2017 at 11:43:54PM +0200, Gandalf Corvotempesta wrote:
> > I'm pretty sure that I'll be able to sleep well even after your block.
> >
> > Il 29 apr 2017 11:28 PM, "Joe Julian"  ha scritto:
> >
> > > No, you proposed a wish. A feature needs described behavior, certainly
> a
> > > lot more than "it should just know what I want it to do".
> > >
> > > I'm done. You can continue to feel entitled here on the mailing list.
> I'll
> > > just set my filters to bitbucket anything from you.
> > >
> > > On 04/29/2017 01:00 PM, Gandalf Corvotempesta wrote:
> > >
> > > I repeat: I've just proposed a feature
> > > I'm not a C developer and I don't know gluster internals, so I can't
> > > provide details
> > >
> > > I've just asked if simplifying the add brick process is something that
> > > developers are interested to add
> > >
> > > Il 29 apr 2017 9:34 PM, "Joe Julian"  ha
> scritto:
> > >
> > >> What I said publicly in another email ... but not to call out my
> > >> perception of your behavior publicly if also like to say:
> > >>
> > >> Acting adversarial doesn't make anybody want to help, especially not
> me
> > >> and I'm the user community's biggest proponent.
> > >>
> > >> On April 29, 2017 11:08:45 AM PDT, Gandalf Corvotempesta <
> > >> gandalf.corvotempe...@gmail.com> wrote:
> > >>>
> > >>> Mine was a suggestion
> > >>> Fell free to ignore was gluster users has to say and still keep going
> > >>> though your way
> > >>>
> > >>> Usually, open source project tends to follow users suggestions
> > >>>
> > >>> Il 29 apr 2017 5:32 PM, "Joe Julian"  ha
> scritto:
> > >>>
> > >>>> Since this is an open source community project, not a company
> product,
> > >>>> feature requests like these are welcome, but would be more welcome
> with
> > >>>> either code or at least a well described method. Broad asks like
> these are
> > >>>> of little value, imho.
> > >>>>
> > >>>>
> > >>>> On 04/29/2017 07:12 AM, Gandalf Corvotempesta wrote:
> > >>>>
> > >>>>> Anyway, the proposed workaround:
> > >>>>> https://joejulian.name/blog/how-to-expand-glusterfs-replicat
> > >>>>> ed-clusters-by-one-server/
> > >>>>> won't work with just a single volume made up of 2 replicated
> bricks.
> > >>>>> If I have a replica 2 volume with server1:brick1 and
> server2:brick1,
> > >>>>> how can I add server3:brick1 ?
> > >>>>> I don't have any bricks to "replace"
> > >>>>>
> > >>>>> This is something i would like to see implemented in gluster.
> > >>>>>
> > >>>>> 2017-04-29 16:08 GMT+02:00 Gandalf Corvotempesta
> > >>>>> :
> > >>>>>
> > >>>>>> 2017-04-24 10:21 GMT+02:00 Pranith Kumar Karampuri <
> > >>>>>> pkara...@redhat.com>:
> > >>>>>>
> > >>>>>>> Are you suggesting this process to be easier through commands,
> > >>>>&

Re: [Gluster-users] Add single server

2017-04-29 Thread Gandalf Corvotempesta
I'm pretty sure that I'll be able to sleep well even after your block.

Il 29 apr 2017 11:28 PM, "Joe Julian"  ha scritto:

> No, you proposed a wish. A feature needs described behavior, certainly a
> lot more than "it should just know what I want it to do".
>
> I'm done. You can continue to feel entitled here on the mailing list. I'll
> just set my filters to bitbucket anything from you.
>
> On 04/29/2017 01:00 PM, Gandalf Corvotempesta wrote:
>
> I repeat: I've just proposed a feature
> I'm not a C developer and I don't know gluster internals, so I can't
> provide details
>
> I've just asked if simplifying the add brick process is something that
> developers are interested to add
>
> Il 29 apr 2017 9:34 PM, "Joe Julian"  ha scritto:
>
>> What I said publicly in another email ... but not to call out my
>> perception of your behavior publicly if also like to say:
>>
>> Acting adversarial doesn't make anybody want to help, especially not me
>> and I'm the user community's biggest proponent.
>>
>> On April 29, 2017 11:08:45 AM PDT, Gandalf Corvotempesta <
>> gandalf.corvotempe...@gmail.com> wrote:
>>>
>>> Mine was a suggestion
>>> Fell free to ignore was gluster users has to say and still keep going
>>> though your way
>>>
>>> Usually, open source project tends to follow users suggestions
>>>
>>> Il 29 apr 2017 5:32 PM, "Joe Julian"  ha scritto:
>>>
>>>> Since this is an open source community project, not a company product,
>>>> feature requests like these are welcome, but would be more welcome with
>>>> either code or at least a well described method. Broad asks like these are
>>>> of little value, imho.
>>>>
>>>>
>>>> On 04/29/2017 07:12 AM, Gandalf Corvotempesta wrote:
>>>>
>>>>> Anyway, the proposed workaround:
>>>>> https://joejulian.name/blog/how-to-expand-glusterfs-replicat
>>>>> ed-clusters-by-one-server/
>>>>> won't work with just a single volume made up of 2 replicated bricks.
>>>>> If I have a replica 2 volume with server1:brick1 and server2:brick1,
>>>>> how can I add server3:brick1 ?
>>>>> I don't have any bricks to "replace"
>>>>>
>>>>> This is something i would like to see implemented in gluster.
>>>>>
>>>>> 2017-04-29 16:08 GMT+02:00 Gandalf Corvotempesta
>>>>> :
>>>>>
>>>>>> 2017-04-24 10:21 GMT+02:00 Pranith Kumar Karampuri <
>>>>>> pkara...@redhat.com>:
>>>>>>
>>>>>>> Are you suggesting this process to be easier through commands,
>>>>>>> rather than
>>>>>>> for administrators to figure out how to place the data?
>>>>>>>
>>>>>>> [1] http://lists.gluster.org/pipermail/gluster-users/2016-July/0
>>>>>>> 27431.html
>>>>>>>
>>>>>> Admin should always have the ability to choose where to place data,
>>>>>> but something
>>>>>> easier should be added, like in any other SDS.
>>>>>>
>>>>>> Something like:
>>>>>>
>>>>>> gluster volume add-brick gv0 new_brick
>>>>>>
>>>>>> if gv0 is a replicated volume, the add-brick should automatically add
>>>>>> the new brick and rebalance data automatically, still keeping the
>>>>>> required redundancy level
>>>>>>
>>>>>> In case admin would like to set a custom placement for data, it should
>>>>>> specify a "force" argument or something similiar.
>>>>>>
>>>>>> tl;dr: as default, gluster should preserve data redundancy allowing
>>>>>> users to add single bricks without having to think how to place data.
>>>>>> This will make gluster way easier to manage and much less error prone,
>>>>>> thus increasing the resiliency of the whole gluster.
>>>>>> after all , if you have a replicated volume, is obvious that you want
>>>>>> your data to be replicated and gluster should manage this on it's own.
>>>>>>
>>>>>> Is this something are you planning or considering for further
>>>>>> implementation?
>>>>>> I know that lack of metadata server (this is a HUGE advantage for
>>>>>> gluster) means less flexibility, but as there is a manual workaround
>>>>>> for adding
>>>>>> single bricks, gluster should be able to handle this automatically.
>>>>>>
>>>>> ___
>>>>> Gluster-users mailing list
>>>>> Gluster-users@gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>> ___
>>>> Gluster-users mailing list
>>>> Gluster-users@gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-04-29 Thread Gandalf Corvotempesta
I repeat: I've just proposed a feature
I'm not a C developer and I don't know gluster internals, so I can't
provide details

I've just asked if simplifying the add brick process is something that
developers are interested to add

Il 29 apr 2017 9:34 PM, "Joe Julian"  ha scritto:

> What I said publicly in another email ... but not to call out my
> perception of your behavior publicly if also like to say:
>
> Acting adversarial doesn't make anybody want to help, especially not me
> and I'm the user community's biggest proponent.
>
> On April 29, 2017 11:08:45 AM PDT, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>>
>> Mine was a suggestion
>> Fell free to ignore was gluster users has to say and still keep going
>> though your way
>>
>> Usually, open source project tends to follow users suggestions
>>
>> Il 29 apr 2017 5:32 PM, "Joe Julian"  ha scritto:
>>
>>> Since this is an open source community project, not a company product,
>>> feature requests like these are welcome, but would be more welcome with
>>> either code or at least a well described method. Broad asks like these are
>>> of little value, imho.
>>>
>>>
>>> On 04/29/2017 07:12 AM, Gandalf Corvotempesta wrote:
>>>
>>>> Anyway, the proposed workaround:
>>>> https://joejulian.name/blog/how-to-expand-glusterfs-replicat
>>>> ed-clusters-by-one-server/
>>>> won't work with just a single volume made up of 2 replicated bricks.
>>>> If I have a replica 2 volume with server1:brick1 and server2:brick1,
>>>> how can I add server3:brick1 ?
>>>> I don't have any bricks to "replace"
>>>>
>>>> This is something i would like to see implemented in gluster.
>>>>
>>>> 2017-04-29 16:08 GMT+02:00 Gandalf Corvotempesta
>>>> :
>>>>
>>>>> 2017-04-24 10:21 GMT+02:00 Pranith Kumar Karampuri <
>>>>> pkara...@redhat.com>:
>>>>>
>>>>>> Are you suggesting this process to be easier through commands, rather
>>>>>> than
>>>>>> for administrators to figure out how to place the data?
>>>>>>
>>>>>> [1] http://lists.gluster.org/pipermail/gluster-users/2016-July/0
>>>>>> 27431.html
>>>>>>
>>>>> Admin should always have the ability to choose where to place data,
>>>>> but something
>>>>> easier should be added, like in any other SDS.
>>>>>
>>>>> Something like:
>>>>>
>>>>> gluster volume add-brick gv0 new_brick
>>>>>
>>>>> if gv0 is a replicated volume, the add-brick should automatically add
>>>>> the new brick and rebalance data automatically, still keeping the
>>>>> required redundancy level
>>>>>
>>>>> In case admin would like to set a custom placement for data, it should
>>>>> specify a "force" argument or something similiar.
>>>>>
>>>>> tl;dr: as default, gluster should preserve data redundancy allowing
>>>>> users to add single bricks without having to think how to place data.
>>>>> This will make gluster way easier to manage and much less error prone,
>>>>> thus increasing the resiliency of the whole gluster.
>>>>> after all , if you have a replicated volume, is obvious that you want
>>>>> your data to be replicated and gluster should manage this on it's own.
>>>>>
>>>>> Is this something are you planning or considering for further
>>>>> implementation?
>>>>> I know that lack of metadata server (this is a HUGE advantage for
>>>>> gluster) means less flexibility, but as there is a manual workaround
>>>>> for adding
>>>>> single bricks, gluster should be able to handle this automatically.
>>>>>
>>>> ___
>>>> Gluster-users mailing list
>>>> Gluster-users@gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-04-29 Thread Gandalf Corvotempesta
Mine was a suggestion
Fell free to ignore was gluster users has to say and still keep going
though your way

Usually, open source project tends to follow users suggestions

Il 29 apr 2017 5:32 PM, "Joe Julian"  ha scritto:

> Since this is an open source community project, not a company product,
> feature requests like these are welcome, but would be more welcome with
> either code or at least a well described method. Broad asks like these are
> of little value, imho.
>
>
> On 04/29/2017 07:12 AM, Gandalf Corvotempesta wrote:
>
>> Anyway, the proposed workaround:
>> https://joejulian.name/blog/how-to-expand-glusterfs-replicat
>> ed-clusters-by-one-server/
>> won't work with just a single volume made up of 2 replicated bricks.
>> If I have a replica 2 volume with server1:brick1 and server2:brick1,
>> how can I add server3:brick1 ?
>> I don't have any bricks to "replace"
>>
>> This is something i would like to see implemented in gluster.
>>
>> 2017-04-29 16:08 GMT+02:00 Gandalf Corvotempesta
>> :
>>
>>> 2017-04-24 10:21 GMT+02:00 Pranith Kumar Karampuri >> >:
>>>
>>>> Are you suggesting this process to be easier through commands, rather
>>>> than
>>>> for administrators to figure out how to place the data?
>>>>
>>>> [1] http://lists.gluster.org/pipermail/gluster-users/2016-July/
>>>> 027431.html
>>>>
>>> Admin should always have the ability to choose where to place data,
>>> but something
>>> easier should be added, like in any other SDS.
>>>
>>> Something like:
>>>
>>> gluster volume add-brick gv0 new_brick
>>>
>>> if gv0 is a replicated volume, the add-brick should automatically add
>>> the new brick and rebalance data automatically, still keeping the
>>> required redundancy level
>>>
>>> In case admin would like to set a custom placement for data, it should
>>> specify a "force" argument or something similiar.
>>>
>>> tl;dr: as default, gluster should preserve data redundancy allowing
>>> users to add single bricks without having to think how to place data.
>>> This will make gluster way easier to manage and much less error prone,
>>> thus increasing the resiliency of the whole gluster.
>>> after all , if you have a replicated volume, is obvious that you want
>>> your data to be replicated and gluster should manage this on it's own.
>>>
>>> Is this something are you planning or considering for further
>>> implementation?
>>> I know that lack of metadata server (this is a HUGE advantage for
>>> gluster) means less flexibility, but as there is a manual workaround
>>> for adding
>>> single bricks, gluster should be able to handle this automatically.
>>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Testing gluster

2017-04-29 Thread Gandalf Corvotempesta
I would like to heavy test a small gluster installation.
Anyone did this previously ?

I think that running bonnie++ for 2 or more days and trying to remove
nodes/bricks
would be enough to test everything, but how can i ensure that, after
some days, all
file stored are exactly how bonnie++ has created ?

Probably, rsync would be better ? I can try to sync a directory with
millions of files
and while the syncing is running, trying to make some damages (power
off, unplug, etc etc).
After all, re-running rsync should not transfer any file, they should
be already present.

Right ? If rsync re-sync files, means that gluster has made some data
loss or data corruption.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-04-29 Thread Gandalf Corvotempesta
Anyway, the proposed workaround:
https://joejulian.name/blog/how-to-expand-glusterfs-replicated-clusters-by-one-server/
won't work with just a single volume made up of 2 replicated bricks.
If I have a replica 2 volume with server1:brick1 and server2:brick1,
how can I add server3:brick1 ?
I don't have any bricks to "replace"

This is something i would like to see implemented in gluster.

2017-04-29 16:08 GMT+02:00 Gandalf Corvotempesta
:
> 2017-04-24 10:21 GMT+02:00 Pranith Kumar Karampuri :
>> Are you suggesting this process to be easier through commands, rather than
>> for administrators to figure out how to place the data?
>>
>> [1] http://lists.gluster.org/pipermail/gluster-users/2016-July/027431.html
>
> Admin should always have the ability to choose where to place data,
> but something
> easier should be added, like in any other SDS.
>
> Something like:
>
> gluster volume add-brick gv0 new_brick
>
> if gv0 is a replicated volume, the add-brick should automatically add
> the new brick and rebalance data automatically, still keeping the
> required redundancy level
>
> In case admin would like to set a custom placement for data, it should
> specify a "force" argument or something similiar.
>
> tl;dr: as default, gluster should preserve data redundancy allowing
> users to add single bricks without having to think how to place data.
> This will make gluster way easier to manage and much less error prone,
> thus increasing the resiliency of the whole gluster.
> after all , if you have a replicated volume, is obvious that you want
> your data to be replicated and gluster should manage this on it's own.
>
> Is this something are you planning or considering for further implementation?
> I know that lack of metadata server (this is a HUGE advantage for
> gluster) means less flexibility, but as there is a manual workaround
> for adding
> single bricks, gluster should be able to handle this automatically.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-04-29 Thread Gandalf Corvotempesta
2017-04-24 10:21 GMT+02:00 Pranith Kumar Karampuri :
> Are you suggesting this process to be easier through commands, rather than
> for administrators to figure out how to place the data?
>
> [1] http://lists.gluster.org/pipermail/gluster-users/2016-July/027431.html

Admin should always have the ability to choose where to place data,
but something
easier should be added, like in any other SDS.

Something like:

gluster volume add-brick gv0 new_brick

if gv0 is a replicated volume, the add-brick should automatically add
the new brick and rebalance data automatically, still keeping the
required redundancy level

In case admin would like to set a custom placement for data, it should
specify a "force" argument or something similiar.

tl;dr: as default, gluster should preserve data redundancy allowing
users to add single bricks without having to think how to place data.
This will make gluster way easier to manage and much less error prone,
thus increasing the resiliency of the whole gluster.
after all , if you have a replicated volume, is obvious that you want
your data to be replicated and gluster should manage this on it's own.

Is this something are you planning or considering for further implementation?
I know that lack of metadata server (this is a HUGE advantage for
gluster) means less flexibility, but as there is a manual workaround
for adding
single bricks, gluster should be able to handle this automatically.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-27 Thread Gandalf Corvotempesta
2017-04-27 14:03 GMT+02:00 Pranith Kumar Karampuri :
> The bugs are not in sharding. Sharding + VM workload is exposing bugs are in
> DHT/rebalance. These bugs existed for years. They are coming to the fore
> only now. It proves to be very difficult to recreate these bugs in other
> environments. We are very transparent about this fact. Until these issues
> are fixed please don't do rebalance in shard+VM environments.

I apreciate this, but it's so critical that you should warn users in
official docs, not only in mailing list.
Not all users reads gluster ML, if someone is trying to use sharding
(as wrote in docs) and then rebalance,
will loose everything.

Anyway , any plan to fix all of these in the upcoming release ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-27 Thread Gandalf Corvotempesta
2017-04-27 13:31 GMT+02:00 Pranith Kumar Karampuri :
> But even after that fix, it is still leading to pause. And these are the two
> updates on what the developers are doing as per my understanding. So that
> workflow is not stable yet IMO.

So, even after that fix, two more critical bug leading to
dataloss/corruption where found ?

I'm sorry but this is pure madness, plese put a "beta" label on
sharding or gluster itself. 4 bugs in a row making loose of data in a
software enginereed to keep data safe are unacceptable.
Yes, all software has bug, but not with this frequency, for the same
reason, with the same results: data loss.

I'm really, really, really warried to put my company valuable data in
a gluster storage. If I loose hundreds of VMs at once, i'm really
fucked.
Yes, backup exists, but try to say to your customers that you have to
restore 20TB from backups (it takes days) and that they lost many
ecommerce orders/transactions.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-27 Thread Gandalf Corvotempesta
2017-04-27 13:21 GMT+02:00 Serkan Çoban :
> I think this is he fix Gandalf asking for:
> https://github.com/gluster/glusterfs/commit/6e3054b42f9aef1e35b493fbb002ec47e1ba27ce

Yes, i'm talking about this.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-27 Thread Gandalf Corvotempesta
I think we are talking about a different bug.

Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri"  ha
scritto:

> I am not a DHT developer, so some of what I say could be a little wrong.
> But this is what I gather.
> I think they found 2 classes of bugs in dht
> 1) Graceful fop failover when rebalance is in progress is missing for some
> fops, that lead to VM pause.
>
> I see that https://review.gluster.org/17085 got merged on 24th on master
> for this. I see patches are posted for 3.8.x for this one.
>
> 2) I think there is some work needs to be done for dht_[f]xattrop. I
> believe this is the next step that is underway.
>
>
> On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>
>> Updates on this critical bug ?
>>
>> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" <
>> gandalf.corvotempe...@gmail.com> ha scritto:
>>
>>> Any update ?
>>> In addition, if this is a different bug but the "workflow" is the same
>>> as the previous one, how is possible that fixing the previous bug
>>> triggered this new one ?
>>>
>>> Is possible to have some details ?
>>>
>>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay :
>>> > Nope. This is a different bug.
>>> >
>>> > -Krutika
>>> >
>>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
>>> >  wrote:
>>> >>
>>> >> This is a good news
>>> >> Is this related to the previously fixed bug?
>>> >>
>>> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay"  ha
>>> >> scritto:
>>> >>>
>>> >>> So Raghavendra has an RCA for this issue.
>>> >>>
>>> >>> Copy-pasting his comment here:
>>> >>>
>>> >>> 
>>> >>>
>>> >>> Following is a rough algorithm of shard_writev:
>>> >>>
>>> >>> 1. Based on the offset, calculate the shards touched by current
>>> write.
>>> >>> 2. Look for inodes corresponding to these shard files in itable.
>>> >>> 3. If one or more inodes are missing from itable, issue mknod for
>>> >>> corresponding shard files and ignore EEXIST in cbk.
>>> >>> 4. resume writes on respective shards.
>>> >>>
>>> >>> Now, imagine a write which falls to an existing "shard_file". For the
>>> >>> sake of discussion lets consider a distribute of three subvols - s1,
>>> s2, s3
>>> >>>
>>> >>> 1. "shard_file" hashes to subvolume s2 and is present on s2
>>> >>> 2. add a subvolume s4 and initiate a fix layout. The layout of
>>> ".shard"
>>> >>> is fixed to include s4 and hash ranges are changed.
>>> >>> 3. write that touches "shard_file" is issued.
>>> >>> 4. The inode for "shard_file" is not present in itable after a graph
>>> >>> switch and features/shard issues an mknod.
>>> >>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and
>>> >>> mknod (shard_file) on s3 succeeds. But, the shard_file is already
>>> present on
>>> >>> s2.
>>> >>>
>>> >>> So, we have two files on two different subvols of dht representing
>>> same
>>> >>> shard and this will lead to corruption.
>>> >>>
>>> >>> 
>>> >>>
>>> >>> Raghavendra will be sending out a patch in DHT to fix this issue.
>>> >>>
>>> >>> -Krutika
>>> >>>
>>> >>>
>>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri
>>> >>>  wrote:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <
>>> mahdi.ad...@outlook.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>>
>>> >>>>> Do you guys have any update regarding this issue ?
>>> >>>>
>>> >>>> I do not actively work on this issue so I do not have an accurate
>>> >>>> update, but from what I heard from Krutika an

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-26 Thread Gandalf Corvotempesta
Updates on this critical bug ?

Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" <
gandalf.corvotempe...@gmail.com> ha scritto:

> Any update ?
> In addition, if this is a different bug but the "workflow" is the same
> as the previous one, how is possible that fixing the previous bug
> triggered this new one ?
>
> Is possible to have some details ?
>
> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay :
> > Nope. This is a different bug.
> >
> > -Krutika
> >
> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
> >  wrote:
> >>
> >> This is a good news
> >> Is this related to the previously fixed bug?
> >>
> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay"  ha
> >> scritto:
> >>>
> >>> So Raghavendra has an RCA for this issue.
> >>>
> >>> Copy-pasting his comment here:
> >>>
> >>> 
> >>>
> >>> Following is a rough algorithm of shard_writev:
> >>>
> >>> 1. Based on the offset, calculate the shards touched by current write.
> >>> 2. Look for inodes corresponding to these shard files in itable.
> >>> 3. If one or more inodes are missing from itable, issue mknod for
> >>> corresponding shard files and ignore EEXIST in cbk.
> >>> 4. resume writes on respective shards.
> >>>
> >>> Now, imagine a write which falls to an existing "shard_file". For the
> >>> sake of discussion lets consider a distribute of three subvols - s1,
> s2, s3
> >>>
> >>> 1. "shard_file" hashes to subvolume s2 and is present on s2
> >>> 2. add a subvolume s4 and initiate a fix layout. The layout of ".shard"
> >>> is fixed to include s4 and hash ranges are changed.
> >>> 3. write that touches "shard_file" is issued.
> >>> 4. The inode for "shard_file" is not present in itable after a graph
> >>> switch and features/shard issues an mknod.
> >>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and
> >>> mknod (shard_file) on s3 succeeds. But, the shard_file is already
> present on
> >>> s2.
> >>>
> >>> So, we have two files on two different subvols of dht representing same
> >>> shard and this will lead to corruption.
> >>>
> >>> 
> >>>
> >>> Raghavendra will be sending out a patch in DHT to fix this issue.
> >>>
> >>> -Krutika
> >>>
> >>>
> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri
> >>>  wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <
> mahdi.ad...@outlook.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>>
> >>>>> Do you guys have any update regarding this issue ?
> >>>>
> >>>> I do not actively work on this issue so I do not have an accurate
> >>>> update, but from what I heard from Krutika and Raghavendra(works on
> DHT) is:
> >>>> Krutika debugged initially and found that the issue seems more likely
> to be
> >>>> in DHT, Satheesaran who helped us recreate this issue in lab found
> that just
> >>>> fix-layout without rebalance also caused the corruption 1 out of 3
> times.
> >>>> Raghavendra came up with a possible RCA for why this can happen.
> >>>> Raghavendra(CCed) would be the right person to provide accurate
> update.
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Respectfully
> >>>>> Mahdi A. Mahdi
> >>>>>
> >>>>> 
> >>>>> From: Krutika Dhananjay 
> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM
> >>>>> To: Mahdi Adnan
> >>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
> >>>>> gluster-users@gluster.org List
> >>>>>
> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> So it looks like Satheesaran managed to recreate this issue. We will
> be
> >>>>> seeking his help in debugging this. It will

[Gluster-users] Cluster management

2017-04-25 Thread Gandalf Corvotempesta
Sorrt for the stupid subject and for questions that probably should be
pleaced in FAQ page, but,
let's assume a replica 3 cluster made with 3 servers (1 brick per server)

1) can I add a fourth server, with one brick, increasing the total
available space? If yes, how?

2) can I increase replica count from 3 to 4 ?

3) can I decrease replica count from 3 to 2 ?

4) can I move from a replicated volume to a distributed replicated volume ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-04-24 Thread Gandalf Corvotempesta
2017-04-24 10:21 GMT+02:00 Pranith Kumar Karampuri :
> At least in case of EC it is with good reason. If you want to change
> volume's configuration from 6+2->7+2 you have to compute the encoding again
> and place different data on the resulting 9 bricks. Which has to be done for
> all files. It is better to just create a new volume with 7+2 and just copy
> the files on to this volume and remove the original files on volume with
> 6+2.

Ok, for EC this makes sense.

> Didn't understand this math. If you want to add 2TB capacity to a volume
> that is 3-way replicated, you essentially need to add 6TB in whatever
> solution you have. At least 6TB with a single server. Which you can do even
> with Gluster.

Obviously, if you add a single 2TB disk in a replica 3, you won't get 2TB usable
space but only 1/3, about 600GB

> I think we had this discussion last July[1] with you that we can simulate
> the same things other storage solutions with metadata do by doing
> replace-bricks and rebalance. If you have a new server with 8 bricks then we
> can add a single server and make sure things are rebalanced with 6+2. Please
> note it is better to use data-bricks that is power of 2 like 4+2/8+2/16+4
> etc than 6+2.

This is an hugly workaround and very prone to errors.
Usually I prefere to not mess with my data, making multiple steps where
any other SDS has this natively.

Please take a look at LizardFS or MooseFS (or even Ceph). You can add
a single disk and
it will be automatically added and rebalanced without loosing
redudancy in any single phase.
If you add a 2TB disk on a replica 3, you'll end up automatically with +600GB
You can also choose which file must be replicated where and how, if
you need one replica on SSD
and other replica (for the same file) on HDD, this is possible, or
even one replica on local SSD, one replica
on hdd on the same datacenter and the third replica on HDD on the dark
side of the moon.

I don' think this would be possible with gluster with fixed "files
map" (I don't know the exact terms) because
the lack of metadata server doesn't allow you to know where a file is,
without assuring that is located in a
fixed position across the whole cluster

in gluster to archieve the same you have to run multiple commands,
respect the proper order of command
line arguments and so on. This is very, very, very risky and prone to errors.

This is not a battle between two SDS and I don't want to be pedant
(i'm just giving some suggestions),
but it's a fact that these SDS are way more flexible than gluster (and
on daily usage, far more cheaper)
I'm hoping that newer version of gluster brings some more flexibility
in brick placement/management.

> Are you suggesting this process to be easier through commands, rather than
> for administrators to figure out how to place the data?

Yes, this for sure. SDS must ensure data resiliency always, so,
whatever operation you'll do, data must be
always replicated properly. If you need to make some dangerous
operation, a "--force" must be used.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-04-24 Thread Gandalf Corvotempesta
Il 24 apr 2017 9:40 AM, "Ashish Pandey"  ha scritto:


There is  difference between server and bricks which we should understand.
When we say  m+n = 6+2, then we are talking about the bricks.
Total number of bricks are m+n = 8.

Now, these bricks could be anywhere on any server. The only thing is that
the server should be a part of cluster.
You can have all the 8 bricks on one server or on 8 different servers.
So, there is no *restriction* on number of servers when you add bricks.
However, the number of bricks which you want to add should be in multiple
of the
configuration you have.


This is clear but it doesn't change the result
As no one is using gluster to replicate data by loosing redundancy (it's
nonsense), adding bricks means adding servers
If our server are already full with no more available slots for adding
disks, the only solution is to add 8 servers more (at least 1 brick per
server)



In you case it should be 8, 16, 24

"can I add a single node moving from 6:2 to 7:2 and so on ?"
You can not make 6+2 config volume  to 7+2 volume. You can not change the
*configuration* of an existing volume.
You can just add bricks in multiple to increase the storage capacity.


Yes and this is the worst thing in gluster: the almost zero flexibility

Bigger the cluster, higher the cost to maintain it or expand it.

If you start with a 6:2 by using commodity hardware, you are screwed, your
next upgrade will be 8 servers with 1 disk/brick each.

Yes, gluster doesn't make use of any metadata server, but I really prefer
to add 2 metadata server and 1 storage server at once when needed than
avoid metadata servers but being forced to add a bounch of servers every
time

More servers means more power cost, more hardware that could fails and so
on.

Let's assume a replica 3 cluster.
If I need to add 2tb more, I have to add 3 servers with 2tb on each server.
Ceph, Lizard, Moose and others allow adding a single server/disk and then
they rebalance data aroud by freeing up the used space adding the new disk.

I thought that this lack of flexibility was addressed is some way in latest
version...
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Add single server

2017-04-22 Thread Gandalf Corvotempesta
I'm still trying to figure out if adding a single server to an
existing gluster cluster is possible or not, based on EC or standard
replica.

I don't think so, because with replica 3, when each server is already
full (no more slots for disks), I need to add 3 server at once.

Is this the same even with EC ? In example, a 6:2 configuration is
"fixed" or can I add a single node moving from 6:2 to 7:2 and so on ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-18 Thread Gandalf Corvotempesta
Any update ?
In addition, if this is a different bug but the "workflow" is the same
as the previous one, how is possible that fixing the previous bug
triggered this new one ?

Is possible to have some details ?

2017-04-04 16:11 GMT+02:00 Krutika Dhananjay :
> Nope. This is a different bug.
>
> -Krutika
>
> On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
>  wrote:
>>
>> This is a good news
>> Is this related to the previously fixed bug?
>>
>> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay"  ha
>> scritto:
>>>
>>> So Raghavendra has an RCA for this issue.
>>>
>>> Copy-pasting his comment here:
>>>
>>> 
>>>
>>> Following is a rough algorithm of shard_writev:
>>>
>>> 1. Based on the offset, calculate the shards touched by current write.
>>> 2. Look for inodes corresponding to these shard files in itable.
>>> 3. If one or more inodes are missing from itable, issue mknod for
>>> corresponding shard files and ignore EEXIST in cbk.
>>> 4. resume writes on respective shards.
>>>
>>> Now, imagine a write which falls to an existing "shard_file". For the
>>> sake of discussion lets consider a distribute of three subvols - s1, s2, s3
>>>
>>> 1. "shard_file" hashes to subvolume s2 and is present on s2
>>> 2. add a subvolume s4 and initiate a fix layout. The layout of ".shard"
>>> is fixed to include s4 and hash ranges are changed.
>>> 3. write that touches "shard_file" is issued.
>>> 4. The inode for "shard_file" is not present in itable after a graph
>>> switch and features/shard issues an mknod.
>>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and
>>> mknod (shard_file) on s3 succeeds. But, the shard_file is already present on
>>> s2.
>>>
>>> So, we have two files on two different subvols of dht representing same
>>> shard and this will lead to corruption.
>>>
>>> 
>>>
>>> Raghavendra will be sending out a patch in DHT to fix this issue.
>>>
>>> -Krutika
>>>
>>>
>>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri
>>>  wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan 
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> Do you guys have any update regarding this issue ?
>>>>
>>>> I do not actively work on this issue so I do not have an accurate
>>>> update, but from what I heard from Krutika and Raghavendra(works on DHT) 
>>>> is:
>>>> Krutika debugged initially and found that the issue seems more likely to be
>>>> in DHT, Satheesaran who helped us recreate this issue in lab found that 
>>>> just
>>>> fix-layout without rebalance also caused the corruption 1 out of 3 times.
>>>> Raghavendra came up with a possible RCA for why this can happen.
>>>> Raghavendra(CCed) would be the right person to provide accurate update.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> Mahdi A. Mahdi
>>>>>
>>>>> 
>>>>> From: Krutika Dhananjay 
>>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM
>>>>> To: Mahdi Adnan
>>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
>>>>> gluster-users@gluster.org List
>>>>>
>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>>>
>>>>> Hi,
>>>>>
>>>>> So it looks like Satheesaran managed to recreate this issue. We will be
>>>>> seeking his help in debugging this. It will be easier that way.
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan 
>>>>> wrote:
>>>>>>
>>>>>> Hello and thank you for your email.
>>>>>> Actually no, i didn't check the gfid of the vms.
>>>>>> If this will help, i can setup a new test cluster and get all the data
>>>>>> you need.
>>>>>>
>>>>>> Get Outlook for Android
>>>>>>
>>>>>>
>>>>>> From: Nithya Balachandran
>>>>>

Re: [Gluster-users] How to Speed UP heal process in Glusterfs 3.10.1

2017-04-18 Thread Gandalf Corvotempesta
2017-04-18 9:36 GMT+02:00 Serkan Çoban :
> Nope, healing speed is 10MB/sec/brick, each brick heals with this
> speed, so one brick or one server each will heal in one week...

Is this by design ? Is it tuneable ? 10MB/s/brick is too low for us.
We will use 10GB ethernet, healing 10MB/s/brick would be a bottleneck.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to Speed UP heal process in Glusterfs 3.10.1

2017-04-18 Thread Gandalf Corvotempesta
2017-04-18 9:17 GMT+02:00 Serkan Çoban :
> In my case I see 6TB data was healed within 7-8 days with above command 
> running.

But is this normal? Gluster need about 7-8 days to heal 6TB ?
In case of a server failure, you need some weeks to heal ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Rebalance info

2017-04-17 Thread Gandalf Corvotempesta
Let's assume a replica 3 cluster with 3 bricks used at 95%

If I add 3 bricks more , a rebalance (in addition to the corruption :-) )
will move some shards to the newly added bricks so that old bricks usage
will go down from 95% to (maybe) 50% ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Rebalance corruption

2017-04-11 Thread Gandalf Corvotempesta
Just a question: is the rebalance bug that corrupt data also present in
RHGS?

If yes, why there is nothing wrote on redhat site to warn users to not
rebalance a sharded volume?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multiple bricks on same node

2017-04-04 Thread Gandalf Corvotempesta
You have to specify the correct order of brick forming the same replica set

In example:

host1:brick1 host2:brick2 host3:brick3 host1:brick4 host2:brick5
host3:brick6

What you did is forming a replica set with all bricks on the same host,
thus an host failure will bring your cluster down

Il 4 apr 2017 8:27 PM, "Valerio Luccio"  ha scritto:

> I apologize if this has been answered before, I haven't found a satisfying
> explanation.
> I've just set up a new gluster. We have 4 servers and each has 3 RAIDS
> with separate raid controller. I wanted to create one giant data space and
> tried:
>
> $ gluster volume create Data replica 2 transport tcp host1:/brick1/data
> host1:/brick2/data host1:/brick3/data [...] host4:/brick3/data
>
> This gave me the error message:
>
> volume create: MRIData: failed: Multiple bricks of a replicate volume are
> present on the same server. This setup is not optimal. Use 'force' at the
> end of the command if you want to override this behavior.
>
> The 'force' option allowed me to create a volume and everything seems to
> work. The question is, whys is this not optimal ? What are the potential
> pitfalls ? I don't want to find myself with an unusable data space.
> Also, does it make a difference the order in which I specify the bricks ?
>
> Thanks,
>
> --
> Valerio Luccio (212) 998-8736
> Center for Brain Imaging   4 Washington Place, Room 158
> New York UniversityNew York, NY 10003
>
> "In an open world, who needs windows or gates ?"
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-03 Thread Gandalf Corvotempesta
This is a good news
Is this related to the previously fixed bug?

Il 3 apr 2017 10:22 AM, "Krutika Dhananjay"  ha
scritto:

> So Raghavendra has an RCA for this issue.
>
> Copy-pasting his comment here:
>
> 
>
> Following is a rough algorithm of shard_writev:
>
> 1. Based on the offset, calculate the shards touched by current write.
> 2. Look for inodes corresponding to these shard files in itable.
> 3. If one or more inodes are missing from itable, issue mknod for 
> corresponding shard files and ignore EEXIST in cbk.
> 4. resume writes on respective shards.
>
> Now, imagine a write which falls to an existing "shard_file". For the sake of 
> discussion lets consider a distribute of three subvols - s1, s2, s3
>
> 1. "shard_file" hashes to subvolume s2 and is present on s2
> 2. add a subvolume s4 and initiate a fix layout. The layout of ".shard" is 
> fixed to include s4 and hash ranges are changed.
> 3. write that touches "shard_file" is issued.
> 4. The inode for "shard_file" is not present in itable after a graph switch 
> and features/shard issues an mknod.
> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and mknod 
> (shard_file) on s3 succeeds. But, the shard_file is already present on s2.
>
> So, we have two files on two different subvols of dht representing same shard 
> and this will lead to corruption.
>
> 
>
> Raghavendra will be sending out a patch in DHT to fix this issue.
>
> -Krutika
>
>
> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan 
>> wrote:
>>
>>> Hi,
>>>
>>>
>>> Do you guys have any update regarding this issue ?
>>>
>> I do not actively work on this issue so I do not have an accurate update,
>> but from what I heard from Krutika and Raghavendra(works on DHT) is:
>> Krutika debugged initially and found that the issue seems more likely to be
>> in DHT, Satheesaran who helped us recreate this issue in lab found that
>> just fix-layout without rebalance also caused the corruption 1 out of 3
>> times. Raghavendra came up with a possible RCA for why this can happen.
>> Raghavendra(CCed) would be the right person to provide accurate update.
>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> --
>>> *From:* Krutika Dhananjay 
>>> *Sent:* Tuesday, March 21, 2017 3:02:55 PM
>>> *To:* Mahdi Adnan
>>> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
>>> gluster-users@gluster.org List
>>>
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>
>>> Hi,
>>>
>>> So it looks like Satheesaran managed to recreate this issue. We will be
>>> seeking his help in debugging this. It will be easier that way.
>>>
>>> -Krutika
>>>
>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan 
>>> wrote:
>>>
 Hello and thank you for your email.
 Actually no, i didn't check the gfid of the vms.
 If this will help, i can setup a new test cluster and get all the data
 you need.

 Get Outlook for Android 

 From: Nithya Balachandran
 Sent: Monday, March 20, 20:57
 Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
 To: Krutika Dhananjay
 Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
 gluster-users@gluster.org List

 Hi,

 Do you know the GFIDs of the VM images which were corrupted?

 Regards,

 Nithya

 On 20 March 2017 at 20:37, Krutika Dhananjay 
 wrote:

 I looked at the logs.

 From the time the new graph (since the add-brick command you shared
 where bricks 41 through 44 are added) is switched to (line 3011 onwards in
 nfs-gfapi.log), I see the following kinds of errors:

 1. Lookups to a bunch of files failed with ENOENT on both replicas
 which protocol/client converts to ESTALE. I am guessing these entries got
 migrated to

 other subvolumes leading to 'No such file or directory' errors.

 DHT and thereafter shard get the same error code and log the following:

  0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
 [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht:
 : failed to lookup the
 file on vmware2-dht [Stale file handle]


   1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
 [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed:
 a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]

 which is fine.

 2. The other kind are from AFR logging of possible split-brain which I
 suppose are harmless too.
 [2017-03-17 14:23:36.968883] W [MSGID: 108008]
 [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable
 subvolume -1 found with event generation 2 for gfid
 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)

 Since you are saying the bug is hit only on VMs that are undergoing IO
 

Re: [Gluster-users] Node count constraints with EC?

2017-03-30 Thread Gandalf Corvotempesta
How can I ensure that each parity brick is stored on a different server ?

Il 30 mar 2017 6:50 AM, "Ashish Pandey"  ha scritto:

> Hi Terry,
>
> There is not constraint on number of nodes for erasure coded volumes.
> However, there are some suggestions to keep in mind.
>
> If you have 4+2 configuration, that means you can loose maximum 2 bricks
> at a time without loosing your volume for IO.
> These bricks may fail because of node crash or node disconnection. That is
> why it is always good to have all the 6 bricks on 6 different nodes. If you
> have 3 bricks on one node and this nodes goes down then you
> will loose the volume and it will be inaccessible.
> So just keep in mind that you should not loose more than redundancy bricks
> even if any one node goes down.
>
> 
> Ashish
>
>
> --
> *From: *"Terry McGuire" 
> *To: *gluster-users@gluster.org
> *Sent: *Wednesday, March 29, 2017 11:59:32 PM
> *Subject: *[Gluster-users] Node count constraints with EC?
>
> Hello list.  Newbie question:  I’m building a low-performance/low-cost
> storage service with a starting size of about 500TB, and want to use
> Gluster with erasure coding.  I’m considering subvolumes of maybe 4+2, or
> 8+3 or 4.  I was thinking I’d spread these over 4 nodes, and add single
> nodes over time, with subvolumes rearranged over new nodes to maintain
> protection from whole node failures.
>
> However, reading through some RedHat-provided documentation, they seem to
> suggest that node counts should be a multiple of 3, 6 or 12, depending on
> subvolume config.  Is this actually a requirement, or is it only a
> suggestion for best performance or something?
>
> Can anyone comment on node count constraints with erasure coded subvolumes?
>
> Thanks in advance for anyone’s reply,
> Terry
>
> _
> Terry McGuire
> Information Services and Technology (IST)
> University of Alberta
> Edmonton, Alberta, Canada  T6G 2H1
> Phone:  780-492-9422 <(780)%20492-9422>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-03-29 Thread Gandalf Corvotempesta
Is rebalance and fix layout needed when adding new bricks?
Any workaround for extending a cluster without loose data?

Il 28 mar 2017 8:19 PM, "Pranith Kumar Karampuri"  ha
scritto:

>
>
> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan 
> wrote:
>
>> Hi,
>>
>>
>> Do you guys have any update regarding this issue ?
>>
> I do not actively work on this issue so I do not have an accurate update,
> but from what I heard from Krutika and Raghavendra(works on DHT) is:
> Krutika debugged initially and found that the issue seems more likely to be
> in DHT, Satheesaran who helped us recreate this issue in lab found that
> just fix-layout without rebalance also caused the corruption 1 out of 3
> times. Raghavendra came up with a possible RCA for why this can happen.
> Raghavendra(CCed) would be the right person to provide accurate update.
>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Krutika Dhananjay 
>> *Sent:* Tuesday, March 21, 2017 3:02:55 PM
>> *To:* Mahdi Adnan
>> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
>> gluster-users@gluster.org List
>>
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>> Hi,
>>
>> So it looks like Satheesaran managed to recreate this issue. We will be
>> seeking his help in debugging this. It will be easier that way.
>>
>> -Krutika
>>
>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan 
>> wrote:
>>
>>> Hello and thank you for your email.
>>> Actually no, i didn't check the gfid of the vms.
>>> If this will help, i can setup a new test cluster and get all the data
>>> you need.
>>>
>>> Get Outlook for Android 
>>>
>>> From: Nithya Balachandran
>>> Sent: Monday, March 20, 20:57
>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>> To: Krutika Dhananjay
>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
>>> gluster-users@gluster.org List
>>>
>>> Hi,
>>>
>>> Do you know the GFIDs of the VM images which were corrupted?
>>>
>>> Regards,
>>>
>>> Nithya
>>>
>>> On 20 March 2017 at 20:37, Krutika Dhananjay 
>>> wrote:
>>>
>>> I looked at the logs.
>>>
>>> From the time the new graph (since the add-brick command you shared
>>> where bricks 41 through 44 are added) is switched to (line 3011 onwards in
>>> nfs-gfapi.log), I see the following kinds of errors:
>>>
>>> 1. Lookups to a bunch of files failed with ENOENT on both replicas which
>>> protocol/client converts to ESTALE. I am guessing these entries got
>>> migrated to
>>>
>>> other subvolumes leading to 'No such file or directory' errors.
>>>
>>> DHT and thereafter shard get the same error code and log the following:
>>>
>>>  0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
>>> [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht:
>>> : failed to lookup the
>>> file on vmware2-dht [Stale file handle]
>>>
>>>
>>>   1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed:
>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]
>>>
>>> which is fine.
>>>
>>> 2. The other kind are from AFR logging of possible split-brain which I
>>> suppose are harmless too.
>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008]
>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable
>>> subvolume -1 found with event generation 2 for gfid
>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)
>>>
>>> Since you are saying the bug is hit only on VMs that are undergoing IO
>>> while rebalance is running (as opposed to those that remained powered off),
>>>
>>> rebalance + IO could be causing some issues.
>>>
>>> CC'ing DHT devs
>>>
>>> Raghavendra/Nithya/Susant,
>>>
>>> Could you take a look?
>>>
>>> -Krutika
>>>
>>>
>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan 
>>> wrote:
>>>
>>> Thank you for your email mate.
>>>
>>> Yes, im aware of this but, to save costs i chose replica 2, this cluster
>>> is all flash.
>>>
>>> In version 3.7.x i had issues with ping timeout, if one hosts went down
>>> for few seconds the whole cluster hangs and become unavailable, to avoid
>>> this i adjusted the ping timeout to 5 seconds.
>>>
>>> As for choosing Ganesha over gfapi, VMWare does not support Gluster
>>> (FUSE or gfapi) im stuck with NFS for this volume.
>>>
>>> The other volume is mounted using gfapi in oVirt cluster.
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> *From:* Krutika Dhananjay 
>>> *Sent:* Sunday, March 19, 2017 2:01:49 PM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users@gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>
>>>
>>>
>>> While I'm still going through the logs, just wanted to point out a
>>> couple of things:
>>>
>>> 1. It is recommended that you use 3-way replication (replica count 3)
>>> for VM store use case
>>>
>>> 2. network.ping-timeout at 5 seconds is way too low. Please change it to
>>> 30.
>>>
>>> Is there any specific rea

Re: [Gluster-users] [Gluster-devel] Release 3.10.1: Scheduled for the 30th of March

2017-03-27 Thread Gandalf Corvotempesta
2017-03-27 18:59 GMT+02:00 Shyam :
> 1) Are there any pending *blocker* bugs that need to be tracked for 3.10.1?
> If so mark them against the provided tracker [2] as blockers for the
> release, or at the very least post them as a response to this mail

I think that file corruption when sharding is enabled during a rebalance *must*
be considered blocker for any release.

If you still continue to release new version with the same bugs around,
you only create confusion to users, willing to think that bugs
(especially the critical ones)
are fixed.

I've never ever seen a software releasing new versions ignoring existing bugs.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Backups

2017-03-23 Thread Gandalf Corvotempesta
The problem is not how to backup, but how to restore.
How do you restore a whole cluster made of thousands of VMs ?

If you move all VMs to a shared storage like gluster, you should
consider how to recover everything from the gluster failure.
If you had a bounch of VMs on each server with local disks, you had to
recover only VMs affected by a single server failure,
but moving everything to a shared storage means to be prepared for a
disaster, where you *must* restore everything or hundreds of TB.

2017-03-23 23:07 GMT+01:00 Gambit15 :
> Don't snapshot the entire gluster volume, keep a rolling routine for
> snapshotting the individual VMs & rsync those.
> As already mentioned, you need to "itemize" the backups - trying to manage
> backups for the whole volume as a single unit is just crazy!
>
> Also, for long term backups, maintaining just the core data of each VM is
> far more manageable.
>
> I settled on oVirt for our platform, and do the following...
>
> A cronjob regularly snapshots & clones each VM, whose image is then rsynced
> to our backup storage;
> The backup server snapshots the VM's image backup volume to maintain
> history/versioning;
> These full images are only maintained for 30 days, for DR purposes;
> A separate routine rsyncs the VM's core data to its own data backup volume,
> which is snapshotted & maintained for 10 years;
>
> This could be made more efficient by using guestfish to extract the core
> data from backup image, instead of basically rsyncing the data across the
> network twice.
>
> That active storage layer uses Gluster on top of XFS & LVM. The backup
> storage layer uses a mirrored storage unit running ZFS on FreeNAS.
> This of course doesn't allow for HA in the case of the entire cloud failing.
> For that we'd use geo-rep & a big fat pipe.
>
> D
>
> On 23 March 2017 at 16:29, Gandalf Corvotempesta
>  wrote:
>>
>> Yes but the biggest issue is how to recover
>> You'll need to recover the whole storage not a single snapshot and this
>> can last for days
>>
>> Il 23 mar 2017 9:24 PM, "Alvin Starr"  ha scritto:
>>>
>>> For volume backups you need something like snapshots.
>>>
>>> If you take a snapshot A of a live volume L that snapshot stays at that
>>> moment in time and you can rsync that to another system or use something
>>> like deltacp.pl to copy it.
>>>
>>> The usual process is to delete the snapshot once its copied and than
>>> repeat the process again when the next backup is required.
>>>
>>> That process does require rsync/deltacp to read the complete volume on
>>> both systems which can take a long time.
>>>
>>> I was kicking around the idea to try and handle snapshot deltas better.
>>>
>>> The idea is that you could take your initial snapshot A then sync that
>>> snapshot to your backup system.
>>>
>>> At a later point you could take another snapshot B.
>>>
>>> Because snapshots contain the copies of the original data at the time of
>>> the snapshot and unmodified data points to the Live volume it is possible to
>>> tell what blocks of data have changed since the snapshot was taken.
>>>
>>> Now that you have a second snapshot you can in essence perform a diff on
>>> the A and B snapshots to get only the blocks that changed up to the time
>>> that B was taken.
>>>
>>> These blocks could be copied to the backup image and you should have a
>>> clone of the B snapshot.
>>>
>>> You would not have to read the whole volume image but just the changed
>>> blocks dramatically improving the speed of the backup.
>>>
>>> At this point you can delete the A snapshot and promote the B snapshot to
>>> be the A snapshot for the next backup round.
>>>
>>>
>>> On 03/23/2017 03:53 PM, Gandalf Corvotempesta wrote:
>>>
>>> Are backup consistent?
>>> What happens if the header on shard0 is synced referring to some data on
>>> shard450 and when rsync parse shard450 this data is changed by subsequent
>>> writes?
>>>
>>> Header would be backupped  of sync respect the rest of the image
>>>
>>> Il 23 mar 2017 8:48 PM, "Joe Julian"  ha scritto:
>>>>
>>>> The rsync protocol only passes blocks that have actually changed. Raw
>>>> changes fewer bits. You're right, though, that it still has to check the
>>>> entire file for those changes.
>>>>
>>>>
>>>> On 03/23/17 12:47, Gandalf C

Re: [Gluster-users] Backups

2017-03-23 Thread Gandalf Corvotempesta
Yes but the biggest issue is how to recover
You'll need to recover the whole storage not a single snapshot and this can
last for days

Il 23 mar 2017 9:24 PM, "Alvin Starr"  ha scritto:

> For volume backups you need something like snapshots.
>
> If you take a snapshot A of a live volume L that snapshot stays at that
> moment in time and you can rsync that to another system or use something
> like deltacp.pl to copy it.
>
> The usual process is to delete the snapshot once its copied and than
> repeat the process again when the next backup is required.
>
> That process does require rsync/deltacp to read the complete volume on
> both systems which can take a long time.
>
> I was kicking around the idea to try and handle snapshot deltas better.
>
> The idea is that you could take your initial snapshot A then sync that
> snapshot to your backup system.
>
> At a later point you could take another snapshot B.
>
> Because snapshots contain the copies of the original data at the time of
> the snapshot and unmodified data points to the Live volume it is possible
> to tell what blocks of data have changed since the snapshot was taken.
>
> Now that you have a second snapshot you can in essence perform a diff on
> the A and B snapshots to get only the blocks that changed up to the time
> that B was taken.
>
> These blocks could be copied to the backup image and you should have a
> clone of the B snapshot.
>
> You would not have to read the whole volume image but just the changed
> blocks dramatically improving the speed of the backup.
>
> At this point you can delete the A snapshot and promote the B snapshot to
> be the A snapshot for the next backup round.
>
> On 03/23/2017 03:53 PM, Gandalf Corvotempesta wrote:
>
> Are backup consistent?
> What happens if the header on shard0 is synced referring to some data on
> shard450 and when rsync parse shard450 this data is changed by subsequent
> writes?
>
> Header would be backupped  of sync respect the rest of the image
>
> Il 23 mar 2017 8:48 PM, "Joe Julian"  ha scritto:
>
>> The rsync protocol only passes blocks that have actually changed. Raw
>> changes fewer bits. You're right, though, that it still has to check the
>> entire file for those changes.
>>
>> On 03/23/17 12:47, Gandalf Corvotempesta wrote:
>>
>> Raw or qcow doesn't change anything about the backup.
>> Georep always have to sync the whole file
>>
>> Additionally, raw images has much less features than qcow
>>
>> Il 23 mar 2017 8:40 PM, "Joe Julian"  ha scritto:
>>
>>> I always use raw images. And yes, sharding would also be good.
>>>
>>> On 03/23/17 12:36, Gandalf Corvotempesta wrote:
>>>
>>> Georep expose to another problem:
>>> When using gluster as storage for VM, the VM file is saved as qcow.
>>> Changes are inside the qcow, thus rsync has to sync the whole file every
>>> time
>>>
>>> A little workaround would be sharding, as rsync has to sync only the
>>> changed shards, but I don't think this is a good solution
>>>
>>> Il 23 mar 2017 8:33 PM, "Joe Julian"  ha scritto:
>>>
>>>> In many cases, a full backup set is just not feasible. Georep to the
>>>> same or different DC may be an option if the bandwidth can keep up with the
>>>> change set. If not, maybe breaking the data up into smaller more manageable
>>>> volumes where you only keep a smaller set of critical data and just back
>>>> that up. Perhaps an object store (swift?) might handle fault tolerance
>>>> distribution better for some workloads.
>>>>
>>>> There's no one right answer.
>>>>
>>>> On 03/23/17 12:23, Gandalf Corvotempesta wrote:
>>>>
>>>> Backing up from inside each VM doesn't solve the problem
>>>> If you have to backup 500VMs you just need more than 1 day and what if
>>>> you have to restore the whole gluster storage?
>>>>
>>>> How many days do you need to restore 1PB?
>>>>
>>>> Probably the only solution should be a georep in the same
>>>> datacenter/rack with a similiar cluster,
>>>> ready to became the master storage.
>>>> In this case you don't need to restore anything as data are already
>>>> there,
>>>> only a little bit back in time but this double the TCO
>>>>
>>>> Il 23 mar 2017 6:39 PM, "Serkan Çoban"  ha
>>>> scritto:
>>>>
>>>>> Assuming a backup window of 12 h

Re: [Gluster-users] Backups

2017-03-23 Thread Gandalf Corvotempesta
Are backup consistent?
What happens if the header on shard0 is synced referring to some data on
shard450 and when rsync parse shard450 this data is changed by subsequent
writes?

Header would be backupped  of sync respect the rest of the image

Il 23 mar 2017 8:48 PM, "Joe Julian"  ha scritto:

> The rsync protocol only passes blocks that have actually changed. Raw
> changes fewer bits. You're right, though, that it still has to check the
> entire file for those changes.
>
> On 03/23/17 12:47, Gandalf Corvotempesta wrote:
>
> Raw or qcow doesn't change anything about the backup.
> Georep always have to sync the whole file
>
> Additionally, raw images has much less features than qcow
>
> Il 23 mar 2017 8:40 PM, "Joe Julian"  ha scritto:
>
>> I always use raw images. And yes, sharding would also be good.
>>
>> On 03/23/17 12:36, Gandalf Corvotempesta wrote:
>>
>> Georep expose to another problem:
>> When using gluster as storage for VM, the VM file is saved as qcow.
>> Changes are inside the qcow, thus rsync has to sync the whole file every
>> time
>>
>> A little workaround would be sharding, as rsync has to sync only the
>> changed shards, but I don't think this is a good solution
>>
>> Il 23 mar 2017 8:33 PM, "Joe Julian"  ha scritto:
>>
>>> In many cases, a full backup set is just not feasible. Georep to the
>>> same or different DC may be an option if the bandwidth can keep up with the
>>> change set. If not, maybe breaking the data up into smaller more manageable
>>> volumes where you only keep a smaller set of critical data and just back
>>> that up. Perhaps an object store (swift?) might handle fault tolerance
>>> distribution better for some workloads.
>>>
>>> There's no one right answer.
>>>
>>> On 03/23/17 12:23, Gandalf Corvotempesta wrote:
>>>
>>> Backing up from inside each VM doesn't solve the problem
>>> If you have to backup 500VMs you just need more than 1 day and what if
>>> you have to restore the whole gluster storage?
>>>
>>> How many days do you need to restore 1PB?
>>>
>>> Probably the only solution should be a georep in the same
>>> datacenter/rack with a similiar cluster,
>>> ready to became the master storage.
>>> In this case you don't need to restore anything as data are already
>>> there,
>>> only a little bit back in time but this double the TCO
>>>
>>> Il 23 mar 2017 6:39 PM, "Serkan Çoban"  ha
>>> scritto:
>>>
>>>> Assuming a backup window of 12 hours, you need to send data at 25GB/s
>>>> to backup solution.
>>>> Using 10G Ethernet on hosts you need at least 25 host to handle 25GB/s.
>>>> You can create an EC gluster cluster that can handle this rates, or
>>>> you just backup valuable data from inside VMs using open source backup
>>>> tools like borg,attic,restic , etc...
>>>>
>>>> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
>>>>  wrote:
>>>> > Let's assume a 1PB storage full of VMs images with each brick over
>>>> ZFS,
>>>> > replica 3, sharding enabled
>>>> >
>>>> > How do you backup/restore that amount of data?
>>>> >
>>>> > Backing up daily is impossible, you'll never finish the backup that
>>>> the
>>>> > following one is starting (in other words, you need more than 24
>>>> hours)
>>>> >
>>>> > Restoring is even worse. You need more than 24 hours with the whole
>>>> cluster
>>>> > down
>>>> >
>>>> > You can't rely on ZFS snapshot due to sharding (the snapshot took
>>>> from one
>>>> > node is useless without all other node related at the same shard) and
>>>> you
>>>> > still have the same restore speed
>>>> >
>>>> > How do you backup this?
>>>> >
>>>> > Even georep isn't enough, if you have to restore the whole storage in
>>>> case
>>>> > of disaster
>>>> >
>>>> > ___
>>>> > Gluster-users mailing list
>>>> > Gluster-users@gluster.org
>>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>> ___
>>> Gluster-users mailing 
>>> listGluster-users@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> ___ Gluster-users mailing
>>> list Gluster-users@gluster.org http://lists.gluster.org/mailm
>>> an/listinfo/gluster-users
>>
>>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Backups

2017-03-23 Thread Gandalf Corvotempesta
Raw or qcow doesn't change anything about the backup.
Georep always have to sync the whole file

Additionally, raw images has much less features than qcow

Il 23 mar 2017 8:40 PM, "Joe Julian"  ha scritto:

> I always use raw images. And yes, sharding would also be good.
>
> On 03/23/17 12:36, Gandalf Corvotempesta wrote:
>
> Georep expose to another problem:
> When using gluster as storage for VM, the VM file is saved as qcow.
> Changes are inside the qcow, thus rsync has to sync the whole file every
> time
>
> A little workaround would be sharding, as rsync has to sync only the
> changed shards, but I don't think this is a good solution
>
> Il 23 mar 2017 8:33 PM, "Joe Julian"  ha scritto:
>
>> In many cases, a full backup set is just not feasible. Georep to the same
>> or different DC may be an option if the bandwidth can keep up with the
>> change set. If not, maybe breaking the data up into smaller more manageable
>> volumes where you only keep a smaller set of critical data and just back
>> that up. Perhaps an object store (swift?) might handle fault tolerance
>> distribution better for some workloads.
>>
>> There's no one right answer.
>>
>> On 03/23/17 12:23, Gandalf Corvotempesta wrote:
>>
>> Backing up from inside each VM doesn't solve the problem
>> If you have to backup 500VMs you just need more than 1 day and what if
>> you have to restore the whole gluster storage?
>>
>> How many days do you need to restore 1PB?
>>
>> Probably the only solution should be a georep in the same datacenter/rack
>> with a similiar cluster,
>> ready to became the master storage.
>> In this case you don't need to restore anything as data are already
>> there,
>> only a little bit back in time but this double the TCO
>>
>> Il 23 mar 2017 6:39 PM, "Serkan Çoban"  ha
>> scritto:
>>
>>> Assuming a backup window of 12 hours, you need to send data at 25GB/s
>>> to backup solution.
>>> Using 10G Ethernet on hosts you need at least 25 host to handle 25GB/s.
>>> You can create an EC gluster cluster that can handle this rates, or
>>> you just backup valuable data from inside VMs using open source backup
>>> tools like borg,attic,restic , etc...
>>>
>>> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
>>>  wrote:
>>> > Let's assume a 1PB storage full of VMs images with each brick over ZFS,
>>> > replica 3, sharding enabled
>>> >
>>> > How do you backup/restore that amount of data?
>>> >
>>> > Backing up daily is impossible, you'll never finish the backup that the
>>> > following one is starting (in other words, you need more than 24 hours)
>>> >
>>> > Restoring is even worse. You need more than 24 hours with the whole
>>> cluster
>>> > down
>>> >
>>> > You can't rely on ZFS snapshot due to sharding (the snapshot took from
>>> one
>>> > node is useless without all other node related at the same shard) and
>>> you
>>> > still have the same restore speed
>>> >
>>> > How do you backup this?
>>> >
>>> > Even georep isn't enough, if you have to restore the whole storage in
>>> case
>>> > of disaster
>>> >
>>> > ___
>>> > Gluster-users mailing list
>>> > Gluster-users@gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>> ___
>> Gluster-users mailing 
>> listGluster-users@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> ___ Gluster-users mailing
>> list Gluster-users@gluster.org http://lists.gluster.org/mailm
>> an/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Backups

2017-03-23 Thread Gandalf Corvotempesta
Maybe exposing the volume as iscsi and then using zfs over iscsi on each
hypervisor?
In this case I'll be able to use zfs snapshot and send them to the backup
server

Il 23 mar 2017 8:36 PM, "Gandalf Corvotempesta" <
gandalf.corvotempe...@gmail.com> ha scritto:

> Georep expose to another problem:
> When using gluster as storage for VM, the VM file is saved as qcow.
> Changes are inside the qcow, thus rsync has to sync the whole file every
> time
>
> A little workaround would be sharding, as rsync has to sync only the
> changed shards, but I don't think this is a good solution
>
> Il 23 mar 2017 8:33 PM, "Joe Julian"  ha scritto:
>
>> In many cases, a full backup set is just not feasible. Georep to the same
>> or different DC may be an option if the bandwidth can keep up with the
>> change set. If not, maybe breaking the data up into smaller more manageable
>> volumes where you only keep a smaller set of critical data and just back
>> that up. Perhaps an object store (swift?) might handle fault tolerance
>> distribution better for some workloads.
>>
>> There's no one right answer.
>>
>> On 03/23/17 12:23, Gandalf Corvotempesta wrote:
>>
>> Backing up from inside each VM doesn't solve the problem
>> If you have to backup 500VMs you just need more than 1 day and what if
>> you have to restore the whole gluster storage?
>>
>> How many days do you need to restore 1PB?
>>
>> Probably the only solution should be a georep in the same datacenter/rack
>> with a similiar cluster,
>> ready to became the master storage.
>> In this case you don't need to restore anything as data are already
>> there,
>> only a little bit back in time but this double the TCO
>>
>> Il 23 mar 2017 6:39 PM, "Serkan Çoban"  ha
>> scritto:
>>
>>> Assuming a backup window of 12 hours, you need to send data at 25GB/s
>>> to backup solution.
>>> Using 10G Ethernet on hosts you need at least 25 host to handle 25GB/s.
>>> You can create an EC gluster cluster that can handle this rates, or
>>> you just backup valuable data from inside VMs using open source backup
>>> tools like borg,attic,restic , etc...
>>>
>>> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
>>>  wrote:
>>> > Let's assume a 1PB storage full of VMs images with each brick over ZFS,
>>> > replica 3, sharding enabled
>>> >
>>> > How do you backup/restore that amount of data?
>>> >
>>> > Backing up daily is impossible, you'll never finish the backup that the
>>> > following one is starting (in other words, you need more than 24 hours)
>>> >
>>> > Restoring is even worse. You need more than 24 hours with the whole
>>> cluster
>>> > down
>>> >
>>> > You can't rely on ZFS snapshot due to sharding (the snapshot took from
>>> one
>>> > node is useless without all other node related at the same shard) and
>>> you
>>> > still have the same restore speed
>>> >
>>> > How do you backup this?
>>> >
>>> > Even georep isn't enough, if you have to restore the whole storage in
>>> case
>>> > of disaster
>>> >
>>> > ___
>>> > Gluster-users mailing list
>>> > Gluster-users@gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>> ___
>> Gluster-users mailing 
>> listGluster-users@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Backups

2017-03-23 Thread Gandalf Corvotempesta
Georep expose to another problem:
When using gluster as storage for VM, the VM file is saved as qcow. Changes
are inside the qcow, thus rsync has to sync the whole file every time

A little workaround would be sharding, as rsync has to sync only the
changed shards, but I don't think this is a good solution

Il 23 mar 2017 8:33 PM, "Joe Julian"  ha scritto:

> In many cases, a full backup set is just not feasible. Georep to the same
> or different DC may be an option if the bandwidth can keep up with the
> change set. If not, maybe breaking the data up into smaller more manageable
> volumes where you only keep a smaller set of critical data and just back
> that up. Perhaps an object store (swift?) might handle fault tolerance
> distribution better for some workloads.
>
> There's no one right answer.
>
> On 03/23/17 12:23, Gandalf Corvotempesta wrote:
>
> Backing up from inside each VM doesn't solve the problem
> If you have to backup 500VMs you just need more than 1 day and what if you
> have to restore the whole gluster storage?
>
> How many days do you need to restore 1PB?
>
> Probably the only solution should be a georep in the same datacenter/rack
> with a similiar cluster,
> ready to became the master storage.
> In this case you don't need to restore anything as data are already there,
> only a little bit back in time but this double the TCO
>
> Il 23 mar 2017 6:39 PM, "Serkan Çoban"  ha scritto:
>
>> Assuming a backup window of 12 hours, you need to send data at 25GB/s
>> to backup solution.
>> Using 10G Ethernet on hosts you need at least 25 host to handle 25GB/s.
>> You can create an EC gluster cluster that can handle this rates, or
>> you just backup valuable data from inside VMs using open source backup
>> tools like borg,attic,restic , etc...
>>
>> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
>>  wrote:
>> > Let's assume a 1PB storage full of VMs images with each brick over ZFS,
>> > replica 3, sharding enabled
>> >
>> > How do you backup/restore that amount of data?
>> >
>> > Backing up daily is impossible, you'll never finish the backup that the
>> > following one is starting (in other words, you need more than 24 hours)
>> >
>> > Restoring is even worse. You need more than 24 hours with the whole
>> cluster
>> > down
>> >
>> > You can't rely on ZFS snapshot due to sharding (the snapshot took from
>> one
>> > node is useless without all other node related at the same shard) and
>> you
>> > still have the same restore speed
>> >
>> > How do you backup this?
>> >
>> > Even georep isn't enough, if you have to restore the whole storage in
>> case
>> > of disaster
>> >
>> > ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org
>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> ___
> Gluster-users mailing 
> listGluster-users@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Backups

2017-03-23 Thread Gandalf Corvotempesta
Backing up from inside each VM doesn't solve the problem
If you have to backup 500VMs you just need more than 1 day and what if you
have to restore the whole gluster storage?

How many days do you need to restore 1PB?

Probably the only solution should be a georep in the same datacenter/rack
with a similiar cluster,
ready to became the master storage.
In this case you don't need to restore anything as data are already there,
only a little bit back in time but this double the TCO

Il 23 mar 2017 6:39 PM, "Serkan Çoban"  ha scritto:

> Assuming a backup window of 12 hours, you need to send data at 25GB/s
> to backup solution.
> Using 10G Ethernet on hosts you need at least 25 host to handle 25GB/s.
> You can create an EC gluster cluster that can handle this rates, or
> you just backup valuable data from inside VMs using open source backup
> tools like borg,attic,restic , etc...
>
> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
>  wrote:
> > Let's assume a 1PB storage full of VMs images with each brick over ZFS,
> > replica 3, sharding enabled
> >
> > How do you backup/restore that amount of data?
> >
> > Backing up daily is impossible, you'll never finish the backup that the
> > following one is starting (in other words, you need more than 24 hours)
> >
> > Restoring is even worse. You need more than 24 hours with the whole
> cluster
> > down
> >
> > You can't rely on ZFS snapshot due to sharding (the snapshot took from
> one
> > node is useless without all other node related at the same shard) and you
> > still have the same restore speed
> >
> > How do you backup this?
> >
> > Even georep isn't enough, if you have to restore the whole storage in
> case
> > of disaster
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Backups

2017-03-23 Thread Gandalf Corvotempesta
Let's assume a 1PB storage full of VMs images with each brick over ZFS,
replica 3, sharding enabled

How do you backup/restore that amount of data?

Backing up daily is impossible, you'll never finish the backup that the
following one is starting (in other words, you need more than 24 hours)

Restoring is even worse. You need more than 24 hours with the whole cluster
down

You can't rely on ZFS snapshot due to sharding (the snapshot took from one
node is useless without all other node related at the same shard) and you
still have the same restore speed

How do you backup this?

Even georep isn't enough, if you have to restore the whole storage in case
of disaster
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-03-18 Thread Gandalf Corvotempesta
Krutika, it wasn't an attack directly to you.
It wasn't an attack at all.

Gluster is a "SCALE-OUT" software defined storage, the folllowing is
wrote in the middle of the homepage:
"GlusterFS is a scalable network filesystem"

So, scaling a cluster is one of the primary goal of gluster.

A critical bug that prevent gluster from being scaled without loosing
data was discovered 1 year ago, and took 1 year to be fixed.

If gluster isn't able to ensure data consistency when doing it's
primary role, scaling up a storage, i'm sorry but it can't be
considered "enterprise" ready or production ready.
Maybe SOHO for small offices or home users, but in enterprises, data
consistency and reliability is the most important thing and gluster
isn't able to guarantee this even
doing a very basic routine procedure that should be considered as the
basis of the whole gluster project (as wrote on gluster's homepage)


2017-03-18 14:21 GMT+01:00 Krutika Dhananjay :
>
>
> On Sat, Mar 18, 2017 at 3:18 PM, Gandalf Corvotempesta
>  wrote:
>>
>> 2017-03-18 2:09 GMT+01:00 Lindsay Mathieson :
>> > Concerning, this was supposed to be fixed in 3.8.10
>>
>> Exactly. https://bugzilla.redhat.com/show_bug.cgi?id=1387878
>> Now let's see how much time they require to fix another CRITICAL bug.
>>
>> I'm really curious.
>
>
> Hey Gandalf!
>
> Let's see. There have been plenty of occasions where I've sat and worked on
> users' issues on weekends.
> And then again, I've got a life too outside of work (or at least I'm
> supposed to), you know.
> (And hey you know what! Today is Saturday and I'm sitting here and
> responding to your mail and collecting information
> on Mahdi's issue. Nobody asked me to look into it. I checked the mail and I
> had a choice to ignore it and not look into it until Monday.)
>
> Is there a genuine problem Mahdi is facing? Without a doubt!
>
> Got a constructive feedback to give? Please do.
> Do you want to give back to the community and help improve GlusterFS? There
> are plenty of ways to do that.
> One of them is testing out the releases and providing feedback. Sharding
> wouldn't have worked today, if not for Lindsay's timely
> and regular feedback in several 3.7.x releases.
>
> But this kind of criticism doesn't help.
>
> Also, spending time on users' issues is only one of the many
> responsibilities we have as developers.
> So what you see on mailing lists is just the tip of the iceberg.
>
> I have personally tried several times to recreate the add-brick bug on 3
> machines I borrowed from Kaleb. I haven't had success in recreating it.
> Reproducing VM-related bugs, in my experience, wasn't easy. I don't use
> Proxmox. Lindsay and Kevin did. There are a myriad qemu options used when
> launching vms. Different VM management projects (ovirt/Proxmox) use
> different defaults for these options. There are too many variables to be
> considered
> when debugging or trying to simulate the users' test.
>
> It's why I asked for Mahdi's help before 3.8.10 was out for feedback on the
> fix:
> http://lists.gluster.org/pipermail/gluster-users/2017-February/030112.html
>
> Alright. That's all I had to say.
>
> Happy weekend to you!
>
> -Krutika
>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-03-18 Thread Gandalf Corvotempesta
2017-03-18 2:09 GMT+01:00 Lindsay Mathieson :
> Concerning, this was supposed to be fixed in 3.8.10

Exactly. https://bugzilla.redhat.com/show_bug.cgi?id=1387878
Now let's see how much time they require to fix another CRITICAL bug.

I'm really curious.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Performance optimization

2017-03-17 Thread Gandalf Corvotempesta
Workload: VM hosting with sharding enabled, replica 3 (with or without
distribution, see below)

Which configuration will perform better:

a) 1 ZFS disk per brick, 1 brick per server. 1 disk for each server.
b) 1 ZFS mirror per brick, 1 brick per server. 1 disk for each server.
c) 1 ZFS disk per brick, 2 bricks per server (with distribution). 4
disks on each server.
d) 1 ZFS mirror per brick, 2 bricks per server (with distribution) 4
disks on each server.
e) any other combination of "c" or "d" by increasing the number of
disks per server.

I think that "distribution" ('c' or 'd') would give me better
performance, as in both reads and writes, multiple disks are involved,
as long Gluster tries to always use the maximum number of bricks,
during distribution.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] output of gluster peer status

2017-03-14 Thread Gandalf Corvotempesta
I can confirm this
Any solution?

Il 14 mar 2017 8:11 PM, "Sergei Gerasenko"  ha scritto:

> Hi everybody,
>
> Easy question: the output of *gluster peer status* on some of the hosts
> in the cluster has the hostname for all but one member of the cluster,
> which is listed by its ip. Is there an easy way to fix it? I know the
> information is coming from the peer files and I could edit the peer files
> directly, but can this be done through cli tools?
>
> Thanks,
>   Sergei
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Gandalf Corvotempesta
2017-03-10 11:39 GMT+01:00 Cedric Lemarchand :
> I am still asking myself how such bug could happen on a clustered storage 
> software, where adding bricks is a base feature for scalable solution, like 
> Gluster. Or maybe is it that STM releases are really under tested compared to 
> LTM ones ? Could we states that STM release are really not made for 
> production, or at least really risky ?

This is the same i've reported some months ago.
I think it's probably the worst thing in gluster. Tons of critical
bugs for critial features (that are also the basis features for a
storage software) that lead to data loss and still to be merged.

This kind of bugs *MUST* be addressed, fixed and released *ASAP*, not
after months and months and are still waiting for a review.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


  1   2   3   4   >