[Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-19 Thread Lian, George (NSB - CN/Hangzhou)
Hi, Gluster Experts,

In glusterfs version 3.12.3, There seems a “fstat” issue for ctime after we use 
fsync,
We have a demo execute binary which write some data and then do fsync for this 
file, it named as “tt”,
Then run tar command right after “tt” command, it will always error with “tar: 
/mnt/test/file1.txt: file changed as we read it”

The command output is list as the below, the source code and volume info 
configuration attached FYI,
This issue will be 100% reproducible! (/mnt/test is mountpoint of glusterfs 
volume “test” , which the volume info is attached in mail)
--
./tt;tar -czvf /tmp/abc.gz /mnt/test/file1.txt
mtime:1531247107.27200
ctime:1531247107.27200
tar: Removing leading `/' from member names
/mnt/test/file1.txt
tar: /mnt/test/file1.txt: file changed as we read it
--

After my investigation, the xattrop for changelog is later than the fsync 
response , this is mean:
In function  “afr_fsync_cbk” will call afr_delayed_changelog_wake_resume (this, 
local->fd, stub);

In our case, it always a pending changelog , so glusterfs save the metadata 
information to stub, and handle pending changelog first,
But the changelog will also change the ctime, from the packet captured by 
tcpdump, the response packet of xattrop will not include the metadata 
information,  and the wake_resume also not handle this metadata changed case.

So in this case, the metadata in mdc_cache is not right, and when cache is 
valid, the application will get WRONG metadata!

For verify my guess, if I change the configuration for this volume
“gluster v set test md-cache-timeout 0” or
“gluster v set export stat-prefetch off”
This issue will be GONE!


And I restore the configuration to default, which mean stat-prefetch is on and 
md-cache-timeout is 1 second,
I try invalidate the md-cache in source code as the below in function 
mdc_fync_cbk on md-cache.c
The issue also will be GONE!

So GLusterFS Experts,
Could you please verify this issue, and share your comments on my investigation?
And your finally solutions is highly appreciated!

changes in function “mdc_fsync_cbk”
int
mdc_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
   int32_t op_ret, int32_t op_errno,
   struct iatt *prebuf, struct iatt *postbuf, dict_t *xdata)
{
mdc_local_t  *local = NULL;

local = frame->local;

if (op_ret != 0)
goto out;

if (!local)
goto out;

mdc_inode_iatt_set_validate(this, local->fd->inode, prebuf, postbuf,
 _gf_true);
/* new added for ctime issue*/
mdc_inode_iatt_invalidate(this, local->fd->inode);
/* new added end*/
out:
MDC_STACK_UNWIND (fsync, frame, op_ret, op_errno, prebuf, postbuf,
  xdata);

return 0;
}
-
Best Regards,
George

#include 
#include 
#include 
#include 
#include 
#include 

void main() {
char* fileName = "/mnt/test/file1.txt";
char buf[128];
struct stat st;
struct timeval tv_begin, tv_end;

// create and write a file, then fflush and fsync
FILE* stream = fopen(fileName,"w");
fwrite("0123456789", sizeof(char), 10, stream);
fflush(stream);
fsync(fileno(stream));
//fsync(stream);
fclose(stream);

// last file status change timestamp
stat(fileName, );
printf("mtime:%06d.%06d\n", st.st_mtim.tv_sec, st.st_mtim.tv_nsec);
printf("ctime:%06d.%06d\n", st.st_ctim.tv_sec, st.st_ctim.tv_nsec);

}
bash-4.4# gluster v info test

Volume Name: test
Type: Replicate
Volume ID: 9373eba9-eb84-4618-a54c-f2837345daec
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: rcp:/trunk/brick/test1/sn0
Brick2: rcp:/trunk/brick/test1/sn1
Brick3: rcp:/trunk/brick/test1/sn2 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.quorum-type: none
cluster.quorum-reads: no
cluster.favorite-child-policy: mtime
diagnostics.client-log-level: INFO___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Coverity covscan for 2018-07-19-1ee1666d (master branch)

2018-07-19 Thread staticanalysis


GlusterFS Coverity covscan results for the master branch are available from
http://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2018-07-19-1ee1666d/

Coverity covscan results for other active branches are also available at
http://download.gluster.org/pub/gluster/glusterfs/static-analysis/

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2018-07-19 Thread Amar Tumballi
On Thu, Jul 19, 2018 at 6:06 PM, Jim Kinney  wrote:

> Too bad the RDMA will be abandoned. It's the perfect transport for
> intranode processing and data sync.
>
>


> I currently use RDMA on a computational cluster between nodes and gluster
> storage. The older IB cards will support 10G IP and 40G IB. I've had some
> success with connectivity but am still faltering with fuse performance. As
> soon as some retired gear is reconnected I'll have a test bed for HA NFS
> over RDMA to computational cluster and 10G IP to non-cluster systems.
>
> But it looks like Gluster 6 is a ways away so maybe I'll get more hardware
> or time to pitch in some code after groking enough IB.
>
>
We are happy to continue to make releases with RDMA for some more time if
there are users. The "proposal" is to make sure we give enough heads up
about the experts in that area not having cycles to make any more
enhancements to the feature.



> Thanks for the heads up and all the work to date.
>

Glad to hear back from you! Makes us realize there are things which we
haven't touched in some time, but people using them.

Thanks,
Amar


>
> On July 19, 2018 2:56:35 AM EDT, Amar Tumballi 
> wrote:
>>
>>
>> *Hi all,Over last 12 years of Gluster, we have developed many features,
>> and continue to support most of it till now. But along the way, we have
>> figured out better methods of doing things. Also we are not actively
>> maintaining some of these features.We are now thinking of cleaning up some
>> of these ‘unsupported’ features, and mark them as ‘SunSet’ (i.e., would be
>> totally taken out of codebase in following releases) in next upcoming
>> release, v5.0. The release notes will provide options for smoothly
>> migrating to the supported configurations.If you are using any of these
>> features, do let us know, so that we can help you with ‘migration’.. Also,
>> we are happy to guide new developers to work on those components which are
>> not actively being maintained by current set of developers.List of features
>> hitting sunset:‘cluster/stripe’ translator:This translator was developed
>> very early in the evolution of GlusterFS, and addressed one of the very
>> common question of Distributed FS, which is “What happens if one of my file
>> is bigger than the available brick. Say, I have 2 TB hard drive, exported
>> in glusterfs, my file is 3 TB”. While it solved the purpose, it was very
>> hard to handle failure scenarios, and give a real good experience to our
>> users with this feature. Over the time, Gluster solved the problem with
>> it’s ‘Shard’ feature, which solves the problem in much better way, and
>> provides much better solution with existing well supported stack. Hence the
>> proposal for Deprecation.If you are using this feature, then do write to
>> us, as it needs a proper migration from existing volume to a new full
>> supported volume type before you upgrade.‘storage/bd’ translator:This
>> feature got into the code base 5 years back with this patch
>> [1]. Plan was to use a block device
>> directly as a brick, which would help to handle disk-image storage much
>> easily in glusterfs.As the feature is not getting more contribution, and we
>> are not seeing any user traction on this, would like to propose for
>> Deprecation.If you are using the feature, plan to move to a supported
>> gluster volume configuration, and have your setup ‘supported’ before
>> upgrading to your new gluster version.‘RDMA’ transport support:Gluster
>> started supporting RDMA while ib-verbs was still new, and very high-end
>> infra around that time were using Infiniband. Engineers did work with
>> Mellanox, and got the technology into GlusterFS for better data migration,
>> data copy. While current day kernels support very good speed with IPoIB
>> module itself, and there are no more bandwidth for experts in these area to
>> maintain the feature, we recommend migrating over to TCP (IP based) network
>> for your volume.If you are successfully using RDMA transport, do get in
>> touch with us to prioritize the migration plan for your volume. Plan is to
>> work on this after the release, so by version 6.0, we will have a cleaner
>> transport code, which just needs to support one type.‘Tiering’
>> featureGluster’s tiering feature which was planned to be providing an
>> option to keep your ‘hot’ data in different location than your cold data,
>> so one can get better performance. While we saw some users for the feature,
>> it needs much more attention to be completely bug free. At the time, we are
>> not having any active maintainers for the feature, and hence suggesting to
>> take it out of the ‘supported’ tag.If you are willing to take it up, and
>> maintain it, do let us know, and we are happy to assist you.If you are
>> already using tiering feature, before upgrading, make sure to do gluster
>> volume tier detach all the bricks before upgrading to next release. Also,
>> we recommend you to use features like dmcache 

Re: [Gluster-devel] Gluster Documentation Hackathon - 7/19 through 7/23

2018-07-19 Thread Vijay Bellur
Once you are done with your bit for Gluster documentation, please update
your contributions here [2] for better co-ordination.

Thanks,
Vijay

[2] http://bit.ly/gluster-doc-hack-report

On Wed, Jul 18, 2018 at 9:57 AM Vijay Bellur  wrote:

> Hey All,
>
> We are organizing a hackathon to improve our upstream documentation. More
> details about the hackathon can be found at [1].
>
> Please feel free to let us know if you have any questions.
>
> Thanks,
> Amar & Vijay
>
> [1]
> https://docs.google.com/document/d/11LLGA-bwuamPOrKunxojzAEpHEGQxv8VJ68L3aKdPns/edit?usp=sharing
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-19 Thread Raghavendra Gowdappa
On Thu, Jul 19, 2018 at 2:29 PM, Lian, George (NSB - CN/Hangzhou) <
george.l...@nokia-sbell.com> wrote:

> Hi, Gluster Experts,
>
>
>
> In glusterfs version 3.12.3, There seems a “fstat” issue for ctime after
> we use fsync,
>
> We have a demo execute binary which write some data and then do fsync for
> this file, it named as “tt”,
>
> Then run tar command right after “tt” command, it will always error with
> “tar: /mnt/test/file1.txt: file changed as we read it”
>
>
>
> The command output is list as the below, the source code and volume info
> configuration attached FYI,
>
> This issue will be 100% reproducible! (/mnt/test is mountpoint of
> glusterfs volume “test” , which the volume info is attached in mail)
>
> --
>
> ./tt;tar -czvf /tmp/abc.gz /mnt/test/file1.txt
>
> mtime:1531247107.27200
>
> ctime:1531247107.27200
>
> tar: Removing leading `/' from member names
>
> /mnt/test/file1.txt
>
> tar: /mnt/test/file1.txt: file changed as we read it
>
> --
>
>
>
> After my investigation, the xattrop for changelog is later than the fsync
> response , this is mean:
>
> In function  “afr_fsync_cbk” will call afr_delayed_changelog_wake_resume
> (this, local->fd, stub);
>
>
>
> In our case, it always a pending changelog , so glusterfs save the
> metadata information to stub, and handle pending changelog first,
>
> But the changelog will also change the ctime, from the packet captured by
> tcpdump, the response packet of xattrop will not include the metadata
> information,  and the wake_resume also not handle this metadata changed
> case.
>
>
>
> So in this case, the metadata in mdc_cache is not right, and when cache is
> valid, the application will get WRONG metadata!
>
>
>
> For verify my guess, if I change the configuration for this volume
>
> “gluster v set test md-cache-timeout 0” or
>
> “gluster v set export stat-prefetch off”
>
> This issue will be GONE!
>

We recently identified an issue with stat-prefetch. Fix can be found at:
https://review.gluster.org/#/c/20410/11

Can you let us know whether this helps?


>
>
>
> And I restore the configuration to default, which mean stat-prefetch is on
> and md-cache-timeout is 1 second,
>
> I try invalidate the md-cache in source code as the below in function
> mdc_fync_cbk on md-cache.c
>
> The issue also will be GONE!
>
>
>
> So GLusterFS Experts,
>
> Could you please verify this issue, and share your comments on my
> investigation?
>
> And your finally solutions is highly appreciated!
>
>
Does the following fix you've posted solves the problem?


>
> changes in function “mdc_fsync_cbk”
>
> int
>
> mdc_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
>
>int32_t op_ret, int32_t op_errno,
>
>struct iatt *prebuf, struct iatt *postbuf, dict_t *xdata)
>
> {
>
> mdc_local_t  *local = NULL;
>
>
>
> local = frame->local;
>
>
>
> if (op_ret != 0)
>
> goto out;
>
>
>
> if (!local)
>
> goto out;
>
>
>
> mdc_inode_iatt_set_validate(this, local->fd->inode, prebuf,
> postbuf,
>
>  _gf_true);
>
> /* new added for ctime issue*/
>
> mdc_inode_iatt_invalidate(this, local->fd->inode);
>
>
> /* new added end*/
>
> out:
>
> MDC_STACK_UNWIND (fsync, frame, op_ret, op_errno, prebuf, postbuf,
>
>   xdata);
>
>
>
> return 0;
>
> }
>
> 
> -
>
> Best Regards,
>
> George
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-19 Thread Pranith Kumar Karampuri
+Ravi

On Thu, Jul 19, 2018 at 2:29 PM, Lian, George (NSB - CN/Hangzhou) <
george.l...@nokia-sbell.com> wrote:

> Hi, Gluster Experts,
>
>
>
> In glusterfs version 3.12.3, There seems a “fstat” issue for ctime after
> we use fsync,
>
> We have a demo execute binary which write some data and then do fsync for
> this file, it named as “tt”,
>
> Then run tar command right after “tt” command, it will always error with
> “tar: /mnt/test/file1.txt: file changed as we read it”
>
>
>
> The command output is list as the below, the source code and volume info
> configuration attached FYI,
>
> This issue will be 100% reproducible! (/mnt/test is mountpoint of
> glusterfs volume “test” , which the volume info is attached in mail)
>
> --
>
> ./tt;tar -czvf /tmp/abc.gz /mnt/test/file1.txt
>
> mtime:1531247107.27200
>
> ctime:1531247107.27200
>
> tar: Removing leading `/' from member names
>
> /mnt/test/file1.txt
>
> tar: /mnt/test/file1.txt: file changed as we read it
>
> --
>
>
>
> After my investigation, the xattrop for changelog is later than the fsync
> response , this is mean:
>
> In function  “afr_fsync_cbk” will call afr_delayed_changelog_wake_resume
> (this, local->fd, stub);
>
>
>
> In our case, it always a pending changelog , so glusterfs save the
> metadata information to stub, and handle pending changelog first,
>
> But the changelog will also change the ctime, from the packet captured by
> tcpdump, the response packet of xattrop will not include the metadata
> information,  and the wake_resume also not handle this metadata changed
> case.
>
>
>
> So in this case, the metadata in mdc_cache is not right, and when cache is
> valid, the application will get WRONG metadata!
>
>
>
> For verify my guess, if I change the configuration for this volume
>
> “gluster v set test md-cache-timeout 0” or
>
> “gluster v set export stat-prefetch off”
>
> This issue will be GONE!
>
>
>
>
>
> And I restore the configuration to default, which mean stat-prefetch is on
> and md-cache-timeout is 1 second,
>
> I try invalidate the md-cache in source code as the below in function
> mdc_fync_cbk on md-cache.c
>
> The issue also will be GONE!
>
>
>
> So GLusterFS Experts,
>
> Could you please verify this issue, and share your comments on my
> investigation?
>
> And your finally solutions is highly appreciated!
>
>
>
> changes in function “mdc_fsync_cbk”
>
> int
>
> mdc_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
>
>int32_t op_ret, int32_t op_errno,
>
>struct iatt *prebuf, struct iatt *postbuf, dict_t *xdata)
>
> {
>
> mdc_local_t  *local = NULL;
>
>
>
> local = frame->local;
>
>
>
> if (op_ret != 0)
>
> goto out;
>
>
>
> if (!local)
>
> goto out;
>
>
>
> mdc_inode_iatt_set_validate(this, local->fd->inode, prebuf,
> postbuf,
>
>  _gf_true);
>
> /* new added for ctime issue*/
>
> mdc_inode_iatt_invalidate(this, local->fd->inode);
>
>
> /* new added end*/
>
> out:
>
> MDC_STACK_UNWIND (fsync, frame, op_ret, op_errno, prebuf, postbuf,
>
>   xdata);
>
>
>
> return 0;
>
> }
>
> 
> -
>
> Best Regards,
>
> George
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2018-07-19 Thread Amar Tumballi
*Hi all,Over last 12 years of Gluster, we have developed many features, and
continue to support most of it till now. But along the way, we have figured
out better methods of doing things. Also we are not actively maintaining
some of these features.We are now thinking of cleaning up some of these
‘unsupported’ features, and mark them as ‘SunSet’ (i.e., would be totally
taken out of codebase in following releases) in next upcoming release,
v5.0. The release notes will provide options for smoothly migrating to the
supported configurations.If you are using any of these features, do let us
know, so that we can help you with ‘migration’.. Also, we are happy to
guide new developers to work on those components which are not actively
being maintained by current set of developers.List of features hitting
sunset:‘cluster/stripe’ translator:This translator was developed very early
in the evolution of GlusterFS, and addressed one of the very common
question of Distributed FS, which is “What happens if one of my file is
bigger than the available brick. Say, I have 2 TB hard drive, exported in
glusterfs, my file is 3 TB”. While it solved the purpose, it was very hard
to handle failure scenarios, and give a real good experience to our users
with this feature. Over the time, Gluster solved the problem with it’s
‘Shard’ feature, which solves the problem in much better way, and provides
much better solution with existing well supported stack. Hence the proposal
for Deprecation.If you are using this feature, then do write to us, as it
needs a proper migration from existing volume to a new full supported
volume type before you upgrade.‘storage/bd’ translator:This feature got
into the code base 5 years back with this patch
[1]. Plan was to use a block device
directly as a brick, which would help to handle disk-image storage much
easily in glusterfs.As the feature is not getting more contribution, and we
are not seeing any user traction on this, would like to propose for
Deprecation.If you are using the feature, plan to move to a supported
gluster volume configuration, and have your setup ‘supported’ before
upgrading to your new gluster version.‘RDMA’ transport support:Gluster
started supporting RDMA while ib-verbs was still new, and very high-end
infra around that time were using Infiniband. Engineers did work with
Mellanox, and got the technology into GlusterFS for better data migration,
data copy. While current day kernels support very good speed with IPoIB
module itself, and there are no more bandwidth for experts in these area to
maintain the feature, we recommend migrating over to TCP (IP based) network
for your volume.If you are successfully using RDMA transport, do get in
touch with us to prioritize the migration plan for your volume. Plan is to
work on this after the release, so by version 6.0, we will have a cleaner
transport code, which just needs to support one type.‘Tiering’
featureGluster’s tiering feature which was planned to be providing an
option to keep your ‘hot’ data in different location than your cold data,
so one can get better performance. While we saw some users for the feature,
it needs much more attention to be completely bug free. At the time, we are
not having any active maintainers for the feature, and hence suggesting to
take it out of the ‘supported’ tag.If you are willing to take it up, and
maintain it, do let us know, and we are happy to assist you.If you are
already using tiering feature, before upgrading, make sure to do gluster
volume tier detach all the bricks before upgrading to next release. Also,
we recommend you to use features like dmcache on your LVM setup to get best
performance from bricks.‘Quota’This is a call out for ‘Quota’ feature, to
let you all know that it will be ‘no new development’ state. While this
feature is ‘actively’ in use by many people, the challenges we have in
accounting mechanisms involved, has made it hard to achieve good
performance with the feature. Also, the amount of extended attribute
get/set operations while using the feature is not very ideal. Hence we
recommend our users to move towards setting quota on backend bricks
directly (ie, XFS project quota), or to use different volumes for different
directories etc.As the feature wouldn’t be deprecated immediately, the
feature doesn’t need a migration plan when you upgrade to newer version,
but if you are a new user, we wouldn’t recommend setting quota feature. By
the release dates, we will be publishing our best alternatives guide for
gluster’s current quota feature.Note that if you want to contribute to the
feature, we have project quota based issue open
[2] Happy to get
contributions, and help in getting a newer approach to
Quota.--These are our set of initial features
which we propose to take out of ‘fully’ supported features. While we are in
the process of making the user/developer experience of the