from:"Milos Vyletel"

[tip:perf/core] perf tools: Avoid possible race condition in copyfile()

2015-06-12 Thread tip-bot for Milos Vyletel

Commit-ID:  d7c72606d97e6f462a99b79e55b39808147d4c8b
Gitweb: http://git.kernel.org/tip/d7c72606d97e6f462a99b79e55b39808147d4c8b
Author: Milos Vyletel 
AuthorDate: Mon, 8 Jun 2015 16:50:16 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 10 Jun 2015 11:51:24 -0300

perf tools: Avoid possible race condition in copyfile()

Use unique temporary files when copying to buildid dir to prevent races
in case multiple instances are trying to copy same file. This is done by

- creating template in form /..XX where the suffix is
  used by mkstemp() to create unique file
- change file mode
- copy content
- if successful link temp file to target file
- unlink temp file

At this point the only file left at target path should be the desired
one either created by us or other instance if we raced. This should also
prevent not yet fully copied files to be visible to to other perf
instances that could try to parse them.

On top of that slow_copyfile no longer needs to deal with file mode when
creating file since temporary file is already created and mode is set.

Succesfully tested by myself by running perf record, archive and reading
the data on other system and by running perf buildid-cache on perf
binary itself. I also did revert fix from 0635b0f that to exposes
previously fixed race with EEXIST and recreator test passed sucessfully.

Signed-off-by: Milos Vyletel 
Acked-by: Ingo Molnar 
Cc: Andy Shevchenko 
Cc: Don Zickus 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Link: 
http://lkml.kernel.org/r/1433775018-19868-1-git-send-email-mi...@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/util.c | 46 +++---
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 0c264bc..edc2d63 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -115,20 +115,17 @@ int rm_rf(char *path)
return rmdir(path);
 }
 
-static int slow_copyfile(const char *from, const char *to, mode_t mode)
+static int slow_copyfile(const char *from, const char *to)
 {
int err = -1;
char *line = NULL;
size_t n;
FILE *from_fp = fopen(from, "r"), *to_fp;
-   mode_t old_umask;
 
if (from_fp == NULL)
goto out;
 
-   old_umask = umask(mode ^ 0777);
to_fp = fopen(to, "w");
-   umask(old_umask);
if (to_fp == NULL)
goto out_fclose_from;
 
@@ -178,29 +175,48 @@ int copyfile_mode(const char *from, const char *to, 
mode_t mode)
int fromfd, tofd;
struct stat st;
int err = -1;
+   char *tmp = NULL, *ptr = NULL;
 
if (stat(from, &st))
goto out;
 
-   if (st.st_size == 0) /* /proc? do it slowly... */
-   return slow_copyfile(from, to, mode);
-
-   fromfd = open(from, O_RDONLY);
-   if (fromfd < 0)
+   /* extra 'x' at the end is to reserve space for '.' */
+   if (asprintf(&tmp, "%s.XXx", to) < 0) {
+   tmp = NULL;
goto out;
+   }
+   ptr = strrchr(tmp, '/');
+   if (!ptr)
+   goto out;
+   ptr = memmove(ptr + 1, ptr, strlen(ptr) - 1);
+   *ptr = '.';
 
-   tofd = creat(to, mode);
+   tofd = mkstemp(tmp);
if (tofd < 0)
-   goto out_close_from;
+   goto out;
+
+   if (fchmod(tofd, mode))
+   goto out_close_to;
+
+   if (st.st_size == 0) { /* /proc? do it slowly... */
+   err = slow_copyfile(from, tmp);
+   goto out_close_to;
+   }
+
+   fromfd = open(from, O_RDONLY);
+   if (fromfd < 0)
+   goto out_close_to;
 
err = copyfile_offset(fromfd, 0, tofd, 0, st.st_size);
 
-   close(tofd);
-   if (err)
-   unlink(to);
-out_close_from:
close(fromfd);
+out_close_to:
+   close(tofd);
+   if (!err)
+   err = link(tmp, to);
+   unlink(tmp);
 out:
+   free(tmp);
return err;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] perf tools: avoid possible race condition in copyfile

2015-06-08 Thread Milos Vyletel

Use unique temporary files when copying to buildid dir to prevent races
in case multiple instances are trying to copy same file. This is done by

- creating template in form /..XX where the suffix is
  used by mkstemp() to create unique file
- change file mode
- copy content
- if successful link temp file to target file
- unlink temp file

At this point the only file left at target path should be the desired
one either created by us or other instance if we raced. This should also
prevent not yet fully copied files to be visible to to other perf
instances that could try to parse them.

On top of that slow_copyfile no longer needs to deal with file mode when
creating file since temporary file is already created and mode is set.

Succesfully tested by myself by running perf record, archive and reading
the data on other system and by running perf buildid-cache on perf binary
itself. I also did revert fix from 0635b0f that to exposes previously
fixed race with EEXIST and recreator test passed sucessfully.

Signed-off-by: Milos Vyletel 
Acked-by: Ingo Molnar 
---
 tools/perf/util/util.c | 46 +++---
 1 file changed, 31 insertions(+), 15 deletions(-)

v2 - rebased on top of tip/perf/core

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 0c264bc..edc2d63 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -115,20 +115,17 @@ int rm_rf(char *path)
return rmdir(path);
 }
 
-static int slow_copyfile(const char *from, const char *to, mode_t mode)
+static int slow_copyfile(const char *from, const char *to)
 {
int err = -1;
char *line = NULL;
size_t n;
FILE *from_fp = fopen(from, "r"), *to_fp;
-   mode_t old_umask;
 
if (from_fp == NULL)
goto out;
 
-   old_umask = umask(mode ^ 0777);
to_fp = fopen(to, "w");
-   umask(old_umask);
if (to_fp == NULL)
goto out_fclose_from;
 
@@ -178,29 +175,48 @@ int copyfile_mode(const char *from, const char *to, 
mode_t mode)
int fromfd, tofd;
struct stat st;
int err = -1;
+   char *tmp = NULL, *ptr = NULL;
 
if (stat(from, &st))
goto out;
 
-   if (st.st_size == 0) /* /proc? do it slowly... */
-   return slow_copyfile(from, to, mode);
-
-   fromfd = open(from, O_RDONLY);
-   if (fromfd < 0)
+   /* extra 'x' at the end is to reserve space for '.' */
+   if (asprintf(&tmp, "%s.XXx", to) < 0) {
+   tmp = NULL;
goto out;
+   }
+   ptr = strrchr(tmp, '/');
+   if (!ptr)
+   goto out;
+   ptr = memmove(ptr + 1, ptr, strlen(ptr) - 1);
+   *ptr = '.';
 
-   tofd = creat(to, mode);
+   tofd = mkstemp(tmp);
if (tofd < 0)
-   goto out_close_from;
+   goto out;
+
+   if (fchmod(tofd, mode))
+   goto out_close_to;
+
+   if (st.st_size == 0) { /* /proc? do it slowly... */
+   err = slow_copyfile(from, tmp);
+   goto out_close_to;
+   }
+
+   fromfd = open(from, O_RDONLY);
+   if (fromfd < 0)
+   goto out_close_to;
 
err = copyfile_offset(fromfd, 0, tofd, 0, st.st_size);
 
-   close(tofd);
-   if (err)
-   unlink(to);
-out_close_from:
close(fromfd);
+out_close_to:
+   close(tofd);
+   if (!err)
+   err = link(tmp, to);
+   unlink(tmp);
 out:
+   free(tmp);
return err;
 }
 
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tools: avoid possible race condition in copyfile

2015-06-08 Thread Milos Vyletel

On Mon, Jun 08, 2015 at 10:39:57AM -0300, Arnaldo Carvalho de Melo wrote:

> Em Thu, Jun 04, 2015 at 11:11:00AM +0200, Ingo Molnar escreveu:
> > * Milos Vyletel  wrote:
> > > On top of that slow_copyfile no longer needs to deal with file mode when
> > > creating file since temporary file is already created and mode is set.
> > > 
> > > Signed-off-by: Milos Vyletel 
> > 
> > Ok, that looks nice!
> > 
> > Assuming it passes testing you can add my ack to it:
> > 
> >   Acked-by: Ingo Molnar 
> 
> Can you please try applying it on tip/perf/core instead? I think it is
> clashing with:
> 
> commit 9c9f5a2f1944e8b6bf2b618d04b31e1c1760637e
> Author: Namhyung Kim 
> Date:   Mon May 18 09:30:18 2015 +0900
> 
> perf tools: Introduce copyfile_offset() function
> 

And also with 

commit 0b1de0be1eac7b23e89cb43c17b02d38ead6b6c8
Author: Namhyung Kim 
Date:   Mon May 18 09:30:17 2015 +0900

perf tools: Add rm_rf() utility function

Will resend version based on tip/perf/core after I rerun my testing.

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tools: avoid possible race condition in copyfile

2015-06-04 Thread Milos Vyletel

On Thu, Jun 04, 2015 at 11:11:00AM +0200, Ingo Molnar wrote:

> 
> * Milos Vyletel  wrote:
> 
> > Use unique temporary files when copying to buildid dir to prevent races
> > in case multiple instances are trying to copy same file. This is done by
> > 
> > - creating template in form /..XX where the suffix is
> >   used by mkstemp() to create unique file
> > - change file mode
> > - copy content
> > - if successful link temp file to target file
> > - unlink temp file
> > 
> > At this point the only file left at target path should be the desired
> > one either created by us or other instance if we raced. This should also
> > prevent not yet fully copied files to be visible to to other perf
> > instances that could try to parse them.
> > 
> > On top of that slow_copyfile no longer needs to deal with file mode when
> > creating file since temporary file is already created and mode is set.
> > 
> > Signed-off-by: Milos Vyletel 
> 
> Ok, that looks nice!

Thanks.

> 
> Assuming it passes testing you can add my ack to it:
> 

I did some testing myself by running perf record, archive and reading
the data on other system as well as running perf buildid-cache on perf
binary itself. I did revert fix from 0635b0f that to expose the race
with EEXIST and my recreator test passed sucessfully. I've also added
debug printfs in the code to really make sure the temporary files are
created, exist and then are moved back.
Everything worked as expected but this can surely use some more testing.

>   Acked-by: Ingo Molnar 
> 
> Is there any other place in tools/perf where we are using file locking or 
> racy 
> shared access to the same file(s)?
> 

There are two places in code I'm aware of which copy files and the both
end up calling copyfile_mode that this patch modifies. Those places are

build_id_cache__add_kcore
  kcore_copy
kcore_copy__copy_file
  copyfile_mode

build_id_cache__add_s
  copyfile
copyfile_mode

unless I'm missing something we should cover all cases.

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf tools: avoid possible race condition in copyfile

2015-06-04 Thread Milos Vyletel

Use unique temporary files when copying to buildid dir to prevent races
in case multiple instances are trying to copy same file. This is done by

- creating template in form /..XX where the suffix is
  used by mkstemp() to create unique file
- change file mode
- copy content
- if successful link temp file to target file
- unlink temp file

At this point the only file left at target path should be the desired
one either created by us or other instance if we raced. This should also
prevent not yet fully copied files to be visible to to other perf
instances that could try to parse them.

On top of that slow_copyfile no longer needs to deal with file mode when
creating file since temporary file is already created and mode is set.

Signed-off-by: Milos Vyletel 
---
 tools/perf/util/util.c | 48 
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 4ee6d0d..fec1e13 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -72,20 +72,17 @@ int mkdir_p(char *path, mode_t mode)
return (stat(path, &st) && mkdir(path, mode)) ? -1 : 0;
 }
 
-static int slow_copyfile(const char *from, const char *to, mode_t mode)
+static int slow_copyfile(const char *from, const char *to)
 {
int err = -1;
char *line = NULL;
size_t n;
FILE *from_fp = fopen(from, "r"), *to_fp;
-   mode_t old_umask;
 
if (from_fp == NULL)
goto out;
 
-   old_umask = umask(mode ^ 0777);
to_fp = fopen(to, "w");
-   umask(old_umask);
if (to_fp == NULL)
goto out_fclose_from;
 
@@ -108,36 +105,55 @@ int copyfile_mode(const char *from, const char *to, 
mode_t mode)
struct stat st;
void *addr;
int err = -1;
+   char *tmp = NULL, *ptr = NULL;
 
if (stat(from, &st))
goto out;
 
-   if (st.st_size == 0) /* /proc? do it slowly... */
-   return slow_copyfile(from, to, mode);
-
-   fromfd = open(from, O_RDONLY);
-   if (fromfd < 0)
+   /* extra 'x' at the end is to reserve space for '.' */
+   if (asprintf(&tmp, "%s.XXx", to) < 0) {
+   tmp = NULL;
goto out;
+   }
+   ptr = strrchr(tmp, '/');
+   if (!ptr)
+   goto out;
+   ptr = memmove(ptr + 1, ptr, strlen(ptr) - 1);
+   *ptr = '.';
 
-   tofd = creat(to, mode);
+   tofd = mkstemp(tmp);
if (tofd < 0)
-   goto out_close_from;
+   goto out;
+
+   if (fchmod(tofd, mode))
+   goto out_close_to;
+
+   if (st.st_size == 0) { /* /proc? do it slowly... */
+   err = slow_copyfile(from, tmp);
+   goto out_close_to;
+   }
+
+   fromfd = open(from, O_RDONLY);
+   if (fromfd < 0)
+   goto out_close_to;
 
addr = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fromfd, 0);
if (addr == MAP_FAILED)
-   goto out_close_to;
+   goto out_close_from;
 
if (write(tofd, addr, st.st_size) == st.st_size)
err = 0;
 
munmap(addr, st.st_size);
-out_close_to:
-   close(tofd);
-   if (err)
-   unlink(to);
 out_close_from:
close(fromfd);
+out_close_to:
+   close(tofd);
+   if (!err)
+   err = link(tmp, to);
+   unlink(tmp);
 out:
+   free(tmp);
return err;
 }
 
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf/tools: put new buildid locks to use

2015-06-03 Thread Milos Vyletel

On Wed, Jun 03, 2015 at 01:21:41PM +0200, Ingo Molnar wrote:

> 
> * Milos Vyletel  wrote:
> 
> > On Thu, May 14, 2015 at 07:38:08PM +0200, Ingo Molnar wrote:
> > 
> > > 
> > > * Milos Vyletel  wrote:
> > > 
> > > > On Thu, May 14, 2015 at 01:38:21PM +0200, Jiri Olsa wrote:
> > > > 
> > > > > On Thu, May 14, 2015 at 12:40:59PM +0200, Ingo Molnar wrote:
> > > > > > 
> > > > > > * Milos Vyletel  wrote:
> > > > > > 
> > > > > > > Use new read/write locks when accesing buildid directory on 
> > > > > > > places where
> > > > > > > we may race if multiple instances are run simultaneously.
> > > > > > 
> > > > > > Dunno, this will create locking interaction between multiple 
> > > > > > instances 
> > > > > > of perf - hanging each other, etc.
> > > > > > 
> > > > > > And it seems unnecessary: the buildid hierarchy is already spread 
> > > > > > out. 
> > > > > > What kind of races might there be?
> > > > > 
> > > > > there was just recently one fixed by commit:
> > > > >   0635b0f71424 perf tools: Fix race in build_id_cache__add_s()
> > > > > 
> > > > > havent checked the final patch yet, but the idea is to
> > > > > protect us from similar bugs
> > > > 
> > > > right. on top of race with EEXIST couple more are possible (EMLINK, 
> > > > ENOSPC, EDQUOT, ENOMEM... the only way to prevent them all is to 
> > > > lock this kind of operations and make sure we run one at a time.
> > > 
> > > Yeah, so the race pointed out in 0635b0f71424 can be (and should be) 
> > > fixed without locking:
> > > 
> > >  - first create the file under a process-private name under 
> > >~/.debug/tmp/ if the target does not exist yet
> > > 
> > >  - then fully fill it in with content
> > > 
> > >  - then link(2) it to the public target name, which VFS operation is
> > >atomic and may fail safely: at which point it got already created
> > >by someone else.
> > > 
> > >  - finally unlink() the private instance name and the target will now
> > >be the only instance left: either created by us, or by some other 
> > >perf instance in the rare racy case.
> > > 
> > > Since all of ~/.debug is on the same filesystem this should work fine.
> > > 
> > > Beyond avoiding locking this approach has another advantage: it's 
> > > transaction safe, so a crashed/interrupted perf instance won't corrupt 
> > > the debug database, it will only put fully constructed files into the 
> > > public build-id namespace. It at most leaves a stale private file 
> > > around in ~/.debug/tmp/.
> > > 
> > 
> > Ingo,
> > 
> > I finally found some time to make this change. While going over the code 
> > I've 
> > noticed one thing that would make concurrent creation even easier to solve. 
> > Instead of copying the file to temp file what about simply opening file 
> > with 
> > O_CREAT|O_EXCL? creat itself
> > 
> > "creat() is equivalent to open() with flags equal to 
> > O_CREAT|O_WRONLY|O_TRUNC."
> > 
> > addition of O_EXCL would
> > 
> > "Ensure that this call creates the file: if this flag is specified in 
> > conjunction with O_CREAT, and pathname already exists, then open() will 
> > fail."
> > 
> > This we would prevent truncation of already linked file in case link() 
> > races as 
> > in 0635b0f71424. What do you think?
> 
> But it would not prevent the problem of creating a not yet fully constructed 
> file 
> - which some other tool invocation could attempt to parse in an incomplete 
> fashion.
> 
> Using create+link+unlink avoids that race, the files in the publicly visible 
> namespace will always be fully constructed by the time they are made visible 
> (atomically).
> 

Got it. Will use the approach proposed by you.

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf/tools: put new buildid locks to use

2015-06-03 Thread Milos Vyletel

On Thu, May 14, 2015 at 07:38:08PM +0200, Ingo Molnar wrote:

> 
> * Milos Vyletel  wrote:
> 
> > On Thu, May 14, 2015 at 01:38:21PM +0200, Jiri Olsa wrote:
> > 
> > > On Thu, May 14, 2015 at 12:40:59PM +0200, Ingo Molnar wrote:
> > > > 
> > > > * Milos Vyletel  wrote:
> > > > 
> > > > > Use new read/write locks when accesing buildid directory on places 
> > > > > where
> > > > > we may race if multiple instances are run simultaneously.
> > > > 
> > > > Dunno, this will create locking interaction between multiple instances 
> > > > of perf - hanging each other, etc.
> > > > 
> > > > And it seems unnecessary: the buildid hierarchy is already spread out. 
> > > > What kind of races might there be?
> > > 
> > > there was just recently one fixed by commit:
> > >   0635b0f71424 perf tools: Fix race in build_id_cache__add_s()
> > > 
> > > havent checked the final patch yet, but the idea is to
> > > protect us from similar bugs
> > 
> > right. on top of race with EEXIST couple more are possible (EMLINK, 
> > ENOSPC, EDQUOT, ENOMEM... the only way to prevent them all is to 
> > lock this kind of operations and make sure we run one at a time.
> 
> Yeah, so the race pointed out in 0635b0f71424 can be (and should be) 
> fixed without locking:
> 
>  - first create the file under a process-private name under 
>~/.debug/tmp/ if the target does not exist yet
> 
>  - then fully fill it in with content
> 
>  - then link(2) it to the public target name, which VFS operation is
>atomic and may fail safely: at which point it got already created
>by someone else.
> 
>  - finally unlink() the private instance name and the target will now
>be the only instance left: either created by us, or by some other 
>perf instance in the rare racy case.
> 
> Since all of ~/.debug is on the same filesystem this should work fine.
> 
> Beyond avoiding locking this approach has another advantage: it's 
> transaction safe, so a crashed/interrupted perf instance won't corrupt 
> the debug database, it will only put fully constructed files into the 
> public build-id namespace. It at most leaves a stale private file 
> around in ~/.debug/tmp/.
> 

Ingo,

I finally found some time to make this change. While going over the code
I've noticed one thing that would make concurrent creation even easier
to solve. Instead of copying the file to temp file what about simply
opening file with O_CREAT|O_EXCL? creat itself

"creat() is equivalent to open() with flags equal to O_CREAT|O_WRONLY|O_TRUNC."

addition of O_EXCL would

"Ensure that this call creates the file: if this flag is specified in
conjunction with O_CREAT, and pathname already exists, then open() will fail."

This we would prevent truncation of already linked file in case link()
races as in 0635b0f71424. What do you think?

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf/tools: put new buildid locks to use

2015-05-20 Thread Milos Vyletel

On Thu, May 14, 2015 at 07:38:08PM +0200, Ingo Molnar wrote:

> 
> * Milos Vyletel  wrote:
> 
> > On Thu, May 14, 2015 at 01:38:21PM +0200, Jiri Olsa wrote:
> > 
> > > On Thu, May 14, 2015 at 12:40:59PM +0200, Ingo Molnar wrote:
> > > > 
> > > > * Milos Vyletel  wrote:
> > > > 
> > > > > Use new read/write locks when accesing buildid directory on places 
> > > > > where
> > > > > we may race if multiple instances are run simultaneously.
> > > > 
> > > > Dunno, this will create locking interaction between multiple instances 
> > > > of perf - hanging each other, etc.
> > > > 
> > > > And it seems unnecessary: the buildid hierarchy is already spread out. 
> > > > What kind of races might there be?
> > > 
> > > there was just recently one fixed by commit:
> > >   0635b0f71424 perf tools: Fix race in build_id_cache__add_s()
> > > 
> > > havent checked the final patch yet, but the idea is to
> > > protect us from similar bugs
> > 
> > right. on top of race with EEXIST couple more are possible (EMLINK, 
> > ENOSPC, EDQUOT, ENOMEM... the only way to prevent them all is to 
> > lock this kind of operations and make sure we run one at a time.
> 
> Yeah, so the race pointed out in 0635b0f71424 can be (and should be) 
> fixed without locking:
> 
>  - first create the file under a process-private name under 
>~/.debug/tmp/ if the target does not exist yet
> 
>  - then fully fill it in with content
> 
>  - then link(2) it to the public target name, which VFS operation is
>atomic and may fail safely: at which point it got already created
>by someone else.
> 
>  - finally unlink() the private instance name and the target will now
>be the only instance left: either created by us, or by some other 
>perf instance in the rare racy case.
> 
> Since all of ~/.debug is on the same filesystem this should work fine.
> 
> Beyond avoiding locking this approach has another advantage: it's 
> transaction safe, so a crashed/interrupted perf instance won't corrupt 
> the debug database, it will only put fully constructed files into the 
> public build-id namespace. It at most leaves a stale private file 
> around in ~/.debug/tmp/.
> 
> Really, we should be following the example of Git, which is using a 
> similar append-mostly flow to handle data, and generally avoids file 
> locking as much as possible - which is a whole new can of worms.
> 

Thanks Ingo. I'll take a look at this later this week.

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] perf/tools: put new buildid locks to use

2015-05-14 Thread Milos Vyletel

On Thu, May 14, 2015 at 01:38:21PM +0200, Jiri Olsa wrote:

> On Thu, May 14, 2015 at 12:40:59PM +0200, Ingo Molnar wrote:
> > 
> > * Milos Vyletel  wrote:
> > 
> > > Use new read/write locks when accesing buildid directory on places where
> > > we may race if multiple instances are run simultaneously.
> > 
> > Dunno, this will create locking interaction between multiple instances 
> > of perf - hanging each other, etc.
> > 
> > And it seems unnecessary: the buildid hierarchy is already spread out. 
> > What kind of races might there be?
> 
> there was just recently one fixed by commit:
>   0635b0f71424 perf tools: Fix race in build_id_cache__add_s()
> 
> havent checked the final patch yet, but the idea is to
> protect us from similar bugs

right. on top of race with EEXIST couple more are possible (EMLINK,
ENOSPC, EDQUOT, ENOMEM... the only way to prevent them all is to lock
this kind of operations and make sure we run one at a time.

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] perf/tools: put new buildid locks to use

2015-05-14 Thread Milos Vyletel

Use new read/write locks when accesing buildid directory on places where
we may race if multiple instances are run simultaneously.

Signed-off-by: Milos Vyletel 
---
 tools/perf/builtin-buildid-cache.c | 12 +++
 tools/perf/util/build-id.c | 41 ++
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-buildid-cache.c 
b/tools/perf/builtin-buildid-cache.c
index d47a0cd..4cf0a1d 100644
--- a/tools/perf/builtin-buildid-cache.c
+++ b/tools/perf/builtin-buildid-cache.c
@@ -94,8 +94,13 @@ static int build_id_cache__kcore_existing(const char 
*from_dir, char *to_dir,
char to_subdir[PATH_MAX];
struct dirent *dent;
int ret = -1;
+   int lockfd;
DIR *d;
 
+   buildid_dir_read_lock(&lockfd);
+   if (lockfd == -1)
+   return -1;
+
d = opendir(to_dir);
if (!d)
return -1;
@@ -121,6 +126,7 @@ static int build_id_cache__kcore_existing(const char 
*from_dir, char *to_dir,
}
 
closedir(d);
+   buildid_dir_read_unlock(lockfd);
 
return ret;
 }
@@ -130,6 +136,7 @@ static int build_id_cache__add_kcore(const char *filename, 
bool force)
char dir[32], sbuildid[BUILD_ID_SIZE * 2 + 1];
char from_dir[PATH_MAX], to_dir[PATH_MAX];
char *p;
+   int lockfd;
 
strlcpy(from_dir, filename, sizeof(from_dir));
 
@@ -156,6 +163,10 @@ static int build_id_cache__add_kcore(const char *filename, 
bool force)
scnprintf(to_dir, sizeof(to_dir), "%s/[kernel.kcore]/%s/%s",
  buildid_dir, sbuildid, dir);
 
+   buildid_dir_write_lock(&lockfd);
+   if (lockfd == -1)
+   return -1;
+
if (mkdir_p(to_dir, 0755))
return -1;
 
@@ -176,6 +187,7 @@ static int build_id_cache__add_kcore(const char *filename, 
bool force)
}
return -1;
}
+   buildid_dir_write_unlock(lockfd);
 
pr_debug("kcore added to build-id cache directory %s\n", to_dir);
 
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 46c41e1..13d23aa 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -17,6 +17,7 @@
 #include "tool.h"
 #include "header.h"
 #include "vdso.h"
+#include 
 
 
 static bool no_buildid_cache;
@@ -308,6 +309,7 @@ int build_id_cache__list_build_ids(const char *pathname,
DIR *dir;
struct dirent *d;
int ret = 0;
+   int lockfd = -1;
 
list = strlist__new(true, NULL);
dir_name = build_id_cache__dirname_from_path(pathname, false, false);
@@ -316,11 +318,17 @@ int build_id_cache__list_build_ids(const char *pathname,
goto out;
}
 
+   buildid_dir_read_lock(&lockfd);
+   if (lockfd == -1) {
+   ret = -errno;
+   goto out;
+   }
+
/* List up all dirents */
dir = opendir(dir_name);
if (!dir) {
ret = -errno;
-   goto out;
+   goto out_unlock;
}
 
while ((d = readdir(dir)) != NULL) {
@@ -330,6 +338,8 @@ int build_id_cache__list_build_ids(const char *pathname,
}
closedir(dir);
 
+out_unlock:
+   buildid_dir_read_unlock(lockfd);
 out:
free(dir_name);
if (ret)
@@ -347,6 +357,7 @@ int build_id_cache__add_s(const char *sbuild_id, const char 
*name,
char *realname = NULL, *filename = NULL, *dir_name = NULL,
 *linkname = zalloc(size), *targetname, *tmp;
int err = -1;
+   int lockfd;
 
if (!is_kallsyms) {
realname = realpath(name, NULL);
@@ -358,30 +369,34 @@ int build_id_cache__add_s(const char *sbuild_id, const 
char *name,
if (!dir_name)
goto out_free;
 
-   if (mkdir_p(dir_name, 0755))
+   buildid_dir_write_lock(&lockfd);
+   if (lockfd == -1)
goto out_free;
 
+   if (mkdir_p(dir_name, 0755))
+   goto out_unlock;
+
if (asprintf(&filename, "%s/%s", dir_name, sbuild_id) < 0) {
filename = NULL;
-   goto out_free;
+   goto out_unlock;
}
 
if (access(filename, F_OK)) {
if (is_kallsyms) {
 if (copyfile("/proc/kallsyms", filename))
-   goto out_free;
+   goto out_unlock;
} else if (link(realname, filename) && errno != EEXIST &&
copyfile(name, filename))
-   goto out_free;
+   goto out_unlock;
}
 
if (!build_id__filename(sbuild_id, linkname, size))
-   goto out_free;
+   goto out_unlock;
tmp = strrchr(linkname, '/');
*tmp = '\0';
 
if (

[PATCH 1/2] perf/tools: add read/write buildid dir locks

2015-05-14 Thread Milos Vyletel

Protect access to buildid dir by flock to prevent race conditions. This
patch adds buildid_dir_read/write_lock/unlock functions.

Signed-off-by: Milos Vyletel 
---
 tools/perf/util/build-id.c | 56 ++
 tools/perf/util/build-id.h |  5 +
 2 files changed, 61 insertions(+)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 61867df..46c41e1 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -536,3 +536,59 @@ bool perf_session__read_build_ids(struct perf_session 
*session, bool with_hits)
 
return ret;
 }
+
+static int buildid_dir_lock(int op)
+{
+   int lockfd, wait = 30;
+   const size_t size = PATH_MAX;
+   char *lockfile = zalloc(size);
+
+   scnprintf(lockfile, size, "%s/.lock", buildid_dir);
+
+   lockfd = open(lockfile, O_RDONLY|O_CREAT, S_IRUSR);
+   if (lockfd  == -1)
+   goto out;
+
+   while (1) {
+   if (flock(lockfd, op | LOCK_NB) == 0)
+   goto out;
+   else if (wait == 0) {
+   close(lockfd);
+   lockfd = -1;
+   goto out;
+   }
+   if (--wait % 10 == 0)
+   pr_info("Waiting for %s lock to be available\n", 
buildid_dir);
+   usleep(100);
+   }
+
+out:
+   free(lockfile);
+   return lockfd;
+}
+
+static void buildid_dir_unlock(int fd)
+{
+   flock(fd, LOCK_UN);
+   close(fd);
+}
+
+void buildid_dir_write_lock(int *fd)
+{
+   *fd = buildid_dir_lock(LOCK_EX);
+}
+
+void buildid_dir_read_lock(int *fd)
+{
+   *fd = buildid_dir_lock(LOCK_SH);
+}
+
+void buildid_dir_write_unlock(int fd)
+{
+   buildid_dir_unlock(fd);
+}
+
+void buildid_dir_read_unlock(int fd)
+{
+   buildid_dir_unlock(fd);
+}
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index 8501122..326cd2e 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -31,4 +31,9 @@ int build_id_cache__add_s(const char *sbuild_id,
 int build_id_cache__remove_s(const char *sbuild_id);
 void disable_buildid_cache(void);
 
+void buildid_dir_write_lock(int *);
+void buildid_dir_read_lock(int *);
+void buildid_dir_write_unlock(int);
+void buildid_dir_read_unlock(int);
+
 #endif
-- 
2.4.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] Add perf debug dir locking

2015-05-14 Thread Milos Vyletel

Hi,

this series is follow up on my last submission where Jirka mentioned it would
be nice to have some locking around .debug dir operations.

https://lkml.org/lkml/2015/3/20/126

Following two patches add such functionality by utilizing flock. Since flock
supports both shared(read) and exclusive(write) locks both variants are being
added so that we can use multiple readers while only one writer.

Having said that any comments/suggestions on the code are welcome.

I did some basic testing. Mostly manual tool testing as well as the recreator
from above mentioned URL. I do not expect any functional change however more
testing is welcome.

Milos

Milos Vyletel (2):
  perf/tools: add read/write buildid dir locks
  perf/tools: put new buildid locks to use

 tools/perf/builtin-buildid-cache.c | 12 +
 tools/perf/util/build-id.c | 97 ++
 tools/perf/util/build-id.h |  5 ++
 3 files changed, 106 insertions(+), 8 deletions(-)

-- 
2.4.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] rcu: small rcu_dereference doc update

2015-04-17 Thread Milos Vyletel

Make a note stating that repeated calls of rcu_dereference() may not
return the same pointer if update happens while in critical section.

Reported-by: Jeff Haran 
Signed-off-by: Milos Vyletel 
---
 Documentation/RCU/whatisRCU.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 88dfce1..16622c9 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -256,7 +256,9 @@ rcu_dereference()
If you are going to be fetching multiple fields from the
RCU-protected structure, using the local variable is of
course preferred.  Repeated rcu_dereference() calls look
-   ugly and incur unnecessary overhead on Alpha CPUs.
+   ugly, do not guarantee that the same pointer will be returned
+   if an update happened while in the critical section, and incur
+   unnecessary overhead on Alpha CPUs.
 
Note that the value returned by rcu_dereference() is valid
only within the enclosing RCU read-side critical section.
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] rcu: small rcu_dereference doc update

2015-04-17 Thread Milos Vyletel

On Fri, Apr 17, 2015 at 10:13:50AM -0400, Steven Rostedt wrote:

> On Fri, 17 Apr 2015 07:06:38 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Fri, Apr 17, 2015 at 12:33:36PM +0200, Milos Vyletel wrote:
> > > Make a note stating that repeated calls of rcu_dereference() may not
> > > return the same pointer if update happens while in critical section.
> > > 
> > > Reported-by: Jeff Haran 
> > > Signed-off-by: Milos Vyletel 
> > 
> > Hmmm...  Seems like that should be obvious, but on the other hand,
> > I have been using RCU for more than twenty years, so my obviousness
> > sensors might need recalibration.
> > 
> > Queued for 4.2.
> 
> Before you queue it, there's a few articles that are screaming to be
> present...
> 
> > 
> > Thanx, Paul
> > 
> > > ---
> > >  Documentation/RCU/whatisRCU.txt | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/Documentation/RCU/whatisRCU.txt 
> > > b/Documentation/RCU/whatisRCU.txt
> > > index 88dfce1..82b1b2c 100644
> > > --- a/Documentation/RCU/whatisRCU.txt
> > > +++ b/Documentation/RCU/whatisRCU.txt
> > > @@ -256,7 +256,9 @@ rcu_dereference()
> > >   If you are going to be fetching multiple fields from the
> > >   RCU-protected structure, using the local variable is of
> > >   course preferred.  Repeated rcu_dereference() calls look
> > > - ugly and incur unnecessary overhead on Alpha CPUs.
> > > + ugly, do not guarantee that same pointer will be returned
> > > + if update happened while in critical section and incur
> > > + unnecessary overhead on Alpha CPUs.
> 
> ugly, do not guarantee that the same pointer will be returned
> if an update happened while in the critical section, and incur
> unnecessary overhead on Alpha CPUs.
> 
Thanks. Corrected. Sending v2 in a bit.

Milos

> -- Steve
> 
> > > 
> > >   Note that the value returned by rcu_dereference() is valid
> > >   only within the enclosing RCU read-side critical section.
> > > -- 
> > > 2.1.0
> > > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] rcu: small rcu_dereference doc update

2015-04-17 Thread Milos Vyletel

Make a note stating that repeated calls of rcu_dereference() may not
return the same pointer if update happens while in critical section.

Reported-by: Jeff Haran 
Signed-off-by: Milos Vyletel 
---
 Documentation/RCU/whatisRCU.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 88dfce1..82b1b2c 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -256,7 +256,9 @@ rcu_dereference()
If you are going to be fetching multiple fields from the
RCU-protected structure, using the local variable is of
course preferred.  Repeated rcu_dereference() calls look
-   ugly and incur unnecessary overhead on Alpha CPUs.
+   ugly, do not guarantee that same pointer will be returned
+   if update happened while in critical section and incur
+   unnecessary overhead on Alpha CPUs.
 
Note that the value returned by rcu_dereference() is valid
only within the enclosing RCU read-side critical section.
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:perf/core] perf tools: Fix race in build_id_cache__add_s()

2015-03-22 Thread tip-bot for Milos Vyletel

Commit-ID:  0635b0f71424be7706793ac260d063491a2889a0
Gitweb: http://git.kernel.org/tip/0635b0f71424be7706793ac260d063491a2889a0
Author: Milos Vyletel 
AuthorDate: Fri, 20 Mar 2015 11:37:25 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 20 Mar 2015 17:49:50 -0300

perf tools: Fix race in build_id_cache__add_s()

int build_id_cache__add_s(const char *sbuild_id, const char *debugdir,
  const char *name, bool is_kallsyms, bool is_vdso)
{
...
if (access(filename, F_OK)) {
   ^- [1]
if (is_kallsyms) {
 if (copyfile("/proc/kallsyms", filename))
goto out_free;
} else if (link(realname, filename) && copyfile(name, filename))
 ^-^- [2]
\ [3]
goto out_free;
}
...

When multiple instances of perf record get to [1] at more or less same time and
run access() one or more may get failure because the file does not exist yet
(since the first instance did not have chance to link it yet).

At this point the race moves to link() at [2] where first thread to get
there links file and goes on but second one gets -EEXIST so it runs
copyfile [3] which truncates the file.

reproducer:

rm -rf /root/.debug
for cpu in $(awk '/processor/ {print $3}' /proc/cpuinfo); do
perf record -a -v -T -F 1000 -C $cpu \
-o perf-${cpu}.data sleep 5 2> /dev/null &
done
wait

and simply search for empty files by:

find /lib/modules/`uname -r`/kernel/* -size 0

Signed-off-by: Milos Vyletel 
Acked-by: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1426847846-2-1-git-send-email-mi...@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/build-id.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index a196746..f7fb258 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -374,7 +374,8 @@ int build_id_cache__add_s(const char *sbuild_id, const char 
*name,
if (is_kallsyms) {
 if (copyfile("/proc/kallsyms", filename))
goto out_free;
-   } else if (link(realname, filename) && copyfile(name, filename))
+   } else if (link(realname, filename) && errno != EEXIST &&
+   copyfile(name, filename))
goto out_free;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf: fix race in build_id_cache__add_s()

2015-03-20 Thread Milos Vyletel

int build_id_cache__add_s(const char *sbuild_id, const char *debugdir,
  const char *name, bool is_kallsyms, bool is_vdso)
{
...
if (access(filename, F_OK)) {
   ^- [1]
if (is_kallsyms) {
 if (copyfile("/proc/kallsyms", filename))
goto out_free;
} else if (link(realname, filename) && copyfile(name, filename))
 ^-^- [2]
\ [3]
goto out_free;
}
...

when multiple instances of perf record get to [1] at more or less same time and
run access() one or more may get failure because the file does not exist yet
(since the first instance did not have chance to link it yet). at this point the
race moves to link() at [2] where first thread to get there links file and goes
on but second one gets -EEXIST so it runs copyfile [3] which truncates the file.

recreator:
rm -rf /root/.debug
for cpu in $(awk '/processor/ {print $3}' /proc/cpuinfo); do
perf record -a -v -T -F 1000 -C $cpu \
-o perf-${cpu}.data sleep 5 2> /dev/null &
done
wait

and simply search for empty files by
find /lib/modules/`uname -r`/kernel/* -size 0

Signed-off-by: Milos Vyletel 
---
 tools/perf/util/build-id.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 0c72680..a9db5a1 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -293,7 +293,8 @@ int build_id_cache__add_s(const char *sbuild_id, const char 
*debugdir,
if (is_kallsyms) {
 if (copyfile("/proc/kallsyms", filename))
goto out_free;
-   } else if (link(realname, filename) && copyfile(name, filename))
+   } else if (link(realname, filename) && errno != EEXIST &&
+   copyfile(name, filename))
goto out_free;
}
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch net-next 3/5] ipv6: consider net.ipv6.conf.all sysctls when making decisions

2014-06-12 Thread Milos Vyletel

David,

my words may have been poorly chosen. Last thing I want is to break things...

What I meant to say is that this changes the behavior of conf.all.*
sysctls from no-op to be part of decision along with interface
specific ones. Default settings still work the same way unless
conf.all.* was manually modified by user (for example setting
conf.all.accept_ra_defrtr=0 did not stop kernel from setting default
router before but it would with this change).

If this is not desired functionality I'm fine with it not being
applied. However if that's the case I suggest removing conf.all.*
altogether since it's no-op and pretty confusing for users (myself
included).

Milos

On Wed, Jun 11, 2014 at 6:32 PM, David Miller  wrote:
> From: Milos Vyletel 
> Date: Tue, 10 Jun 2014 13:49:35 -0400
>
>> On Tue, Jun 10, 2014 at 1:13 PM, Stephen Hemminger
>>  wrote:
>>> On Tue, 10 Jun 2014 12:19:11 -0400
>>> Milos Vyletel  wrote:
>>>
>>>> As it is right now net.ipv6.conf.all.* are mostly ignored and instead
>>>> we're only making decisions based on interface specific settings. These
>>>> settings are coppied from net.ipv6.conf.default and changing either all
>>>> or default settings have no effect.
>>>
>>> Although this is the right idea conceptually, it risks creating unhappy
>>> users because it changes the semantics of the API. This will probably break
>>> somebody's configuration.
>>
>> You're right but in this case I feel we are moving in the right
>> direction. While the
>> behavior will change in some cases this change is only adding well known ipv4
>> behavior for ipv6 sysctls. In fact documentation briefly mentioned that
>> net.ipv6.conf.all.* should change all the interface-specific settings
>> but that was
>> not the case until now.
>
> You can't just break things on people, no matter how much you think the
> current behavior is "poorly designed", "inconsistent with other areas of
> the networking", etc.  None of those things matter if you have to break
> things on people to "fix" it.
>
> I'm not applying this series, sorry.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch net-next 3/5] ipv6: consider net.ipv6.conf.all sysctls when making decisions

2014-06-10 Thread Milos Vyletel

On Tue, Jun 10, 2014 at 1:13 PM, Stephen Hemminger
 wrote:
> On Tue, 10 Jun 2014 12:19:11 -0400
> Milos Vyletel  wrote:
>
>> As it is right now net.ipv6.conf.all.* are mostly ignored and instead
>> we're only making decisions based on interface specific settings. These
>> settings are coppied from net.ipv6.conf.default and changing either all
>> or default settings have no effect.
>
> Although this is the right idea conceptually, it risks creating unhappy
> users because it changes the semantics of the API. This will probably break
> somebody's configuration.

You're right but in this case I feel we are moving in the right
direction. While the
behavior will change in some cases this change is only adding well known ipv4
behavior for ipv6 sysctls. In fact documentation briefly mentioned that
net.ipv6.conf.all.* should change all the interface-specific settings
but that was
not the case until now.

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch net-next 3/5] ipv6: consider net.ipv6.conf.all sysctls when making decisions

2014-06-10 Thread Milos Vyletel

> Hello,
>
>>
>> IN6_DEV_MAXCONF
>>   IN6_DEV_HOPLIMIT
>>   IN6_DEV_MTU
>
> I'm a little bit surprise to set the MTU as the maximum of all/dev
> value. Since the default for "all" is IPV6_MIN_MTU, it is probably
> harmless, but it could be a little bit surprising for administrators.
>
>>  IN6_DEV_USE_TEMPADDR
>
> For this one, the default is zero, except for point-to-point and
> loopback, with -1. I didn't see any difference between 0 and -1 in the
> current kernel code, but if I missed something, it could be a little
> problem.

Hi,

these are both valid points. It does not make too much sense to get
max value for MTU and I'd say we should either get min or simply get
interface setting and don't change the current behavior for this particular
one.

use_tempaddr will have special treatment too since it can range from:
use_tempaddr - INTEGER
Preference for Privacy Extensions (RFC3041).
  <= 0 : disable Privacy Extensions
  == 1 : enable Privacy Extensions, but prefer public
 addresses over temporary addresses.
  >  1 : enable Privacy Extensions and prefer temporary
 addresses over public addresses.
Default:  0 (for most devices)
 -1 (for point-to-point devices and loopback devices)

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch net-next 4/5] Documentation: update ipv6 part of networking/ip-sysctl.txt

2014-06-10 Thread Milos Vyletel

Documentation was updated with details how net.ipv6.conf.all settings are
used and how they affect current behaviour and interface-specific settings.

Signed-off-by: Milos Vyletel 
---
 Documentation/networking/ip-sysctl.txt | 48 ++
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index ab42c95..f0406ff 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1159,13 +1159,51 @@ ip6frag_secret_interval - INTEGER
Default: 600
 
 conf/default/*:
-   Change the interface-specific default settings.
+   Change the interface-specific default settings. These settings are
+   applied to all interfaces that have not been up and that did not have
+   this settings set previously. This is usually only during boot before
+   network is set up when sysctl settings are applied.
 
 
 conf/all/*:
-   Change all the interface-specific settings.
-
-   [XXX:  Other special features than forwarding?]
+   Change all the interface-specific settings. Different settings have
+   different effect. There are three different groups based on how they
+   treat conf/all vs interface-specific settings. These groups are
+
+   Both conf/all and interface must enable this feature (andconf)
+   - accept_ra_defrtr
+   - accept_ra_pinfo
+   - accept_ra_rtr_pref
+   - accept_redirects
+   - autoconf
+   - force_tllao
+   - mforward
+   - ndisc_notify
+   - suppress_frag_ndisc
+
+   Either conf/all or interface must enable this feature (orconf)
+   - disable_ipv6
+   - proxy_ndp
+
+   Maximum value of conf/all or interface will be used (maxconf)
+   - accept_dad
+   - accept_ra
+   - accept_ra_rt_info_max_plen
+   - accept_source_route
+   - dad_transmits
+   - hop_limit
+   - max_addresses
+   - max_desync_factor
+   - mtu
+   - optimistic_dad
+   - regen_max_retry
+   - router_probe_interval
+   - router_solicitation_delay
+   - router_solicitation_interval
+   - router_solicitations
+   - temp_prefered_lft
+   - temp_valid_lft
+   - use_tempaddr
 
 conf/all/forwarding - BOOLEAN
Enable global IPv6 forwarding between all interfaces.
@@ -1414,7 +1452,7 @@ force_mld_version - INTEGER
1 - Enforce to use MLD version 1
2 - Enforce to use MLD version 2
 
-suppress_frag_ndisc - INTEGER
+suppress_frag_ndisc - BOOLEAN
Control RFC 6980 (Security Implications of IPv6 Fragmentation
with IPv6 Neighbor Discovery) behavior:
1 - (default) discard fragmented neighbor discovery packets
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch net-next 1/5] ipv6: align sysctl table creation code with ipv4

2014-06-10 Thread Milos Vyletel

Use ADDRCONF_SYSCTL* macros to define sysctl table. This is first step
to align ipv6 and ipv4 sysctl code.

Signed-off-by: Milos Vyletel 
---
 net/ipv6/addrconf.c | 300 
 1 file changed, 66 insertions(+), 234 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5667b30..dbacca4 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4917,6 +4917,27 @@ int addrconf_sysctl_proxy_ndp(struct ctl_table *ctl, int 
write,
return ret;
 }
 
+#define ADDRCONF_SYSCTL_ENTRY(name, var, mval, proc) \
+   { \
+   .procname   = #name, \
+   .data   = (void *)&ipv6_devconf + \
+   offsetof(struct ipv6_devconf, var), \
+   .maxlen = sizeof(int), \
+   .mode   = mval, \
+   .proc_handler   = proc, \
+   }
+
+#define ADDRCONF_SYSCTL_RW_ENTRY(name) \
+   ADDRCONF_SYSCTL_ENTRY(name, name, 0644, proc_dointvec)
+
+#define ADDRCONF_SYSCTL_RO_ENTRY(name) \
+   ADDRCONF_SYSCTL_ENTRY(name, name, 0444, proc_dointvec)
+
+#define ADDRCONF_SYSCTL_COMPLEX_ENTRY(name, proc) \
+   ADDRCONF_SYSCTL_ENTRY(name, name, 0644, proc)
+
+#define ADDRCONF_SYSCTL_CUSTOM_ENTRY(name, var, proc) \
+   ADDRCONF_SYSCTL_ENTRY(name, var, 0644, proc)
 
 static struct addrconf_sysctl_table
 {
@@ -4925,248 +4946,59 @@ static struct addrconf_sysctl_table
 } addrconf_sysctl __read_mostly = {
.sysctl_header = NULL,
.addrconf_vars = {
-   {
-   .procname   = "forwarding",
-   .data   = &ipv6_devconf.forwarding,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = addrconf_sysctl_forward,
-   },
-   {
-   .procname   = "hop_limit",
-   .data   = &ipv6_devconf.hop_limit,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
-   {
-   .procname   = "mtu",
-   .data   = &ipv6_devconf.mtu6,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
-   {
-   .procname   = "accept_ra",
-   .data   = &ipv6_devconf.accept_ra,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
-   {
-   .procname   = "accept_redirects",
-   .data   = &ipv6_devconf.accept_redirects,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
-   {
-   .procname   = "autoconf",
-   .data   = &ipv6_devconf.autoconf,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
-   {
-   .procname   = "dad_transmits",
-   .data   = &ipv6_devconf.dad_transmits,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
-   {
-   .procname   = "router_solicitations",
-   .data   = &ipv6_devconf.rtr_solicits,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec,
-   },
-   {
-   .procname   = "router_solicitation_interval",
-   .data   = &ipv6_devconf.rtr_solicit_interval,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec_jiffies,
-   },
-   {
-   .procname   = "router_solicitation_delay",
-   .data   = &ipv6_devconf.rtr_solicit_delay,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec_jiffies,
-   },
-

[patch net-next 5/5] ipv6: copy default config values to interfaces

2014-06-10 Thread Milos Vyletel

Propagate changes to default sysctl values to all interfaces if they
were not previously configured and interface was not up before. This is
usually only true during boot when we apply /etc/sysctl.conf values
before network is brought up.

Signed-off-by: Milos Vyletel 
---
 include/linux/ipv6.h |  2 ++
 include/net/ipv6.h   |  6 ++
 net/ipv6/addrconf.c  | 39 +--
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index fe8d38d..e356905 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -1,6 +1,7 @@
 #ifndef _IPV6_H
 #define _IPV6_H
 
+#include 
 #include 
 
 #define ipv6_optlen(p)  (((p)->hdrlen+1) << 3)
@@ -11,6 +12,7 @@
 struct ipv6_devconf {
void*sysctl;
__s32   data[IPV6_DEVCONF_MAX];
+   DECLARE_BITMAP(state, IPV6_DEVCONF_MAX);
 };
 
 struct ipv6_params {
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 1ee39a5..eb4e911 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -302,9 +302,15 @@ static inline int ipv6_devconf_get(struct inet6_dev *idev, 
int index)
 static inline void ipv6_devconf_set(struct inet6_dev *idev, int index,
int val)
 {
+   set_bit(index, idev->cnf.state);
idev->cnf.data[index] = val;
 }
 
+static inline void ipv6_devconf_setall(struct inet6_dev *idev)
+{
+   bitmap_fill(idev->cnf.state, IPV6_DEVCONF_MAX);
+}
+
 #define IN6_DEV_CONF_GET(idev, attr) \
ipv6_devconf_get((idev), IPV6_DEVCONF_ ## attr)
 #define IN6_DEV_CONF_SET(idev, attr, val) \
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5b1b578..5218978 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -40,6 +40,7 @@
 
 #define pr_fmt(fmt) "IPv6: " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -852,6 +853,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr 
*addr,
goto out;
}
 
+   ipv6_devconf_setall(idev);
neigh_parms_data_state_setall(idev->nd_parms);
 
ifa->addr = *addr;
@@ -4897,6 +4899,39 @@ int addrconf_sysctl_proxy_ndp(struct ctl_table *ctl, int 
write,
return ret;
 }
 
+static void addrconf_copy_dflt_conf(struct net *net, int i)
+{
+   struct net_device *dev;
+
+   rcu_read_lock();
+   for_each_netdev_rcu(net, dev) {
+   struct inet6_dev *idev = __in6_dev_get(dev);
+
+   if (idev && !test_bit(i, idev->cnf.state))
+   idev->cnf.data[i] = net->ipv6.devconf_dflt->data[i];
+   }
+   rcu_read_unlock();
+}
+
+static int addrconf_sysctl_proc(struct ctl_table *ctl, int write,
+   void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   int ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
+
+   if (write) {
+   struct ipv6_devconf *cnf = ctl->extra1;
+   struct net *net = ctl->extra2;
+   int i = (int *)ctl->data - cnf->data;
+
+   set_bit(i, cnf->state);
+
+   if (cnf == net->ipv6.devconf_dflt)
+   addrconf_copy_dflt_conf(net, i);
+   }
+
+   return ret;
+}
+
 #define ADDRCONF_SYSCTL_ENTRY(attr, name, mval, proc) \
{ \
.procname   = #name, \
@@ -4909,10 +4944,10 @@ int addrconf_sysctl_proxy_ndp(struct ctl_table *ctl, 
int write,
}
 
 #define ADDRCONF_SYSCTL_RW_ENTRY(attr, name) \
-   ADDRCONF_SYSCTL_ENTRY(attr, name, 0644, proc_dointvec)
+   ADDRCONF_SYSCTL_ENTRY(attr, name, 0644, addrconf_sysctl_proc)
 
 #define ADDRCONF_SYSCTL_RO_ENTRY(attr, name) \
-   ADDRCONF_SYSCTL_ENTRY(attr, name, 0444, proc_dointvec)
+   ADDRCONF_SYSCTL_ENTRY(attr, name, 0444, addrconf_sysctl_proc)
 
 #define ADDRCONF_SYSCTL_COMPLEX_ENTRY(attr, name, proc) \
ADDRCONF_SYSCTL_ENTRY(attr, name, 0644, proc)
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch net-next 2/5] ipv6: rename DEVCONF* to IPV6_DEVCONF* to align with ipv4

2014-06-10 Thread Milos Vyletel

Use IPV6_DEVCONF* name similar to the IPV4_DEVCONF* used in ipv4.
Another step in aligning code.

Signed-off-by: Milos Vyletel 
---
 include/uapi/linux/ipv6.h | 68 +--
 net/ipv6/addrconf.c   | 74 +++
 2 files changed, 71 insertions(+), 71 deletions(-)

diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 593b0e3..715559a 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -130,40 +130,40 @@ struct ipv6hdr {
 
 /* index values for the variables in ipv6_devconf */
 enum {
-   DEVCONF_FORWARDING = 0,
-   DEVCONF_HOPLIMIT,
-   DEVCONF_MTU6,
-   DEVCONF_ACCEPT_RA,
-   DEVCONF_ACCEPT_REDIRECTS,
-   DEVCONF_AUTOCONF,
-   DEVCONF_DAD_TRANSMITS,
-   DEVCONF_RTR_SOLICITS,
-   DEVCONF_RTR_SOLICIT_INTERVAL,
-   DEVCONF_RTR_SOLICIT_DELAY,
-   DEVCONF_USE_TEMPADDR,
-   DEVCONF_TEMP_VALID_LFT,
-   DEVCONF_TEMP_PREFERED_LFT,
-   DEVCONF_REGEN_MAX_RETRY,
-   DEVCONF_MAX_DESYNC_FACTOR,
-   DEVCONF_MAX_ADDRESSES,
-   DEVCONF_FORCE_MLD_VERSION,
-   DEVCONF_ACCEPT_RA_DEFRTR,
-   DEVCONF_ACCEPT_RA_PINFO,
-   DEVCONF_ACCEPT_RA_RTR_PREF,
-   DEVCONF_RTR_PROBE_INTERVAL,
-   DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN,
-   DEVCONF_PROXY_NDP,
-   DEVCONF_OPTIMISTIC_DAD,
-   DEVCONF_ACCEPT_SOURCE_ROUTE,
-   DEVCONF_MC_FORWARDING,
-   DEVCONF_DISABLE_IPV6,
-   DEVCONF_ACCEPT_DAD,
-   DEVCONF_FORCE_TLLAO,
-   DEVCONF_NDISC_NOTIFY,
-   DEVCONF_MLDV1_UNSOLICITED_REPORT_INTERVAL,
-   DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL,
-   DEVCONF_SUPPRESS_FRAG_NDISC,
-   DEVCONF_MAX
+   IPV6_DEVCONF_FORWARDING = 0,
+   IPV6_DEVCONF_HOPLIMIT,
+   IPV6_DEVCONF_MTU6,
+   IPV6_DEVCONF_ACCEPT_RA,
+   IPV6_DEVCONF_ACCEPT_REDIRECTS,
+   IPV6_DEVCONF_AUTOCONF,
+   IPV6_DEVCONF_DAD_TRANSMITS,
+   IPV6_DEVCONF_RTR_SOLICITS,
+   IPV6_DEVCONF_RTR_SOLICIT_INTERVAL,
+   IPV6_DEVCONF_RTR_SOLICIT_DELAY,
+   IPV6_DEVCONF_USE_TEMPADDR,
+   IPV6_DEVCONF_TEMP_VALID_LFT,
+   IPV6_DEVCONF_TEMP_PREFERED_LFT,
+   IPV6_DEVCONF_REGEN_MAX_RETRY,
+   IPV6_DEVCONF_MAX_DESYNC_FACTOR,
+   IPV6_DEVCONF_MAX_ADDRESSES,
+   IPV6_DEVCONF_FORCE_MLD_VERSION,
+   IPV6_DEVCONF_ACCEPT_RA_DEFRTR,
+   IPV6_DEVCONF_ACCEPT_RA_PINFO,
+   IPV6_DEVCONF_ACCEPT_RA_RTR_PREF,
+   IPV6_DEVCONF_RTR_PROBE_INTERVAL,
+   IPV6_DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN,
+   IPV6_DEVCONF_PROXY_NDP,
+   IPV6_DEVCONF_OPTIMISTIC_DAD,
+   IPV6_DEVCONF_ACCEPT_SOURCE_ROUTE,
+   IPV6_DEVCONF_MC_FORWARDING,
+   IPV6_DEVCONF_DISABLE_IPV6,
+   IPV6_DEVCONF_ACCEPT_DAD,
+   IPV6_DEVCONF_FORCE_TLLAO,
+   IPV6_DEVCONF_NDISC_NOTIFY,
+   IPV6_DEVCONF_MLDV1_UNSOLICITED_REPORT_INTERVAL,
+   IPV6_DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL,
+   IPV6_DEVCONF_SUPPRESS_FRAG_NDISC,
+   IPV6_DEVCONF_MAX
 };
 
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index dbacca4..24fbb38 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4272,62 +4272,62 @@ errout:
 static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
__s32 *array, int bytes)
 {
-   BUG_ON(bytes < (DEVCONF_MAX * 4));
+   BUG_ON(bytes < (IPV6_DEVCONF_MAX * 4));
 
memset(array, 0, bytes);
-   array[DEVCONF_FORWARDING] = cnf->forwarding;
-   array[DEVCONF_HOPLIMIT] = cnf->hop_limit;
-   array[DEVCONF_MTU6] = cnf->mtu6;
-   array[DEVCONF_ACCEPT_RA] = cnf->accept_ra;
-   array[DEVCONF_ACCEPT_REDIRECTS] = cnf->accept_redirects;
-   array[DEVCONF_AUTOCONF] = cnf->autoconf;
-   array[DEVCONF_DAD_TRANSMITS] = cnf->dad_transmits;
-   array[DEVCONF_RTR_SOLICITS] = cnf->rtr_solicits;
-   array[DEVCONF_RTR_SOLICIT_INTERVAL] =
+   array[IPV6_DEVCONF_FORWARDING] = cnf->forwarding;
+   array[IPV6_DEVCONF_HOPLIMIT] = cnf->hop_limit;
+   array[IPV6_DEVCONF_MTU6] = cnf->mtu6;
+   array[IPV6_DEVCONF_ACCEPT_RA] = cnf->accept_ra;
+   array[IPV6_DEVCONF_ACCEPT_REDIRECTS] = cnf->accept_redirects;
+   array[IPV6_DEVCONF_AUTOCONF] = cnf->autoconf;
+   array[IPV6_DEVCONF_DAD_TRANSMITS] = cnf->dad_transmits;
+   array[IPV6_DEVCONF_RTR_SOLICITS] = cnf->rtr_solicits;
+   array[IPV6_DEVCONF_RTR_SOLICIT_INTERVAL] =
jiffies_to_msecs(cnf->rtr_solicit_interval);
-   array[DEVCONF_RTR_SOLICIT_DELAY] =
+   array[IPV6_DEVCONF_RTR_SOLICIT_DELAY] =
jiffies_to_msecs(cnf->rtr_solicit_delay);
-   array[DEVCONF_FORCE_MLD_VERSION] = cnf->force_mld_version;
-   array[DEVCONF_MLDV1_UNSOLICITED_REPORT_INTERVAL] =
+   array[IPV6_DEVCONF_FORCE_MLD_VERSION] = cnf->force_mld_version;
+   array[IPV6_DEVCONF_MLDV1_UNSOLICITED_REPORT_INTERVAL] =
jiffi

Re: [patch net-next] net: ipv6: trigger action when setting conf.all sysctls

2014-05-28 Thread Milos Vyletel

On Wed, May 21, 2014 at 4:31 PM, David Miller  wrote:
> From: Cong Wang 
> Date: Wed, 21 May 2014 10:35:17 -0700
>
>> On Mon, May 19, 2014 at 10:09 AM, Milos Vyletel  
>> wrote:
>>> As it is right now most of these tunables don't do anything - they don't
>>> set any interface specific settings to the desired value and they don't
>>> take precedence before device specific settings in most cases.
>>>
>>> Documentation/networking/ip-sysctl.txt defines them to be:
>>> conf/all/*:
>>> Change all the interface-specific settings.
>>>
>>> but that is not really the case.
>>>
>>
>> I don't think it's a good idea, it enforces interface specific changes.
>>
>> You need to mimic what IPv4 conf does (if it hasn't done yet), something
>> like below:
>>
>> #define IN_DEV_MAXCONF(in_dev, attr) \
>> (max(IPV4_DEVCONF_ALL(dev_net(in_dev->dev), attr), \
>>  IN_DEV_CONF_GET((in_dev), attr)))
>
> Agreed.

Thanks for the comments guys. I've looked at ipv4 sysctl code and I'm going to
align ipv6 code and add similar macros (IN6_DEV_{OR,MAX,AND}CONF).

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch net-next] net: ipv6: trigger action when setting conf.all sysctls

2014-05-19 Thread Milos Vyletel

As it is right now most of these tunables don't do anything - they don't
set any interface specific settings to the desired value and they don't
take precedence before device specific settings in most cases.

Documentation/networking/ip-sysctl.txt defines them to be:
conf/all/*:
Change all the interface-specific settings.

but that is not really the case.

Following patch will, when any net.ipv6.conf.all settings are set,
trigger action that will change default as well as all interface
specific settings of same procname. Only sysctls handled by default
proc_dointvec* are being changed. Per device changes are still allowed
and work as before.

Setting accept_dad (just as an example) with this patch looks like this
[root@linux ~]# uname -r
3.15.0-rc5+
[root@linux ~]# sysctl net.ipv6.conf | grep accept_dad
net.ipv6.conf.all.accept_dad = 0
net.ipv6.conf.default.accept_dad = 0
net.ipv6.conf.eth0.accept_dad = 0
net.ipv6.conf.lo.accept_dad = 0
[root@linux ~]# sysctl net.ipv6.conf.all.accept_dad=1
net.ipv6.conf.all.accept_dad = 1
[root@linux ~]# sysctl net.ipv6.conf | grep accept_dad
net.ipv6.conf.all.accept_dad = 1
net.ipv6.conf.default.accept_dad = 1
net.ipv6.conf.eth0.accept_dad = 1
net.ipv6.conf.lo.accept_dad = 1
[root@linux ~]# sysctl net.ipv6.conf.eth0.accept_dad=0
net.ipv6.conf.eth0.accept_dad = 0
[root@linux ~]# sysctl net.ipv6.conf | grep accept_dad
net.ipv6.conf.all.accept_dad = 1
net.ipv6.conf.default.accept_dad = 1
net.ipv6.conf.eth0.accept_dad = 0
net.ipv6.conf.lo.accept_dad = 1

Signed-off-by: Milos Vyletel 
---
 net/ipv6/addrconf.c | 157 ++--
 1 file changed, 127 insertions(+), 30 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6c7fa08..f4a1eaa 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -110,6 +110,7 @@ static inline u32 cstamp_delta(unsigned long cstamp)
 #ifdef CONFIG_SYSCTL
 static void addrconf_sysctl_register(struct inet6_dev *idev);
 static void addrconf_sysctl_unregister(struct inet6_dev *idev);
+static int __addrconf_sysctl_all_trigger(struct net *, void *);
 #else
 static inline void addrconf_sysctl_register(struct inet6_dev *idev)
 {
@@ -4922,6 +4923,56 @@ int addrconf_sysctl_proxy_ndp(struct ctl_table *ctl, int 
write,
return ret;
 }
 
+static
+int addrconf_proc_dointvec(struct ctl_table *ctl, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   int *valp = ctl->data;
+   struct inet6_dev *idev = ctl->extra1;
+   struct net *net = ctl->extra2;
+   int ret;
+
+   ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
+
+   if (write && idev == NULL)
+   ret = __addrconf_sysctl_all_trigger(net, valp);
+
+   return ret;
+}
+
+static
+int addrconf_proc_dointvec_jiffies(struct ctl_table *ctl, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   int *valp = ctl->data;
+   struct inet6_dev *idev = ctl->extra1;
+   struct net *net = ctl->extra2;
+   int ret;
+
+   ret = proc_dointvec_jiffies(ctl, write, buffer, lenp, ppos);
+
+   if (write && idev == NULL)
+   ret = __addrconf_sysctl_all_trigger(net, valp);
+
+   return ret;
+}
+
+static
+int addrconf_proc_dointvec_ms_jiffies(struct ctl_table *ctl, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   int *valp = ctl->data;
+   struct inet6_dev *idev = ctl->extra1;
+   struct net *net = ctl->extra2;
+   int ret;
+
+   ret = proc_dointvec_ms_jiffies(ctl, write, buffer, lenp, ppos);
+
+   if (write && idev == NULL)
+   ret = __addrconf_sysctl_all_trigger(net, valp);
+
+   return ret;
+}
 
 static struct addrconf_sysctl_table
 {
@@ -4942,70 +4993,70 @@ static struct addrconf_sysctl_table
.data   = &ipv6_devconf.hop_limit,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = proc_dointvec,
+   .proc_handler   = addrconf_proc_dointvec,
},
{
.procname   = "mtu",
.data   = &ipv6_devconf.mtu6,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = proc_dointvec,
+   .proc_handler   = addrconf_proc_dointvec,
},
{
.procname   = "accept_ra",
.data   = &ipv6_devconf.accept_ra,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = proc_dointvec,
+   .proc_handler

Re: [PATCH] crypto: x86/sha1: fix coverity CID 1195603

2014-05-06 Thread Milos Vyletel

And this time in plain text...

Most likely yes but I wanted to keep sha1_ssse3_mod_init consistent
with sha256_ssse3_mod_init/sha512_ssse3_mod_init functions.

Milos

On Tue, May 6, 2014 at 4:28 AM, Pavel Machek  wrote:
> On Wed 2014-04-30 15:17:54, Milos Vyletel wrote:
>> Coverity detected possible use of uninitialized pointer when printing info
>> message during module load. While this is higly unlikely to cause any 
>> troubles
>> simple change in sha1_ssse3_mod_init to make it look like sha256/512 init
>> function will fix this.
>>
>> 260
>>   6. Condition sha1_transform_asm, taking true branch
>> 261if (sha1_transform_asm) {
>>
>> CID 1195603 (#1 of 1): Uninitialized pointer read (UNINIT)
>> 7. uninit_use_in_call: Using uninitialized value algo_name when calling 
>> printk.
>> 262pr_info("Using %s optimized SHA-1 implementation\n", 
>> algo_name);
>> 263        return crypto_register_shash(&alg);
>> 264}
>>
>> Reported-by: 
>> Signed-off-by: Milos Vyletel 
>> ---
>>  arch/x86/crypto/sha1_ssse3_glue.c | 22 --
>>  1 file changed, 12 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/crypto/sha1_ssse3_glue.c 
>> b/arch/x86/crypto/sha1_ssse3_glue.c
>> index 74d16ef..5352196 100644
>> --- a/arch/x86/crypto/sha1_ssse3_glue.c
>> +++ b/arch/x86/crypto/sha1_ssse3_glue.c
>> @@ -235,31 +235,33 @@ static bool __init avx2_usable(void)
>>
>>  static int __init sha1_ssse3_mod_init(void)
>>  {
>> - char *algo_name;
>> -
>
> Would simple "algo_name = NULL" be enough to fix this?
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] crypto: x86/sha1: fix coverity CID 1195603

2014-04-30 Thread Milos Vyletel

Coverity detected possible use of uninitialized pointer when printing info
message during module load. While this is higly unlikely to cause any troubles
simple change in sha1_ssse3_mod_init to make it look like sha256/512 init
function will fix this.

260
6. Condition sha1_transform_asm, taking true branch
261if (sha1_transform_asm) {

CID 1195603 (#1 of 1): Uninitialized pointer read (UNINIT)
7. uninit_use_in_call: Using uninitialized value algo_name when calling printk.
262pr_info("Using %s optimized SHA-1 implementation\n", 
algo_name);
263return crypto_register_shash(&alg);
264}

Reported-by: 
Signed-off-by: Milos Vyletel 
---
 arch/x86/crypto/sha1_ssse3_glue.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c 
b/arch/x86/crypto/sha1_ssse3_glue.c
index 74d16ef..5352196 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -235,31 +235,33 @@ static bool __init avx2_usable(void)
 
 static int __init sha1_ssse3_mod_init(void)
 {
-   char *algo_name;
-
/* test for SSSE3 first */
-   if (cpu_has_ssse3) {
+   if (cpu_has_ssse3)
sha1_transform_asm = sha1_transform_ssse3;
-   algo_name = "SSSE3";
-   }
 
 #ifdef CONFIG_AS_AVX
/* allow AVX to override SSSE3, it's a little faster */
if (avx_usable()) {
sha1_transform_asm = sha1_transform_avx;
-   algo_name = "AVX";
 #ifdef CONFIG_AS_AVX2
/* allow AVX2 to override AVX, it's a little faster */
-   if (avx2_usable()) {
+   if (avx2_usable())
sha1_transform_asm = sha1_apply_transform_avx2;
-   algo_name = "AVX2";
-   }
 #endif
}
 #endif
 
if (sha1_transform_asm) {
-   pr_info("Using %s optimized SHA-1 implementation\n", algo_name);
+#ifdef CONFIG_AS_AVX
+   if (sha1_transform_asm == sha1_transform_avx)
+   pr_info("Using AVX optimized SHA-1 implementation\n");
+#ifdef CONFIG_AS_AVX2
+   else if (sha1_transform_asm == sha1_transform_avx2)
+   pr_info("Using AVX2 optimized SHA-1 implementation\n");
+#endif
+   else
+#endif
+   pr_info("Using SSSE3 optimized SHA-1 implementation\n");
return crypto_register_shash(&alg);
}
pr_info("Neither AVX nor AVX2 nor SSSE3 is available/usable.\n");
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: emit udev event when device is resized

2013-02-27 Thread Milos Vyletel

- Original Message -
> Milos Vyletel  writes:
> > On Feb 25, 2013, at 5:12 PM, Greg KH  wrote:
> >
> >> On Fri, Feb 22, 2013 at 10:14:49AM +1030, Rusty Russell wrote:
> >>> Milos Vyletel  writes:
> >>> 
> >>>> When virtio-blk device is resized from host (using block_resize from
> >>>> QEMU) emit
> >>>> KOBJ_CHANGE uevent to notify guest about such change. This allows user
> >>>> to have
> >>>> custom udev rules which would take whatever action if such event occurs.
> >>>> As a
> >>>> proof of concept I've created simple udev rule that automatically resize
> >>>> filesystem on virtio-blk device.
> >>>> 
> >>>> ACTION=="change", KERNEL=="vd*", \
> >>>>ENV{RESIZE}=="1", \
> >>>>ENV{ID_FS_TYPE}=="ext[3-4]", \
> >>>>RUN+="/sbin/resize2fs /dev/%k"
> >>>> ACTION=="change", KERNEL=="vd*", \
> >>>>ENV{RESIZE}=="1", \
> >>>>ENV{ID_FS_TYPE}=="LVM2_member", \
> >>>>RUN+="/sbin/pvresize /dev/%k"
> >>> 
> >>> This looks fine to me, but I like to check with Greg before adding udev
> >>> callouts Greg?
> >> 
> >> Hm, I thought we were frowning apon running binaries from udev rules
> >> these days, especially ones that might have big consequences (like
> >> resizing a disk image) like this.
> >> 
> >> Kay, am I right?
> >> 
> >> We already emit KOBJECT_CHANGE events when block devices change, from
> >> within the block core code.  Why is the patch below needed instead of
> >> using these events that are already generated?  How are virtio block
> >> devices special?
> >> 
> >>> BTW, if this is good, it's good for stable IMHO.
> >> 
> >> What bug does it fix?
> >> 
> >
> > It is not really a bug but it definitely is useful enhancement to have in
> > stable too. I
> > can imagine lots of people can benefit from this.
> 
> But that applies to almost any enhancement :)
> 

Good point :) 

> It will go in *next* merge window, not this one.
> 

Cool, thanks.

Milos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: emit udev event when device is resized

2013-02-26 Thread Milos Vyletel

- Original Message -
> On 02/25/2013 10:54 PM, Milos Vyletel wrote:
> > - Original Message -
> >> On 02/22/2013 03:02 AM, Milos Vyletel wrote:
> >>> When virtio-blk device is resized from host (using block_resize from
> >>> QEMU)
> >>> emit
> >>> KOBJ_CHANGE uevent to notify guest about such change. This allows user to
> >>> have
> >>> custom udev rules which would take whatever action if such event occurs.
> >>> As
> >>> a
> >>> proof of concept I've created simple udev rule that automatically resize
> >>> filesystem on virtio-blk device.
> >>>
> >>> ACTION=="change", KERNEL=="vd*", \
> >>> ENV{RESIZE}=="1", \
> >>> ENV{ID_FS_TYPE}=="ext[3-4]", \
> >>> RUN+="/sbin/resize2fs /dev/%k"
> >>> ACTION=="change", KERNEL=="vd*", \
> >>> ENV{RESIZE}=="1", \
> >>> ENV{ID_FS_TYPE}=="LVM2_member", \
> >>> RUN+="/sbin/pvresize /dev/%k"
> >>>
> >>> Signed-off-by: Milos Vyletel 
> >>> ---
> >>>  drivers/block/virtio_blk.c |3 +++
> >>>  1 files changed, 3 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> >>> index 8ad21a2..5990382 100644
> >>> --- a/drivers/block/virtio_blk.c
> >>> +++ b/drivers/block/virtio_blk.c
> >>> @@ -539,6 +539,8 @@ static void virtblk_config_changed_work(struct
> >>> work_struct *work)
> >>>   struct virtio_device *vdev = vblk->vdev;
> >>>   struct request_queue *q = vblk->disk->queue;
> >>>   char cap_str_2[10], cap_str_10[10];
> >>> + char event[] = "RESIZE=1";
> >>> + char *envp[] = { event, NULL };
> >>
> >> event is not used again. Why not just
> >>
> >> char *envp[] = { "RESIZE=1", NULL };
> >>
> > 
> > You're right. This works too. I was looking at other modules in kernel
> > how they do it and this was commonly used. Take a look at block/genhd.c
> > media_change_notify_thread() function which does similar thing.
> 
> Well, which kernel version are you looking at.
> media_change_notify_thread() is gone back in 2010. See commit: 9dc340.
>

Oh, you're right. I was looking at the rhel 6.3 kernel (2.6.32-279.19.1.el6)
before I switched to upstream branch and did the changes. Sorry about the
confusion.

Milos
 
> >>
> >>>   u64 capacity, size;
> >>>  
> >>>   mutex_lock(&vblk->config_lock);
> >>> @@ -568,6 +570,7 @@ static void virtblk_config_changed_work(struct
> >>> work_struct *work)
> >>>  
> >>>   set_capacity(vblk->disk, capacity);
> >>>   revalidate_disk(vblk->disk);
> >>> + kobject_uevent_env(&disk_to_dev(vblk->disk)->kobj, KOBJ_CHANGE, envp);
> >>>  done:
> >>>   mutex_unlock(&vblk->config_lock);
> >>>  }
> >>>
> >>
> >> I tried the following with your automatically resize udev rules.
> >>
> >> (qemu) block_resize vd0 500
> >> Check the fs size in guest, it is 500Mb
> >>
> >> (qemu) block_resize vd0 1200
> >> Check the fs size in guest, it is still 500Mb
> >>
> >> Can you try resizing multiple times? Does it work?
> > 
> > Yep, works for me. After each block_resize call I get udev event. I
> > was running udevadm monitor at that time. I've also tried running
> > udevd in debug mode and even if I try to bump size twice in a row
> > I get two separate udev events and resize2fs is triggered.
> 
> Turned on the debug info and looked into it again. It works for me now
> and the udev event is seen.
> 
> Tested-by: Asias He 
> 
> > [root@host ~]# lvresize -f -L +100M /dev/vgguests/evd2 && \
> > virsh blockresize resize vdb 1 && \
> > lvresize -f -L +100M /dev/vgguests/evd2 && \
> > virsh blockresize resize vdb 1
> >   Rounding size to boundary between physical extents: 128.00 MiB
> >   Extending logical volume evd2 to 31.38 GiB
> >   Logical volume evd2 successfully resized
> > Block device 'vdb' is resized
> >   Rounding size to boundary between physical extents: 128.00 MiB
> >   Extending logical volume evd2 to 31.50 GiB
> >   Logical volume evd2 successfully r

Re: [PATCH] virtio-blk: emit udev event when device is resized

2013-02-25 Thread Milos Vyletel


On Feb 25, 2013, at 5:12 PM, Greg KH  wrote:

> On Fri, Feb 22, 2013 at 10:14:49AM +1030, Rusty Russell wrote:
>> Milos Vyletel  writes:
>> 
>>> When virtio-blk device is resized from host (using block_resize from QEMU) 
>>> emit
>>> KOBJ_CHANGE uevent to notify guest about such change. This allows user to 
>>> have
>>> custom udev rules which would take whatever action if such event occurs. As 
>>> a
>>> proof of concept I've created simple udev rule that automatically resize
>>> filesystem on virtio-blk device.
>>> 
>>> ACTION=="change", KERNEL=="vd*", \
>>>ENV{RESIZE}=="1", \
>>>ENV{ID_FS_TYPE}=="ext[3-4]", \
>>>RUN+="/sbin/resize2fs /dev/%k"
>>> ACTION=="change", KERNEL=="vd*", \
>>>ENV{RESIZE}=="1", \
>>>ENV{ID_FS_TYPE}=="LVM2_member", \
>>>RUN+="/sbin/pvresize /dev/%k"
>> 
>> This looks fine to me, but I like to check with Greg before adding udev
>> callouts Greg?
> 
> Hm, I thought we were frowning apon running binaries from udev rules
> these days, especially ones that might have big consequences (like
> resizing a disk image) like this.
> 
> Kay, am I right?
> 
> We already emit KOBJECT_CHANGE events when block devices change, from
> within the block core code.  Why is the patch below needed instead of
> using these events that are already generated?  How are virtio block
> devices special?
> 
>> BTW, if this is good, it's good for stable IMHO.
> 
> What bug does it fix?
> 

It is not really a bug but it definitely is useful enhancement to have in 
stable too. I
can imagine lots of people can benefit from this.

Milos

> thanks,
> 
> greg k-h
> 
>> Cheers,
>> Rusty.
>> 
>>> Signed-off-by: Milos Vyletel 
>>> ---
>>> drivers/block/virtio_blk.c |3 +++
>>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>> 
>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>> index 8ad21a2..5990382 100644
>>> --- a/drivers/block/virtio_blk.c
>>> +++ b/drivers/block/virtio_blk.c
>>> @@ -539,6 +539,8 @@ static void virtblk_config_changed_work(struct 
>>> work_struct *work)
>>> struct virtio_device *vdev = vblk->vdev;
>>> struct request_queue *q = vblk->disk->queue;
>>> char cap_str_2[10], cap_str_10[10];
>>> +   char event[] = "RESIZE=1";
>>> +   char *envp[] = { event, NULL };
>>> u64 capacity, size;
>>> 
>>> mutex_lock(&vblk->config_lock);
>>> @@ -568,6 +570,7 @@ static void virtblk_config_changed_work(struct 
>>> work_struct *work)
>>> 
>>> set_capacity(vblk->disk, capacity);
>>> revalidate_disk(vblk->disk);
>>> +   kobject_uevent_env(&disk_to_dev(vblk->disk)->kobj, KOBJ_CHANGE, envp);
>>> done:
>>> mutex_unlock(&vblk->config_lock);
>>> }
>>> -- 
>>> 1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: emit udev event when device is resized

2013-02-25 Thread Milos Vyletel


On Feb 25, 2013, at 5:39 PM, Kay Sievers  wrote:

> On Mon, Feb 25, 2013 at 11:12 PM, Greg KH  wrote:
> 
>> Hm, I thought we were frowning apon running binaries from udev rules
>> these days, especially ones that might have big consequences (like
>> resizing a disk image) like this.
>> 
>> Kay, am I right?
> 
> We removed most of them from the default setups, yes. But there is
> nothing wrong if people want to ship that in some package or as custom
> rules.
> 
> It looks fine to me, we would just not add such things to the default
> set of of rules these days.

That was not my intention. I just wanted to demonstrate that event like this
could be useful in cases like filesystem resize. We have a need for automatic
resize by our customers so we will ship our custom udev rules. It's perfectly
understandable that default udev rules will not have this.

Milos

> 
>> We already emit KOBJECT_CHANGE events when block devices change, from
>> within the block core code.  Why is the patch below needed instead of
>> using these events that are already generated?  How are virtio block
>> devices special?
> 
> I think we only do that for dm and md and a couple of special cases
> like loop and read-only settings.
> 
> Kay

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: emit udev event when device is resized

2013-02-25 Thread Milos Vyletel

- Original Message -
> On 02/22/2013 03:02 AM, Milos Vyletel wrote:
> > When virtio-blk device is resized from host (using block_resize from QEMU)
> > emit
> > KOBJ_CHANGE uevent to notify guest about such change. This allows user to
> > have
> > custom udev rules which would take whatever action if such event occurs. As
> > a
> > proof of concept I've created simple udev rule that automatically resize
> > filesystem on virtio-blk device.
> > 
> > ACTION=="change", KERNEL=="vd*", \
> > ENV{RESIZE}=="1", \
> > ENV{ID_FS_TYPE}=="ext[3-4]", \
> > RUN+="/sbin/resize2fs /dev/%k"
> > ACTION=="change", KERNEL=="vd*", \
> > ENV{RESIZE}=="1", \
> > ENV{ID_FS_TYPE}=="LVM2_member", \
> > RUN+="/sbin/pvresize /dev/%k"
> > 
> > Signed-off-by: Milos Vyletel 
> > ---
> >  drivers/block/virtio_blk.c |3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 8ad21a2..5990382 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -539,6 +539,8 @@ static void virtblk_config_changed_work(struct
> > work_struct *work)
> > struct virtio_device *vdev = vblk->vdev;
> > struct request_queue *q = vblk->disk->queue;
> > char cap_str_2[10], cap_str_10[10];
> > +   char event[] = "RESIZE=1";
> > +   char *envp[] = { event, NULL };
> 
> event is not used again. Why not just
> 
> char *envp[] = { "RESIZE=1", NULL };
> 

You're right. This works too. I was looking at other modules in kernel
how they do it and this was commonly used. Take a look at block/genhd.c
media_change_notify_thread() function which does similar thing.

> 
> > u64 capacity, size;
> >  
> > mutex_lock(&vblk->config_lock);
> > @@ -568,6 +570,7 @@ static void virtblk_config_changed_work(struct
> > work_struct *work)
> >  
> > set_capacity(vblk->disk, capacity);
> > revalidate_disk(vblk->disk);
> > +   kobject_uevent_env(&disk_to_dev(vblk->disk)->kobj, KOBJ_CHANGE, envp);
> >  done:
> > mutex_unlock(&vblk->config_lock);
> >  }
> > 
> 
> I tried the following with your automatically resize udev rules.
> 
> (qemu) block_resize vd0 500
> Check the fs size in guest, it is 500Mb
> 
> (qemu) block_resize vd0 1200
> Check the fs size in guest, it is still 500Mb
> 
> Can you try resizing multiple times? Does it work?

Yep, works for me. After each block_resize call I get udev event. I
was running udevadm monitor at that time. I've also tried running
udevd in debug mode and even if I try to bump size twice in a row
I get two separate udev events and resize2fs is triggered. 

[root@host ~]# lvresize -f -L +100M /dev/vgguests/evd2 && \
virsh blockresize resize vdb 1 && \
lvresize -f -L +100M /dev/vgguests/evd2 && \
virsh blockresize resize vdb 1
  Rounding size to boundary between physical extents: 128.00 MiB
  Extending logical volume evd2 to 31.38 GiB
  Logical volume evd2 successfully resized
Block device 'vdb' is resized
  Rounding size to boundary between physical extents: 128.00 MiB
  Extending logical volume evd2 to 31.50 GiB
  Logical volume evd2 successfully resized
Block device 'vdb' is resized

On guest I get this (from udevd --debug)
...
1361804003.724025 [4438] util_run_program: '/sbin/resize2fs /dev/vdb' started
1361804003.725215 [4438] util_run_program: '/sbin/resize2fs' (stderr) 
'resize2fs 1.41.12 (17-May-2010)'
1361804004.129329 [4437] event_queue_insert: seq 2073 queued, 'change' 'block'
1361804006.577235 [4438] util_run_program: '/sbin/resize2fs' (stdout) 
'Filesystem at /dev/vdb is mounted on /mnt/backup; on-line resizing required'
1361804006.577251 [4438] util_run_program: '/sbin/resize2fs' (stdout) 'old 
desc_blocks = 2, new_desc_blocks = 2'
1361804006.577254 [4438] util_run_program: '/sbin/resize2fs' (stdout) 
'Performing an on-line resize of /dev/vdb to 8224768 (4k) blocks.'
1361804006.577257 [4438] util_run_program: '/sbin/resize2fs' (stdout) 'The 
filesystem on /dev/vdb is now 8224768 blocks long.'
1361804006.577260 [4438] util_run_program: '/sbin/resize2fs' (stdout) ''
1361804006.577544 [4438] util_run_program: '/sbin/resize2fs /dev/vdb' returned 
with exitcode 0
...
1361804006.702659 [4438] util_run_program: '/sbin/resize2fs /dev

[PATCH] virtio-blk: emit udev event when device is resized

2013-02-21 Thread Milos Vyletel

When virtio-blk device is resized from host (using block_resize from QEMU) emit
KOBJ_CHANGE uevent to notify guest about such change. This allows user to have
custom udev rules which would take whatever action if such event occurs. As a
proof of concept I've created simple udev rule that automatically resize
filesystem on virtio-blk device.

ACTION=="change", KERNEL=="vd*", \
ENV{RESIZE}=="1", \
ENV{ID_FS_TYPE}=="ext[3-4]", \
RUN+="/sbin/resize2fs /dev/%k"
ACTION=="change", KERNEL=="vd*", \
ENV{RESIZE}=="1", \
ENV{ID_FS_TYPE}=="LVM2_member", \
RUN+="/sbin/pvresize /dev/%k"

Signed-off-by: Milos Vyletel 
---
 drivers/block/virtio_blk.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 8ad21a2..5990382 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -539,6 +539,8 @@ static void virtblk_config_changed_work(struct work_struct 
*work)
struct virtio_device *vdev = vblk->vdev;
struct request_queue *q = vblk->disk->queue;
char cap_str_2[10], cap_str_10[10];
+   char event[] = "RESIZE=1";
+   char *envp[] = { event, NULL };
u64 capacity, size;
 
mutex_lock(&vblk->config_lock);
@@ -568,6 +570,7 @@ static void virtblk_config_changed_work(struct work_struct 
*work)
 
set_capacity(vblk->disk, capacity);
revalidate_disk(vblk->disk);
+   kobject_uevent_env(&disk_to_dev(vblk->disk)->kobj, KOBJ_CHANGE, envp);
 done:
mutex_unlock(&vblk->config_lock);
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:perf/core] perf tools: Avoid possible race condition in copyfile()

[PATCH v2] perf tools: avoid possible race condition in copyfile

Re: [PATCH] perf tools: avoid possible race condition in copyfile

Re: [PATCH] perf tools: avoid possible race condition in copyfile

[PATCH] perf tools: avoid possible race condition in copyfile

Re: [PATCH 2/2] perf/tools: put new buildid locks to use

Re: [PATCH 2/2] perf/tools: put new buildid locks to use

Re: [PATCH 2/2] perf/tools: put new buildid locks to use

Re: [PATCH 2/2] perf/tools: put new buildid locks to use

[PATCH 2/2] perf/tools: put new buildid locks to use

[PATCH 1/2] perf/tools: add read/write buildid dir locks

[PATCH 0/2] Add perf debug dir locking

[PATCH v2] rcu: small rcu_dereference doc update

Re: [PATCH] rcu: small rcu_dereference doc update

[PATCH] rcu: small rcu_dereference doc update

[tip:perf/core] perf tools: Fix race in build_id_cache__add_s()

[PATCH] perf: fix race in build_id_cache__add_s()

Re: [patch net-next 3/5] ipv6: consider net.ipv6.conf.all sysctls when making decisions

Re: [patch net-next 3/5] ipv6: consider net.ipv6.conf.all sysctls when making decisions

Re: [patch net-next 3/5] ipv6: consider net.ipv6.conf.all sysctls when making decisions

[patch net-next 4/5] Documentation: update ipv6 part of networking/ip-sysctl.txt

[patch net-next 1/5] ipv6: align sysctl table creation code with ipv4

[patch net-next 5/5] ipv6: copy default config values to interfaces

[patch net-next 2/5] ipv6: rename DEVCONF* to IPV6_DEVCONF* to align with ipv4

Re: [patch net-next] net: ipv6: trigger action when setting conf.all sysctls

[patch net-next] net: ipv6: trigger action when setting conf.all sysctls

Re: [PATCH] crypto: x86/sha1: fix coverity CID 1195603

[PATCH] crypto: x86/sha1: fix coverity CID 1195603

Re: [PATCH] virtio-blk: emit udev event when device is resized

Re: [PATCH] virtio-blk: emit udev event when device is resized

Re: [PATCH] virtio-blk: emit udev event when device is resized

Re: [PATCH] virtio-blk: emit udev event when device is resized

Re: [PATCH] virtio-blk: emit udev event when device is resized

[PATCH] virtio-blk: emit udev event when device is resized

34 matches

Site Navigation

Mail list logo

Footer information