Re: rpki-client introduce validated cache

2022-01-14 Thread Claudio Jeker
On Fri, Jan 14, 2022 at 01:45:19PM +, Job Snijders wrote:
> Thanks Claudio,
> 
> A question about stats below
> 
> On Fri, Jan 14, 2022 at 10:29:20AM +0100, Claudio Jeker wrote:
> > @@ -1246,8 +1249,8 @@ main(int argc, char *argv[])
> > logx("Certificate revocation lists: %zu", stats.crls);
> > logx("Ghostbuster records: %zu", stats.gbrs);
> > logx("Repositories: %zu", stats.repos);
> > -   logx("Cleanup: removed %zu files, %zu directories",
> > -   stats.del_files, stats.del_dirs);
> > +   logx("Cleanup: removed %zu files, %zu directories, %zu superfluous",
> > +   stats.del_files, stats.del_dirs, stats.extra_files);
> 
> The above stats message is not entirely clear to me, is something along
> these lines more precise?
> 
> logx("Cleanup: removed %zu previously valid objects, %zu directories, %zu"
>   " superfluous files", stats.del_files, stats.del_dirs, 
> stats.extra_files);

I still need to think about this more. The way the numbers are calculated
is not what you suggested. Right now the delete numbers are including
files that where never valid, etc. It may make sense to spend some time
and fix the reporting so that the stats returned make sense.

I think we now have files in ta and valid that should be accounted
differently than file in rrdp and rsync. The latter are now mostly
temporary files (e.g. rsync will always be empty after a run).

I'll get the validated cache in and then we can bikeshed over this :)
-- 
:wq Claudio



Re: rpki-client introduce validated cache

2022-01-14 Thread Job Snijders
Thanks Claudio,

A question about stats below

On Fri, Jan 14, 2022 at 10:29:20AM +0100, Claudio Jeker wrote:
> @@ -1246,8 +1249,8 @@ main(int argc, char *argv[])
>   logx("Certificate revocation lists: %zu", stats.crls);
>   logx("Ghostbuster records: %zu", stats.gbrs);
>   logx("Repositories: %zu", stats.repos);
> - logx("Cleanup: removed %zu files, %zu directories",
> - stats.del_files, stats.del_dirs);
> + logx("Cleanup: removed %zu files, %zu directories, %zu superfluous",
> + stats.del_files, stats.del_dirs, stats.extra_files);

The above stats message is not entirely clear to me, is something along
these lines more precise?

logx("Cleanup: removed %zu previously valid objects, %zu directories, %zu"
" superfluous files", stats.del_files, stats.del_dirs, 
stats.extra_files);

Kind regards,

Job



Re: rpki-client introduce validated cache

2022-01-14 Thread Theo Buehler
On Fri, Jan 14, 2022 at 10:29:20AM +0100, Claudio Jeker wrote:
> On Thu, Jan 13, 2022 at 10:51:33PM +0100, Theo Buehler wrote:
> > On Thu, Jan 13, 2022 at 05:05:57PM +0100, Claudio Jeker wrote:
> > > This diff adds a new cache subdir called "valid". This is the place where
> > > all verified and good files are stored after a run. It makes -n work a lot
> > > better since -n will now only look at what's inside "valid" and ignore
> > > "rsync" and "rrdp".
> > > 
> > > The trust anchors are still stored in "ta" even if valid.
> > > The rsync repo will only hold temporary files and be empty otherwise.
> > > The rrdp repo still may contain some superfluous files that people
> > > included in their snapshots. Not keeping them could result in extra
> > > snapshot downloads since delta updates could refer to these files.
> > > Since there is a valid repo, rrdp no longer needs an additional temp
> > > directory for its sync.
> > > 
> > > With this rsync will use the --compare-dest feature and use the valid
> > > cache as a base.
> > > 
> > > The file cleanup at the end got more complex. There is now
> > > repo_move_valid() and repo_cleanup_rrdp() to do two initial steps of
> > > cleanup before the tree traversal starts. The fts function now uses
> > > some fts features to find a) empty directories and b) superfluous files.
> > > 
> > > I think the code changes outside of repo.c should be straight forward.
> > > 
> > > To the rpki connoisseur, this currently only implements a simple newest
> > > file wins startegy. The handling of .mft and their serial number will
> > > follow once this is in.
> > 
> > I have made a first pass over this. I will need to sleep on this and
> > think through the repo.c changes in more detail with a fresh head.
> > 
> > Couple leaks and a few stylistic remarks noted inline.
> 
> Adjusted diff below. Thanks for the review of this and also for all the
> previous diffs.

This looks good to me now. Couldn't find anything of substance to
complain about. Let's get it in so we can make further progress in tree. 

ok tb



Re: rpki-client introduce validated cache

2022-01-14 Thread Claudio Jeker
On Thu, Jan 13, 2022 at 10:51:33PM +0100, Theo Buehler wrote:
> On Thu, Jan 13, 2022 at 05:05:57PM +0100, Claudio Jeker wrote:
> > This diff adds a new cache subdir called "valid". This is the place where
> > all verified and good files are stored after a run. It makes -n work a lot
> > better since -n will now only look at what's inside "valid" and ignore
> > "rsync" and "rrdp".
> > 
> > The trust anchors are still stored in "ta" even if valid.
> > The rsync repo will only hold temporary files and be empty otherwise.
> > The rrdp repo still may contain some superfluous files that people
> > included in their snapshots. Not keeping them could result in extra
> > snapshot downloads since delta updates could refer to these files.
> > Since there is a valid repo, rrdp no longer needs an additional temp
> > directory for its sync.
> > 
> > With this rsync will use the --compare-dest feature and use the valid
> > cache as a base.
> > 
> > The file cleanup at the end got more complex. There is now
> > repo_move_valid() and repo_cleanup_rrdp() to do two initial steps of
> > cleanup before the tree traversal starts. The fts function now uses
> > some fts features to find a) empty directories and b) superfluous files.
> > 
> > I think the code changes outside of repo.c should be straight forward.
> > 
> > To the rpki connoisseur, this currently only implements a simple newest
> > file wins startegy. The handling of .mft and their serial number will
> > follow once this is in.
> 
> I have made a first pass over this. I will need to sleep on this and
> think through the repo.c changes in more detail with a fresh head.
> 
> Couple leaks and a few stylistic remarks noted inline.

Adjusted diff below. Thanks for the review of this and also for all the
previous diffs.

-- 
:wq Claudio

Index: extern.h
===
RCS file: /cvs/src/usr.sbin/rpki-client/extern.h,v
retrieving revision 1.103
diff -u -p -r1.103 extern.h
--- extern.h13 Jan 2022 13:46:03 -  1.103
+++ extern.h13 Jan 2022 13:52:42 -
@@ -341,7 +341,7 @@ enum publish_type {
 struct entity {
TAILQ_ENTRY(entity) entries;
char*path;  /* path relative to repository */
-   char*file;  /* filename */
+   char*file;  /* filename or valid repo path */
unsigned char   *data;  /* optional data blob */
size_t   datasz;/* length of optional data blob */
unsigned int repoid;/* repository identifier */
@@ -380,6 +380,7 @@ struct stats {
size_t   vrps; /* total number of vrps */
size_t   uniqs; /* number of unique vrps */
size_t   del_files; /* number of files removed in cleanup */
+   size_t   extra_files; /* number of superfluous files */
size_t   del_dirs; /* number of directories removed in cleanup */
size_t   brks; /* number of BGPsec Router Key (BRK) certificates */
struct timeval  elapsed_time;
@@ -506,7 +507,7 @@ void rrdp_clear(unsigned int);
 voidrrdp_save_state(unsigned int, struct rrdp_session *);
 int rrdp_handle_file(unsigned int, enum publish_type, char *,
char *, size_t, char *, size_t);
-char   *repo_basedir(const struct repo *);
+char   *repo_basedir(const struct repo *, int);
 unsigned intrepo_id(const struct repo *);
 const char *repo_uri(const struct repo *);
 struct repo*ta_lookup(int, struct tal *);
@@ -520,7 +521,8 @@ void rsync_finish(unsigned int, int);
 voidhttp_finish(unsigned int, enum http_result, const char *);
 voidrrdp_finish(unsigned int, int);
 
-voidrsync_fetch(unsigned int, const char *, const char *);
+voidrsync_fetch(unsigned int, const char *, const char *,
+   const char *);
 voidhttp_fetch(unsigned int, const char *, const char *, int);
 voidrrdp_fetch(unsigned int, const char *, const char *,
struct rrdp_session *);
Index: main.c
===
RCS file: /cvs/src/usr.sbin/rpki-client/main.c,v
retrieving revision 1.175
diff -u -p -r1.175 main.c
--- main.c  13 Jan 2022 13:18:41 -  1.175
+++ main.c  14 Jan 2022 08:57:30 -
@@ -151,20 +151,22 @@ entity_write_repo(struct repo *rp)
struct ibuf *b;
enum rtype type = RTYPE_REPO;
unsigned int repoid;
-   char *path;
+   char *path, *altpath;
int talid = 0;
 
repoid = repo_id(rp);
-   path = repo_basedir(rp);
+   path = repo_basedir(rp, 0);
+   altpath = repo_basedir(rp, 1);
b = io_new_buffer();
io_simple_buffer(b, &type, sizeof(type));
io_simple_buffer(b, &repoid, sizeof(repoid));
io_simple_buffer(b, &talid, sizeof(talid));
io_str_buffer(b, path);
-   

Re: rpki-client introduce validated cache

2022-01-13 Thread Theo Buehler
On Thu, Jan 13, 2022 at 05:05:57PM +0100, Claudio Jeker wrote:
> This diff adds a new cache subdir called "valid". This is the place where
> all verified and good files are stored after a run. It makes -n work a lot
> better since -n will now only look at what's inside "valid" and ignore
> "rsync" and "rrdp".
> 
> The trust anchors are still stored in "ta" even if valid.
> The rsync repo will only hold temporary files and be empty otherwise.
> The rrdp repo still may contain some superfluous files that people
> included in their snapshots. Not keeping them could result in extra
> snapshot downloads since delta updates could refer to these files.
> Since there is a valid repo, rrdp no longer needs an additional temp
> directory for its sync.
> 
> With this rsync will use the --compare-dest feature and use the valid
> cache as a base.
> 
> The file cleanup at the end got more complex. There is now
> repo_move_valid() and repo_cleanup_rrdp() to do two initial steps of
> cleanup before the tree traversal starts. The fts function now uses
> some fts features to find a) empty directories and b) superfluous files.
> 
> I think the code changes outside of repo.c should be straight forward.
> 
> To the rpki connoisseur, this currently only implements a simple newest
> file wins startegy. The handling of .mft and their serial number will
> follow once this is in.

I have made a first pass over this. I will need to sleep on this and
think through the repo.c changes in more detail with a fresh head.

Couple leaks and a few stylistic remarks noted inline.

> -- 
> :wq Claudio
> 
> ? obj
> Index: extern.h
> ===
> RCS file: /cvs/src/usr.sbin/rpki-client/extern.h,v
> retrieving revision 1.103
> diff -u -p -r1.103 extern.h
> --- extern.h  13 Jan 2022 13:46:03 -  1.103
> +++ extern.h  13 Jan 2022 15:50:40 -
> @@ -341,7 +341,7 @@ enum publish_type {
>  struct entity {
>   TAILQ_ENTRY(entity) entries;
>   char*path;  /* path relative to repository */
> - char*file;  /* filename */
> + char*file;  /* filename or valid repo path */
>   unsigned char   *data;  /* optional data blob */
>   size_t   datasz;/* length of optional data blob */
>   unsigned int repoid;/* repository identifier */
> @@ -380,6 +380,7 @@ struct stats {
>   size_t   vrps; /* total number of vrps */
>   size_t   uniqs; /* number of unique vrps */
>   size_t   del_files; /* number of files removed in cleanup */
> + size_t   extra_files; /* number of superfluous files */
>   size_t   del_dirs; /* number of directories removed in cleanup */
>   size_t   brks; /* number of BGPsec Router Key (BRK) certificates */
>   struct timeval  elapsed_time;
> @@ -506,7 +507,7 @@ void   rrdp_clear(unsigned int);
>  void  rrdp_save_state(unsigned int, struct rrdp_session *);
>  int   rrdp_handle_file(unsigned int, enum publish_type, char *,
>   char *, size_t, char *, size_t);
> -char *repo_basedir(const struct repo *);
> +char *repo_basedir(const struct repo *, int);
>  unsigned int  repo_id(const struct repo *);
>  const char   *repo_uri(const struct repo *);
>  struct repo  *ta_lookup(int, struct tal *);
> @@ -520,7 +521,8 @@ void   rsync_finish(unsigned int, int);
>  void  http_finish(unsigned int, enum http_result, const char *);
>  void  rrdp_finish(unsigned int, int);
>  
> -void  rsync_fetch(unsigned int, const char *, const char *);
> +void  rsync_fetch(unsigned int, const char *, const char *,
> + const char *);
>  void  http_fetch(unsigned int, const char *, const char *, int);
>  void  rrdp_fetch(unsigned int, const char *, const char *,
>   struct rrdp_session *);
> Index: main.c
> ===
> RCS file: /cvs/src/usr.sbin/rpki-client/main.c,v
> retrieving revision 1.175
> diff -u -p -r1.175 main.c
> --- main.c13 Jan 2022 13:18:41 -  1.175
> +++ main.c13 Jan 2022 15:50:40 -
> @@ -151,17 +151,18 @@ entity_write_repo(struct repo *rp)
>   struct ibuf *b;
>   enum rtype type = RTYPE_REPO;
>   unsigned int repoid;
> - char *path;
> + char *path, *altpath;
>   int talid = 0;
>  
>   repoid = repo_id(rp);
> - path = repo_basedir(rp);
> + path = repo_basedir(rp, 0);
> + altpath = repo_basedir(rp, 1);
>   b = io_new_buffer();
>   io_simple_buffer(b, &type, sizeof(type));
>   io_simple_buffer(b, &repoid, sizeof(repoid));
>   io_simple_buffer(b, &talid, sizeof(talid));
>   io_str_buffer(b, path);
> - io_str_buffer(b, NULL);
> + io_str_buffer(b, altpath);
>   io_buf_buffer(b, NULL, 0);
>   io_close_buffer(&procq, b);
>   free(path);

Don't leak alt

rpki-client introduce validated cache

2022-01-13 Thread Claudio Jeker
This diff adds a new cache subdir called "valid". This is the place where
all verified and good files are stored after a run. It makes -n work a lot
better since -n will now only look at what's inside "valid" and ignore
"rsync" and "rrdp".

The trust anchors are still stored in "ta" even if valid.
The rsync repo will only hold temporary files and be empty otherwise.
The rrdp repo still may contain some superfluous files that people
included in their snapshots. Not keeping them could result in extra
snapshot downloads since delta updates could refer to these files.
Since there is a valid repo, rrdp no longer needs an additional temp
directory for its sync.

With this rsync will use the --compare-dest feature and use the valid
cache as a base.

The file cleanup at the end got more complex. There is now
repo_move_valid() and repo_cleanup_rrdp() to do two initial steps of
cleanup before the tree traversal starts. The fts function now uses
some fts features to find a) empty directories and b) superfluous files.

I think the code changes outside of repo.c should be straight forward.

To the rpki connoisseur, this currently only implements a simple newest
file wins startegy. The handling of .mft and their serial number will
follow once this is in.
-- 
:wq Claudio

? obj
Index: extern.h
===
RCS file: /cvs/src/usr.sbin/rpki-client/extern.h,v
retrieving revision 1.103
diff -u -p -r1.103 extern.h
--- extern.h13 Jan 2022 13:46:03 -  1.103
+++ extern.h13 Jan 2022 15:50:40 -
@@ -341,7 +341,7 @@ enum publish_type {
 struct entity {
TAILQ_ENTRY(entity) entries;
char*path;  /* path relative to repository */
-   char*file;  /* filename */
+   char*file;  /* filename or valid repo path */
unsigned char   *data;  /* optional data blob */
size_t   datasz;/* length of optional data blob */
unsigned int repoid;/* repository identifier */
@@ -380,6 +380,7 @@ struct stats {
size_t   vrps; /* total number of vrps */
size_t   uniqs; /* number of unique vrps */
size_t   del_files; /* number of files removed in cleanup */
+   size_t   extra_files; /* number of superfluous files */
size_t   del_dirs; /* number of directories removed in cleanup */
size_t   brks; /* number of BGPsec Router Key (BRK) certificates */
struct timeval  elapsed_time;
@@ -506,7 +507,7 @@ void rrdp_clear(unsigned int);
 voidrrdp_save_state(unsigned int, struct rrdp_session *);
 int rrdp_handle_file(unsigned int, enum publish_type, char *,
char *, size_t, char *, size_t);
-char   *repo_basedir(const struct repo *);
+char   *repo_basedir(const struct repo *, int);
 unsigned intrepo_id(const struct repo *);
 const char *repo_uri(const struct repo *);
 struct repo*ta_lookup(int, struct tal *);
@@ -520,7 +521,8 @@ void rsync_finish(unsigned int, int);
 voidhttp_finish(unsigned int, enum http_result, const char *);
 voidrrdp_finish(unsigned int, int);
 
-voidrsync_fetch(unsigned int, const char *, const char *);
+voidrsync_fetch(unsigned int, const char *, const char *,
+   const char *);
 voidhttp_fetch(unsigned int, const char *, const char *, int);
 voidrrdp_fetch(unsigned int, const char *, const char *,
struct rrdp_session *);
Index: main.c
===
RCS file: /cvs/src/usr.sbin/rpki-client/main.c,v
retrieving revision 1.175
diff -u -p -r1.175 main.c
--- main.c  13 Jan 2022 13:18:41 -  1.175
+++ main.c  13 Jan 2022 15:50:40 -
@@ -151,17 +151,18 @@ entity_write_repo(struct repo *rp)
struct ibuf *b;
enum rtype type = RTYPE_REPO;
unsigned int repoid;
-   char *path;
+   char *path, *altpath;
int talid = 0;
 
repoid = repo_id(rp);
-   path = repo_basedir(rp);
+   path = repo_basedir(rp, 0);
+   altpath = repo_basedir(rp, 1);
b = io_new_buffer();
io_simple_buffer(b, &type, sizeof(type));
io_simple_buffer(b, &repoid, sizeof(repoid));
io_simple_buffer(b, &talid, sizeof(talid));
io_str_buffer(b, path);
-   io_str_buffer(b, NULL);
+   io_str_buffer(b, altpath);
io_buf_buffer(b, NULL, 0);
io_close_buffer(&procq, b);
free(path);
@@ -254,14 +255,15 @@ rrdp_fetch(unsigned int id, const char *
  * Request a repository sync via rsync URI to directory local.
  */
 void
-rsync_fetch(unsigned int id, const char *uri, const char *local)
+rsync_fetch(unsigned int id, const char *uri, const char *local,
+const char *base)
 {
struct ibuf *b;
 
b = io_new_buffer();
i