Re: dump LOB status

2020-09-29 Thread Juha Erkkilä



> On 26. Sep 2020, at 9.31, Otto Moerbeek  wrote:
> Indeed, that commit was reverted in FreeBSD. This should do better. I
> do not like the assert FreeBSD has, so I turned into an quit().

Works for me.  Thanks!

> Index: tape.c
> ===
> RCS file: /cvs/src/sbin/dump/tape.c,v
> retrieving revision 1.45
> diff -u -p -r1.45 tape.c
> --- tape.c28 Jun 2019 13:32:43 -  1.45
> +++ tape.c26 Sep 2020 06:30:37 -
> @@ -330,7 +330,10 @@ flushtape(void)
>   }
> 
>   blks = 0;
> - if (spcl.c_type != TS_END) {
> + if (spcl.c_type != TS_END && spcl.c_type != TS_CLRI &&
> + spcl.c_type != TS_BITS) {
> + if (spcl.c_count > TP_NINDIR)
> + quit("c_count too large\n");
>   for (i = 0; i < spcl.c_count; i++)
>   if (spcl.c_addr[i] != 0)
>   blks++;



Re: dump LOB status

2020-09-26 Thread Otto Moerbeek
On Fri, Sep 25, 2020 at 07:49:20AM +0200, Otto Moerbeek wrote:

> On Fri, Sep 25, 2020 at 08:42:38AM +0300, Juha Erkkilä wrote:
> 
> > 
> > > On 24. Sep 2020, at 15.36, Otto Moerbeek  wrote:
> > > 
> > > On Tue, Sep 22, 2020 at 08:37:22PM +0300, Juha Erkkilä wrote:
> > >> Actually, I tested this again and now it appears
> > >> dump and restore both work correctly. Previously,
> > >> I first tested dump/restore with an empty filesystem,
> > >> then with some files, and it may be that the second
> > >> time I was accidentally testing restore with the first
> > >> dump file.
> > >> 
> > >> My tests were only with a small amount of files,
> > >> I will do a better test with proper data (about
> > >> 0.5 terabytes and over 10 files) and will
> > >> report again here in a next few days.
> > > 
> > > Lookin through FreeBSD commits I think you want the main.c one as
> > > well, otherwise silent corruption of the dump is still possible.
> > > 
> > >   -Otto
> > 
> > With that patch I get a message:
> > 
> > fatal: morestack on g0
> >   DUMP: fs is too large for dump!
> >   DUMP: The ENTIRE dump is aborted.
> > 
> > This is on a 2 terabyte filesystem with 0.5 terabytes
> > of data “successfully” backed up (or at least I considered
> > the backup and restore as successful).
> 
> Hmm, I neeed to dig into the dump format and see if the math is right.

Indeed, that commit was reverted in FreeBSD. This should do better. I
do not like the assert FreeBSD has, so I turned into an quit().

-Otto

Index: tape.c
===
RCS file: /cvs/src/sbin/dump/tape.c,v
retrieving revision 1.45
diff -u -p -r1.45 tape.c
--- tape.c  28 Jun 2019 13:32:43 -  1.45
+++ tape.c  26 Sep 2020 06:30:37 -
@@ -330,7 +330,10 @@ flushtape(void)
}
 
blks = 0;
-   if (spcl.c_type != TS_END) {
+   if (spcl.c_type != TS_END && spcl.c_type != TS_CLRI &&
+   spcl.c_type != TS_BITS) {
+   if (spcl.c_count > TP_NINDIR)
+   quit("c_count too large\n");
for (i = 0; i < spcl.c_count; i++)
if (spcl.c_addr[i] != 0)
blks++;



Re: dump LOB status

2020-09-25 Thread Craig Skinner
On Thu, 24 Sep 2020 18:04:15 +0300 Juha Erkkilä wrote:
> 
> I tested this with 0.5 terabytes and approximately 70 thousand files,
> with level 0 and 1 dumps, doing some additions/deletions/moves between
> dumps (no inplace modifications to files, though).
>
> It appears both dump and restore worked correctly. I did not check
> all file contents though, but compared path listings and did contents
> check to some randomly sampled files.
> 

FYI: http://www.CoreDumps.De/doc/dump/zwicky/testdump.doc.html



Re: dump LOB status

2020-09-25 Thread Juha Erkkilä


> On 24. Sep 2020, at 15.36, Otto Moerbeek  wrote:
> 
> On Tue, Sep 22, 2020 at 08:37:22PM +0300, Juha Erkkilä wrote:
>> Actually, I tested this again and now it appears
>> dump and restore both work correctly. Previously,
>> I first tested dump/restore with an empty filesystem,
>> then with some files, and it may be that the second
>> time I was accidentally testing restore with the first
>> dump file.
>> 
>> My tests were only with a small amount of files,
>> I will do a better test with proper data (about
>> 0.5 terabytes and over 10 files) and will
>> report again here in a next few days.
> 
> Lookin through FreeBSD commits I think you want the main.c one as
> well, otherwise silent corruption of the dump is still possible.
> 
>   -Otto

With that patch I get a message:

fatal: morestack on g0
  DUMP: fs is too large for dump!
  DUMP: The ENTIRE dump is aborted.

This is on a 2 terabyte filesystem with 0.5 terabytes
of data “successfully” backed up (or at least I considered
the backup and restore as successful).



Re: dump LOB status

2020-09-24 Thread Otto Moerbeek
On Fri, Sep 25, 2020 at 08:42:38AM +0300, Juha Erkkilä wrote:

> 
> > On 24. Sep 2020, at 15.36, Otto Moerbeek  wrote:
> > 
> > On Tue, Sep 22, 2020 at 08:37:22PM +0300, Juha Erkkilä wrote:
> >> Actually, I tested this again and now it appears
> >> dump and restore both work correctly. Previously,
> >> I first tested dump/restore with an empty filesystem,
> >> then with some files, and it may be that the second
> >> time I was accidentally testing restore with the first
> >> dump file.
> >> 
> >> My tests were only with a small amount of files,
> >> I will do a better test with proper data (about
> >> 0.5 terabytes and over 10 files) and will
> >> report again here in a next few days.
> > 
> > Lookin through FreeBSD commits I think you want the main.c one as
> > well, otherwise silent corruption of the dump is still possible.
> > 
> > -Otto
> 
> With that patch I get a message:
> 
> fatal: morestack on g0
>   DUMP: fs is too large for dump!
>   DUMP: The ENTIRE dump is aborted.
> 
> This is on a 2 terabyte filesystem with 0.5 terabytes
> of data “successfully” backed up (or at least I considered
> the backup and restore as successful).

Hmm, I neeed to dig into the dump format and see if the math is right.

-Otto



Re: dump LOB status

2020-09-24 Thread Juha Erkkilä



> On 22. Sep 2020, at 9.00, Otto Moerbeek  wrote:
> 
> On Mon, Sep 21, 2020 at 10:23:55PM +0300, Juha Erkkilä wrote:
>> 
>> It looks like the same issue has been fixed in
>> FreeBSD: https://svnweb.freebsd.org/base?view=revision=334979 
>> 
>> 
>> The diff applies cleanly to the current OpenBSD source tree.
> 
> Maybe by hand, but not by using patch(1), the context differs a bit.
> 
> Next obvious question: did you test if it fixes your problem? That
> means, do you get a dump that can be restored again?
> 
>   -Otto

Two of my previous mails did not make it to the list
for some reason, but that does not matter.

I tested this with 0.5 terabytes and approximately
70 thousand files, with level 0 and 1 dumps,
doing some additions/deletions/moves between
dumps (no inplace modifications to files, though).
It appears both dump and restore worked
correctly. I did not check all file contents though,
but compared path listings and did contents check
to some randomly sampled files.

I will also test the patch Otto sent to this
list a while ago. This may take some time.



Re: dump LOB status

2020-09-24 Thread Otto Moerbeek
On Tue, Sep 22, 2020 at 08:37:22PM +0300, Juha Erkkilä wrote:

> 
> > On 22. Sep 2020, at 15.04, Juha Erkkilä  wrote:
> > 
> >> On 22. Sep 2020, at 9.00, Otto Moerbeek  wrote:
> >> Maybe by hand, but not by using patch(1), the context differs a bit.
> >> 
> >> Next obvious question: did you test if it fixes your problem? That
> >> means, do you get a dump that can be restored again?
> >> 
> >>-Otto
> > 
> > Thanks Otto for a very good question!  So no,
> > do not use that patch as is, it breaks restore
> > as it can not be used to restore any files.
> 
> Actually, I tested this again and now it appears
> dump and restore both work correctly. Previously,
> I first tested dump/restore with an empty filesystem,
> then with some files, and it may be that the second
> time I was accidentally testing restore with the first
> dump file.
> 
> My tests were only with a small amount of files,
> I will do a better test with proper data (about
> 0.5 terabytes and over 10 files) and will
> report again here in a next few days.

Lookin through FreeBSD commits I think you want the main.c one as
well, otherwise silent corruption of the dump is still possible.

-Otto

Index: main.c
===
RCS file: /cvs/src/sbin/dump/main.c,v
retrieving revision 1.61
diff -u -p -r1.61 main.c
--- main.c  28 Jun 2019 13:32:43 -  1.61
+++ main.c  24 Sep 2020 10:24:45 -
@@ -92,7 +92,7 @@ main(int argc, char *argv[])
int ch, mode;
struct tm then;
struct statfs fsbuf;
-   int i, anydirskipped, bflag = 0, Tflag = 0, honorlevel = 1;
+   int i, anydirskipped, c_count, bflag = 0, Tflag = 0, honorlevel = 1;
ino_t maxino;
time_t t;
int dirlist;
@@ -442,6 +442,9 @@ main(int argc, char *argv[])
 #endif
maxino = (ino_t)sblock->fs_ipg * sblock->fs_ncg;
mapsize = roundup(howmany(maxino, NBBY), TP_BSIZE);
+   c_count = howmany(mapsize * sizeof(char), TP_BSIZE);
+   if (c_count > TP_NINDIR)
+   quit("fs is too large for dump!");
usedinomap = calloc((unsigned) mapsize, sizeof(char));
dumpdirmap = calloc((unsigned) mapsize, sizeof(char));
dumpinomap = calloc((unsigned) mapsize, sizeof(char));
Index: tape.c
===
RCS file: /cvs/src/sbin/dump/tape.c,v
retrieving revision 1.45
diff -u -p -r1.45 tape.c
--- tape.c  28 Jun 2019 13:32:43 -  1.45
+++ tape.c  24 Sep 2020 10:24:45 -
@@ -330,7 +330,8 @@ flushtape(void)
}
 
blks = 0;
-   if (spcl.c_type != TS_END) {
+   if (spcl.c_type != TS_END && spcl.c_type != TS_CLRI &&
+   spcl.c_type != TS_BITS) {
for (i = 0; i < spcl.c_count; i++)
if (spcl.c_addr[i] != 0)
blks++;



Re: dump LOB status

2020-09-22 Thread Otto Moerbeek
On Mon, Sep 21, 2020 at 10:23:55PM +0300, Juha Erkkilä wrote:

> 
> 
> > On 16. Sep 2020, at 20.27, Juha Erkkilä  wrote:
> > 
> > 
> >> On 16. Sep 2020, at 0.18, Kenneth Gober  wrote:
> >> I took a very quick look at the source and it appears that 213 is shown in
> >> octal.  I believe that the 200 bit indicates that a core file was produced,
> >> and 13 is probably a signal number (13 octal equals 11 decimal which would
> >> be SIGSEGV).  I am not sure whether the size of the file system is itself
> >> the cause, I have been using dump(8) to back up a large (currently 6.7TB)
> >> volume to tape for years (several tapes, actually) and it works fine,
> >> although that system is still on 6.1/amd64.  I looked in CVS and didn't see
> >> any obvious diffs between 6.1 and 6.6 that jumped out at me as potential
> >> causes, so perhaps the issue has been latent for a long time and I haven't
> >> seen it because it's triggered by the particulars of one or more files
> >> rather than the overall file system size.  Maybe if an individual file gets
> >> too big, or is too 'sparse' or something?
> > 
> > I can reproduce this on -current from Fri Sep 11 11:30:09
> > with a freshly created and an empty filesystem of 2 terabytes.
> 
> It looks like the same issue has been fixed in
> FreeBSD: https://svnweb.freebsd.org/base?view=revision=334979 
> 
> 
> The diff applies cleanly to the current OpenBSD source tree.

Maybe by hand, but not by using patch(1), the context differs a bit.

Next obvious question: did you test if it fixes your problem? That
means, do you get a dump that can be restored again?

-Otto



Re: dump LOB status

2020-09-21 Thread Juha Erkkilä



> On 16. Sep 2020, at 20.27, Juha Erkkilä  wrote:
> 
> 
>> On 16. Sep 2020, at 0.18, Kenneth Gober  wrote:
>> I took a very quick look at the source and it appears that 213 is shown in
>> octal.  I believe that the 200 bit indicates that a core file was produced,
>> and 13 is probably a signal number (13 octal equals 11 decimal which would
>> be SIGSEGV).  I am not sure whether the size of the file system is itself
>> the cause, I have been using dump(8) to back up a large (currently 6.7TB)
>> volume to tape for years (several tapes, actually) and it works fine,
>> although that system is still on 6.1/amd64.  I looked in CVS and didn't see
>> any obvious diffs between 6.1 and 6.6 that jumped out at me as potential
>> causes, so perhaps the issue has been latent for a long time and I haven't
>> seen it because it's triggered by the particulars of one or more files
>> rather than the overall file system size.  Maybe if an individual file gets
>> too big, or is too 'sparse' or something?
> 
> I can reproduce this on -current from Fri Sep 11 11:30:09
> with a freshly created and an empty filesystem of 2 terabytes.

It looks like the same issue has been fixed in
FreeBSD: https://svnweb.freebsd.org/base?view=revision=334979 


The diff applies cleanly to the current OpenBSD source tree.


Re: dump LOB status

2020-09-17 Thread Sebastien Marie
On Tue, Sep 15, 2020 at 03:19:25PM -, Stuart Henderson wrote:
> On 2020-09-15, Jose Soares  wrote:
> > Hi!
> >
> > I am getting the following output from dump:
> >
> >  # dump -0au -f /dev/nrst0 /dev/rsd0d
> >   DUMP: Date of this level 0 dump: Tue Sep 15 16:23:09 2020
> >   DUMP: Date of last level 0 dump: the epoch
> >   DUMP: Dumping /dev/rsd0d to /dev/nrst0
> >   DUMP: mapping (Pass I) [regular files]
> >   DUMP: mapping (Pass II) [directories]
> >   DUMP: estimated 2843256661 tape blocks.
> >   DUMP: Volume 1 started at: Tue Sep 15 16:24:11 2020
> >   DUMP: Child 97414 returns LOB status 213
> >
> > Could you please explain the meaning of "LOB status 213"?
> 
> LOB=low-order byte
> 
> What 213 represents, I'm not sure...
 
The message comes from sbin/dump/tape.c

   592  #ifdef TDEBUG
   593  msg("Tape: %d; parent process: %d child process %d\n",
   594  tapeno+1, parentpid, childpid);
   595  #endif /* TDEBUG */
   596  while ((waitingpid = wait()) != childpid)
   597  msg("Parent %d waiting for child %d has another 
child %d return\n",
   598  parentpid, childpid, waitingpid);
   599  if (status & 0xFF) {
   600  msg("Child %d returns LOB status %o\n",
   601  childpid, status&0xFF);
   602  }

213 is octal number (139, 0x8b) of exit code of child process.

As the status is &0xFF, I am not 100% sure, but usually an exit code
of 139 means that the process terminated due to receipt of signal 11,
and generated a coredump.

Do you have a dump.core file ? Can you extract the backtrace ?

Thanks.
-- 
Sebastien Marie



Re: dump LOB status

2020-09-16 Thread Juha Erkkilä


> On 16. Sep 2020, at 0.18, Kenneth Gober  wrote:
> I took a very quick look at the source and it appears that 213 is shown in
> octal.  I believe that the 200 bit indicates that a core file was produced,
> and 13 is probably a signal number (13 octal equals 11 decimal which would
> be SIGSEGV).  I am not sure whether the size of the file system is itself
> the cause, I have been using dump(8) to back up a large (currently 6.7TB)
> volume to tape for years (several tapes, actually) and it works fine,
> although that system is still on 6.1/amd64.  I looked in CVS and didn't see
> any obvious diffs between 6.1 and 6.6 that jumped out at me as potential
> causes, so perhaps the issue has been latent for a long time and I haven't
> seen it because it's triggered by the particulars of one or more files
> rather than the overall file system size.  Maybe if an individual file gets
> too big, or is too 'sparse' or something?

I can reproduce this on -current from Fri Sep 11 11:30:09
with a freshly created and an empty filesystem of 2 terabytes.



Re: dump LOB status

2020-09-15 Thread Juha Erkkilä


> On 15. Sep 2020, at 18.54, Jose Soares  wrote:
> 
> Thank you, Stuart.
> 
> I am facing this when issuing the dump command of a "large" file system
> (2.7TB).
> dump command has finished successfully for the other smaller file systems.
> 
> # df -h
> Filesystem SizeUsed   Avail Capacity  Mounted on
> /dev/wd0a  2.0G237M1.6G12%/
> /dev/sd0d 10.8T2.7T7.6T26%/home
> /dev/wd0d  3.9G146K3.7G 0%/tmp
> /dev/wd0f  3.9G956M2.8G25%/usr
> /dev/wd0g  2.0G253M1.6G13%/usr/X11R6
> /dev/wd0h  5.9G   15.6M5.6G 0%/usr/local
> /dev/wd0j  3.1G2.0K3.0G 0%/usr/obj
> /dev/wd0i  2.0G2.0K1.9G 0%/usr/src
> /dev/wd0e  5.9G106M5.5G 2%/var
> 
> The only contribution I was able to find via Google was
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244470 where a similar
> problem was being reported also regarding a dump of a large file system,
> but for FreeBSD.
> 
> Any suggestion to get the dump working or to better understand what is
> happening?

Segfault on dump, tape.c line 335 spcl.c_addr[I], it overflows.

A workaround is to raise TP_BSIZE from 1024 to
something bigger (maybe 8192?) in /usr/include/protocols/dumprestore.h
and recompile dump. Not a proper fix!

(Also happened to me maybe a week ago, recent -current,
indeed the filesystem was big (2 terabytes)).



Re: dump LOB status

2020-09-15 Thread Kenneth Gober
On Tue, Sep 15, 2020 at 12:04 PM Jose Soares 
wrote:

> I am facing this when issuing the dump command of a "large" file system
> (2.7TB).
> dump command has finished successfully for the other smaller file systems.
>
> On Tue, Sep 15, 2020 at 4:47 PM Stuart Henderson 
> wrote:
> > On 2020-09-15, Jose Soares  wrote:
> > >   DUMP: Child 97414 returns LOB status 213
> > >
> > > Could you please explain the meaning of "LOB status 213"?
> >
> > LOB=low-order byte
> >
> > What 213 represents, I'm not sure...
>

I took a very quick look at the source and it appears that 213 is shown in
octal.  I believe that the 200 bit indicates that a core file was produced,
and 13 is probably a signal number (13 octal equals 11 decimal which would
be SIGSEGV).  I am not sure whether the size of the file system is itself
the cause, I have been using dump(8) to back up a large (currently 6.7TB)
volume to tape for years (several tapes, actually) and it works fine,
although that system is still on 6.1/amd64.  I looked in CVS and didn't see
any obvious diffs between 6.1 and 6.6 that jumped out at me as potential
causes, so perhaps the issue has been latent for a long time and I haven't
seen it because it's triggered by the particulars of one or more files
rather than the overall file system size.  Maybe if an individual file gets
too big, or is too 'sparse' or something?

-ken


Re: dump LOB status

2020-09-15 Thread Jose Soares
Thank you, Stuart.

I am facing this when issuing the dump command of a "large" file system
(2.7TB).
dump command has finished successfully for the other smaller file systems.

# df -h
Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/wd0a  2.0G237M1.6G12%/
/dev/sd0d 10.8T2.7T7.6T26%/home
/dev/wd0d  3.9G146K3.7G 0%/tmp
/dev/wd0f  3.9G956M2.8G25%/usr
/dev/wd0g  2.0G253M1.6G13%/usr/X11R6
/dev/wd0h  5.9G   15.6M5.6G 0%/usr/local
/dev/wd0j  3.1G2.0K3.0G 0%/usr/obj
/dev/wd0i  2.0G2.0K1.9G 0%/usr/src
/dev/wd0e  5.9G106M5.5G 2%/var

The only contribution I was able to find via Google was
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244470 where a similar
problem was being reported also regarding a dump of a large file system,
but for FreeBSD.

Any suggestion to get the dump working or to better understand what is
happening?

Jose Soares



On Tue, Sep 15, 2020 at 4:47 PM Stuart Henderson 
wrote:

> On 2020-09-15, Jose Soares  wrote:
> > Hi!
> >
> > I am getting the following output from dump:
> >
> >  # dump -0au -f /dev/nrst0 /dev/rsd0d
> >   DUMP: Date of this level 0 dump: Tue Sep 15 16:23:09 2020
> >   DUMP: Date of last level 0 dump: the epoch
> >   DUMP: Dumping /dev/rsd0d to /dev/nrst0
> >   DUMP: mapping (Pass I) [regular files]
> >   DUMP: mapping (Pass II) [directories]
> >   DUMP: estimated 2843256661 tape blocks.
> >   DUMP: Volume 1 started at: Tue Sep 15 16:24:11 2020
> >   DUMP: Child 97414 returns LOB status 213
> >
> > Could you please explain the meaning of "LOB status 213"?
>
> LOB=low-order byte
>
> What 213 represents, I'm not sure...
>
>
>


Re: dump LOB status

2020-09-15 Thread Stuart Henderson
On 2020-09-15, Jose Soares  wrote:
> Hi!
>
> I am getting the following output from dump:
>
>  # dump -0au -f /dev/nrst0 /dev/rsd0d
>   DUMP: Date of this level 0 dump: Tue Sep 15 16:23:09 2020
>   DUMP: Date of last level 0 dump: the epoch
>   DUMP: Dumping /dev/rsd0d to /dev/nrst0
>   DUMP: mapping (Pass I) [regular files]
>   DUMP: mapping (Pass II) [directories]
>   DUMP: estimated 2843256661 tape blocks.
>   DUMP: Volume 1 started at: Tue Sep 15 16:24:11 2020
>   DUMP: Child 97414 returns LOB status 213
>
> Could you please explain the meaning of "LOB status 213"?

LOB=low-order byte

What 213 represents, I'm not sure...