Ok, so I've checked nearly everything hardware related:

- replaced the Rasperry PI to rule out a defect on the NIC
- replace the patch cable
- moved the RPI to the same switch as the other RPI that is backing up fine

I was just about to build the fd from source, when I had a closer look to the point in time when the error occured:



2018-03-30 20:51:11heBacula-dir JobId 1550: Sending Accurate information to the FD. 2018-03-30 20:51:24heBacula-dir JobId 1550: Error: bsock.c:721 Write error sending 115 bytes to Client: heRPI02-fd:heRPI02:9102: ERR=Connection reset by peer


So I set accurate = no in the Job configuration.

From the bacula docs:

*Accurate = yesno*
    In accurate mode, the File daemon knowns exactly which files were
    present after the last backup. So it is able to handle deleted or
    renamed files.

    When restoring a FileSet for a specified date (including "most
    recent"), Bacula is able to restore exactly the files and
    directories that existed at the time of the last backup prior to
    that date including ensuring that deleted files are actually
    deleted, and renamed directories are restored properly.

    In this mode, the File daemon must keep data concerning all files
    in memory. So If you do not have sufficient memory, the backup may
    either be terribly slow or fail.

    _*For 500.000 files (a typical desktop linux system), it will
    require approximately 64 Megabytes of RAM*_ on your File daemon to
    hold the required information.



IMO the RPI should easily be able to spare the 64MB of RAM -- but the last 20ish incremental backups ran fine. So setting Accurate = no seems to be a viable
workaround.

Cheers, Thorsten



On 30.03.2018 21:22, Johannsen, Thorsten wrote:
Hello list,

I have two Raspberry PI's running Raspbian:


One of the Raspi's (heRPI01) is being backed up without any problems. The other one (heRPI02), however, refused to make
backups after a first and successful backup:


$ uname -a
Linux heRPI02 4.9.80-v7+ #1098 SMP Fri Mar 9 19:11:42 GMT 2018 armv7l GNU/Linux

$ cat /etc/debian_version
9.4


From the job log:

2018-03-30 20:51:11heBacula-dir JobId 1550: Start Backup JobId 1550, Job=Job_heRPI02.2018-03-30_20.51.09_16 2018-03-30 20:51:11heBacula-dir JobId 1550: Using Device "AC-IncBackup-Dev03" to write. 2018-03-30 20:51:11heBacula-dir JobId 1550: Sending Accurate information to the FD. 2018-03-30 20:51:24heBacula-dir JobId 1550: Error: bsock.c:721 Write error sending 115 bytes to Client: heRPI02-fd:heRPI02:9102: ERR=Connection reset by peer 2018-03-30 20:51:24heBacula-dir JobId 1550: Error: bsock.c:609 Socket has errors=1 on call to Client: heRPI02-fd:heRPI02:9102 2018-03-30 20:51:24heBacula-dir JobId 1550: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer 2018-03-30 20:52:24heBacula-dir JobId 1550: Fatal error: No Job status returned from FD.

[...]

Elapsed time:           1 min 12 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Comm Line Compression:  None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               yes
  Volume name(s):
  Volume Session Id:      1
  Volume Session Time:    1522436494
  Last Volume Bytes:      0 (0 B)
  Non-fatal FD errors:    2
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  OK
  Termination:            *** Backup Error ***




I started the FD and the SD with "-d 999" and uploaded the debug output to pastebin:



SD: https://pastebin.com/Q2tSTWx7

FD: https://pastebin.com/Rq5TSiuN





As an end-user (in contrast to a developer ;-) ) I see that the fd segfault's: (<-- <-- <--)

heRPI02-fd: job.c:311-1552 Executing Dir accurate files=50654
 command.
heRPI02-fd: htable.c:67-1552 malloc buf=752be020 size=1000000 rem=19673568 heRPI02-fd: htable.c:130-1552 Leave hash_index hash=0x752be068 index=59779085
heRPI02-fd: htable.c:283-1552 Insert: hash=0 index=59779085
heRPI02-fd: htable.c:286-1552 Insert hp=752be030 index=1507 item=752be030 offset=0 heRPI02-fd: htable.c:293-1552 Insert hp->next=0 hp->hash=0x390280d hp->key=<NULL> heRPI02-fd: htable.c:299-1552 Leave insert index=1507 num_items=1 key=/mnt/ heRPI02-fd: accurate.c:230-1552 add fname=</mnt/> lstat=LMC HtB EHt C A A A BAA BAA I BaqFXr BaqEVw BaqFXr A A d  delta_seq=0 chksum= heRPI02-fd: htable.c:130-1552 Leave hash_index hash=0x12c4528 index=-1036830508
heRPI02-fd: htable.c:283-1552 Insert: hash=3b2ac index=-1036830508
heRPI02-fd: htable.c:286-1552 Insert hp=752be0b0 index=7140 item=752be0b0 offset=0 heRPI02-fd: signal.c:135-1552 sig=11 Segmentation violation <-- <-- <-- <--
heRPI02-fd: signal.c:205-1552 Working=/var/lib/bacula
heRPI02-fd: signal.c:206-1552 btpath=/usr/sbin/btraceback
heRPI02-fd: signal.c:207-1552 exepath=/usr/sbin/bacula-fd
heRPI02-fd: signal.c:236-1552 Doing waitpid
bsmtp: bsmtp.c:488-0 Failed to connect to mailhost localhost
heRPI02-fd: signal.c:238-1552 Done waitpid
heRPI02-fd: lockmgr.c:1157-1552 lockmgr disabled



Any ideas what could be wrong? Anything obvious I'm missing?



Thanks a lot and have a nice (easter) weekend!


Cheers,

Thorsten





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to