[jira] [Updated] (GEODE-7989) Improve logging of exceptions that happen during execution of backup

Jakov Varenina (Jira) Thu, 16 Apr 2020 03:01:06 -0700


     [ 
https://issues.apache.org/jira/browse/GEODE-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jakov Varenina updated GEODE-7989:
----------------------------------
    Description: 
While backup is executed on the servers and fails due to exception e.g. 
"IOException: Not enough space left on device" then this exception (feedback) 
is not propagated to the user of DistributedSystemMXBean.backupAllMembers API. 
It will only get list of members and disk-stores for which backup is 
successfully executed. But it will not have indication what caused backup to 
fail for some members since Exception is not logged on server when using log 
level less than debug (config, warn, ...). It would be good to have at least 
have better logging for following cases: 

1. Disk where oplogs are saved is to small for new oplog created by Geode 
backup procedure. This step is executed in Geode backup phase 
startDiskStoreBackup . If there is no enough space left on device, Geode will 
log that exception in DEBUG (see below). It would be good to have this logged 
in info or warning log level.

2. There is no enough space on disk where oplogs are copied for backup (this 
doesn't need to be the same disk as mentioned before, and it is not same disk 
for our case). This step in Geode is called completeBackup, and it doesn't log 
even debug log if problem appears, but disk stores are reported as offline 
(DiskBackupStatus.getOfflineDiskStores()).  It would be good to have this 
exception logged in info or warning log level.

Exception logged only in debug level:

java.io.IOException: Not enough space left on device
        at 
org.apache.geode.internal.shared.NativeCallsJNAImpl$POSIXNativeCalls.preBlow(NativeCallsJNAImpl.java:296)
        at org.apache.geode.internal.cache.Oplog.preblow(Oplog.java:1007)
        at org.apache.geode.internal.cache.Oplog.createCrf(Oplog.java:1073)
        at org.apache.geode.internal.cache.Oplog.<init>(Oplog.java:646)
        at org.apache.geode.internal.cache.Oplog.switchOpLog(Oplog.java:3723)
        at org.apache.geode.internal.cache.Oplog.forceRolling(Oplog.java:3643)
        at 
org.apache.geode.internal.cache.PersistentOplogSet.forceRoll(PersistentOplogSet.java:199)
        at 
org.apache.geode.internal.cache.backup.BackupTask.startDiskStoreBackup(BackupTask.java:274)
        at 
org.apache.geode.internal.cache.backup.BackupTask.startDiskStoreBackups(BackupTask.java:149)
        at 
org.apache.geode.internal.cache.backup.BackupTask.doBackup(BackupTask.java:111)
        at 
org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:82)
        at 
org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)



  was:
While backup is executed on the servers and fails due to exception e.g. 
"IOException: Not enough space left on device" then this exception (feedback) 
is not propagated to the user of DistributedSystemMXBean.backupAllMembers API. 
It will only get list of members and disk-stores for which backup is 
successfully executed. But it will not have indication what caused backup to 
fail for some members since Exception is not logged on server when using log 
level less than debug (config, warn, ...). It would be good to have at least 
have better logging for following cases: 

1. Disk where oplogs are saved is to small for new oplog created by Geode 
backup procedure. This step is executed in Geode backup phase 
startDiskStoreBackup . If there is no enough space left on device, Geode will 
log that exception in DEBUG (see below). It would be good to have this logged 
in info or warning log level.

2. There is no enough space on disk where oplogs are copied for backup (this 
doesn't need to be the same disk as mentioned before, and it is not same disk 
for our case). This step in Geode is called completeBackup, and it doesn't log 
even debug log if problem appears, but disk stores are reported as offline 
(DiskBackupStatus.getOfflineDiskStores()).

Exception logged only in debug level:

java.io.IOException: Not enough space left on device
        at 
org.apache.geode.internal.shared.NativeCallsJNAImpl$POSIXNativeCalls.preBlow(NativeCallsJNAImpl.java:296)
        at org.apache.geode.internal.cache.Oplog.preblow(Oplog.java:1007)
        at org.apache.geode.internal.cache.Oplog.createCrf(Oplog.java:1073)
        at org.apache.geode.internal.cache.Oplog.<init>(Oplog.java:646)
        at org.apache.geode.internal.cache.Oplog.switchOpLog(Oplog.java:3723)
        at org.apache.geode.internal.cache.Oplog.forceRolling(Oplog.java:3643)
        at 
org.apache.geode.internal.cache.PersistentOplogSet.forceRoll(PersistentOplogSet.java:199)
        at 
org.apache.geode.internal.cache.backup.BackupTask.startDiskStoreBackup(BackupTask.java:274)
        at 
org.apache.geode.internal.cache.backup.BackupTask.startDiskStoreBackups(BackupTask.java:149)
        at 
org.apache.geode.internal.cache.backup.BackupTask.doBackup(BackupTask.java:111)
        at 
org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:82)
        at 
org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)




> Improve logging of exceptions that happen during execution of backup
> --------------------------------------------------------------------
>
>                 Key: GEODE-7989
>                 URL: https://issues.apache.org/jira/browse/GEODE-7989
>             Project: Geode
>          Issue Type: Improvement
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While backup is executed on the servers and fails due to exception e.g. 
> "IOException: Not enough space left on device" then this exception (feedback) 
> is not propagated to the user of DistributedSystemMXBean.backupAllMembers 
> API. It will only get list of members and disk-stores for which backup is 
> successfully executed. But it will not have indication what caused backup to 
> fail for some members since Exception is not logged on server when using log 
> level less than debug (config, warn, ...). It would be good to have at least 
> have better logging for following cases: 
> 1. Disk where oplogs are saved is to small for new oplog created by Geode 
> backup procedure. This step is executed in Geode backup phase 
> startDiskStoreBackup . If there is no enough space left on device, Geode will 
> log that exception in DEBUG (see below). It would be good to have this logged 
> in info or warning log level.
> 2. There is no enough space on disk where oplogs are copied for backup (this 
> doesn't need to be the same disk as mentioned before, and it is not same disk 
> for our case). This step in Geode is called completeBackup, and it doesn't 
> log even debug log if problem appears, but disk stores are reported as 
> offline (DiskBackupStatus.getOfflineDiskStores()).  It would be good to have 
> this exception logged in info or warning log level.
> Exception logged only in debug level:
> java.io.IOException: Not enough space left on device
>         at 
> org.apache.geode.internal.shared.NativeCallsJNAImpl$POSIXNativeCalls.preBlow(NativeCallsJNAImpl.java:296)
>         at org.apache.geode.internal.cache.Oplog.preblow(Oplog.java:1007)
>         at org.apache.geode.internal.cache.Oplog.createCrf(Oplog.java:1073)
>         at org.apache.geode.internal.cache.Oplog.<init>(Oplog.java:646)
>         at org.apache.geode.internal.cache.Oplog.switchOpLog(Oplog.java:3723)
>         at org.apache.geode.internal.cache.Oplog.forceRolling(Oplog.java:3643)
>         at 
> org.apache.geode.internal.cache.PersistentOplogSet.forceRoll(PersistentOplogSet.java:199)
>         at 
> org.apache.geode.internal.cache.backup.BackupTask.startDiskStoreBackup(BackupTask.java:274)
>         at 
> org.apache.geode.internal.cache.backup.BackupTask.startDiskStoreBackups(BackupTask.java:149)
>         at 
> org.apache.geode.internal.cache.backup.BackupTask.doBackup(BackupTask.java:111)
>         at 
> org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:82)
>         at 
> org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
>         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (GEODE-7989) Improve logging of exceptions that happen during execution of backup

Reply via email to