I’ve just discovered a new failure on a different container too; # lxc move host2:nexus host1:nexus error: Error transferring container data: checkpoint failed: (00.355457) Error (files-reg.c:422): Can't dump ghost file /usr/local/sonatype-work/nexus/tmp/jar_cache5838699621686145685.tmp of 1177738 size, increase limit (00.355477) Error (cr-dump.c:1255): Dump files (pid: 22072) failed with -1 (00.357100) Error (cr-dump.c:1617): Dumping FAILED.
On 06/11/2015, 08:40, "lxc-users on behalf of Jamie Brown" <lxc-users-boun...@lists.linuxcontainers.org on behalf of jamie.br...@mpec.co.uk> wrote: >Tycho, > >Thanks for your help. > >The kernels were in fact different versions, though I’m not sure how I got >into that state! So they’re now both running 3.19.0. > >Now, I at least receive the same error when migrating in both directions; ># lxc move host2:test host1:test2 >error: Error transferring container data: restore failed: >(00.008103) 1: Error (mount.c:2030): Can't mount at ./dev/.lxd-mounts: No >such file or directory > ># lxc move host1:test1 host2:test1 >error: Error transferring container data: restore failed: >(00.008103) 1: Error (mount.c:2030): Can't mount at ./dev/.lxd-mounts: No such >file or directory > > > > >The backing store is the default (directory based). However, on host2 the >/var/lib/lxd/containers directory is a symlink to an ext3 mount. On host1 >they’re on ext4, is that likely to cause any issues? > >The strange thing is, [randomly] the live move DOES succeed. I’ve definitely >migrated a clean [running] container about 3 times from host2 to host1, but >then when I try again with a new container it fails. This even worked before I >updated the kernel. However, I can’t seem to find specific steps to replicate >the successful move. I’ve never succeeded in migrating the same container back >from host1 to host2 without stopping it. This is what is concerning me the >most, I would expect either permanent failure or permanent success. I keep >gaining false hope because the first time I migrated a container after >updating the kernel it worked, so I thought, problem solved! But then I >couldn’t migrate another :( > >-- Jamie > > > >05/11/2015, 16:58, "lxc-users on behalf of Tycho Andersen" ><lxc-users-boun...@lists.linuxcontainers.org on behalf of >tycho.ander...@canonical.com> wrote: > >>Hi Jamie, >> >>Thanks for trying it out. >> >>On Thu, Nov 05, 2015 at 11:39:43AM +0000, Jamie Brown wrote: >>> Hello again, >>> >>> Oddly, I've now re-installed the old server and configured it identically >>> to before (except now using RAID) and tried migrating a container back and >>> I am getting a different failure; >>> >>> # lxc move host2:test host1:test >>> >>> error: Error transferring container data: restore failed: >>> (00.007414) 1: Error (mount.c:2030): Can't mount at ./dev/.lxd-mounts: >>> No such file or directory >>> (00.026443) Error (cr-restore.c:1939): Restoring FAILED. >>> >>> The container appears in the remote container list whilst moving, but then >>> after failure it is deleted and it is in the STOPPED state on the source >>> host. >> >>Right, the restore failed, so the container had already been stopped >>from the dump, so it was stopped on the target. What we should really >>do is leave it in a frozen state after the dump, and once the restore >>succeeds then we can kill it. Hopefully that's something I can >>implement this cycle. >> >>As for the actual error, sounds like the target LXD didn't have >>shmounts but the source one did. Are they using different backing >>stores? What version of LXD are they? >> >>> >>> Here's the output from the log, not sure how much is relevant to the >>> migration attempt. >>> >>> # lxc info --show-log test >>> ... >>> lxc 1446723150.396 DEBUG lxc_start - start.c:__lxc_start:1210 - unknown >>> exit status for init: 9 >>> lxc 1446723150.396 DEBUG lxc_start - >>> start.c:__lxc_start:1215 - Pushing physical nics back to host namespace >>> lxc 1446723150.396 DEBUG lxc_start - >>> start.c:__lxc_start:1218 - Tearing down virtual network devices used by >>> container >>> lxc 1446723150.396 WARN lxc_conf - >>> conf.c:lxc_delete_network:2939 - failed to remove interface '(null)' >>> lxc 1446723150.396 INFO lxc_error - >>> error.c:lxc_error_set_and_log:55 - child <10499> ended on signal (9) >>> lxc 1446723150.396 WARN lxc_conf - >>> conf.c:lxc_delete_network:2939 - failed to remove interface '(null)' >>> lxc 1446723295.520 WARN lxc_cgmanager - >>> cgmanager.c:cgm_get:993 - do_cgm_get exited with error >>> lxc 1446723295.522 WARN lxc_cgmanager - >>> cgmanager.c:cgm_get:993 - do_cgm_get exited with error >>> >>> >>> If I try to migrate a container in the reverse direction, I get a similar >>> error; >>> >>> # lxc move host1:test1 host2:test1 >>> error: Error transferring container data: restore failed: >>> (00.001093) Error (cgroup.c:1204): cg: Can't mount controller dir >>> .criu.cgyard.aOuQtF/net_cls: No such file or directory >> >>This is probably because the kernel on host1 is newer than the >>kernel on host2 and has net_cls cgroup support where as host2's >>doesn't. >> >>Tycho >> >>> >>> >>> >>> Any ideas? >>> >>> -- Jamie >>> >>> >>> >>> On 05/11/2015, 08:05, "lxc-users on behalf of Jamie Brown" >>> <lxc-users-boun...@lists.linuxcontainers.org on behalf of >>> jamie.br...@mpec.co.uk> wrote: >>> >>> >Thanks Tycho, installing CRIU solved the problem; >>> > >>> ># apt-get install criu >>> > >>> >Should this package not be included as a dependency for LXD, or at least >>> >provide a meaningful warning if the package isn’t available? It seems odd >>> >to advertise out-the-box live migration in LXD, but then have to install >>> >another package to provide it. >>> > >>> >Is this in the documentation anywhere? >>> > >>> >Thanks again. >>> > >>> >-- Jamie >>> > >>> > >>> > >>> > >>> >On 04/11/2015, 16:47, "lxc-users on behalf of Tycho Andersen" >>> ><lxc-users-boun...@lists.linuxcontainers.org on behalf of >>> >tycho.ander...@canonical.com> wrote: >>> > >>> >>On Wed, Nov 04, 2015 at 01:48:44PM +0000, Jamie Brown wrote: >>> >>> Greetings all. >>> >>> >>> >>> I’ve been using LXD in a development environment for a few weeks and so >>> >>> far very impressed, >>> >>> I can see a really bright future for this technology! >>> >>> >>> >>> However, today I thought I’d try out the live migration, based on the >>> >>> following guide; >>> >>> https://insights.ubuntu.com/2015/05/06/live-migration-in-lxd/ >>> >>> >>> >>> I believe I have followed the steps correctly, however when I run the >>> >>> move command, I >>> >>> receive the following output; >>> >>> >>> >>> # lxc move host1:test host2:test >>> >>> error: Error transferring container data: checkpoint failed: >>> >>> Problem accessing CRIU log: open /tmp/lxd_migration_899480871/dump.log: >>> >>> no such file or directory >>> >>> >>> >>> The file it is referring to above doesn't exist. However, there are >>> >>> other lxd_migration_* >>> >>> directories with different numbers appended. Each time I attempt the >>> >>> migration a new directory >>> >>> is created (e.g. lxd_migration_192965652), but there is no dump.log in >>> >>> there. >>> >>> >>> >>> The migration doesn't create a log file as per the guide above in; >>> >>> /var/log/lxd/test/migration_{dump|restore}_.log >>> >>> >>> >>> Steps I've taken; >>> >>> >>> >>> - Copied all profiles from host1 to host2 >>> >>> - Added the migratable profile to the container >>> >>> - Removed lxcfs package (on both hosts) >>> >>> - Added the remote HTTPS hosts for both the local and remote hosts >>> >>> >>> >>> Both hosts are running Ubuntu 14.04.3 LTS (x64) with LXD version 0.21. >>> >>> >>> >>> The only difference I can tell between my hosts and the guide is that >>> >>> the 'migratable' >>> >>> profile (which came out-the-box with my LXD installation) doesn't >>> >>> contain the autostart >>> >>> entries as in the guide above; >>> >>> >>> >>> # lxc profile show migratable >>> >>> name: migratable >>> >>> config: >>> >>> raw.lxc: |- >>> >>> lxc.console = none >>> >>> lxc.cgroup.devices.deny = c 5:1 rwm >>> >>> lxc.seccomp = >>> >>> security.privileged: "true" >>> >>> devices: {} >>> >>> >>> >>> >>> >>> Any help would be much appreciated! >>> >> >>> >>Have you installed CRIU? lxc info --show-log test probably has more >>> >>info about what failed, but my guess is that it can't find CRIU if you >>> >>haven't installed it. >>> >> >>> >>Tycho >>> >> >>> >>> Thank you, >>> >>> >>> >>> Jamie >>> >>> >>> >>> _______________________________________________ >>> >>> lxc-users mailing list >>> >>> lxc-users@lists.linuxcontainers.org >>> >>> http://lists.linuxcontainers.org/listinfo/lxc-users >>> >>_______________________________________________ >>> >>lxc-users mailing list >>> >>lxc-users@lists.linuxcontainers.org >>> >>http://lists.linuxcontainers.org/listinfo/lxc-users >>> >_______________________________________________ >>> >lxc-users mailing list >>> >lxc-users@lists.linuxcontainers.org >>> >http://lists.linuxcontainers.org/listinfo/lxc-users >>> _______________________________________________ >>> lxc-users mailing list >>> lxc-users@lists.linuxcontainers.org >>> http://lists.linuxcontainers.org/listinfo/lxc-users >>_______________________________________________ >>lxc-users mailing list >>lxc-users@lists.linuxcontainers.org >>http://lists.linuxcontainers.org/listinfo/lxc-users >_______________________________________________ >lxc-users mailing list >lxc-users@lists.linuxcontainers.org >http://lists.linuxcontainers.org/listinfo/lxc-users _______________________________________________ lxc-users mailing list lxc-users@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-users