Bug#987368: Installer fails at first menu "Choose language"

2021-06-11 Thread Frédéric Bonnard
Hi Steve,

awesome, couldn't wait to test it, so I built udpkg 1.20 from salsa and
integrated in latest debian-installer built to have a mini.iso to play
with.

I tried that iso 5-6 times on each of the 2 physical Power machines on
which I encountered the bug and it worked every single time.

From my perspective, it's a huge improvement for bullseye.

Thanks Steve and everyone!

F.


On Fri, 11 Jun 2021 02:57:21 +0100, Steve McIntyre  wrote:
> On Thu, Jun 10, 2021 at 12:11:05AM +0100, Steve McIntyre wrote:
> >
> >Looking at the history in this bug, things are not working as we hoped
> >when we added the multi-console support. When I initially worked with
> >Wookey on this, we didn't see errors like this at all in
> >testing. That's not to say that there's *not* a problem here, but
> >maybe other changes made since then have caused it to be uncovered.
> >
> >Multi-console support is a significant improvement for a number of
> >non-x86 users. This is particularly the case for those with arm64
> >systems where the firmware might default to the primary console being
> >a serial port but the user doesn't even know that. We wanted to be
> >able to start d-i on all the likely-looking consoles (serial *and* tty
> >*and* graphical), allowing the user to interact with the one they
> >preferred.
> >
> >In our testing, I don't remember ever seeing udpkg invocations racing
> >against each other like this. But in my own testing for d-i Bullseye
> >RC2 in an arm64 VM here I've just seen this exact problem myself so
> >it's clearly a thing!
> >
> >I'm looking at udpkg now to see what I can do there. I'm hoping that
> >it might be a reasonably quick fix use filesytem-based locking around
> >status file updates.
> 
> Having experimented with exactly that, after a little bit of tweaking
> I think I've fixed the bug. Previously I could reproduce this bug
> readily, ~75% of the time on my local arm64 VM. With my new udpkg
> build included, I've just run things through a dozen times in
> succession with no problem encountered. I think that's good enough, so
> I've pushed and uploaded a new udpkg into unstable.
> 
> Please check back in a couple of days with a daily build and validate
> this fixes things for you too.
> 
> -- 
> Steve McIntyre, Cambridge, UK.st...@einval.com
> < Aardvark> I dislike C++ to start with. C++11 just seems to be
> handing rope-creating factories for users to hang multiple
> instances of themselves.
> 


signature.asc
Description: PGP signature


Bug#987368: Installer fails at first menu "Choose language"

2021-06-10 Thread Steve McIntyre
On Thu, Jun 10, 2021 at 12:11:05AM +0100, Steve McIntyre wrote:
>
>Looking at the history in this bug, things are not working as we hoped
>when we added the multi-console support. When I initially worked with
>Wookey on this, we didn't see errors like this at all in
>testing. That's not to say that there's *not* a problem here, but
>maybe other changes made since then have caused it to be uncovered.
>
>Multi-console support is a significant improvement for a number of
>non-x86 users. This is particularly the case for those with arm64
>systems where the firmware might default to the primary console being
>a serial port but the user doesn't even know that. We wanted to be
>able to start d-i on all the likely-looking consoles (serial *and* tty
>*and* graphical), allowing the user to interact with the one they
>preferred.
>
>In our testing, I don't remember ever seeing udpkg invocations racing
>against each other like this. But in my own testing for d-i Bullseye
>RC2 in an arm64 VM here I've just seen this exact problem myself so
>it's clearly a thing!
>
>I'm looking at udpkg now to see what I can do there. I'm hoping that
>it might be a reasonably quick fix use filesytem-based locking around
>status file updates.

Having experimented with exactly that, after a little bit of tweaking
I think I've fixed the bug. Previously I could reproduce this bug
readily, ~75% of the time on my local arm64 VM. With my new udpkg
build included, I've just run things through a dozen times in
succession with no problem encountered. I think that's good enough, so
I've pushed and uploaded a new udpkg into unstable.

Please check back in a couple of days with a daily build and validate
this fixes things for you too.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
< Aardvark> I dislike C++ to start with. C++11 just seems to be
handing rope-creating factories for users to hang multiple
instances of themselves.



Bug#987368: Installer fails at first menu "Choose language"

2021-06-09 Thread Steve McIntyre
Argh...

Minor thing: James, please don't top-post on Debian lists. It breaks
conversation threads, particularly when others are doing inline
quoting.

On Sun, Jun 06, 2021 at 10:08:09PM +0100, James Addison wrote:
>Thanks Cyril, Frédéric - it feels like we're reaching a consensus that
>udpkg may not be multi-process safe (although, strictly speaking, I
>would say we haven't proven that yet).
>
>The authors of multi-console support could be the best people to
>recommend a path forward, as they may have close knowledge of the
>level of testing and completion of the change.  I've attempted to add
>them as subscribers to the bug although I expect that is opt-in and
>I'm not sure whether I added them correctly.

I saw nothing here, I'm afraid. But I do read debian-boot when I have
the time.

>Until any feedback from them, I'll mention a few possible routes that
>had occurred to me:
>
>- Backtracking: if we feel this is a problem that would likely affect
>and/or annoy a significant number of users, we could disable
>multi-console support for bullseye
>- Known-issue: if we feel the issue is serious but rare, we could
>indicate that it is a known problem (and perhaps prepare and document
>workarounds)
>- Scripting fix: we could perhaps adjust the installation scripts so
>that d-i runs in a single-process condition until after udpkg has
>completed, and only begin multiple debian-installer processes after
>that
>- Process-safety fix: in some sense an 'ideal' fix, we could add
>multi-process safety to udpkg, either by using careful rewriting or
>perhaps by using a lockfile or other safety mechanism(s)

Looking at the history in this bug, things are not working as we hoped
when we added the multi-console support. When I initially worked with
Wookey on this, we didn't see errors like this at all in
testing. That's not to say that there's *not* a problem here, but
maybe other changes made since then have caused it to be uncovered.

Multi-console support is a significant improvement for a number of
non-x86 users. This is particularly the case for those with arm64
systems where the firmware might default to the primary console being
a serial port but the user doesn't even know that. We wanted to be
able to start d-i on all the likely-looking consoles (serial *and* tty
*and* graphical), allowing the user to interact with the one they
preferred.

In our testing, I don't remember ever seeing udpkg invocations racing
against each other like this. But in my own testing for d-i Bullseye
RC2 in an arm64 VM here I've just seen this exact problem myself so
it's clearly a thing!

I'm looking at udpkg now to see what I can do there. I'm hoping that
it might be a reasonably quick fix use filesytem-based locking around
status file updates.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
  Getting a SCSI chain working is perfectly simple if you remember that there
  must be exactly three terminations: one on one end of the cable, one on the
  far end, and the goat, terminated over the SCSI chain with a silver-handled
  knife whilst burning *black* candles. --- Anthony DeBoer



Bug#987368: Installer fails at first menu "Choose language"

2021-06-06 Thread James Addison
Thanks Cyril, Frédéric - it feels like we're reaching a consensus that
udpkg may not be multi-process safe (although, strictly speaking, I
would say we haven't proven that yet).

The authors of multi-console support could be the best people to
recommend a path forward, as they may have close knowledge of the
level of testing and completion of the change.  I've attempted to add
them as subscribers to the bug although I expect that is opt-in and
I'm not sure whether I added them correctly.

Until any feedback from them, I'll mention a few possible routes that
had occurred to me:

- Backtracking: if we feel this is a problem that would likely affect
and/or annoy a significant number of users, we could disable
multi-console support for bullseye
- Known-issue: if we feel the issue is serious but rare, we could
indicate that it is a known problem (and perhaps prepare and document
workarounds)
- Scripting fix: we could perhaps adjust the installation scripts so
that d-i runs in a single-process condition until after udpkg has
completed, and only begin multiple debian-installer processes after
that
- Process-safety fix: in some sense an 'ideal' fix, we could add
multi-process safety to udpkg, either by using careful rewriting or
perhaps by using a lockfile or other safety mechanism(s)

Some related factors to consider:

- Do we advertise and support multi-process debian-installer support
in our documentation?
- Do we have availability of developer expertise for udpkg, including
review and QA time?
- Could/should the distance to a release date of Debian bullseye be a factor?

Cheers,
James

On Mon, 31 May 2021 at 10:31, Frédéric Bonnard  wrote:
>
> Hi Cyril/all,
> sorry that the process takes long, but that was the only way to
> reproduce that bug (which I think may not be specific to ppc64el)
> without having Power hardware (and a LPAR/HMC setup).
>
> > Looking at that log, one sees two PIDs for main-menu (272 and 278),
> > which could explain a very nice race condition: udpkg racing, one of
> > them making the status file disappear from under the feet of the other
> > one?
>
> My feeling is that this is exactly what's happening.
> I tried touching the missing file and the installer was happy because
> the called process (udpkg from what I remember) does not crash anymore
> (one can try udpkg without status file and it will crash).
>
> > See also two /sbin/debian-installer processes getting started beforehand
> > (one on /dev/hvc0, one on /dev/tty).
>
> Exactly.
>
> > It looks to me this is a clear problem on the debian-installer side (how
> > it deals with multiple consoles, similar to #940028 as you mentioned
> > initially), rather than some possible issues with emulation?
>
> To me, it's clearly not a qemu issue, since I have that issue on
> physical machines too. I just went the emulation way to enable people
> without hardware to reproduce the bug. It's more a race condition of
> running two debian-installers (not sure now who is remove the status
> file, probably udpkg ?).
>
> The point is that some work has already been done by several people on
> those multiple consoles setup according to the git commits , and I guess
> they will clearly get a grasp of what's going on.
>
>
> F.



Bug#987368: Installer fails at first menu "Choose language"

2021-05-31 Thread Frédéric Bonnard
Hi Cyril/all,
sorry that the process takes long, but that was the only way to
reproduce that bug (which I think may not be specific to ppc64el)
without having Power hardware (and a LPAR/HMC setup).

> Looking at that log, one sees two PIDs for main-menu (272 and 278),
> which could explain a very nice race condition: udpkg racing, one of
> them making the status file disappear from under the feet of the other
> one?

My feeling is that this is exactly what's happening.
I tried touching the missing file and the installer was happy because
the called process (udpkg from what I remember) does not crash anymore
(one can try udpkg without status file and it will crash).

> See also two /sbin/debian-installer processes getting started beforehand
> (one on /dev/hvc0, one on /dev/tty).

Exactly.

> It looks to me this is a clear problem on the debian-installer side (how
> it deals with multiple consoles, similar to #940028 as you mentioned
> initially), rather than some possible issues with emulation?

To me, it's clearly not a qemu issue, since I have that issue on
physical machines too. I just went the emulation way to enable people
without hardware to reproduce the bug. It's more a race condition of
running two debian-installers (not sure now who is remove the status
file, probably udpkg ?).

The point is that some work has already been done by several people on
those multiple consoles setup according to the git commits , and I guess
they will clearly get a grasp of what's going on.


F.


signature.asc
Description: PGP signature


Bug#987368: Installer fails at first menu "Choose language"

2021-05-30 Thread Cyril Brulebois
Hi James,

[cc-ing submitter back, and youpi for the final comment]

James Addison  (2021-05-28):
> Does d-i tend to use udpkg for bootstrapping?
> 
> If so, I think 
> https://salsa.debian.org/installer-team/udpkg/-/blob/master/status.c#L390
> could be a potential section of code to investigate further.
> 
> It doesn't look like full-fat dpkg performs these kind of renames on
> the status file.

Since that happens quite early, my first instinct was to look under the
main-menu source package, which indeed contains some references to udpkg:

kibi@tokyo:~/debian-installer/packages/main-menu$ git grep -i udpkg
debian/changelog:- add "exec" to the udpkg calls, ash isn't able to 
optimize the fork
debian/changelog:- Don't run udpkg --configure on virtual packages
main-menu.c:/* Tell udpkg to shut up. */
main-menu.c:setenv("UDPKG_QUIET", "y", 1);
main-menu.c:if (asprintf(, "exec udpkg --configure 
--force-configure %s", p->p.package) == -1)

If you want to read it online:
  
https://salsa.debian.org/installer-team/main-menu/-/blob/master/main-menu.c#L835

main-menu is also the one getting the error according to Frédéric's
syslog excerpt, so that would be somewhat consistent. I'm quoting it
again before some comments:

Apr 22 12:11:57 reopen-console: Looking at console hvc0 from /proc/consoles
Apr 22 12:11:57 reopen-console:Adding hvc0 to possible consoles list
Apr 22 12:11:57 reopen-console:hvc0 is preferred
Apr 22 12:11:57 reopen-console:Adding tty0 to possible consoles list
Apr 22 12:11:57 reopen-console: Adding inittab entry for hvc0
Apr 22 12:11:57 reopen-console: Adding inittab entry for tty0
Apr 22 12:11:57 reopen-console: Restarting init to start d-i on the 
console(s) we found
Apr 22 12:11:57 init: reloading /etc/inittab
Apr 22 12:11:57 init: starting pid 239, tty '/dev/tty4': '/usr/bin/tail -f 
/var/log/syslog'
Apr 22 12:11:57 init: starting pid 240, tty '/dev/hvc0': 
'/sbin/debian-installer'
Apr 22 12:11:57 init: starting pid 241, tty '/dev/tty0': 
'/sbin/debian-installer
Apr 22 12:11:57 debconf: Setting debconf/language to en
Apr 22 12:11:57 main-menu[272]: INFO: Falling back to the package 
description for brltty-udeb
Apr 22 12:11:57 debconf: Setting debconf/language to en
Apr 22 12:11:57 main-menu[272]: INFO: Menu item 'localechooser' selected
Apr 22 12:11:57 main-menu[278]: INFO: Falling back to the package 
description for brltty-udeb
Apr 22 12:11:57 main-menu[278]: INFO: Menu item 'localechooser' selected
Apr 22 12:11:57 main-menu[278]: /var/lib/dpkg/status: No such file or 
directory
Apr 22 12:11:57 main-menu[272]: /var/lib/dpkg/status: No such file or 
directory
Apr 22 12:11:57 main-menu[272]: /var/lib/dpkg/status: No such file or 
directory

Looking at that log, one sees two PIDs for main-menu (272 and 278),
which could explain a very nice race condition: udpkg racing, one of
them making the status file disappear from under the feet of the other
one?

See also two /sbin/debian-installer processes getting started beforehand
(one on /dev/hvc0, one on /dev/tty).

It looks to me this is a clear problem on the debian-installer side (how
it deals with multiple consoles, similar to #940028 as you mentioned
initially), rather than some possible issues with emulation?


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#987368: Installer fails at first menu "Choose language"

2021-05-30 Thread Cyril Brulebois
Hi Frédéric,

Frédéric Bonnard  (2021-05-28):
> Actually you need to "wget" the vmlinux and initrd.gz files inside the
> emulated system, in the same place where you'll kexec. So just re-run
> your wget commands from the petitboot shell and you
> should be good :)

I think that was just an out-of-order execution bug in the kibi
emulator, sorry about that.

Wow, every iteration/test eats up some time, and I've got to work on
other issues… I think I'll keep an open mind about that one, but I'm not
sure I will block the installer for 11.0 on it. I'm happy to take an
explanation, patch, workaround, etc. if someone has a stroke of genius,
though!

(Maybe some free time will magically pop up, but I wouldn't hold my
breath…)



Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#987368: Installer fails at first menu "Choose language"

2021-05-28 Thread James Addison
Does d-i tend to use udpkg for bootstrapping?

If so, I think 
https://salsa.debian.org/installer-team/udpkg/-/blob/master/status.c#L390
could be a potential section of code to investigate further.

It doesn't look like full-fat dpkg performs these kind of renames on
the status file.



Bug#987368: Installer fails at first menu "Choose language"

2021-05-28 Thread James Addison
This might also be the same issue as reported in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944125

(some kind of race condition where multiple consoles are available and
entered into the inittab, and a /var/lib/dpkg/status.bak is found
instead of the expected status file)



Bug#987368: Installer fails at first menu "Choose language"

2021-05-28 Thread Frédéric Bonnard
On Fri, 28 May 2021 14:39:56 +0200, Cyril Brulebois  wrote:
> Hi Frédéric,
> 
> Frédéric Bonnard  (2021-05-25):
> > Get that one too :
> > https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/artifact/images/skiboot.lid
> > 
> > and give it to qemu-system-ppc64 by adding "-bios skiboot.lid" to the qemu
> > line (got the same issue. Last components build may require a newer
> > skiboot.lid than the one coming in qemu package).
> 
> OK, thanks! It went further, but I'm now facing this:
> 
> Exiting petitboot. Type 'exit' to return.
> You may run 'pb-sos' to gather diagnostic data
> No password set, running as root. You may set a password in the System 
> Configuration screen.
> # kexec -l vmlinux -i initrd.gz -e 
> load_kernel: Open of vmlinux failed: No such file or directory
> # kexec -s vmlinux -i initrd.gz -e
> file_load failed: Bad file descriptor
> 
> but those files are in the directory where I started qemu-system-ppc64
> from. Did I miss a step?

Actually you need to "wget" the vmlinux and initrd.gz files inside the
emulated system, in the same place where you'll kexec .
So just re-run your wget commands from the petitboot shell and you
should be good :)

> (Third time's the charm? :))

:D 

> 
> 
> Cheers,
> -- 
> Cyril Brulebois (k...@debian.org)
> D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#987368: Installer fails at first menu "Choose language"

2021-05-28 Thread Cyril Brulebois
Hi Frédéric,

Frédéric Bonnard  (2021-05-25):
> Get that one too :
> https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/artifact/images/skiboot.lid
> 
> and give it to qemu-system-ppc64 by adding "-bios skiboot.lid" to the qemu
> line (got the same issue. Last components build may require a newer
> skiboot.lid than the one coming in qemu package).

OK, thanks! It went further, but I'm now facing this:

Exiting petitboot. Type 'exit' to return.
You may run 'pb-sos' to gather diagnostic data
No password set, running as root. You may set a password in the System 
Configuration screen.
# kexec -l vmlinux -i initrd.gz -e 
load_kernel: Open of vmlinux failed: No such file or directory
# kexec -s vmlinux -i initrd.gz -e
file_load failed: Bad file descriptor

but those files are in the directory where I started qemu-system-ppc64
from. Did I miss a step?


(Third time's the charm? :))


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#987368: Installer fails at first menu "Choose language"

2021-05-25 Thread Frédéric Bonnard
Hey Cyril,
thanks for trying!
Get that one too :
https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/artifact/images/skiboot.lid

and give it to qemu-system-ppc64 by adding "-bios skiboot.lid" to the qemu
line (got the same issue. Last components build may require a newer
skiboot.lid than the one coming in qemu package).


F.


On Mon, 24 May 2021 04:48:00 +0200, Cyril Brulebois  wrote:
> Hi Frédéric,
> 
> Frédéric Bonnard  (2021-04-26):
> > Thanks for willing to investigate !
> 
> Thanks for the detailed steps!
> 
> > LPAR setup in PowerVM can not be reproduced to my knowledge with qemu;
> > this a partitioning configuration with PHYP proprietary firmware by
> > IBM.  Using a ppc64el vm, I never had the issue, since I think, hvc0
> > does not exist and thus does not create race condition with tty0. The
> > last possible configuration providing hvc0 is the baremetal mode
> > (PowerNV), installating a physical Power machine with linux on top of
> > it.
> > Hopefully, qemu is able to emulate a baremetal machine (PowerNV) as
> > skiboot firmware is opensource (compared to PHYP).
> > I tried and could reproduce the bug after 3 tries.
> > 
> > For this, on your amd64 machine :
> > - install qemu-system-ppc 5.2 (in my case, using stable, I used 
> > 1:5.2+dfsg-9~bpo10+1 )
> > - get those :
> >   * 
> > https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/artifact/images/rootfs.cpio.xz
> >   * 
> > https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/artifact/images/zImage.epapr
> > - use the following to emulate the P9 PowerNV Witherspoon machine :
> >   qemu-system-ppc64 -m 2G -machine powernv9 -smp 8,cores=8,threads=1 \
> > -accel tcg,thread=single \
> > -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0  \
> > -netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \
> > -kernel ./zImage.epapr  \
> > -initrd ./rootfs.cpio.xz \
> > -nographic
> > - Once you get into the petitboot menu, "Exit to shell"
> 
> Using a rather similar qemu package, but on bullseye (1:5.2+dfsg-10),
> I'm not able to get to the petitboot menu:
> 
> kibi@tokyo:~/downloads$ qemu-system-ppc64 -m 2G -machine powernv9 -smp 
> 8,cores=8,threads=1 \
> -accel tcg,thread=single \
> -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0  
> \
> -netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \
> -kernel ./zImage.epapr  \
> -initrd ./rootfs.cpio.xz \
> -nographic
> [0.015788836,5] OPAL v6.4 starting...
> [0.016283111,7] initial console log level: memory 7, driver 5
> [0.016319531,6] CPU: P9 generation processor (max 4 threads/core)
> [0.016335507,7] CPU: Boot CPU PIR is 0x PVR is 0x004e1200
> [0.016452461,7] OPAL table: 0x30110530 .. 0x30110aa0, branch table: 
> 0x30002000
> [0.016622304,7] Assigning physical memory map table for nimbus
> [0.016979537,7] FDT: Parsing fdt @0x100
> [0.019592646,5] CHIP: Detected Qemu simulator
> [0.019757972,6] CHIP: Initialised chip 0 from xscom@603fc
> [0.020117574,6] P9 DD2.00 detected
> [0.020142490,5] CHIP: Chip ID  type: P9N DD2.00
> [0.020155985,7] XSCOM: Base address: 0x603fc
> [0.020192159,7] XSTOP: ibm,sw-checkstop-fir prop not found
> [0.020301881,6] MFSI 0:0: Initialized
> [0.020313870,6] MFSI 0:2: Initialized
> [0.020323201,6] MFSI 0:1: Initialized
> [0.020848585,6] LPC: LPC[000]: Initialized
> [0.020858725,7] LPC: access via MMIO @0x60300
> [0.020894150,7] LPC: Default bus on chip 0x0
> [0.020998003,7] CPU: New max PIR set to 0x1f
> [0.021512680,7] MEM: parsing reserved memory from 
> reserved-names/-ranges properties
> [0.021609903,7] CPU: decrementer bits 56
> [0.021668231,6] CPU: CPU from DT PIR=0x Server#=0x0 State=3
> [0.021750145,6] CPU:  1 secondary threads
> [0.021766340,6] CPU: CPU from DT PIR=0x0004 Server#=0x4 State=3
> [0.021788849,6] CPU:  1 secondary threads
> [0.021797000,6] CPU: CPU from DT PIR=0x0008 Server#=0x8 State=3
> [0.021807120,6] CPU:  1 secondary threads
> [0.021814202,6] CPU: CPU from DT PIR=0x000c Server#=0xc State=3
> [0.021823785,6] CPU:  1 secondary threads
> [0.021834496,6] CPU: CPU from DT PIR=0x0010 Server#=0x10 State=3
> [0.021846219,6] CPU:  1 secondary threads
> [0.021858600,6] CPU: CPU from DT PIR=0x0014 Server#=0x14 State=3
> [0.021870856,6] CPU:  1 secondary threads
> [0.021878195,6] CPU: CPU from DT PIR=0x0018 Server#=0x18 State=3
> [0.021887826,6] CPU:  1 secondary threads
> [0.021894705,6] CPU: CPU from DT PIR=0x001c Server#=0x1c State=3
> [

Bug#987368: Installer fails at first menu "Choose language"

2021-05-23 Thread Cyril Brulebois
Hi Frédéric,

Frédéric Bonnard  (2021-04-26):
> Thanks for willing to investigate !

Thanks for the detailed steps!

> LPAR setup in PowerVM can not be reproduced to my knowledge with qemu;
> this a partitioning configuration with PHYP proprietary firmware by
> IBM.  Using a ppc64el vm, I never had the issue, since I think, hvc0
> does not exist and thus does not create race condition with tty0. The
> last possible configuration providing hvc0 is the baremetal mode
> (PowerNV), installating a physical Power machine with linux on top of
> it.
> Hopefully, qemu is able to emulate a baremetal machine (PowerNV) as
> skiboot firmware is opensource (compared to PHYP).
> I tried and could reproduce the bug after 3 tries.
> 
> For this, on your amd64 machine :
> - install qemu-system-ppc 5.2 (in my case, using stable, I used 
> 1:5.2+dfsg-9~bpo10+1 )
> - get those :
>   * 
> https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/artifact/images/rootfs.cpio.xz
>   * 
> https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/artifact/images/zImage.epapr
> - use the following to emulate the P9 PowerNV Witherspoon machine :
>   qemu-system-ppc64 -m 2G -machine powernv9 -smp 8,cores=8,threads=1 \
> -accel tcg,thread=single \
> -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0  \
> -netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \
> -kernel ./zImage.epapr  \
> -initrd ./rootfs.cpio.xz \
> -nographic
> - Once you get into the petitboot menu, "Exit to shell"

Using a rather similar qemu package, but on bullseye (1:5.2+dfsg-10),
I'm not able to get to the petitboot menu:

kibi@tokyo:~/downloads$ qemu-system-ppc64 -m 2G -machine powernv9 -smp 
8,cores=8,threads=1 \
-accel tcg,thread=single \
-device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0  \
-netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \
-kernel ./zImage.epapr  \
-initrd ./rootfs.cpio.xz \
-nographic
[0.015788836,5] OPAL v6.4 starting...
[0.016283111,7] initial console log level: memory 7, driver 5
[0.016319531,6] CPU: P9 generation processor (max 4 threads/core)
[0.016335507,7] CPU: Boot CPU PIR is 0x PVR is 0x004e1200
[0.016452461,7] OPAL table: 0x30110530 .. 0x30110aa0, branch table: 
0x30002000
[0.016622304,7] Assigning physical memory map table for nimbus
[0.016979537,7] FDT: Parsing fdt @0x100
[0.019592646,5] CHIP: Detected Qemu simulator
[0.019757972,6] CHIP: Initialised chip 0 from xscom@603fc
[0.020117574,6] P9 DD2.00 detected
[0.020142490,5] CHIP: Chip ID  type: P9N DD2.00
[0.020155985,7] XSCOM: Base address: 0x603fc
[0.020192159,7] XSTOP: ibm,sw-checkstop-fir prop not found
[0.020301881,6] MFSI 0:0: Initialized
[0.020313870,6] MFSI 0:2: Initialized
[0.020323201,6] MFSI 0:1: Initialized
[0.020848585,6] LPC: LPC[000]: Initialized
[0.020858725,7] LPC: access via MMIO @0x60300
[0.020894150,7] LPC: Default bus on chip 0x0
[0.020998003,7] CPU: New max PIR set to 0x1f
[0.021512680,7] MEM: parsing reserved memory from 
reserved-names/-ranges properties
[0.021609903,7] CPU: decrementer bits 56
[0.021668231,6] CPU: CPU from DT PIR=0x Server#=0x0 State=3
[0.021750145,6] CPU:  1 secondary threads
[0.021766340,6] CPU: CPU from DT PIR=0x0004 Server#=0x4 State=3
[0.021788849,6] CPU:  1 secondary threads
[0.021797000,6] CPU: CPU from DT PIR=0x0008 Server#=0x8 State=3
[0.021807120,6] CPU:  1 secondary threads
[0.021814202,6] CPU: CPU from DT PIR=0x000c Server#=0xc State=3
[0.021823785,6] CPU:  1 secondary threads
[0.021834496,6] CPU: CPU from DT PIR=0x0010 Server#=0x10 State=3
[0.021846219,6] CPU:  1 secondary threads
[0.021858600,6] CPU: CPU from DT PIR=0x0014 Server#=0x14 State=3
[0.021870856,6] CPU:  1 secondary threads
[0.021878195,6] CPU: CPU from DT PIR=0x0018 Server#=0x18 State=3
[0.021887826,6] CPU:  1 secondary threads
[0.021894705,6] CPU: CPU from DT PIR=0x001c Server#=0x1c State=3
[0.021904113,6] CPU:  1 secondary threads
[0.023185494,5] PLAT: Using SuperIO UART
[0.023483354,7] UART: Using LPC IRQ 4
[0.028248102,5] PLAT: Detected QEMU POWER9 platform
[0.028385817,5] PLAT: Detected BMC platform ast2500:openbmc
[0.059238262,5] CPU: All 8 processors called in...
[0.059731948,3] SBE: Master chip ID not found.
[0.060360004,7] LPC: Routing irq 10, policy: 0 (r=1)
[0.060430008,7] LPC: SerIRQ 10 using route 0 targetted at OPAL
[0.082669345,5] HIOMAP: Negotiated hiomap protocol v2
[0.082815227,5] HIOMAP: Block size is 4KiB
   

Bug#987368: Installer fails at first menu "Choose language"

2021-05-02 Thread Samuel Thibault
Vagrant Cascadian, le dim. 02 mai 2021 14:36:46 -0700, a ecrit:
> On 2021-04-22, Frédéric Bonnard wrote:
> > Machine: Power10 machine but got it on Power8 as well
> 
> > This happens randomly when the installer menu starts, I get to the first
> > menu "Choose language", but it is red saying "An installation failed...
> > The failing step is: Choose language".
> 
> FWIW, I've seen this a few times on arm64 too.

I got it as well somtimes while testing, on arm and ppc with qemu. I
assumed it could be a qemu emulation bug. But it does happen only on
that screen, indeed.

Samuel



Bug#987368: Installer fails at first menu "Choose language"

2021-05-02 Thread Vagrant Cascadian
On 2021-04-22, Frédéric Bonnard wrote:
> Machine: Power10 machine but got it on Power8 as well

> This happens randomly when the installer menu starts, I get to the first
> menu "Choose language", but it is red saying "An installation failed...
> The failing step is: Choose language".

FWIW, I've seen this a few times on arm64 too.

I hate to admit I rebooted and tried again the few times it happened,
rather than dive into the details of the problem, but it *usually*
worked on the second try... so this may not be a deterministic bug.

I'll try again soon and try to get more details...

live well,
  vagrant


signature.asc
Description: PGP signature


Bug#987368: Installer fails at first menu "Choose language"

2021-04-26 Thread Frédéric Bonnard
Hi Cyril,

Thanks for willing to investigate !

LPAR setup in PowerVM can not be reproduced to my knowledge with qemu ; this a
partitioning configuration with PHYP proprietary firmware by IBM.
Using a ppc64el vm, I never had the issue, since I think, hvc0 does not
exist and thus does not create race condition with tty0.
The last possible configuration providing hvc0 is the baremetal mode
(PowerNV), installating a physical Power machine with linux on top of
it.
Hopefully, qemu is able to emulate a baremetal machine (PowerNV) as skiboot
firmware is opensource (compared to PHYP).
I tried and could reproduce the bug after 3 tries.

For this, on your amd64 machine :
- install qemu-system-ppc 5.2 (in my case, using stable, I used 
1:5.2+dfsg-9~bpo10+1 )
- get those :
  * 
https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/artifact/images/rootfs.cpio.xz
  * 
https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/artifact/images/zImage.epapr
- use the following to emulate the P9 PowerNV Witherspoon machine :
  qemu-system-ppc64 -m 2G -machine powernv9 -smp 8,cores=8,threads=1 \
-accel tcg,thread=single \
-device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0  \
-netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \
-kernel ./zImage.epapr  \
-initrd ./rootfs.cpio.xz \
-nographic
- Once you get into the petitboot menu, "Exit to shell"
- wget : (I couldn't boot the mini.iso)
  * 
https://d-i.debian.org/daily-images/ppc64el/daily/netboot/debian-installer/ppc64el/vmlinux
  * 
https://d-i.debian.org/daily-images/ppc64el/daily/netboot/debian-installer/ppc64el/initrd.gz
- kexec those :
  1. kexec -l vmlinux -i initrd.gz -e (you'll get an error.. but this
  steps seems necessary)
  2. kexec -s vmlinux -i initrd.gz -e
- cross fingers ; if it doesn't fail, halt and rerun qemu...

I hope you get it as well!

F.

On Fri, 23 Apr 2021 22:48:33 +0200, Cyril Brulebois  wrote:
> Hello Frédéric,
> 
> Frédéric Bonnard  (2021-04-22):
> > Boot method: CD
> > Image version: 
> > http://d-i.debian.org/daily-images/ppc64el/daily/netboot/mini.iso
> > Date: April 21st 2021
> > 
> > Machine: Power10 machine but got it on Power8 as well
> > 
> > This happens randomly when the installer menu starts, I get to the first
> > menu "Choose language", but it is red saying "An installation failed...
> > The failing step is: Choose language".
> > 
> > It seems the missing file /var/lib/dpkg/status is causing this.
> > Instead I have /var/lib/dpkg/status.bak
> 
> I don't know much about ppc (I don't think iBook G4 experience counts
> much at this stage), but I see that qemu-system-ppc exists, and that it
> provides qemu-system-ppc64. Would you have some guide that could help us
> reproduce the issue from say an amd64 host?
> 
> 
> Cheers,
> -- 
> Cyril Brulebois (k...@debian.org)
> D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#987368: Installer fails at first menu "Choose language"

2021-04-23 Thread Cyril Brulebois
Hello Frédéric,

Frédéric Bonnard  (2021-04-22):
> Boot method: CD
> Image version: 
> http://d-i.debian.org/daily-images/ppc64el/daily/netboot/mini.iso
> Date: April 21st 2021
> 
> Machine: Power10 machine but got it on Power8 as well
> 
> This happens randomly when the installer menu starts, I get to the first
> menu "Choose language", but it is red saying "An installation failed...
> The failing step is: Choose language".
> 
> It seems the missing file /var/lib/dpkg/status is causing this.
> Instead I have /var/lib/dpkg/status.bak

I don't know much about ppc (I don't think iBook G4 experience counts
much at this stage), but I see that qemu-system-ppc exists, and that it
provides qemu-system-ppc64. Would you have some guide that could help us
reproduce the issue from say an amd64 host?


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#987368: Installer fails at first menu "Choose language"

2021-04-22 Thread Frédéric Bonnard
Package: installation-reports

Boot method: CD
Image version: http://d-i.debian.org/daily-images/ppc64el/daily/netboot/mini.iso
Date: April 21st 2021

Machine: Power10 machine but got it on Power8 as well

This happens randomly when the installer menu starts, I get to the first
menu "Choose language", but it is red saying "An installation failed...
The failing step is: Choose language".

It seems the missing file /var/lib/dpkg/status is causing this.
Instead I have /var/lib/dpkg/status.bak

Digging a bit I found this bug : 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=940028

which seems close to what I have :

---
Apr 22 12:11:57 reopen-console: Looking at console hvc0 from /proc/consoles
Apr 22 12:11:57 reopen-console:Adding hvc0 to possible consoles list
Apr 22 12:11:57 reopen-console:hvc0 is preferred
Apr 22 12:11:57 reopen-console:Adding tty0 to possible consoles list
Apr 22 12:11:57 reopen-console: Adding inittab entry for hvc0
Apr 22 12:11:57 reopen-console: Adding inittab entry for tty0
Apr 22 12:11:57 reopen-console: Restarting init to start d-i on the console(s) 
we found
Apr 22 12:11:57 init: reloading /etc/inittab
Apr 22 12:11:57 init: starting pid 239, tty '/dev/tty4': '/usr/bin/tail -f 
/var/log/syslog'
Apr 22 12:11:57 init: starting pid 240, tty '/dev/hvc0': 
'/sbin/debian-installer'
Apr 22 12:11:57 init: starting pid 241, tty '/dev/tty0': '/sbin/debian-installer
Apr 22 12:11:57 debconf: Setting debconf/language to en
Apr 22 12:11:57 main-menu[272]: INFO: Falling back to the package description 
for brltty-udeb
Apr 22 12:11:57 debconf: Setting debconf/language to en
Apr 22 12:11:57 main-menu[272]: INFO: Menu item 'localechooser' selected
Apr 22 12:11:57 main-menu[278]: INFO: Falling back to the package description 
for brltty-udeb
Apr 22 12:11:57 main-menu[278]: INFO: Menu item 'localechooser' selected
Apr 22 12:11:57 main-menu[278]: /var/lib/dpkg/status: No such file or directory
Apr 22 12:11:57 main-menu[272]: /var/lib/dpkg/status: No such file or directory
Apr 22 12:11:57 main-menu[272]: /var/lib/dpkg/status: No such file or directory
---

Regards,

F.


signature.asc
Description: PGP signature