Bug#954270: [Debian-med-packaging] Bug#954270: [RFS] kmc: arm64 autopkgtest time out

2020-12-10 Thread Étienne Mollier
Hi Andreas, Hi Paul,

Paul Gevers, on 2020-12-10 13:50:59 +0100:
> On 10-12-2020 13:10, Andreas Tille wrote:
> > My guess is that kmc was developed under and designed for amd64 and it
> > runs there.  That's explicitly the case for the latest upstream version
> > which will not support any arm* out of the box (if I understand Étienne
> > correctly).  I do not see any need to stress test our hardware with
> > some software that will be outdated soonish.

I'm merging my changes for the kmc 3.1.1 amd64-only update; will
push very soon.

> I couldn't resist. I started it up and it hangs.
> 
> Paul
[...]
> # In the end, default options should, ideally, work on any configuration.
> echo 'Running kmc (default thread count)'
> Running kmc (default thread count)
> rm -f 1.kmc_suf 1.kmc_pre
> kmc -ci1 -m1 -k28 $ORIGDIR/debian/tests/sample_6.fastq.gz 1 .
> *

Thank you very much for the full log!  :)

I was hoping to see it on the ci logs during my lunch hour, but
as I understand now, it was not accessible from the web page (or
rather well hidden).  Looks like I either screwed my patch or
the general issue is way more intricate than I first thought.
Looks like I'll have to locate again that RoM removal procedure.

Have a nice evening,
-- 
Étienne Mollier 
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/1, please excuse my verbosity.


signature.asc
Description: PGP signature


Bug#954270: [RFS] kmc: arm64 autopkgtest time out

2020-12-10 Thread Paul Gevers
Hi

On 10-12-2020 13:10, Andreas Tille wrote:
> On Thu, Dec 10, 2020 at 11:58:23AM +0100, Paul Gevers wrote:
>>> In the kmc case I'm seriously wondering whether we should restrict
>>> the architectures to those that are relevant in practice.  It seems
>>> to be in line with upstream and I'm tempted to follow Étienne's
>>> suggestion to upgrade to kmc 3.x which does not even build on armhf
>>> any more and remove the package for the non-building architectures.
>>>
>>> Étienne, would you mind pushing your patches to Git to proceed with
>>> this plan?
>>
>> Shall I run the current package on our big amd64 host to check if it
>> fails there and that part of problem is solved? I mean, I understand
>> that there was a real problem with hosts that have lots of CPU's and I
>> assume you support amd64.
> 
> My guess is that kmc was developed under and designed for amd64 and it
> runs there.  That's explicitly the case for the latest upstream version
> which will not support any arm* out of the box (if I understand Étienne
> correctly).  I do not see any need to stress test our hardware with
> some software that will be outdated soonish.
> 
> Kind regards
> 
>Andreas.
> 
> 

I couldn't resist. I started it up and it hangs.

Paul

root@ci-worker13:~# lscpu
Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
Address sizes:   46 bits physical, 48 bits virtual
CPU(s):  48
On-line CPU(s) list: 0-47
Thread(s) per core:  2
Core(s) per socket:  12
Socket(s):   2
NUMA node(s):2
Vendor ID:   GenuineIntel
CPU family:  6
Model:   79
Model name:  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Stepping:1
CPU MHz: 2200.074
CPU max MHz: 2900.
CPU min MHz: 1200.
BogoMIPS:4400.13
Virtualization:  VT-x
L1d cache:   32K
L1i cache:   32K
L2 cache:256K
L3 cache:30720K
NUMA node0 CPU(s):   0-11,24-35
NUMA node1 CPU(s):   12-23,36-47
Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single
pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept
vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts flush_l1d



root@ci-worker13:~# /usr/bin/autopkgtest --no-built-binaries
'--setup-commands=echo '"'"'kmc testing/amd64'"'"' > /var/tmp/debci.pkg
2>&1 || true' --user debci --apt-upgrade '--add-apt-source=deb
http://incoming.debian.org/debian-buildd buildd-unstable main contrib
non-free' --add-apt-release=unstable --pin-packages=unstable=src:kmc kmc
-- lxc --sudo --name elbrus autopkgtest-testing-amd64

[...]

autopkgtest [12:45:12]: test build-lib: [---
g++ kmc_dump/*cpp -std=c++11 -lkmc -o $WORKDIR/my_kmcdump
[ -x $WORKDIR/my_kmcdump ]
echo "build: OK"

cd $WORKDIR
echo 'Running kmc (single threaded)'
kmc -ci1 -m1 -k28 -t1 $ORIGDIR/debian/tests/sample_6.fastq.gz 1 .
build: OK
Running kmc (single threaded)
*
*

1st stage: 0.170529s
2nd stage: 0.041426s
Total: 0.211955s
Tmp size : 0MB

Stats:
   No. of k-mers below min. threshold :0
   No. of k-mers above max. threshold :0
   No. of unique k-mers   : 6435
   No. of unique counted k-mers   : 6435
   Total no. of k-mers: 6714
   Total no. of reads :  750
   Total no. of super-k-mers  : 1234
ls -Al
total 1200
-rw-r--r-- 1 debci debci 1114204 Dec 10 12:45 1.kmc_pre
-rw-r--r-- 1 debci debci   45053 Dec 10 12:45 1.kmc_suf
-rwxr-xr-x 1 debci debci   63024 Dec 10 12:45 my_kmcdump
[ -s 1.kmc_suf ]
[ -s 1.kmc_pre ]
echo "kmc (single threaded): OK"
kmc (single threaded): OK

./my_kmcdump 1 out
[ -s out ]
echo "run: OK"
run: OK

# FIXME: uncomment the below test once kmc is updated to version 3.1.1
or more.
## Multi-threaded runs have been known to be faulty 

Bug#954270: [RFS] kmc: arm64 autopkgtest time out

2020-12-10 Thread Andreas Tille
On Thu, Dec 10, 2020 at 11:58:23AM +0100, Paul Gevers wrote:
> > In the kmc case I'm seriously wondering whether we should restrict
> > the architectures to those that are relevant in practice.  It seems
> > to be in line with upstream and I'm tempted to follow Étienne's
> > suggestion to upgrade to kmc 3.x which does not even build on armhf
> > any more and remove the package for the non-building architectures.
> > 
> > Étienne, would you mind pushing your patches to Git to proceed with
> > this plan?
> 
> Shall I run the current package on our big amd64 host to check if it
> fails there and that part of problem is solved? I mean, I understand
> that there was a real problem with hosts that have lots of CPU's and I
> assume you support amd64.

My guess is that kmc was developed under and designed for amd64 and it
runs there.  That's explicitly the case for the latest upstream version
which will not support any arm* out of the box (if I understand Étienne
correctly).  I do not see any need to stress test our hardware with
some software that will be outdated soonish.

Kind regards

   Andreas.


-- 
http://fam-tille.de



Bug#954270: [RFS] kmc: arm64 autopkgtest time out

2020-12-10 Thread Paul Gevers
Hi

Oops, missed a part of the message.

On 10-12-2020 11:00, Andreas Tille wrote:
>> It seems that this didn't work on our armhf worker:
>> https://ci.debian.net/packages/k/kmc/testing/armhf/
>>
>> The amd64 run didn't happen on the big box.
> 
> In the kmc case I'm seriously wondering whether we should restrict
> the architectures to those that are relevant in practice.  It seems
> to be in line with upstream and I'm tempted to follow Étienne's
> suggestion to upgrade to kmc 3.x which does not even build on armhf
> any more and remove the package for the non-building architectures.
> 
> Étienne, would you mind pushing your patches to Git to proceed with
> this plan?

Shall I run the current package on our big amd64 host to check if it
fails there and that part of problem is solved? I mean, I understand
that there was a real problem with hosts that have lots of CPU's and I
assume you support amd64.

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#954270: [RFS] kmc: arm64 autopkgtest time out

2020-12-10 Thread Paul Gevers
Hi Andreas,

On 10-12-2020 11:00, Andreas Tille wrote:
> just a general question:  The general overview page like
> 
>https://ci.debian.net/packages/k/kmc/
> 
> does not seem to be updated while the specific architecture
> pages like

You can see at the bottom of the page when it was last updated. However,
that's not the problem here. That overview page only shows "pure" runs,
i.e. only the results of tests in the different suites without packages
from another suite. We recognize that that's sometimes confusing, but
the alternative (which we had originally) was more confusing, as it
regularly flip back and forth depending on which package(s) from another
suite are added into the mix. We'll probably need to add two tables in
the future, but it's low on our list as I'm not sure how useful it is
considering the flipping behavior, but then at least it's clear to the
user what the table is meant to convey. Ideas welcome.

>https://ci.debian.net/packages/k/kmc/testing/armhf/
> 
> shows the issue.

That's because there it distinguishes by color which runs were "pure"
and which runs had packages from another suite.

> I remember I was asking in connection with augur
> (where the overview page is featuring  8.0.0-2 pass  for amd64/unstable
> which is very outdated).  Is there any chance to fix this soon?

ENOTUITS as there is nothing to fix, only to improve the UI (which is
always difficult). Patches (to debci in this case) are always welcome.

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#954270: [RFS] kmc: arm64 autopkgtest time out

2020-12-10 Thread Andreas Tille
Hi Paul,

just a general question:  The general overview page like

   https://ci.debian.net/packages/k/kmc/

does not seem to be updated while the specific architecture
pages like

   https://ci.debian.net/packages/k/kmc/testing/armhf/

shows the issue.  I remember I was asking in connection with augur
(where the overview page is featuring  8.0.0-2 pass  for amd64/unstable
which is very outdated).  Is there any chance to fix this soon?

On Thu, Dec 10, 2020 at 10:37:12AM +0100, Paul Gevers wrote:
> Control: reopen -1
> Control: retitle -1 kmc: autopgktest times out on multimulticore hosts
> 
> >> multiple debci-workers. That's common on our arm64 workers too.
> > 
> >> [1] https://ci.debian.net/packages/k/kmc/testing/amd64/
> > 
> > I suppose easy solution would be to cap the cores count of the
> > program while working with upstream to find a proper fix.  Will
> > see what can be done about it.
> 
> It seems that this didn't work on our armhf worker:
> https://ci.debian.net/packages/k/kmc/testing/armhf/
> 
> The amd64 run didn't happen on the big box.

In the kmc case I'm seriously wondering whether we should restrict
the architectures to those that are relevant in practice.  It seems
to be in line with upstream and I'm tempted to follow Étienne's
suggestion to upgrade to kmc 3.x which does not even build on armhf
any more and remove the package for the non-building architectures.

Étienne, would you mind pushing your patches to Git to proceed with
this plan?

Kind regards

 Andreas.

-- 
http://fam-tille.de



Bug#954270: [RFS] kmc: arm64 autopkgtest time out

2020-12-10 Thread Paul Gevers
Control: reopen -1
Control: retitle -1 kmc: autopgktest times out on multimulticore hosts

Hi

On 06-12-2020 22:06, Étienne Mollier wrote:
> Paul Gevers, on 2020-12-06 20:54:43 +0100:
>> It recently started to time out on amd64 too, but not always [1]. And
>> when we added armhf, that timed out too. The failures on amd64 that I
>> checked were all on ci-worker13, which is one of our hosts that runs
>> multiple debci-workers. That's common on our arm64 workers too.
> 
>> [1] https://ci.debian.net/packages/k/kmc/testing/amd64/
> 
> Thanks for the pointers, I believe I found a reproducer!  :)
> 
> All failing CI runners all had in common a high NR_CPUS count,
> at least 32 cores.  I don't have 32 cores at hand, but kmc
> provides an option -t to increase the parallelization.  The
> following command, in the conditions of the autopkgtest, will
> hang on any machines:
> 
>   $ kmc -ci1 -m1 -k28 -t32 $ORIGDIR/debian/tests/sample_6.fastq.gz 1 .
> 
> strace output looks like the program just deadlocks, I see no
> CPU consumption while the command is supposed to be running:
> 
>   strace: Process 3001579 attached with 23 threads
>   [pid 3001633] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL 
> 
>   [pid 3001632] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL 
> 
>   [...]
>   [pid 3001612] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL 
> 
>   [pid 3001579] futex(0x55b69bf9ce18, FUTEX_WAIT_PRIVATE, 0, NULL
> 
> I suppose easy solution would be to cap the cores count of the
> program while working with upstream to find a proper fix.  Will
> see what can be done about it.

It seems that this didn't work on our armhf worker:
https://ci.debian.net/packages/k/kmc/testing/armhf/

The amd64 run didn't happen on the big box.

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#954270: [RFS] kmc: arm64 autopkgtest time out

2020-12-06 Thread Étienne Mollier
Control: tag -1 + confirmed
Control: tag -1 - unreproducible

Hi Paul,

Paul Gevers, on 2020-12-06 20:54:43 +0100:
> It recently started to time out on amd64 too, but not always [1]. And
> when we added armhf, that timed out too. The failures on amd64 that I
> checked were all on ci-worker13, which is one of our hosts that runs
> multiple debci-workers. That's common on our arm64 workers too.

> [1] https://ci.debian.net/packages/k/kmc/testing/amd64/

Thanks for the pointers, I believe I found a reproducer!  :)

All failing CI runners all had in common a high NR_CPUS count,
at least 32 cores.  I don't have 32 cores at hand, but kmc
provides an option -t to increase the parallelization.  The
following command, in the conditions of the autopkgtest, will
hang on any machines:

$ kmc -ci1 -m1 -k28 -t32 $ORIGDIR/debian/tests/sample_6.fastq.gz 1 .

strace output looks like the program just deadlocks, I see no
CPU consumption while the command is supposed to be running:

strace: Process 3001579 attached with 23 threads
[pid 3001633] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL 

[pid 3001632] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL 

[...]
[pid 3001612] futex(0x55b69bf9cb70, FUTEX_WAIT_PRIVATE, 0, NULL 

[pid 3001579] futex(0x55b69bf9ce18, FUTEX_WAIT_PRIVATE, 0, NULL

I suppose easy solution would be to cap the cores count of the
program while working with upstream to find a proper fix.  Will
see what can be done about it.

Have a good evening,  :)
-- 
Étienne Mollier 
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/2, please excuse my verbosity.


signature.asc
Description: PGP signature