On Thu, Aug 23, 2012 at 12:45 AM, Jakub Jermar <[email protected]> wrote:
> Tonight, I did some testing and unfortunately it appears the branch is
> not that scrubbed or even tested as one would have hoped. These are the
> issues I noticed:
>
> - test cht panics the kernel when run on real amd64 or ia32 hardware
> (4-core Intel i5, 1.4GHz, 4G memory)
> - test cht panics the kernel when run on the above machine in QEMU (-smp
> 4 --enable-kvm)
> - unfortunately the panic message is interleaved with some other
> concurrent output so it is completely unreadable (some synchronization
> should be probably added back to printf)

That is very unfortunate. While I expected there to be some latent
bugs in the implementation (as in any new software) I would not have
expected "test cht" would fail (eg leak memory) and definitely not to
panic.

It is however quite difficult to respond to this point since I have no
idea what went wrong.

Let me first respond to:
> - finally, what worries me is that the cht obviously did not see any
> systematic testing (as evidenced by the panics)

I am most certain the implementation needs more elaborate
tests as well as running them for hours on ends on various
real hardware. However, the implementation is far from being
untested. I ran the tests and the benchmarks (which also
check for invariants in debug builds) for literally hours during
the week 30.7. - 3.8. on these configurations:
- Intel Core i7 920 (4x 2.66 GHz + HyperThreading) = ~12 hours/4 days
(the computer lab openned arround 10:00, and they kicked me
out at around 01:00 = 15 hours, but of course it takes some
time to set up eg the benchmark so 12 hours should be a reasonable
estimate)
- Intel Core2 Quad Q9550 (4x 2.83 GHz) = ~12 hours = 1 day
... both mostly with "qemu-system-x86_64 -smp 4" or "-smp 3"
- Intel Pentium M 2 GHz - my development machine, so tested
most often on it. Does not have hw virtualization support.

It is however true, this testing was done before r1586 of [1].
I did not test it thoroughly after switching to using a sentinel
node to terminate bucket chains (that's r1586). I did not think
the change was serious enough to warrant extensive testing.
In my case it means going to the school's computer lab
and spending a day or two there.

>   - just noticed that when run in QEMU (without kvm, just 1 CPU), it
>     crashes too due to a failed assertion at adt/cht.c:872:
>     rcu_read_locked()

Ok, this is embarresing. While writing the documentation I
noticed adding this requirement (ie surrounding cht_remove_item()
in rcu read lock) would eliminate a possible misuse of the api
(see comment above the asser). I did not run the test because,
well, it was "just" documentation/comments.

> - make check reveals that only amd64 and ia32 targets build
> - other targets do not build for various reasons:
>   - spinlock_t * needs to be defined for UP builds too because it is
>     now needed by condvars regardless of whether the build is UP or SMP
>   - sparc64 is missing some proc/thread.h includes
>   - non x86 platforms are missing atomic_cas_* implementations (the
>     first step would be to make all platforms at least build using just
>     dummy implementations, the next step would be to provide
>     functional, but perhaps still limited implementations for these);
>     for example, armv4 does not have atomic instructions, so all one
>     can do is disable interrupts and then do a non-atomic cas

I will respond to this in the evening today.

> - I also find it a little suboptimal that you didn't provide any of the
> Big Theory Statements at the beginning of RCU a CHT files as I
> previously suggested

Not having a high-level algorithmic overview is definitely a minus
point. And something that must be added if the code is to be
merged.

While I think it is a must and a good idea to have such a description,
it is not going to be a single paragraph of text. Eg Podzimek's
RCU algorithm is described in a not-so-short chapter of his thesis...
CHT draws from 4 papers. Maybe you can now see why my lazy
self has not yet completed such a description.

On the other hand, the ratio of code/comments is well above average.
There are nice descriptions of bucket join and splitting operations.

> - finally, what worries me is that the cht obviously did not see any
> systematic testing (as evidenced by the panics) and also it was not
> tried for the global page hash table (sparc64 and ia64 do not build and
> page_ht.c still uses the old hash table)
>

Again, I will respond in the evening today.

Adam

PS: I am on vacation at this moment without my notebook, so I
cannot build/run/commit anything.

[1] http://bazaar.launchpad.net/~adam-hraska+lp/helenos/rcu/revision/1586

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Reply via email to