On Thu, Aug 23, 2012 at 12:45 AM, Jakub Jermar <[email protected]> wrote: > Tonight, I did some testing and unfortunately it appears the branch is > not that scrubbed or even tested as one would have hoped. These are the > issues I noticed: > > - test cht panics the kernel when run on real amd64 or ia32 hardware > (4-core Intel i5, 1.4GHz, 4G memory) > - test cht panics the kernel when run on the above machine in QEMU (-smp > 4 --enable-kvm) > - unfortunately the panic message is interleaved with some other > concurrent output so it is completely unreadable (some synchronization > should be probably added back to printf)
That is very unfortunate. While I expected there to be some latent bugs in the implementation (as in any new software) I would not have expected "test cht" would fail (eg leak memory) and definitely not to panic. It is however quite difficult to respond to this point since I have no idea what went wrong. Let me first respond to: > - finally, what worries me is that the cht obviously did not see any > systematic testing (as evidenced by the panics) I am most certain the implementation needs more elaborate tests as well as running them for hours on ends on various real hardware. However, the implementation is far from being untested. I ran the tests and the benchmarks (which also check for invariants in debug builds) for literally hours during the week 30.7. - 3.8. on these configurations: - Intel Core i7 920 (4x 2.66 GHz + HyperThreading) = ~12 hours/4 days (the computer lab openned arround 10:00, and they kicked me out at around 01:00 = 15 hours, but of course it takes some time to set up eg the benchmark so 12 hours should be a reasonable estimate) - Intel Core2 Quad Q9550 (4x 2.83 GHz) = ~12 hours = 1 day ... both mostly with "qemu-system-x86_64 -smp 4" or "-smp 3" - Intel Pentium M 2 GHz - my development machine, so tested most often on it. Does not have hw virtualization support. It is however true, this testing was done before r1586 of [1]. I did not test it thoroughly after switching to using a sentinel node to terminate bucket chains (that's r1586). I did not think the change was serious enough to warrant extensive testing. In my case it means going to the school's computer lab and spending a day or two there. > - just noticed that when run in QEMU (without kvm, just 1 CPU), it > crashes too due to a failed assertion at adt/cht.c:872: > rcu_read_locked() Ok, this is embarresing. While writing the documentation I noticed adding this requirement (ie surrounding cht_remove_item() in rcu read lock) would eliminate a possible misuse of the api (see comment above the asser). I did not run the test because, well, it was "just" documentation/comments. > - make check reveals that only amd64 and ia32 targets build > - other targets do not build for various reasons: > - spinlock_t * needs to be defined for UP builds too because it is > now needed by condvars regardless of whether the build is UP or SMP > - sparc64 is missing some proc/thread.h includes > - non x86 platforms are missing atomic_cas_* implementations (the > first step would be to make all platforms at least build using just > dummy implementations, the next step would be to provide > functional, but perhaps still limited implementations for these); > for example, armv4 does not have atomic instructions, so all one > can do is disable interrupts and then do a non-atomic cas I will respond to this in the evening today. > - I also find it a little suboptimal that you didn't provide any of the > Big Theory Statements at the beginning of RCU a CHT files as I > previously suggested Not having a high-level algorithmic overview is definitely a minus point. And something that must be added if the code is to be merged. While I think it is a must and a good idea to have such a description, it is not going to be a single paragraph of text. Eg Podzimek's RCU algorithm is described in a not-so-short chapter of his thesis... CHT draws from 4 papers. Maybe you can now see why my lazy self has not yet completed such a description. On the other hand, the ratio of code/comments is well above average. There are nice descriptions of bucket join and splitting operations. > - finally, what worries me is that the cht obviously did not see any > systematic testing (as evidenced by the panics) and also it was not > tried for the global page hash table (sparc64 and ia64 do not build and > page_ht.c still uses the old hash table) > Again, I will respond in the evening today. Adam PS: I am on vacation at this moment without my notebook, so I cannot build/run/commit anything. [1] http://bazaar.launchpad.net/~adam-hraska+lp/helenos/rcu/revision/1586 _______________________________________________ HelenOS-devel mailing list [email protected] http://lists.modry.cz/cgi-bin/listinfo/helenos-devel
