On Thu, Aug 23, 2012 at 1:21 PM, Jakub Jermar <[email protected]> wrote: > On 23.8.2012 9:56, Adam Hraska wrote: >> On Thu, Aug 23, 2012 at 12:45 AM, Jakub Jermar <[email protected]> wrote: >>> Tonight, I did some testing and unfortunately it appears the branch is >>> not that scrubbed or even tested as one would have hoped. These are the >>> issues I noticed: >>> >>> - test cht panics the kernel when run on real amd64 or ia32 hardware >>> (4-core Intel i5, 1.4GHz, 4G memory) >>> - test cht panics the kernel when run on the above machine in QEMU (-smp >>> 4 --enable-kvm) >>> - unfortunately the panic message is interleaved with some other >>> concurrent output so it is completely unreadable (some synchronization >>> should be probably added back to printf) >> >> That is very unfortunate. While I expected there to be some latent >> bugs in the implementation (as in any new software) I would not have >> expected "test cht" would fail (eg leak memory) and definitely not to >> panic. >> >> It is however quite difficult to respond to this point since I have no >> idea what went wrong. > > I believe this is the same thing as the QEMU panic on UP. The same > assertion is just hit by each CPU at about the same time, which would > explain the blended panic message.
Thanks. I will have a look at it on Sunday, when I get back to normal computer & internet connection. >> I am most certain the implementation needs more elaborate >> tests as well as running them for hours on ends on various >> real hardware. However, the implementation is far from being >> untested. I ran the tests and the benchmarks (which also >> check for invariants in debug builds) for literally hours during >> the week 30.7. - 3.8. on these configurations: >> - Intel Core i7 920 (4x 2.66 GHz + HyperThreading) = ~12 hours/4 days >> (the computer lab openned arround 10:00, and they kicked me >> out at around 01:00 = 15 hours, but of course it takes some >> time to set up eg the benchmark so 12 hours should be a reasonable >> estimate) >> - Intel Core2 Quad Q9550 (4x 2.83 GHz) = ~12 hours = 1 day >> ... both mostly with "qemu-system-x86_64 -smp 4" or "-smp 3" >> - Intel Pentium M 2 GHz - my development machine, so tested >> most often on it. Does not have hw virtualization support. > > If I am not mistaken, all of those were in fact QEMU runs, right? We > need to test this thoroughly on bare metal as well. Yep, they were run in qemu. In qemu with hardware virtualization. I would hope it does a reasonable job of mimicking running the code on bare metal -- CHT is not kernel-specific code and it just makes use of ordering memory accesses and having atomic instructions. >> Ok, this is embarresing. While writing the documentation I >> noticed adding this requirement (ie surrounding cht_remove_item() >> in rcu read lock) would eliminate a possible misuse of the api >> (see comment above the asser). I did not run the test because, >> well, it was "just" documentation/comments. > > Also if you run this as non-debug build, the assertion will not be hit. Just to clarify: I added the assert because I thought it may be a safer use of the api and requiring an rcu read lock enclosed cht_remove_item() did not seem like a big deal -- although not really needed most of time it might be difficult for a casual user to decide if it is needed and entering rcu read side is inexpensive. But then I forgot the commit had this single code change and since I thought I had only added comments I did not test anything and just checked that it builds and boots. Oops. >> Not having a high-level algorithmic overview is definitely a minus >> point. And something that must be added if the code is to be >> merged. >> >> While I think it is a must and a good idea to have such a description, >> it is not going to be a single paragraph of text. Eg Podzimek's >> RCU algorithm is described in a not-so-short chapter of his thesis... >> CHT draws from 4 papers. Maybe you can now see why my lazy >> self has not yet completed such a description. > > I understand your reasons, but still, creating the intro is part of the > job. The Big Theory Statement needs not be exhaustively exhaustive. It > should briefly explain the idea behind the algorithm and perhaps some > important details of the implementation and provide pointers to papers > where one can read more about it. Agreed. I will eventually add them, but that is not to be considered during the final evaluation. >> There are nice descriptions of bucket join and splitting operations. > > Indeed there are. But especially without some introduction, I am afraid > they are not that useful to someone who is trying to understand what is > going on. I believe these comments are meant for someone who is already > familiar with the hypothetical Big Theory Statement. Understood. I will make writing the big picture description a priority so that others can then check the code or the algorithm itself. Since I will return to Prague during the weekend I can write and commit such a description during the next week at the earliest. Adam _______________________________________________ HelenOS-devel mailing list [email protected] http://lists.modry.cz/cgi-bin/listinfo/helenos-devel
