Re: [HelenOS-devel] scalable hash table - week 11

Adam Hraska Thu, 23 Aug 2012 16:49:57 -0700

On Thu, Aug 23, 2012 at 1:21 PM, Jakub Jermar <[email protected]> wrote:
> On 23.8.2012 9:56, Adam Hraska wrote:
>> On Thu, Aug 23, 2012 at 12:45 AM, Jakub Jermar <[email protected]> wrote:
>>> Tonight, I did some testing and unfortunately it appears the branch is
>>> not that scrubbed or even tested as one would have hoped. These are the
>>> issues I noticed:
>>>
>>> - test cht panics the kernel when run on real amd64 or ia32 hardware
>>> (4-core Intel i5, 1.4GHz, 4G memory)
>>> - test cht panics the kernel when run on the above machine in QEMU (-smp
>>> 4 --enable-kvm)
>>> - unfortunately the panic message is interleaved with some other
>>> concurrent output so it is completely unreadable (some synchronization
>>> should be probably added back to printf)
>>
>> That is very unfortunate. While I expected there to be some latent
>> bugs in the implementation (as in any new software) I would not have
>> expected "test cht" would fail (eg leak memory) and definitely not to
>> panic.
>>
>> It is however quite difficult to respond to this point since I have no
>> idea what went wrong.
>
> I believe this is the same thing as the QEMU panic on UP. The same
> assertion is just hit by each CPU at about the same time, which would
> explain the blended panic message.


Thanks. I will have a look at it on Sunday, when I get back to normal
computer & internet connection.

>> I am most certain the implementation needs more elaborate
>> tests as well as running them for hours on ends on various
>> real hardware. However, the implementation is far from being
>> untested. I ran the tests and the benchmarks (which also
>> check for invariants in debug builds) for literally hours during
>> the week 30.7. - 3.8. on these configurations:
>> - Intel Core i7 920 (4x 2.66 GHz + HyperThreading) = ~12 hours/4 days
>> (the computer lab openned arround 10:00, and they kicked me
>> out at around 01:00 = 15 hours, but of course it takes some
>> time to set up eg the benchmark so 12 hours should be a reasonable
>> estimate)
>> - Intel Core2 Quad Q9550 (4x 2.83 GHz) = ~12 hours = 1 day
>> ... both mostly with "qemu-system-x86_64 -smp 4" or "-smp 3"
>> - Intel Pentium M 2 GHz - my development machine, so tested
>> most often on it. Does not have hw virtualization support.
>
> If I am not mistaken, all of those were in fact QEMU runs, right? We
> need to test this thoroughly on bare metal as well.

Yep, they were run in qemu. In qemu with hardware virtualization.
I would hope it does a reasonable job of mimicking running the
code on bare metal -- CHT is not kernel-specific code and it
just makes use of ordering memory accesses and having atomic
instructions.

>> Ok, this is embarresing. While writing the documentation I
>> noticed adding this requirement (ie surrounding cht_remove_item()
>> in rcu read lock) would eliminate a possible misuse of the api
>> (see comment above the asser). I did not run the test because,
>> well, it was "just" documentation/comments.
>
> Also if you run this as non-debug build, the assertion will not be hit.

Just to clarify: I added the assert because I thought it may be
a safer use of the api and requiring an rcu read lock enclosed
cht_remove_item() did not seem like a big deal -- although not
really needed most of time it might be difficult for a casual user
to decide if it is needed and entering rcu read side is inexpensive.
But then I forgot the commit had this single code change and
since I thought I had only added comments I did not test anything
and just checked that it builds and boots. Oops.

>> Not having a high-level algorithmic overview is definitely a minus
>> point. And something that must be added if the code is to be
>> merged.
>>
>> While I think it is a must and a good idea to have such a description,
>> it is not going to be a single paragraph of text. Eg Podzimek's
>> RCU algorithm is described in a not-so-short chapter of his thesis...
>> CHT draws from 4 papers. Maybe you can now see why my lazy
>> self has not yet completed such a description.
>
> I understand your reasons, but still, creating the intro is part of the
> job. The Big Theory Statement needs not be exhaustively exhaustive. It
> should briefly explain the idea behind the algorithm and perhaps some
> important details of the implementation and provide pointers to papers
> where one can read more about it.

Agreed. I will eventually add them, but that is not to be considered
during the final evaluation.

>> There are nice descriptions of bucket join and splitting operations.
>
> Indeed there are. But especially without some introduction, I am afraid
> they are not that useful to someone who is trying to understand what is
> going on. I believe these comments are meant for someone who is already
> familiar with the hypothetical Big Theory Statement.

Understood. I will make writing the big picture description a priority
so that others can then check the code or the algorithm itself. Since
I will return to Prague during the weekend I can write and commit
such a description during the next week at the earliest.

Adam

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Re: [HelenOS-devel] scalable hash table - week 11

Reply via email to