Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
On Dec 3 2018, Peter Bex wrote: On Mon, Dec 03, 2018 at 10:46:38AM +0100, Jörg F. Wittenberger wrote: So for me the question remains: wouldn't it be much, much more efficient to work sort-of hand-in-hand with one of the core developers, or maybe on the list to get the remaining things (bugs and improvements) fixed and reviewed? I think this would be quite helpful. Perhaps at another hackathon we can sit down together (ideally with more than one core developer to ensure we all are on the same page and understand it). Agreed. Or maybe the list? Could take time to find a chance to meet. This is one of the ugly truths of open source collaborative development; you really have to have a good plan on how to communicate the changes you're making back to "upstream",[...] Dropping a complex patch is generally not the way to go about adding code to an existing system.[...] eyeball it for obvious mistakes and other quality issues. Too true. Plus: it depends on the culture of the project. Try to put yourself in my shoes for a moment. Up to porting to chicken, I mostly contributed to rscheme, which was a one man show. When I hit an issue, I'd send a vague patch and back came a completely rewritten one after a day or two. Though it generally where small issues or rare failures of complex optimizations going wrong. This also means that the submitted code has to be so simple that others who aren't familiar with it can study it and debug it if issues crop up (and they will, with any sizable change). The scheduler is a major pain point regarding this, since concurrency is difficult enough (or impossible?) to understand at all, regardless of the quality of the code in the scheduler (which isn't stellar to begin with). So when I evaluated chicken, I found it to be a compiler producing slightly faster code than rscheme. First tests went well. Then I invested a lot of time until I could run a more sizable piece of code. Just to run into all sorts of issues. Taking #1564 for an example. It can be quite worse than just killing the program: When I ran into it, I was not always so lucky to find the thread piled up in the waiting queue to be in a state the consistency check would complain about. When the thread was blocked for a different mutex (hence sitting in two waiting queues at the same time), the mutex-unlock! would happily unblock it - thus stealing the other mutex from the third thread holding it. This kind of poked fun at the idea to use them for synchronization. At that point it did _not_ occur to me that my code would be especially complex a thing. Not did I assume nobody else had ever run more than toy examples on load-free systems. I assumed that it was obvious how badly broken it was. (And I did not foresee this not coming up elsewhere for a decade.) Sure I felt bad to have to bring up such a huge patch. But it fixed several interrelated bugs plus two counts of Big-O reduction of complexity. I might have expected some questions, comments etc. Certainly not being completely ignored. So I tried to push it for a while. Same goes for issues, which literally went against the text book examples used to teach how not to do things like not using dummy head lists in C - something I did not believe anybody would do. At least I expected the respective patch to be welcome. Especially as it was quite a job to actually change a large file and then test the results before submitting. Eventually I took being ignored as unwarrented... ...and lost faith in the project. So at some point, merging a large change is a bit of an act of faith. It also requires trust, which needs to be built up over time by showing consistent quality patches and commitment to the project. This is the really hard bit, especially if you just want one specific feature to be added and don't have that many other things to contribute to the system simply because it works for you. Yeah, I just needed a compiler as rscheme was dead. I did not want to turn into a teacher. Little hope I had that I could ever get something fed back upstream. Hence I did no longer try to. I don't have a good solution for this, but your suggestion to walk through the code together seems like a good one to me. Agreed. Chicken is not that bad. It just has a couple of rough edges. Best /Jörg ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
On Mon, Dec 03, 2018 at 10:46:38AM +0100, Jörg F. Wittenberger wrote: > Whats going on here IMHO is that a lot of lifetime, your guys and mine, is > wasted. At the same time the code quality of the result is likely worse that > what I'm using as the source to cut out those patches. [...] > > So for me the question remains: wouldn't it be much, much more efficient to > work sort-of hand-in-hand with one of the core developers, or maybe on the > list to get the remaining things (bugs and improvements) fixed and reviewed? I think this would be quite helpful. Perhaps at another hackathon we can sit down together (ideally with more than one core developer to ensure we all are on the same page and understand it). This is one of the ugly truths of open source collaborative development; you really have to have a good plan on how to communicate the changes you're making back to "upstream", or face porting nightmares every time you upgrade. I've made this mistake a few times too back in the day, some with CHICKEN (the Makefile refactor is one of those) and some with other projects. Dropping a complex patch is generally not the way to go about adding code to an existing system. On the other hand, sometimes one person or group creates such large changes that can't be split up (for instance the numbers stuff, or the chicken-install rewrite). At such points there is no realistic way to review everything, so the best the "upstream" can do is test the code extensively and eyeball it for obvious mistakes and other quality issues. This also means that the submitted code has to be so simple that others who aren't familiar with it can study it and debug it if issues crop up (and they will, with any sizable change). The scheduler is a major pain point regarding this, since concurrency is difficult enough (or impossible?) to understand at all, regardless of the quality of the code in the scheduler (which isn't stellar to begin with). So at some point, merging a large change is a bit of an act of faith. It also requires trust, which needs to be built up over time by showing consistent quality patches and commitment to the project. This is the really hard bit, especially if you just want one specific feature to be added and don't have that many other things to contribute to the system simply because it works for you. I don't have a good solution for this, but your suggestion to walk through the code together seems like a good one to me. Cheers, Peter signature.asc Description: PGP signature ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
Thanks you so much Kon, reviewing these logs helped to confirm my feelings. Feelings, not findings. Yet. Tinkering with these scheduler/srfi-18 issues again really made me feel bad and sorry. In fact the anger has cost me the sleep of the better half of the night. Still enrages me. Whats going on here IMHO is that a lot of lifetime, your guys and mine, is wasted. At the same time the code quality of the result is likely worse that what I'm using as the source to cut out those patches. As I can't outright proof this statement to you, let me recap the background for a moment: Around a decade ago I ported a rather thread-heavy thing (Askemos, which technically is something partially inspired by Erlang, bearing similarities to Termite - except that those processes are all made persistent and the states is replicated and synchronized in byzantine agreement over a part of the network; you might be able to imagine that this is really stressing the threading capabilities of the language in use) from rscheme to chicken. The code was at that time grown for ~7yrs; that's almost 100 modules, which took some months to port. ... ...Only to learn that the threading in chicken was not at all up for the job. Hence I spend a few more weeks fixing that one. Including adding an prio queue for timeout- and fd-list. What I could NOT produce where test cases for each of the bugs (1231, 1232, 1255, 1564 - like these are not all) I fixed in the process. Nor was is feasible to fix them one-by-one. (Yesterday evening I failed to properly backport the fix for 1564 into the ugly code implementing the timeout queue -- while asking myself why the hell it is useful; this queue should be replaced with a better version anyway.) The result I posted on chicken-users at that time. It was a complex fix. Sure. But those where sort of interrelated bugs. Then for about seven years I sadly maintained a chicken fork (which I'm still using in production) for these differences in order to be able to use chicken at all. Since 4.12 it is at least _possible_ to run this code on stock chicken. Partly because I changed my code to avoid triggering bugs remaining. So for me the question remains: wouldn't it be much, much more efficient to work sort-of hand-in-hand with one of the core developers, or maybe on the list to get the remaining things (bugs and improvements) fixed and reviewed? It would be so much more satisfying to me to actually produce code I could approve myself than backport yet another hotfix -- creating a result in the process I take issues with. (((Going into details, I'd probably do the prio-queue different today as I learned about chickens performance details. And I'm ready to do so. But at least I'd like to be allowed to use a prio queue using a proper interface than kludging inline handling of a linear list into well tested code -- likely creating fresh bugs in the process.))) Best /Jörg On Dec 2 2018, Kon Lovett wrote: see attached git (C4) & svn (C5) logs #(in C4 core local repo) git log --follow -p -- srfi-18.scm >srfi-18.log #(in C5 svn local repo) svn log --diff trunk >srfi-18_trunk-diff.log hth On Dec 2, 2018, at 1:19 PM, Jörg F. Wittenberger wrote: Thanks for the replies, chicken-install -r srfi-18 ; did the trick already I should have stated that that's what I have, what I've been looking for was the git history. I wonder for some statements why the hell they are there at all. Two possible reasons: a) I cleaned them up for being obsolete (due to former changes I made) b) removed since I touched the file, which begs the question "why where those added". Never mind. I can proceed at least. On Dec 2 2018, Kon Lovett wrote: well, that shows me. ;-) trying to track down why #497 $ chicken-install -r srfi-18 mapped (srfi-18) to () retrieving ... On Dec 2, 2018, at 10:42 AM, Kon Lovett wrote: C5 evicted srfi-18, along w/ srfi-1, 13, 14, & 69, to the egg store. chicken-install -retrieve. On Dec 2, 2018, at 10:39 AM, Jörg F. Wittenberger wrote: Hi all, when I tried to reply in a timely manner I apparently sent out a link to a broken file. Sorry for that. Just wanted to see if I could create a patch for the current master. For this I need srfi-18 egg source too. Just I can't find it. Jöry On Nov 30 2018, Jörg F. Wittenberger wrote: Hello Megane, On Nov 30 2018, megane wrote: Hi, Here's another version that crashes quickly with "very high probability". ... 24 Error: (mutex-unlock) Internal scheduler error: unknown thread state 25 # 26 ready This bears an uncanny resemblance to scheduler issues I've been fighting a long ago. Too long to ago. --- A fix Just allow the 'ready state for threads in mutex-unlock! ... Is this a correct fix? Too long ago. But it feels wrong. We'd rather make sure there is no ready thread in the queue waiting for a mutex in the first place. Diffing the changes I maintained quite a while back http://ball.askemos.
Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
Thanks for the replies, chicken-install -r srfi-18 ; did the trick already I should have stated that that's what I have, what I've been looking for was the git history. I wonder for some statements why the hell they are there at all. Two possible reasons: a) I cleaned them up for being obsolete (due to former changes I made) b) removed since I touched the file, which begs the question "why where those added". Never mind. I can proceed at least. On Dec 2 2018, Kon Lovett wrote: well, that shows me. ;-) trying to track down why #497 $ chicken-install -r srfi-18 mapped (srfi-18) to () retrieving ... On Dec 2, 2018, at 10:42 AM, Kon Lovett wrote: C5 evicted srfi-18, along w/ srfi-1, 13, 14, & 69, to the egg store. chicken-install -retrieve. On Dec 2, 2018, at 10:39 AM, Jörg F. Wittenberger wrote: Hi all, when I tried to reply in a timely manner I apparently sent out a link to a broken file. Sorry for that. Just wanted to see if I could create a patch for the current master. For this I need srfi-18 egg source too. Just I can't find it. Jöry On Nov 30 2018, Jörg F. Wittenberger wrote: Hello Megane, On Nov 30 2018, megane wrote: Hi, Here's another version that crashes quickly with "very high probability". ... 24 Error: (mutex-unlock) Internal scheduler error: unknown thread state 25 # 26 ready This bears an uncanny resemblance to scheduler issues I've been fighting a long ago. Too long to ago. --- A fix Just allow the 'ready state for threads in mutex-unlock! ... Is this a correct fix? Too long ago. But it feels wrong. We'd rather make sure there is no ready thread in the queue waiting for a mutex in the first place. Diffing the changes I maintained quite a while back http://ball.askemos.org/Ad60e3fb123a79b2e5128915116b288f7/chicken-4.9.1-ball.tar.gz you will find that I added a ##sys#thread-clear-blocking-state! Towards the end of scheduler.scm and used it for consistency whereever I ran into not-so-clean unlocks. Now this is still an invasive change. But looking at the source of scheduler and srfi-18 in chicken 5 right now, I can't fight the feeling that it is working around the missing changes at several places. Best /Jörg ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
well, that shows me. ;-) trying to track down why #497 $ chicken-install -r srfi-18 mapped (srfi-18) to () retrieving ... > On Dec 2, 2018, at 10:42 AM, Kon Lovett wrote: > > C5 evicted srfi-18, along w/ srfi-1, 13, 14, & 69, to the egg store. > > chicken-install -retrieve. > >> On Dec 2, 2018, at 10:39 AM, Jörg F. Wittenberger >> wrote: >> >> Hi all, >> >> when I tried to reply in a timely manner I apparently sent out a link to a >> broken file. Sorry for that. >> >> Just wanted to see if I could create a patch for the current master. >> >> For this I need srfi-18 egg source too. Just I can't find it. >> >> Jöry >> >> On Nov 30 2018, Jörg F. Wittenberger wrote: >> >>> Hello Megane, >>> >>> On Nov 30 2018, megane wrote: >>> Hi, Here's another version that crashes quickly with "very high probability". >>> ... 24 Error: (mutex-unlock) Internal scheduler error: unknown thread state 25# 26ready >>> >>> This bears an uncanny resemblance to scheduler issues I've been fighting a >>> long ago. >>> >>> Too long to ago. >>> --- A fix Just allow the 'ready state for threads in mutex-unlock! ... Is this a correct fix? >>> >>> >>> Too long ago. >>> >>> But it feels wrong. We'd rather make sure there is no ready thread in the >>> queue waiting for a mutex in the first place. >>> >>> Diffing the changes I maintained quite a while back >>> http://ball.askemos.org/Ad60e3fb123a79b2e5128915116b288f7/chicken-4.9.1-ball.tar.gz >>> you will find that I added a >>> >>> ##sys#thread-clear-blocking-state! >>> >>> Towards the end of scheduler.scm and used it for consistency whereever I >>> ran into not-so-clean unlocks. >>> >>> Now this is still an invasive change. But looking at the source of >>> scheduler and srfi-18 in chicken 5 right now, I can't fight the feeling >>> that it is working around the missing changes at several places. >>> >>> Best >>> >>> /Jörg >>> >>> >>> ___ >>> Chicken-hackers mailing list >>> Chicken-hackers@nongnu.org >>> https://lists.nongnu.org/mailman/listinfo/chicken-hackers >>> >> >> ___ >> Chicken-hackers mailing list >> Chicken-hackers@nongnu.org >> https://lists.nongnu.org/mailman/listinfo/chicken-hackers > ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
C5 evicted srfi-18, along w/ srfi-1, 13, 14, & 69, to the egg store. chicken-install -retrieve. > On Dec 2, 2018, at 10:39 AM, Jörg F. Wittenberger > wrote: > > Hi all, > > when I tried to reply in a timely manner I apparently sent out a link to a > broken file. Sorry for that. > > Just wanted to see if I could create a patch for the current master. > > For this I need srfi-18 egg source too. Just I can't find it. > > Jöry > > On Nov 30 2018, Jörg F. Wittenberger wrote: > >> Hello Megane, >> >> On Nov 30 2018, megane wrote: >> >>> Hi, >>> >>> Here's another version that crashes quickly with "very high >>> probability". >> ... >>> 24 Error: (mutex-unlock) Internal scheduler error: unknown thread state >>> 25# >>> 26ready >> >> This bears an uncanny resemblance to scheduler issues I've been fighting a >> long ago. >> >> Too long to ago. >> >>> --- A fix >>> >>> Just allow the 'ready state for threads in mutex-unlock! >>> >>> ... >>> Is this a correct fix? >> >> >> Too long ago. >> >> But it feels wrong. We'd rather make sure there is no ready thread in the >> queue waiting for a mutex in the first place. >> >> Diffing the changes I maintained quite a while back >> http://ball.askemos.org/Ad60e3fb123a79b2e5128915116b288f7/chicken-4.9.1-ball.tar.gz >> you will find that I added a >> >> ##sys#thread-clear-blocking-state! >> >> Towards the end of scheduler.scm and used it for consistency whereever I ran >> into not-so-clean unlocks. >> >> Now this is still an invasive change. But looking at the source of scheduler >> and srfi-18 in chicken 5 right now, I can't fight the feeling that it is >> working around the missing changes at several places. >> >> Best >> >> /Jörg >> >> >> ___ >> Chicken-hackers mailing list >> Chicken-hackers@nongnu.org >> https://lists.nongnu.org/mailman/listinfo/chicken-hackers >> > > ___ > Chicken-hackers mailing list > Chicken-hackers@nongnu.org > https://lists.nongnu.org/mailman/listinfo/chicken-hackers ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
Hi all, when I tried to reply in a timely manner I apparently sent out a link to a broken file. Sorry for that. Just wanted to see if I could create a patch for the current master. For this I need srfi-18 egg source too. Just I can't find it. Jöry On Nov 30 2018, Jörg F. Wittenberger wrote: Hello Megane, On Nov 30 2018, megane wrote: Hi, Here's another version that crashes quickly with "very high probability". ... 24 Error: (mutex-unlock) Internal scheduler error: unknown thread state 25 # 26 ready This bears an uncanny resemblance to scheduler issues I've been fighting a long ago. Too long to ago. --- A fix Just allow the 'ready state for threads in mutex-unlock! ... Is this a correct fix? Too long ago. But it feels wrong. We'd rather make sure there is no ready thread in the queue waiting for a mutex in the first place. Diffing the changes I maintained quite a while back http://ball.askemos.org/Ad60e3fb123a79b2e5128915116b288f7/chicken-4.9.1-ball.tar.gz you will find that I added a ##sys#thread-clear-blocking-state! Towards the end of scheduler.scm and used it for consistency whereever I ran into not-so-clean unlocks. Now this is still an invasive change. But looking at the source of scheduler and srfi-18 in chicken 5 right now, I can't fight the feeling that it is working around the missing changes at several places. Best /Jörg ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
Hello Megane, On Nov 30 2018, megane wrote: Hi, Here's another version that crashes quickly with "very high probability". ... 24 Error: (mutex-unlock) Internal scheduler error: unknown thread state 25 # 26 ready This bears an uncanny resemblance to scheduler issues I've been fighting a long ago. Too long to ago. --- A fix Just allow the 'ready state for threads in mutex-unlock! ... Is this a correct fix? Too long ago. But it feels wrong. We'd rather make sure there is no ready thread in the queue waiting for a mutex in the first place. Diffing the changes I maintained quite a while back http://ball.askemos.org/Ad60e3fb123a79b2e5128915116b288f7/chicken-4.9.1-ball.tar.gz you will find that I added a ##sys#thread-clear-blocking-state! Towards the end of scheduler.scm and used it for consistency whereever I ran into not-so-clean unlocks. Now this is still an invasive change. But looking at the source of scheduler and srfi-18 in chicken 5 right now, I can't fight the feeling that it is working around the missing changes at several places. Best /Jörg ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers