Hi, we just discussed the very same idea. It seems that during cloning, an actor is present in the variable subscription list, but not in the Space::a_actors list, so we thought that maybe an actor disappeared without unsubscribing. We're using ES_SUBSUMED(this, home) everywhere in floatvar propagators to indicate entailment, I'll check whether that is done everywhere.
Thanks, Filip > Hi Mikael and I just had one more thing you could check: the behaviour could > also be due to a propagator that says subsumption but does not delete all of > its dependencies via cancel. > > Christian > > -- > Christian Schulte, www.ict.kth.se/~cschulte/ > > > -----Original Message----- > From: Christian Schulte [mailto:[email protected]] > Sent: Friday, February 13, 2009 11:12 AM > To: 'Filip Konvička'; '[email protected]' > Cc: 'Luboš Moric' > Subject: RE: [gecode-users] Cloning problems > > Hi, > > that sounds ugly ;-) Before I start with rambling about cloning, a tip: > Gecode supports unary branchings (with a single alternatiuve only). So you > could have a unary branching and don't have to have a mock-up empty branch. > The search engines even optimize that case. If you are looking for an > example, check the assign function in gecode/int.hh for assigning variables. > > Okay, cloning it is. So, there are two stages: copying and updating. During > copying you first create a new space and copy all actors. While copying all > actors you copy each variable and its domain used by an actor and collect a > list of all these variables (actually a list for each variable type). The > only thing that you do not do is copying the dependency array of a variable > (the array of pointers that point to propagators that depend on a variable). > Then in updating, you update the dependency arrays (and also deal with > advisors). > > Updating a dependency array just copies for each old entry the new entry > (stored via a forwarding pointer in each actor) and restore some information > for variables that has been stored in the copy of the dependency array. > After that you cleanup the space, the actors, and you are done. > > In VarImp<VIC>::enter, the --idx[0] makes room for an additional entry. Then > you keep moving actors to a free entry and by this create a new free entry > for the propagation condition i. When the loop is done, you know that there > is a free entry for propagation condition pc and that's where the propagator > is entered. > > Your guess about x->idx[0] in VarImp<VIC>::update is correct: this actually > stores the forwarding pointer for a variable implementation. Also your > suspicion about ActorLink::prev is correct: there the forwarding pointer for > actors is stored. > > Let me come up with some idea where the trouble could be: > - you forget to update a variable or view at some place. Either in your > branching, propagator, or model. > Then you can see an invalid pointer in the dependency array. > You might want to change the default constructor of VarBase in > gecode/kernel/var.icc and the default constructor > of VarViewBase in gecode/kernel/view.icc so that they set varimp to NULL > (actually, I just did that yesterday in the trunk as well). Then you will be > sure that when you forget to update, you'll see a NULL pointer. > - you commit a heinous crime in that you change the dependencies of a > variable with cancel/subscribe during > cloning. That is illegal and will for sure break everything. Note that > creating a propagator during > cloning will commit that crime, too. > > It is very very unlikely (but not impossible, of course) that the cloning in > Gecode has a snatch but, as said, very unlikely given that its used by > anything else. > > Hope that helps. Please do not hesitate to ask some more > Christian > > -- > Christian Schulte, www.ict.kth.se/~cschulte/ > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf > Of Filip Konvicka > Sent: Thursday, February 12, 2009 2:21 PM > To: [email protected] > Cc: Luboš Moric > Subject: [gecode-users] Cloning problems > > Hi, > > [Sorry, this is a looong message...] > > we're hunting a serious bug that occurs during space cloning in 2.2.0. > The bug occurs very rarely, but we have a testcase that triggers this > behavior. > > We have many constraints in the problem instance and the solver should > post as many propagators as possible. We have a custom branching for > this, which posts one propagator at a time in commit(), while the > alternative is not to post the propagator (i.e. a no-op). Because we're > only looking for the first solution, in the case of a failure we no > longer need the path back to the root in the recomputation tree, so we > decided to use our own simple search engine for this. The standard DFS > search engine exhibits exactly the same behavior (both with > recomputation on and off), and we don't see any problems with our search > engine. > > Everything seems to work for the vast majority of the test cases, but > there are a few instances that cause problems (probably) during cloning > (can be probably also be caused by some earlier bad subscibe or > unsubscribe). From our point of view, there is nothing wrong or special > about the instances. The crashes occur at the same location both on > Linux and Windows, in both release and debug builds. Changing memory > management (e.g. never deleting Spaces in the search engine) can cause > the crash to occur at slightly different places (e.g. some propagation > during status() after clone() finishes). > > One particular case we're looking at now crashes at core.icc:2270, where > f[0] is a bad pointer (0xfeeefeee at Windows). We're not sure how this > can happen - we know that in this case n==2 at core.icc:2255, so idx[0] > is bad pointer at core.icc:2252. This is also what Valgrind says on > Linux (bad read of size 4). > > When we were trying to debug the other cases, we found out that the > subscription list in a variable in the cloned space contained an actor > link that was probably copied incorrectly as it seemed as a pure > ActorLink like Space::a_actors, having a totally different address than > the rest of the actors (probably belonging to the original space > object). When we tried to find out when this actor link entered the > list, we ended up in VarImp<VIC>::update again. > > We're (of course:-)) using FloatVars in the model, and we eliminated all > other kinds of variables and propagators. In our case, pc_max==1 and > free_bits==0. > > We find it difficult to understand what is happening during cloning. We > would appreciate if someone explaned the basic idea. We only have > floatvars, propagators and one branching (no advisors or other types of > actors/branchings/advisors). > > We know how VarImp<VIC>::resize works, that's easy. In > VarImp<VIC>::enter, we can't see why you do "--idx[0];" as the first > iteration of the for cycle overwrites it (as long as pc>0, of course). > May be just optimization of course. As for VarImp<VIC>::update, we only > guess...we suspect that a) the original x->idx[0] is destroyed somewhere > so it needs to get restored from a memcpy backup at idx[0], b) > ActorLink::_prev is probably used to map old actors to new ones (thus > the "->prev()". We did not dig deep enough to be sure though, so we'd > welcome some guidance here. > > Cheers, > Filip > > > _______________________________________________ > Gecode users mailing list > [email protected] > https://www.gecode.org/mailman/listinfo/gecode-users > > -- Filip Konvička LOGIS, s.r.o. tel. +420 556 841 100 mob. +420 736 758 714 _______________________________________________ Gecode users mailing list [email protected] https://www.gecode.org/mailman/listinfo/gecode-users
