On 6/13/21 12:15 PM, Elena via Python-list wrote: > Hi, I have, say 10 variables (x1 ... x10) which can assume discrete finite > values, for instance [0,1 or 2]. > I need to build a set of rules, such as: > > 1) if x1==0 and x2==1 and x10==2 then y = 1 > 2) if x2==1 and x3==1 and x4==2 and x6==0 then y = 0 > 3) if x2==0 and x3==1 then y = 2 > 4) if x6==0 and x7==2 then y = 0 > ... > ... > (actually it can be seen as a decision tree classifier). > > y can assume the same discrete value [0,1 or 2] > I don't know a-priori anything about the number of rules and the > combinations of the tested inputs. > > Given a dataset of X={(x1... x10)} I can calculate Y=f(X) where f is this > rule-based function. > > I know an operator g that can calculate a real value from Y: e = g(Y) > g is too complex to be written analytically. > > I would like to find a set of rules f able to minimize e on X. > > I know the problem can become NP-hard, but I would be fine also with a > suboptimal solution. > > What's the best way to approach the problem? > In case, does something already exist in python? > > > thank you
My first feeling is that this doesn't have a lot of structure to do a lot of optimizations, and just brute forcing might be the answer. Of course, one option is just sort the rules by Y, then iterate through the rules and see if any data matches, for that putting the dataset into something like an SQLite database with indexes on the various X columns might work (with 10 columns, the 45 indexes on each column pair might make things reasonably efficient.) The biggest gain would happen if you could look at the rule set and find patterns in it. The big question is making sure that the rule set covers every value in the data array, and never gives one input value two different y values. -- Richard Damon -- https://mail.python.org/mailman/listinfo/python-list