starsavari wrote: > > We are considering drool rule engine for data profiling (run the data > through the various regex pattern for data analysis) on a large amount of > data. We have rules to determine the SSN, Phone number, Driver license > number, tracking number etc. Most of our rules are regex pattern matching. > I've developed a sample program with 5 rules, all are pattern matching > rules and running the rules against million facts/objects (each object has > 2 string fields). It takes around 58 seconds versus 2 seconds when I run > the same thing with plain JAVA code. > >
I ran into a similar issue but ran out of time before I could put together a clean fix. Attached is a partial fix to TEST at your own risk. It basically caches the regular expression instead of re-compiling it on every match. It has no limits on size so some might call it a memory leaker. http://www.nabble.com/file/p23842739/MatchesEvaluatorsDefinition.java MatchesEvaluatorsDefinition.java Check out thread http://www.nabble.com/Help-with-MatchesEvaluatorsDefinition-that-caches-compiled-regular-expression-patterns-td23377792.html -- View this message in context: http://www.nabble.com/Performance-is-too-bad-when-using-matches-predicate-tp23841498p23842739.html Sent from the drools - user mailing list archive at Nabble.com. _______________________________________________ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users