Re: [rules-users] Parallelization
Well, I just wanted to report some results... I added the GC switch -server, just to be sure, but anyway, I was having endless trouble with my ThreadPoolExecutor, and switched to: ExecutorService threadPool = Executors.newFixedThreadPool(5); And my claims now process at an unfathomable 5ms each! A week ago, it was 330ms per claim. So, I've managed to get a 66-fold speed increase. Thanks for all your help. I reckon the boss will be happy with this. -- View this message in context: http://drools-java-rules-engine.46999.n3.nabble.com/Parallelization-tp809341p812388.html Sent from the Drools - User mailing list archive at Nabble.com. ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users
[rules-users] Parallelization
Hi Drools squad, This is a follow-up to my previous speed-related post. By boss is still pushing to get 35ms down a bit, and I'm looking at parallelization options. I've looked through the forums, but not successfully... The options I see, are: 1. KnowledgeBase partitioning (setting KnowledgeBaseConfiguration to use multi-threads) - I tried this, and got the error pasted at the bottom. My suspicion is that it starts a thread, and meanwhile the Java thread continues, and disposes of the session before evaluation is complete. 2. Creating multiple Java threads, each of which starts its own KnowledgeSession. - I started this, but need to confirm that this is possible. What's happening currently, is that the Java thread continues, and closes my database connection prematurely, and so, I am working on adding some sort of counting-semaphore, to wait for all the threads to complete before continuing the Java thread. Should I pursue either of these ideas? I will probably work on the second today. The other idea I had was to try Sequential Mode, but I don't think my data is applicable to a StatelessKnowledgeSession. Thanks, Daniel *** Partition task manager caught an unexpected exception: null Drools is capturing the exception to avoid thread death. Please report stack trace to development team. java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1760) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) at org.drools.reteoo.PartitionTaskManager.enqueue(PartitionTaskManager.java:75) at org.drools.reteoo.AsyncCompositeObjectSinkAdapter.doPropagateAssertObject(AsyncCompositeObjectSinkAdapter.java:49) at org.drools.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:344) at org.drools.reteoo.AlphaNode.assertObject(AlphaNode.java:147) at org.drools.reteoo.PartitionTaskManager$FactAssertAction.execute(PartitionTaskManager.java:188) at org.drools.reteoo.PartitionTaskManager$PartitionTask.run(PartitionTaskManager.java:112) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- View this message in context: http://drools-java-rules-engine.46999.n3.nabble.com/Parallelization-tp809341p809341.html Sent from the Drools - User mailing list archive at Nabble.com. ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users
Re: [rules-users] Parallelization
Does your system support parallel execution of Java threads on multiple processors? Otherwise I don't see how parallelization will gain much since the Rete evaluation itself is clearly compile-time bound. How are you timing these 53ms? Does this include input time for your facts? Frequently, much of a program's elapsed time is saved by adopting a better i/o strategy. Also, IIRC, there was the Utitilities method comparing times. Have you looked into possible speed gains there, i.e., what's the type of the date attribute and how does that U-method work? -W On Tue, May 11, 2010 at 10:40 AM, djb dbrownel...@hotmail.com wrote: Hi Drools squad, This is a follow-up to my previous speed-related post. By boss is still pushing to get 35ms down a bit, and I'm looking at parallelization options. I've looked through the forums, but not successfully... The options I see, are: 1. KnowledgeBase partitioning (setting KnowledgeBaseConfiguration to use multi-threads) - I tried this, and got the error pasted at the bottom. My suspicion is that it starts a thread, and meanwhile the Java thread continues, and disposes of the session before evaluation is complete. 2. Creating multiple Java threads, each of which starts its own KnowledgeSession. - I started this, but need to confirm that this is possible. What's happening currently, is that the Java thread continues, and closes my database connection prematurely, and so, I am working on adding some sort of counting-semaphore, to wait for all the threads to complete before continuing the Java thread. Should I pursue either of these ideas? I will probably work on the second today. The other idea I had was to try Sequential Mode, but I don't think my data is applicable to a StatelessKnowledgeSession. Thanks, Daniel *** Partition task manager caught an unexpected exception: null Drools is capturing the exception to avoid thread death. Please report stack trace to development team. java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1760) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) at org.drools.reteoo.PartitionTaskManager.enqueue(PartitionTaskManager.java:75) at org.drools.reteoo.AsyncCompositeObjectSinkAdapter.doPropagateAssertObject(AsyncCompositeObjectSinkAdapter.java:49) at org.drools.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:344) at org.drools.reteoo.AlphaNode.assertObject(AlphaNode.java:147) at org.drools.reteoo.PartitionTaskManager$FactAssertAction.execute(PartitionTaskManager.java:188) at org.drools.reteoo.PartitionTaskManager$PartitionTask.run(PartitionTaskManager.java:112) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- View this message in context: http://drools-java-rules-engine.46999.n3.nabble.com/Parallelization-tp809341p809341.html Sent from the Drools - User mailing list archive at Nabble.com. ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users
Re: [rules-users] Parallelization
Hi Wolfgang, Ok, well I implemented my option #2, which has cut it down to 23ms, which is a good start. My timing is done by taking the time before, and after, and dividing by the number of claims processed. (and averaging over a few runs) I use one thread per StatefulKnowledgeSession... My machine has 2 cores, but it will eventually be running on an 8 core beast, so i reckon this was a good improvement. I was just worried that I wouldn't be able to simultaneously process multiple K-Sessions, but apparently, Drools doesn't mind. I'm pretty sure any machine with multiple cores supports parallel java threads, no? - Regarding my Utilities method, eg. isWithinTimePeriod(20100308, 20090405, 1, Y) I can get about 5ms off by commenting out the eval, so it's not going to be a big jump even if I fix it, but, well, I am using MMdd Strings, which in the method, I sub-stringed, converted to ints, instantiated DateMidnight objects, and compared using Joda-time daysBetween/monthsBetween/yearsBetween methods. My thought was that pre-converting to ints would help, so that each ClaimLine has year/month/day int variables, and pass them in instead. (i.e., Saves 3 String.substring()'s, and 3 Integer.parseInt()). but that actually slowed it down a few milliseconds. (Maybe passing 6 params instead of 2?!) I'm comparing two dates by an arbitrary period, like 2 days or 1 month, and need the framework of the Gregorian Calendar. So, I don't think I can do anything about this. 2 months is never guaranteed to be a set number of milliseconds. It all depends on the claim date, which is fact data, and therefore variable. Regards, Daniel -- View this message in context: http://drools-java-rules-engine.46999.n3.nabble.com/Parallelization-tp809341p809753.html Sent from the Drools - User mailing list archive at Nabble.com. ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users
Re: [rules-users] Parallelization
On Tue, May 11, 2010 at 2:55 PM, djb dbrownel...@hotmail.com wrote: Hi Wolfgang, I use one thread per StatefulKnowledgeSession... My machine has 2 cores, but it will eventually be running on an 8 core beast, so i reckon this was a good improvement. I was just worried that I wouldn't be able to simultaneously process multiple K-Sessions, but apparently, Drools doesn't mind. I'm pretty sure any machine with multiple cores supports parallel java threads, no? This is a Q for a Java guru (which I'm not), and may depend on the JVM and what not. Care to provide details? Probably there's people on this list who might know. (Otherwise, I have a good contact.) - Regarding my Utilities method, eg. isWithinTimePeriod(20100308, 20090405, 1, Y) I can get about 5ms off by commenting out the eval, so it's not going to be a big jump even if I fix it, but, well, I am using MMdd Strings, which in the method, I sub-stringed, converted to ints, instantiated DateMidnight objects, and compared using Joda-time daysBetween/monthsBetween/yearsBetween methods. My thought was that pre-converting to ints would help, so that each ClaimLine has year/month/day int variables, and pass them in instead. (i.e., Saves 3 String.substring()'s, and 3 Integer.parseInt()). but that actually slowed it down a few milliseconds. (Maybe passing 6 params instead of 2?!) You can treat 20100510 as an integer defining one date; all relations between such numbers would hold as for the true dates - you just can't compute durations (days between dates) by simple substraction. I'm comparing two dates by an arbitrary period, like 2 days or 1 month, and need the framework of the Gregorian Calendar. So, I don't think I can do anything about this. 2 months is never guaranteed to be a set number of milliseconds. It all depends on the claim date, which is fact data, and therefore variable. I'm not sure how the other arguments in the call (Y, 99) are related to rules or facts, but a repeated test whether one date dx is between d1 and d2 where d2 depends on d1 and a duration would certainly gain from computing d2 once. -W ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users
Re: [rules-users] Parallelization
They are for simple CEP type applications, you won't see a benefit else where, possibly a slow down if you are doing more business type rules. Mark On 11/05/2010 09:40, djb wrote: Hi Drools squad, This is a follow-up to my previous speed-related post. By boss is still pushing to get 35ms down a bit, and I'm looking at parallelization options. I've looked through the forums, but not successfully... The options I see, are: 1. KnowledgeBase partitioning (setting KnowledgeBaseConfiguration to use multi-threads) - I tried this, and got the error pasted at the bottom. My suspicion is that it starts a thread, and meanwhile the Java thread continues, and disposes of the session before evaluation is complete. 2. Creating multiple Java threads, each of which starts its own KnowledgeSession. - I started this, but need to confirm that this is possible. What's happening currently, is that the Java thread continues, and closes my database connection prematurely, and so, I am working on adding some sort of counting-semaphore, to wait for all the threads to complete before continuing the Java thread. Should I pursue either of these ideas? I will probably work on the second today. The other idea I had was to try Sequential Mode, but I don't think my data is applicable to a StatelessKnowledgeSession. Thanks, Daniel *** Partition task manager caught an unexpected exception: null Drools is capturing the exception to avoid thread death. Please report stack trace to development team. java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1760) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) at org.drools.reteoo.PartitionTaskManager.enqueue(PartitionTaskManager.java:75) at org.drools.reteoo.AsyncCompositeObjectSinkAdapter.doPropagateAssertObject(AsyncCompositeObjectSinkAdapter.java:49) at org.drools.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:344) at org.drools.reteoo.AlphaNode.assertObject(AlphaNode.java:147) at org.drools.reteoo.PartitionTaskManager$FactAssertAction.execute(PartitionTaskManager.java:188) at org.drools.reteoo.PartitionTaskManager$PartitionTask.run(PartitionTaskManager.java:112) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users
Re: [rules-users] Parallelization
On 11/05/2010 13:55, djb wrote: Hi Wolfgang, Ok, well I implemented my option #2, which has cut it down to 23ms, which is a good start. My timing is done by taking the time before, and after, and dividing by the number of claims processed. (and averaging over a few runs) I use one thread per StatefulKnowledgeSession... My machine has 2 cores, but it will eventually be running on an 8 core beast, so i reckon this was a good improvement. I was just worried that I wouldn't be able to simultaneously process multiple K-Sessions, but apparently, Drools doesn't mind. I'm pretty sure any machine with multiple cores supports parallel java threads, no? multiple ksessions running in different threads all sharing the same kbase is perfectively acceptable, it was designed that way. Mark - Regarding my Utilities method, eg. isWithinTimePeriod(20100308, 20090405, 1, Y) I can get about 5ms off by commenting out the eval, so it's not going to be a big jump even if I fix it, but, well, I am using MMdd Strings, which in the method, I sub-stringed, converted to ints, instantiated DateMidnight objects, and compared using Joda-time daysBetween/monthsBetween/yearsBetween methods. My thought was that pre-converting to ints would help, so that each ClaimLine has year/month/day int variables, and pass them in instead. (i.e., Saves 3 String.substring()'s, and 3 Integer.parseInt()). but that actually slowed it down a few milliseconds. (Maybe passing 6 params instead of 2?!) I'm comparing two dates by an arbitrary period, like 2 days or 1 month, and need the framework of the Gregorian Calendar. So, I don't think I can do anything about this. 2 months is never guaranteed to be a set number of milliseconds. It all depends on the claim date, which is fact data, and therefore variable. Regards, Daniel ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users
Re: [rules-users] Parallelization
Hi Daniel, I was reading the other day that a JVM implementation does not necessarily have to run Java threads in different Processes (taking advantage of multiple cores). If you saw a significant speedup then I would assume your JVM does this. It is worth investigating for your production deployment. I would think that recent JVMs on modern operating systems would support this, but I also wouldn't leave it up to chance. This post seems to imply that the only JVM/OS combinations that don't support native threads are Java 1.2 or Solaris: http://forums.sun.com/thread.jspa?threadID=5330507 About StatefulKnowledgeSessions: You should be able to run these in parallel no problem. -Steve rules-users-boun...@lists.jboss.org wrote on 05/11/2010 07:55:18 AM: From: djb dbrownel...@hotmail.com To: rules-users@lists.jboss.org Date: 05/11/2010 08:01 AM Subject: Re: [rules-users] Parallelization Sent by: rules-users-boun...@lists.jboss.org Hi Wolfgang, Ok, well I implemented my option #2, which has cut it down to 23ms, which is a good start. My timing is done by taking the time before, and after, and dividing by the number of claims processed. (and averaging over a few runs) I use one thread per StatefulKnowledgeSession... My machine has 2 cores, but it will eventually be running on an 8 core beast, so i reckon this was a good improvement. I was just worried that I wouldn't be able to simultaneously process multiple K-Sessions, but apparently, Drools doesn't mind. I'm pretty sure any machine with multiple cores supports parallel java threads, no? - Regarding my Utilities method, eg. isWithinTimePeriod(20100308, 20090405, 1, Y) I can get about 5ms off by commenting out the eval, so it's not going to be a big jump even if I fix it, but, well, I am using MMdd Strings, which in the method, I sub-stringed, converted to ints, instantiated DateMidnight objects, and compared using Joda-time daysBetween/monthsBetween/yearsBetween methods. My thought was that pre-converting to ints would help, so that each ClaimLine has year/month/day int variables, and pass them in instead. (i.e., Saves 3 String.substring()'s, and 3 Integer.parseInt()). but that actually slowed it down a few milliseconds. (Maybe passing 6 params instead of 2?!) I'm comparing two dates by an arbitrary period, like 2 days or 1 month, and need the framework of the Gregorian Calendar. So, I don't think I can do anything about this. 2 months is never guaranteed to be a set number of milliseconds. It all depends on the claim date, which is fact data, and therefore variable. Regards, Daniel -- View this message in context: http://drools-java-rules-engine. 46999.n3.nabble.com/Parallelization-tp809341p809753.html Sent from the Drools - User mailing list archive at Nabble.com. ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users
Re: [rules-users] Parallelization
I'm not a guru but I'm pretty certain all modern JVMs support multiple cores well. You will probably make sure you are using the server VM (-server) and not the client VM (assuming you are using the standard VM). Depending on what you current machine is and whether or not you have 2 cpu's or just two cores (and if java can tell the difference) you may or may not currently be running the server vm (see http://java.sun.com/javase/6/docs/technotes/guides/vm/server-class.html) If you aren't then you may be very lucky and find that switching to the server class vm gives you those few extra ms to keep your boss happy. Presumably you have also done the other standard optimization tricks to ensure you have sufficient heap memory etc? Thomas From: rules-users-boun...@lists.jboss.org [mailto:rules-users-boun...@lists.jboss.org] On Behalf Of Wolfgang Laun Sent: 11 May 2010 14:32 To: Rules Users List Subject: Re: [rules-users] Parallelization On Tue, May 11, 2010 at 2:55 PM, djb dbrownel...@hotmail.commailto:dbrownel...@hotmail.com wrote: Hi Wolfgang, I use one thread per StatefulKnowledgeSession... My machine has 2 cores, but it will eventually be running on an 8 core beast, so i reckon this was a good improvement. I was just worried that I wouldn't be able to simultaneously process multiple K-Sessions, but apparently, Drools doesn't mind. I'm pretty sure any machine with multiple cores supports parallel java threads, no? This is a Q for a Java guru (which I'm not), and may depend on the JVM and what not. Care to provide details? Probably there's people on this list who might know. (Otherwise, I have a good contact.) - Regarding my Utilities method, eg. isWithinTimePeriod(20100308, 20090405, 1, Y) I can get about 5ms off by commenting out the eval, so it's not going to be a big jump even if I fix it, but, well, I am using MMdd Strings, which in the method, I sub-stringed, converted to ints, instantiated DateMidnight objects, and compared using Joda-time daysBetween/monthsBetween/yearsBetween methods. My thought was that pre-converting to ints would help, so that each ClaimLine has year/month/day int variables, and pass them in instead. (i.e., Saves 3 String.substring()'s, and 3 Integer.parseInt()). but that actually slowed it down a few milliseconds. (Maybe passing 6 params instead of 2?!) You can treat 20100510 as an integer defining one date; all relations between such numbers would hold as for the true dates - you just can't compute durations (days between dates) by simple substraction. I'm comparing two dates by an arbitrary period, like 2 days or 1 month, and need the framework of the Gregorian Calendar. So, I don't think I can do anything about this. 2 months is never guaranteed to be a set number of milliseconds. It all depends on the claim date, which is fact data, and therefore variable. I'm not sure how the other arguments in the call (Y, 99) are related to rules or facts, but a repeated test whether one date dx is between d1 and d2 where d2 depends on d1 and a duration would certainly gain from computing d2 once. -W ** This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmas...@nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. NDS Limited. Registered Office: One London Road, Staines, Middlesex, TW18 4EX, United Kingdom. A company registered in England and Wales. Registered no. 3080780. VAT no. GB 603 8808 40-00 ** ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users
Re: [rules-users] Parallelization
If you're using the sun VM multiple Threads will use multiple cores just fine. What version of java? Are you intending to max out the CPU on the machine? If so I suggest this configuration, for an 8 core box: 7 Threads, each running a StatefulKnowledgeSession. Concurrent garbage collection, with VM startup parameters like this: -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxGCPauseMillis=150 -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=2 Here's a good reference for GC parameters: http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html And if you're feeling frisky, try out these to see if it improves performance: -XX:+UseTLAB -XX:+UseSpinning -XX:+UseFastAccessorMethods The default recommendation for maxing out the CPU is to use num_cores+1 Threads, but in this case since garbage collection may be an issue I'd reduce that by two. You need one core dedicated to other stuff (I/O and whatnot) and some resources for GC. --- On Tue, 5/11/10, djb dbrownel...@hotmail.com wrote: From: djb dbrownel...@hotmail.com Subject: Re: [rules-users] Parallelization To: rules-users@lists.jboss.org Date: Tuesday, May 11, 2010, 7:55 AM Hi Wolfgang, Ok, well I implemented my option #2, which has cut it down to 23ms, which is a good start. My timing is done by taking the time before, and after, and dividing by the number of claims processed. (and averaging over a few runs) I use one thread per StatefulKnowledgeSession... My machine has 2 cores, but it will eventually be running on an 8 core beast, so i reckon this was a good improvement. I was just worried that I wouldn't be able to simultaneously process multiple K-Sessions, but apparently, Drools doesn't mind. I'm pretty sure any machine with multiple cores supports parallel java threads, no? - Regarding my Utilities method, eg. isWithinTimePeriod(20100308, 20090405, 1, Y) I can get about 5ms off by commenting out the eval, so it's not going to be a big jump even if I fix it, but, well, I am using MMdd Strings, which in the method, I sub-stringed, converted to ints, instantiated DateMidnight objects, and compared using Joda-time daysBetween/monthsBetween/yearsBetween methods. My thought was that pre-converting to ints would help, so that each ClaimLine has year/month/day int variables, and pass them in instead. (i.e., Saves 3 String.substring()'s, and 3 Integer.parseInt()). but that actually slowed it down a few milliseconds. (Maybe passing 6 params instead of 2?!) I'm comparing two dates by an arbitrary period, like 2 days or 1 month, and need the framework of the Gregorian Calendar. So, I don't think I can do anything about this. 2 months is never guaranteed to be a set number of milliseconds. It all depends on the claim date, which is fact data, and therefore variable. Regards, Daniel -- View this message in context: http://drools-java-rules-engine.46999.n3.nabble.com/Parallelization-tp809341p809753.html Sent from the Drools - User mailing list archive at Nabble.com. ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users ___ rules-users mailing list rules-users@lists.jboss.org https://lists.jboss.org/mailman/listinfo/rules-users