[jira] Commented: (SM-625) Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.MultipleFlowsTest
[ https://issues.apache.org/activemq/browse/SM-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40378 ] Oleg Zhurakousky commented on SM-625: - You're right; there is no way of knowing if an Endpoint exists anywhere in the cluster. But I think the question I am raising is: Would it be a good idea to give a requester a benefit of the doubt. In other words if you ask for something and I don't see it right away, I'll make an attempt and take some time to look for it before I give you a definitive NO. In this situation we are at the mercy of network, its speed and configuration (something we can't control from SM) and I think the least we could do is to give networking components to do what it is supposed to do and either come up with an endpoint or with message stating a potential reason why endpoint was not found. Here is a sample code representing what I am talking about. int retries = 0; while (endpoints.length 1 || retries 5){ endpoints = resolveAvailableEndpoints(context, exchange); Thread.sleep(500); retries++; } Most of the time if Endpoint is there the loop will exit right away. It will only execute retry logic if array is empty. P.S. I am purposely trying to be a devils advocate here. I am not sure I completely agree with this solution and mainly doing it to facilitate the discussion since one thing I definitely agree. . . it is a part of the bigger issue. Meanwhile I'll be digging. . . . Also, as far as fixing the test, I can definitely wrap the sendMessages(..) call in the retry logic instead of relying on Thread.sleep(..) but would it really solve a bigger problem. . . Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.MultipleFlowsTest - Key: SM-625 URL: https://issues.apache.org/activemq/browse/SM-625 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Reporter: Fritz Oconer Fix For: 3.2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SM-625) Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.MultipleFlowsTest
[ https://issues.apache.org/activemq/browse/SM-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40378 ] ozhurakousky edited comment on SM-625 at 10/15/07 5:38 AM: --- You're right; there is no way of knowing if an Endpoint exists anywhere in the cluster. But I think the question I am raising is: Would it be a good idea to give a requester a benefit of the doubt. In other words if you ask for something and I don't see it right away, I'll make an attempt and take some time to look for it before I give you a definitive NO. In this situation we are at the mercy of network, its speed and configuration (something we can't control from SM) and I think the least we could do is to give networking components to do what it is supposed to do and either come up with an endpoint or with message stating a potential reason why endpoint was not found. Here is a sample code representing what I am talking about. {code} int retries = 0; while (endpoints.length 1 || retries 5){ endpoints = resolveAvailableEndpoints(context, exchange); Thread.sleep(500); retries++; } {/code} Most of the time if Endpoint is there the loop will exit right away. It will only execute retry logic if array is empty. P.S. I am purposely trying to be a devils advocate here. I am not sure I completely agree with this solution and mainly doing it to facilitate the discussion since one thing I definitely agree. . . it is a part of the bigger issue. Meanwhile I'll be digging. . . . Also, as far as fixing the test, I can definitely wrap the sendMessages(..) call in the retry logic instead of relying on Thread.sleep(..) but would it really solve a bigger problem. . . was (Author: ozhurakousky): You're right; there is no way of knowing if an Endpoint exists anywhere in the cluster. But I think the question I am raising is: Would it be a good idea to give a requester a benefit of the doubt. In other words if you ask for something and I don't see it right away, I'll make an attempt and take some time to look for it before I give you a definitive NO. In this situation we are at the mercy of network, its speed and configuration (something we can't control from SM) and I think the least we could do is to give networking components to do what it is supposed to do and either come up with an endpoint or with message stating a potential reason why endpoint was not found. Here is a sample code representing what I am talking about. int retries = 0; while (endpoints.length 1 || retries 5){ endpoints = resolveAvailableEndpoints(context, exchange); Thread.sleep(500); retries++; } Most of the time if Endpoint is there the loop will exit right away. It will only execute retry logic if array is empty. P.S. I am purposely trying to be a devils advocate here. I am not sure I completely agree with this solution and mainly doing it to facilitate the discussion since one thing I definitely agree. . . it is a part of the bigger issue. Meanwhile I'll be digging. . . . Also, as far as fixing the test, I can definitely wrap the sendMessages(..) call in the retry logic instead of relying on Thread.sleep(..) but would it really solve a bigger problem. . . Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.MultipleFlowsTest - Key: SM-625 URL: https://issues.apache.org/activemq/browse/SM-625 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Reporter: Fritz Oconer Fix For: 3.2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SM-630) org.apache.servicemix.jbi.messaging.JcaFlowTransactionTest
[ https://issues.apache.org/activemq/browse/SM-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40383 ] Oleg Zhurakousky commented on SM-630: - Quick comment The scenarios within this test seem to be following what's described here: http://incubator.apache.org/servicemix/transactions.html Each test is encapsulated in the try/catch block with fail(. . .) call which never gets executed (and it shouldn't). If there is any other problems I can't seem to replicate it other then a potential issue with clustered send/receives, but they all part of the bigger issue where there is not enough time (especially on the good hardware) for Demand Forwarding Bridge to setup. org.apache.servicemix.jbi.messaging.JcaFlowTransactionTest -- Key: SM-630 URL: https://issues.apache.org/activemq/browse/SM-630 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: incubation Reporter: Fritz Oconer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SM-628) org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest
[ https://issues.apache.org/activemq/browse/SM-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Zhurakousky resolved SM-628. - Resolution: Fixed I was able to replicate inconsistencies of this and other similar tests and figure out why they were so unstable. Even though in the async message world we have to do something to give messages enough time to arrive It is hard to rely on Thread.sleep(...). Some of us have better hardware then others, thus making it difficult to know the optimum time needed for messages to arrive. Here is what I did to make sure that I am right Firs I ran this test several times with different Thread.sleep(..) values and start noticing that 9 out of 10 times it will run just fine, but then it would fail the test (mostly in the Cluster portion of the test) failing on this line: assertFalse(receiver.getMessageList().hasReceivedMessage()); That was interesting because we are flushing all the messages before each message batch send. I was even able to verify with few debug statements that receivers are empty. I start suspecting that messages were still arriving after flush was executed. To verify it I had to modify the message content to include some type of unique identifier. When I did (simple static counter such as 1, 2, 3, 4 . . . etc.), everything became clear. First test in the cluster test sends a batch of 10 messages with identifier 1, then flush, then second batch with identifier 2 and so on. Well, very quickly I start seeing that occasionally after sending second batch with identifier 2, I would still get a message on the receiver with identifier 1. So, that is pretty much the story, if you guys need more details, just let me know. The good news is that there are several tests that are very similar (JCAFlowTest, MultimpleJMSFlowTest etc.) which have the same problem and obviously the same fix. I am recommending the attached two patches. One is a new class (ClusterFlowTestHelper) that should be a base class for all the tests similar to JMSFlow test and obviously JMSFlowTest patch which removes all the Thread.sleep(), extends ClusterFlowTestHelper and calls its helper methods to verify that messages were received. Currently I am setting the timeout to 4000 msc even though on my machine all 10 messages arrive in under 200 msc. I've also added another assert, so if after all messages do not arrive in the allotted time, then the test will fail (but at least we'll know exactly why). If you agree with the proposed solution, I'll take care of all other tests in the similar way. org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest -- Key: SM-628 URL: https://issues.apache.org/activemq/browse/SM-628 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Reporter: Fritz Oconer Fix For: 3.2 Attachments: SMTestCasesPatches.zip -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SM-624) Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.jms.MultipleJMSFlowTest
[ https://issues.apache.org/activemq/browse/SM-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40368 ] Oleg Zhurakousky commented on SM-624: - Can someone please give me quick pointer on this one. This test only has one test() method which basically follows the flow of 1. Create 4 containers 2. Start 4 Containers 3. Activate Receivers in these containers and so on Then stopping and starting again and producing log messages that look like this: 13:50:33,671 | INFO | main | MultipleJMSFlowTest | nmr.flow.jms.MultipleJMSFlowTest 103 | Nodes: 0, 0, 0, 0 13:50:33,703 | INFO | main | JBIContainer | cemix.jbi.container.JBIContainer 642 | ServiceMix JBI Container (container0) started 13:50:37,703 | INFO | main | MultipleJMSFlowTest | nmr.flow.jms.MultipleJMSFlowTest 103 | Nodes: 1, 0, 0, 0 13:50:37,703 | WARN | main | ClientFactory| emix.jbi.framework.ClientFactory 89 | Cound not start ClientFactory: javax.naming.NamingException: Something already bound at ClientFactory 13:50:37,718 | INFO | main | JBIContainer | cemix.jbi.container.JBIContainer 642 | ServiceMix JBI Container (container1) started 13:50:41,718 | INFO | main | MultipleJMSFlowTest | nmr.flow.jms.MultipleJMSFlowTest 103 | Nodes: 2, 2, 0, 0 13:50:41,718 | WARN | main | ClientFactory| emix.jbi.framework.ClientFactory 89 | Cound not start ClientFactory: javax.naming.NamingException: Something already bound at ClientFactory 13:50:41,734 | INFO | main | JBIContainer | cemix.jbi.container.JBIContainer 642 | ServiceMix JBI Container (container2) started 13:50:45,734 | INFO | main | MultipleJMSFlowTest | nmr.flow.jms.MultipleJMSFlowTest 103 | Nodes: 3, 3, 3, 0 13:50:45,734 | WARN | main | ClientFactory| emix.jbi.framework.ClientFactory 89 | Cound not start ClientFactory: javax.naming.NamingException: Something already bound at ClientFactory 13:50:45,750 | INFO | main | JBIContainer | cemix.jbi.container.JBIContainer 642 | ServiceMix JBI Container (container3) started 13:50:49,750 | INFO | main | MultipleJMSFlowTest | nmr.flow.jms.MultipleJMSFlowTest 103 | Nodes: 4, 4, 4, 4 There are no assertions, and so far it does not produce any errors What was the thought behind this test? Could it be incomplete? Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.jms.MultipleJMSFlowTest --- Key: SM-624 URL: https://issues.apache.org/activemq/browse/SM-624 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Environment: Windows and linux Reporter: Fritz Oconer Fix For: 3.2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SM-628) org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest
[ https://issues.apache.org/activemq/browse/SM-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40363 ] ozhurakousky edited comment on SM-628 at 10/12/07 6:23 AM: --- I agree and that was my initial thought to include the timeout logic as part of MessageList or reuse the existing method. The issue that I see is that MessageList represents one receiver. Most of these instabilities were occurring during a cluster test which means we were dealing with more then one receiver and when more then once receiver is active I observed that messages are equally spread between them (i.e., 10 messages, 2 receivers, each receives 5.) which means I can't even use MessageList logic since in the cluster world the amount of messages received by one receiver is not always equal to the amount of messages sent. Even if I assume that such load balancing (2 receivers 5 each) is true round robin and I could potentially reuse MessageList logic (receiver.getMessageList().assertMessagesReceived(NUM_MESSAGES);) by dividing the amount of messages sent by the amount received and place it in NUM_MESSAGES value when I di this check, I have to make sure that in my tests I always have the amount of messages sent divisible my the amount of receivers without a remainder, otherwise if I send 9 messages one receiver gets 5 while other gets 4. Which one ??? I would not know. So, the method I proposed in my patch is to have and independent process that sums the amount of messages form all receivers by actually using MessageList.getMessageCount(). NOTE: Just thought about something. I can reuse waitForMessagesToArrive(..) method in my ClusterHelperTest, to eliminate my timeout logic, but I still have to sum all of the messages before I return true or false. As to your other question about Resolve. I do not have a huge ego nor do I have any problems with it plus I am just starting my contribution with SM, so I would not mind some one watching a bit over what I do for a while (first time I crashed and burned. . .remember). So I would still like to use Resolve as the way of suggesting a FIX and acceptance of such FIX by peers would grant the Close of issue. We actually use this process internaly in my company was (Author: ozhurakousky): I agree and that was my initial thought to include the timeout logic as part of MessageList or reuse the existing method. The issue that I see is that MessageList represents one receiver. Most of these instabilities were occurring during a cluster test which means we were dealing with more then one receiver and when more then once receiver is active I observed that messages are equally spread between them (i.e., 10 messages, 2 receivers, each receives 5.) which means I can't even use MessageList logic since in the cluster world the amount of messages received by one receiver is not always equal to the amount of messages sent. Even if I assume that such load balancing (2 receivers 5 each) is true round robin and I could potentially reuse MessageList logic (receiver.getMessageList().assertMessagesReceived(NUM_MESSAGES);) by dividing the amount of messages sent by the amount received and place it in NUM_MESSAGES value when I di this check, I have to make sure that in my tests I always have the amount of messages sent divisible my the amount of receivers without a remainder, otherwise if I send 9 messages one receiver gets 5 while other gets 4. Which one ??? I would not know. So, the method I proposed in my patch is to have and independent process that sums the amount of messages form all receivers by actually using MessageList.getMessageCount(). As to your other question about Resolve. I do not have a huge ego nor do I have any problems with it plus I am just starting my contribution with SM, so I would not mind some one watching a bit over what I do for a while (first time I crashed and burned. . .remember). So I would still like to use Resolve as the way of suggesting a FIX and acceptance of such FIX by peers would grant the Close of issue. We actually use this process internaly in my company org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest -- Key: SM-628 URL: https://issues.apache.org/activemq/browse/SM-628 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Reporter: Fritz Oconer Fix For: 3.2 Attachments: SMTestCasesPatches.zip -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SM-627) Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.jca.JCAFlowTest
[ https://issues.apache.org/activemq/browse/SM-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Zhurakousky updated SM-627: Attachment: JCAFlowTest_patch.txt The same solution as JMSFlowTest (see discussion there). Make sure that JMSFlowTest patches are applied first since there is a new Helper class thus making this page a compile dependency. Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.jca.JCAFlowTest --- Key: SM-627 URL: https://issues.apache.org/activemq/browse/SM-627 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Reporter: Fritz Oconer Fix For: 3.2 Attachments: JCAFlowTest_patch.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SM-628) org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest
[ https://issues.apache.org/activemq/browse/SM-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Zhurakousky updated SM-628: Attachment: SMTestCasesPatches.zip Patches for JMSFlowTest org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest -- Key: SM-628 URL: https://issues.apache.org/activemq/browse/SM-628 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Reporter: Fritz Oconer Fix For: 3.2 Attachments: SMTestCasesPatches.zip -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SM-628) org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest
[ https://issues.apache.org/activemq/browse/SM-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40365 ] Oleg Zhurakousky commented on SM-628: - Cool, but don't commit yet. I just edited my previous comment with a little NOTE (there is something I can reuse from MessageList. . . you made me think. . .), so I want to change that. Also, let me look at other similar test to make sure that the approach is the same. I will provide another comment here when I am done. As for Jira/Resolve. May be you right Resolve or Close if not accepted could always be reopened. But it would eliminate double work if it is accepted sine the only time you would have to go back if you had to reopen it (you would not have to go back and close something that you agree already). org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest -- Key: SM-628 URL: https://issues.apache.org/activemq/browse/SM-628 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Reporter: Fritz Oconer Fix For: 3.2 Attachments: SMTestCasesPatches.zip -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SM-628) org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest
[ https://issues.apache.org/activemq/browse/SM-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40366 ] Oleg Zhurakousky commented on SM-628: - Guillaume Go ahead and commit if you like. I just tested it with JCAFlowTest and it works good so I'll be submitting a patch for that shortly, but it has to be after you apply this patch otherwise the code will not compile without having ClusterFlowTestHelper. As to the note I made in the previous comment it would not work unless we start changing more things around since the wait method takes the amount of messages as input paramener and as I said before I would not know that in the clustered flows. So I would keep it the way it is. Let me know if you plan to commit so I know when to submit the JCA and other patches. org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest -- Key: SM-628 URL: https://issues.apache.org/activemq/browse/SM-628 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Reporter: Fritz Oconer Fix For: 3.2 Attachments: SMTestCasesPatches.zip -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SM-624) Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.jms.MultipleJMSFlowTest
[ https://issues.apache.org/activemq/browse/SM-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Zhurakousky resolved SM-624. - Resolution: Cannot Reproduce I have ran 3.2-SNAPSHOT version of this test outside of Maven as well as with Maven and see no issues: --- T E S T S --- Running org.apache.servicemix.jbi.nmr.flow.jms.JMSFlowTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.812 sec Results : Tests run: 2, Failures: 0, Errors: 0, Skipped: 0 I see this is a rather old task, but . . I got to start somewhere. Failed unit test (servicemix-core) : org.apache.servicemix.jbi.nmr.flow.jms.MultipleJMSFlowTest --- Key: SM-624 URL: https://issues.apache.org/activemq/browse/SM-624 Project: ServiceMix Issue Type: Sub-task Components: servicemix-core Affects Versions: 3.0 Environment: Windows and linux Reporter: Fritz Oconer Fix For: 3.2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.