[jira] [Created] (ARTEMIS-3809) LargeMessageConsumerImpl hangs the message consume

2022-04-29 Thread David Bennion (Jira)
David Bennion created ARTEMIS-3809:
--

 Summary: LargeMessageConsumerImpl hangs the message consume
 Key: ARTEMIS-3809
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: Broker
Affects Versions: 2.21.0
 Environment: OS: Windows Server 2019
JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
Max Memory (-Xmx): 6GB
Allocated to JVM: 4.168GB
Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
Reporter: David Bennion


I wondered if this might be a recurrence of issue ARTEMIS-2293 but this happens 
on 2.21.0 and I can see the code change in LargeMessageControllerImpl.  

Using the default min-large-message-size of 100K. (defaults)

Many messages are passing through the broker when this happens.  I would 
anticipate that most of the messages are smaller than 100K, but clearly some of 
them must exceed.  After some number of messages, a particular consumer ceases 
to consume messages.

After the system became "hung" I was able to get a stack trace and I was able 
to identify that the system is stuck in an Object.wait() for a notify that 
appears to never come.

Here is the trace I was able to capture:
{code:java}
Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
    - waiting on <0x43523a75> (a 
org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
    - locked <0x43523a75> (a 
org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
    at  java.base@17.0.1/java.lang.Object.wait(Native Method)
    at 
org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
    at 
org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
    at 
org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
    at 
org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
    at mypackage.MessageListener.handleMessage(MessageListener.java:46)

{code}
 

The app can run either as a single node using the InVM transporter or as a 
cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
the InVM. 

I am not expert in this code, but I can tell from the call stack that 0 must be 
the value of timeWait passed into waitCompletion().  But from what I can 
discern of the code changes in 2.21.0,  it should be adjusting the readTimeout 
to the timeout of the message (I think?) such that it causes the read to 
eventually give up rather than remaining blocked forever.

We have persistenceEnabled = false, which leads me to believe that the only 
disk activity  for messages should be related to large messages(?).  

On a machine and context where this was consistently happening, I adjusted the 
min-large-message-size upwards and the problem went away.   This makes sense 
for my application, but ultimately if a message goes across the threshold to 
become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-04-29 Thread David Bennion (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Bennion updated ARTEMIS-3809:
---
Summary: LargeMessageControllerImpl hangs the message consume  (was: 
LargeMessageConsumerImpl hangs the message consume)

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-02 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530948#comment-17530948
 ] 

David Bennion commented on ARTEMIS-3809:


Thanks for following up, Justin. 

Answers to questions:

1)  We have one particular machine where this seems to happen.  I have reports 
of it happening very sporadically other places, but it seems to be reliable on 
this particular machine.  I am able to create it by running a process which 
causes a lot of messages to flow through the system and at a certain point it 
will hang.

2) I will see what I can do on this one for you.  This actually is embedded and 
configured programmatically, so I don't have a broker.xml – just settings in 
the code.

3) I believe I can confirm that it is stuck indefinitely in this manner, and 
you can verify my thinking.  Before writing this response I went and reproduced 
it again.  I have the ability to get a threaddump from the system after the 
consumer has hung.  
At 2:22 PM as the issue started happening I took a threaddump.  And then again 
at 3:10 PM so around 50 minutes later, the threaddump had locks on the same 
object for a wait notify.  So the state of this thread is the same 50 minutes 
apart.

Thread-2 (ActiveMQ-client-global-threads) id=154 state=TIMED_WAITING
    - waiting on <{color:#FF}0x3a62856c{color}> (a 
org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
    - locked <{color:#FF}0x3a62856c{color}> (a 
org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
    at java.base@17.0.1/java.lang.Object.wait(Native Method)
    at 
org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
    at 
org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)

And from reading the code, it looks like a NEW instance of 
LargeMessageControllerImpl is instantiated for each large message as it is 
being received.  I therefore conclude that this thread remained blocked on the 
same message well beyond any kind of reasonable timeout.

The "blocking call timeout" settable via the serverLocator.setCallTimeout() 
method is at its default of 0 (at least I didn't set any value prior to this 
problem).

4)  I adjusted the org.apache.activemq.artemis to TRACE and did this run.  It 
produced a LOT of logging.  I am going to parse through it and see if I can 
find relevant stuff for you relative the time when "the train left the track".  
:)   Will post back with more details as I can find them.

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListene

[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-02 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530988#comment-17530988
 ] 

David Bennion commented on ARTEMIS-3809:


I have logs and I was able to identify the last time the message consumer gets 
called:


{code:java}
05/02/2022 15:19:10.440 TRACE (global-threads)) [ClientConsumerImpl  ]  
org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl@4e3abde6{consumerContext=ActiveMQConsumerContext{id=0},
 queueName=analyze}::Calling handler.onMessage
{code}

The blocked consumer is on a queue called 'analyze'.  We have another queue 
called 'Stats' that I believe is still pushing messages.

 

 
{noformat}
05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::Sending 
inVM packet
05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [motingConnectionImpl]  
RemotingConnectionID=1e5a49e7-ca54-11ec-99a3-0050568ed7da handling packet 
SessionIndividualAcknowledgeMessage[type=81, channelID=23, responseAsync=false, 
requiresResponse=false, correlationID=-1, consumerID=0, messageID=34369871410, 
requiresResponse=false]
05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::packet 
sent done
05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::Sending 
inVM packet
05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [motingConnectionImpl]  
RemotingConnectionID=1e5a49e7-ca54-11ec-99a3-0050568ed7da handling packet 
SessionSendMessage_V2[type=71, channelID=14, responseAsync=true, 
requiresResponse=false, correlationID=0, 
message=CoreMessage[messageID=0,durable=false,userID=null,priority=4, 
timestamp=Mon May 02 15:19:10 CDT 2022,expiration=Mon May 02 15:19:12 CDT 2022, 
durable=false, 
address=Stats,size=1552,properties=TypedProperties[...SNIPPED...]]@1327958296, 
requiresResponse=false, correlationID=0, requiresResponse=false]
05/02/2022 15:19:10.440 TRACE (0.0-2081182161)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::packet 
sent done
05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [SessionPacketHandler]  
ServerSessionPacketHandler::handlePacket,SessionIndividualAcknowledgeMessage[type=81,
 channelID=23, responseAsync=false, requiresResponse=false, correlationID=-1, 
consumerID=0, messageID=34369871410, requiresResponse=false]
05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [ServerConsumerImpl  ]  
individualACK messageID=34369871410
05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [ServerConsumerImpl  ]  
individualACK starting new TX
05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [ServerConsumerImpl  ]  ACKing 
ref 
Reference[34369871410]:NON-RELIABLE:CoreMessage[messageID=34369871410,durable=false,userID=null,priority=5,
 timestamp=Mon May 02 15:19:09 CDT 2022,expiration=Mon May 02 15:20:09 CDT 
2022, durable=false, 
address=analyze,size=9014,properties=TypedProperties[...SNIPPED...]]@1054124345 
on tx= TransactionImpl [xid=null, txID=34369871912, xid=null, state=ACTIVE, 
createTime=1651522750440(Mon May 02 15:19:10 CDT 2022), timeoutSeconds=-1, nr 
operations = 0]@1df2cdae, consumer=ServerConsumerImpl [id=0, filter=null, 
binding=LocalQueueBinding [address=analyze, queue=QueueImpl[name=analyze, 
postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=0.0.0.0], 
temp=false]@1199309c, filter=null, name=analyze, 
clusterName=analyze5aa9778c-98de-11ec-b671-0050568ed7da]]
05/02/2022 15:19:10.440 TRACE (Impl$6@f15a0d8)) [QueueImpl           ]  
QueueImpl[name=analyze, postOffice=PostOfficeImpl 
[server=ActiveMQServerImpl::name=0.0.0.0], temp=false]@1199309c acknowledge 
tx=true 
ref=Reference[34369871410]:NON-RELIABLE:CoreMessage[messageID=34369871410,durable=false,userID=null,priority=5,
 timestamp=Mon May 02 15:19:09 CDT 2022,expiration=Mon May 02 15:20:09 CDT 
2022, durable=false, 
address=analyze,size=9014,properties=TypedProperties[...SNIPPED...]]@1054124345,
 reason=NORMAL, consumer=ServerConsumerImpl [id=0, filter=null, 
binding=LocalQueueBinding [address=analyze, queue=QueueImpl[name=analyze, 
postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=0.0.0.0], 
temp=false]@1199309c, filter=null, name=analyze, 
clusterName=analyze5aa9778c-98de-11ec-b671-0050568ed7da]]
05/02/2022 15:19:10.440 INFO  (Impl$6@f15a0d8)) [message             ]  
AMQ601502: User guest(guest)@invm:0 is acknowledging a message from analyze: 
CoreMessage[messageID=34369871410,durable=false,userID=null,priority=5, 
timestamp=Mon May 02 15:19:09 CDT 2022,expiration=Mon May 02 15:20:09 CDT 2022, 
durable=false, 
address=analyze,size=9014,properties=TypedProperties[...SNIPPED...]]@1054124345
05/02/2022 15:19:10.440 DEBUG (Impl$6@f15a0d8)) [impl                ]  
AMQ843014: acknowledged m

[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531237#comment-17531237
 ] 

David Bennion commented on ARTEMIS-3809:


Thanks Justin.  
{quote}Since the {{wait}} happens in a {{while}} loop it could be looping on 
the same {{LargeMessageControllerImpl}} instance. So the lock ID in itself is 
not sufficient to say that execution of that thread has not proceeded in anyway 
between thread dumps.
{quote}
Ok.  Yes I get this in principle.  The behavior we observed from this is that 
all of the other messages timed out in the queues.  We have 60 second timeouts 
on all these these messages.  I pulled a thread dump this morning and this 
machine is still hung on the same place and with the same object hash.  We ran 
a large batch of messages yesterday and after this issue a large number came 
back as timeout.  So I am virtually certain that we are a hung on a single 
message in a single place, otherwise the message would have eventually timed 
out and the consumer would have become unblocked and wouldn't be hanging in 
this same location.  I have not yet been able to get this machine into a 
debugger to be able to attempt to inspect.  (There is some work involved in 
that due to network segmentation and security stuff, but I'll work my way 
around that).
{quote}The default call timeout on the {{ServerLocator}} should be {{3}} 
milliseconds. Therefore, if you are not setting it to {{0}} I don't see how a 
value of {{0}} could be used here.
{quote}
OK, that surprises me.  I will search  to double validate that what I said is 
true.  I cannot see any references in our code other than the one we just put 
in.  But is it possible that there is some other configuration call that does 
this as a side effect?  I can clearly see that the current number must be 
either 0 or astronomically large.
{quote}The main thing I'm interested in at this point is {{TRACE}} logging on 
{{{}org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl{}}}.
 This will show what packets are arriving to the client. Specifically I want to 
see packets of the type {{SESS_RECEIVE_LARGE_MSG}} and 
{{{}SESS_RECEIVE_CONTINUATION{}}}.
{quote}
Got it.  Unless I am mistaken, I won't see any of those logs from 
RemotingConnectionImpl in my logs because I have only seemed to be able to make 
this happen on InVM.  For instance, I see these logs:

[insert log here]

 

But these logs just appear over and over like this and don't really seem to 
have anything interesting.
{quote}Ultimately the best way to proceed is for you to provide a way to 
reproduce the problem. That way I can put a debugger on it and really see 
what's happening.
{quote}
I get it.  I will see what I can do.  It is hard enough to get this one to 
happen even with all our code connected.

 

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.Cl

[jira] [Comment Edited] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531237#comment-17531237
 ] 

David Bennion edited comment on ARTEMIS-3809 at 5/3/22 2:47 PM:


Thanks Justin.  
{quote}Since the {{wait}} happens in a {{while}} loop it could be looping on 
the same {{LargeMessageControllerImpl}} instance. So the lock ID in itself is 
not sufficient to say that execution of that thread has not proceeded in anyway 
between thread dumps.
{quote}
Ok.  Yes I get this in principle.  The behavior we observed from this is that 
all of the other messages timed out in the queues.  We have 60 second timeouts 
on all these these messages.  I pulled a thread dump this morning and this 
machine is still hung on the same place and with the same object hash.  We ran 
a large batch of messages yesterday and after this issue a large number came 
back as timeout.  So I am virtually certain that we are a hung on a single 
message in a single place, otherwise the message would have eventually timed 
out and the consumer would have become unblocked and wouldn't be hanging in 
this same location.  I have not yet been able to get this machine into a 
debugger to be able to attempt to inspect.  (There is some work involved in 
that due to network segmentation and security stuff, but I'll work my way 
around that).
{quote}The default call timeout on the {{ServerLocator}} should be {{3}} 
milliseconds. Therefore, if you are not setting it to {{0}} I don't see how a 
value of {{0}} could be used here.
{quote}
OK, that surprises me.  I will search  to double validate that what I said is 
true.  I cannot see any references in our code other than the one we just put 
in.  But is it possible that there is some other configuration call that does 
this as a side effect?  I can clearly see that the current number must be 
either 0 or astronomically large.
{quote}The main thing I'm interested in at this point is {{TRACE}} logging on 
{{{}org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl{}}}.
 This will show what packets are arriving to the client. Specifically I want to 
see packets of the type {{SESS_RECEIVE_LARGE_MSG}} and 
{{{}SESS_RECEIVE_CONTINUATION{}}}.
{quote}
Got it.  Unless I am mistaken, I won't see any of those logs from 
RemotingConnectionImpl in my logs because I have only seemed to be able to make 
this happen on InVM.  For instance, I see these logs:
{noformat}
05/02/2022 15:19:09.440 TRACE (.InVMConnector)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::packet 
sent done
05/02/2022 15:19:09.440 TRACE (0.0-2081182161)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::Sending 
inVM packet
05/02/2022 15:19:09.440 TRACE (0.0-2081182161)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::packet 
sent done
05/02/2022 15:19:09.440 TRACE (.InVMConnector)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::Sending 
inVM packet
05/02/2022 15:19:09.440 TRACE (.InVMConnector)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::packet 
sent done
05/02/2022 15:19:09.441 TRACE (0.0-2081182161)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::Sending 
inVM packet
05/02/2022 15:19:09.441 TRACE (0.0-2081182161)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::packet 
sent done
05/02/2022 15:19:09.441 TRACE (.InVMConnector)) [InVMConnection      ]  
InVMConnection [serverID=0, id=1e5a49e7-ca54-11ec-99a3-0050568ed7da]::Sending 
inVM packet{noformat}
 

But these logs just appear over and over like this and don't really seem to 
have anything interesting.
{quote}Ultimately the best way to proceed is for you to provide a way to 
reproduce the problem. That way I can put a debugger on it and really see 
what's happening.
{quote}
I get it.  I will see what I can do.  It is hard enough to get this one to 
happen even with all our code connected.

 


was (Author: JIRAUSER288845):
Thanks Justin.  
{quote}Since the {{wait}} happens in a {{while}} loop it could be looping on 
the same {{LargeMessageControllerImpl}} instance. So the lock ID in itself is 
not sufficient to say that execution of that thread has not proceeded in anyway 
between thread dumps.
{quote}
Ok.  Yes I get this in principle.  The behavior we observed from this is that 
all of the other messages timed out in the queues.  We have 60 second timeouts 
on all these these messages.  I pulled a thread dump this morning and this 
machine is still hung on the same place and with the same object hash.  We ran 
a large batch of messages yesterday and after this issue a large number came 
back as timeout.  

[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531244#comment-17531244
 ] 

David Bennion commented on ARTEMIS-3809:


Just ran and put a breakpoint on LargeMessageControllerImpl:294.  (attempting 
to get it to occur again).   Verified that readTimeout *IS* set to 3 (30 
seconds), as you suggested. 

I am now suspecting that we are stuck in that while loop and will never get a 
streamEnded. 

Will try to substantiate this with the debugger in the context when the issue 
actually happens (slow going).

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531244#comment-17531244
 ] 

David Bennion edited comment on ARTEMIS-3809 at 5/3/22 3:43 PM:


Just ran and put a breakpoint on LargeMessageControllerImpl:294.  (attempting 
to get it to occur again).   Verified that readTimeout *IS* set to 3 (30 
seconds), as you suggested. 

I am now suspecting that we are stuck in that while loop and will never get a 
streamEnded.  (given that every time I get a stack trace when we are stuck it 
still points to that line.  It is a matter of nano-seconds to do the loop and 
get back to the 30s wait() call)

Will try to substantiate this with the debugger in context when the issue 
actually happens (slow going).


was (Author: JIRAUSER288845):
Just ran and put a breakpoint on LargeMessageControllerImpl:294.  (attempting 
to get it to occur again).   Verified that readTimeout *IS* set to 3 (30 
seconds), as you suggested. 

I am now suspecting that we are stuck in that while loop and will never get a 
streamEnded. 

Will try to substantiate this with the debugger in the context when the issue 
actually happens (slow going).

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Bennion updated ARTEMIS-3809:
---
Attachment: image-2022-05-03-10-51-46-872.png

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
> Attachments: image-2022-05-03-10-51-46-872.png
>
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531297#comment-17531297
 ] 

David Bennion commented on ARTEMIS-3809:


OK, I was able to validate the above in my debugger.  When we hang, we are 
continually looping and waiting for 30 seconds.  The streamEnded is false, but 
packetAdded = true.  So interesting state.  I'll keep looking around here and 
maybe get lucky before investing the work to make this reproducible.

 

!image-2022-05-03-10-51-46-872.png!

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
> Attachments: image-2022-05-03-10-51-46-872.png
>
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531305#comment-17531305
 ] 

David Bennion commented on ARTEMIS-3809:


Because packetAdded = true this timeout never triggers here
{code:java}
            } else if (System.currentTimeMillis() > timeOut && !packetAdded) {
               throw ActiveMQClientMessageBundle.BUNDLE.timeoutOnLargeMessage();
            }{code}

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
> Attachments: image-2022-05-03-10-51-46-872.png
>
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531314#comment-17531314
 ] 

David Bennion commented on ARTEMIS-3809:


(timeOut - System.currentTimeMillis())/1000  right now = -1973 and just keeps 
gapping further.

I can't see that packetAdded ever gets reset to false, so that appears to be 
the one way door on this.

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
> Attachments: image-2022-05-03-10-51-46-872.png
>
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531328#comment-17531328
 ] 

David Bennion commented on ARTEMIS-3809:


compressed messages is off for us currently

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
> Attachments: image-2022-05-03-10-51-46-872.png
>
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531331#comment-17531331
 ] 

David Bennion commented on ARTEMIS-3809:


The stuck message from inspecting the outStream:

ClientLargeMessageImpl[messageID=38664715608, durable=false, 
address=analyze,userID=null,properties=TypedProperties[...SNIPPED...,_AMQ_LARGE_SIZE={color:#FF}108314{color}]]

 

So barely over the 100K threshold

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
> Attachments: image-2022-05-03-10-51-46-872.png
>
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-03 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531405#comment-17531405
 ] 

David Bennion commented on ARTEMIS-3809:


Thank you sir!  I'll be happy to verify the change on my side as well.

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
> Attachments: image-2022-05-03-10-51-46-872.png
>
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

2022-05-04 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531770#comment-17531770
 ] 

David Bennion commented on ARTEMIS-3809:


I have still been thinking about this scenario, and your explanation for a 
resolution to make this system more robust makes total sense.   The delivery 
system being robust and timing out is really important and the possibility of 
sending a single packet and then vanishing seems like a plausible edge case.  
So that is all good. 

 

The  piece of my situation that still doesn't make complete sense to me though 
is that all of these messages occur within a single JVM and using the InVM 
transporter.  I don't (that I know of) have any kind of error in the log that 
indicates stuff went wrong.  So how did I arrive at this point where a single 
packet of a large message made it through and it abandoned the send of the rest 
of it without a trace?

With the fix that you are proposing (which I believe is a correct and valuable 
fix), would it not be true that my situation would simply get a failed message 
delivery for that message and continue on around it? 

> LargeMessageControllerImpl hangs the message consume
> 
>
> Key: ARTEMIS-3809
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.21.0
> Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>Reporter: David Bennion
>Priority: Major
>  Labels: test-stability
> Attachments: image-2022-05-03-10-51-46-872.png
>
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARTEMIS-3867) ha-policy for replication won't work on Windows

2022-06-20 Thread David Bennion (Jira)
David Bennion created ARTEMIS-3867:
--

 Summary: ha-policy for replication won't work on Windows
 Key: ARTEMIS-3867
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3867
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: Broker
Affects Versions: 2.23.0
Reporter: David Bennion


FileBasedPrimitiveManager.java line 134 has  
{code:java}
getDistributedLock("ML:" + mutableLongId); {code}
But the ':' is invalid on a Windows filesystem.  This should maybe be replaced 
with an '_' or a '-'?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3867) ha-policy for replication won't work on Windows

2022-06-21 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556933#comment-17556933
 ] 

David Bennion commented on ARTEMIS-3867:


I am happy to change it to whatever.  We have a system that uses Artemis as an 
embedded broker.  We are now trying to add the primary/standby (master/slave) 
slave HA configuration for the broker to this application.  We have run into 
this as an issue and it stops everything from working. 

 

We have run into a few other things that have been problematic for us and are 
working on a minimal set of code.  The rationale that this should only be an 
issue with the 'test-suite' – is this file only meant to be used by the test 
suite?  Have we taken a wrong turn?

> ha-policy for replication won't work on Windows
> ---
>
> Key: ARTEMIS-3867
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3867
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.23.0
>Reporter: David Bennion
>Priority: Blocker
>  Labels: high-availability
>
> FileBasedPrimitiveManager.java line 134 has  
> {code:java}
> getDistributedLock("ML:" + mutableLongId); {code}
> But the ':' is invalid on a Windows filesystem.  This should maybe be 
> replaced with an '_' or a '-'?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ARTEMIS-3867) ha-policy for replication won't work on Windows

2022-06-21 Thread David Bennion (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Bennion updated ARTEMIS-3867:
---
Priority: Major  (was: Blocker)

> ha-policy for replication won't work on Windows
> ---
>
> Key: ARTEMIS-3867
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3867
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.23.0
>Reporter: David Bennion
>Priority: Major
>  Labels: high-availability
>
> FileBasedPrimitiveManager.java line 134 has  
> {code:java}
> getDistributedLock("ML:" + mutableLongId); {code}
> But the ':' is invalid on a Windows filesystem.  This should maybe be 
> replaced with an '_' or a '-'?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ARTEMIS-3867) ha-policy for replication won't work on Windows

2022-06-21 Thread David Bennion (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Bennion updated ARTEMIS-3867:
---
Priority: Minor  (was: Major)

> ha-policy for replication won't work on Windows
> ---
>
> Key: ARTEMIS-3867
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3867
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.23.0
>Reporter: David Bennion
>Priority: Minor
>  Labels: high-availability
>
> FileBasedPrimitiveManager.java line 134 has  
> {code:java}
> getDistributedLock("ML:" + mutableLongId); {code}
> But the ':' is invalid on a Windows filesystem.  This should maybe be 
> replaced with an '_' or a '-'?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3867) ha-policy for replication won't work on Windows

2022-06-21 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557137#comment-17557137
 ] 

David Bennion commented on ARTEMIS-3867:


Well well.   I think we must have taken a wrong turn on this.  We have an older 
version of this app that uses an older version of Artemis (1.5 era) that 
supports fail-over for the broker with the master/slave configuration.

Let me re-analyzer our approach.

> ha-policy for replication won't work on Windows
> ---
>
> Key: ARTEMIS-3867
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3867
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.23.0
>Reporter: David Bennion
>Priority: Minor
>  Labels: high-availability
>
> FileBasedPrimitiveManager.java line 134 has  
> {code:java}
> getDistributedLock("ML:" + mutableLongId); {code}
> But the ':' is invalid on a Windows filesystem.  This should maybe be 
> replaced with an '_' or a '-'?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ARTEMIS-3867) ha-policy for replication won't work on Windows

2022-06-21 Thread David Bennion (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Bennion updated ARTEMIS-3867:
---
Priority: Trivial  (was: Minor)

> ha-policy for replication won't work on Windows
> ---
>
> Key: ARTEMIS-3867
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3867
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.23.0
>Reporter: David Bennion
>Priority: Trivial
>  Labels: high-availability
>
> FileBasedPrimitiveManager.java line 134 has  
> {code:java}
> getDistributedLock("ML:" + mutableLongId); {code}
> But the ':' is invalid on a Windows filesystem.  This should maybe be 
> replaced with an '_' or a '-'?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARTEMIS-3867) ha-policy for replication won't work on Windows

2022-06-21 Thread David Bennion (Jira)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557150#comment-17557150
 ] 

David Bennion commented on ARTEMIS-3867:


I think we identified what is going on.  We were attempting to use 
ReplicationPrimaryPolicyConfiguration instead of ReplicatedPolicyConfiguration 
/ ReplicaPolicyConfiguration.  That took us inadvertently down the road of the 
pluggable quorum feature.  I am happy to close the issue or mark it trivial.

> ha-policy for replication won't work on Windows
> ---
>
> Key: ARTEMIS-3867
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3867
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.23.0
>Reporter: David Bennion
>Priority: Trivial
>  Labels: high-availability
>
> FileBasedPrimitiveManager.java line 134 has  
> {code:java}
> getDistributedLock("ML:" + mutableLongId); {code}
> But the ':' is invalid on a Windows filesystem.  This should maybe be 
> replaced with an '_' or a '-'?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARTEMIS-4205) All a database to be used to pluggable quorum vote locking requirements

2023-03-14 Thread David Bennion (Jira)
David Bennion created ARTEMIS-4205:
--

 Summary: All a database to be used to pluggable quorum vote 
locking requirements
 Key: ARTEMIS-4205
 URL: https://issues.apache.org/jira/browse/ARTEMIS-4205
 Project: ActiveMQ Artemis
  Issue Type: New Feature
  Components: Broker
Reporter: David Bennion


Utilizing the new pluggable quorum vote locking api, implement the locking 
manager using a JDBC database connection.   Support for Oracle, MSSQL and 
Postgres is included initially, with H2 being utilized for unit testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARTEMIS-4205) Allow a database to be used to pluggable quorum vote locking requirements

2023-03-14 Thread David Bennion (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Bennion updated ARTEMIS-4205:
---
Summary: Allow a database to be used to pluggable quorum vote locking 
requirements  (was: All a database to be used to pluggable quorum vote locking 
requirements)

> Allow a database to be used to pluggable quorum vote locking requirements
> -
>
> Key: ARTEMIS-4205
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4205
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: David Bennion
>Priority: Minor
>  Labels: pull-request-available
>
> Utilizing the new pluggable quorum vote locking api, implement the locking 
> manager using a JDBC database connection.   Support for Oracle, MSSQL and 
> Postgres is included initially, with H2 being utilized for unit testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARTEMIS-4205) Allow a database to be used to meet pluggable quorum vote locking requirements

2023-03-14 Thread David Bennion (Jira)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Bennion updated ARTEMIS-4205:
---
Summary: Allow a database to be used to meet pluggable quorum vote locking 
requirements  (was: Allow a database to be used to pluggable quorum vote 
locking requirements)

> Allow a database to be used to meet pluggable quorum vote locking requirements
> --
>
> Key: ARTEMIS-4205
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4205
> Project: ActiveMQ Artemis
>  Issue Type: New Feature
>  Components: Broker
>Reporter: David Bennion
>Priority: Minor
>  Labels: pull-request-available
>
> Utilizing the new pluggable quorum vote locking api, implement the locking 
> manager using a JDBC database connection.   Support for Oracle, MSSQL and 
> Postgres is included initially, with H2 being utilized for unit testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)