Bugs item #701731, was opened at 2003-03-11 18:44
Message generated for change (Comment added) made by ejort
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=701731&group_id=22866

Category: JBossMQ
Group: v3.0 Rabbit Hole
Status: Open
Resolution: None
Priority: 5
Submitted By: Eduard Letifov (techtime)
Assigned to: Adrian Brock (ejort)
Summary: Problem: MDB stops reacting 

Initial Comment:
I would like to report a potential bug.

I've made a post in Jboss forums and there was some 

responses from other people supposedly experiencing 

simular problems.



Below, I repeat the problem description and our current 

state of researching the roots of the problem:

=====

Setup:

Our system is centered around Jboss 3.0.4+Jetty 

running on AS400. In short, it handles HTTP requests :) 

Current (test environment) average load on the system is 

somewhere around 30 requests per minute, peak values 

so far were around 200. Production systems certainly as 

usual has to handle more.



The basic workflow/design of the system can be 

described as follows: 

- requests are picked up by the servlets

- each type of request is handled by separate servlet 

(actually the same class, but different config parameters 

through web.xml)

- servlet parses the request, composes an object 

message and submits it to the core via request JMS 

queue with one of the properties set to specific constant 

reflecting the request type

- servlet goes to timed-synchronous waiting cycle on 

response queue, if no response arrives from core in 10 

sec, a specific (timeout) response is sent to the client

- core "interface" is a set of MDB that call the rest 

(Session and Entity beans) or respond back directly 

through response queue. Instances of each different 

MDB class handle request messages only of a certain 

type specified via message selector in descriptor.



Problem:

After a certain time (~couple of days) one (or several, no 

system visible) MDBs stop responding to messages. In 

the logs it is visible that message was submitted to the 

queue, after that nothing happens, clients eventually 

receive timeout response. If we restart Jboss or 

stop/destroy/create/start the jar with MDB through JMX 

console everything starts working again.

===



I've noticed that in our case problems seem to start after 

the following message appears in the logs: "WARNING: 

NACK issued. The message consumer was not waiting 

for a message."



Also we've made some more detailed research of logs 

and Jboss source. After we enable tracing here is what 

we've got:



2003-03-06 16:34:21,068 TRACE 

[1046964853752:TracingInterceptor] CALLED : receive

2003-03-06 16:34:21,082 TRACE 

[1046964853752:TracingInterceptor] ARG : -2147473121

2003-03-06 16:34:21,090 TRACE 

[1046964853752:TracingInterceptor] ARG : -1

2003-03-06 16:34:21,090 TRACE 

[1046964853752:ServerSecurityInterceptor] Checking 

receive authorize on 

SpyDistributedConnection:ID:78/ca35b8c9ad1c166f2156

0d812148ea9e subId=-2147473121

2003-03-06 16:34:21,092 TRACE 

[1046964853752:TracingInterceptor] EXCEPTION : 

receive: 

javax.jms.JMSException: The provided subscription 

does not exist

java/lang/Throwable.<init>(Ljava/lang/String;)V+4 

(Throwable.java:90) 

javax/jms/JMSException.<init>

(Ljava/lang/String;Ljava/lang/String;)V+0 

(JMSException.java:0) 

javax/jms/JMSException.<init>(Ljava/lang/String;)V+0 

(JMSException.java:0) 

org/jboss/mq/server/ClientConsumer.getSubscription(I)

Lorg/jboss/mq/Subscription;+0 (ClientConsumer.java:0) 

org/jboss/mq/server/JMSDestinationManager.getSubscri

ption(Lorg/jboss/mq/ConnectionToken;I)

Lorg/jboss/mq/Subscription;+0 

(JMSDestinationManager.java:0) 

org/jboss/mq/server/JMSServerInterceptorSupport.getSub

scription(Lorg/jboss/mq/ConnectionToken;I)

Lorg/jboss/mq/Subscription;+0 

(JMSServerInterceptorSupport.java:0) 

org/jboss/mq/security/ServerSecurityInterceptor.receive

(Lorg/jboss/mq/ConnectionToken;IJ)

Lorg/jboss/mq/SpyMessage;+0 

(ServerSecurityInterceptor.java:0) 

org/jboss/mq/server/TracingInterceptor.receive

(Lorg/jboss/mq/ConnectionToken;IJ)

Lorg/jboss/mq/SpyMessage;+0 

(TracingInterceptor.java:0) 

org/jboss/mq/server/JMSServerInvoker.receive

(Lorg/jboss/mq/ConnectionToken;IJ)

Lorg/jboss/mq/SpyMessage;+0 

(JMSServerInvoker.java:0) 

org/jboss/mq/il/jvm/JVMServerIL.receive

(Lorg/jboss/mq/ConnectionToken;IJ)

Lorg/jboss/mq/SpyMessage;+0 (JVMServerIL.java:0) 

org/jboss/mq/Connection.receive

(Lorg/jboss/mq/Subscription;J)

Lorg/jboss/mq/SpyMessage;+0 (Connection.java:0) 

org/jboss/mq/SpyMessageConsumer.receiveNoWait()

Ljavax/jms/Message;+0 (SpyMessageConsumer.java:0) 

com/vocognition/talkface/jms/JMSModel.checkNoWaitFo

rReplyTo

(Lcom/vocognition/talkface/talkman/TalkmanMessage;)

Lcom/vocognition/talkface/talkman/ModelMessage;+0 

(JMSModel.java:73) 

com/vocognition/talkface/talkman/LUTHandler.service

(Lcom/vocognition/talkface/talkman/TalkmanRequest;Lco

m/vocognition/talkface/talkman/TalkmanResponse;)V+0 

(LUTHandler.java:27) 

com/vocognition/talkface/protocol/tfshttp/JMSServlet.doG

et

(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http

/HttpServletResponse;)V+0 (JMSServlet.java:112) 

javax/servlet/http/HttpServlet.service

(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http

/HttpServletResponse;)V+0 (HttpServlet.java:0) 

javax/servlet/http/HttpServlet.service

(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResp

onse;)V+0 (HttpServlet.java:0) 

org/mortbay/jetty/servlet/ServletHolder.handle

(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResp

onse;)V+0 (ServletHolder.java:334) 

org/mortbay/jetty/servlet/WebApplicationHandler.dispatc

h

(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;

Ljavax/servlet/http/HttpServletResponse;Lorg/mortbay/jett

y/servlet/ServletHolder;)V+0 

(WebApplicationHandler.java:201) 

org/mortbay/jetty/servlet/ServletHandler.handle

(Ljava/lang/String;Ljava/lang/String;Lorg/mortbay/http/Htt

pRequest;Lorg/mortbay/http/HttpResponse;)V+0 

(ServletHandler.java:518) 

org/mortbay/http/HttpContext.handle

(Ljava/lang/String;Ljava/lang/String;Lorg/mortbay/http/Htt

pRequest;Lorg/mortbay/http/HttpResponse;)Z+0 

(HttpContext.java:1665) 

org/mortbay/jetty/servlet/WebApplicationContext.handle

(Ljava/lang/String;Ljava/lang/String;Lorg/mortbay/http/Htt

pRequest;Lorg/mortbay/http/HttpResponse;)Z+0 

(WebApplicationContext.java:540) 

org/mortbay/http/HttpContext.handle

(Lorg/mortbay/http/HttpRequest;Lorg/mortbay/http/HttpR

esponse;)Z+0 (HttpContext.java:1576) 

org/mortbay/http/HttpServer.service

(Lorg/mortbay/http/HttpRequest;Lorg/mortbay/http/HttpR

esponse;)Lorg/mortbay/http/HttpContext;+0 

(HttpServer.java:840) 

org/jboss/jetty/Jetty.service

(Lorg/mortbay/http/HttpRequest;Lorg/mortbay/http/HttpR

esponse;)Lorg/mortbay/http/HttpContext;+0 

(Jetty.java:543) 

org/mortbay/http/HttpConnection.service

(Lorg/mortbay/http/HttpRequest;Lorg/mortbay/http/HttpR

esponse;)Lorg/mortbay/http/HttpContext;+0 

(HttpConnection.java:804) 

org/mortbay/http/HttpConnection.handleNext()Z+0 

(HttpConnection.java:871) 

org/mortbay/http/HttpConnection.handle()V+0 

(HttpConnection.java:822) 

org/mortbay/http/SocketListener.handleConnection

(Ljava/net/Socket;)V+0 (SocketListener.java:176) 

org/mortbay/util/ThreadedServer.handle

(Ljava/lang/Object;)V+0 (ThreadedServer.java:287) 

org/mortbay/util/ThreadPool$JobRunner.run()V+0 

(ThreadPool.java:658) 

java/lang/Thread.run()V+11 (Thread.java:484) 

2003-03-06 16:34:21,097 TRACE 

[1046964853752:TracingInterceptor] RETURN : receive





We've added some more trace statements. The 

subscription was definitely added, but there is no 

evidence that it is removed. So it looks like it magically 

dissappears :(



I am worried about one thing - in JbossMQ 

ClientConsumer source all operations on subsriptions 

HashMap are done in blocks synchronized on 

subscriptions object itself by making clones, operating 

on them and then setting subscriptions reference to the 

new object (as in the following code snippets). I am not 

sure why is it done like this, and I suspect that it 

actually breaks the idea of synchronization because as 

it looks right after the reference is changed another 

thread may acquire a lock on a new object, while the 

first thread is still holding a lock on the old one. Please 

take a look:

===

public void addSubscription(Subscription req) throws 

JMSException

{

if( log.isTraceEnabled() )

log.trace("Adding subscription for: " + req);

req.connectionToken = connectionToken;

req.clientConsumer = this;



JMSDestination jmsdest = (JMSDestination) 

server.getJMSDestination(req.destination);

if (jmsdest == null)

{

throw new JMSException("The destination " + 

req.destination + " does not exist !");

}



jmsdest.addSubscriber(req);



synchronized (subscriptions)

{

HashMap subscriptionsClone = (HashMap) 

subscriptions.clone();

subscriptionsClone.put(new Integer(req.subscriptionId), 

req);

subscriptions = subscriptionsClone;

}

}

===



public void close()

{

boolean trace = log.isTraceEnabled();

if( trace )

log.trace("" + this +"->close()");



synchronized (messages)

{

closed = true;

if (enqueued)

{

if( trace )

log.trace("" + this +"->close(): Cancelling work in 

progress.");

threadPool.cancelWork(this);

enqueued = false;

}

}



synchronized (subscriptions)

{

Iterator i = subscriptions.keySet().iterator();

while (i.hasNext())

{

Integer subscriptionId = (Integer) i.next();

try

{

removeSubscription(subscriptionId.intValue());

}

catch(JMSException ignore)

{

}

}

}



// Nack the removed subscriptions, the connection is 

gone

HashMap removedSubsClone = (HashMap) ((HashMap) 

removedSubscriptions).clone();

Iterator i = removedSubsClone.values().iterator();

while (i.hasNext())

{

Subscription removed = (Subscription) i.next();

JMSDestination queue = (JMSDestination) 

server.getJMSDestination(removed.destination);

if (queue == null)

log.warn("The subscription was registered with a 

destination that does not exist: " + removed);

try

{

queue.nackMessages(removed);

}

catch (JMSException e)

{

log.warn("Unable to nack removed subscription: " + 

removed, e);

}

}

}





----------------------------------------------------------------------

>Comment By: Adrian Brock (ejort)
Date: 2003-09-26 16:30

Message:
Logged In: YES 
user_id=9459

Found the problem.

Once I saw the log, it was obvious where to look.



The "enabled" flag in ClientConsumer wasn't synchronized

which meant block subscriptions could be missed during

connection.start();



Attached the fixed version of the source.



Regards,

Adrian

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-09-26 14:36

Message:
Logged In: YES 
user_id=438410

In fact I did ! The missing on is called GetSignOnMDB. The ejb-

jar and jboss.xml we use are here:



 ftp://ftp.vocognition.com/share/jboss/xml.zip



Ed.

----------------------------------------------------------------------

Comment By: Adrian Brock (ejort)
Date: 2003-09-26 14:21

Message:
Logged In: YES 
user_id=9459

I don't suppose you did listReceivers() on the queue so we can

see which one is missing?



The only modification you are missing after 3.0.8 in the 3.0

branch is

this one (this fixes a copy on write bug that might show up

on an AS400):

http://cvs.sourceforge.net/viewcvs.py/jboss/jbossmq/src/main/org/jboss/mq/server/JMSDestinationManager.java?only_with_tag=Branch_3_0



Once 3.2.2 goes final I will be backporting all the latest

changes/fixes

into 3.0



Regards,

Adrian

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-09-26 13:33

Message:
Logged In: YES 
user_id=438410

Adrian,



 thanks for quick response. At 9pm customer stopped 

working. I don't think there was any problem with that. Then 

at 1 am the kill Jboss process (we've asked them several 

times not to do this but so far they don't listen) Then they 

start it again automatically. The problem is that by morning 

the number of receivers on the "from-talkman" queue is 27, 

while actually it is 28. If I redeploy the application it goes 

back to 28. If I shutdown Jboss and start it again I will get 27 

again.

----------------------------------------------------------------------

Comment By: Adrian Brock (ejort)
Date: 2003-09-26 13:23

Message:
Logged In: YES 
user_id=9459

I'm not sure what I'm looking at:



Your logs show everything stops at 9pm (was that a deadlock,

did you take a threaddump?)



You reboot at 1am, there are only three messages in the system,

which are persistent messages recovered in to-talkman during

recovery.



Everything else is listening on empty destinations.



Regards,

Adrian

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-09-26 11:34

Message:
Logged In: YES 
user_id=438410

Adrian.



We still encounter the problem every time Jboss is restarted. 

The logs of such restart, with all the traces you've mentioned 

before enabled are here:



ftp://ftp.vocognition.com/share/jboss/20030926.zip 



Ed.

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-09-04 16:26

Message:
Logged In: YES 
user_id=438410

No, we haven't yet, cause we managed to convince ourselves 

that it has been solved with 3.0.8



Now, to define it more clearly today I witness the following:

There is queue. It has 28 MDB's listening with selectors.

If I start Jboss with my ear already being in the deploy dir the 

queue reports (in JMX view) 27 receivers. Then I "touch" my 

ear, it gets redeployed and receivers are back to 28. I cannot 

confirm that the problem exists in exactly the same form as it 

used to be (MDB going offline after some time) but at least it 

is very related one. Should I open a different bug & enable 

logging in something different?

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-09-04 12:50

Message:
Logged In: YES 
user_id=438410

No, we haven't yet, cause we managed to convince ourselves 

that it has been solved with 3.0.8



Now, to define it more clearly today I witness the following:

There is queue. It has 28 MDB's listening with selectors.

If I start Jboss with my ear already being in the deploy dir the 

queue reports (in JMX view) 27 receivers. Then I "touch" my 

ear, it gets redeployed and receivers are back to 28. I cannot 

confirm that the problem exists in exactly the same form as it 

used to be (MDB going offline after some time) but at least it 

is very related one. Should I open a different bug & enable 

logging in something different?

----------------------------------------------------------------------

Comment By: Adrian Brock (ejort)
Date: 2003-09-01 13:56

Message:
Logged In: YES 
user_id=9459

Did you go through the logging changes I mentioned in my

previous post?



Regards,

Adrian

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-09-01 13:45

Message:
Logged In: YES 
user_id=438410

Adrian,



 We have just experienced the same problem at the 

customer's with 3.0.8. Right after the Jboss is started the 

number of listeners on the queue is less by one that it should 

be. If I redeploy the ear it goes back again. Still after some 

activity it drops again. 



 Looks exactly like it used to be... Sorry. Is there anything we 

can do to identify what the problem is?



 Ed

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-09-01 13:45

Message:
Logged In: YES 
user_id=438410

Adrian,



 We have just experienced the same problem at the 

customer's with 3.0.8. Right after the Jboss is started the 

number of listeners on the queue is less by one that it should 

be. If I redeploy the ear it goes back again. Still after some 

activity it drops again. 



 Looks exactly like it used to be... Sorry. Is there anything we 

can do to identify what the problem is?



 Ed

----------------------------------------------------------------------

Comment By: Adrian Brock (ejort)
Date: 2003-06-25 12:19

Message:
Logged In: YES 
user_id=9459

Ok, reopen if you still have problems.



I do have one outstanding fix to backport from 3.2 to 3.0,

I'll try to find time to do in the next couple of days.



Regards,

Adrian

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-06-25 11:56

Message:
Logged In: YES 
user_id=438410

Sorry, Adrian.



 Yes, we took 3.0.8 and it seems to be working, at least here 

in the office we were not able to break it. We have delivered 

the version with 3.0.8 to the customer and now we just have 

to wait when their IT department installs it. 



 Personally, I think you can close this bug.



Ed.

----------------------------------------------------------------------

Comment By: Adrian Brock (ejort)
Date: 2003-06-25 11:47

Message:
Logged In: YES 
user_id=9459

ping

----------------------------------------------------------------------

Comment By: Adrian Brock (ejort)
Date: 2003-05-23 18:08

Message:
Logged In: YES 
user_id=9459

Hi can you try this again with latest CVS either 3.0 or 3.2

I just fixed a problem in the ServerSession pool that

was causing a similar problem with MQSeries

as the jms provider.



Regards,

Adrian

----------------------------------------------------------------------

Comment By: Adrian Brock (ejort)
Date: 2003-05-14 13:41

Message:
Logged In: YES 
user_id=9459

Hi Eduard,



I had a quick look at your logs over the weekend but

I didn't come to any conclusion.



I have a suspicion that are there are two problems

that maybe related?



1) The Nack when not waiting for a message in the

temporary reply queue.

2) The MDB not receiving any messages.



If it is a deadlock you should be able to get a

thread dump on windows using a ctrl-break.

If you are using java 1.3 start jboss with -Xdebug

so you can see what is locked.



The logs you included did not include the full trace.

information. Do you have the trace nterceptor

configured in jbossmq-service.xml?

Can you run it with it with TRACE enabled for

org.jboss.mq

but use INFO for the very noisy

org.jboss.mq.server.MessageCache

org.jboss.mq.server.MessageReference



For the mdb you will also want to include

org.jboss.jms

org.jboss.ejb.plugins.jms



Also the class org.jboss.mq.MessageConsumer

which is where the NACK occurs doesn't have much

TRACE logging.

You might want to add some, in particular the

changes for field waitingForMessage



Regards,

Adrian

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-05-14 13:27

Message:
Logged In: YES 
user_id=438410

Adrian,



 we have the same problem with Jboss 3.2.1 on Windows, 

looks like a thread deadlock or something. Silly question, but 

would thread dumps, trace logs help?

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-05-09 08:26

Message:
Logged In: YES 
user_id=438410

Adrian,



 We just tried it with 3.2.1 - same problem

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-05-09 07:32

Message:
Logged In: YES 
user_id=438410

Adrian,



 We experience the same problem with Jboss 3.0.7 and 

Jboss 3.2.0. 

 Please tell me what can I do to help you solve it?

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-05-09 07:31

Message:
Logged In: YES 
user_id=438410

Adrian,



 We experience the same problem with Jboss 3.0.7 and 

Jboss 3.2.0. 

 Please tell me what can I do to help you solve it?

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-04-17 08:11

Message:
Logged In: YES 
user_id=438410

Ok, upload failed. I am putting the logs and deployment 

descriptors on our ftp server.



ftp://ftp.vocognition.com/share/jboss/20030416.zip

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-04-17 08:00

Message:
Logged In: YES 
user_id=438410

Thank you, Adrian. I was thinking about this as well after I've 

seen your answer to some other bug. I will try it today. 



Attached are the server and trace logs. Jboss was just 

started. It has began answering requests with several MDBs 

(you can search for thread signatures 1050489832192, 

1050489845715, 1050489880169, 1050489913857, 

1050489922977, 1050489948331, 1050489976720, 

1050490009021 to see it). 



Then a request came for GetAssignmentPuts (see 

1050490010994, 1050490048469, 1050490082657, 

1050480116032) that should have invoked 

GetAssignmentPutsMDB but never did.

----------------------------------------------------------------------

Comment By: Adrian Brock (ejort)
Date: 2003-04-16 16:41

Message:
Logged In: YES 
user_id=9459

Yes, attach the trace logs. Preferably the relevent part

where it goes offline and a bit before.



Is it possible for you test this with jboss3.0.7?

There have been a number of fixes since 3.0.4

that could be relevent.

One was a fix where the subscription ids were

not being generated uniquely.

Another was a fix to the message consumer where

it could stall if it nacked a message that had passed

its time to live.

Another was to update to the latest concurrent.jar



Regards,

Adrian

----------------------------------------------------------------------

Comment By: Eduard Letifov (techtime)
Date: 2003-04-16 15:56

Message:
Logged In: YES 
user_id=438410

Adrian,



I have applied the patch, and deployed it as the customer's 

but unfortunately it haven't solved the main problem. MDB 

goes "offline" almoust everyday, and sometimes right from 

the start of the server. 



Please, can you at least direct me where to look next?



BTW. I do have the trace logs, should I attach them here?

----------------------------------------------------------------------

Comment By: Adrian Brock (ejort)
Date: 2003-03-14 17:55

Message:
Logged In: YES 
user_id=9459

I've changed the ClientConsumer to not clone

the subscriptions, except when iterating at close to avoid

concurrent modification exceptions.

The semantics of this synchronization are wrong.



There are two other locations in the code doing a

similar thing. The changes for these are still outstanding

so I haven't closed this bug report yet.



SpySession and consumers

JMSDestinationManager and destinations



Regards,

Adrian

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=701731&group_id=22866


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
JBoss-Development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development

Reply via email to