Bugs item #673249, was opened at 2003-01-23 18:20
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=673249&group_id=22866
Category: JBossServer
Group: v3.2
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Matt Cleveland (groovesoftware)
Assigned to: David Jencks (d_jencks)
Summary: 3.2RC1 Oracle XA Problem
Initial Comment:
I'm running into a problem with Oracle XA in 3.2RC1.
I'm running Oracle 9.2.0.1. I know there have been a
bunch of problems with the Oracle XA driver and I know
some of them are supposed to be fixed in 3.2RC1 but I
think this is yet another Oracle problem.
I have a really simple test. I have a client that starts up
N threads. Each thread calls an EJB. The EJB gets an
Oracle connection (from an XA pool) and inserts a
record into the database and then closes the
connection and returns. This all works fine under lower
load, but the log file shows the stack trace below
occasionally under heavy load. In some cases I then
start getting "ORA-01591: lock held by in-doubt
distributed transaction" on Oracle calls after the error.
The client is not receiving this error. In fact it is only
reported as a warning. Still it's pretty scary to see
these flying by in the log file. It leaves you wondering if
the transaction committed or rolled back. From the
stack trace I believe that the transaction rolled back and
this is still an Oracle concurrency bug, but
if that's not the case I wish the log message told me
that.
I've tried with and without TrackConnectionByTx. My
oracle-xa-ds.xml is pasted below the stack trace.
2003-01-21 21:42:09,141 WARN
[org.jboss.tm.TransactionImpl]
XAException: tx=Tra
nsactionImpl:XidImpl [FormatId=257,
GlobalId=malt//1809, BranchQual=]
errorCode=XAER_RMERR
oracle.jdbc.xa.OracleXAException
at oracle.jdbc.xa.OracleXAResource.checkError
(OracleXAResource.java:1157)
at oracle.jdbc.xa.client.OracleXAResource.commit
(OracleXAResource.java:590)
at
org.jboss.resource.adapter.jdbc.xa.XAManagedConnecti
on.commit(XAManagedConnection.java:140)
at org.jboss.tm.TransactionImpl.commitResources
(TransactionImpl.java:1420)
at org.jboss.tm.TransactionImpl.commit
(TransactionImpl.java:349)
at
org.jboss.ejb.plugins.TxInterceptorCMT.endTransaction
(TxInterceptorCMT.java:361)
at
org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransacti
ons(TxInterceptorCMT.java:247)
at org.jboss.ejb.plugins.TxInterceptorCMT.invoke
(TxInterceptorCMT.java:101)
at org.jboss.ejb.plugins.SecurityInterceptor.invoke
(SecurityInterceptor.java:130)
at org.jboss.ejb.plugins.LogInterceptor.invoke
(LogInterceptor.java:204)
at
org.jboss.ejb.plugins.CleanShutdownInterceptor.invoke
(CleanShutdownInterceptor.java:265)
at
org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invo
ke(ProxyFactoryFinderInterceptor.java:154)
at org.jboss.ejb.StatelessSessionContainer.invoke
(StatelessSessionContai
ner.java:303)
at org.jboss.ejb.Container.invoke
(Container.java:680)
at org.jboss.mx.server.MBeanServerImpl.invoke
(MBeanServerImpl.java:549)
at
org.jboss.invocation.jrmp.server.JRMPInvokerHA.invoke
(JRMPInvokerHA.java:163)
at java.lang.reflect.Method.invoke(Native Method)
at sun.rmi.server.UnicastServerRef.dispatch
(UnicastServerRef.java:236)
at sun.rmi.transport.Transport$1.run
(Transport.java:147)
at java.security.AccessController.doPrivileged
(Native Method)
at sun.rmi.transport.Transport.serviceCall
(Transport.java:143)
at
sun.rmi.transport.tcp.TCPTransport.handleMessages
(TCPTransport.java:460)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.r
un(TCPTransport
.java:701)
at java.lang.Thread.run(Thread.java:479)
oracle-xa-ds
------------------
<?xml version="1.0" encoding="UTF-8"?>
<datasources>
<xa-datasource>
<jndi-name>XaOracleDS</jndi-name>
<track-connection-by-tx>true</track-connection-by-
tx>
<managedconnectionfactory-
class>org.jboss.resource.adapter.jdbc.xa.oracle.XAOrac
leManagedConnectionFactory</managedconnectionfacto
ry-class>
<!--xa-datasource-
class>oracle.jdbc.xa.client.OracleXADataSource</xa-
datasource-class-->
<xa-datasource-property
name="URL">jdbc:oracle:thin@server:port:sid</xa-
datasource-property>
<xa-datasource-property name="User">scott</xa-
datasource-property>
<xa-datasource-property
name="Password">tiger</xa-datasource-property>
<min-pool-size>0</min-pool-size>
<max-pool-size>50</max-pool-size>
<blocking-timeout-millis>20000</blocking-timeout-
millis>
<idle-timeout-minutes>15</idle-timeout-minutes>
</xa-datasource>
</datasources>
Thanks,
Matt Cleveland
----------------------------------------------------------------------
>Comment By: David Jencks (d_jencks)
Date: 2003-01-28 20:08
Message:
Logged In: YES
user_id=60525
You should have been seeing a TransactionRolledBackException or (if you were in vm
using a local interface) TransactionRolledBackLocalException. The ejb container is
supposed to do a thorough job of insulating you from dealing with low level
XAExceptions.
----------------------------------------------------------------------
Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 19:47
Message:
Logged In: YES
user_id=85088
David, as far as I can see your test is the same, but I do not
have a very good understanding of when/how the Oracle
exception was failing compared to how your test fails. What
exception will the client see? I was seeing an exception in
the client, but it wasn't the RMERR exception so I didn't
consider it to be the same problem, but hadn't had time to
investigate further. I don't recall now what exception I was
seeing in the client. I didn't correlate it with the RMERR
because it seemed to occur to long after the RMERR to be
the same exception. I was seeing the exception in the client
after several RMERR exceptions scrolled by in the server
log. But thinking about it now with the log files scrolling by
and with network and IO lag etc. the timing could have been
off and they could have been the same, but I never looked
very closely because they were different exceptions and
different messages. I have been expecting an RMERR to
appear on the client.
----------------------------------------------------------------------
Comment By: David Jencks (d_jencks)
Date: 2003-01-28 19:33
Message:
Logged In: YES
user_id=60525
I implemented pluggable XAException handling, it doesn't break anything but I don't
know that it really works either. If you (Matt or Igor) have any ideas on how to test
it (for Oracle) please do so.
I have no idea what to do about the lack of exception propagation since the testcase I
wrote IS propagating the exception correctly. Matt, can you see any real difference
between the testcase that works and your code that doesn't? Without more ideas I
will have to wait until I can get Oracle installed and try to reproduce the problem by
limiting the number of sessions.
----------------------------------------------------------------------
Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 19:25
Message:
Logged In: YES
user_id=85088
Increasing max sessions and max processes seems to have
fixed the problem. I am running with 100 threads for quite
some time now and I have not encountered the error.
Igor, my apologies for making you work so hard on
something that turns out to be Oracle configuration. Is there
any way to get this error reported so that users will be able to
self-diagnose this? I like your idea of pluggable exception
formatters. Perhaps XAOracleManagedConnectionFactory
could intercept the exception, do some better logging and
rethrow it, or maybe that wouldn't work. I'm not too familiar
with the code.
David, this leaves the problem with the exception being
propagated to the client in an unknown state. Now that the
problem is fixed I doubt I can get the database parameters
changed back which means I can't reproduce it anymore.
Any ideas on how to proceed?
----------------------------------------------------------------------
Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 17:33
Message:
Logged In: YES
user_id=85088
Using "select count(*) from v$session" I have seen the test
fail with session counts in the high 120 range, but I have
confirmed that the number of sessions can get at least as
high as 145. Of course I can't confirm that the number of
sessions didn't spike in between my queries to check it
(there are other users on the system), but I was frantically
requerying every couple of seconds and the number seemed
to move in a predictable, incremental manner.
One thing that's odd, when I was running the test someone
else received a max processes exceeded when running
SQL*Plus. Perhaps the error being reported is not exactly
correct and the problem has something to do with this. I will
investigate this angle to the best of my ability.
I'm still working on that init.ora file.
----------------------------------------------------------------------
Comment By: Igor Fedorenko (igorfie)
Date: 2003-01-28 16:33
Message:
Logged In: YES
user_id=232950
That's weird. You can monitor total number of oracle sessions
using "select count(*) from v$session" because oracle starts
a number of internal sessions (I do not know if this number
can change or not).
Oracle used to limit number of distributed transactions but
this limitation was removed in 9.2 as far as I know. Check
with documentation @ http://technet.oracle.com.
Also, can you send me contents of your init.ora file, I want to
compare it with mine.
----------------------------------------------------------------------
Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 16:10
Message:
Logged In: YES
user_id=85088
I can rule out the simple matter of exceeding max sessions
and I can also rule out JBoss leaking sessions. Here's how I
know.
I run with a pool size of 64 connections. If I run with a client
program executing 55 threads and monitor the number of
sessions in SQL*Plus then as the test runs the number of
sessions eventually reaches 56, that's 55 for JBoss and 1 for
SQL*Plus. I eventually get the exception.
Again running with a pool size of 64 connections. If I run with
a client program executing 50 threads and monitor the
number of sessions in SQL*Plus then as the test runs the
number of sessions eventualy reached 51, that's 50 for JBoss
and 1 for SQL*Plus. I also eventually get the exception in
this case, but the number of sessions never exceeds 51.
So, I know my max sessions is at least 56 but I get the error
with only 51 open. So, JBoss is not leaking sessions, but I
am not hitting max sessions either. I can't rule out an Oracle
bug where it thinks it hit max sessions but didn't, but I can't
prove it either.
Any ideas?
Is there any other variable besides SESSIONS that may be
involved? Is there perhaps a limit on XA transactions or
something like that?
I know you're not Oracle support, and I appreciate your help.
I just want to completely rule out a problem on the JBoss end
before going to Oracle and I also want to have the correct
information to report to Oracle if I do need to go to them.
Thanks,
Matt Cleveland
----------------------------------------------------------------------
Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 15:29
Message:
Logged In: YES
user_id=85088
I will confirm, but I *THINK* I have enough sessions. I have
seen a different error in the past when running out of
sessions. I will do some testing today and see what I can
find.
----------------------------------------------------------------------
Comment By: Igor Fedorenko (igorfie)
Date: 2003-01-28 15:17
Message:
Logged In: YES
user_id=232950
Correction -- number of SCOTT's sessions before running the
test is supposed to be zero. Sorry for the confusion ;-)
----------------------------------------------------------------------
Comment By: Igor Fedorenko (igorfie)
Date: 2003-01-28 14:45
Message:
Logged In: YES
user_id=232950
Matt,
You are running out of oracle database connections. Here is
what "oracle error: 18" means:
oahu:$ oerr ora 18
00018, 00000, "maximum number of sessions exceeded"
// *Cause: All session state objects are in use.
// *Action: Increase the value of the SESSIONS initialization
parameter.
If you believe that JBoss is leaking sessions please provide a
test case that shows this problem. Possible test procedure
would look like
1. Make sure nobody else is using the same database
schema SCOTT
2. Using SQL*Plus connect to Oracle as SYSTEM and
execute "select count(*) from v$session where
username='SCOTT'" to get initial number of sessions (it's
about 9 sessions in idle database)
3. Start JBoss, run your test
4. Using SQL*Plus connect to Oracle as SYSTEM and
execute "select count(*) from v$session where
username='SCOTT'" to get number of sessions after running
the test
A bug exists if the difference between number of sessions
before and after the test is greater then maximum number of
connections in the pool.
Hope this helps.
On a related topic. It'd be nice to allow pluggable
XAException formatters (for vendor specific error messages).
----------------------------------------------------------------------
Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-28 00:12
Message:
Logged In: YES
user_id=85088
Per Igor Federenko, I tested with a special jboss-
transaction.jar that would output Oracle specific debug
messages. When re-running my test I received the following
error report in the server log file.
2003-01-27 23:44:59,030 WARN
[org.jboss.tm.TransactionImpl] xa error: -3 (A res
ource manager error has occured in the transaction branch.);
oracle error: 18; o
racle sql error: 0;
oracle.jdbc.xa.OracleXAException
at oracle.jdbc.xa.OracleXAResource.checkError
(OracleXAResource.java:1157
)
at oracle.jdbc.xa.client.OracleXAResource.commit
(OracleXAResource.java:5
90)
at
org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.co
mmit(XAManag
edConnection.java:140)
at org.jboss.tm.TransactionImpl.commitResources
(TransactionImpl.java:147
3)
at org.jboss.tm.TransactionImpl.commit
(TransactionImpl.java:352)
at
org.jboss.ejb.plugins.TxInterceptorCMT.endTransaction
(TxInterceptorCM
T.java:361)
at
org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransactions
(TxIntercep
torCMT.java:247)
at org.jboss.ejb.plugins.TxInterceptorCMT.invoke
(TxInterceptorCMT.java:1
01)
at org.jboss.ejb.plugins.SecurityInterceptor.invoke
(SecurityInterceptor.
java:130)
at org.jboss.ejb.plugins.LogInterceptor.invoke
(LogInterceptor.java:204)
at
org.jboss.ejb.plugins.CleanShutdownInterceptor.invoke
(CleanShutdownIn
terceptor.java:265)
at
org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke
(ProxyFacto
ryFinderInterceptor.java:154)
at org.jboss.ejb.StatelessSessionContainer.invoke
(StatelessSessionContai
ner.java:303)
at org.jboss.ejb.Container.invoke(Container.java:680)
at org.jboss.mx.server.MBeanServerImpl.invoke
(MBeanServerImpl.java:549)
at
org.jboss.invocation.jrmp.server.JRMPInvokerHA.invoke
(JRMPInvokerHA.j
ava:163)
at java.lang.reflect.Method.invoke(Native Method)
at sun.rmi.server.UnicastServerRef.dispatch
(UnicastServerRef.java:236)
at sun.rmi.transport.Transport$1.run(Transport.java:147)
at java.security.AccessController.doPrivileged(Native
Method)
at sun.rmi.transport.Transport.serviceCall
(Transport.java:143)
at sun.rmi.transport.tcp.TCPTransport.handleMessages
(TCPTransport.java:4
60)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run
(TCPTransport
.java:701)
at java.lang.Thread.run(Thread.java:479)
2003-01-27 23:44:59,035 WARN
[org.jboss.tm.TransactionImpl] XAException: tx=Tra
nsactionImpl:XidImpl [FormatId=257,
GlobalId=redhook.synxis.com//511, BranchQual
=] errorCode=XAER_RMERR
oracle.jdbc.xa.OracleXAException
at oracle.jdbc.xa.OracleXAResource.checkError
(OracleXAResource.java:1157
)
at oracle.jdbc.xa.client.OracleXAResource.commit
(OracleXAResource.java:5
90)
at
org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.co
mmit(XAManag
edConnection.java:140)
at org.jboss.tm.TransactionImpl.commitResources
(TransactionImpl.java:147
3)
at org.jboss.tm.TransactionImpl.commit
(TransactionImpl.java:352)
at
org.jboss.ejb.plugins.TxInterceptorCMT.endTransaction
(TxInterceptorCM
T.java:361)
at
org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransactions
(TxIntercep
torCMT.java:247)
at org.jboss.ejb.plugins.TxInterceptorCMT.invoke
(TxInterceptorCMT.java:1
01)
at org.jboss.ejb.plugins.SecurityInterceptor.invoke
(SecurityInterceptor.
java:130)
at org.jboss.ejb.plugins.LogInterceptor.invoke
(LogInterceptor.java:204)
at
org.jboss.ejb.plugins.CleanShutdownInterceptor.invoke
(CleanShutdownIn
terceptor.java:265)
at
org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke
(ProxyFacto
ryFinderInterceptor.java:154)
at org.jboss.ejb.StatelessSessionContainer.invoke
(StatelessSessionContai
ner.java:303)
at org.jboss.ejb.Container.invoke(Container.java:680)
at org.jboss.mx.server.MBeanServerImpl.invoke
(MBeanServerImpl.java:549)
at
org.jboss.invocation.jrmp.server.JRMPInvokerHA.invoke
(JRMPInvokerHA.java:163)
at java.lang.reflect.Method.invoke(Native Method)
at sun.rmi.server.UnicastServerRef.dispatch
(UnicastServerRef.java:236)
at sun.rmi.transport.Transport$1.run(Transport.java:147)
at java.security.AccessController.doPrivileged(Native
Method)
at sun.rmi.transport.Transport.serviceCall
(Transport.java:143)
at sun.rmi.transport.tcp.TCPTransport.handleMessages
(TCPTransport.java:4
60)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run
(TCPTransport
.java:701)
at java.lang.Thread.run(Thread.java:479)
----------------------------------------------------------------------
Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-27 16:35
Message:
Logged In: YES
user_id=85088
Oops, I spoke too soon. The error is NOT being propagated
to the client using the latest CVS for 3.2. Looks like we
need to look further. In my test I received multiple RMERR
exceptions in the server log file, but none were reported to
the test client.
----------------------------------------------------------------------
Comment By: Matt Cleveland (groovesoftware)
Date: 2003-01-27 16:25
Message:
Logged In: YES
user_id=85088
The problem with the error being propagated to the client is
fixed.
I'm not convinced of your answer regarding the RMERR.
First of all, the in-doubt tx error comes after the RMERR,
which makes sense. If JBoss failed to commit or rollback
any transaction for any reason then it would become in-doubt
because Oracle would not know whether it should be
committed or rolled back, right? Second, this RMERR
exception looks very much like the type of exceptions you
will get using JBoss with Oracle if you turn off
TrackConnectionByTX or do not use the
XAOracleManagedConnectionFactory. Now, I'm not saying
it's not an Oracle oddity or a behavior that differs from other
XA drivers, but those are the kinds of things that
TrackConnectionByTX and
XAOracleManagedConnectionFactory are designed to fix. I'm
hoping someone can do the same with this one or at least
rule out the possibility of doing the same.
Just to keep this bug report up to date with some activity in
the dev list, here are the details of how to reproduce the bug.
> Ok, it took a while, but I can confirm that your test
produces the error on
> JBoss 3.2 from CVS with clustering turned off. Two things
you might be
> missing are 1) increasing the thread count in the client to
100 makes it
> more likely to happen more quickly and 2) the test client
does not receive
> the error. The error ONLY shows up in the server log file
(and stdout).
>
> We are using Oracle 9.2.0.1.0. The JDBC driver version is
9.2.0.0.0 as
> reported in the manifest.
>
> Just to make sure I'm not missing something here are all
the boring details
> of what I did.
>
> 1. Got the latest from CVS
> 2. ./build.sh clobber
> 3. built JBoss with integrated Tomcat 4.1.18
> 4. Tweaked TestBean as follows to make it work in my
build environment.
> None of these changes should matter to the test.
> - changed bean name from test/Test to Test
> - changed the view-type to remote because our build
doesn't do
> <localinterface> for xdoclet
> - changed the data source name. Yours was
XAOracleDS and mine is
> XaOracleDS
> - changed the name of your remote interface to
TestRemoteIF to match
> our naming conventions
> 5. made corresponding changes to TestMtClient and
increased the number of
> threads to 100
> 6. built into an EAR
> 7. added my oracle-xa-ds.xml to the default configuration
> 8. turned on Pad in the XidFactory for the transaction
manager in the
> default configuration
> 9. deployed my EAR to the default configuration
> 10. started the default configuration
> 11. ran TestMtClient long enough to get the error. The
error shows up in
> the server log file and stdout.
----------------------------------------------------------------------
Comment By: David Jencks (d_jencks)
Date: 2003-01-27 05:53
Message:
Logged In: YES
user_id=60525
I've fixed the problem with no error showing up to the client in Branch_3_2 cvs.
Please check that the error is being propagated as you expect to the client.
I think the original RMERR may well be an Oracle problem since the stack trace
indicates that onephase commit is being called. In this case any in-doubt transaction
can be in doubt only because Oracle has lost track of its own internal state. (At
least, since jboss is not calling prepare, I can't see how jboss has anything to do
with an in-doubt tx).
You can check the error propagation with running this test:
cd testsuite
./build.sh one-test -Dtest=org.jboss.test.jca.test.XAExceptionUnitTestCase
Please report back your results, if satisfactory I will port to 3.0 and 4 if necessary.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=673249&group_id=22866
-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
Jboss-development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development