[
https://issues.apache.org/jira/browse/ODE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787424#action_12787424
]
Tammo van Lessen commented on ODE-647:
--------------------------------------
I could reproduce this bug on ODE-trunk even with Axis 1.4.1. The issue is
slightly different but still occurs. During a long debug session last night I
could isolate the issue. See my findings below:
The issue has been introduced by r790694 which basically fixes a memory leak
with Axis2. Without this change set, Axis2 didn't complain about an existing
service with the same name, instead it removes the old ServiceClient and
registeres the new one with a new ServiceGroupContext. This is probably leaking
some memory.
The bug itself is a synchronization issue in SoapExternalService caused by a
static thread-local object that serves as a cache for ServiceClient objects.
SoapExternalServices represent external partner services and are called by the
engine to invoke those partners. The invocation itself is deferred and may be
executed concurrently by multiple threads (pooled by an ExecutorService). See
also ODE-382. So theoretically it is possible to have two concurrent workers
calling the same partner with different (or even the same) operation, hence the
workers should not use the same axis2 service client as this would cause
strange side-effects. From what I could get from the class' history, this
scenario is the reason why a thread-local object has been introduced.
Back to bug: Having a thread-local cache basically means that each worker has
it's own cache. Unfortunately, we don't route external invocations to the same
thread but instead get randomly one from the pool.
Assume we have 2 of x threads {t1, t2} and 2 services {s1, s2}.
1. ODE calls s1, the pool chooses t1: (t1, s1).invoke
2. ODE calls s2, the pool chooses t2: (t2, s2).invoke
Now t1 has stored the ServiceClient for s1 in its thread-local cache and t2 the
ServiceClient for s2
3. ODE calls s1 again, but this time the pool returns t2: (t2, s1).invoke
Now the thread-local cache for t2 returns s2, we run into the if-branch in
SoapExternalService.getServiceClient(), clean up and discard the
s2-ServiceClient (why?) and re-create s1, which is still registered with Axis2,
hence it bitterly complains.
Now the question is, what would be the best fix?
I see two possibilities:
a) Fix thread-local caching. The current caching strategy is IMO not really
helpful as it is caching exactly 1 serviceclient instance per thread, and the
chance that the same thread is interacting with this particular service again
is not really high (depending on the pool size, /dev/urandom and the current
weather conditions). An option would be to give each thread a cache set, so
that each thread can have multiple service clients (in the worst case for each
SoapExternalService). The services must have different names so that concurring
workers interacting with the same external service use different service client
instances. However, we have to find a way to release them again at some point
in time.
b) Drop the thread-local caching and synchronize access to the ServiceClient
associated with SoapExternalService.
c) you tell me.
As I currently don't know how expensive creation and storage of ServiceClient
instances is, I'd like to know what you think. a), b), or even c)?
Thanks,
Tammo
> Multiple consecutive invocations to a service might incur an axis2.AxisFault
> of "two services cannot have same name".
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: ODE-647
> URL: https://issues.apache.org/jira/browse/ODE-647
> Project: ODE
> Issue Type: Bug
> Affects Versions: 1.3.3
> Environment: ODE1.3.3, Tomcat 5.5.9, Sun JVM 1.5, Windows XP SP3
> Reporter: Wenfeng Zhao
> Assignee: Alexis Midon
> Priority: Critical
> Fix For: 1.3.4
>
> Attachments: ODE-647-outputs.txt, ODE647.zip
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Although version 1.3.2 and 2.0 are OK, with ODE 1.3.3, it seems that
> multiple consecutive invocations to a same component service might incur the
> following exception:
> ERROR - GeronimoLog.error(108) | Error sending message to Axis2 for ODE mex
> {PartnerRoleMex#hqejbhcnphr4i3dscxlf10 [PID
> {http://scqr.bupt.edu.cn/solution}process_SyntheticBookService_sol2-12]
> calling null.operation1(...)}
> org.apache.axis2.AxisFault: Two services cannot have same name. A service
> with the
> axis_service_for_{http://example.org/writerInfo}writerInfoService#writerInfoPort_hqejbhcnphr4i3dscxlf0r
> name already exists in the system.
> at
> org.apache.axis2.client.ServiceClient.configureServiceClient(ServiceClient.java:172)
> at org.apache.axis2.client.ServiceClient.<init>(ServiceClient.java:139)
> at
> org.apache.ode.axis2.SoapExternalService.getServiceClient(SoapExternalService.java:281)
> at
> org.apache.ode.axis2.SoapExternalService.invoke(SoapExternalService.java:140)
> at
> org.apache.ode.axis2.MessageExchangeContextImpl.invokePartner(MessageExchangeContextImpl.java:52)
> at
> org.apache.ode.bpel.engine.BpelRuntimeContextImpl.invoke(BpelRuntimeContextImpl.java:781)
> at org.apache.ode.bpel.runtime.INVOKE.run(INVOKE.java:100)
> at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at
> org.apache.ode.jacob.vpu.JacobVPU$JacobThreadImpl.run(JacobVPU.java:451)
> at org.apache.ode.jacob.vpu.JacobVPU.execute(JacobVPU.java:139)
> at
> org.apache.ode.bpel.engine.BpelRuntimeContextImpl.execute(BpelRuntimeContextImpl.java:875)
> at
> org.apache.ode.bpel.engine.BpelProcess.handleWorkEvent(BpelProcess.java:438)
> at
> org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:439)
> at
> org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:441)
> at
> org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:411)
> at
> org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:405)
> at
> org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:218)
> at
> org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:404)
> at
> org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:401)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
> at java.util.concurrent.FutureTask.run(FutureTask.java:123)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
> at java.lang.Thread.run(Thread.java:595)
> And I noted that a similar problem has been discussed in 2007(
> https://issues.apache.org/jira/browse/AXIS2-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476874
> ). But I'm not clear whether there are relations between the two problems.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.