Re: VFS - Synapse Memory Leak

Andreas Veithen Fri, 20 Mar 2009 02:05:59 -0700

Of course the memory allocated to a message will be freed once the
message has been processed. That is why it's important to set the
OUT_ONLY property: if it is not set correctly, Synapse will keep the
message context (with the payload) in a callback table to correlate it
with a future response (which in your case never comes in). Probably
there is something to improve here in Synapse:
- The VFS transport should trigger an error if there is a mismatch
between the message exchange pattern and the transport configuration
of the service (the transport.vfs.* parameters).
- Synapse should start issuing warnings when the number of entries in
the callback table reaches a certain threshold.


Andreas

On Fri, Mar 20, 2009 at 01:41, Kim Horn <[email protected]> wrote:
> Not really; I cannot see why memory should permanently grow when I pass the 
> same file
> repeatedly through VFS. In theory this means VFS will always consume all the 
> available memory
> given enough time and file iterations. Therefore VFS cannot be used in a 
> production system.
> This is definition of Memory Leak. I would expect SOME overhead on top of 
> file size but
> I would assume the memory no longer required would be re-claimed. I would 
> also assume
> The overhead was not 10 times the file size; seems excessive.
>
> Yes I understand the streaming approach should in theory use a fixed and much 
> smaller amount of memory;
> but haven't tested that yet either. No reason given above memory leak that it 
> should not permanently grow
> but at a smaller rate aswell.
>
> Thanks
> Kim
>
> -----Original Message-----
> From: Andreas Veithen [mailto:[email protected]]
> Sent: Friday, 20 March 2009 10:52 AM
> To: [email protected]
> Subject: Re: VFS - Synapse Memory Leak
>
> If N is the size of the file, the memory consumption caused by the
> transport is O(N) with transport.vfs.Streaming=false and O(1) with
> transport.vfs.Streaming=true. The getTextAsStream and writeTextTo
> methods in org.apache.axis2.format.ElementHelper are there to allow
> you to implement your mediator with O(1) memory usage, so that the
> overall memory consumption remains O(1). Does that answer your
> question?
>
> Andreas
>
> On Thu, Mar 19, 2009 at 23:33, Kim Horn <[email protected]> wrote:
>> It's the same Synapse.xml as specified originally and same trace. If you are 
>> using Nabble you can see this, in case you lost the prior emails I can post 
>> them again.
>>
>> I must admit I did not set those extra parameters, you mentioned, but I 
>> don't see why you should set parameter to Stop a memory leak. I guessed 
>> these parameter would just reduce the large amounts of memory it appears to 
>> be using, e.g. 10 times the file size, via streaming ? Why is their 10 
>> copies of the data floating around ? Lots of buffering. This issue suggests 
>> to me that any use of VFS will eventually kill the Server. Even with smaller 
>> files it will eventually use all available memory. I guess I did not 
>> understand the actual reason for this issue from prior discussion.
>>
>> I will try your extra parameters today though.
>>
>> Thanks
>> Kim
>>
>>
>> -----Original Message-----
>> From: Andreas Veithen [mailto:[email protected]]
>> Sent: Thursday, 19 March 2009 5:48 PM
>> To: [email protected]
>> Subject: Re: VFS - Synapse Memory Leak
>>
>> Kim,
>>
>> Can you post your current synapse.xml as well as the stack trace you get now?
>>
>> Andreas
>>
>> On Thu, Mar 19, 2009 at 07:20, kimhorn <[email protected]> wrote:
>>>
>>> Using the last stable build from 15 March 2009 I still get exactly same
>>> behaviour as originally
>>> described with the above script. VFS still just dies. Would your fixes be in
>>> this ?
>>>
>>> Using the last st
>>>
>>> Andreas Veithen-2 wrote:
>>>>
>>>> I committed the code and it will be available in the next WS-Commons
>>>> transport build. The methods are located in
>>>> org.apache.axis2.format.ElementHelper in the axis2-transport-base
>>>> module.
>>>>
>>>> Andreas
>>>>
>>>> On Thu, Mar 12, 2009 at 00:06, Kim Horn <[email protected]> wrote:
>>>>> Hello Andreas,
>>>>> This is great and really helps, have not had time to try it out but will
>>>>> soon.
>>>>>
>>>>> Contributing the java.io.Reader would be a great help but it will take me
>>>>> a while to get up to speed to do the Synapse iterator.
>>>>>
>>>>> In the short term I am going to use a brute force approach that is now
>>>>> feasible given the memory issue is resolved. Just thought of this one
>>>>> today. Use VFS proxy to FTP file locally; so streaming helps here. A
>>>>> POJOCommand on <out> to split file into another directory, stream in and
>>>>> out. Another independent VFS proxy watches that directory and submits
>>>>> each file to Web service. Hopefully memory will be fine. Overloading the
>>>>> destination may still be an issue ?
>>>>>
>>>>> Kim
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Andreas Veithen [mailto:[email protected]]
>>>>> Sent: Monday, 9 March 2009 10:55 PM
>>>>> To: [email protected]
>>>>> Subject: Re: VFS - Synapse Memory Leak
>>>>>
>>>>> The changes I did in the VFS transport and the message builders for
>>>>> text/plain and application/octet-stream certainly don't provide an
>>>>> out-of-the-box solution for your use case, but they are the
>>>>> prerequisite.
>>>>>
>>>>> Concerning your first proposed solution (let the VFS write the content
>>>>> to a temporary file), I don't like this because it would create a
>>>>> tight coupling between the VFS transport and the mediator. A design
>>>>> goal should be that the solution will still work if the file comes
>>>>> from another source, e.g. an attachment in an MTOM or SwA message.
>>>>>
>>>>> I thing that an all-Synapse solution (2 or 3) should be possible, but
>>>>> this will require development of a custom mediator. This mediator
>>>>> would read the content, split it up (and store the chunks in memory or
>>>>> an disk) and executes a sub-sequence for each chunk. The execution of
>>>>> the sub-sequence would happen synchronously to limit the memory/disk
>>>>> space consumption (to the maximum chunk size) and to avoid flooding
>>>>> the destination service.
>>>>>
>>>>> Note that it is probably not possible to implemented the mediator
>>>>> using a script because of the problematic String handling. Also,
>>>>> Spring, POJO and class mediators don't support sub-sequences (I
>>>>> think). Therefore it should be implemented as a full-featured Java
>>>>> mediator, probably taking the existing iterate mediator as a template.
>>>>> I can contribute the required code to get the text content in the form
>>>>> of a java.io.Reader.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Andreas
>>>>>
>>>>> On Mon, Mar 9, 2009 at 03:05, kimhorn <[email protected]> wrote:
>>>>>>
>>>>>> Although this is a good feature it may not solve the actual problem ?
>>>>>> The main first issue on my list was the memory leak.
>>>>>> However, the real problem is once I get this massive files I  have to
>>>>>> send
>>>>>> it to a web Service that can only take it in small chunks (about 14MB) .
>>>>>> Streaming it straight out would just kill the destination Web service.
>>>>>> It
>>>>>> would get the memory error. The text document can be split apart easily,
>>>>>> as
>>>>>> it has independant records on each line seperated by <CR> <LF>.
>>>>>>
>>>>>> In an earlier post; that was not responded too, I mentioned:
>>>>>>
>>>>>> "Otherwise; for large EDI files a VFS iterator Mediator that streams
>>>>>> through
>>>>>> input file and outputs smaller
>>>>>> chunks for processing, in Synapse, may be a solution ? "
>>>>>>
>>>>>> So I had mentioned a few solutions, in prior posts, solution now are:
>>>>>>
>>>>>> 1) VFS writes straight to temporary file, then a Java mediator can
>>>>>> process
>>>>>> the file by splitting it into many smaller files. These files then
>>>>>> trigger
>>>>>> another VFS proxy that submits these to the final web Service.
>>>>>> The problem is is that is uses the file system (not so bad).
>>>>>> 2) A Java Mediator takes the <text> package and splits it up by wrapping
>>>>>> into many XML <data> elements that can then be acted on by a Synapse
>>>>>> Iterator. So replace the text message with many smaller XML elements.
>>>>>> Problem is that this loads whole message into memory.
>>>>>> 3) Create another Iterator in Synapse that works on Regular expression
>>>>>> (to
>>>>>> split the text data) or actually uses a for loop approach to chop the
>>>>>> file
>>>>>> into chunks based on the loop index value. E.g. Index = 23 means a 14K
>>>>>> chunk
>>>>>> 23 chunks into the data.
>>>>>> 4) Using the approach proposed now - just submit the file straight
>>>>>> (stream
>>>>>> it) to another web service that chops it up. It may return an XML
>>>>>> document
>>>>>> with many sub elelements that allows the standard Iterator to work.
>>>>>> Similar
>>>>>> to (2) but using another service rather than Java to split document.
>>>>>> 5) Using the approach proposed now - just submit the file straight
>>>>>> (stream
>>>>>> it) to another web service that chops it up but calls a Synapse proxy
>>>>>> with
>>>>>> each small packet of data that then forwards it to the final WEb
>>>>>> Service. So
>>>>>> the Web Service iterates across the data; and not Synapse.
>>>>>>
>>>>>> Then other solutions replace Synapse with a stand alone Java program at
>>>>>> the
>>>>>> front end.
>>>>>>
>>>>>> Another issue here is throttling: Splitting the file is one issues but
>>>>>> submitting 100's of calls in parralel to the destination service would
>>>>>> result in time outs... So need to work in throttling.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ruwan Linton wrote:
>>>>>>>
>>>>>>> I agree and can understand the time factor and also +1 for reusing
>>>>>>> stuff
>>>>>>> than trying to invent the wheel again :-)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ruwan
>>>>>>>
>>>>>>> On Sun, Mar 8, 2009 at 4:08 PM, Andreas Veithen
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>> Ruwan,
>>>>>>>>
>>>>>>>> It's not a question of possibility, it is a question of available time
>>>>>>>> :-)
>>>>>>>>
>>>>>>>> Also note that some of the features that we might want to implement
>>>>>>>> have some similarities with what is done for attachments in Axiom
>>>>>>>> (except that an attachment is only available once, while a file over
>>>>>>>> VFS can be read several times). I think there is also some existing
>>>>>>>> code in Axis2 that might be useful. We should not reimplement these
>>>>>>>> things but try to make the existing code reusable. This however is
>>>>>>>> only realistic for the next release after 1.3.
>>>>>>>>
>>>>>>>> Andreas
>>>>>>>>
>>>>>>>> On Sun, Mar 8, 2009 at 03:47, Ruwan Linton <[email protected]>
>>>>>>>> wrote:
>>>>>>>> > Andreas,
>>>>>>>> >
>>>>>>>> > Can we have the caching at the file system as a property to support
>>>>>>>> the
>>>>>>>> > multiple layers touching the full message and is it possible make it
>>>>>>>> to
>>>>>>>> > specify a threshold for streaming? For example if the message is
>>>>>>>> touched
>>>>>>>> > several time we might still need streaming but not for the 100KB or
>>>>>>>> lesser
>>>>>>>> > files.
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Ruwan
>>>>>>>> >
>>>>>>>> > On Sun, Mar 8, 2009 at 1:12 AM, Andreas Veithen <
>>>>>>>> [email protected]>
>>>>>>>> > wrote:
>>>>>>>> >>
>>>>>>>> >> I've done an initial implementation of this feature. It is
>>>>>>>> available
>>>>>>>> >> in trunk and should be included in the next nightly build. In order
>>>>>>>> to
>>>>>>>> >> enable this in your configuration, you need to add the following
>>>>>>>> >> property to the proxy:
>>>>>>>> >>
>>>>>>>> >> <parameter name="transport.vfs.Streaming">true</parameter>
>>>>>>>> >>
>>>>>>>> >> You also need to add the following mediators just before the <send>
>>>>>>>> >> mediator:
>>>>>>>> >>
>>>>>>>> >> <property action="remove" name="transportNonBlocking"
>>>>>>>> scope="axis2"/>
>>>>>>>> >> <property action="set" name="OUT_ONLY" value="true"/>
>>>>>>>> >>
>>>>>>>> >> With this configuration Synapse will stream the data directly from
>>>>>>>> the
>>>>>>>> >> incoming to the outgoing transport without storing it in memory or
>>>>>>>> in
>>>>>>>> >> a temporary file. Note that this has two other side effects:
>>>>>>>> >> * The incoming file (or connection in case of a remote file) will
>>>>>>>> only
>>>>>>>> >> be opened on demand. In this case this happens during execution of
>>>>>>>> the
>>>>>>>> >> <send> mediator.
>>>>>>>> >> * If during the mediation the content of the file is needed several
>>>>>>>> >> time (which is not the case in your example), it will be read
>>>>>>>> several
>>>>>>>> >> times. The reason is of course that the content is not cached.
>>>>>>>> >>
>>>>>>>> >> I tested the solution with a 2GB file and it worked fine. The
>>>>>>>> >> performance of the implementation is not yet optimal, but at least
>>>>>>>> the
>>>>>>>> >> memory consumption is constant.
>>>>>>>> >>
>>>>>>>> >> Some additional comments:
>>>>>>>> >> * The transport.vfs.Streaming property has no impact on XML and
>>>>>>>> SOAP
>>>>>>>> >> processing: this type of content is processed exactly as before.
>>>>>>>> >> * With the changes described here, we have now two different
>>>>>>>> policies
>>>>>>>> >> for plain text and binary content processing: in-memory caching +
>>>>>>>> no
>>>>>>>> >> streaming (transport.vfs.Streaming=false) and no caching + deferred
>>>>>>>> >> connection + streaming (transport.vfs.Streaming=true). Probably we
>>>>>>>> >> should define a wider range of policies in the future, including
>>>>>>>> file
>>>>>>>> >> system caching + streaming.
>>>>>>>> >> * It is necessary to remove the transportNonBlocking property
>>>>>>>> >> (MessageContext.TRANSPORT_NON_BLOCKING) to prevent the <send>
>>>>>>>> mediator
>>>>>>>> >> (more precisely the OperationClient) from executing the outgoing
>>>>>>>> >> transport in a separate thread. This property is set by the
>>>>>>>> incoming
>>>>>>>> >> transport. I think this is a bug since I don't see any valid reason
>>>>>>>> >> why the transport that handles the incoming request should
>>>>>>>> determine
>>>>>>>> >> the threading behavior of the transport that sends the outgoing
>>>>>>>> >> request to the target service. Maybe Asankha can comment on this?
>>>>>>>> >>
>>>>>>>> >> Andreas
>>>>>>>> >>
>>>>>>>> >> On Thu, Mar 5, 2009 at 07:21, kimhorn <[email protected]>
>>>>>>>> wrote:
>>>>>>>> >> >
>>>>>>>> >> > Thats good; as this stops us using Synapse.
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > Asankha C. Perera wrote:
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>> Exception in thread "vfs-Worker-4" java.lang.OutOfMemoryError:
>>>>>>>> Java
>>>>>>>> >> >>> heap
>>>>>>>> >> >>> space
>>>>>>>> >> >>>         at
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
>>>>>>>> >> >>>         at
>>>>>>>> >> >>>
>>>>>>>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
>>>>>>>> >> >>>         at java.lang.StringBuffer.append(StringBuffer.java:307)
>>>>>>>> >> >>>         at java.io.StringWriter.write(StringWriter.java:72)
>>>>>>>> >> >>>         at
>>>>>>>> org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1129)
>>>>>>>> >> >>>         at
>>>>>>>> org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)
>>>>>>>> >> >>>         at
>>>>>>>> org.apache.commons.io.IOUtils.copy(IOUtils.java:1078)
>>>>>>>> >> >>>         at
>>>>>>>> org.apache.commons.io.IOUtils.toString(IOUtils.java:382)
>>>>>>>> >> >>>         at
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> org.apache.synapse.format.PlainTextBuilder.processDocument(PlainTextBuilder.java:68)
>>>>>>>> >> >>>
>>>>>>>> >> >> Since the content type is text, the plain text formatter is
>>>>>>>> trying
>>>>>>>> to
>>>>>>>> >> >> use a String to parse as I see.. which is a problem for large
>>>>>>>> content..
>>>>>>>> >> >>
>>>>>>>> >> >> A definite bug we need to fix ..
>>>>>>>> >> >>
>>>>>>>> >> >> cheers
>>>>>>>> >> >> asankha
>>>>>>>> >> >>
>>>>>>>> >> >> --
>>>>>>>> >> >> Asankha C. Perera
>>>>>>>> >> >> AdroitLogic, http://adroitlogic.org
>>>>>>>> >> >>
>>>>>>>> >> >> http://esbmagic.blogspot.com
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >> >> To unsubscribe, e-mail: [email protected]
>>>>>>>> >> >> For additional commands, e-mail: [email protected]
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >> > --
>>>>>>>> >> > View this message in context:
>>>>>>>> >> >
>>>>>>>> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22345904.html
>>>>>>>> >> > Sent from the Synapse - Dev mailing list archive at Nabble.com.
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >> > To unsubscribe, e-mail: [email protected]
>>>>>>>> >> > For additional commands, e-mail: [email protected]
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >> To unsubscribe, e-mail: [email protected]
>>>>>>>> >> For additional commands, e-mail: [email protected]
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Ruwan Linton
>>>>>>>> > http://wso2.org - "Oxygenating the Web Services Platform"
>>>>>>>> > http://ruwansblog.blogspot.com/
>>>>>>>> >
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ruwan Linton
>>>>>>> http://wso2.org - "Oxygenating the Web Services Platform"
>>>>>>> http://ruwansblog.blogspot.com/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22405973.html
>>>>>> Sent from the Synapse - Dev mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context: 
>>> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22594321.html
>>> Sent from the Synapse - Dev mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: VFS - Synapse Memory Leak

Reply via email to