No, basically the code would take an OMElement and return a Reader
that represents the text content of that element. It would take care
of doing this in an optimal way (constant memory usage and minimal
usage of intermediate buffers), i.e. it would provide the same
functionality than new StringReader(omElement.getText()), but without
loading the entire data into memory. We already do something like this
in the PlainTextFormatter, but here the idea is to encapsulate that
nicely behind a Reader implementation. Note that this is not at all
transport specific and would work with any OMElement.

Having this piece of code means that the streaming aspect is done and
that the problem is reduced to the implementation of a
split-iterate-callout mediation/mediator. (I'm trying to decompose the
original problem into smaller pieces that can be reused elsewhere)

Andreas

On Mon, Mar 9, 2009 at 16:53, Ruwan Linton <[email protected]> wrote:
> Andreas,
>
> On Mon, Mar 9, 2009 at 5:24 PM, Andreas Veithen <[email protected]>
> wrote:
>>
>> The changes I did in the VFS transport and the message builders for
>> text/plain and application/octet-stream certainly don't provide an
>> out-of-the-box solution for your use case, but they are the
>> prerequisite.
>>
>> Concerning your first proposed solution (let the VFS write the content
>> to a temporary file), I don't like this because it would create a
>> tight coupling between the VFS transport and the mediator. A design
>> goal should be that the solution will still work if the file comes
>> from another source, e.g. an attachment in an MTOM or SwA message.
>>
>> I thing that an all-Synapse solution (2 or 3) should be possible, but
>> this will require development of a custom mediator. This mediator
>> would read the content, split it up (and store the chunks in memory or
>> an disk) and executes a sub-sequence for each chunk. The execution of
>> the sub-sequence would happen synchronously to limit the memory/disk
>> space consumption (to the maximum chunk size) and to avoid flooding
>> the destination service.
>>
>> Note that it is probably not possible to implemented the mediator
>> using a script because of the problematic String handling. Also,
>> Spring, POJO and class mediators don't support sub-sequences (I
>> think). Therefore it should be implemented as a full-featured Java
>> mediator, probably taking the existing iterate mediator as a template.
>> I can contribute the required code to get the text content in the form
>> of a java.io.Reader.
>
> Could you please explain this is bit? do you mean to implement the transport
> to give out text content as a java.io.Reader? If so what is the general
> usage of this except for this particular scenario?
>
> Thanks,
> Ruwan
>
>>
>> Regards,
>>
>> Andreas
>>
>> On Mon, Mar 9, 2009 at 03:05, kimhorn <[email protected]> wrote:
>> >
>> > Although this is a good feature it may not solve the actual problem ?
>> > The main first issue on my list was the memory leak.
>> > However, the real problem is once I get this massive files I  have to
>> > send
>> > it to a web Service that can only take it in small chunks (about 14MB) .
>> > Streaming it straight out would just kill the destination Web service.
>> > It
>> > would get the memory error. The text document can be split apart easily,
>> > as
>> > it has independant records on each line seperated by <CR> <LF>.
>> >
>> > In an earlier post; that was not responded too, I mentioned:
>> >
>> > "Otherwise; for large EDI files a VFS iterator Mediator that streams
>> > through
>> > input file and outputs smaller
>> > chunks for processing, in Synapse, may be a solution ? "
>> >
>> > So I had mentioned a few solutions, in prior posts, solution now are:
>> >
>> > 1) VFS writes straight to temporary file, then a Java mediator can
>> > process
>> > the file by splitting it into many smaller files. These files then
>> > trigger
>> > another VFS proxy that submits these to the final web Service.
>> > The problem is is that is uses the file system (not so bad).
>> > 2) A Java Mediator takes the <text> package and splits it up by wrapping
>> > into many XML <data> elements that can then be acted on by a Synapse
>> > Iterator. So replace the text message with many smaller XML elements.
>> > Problem is that this loads whole message into memory.
>> > 3) Create another Iterator in Synapse that works on Regular expression
>> > (to
>> > split the text data) or actually uses a for loop approach to chop the
>> > file
>> > into chunks based on the loop index value. E.g. Index = 23 means a 14K
>> > chunk
>> > 23 chunks into the data.
>> > 4) Using the approach proposed now - just submit the file straight
>> > (stream
>> > it) to another web service that chops it up. It may return an XML
>> > document
>> > with many sub elelements that allows the standard Iterator to work.
>> > Similar
>> > to (2) but using another service rather than Java to split document.
>> > 5) Using the approach proposed now - just submit the file straight
>> > (stream
>> > it) to another web service that chops it up but calls a Synapse proxy
>> > with
>> > each small packet of data that then forwards it to the final WEb
>> > Service. So
>> > the Web Service iterates across the data; and not Synapse.
>> >
>> > Then other solutions replace Synapse with a stand alone Java program at
>> > the
>> > front end.
>> >
>> > Another issue here is throttling: Splitting the file is one issues but
>> > submitting 100's of calls in parralel to the destination service would
>> > result in time outs... So need to work in throttling.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Ruwan Linton wrote:
>> >>
>> >> I agree and can understand the time factor and also +1 for reusing
>> >> stuff
>> >> than trying to invent the wheel again :-)
>> >>
>> >> Thanks,
>> >> Ruwan
>> >>
>> >> On Sun, Mar 8, 2009 at 4:08 PM, Andreas Veithen
>> >> <[email protected]>wrote:
>> >>
>> >>> Ruwan,
>> >>>
>> >>> It's not a question of possibility, it is a question of available time
>> >>> :-)
>> >>>
>> >>> Also note that some of the features that we might want to implement
>> >>> have some similarities with what is done for attachments in Axiom
>> >>> (except that an attachment is only available once, while a file over
>> >>> VFS can be read several times). I think there is also some existing
>> >>> code in Axis2 that might be useful. We should not reimplement these
>> >>> things but try to make the existing code reusable. This however is
>> >>> only realistic for the next release after 1.3.
>> >>>
>> >>> Andreas
>> >>>
>> >>> On Sun, Mar 8, 2009 at 03:47, Ruwan Linton <[email protected]>
>> >>> wrote:
>> >>> > Andreas,
>> >>> >
>> >>> > Can we have the caching at the file system as a property to support
>> >>> > the
>> >>> > multiple layers touching the full message and is it possible make it
>> >>> > to
>> >>> > specify a threshold for streaming? For example if the message is
>> >>> touched
>> >>> > several time we might still need streaming but not for the 100KB or
>> >>> lesser
>> >>> > files.
>> >>> >
>> >>> > Thanks,
>> >>> > Ruwan
>> >>> >
>> >>> > On Sun, Mar 8, 2009 at 1:12 AM, Andreas Veithen <
>> >>> [email protected]>
>> >>> > wrote:
>> >>> >>
>> >>> >> I've done an initial implementation of this feature. It is
>> >>> >> available
>> >>> >> in trunk and should be included in the next nightly build. In order
>> >>> >> to
>> >>> >> enable this in your configuration, you need to add the following
>> >>> >> property to the proxy:
>> >>> >>
>> >>> >> <parameter name="transport.vfs.Streaming">true</parameter>
>> >>> >>
>> >>> >> You also need to add the following mediators just before the <send>
>> >>> >> mediator:
>> >>> >>
>> >>> >> <property action="remove" name="transportNonBlocking"
>> >>> >> scope="axis2"/>
>> >>> >> <property action="set" name="OUT_ONLY" value="true"/>
>> >>> >>
>> >>> >> With this configuration Synapse will stream the data directly from
>> >>> >> the
>> >>> >> incoming to the outgoing transport without storing it in memory or
>> >>> >> in
>> >>> >> a temporary file. Note that this has two other side effects:
>> >>> >> * The incoming file (or connection in case of a remote file) will
>> >>> >> only
>> >>> >> be opened on demand. In this case this happens during execution of
>> >>> >> the
>> >>> >> <send> mediator.
>> >>> >> * If during the mediation the content of the file is needed several
>> >>> >> time (which is not the case in your example), it will be read
>> >>> >> several
>> >>> >> times. The reason is of course that the content is not cached.
>> >>> >>
>> >>> >> I tested the solution with a 2GB file and it worked fine. The
>> >>> >> performance of the implementation is not yet optimal, but at least
>> >>> >> the
>> >>> >> memory consumption is constant.
>> >>> >>
>> >>> >> Some additional comments:
>> >>> >> * The transport.vfs.Streaming property has no impact on XML and
>> >>> >> SOAP
>> >>> >> processing: this type of content is processed exactly as before.
>> >>> >> * With the changes described here, we have now two different
>> >>> >> policies
>> >>> >> for plain text and binary content processing: in-memory caching +
>> >>> >> no
>> >>> >> streaming (transport.vfs.Streaming=false) and no caching + deferred
>> >>> >> connection + streaming (transport.vfs.Streaming=true). Probably we
>> >>> >> should define a wider range of policies in the future, including
>> >>> >> file
>> >>> >> system caching + streaming.
>> >>> >> * It is necessary to remove the transportNonBlocking property
>> >>> >> (MessageContext.TRANSPORT_NON_BLOCKING) to prevent the <send>
>> >>> >> mediator
>> >>> >> (more precisely the OperationClient) from executing the outgoing
>> >>> >> transport in a separate thread. This property is set by the
>> >>> >> incoming
>> >>> >> transport. I think this is a bug since I don't see any valid reason
>> >>> >> why the transport that handles the incoming request should
>> >>> >> determine
>> >>> >> the threading behavior of the transport that sends the outgoing
>> >>> >> request to the target service. Maybe Asankha can comment on this?
>> >>> >>
>> >>> >> Andreas
>> >>> >>
>> >>> >> On Thu, Mar 5, 2009 at 07:21, kimhorn <[email protected]>
>> >>> >> wrote:
>> >>> >> >
>> >>> >> > Thats good; as this stops us using Synapse.
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > Asankha C. Perera wrote:
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>> Exception in thread "vfs-Worker-4" java.lang.OutOfMemoryError:
>> >>> Java
>> >>> >> >>> heap
>> >>> >> >>> space
>> >>> >> >>>         at
>> >>> >> >>>
>> >>> >> >>>
>> >>>
>> >>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
>> >>> >> >>>         at
>> >>> >> >>>
>> >>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
>> >>> >> >>>         at java.lang.StringBuffer.append(StringBuffer.java:307)
>> >>> >> >>>         at java.io.StringWriter.write(StringWriter.java:72)
>> >>> >> >>>         at
>> >>> org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1129)
>> >>> >> >>>         at
>> >>> >> >>> org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)
>> >>> >> >>>         at
>> >>> >> >>> org.apache.commons.io.IOUtils.copy(IOUtils.java:1078)
>> >>> >> >>>         at
>> >>> org.apache.commons.io.IOUtils.toString(IOUtils.java:382)
>> >>> >> >>>         at
>> >>> >> >>>
>> >>> >> >>>
>> >>>
>> >>> org.apache.synapse.format.PlainTextBuilder.processDocument(PlainTextBuilder.java:68)
>> >>> >> >>>
>> >>> >> >> Since the content type is text, the plain text formatter is
>> >>> >> >> trying
>> >>> to
>> >>> >> >> use a String to parse as I see.. which is a problem for large
>> >>> content..
>> >>> >> >>
>> >>> >> >> A definite bug we need to fix ..
>> >>> >> >>
>> >>> >> >> cheers
>> >>> >> >> asankha
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> Asankha C. Perera
>> >>> >> >> AdroitLogic, http://adroitlogic.org
>> >>> >> >>
>> >>> >> >> http://esbmagic.blogspot.com
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> ---------------------------------------------------------------------
>> >>> >> >> To unsubscribe, e-mail: [email protected]
>> >>> >> >> For additional commands, e-mail: [email protected]
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >
>> >>> >> > --
>> >>> >> > View this message in context:
>> >>> >> >
>> >>>
>> >>> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22345904.html
>> >>> >> > Sent from the Synapse - Dev mailing list archive at Nabble.com.
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> ---------------------------------------------------------------------
>> >>> >> > To unsubscribe, e-mail: [email protected]
>> >>> >> > For additional commands, e-mail: [email protected]
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >> ---------------------------------------------------------------------
>> >>> >> To unsubscribe, e-mail: [email protected]
>> >>> >> For additional commands, e-mail: [email protected]
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Ruwan Linton
>> >>> > http://wso2.org - "Oxygenating the Web Services Platform"
>> >>> > http://ruwansblog.blogspot.com/
>> >>> >
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: [email protected]
>> >>> For additional commands, e-mail: [email protected]
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Ruwan Linton
>> >> http://wso2.org - "Oxygenating the Web Services Platform"
>> >> http://ruwansblog.blogspot.com/
>> >>
>> >>
>> >
>> > --
>> > View this message in context:
>> > http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22405973.html
>> > Sent from the Synapse - Dev mailing list archive at Nabble.com.
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>
>
> --
> Ruwan Linton
> http://wso2.org - "Oxygenating the Web Services Platform"
> http://ruwansblog.blogspot.com/
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to