Andreas, On Mon, Mar 9, 2009 at 5:24 PM, Andreas Veithen <[email protected]>wrote:
> The changes I did in the VFS transport and the message builders for > text/plain and application/octet-stream certainly don't provide an > out-of-the-box solution for your use case, but they are the > prerequisite. > > Concerning your first proposed solution (let the VFS write the content > to a temporary file), I don't like this because it would create a > tight coupling between the VFS transport and the mediator. A design > goal should be that the solution will still work if the file comes > from another source, e.g. an attachment in an MTOM or SwA message. > > I thing that an all-Synapse solution (2 or 3) should be possible, but > this will require development of a custom mediator. This mediator > would read the content, split it up (and store the chunks in memory or > an disk) and executes a sub-sequence for each chunk. The execution of > the sub-sequence would happen synchronously to limit the memory/disk > space consumption (to the maximum chunk size) and to avoid flooding > the destination service. > > Note that it is probably not possible to implemented the mediator > using a script because of the problematic String handling. Also, > Spring, POJO and class mediators don't support sub-sequences (I > think). Therefore it should be implemented as a full-featured Java > mediator, probably taking the existing iterate mediator as a template. > I can contribute the required code to get the text content in the form > of a java.io.Reader. Could you please explain this is bit? do you mean to implement the transport to give out text content as a java.io.Reader? If so what is the general usage of this except for this particular scenario? Thanks, Ruwan > > > Regards, > > Andreas > > On Mon, Mar 9, 2009 at 03:05, kimhorn <[email protected]> wrote: > > > > Although this is a good feature it may not solve the actual problem ? > > The main first issue on my list was the memory leak. > > However, the real problem is once I get this massive files I have to > send > > it to a web Service that can only take it in small chunks (about 14MB) . > > Streaming it straight out would just kill the destination Web service. It > > would get the memory error. The text document can be split apart easily, > as > > it has independant records on each line seperated by <CR> <LF>. > > > > In an earlier post; that was not responded too, I mentioned: > > > > "Otherwise; for large EDI files a VFS iterator Mediator that streams > through > > input file and outputs smaller > > chunks for processing, in Synapse, may be a solution ? " > > > > So I had mentioned a few solutions, in prior posts, solution now are: > > > > 1) VFS writes straight to temporary file, then a Java mediator can > process > > the file by splitting it into many smaller files. These files then > trigger > > another VFS proxy that submits these to the final web Service. > > The problem is is that is uses the file system (not so bad). > > 2) A Java Mediator takes the <text> package and splits it up by wrapping > > into many XML <data> elements that can then be acted on by a Synapse > > Iterator. So replace the text message with many smaller XML elements. > > Problem is that this loads whole message into memory. > > 3) Create another Iterator in Synapse that works on Regular expression > (to > > split the text data) or actually uses a for loop approach to chop the > file > > into chunks based on the loop index value. E.g. Index = 23 means a 14K > chunk > > 23 chunks into the data. > > 4) Using the approach proposed now - just submit the file straight > (stream > > it) to another web service that chops it up. It may return an XML > document > > with many sub elelements that allows the standard Iterator to work. > Similar > > to (2) but using another service rather than Java to split document. > > 5) Using the approach proposed now - just submit the file straight > (stream > > it) to another web service that chops it up but calls a Synapse proxy > with > > each small packet of data that then forwards it to the final WEb Service. > So > > the Web Service iterates across the data; and not Synapse. > > > > Then other solutions replace Synapse with a stand alone Java program at > the > > front end. > > > > Another issue here is throttling: Splitting the file is one issues but > > submitting 100's of calls in parralel to the destination service would > > result in time outs... So need to work in throttling. > > > > > > > > > > > > > > > > > > Ruwan Linton wrote: > >> > >> I agree and can understand the time factor and also +1 for reusing stuff > >> than trying to invent the wheel again :-) > >> > >> Thanks, > >> Ruwan > >> > >> On Sun, Mar 8, 2009 at 4:08 PM, Andreas Veithen > >> <[email protected]>wrote: > >> > >>> Ruwan, > >>> > >>> It's not a question of possibility, it is a question of available time > >>> :-) > >>> > >>> Also note that some of the features that we might want to implement > >>> have some similarities with what is done for attachments in Axiom > >>> (except that an attachment is only available once, while a file over > >>> VFS can be read several times). I think there is also some existing > >>> code in Axis2 that might be useful. We should not reimplement these > >>> things but try to make the existing code reusable. This however is > >>> only realistic for the next release after 1.3. > >>> > >>> Andreas > >>> > >>> On Sun, Mar 8, 2009 at 03:47, Ruwan Linton <[email protected]> > >>> wrote: > >>> > Andreas, > >>> > > >>> > Can we have the caching at the file system as a property to support > the > >>> > multiple layers touching the full message and is it possible make it > to > >>> > specify a threshold for streaming? For example if the message is > >>> touched > >>> > several time we might still need streaming but not for the 100KB or > >>> lesser > >>> > files. > >>> > > >>> > Thanks, > >>> > Ruwan > >>> > > >>> > On Sun, Mar 8, 2009 at 1:12 AM, Andreas Veithen < > >>> [email protected]> > >>> > wrote: > >>> >> > >>> >> I've done an initial implementation of this feature. It is available > >>> >> in trunk and should be included in the next nightly build. In order > to > >>> >> enable this in your configuration, you need to add the following > >>> >> property to the proxy: > >>> >> > >>> >> <parameter name="transport.vfs.Streaming">true</parameter> > >>> >> > >>> >> You also need to add the following mediators just before the <send> > >>> >> mediator: > >>> >> > >>> >> <property action="remove" name="transportNonBlocking" > scope="axis2"/> > >>> >> <property action="set" name="OUT_ONLY" value="true"/> > >>> >> > >>> >> With this configuration Synapse will stream the data directly from > the > >>> >> incoming to the outgoing transport without storing it in memory or > in > >>> >> a temporary file. Note that this has two other side effects: > >>> >> * The incoming file (or connection in case of a remote file) will > only > >>> >> be opened on demand. In this case this happens during execution of > the > >>> >> <send> mediator. > >>> >> * If during the mediation the content of the file is needed several > >>> >> time (which is not the case in your example), it will be read > several > >>> >> times. The reason is of course that the content is not cached. > >>> >> > >>> >> I tested the solution with a 2GB file and it worked fine. The > >>> >> performance of the implementation is not yet optimal, but at least > the > >>> >> memory consumption is constant. > >>> >> > >>> >> Some additional comments: > >>> >> * The transport.vfs.Streaming property has no impact on XML and SOAP > >>> >> processing: this type of content is processed exactly as before. > >>> >> * With the changes described here, we have now two different > policies > >>> >> for plain text and binary content processing: in-memory caching + no > >>> >> streaming (transport.vfs.Streaming=false) and no caching + deferred > >>> >> connection + streaming (transport.vfs.Streaming=true). Probably we > >>> >> should define a wider range of policies in the future, including > file > >>> >> system caching + streaming. > >>> >> * It is necessary to remove the transportNonBlocking property > >>> >> (MessageContext.TRANSPORT_NON_BLOCKING) to prevent the <send> > mediator > >>> >> (more precisely the OperationClient) from executing the outgoing > >>> >> transport in a separate thread. This property is set by the incoming > >>> >> transport. I think this is a bug since I don't see any valid reason > >>> >> why the transport that handles the incoming request should determine > >>> >> the threading behavior of the transport that sends the outgoing > >>> >> request to the target service. Maybe Asankha can comment on this? > >>> >> > >>> >> Andreas > >>> >> > >>> >> On Thu, Mar 5, 2009 at 07:21, kimhorn <[email protected]> > wrote: > >>> >> > > >>> >> > Thats good; as this stops us using Synapse. > >>> >> > > >>> >> > > >>> >> > > >>> >> > Asankha C. Perera wrote: > >>> >> >> > >>> >> >> > >>> >> >>> Exception in thread "vfs-Worker-4" java.lang.OutOfMemoryError: > >>> Java > >>> >> >>> heap > >>> >> >>> space > >>> >> >>> at > >>> >> >>> > >>> >> >>> > >>> > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99) > >>> >> >>> at > >>> >> >>> > >>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518) > >>> >> >>> at java.lang.StringBuffer.append(StringBuffer.java:307) > >>> >> >>> at java.io.StringWriter.write(StringWriter.java:72) > >>> >> >>> at > >>> org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1129) > >>> >> >>> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104) > >>> >> >>> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1078) > >>> >> >>> at > >>> org.apache.commons.io.IOUtils.toString(IOUtils.java:382) > >>> >> >>> at > >>> >> >>> > >>> >> >>> > >>> > org.apache.synapse.format.PlainTextBuilder.processDocument(PlainTextBuilder.java:68) > >>> >> >>> > >>> >> >> Since the content type is text, the plain text formatter is > trying > >>> to > >>> >> >> use a String to parse as I see.. which is a problem for large > >>> content.. > >>> >> >> > >>> >> >> A definite bug we need to fix .. > >>> >> >> > >>> >> >> cheers > >>> >> >> asankha > >>> >> >> > >>> >> >> -- > >>> >> >> Asankha C. Perera > >>> >> >> AdroitLogic, http://adroitlogic.org > >>> >> >> > >>> >> >> http://esbmagic.blogspot.com > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> --------------------------------------------------------------------- > >>> >> >> To unsubscribe, e-mail: [email protected] > >>> >> >> For additional commands, e-mail: [email protected] > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> > > >>> >> > -- > >>> >> > View this message in context: > >>> >> > > >>> > http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22345904.html > >>> >> > Sent from the Synapse - Dev mailing list archive at Nabble.com. > >>> >> > > >>> >> > > >>> >> > > >>> --------------------------------------------------------------------- > >>> >> > To unsubscribe, e-mail: [email protected] > >>> >> > For additional commands, e-mail: [email protected] > >>> >> > > >>> >> > > >>> >> > >>> >> > --------------------------------------------------------------------- > >>> >> To unsubscribe, e-mail: [email protected] > >>> >> For additional commands, e-mail: [email protected] > >>> >> > >>> > > >>> > > >>> > > >>> > -- > >>> > Ruwan Linton > >>> > http://wso2.org - "Oxygenating the Web Services Platform" > >>> > http://ruwansblog.blogspot.com/ > >>> > > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [email protected] > >>> For additional commands, e-mail: [email protected] > >>> > >>> > >> > >> > >> -- > >> Ruwan Linton > >> http://wso2.org - "Oxygenating the Web Services Platform" > >> http://ruwansblog.blogspot.com/ > >> > >> > > > > -- > > View this message in context: > http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22405973.html > > Sent from the Synapse - Dev mailing list archive at Nabble.com. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Ruwan Linton http://wso2.org - "Oxygenating the Web Services Platform" http://ruwansblog.blogspot.com/
