Re: Caching support for large attachments

Manjula Peiris Sun, 16 Mar 2008 05:52:29 -0700

On Sun, 2008-03-16 at 17:53 +0530, Senaka Fernando wrote:
> Hi Samisa,
> 
> IIRC, this discussion is on handling attachments and thus, does not relate
> to caching. Though $subject says "Caching" what actually was discussed was
> a mechanism to buffer the attachment in a file,


I used the word "Caching" because the jira
https://issues.apache.org/jira/browse/AXIS2C-672

used the word caching and our Axis2/Java uses that word for buffer the
attachment to file. So I don't think any of the developers misunderstood
that.

>  rather than in memory, and
> that buffer has nothing to do with a Caching, which is a totally different
> concept, as in [1].
> 
> The previous mail I sent was a reply to Manjula's concern in handling a
> scenario where the MIME boundary appears as two parts distributed among
> two reads. As unlike the previous scenarios, the once read block will be
> flushed to a file, instead of having it in memory. Thus, parsing may have
> to be thought of. Sorry if it confused you.
> 
> IMHO, writing a partially parsed buffer to a file is not that efficient as
> we will have to parse it sometime later, to discover MIME Boundaries and
> extract attachments. Thus, I still believe that realtime buffering to a
> file while parsing is still a better choice. To implement such, we will
> have to modify our mime_parser.c, and probably the data_handler
> implementation.
> 
> Or if not, am I misunderstanding $subject?
> 
> [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
> 
> Regards,
> Senaka
> 
> > Senaka Fernando wrote:
> >>> Hi Manjula, Thilina and others,
> >>>
> >>> Yep, I think I'm exactly in the same view point as Thilina when it
> >>> comes
> >>> to handling attachment data. Well for the chunking part. I think I
> >>> didn't
> >>> get Thilina right in his first e-mail.
> >>>
> >>> And, However, the file per MIME part may not always be optimal. I say
> >>> rather  each file should have a fixed Max Size and if that is exceeded
> >>> perhaps you can divide it to two. Also a user should always be given
> >>> the
> >>> option to choose between Thilina's method and this method through the
> >>> axis2.xml (or services.xml). Thus, a user can fine tune memory use.
> >>>
> >>> When it comes to base64 encoded binary data, you can use a mechanism
> >>> where
> >>> the buffer would always have the size which is a multiple of 4, and
> >>> then
> >>> when you flush you decode it and copy it to the file, so that should
> >>> essentially be the same to a user when it comes to caching.
> >>>
> >>> OK, so Manjula, you mean when the MIME boundary appears partially in
> >>> the
> >>> first read and partially in the second?
> >>>
> >>> Well this is probably the best solution.
> >>>
> >>> You will allocate enough size to read twice the size of a MIME boundary
> >>> and in your very first read, you will read 2 times the MIME boundary,
> >>> then
> >>> you will search for the existence of the MIME boundary. Next you will
> >>> do a
> >>> memmove() and move all the contents of the buffer starting from the
> >>> MidPoint until the end, to the beginning of the buffer. After doing
> >>> this,
> >>> you will read a size equivalent to 1/2 the buffer (which again is the
> >>> size
> >>> of the MIME boundary marker) and store it from the Mid Point of the
> >>> buffer
> >>> to the end. Then you will search again. You will iterate this procedure
> >>> until you read less than half the size of the buffer.
> >>>
> >>
> >> If you are interested further in this mechanism, I used this approach
> >> when
> >> it comes to resending Binary data using TCPMon. You may check that also.
> >>
> >> Also, the strstr() has issues when you have '\0' in the middle. Thus you
> >> will have to use a temporary search marker and use that in the process.
> >> Before calling strstr() you will check whether strlen(temp) is greater
> >> than the MIME boundary marker or equal. If it is greater, you only need
> >> to
> >> search once. If it is equal, you will need to search exactly twice. If
> >> it
> >> is less you increment temp by strlen(temp) and repeat until you cross
> >> the
> >> Midpoint. So this makes the search even efficient.
> >>
> >> If you want to make the search even efficient, you can make the buffer
> >> size one less than the size of the MIME boundary marker, so when you get
> >> the equals scenario, you will have to search only once.
> >>
> >> The fact I've used here is that strstr and strlen behaves the same in a
> >> given implementation. In Windows if strlen() is multibyte aware, so will
> >> strstr(). So, no worries.
> >>
> >
> > We have an efficient parsing mechanism already, tested and proven to
> > work, with 1.3. Why on earth are we discussing this over and over again?
> >
> > Does caching get affected by the mime parser logic? IMHO no. They are
> > two separate concerns, so ahy are we wasting time discussing parsing
> > while the problem at had is not parsing but caching?
> >
> > Writing the partially passed buffer was a solution to caching. Do we
> > have any other alternatives? If so what, in short, what are they?
> >
> > Samisa...
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Caching support for large attachments

Reply via email to