On Sun, 2008-03-16 at 17:53 +0530, Senaka Fernando wrote: > Hi Samisa, > > IIRC, this discussion is on handling attachments and thus, does not relate > to caching. Though $subject says "Caching" what actually was discussed was > a mechanism to buffer the attachment in a file,
I used the word "Caching" because the jira https://issues.apache.org/jira/browse/AXIS2C-672 used the word caching and our Axis2/Java uses that word for buffer the attachment to file. So I don't think any of the developers misunderstood that. > rather than in memory, and > that buffer has nothing to do with a Caching, which is a totally different > concept, as in [1]. > > The previous mail I sent was a reply to Manjula's concern in handling a > scenario where the MIME boundary appears as two parts distributed among > two reads. As unlike the previous scenarios, the once read block will be > flushed to a file, instead of having it in memory. Thus, parsing may have > to be thought of. Sorry if it confused you. > > IMHO, writing a partially parsed buffer to a file is not that efficient as > we will have to parse it sometime later, to discover MIME Boundaries and > extract attachments. Thus, I still believe that realtime buffering to a > file while parsing is still a better choice. To implement such, we will > have to modify our mime_parser.c, and probably the data_handler > implementation. > > Or if not, am I misunderstanding $subject? > > [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html > > Regards, > Senaka > > > Senaka Fernando wrote: > >>> Hi Manjula, Thilina and others, > >>> > >>> Yep, I think I'm exactly in the same view point as Thilina when it > >>> comes > >>> to handling attachment data. Well for the chunking part. I think I > >>> didn't > >>> get Thilina right in his first e-mail. > >>> > >>> And, However, the file per MIME part may not always be optimal. I say > >>> rather each file should have a fixed Max Size and if that is exceeded > >>> perhaps you can divide it to two. Also a user should always be given > >>> the > >>> option to choose between Thilina's method and this method through the > >>> axis2.xml (or services.xml). Thus, a user can fine tune memory use. > >>> > >>> When it comes to base64 encoded binary data, you can use a mechanism > >>> where > >>> the buffer would always have the size which is a multiple of 4, and > >>> then > >>> when you flush you decode it and copy it to the file, so that should > >>> essentially be the same to a user when it comes to caching. > >>> > >>> OK, so Manjula, you mean when the MIME boundary appears partially in > >>> the > >>> first read and partially in the second? > >>> > >>> Well this is probably the best solution. > >>> > >>> You will allocate enough size to read twice the size of a MIME boundary > >>> and in your very first read, you will read 2 times the MIME boundary, > >>> then > >>> you will search for the existence of the MIME boundary. Next you will > >>> do a > >>> memmove() and move all the contents of the buffer starting from the > >>> MidPoint until the end, to the beginning of the buffer. After doing > >>> this, > >>> you will read a size equivalent to 1/2 the buffer (which again is the > >>> size > >>> of the MIME boundary marker) and store it from the Mid Point of the > >>> buffer > >>> to the end. Then you will search again. You will iterate this procedure > >>> until you read less than half the size of the buffer. > >>> > >> > >> If you are interested further in this mechanism, I used this approach > >> when > >> it comes to resending Binary data using TCPMon. You may check that also. > >> > >> Also, the strstr() has issues when you have '\0' in the middle. Thus you > >> will have to use a temporary search marker and use that in the process. > >> Before calling strstr() you will check whether strlen(temp) is greater > >> than the MIME boundary marker or equal. If it is greater, you only need > >> to > >> search once. If it is equal, you will need to search exactly twice. If > >> it > >> is less you increment temp by strlen(temp) and repeat until you cross > >> the > >> Midpoint. So this makes the search even efficient. > >> > >> If you want to make the search even efficient, you can make the buffer > >> size one less than the size of the MIME boundary marker, so when you get > >> the equals scenario, you will have to search only once. > >> > >> The fact I've used here is that strstr and strlen behaves the same in a > >> given implementation. In Windows if strlen() is multibyte aware, so will > >> strstr(). So, no worries. > >> > > > > We have an efficient parsing mechanism already, tested and proven to > > work, with 1.3. Why on earth are we discussing this over and over again? > > > > Does caching get affected by the mime parser logic? IMHO no. They are > > two separate concerns, so ahy are we wasting time discussing parsing > > while the problem at had is not parsing but caching? > > > > Writing the partially passed buffer was a solution to caching. Do we > > have any other alternatives? If so what, in short, what are they? > > > > Samisa... > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
