My changes are in a the feature-attachment-outputstream branch
https://github.com/xwiki/xwiki-platform/commits/feature-attachment-outputstream

IMO the Hibernate blob handling is never really going to work in a 
cross-platform way.
Even if that is not true, it absolutely won't work with Cassandra.

For loading an attachment piecewise, OutputStreams are not the only possible 
solution
but they are (in my analysis) the best.

If we hold off on adding the API because there is no consumer in platform 
proper, I'll
have to code around it and find another (inferior) solution and if piecewise 
attachment
loading ever does reach platform proper, everything will be developed and 
debugged around
inferior solution.

That's my case for it, let me know if you have any specific concerns or if your 
veto is on the general principle.
I want to get an idea of whether this should be cleaned up or killed quickly.

Thanks,

Caleb



On 08/07/2012 11:31 PM, Caleb James DeLisle wrote:
> 
> 
> On 08/07/2012 10:52 PM, Sergiu Dumitriu wrote:
>> On 08/07/2012 03:44 PM, Sergiu Dumitriu wrote:
>>> On 08/07/2012 03:43 PM, Denis Gervalle wrote:
>>>>  From the current interface, I would use getContentOutputStream(), since
>>>> this would be the opposite of getContentInputStream(). This seems to me
>>>> very descriptive compare to a setContent() returning an OutputStream,
>>>> since
>>>> a set is not supposed to return anything. I would use
>>>> setContent(OutputStream) if it was your goal, but this probably not what
>>>> you expect.
>>>>
>>>> So, +1 for  getContentOutputStream()
>>>
>>> +1 as well.
>>
>> Actually, one thing I don't like is that the API is getting even bigger, 
>> with lots of different ways of doing the same thing. Ideally, there should 
>> be only one way of providing the content, and only one way of retrieving it. 
>> And IMO working with byte streams is the right way, and I think that the 
>> current setContent(InputStream) method is the one that's best.
>>
>> Returning an OutputStream where the caller can write the content is opening 
>> the door to lots of potential problems. Providing such a method becomes API, 
>> and APIs should be well designed. And an OutputStream in my head means some 
>> implicit constraints that must be well documented:
>>
> 
> I can answer for the implementation which I wrote as a result of this 
> discussion.
> 
>> - should the stream be closed at the end?
> 
> Yes, if it is not closed then the content does not become "live".
> 
>> - must the response be written before a transaction ends?
> 
> It doesn't matter when it is written since it's just dealing with FileItems.
> 
>> - what if nothing is written in the stream, does that mean that the old 
>> content is preserved, or that the attachment is going to be truncated?
> 
> The attachment becomes is made empty. It is legal to have a 0 byte file.
> 
>>
>> Basically, this is a _push_ API, and I for one prefer _pull_ APIs, since the 
>> backend will get the data when it needs it, it doesn't have to document how 
>> long it's going to wait for something to be pushed.
>>
>>
>> I agree that our current way of dealing with Hibernate is not right. We 
>> should use proper blob streams for big data, like attachment content. And 
>> Hibernate's LobCreator expects a Blob object that offers access to its 
>> content via an InputStream that can be read. Again, Hibernate expects an 
>> input stream from which it will read when it needs to, it doesn't allow you 
>> to write the content into an OutputStream when you want.
> 
> There are a few issues here. When I investigated this, Hibernate's blob 
> handling was (still is?) just a wrapper around blob management in the 
> database which basically doesn't work.
> Oracle has a proprietary blob storage method which works with streams, I 
> think Hibernate does not support this though.
> As I remember, mysql and postgres have blob APIs but they buffer everything 
> in ram making it nonsensical.
> This was investigated to solve attachment memory consumption issues and it 
> was concluded that it does not work, leading to FS attachment store.
> 
> A solution which will work is to piece out the attachment and store each 
> piece separately, flushing the PersistenceManager or Session between 
> save/load operations.
> they can be run in discrete transactions as long as the metadata contains a 
> pointer to the pieces and the old pieces are not removed until after all of 
> the new pieces are saved.
> On the load, you are loading a list of attachment pieces one by one and their 
> content must be written out before it is removed from memory.
> 
> The options for making this work are:
> 
> * copy the content into a temp file then copy it again the InputStream from 
> that file. This double buffers the content.
> * use PipedOutputStream to bounce it between two threads. (which I am doing 
> now but need to debug a problem with it.)
> * extend XWikiAttachmentContent for this use case.
> * add a method to XWikiAttachmentContent which allows for writing to it using 
> an OutputStream.
> 
> Since this pattern would be applicable to Hibernate and the technology which 
> came from Cassandra attachments
> could be ported into the main trunk, I wanted to do the latter.
> If you are not swayed then I will return to evaluating the former methods.
> 
> Caleb
> 
> 
>>
>>
>> So, consider this a -1 until you can convince me that it's indeed something 
>> we want to include in our APIs.
>>
>>>> +0 for getOutputStream()
>>>> -0 for other previous proposals.
>>>>
>>>> On Tue, Aug 7, 2012 at 9:01 PM, Caleb James DeLisle <
>>>> [email protected]> wrote:
>>>>
>>>>> getOutputStream() is not very descriptive although I suppose a good
>>>>> javadoc comment would alleviate
>>>>> the issue, I wrestled with the name myself and settled on setContent()
>>>>> because it overloads the existing setContent() so it should be a bit
>>>>> easier to remember.
>>>>>
>>>>> If you guys like getOutputStream(), I'm happy enough with it.
>>>>>
>>>>> Caleb
>>>>>
>>>>>
>>>>> On 08/07/2012 03:24 AM, Thomas Mortagne wrote:
>>>>>> +1 for the idea in general but same comment than Marius
>>>>>>
>>>>>> On Tue, Aug 7, 2012 at 7:35 AM, Marius Dumitru Florea
>>>>>> <[email protected]> wrote:
>>>>>>> I understand the need and I'm +1 but I don't like the method name
>>>>>>> (neither setContent() nor addContent()). WDYT about getOutputStream()
>>>>>>> ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marius
>>>>>>>
>>>>>>> On Tue, Aug 7, 2012 at 12:06 AM, Caleb James DeLisle
>>>>>>> <[email protected]> wrote:
>>>>>>>> Hi,
>>>>>>>> In the development of Cassandra attachments, I found I want to
>>>>>>>> load an
>>>>> attachment one chunk at a time and
>>>>>>>> write that chunk to a provided OutputStream, this is how I envision
>>>>> next generation Hibernate attachments working too.
>>>>>>>>
>>>>>>>> I would like to add to XWikiAttachmentContent:
>>>>>>>>
>>>>>>>> public OutputStream addContent();
>>>>>>>>
>>>>>>>> which returns an OutputStream that allows writing the attachment
>>>>> content and upon close,
>>>>>>>> sets the attachment content as dirty and resets the size field.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> Caleb
>>
> 
> 
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
> 


_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to