Re: [jira] Commented: (JCR-680) Improve the Value implementation

Thomas Mueller Wed, 27 Dec 2006 01:55:18 -0800

Hi Jukka,

So far I thought 'values' is quite a simple concept, and with some
exceptions (for example the problems with streams) I thought it should
be possible to implement it in a simple way. To tell you the truth, I
don't understand the class diagram. For me, it is very complex, I'm
confused. For example, I don't understand why you made the distinction
between initial and committed values? Also I don't understand what
GenericValue is for (specially the hashCode method of this class seems
strange, it looks like it is very inefficient because it is always
creates a String and doesn't cache the hash value). Other things that
are inefficient are CommittedStringValue.getXYZ() because those
methods always create Parser objects. But I just don't understand the
concept, I'm not saying it is 'wrong' (maybe we can guarantee that
these functions are not used a lot).


By the way, one concept I don't currently see in the value factory
implementation is caching of commonly used values (except for
CommittedTrueValue / CommittedFalseValue). In my experience, this
improves the performance and saves a lot of memory because it is very
common that a small set of values is used in many places. I could
implement such a feature later on if it is not implemented yet.

SerializableInputStream: if I understand it correctly, then the old
implementation allowed very large (that don't fit in memory) binary
values, and the new implementation does not? Please correct me if I'm
wrong. If this is would be correct, then for me this alone is reason
enough not to apply the patch. I think the ability to store / retrieve
very large binary values is very important. I understand it is
'better' not to create temp files on the client side (for speed for
example, and to avoid security problems and avoid having to deal with
creating / deleting temp files), but if you want to consume the value
multiple times (for example, because it is used in multiple places),
it is probably the easiest solution. Another case is for high
availability clustering, if the value is sent to multiple servers (I
understand there is no plan to support that in the near future).

Please don't view my comments as critics. I just don't understand the
concept behind it, and the reasons for doing the things like you did.

I'm for adding the test cases of course! Test cases are always good.

Thomas


On 12/22/06, Jukka Zitting <[EMAIL PROTECTED]> wrote:

Hi,

On 12/22/06, Thomas Mueller <[EMAIL PROTECTED]> wrote:
> > The rationale for proposing a revolutionary rewrite rather than 
incrementally
> > improving the existing Value implementation is that the basic design of the
> > existing implementation doesn't allow easy extension or customization.
>
> What kind of extension / customization do you have in mind? I'm just
> curious... SPI?

SPI might be a good candidate, though I was especially thinking of
implementations where you'd rather use a custom adapter to an internal
value representation instead of one of the  existing the committed
value classes. The State pattern in the proposed implementation nicely
separates the internal value representation from the value state
behaviour, making it easy to implement custom value backends.

For example, I've been thinking about a simple SystemViewRepository
implementation that would expose a system view XML document through
the JCR API. It would make sense for such an implementation to
implement Values as adapters of the sv:value nodes in the DOM tree.

> > >   the stream data is materialized in memory during de-/serialization;
> > >   this renders it imo unusable for large streams.
> > Value serialization should never be used by Jackrabbit core, it's included 
for
> > other applications like JCR-RMI.
>
> So you mean the JCR-RMI would serialize streams? Does it do that now?

JCR-RMI currently uses temporary files for deserialized binary values.
I'm not too happy about that, a better approach would be to use a
RemoteInputStream to stream the data over the network on-demand.

> More generally, is there another way to (more or less) efficiently access a
> JCR repository remotely and store / read streams, and what is the plan
> for the future? Or is this the goal of the SPI project (from what I
> read, I'm not sure any more)?

I think the WebDAV server is already pretty good in that respect.
Implementing the JCR API on top of the WebDAV interface is one of the
goals of the SPI effort.

> > The goal of the default implementation is full semantic accuracy without 
extra
> > external dependencies (like to the file system)
>
> Currently I think that buffering really large (bigger than memory)
> streams to disk (temp file) on the client side is the easiest way to
> support them in a client / server environment. If you don't do that,
> then either you can't support large streams at all, or you need to
> implement special handling in the remote protocol. So you have the
> dependency there, and things get even more complicated and less
> modular compared to disk buffering. Just my opinion.

There are legitimate reasons to avoid temporary disk storage in a
general purpose component like the proposed Value implementation. For
example, you need to make sure that any temporary files are removed
when the referencing Value is no longer used and that file contents
can not be read by anyone else (in some cases not even by code in the
same JVM instance).

The limitation on memory use of serializing a binary value is
essentially the same as using Value.getString() on a binary value.
It's a useful feature in some cases and allows you to quickly
prototype things, but you need to add special handling to binary
values to make your application scale.

BR,

Jukka Zitting

Re: [jira] Commented: (JCR-680) Improve the Value implementation

Reply via email to