Hi Jukka, So far I thought 'values' is quite a simple concept, and with some exceptions (for example the problems with streams) I thought it should be possible to implement it in a simple way. To tell you the truth, I don't understand the class diagram. For me, it is very complex, I'm confused. For example, I don't understand why you made the distinction between initial and committed values? Also I don't understand what GenericValue is for (specially the hashCode method of this class seems strange, it looks like it is very inefficient because it is always creates a String and doesn't cache the hash value). Other things that are inefficient are CommittedStringValue.getXYZ() because those methods always create Parser objects. But I just don't understand the concept, I'm not saying it is 'wrong' (maybe we can guarantee that these functions are not used a lot).
By the way, one concept I don't currently see in the value factory implementation is caching of commonly used values (except for CommittedTrueValue / CommittedFalseValue). In my experience, this improves the performance and saves a lot of memory because it is very common that a small set of values is used in many places. I could implement such a feature later on if it is not implemented yet. SerializableInputStream: if I understand it correctly, then the old implementation allowed very large (that don't fit in memory) binary values, and the new implementation does not? Please correct me if I'm wrong. If this is would be correct, then for me this alone is reason enough not to apply the patch. I think the ability to store / retrieve very large binary values is very important. I understand it is 'better' not to create temp files on the client side (for speed for example, and to avoid security problems and avoid having to deal with creating / deleting temp files), but if you want to consume the value multiple times (for example, because it is used in multiple places), it is probably the easiest solution. Another case is for high availability clustering, if the value is sent to multiple servers (I understand there is no plan to support that in the near future). Please don't view my comments as critics. I just don't understand the concept behind it, and the reasons for doing the things like you did. I'm for adding the test cases of course! Test cases are always good. Thomas On 12/22/06, Jukka Zitting <[EMAIL PROTECTED]> wrote:
Hi, On 12/22/06, Thomas Mueller <[EMAIL PROTECTED]> wrote: > > The rationale for proposing a revolutionary rewrite rather than incrementally > > improving the existing Value implementation is that the basic design of the > > existing implementation doesn't allow easy extension or customization. > > What kind of extension / customization do you have in mind? I'm just > curious... SPI? SPI might be a good candidate, though I was especially thinking of implementations where you'd rather use a custom adapter to an internal value representation instead of one of the existing the committed value classes. The State pattern in the proposed implementation nicely separates the internal value representation from the value state behaviour, making it easy to implement custom value backends. For example, I've been thinking about a simple SystemViewRepository implementation that would expose a system view XML document through the JCR API. It would make sense for such an implementation to implement Values as adapters of the sv:value nodes in the DOM tree. > > > the stream data is materialized in memory during de-/serialization; > > > this renders it imo unusable for large streams. > > Value serialization should never be used by Jackrabbit core, it's included for > > other applications like JCR-RMI. > > So you mean the JCR-RMI would serialize streams? Does it do that now? JCR-RMI currently uses temporary files for deserialized binary values. I'm not too happy about that, a better approach would be to use a RemoteInputStream to stream the data over the network on-demand. > More generally, is there another way to (more or less) efficiently access a > JCR repository remotely and store / read streams, and what is the plan > for the future? Or is this the goal of the SPI project (from what I > read, I'm not sure any more)? I think the WebDAV server is already pretty good in that respect. Implementing the JCR API on top of the WebDAV interface is one of the goals of the SPI effort. > > The goal of the default implementation is full semantic accuracy without extra > > external dependencies (like to the file system) > > Currently I think that buffering really large (bigger than memory) > streams to disk (temp file) on the client side is the easiest way to > support them in a client / server environment. If you don't do that, > then either you can't support large streams at all, or you need to > implement special handling in the remote protocol. So you have the > dependency there, and things get even more complicated and less > modular compared to disk buffering. Just my opinion. There are legitimate reasons to avoid temporary disk storage in a general purpose component like the proposed Value implementation. For example, you need to make sure that any temporary files are removed when the referencing Value is no longer used and that file contents can not be read by anyone else (in some cases not even by code in the same JVM instance). The limitation on memory use of serializing a binary value is essentially the same as using Value.getString() on a binary value. It's a useful feature in some cases and allows you to quickly prototype things, but you need to add special handling to binary values to make your application scale. BR, Jukka Zitting