[jira] Commented: (JCR-680) Improve the Value implementation
[ https://issues.apache.org/jira/browse/JCR-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462186 ] Jukka Zitting commented on JCR-680: --- > I agreed with stefan that it's a bit hard to understand how it works. > especially because the > inner classes modify the field of the GenericValue. Another alternative for the internal InitialValue classes would be to add a package-protected setValue() state setter in GenericValue and have the InitialValues as normal top-level classes. The current approach achieves maximum encapsulation at the expense of the extra complexity of having the InitialValues as inner classes. I'm not opposed to changing that. > to only thing i'm not happy with is the 'equals()' and 'hashCode()' > implementation. hashcode > should cache the code, and i suggest that equals delegates to the values. for > example it might > result in very poor performance if 2 date values always need to convert to > string for comparison: I'm not sure how performance-critical the Value equality and hash code operations are, thus I'd rather optimize for clarity by default. Also, see JCR-598 where the previous date optimization actually caused incorrect behaviour. > Improve the Value implementation > > > Key: JCR-680 > URL: https://issues.apache.org/jira/browse/JCR-680 > Project: Jackrabbit > Issue Type: Improvement > Components: core >Reporter: Jukka Zitting > Assigned To: Jukka Zitting >Priority: Minor > Attachments: class.jpg, GenericValue.patch, GenericValue.tar.gz, > JCR-680.patch > > > The current Value implementation found in jackrabbit-jcr-commons has some > deficiencies like Value.equals() being incorrect in some cases (see for > example JCR-320), and Name and Path values not following namespace remappings. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-680) Improve the Value implementation
[ https://issues.apache.org/jira/browse/JCR-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462176 ] Tobias Bocanegra commented on JCR-680: -- I agreed with stefan that it's a bit hard to understand how it works. especially because the inner classes modify the field of the GenericValue. however, it seems to make the other values classes much leaner :-) and the stream/value ambiguity is nicely solved this this pattern. to only thing i'm not happy with is the 'equals()' and 'hashCode()' implementation. hashcode should cache the code, and i suggest that equals delegates to the values. for example it might result in very poor performance if 2 date values always need to convert to string for comparison: public boolean equals(Object other) { if (this == other) { return true; } if (other instanceof GenericValue) { GenericValue o = (GenericValue) other; return o.value == value || value.equals(o.value); } else if (other instanceof Value) { return value.equals((Value) other); } else { return false; } } and have all the optimized equals in the value classes. > Improve the Value implementation > > > Key: JCR-680 > URL: https://issues.apache.org/jira/browse/JCR-680 > Project: Jackrabbit > Issue Type: Improvement > Components: core >Reporter: Jukka Zitting > Assigned To: Jukka Zitting >Priority: Minor > Attachments: class.jpg, GenericValue.patch, GenericValue.tar.gz, > JCR-680.patch > > > The current Value implementation found in jackrabbit-jcr-commons has some > deficiencies like Value.equals() being incorrect in some cases (see for > example JCR-320), and Name and Path values not following namespace remappings. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-680) Improve the Value implementation
[ https://issues.apache.org/jira/browse/JCR-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462051 ] Jukka Zitting commented on JCR-680: --- > it might be just my lack of understanding but the proposed redesign > is IMO more complex and harder to grasp than gthe current implementation. I would argue for my proposal (GenericValue.patch) with the following: a) The proposed Value implementation (counting the essential *Value classes + ValueFactoryImpl) totals 379 NCSS (non-commenting source statements), a notable reduction (-39%) from the current 619 NCSS. Also the number of methods per class is down as well as virtually all method-level code complexity metrics. b) All the member variables of instantiated objects are always used in the proposed implementation, whereas almost all the member variables of the current implementation are unused in some object states. c) Related to b), the proposed implementation contains only a single mutable member variable (GenericValue.value) and only the GenericValue instances are mutable. The current implementation has four mutable member variables and all the *Value instances are mutable. d) The proposed implementation follows a well known and documented design pattern. e) The proposed implementation fixes JCR-320 trivially. > i am not against redesigning the Value implementations in general but > i frankly fail to see the net benefit of the proposed redesign. it's harder > to understand (at least for me ;) I'm not too committed to the proposal, so it's OK to me not to apply it if the above rationale is not convincing enough to justify the drawbacks of changing existing code without any immediate need other than JCR-320 (which is quite minor). > btw: the patch is still quite big and quite difficult to read. Check the attached GenericValue.tar.gz file, it contains the proposed versions of the modified classes. I find the resulting source files much easier to read in this case since a large part of the patch is simply removing code that's no longer needed. > Improve the Value implementation > > > Key: JCR-680 > URL: https://issues.apache.org/jira/browse/JCR-680 > Project: Jackrabbit > Issue Type: Improvement > Components: core >Reporter: Jukka Zitting > Assigned To: Jukka Zitting >Priority: Minor > Attachments: class.jpg, GenericValue.patch, GenericValue.tar.gz, > JCR-680.patch > > > The current Value implementation found in jackrabbit-jcr-commons has some > deficiencies like Value.equals() being incorrect in some cases (see for > example JCR-320), and Name and Path values not following namespace remappings. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-680) Improve the Value implementation
[ https://issues.apache.org/jira/browse/JCR-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461995 ] Stefan Guggisberg commented on JCR-680: --- > This patch fixes JCR-320 and notably simplifies the Value classes. it might be just my lack of understanding but the proposed redesign is IMO more complex and harder to grasp than gthe current implementation. i am not against redesigning the Value implementations in general but i frankly fail to see the net benefit of the proposed redesign. it's harder to understand (at least for me ;) btw: the patch is still quite big and quite difficult to read. > Improve the Value implementation > > > Key: JCR-680 > URL: https://issues.apache.org/jira/browse/JCR-680 > Project: Jackrabbit > Issue Type: Improvement > Components: core >Reporter: Jukka Zitting > Assigned To: Jukka Zitting >Priority: Minor > Attachments: class.jpg, GenericValue.patch, JCR-680.patch > > > The current Value implementation found in jackrabbit-jcr-commons has some > deficiencies like Value.equals() being incorrect in some cases (see for > example JCR-320), and Name and Path values not following namespace remappings. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Commented: (JCR-680) Improve the Value implementation
Hi, On 12/27/06, Thomas Mueller <[EMAIL PROTECTED]> wrote: So far I thought 'values' is quite a simple concept, and with some exceptions (for example the problems with streams) I thought it should be possible to implement it in a simple way. To tell you the truth, I don't understand the class diagram. For me, it is very complex, I'm confused. For example, I don't understand why you made the distinction between initial and committed values? I'm using the State pattern to localize the Value state transitions to the Generic and Initial value classes. The current approach keeps the state in an explicit BaseValue.state member value and requires explicit setValueConsumed() calls in subclasses to keep the state updated. The proposed implementation simplifies this by removing all state handling form the committed value classes, leaving only the responsibilities of value representation and conversion to the committed values. Also I don't understand what GenericValue is for (specially the hashCode method of this class seems strange, it looks like it is very inefficient because it is always creates a String and doesn't cache the hash value). GenericValue implements the context part of the State pattern. It is essentially a wrapper around the GenericValue.value member variable that points to the current Value state. The state can either be an "initial" value (in case a value getter has not yet been called) or a "committed" value (in case a getter has already been called). The difference between those states is that an "initial" value can still be converted from stream to non-stream or vice versa, but a "committed" value can not. Caching the hash value would of course be possible if better performance is required, but I don't think Values are used that much as hash keys, so I think that simplicity is more important in this case. The getString() call is required in hashCode() to achieve proper state behaviour and to comply with the equals() contract. Other things that are inefficient are CommittedStringValue.getXYZ() because those methods always create Parser objects. But I just don't understand the concept, I'm not saying it is 'wrong' (maybe we can guarantee that these functions are not used a lot). I wanted to localize the value conversion mechanisms as much as possible to avoid duplicating code. Since an instance of the ValueParser class is essentially just a wrapper for the String to be parsed, I would expect a decent JVM to optimize away the allocation of the object (or to just allocate it as a temporary object on stack). By the way, one concept I don't currently see in the value factory implementation is caching of commonly used values (except for CommittedTrueValue / CommittedFalseValue). In my experience, this improves the performance and saves a lot of memory because it is very common that a small set of values is used in many places. I could implement such a feature later on if it is not implemented yet. Good point, and actually something that couldn't be possible without separating the "committed" value state. It would be nice to see some performance numbers before doing that though, since a generational garbage collector allows amazingly good performance with large numbers of small objects. SerializableInputStream: if I understand it correctly, then the old implementation allowed very large (that don't fit in memory) binary values, and the new implementation does not? Please correct me if I'm wrong. You're wrong. The whole stream is consumed in memory only when a Value gets serialized, something that the current implementation doesn't even support. In any case both implementations are even required to consume the entire stream if getString() is called. In normal circumstances (i.e. only getStream() called on binary values), the proposed implementation simply passes along the InputStream received in ValueFactory.createValue(InputStream) just like the current implementation does. Note however that simply doing that in ValueFactory.createValue(InputStream) is most likely incorrect as the client that makes the call has currently no way of knowing when or who should close the stream to reclaim any resources (file handles, etc.) associated with the stream. The proposed Value implementation does not attempt to solve this isssue. PS. I should probably present the proposed changes as a sequence of incremental changes than as a single revolutionary patch. BR, Jukka Zitting
Re: [jira] Commented: (JCR-680) Improve the Value implementation
Hi Jukka, So far I thought 'values' is quite a simple concept, and with some exceptions (for example the problems with streams) I thought it should be possible to implement it in a simple way. To tell you the truth, I don't understand the class diagram. For me, it is very complex, I'm confused. For example, I don't understand why you made the distinction between initial and committed values? Also I don't understand what GenericValue is for (specially the hashCode method of this class seems strange, it looks like it is very inefficient because it is always creates a String and doesn't cache the hash value). Other things that are inefficient are CommittedStringValue.getXYZ() because those methods always create Parser objects. But I just don't understand the concept, I'm not saying it is 'wrong' (maybe we can guarantee that these functions are not used a lot). By the way, one concept I don't currently see in the value factory implementation is caching of commonly used values (except for CommittedTrueValue / CommittedFalseValue). In my experience, this improves the performance and saves a lot of memory because it is very common that a small set of values is used in many places. I could implement such a feature later on if it is not implemented yet. SerializableInputStream: if I understand it correctly, then the old implementation allowed very large (that don't fit in memory) binary values, and the new implementation does not? Please correct me if I'm wrong. If this is would be correct, then for me this alone is reason enough not to apply the patch. I think the ability to store / retrieve very large binary values is very important. I understand it is 'better' not to create temp files on the client side (for speed for example, and to avoid security problems and avoid having to deal with creating / deleting temp files), but if you want to consume the value multiple times (for example, because it is used in multiple places), it is probably the easiest solution. Another case is for high availability clustering, if the value is sent to multiple servers (I understand there is no plan to support that in the near future). Please don't view my comments as critics. I just don't understand the concept behind it, and the reasons for doing the things like you did. I'm for adding the test cases of course! Test cases are always good. Thomas On 12/22/06, Jukka Zitting <[EMAIL PROTECTED]> wrote: Hi, On 12/22/06, Thomas Mueller <[EMAIL PROTECTED]> wrote: > > The rationale for proposing a revolutionary rewrite rather than incrementally > > improving the existing Value implementation is that the basic design of the > > existing implementation doesn't allow easy extension or customization. > > What kind of extension / customization do you have in mind? I'm just > curious... SPI? SPI might be a good candidate, though I was especially thinking of implementations where you'd rather use a custom adapter to an internal value representation instead of one of the existing the committed value classes. The State pattern in the proposed implementation nicely separates the internal value representation from the value state behaviour, making it easy to implement custom value backends. For example, I've been thinking about a simple SystemViewRepository implementation that would expose a system view XML document through the JCR API. It would make sense for such an implementation to implement Values as adapters of the sv:value nodes in the DOM tree. > > > the stream data is materialized in memory during de-/serialization; > > > this renders it imo unusable for large streams. > > Value serialization should never be used by Jackrabbit core, it's included for > > other applications like JCR-RMI. > > So you mean the JCR-RMI would serialize streams? Does it do that now? JCR-RMI currently uses temporary files for deserialized binary values. I'm not too happy about that, a better approach would be to use a RemoteInputStream to stream the data over the network on-demand. > More generally, is there another way to (more or less) efficiently access a > JCR repository remotely and store / read streams, and what is the plan > for the future? Or is this the goal of the SPI project (from what I > read, I'm not sure any more)? I think the WebDAV server is already pretty good in that respect. Implementing the JCR API on top of the WebDAV interface is one of the goals of the SPI effort. > > The goal of the default implementation is full semantic accuracy without extra > > external dependencies (like to the file system) > > Currently I think that buffering really large (bigger than memory) > streams to disk (temp file) on the client side is the easiest way to > support them in a client / server environment. If you don't do that, > then either you can't support large streams at all, or you need to > implement special handling in the remote protocol. So you have the > dependency there, and things get even more complicated and less > modular
Re: [jira] Commented: (JCR-680) Improve the Value implementation
Hi, On 12/22/06, Thomas Mueller <[EMAIL PROTECTED]> wrote: > The rationale for proposing a revolutionary rewrite rather than incrementally > improving the existing Value implementation is that the basic design of the > existing implementation doesn't allow easy extension or customization. What kind of extension / customization do you have in mind? I'm just curious... SPI? SPI might be a good candidate, though I was especially thinking of implementations where you'd rather use a custom adapter to an internal value representation instead of one of the existing the committed value classes. The State pattern in the proposed implementation nicely separates the internal value representation from the value state behaviour, making it easy to implement custom value backends. For example, I've been thinking about a simple SystemViewRepository implementation that would expose a system view XML document through the JCR API. It would make sense for such an implementation to implement Values as adapters of the sv:value nodes in the DOM tree. > > the stream data is materialized in memory during de-/serialization; > > this renders it imo unusable for large streams. > Value serialization should never be used by Jackrabbit core, it's included for > other applications like JCR-RMI. So you mean the JCR-RMI would serialize streams? Does it do that now? JCR-RMI currently uses temporary files for deserialized binary values. I'm not too happy about that, a better approach would be to use a RemoteInputStream to stream the data over the network on-demand. More generally, is there another way to (more or less) efficiently access a JCR repository remotely and store / read streams, and what is the plan for the future? Or is this the goal of the SPI project (from what I read, I'm not sure any more)? I think the WebDAV server is already pretty good in that respect. Implementing the JCR API on top of the WebDAV interface is one of the goals of the SPI effort. > The goal of the default implementation is full semantic accuracy without extra > external dependencies (like to the file system) Currently I think that buffering really large (bigger than memory) streams to disk (temp file) on the client side is the easiest way to support them in a client / server environment. If you don't do that, then either you can't support large streams at all, or you need to implement special handling in the remote protocol. So you have the dependency there, and things get even more complicated and less modular compared to disk buffering. Just my opinion. There are legitimate reasons to avoid temporary disk storage in a general purpose component like the proposed Value implementation. For example, you need to make sure that any temporary files are removed when the referencing Value is no longer used and that file contents can not be read by anyone else (in some cases not even by code in the same JVM instance). The limitation on memory use of serializing a binary value is essentially the same as using Value.getString() on a binary value. It's a useful feature in some cases and allows you to quickly prototype things, but you need to add special handling to binary values to make your application scale. BR, Jukka Zitting
Re: [jira] Commented: (JCR-680) Improve the Value implementation
Hi, The rationale for proposing a revolutionary rewrite rather than incrementally improving the existing Value implementation is that the basic design of the existing implementation doesn't allow easy extension or customization. What kind of extension / customization do you have in mind? I'm just curious... SPI? > the stream data is materialized in memory during de-/serialization; > this renders it imo unusable for large streams. Value serialization should never be used by Jackrabbit core, it's included for other applications like JCR-RMI. So you mean the JCR-RMI would serialize streams? Does it do that now? I just think this is / would be quite a limitation for JCR-RMI. I'm not sure about the use case for JCR-RMI and how long it is here to stay, maybe it is OK if the functionality is / will be limited. More generally, is there another way to (more or less) efficiently access a JCR repository remotely and store / read streams, and what is the plan for the future? Or is this the goal of the SPI project (from what I read, I'm not sure any more)? The goal of the default implementation is full semantic accuracy without extra external dependencies (like to the file system) Currently I think that buffering really large (bigger than memory) streams to disk (temp file) on the client side is the easiest way to support them in a client / server environment. If you don't do that, then either you can't support large streams at all, or you need to implement special handling in the remote protocol. So you have the dependency there, and things get even more complicated and less modular compared to disk buffering. Just my opinion. Thomas
[jira] Commented: (JCR-680) Improve the Value implementation
[ http://issues.apache.org/jira/browse/JCR-680?page=comments#action_12460456 ] Jukka Zitting commented on JCR-680: --- > > * Support for namespace remappings in NAME and PATH values > > is this a requirement? IMO the spec doesn't mandate it. The spec says (6.2.5.2): "A NAME is a pairing of a namespace and a local name. It must be handled internally in such a way that when read through the API the namespace is mapped to the current prefix." It can of course be argued that "read through the API" means Property.getValue() instead of Value.getString(), but then having Value.getType() return PropertyType.NAME wouldn't really make sense. > (btw huge!) Agreed, it's rather large. The amount of code is however about the same as in the current value implementation. The rationale for proposing a revolutionary rewrite rather than incrementally improving the existing Value implementation is that the basic design of the existing implementation doesn't allow easy extension or customization. > - SerializableInputStream: > the stream data is materialized in memory during de-/serialization; > this renders it imo unusable for large streams. in the entire core > streams are never unconditionally materialized. Value serialization should never be used by Jackrabbit core, it's included for other applications like JCR-RMI. Such applications should consider case-by-case whether custom serialization of binary values is needed. The goal of the default implementation is full semantic accuracy without extra external dependencies (like to the file system). It is easy for an application to override the default behaviour by subclassing CommittedValueFactory. > - ValueParser: > the getDate() method does not preserve the time zone in the > returned Calendar object. the current implementation internally > uses to ISO8601 utilty class which preserves the time zone > information. what was the rationale of not using the ISO8601 class? I think it's generally more useful for a client to receive all date values in the default timezone of the system. That's just my opinion though, preserving the time zone information would be no problem. I used the ISO8601 class as the basis of the Date parsing in ValueParser and formatting in CommittedDateValue, the methods there are streamlined versions of the parse and format methods in ISO8601. Since the ISO8601 date formats are not used elsewhere and people can just as well use ValueFactory for all format conversions, I don't think having the formatting in a separate utility class is needed. > Improve the Value implementation > > > Key: JCR-680 > URL: http://issues.apache.org/jira/browse/JCR-680 > Project: Jackrabbit > Issue Type: Improvement > Components: core >Reporter: Jukka Zitting > Assigned To: Jukka Zitting >Priority: Minor > Attachments: class.jpg, JCR-680.patch > > > The current Value implementation found in jackrabbit-jcr-commons has some > deficiencies like Value.equals() being incorrect in some cases (see for > example JCR-320), and Name and Path values not following namespace remappings. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-680) Improve the Value implementation
[ http://issues.apache.org/jira/browse/JCR-680?page=comments#action_12460441 ] Stefan Guggisberg commented on JCR-680: --- > * Support for namespace remappings in NAME and PATH values is this a requirement? IMO the spec doesn't mandate it. a first rough run through the (btw huge!) patch revealed 2 issues: - SerializableInputStream: the stream data is materialized in memory during de-/serialization; this renders it imo unusable for large streams. in the entire core streams are never unconditionally materialized. - ValueParser: the getDate() method does not preserve the time zone in the returned Calendar object. the current implementation internally uses to ISO8601 utilty class which preserves the time zone information. what was the rationale of not using the ISO8601 class? > Improve the Value implementation > > > Key: JCR-680 > URL: http://issues.apache.org/jira/browse/JCR-680 > Project: Jackrabbit > Issue Type: Improvement > Components: core >Reporter: Jukka Zitting > Assigned To: Jukka Zitting >Priority: Minor > Attachments: class.jpg, JCR-680.patch > > > The current Value implementation found in jackrabbit-jcr-commons has some > deficiencies like Value.equals() being incorrect in some cases (see for > example JCR-320), and Name and Path values not following namespace remappings. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira