[jira] Commented: (JCR-680) Improve the Value implementation

2007-01-04 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462186
 ] 

Jukka Zitting commented on JCR-680:
---

> I agreed with stefan that it's a bit hard to understand how it works. 
> especially because the
> inner classes modify the field of the GenericValue.

Another alternative for the internal InitialValue classes would be to add a 
package-protected setValue() state setter in GenericValue and have the 
InitialValues as normal top-level classes. The current approach achieves 
maximum encapsulation at the expense of the extra complexity of having the 
InitialValues as inner classes. I'm not opposed to changing that.

> to only thing i'm not happy with is the 'equals()' and 'hashCode()' 
> implementation. hashcode
> should cache the code, and i suggest that equals delegates to the values. for 
> example it might
> result in very poor performance if 2 date values always need to convert to 
> string for comparison:

I'm not sure how performance-critical the Value equality and hash code 
operations are, thus I'd rather optimize for clarity by default. Also, see 
JCR-598 where the previous date optimization actually caused
incorrect behaviour.


> Improve the Value implementation
> 
>
> Key: JCR-680
> URL: https://issues.apache.org/jira/browse/JCR-680
> Project: Jackrabbit
>  Issue Type: Improvement
>  Components: core
>Reporter: Jukka Zitting
> Assigned To: Jukka Zitting
>Priority: Minor
> Attachments: class.jpg, GenericValue.patch, GenericValue.tar.gz, 
> JCR-680.patch
>
>
> The current Value implementation found in jackrabbit-jcr-commons has some 
> deficiencies like Value.equals() being incorrect in some cases (see for 
> example JCR-320), and Name and Path values not following namespace remappings.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-680) Improve the Value implementation

2007-01-04 Thread Tobias Bocanegra (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462176
 ] 

Tobias Bocanegra commented on JCR-680:
--

I agreed with stefan that it's a bit hard to understand how it works. 
especially because the inner classes modify the field of the GenericValue. 
however, it seems to make the other values classes much leaner :-) and the 
stream/value ambiguity is nicely solved this this pattern.

to only thing i'm not happy with is the 'equals()' and 'hashCode()' 
implementation. hashcode should cache the code, and i suggest that equals 
delegates to the values. for example it might result in very poor performance 
if 2 date values always need to convert to string for comparison:

public boolean equals(Object other) {
if (this == other) {
return true;
}
if (other instanceof GenericValue) {
GenericValue o = (GenericValue) other;
return o.value == value || value.equals(o.value);
} else if (other instanceof Value) {
return value.equals((Value) other);
} else {
return false;
}
}

and have all the optimized equals in the value classes.

> Improve the Value implementation
> 
>
> Key: JCR-680
> URL: https://issues.apache.org/jira/browse/JCR-680
> Project: Jackrabbit
>  Issue Type: Improvement
>  Components: core
>Reporter: Jukka Zitting
> Assigned To: Jukka Zitting
>Priority: Minor
> Attachments: class.jpg, GenericValue.patch, GenericValue.tar.gz, 
> JCR-680.patch
>
>
> The current Value implementation found in jackrabbit-jcr-commons has some 
> deficiencies like Value.equals() being incorrect in some cases (see for 
> example JCR-320), and Name and Path values not following namespace remappings.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-680) Improve the Value implementation

2007-01-03 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462051
 ] 

Jukka Zitting commented on JCR-680:
---

> it might be just my lack of understanding but the proposed redesign
> is IMO more complex and harder to grasp than gthe current implementation. 

I would argue for my proposal (GenericValue.patch) with the following:

a) The proposed Value implementation (counting the essential *Value classes + 
ValueFactoryImpl) totals 379 NCSS (non-commenting source statements), a notable 
reduction (-39%) from the current 619 NCSS. Also the number of methods per 
class is down as well as virtually all method-level code complexity metrics.

b) All the member variables of instantiated objects are always used in the 
proposed implementation, whereas almost all the member variables of the current 
implementation are unused in some object states.

c) Related to b), the proposed implementation contains only a single mutable 
member variable (GenericValue.value) and only the GenericValue instances are 
mutable. The current implementation has four mutable member variables and all 
the *Value instances are mutable.

d) The proposed implementation follows a well known and documented design 
pattern.

e) The proposed implementation fixes JCR-320 trivially.

> i am not against redesigning the Value implementations in general but
> i frankly fail to see the net benefit of the proposed redesign. it's harder
> to understand (at least for me ;)

I'm not too committed to the proposal, so it's OK to me not to apply it if the 
above rationale is not convincing enough to justify the drawbacks of changing 
existing code without any immediate need other than JCR-320 (which is quite 
minor).

> btw: the patch is still quite big and quite difficult to read.

Check the attached GenericValue.tar.gz file, it contains the proposed versions 
of the modified classes. I find the resulting source files much easier to read 
in this case since a large part of the patch is simply removing code that's no 
longer needed.


> Improve the Value implementation
> 
>
> Key: JCR-680
> URL: https://issues.apache.org/jira/browse/JCR-680
> Project: Jackrabbit
>  Issue Type: Improvement
>  Components: core
>Reporter: Jukka Zitting
> Assigned To: Jukka Zitting
>Priority: Minor
> Attachments: class.jpg, GenericValue.patch, GenericValue.tar.gz, 
> JCR-680.patch
>
>
> The current Value implementation found in jackrabbit-jcr-commons has some 
> deficiencies like Value.equals() being incorrect in some cases (see for 
> example JCR-320), and Name and Path values not following namespace remappings.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-680) Improve the Value implementation

2007-01-03 Thread Stefan Guggisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/JCR-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461995
 ] 

Stefan Guggisberg commented on JCR-680:
---

> This patch fixes JCR-320 and notably simplifies the Value classes.

it might be just my lack of understanding but the proposed redesign 
is IMO more complex and harder to grasp than gthe current implementation.

i am not against redesigning the Value implementations in general but
i frankly fail to see the net benefit of the proposed redesign. it's harder
to understand (at least for me ;) 

btw: the patch is still quite big and quite difficult to read.

> Improve the Value implementation
> 
>
> Key: JCR-680
> URL: https://issues.apache.org/jira/browse/JCR-680
> Project: Jackrabbit
>  Issue Type: Improvement
>  Components: core
>Reporter: Jukka Zitting
> Assigned To: Jukka Zitting
>Priority: Minor
> Attachments: class.jpg, GenericValue.patch, JCR-680.patch
>
>
> The current Value implementation found in jackrabbit-jcr-commons has some 
> deficiencies like Value.equals() being incorrect in some cases (see for 
> example JCR-320), and Name and Path values not following namespace remappings.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Commented: (JCR-680) Improve the Value implementation

2006-12-27 Thread Jukka Zitting

Hi,

On 12/27/06, Thomas Mueller <[EMAIL PROTECTED]> wrote:

So far I thought 'values' is quite a simple concept, and with some
exceptions (for example the problems with streams) I thought it should
be possible to implement it in a simple way. To tell you the truth, I
don't understand the class diagram. For me, it is very complex, I'm
confused. For example, I don't understand why you made the distinction
between initial and committed values?


I'm using the State pattern to localize the Value state transitions to
the Generic and Initial value classes. The current approach keeps the
state in an explicit BaseValue.state member value and requires
explicit setValueConsumed() calls in subclasses to keep the state
updated. The proposed implementation simplifies this by removing all
state handling form the committed value classes, leaving only the
responsibilities of value representation and conversion to the
committed values.


Also I don't understand what GenericValue is for (specially the hashCode
method of this class seems strange, it looks like it is very inefficient because
it is always creates a String and doesn't cache the hash value).


GenericValue implements the context part of the State pattern. It is
essentially a wrapper around the GenericValue.value member variable
that points to the current Value state. The state can either be an
"initial" value (in case a value getter has not yet been called) or a
"committed" value (in case a getter has already been called). The
difference between those states is that an "initial" value can still
be converted from stream to non-stream or vice versa, but a
"committed" value can not.

Caching the hash value would of course be possible if better
performance is required, but I don't think Values are used that much
as hash keys, so I think that simplicity is more important in this
case. The getString() call is required in hashCode() to achieve proper
state behaviour and to comply with the equals() contract.


Other things that are inefficient are CommittedStringValue.getXYZ() because
those methods always create Parser objects. But I just don't understand
the concept, I'm not saying it is 'wrong' (maybe we can guarantee that
these functions are not used a lot).


I wanted to localize the value conversion mechanisms as much as
possible to avoid duplicating code. Since an instance of the
ValueParser class is essentially just a wrapper for the String to be
parsed, I would expect a decent JVM to optimize away the allocation of
the object (or to just allocate it as a temporary object on stack).


By the way, one concept I don't currently see in the value factory
implementation is caching of commonly used values (except for
CommittedTrueValue / CommittedFalseValue). In my experience, this
improves the performance and saves a lot of memory because it is very
common that a small set of values is used in many places. I could
implement such a feature later on if it is not implemented yet.


Good point, and actually something that couldn't be possible without
separating the "committed" value state. It would be nice to see some
performance numbers before doing that though, since a generational
garbage collector allows amazingly good performance with large numbers
of small objects.


SerializableInputStream: if I understand it correctly, then the old
implementation allowed very large (that don't fit in memory) binary
values, and the new implementation does not? Please correct me if I'm
wrong.


You're wrong. The whole stream is consumed in memory only when a Value
gets serialized, something that the current implementation doesn't
even support. In any case both implementations are even required to
consume the entire stream if getString() is called. In normal
circumstances (i.e. only getStream() called on binary values), the
proposed implementation simply passes along the InputStream received
in ValueFactory.createValue(InputStream) just like the current
implementation does.

Note however that simply doing that in
ValueFactory.createValue(InputStream) is most likely incorrect as the
client that makes the call has currently no way of knowing when or who
should close the stream to reclaim any resources (file handles, etc.)
associated with the stream. The proposed Value implementation does not
attempt to solve this isssue.

PS. I should probably present the proposed changes as a sequence of
incremental changes than as a single revolutionary patch.

BR,

Jukka Zitting


Re: [jira] Commented: (JCR-680) Improve the Value implementation

2006-12-27 Thread Thomas Mueller

Hi Jukka,

So far I thought 'values' is quite a simple concept, and with some
exceptions (for example the problems with streams) I thought it should
be possible to implement it in a simple way. To tell you the truth, I
don't understand the class diagram. For me, it is very complex, I'm
confused. For example, I don't understand why you made the distinction
between initial and committed values? Also I don't understand what
GenericValue is for (specially the hashCode method of this class seems
strange, it looks like it is very inefficient because it is always
creates a String and doesn't cache the hash value). Other things that
are inefficient are CommittedStringValue.getXYZ() because those
methods always create Parser objects. But I just don't understand the
concept, I'm not saying it is 'wrong' (maybe we can guarantee that
these functions are not used a lot).

By the way, one concept I don't currently see in the value factory
implementation is caching of commonly used values (except for
CommittedTrueValue / CommittedFalseValue). In my experience, this
improves the performance and saves a lot of memory because it is very
common that a small set of values is used in many places. I could
implement such a feature later on if it is not implemented yet.

SerializableInputStream: if I understand it correctly, then the old
implementation allowed very large (that don't fit in memory) binary
values, and the new implementation does not? Please correct me if I'm
wrong. If this is would be correct, then for me this alone is reason
enough not to apply the patch. I think the ability to store / retrieve
very large binary values is very important. I understand it is
'better' not to create temp files on the client side (for speed for
example, and to avoid security problems and avoid having to deal with
creating / deleting temp files), but if you want to consume the value
multiple times (for example, because it is used in multiple places),
it is probably the easiest solution. Another case is for high
availability clustering, if the value is sent to multiple servers (I
understand there is no plan to support that in the near future).

Please don't view my comments as critics. I just don't understand the
concept behind it, and the reasons for doing the things like you did.

I'm for adding the test cases of course! Test cases are always good.

Thomas


On 12/22/06, Jukka Zitting <[EMAIL PROTECTED]> wrote:

Hi,

On 12/22/06, Thomas Mueller <[EMAIL PROTECTED]> wrote:
> > The rationale for proposing a revolutionary rewrite rather than 
incrementally
> > improving the existing Value implementation is that the basic design of the
> > existing implementation doesn't allow easy extension or customization.
>
> What kind of extension / customization do you have in mind? I'm just
> curious... SPI?

SPI might be a good candidate, though I was especially thinking of
implementations where you'd rather use a custom adapter to an internal
value representation instead of one of the  existing the committed
value classes. The State pattern in the proposed implementation nicely
separates the internal value representation from the value state
behaviour, making it easy to implement custom value backends.

For example, I've been thinking about a simple SystemViewRepository
implementation that would expose a system view XML document through
the JCR API. It would make sense for such an implementation to
implement Values as adapters of the sv:value nodes in the DOM tree.

> > >   the stream data is materialized in memory during de-/serialization;
> > >   this renders it imo unusable for large streams.
> > Value serialization should never be used by Jackrabbit core, it's included 
for
> > other applications like JCR-RMI.
>
> So you mean the JCR-RMI would serialize streams? Does it do that now?

JCR-RMI currently uses temporary files for deserialized binary values.
I'm not too happy about that, a better approach would be to use a
RemoteInputStream to stream the data over the network on-demand.

> More generally, is there another way to (more or less) efficiently access a
> JCR repository remotely and store / read streams, and what is the plan
> for the future? Or is this the goal of the SPI project (from what I
> read, I'm not sure any more)?

I think the WebDAV server is already pretty good in that respect.
Implementing the JCR API on top of the WebDAV interface is one of the
goals of the SPI effort.

> > The goal of the default implementation is full semantic accuracy without 
extra
> > external dependencies (like to the file system)
>
> Currently I think that buffering really large (bigger than memory)
> streams to disk (temp file) on the client side is the easiest way to
> support them in a client / server environment. If you don't do that,
> then either you can't support large streams at all, or you need to
> implement special handling in the remote protocol. So you have the
> dependency there, and things get even more complicated and less
> modular 

Re: [jira] Commented: (JCR-680) Improve the Value implementation

2006-12-22 Thread Jukka Zitting

Hi,

On 12/22/06, Thomas Mueller <[EMAIL PROTECTED]> wrote:

> The rationale for proposing a revolutionary rewrite rather than incrementally
> improving the existing Value implementation is that the basic design of the
> existing implementation doesn't allow easy extension or customization.

What kind of extension / customization do you have in mind? I'm just
curious... SPI?


SPI might be a good candidate, though I was especially thinking of
implementations where you'd rather use a custom adapter to an internal
value representation instead of one of the  existing the committed
value classes. The State pattern in the proposed implementation nicely
separates the internal value representation from the value state
behaviour, making it easy to implement custom value backends.

For example, I've been thinking about a simple SystemViewRepository
implementation that would expose a system view XML document through
the JCR API. It would make sense for such an implementation to
implement Values as adapters of the sv:value nodes in the DOM tree.


> >   the stream data is materialized in memory during de-/serialization;
> >   this renders it imo unusable for large streams.
> Value serialization should never be used by Jackrabbit core, it's included for
> other applications like JCR-RMI.

So you mean the JCR-RMI would serialize streams? Does it do that now?


JCR-RMI currently uses temporary files for deserialized binary values.
I'm not too happy about that, a better approach would be to use a
RemoteInputStream to stream the data over the network on-demand.


More generally, is there another way to (more or less) efficiently access a
JCR repository remotely and store / read streams, and what is the plan
for the future? Or is this the goal of the SPI project (from what I
read, I'm not sure any more)?


I think the WebDAV server is already pretty good in that respect.
Implementing the JCR API on top of the WebDAV interface is one of the
goals of the SPI effort.


> The goal of the default implementation is full semantic accuracy without extra
> external dependencies (like to the file system)

Currently I think that buffering really large (bigger than memory)
streams to disk (temp file) on the client side is the easiest way to
support them in a client / server environment. If you don't do that,
then either you can't support large streams at all, or you need to
implement special handling in the remote protocol. So you have the
dependency there, and things get even more complicated and less
modular compared to disk buffering. Just my opinion.


There are legitimate reasons to avoid temporary disk storage in a
general purpose component like the proposed Value implementation. For
example, you need to make sure that any temporary files are removed
when the referencing Value is no longer used and that file contents
can not be read by anyone else (in some cases not even by code in the
same JVM instance).

The limitation on memory use of serializing a binary value is
essentially the same as using Value.getString() on a binary value.
It's a useful feature in some cases and allows you to quickly
prototype things, but you need to add special handling to binary
values to make your application scale.

BR,

Jukka Zitting


Re: [jira] Commented: (JCR-680) Improve the Value implementation

2006-12-22 Thread Thomas Mueller

Hi,


The rationale for proposing a revolutionary rewrite rather than incrementally 
improving the existing Value implementation is that the basic design of the 
existing implementation doesn't allow easy extension or customization.


What kind of extension / customization do you have in mind? I'm just
curious... SPI?


>   the stream data is materialized in memory during de-/serialization;
>   this renders it imo unusable for large streams.
Value serialization should never be used by Jackrabbit core, it's included for 
other applications like JCR-RMI.


So you mean the JCR-RMI would serialize streams? Does it do that now?
I just think this is / would be quite a limitation for JCR-RMI. I'm
not sure about the use case for JCR-RMI and how long it is here to
stay, maybe it is OK if the functionality is / will be limited. More
generally, is there another way to (more or less) efficiently access a
JCR repository remotely and store / read streams, and what is the plan
for the future? Or is this the goal of the SPI project (from what I
read, I'm not sure any more)?


The goal of the default implementation is full semantic accuracy without extra 
external dependencies (like to the file system)


Currently I think that buffering really large (bigger than memory)
streams to disk (temp file) on the client side is the easiest way to
support them in a client / server environment. If you don't do that,
then either you can't support large streams at all, or you need to
implement special handling in the remote protocol. So you have the
dependency there, and things get even more complicated and less
modular compared to disk buffering. Just my opinion.

Thomas


[jira] Commented: (JCR-680) Improve the Value implementation

2006-12-22 Thread Jukka Zitting (JIRA)
[ 
http://issues.apache.org/jira/browse/JCR-680?page=comments#action_12460456 ] 

Jukka Zitting commented on JCR-680:
---

> > * Support for namespace remappings in NAME and PATH values
>
> is this a requirement? IMO the spec doesn't mandate it.

The spec says (6.2.5.2):

"A NAME is a pairing of a namespace and a local name. It must be handled 
internally in such
a way that when read through the API the namespace is mapped to the current 
prefix."

It can of course be argued that "read through the API" means 
Property.getValue() instead of Value.getString(), but then having 
Value.getType() return PropertyType.NAME wouldn't really make sense.

> (btw huge!)

Agreed, it's rather large. The amount of code is however about the same as in 
the current value implementation.

The rationale for proposing a revolutionary rewrite rather than incrementally 
improving the existing Value implementation is that the basic design of the 
existing implementation doesn't allow easy extension or customization.

> - SerializableInputStream:
>   the stream data is materialized in memory during de-/serialization;
>   this renders it imo unusable for large streams. in the entire core
>   streams are never unconditionally materialized. 

Value serialization should never be used by Jackrabbit core, it's included for 
other applications like JCR-RMI. Such applications should consider case-by-case 
whether custom serialization of binary values is needed. The goal of the 
default implementation is full semantic accuracy without extra external 
dependencies (like to the file system). It is easy for an application to 
override the default behaviour by subclassing CommittedValueFactory.

> - ValueParser:
>   the getDate() method does not preserve the time zone in the
>   returned Calendar object. the current implementation internally
>   uses to ISO8601 utilty class which preserves the time zone
>   information. what was the rationale of not using the ISO8601 class?

I think it's generally more useful for a client to receive all date values in 
the default timezone of the system. That's just my opinion though, preserving 
the time zone information would be no problem.

I used the ISO8601 class as the basis of the Date parsing in ValueParser and 
formatting in CommittedDateValue, the methods there are streamlined versions of 
the parse and format methods in ISO8601. Since the ISO8601 date formats are not 
used elsewhere and people can just as well use ValueFactory for all format 
conversions, I don't think having the formatting in a separate utility class is 
needed.


> Improve the Value implementation
> 
>
> Key: JCR-680
> URL: http://issues.apache.org/jira/browse/JCR-680
> Project: Jackrabbit
>  Issue Type: Improvement
>  Components: core
>Reporter: Jukka Zitting
> Assigned To: Jukka Zitting
>Priority: Minor
> Attachments: class.jpg, JCR-680.patch
>
>
> The current Value implementation found in jackrabbit-jcr-commons has some 
> deficiencies like Value.equals() being incorrect in some cases (see for 
> example JCR-320), and Name and Path values not following namespace remappings.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (JCR-680) Improve the Value implementation

2006-12-22 Thread Stefan Guggisberg (JIRA)
[ 
http://issues.apache.org/jira/browse/JCR-680?page=comments#action_12460441 ] 

Stefan Guggisberg commented on JCR-680:
---

> * Support for namespace remappings in NAME and PATH values 

is this a requirement? IMO the spec doesn't mandate it. 

a first rough run through the (btw huge!) patch revealed 2 issues:

- SerializableInputStream: 
  the stream data is materialized in memory during de-/serialization;
  this renders it imo unusable for large streams. in the entire core
  streams are never unconditionally materialized. 

- ValueParser:
   the getDate() method does not preserve the time zone in the 
   returned Calendar object. the current implementation internally
   uses to ISO8601 utilty class which preserves the time zone  
   information. what was the rationale of not using the ISO8601 class?



> Improve the Value implementation
> 
>
> Key: JCR-680
> URL: http://issues.apache.org/jira/browse/JCR-680
> Project: Jackrabbit
>  Issue Type: Improvement
>  Components: core
>Reporter: Jukka Zitting
> Assigned To: Jukka Zitting
>Priority: Minor
> Attachments: class.jpg, JCR-680.patch
>
>
> The current Value implementation found in jackrabbit-jcr-commons has some 
> deficiencies like Value.equals() being incorrect in some cases (see for 
> example JCR-320), and Name and Path values not following namespace remappings.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira