Hi,

Sounds good. Just note that "user provided" value is in fact the one that is decoded from the protocol in case that a value is received in LDAP message. And in that case the "user provided" value is always binary, even for string values. If you remember our conversation some time ago the re-coding of string value by current implementation caused quite nasty problem if the receiver cannot reliably distinguish string and binary values. Therefore I really like the idea to store the original value and record on demand. If that is done well then the originally-received binary values can be retrieved even from StringValue class. And that will improve usability of the API.

--
Radovan Semancik
Software Architect
evolveum.com




On 10/12/2015 08:50 PM, Emmanuel Lécharny wrote:
Thoughts about value handling in the API and Server
---------------------------------------------------

We currently manage a quite complex hierarchy of classes to handle
attribute's values :

(Value<T>)
   o
   |
   +--[[AbstractValue<T>]]
         ^
         |
         +-- [StringValue | T : byte[]]
         |
         +-- [BinaryValue | T : String]
Every Value holds a wrappedValue (aka User Provided value) and a
normalizedValue. This second aspect is absolutely mandatory, because we
always return the UPValue back to the user, and we always compare values
using the normalized value (well, we can discuss that too).

DN and Filters are using a String representation of values that are a
bit specific. Typically, some chars get escaped in both cases (but not
the same way).

That is quite complex...

We probably can handle those values in a different way. First of all,
binary values aren't modified by the normalization process, so we could
most certainly save some space by not keeping a UpValue within a
NormValue for such values. Second, everything in LDAP is using UTF-8,
and we can easily convert UTF-8 to Unicode (which is the default format
for char in Java). We so have a trivial UTF-8 <--> Unicode conversion
that could be used if needed.

Last, not least every value is written either as a byte[] (binary
values) or as a UTF-8 String, which is also a byte[]. Knowing that we
will send back the values to the client converting them from String to
UTF-8, we can assume that most of the case, we are doing two conversions
(from byte[] to UTF-8 to String and then from String to UTF-8 to
byte[]), mostly wasting a lot of CPU...

Another idea would be to simply hide the byte[] unless we need to
convert them to a String, which can be done when needed. We need to
convert the values when we do a normalize (this happens when we want to
compare the value to another one), or a compare. We also need to run
every value through the PrepareString methods (and PrepSASL for the
userPassword) before saving them to the disk.


At this point, I can forsess some huge simplification in both the API
and the serverbu switching to a simpler data structure, and a potential
speedup (avoiding useless conversion).

I'd like you to review what I just wrote and tell me if I'm off base, or
if you feel like me that we can get a better server by changing those
data strcture.

Thanks !


Reply via email to