Re: The future of Serialization

2014-08-27 Thread Paul Sandoz

On Aug 9, 2014, at 7:20 PM, Brian Goetz  wrote:

>> I've noticed there's not much interest in improving Serialization on
>> these lists.  This makes me wonder if java Serialization has lost
>> relevance in recent years with the rise of protocol buffers apache
>> thrift and other means of data transfer over byte streams.
> 
> I sense your frustration, but I think you may be reaching the wrong 
> conclusion.  The lack of response is probably not evidence that there's no 
> interest in fixing serialization; its that fixing serialization, with all the 
> constraints that "fix" entails, is just really really hard, and its much 
> easier to complain about it (and even say "let's just get rid of it") than to 
> fix it.
> 
>> Should Serializable eventually be deprecated? Should Serialization be
>> disabled by default? Should a new mechanism be developed? If a new
>> mechanism is developed, what about circular object relationships?
> 
> As I delved into my own explorations of serialization, I started to realize 
> why such a horrible approach was the one that was ultimately chosen; while 
> serialization is horrible and awful and leaky and insecure and complex and 
> brittle, it does address problems like cyclic data structures and independent 
> evolution of subclass and superclass better than the "clean" models.
> 
> My conclusion is, at best, a new mechanism would have to live side-by-side 
> with the old one, since it could only handle 95% of the cases.  It might 
> handle those 95% much better -- more cleanly, securely, and allowing easier 
> schema evolution -- but the hard cases are still there.  Still, reducing the 
> use of the horrible old mechanism may still be a worthy goal, even if it 
> can't be killed outright.
> 


Also many serialization-based libraries use sun.misc.Unsafe or 
sun.reflect.ReflectionFactory for various reasons (with backup plans if such 
classes are not available or accessible).

As part to the future of serialization i think we need to evaluate libraries 
such as XStream and Objenesis  to see what unsafe/internal mechanisms can be 
replaced by functionally equivalent safe public mechanisms.

I have more questions than answers at the moment with regards to that :-(

Paul.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: The future of Serialization

2014-08-13 Thread Alan Bateman

On 12/08/2014 10:03, Peter Firmstone wrote:


Interesting, language features for modules, won't necessarily involve 
ClassLoader's (my assumptions were based on existing systems) although 
you'd expect modules to have their own ProtectionDomain.


I think it would be reasonable to expect to that you should be able to 
grant permissions to specific modules. This is something for a 
difference discussion thread of course.




When is a better timeframe, roughly, to discuss Serializable?

I'm sure there isn't a best time that works for everyone that is 
interested in this topic. However, in terms of JDK 9 then it seems early 
enough to do the exploration and prototypes, and get moving on proposals.


-Alan.


Re: The future of Serialization

2014-08-12 Thread Peter Firmstone
Interesting, language features for modules, won't necessarily involve 
ClassLoader's (my assumptions were based on existing systems) although you'd 
expect modules to have their own ProtectionDomain.

An alternative to isolates, is separate processes with jvm class sharing 
enabled.

I'll keep an eye out for the JSR.

When is a better timeframe, roughly, to discuss Serializable?

Regards,

Peter.

- Original message -
> On 11/08/2014 13:06, Peter Firmstone wrote:
> > Thanks Alan, I can relate to time poverty :)
> >
> > I might be assuming too much, but if there's interest in doing
> > something with Serialization, I'd be interested in learning about
> > plans or difficulties involved in deserialization and modules. It can
> > be a little more difficult to find the correct ClassLoader to resolve
> > classes during deserialization when ClassLoader relationships aren't
> > hierarchial.   Object streams can be annotated with module information
> > to assist resolution.
> The issues around visibility when deserializing are somewhat independent
> of modules. The usual way to deal with such matters is to override the
> resolveClass method. Another long standing suggestion is for
> ObjectInputStream to define a new constructor that takes a class loader
> to avoid the stack walk to get the user-defined loader. It remains to
> seen but if we can avoid changing visibility then we don't change the
> status quo.
> > :
> >
> > Got any links to info on extending access control rules?
> Not yet, a future JSR will define the standard module system and there
> will be a corresponding JEP for the implementation.
>
> -Alan.



Re: The future of Serialization

2014-08-11 Thread Alan Bateman

On 11/08/2014 13:06, Peter Firmstone wrote:

Thanks Alan, I can relate to time poverty :)

I might be assuming too much, but if there's interest in doing 
something with Serialization, I'd be interested in learning about 
plans or difficulties involved in deserialization and modules. It can 
be a little more difficult to find the correct ClassLoader to resolve 
classes during deserialization when ClassLoader relationships aren't 
hierarchial.  Object streams can be annotated with module information 
to assist resolution.
The issues around visibility when deserializing are somewhat independent 
of modules. The usual way to deal with such matters is to override the 
resolveClass method. Another long standing suggestion is for 
ObjectInputStream to define a new constructor that takes a class loader 
to avoid the stack walk to get the user-defined loader. It remains to 
seen but if we can avoid changing visibility then we don't change the 
status quo.

:

Got any links to info on extending access control rules?
Not yet, a future JSR will define the standard module system and there 
will be a corresponding JEP for the implementation.


-Alan.


Re: The future of Serialization

2014-08-11 Thread Peter Firmstone

Thanks Alan, I can relate to time poverty :)

I might be assuming too much, but if there's interest in doing something 
with Serialization, I'd be interested in learning about plans or 
difficulties involved in deserialization and modules.   It can be a 
little more difficult to find the correct ClassLoader to resolve classes 
during deserialization when ClassLoader relationships aren't 
hierarchial.  Object streams can be annotated with module information to 
assist resolution.


On the subject of isolates, I found Ribbons interesting:
https://www.cs.purdue.edu/homes/peugster/Ribbons/RJ.pdf
https://www.cs.purdue.edu/homes/peugster/Ribbons/

Got any links to info on extending access control rules?

Regards,

Peter.

On 11/08/2014 9:21 PM, Alan Bateman wrote:

On 09/08/2014 06:56, Peter Firmstone wrote:


I've noticed there's not much interest in improving Serialization on 
these lists. This makes me wonder if java Serialization has lost 
relevance in recent years with the rise of protocol buffers apache 
thrift and other means of data transfer over byte streams.


Just to add to Brian's comments, I think part of it is that many 
people are busy with other things, preparing for JDK 9 for example. So 
I think there is a lot of support for investigation and proposals that 
would improve things, it's just that some people are too busy to respond.





I don't know if isolates will be included with JDK 9 for Jigsaw, or 
whether ClassLoaders alone will provide isolation for modules.


The ability to limit visibility and provide isolation of 
implementation classes as well as providing limits on memory and 
threads for isolated modules would also improve platform security.


If by "isolates" you mean JSR 121 then I think that would be well 
beyond the scope, as would resource management. This isn't really the 
thread to discuss how module boundaries will work but just to say that 
class loaders and visibility can be weak when it comes to module 
boundaries. There are other options available, particularly when the 
ability to extend the access control rules are on the table. So I 
would suggest not making any assumptions here for now.


-Alan.




Re: The future of Serialization

2014-08-11 Thread Alan Bateman

On 09/08/2014 06:56, Peter Firmstone wrote:


I've noticed there's not much interest in improving Serialization on 
these lists. This makes me wonder if java Serialization has lost 
relevance in recent years with the rise of protocol buffers apache 
thrift and other means of data transfer over byte streams.


Just to add to Brian's comments, I think part of it is that many people 
are busy with other things, preparing for JDK 9 for example. So I think 
there is a lot of support for investigation and proposals that would 
improve things, it's just that some people are too busy to respond.





I don't know if isolates will be included with JDK 9 for Jigsaw, or 
whether ClassLoaders alone will provide isolation for modules.


The ability to limit visibility and provide isolation of 
implementation classes as well as providing limits on memory and 
threads for isolated modules would also improve platform security.


If by "isolates" you mean JSR 121 then I think that would be well beyond 
the scope, as would resource management. This isn't really the thread to 
discuss how module boundaries will work but just to say that class 
loaders and visibility can be weak when it comes to module boundaries. 
There are other options available, particularly when the ability to 
extend the access control rules are on the table. So I would suggest not 
making any assumptions here for now.


-Alan.


Re: The future of Serialization

2014-08-11 Thread Peter Firmstone

On 11/08/2014 8:12 PM, Peter Firmstone wrote:

Brian,

Thanks for picking up on my frustration ;)

I have something in mind for Serializable2 to address cyclic data 
structures and the possibility of independant evolution of super and 
child classes, while retaining a relatively clean public api, with one 
optional private method.  The methods and interfaces proposed are 
suitable for any alternative ObjectInput and ObjectOutput implementation.


An interface exists in Apache River, it's called Startable, it has one 
method:


public void start() throws Exception;

It's called by a framework to allow an Object to start threads, 
publish "this" or throw an exception after construction.  The intent 
is to allow an object to be immutable with final fields and be 
provided with a thread of execution after construction and before 
publication.


Something similar can be used to wire up circular relations, let met 
explain:


Every class that implements Serializable has one thing in common, the 
Serialization protocol and every Object instance of a Serializable 
class has an arbitrary serial form.


I propose a final class representing SerialForm for an object, that 
cannot be extended, requires privilege to instantiate and also 
performs method guard security checks, for all callers with the 
exception of a calling class reading or writing its own serial form.  
SerialForm needs a parameter field key identity represented by the 
calling Class, 


Sorry, that should read "field name", not "method name".

the method name and the field's Class type, this key would be used for 
both writing and retrieving a field entry in SerialForm. SerialForm 
will also provide a method to advise if a field key contains a 
circular relation, any field entry in SerialForm that would contain a 
circular relation is not populated until after construction of the 
current object is complete.


An arbitrary Serializable2 Object instance may be composed of a 
hierarchy of classes, each belonging to a separate ProtectionDomain.


For the following interface:

public interface Serializable2 {

void writeObject(SerialForm serial) throws IOException;

}

Implementers of Serializable2 must:

  1. Implement writeObject
  2. Implement a constructor with the signature:  (SerialForm serial).

Implementors that need to check invariants, delay throwing an 
Exception, publish "this" or set a circular reference after 
construction should:


  4. Implement: private void readObjectNoData() throws
 InvalidObjectException;

Child class implementations should:

  5. Call their super class writeObject method and superclass
 constructor, but may call any super class constructor or methods.

Compatibility and Evolution:

  1. Fields can be included or omitted from SerialForm, by an
 implementation, without breaking compatibility, provided a null
 reference is accepted during deserialization.
  2. Child classes in a hierarchy;  all Serializable2 implementing
 superclass constructors have the same signature; the superclass
 implementation can be substituted, without breaking child class
 deserialization (provided this is the constructor used by the
 child class).
  3. There is no serialVersionUID.
  4. Child class Serializable2 implementations can extend a superclass
 without a zero arg constructor that doesn't itself implement
 Serializable2.
  5. Child classes that do not override writeObject will not be
 serialized, so can effectively opt out.
  6. Because implementations are required to implement public methods,
 there is no "Magic".
  7. Serializable2 shouldn't extend Serializable, allowing classes to
 implement both interfaces for a period of time (for that reason
 the signature for readObjectNoData may need to be changed for
 Serializable2).
  8. ObjectInputStream and ObjectOutputStream can be extended to
 support both implementations for compatibility, however
 alternative stream implementations would be preferable for
 Serializable2 to avoid Serializable security issues.  The new
 implementations should be possible to substitute because both
 types would use the same Stream Protocol, provided the classes
 being deserialized implement Serializable2.


My reasoning for retaining readObjectNoData() and for updating field 
entry's in SerialForm that contain circular relations after 
construction, is:


  1. An object reference for the object currently being deserialized
 can be passed to another object's constructor (via a SerialForm
 instance) after the current Object's constructor completes,
 allowing safe publication of final field freezes that occur at the
 end of construction.
  2. When the Serialization2 Framework becomes aware of an object that
 contains a circular relationship while that object is in the
 process of being deserialized, the second object will not be
 instantiated until after the constructor of the first object in
  

Re: The future of Serialization

2014-08-11 Thread Peter Firmstone

Brian,

Thanks for picking up on my frustration ;)

I have something in mind for Serializable2 to address cyclic data 
structures and the possibility of independant evolution of super and 
child classes, while retaining a relatively clean public api, with one 
optional private method.  The methods and interfaces proposed are 
suitable for any alternative ObjectInput and ObjectOutput implementation.


An interface exists in Apache River, it's called Startable, it has one 
method:


public void start() throws Exception;

It's called by a framework to allow an Object to start threads, publish 
"this" or throw an exception after construction.  The intent is to allow 
an object to be immutable with final fields and be provided with a 
thread of execution after construction and before publication.


Something similar can be used to wire up circular relations, let met 
explain:


Every class that implements Serializable has one thing in common, the 
Serialization protocol and every Object instance of a Serializable class 
has an arbitrary serial form.


I propose a final class representing SerialForm for an object, that 
cannot be extended, requires privilege to instantiate and also performs 
method guard security checks, for all callers with the exception of a 
calling class reading or writing its own serial form.  SerialForm needs 
a parameter field key identity represented by the calling Class, the 
method name and the field's Class type, this key would be used for both 
writing and retrieving a field entry in SerialForm. SerialForm will also 
provide a method to advise if a field key contains a circular relation, 
any field entry in SerialForm that would contain a circular relation is 
not populated until after construction of the current object is complete.


An arbitrary Serializable2 Object instance may be composed of a 
hierarchy of classes, each belonging to a separate ProtectionDomain.


For the following interface:

public interface Serializable2 {

void writeObject(SerialForm serial) throws IOException;

}

Implementers of Serializable2 must:

  1. Implement writeObject
  2. Implement a constructor with the signature:  (SerialForm serial).

Implementors that need to check invariants, delay throwing an Exception, 
publish "this" or set a circular reference after construction should:


  4. Implement: private void readObjectNoData() throws
 InvalidObjectException;

Child class implementations should:

  5. Call their super class writeObject method and superclass
 constructor, but may call any super class constructor or methods.

Compatibility and Evolution:

  1. Fields can be included or omitted from SerialForm, by an
 implementation, without breaking compatibility, provided a null
 reference is accepted during deserialization.
  2. Child classes in a hierarchy;  all Serializable2 implementing
 superclass constructors have the same signature; the superclass
 implementation can be substituted, without breaking child class
 deserialization (provided this is the constructor used by the
 child class).
  3. There is no serialVersionUID.
  4. Child class Serializable2 implementations can extend a superclass
 without a zero arg constructor that doesn't itself implement
 Serializable2.
  5. Child classes that do not override writeObject will not be
 serialized, so can effectively opt out.
  6. Because implementations are required to implement public methods,
 there is no "Magic".
  7. Serializable2 shouldn't extend Serializable, allowing classes to
 implement both interfaces for a period of time (for that reason
 the signature for readObjectNoData may need to be changed for
 Serializable2).
  8. ObjectInputStream and ObjectOutputStream can be extended to
 support both implementations for compatibility, however
 alternative stream implementations would be preferable for
 Serializable2 to avoid Serializable security issues.  The new
 implementations should be possible to substitute because both
 types would use the same Stream Protocol, provided the classes
 being deserialized implement Serializable2.


My reasoning for retaining readObjectNoData() and for updating field 
entry's in SerialForm that contain circular relations after 
construction, is:


  1. An object reference for the object currently being deserialized
 can be passed to another object's constructor (via a SerialForm
 instance) after the current Object's constructor completes,
 allowing safe publication of final field freezes that occur at the
 end of construction.
  2. When the Serialization2 Framework becomes aware of an object that
 contains a circular relationship while that object is in the
 process of being deserialized, the second object will not be
 instantiated until after the constructor of the first object in
 the relationship completes.  Data read in from the stream can be
 stored in a SerialForm without requiri

Re: The future of Serialization

2014-08-09 Thread Brian Goetz

I've noticed there's not much interest in improving Serialization on
these lists.  This makes me wonder if java Serialization has lost
relevance in recent years with the rise of protocol buffers apache
thrift and other means of data transfer over byte streams.


I sense your frustration, but I think you may be reaching the wrong 
conclusion.  The lack of response is probably not evidence that there's 
no interest in fixing serialization; its that fixing serialization, with 
all the constraints that "fix" entails, is just really really hard, and 
its much easier to complain about it (and even say "let's just get rid 
of it") than to fix it.



Should Serializable eventually be deprecated? Should Serialization be
disabled by default? Should a new mechanism be developed? If a new
mechanism is developed, what about circular object relationships?


As I delved into my own explorations of serialization, I started to 
realize why such a horrible approach was the one that was ultimately 
chosen; while serialization is horrible and awful and leaky and insecure 
and complex and brittle, it does address problems like cyclic data 
structures and independent evolution of subclass and superclass better 
than the "clean" models.


My conclusion is, at best, a new mechanism would have to live 
side-by-side with the old one, since it could only handle 95% of the 
cases.  It might handle those 95% much better -- more cleanly, securely, 
and allowing easier schema evolution -- but the hard cases are still 
there.  Still, reducing the use of the horrible old mechanism may still 
be a worthy goal, even if it can't be killed outright.




The future of Serialization

2014-08-08 Thread Peter Firmstone
I've noticed there's not much interest in improving Serialization on these 
lists.  This makes me wonder if java Serialization has lost relevance in recent 
years with the rise of protocol buffers apache thrift and other means of data 
transfer over byte streams.

The burden of implementing Serializable can significantly hamper developers 
efforts when refactoring, it's quite common for some projects to make no 
guarantee regarding Serialization compatibility between releases.  Also 
implementation of 

Serializable can double project development hours, hamper future development 
and increase software maintenance costs.

Serialization also presents opportunities for attackers and has been 
responsible for a number of zero day exploits.

I don't know if isolates will be included with JDK 9 for Jigsaw, or whether 
ClassLoaders alone will provide isolation for modules.

The ability to limit visibility and provide isolation of implementation classes 
as well as providing limits on memory and threads for isolated modules would 
also improve platform security.

Serialization may provide a means to hot upgrade modules, but more flexible 
options that doesn't cause serial data lock in need to be developed.

Should Serializable eventually be deprecated?
Should Serialization be disabled by default?  
Should a new mechanism be developed?
If a new mechanism is developed, what about circular object relationships?

Regards,

Peter.