Re: std.serialization: pre-voting review / discussion
Review summary: http://wiki.dlang.org/Review/std.serialization ( please review :) )
Re: std.serialization: pre-voting review / discussion
On 2013-09-21 15:13, mrd wrote: Is this the right way? There are special formats (Protocol Buffers, for example) for a binary format what can be changed over time without breaking old code. But for normal serialization is not this redundant? Besides, search by name slower compared with other methods (field numbers, for example). Not necessarily. I could implement that by default it will use the field number, if the names doesn't match it could fallback to do a lookup by name. I would like to avoid having a dependency on the orders of the fields. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-09-21 14:48, mrd wrote: What is the purpose of the keys? Fields are looked up by name. This is to avoid a dependency of the order of the fields. I guess I can look up by field order instead and fallback to a name look up if a name don't match. If they really needed, can it be realised as option and disable them? (I see also that they are forces to add a lot of duplicate functions.) Yeah, I guess so. I guess if it succeeded binary format can be made very compact (and possibly faster) as Protocol Buffers. I'm working on a binary archive as well. It ignores the name and look up by field order instead. It assume that the archived data and the class/struct has the same field order. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Thursday, 15 August 2013 at 07:07:13 UTC, Jacob Carlborg wrote: On 2013-08-14 21:55, ilya-stromberg wrote: Can you use another serialization format and supports file output for it? For example, can you use JSON, BSON or binary format? The idea of the library is that it can support multiple archive types. Currently only XML is implemented. I have been working on a binary archive for a while but I haven't finished it yet. I am also working on my own binary archive implementation (I just want to quickly get a serialization of integral types, arrays and structs into binary) and I have a question: What is the purpose of the keys? If they really needed, can it be realised as option and disable them? (I see also that they are forces to add a lot of duplicate functions.) I guess if it succeeded binary format can be made very compact (and possibly faster) as Protocol Buffers.
Re: std.serialization: pre-voting review / discussion
On Thursday, 22 August 2013 at 13:13:48 UTC, Jacob Carlborg wrote: On 2013-08-22 13:57, ilya-stromberg wrote: Can std.serialization load data from old file to the new class? Yes. In this case it will use the name of the instance fields when searching for values in the archive. Is this the right way? There are special formats (Protocol Buffers, for example) for a binary format what can be changed over time without breaking old code. But for normal serialization is not this redundant? Besides, search by name slower compared with other methods (field numbers, for example).
Re: std.serialization: pre-voting review / discussion
On Wednesday, 14 August 2013 at 09:26:55 UTC, Jacob Carlborg wrote: On 2013-08-14 11:17, Tove wrote: I find the newstyle both more intuitive and you also more dry not duplicating the identifier: int b; mixin NonSerialized!(b) @nonSerialized struct Foo { int a; int b; int c; } struct Bar { int a; int b; @nonSerialized int c; } Absolutely. Jacob, can you add @serializationName(string name) UDA? I saw the custom serialization example from documentation: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializable.html#.Serializable class Foo : Serializable { int a; void toData (Serializer serializer, Serializer.Data key) { serializer.serialize(a, b); } void fromData (Serializer serializer, Serializer.Data key) { a = serializer.deserialize!(int)(b); } } Whith @serializationName(string name) attribute example should look like this: class Foo { @serializationName(b) int a; } Or for class/struct name: @serializationName(Bar) class Foo { int a; } I think it's easier to use than custom serialization. And @nonSerialized UDA used for same purpose - simplify serialization customization. Is it possible to implement?
Re: std.serialization: pre-voting review / discussion
On 2013-09-04 14:37, ilya-stromberg wrote: Jacob, can you add @serializationName(string name) UDA? I saw the custom serialization example from documentation: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializable.html#.Serializable class Foo : Serializable { int a; void toData (Serializer serializer, Serializer.Data key) { serializer.serialize(a, b); } void fromData (Serializer serializer, Serializer.Data key) { a = serializer.deserialize!(int)(b); } } Whith @serializationName(string name) attribute example should look like this: class Foo { @serializationName(b) int a; } Or for class/struct name: @serializationName(Bar) class Foo { int a; } I think it's easier to use than custom serialization. And @nonSerialized UDA used for same purpose - simplify serialization customization. @nonSerialized is already available. At the bottom of the link you posted. Is it possible to implement? Yes, the question is how much of these customization should be supported. It's easy to add at a later time if I don't add it from the beginning. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Thursday, 22 August 2013 at 19:53:53 UTC, Jacob Carlborg wrote: On 2013-08-22 21:30, ilya-stromberg wrote: Great! What about more difficult cases? For example, we have: class Foo { int a; int b; } After changes we have new class: class Foo { long b; } Can std.serialization load data to new class from old file? It should ignore a and convert b from int to long. No it can't. It will throw an exception because it cannot find a long element: Could not find an element long with the attribute key with the value b Jacob, can you use clearer error messages and provide more information for it? You can type full class/sruct name (via std.traits.fullyQualifiedName) and field name and type: information can not be found in the archive Please, put attention on it.
Re: std.serialization: pre-voting review / discussion
On Sunday, 1 September 2013 at 08:33:51 UTC, ilya-stromberg wrote: On Thursday, 22 August 2013 at 19:53:53 UTC, Jacob Carlborg wrote: On 2013-08-22 21:30, ilya-stromberg wrote: Great! What about more difficult cases? For example, we have: class Foo { int a; int b; } After changes we have new class: class Foo { long b; } Can std.serialization load data to new class from old file? It should ignore a and convert b from int to long. No it can't. It will throw an exception because it cannot find a long element: Could not find an element long with the attribute key with the value b Jacob, can you use clearer error messages and provide more information for it? You can type full class/sruct name (via std.traits.fullyQualifiedName) and field name and type: information can not be found in the archive Please, put attention on it. Sorry, I want to write: Could not deserialize the field b with type long of class Fouo: information can not be found in the archive.
Re: std.serialization: pre-voting review / discussion
On 2013-09-01 10:35, ilya-stromberg wrote: Sorry, I want to write: Could not deserialize the field b with type long of class Fouo: information can not be found in the archive. Yes, I could enhance the error message. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
Jacob, what are your current plans on this (considering recent range API discussion thread)?
Re: std.serialization: pre-voting review / discussion
On 2013-08-31 14:11, Dicebot wrote: Jacob, what are your current plans on this (considering recent range API discussion thread)? My todo list looks like this: - write an overview documentation - improve the documentation for std.serialization.serializable to indicate it's not required - implement a convenience function for serializing - implement a convenience function for serializing to a file - remove Serializeable - check only for toData when serializing - check only for fromData when deserializing - split Serializer in to two parts - make the parts structs - possibly provide class wrappers - split Archive in two parts - add range interface to Serializer and Archive - rename all archives to archivers - replace ddoc comments with regular comments for all package protected symbols Although I'm guessing I won't be able to finish it in time for voting. How much time is it left anyway? -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Saturday, 31 August 2013 at 17:58:57 UTC, Jacob Carlborg wrote: My todo list looks like this: - write an overview documentation - improve the documentation for std.serialization.serializable to indicate it's not required - implement a convenience function for serializing - implement a convenience function for serializing to a file - remove Serializeable - check only for toData when serializing - check only for fromData when deserializing - split Serializer in to two parts - make the parts structs - possibly provide class wrappers - split Archive in two parts - add range interface to Serializer and Archive - rename all archives to archivers - replace ddoc comments with regular comments for all package protected symbols Although I'm guessing I won't be able to finish it in time for voting. How much time is it left anyway? Great. No hurry here, there is no hard deadline for voting - I'll put it on pause until you are ready. Just doing some personal bookkeeping. No pressure, just write me an e-mail when ready for next stage.
Re: std.serialization: pre-voting review / discussion
On 2013-08-31 20:51, Dicebot wrote: Great. No hurry here, there is no hard deadline for voting - I'll put it on pause until you are ready. Just doing some personal bookkeeping. No pressure, just write me an e-mail when ready for next stage. What I mean is that we usual have a couple of weeks for reviewing and then about one week for voting. I don't want to put the whole review queue on hold. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Saturday, 31 August 2013 at 19:37:34 UTC, Jacob Carlborg wrote: On 2013-08-31 20:51, Dicebot wrote: Great. No hurry here, there is no hard deadline for voting - I'll put it on pause until you are ready. Just doing some personal bookkeeping. No pressure, just write me an e-mail when ready for next stage. What I mean is that we usual have a couple of weeks for reviewing and then about one week for voting. I don't want to put the whole review queue on hold. You won't. Reviewing is not a blocking operation, if anyone wants to acts as a review manager for some other contribution, nothing prevents from doing it right now. I have simply marked `std.serialization` as Incorporating review comments in wiki and given no new comments this round of review can be considered finished.
Re: std.serialization: pre-voting review / discussion
On Friday, 23 August 2013 at 13:39:47 UTC, Dicebot wrote: On Friday, 23 August 2013 at 13:34:04 UTC, ilya-stromberg wrote: It's a serious issue. May be it's more important than range support. For example, I have to change class (bug fixing, new features, etc.), but it comparable with previos version (example: it's always possible to convert int to long). I that case I can't use std.serialization and have to write own solution (for examle, save data in csv file). I don't think it as an issue at all. Behavior you want can't be defined in a generic way, at least not without lot of UDA help or similar declarative approach. In other words, the fact that those two classes are interchangeable in the context of the serialization exists only in the mind of programmer, not in D type system. More than that, such behavior goes seriously out of the line of D being strongly typed language. I think functionality you want does belong to a more specialized module, not generic std.serialization - maybe even format-specific. Maybe you are right. But I think it's not so difficult to implement, at least for simle cases. We can follow a simple rules, for example like this: Does element b exists in the archive? - Yes. Does element b has type long? - No, the type is int. Can we convert type int to long? - Yes, load element b to tempory variable and convert it to long: int _b = 4; long b = to!long(_b); Is it difficult to implement? Also, we can provide a few deserialize models: strict (like current behavior) and smart (like example above). May be even 3 levels: strict, implicit conversions (like int to long) and explicit conversions (like long to int).
Re: std.serialization: pre-voting review / discussion
On Wednesday, 28 August 2013 at 16:02:09 UTC, ilya-stromberg wrote: ... There was a good proposal by Dmitry to separate sequential strict serialization for random-access one as two distinct entities. I like it and I think it that is also can solve your problem.
Re: std.serialization: pre-voting review / discussion
On Wednesday, 28 August 2013 at 16:10:03 UTC, Dicebot wrote: There was a good proposal by Dmitry to separate sequential strict serialization for random-access one as two distinct entities. I like it and I think it that is also can solve your problem. The problem is not only my. Actually, I didn't use C# serialization due this problem - any minimal code change breaks all previously serialized data. But I used .Net 1, maybe in current version solve this. Can you print the link, please?
Re: std.serialization: pre-voting review / discussion
On Wednesday, 28 August 2013 at 16:19:20 UTC, ilya-stromberg wrote: Can you print the link, please? http://forum.dlang.org/post/kvj17t$1ash$1...@digitalmars.com (Rigid vs Flexible part)
Re: std.serialization: pre-voting review / discussion
On 2013-08-28 18:02, ilya-stromberg wrote: Maybe you are right. But I think it's not so difficult to implement, at least for simle cases. We can follow a simple rules, for example like this: Does element b exists in the archive? - Yes. Does element b has type long? - No, the type is int. Can we convert type int to long? - Yes, load element b to tempory variable and convert it to long: int _b = 4; long b = to!long(_b); Is it difficult to implement? Also, we can provide a few deserialize models: strict (like current behavior) and smart (like example above). May be even 3 levels: strict, implicit conversions (like int to long) and explicit conversions (like long to int). I don't think we should add too much of this kind of functionality. There's a reason for why it supports custom serialization. This is a perfect example. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Saturday, 24 August 2013 at 17:47:35 UTC, Jacob Carlborg wrote: First, the interface Serializable is actually not necessary because this is actually checked with at template at compile time, it's possible to use these methods for structs as well. Second, instead of checking for both toData and fromData when serializing and deserializing it should only check for toData when serializing and only for fromData when deserializing. The name isSerializable is TERRIBLE: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializable.html#.isSerializable It only checks if functions toData and fromData exists in a class or struct. But std.serialization can serialize almost any data, so please rename the template, like hasCustomSerialization. The real isSerializable must check if it's possible to serialize and should look like this: enum isSerializable(T) = serializer.serialize(T);
Re: std.serialization: pre-voting review / discussion
On Friday, 23 August 2013 at 20:28:10 UTC, Jacob Carlborg wrote: On 2013-08-22 21:30, ilya-stromberg wrote: What about more difficult cases? Actually, my previous answer was not entirely correct. By default it will throw an exception. But you can implement the above using custom serialization (here using Orange) : Great job! A little question. For example, I would like to load data from previos format and store current version in default std.serialization format. So, I don't want to implement toData at all? Is it possible? Or can I call the default serialization method? Something like this: class Foo : Serializable { long b; //I don't want to implement this void toData (Serializer serializer, Serializer.Data key) { serializer.serialize(this); } void fromData (Serializer serializer, Serializer.Data key) { b = serializer.deserialize!(int)(b); } } Also, please add this examlpe to the documentation, it could be useful for many users. Note that we can split Serializable interface for 2 interfaces: interface ToSerializable { void toData(Serializer serializer, Serializer.Data key); } interface FromSerializable { void fromData(Serializer serializer, Serializer.Data key); } interface Serializable : ToSerializable, FromSerializable { } class Foo : FromSerializable { long b; void fromData (Serializer serializer, Serializer.Data key) { b = serializer.deserialize!(int)(b); } //I must NOT to implement toData }
Re: std.serialization: pre-voting review / discussion
On 2013-08-24 14:45, ilya-stromberg wrote: Great job! A little question. For example, I would like to load data from previos format and store current version in default std.serialization format. So, I don't want to implement toData at all? Is it possible? Or can I call the default serialization method? Something like this: class Foo : Serializable { long b; //I don't want to implement this void toData (Serializer serializer, Serializer.Data key) { serializer.serialize(this); } void fromData (Serializer serializer, Serializer.Data key) { b = serializer.deserialize!(int)(b); } } I actually noticed this problem when I wrote the example. First, the interface Serializable is actually not necessary because this is actually checked with at template at compile time, it's possible to use these methods for structs as well. Second, instead of checking for both toData and fromData when serializing and deserializing it should only check for toData when serializing and only for fromData when deserializing. I'll add this to my todo list. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Saturday, 24 August 2013 at 17:47:35 UTC, Jacob Carlborg wrote: I actually noticed this problem when I wrote the example. First, the interface Serializable is actually not necessary because this is actually checked with at template at compile time, it's possible to use these methods for structs as well. Second, instead of checking for both toData and fromData when serializing and deserializing it should only check for toData when serializing and only for fromData when deserializing. In that case maybe we should remove Serializable interface? And just spesify that user must implement toData or fromData for custom serializing or deserializing. Is it possible?
Re: std.serialization: pre-voting review / discussion
On 2013-08-24 21:26, ilya-stromberg wrote: In that case maybe we should remove Serializable interface? And just spesify that user must implement toData or fromData for custom serializing or deserializing. Is it possible? Yes, that's what I'm planning to do. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Saturday, 24 August 2013 at 19:32:13 UTC, Jacob Carlborg wrote: On 2013-08-24 21:26, ilya-stromberg wrote: In that case maybe we should remove Serializable interface? And just spesify that user must implement toData or fromData for custom serializing or deserializing. Is it possible? Yes, that's what I'm planning to do. Maybe we should rename methods toData and fromData to avoid name collisions? For example, we can use serializeToData and deserializeFromData, it will be clearer.
Re: std.serialization: pre-voting review / discussion
On Thursday, 22 August 2013 at 19:53:53 UTC, Jacob Carlborg wrote: On 2013-08-22 21:30, ilya-stromberg wrote: What about more difficult cases? No it can't. It will throw an exception because it cannot find a long element: Could not find an element long with the attribute key with the value b It's a serious issue. May be it's more important than range support. For example, I have to change class (bug fixing, new features, etc.), but it comparable with previos version (example: it's always possible to convert int to long). I that case I can't use std.serialization and have to write own solution (for examle, save data in csv file). The easist way to fix it - store Interface Definition of the serialized data (should be generated automaticly). For example, we can use XML Schema for Xml Archive. With Interface Definition we can find changes and try to convert data to new format. Note that glycerine also put your attention to this point: http://forum.dlang.org/post/kftlfwcyughhghewq...@forum.dlang.org 1. Interface Definition Language (IDL): required or not? If not, how do know the details of what to serialize. If not, how do you handle/support data versioning? If not, how do you interoperate without another language? If yes, which types are supported and what is the syntax and grammar of the IDL? Ideas?
Re: std.serialization: pre-voting review / discussion
On Friday, 23 August 2013 at 13:34:04 UTC, ilya-stromberg wrote: It's a serious issue. May be it's more important than range support. For example, I have to change class (bug fixing, new features, etc.), but it comparable with previos version (example: it's always possible to convert int to long). I that case I can't use std.serialization and have to write own solution (for examle, save data in csv file). I don't think it as an issue at all. Behavior you want can't be defined in a generic way, at least not without lot of UDA help or similar declarative approach. In other words, the fact that those two classes are interchangeable in the context of the serialization exists only in the mind of programmer, not in D type system. More than that, such behavior goes seriously out of the line of D being strongly typed language. I think functionality you want does belong to a more specialized module, not generic std.serialization - maybe even format-specific.
Re: std.serialization: pre-voting review / discussion
On Friday, 23 August 2013 at 13:39:47 UTC, Dicebot wrote: On Friday, 23 August 2013 at 13:34:04 UTC, ilya-stromberg wrote: It's a serious issue. May be it's more important than range support. For example, I have to change class (bug fixing, new features, etc.), but it comparable with previos version (example: it's always possible to convert int to long). I that case I can't use std.serialization and have to write own solution (for examle, save data in csv file). I don't think it as an issue at all. Behavior you want can't be defined in a generic way, at least not without lot of UDA help or similar declarative approach. In other words, the fact that those two classes are interchangeable in the context of the serialization exists only in the mind of programmer, not in D type system. More than that, such behavior goes seriously out of the line of D being strongly typed language. I think functionality you want does belong to a more specialized module, not generic std.serialization - maybe even format-specific. What about adding delegate hooks in somewhere? These delegates would be called on errors like invalid type or missing field. I'm not saying this needs to be there in order to release, but would this be a direction we'd like to go eventually? I've seen similar approaches elsewhere (e.g. Node.js's HTTP parser).
Re: std.serialization: pre-voting review / discussion
On 2013-08-23 16:39, Tyler Jameson Little wrote: What about adding delegate hooks in somewhere? These delegates would be called on errors like invalid type or missing field. I'm not saying this needs to be there in order to release, but would this be a direction we'd like to go eventually? I've seen similar approaches elsewhere (e.g. Node.js's HTTP parser). std.serialization already supports delegate hooks for missing values: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializer.html#.Serializer.errorCallback -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-08-22 21:30, ilya-stromberg wrote: Great! What about more difficult cases? For example, we have: class Foo { int a; int b; } After changes we have new class: class Foo { long b; } Can std.serialization load data to new class from old file? It should ignore a and convert b from int to long. Actually, my previous answer was not entirely correct. By default it will throw an exception. But you can implement the above using custom serialization (here using Orange) : module main; import orange.serialization._; import orange.serialization.archives._; import std.stdio; class Foo : Serializable { long b; void toData (Serializer serializer, Serializer.Data key) { } void fromData (Serializer serializer, Serializer.Data key) { b = serializer.deserialize!(int)(b); } } void main () { auto archive = new XmlArchive!(char); auto serializer = new Serializer(archive); auto data = `?xml version=1.0 encoding=UTF-8? archive version=1.0.0 type=org.dsource.orange.xml data object runtimeType=main.Foo type=main.Foo key=0 id=0 int key=a id=13/int int key=b id=24/int /object /data /archive`; auto f = serializer.deserialize!(Foo)(cast(immutable(void)[]) data); assert(f.b == 4); } -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Friday, 23 August 2013 at 20:29:40 UTC, Jacob Carlborg wrote: On 2013-08-23 16:39, Tyler Jameson Little wrote: What about adding delegate hooks in somewhere? These delegates would be called on errors like invalid type or missing field. I'm not saying this needs to be there in order to release, but would this be a direction we'd like to go eventually? I've seen similar approaches elsewhere (e.g. Node.js's HTTP parser). std.serialization already supports delegate hooks for missing values: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializer.html#.Serializer.errorCallback Awesome!
Re: std.serialization: pre-voting review / discussion
On Sunday, 18 August 2013 at 19:46:00 UTC, Jacob Carlborg wrote: If versioning is crucial it can be added. Can std.serialization load data if class definition was changed? For example, we have class Foo: class Foo { int a; int b; } and we serialize it in some file. After that class Foo was changed: class Foo { int b; int a; } Can std.serialization load data from old file to the new class?
Re: std.serialization: pre-voting review / discussion
On 2013-08-22 13:57, ilya-stromberg wrote: Can std.serialization load data if class definition was changed? For example, we have class Foo: class Foo { int a; int b; } and we serialize it in some file. After that class Foo was changed: class Foo { int b; int a; } Can std.serialization load data from old file to the new class? Yes. In this case it will use the name of the instance fields when searching for values in the archive. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-08-20 20:04, Walter Bright wrote: Hmm. That looks then like a ddoc bug. Added as: http://d.puremagic.com/issues/show_bug.cgi?id=10870 Found this as well: http://d.puremagic.com/issues/show_bug.cgi?id=10869 -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Thursday, 22 August 2013 at 13:13:48 UTC, Jacob Carlborg wrote: On 2013-08-22 13:57, ilya-stromberg wrote: Can std.serialization load data if class definition was changed? Yes. In this case it will use the name of the instance fields when searching for values in the archive. Great! What about more difficult cases? For example, we have: class Foo { int a; int b; } After changes we have new class: class Foo { long b; } Can std.serialization load data to new class from old file? It should ignore a and convert b from int to long.
Re: std.serialization: pre-voting review / discussion
On 8/22/2013 9:31 AM, Jacob Carlborg wrote: Added as: http://d.puremagic.com/issues/show_bug.cgi?id=10870 Found this as well: http://d.puremagic.com/issues/show_bug.cgi?id=10869 Thanks
Re: std.serialization: pre-voting review / discussion
On 2013-08-22 21:30, ilya-stromberg wrote: Great! What about more difficult cases? For example, we have: class Foo { int a; int b; } After changes we have new class: class Foo { long b; } Can std.serialization load data to new class from old file? It should ignore a and convert b from int to long. No it can't. It will throw an exception because it cannot find a long element: Could not find an element long with the attribute key with the value b -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Tuesday, 20 August 2013 at 10:51:25 UTC, Johannes Pfau wrote: Am Tue, 20 Aug 2013 10:40:57 +0200 schrieb ilya-stromberg ilya-stromberg-2...@yandex.ru: We can use InputRange like this: import std.file; auto archive = new Archive() Serializer(archive).serialize(object); //Archive implements InputRange for ubyte[] write(file, archive); Yes, InputRange is more flexible, but it's also more difficult to implement and less efficient: What happens between the 'serialize' and the 'write' call? Archive has to cache the data, either the original object or the final produced data in an ubyte[] buffer. No, Archive have to do NOTHING. 'serialize' call must only store pointer to the object - without this requirement we can't have lazy range. Serialization starts afrer 'write' call, and ArchiveInputRange have to store current serialization state (like Serializer in current implementation).
Re: std.serialization: pre-voting review / discussion
On Tuesday, 20 August 2013 at 15:07:39 UTC, Tyler Jameson Little wrote: On Tuesday, 20 August 2013 at 13:44:01 UTC, Daniel Murphy wrote: Dicebot pub...@dicebot.lv wrote in message news:luhuyerzmkebcltxh...@forum.dlang.org... What I really don't like is excessive amount of object in the API. For example, I have found no reason why I need to create serializer object to simply dump a struct state. It is both boilerplate and runtime overhead I can't justify. Only state serializer has is archiver - and it is simply collection of methods on its own. I prefer to be able to do something like `auto data = serialize!XmlArchiver(value);` I think this is very important. Simple uses should be as simple as possible. +1 This would enhance the 1-liner: write(file, serialize!XmlArchiver(InputRange)); We could even make nearly everything private except an isArchiver() template and serialize!(). It will be great! Also, whith Uniform Function Call Syntax (UFCS) it can be better: InputRange.serialize!XmlArchiver.zip.save(file); Also, we can provide a default Archiver type, for example XmlArchiver or BinaryArchiver: auto serialize(Archiver = BinaryArchiver, R)(R InputRange); //Use default Archiver type InputRange.serialize.zip.save(file);
Re: std.serialization: pre-voting review / discussion
On 2013-08-21 08:45, ilya-stromberg wrote: Also, we can provide a default Archiver type, for example XmlArchiver or BinaryArchiver: auto serialize(Archiver = BinaryArchiver, R)(R InputRange); //Use default Archiver type InputRange.serialize.zip.save(file); That's the plan. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-08-21 02:09, Dicebot wrote: P.S. Right now most important (and probably only really important) thing is range API. I think it is worth focusing on it and getting through the voting stage - actual merge can happen at any time you / Phobos devs are satisfied with implementation state, it does not require major community attention. Yes, but now there have been quite a lot suggestions for how the range API should look like that I'm even more confused. I'll think a start a new thread for this. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Wednesday, 21 August 2013 at 06:55:56 UTC, Jacob Carlborg wrote: On 2013-08-21 02:09, Dicebot wrote: P.S. Right now most important (and probably only really important) thing is range API. I think it is worth focusing on it and getting through the voting stage - actual merge can happen at any time you / Phobos devs are satisfied with implementation state, it does not require major community attention. Yes, but now there have been quite a lot suggestions for how the range API should look like that I'm even more confused. I'll think a start a new thread for this. Try to read the article: http://wiki.dlang.org/Component_programming_with_ranges It has got a lot of range examples, including std.range, std.algorithm and creation of new ranges.
Re: std.serialization: pre-voting review / discussion
On Wednesday, 21 August 2013 at 06:55:56 UTC, Jacob Carlborg wrote: Yes, but now there have been quite a lot suggestions for how the range API should look like that I'm even more confused. I'll think a start a new thread for this. Sure. I have already written my opinion on this but getting attention / opinion of some Phobos developers on this topic could have been valuable.
Re: std.serialization: pre-voting review / discussion
On 8/12/2013 6:27 AM, Dicebot wrote: Documentation: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/index.html Thank you, Jacob. It looks like you've put a lot of nice work into this. I've perused the documentation, and all I can think of is What's a cubit? http://www.youtube.com/watch?v=so9o3_daDZw I.e. there are 9 documentation pages of details. There's no obvious place to start, no overview, no explanation of what serialization is for and why I might want to use it and what's great about this implementation. At least none that I could find. Also needs some non-trivial canonical example code. Something that answers who what where when why and how would be immensely useful. Some nits: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializationexception.html Something went horribly wrong here: Parameters: Exception exception the exception exception to wrap https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_registerwrapper.html Lacks an illuminating example. https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializer.html When would I use a struct Array or a struct Slice? https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_attribute.html struct attribute should be capitalized. When would I use an attribute? Does this have anything to do with User Defined Attributes? Need a canonical example. https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_archives_archive.html Aren't interfaces already abstract? I.e. abstract is redundant. The documentation defines an archive more or less as an archive. I still don't know what an archive is. (E.g. a zip file is an archive - can this create zip files?)
Re: std.serialization: pre-voting review / discussion
On 8/18/2013 9:33 AM, David Nadlinger wrote: Having a system that regularly, automatically runs the test suites of several larger, well-known D projects with the results being readily available to the DMD/druntime/Phobos teams would certainly help. But it's also not ideal, since if a project starts to fail, the exact nature of the issue (regression in DMD or bug in the project, and if the former, a minimal test case) can often be hard to track down for somebody not already familiar with the code base. That's exactly the problem. If these large projects are incorporated into the autotester, who is going to isolate/fix problems arising with them? The test suite is designed to be a collection of already-isolated issues, so understanding what went wrong shouldn't be too difficult. Note that already it is noticeably much harder to debug a phobos unit test gone awry than the other tests. A full blown project that nobody understands would fare far worse. (And the other problem, of course, is the test suite is designed to be runnable fairly quickly. Compiling some other large project and running its test suite can make the autotester much less useful when the turnaround time increases.) Putting large projects into the autotester has the implication that development and support of those projects has been ceded to the core dev team, i.e. who is responsible for it has been badly blurred.
Re: std.serialization: pre-voting review / discussion
On Tuesday, 20 August 2013 at 03:42:48 UTC, Tyler Jameson Little wrote: On Monday, 19 August 2013 at 18:06:00 UTC, Johannes Pfau wrote: An important question regarding ranges for std.serialization is whether we want it to work as an InputRange or if it should _take_ an OutputRange. So the question is - auto archive = new Archive(); Serializer(archive).serialize(object); //Archive takes OutputRange, writes to it archive.writeTo(OutputRange); vs auto archive = new Archive() Serializer(archive).serialize(object); //Archive implements InputRange for ubyte[] foreach(ubyte[] data; archive) {} - I'd use the first approach as it should be simpler to implement. The second approach would be useful if the ubyte[] elements were processed via other ranges (map, take, ...). But as binary data is usually not processed in this way but just stored to disk or sent over network (basically streaming operations) the first approach should be fine. +1 for the first way. No, you are WRONG. InputRange is MORE flexible: it can be lazy or eager. OutputRange is only eager. As we know, lazy ranges is required if it's possible: On Sunday, 18 August 2013 at 18:26:55 UTC, Dicebot wrote: So as a review manager, I think voting should be delayed until API is ready to address lazy range-based work model. No actual implementation is required but 1) it should be possible to do it later without breaking user code 2) library should not make an assumption about implementation being lazy or eager We can use InputRange like this: import std.file; auto archive = new Archive() Serializer(archive).serialize(object); //Archive implements InputRange for ubyte[] write(file, archive); Another benefit: we can process InputRange. For example, if we have ZipRange zip(InputRange) function, it's easy to compress data: write(file, zip(archive)); Another example: we would like to change output xml file and filter some data (because we already have it). Or we would like to transform output xml to the html web page. No problems: XmlRange transformXml(InputRange); write(file, transformXml(archive)); Ideas?
Re: std.serialization: pre-voting review / discussion
Am Tue, 20 Aug 2013 10:40:57 +0200 schrieb ilya-stromberg ilya-stromberg-2...@yandex.ru: On Tuesday, 20 August 2013 at 03:42:48 UTC, Tyler Jameson Little wrote: On Monday, 19 August 2013 at 18:06:00 UTC, Johannes Pfau wrote: An important question regarding ranges for std.serialization is whether we want it to work as an InputRange or if it should _take_ an OutputRange. So the question is - auto archive = new Archive(); Serializer(archive).serialize(object); //Archive takes OutputRange, writes to it archive.writeTo(OutputRange); vs auto archive = new Archive() Serializer(archive).serialize(object); //Archive implements InputRange for ubyte[] foreach(ubyte[] data; archive) {} - I'd use the first approach as it should be simpler to implement. The second approach would be useful if the ubyte[] elements were processed via other ranges (map, take, ...). But as binary data is usually not processed in this way but just stored to disk or sent over network (basically streaming operations) the first approach should be fine. +1 for the first way. No, you are WRONG. InputRange is MORE flexible: it can be lazy or eager. OutputRange is only eager. As we know, lazy ranges is required if it's possible: On Sunday, 18 August 2013 at 18:26:55 UTC, Dicebot wrote: So as a review manager, I think voting should be delayed until API is ready to address lazy range-based work model. No actual implementation is required but 1) it should be possible to do it later without breaking user code 2) library should not make an assumption about implementation being lazy or eager We can use InputRange like this: import std.file; auto archive = new Archive() Serializer(archive).serialize(object); //Archive implements InputRange for ubyte[] write(file, archive); Yes, InputRange is more flexible, but it's also more difficult to implement and less efficient: What happens between the 'serialize' and the 'write' call? Archive has to cache the data, either the original object or the final produced data in an ubyte[] buffer. Another benefit: we can process InputRange. For example, if we have ZipRange zip(InputRange) function, it's easy to compress data: write(file, zip(archive)); Another example: we would like to change output xml file and filter some data (because we already have it). Or we would like to transform output xml to the html web page. No problems: Filtering is easier with an InputRange. Zip-Streams on the other hand should be OutputRanges and therefore work fine with both approaches. XmlRange transformXml(InputRange); write(file, transformXml(archive)); Ideas? The question is are there real-world examples where this is useful. You have to gauge the utility of this approach against it's more complicated and less efficient implementation.
Re: std.serialization: pre-voting review / discussion
Ok, I was trying to avoid expressing personal opinion until now and mostly keep track of comments of others but now that I have started reading docs/sources in details, will step down from review manager role for a moment and do some very subjective reviewing :) --- Hot topic first. Ranges. As far as I can see it it is not about lets stick range API whenever possible because it is the way Phobos does things. Key moment here to recognized use cases that are likely to require range-based interface and focus on them. As far as I can see it there two important places where possibility for range-based API can be helpful - providing values for serialize and providing raw data to deserialize, as well as matching Archiver changes. Former is relatively trivial - serialize should have an overload that accepts InputRange of monotyped values to take care of and provides ForwardRange as a result, which serializes values one-by-one lazily. Same goes to archiver. Latter is a bit more interesting. It would have been cool if instead of accepting raw data chunk that matches deserialized object size serializer.deserialize could have accepted InputRange that provides sequence of any random chunks of raw data and use it to construct values on per-request basis, lazily. This will require maintaining a buffer that will keep unconsumed remainder of the last chunk and make some decisions about behavior in case of hitting empty() before getting enough data to deserialize object. But it is not be something you should care about right now because only actual function/method signatures are needed with static asserts insides, actual implementation can be added later by anyone willing to spend time. --- Now about my personal feeling about std.serialization as a potential user. Core functionality I'd like to see in such module is the ability to dump D data type state into arbitrary formats in a robust way that requires minimal interference from the user code. Something like what is done with toJSON/fromJSON in vibe.d API stuff but more generic when in comes to output formats and more robust when it comes to data hierarchies to load/store. Judging by examples and documentation this is exactly what std.serialization does and I like it. It lacks some better output (Archiver) choices but it is more like Phobos fault. What I really don't like is excessive amount of object in the API. For example, I have found no reason why I need to create serializer object to simply dump a struct state. It is both boilerplate and runtime overhead I can't justify. Only state serializer has is archiver - and it is simply collection of methods on its own. I prefer to be able to do something like `auto data = serialize!XmlArchiver(value);` That is not something that would have made me vote against the inclusion (I think it is much needed anyway) but that may have discouraged me from using this part of Phobos and fall to some NIH syndrome. I have found documentation complete enough to get a basic understanding personally but one thing that has caused some frustration is that docs don't make clear distinction between minimal stuff and extra features. For example, there is https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializable.html - my guess that it is only used if user wants to override default serialization method for an aggregate type. But documentation for it is written in such manner that it gives an impression that it is absolutely required. --- Last thing is not really relevant but is more about general documentation problem. This may be the first package that makes use of new package.d system and it shows that we need some way to provide package-wide documentation to keep things clear. I guess for DDOC itself generating output from package.d is nothing special - but what about dlang.org? How hard will it be to update a documentation page to support own block for package roots?
Re: std.serialization: pre-voting review / discussion
On 2013-08-20 10:01, Walter Bright wrote: Thank you, Jacob. It looks like you've put a lot of nice work into this. I've perused the documentation, and all I can think of is What's a cubit? http://www.youtube.com/watch?v=so9o3_daDZw I.e. there are 9 documentation pages of details. There's no obvious place to start, no overview, no explanation of what serialization is for and why I might want to use it and what's great about this implementation. At least none that I could find. Also needs some non-trivial canonical example code. Something that answers who what where when why and how would be immensely useful. Yes, I need to add some overview documentation. There's still the problem of finding the overview. Some nits: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializationexception.html Something went horribly wrong here: Parameters: Exception exception the exception exception to wrap Hehe, yeah :) https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_registerwrapper.html Lacks an illuminating example. That doesn't need to be ddoc comments at all. The whole module is declared package. I would be really nice if ddoc could automatically hide anything that wasn't public or protected but still generate the documentation for package and private. https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializer.html When would I use a struct Array or a struct Slice? Same as above. I'll see if they really have to be public. https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_attribute.html struct attribute should be capitalized. When would I use an attribute? Does this have anything to do with User Defined Attributes? Need a canonical example. Same as above. I have used lower case because I don't consider this a struct, yes technically it is. This is an attribute (UDA) and I think attributes should be lower case. Or rather it's supposed to be used on types to indicate they are UDA's: @attribute struct foo {} The reason for this is that I'm a bit disappointed in the implementation of UDA's in D. I would have liked to have some kind of entity that I can point to and say this is an attribute. Currently all random values and types can be used as an UDA, I don't like that. Same idea why to have interface and abstract keywords. It's possible to avoid these, i.e. C++, but I think it's a lot better to have them. https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_archives_archive.html Aren't interfaces already abstract? I.e. abstract is redundant. I have no idea why abstract is added there. The definition looks like this: https://github.com/jacob-carlborg/phobos/blob/serialization/std/serialization/archives/archive.d#L88 The documentation defines an archive more or less as an archive. I still don't know what an archive is. The archive is the backend in the serialization process. And The archive is responsible for archiving primitive types in the format chosen by the archive implementation. The archive ensures that all types are properly archived in a format that can be later unarchived. (E.g. a zip file is an archive - can this create zip files?) Theoretically one can create an archive that serializes to a zip file, yes. Or rather the format used by zip. An archive shouldn't write to disk. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
Dicebot pub...@dicebot.lv wrote in message news:luhuyerzmkebcltxh...@forum.dlang.org... What I really don't like is excessive amount of object in the API. For example, I have found no reason why I need to create serializer object to simply dump a struct state. It is both boilerplate and runtime overhead I can't justify. Only state serializer has is archiver - and it is simply collection of methods on its own. I prefer to be able to do something like `auto data = serialize!XmlArchiver(value);` I think this is very important. Simple uses should be as simple as possible.
Re: std.serialization: pre-voting review / discussion
On Tuesday, 20 August 2013 at 13:44:01 UTC, Daniel Murphy wrote: Dicebot pub...@dicebot.lv wrote in message news:luhuyerzmkebcltxh...@forum.dlang.org... What I really don't like is excessive amount of object in the API. For example, I have found no reason why I need to create serializer object to simply dump a struct state. It is both boilerplate and runtime overhead I can't justify. Only state serializer has is archiver - and it is simply collection of methods on its own. I prefer to be able to do something like `auto data = serialize!XmlArchiver(value);` I think this is very important. Simple uses should be as simple as possible. +1 This would enhance the 1-liner: write(file, serialize!XmlArchiver(InputRange)); We could even make nearly everything private except an isArchiver() template and serialize!().
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 16:29:54 UTC, Jacob Carlborg wrote: On 2013-08-19 17:41, Jesse Phillips wrote: Code has moved to https://github.com/opticron/ProtocolBuffer Does it have any utility functions that are fairly standalone to handle the basic types, i.e. int, string, float and so on? The data conversions are handled by https://github.com/opticron/ProtocolBuffer/blob/master/conversion/pbbinary.d
Re: std.serialization: pre-voting review / discussion
On 8/20/2013 6:28 AM, Jacob Carlborg wrote: That doesn't need to be ddoc comments at all. The whole module is declared package. I would be really nice if ddoc could automatically hide anything that wasn't public or protected but still generate the documentation for package and private. You can hide comments from ddoc by not starting them with /** but with /* I have no idea why abstract is added there. The definition looks like this: https://github.com/jacob-carlborg/phobos/blob/serialization/std/serialization/archives/archive.d#L88 Hmm. That looks then like a ddoc bug. The documentation defines an archive more or less as an archive. I still don't know what an archive is. The archive is the backend in the serialization process. Doesn't make sense to me. I would think the archive would be what is created, not the creator. And The archive is responsible for archiving primitive types in the format chosen by the archive implementation. The archive ensures that all types are properly archived in a format that can be later unarchived. What confuses me here is the conflation between the archiveR and the resulting archive, i.e. an archiver creates an archive. Saying archive creates the archive is a bit of a disastrous conflation of the terms, as it makes the documentation a constant source of confusion. (E.g. a zip file is an archive - can this create zip files?) Theoretically one can create an archive that serializes to a zip file, yes. Or rather the format used by zip. An archive shouldn't write to disk. Some exposition of this is necessary, along with some comments along the line that the package provides a generic archiving interface, and a couple implementations X and Y of that interface, and that other implementations such as Z, the zip archiver, are possible.
Re: std.serialization: pre-voting review / discussion
On 2013-08-20 15:12, Dicebot wrote: What I really don't like is excessive amount of object in the API. For example, I have found no reason why I need to create serializer object to simply dump a struct state. It is both boilerplate and runtime overhead I can't justify. Only state serializer has is archiver - and it is simply collection of methods on its own. I prefer to be able to do something like `auto data = serialize!XmlArchiver(value);` I have been planning to add a function like that but just haven't got around doing it. This is just a convenience function that is easy to add. Some reasons for having an object oriented API are: * The serializer does have state. It stores information about what's serialized and keep track that an object is not stored more than once in the archive and similar things. * When doing custom serialization the serializer is passed to the methods: https://github.com/jacob-carlborg/orange/wiki/Custom-Serialization I have found documentation complete enough to get a basic understanding personally but one thing that has caused some frustration is that docs don't make clear distinction between minimal stuff and extra features. For example, there is https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/std_serialization_serializable.html - my guess that it is only used if user wants to override default serialization method for an aggregate type. But documentation for it is written in such manner that it gives an impression that it is absolutely required. Ok, I can try and clarify that. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-08-20 17:07, Tyler Jameson Little wrote: +1 This would enhance the 1-liner: write(file, serialize!XmlArchiver(InputRange)); We could even make nearly everything private except an isArchiver() template and serialize!(). The rest of the API is need for more advanced use cases. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-08-20 20:04, Walter Bright wrote: You can hide comments from ddoc by not starting them with /** but with /* Yeah, I know that. Doesn't make sense to me. I would think the archive would be what is created, not the creator. I guess it could be called archiver, or do you have a better suggestion? What confuses me here is the conflation between the archiveR and the resulting archive, i.e. an archiver creates an archive. Saying archive creates the archive is a bit of a disastrous conflation of the terms, as it makes the documentation a constant source of confusion. Would calling it archiver or some other name be better? Some exposition of this is necessary, along with some comments along the line that the package provides a generic archiving interface, and a couple implementations X and Y of that interface, and that other implementations such as Z, the zip archiver, are possible. I don't understand what's so confusing. This is the interface all archive implementations need to implement to be able to be used as an archive with the serializer. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Tuesday, 20 August 2013 at 20:06:12 UTC, Jacob Carlborg wrote: Would calling it archiver or some other name be better? Almost certainly, yes. An archive is something you put data into, not something that puts data somewhere else. ;) David
Re: std.serialization: pre-voting review / discussion
On 8/20/2013 1:06 PM, Jacob Carlborg wrote: I guess it could be called archiver, or do you have a better suggestion? That sounds perfect. Some exposition of this is necessary, along with some comments along the line that the package provides a generic archiving interface, and a couple implementations X and Y of that interface, and that other implementations such as Z, the zip archiver, are possible. I don't understand what's so confusing. This is the interface all archive implementations need to implement to be able to be used as an archive with the serializer. I tend to think in terms of concrete examples, rather than abstract concepts. Hence my suggestion.
Re: std.serialization: pre-voting review / discussion
On Tuesday, 20 August 2013 at 19:34:54 UTC, Jacob Carlborg wrote: I have been planning to add a function like that but just haven't got around doing it. This is just a convenience function that is easy to add. Cool, as I have said it is not something critical that would impact my voting, just personal preferences. * The serializer does have state. It stores information about what's serialized and keep track that an object is not stored more than once in the archive and similar things. Ah, makes sense. Well I guess then this is the power for robustness in data structure support and nothing can be done but hide it behind convenience wrappers. Sad but true :)
Re: std.serialization: pre-voting review / discussion
P.S. Right now most important (and probably only really important) thing is range API. I think it is worth focusing on it and getting through the voting stage - actual merge can happen at any time you / Phobos devs are satisfied with implementation state, it does not require major community attention.
Re: std.serialization: pre-voting review / discussion
On Sunday, 18 August 2013 at 18:26:55 UTC, Dicebot wrote: On Monday, 12 August 2013 at 13:27:45 UTC, Dicebot wrote: Stepping up to act as a Review Manager for Jacob Carlborg std.serialization So as a review manager, I think voting should be delayed until API is ready to address lazy range-based work model. No actual implementation is required but 1) it should be possible to do it later without breaking user code 2) library should not make an assumption about implementation being lazy or eager Can we path current std.xml to add file input/output, not only memory input/output? It can helps to serialize big data arrays directly in file.
Re: std.serialization: pre-voting review / discussion
On 2013-08-18 01:31, Jesse Phillips wrote: I'd like to start off by saying I don't really know what I want from a std.serialize module. I've done some work with a C# JSON serializer and dealt with Protocol Buffers. This library looks to be providing a means to serialize any D data structure. It deals with pointers/class/struct/arrays... It is export format agnostic, while currently only XML is available, allowing for export to JSON or some binary form. Afterwards the data can return to the program through deserialization. This is a use-case I don't think I've needed. Though I do see the value in it and would expect Phobos to provide such functionality. What I'm not finding in this library is a way to support a 3rd party protocol. Such as those used in Thrift or Protocol Buffers. These specify some aspects of data layout, for example in Protocol Buffers arrays of primitives can be laid out in two forms [ID][value][ID][value] or [ID][Length][value][value]. I have had a brief look at Protocol Buffers and I don't see why it wouldn't work as an archive. I would probably need to implement a Protocol Buffers archive type to see what the limitations of std.serialization and Protocol Buffers are. Thrift and Protocol Buffers use code generation to create the language data type, and at least for Protocol Buffers a method contains all the logic for deserializing a collection of bytes, and one for serializing. I'm not seeing how std.serialize would make this easier or more usable. If a Thrift or Protocol Buffers archive would be used with std.serialization I'm thinking that one would skip that step and have the data types defined directly in D. When looking at the Archive module, I see that all the specific types get their own void function. I'm unclear on where these functions are supposed to archive to The archive holds the data. When the serialization is complete the data can be accessed using archive.data. , and container types take a delegate which I suppose is a means for the archiver to place output around the field data. Yes, exactly. It lets the archive know where a structured type begins and ends. In conclusion, I don't feel like I've said very much. I don't think std.serialize is applicable to Protocol Buffers or Thrift, and I don't know what benefit there would be if it was. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-08-18 20:26, Dicebot wrote: OK, time to make a short summary. There have been mentioned several issues / improvement possibilities. I don't think they prevent voting and it is up to Jacob to decide what he want to incorporate from it. I've been quite busy lately but I've tried to address the minor issues with regards of documentation. I've hit a new problem in the process: http://forum.dlang.org/thread/kujcns$1quo$1...@digitalmars.com However, there are two things that do matter in my opinion - pre-UDA part of API and uncertainty about range-based lazy approach. Important thing here is that while library can be included with plenty of features lacking we can't really afford to break its API only few releases later just to add/remove these features. What do you mean with pre-UDA part of API? I think it will be fairly easy to add support for ranges, at least for the output. I'll see what I can do. So as a review manager, I think voting should be delayed until API is ready to address lazy range-based work model. No actual implementation is required but 1) it should be possible to do it later without breaking user code 2) library should not make an assumption about implementation being lazy or eager That is my understanding based on current knowledge of Phobos modules, please correct me if I am wrong. Jacob, please tell if you have any objections or, if this decision sounds reasonable - just contact me via e-mail when you will find std.serialization suitable for final voting. I think it is pretty clear that package itself is considered useful and welcome to Phobos. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 12:57:56 UTC, Jacob Carlborg wrote: I've been quite busy lately but I've tried to address the minor issues with regards of documentation. I've hit a new problem in the process: http://forum.dlang.org/thread/kujcns$1quo$1...@digitalmars.com I also expect that enhancement to dlang.org to support package.d documentation will also probably be needed at some point to get proper examples. Such issues can be worked on during actual merge process and are not worth blocking voting. What do you mean with pre-UDA part of API? This thread: http://forum.dlang.org/post/xqklcesoguxujifij...@forum.dlang.org I think it will be fairly easy to add support for ranges, at least for the output. I'll see what I can do. Great! Are there any difficulties with the input?
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 12:49:48 UTC, Jacob Carlborg wrote: I have had a brief look at Protocol Buffers and I don't see why it wouldn't work as an archive. I would probably need to implement a Protocol Buffers archive type to see what the limitations of std.serialization and Protocol Buffers are. You can find the Protocol Buffers library here, may be it helps: https://256.makerslocal.org/wiki/index.php/ProtocolBuffer
Re: std.serialization: pre-voting review / discussion
On 2013-08-19 15:03, Dicebot wrote: Great! Are there any difficulties with the input? It just that I don't clearly know how the code will need to look like, and I'm not particular familiar with implementing range based code. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 13:31:27 UTC, Jacob Carlborg wrote: On 2013-08-19 15:03, Dicebot wrote: Great! Are there any difficulties with the input? It just that I don't clearly know how the code will need to look like, and I'm not particular familiar with implementing range based code. Ok, I'll investigate related part of package a bit more in details during this week and see if I can suggest something.
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 13:31:27 UTC, Jacob Carlborg wrote: On 2013-08-19 15:03, Dicebot wrote: Great! Are there any difficulties with the input? It just that I don't clearly know how the code will need to look like, and I'm not particular familiar with implementing range based code. Maybe we need some kind of doc explaining the idiomatic usage of ranges? Personally, I'd like to do something like this: auto archive = new XmlArchive!(char); // create an XML archive auto serializer = new Serializer(archive); // create the serializer serializer.serialize(foo); pipe(archive.out, someFile); Where pipe would read from the left and write to the right. My idea for an implementation is through using take(): void pipe(R) (R input, File output) // isInputRange(R)... { while (!input.empty) { // if Serializer has no data cached, goes through one step // and returns what it has auto arr = input.take(BUF_SIZE); input.popFrontN(arr.length); output.write(arr); } } For now, I'd be happy for serializer to process all data in serialize(), but change the behavior later to do step through computation when calling take(). I don't know if this helps, and others are very likely to have better ideas.
Re: std.serialization: pre-voting review / discussion
On Sunday, 18 August 2013 at 20:33:01 UTC, Jonathan M Davis wrote: On Sunday, August 18, 2013 21:45:59 Jacob Carlborg wrote: If versioning is crucial it can be added. I don't know if it's crucial or not, but I know that the Java guys didn't have it initially but ended up adding it later, which would imply that they ran into problems that made them decide that it should be there. I'd certainly be inclined to think that it's better to have it, and it's probably easier to add it before it's merged than later. But I don't know how crucial it is. - Jonathan M Davis I think this versioning idea is more important for protocol buffers, msgpck, thrift like libraries that use a separate IDL schema and IDL-compiled code. std.serialization uses the D code itself to serialize so the version is practically dictated by the user. It may as well be manually handledas long as it throws/returns error and doesn't crash if one tries to deserialize an archive type into a different/modified D type. From memory the Protocol Buffers versioning is to ensure schema generated code and library are compatible. You get compile errors if you try to compile IDL generated code with a newer version of the library. Similarly you get runtime errors if you deserialize data that was serialized with an older version of the library. This is all from memory so I could be wrong... Orange seems/feels more like the BOOST.serialization to me but much better. It's D for a start and allows custom archive types. In BOOST, the library stores a version number in the archive for each class serialized. This number defaults to 0 but can be set by the user via a #define. http://www.boost.org/doc/libs/1_54_0/libs/serialization/doc/tutorial.html#versioning I think adding it later can be done without breaking existing API, if it is deemed necessary. It just needs to default to 0 or something similar when missing from an archive and ensure it won't clash with any fields in existing archives.
Re: std.serialization: pre-voting review / discussion
On 2013-08-19 15:47, Dicebot wrote: Ok, I'll investigate related part of package a bit more in details during this week and see if I can suggest something. What I have now is something like this: auto foo = new Foo; foo.a = 3; auto archive = new XmlArchive!(string); // string is the range type auto serializer = new Serializer(archive); serializer.serialize(foo); auto data = archive.data; // returns a range, typed as XmlArchiveData The problem now is that the range type is string, so I can't set the data using any other range type: archive.data = data; Results in: Error: cannot implicitly convert expression (range) of type XmlArchiveData to string How can I handle that? -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 13:17:48 UTC, ilya-stromberg wrote: On Monday, 19 August 2013 at 12:49:48 UTC, Jacob Carlborg wrote: I have had a brief look at Protocol Buffers and I don't see why it wouldn't work as an archive. I would probably need to implement a Protocol Buffers archive type to see what the limitations of std.serialization and Protocol Buffers are. You can find the Protocol Buffers library here, may be it helps: https://256.makerslocal.org/wiki/index.php/ProtocolBuffer Code has moved to https://github.com/opticron/ProtocolBuffer
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 12:49:48 UTC, Jacob Carlborg wrote: I have had a brief look at Protocol Buffers and I don't see why it wouldn't work as an archive. I would probably need to implement a Protocol Buffers archive type to see what the limitations of std.serialization and Protocol Buffers are. I not familiar with the interaction of Archive and Serializer. I was overwhelmed by the number of functions I'd have to implement (or in my case ignore) and ultimately I didn't know what my serialized data would look like. I think it is possible to output a binary format which uses the same translation as Protocol Buffers, but I wouldn't expect it to resemble a message. Thrift and Protocol Buffers use code generation to create the language data type, and at least for Protocol Buffers a method contains all the logic for deserializing a collection of bytes, and one for serializing. I'm not seeing how std.serialize would make this easier or more usable. If a Thrift or Protocol Buffers archive would be used with std.serialization I'm thinking that one would skip that step and have the data types defined directly in D. I'll see if I can push my way through creating an Archive type.
Re: std.serialization: pre-voting review / discussion
On 2013-08-19 17:40, Jesse Phillips wrote: I not familiar with the interaction of Archive and Serializer. I was overwhelmed by the number of functions I'd have to implement (or in my case ignore) and ultimately I didn't know what my serialized data would look like. std.serialization basically support any type in D (except for delegates and function pointers). If a particular method doesn't make sense to implement for a given archive, just implement a dummy function to satisfy the interface. The documentation for Archive says so: When implementing a new archive type, if any of these methods do not make sense for that particular implementation just implement an empty method and return T.init, if the method returns a value. If something breaks due to this please let me know. I think it is possible to output a binary format which uses the same translation as Protocol Buffers, but I wouldn't expect it to resemble a message. In the binary archive I'm working on I have chosen to ignore some parts of the implicit contract between the serializer and the archive. For example, I'm not planning to support slices, pointers to fields and similar complex features. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-08-19 15:03, Dicebot wrote: This thread: http://forum.dlang.org/post/xqklcesoguxujifij...@forum.dlang.org I have removed all uses of mixin annotations. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-08-19 17:41, Jesse Phillips wrote: Code has moved to https://github.com/opticron/ProtocolBuffer Does it have any utility functions that are fairly standalone to handle the basic types, i.e. int, string, float and so on? -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
Am Mon, 19 Aug 2013 16:21:44 +0200 schrieb Tyler Jameson Little beatgam...@gmail.com: On Monday, 19 August 2013 at 13:31:27 UTC, Jacob Carlborg wrote: On 2013-08-19 15:03, Dicebot wrote: Great! Are there any difficulties with the input? It just that I don't clearly know how the code will need to look like, and I'm not particular familiar with implementing range based code. Maybe we need some kind of doc explaining the idiomatic usage of ranges? Personally, I'd like to do something like this: auto archive = new XmlArchive!(char); // create an XML archive auto serializer = new Serializer(archive); // create the serializer serializer.serialize(foo); pipe(archive.out, someFile); Your pipe function is the same as std.algorithm.copy(InputRange, OutputRange) or std.range.put(OutputRange, InputRange); An important question regarding ranges for std.serialization is whether we want it to work as an InputRange or if it should _take_ an OutputRange. So the question is - auto archive = new Archive(); Serializer(archive).serialize(object); //Archive takes OutputRange, writes to it archive.writeTo(OutputRange); vs auto archive = new Archive() Serializer(archive).serialize(object); //Archive implements InputRange for ubyte[] foreach(ubyte[] data; archive) {} - I'd use the first approach as it should be simpler to implement. The second approach would be useful if the ubyte[] elements were processed via other ranges (map, take, ...). But as binary data is usually not processed in this way but just stored to disk or sent over network (basically streaming operations) the first approach should be fine. The first approach has the additional benefit that we can easily do streaming like this: auto archive = new Archive(OutputRange); //Immediately write the data to the output range Serializer(archive).serialize([1,2,3]); This is difficult to implement with the second approach as you somehow have to interleave calls to serialize and reads to the InputRange interface: Serializer(archive).serialize(1); foreach(data; archive) {stdout.write(data);} Serializer(archive).serialize(2); foreach(data; archive) {stdout.write(data);} And it's still less efficient than approach 1 as it has to keep an internal buffer. Another point is that serialize in the above example could be renamed to put. This way Serializer would itself be an OutputRange which allows stuff like [1,2,3,4,5].stride(2).take(2).copy(archive); Then serialize could also accept InputRanges to allow this: archive.serialize([1,2,3,4,5].stride(2).take(2)); However, this use case is already covered by using copy so it would just be for convenience.
Re: std.serialization: pre-voting review / discussion
19-Aug-2013 22:05, Johannes Pfau пишет: Am Mon, 19 Aug 2013 16:21:44 +0200 schrieb Tyler Jameson Little beatgam...@gmail.com: Another point is that serialize in the above example could be renamed to put. This way Serializer would itself be an OutputRange which allows stuff like [1,2,3,4,5].stride(2).take(2).copy(archive); +1 I totally expect serializer to be a sink. Then serialize could also accept InputRanges to allow this: archive.serialize([1,2,3,4,5].stride(2).take(2)); However, this use case is already covered by using copy so it would just be for convenience. -- Dmitry Olshansky
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 14:47:15 UTC, bsd wrote: I think this versioning idea is more important for protocol buffers, msgpck, thrift like libraries that use a separate IDL schema and IDL-compiled code. std.serialization uses the D code itself to serialize so the version is practically dictated by the user. It may as well be manually handledas long as it throws/returns error and doesn't crash if one tries to deserialize an archive type into a different/modified D type. From memory the Protocol Buffers versioning is to ensure schema generated code and library are compatible. You get compile errors if you try to compile IDL generated code with a newer version of the library. Similarly you get runtime errors if you deserialize data that was serialized with an older version of the library. This is all from memory so I could be wrong... Seems like your memory has indeed faded a bit. ;) Versioning is an integral idea of formats like Protobuf and Thrift. For example, see the A bit of history section right on the doc overview page. [1] You might also want to read through the (rather dated) Thrift whitepaper to get an idea about what the design constraints for it were. [2] The main point is that when you have deployed services at the scale Google or Facebook work with, you can't just upgrade all involved parties simultaneously on a schema change. So, having to support multiple versions running along each other is pretty much a given, and the best way to deal with that is to build it right into your protocols. David [1] https://developers.google.com/protocol-buffers/docs/overview [2] http://thrift.apache.org/static/files/thrift-20070401.pdf
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 19:47:32 UTC, David Nadlinger wrote: Versioning is an integral idea of formats like Protobuf and Thrift. For example, see the A bit of history section right on the doc overview page. [1] You might also want to read through the (rather dated) Thrift whitepaper to get an idea about what the design constraints for it were. [2] By the way, to be honest, this is also the main point that makes me feel uneasy about including Orbit in Phobos at this point: Sure, it has been around for some time, but as far as I can tell, not a lot of people are using it right now, and what seems to be entirely missing from the docs is a clear design rationale, outlining its goals and explaining how Orbit compares to well-known existing solutions. It seems to me that a large part of the discussion in this thread can be attributed to that fact, i.e. a lack of understanding/agreement what the module is supposed to be in the first place. David
Re: std.serialization: pre-voting review / discussion
Seems like your memory has indeed faded a bit. ;) Versioning is an integral idea of formats like Protobuf and Thrift. For example, see the A bit of history section right on the doc overview page. [1] You might also want to read through the (rather dated) Thrift whitepaper to get an idea about what the design constraints for it were. [2] The main point is that when you have deployed services at the scale Google or Facebook work with, you can't just upgrade all involved parties simultaneously on a schema change. So, having to support multiple versions running along each other is pretty much a given, and the best way to deal with that is to build it right into your protocols. David [1] https://developers.google.com/protocol-buffers/docs/overview [2] http://thrift.apache.org/static/files/thrift-20070401.pdf Getting old! :-) Thanks for the heads up.
Re: std.serialization: pre-voting review / discussion
On Monday, 19 August 2013 at 18:06:00 UTC, Johannes Pfau wrote: Am Mon, 19 Aug 2013 16:21:44 +0200 schrieb Tyler Jameson Little beatgam...@gmail.com: On Monday, 19 August 2013 at 13:31:27 UTC, Jacob Carlborg wrote: On 2013-08-19 15:03, Dicebot wrote: Great! Are there any difficulties with the input? It just that I don't clearly know how the code will need to look like, and I'm not particular familiar with implementing range based code. Maybe we need some kind of doc explaining the idiomatic usage of ranges? Personally, I'd like to do something like this: auto archive = new XmlArchive!(char); // create an XML archive auto serializer = new Serializer(archive); // create the serializer serializer.serialize(foo); pipe(archive.out, someFile); Your pipe function is the same as std.algorithm.copy(InputRange, OutputRange) or std.range.put(OutputRange, InputRange); Right, for some reason I couldn't find it... Moot point though. An important question regarding ranges for std.serialization is whether we want it to work as an InputRange or if it should _take_ an OutputRange. So the question is - auto archive = new Archive(); Serializer(archive).serialize(object); //Archive takes OutputRange, writes to it archive.writeTo(OutputRange); vs auto archive = new Archive() Serializer(archive).serialize(object); //Archive implements InputRange for ubyte[] foreach(ubyte[] data; archive) {} - I'd use the first approach as it should be simpler to implement. The second approach would be useful if the ubyte[] elements were processed via other ranges (map, take, ...). But as binary data is usually not processed in this way but just stored to disk or sent over network (basically streaming operations) the first approach should be fine. +1 for the first way. The first approach has the additional benefit that we can easily do streaming like this: auto archive = new Archive(OutputRange); //Immediately write the data to the output range Serializer(archive).serialize([1,2,3]); This can make a nice one-liner for the general case: Serializer(new Archive(OutputRange)).serialize(...); Another point is that serialize in the above example could be renamed to put. This way Serializer would itself be an OutputRange which allows stuff like [1,2,3,4,5].stride(2).take(2).copy(archive); Then serialize could also accept InputRanges to allow this: archive.serialize([1,2,3,4,5].stride(2).take(2)); However, this use case is already covered by using copy so it would just be for convenience. This is nice, but I think I like serialize() better. I also don't think serializing a range is it's primary purpose, so it doesn't make a lot of sense to optimize for the uncommon case.
Re: std.serialization: pre-voting review / discussion
On Saturday, 17 August 2013 at 08:29:37 UTC, glycerine wrote: On Wednesday, 14 August 2013 at 13:43:50 UTC, Dicebot wrote: On Wednesday, 14 August 2013 at 13:28:42 UTC, glycerine wrote: Wishful thinking aside, they are competitors. They are not. `std.serialization` does not and should not compete in Thrift domain. Huh? Do you know what thrift does? Summary: Everything that Orange/std.serialization does and more. That's actually not true. Thrift does not serialize arbitrary object graphs, or any types with indirections, for that matter. This is by design, it would be hard to do this efficiently in all target languages, and contrary to Orange, performance is the main focus of Thrift. If you are going to standardize something, standardize the Thrift bindings so that the compiler doesn't introduce regressions that break them, like happened from dmd 2.062 - present. On a related note, we desperately need to do something about this, especially since there seems to be an increased amount of interest in Thrift lately. For 2.061 and the previous releases, I always tested every beta against Thrift, and almost invariably found at least one bug/regression per release. However, for 2.062 and 2.063, I was busy with LDC (and other things) at the time and it seems like I forgot to run the tests. The DMD 2.062+ error message (see https://issues.apache.org/jira/browse/THRIFT-2130) doesn't make much sense; I guess the best way of going about this would be to try to DustMite-reduce the problem first or to fire up DMD in gdb to see what exactly is tripping the recursive alias error. David
Re: std.serialization: pre-voting review / discussion
On Saturday, 17 August 2013 at 11:20:17 UTC, Dicebot wrote: 1) Having bindings in standard library is discouraged, we have Deimos for that. There is only curl stuff and it is considered a bad solution as far as I am aware of. The D implementation of Thrift is actually not a binding and does not necessarily rely on the Thrift code generator either – all the latter does is to generate a D struct definition for the types/method parameters in your .thrift file that is then handled at D compile-time via reflection. In fact, this even works the other way, allowing you to generate .thrift IDL files for existing D types. (And yes, in theory the code generator could be replaced by ImportExpressions and a CTFE parser.) David
Re: std.serialization: pre-voting review / discussion
On Saturday, 17 August 2013 at 10:15:34 UTC, BS wrote: I'd rather that was left for a separate module (or two or three) built on top of std.serialization. In an ideal world, Thrift could maybe be built on std.serialization, but in the current form that's not true (regardless of e.g. versioning, Orange is likely not fast enough), and I am not sure whether this is a desirable goal in the first place anyway. David
Re: std.serialization: pre-voting review / discussion
On Wednesday, 14 August 2013 at 16:25:21 UTC, Andrei Alexandrescu wrote: On 8/14/13 1:48 AM, Jacob Carlborg wrote: On 2013-08-14 10:19, Tyler Jameson Little wrote: - I would to serialize to a range (file?) and deserialize from a range (file?) The serialized data is returned as an array, so that is compatible with the range interface, it just won't be lazy. This seems like a major limitation. (Disclaimer: I haven't read the documentation yet.) Andrei Shall we fix it before accept the std.serialization? For example, if I have 10GB of data and 16GB operating memory, I can't use std.serialization. It saves all my data into string into operating memory, so I haven't got enough memory to save data in file. It's currently limited by std.xml. In other hand, std.serialization can help in many other cases if I have enough memory to store copy of my data. As I can see, we have a few options: - accept std.serialization as is. If users can't use std.serialization due memory limitation, they should find another way. - hold std.serialization until we will have new std.xml module with support of range/file input/output. Users should use Orange if they need std.serialization right now. - hold std.serialization until we will have binary archive for serialization with support of range/file input/output. Users should use Orange if they need std.serialization right now. - use another xml library, for example from Tango. Ideas?
Re: std.serialization: pre-voting review / discussion
ilya-stromberg wrote: The serialized data is returned as an array, so that is compatible with the range interface, it just won't be lazy. This seems like a major limitation. (Disclaimer: I haven't read the documentation yet.) Andrei Shall we fix it before accept the std.serialization? For example, if I have 10GB of data and 16GB operating memory, I can't use std.serialization. It saves all my data into string into operating memory, so I haven't got enough memory to save data in file. It's currently limited by std.xml. In other hand, std.serialization can help in many other cases if I have enough memory to store copy of my data. As I can see, we have a few options: - accept std.serialization as is. If users can't use std.serialization due memory limitation, they should find another way. - hold std.serialization until we will have new std.xml module with support of range/file input/output. Users should use Orange if they need std.serialization right now. - hold std.serialization until we will have binary archive for serialization with support of range/file input/output. Users should use Orange if they need std.serialization right now. - use another xml library, for example from Tango. My opinion is - accept it as it is (if it's not completely broken). I recently needed some way to serialize a data structure (in order by save the state of the app and restore it later) and was quite disappointed there is nothing like that in Phobos. Although XML is not necessarily well suited to my particular use case, it's still better than nothing. Binary archive would be a great plus, but allow me to point out that current state of affairs (std.serialization being in a pre-accepted state for a long time AFAIK) is probably the worst state we might have - on the one hand I would not use third party libs, because std.serialization is just around the corner, on the other I don't have std.serialization distributed with the compiler yet. Also binary archive is an extension, not a change, so I don't see any reason why it could not be added later (because it would be backward compatible). -- Marek Janukowicz
Re: std.serialization: pre-voting review / discussion
On 8/18/13, Marek Janukowicz ma...@janukowicz.net wrote: I recently needed some way to serialize a data structure (in order by save the state of the app and restore it later) and was quite disappointed there is nothing like that in Phobos. FWIW you could try out msgpack-d: https://github.com/msgpack/msgpack-d#usage It's a very tiny and a fast library.
Re: std.serialization: pre-voting review / discussion
On Sunday, 18 August 2013 at 08:38:53 UTC, ilya-stromberg wrote: As I can see, we have a few options: - accept std.serialization as is. If users can't use std.serialization due memory limitation, they should find another way. - hold std.serialization until we will have new std.xml module with support of range/file input/output. Users should use Orange if they need std.serialization right now. - hold std.serialization until we will have binary archive for serialization with support of range/file input/output. Users should use Orange if they need std.serialization right now. - use another xml library, for example from Tango. Ideas? We should add a suitable range interface, even if it makes no sense with current std.xml and include std.serialization now. For many use cases it will be sufficient and the improvements can come when std.xml2 comes. Holding back std.serialization will only mean that we won't see any new backend from users and would be quite unfair to Jacob and may keep off other contributors.
Re: std.serialization: pre-voting review / discussion
On 8/18/13, David Nadlinger c...@klickverbot.at wrote: On Saturday, 17 August 2013 at 08:29:37 UTC, glycerine wrote: If you are going to standardize something, standardize the Thrift bindings so that the compiler doesn't introduce regressions that break them, like happened from dmd 2.062 - present. On a related note, we desperately need to do something about this, especially since there seems to be an increased amount of interest in Thrift lately. For 2.061 and the previous releases, I always tested every beta against Thrift, and almost invariably found at least one bug/regression per release. However, for 2.062 and 2.063, I was busy with LDC (and other things) at the time and it seems like I forgot to run the tests. I think it would be good if we added Thrift and other test-cases, for example from the D Templates Book, to the test machines. But since there's a lot of code maybe the test machines should run the tests sporadically (e.g. after every #N new commits), otherwise pull requests would take forever to test. Alternatively we could at least try to test these major projects with release candidates Normally the project maintainers would do this themselves, but it's easy to run out of time or just to forget to test things, and then it's too late (well we have fixup DMD releases now so it's not too bad).
Re: std.serialization: pre-voting review / discussion
On Sunday, 18 August 2013 at 14:24:38 UTC, Tobias Pankrath wrote: On Sunday, 18 August 2013 at 08:38:53 UTC, ilya-stromberg wrote: As I can see, we have a few options: - accept std.serialization as is. If users can't use std.serialization due memory limitation, they should find another way. - hold std.serialization until we will have new std.xml module with support of range/file input/output. Users should use Orange if they need std.serialization right now. - hold std.serialization until we will have binary archive for serialization with support of range/file input/output. Users should use Orange if they need std.serialization right now. - use another xml library, for example from Tango. Ideas? We should add a suitable range interface, even if it makes no sense with current std.xml and include std.serialization now. For many use cases it will be sufficient and the improvements can come when std.xml2 comes. Holding back std.serialization will only mean that we won't see any new backend from users and would be quite unfair to Jacob and may keep off other contributors. I completely agree. I'm the one that brought it up, and I mostly brought it up so the API doesn't have to change once std.xml is fixed. I don't think changing the return type to a range will be too difficult or memory expensive. Also, since slices *are* ranges, shouldn't this just work?
Re: std.serialization: pre-voting review / discussion
On Sunday, 18 August 2013 at 14:52:04 UTC, Andrej Mitrovic wrote: Normally the project maintainers would do this themselves, but it's easy to run out of time or just to forget to test things, and then it's too late (well we have fixup DMD releases now so it's not too bad). The big problem with this right now is that quite frequently, you run the tests and discover one regression in the beta, file it, fix it (or wait for it to get fixed), then run the tests again, discover that they still don't pass, etc. This is not only an annoying and time-intensive job for the maintainer of the project (as during beta you have to pretty much always be on your toes for a new version to test lest Walter decide to make the final release), but this also increases beta duration. One obvious reaction to this (as a project maintainer) would be to continuously track Git master and report regressions as they arise. However, this is also not always practical, as quite often, there is a regression/backwards-incompatible change early on in the development process that is not fixed until much later, so that multiple issues can still pile up unnoticed. Having a system that regularly, automatically runs the test suites of several larger, well-known D projects with the results being readily available to the DMD/druntime/Phobos teams would certainly help. But it's also not ideal, since if a project starts to fail, the exact nature of the issue (regression in DMD or bug in the project, and if the former, a minimal test case) can often be hard to track down for somebody not already familiar with the code base. David
Re: std.serialization: pre-voting review / discussion
On Sunday, 18 August 2013 at 16:33:51 UTC, David Nadlinger wrote: ... Please, don't move too far from review topic ;) It is a separate issue to discuss.
Re: std.serialization: pre-voting review / discussion
On Monday, 12 August 2013 at 13:27:45 UTC, Dicebot wrote: Stepping up to act as a Review Manager for Jacob Carlborg std.serialization Input Code: https://github.com/jacob-carlborg/phobos/tree/serialization Documentation: https://dl.dropboxusercontent.com/u/18386187/docs/std.serialization/index.html Previous review thread: http://forum.dlang.org/thread/adyanbsdsxsfdpvoo...@forum.dlang.org Changes since last review - Sources has been integrated into Phobos source tree - DDOC documentation has been provided in a form it should look like on dlang.org - Most utility functions/template code depends on have been inlined. Remaining `package` utility modules: * std.serialization.archives.xmldocument * std.serialization.attribute * std.serialization.registerwrapper Information for reviewers Goal of this thread is to detect if there are any outstanding issues that need to fixed before formal yes/no voting happens. If no critical objections will arise, voting will begin starting with a next week. Please take this seriously: If you identify problems along the way, please note if they are minor, serious, or showstoppers. (http://wiki.dlang.org/Review/Process). This information later will be used to determine if library is ready for voting. If there are any frequent Phobos contributors / core developers please pay extra attention to submission code style and fitting into overall Phobos guidelines and structure. - Let the thread begin. Jacob, it is probably worth creating a pull request with latest rebased version of your proposal to simplify getting a quick overview of changes. Also please tell if there is anything you want/need to implement before merging. OK, time to make a short summary. There have been mentioned several issues / improvement possibilities. I don't think they prevent voting and it is up to Jacob to decide what he want to incorporate from it. However, there are two things that do matter in my opinion - pre-UDA part of API and uncertainty about range-based lazy approach. Important thing here is that while library can be included with plenty of features lacking we can't really afford to break its API only few releases later just to add/remove these features. So as a review manager, I think voting should be delayed until API is ready to address lazy range-based work model. No actual implementation is required but 1) it should be possible to do it later without breaking user code 2) library should not make an assumption about implementation being lazy or eager That is my understanding based on current knowledge of Phobos modules, please correct me if I am wrong. Jacob, please tell if you have any objections or, if this decision sounds reasonable - just contact me via e-mail when you will find std.serialization suitable for final voting. I think it is pretty clear that package itself is considered useful and welcome to Phobos.
Re: std.serialization: pre-voting review / discussion
On 2013-08-18 10:38, ilya-stromberg wrote: - use another xml library, for example from Tango. The XML module from Tango excepts the content being in memory as well, at least the Document module. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
On 2013-08-17 10:29, glycerine wrote: Huh? Do you know what thrift does? Summary: Everything that Orange/std.serialization does and more. To the point: Thrift provides data versioning, std.serialization does not. In my book: end of story, game over. Thrift is preffered choice. If you are going to standardize something, standardize the Thrift bindings so that the compiler doesn't introduce regressions that break them, like happened from dmd 2.062 - present. Orange/std.serialization is capable of serializing more types than Thrift is. Example it will correctly serialize and deserialize slices, pointers and so on. It's easy to implement versioning yourself, something like: class Foo { int version_; int a; int b; void toData (Serializer serializer, Serializer.Data key) { serializer.serialize(a, a); serializer.serialize(version_, version_); if (version_ == 2) serializer.serialize(b, b); } // Do the corresponding in fromData. } If versioning is crucial it can be added. -- /Jacob Carlborg
Re: std.serialization: pre-voting review / discussion
Andrej Mitrovic wrote: I recently needed some way to serialize a data structure (in order by save the state of the app and restore it later) and was quite disappointed there is nothing like that in Phobos. FWIW you could try out msgpack-d: https://github.com/msgpack/msgpack-d#usage It's a very tiny and a fast library. That's what I ended up using, but I would be much more happy to have something like this in Phobos. -- Marek Janukowicz
Re: std.serialization: pre-voting review / discussion
On Sunday, August 18, 2013 21:45:59 Jacob Carlborg wrote: If versioning is crucial it can be added. I don't know if it's crucial or not, but I know that the Java guys didn't have it initially but ended up adding it later, which would imply that they ran into problems that made them decide that it should be there. I'd certainly be inclined to think that it's better to have it, and it's probably easier to add it before it's merged than later. But I don't know how crucial it is. - Jonathan M Davis
Re: std.serialization: pre-voting review / discussion
On 8/18/2013 11:26 AM, Dicebot wrote: So as a review manager, I think voting should be delayed until API is ready to address lazy range-based work model. I agree. Ranges are a very big deal for D, and libraries that can conceivably support it must do so.
Re: std.serialization: pre-voting review / discussion
On Wednesday, 14 August 2013 at 13:43:50 UTC, Dicebot wrote: On Wednesday, 14 August 2013 at 13:28:42 UTC, glycerine wrote: Wishful thinking aside, they are competitors. They are not. `std.serialization` does not and should not compete in Thrift domain. Huh? Do you know what thrift does? Summary: Everything that Orange/std.serialization does and more. To the point: Thrift provides data versioning, std.serialization does not. In my book: end of story, game over. Thrift is preffered choice. If you are going to standardize something, standardize the Thrift bindings so that the compiler doesn't introduce regressions that break them, like happened from dmd 2.062 - present. You don't provide any rationale for your assertion, so I can't really respond more constructively until you do. Please familiarize yourself with D's Thrift bindings, which work well with dmd 2.061. Then provide a rationale for your conjecture.