subject:"Re\: RFC\: std.json sucessor"


Am 22.08.2014 00:48, schrieb Brian Schott:

On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:

Destroy away! ;)


source/stdx/data/json/lexer.d(263:8)[warn]: 'JSONToken' has method
'opEquals', but not 'toHash'.
source/stdx/data/json/lexer.d(499:65)[warn]: Use parenthesis to clarify
this expression.
source/stdx/data/json/parser.d(516:8)[warn]: 'JSONParserNode' has method
'opEquals', but not 'toHash'.
source/stdx/data/json/value.d(95:10)[warn]: Variable c is never used.
source/stdx/data/json/value.d(99:10)[warn]: Variable d is never used.
source/stdx/data/json/package.d(942:14)[warn]: Variable val is never used.

It's likely that you can ignore these, but I thought I'd post them
anyways. (The last three are in unittest blocks, for example.)


Fixed all of them (neither was causing harm, but it's still nicer that 
way). Also added @safe and nothrow where possible.


BTW, anyone knows what's holding back formattedWrite() from being @safe 
for simple types?

Re: RFC: std.json sucessor


Am 22.08.2014 00:35, schrieb Sönke Ludwig:

The DOM style JSONValue type is based on std.variant.Algebraic. This
currently has a few usability issues that can be solved by
upgrading/fixing Algebraic:

  - Operator overloading only works sporadically
  - (...)
  - Operations and conversions between different Algebraic types is not
conveniently supported, which gets important when other similar
formats get supported (e.g. BSON)


https://github.com/D-Programming-Language/phobos/pull/2452
https://github.com/D-Programming-Language/phobos/pull/2453

Those fix the most important operators, index access and binary arithmetic.

Re: RFC: std.json sucessor

2014-08-22 Thread matovitch via Digitalmars-d

Very nice ! I had started (and dropped) a json module based on 
Algebraic too. So without opDispatch you plan to use a syntax 
like jPerson["age"] = 10 ? You didn't use stdx.d.lexer. Any 
reason why ? (I am asking even if I never used this module.(never 
coded much in D in fact))

Re: RFC: std.json sucessor


Am 22.08.2014 14:17, schrieb matovitch:

Very nice ! I had started (and dropped) a json module based on Algebraic
too. So without opDispatch you plan to use a syntax like jPerson["age"]
= 10 ? You didn't use stdx.d.lexer. Any reason why ? (I am asking even
if I never used this module.(never coded much in D in fact))


Exactly, that's the syntax you'd use for JSONValue. But my favorite way 
to work with most JSON data is actually to directly read the JSON string 
into a D struct using a serialization framework and then access the 
struct in a strongly typed way. This has both, less syntactic and less 
runtime overhead, and also greatly reduces the chance for field 
name/type related bugs.


The module is written against current Phobos, which is why stdx.d.lexer 
wasn't really an option. I'm also unsure if std.lexer would be able to 
handle the parsing required for JSON numbers and strings. But it would 
certainly be nice already if at least the token structure could be 
reused. However, it should also be possible to find a painless migration 
path later, when std.lexer is actually part of Phobos.

Re: RFC: std.json sucessor

2014-08-22 Thread matovitch via Digitalmars-d


On Friday, 22 August 2014 at 12:39:08 UTC, Sönke Ludwig wrote:

Am 22.08.2014 14:17, schrieb matovitch:
Very nice ! I had started (and dropped) a json module based on 
Algebraic
too. So without opDispatch you plan to use a syntax like 
jPerson["age"]
= 10 ? You didn't use stdx.d.lexer. Any reason why ? (I am 
asking even

if I never used this module.(never coded much in D in fact))


Exactly, that's the syntax you'd use for JSONValue. But my 
favorite way to work with most JSON data is actually to 
directly read the JSON string into a D struct using a 
serialization framework and then access the struct in a 
strongly typed way. This has both, less syntactic and less 
runtime overhead, and also greatly reduces the chance for field 
name/type related bugs.




Completely agree, I am waiting for a serializer too. I would love 
to see something like cap'n proto in D.


The module is written against current Phobos, which is why 
stdx.d.lexer wasn't really an option. I'm also unsure if 
std.lexer would be able to handle the parsing required for JSON 
numbers and strings. But it would certainly be nice already if 
at least the token structure could be reused. However, it 
should also be possible to find a painless migration path 
later, when std.lexer is actually part of Phobos.


Ok. I think I remember there was a stdx.d.lexer's Json parser 
provided as sample.

Re: RFC: std.json sucessor


Am 22.08.2014 14:47, schrieb matovitch:

Ok. I think I remember there was a stdx.d.lexer's Json parser provided
as sample.



I see, so you just have to write your own number/string parsing routines:
https://github.com/Hackerpilot/lexer-demo/blob/master/jsonlexer.d

Re: RFC: std.json sucessor

2014-08-22 Thread matovitch via Digitalmars-d


On Friday, 22 August 2014 at 13:00:19 UTC, Sönke Ludwig wrote:

Am 22.08.2014 14:47, schrieb matovitch:
Ok. I think I remember there was a stdx.d.lexer's Json parser 
provided

as sample.



I see, so you just have to write your own number/string parsing 
routines:

https://github.com/Hackerpilot/lexer-demo/blob/master/jsonlexer.d


It's kind of "low level" indeed...I don't know what kind of back 
magic are doing all these template mixins but the code looks 
quite clean.


Confusing :

// Therefore, this always returns false.
bool isSeparating(size_t offset) pure nothrow @safe
{
return true;
}

Re: RFC: std.json sucessor

2014-08-22 Thread Ary Borenszweig via Digitalmars-d


On 8/22/14, 3:33 AM, Sönke Ludwig wrote:

Am 22.08.2014 02:42, schrieb Ary Borenszweig:

Say I have a class Person with name (string) and age (int) with a
constructor that receives both. How would I create an instance of a
Person from a json with the json stream?

Suppose the json is this:

{"age": 10, "name": "John"}

And the class is this:

class Person {
   this(string name, int age) {
 // ...
   }
}



Without a serialization framework it would in theory work like this:

 JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
 auto p = new Person(v["name"].get!string, v["age"].get!int);

unfortunately the operator overloading doesn't work like this currently,
so this is needed:

 JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
 auto p = new Person(
 v.get!(Json[string])["name"].get!string,
 v.get!(Json[string])["age"].get!int);


But does this parse the whole json into JSONValue? I want to create a 
Person without creating an intermediate JSONValue for the whole json. 
Can this be done?

Re: RFC: std.json sucessor

2014-08-22 Thread Jacob Carlborg via Digitalmars-d


On 2014-08-22 00:35, Sönke Ludwig wrote:

Following up on the recent "std.jgrandson" thread [1], I've picked up
the work (a lot earlier than anticipated) and finished a first version
of a loose blend of said std.jgrandson, vibe.data.json and some changes
that I had planned for vibe.data.json for a while. I'm quite pleased by
the results so far, although without a serialization framework it still
misses a very important building block.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/
DUB: http://code.dlang.org/packages/std_data_json


* Opening braces should be put on their own line to follow Phobos style 
guides


* I'm wondering about the assert in lexer.d, line 160. What happens if 
two invalid tokens after each other occur?


* I think we have talked about this before, when reviewing D lexers. I'm 
thinking of how to handle invalid data. Is it the best solution to throw 
an exception? Would it be possible to return an error token and have the 
client decide what to do about? Shouldn't it be possible to build a JSON 
validator on this?


* The lexer seems to always convert JSON types to their native D types, 
is that wise to do? That's unnecessary if you're implementing syntax 
highlighting


--
/Jacob Carlborg

Re: RFC: std.json sucessor


On Friday, 22 August 2014 at 15:47:51 UTC, Jacob Carlborg wrote:
* I think we have talked about this before, when reviewing D 
lexers. I'm thinking of how to handle invalid data. Is it the 
best solution to throw an exception? Would it be possible to 
return an error token and have the client decide what to do 
about?


Hmm... my initial reaction was "not as default - it should throw 
on error, otherwise noone will check for errors". But if it's 
returning an error token, maybe it would be sufficient if that 
token throws when its value is accessed?

Re: RFC: std.json sucessor


Am 22.08.2014 17:47, schrieb Jacob Carlborg:

On 2014-08-22 00:35, Sönke Ludwig wrote:

Following up on the recent "std.jgrandson" thread [1], I've picked up
the work (a lot earlier than anticipated) and finished a first version
of a loose blend of said std.jgrandson, vibe.data.json and some changes
that I had planned for vibe.data.json for a while. I'm quite pleased by
the results so far, although without a serialization framework it still
misses a very important building block.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/
DUB: http://code.dlang.org/packages/std_data_json


* Opening braces should be put on their own line to follow Phobos style
guides


Will do.


* I'm wondering about the assert in lexer.d, line 160. What happens if
two invalid tokens after each other occur?


There are actually no invalid tokens at all, the "invalid" enum value is 
only used to denote that no token is currently stored in _front. If 
readToken() doesn't throw, there will always be a valid token.



* I think we have talked about this before, when reviewing D lexers. I'm
thinking of how to handle invalid data. Is it the best solution to throw
an exception? Would it be possible to return an error token and have the
client decide what to do about? Shouldn't it be possible to build a JSON
validator on this?


That would indeed be a possibility, it's how I used to handle it in my 
private version of std.lexer, too. It could also be made a compile time 
option.



* The lexer seems to always convert JSON types to their native D types,
is that wise to do? That's unnecessary if you're implementing syntax
highlighting


It's basically the same trade-off as for unescaping string literals. For 
"string" inputs, it would be more efficient to just store a slice, but 
for generic input ranges it avoids the otherwise needed allocation. The 
proposed flag could make an improvement here, too.

Re: RFC: std.json sucessor


Am 22.08.2014 16:53, schrieb Ary Borenszweig:

On 8/22/14, 3:33 AM, Sönke Ludwig wrote:

Without a serialization framework it would in theory work like this:

 JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
 auto p = new Person(v["name"].get!string, v["age"].get!int);

unfortunately the operator overloading doesn't work like this currently,
so this is needed:

 JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
 auto p = new Person(
 v.get!(Json[string])["name"].get!string,
 v.get!(Json[string])["age"].get!int);


But does this parse the whole json into JSONValue? I want to create a
Person without creating an intermediate JSONValue for the whole json.
Can this be done?


That would be done by the serialization framework. Instead of using 
parseJSON(), it could use parseJSONStream() to populate the Person 
instance on the fly, without putting the whole JSON into memory. But I'd 
like to leave that for a later addition, because we'd otherwise end up 
with duplicate functionality once std.serialization gets finalized.


Manually it would work similar to this:

auto nodes = parseJSONStream(`{"age": 10, "name": "John"}`);
with (JSONParserNode.Kind) {
enforce(nodes.front == objectStart);
nodes.popFront();
while (nodes.front != objectEnd) {
auto key = nodes.front.key;
nodes.popFront();
if (key == "name")
person.name = nodes.front.literal.string;
else if (key == "age")
person.age = nodes.front.literal.number;
}
}

Re: RFC: std.json sucessor


Some thoughts about the API:

1) Instead of `parseJSONValue` and `lexJSON`, how about static 
methods `JSON.parse` and `JSON.lex`, or even a module level 
functions `std.data.json.parse` etc.? The "JSON" part of the name 
is redundant.


2) Also, `parseJSONValue` and `parseJSONStream` probably don't 
need to have different names. They can be distinguished by their 
parameter types.


3) `toJSONString` shouldn't just take a boolean as flag for 
pretty-printing. It should either use something like 
`Pretty.YES`, or the function should be called 
`toPrettyJSONString` (I believe I have seen this latter 
convention elsewhere).
We should also think about whether we can just call the functions 
`toString` and `toPrettyString`. Alternatively, `toJSON` and 
`toPrettyJSON` should be considered.

Re: RFC: std.json sucessor

2014-08-22 Thread Christian Manning via Digitalmars-d

It would be nice to have integers treated separately to doubles. 
I know it makes the number parsing simpler to just treat 
everything as double, but still, it could be annoying when you 
expect an integer type.


I'd also like to see some benchmarks, particularly against some 
of the high performance C++ parsers, i.e. rapidjson, gason, 
sajson. Or even some of the "not bad" performance parsers with 
better APIs, i.e. QJsonDocument, jsoncpp and jsoncons (slow but 
perhaps comparable interface to this proposal?).

Re: RFC: std.json sucessor


Am 22.08.2014 18:15, schrieb "Marc Schütz" ":

Some thoughts about the API:

1) Instead of `parseJSONValue` and `lexJSON`, how about static methods
`JSON.parse` and `JSON.lex`, or even a module level functions
`std.data.json.parse` etc.? The "JSON" part of the name is redundant.


For those functions it may be acceptable, although I really dislike that 
style, because it makes the code harder to read (what exactly does this 
parse?) and the functions are rarely used, so that that typing that 
additional "JSON" should be no issue at all. On the other hand, if you 
always type "JSON.lex" it's more to type than just "lexJSON".


But for "[JSON]Value" it gets ugly really quick, because "Value"s are 
such a common thing and quickly occur in multiple kinds in the same 
source file.




2) Also, `parseJSONValue` and `parseJSONStream` probably don't need to
have different names. They can be distinguished by their parameter types.


Actually they take exactly the same parameters and just differ in their 
return value. It would be more descriptive to name them parseAsJSONValue 
and parseAsJSONStream - or maybe parseJSONAsValue or parseJSONToValue? 
The current naming is somewhat modeled after std.conv's "to!T" and 
"parse!T".




3) `toJSONString` shouldn't just take a boolean as flag for
pretty-printing. It should either use something like `Pretty.YES`, or
the function should be called `toPrettyJSONString` (I believe I have
seen this latter convention elsewhere).
We should also think about whether we can just call the functions
`toString` and `toPrettyString`. Alternatively, `toJSON` and
`toPrettyJSON` should be considered.


Agreed, a boolean isn't good for a public interface, renaming the 
current writeAsString to private writeAsStringImpl and then adding 
"(writeAs/to)[Pretty]String" sounds reasonable. Actually I've done it 
that way for vibe.data.json.

Re: RFC: std.json sucessor


Am 22.08.2014 18:31, schrieb Christian Manning:

It would be nice to have integers treated separately to doubles. I know
it makes the number parsing simpler to just treat everything as double,
but still, it could be annoying when you expect an integer type.


That's how I've done it for vibe.data.json, too. For the new 
implementation, I've just used the number parsing routine from Andrei's 
std.jgrandson module. Does anybody have reservations about representing 
integers as "long" instead?




I'd also like to see some benchmarks, particularly against some of the
high performance C++ parsers, i.e. rapidjson, gason, sajson. Or even
some of the "not bad" performance parsers with better APIs, i.e.
QJsonDocument, jsoncpp and jsoncons (slow but perhaps comparable
interface to this proposal?).


That would indeed be nice to have, but I'm not sure if I can manage to 
squeeze that in besides finishing the module itself. My time frame for 
working on this is quite limited.

Re: RFC: std.json sucessor

2014-08-22 Thread Ary Borenszweig via Digitalmars-d


On 8/22/14, 1:24 PM, Sönke Ludwig wrote:

Am 22.08.2014 16:53, schrieb Ary Borenszweig:

On 8/22/14, 3:33 AM, Sönke Ludwig wrote:

Without a serialization framework it would in theory work like this:

 JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
 auto p = new Person(v["name"].get!string, v["age"].get!int);

unfortunately the operator overloading doesn't work like this currently,
so this is needed:

 JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
 auto p = new Person(
 v.get!(Json[string])["name"].get!string,
 v.get!(Json[string])["age"].get!int);


But does this parse the whole json into JSONValue? I want to create a
Person without creating an intermediate JSONValue for the whole json.
Can this be done?


That would be done by the serialization framework. Instead of using
parseJSON(), it could use parseJSONStream() to populate the Person
instance on the fly, without putting the whole JSON into memory. But I'd
like to leave that for a later addition, because we'd otherwise end up
with duplicate functionality once std.serialization gets finalized.

Manually it would work similar to this:

auto nodes = parseJSONStream(`{"age": 10, "name": "John"}`);
with (JSONParserNode.Kind) {
 enforce(nodes.front == objectStart);
 nodes.popFront();
 while (nodes.front != objectEnd) {
 auto key = nodes.front.key;
 nodes.popFront();
 if (key == "name")
 person.name = nodes.front.literal.string;
 else if (key == "age")
 person.age = nodes.front.literal.number;
 }
}


Cool, that looks good :-)

Re: RFC: std.json sucessor


On Friday, 22 August 2014 at 16:48:44 UTC, Sönke Ludwig wrote:

Am 22.08.2014 18:15, schrieb "Marc Schütz" ":

Some thoughts about the API:

1) Instead of `parseJSONValue` and `lexJSON`, how about static 
methods

`JSON.parse` and `JSON.lex`, or even a module level functions
`std.data.json.parse` etc.? The "JSON" part of the name is 
redundant.


For those functions it may be acceptable, although I really 
dislike that style, because it makes the code harder to read 
(what exactly does this parse?) and the functions are rarely 
used, so that that typing that additional "JSON" should be no 
issue at all. On the other hand, if you always type "JSON.lex" 
it's more to type than just "lexJSON".


I'm not really concerned about the amount of typing, it just 
seemed a bit odd to have the redundant JSON in there, as we have 
module names for namespacing. Your argument about readability is 
true nevertheless. But...




But for "[JSON]Value" it gets ugly really quick, because 
"Value"s are such a common thing and quickly occur in multiple 
kinds in the same source file.




2) Also, `parseJSONValue` and `parseJSONStream` probably don't 
need to
have different names. They can be distinguished by their 
parameter types.


Actually they take exactly the same parameters and just differ 
in their return value. It would be more descriptive to name 
them parseAsJSONValue and parseAsJSONStream - or maybe 
parseJSONAsValue or parseJSONToValue? The current naming is 
somewhat modeled after std.conv's "to!T" and "parse!T".


... why not use exactly the same convention then? => 
`parse!JSONValue`


Would be nice to have a "pluggable" API where you just need to 
specify the type in a factory method to choose the input format. 
Then there could be `parse!BSON`, `parse!YAML`, with the same 
style as `parse!(int[])`.


I know this sound a bit like bike-shedding, but the API shouldn't 
stand by itself, but fit into the "big picture", especially as 
there will probably be other parsers (you already named the 
module std._data_.json).

Re: RFC: std.json sucessor


On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:

Am 22.08.2014 18:31, schrieb Christian Manning:
It would be nice to have integers treated separately to 
doubles. I know
it makes the number parsing simpler to just treat everything 
as double,
but still, it could be annoying when you expect an integer 
type.


That's how I've done it for vibe.data.json, too. For the new 
implementation, I've just used the number parsing routine from 
Andrei's std.jgrandson module. Does anybody have reservations 
about representing integers as "long" instead?


It should automatically fall back to double on overflow. Maybe 
even use BigInt if applicable?

Re: RFC: std.json sucessor


Am 22.08.2014 19:24, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 16:48:44 UTC, Sönke Ludwig wrote:


Actually they take exactly the same parameters and just differ in
their return value. It would be more descriptive to name them
parseAsJSONValue and parseAsJSONStream - or maybe parseJSONAsValue or
parseJSONToValue? The current naming is somewhat modeled after
std.conv's "to!T" and "parse!T".


... why not use exactly the same convention then? => `parse!JSONValue`

Would be nice to have a "pluggable" API where you just need to specify
the type in a factory method to choose the input format. Then there
could be `parse!BSON`, `parse!YAML`, with the same style as
`parse!(int[])`.

I know this sound a bit like bike-shedding, but the API shouldn't stand
by itself, but fit into the "big picture", especially as there will
probably be other parsers (you already named the module std._data_.json).


That would be nice, but then it should also work together with std.conv, 
which basically is exactly this pluggable API. Just like this it would 
result in an ambiguity error if both std.data.json and std.conv are 
imported at the same time.


Is there a way to make std.conv work properly with JSONValue? I guess 
the only theoretical way would be to put something in JSONValue, but 
that would result in a slightly ugly cyclic dependency between parser.d 
and value.d.

Re: RFC: std.json sucessor


Am 22.08.2014 19:27, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:

Am 22.08.2014 18:31, schrieb Christian Manning:

It would be nice to have integers treated separately to doubles. I know
it makes the number parsing simpler to just treat everything as double,
but still, it could be annoying when you expect an integer type.


That's how I've done it for vibe.data.json, too. For the new
implementation, I've just used the number parsing routine from
Andrei's std.jgrandson module. Does anybody have reservations about
representing integers as "long" instead?


It should automatically fall back to double on overflow. Maybe even use
BigInt if applicable?


I guess BigInt + exponent would be the only lossless way to represent 
any JSON number. That could then be converted to any desired smaller 
type as required.


But checking for overflow during number parsing would definitely have an 
impact on parsing speed, as well as using a BigInt of course, so the 
question is how we want set up the trade off here (or if there is 
another way that is overhead-free).

Re: RFC: std.json sucessor


On Friday, 22 August 2014 at 17:35:20 UTC, Sönke Ludwig wrote:
... why not use exactly the same convention then? => 
`parse!JSONValue`


Would be nice to have a "pluggable" API where you just need to 
specify
the type in a factory method to choose the input format. Then 
there

could be `parse!BSON`, `parse!YAML`, with the same style as
`parse!(int[])`.

I know this sound a bit like bike-shedding, but the API 
shouldn't stand
by itself, but fit into the "big picture", especially as there 
will
probably be other parsers (you already named the module 
std._data_.json).


That would be nice, but then it should also work together with 
std.conv, which basically is exactly this pluggable API. Just 
like this it would result in an ambiguity error if both 
std.data.json and std.conv are imported at the same time.


Is there a way to make std.conv work properly with JSONValue? I 
guess the only theoretical way would be to put something in 
JSONValue, but that would result in a slightly ugly cyclic 
dependency between parser.d and value.d.


The easiest and cleanest way would be to add a function in 
std.data.json:


auto parse(Target, Source)(Source input)
if(is(Target == JSONValue))
{
return ...;
}

The various overloads of `std.conv.parse` already have mutually 
exclusive template constraints, they will not collide with our 
function.

Re: RFC: std.json sucessor


On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:

Am 22.08.2014 19:27, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:

Am 22.08.2014 18:31, schrieb Christian Manning:
It would be nice to have integers treated separately to 
doubles. I know
it makes the number parsing simpler to just treat everything 
as double,
but still, it could be annoying when you expect an integer 
type.


That's how I've done it for vibe.data.json, too. For the new
implementation, I've just used the number parsing routine from
Andrei's std.jgrandson module. Does anybody have reservations 
about

representing integers as "long" instead?


It should automatically fall back to double on overflow. Maybe 
even use

BigInt if applicable?


I guess BigInt + exponent would be the only lossless way to 
represent any JSON number. That could then be converted to any 
desired smaller type as required.


But checking for overflow during number parsing would 
definitely have an impact on parsing speed, as well as using a 
BigInt of course, so the question is how we want set up the 
trade off here (or if there is another way that is 
overhead-free).


As the functions will be templatized anyway, it should include a 
flags parameter. These and possible future extensions can then be 
selected by the user.

Re: RFC: std.json sucessor


On 8/21/2014 3:35 PM, Sönke Ludwig wrote:

Destroy away! ;)


Thanks for taking this on! This is valuable work. On to destruction!

I'm looking at:

http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html

I anticipate this will be used a LOT and in very high speed demanding 
applications. With that in mind,



1. There's no mention of what will happen if it is passed malformed JSON 
strings. I presume an exception is thrown. Exceptions are both slow and consume 
GC memory. I suggest an alternative would be to emit an "Error" token instead; 
this would be much like how the UTF decoding algorithms emit a "replacement 
char" for invalid UTF sequences.


2. The escape sequenced strings presumably consume GC memory. This will be a 
problem for high performance code. I suggest either leaving them undecoded in 
the token stream, and letting higher level code decide what to do about them, or 
provide a hook that the user can override with his own allocation scheme.



If we don't make it possible to use std.json without invoking the GC, I believe 
the module will fail in the long term.

Re: RFC: std.json sucessor


Am 22.08.2014 19:57, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 17:35:20 UTC, Sönke Ludwig wrote:

... why not use exactly the same convention then? => `parse!JSONValue`

Would be nice to have a "pluggable" API where you just need to specify
the type in a factory method to choose the input format. Then there
could be `parse!BSON`, `parse!YAML`, with the same style as
`parse!(int[])`.

I know this sound a bit like bike-shedding, but the API shouldn't stand
by itself, but fit into the "big picture", especially as there will
probably be other parsers (you already named the module
std._data_.json).


That would be nice, but then it should also work together with
std.conv, which basically is exactly this pluggable API. Just like
this it would result in an ambiguity error if both std.data.json and
std.conv are imported at the same time.

Is there a way to make std.conv work properly with JSONValue? I guess
the only theoretical way would be to put something in JSONValue, but
that would result in a slightly ugly cyclic dependency between
parser.d and value.d.


The easiest and cleanest way would be to add a function in std.data.json:

 auto parse(Target, Source)(Source input)
 if(is(Target == JSONValue))
 {
 return ...;
 }

The various overloads of `std.conv.parse` already have mutually
exclusive template constraints, they will not collide with our function.


Okay, for parse that may work, but what about to!()?

Re: RFC: std.json sucessor


Am 22.08.2014 20:01, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:

Am 22.08.2014 19:27, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:

Am 22.08.2014 18:31, schrieb Christian Manning:

It would be nice to have integers treated separately to doubles. I
know
it makes the number parsing simpler to just treat everything as
double,
but still, it could be annoying when you expect an integer type.


That's how I've done it for vibe.data.json, too. For the new
implementation, I've just used the number parsing routine from
Andrei's std.jgrandson module. Does anybody have reservations about
representing integers as "long" instead?


It should automatically fall back to double on overflow. Maybe even use
BigInt if applicable?


I guess BigInt + exponent would be the only lossless way to represent
any JSON number. That could then be converted to any desired smaller
type as required.

But checking for overflow during number parsing would definitely have
an impact on parsing speed, as well as using a BigInt of course, so
the question is how we want set up the trade off here (or if there is
another way that is overhead-free).


As the functions will be templatized anyway, it should include a flags
parameter. These and possible future extensions can then be selected by
the user.


I'm actually in the process of converting the "track_location" parameter 
to a flags enum and to add support for an error token, so this would fit 
right in.

Re: RFC: std.json sucessor


On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:

Am 22.08.2014 19:57, schrieb "Marc Schütz" ":
The easiest and cleanest way would be to add a function in 
std.data.json:


auto parse(Target, Source)(Source input)
if(is(Target == JSONValue))
{
return ...;
}

The various overloads of `std.conv.parse` already have mutually
exclusive template constraints, they will not collide with our 
function.


Okay, for parse that may work, but what about to!()?


What's the problem with to!()?

Re: RFC: std.json sucessor

2014-08-22 Thread Andrej Mitrovic via Digitalmars-d

On 8/22/14, Sönke Ludwig  wrote:
> Docs: http://s-ludwig.github.io/std_data_json/

This confused me for a solid minute:

// Lex a JSON string into a lazy range of tokens
auto tokens = lexJSON(`{"name": "Peter", "age": 42}`);

with (JSONToken.Kind) {
assert(tokens.map!(t => t.kind).equal(
[objectStart, string, colon, string, comma,
string, colon, number, objectEnd]));
}

Generally I'd avoid using de-facto reserved names as enum member names
(e.g. string).

Re: RFC: std.json sucessor


Am 22.08.2014 21:15, schrieb Andrej Mitrovic via Digitalmars-d:

On 8/22/14, Sönke Ludwig  wrote:

Docs: http://s-ludwig.github.io/std_data_json/


This confused me for a solid minute:

// Lex a JSON string into a lazy range of tokens
auto tokens = lexJSON(`{"name": "Peter", "age": 42}`);

with (JSONToken.Kind) {
 assert(tokens.map!(t => t.kind).equal(
 [objectStart, string, colon, string, comma,
 string, colon, number, objectEnd]));
}

Generally I'd avoid using de-facto reserved names as enum member names
(e.g. string).



Hmmm, but it *is* a string. Isn't the problem more the use of with in 
this case? Maybe the example should just use with(JSONToken) and then 
Kind.string?

Re: RFC: std.json sucessor

2014-08-22 Thread Christian Manning via Digitalmars-d


On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:

Am 22.08.2014 19:27, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:

Am 22.08.2014 18:31, schrieb Christian Manning:
It would be nice to have integers treated separately to 
doubles. I know
it makes the number parsing simpler to just treat everything 
as double,
but still, it could be annoying when you expect an integer 
type.


That's how I've done it for vibe.data.json, too. For the new
implementation, I've just used the number parsing routine from
Andrei's std.jgrandson module. Does anybody have reservations 
about

representing integers as "long" instead?


It should automatically fall back to double on overflow. Maybe 
even use

BigInt if applicable?


I guess BigInt + exponent would be the only lossless way to 
represent any JSON number. That could then be converted to any 
desired smaller type as required.


But checking for overflow during number parsing would 
definitely have an impact on parsing speed, as well as using a 
BigInt of course, so the question is how we want set up the 
trade off here (or if there is another way that is 
overhead-free).


You could check for a decimal point and a 0 at the front 
(excluding possible - sign), either would indicate a double, 
making the reasonable assumption that anything else will fit in a 
long.

Re: RFC: std.json sucessor


Am 22.08.2014 21:48, schrieb Christian Manning:

On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:

Am 22.08.2014 19:27, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:

Am 22.08.2014 18:31, schrieb Christian Manning:

It would be nice to have integers treated separately to doubles. I
know
it makes the number parsing simpler to just treat everything as
double,
but still, it could be annoying when you expect an integer type.


That's how I've done it for vibe.data.json, too. For the new
implementation, I've just used the number parsing routine from
Andrei's std.jgrandson module. Does anybody have reservations about
representing integers as "long" instead?


It should automatically fall back to double on overflow. Maybe even use
BigInt if applicable?


I guess BigInt + exponent would be the only lossless way to represent
any JSON number. That could then be converted to any desired smaller
type as required.

But checking for overflow during number parsing would definitely have
an impact on parsing speed, as well as using a BigInt of course, so
the question is how we want set up the trade off here (or if there is
another way that is overhead-free).


You could check for a decimal point and a 0 at the front (excluding
possible - sign), either would indicate a double, making the reasonable
assumption that anything else will fit in a long.


Yes, no decimal point + no exponent would work without overhead to 
detect integers, but that wouldn't solve the proposed automatic 
long->double overflow, which is what I meant. My current idea is to 
default to double and optionally support any of long, BigInt and 
"Decimal" (BigInt+exponent), where integer overflow only works for 
long->BigInt.

Re: RFC: std.json sucessor

2014-08-22 Thread John Colvin via Digitalmars-d


On Friday, 22 August 2014 at 20:02:41 UTC, Sönke Ludwig wrote:

Am 22.08.2014 21:48, schrieb Christian Manning:

On Friday, 22 August 2014 at 17:45:03 UTC, Sönke Ludwig wrote:
Am 22.08.2014 19:27, schrieb "Marc Schütz" 
":
On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig 
wrote:

Am 22.08.2014 18:31, schrieb Christian Manning:
It would be nice to have integers treated separately to 
doubles. I

know
it makes the number parsing simpler to just treat 
everything as

double,
but still, it could be annoying when you expect an integer 
type.


That's how I've done it for vibe.data.json, too. For the new
implementation, I've just used the number parsing routine 
from
Andrei's std.jgrandson module. Does anybody have 
reservations about

representing integers as "long" instead?


It should automatically fall back to double on overflow. 
Maybe even use

BigInt if applicable?


I guess BigInt + exponent would be the only lossless way to 
represent
any JSON number. That could then be converted to any desired 
smaller

type as required.

But checking for overflow during number parsing would 
definitely have
an impact on parsing speed, as well as using a BigInt of 
course, so
the question is how we want set up the trade off here (or if 
there is

another way that is overhead-free).


You could check for a decimal point and a 0 at the front 
(excluding
possible - sign), either would indicate a double, making the 
reasonable

assumption that anything else will fit in a long.


Yes, no decimal point + no exponent would work without overhead 
to detect integers, but that wouldn't solve the proposed 
automatic long->double overflow, which is what I meant. My 
current idea is to default to double and optionally support any 
of long, BigInt and "Decimal" (BigInt+exponent), where integer 
overflow only works for long->BigInt.


It might be the right choice anyway (seeing as json/js do 
overflow to double), but fwiw it's still atrocious.


double a = long.max;
assert(iota(1, 100).map!(d => (a+d)-a).until!"a != 
0".walkLength == 1024);


Yuk.

Floating point numbers and integers are so completely different 
in behaviour that it's just dishonest to transparently switch 
between the two. This especially the case for overflow from long 
-> double, where by definition you're 10 bits past being able to 
reliably accurately represent the integer in question.

Re: RFC: std.json sucessor

2014-08-22 Thread Christian Manning via Digitalmars-d

Yes, no decimal point + no exponent would work without overhead 
to detect integers, but that wouldn't solve the proposed 
automatic long->double overflow, which is what I meant. My 
current idea is to default to double and optionally support any 
of long, BigInt and "Decimal" (BigInt+exponent), where integer 
overflow only works for long->BigInt.


Ah I see.

I have to say, if you are going to treat integers and floating 
point numbers differently, then you should store them 
differently. long should be used to store integers, double for 
floating point numbers. 64 bit signed integer (long) is a totally 
reasonable limitation for integers, but even that would lose 
precision stored as a double as you are proposing (if I'm 
understanding right). I don't think BigInt needs to be brought 
into this at all really.


In the case of integers met in the parser which are too 
large/small to fit in long, give an error IMO. Such integers 
should be (and are by other libs IIRC) serialised in the form 
"1.234e-123" to force double parsing, perhaps losing precision at 
that stage rather than invisibly inside the library. Size of JSON 
numbers is implementation defined and the whole thing shouldn't 
be degraded in both performance and usability to cover JSON 
serialisers who go beyond common native number types.


Of course, you are free to do whatever you like :)

Re: RFC: std.json sucessor


Am 22.08.2014 20:08, schrieb Walter Bright:

On 8/21/2014 3:35 PM, Sönke Ludwig wrote:

Destroy away! ;)


Thanks for taking this on! This is valuable work. On to destruction!

I'm looking at:

http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html

I anticipate this will be used a LOT and in very high speed demanding
applications. With that in mind,


1. There's no mention of what will happen if it is passed malformed JSON
strings. I presume an exception is thrown. Exceptions are both slow and
consume GC memory. I suggest an alternative would be to emit an "Error"
token instead; this would be much like how the UTF decoding algorithms
emit a "replacement char" for invalid UTF sequences.


The latest version now features a LexOptions.noThrow option which causes 
an error token to be emitted instead. After popping the error token, the 
range is always empty.




2. The escape sequenced strings presumably consume GC memory. This will
be a problem for high performance code. I suggest either leaving them
undecoded in the token stream, and letting higher level code decide what
to do about them, or provide a hook that the user can override with his
own allocation scheme.


The problem is that it really depends on the use case and on the type of 
input stream which approach is more efficient (storing the escaped 
version of a string might require *two* allocations if the input range 
cannot be sliced and if the decoded string is then requested by the 
parser). My current idea therefore is to simply make this configurable, too.


Enabling the use of custom allocators should be easily possible as an 
add-on functionality later on. At least my suggestion would be to wait 
with this until we have a finished std.allocator module.

Re: RFC: std.json sucessor


Am 22.08.2014 18:13, schrieb Sönke Ludwig:

Am 22.08.2014 17:47, schrieb Jacob Carlborg:


* Opening braces should be put on their own line to follow Phobos style
guides


Will do.


* I'm wondering about the assert in lexer.d, line 160. What happens if
two invalid tokens after each other occur?


There are actually no invalid tokens at all, the "invalid" enum value is
only used to denote that no token is currently stored in _front. If
readToken() doesn't throw, there will always be a valid token.


Renamed from "invalid" to "none" now to avoid confusion ->




* I think we have talked about this before, when reviewing D lexers. I'm
thinking of how to handle invalid data. Is it the best solution to throw
an exception? Would it be possible to return an error token and have the
client decide what to do about? Shouldn't it be possible to build a JSON
validator on this?


That would indeed be a possibility, it's how I used to handle it in my
private version of std.lexer, too. It could also be made a compile time
option.


and an additional "error" kind has been added, which implements the 
above. Enabled using LexOptions.noThrow.



* The lexer seems to always convert JSON types to their native D types,
is that wise to do? That's unnecessary if you're implementing syntax
highlighting


It's basically the same trade-off as for unescaping string literals. For
"string" inputs, it would be more efficient to just store a slice, but
for generic input ranges it avoids the otherwise needed allocation. The
proposed flag could make an improvement here, too.

Re: RFC: std.json sucessor


On 8/22/2014 2:27 PM, Sönke Ludwig wrote:

Am 22.08.2014 20:08, schrieb Walter Bright:

1. There's no mention of what will happen if it is passed malformed JSON
strings. I presume an exception is thrown. Exceptions are both slow and
consume GC memory. I suggest an alternative would be to emit an "Error"
token instead; this would be much like how the UTF decoding algorithms
emit a "replacement char" for invalid UTF sequences.

The latest version now features a LexOptions.noThrow option which causes an
error token to be emitted instead. After popping the error token, the range is
always empty.


Having a nothrow option may prevent the functions from being attributed as 
"nothrow".


But in any case, to worship at the Altar Of Composability, the error token could 
always be emitted, and then provide another algorithm which passes through all 
non-error tokens, and throws if it sees an error token.




2. The escape sequenced strings presumably consume GC memory. This will
be a problem for high performance code. I suggest either leaving them
undecoded in the token stream, and letting higher level code decide what
to do about them, or provide a hook that the user can override with his
own allocation scheme.


The problem is that it really depends on the use case and on the type of input
stream which approach is more efficient (storing the escaped version of a string
might require *two* allocations if the input range cannot be sliced and if the
decoded string is then requested by the parser). My current idea therefore is to
simply make this configurable, too.

Enabling the use of custom allocators should be easily possible as an add-on
functionality later on. At least my suggestion would be to wait with this until
we have a finished std.allocator module.


I'm worried that std.allocator is stalled and we'll be digging ourselves deeper 
into needing to revise things later to remove GC usage. I'd really like to find 
a way to abstract the allocation away from the algorithm.

Re: RFC: std.json sucessor

2014-08-22 Thread deadalnix via Digitalmars-d

First thank you for your work. std.json is horrible to use right 
now, so a replacement is more than welcome.


I haven't played with your code yet, so I may be asking for 
somethign that already exists, but did you had a look to jsvar by 
Adam ?


You can find it here: 
https://github.com/adamdruppe/arsd/blob/master/jsvar.d


One of the big pain when one work with format like JSON is that 
you go from the untyped world to the typed world (the same 
problem occurs with XML and various config format as well).


I think Adam got the right balance in jsvar. It behave closely 
enough to javascript so it is convenient to manipulate, while 
removing the most dangerous behavior (concatenation is still done 
using ~and not + as in JS).


If that is not already the case, I'd love that the element I get 
out of my JSON behave that way. If you can do that, you have a 
user.

Re: RFC: std.json sucessor


On 8/22/2014 6:05 PM, Walter Bright wrote:

The problem is that it really depends on the use case and on the type of input
stream which approach is more efficient (storing the escaped version of a string
might require *two* allocations if the input range cannot be sliced and if the
decoded string is then requested by the parser). My current idea therefore is to
simply make this configurable, too.

Enabling the use of custom allocators should be easily possible as an add-on
functionality later on. At least my suggestion would be to wait with this until
we have a finished std.allocator module.


Another possibility is to have the user pass in a resizeable buffer which then 
will be used to store the strings in as necessary.


One example is std.internal.scopebuffer. The nice thing about that is the user 
can use the stack for the storage, which works out to be very, very fast.

Re: RFC: std.json sucessor

2014-08-22 Thread ketmar via Digitalmars-d

On Sat, 23 Aug 2014 02:23:25 +
deadalnix via Digitalmars-d  wrote:

> I haven't played with your code yet, so I may be asking for 
> somethign that already exists, but did you had a look to jsvar by 
> Adam ?

jsvar using opDispatch, and Sönke wrote:
>  - No opDispatch() for JSONValue - this has shown to do more harm than
>good in vibe.data.json


signature.asc
Description: PGP signature

Re: RFC: std.json sucessor


On Saturday, 23 August 2014 at 02:30:23 UTC, Walter Bright wrote:
Another possibility is to have the user pass in a resizeable 
buffer which then will be used to store the strings in as 
necessary.


One example is std.internal.scopebuffer. The nice thing about 
that is the user can use the stack for the storage, which works 
out to be very, very fast.


Does this mean that D is getting resizable stack allocations in 
lower stack frames? That has a lot of implications for code gen.

Re: RFC: std.json sucessor


On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:

On Saturday, 23 August 2014 at 02:30:23 UTC, Walter Bright wrote:

One example is std.internal.scopebuffer. The nice thing about that is the user
can use the stack for the storage, which works out to be very, very fast.


Does this mean that D is getting resizable stack allocations in lower stack
frames? That has a lot of implications for code gen.


scopebuffer does not require resizeable stack allocations.

Re: RFC: std.json sucessor


On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:

On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:
Does this mean that D is getting resizable stack allocations 
in lower stack

frames? That has a lot of implications for code gen.


scopebuffer does not require resizeable stack allocations.


So you cannot use the stack for resizable allocations.

That would however be a nice optimization. Iff an algorithm only 
have one alloca, can be inlined in a way which does not extend 
the stack and use a resizable buffer that grows downwards in 
memory then you can have a resizable buffer on the stack:


HIMEM
...
Algorihm stack frame vars
Inlined vars
Buffer head/book keeping vars
Buffer end
Buffer front
...add to front here...
End of stack
LOMEM

Re: RFC: std.json sucessor


On 8/22/2014 9:48 PM, Ola Fosheim Gr wrote:

On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:

On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:

Does this mean that D is getting resizable stack allocations in lower stack
frames? That has a lot of implications for code gen.


scopebuffer does not require resizeable stack allocations.


So you cannot use the stack for resizable allocations.


Please, take a look at how scopebuffer works.

Re: RFC: std.json sucessor


On Saturday, 23 August 2014 at 05:28:55 UTC, Walter Bright wrote:

On 8/22/2014 9:48 PM, Ola Fosheim Gr wrote:
On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright 
wrote:

On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:
Does this mean that D is getting resizable stack allocations 
in lower stack

frames? That has a lot of implications for code gen.


scopebuffer does not require resizeable stack allocations.


So you cannot use the stack for resizable allocations.


Please, take a look at how scopebuffer works.


I have? It requires an upperbound to stay on the stack, that 
creates a big hole in the stack. I don't think wasting the stack 
or moving to the heap is a nice predictable solution. It would be 
better to just have a couple of regions that do "reverse" stack 
allocations, but the most efficient solution is the one I 
outlined.


With json you might be able to create an upperbound of say 4-8 
times the size of the source iff you know the file size. You 
don't if you are streaming.


(scopebuffer is too unpredictable for real time, a pure stack 
solution is predictable)

Re: RFC: std.json sucessor


On 8/22/2014 11:25 PM, Ola Fosheim Gr wrote:

On Saturday, 23 August 2014 at 05:28:55 UTC, Walter Bright wrote:

On 8/22/2014 9:48 PM, Ola Fosheim Gr wrote:

On Saturday, 23 August 2014 at 04:36:34 UTC, Walter Bright wrote:

On 8/22/2014 9:01 PM, Ola Fosheim Gr wrote:

Does this mean that D is getting resizable stack allocations in lower stack
frames? That has a lot of implications for code gen.


scopebuffer does not require resizeable stack allocations.


So you cannot use the stack for resizable allocations.


Please, take a look at how scopebuffer works.


I have? It requires an upperbound to stay on the stack, that creates a big hole
in the stack. I don't think wasting the stack or moving to the heap is a nice
predictable solution. It would be better to just have a couple of regions that
do "reverse" stack allocations, but the most efficient solution is the one I
outlined.


Scopebuffer is extensively used in Warp, and works very well. The "hole" in the 
stack is not a significant problem.




With json you might be able to create an upperbound of say 4-8 times the size of
the source iff you know the file size. You don't if you are streaming.

(scopebuffer is too unpredictable for real time, a pure stack solution is
predictable)


You can always implement your own buffering system and pass it in - that's the 
point, it's under user control.

Re: RFC: std.json sucessor


On Saturday, 23 August 2014 at 06:41:11 UTC, Walter Bright wrote:
Scopebuffer is extensively used in Warp, and works very well. 
The "hole" in the stack is not a significant problem.


Well, on a webserver you don't want to push out the caches for no 
good reason.


You can always implement your own buffering system and pass it 
in - that's the point, it's under user control.


My point is that you need compiler support to get good buffering 
options on the stack. Something like an @alloca_inline:


auto buffer = @alloca_inline getstuff();
process(buffer);

I think all memory allocation should be under compiler control, 
the library solutions are bound to be suboptimal, i.e. slower.

Re: RFC: std.json sucessor

2014-08-23 Thread Andrej Mitrovic via Digitalmars-d

On 8/22/14, Sönke Ludwig  wrote:
> Hmmm, but it *is* a string. Isn't the problem more the use of with in
> this case?

Yeah, maybe so. I thought for a second it was a tuple, but then I saw
the square brackets and was left scratching my head. :)

Re: RFC: std.json sucessor


Am 23.08.2014 03:05, schrieb Walter Bright:

On 8/22/2014 2:27 PM, Sönke Ludwig wrote:

Am 22.08.2014 20:08, schrieb Walter Bright:

1. There's no mention of what will happen if it is passed malformed JSON
strings. I presume an exception is thrown. Exceptions are both slow and
consume GC memory. I suggest an alternative would be to emit an "Error"
token instead; this would be much like how the UTF decoding algorithms
emit a "replacement char" for invalid UTF sequences.

The latest version now features a LexOptions.noThrow option which
causes an
error token to be emitted instead. After popping the error token, the
range is
always empty.


Having a nothrow option may prevent the functions from being attributed
as "nothrow".


It's a compile time option, so that shouldn't be an issue. There is also 
just a single "throw" statement in the source, so it's easy to isolate.

Re: RFC: std.json sucessor


Am 23.08.2014 04:23, schrieb deadalnix:

First thank you for your work. std.json is horrible to use right now, so
a replacement is more than welcome.

I haven't played with your code yet, so I may be asking for somethign
that already exists, but did you had a look to jsvar by Adam ?

You can find it here:
https://github.com/adamdruppe/arsd/blob/master/jsvar.d

One of the big pain when one work with format like JSON is that you go
from the untyped world to the typed world (the same problem occurs with
XML and various config format as well).

I think Adam got the right balance in jsvar. It behave closely enough to
javascript so it is convenient to manipulate, while removing the most
dangerous behavior (concatenation is still done using ~and not + as in JS).

If that is not already the case, I'd love that the element I get out of
my JSON behave that way. If you can do that, you have a user.


Setting the issue of opDispatch aside, one of the goals was to use 
Algebraic to store values. It is probably not completely as flexible as 
jsvar, but still transparently enables a lot of operations (with those 
pull requests merged at least). But it has another big advantage, which 
is that we can later define other types based on Algebraic, such as 
BSONValue, and those can be transparently runtime converted between each 
other in a generic way. A special case type on the other hand produces 
nasty dependencies between the formats.


Main issues of using opDispatch:

 - Prone to bugs where a normal field/method of the JSONValue struct is 
accessed instead of a JSON field
 - On top of that the var.field syntax gives the wrong impression that 
you are working with static typing, while var["field"] makes it clear 
that runtime indexing is going on
 - Every interface change of JSONValue would be a silent breaking 
change, because the whole string domain is used up for opDispatch

Re: RFC: std.json sucessor

2014-08-23 Thread w0rp via Digitalmars-d


On Saturday, 23 August 2014 at 09:22:01 UTC, Sönke Ludwig wrote:

Main issues of using opDispatch:

 - Prone to bugs where a normal field/method of the JSONValue 
struct is accessed instead of a JSON field
 - On top of that the var.field syntax gives the wrong 
impression that you are working with static typing, while 
var["field"] makes it clear that runtime indexing is going on
 - Every interface change of JSONValue would be a silent 
breaking change, because the whole string domain is used up for 
opDispatch


I have seen similar issues to these with simplexml in PHP. Using 
opDispatch to match all possible names except a few doesn't work 
so well.


I'm not sure if you've changed it already, but I agree with the 
earlier comment about changing the flag for pretty printing from 
a boolean to an enum value. Booleans in interfaces is one of my 
pet peeves.

Re: RFC: std.json sucessor


Am 23.08.2014 14:19, schrieb w0rp:

I'm not sure if you've changed it already, but I agree with the earlier
comment about changing the flag for pretty printing from a boolean to an
enum value. Booleans in interfaces is one of my pet peeves.


It's split into two separate functions now. Having to type out a full 
enum value I guess would be too distracting in this case, since they 
will be pretty frequently used.

Re: RFC: std.json sucessor


Am 22.08.2014 20:08, schrieb Walter Bright:

(...)
2. The escape sequenced strings presumably consume GC memory. This will
be a problem for high performance code. I suggest either leaving them
undecoded in the token stream, and letting higher level code decide what
to do about them, or provide a hook that the user can override with his
own allocation scheme.

If we don't make it possible to use std.json without invoking the GC, I
believe the module will fail in the long term.


I've added two new types now to abstract away how strings and numbers 
are represented in memory. For string literals this means that for input 
types "string" and "immutable(ubyte)[]" they will always be stored as 
slices to the input buffer. JSONValue has a .rawValue property to access 
them, as well as an "alias this"ed .value property that transparently 
unescapes.


At that place it would also be easy to provide a method that takes an 
arbitrary output range to unescape without allocations.


Documentation and code are both updated (also added a note about 
exception behavior).

Re: RFC: std.json sucessor


Am 22.08.2014 21:00, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:

Am 22.08.2014 19:57, schrieb "Marc Schütz" ":

The easiest and cleanest way would be to add a function in
std.data.json:

auto parse(Target, Source)(Source input)
if(is(Target == JSONValue))
{
return ...;
}

The various overloads of `std.conv.parse` already have mutually
exclusive template constraints, they will not collide with our function.


Okay, for parse that may work, but what about to!()?


What's the problem with to!()?


to!() definitely doesn't have a template constraint that excludes 
JSONValue. Instead, it will convert any struct type that doesn't define 
toString() to a D-like representation.

Re: RFC: std.json sucessor


On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:

Am 22.08.2014 21:00, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:
Am 22.08.2014 19:57, schrieb "Marc Schütz" 
":

The easiest and cleanest way would be to add a function in
std.data.json:

   auto parse(Target, Source)(Source input)
   if(is(Target == JSONValue))
   {
   return ...;
   }

The various overloads of `std.conv.parse` already have 
mutually
exclusive template constraints, they will not collide with 
our function.


Okay, for parse that may work, but what about to!()?


What's the problem with to!()?


to!() definitely doesn't have a template constraint that 
excludes JSONValue. Instead, it will convert any struct type 
that doesn't define toString() to a D-like representation.


For converting a JSONValue to a different type, JSONValue can 
implement `opCast`, which is the regular interface that 
std.conv.to uses if it's available.


For converting something _to_ a JSONValue, std.conv.to will 
simply create an instance of it by calling the constructor.

Re: RFC: std.json sucessor


Am 23.08.2014 19:25, schrieb "Marc Schütz" ":

On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:

Am 22.08.2014 21:00, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:

Am 22.08.2014 19:57, schrieb "Marc Schütz" ":

The easiest and cleanest way would be to add a function in
std.data.json:

   auto parse(Target, Source)(Source input)
   if(is(Target == JSONValue))
   {
   return ...;
   }

The various overloads of `std.conv.parse` already have mutually
exclusive template constraints, they will not collide with our
function.


Okay, for parse that may work, but what about to!()?


What's the problem with to!()?


to!() definitely doesn't have a template constraint that excludes
JSONValue. Instead, it will convert any struct type that doesn't
define toString() to a D-like representation.


For converting a JSONValue to a different type, JSONValue can implement
`opCast`, which is the regular interface that std.conv.to uses if it's
available.

For converting something _to_ a JSONValue, std.conv.to will simply
create an instance of it by calling the constructor.


That would just introduce the said dependency cycle between JSONValue, 
the parser and the lexer. Possible, but not particularly pretty. Also, 
using the JSONValue constructor to parse an input string would 
contradict the intuitive behavior to just store the string value.

Re: RFC: std.json sucessor


On 8/23/2014 9:36 AM, Sönke Ludwig wrote:

input types "string" and "immutable(ubyte)[]"


Why the immutable(ubyte)[] ?

Re: RFC: std.json sucessor


Am 23.08.2014 19:38, schrieb Walter Bright:

On 8/23/2014 9:36 AM, Sönke Ludwig wrote:

input types "string" and "immutable(ubyte)[]"


Why the immutable(ubyte)[] ?


I've adopted that basically from Andrei's module. The idea is to allow 
processing data with arbitrary character encoding. However, the output 
will always be Unicode and JSON is defined to be encoded as Unicode, 
too, so that could probably be dropped...

Re: RFC: std.json sucessor


On 8/23/2014 10:42 AM, Sönke Ludwig wrote:

Am 23.08.2014 19:38, schrieb Walter Bright:

On 8/23/2014 9:36 AM, Sönke Ludwig wrote:

input types "string" and "immutable(ubyte)[]"


Why the immutable(ubyte)[] ?


I've adopted that basically from Andrei's module. The idea is to allow
processing data with arbitrary character encoding. However, the output will
always be Unicode and JSON is defined to be encoded as Unicode, too, so that
could probably be dropped...


I feel that non-UTF encodings should be handled by adapter algorithms, not 
embedded into the JSON lexer, so yes, I'd drop that.

Re: RFC: std.json sucessor


On Saturday, 23 August 2014 at 17:32:01 UTC, Sönke Ludwig wrote:

Am 23.08.2014 19:25, schrieb "Marc Schütz" ":
On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig 
wrote:
Am 22.08.2014 21:00, schrieb "Marc Schütz" 
":
On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig 
wrote:
Am 22.08.2014 19:57, schrieb "Marc Schütz" 
":

The easiest and cleanest way would be to add a function in
std.data.json:

  auto parse(Target, Source)(Source input)
  if(is(Target == JSONValue))
  {
  return ...;
  }

The various overloads of `std.conv.parse` already have 
mutually
exclusive template constraints, they will not collide with 
our

function.


Okay, for parse that may work, but what about to!()?


What's the problem with to!()?


to!() definitely doesn't have a template constraint that 
excludes
JSONValue. Instead, it will convert any struct type that 
doesn't

define toString() to a D-like representation.


For converting a JSONValue to a different type, JSONValue can 
implement
`opCast`, which is the regular interface that std.conv.to uses 
if it's

available.

For converting something _to_ a JSONValue, std.conv.to will 
simply

create an instance of it by calling the constructor.


That would just introduce the said dependency cycle between 
JSONValue, the parser and the lexer. Possible, but not 
particularly pretty. Also, using the JSONValue constructor to 
parse an input string would contradict the intuitive behavior 
to just store the string value.


That's what I expect it to do anyway. For parsing, there are 
already other functions. "mystring".to!JSONValue should just wrap 
"mystring".

Re: RFC: std.json sucessor


Am 23.08.2014 20:31, schrieb "Marc Schütz" ":

On Saturday, 23 August 2014 at 17:32:01 UTC, Sönke Ludwig wrote:

Am 23.08.2014 19:25, schrieb "Marc Schütz" ":

On Saturday, 23 August 2014 at 16:49:23 UTC, Sönke Ludwig wrote:

Am 22.08.2014 21:00, schrieb "Marc Schütz" ":

On Friday, 22 August 2014 at 18:08:34 UTC, Sönke Ludwig wrote:

Am 22.08.2014 19:57, schrieb "Marc Schütz" ":

The easiest and cleanest way would be to add a function in
std.data.json:

  auto parse(Target, Source)(Source input)
  if(is(Target == JSONValue))
  {
  return ...;
  }

The various overloads of `std.conv.parse` already have mutually
exclusive template constraints, they will not collide with our
function.


Okay, for parse that may work, but what about to!()?


What's the problem with to!()?


to!() definitely doesn't have a template constraint that excludes
JSONValue. Instead, it will convert any struct type that doesn't
define toString() to a D-like representation.


For converting a JSONValue to a different type, JSONValue can implement
`opCast`, which is the regular interface that std.conv.to uses if it's
available.

For converting something _to_ a JSONValue, std.conv.to will simply
create an instance of it by calling the constructor.


That would just introduce the said dependency cycle between JSONValue,
the parser and the lexer. Possible, but not particularly pretty. Also,
using the JSONValue constructor to parse an input string would
contradict the intuitive behavior to just store the string value.


That's what I expect it to do anyway. For parsing, there are already
other functions. "mystring".to!JSONValue should just wrap "mystring".


Probably, but then to!() is inconsistent with parse!(). Usually they are 
both the same apart from how the tail of the input string is handled.

Re: RFC: std.json sucessor

2014-08-23 Thread Brad Roberts via Digitalmars-d


On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:

On 8/23/2014 10:42 AM, Sönke Ludwig wrote:

Am 23.08.2014 19:38, schrieb Walter Bright:

On 8/23/2014 9:36 AM, Sönke Ludwig wrote:

input types "string" and "immutable(ubyte)[]"


Why the immutable(ubyte)[] ?


I've adopted that basically from Andrei's module. The idea is to allow
processing data with arbitrary character encoding. However, the output
will
always be Unicode and JSON is defined to be encoded as Unicode, too,
so that
could probably be dropped...


I feel that non-UTF encodings should be handled by adapter algorithms,
not embedded into the JSON lexer, so yes, I'd drop that.


For performance purposes, determining encoding during lexing is useful. 
 You can avoid any conversion costs when you know that the original 
string is ascii or utf-8 or other.  The cost during lexing is 
essentially zero.  The cost of storing that state might be a concern, or 
it might be free in otherwise unused padding space.  The cost of 
re-scanning strings that can be avoided is non-trivial.


My past experience with this was in an http parser, where there's even 
more complex logic than json parsing, but the concepts still apply.

Re: RFC: std.json sucessor

On Saturday, 23 August 2014 at 19:01:13 UTC, Brad Roberts via 
Digitalmars-d wrote:
original string is ascii or utf-8 or other.  The cost during 
lexing is essentially zero.


I am not so sure when it comes to SIMD lexing. I think the 
specified behaviour should be done in a way which encourage later 
optimizations.

Re: RFC: std.json sucessor


Some baselines for performance:

https://github.com/mloskot/json_benchmark

http://chadaustin.me/2013/01/json-parser-benchmarking/

Re: RFC: std.json sucessor

2014-08-23 Thread deadalnix via Digitalmars-d


On Saturday, 23 August 2014 at 09:22:01 UTC, Sönke Ludwig wrote:

Main issues of using opDispatch:

 - Prone to bugs where a normal field/method of the JSONValue 
struct is accessed instead of a JSON field
 - On top of that the var.field syntax gives the wrong 
impression that you are working with static typing, while 
var["field"] makes it clear that runtime indexing is going on
 - Every interface change of JSONValue would be a silent 
breaking change, because the whole string domain is used up for 
opDispatch


Yes, I don't mind missing that one. It look like a false good 
idea.

Re: RFC: std.json sucessor

2014-08-23 Thread Andrei Alexandrescu via Digitalmars-d


On 8/23/14, 10:46 AM, Walter Bright wrote:

On 8/23/2014 10:42 AM, Sönke Ludwig wrote:

Am 23.08.2014 19:38, schrieb Walter Bright:

On 8/23/2014 9:36 AM, Sönke Ludwig wrote:

input types "string" and "immutable(ubyte)[]"


Why the immutable(ubyte)[] ?


I've adopted that basically from Andrei's module. The idea is to allow
processing data with arbitrary character encoding. However, the output
will
always be Unicode and JSON is defined to be encoded as Unicode, too,
so that
could probably be dropped...


I feel that non-UTF encodings should be handled by adapter algorithms,
not embedded into the JSON lexer, so yes, I'd drop that.


I think accepting ubyte it's a good idea. It means "got this stream of 
bytes off of the wire and it hasn't been validated as a UTF string". It 
also means (which is true) that the lexer does enough validation to 
constrain arbitrary bytes into text, and saves caller from either a 
check (expensive) or a cast (unpleasant).


Reality is the JSON lexer takes ubytes and produces tokens.


Andrei

Re: RFC: std.json sucessor


On 8/23/2014 2:36 PM, Andrei Alexandrescu wrote:

I think accepting ubyte it's a good idea. It means "got this stream of bytes off
of the wire and it hasn't been validated as a UTF string". It also means (which
is true) that the lexer does enough validation to constrain arbitrary bytes into
text, and saves caller from either a check (expensive) or a cast (unpleasant).

Reality is the JSON lexer takes ubytes and produces tokens.


Using an adapter still makes sense, because:

1. The adapter should be just as fast as wiring it in internally

2. The adapter then becomes a general purpose tool that can be used elsewhere 
where the encoding is unknown or suspect


3. The scope of the adapter is small, so it is easier to get it right, and being 
reusable means every user benefits from it


4. If we can't make adapters efficient, we've failed at the ranges+algorithms 
model, and I'm very unwilling to fail at that

Re: RFC: std.json sucessor


On 8/23/2014 12:00 PM, Brad Roberts via Digitalmars-d wrote:

On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:

I feel that non-UTF encodings should be handled by adapter algorithms,
not embedded into the JSON lexer, so yes, I'd drop that.


For performance purposes, determining encoding during lexing is useful.


I'm not convinced that using an adapter algorithm won't be just as fast.

Re: RFC: std.json sucessor

2014-08-23 Thread Andrei Alexandrescu via Digitalmars-d


On 8/23/14, 3:24 PM, Walter Bright wrote:

On 8/23/2014 2:36 PM, Andrei Alexandrescu wrote:

I think accepting ubyte it's a good idea. It means "got this stream of
bytes off
of the wire and it hasn't been validated as a UTF string". It also
means (which
is true) that the lexer does enough validation to constrain arbitrary
bytes into
text, and saves caller from either a check (expensive) or a cast
(unpleasant).

Reality is the JSON lexer takes ubytes and produces tokens.


Using an adapter still makes sense, because:

1. The adapter should be just as fast as wiring it in internally

2. The adapter then becomes a general purpose tool that can be used
elsewhere where the encoding is unknown or suspect

3. The scope of the adapter is small, so it is easier to get it right,
and being reusable means every user benefits from it

4. If we can't make adapters efficient, we've failed at the
ranges+algorithms model, and I'm very unwilling to fail at that


An adapter would solve the wrong problem here. There's nothing to adapt 
from and to.


An adapter would be good if e.g. the stream uses UTF-16 or some Windows 
encoding. Bytes are the natural input for a json parser.



Andrei

Re: RFC: std.json sucessor

2014-08-23 Thread Brad Roberts via Digitalmars-d


On 8/23/2014 3:20 PM, Walter Bright via Digitalmars-d wrote:

On 8/23/2014 12:00 PM, Brad Roberts via Digitalmars-d wrote:

On 8/23/2014 10:46 AM, Walter Bright via Digitalmars-d wrote:

I feel that non-UTF encodings should be handled by adapter algorithms,
not embedded into the JSON lexer, so yes, I'd drop that.


For performance purposes, determining encoding during lexing is useful.


I'm not convinced that using an adapter algorithm won't be just as fast.


Consider your own talks on optimizing the existing dmd lexer.  In those 
talks you've talked about the evils of additional processing on every 
byte.  That's what you're talking about here.  While it's possible that 
the inliner and other optimizer steps might be able to integrate the two 
phases and remove some overhead, I'll believe it when I see the 
resulting assembly code.

Re: RFC: std.json sucessor

I've added support (compile time option [1]) for long and BigInt in the 
lexer (and parser), see [2]. JSONValue currently still only stores 
double for numbers. There are two options for extending JSONValue:


1. Add long and BigInt to the set of supported types for JSONValue. This 
preserves all features of Algebraic and would later still allow 
transparent conversion to other similar value types (e.g. BSONValue). On 
the other hand it would be necessary to always check the actual type 
before accessing a number, or the Algebraic would throw.


2. Instead of double, store a JSONNumber in the Algebraic. This enables 
all the transparent conversions of JSONNumber and would thus be more 
convenient, but blocks the way for possible automatic conversions in the 
future.


I'm leaning towards 1, because allowing generic conversion between 
different JSONValue-like types was one of my prime goals for the new module.


[1]: 
http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/LexOptions.html
[2]: 
http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/JSONNumber.html

Re: RFC: std.json sucessor


On Monday, 25 August 2014 at 11:30:15 UTC, Sönke Ludwig wrote:
I've added support (compile time option [1]) for long and 
BigInt in the lexer (and parser), see [2]. JSONValue currently 
still only stores double for numbers.


It can be very useful to have a base 10 exponent representation 
in certain situations where you need to have the exact same 
results in two systems (like a third party ERP server versus a 
client side application). Base 2 exponents are tricky (incorrect) 
when you read ascii.


E.g. I have resorted to using Decimal in Python just to avoid the 
weird round off issues when calculating prices where the price is 
given in fractions of the order unit.


Perhaps a marginal problem, but could be important for some 
serious application areas where you need to integrate D with 
existing systems (for which you don't have the source code).

Re: RFC: std.json sucessor

2014-08-25 Thread Don via Digitalmars-d


On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
Following up on the recent "std.jgrandson" thread [1], I've 
picked up the work (a lot earlier than anticipated) and 
finished a first version of a loose blend of said 
std.jgrandson, vibe.data.json and some changes that I had 
planned for vibe.data.json for a while. I'm quite pleased by 
the results so far, although without a serialization framework 
it still misses a very important building block.


Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/
DUB: http://code.dlang.org/packages/std_data_json

The new code contains:
 - Lazy lexer in the form of a token input range (using slices 
of the

   input if possible)
 - Lazy streaming parser (StAX style) in the form of a node 
input range

 - Eager DOM style parser returning a JSONValue
 - Range based JSON string generator taking either a token 
range, a

   node range, or a JSONValue
 - Opt-out location tracking (line/column) for tokens, nodes 
and values
 - No opDispatch() for JSONValue - this has shown to do more 
harm than

   good in vibe.data.json

The DOM style JSONValue type is based on std.variant.Algebraic. 
This currently has a few usability issues that can be solved by 
upgrading/fixing Algebraic:


 - Operator overloading only works sporadically
 - No "tag" enum is supported, so that switch()ing on the type 
of a

   value doesn't work and an if-else cascade is required
 - Operations and conversions between different Algebraic types 
is not
   conveniently supported, which gets important when other 
similar

   formats get supported (e.g. BSON)

Assuming that those points are solved, I'd like to get some 
early feedback before going for an official review. One open 
issue is how to handle unescaping of string literals. Currently 
it always unescapes immediately, which is more efficient for 
general input ranges when the unescaped result is needed, but 
less efficient for string inputs when the unescaped result is 
not needed. Maybe a flag could be used to conditionally switch 
behavior depending on the input range type.


Destroy away! ;)

[1]: http://forum.dlang.org/thread/lrknjl$co7$1...@digitalmars.com



One missing feature (which is also missing from the existing 
std.json) is support for NaN and Infinity as JSON values. 
Although they are not part of the formal JSON spec (which is a 
ridiculous omission, the argument given for excluding them is 
fallacious), they do get generated if you use Javascript's 
toString to create the JSON. Many JSON libraries (eg Google's) 
also generate them, so they are frequently encountered in 
practice. So a JSON parser should at least be able to lex them.


ie this should be parsable:

{"foo": NaN, "bar": Infinity, "baz": -Infinity}

You should also put tests in for what happens when you pass NaN 
or infinity to toJSON. It shouldn't silently generate invalid 
JSON.

Re: RFC: std.json sucessor


On Monday, 25 August 2014 at 13:07:08 UTC, Don wrote:

practice. So a JSON parser should at least be able to lex them.

ie this should be parsable:

{"foo": NaN, "bar": Infinity, "baz": -Infinity}

You should also put tests in for what happens when you pass NaN 
or infinity to toJSON. It shouldn't silently generate invalid 
JSON.


I believe you are allowed to use very high exponents, though. 
Like: 1E999 . So you need to decide if those should be mapped to 
+Infinity or to the max value…


NaN also come in two forms with differing semantics: 
signalling(NaNs) and quiet (NaN).  NaN is used for 0/0 and 
sqrt(-1), but NaNs is used for illegal values and failure.


For some reason D does not seem to support this aspect of 
IEEE754? I cannot find ".nans" listed on the page 
http://dlang.org/property.html


The distinction is important when you do conditional branching. 
With NaNs you might not be able to figure out which branch to 
take since you might have missed out on a real value, with NaN 
you got the value (which is known to be not real) and you might 
be able to branch.

Re: RFC: std.json sucessor

Am 25.08.2014 14:12, schrieb "Ola Fosheim Grøstad" 
":

On Monday, 25 August 2014 at 11:30:15 UTC, Sönke Ludwig wrote:

I've added support (compile time option [1]) for long and BigInt in
the lexer (and parser), see [2]. JSONValue currently still only stores
double for numbers.


It can be very useful to have a base 10 exponent representation in
certain situations where you need to have the exact same results in two
systems (like a third party ERP server versus a client side
application). Base 2 exponents are tricky (incorrect) when you read ascii.

E.g. I have resorted to using Decimal in Python just to avoid the weird
round off issues when calculating prices where the price is given in
fractions of the order unit.

Perhaps a marginal problem, but could be important for some serious
application areas where you need to integrate D with existing systems
(for which you don't have the source code).


In fact, I've already prepared the code for that, but commented it out 
for now, because I wanted to have an efficient algorithm for converting 
double to Decimal and because we should probably first add a Decimal 
type to Phobos instead of adding it to the JSON module.

Re: RFC: std.json sucessor


Am 25.08.2014 15:07, schrieb Don:

On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:

Following up on the recent "std.jgrandson" thread [1], I've picked up
the work (a lot earlier than anticipated) and finished a first version
of a loose blend of said std.jgrandson, vibe.data.json and some
changes that I had planned for vibe.data.json for a while. I'm quite
pleased by the results so far, although without a serialization
framework it still misses a very important building block.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/
DUB: http://code.dlang.org/packages/std_data_json

The new code contains:
 - Lazy lexer in the form of a token input range (using slices of the
   input if possible)
 - Lazy streaming parser (StAX style) in the form of a node input range
 - Eager DOM style parser returning a JSONValue
 - Range based JSON string generator taking either a token range, a
   node range, or a JSONValue
 - Opt-out location tracking (line/column) for tokens, nodes and values
 - No opDispatch() for JSONValue - this has shown to do more harm than
   good in vibe.data.json

The DOM style JSONValue type is based on std.variant.Algebraic. This
currently has a few usability issues that can be solved by
upgrading/fixing Algebraic:

 - Operator overloading only works sporadically
 - No "tag" enum is supported, so that switch()ing on the type of a
   value doesn't work and an if-else cascade is required
 - Operations and conversions between different Algebraic types is not
   conveniently supported, which gets important when other similar
   formats get supported (e.g. BSON)

Assuming that those points are solved, I'd like to get some early
feedback before going for an official review. One open issue is how to
handle unescaping of string literals. Currently it always unescapes
immediately, which is more efficient for general input ranges when the
unescaped result is needed, but less efficient for string inputs when
the unescaped result is not needed. Maybe a flag could be used to
conditionally switch behavior depending on the input range type.

Destroy away! ;)

[1]: http://forum.dlang.org/thread/lrknjl$co7$1...@digitalmars.com



One missing feature (which is also missing from the existing std.json)
is support for NaN and Infinity as JSON values. Although they are not
part of the formal JSON spec (which is a ridiculous omission, the
argument given for excluding them is fallacious), they do get generated
if you use Javascript's toString to create the JSON. Many JSON libraries
(eg Google's) also generate them, so they are frequently encountered in
practice. So a JSON parser should at least be able to lex them.

ie this should be parsable:

{"foo": NaN, "bar": Infinity, "baz": -Infinity}


This would probably best added as another (CT) optional feature. I think 
the default should strictly adhere to the JSON specification, though.




You should also put tests in for what happens when you pass NaN or
infinity to toJSON. It shouldn't silently generate invalid JSON.


Good point. The current solution to just use formattedWrite("%.16g") is 
also not ideal.

Re: RFC: std.json sucessor

Am 25.08.2014 16:04, schrieb Sönke Ludwig:

Am 25.08.2014 15:07, schrieb Don:

On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:

Following up on the recent "std.jgrandson" thread [1], I've picked up
the work (a lot earlier than anticipated) and finished a first version
of a loose blend of said std.jgrandson, vibe.data.json and some
changes that I had planned for vibe.data.json for a while. I'm quite
pleased by the results so far, although without a serialization
framework it still misses a very important building block.

Code: https://github.com/s-ludwig/std_data_json
Docs: http://s-ludwig.github.io/std_data_json/
DUB: http://code.dlang.org/packages/std_data_json

The new code contains:
- Lazy lexer in the form of a token input range (using slices of the
input if possible)
- Lazy streaming parser (StAX style) in the form of a node input range
- Eager DOM style parser returning a JSONValue
- Range based JSON string generator taking either a token range, a
node range, or a JSONValue
- Opt-out location tracking (line/column) for tokens, nodes and values
- No opDispatch() for JSONValue - this has shown to do more harm than
good in vibe.data.json

The DOM style JSONValue type is based on std.variant.Algebraic. This
currently has a few usability issues that can be solved by
upgrading/fixing Algebraic:

- Operator overloading only works sporadically
- No "tag" enum is supported, so that switch()ing on the type of a
value doesn't work and an if-else cascade is required
- Operations and conversions between different Algebraic types is not
conveniently supported, which gets important when other similar
formats get supported (e.g. BSON)

Assuming that those points are solved, I'd like to get some early
feedback before going for an official review. One open issue is how to
handle unescaping of string literals. Currently it always unescapes
immediately, which is more efficient for general input ranges when the
unescaped result is needed, but less efficient for string inputs when
the unescaped result is not needed. Maybe a flag could be used to
conditionally switch behavior depending on the input range type.

Destroy away! ;)

[1]: http://forum.dlang.org/thread/lrknjl$co7$1...@digitalmars.com

One missing feature (which is also missing from the existing std.json)
is support for NaN and Infinity as JSON values. Although they are not
part of the formal JSON spec (which is a ridiculous omission, the
argument given for excluding them is fallacious), they do get generated
if you use Javascript's toString to create the JSON. Many JSON libraries
(eg Google's) also generate them, so they are frequently encountered in
practice. So a JSON parser should at least be able to lex them.

ie this should be parsable:

{"foo": NaN, "bar": Infinity, "baz": -Infinity}

This would probably best added as another (CT) optional feature. I think
the default should strictly adhere to the JSON specification, though.

http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/LexOptions.specialFloatLiterals.html

You should also put tests in for what happens when you pass NaN or
infinity to toJSON. It shouldn't silently generate invalid JSON.

Good point. The current solution to just use formattedWrite("%.16g") is
also not ideal.

By default, floating-point special values are now output as 'null',
according to the ECMA-script standard. Optionally, they will be emitted
as 'NaN' and 'Infinity':

http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.specialFloatLiterals.html

Re: RFC: std.json sucessor


On Monday, 25 August 2014 at 15:34:29 UTC, Sönke Ludwig wrote:
By default, floating-point special values are now output as 
'null', according to the ECMA-script standard. Optionally, they 
will be emitted as 'NaN' and 'Infinity':


ECMAScript presumes double. I think one should base Phobos on 
language-independent standards. I suggest:


http://tools.ietf.org/html/rfc7159

For a web server it would be most useful to get an exception 
since you risk ending up with web-clients not working with no 
logging. It is better to have an exception and log an error so 
the problem can be fixed.

Re: RFC: std.json sucessor

On Monday, 25 August 2014 at 15:46:12 UTC, Ola Fosheim Grøstad 
wrote:
For a web server it would be most useful to get an exception 
since you risk ending up with web-clients not working with no 
logging. It is better to have an exception and log an error so 
the problem can be fixed.


Let me expand a bit on the difference between web clients and 
servers, assuming D is used on the server:


* Web servers have to check all input and log illegal activity. 
It is either a bug or an attack.


* Web clients don't have to check input from the server (at most 
a crypto check) and should not do double work if servers validate 
anyway.


* Web servers detect errors and send the error as a response to 
the client that displays it as a warning to the user. This is the 
uncommon case so you don't want to burden the client with it.


From this we can infer:

- It makes more sense for ECMAScript to turn illegal values into 
null since it runs on the client.


- The server needs efficient validation of input so that it can 
have faster response.


- The more integration of validation of typedness you can have in 
the parser, the better.



Thus it would be an advantage to be able to configure the 
validation done in the parser (through template mechanisms):



1. On write: throw exception on all illegal values or values that 
cannot be represented in the format. If the values are illegal 
then the client should not receive it. It could cause legal 
problems (like wrong prices).



2. On read: add the ability to configure the validation of 
typedness on many parameters:


- no nulls, no dicts, only nesting arrays etc

- predetermined key-values and automatic mapping to structs on 
exact match.


- require all leaf arrays to be uniform (array of strings, array 
of numbers)


- match a predefined grammar

etc

Re: RFC: std.json sucessor


On 8/23/2014 6:32 PM, Brad Roberts via Digitalmars-d wrote:

I'm not convinced that using an adapter algorithm won't be just as fast.

Consider your own talks on optimizing the existing dmd lexer.  In those talks
you've talked about the evils of additional processing on every byte.  That's
what you're talking about here.  While it's possible that the inliner and other
optimizer steps might be able to integrate the two phases and remove some
overhead, I'll believe it when I see the resulting assembly code.


On the other hand, deadalnix demonstrated that the ldc optimizer was able to 
remove the extra code.


I have a reasonable faith that optimization can be improved where necessary to 
cover this.

Re: RFC: std.json sucessor


On 8/23/2014 3:51 PM, Andrei Alexandrescu wrote:

An adapter would solve the wrong problem here. There's nothing to adapt from and
to.

An adapter would be good if e.g. the stream uses UTF-16 or some Windows
encoding. Bytes are the natural input for a json parser.


The adaptation is to take arbitrary byte input in an unknown encoding and 
produce valid UTF.


Note that many html readers scan the bytes to see if it is ASCII, UTF, some code 
page encoding, Shift-JIS, etc., and translate accordingly. I do not see why that 
is less costly to put inside the JSON lexer than as an adapter.

Re: RFC: std.json sucessor

On 8/25/2014 6:23 AM, "Ola Fosheim Grøstad" 
" wrote:

On Monday, 25 August 2014 at 13:07:08 UTC, Don wrote:

practice. So a JSON parser should at least be able to lex them.

ie this should be parsable:

{"foo": NaN, "bar": Infinity, "baz": -Infinity}

You should also put tests in for what happens when you pass NaN or infinity to
toJSON. It shouldn't silently generate invalid JSON.


I believe you are allowed to use very high exponents, though. Like: 1E999 . So
you need to decide if those should be mapped to +Infinity or to the max value…


Infinity. Mapping to max value would be a horrible bug.



NaN also come in two forms with differing semantics: signalling(NaNs) and quiet
(NaN).  NaN is used for 0/0 and sqrt(-1), but NaNs is used for illegal values
and failure.

For some reason D does not seem to support this aspect of IEEE754? I cannot find
".nans" listed on the page http://dlang.org/property.html


Because I tried supporting them in C++. It doesn't work for various reasons. 
Nobody else supports them, either.

Re: RFC: std.json sucessor

2014-08-25 Thread simendsjo via Digitalmars-d

On 08/25/2014 09:35 PM, Walter Bright wrote:
> On 8/23/2014 6:32 PM, Brad Roberts via Digitalmars-d wrote:
>>> I'm not convinced that using an adapter algorithm won't be just as fast.
>> Consider your own talks on optimizing the existing dmd lexer.  In
>> those talks
>> you've talked about the evils of additional processing on every byte. 
>> That's
>> what you're talking about here.  While it's possible that the inliner
>> and other
>> optimizer steps might be able to integrate the two phases and remove some
>> overhead, I'll believe it when I see the resulting assembly code.
> 
> On the other hand, deadalnix demonstrated that the ldc optimizer was
> able to remove the extra code.
> 
> I have a reasonable faith that optimization can be improved where
> necessary to cover this.

I just happened to write a very small script yesterday and tested with
the compilers (with dub --build=release).

dmd: 2.8 mb
gdc: 3.3 mb
ldc  0.5 mb

So ldc can remove quite a substantial amount of code in some cases.

Re: RFC: std.json sucessor


On Monday, 25 August 2014 at 19:38:05 UTC, Walter Bright wrote:
The adaptation is to take arbitrary byte input in an unknown 
encoding and produce valid UTF.


I agree.

For a restful http service the encoding should be specified in 
the http header and the input rejected if it isn't UTF 
compatible. For that use scenario you only want validation, not 
conversion. However some validation is free, like if you only 
accept numbers you could just turn off parsing of strings in the 
template…


If files are read from storage then you can reread the file if it 
fails validation on the first pass.


I wonder, in which use scenario it is that both of these 
conditions fail?


1. unspecified character-set and cannot assume UTF for JSON
3. unable to re-parse

Re: RFC: std.json sucessor


On Monday, 25 August 2014 at 19:42:03 UTC, Walter Bright wrote:

Infinity. Mapping to max value would be a horrible bug.


Yes… but then you are reading an illegal value that JSON does not 
support…


For some reason D does not seem to support this aspect of 
IEEE754? I cannot find

".nans" listed on the page http://dlang.org/property.html


Because I tried supporting them in C++. It doesn't work for 
various reasons. Nobody else supports them, either.


I haven't tested, but Python is supposed to throw on NaNs.

gcc has support for nans in their documentation:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

IBM Fortran supports it…

I think supporting signaling NaN is important for correctness.

Re: RFC: std.json sucessor

Am 25.08.2014 17:46, schrieb "Ola Fosheim Grøstad" 
":

On Monday, 25 August 2014 at 15:34:29 UTC, Sönke Ludwig wrote:

By default, floating-point special values are now output as 'null',
according to the ECMA-script standard. Optionally, they will be
emitted as 'NaN' and 'Infinity':


ECMAScript presumes double. I think one should base Phobos on
language-independent standards. I suggest:

http://tools.ietf.org/html/rfc7159


Well, of course it's based on that RFC, did you seriously think 
something else? However, that standard has no mention of infinity or 
NaN, and since JSON is designed to be a subset of ECMA script, it's 
basically the only thing that comes close.




For a web server it would be most useful to get an exception since you
risk ending up with web-clients not working with no logging. It is
better to have an exception and log an error so the problem can be fixed.


Although you have a point there of course, it's also highly unlikely 
that those clients would work correctly if we presume that JSON 
supported infinity/NaN. So it would really be just coincidence to detect 
a bug like that.


But I generally agree, it's just that the anti-exception voices are 
pretty loud these days (including Walter's), so that I opted for a 
non-throwing solution instead. I guess it wouldn't hurt though to 
default to throwing an exception, while still providing the 
GeneratorOptions.specialFloatLiterals option to handle those values 
without exception overhead, but in a non standard-conforming way.

Re: RFC: std.json sucessor

On Monday, 25 August 2014 at 20:04:10 UTC, Ola Fosheim Grøstad 
wrote:

I think supporting signaling NaN is important for correctness.


It is defined in C++11:

http://en.cppreference.com/w/cpp/types/numeric_limits/signaling_NaN

Re: RFC: std.json sucessor


- It makes more sense for ECMAScript to turn illegal values into null
since it runs on the client.


Like... node.js?

Sorry, just kidding.

I don't think it makes sense for clients to be less strict about such 
things, but I do agree with your assessment about being as strict as 
possible on the server. I also do think that exceptions are a perfect 
tool especially for server applications and that instead of avoiding 
them because they are slow, they should better be made fast enough to 
not be an issue.

Re: RFC: std.json sucessor

Am 25.08.2014 21:50, schrieb "Ola Fosheim Grøstad" 
":

On Monday, 25 August 2014 at 19:38:05 UTC, Walter Bright wrote:

The adaptation is to take arbitrary byte input in an unknown encoding
and produce valid UTF.


I agree.

For a restful http service the encoding should be specified in the http
header and the input rejected if it isn't UTF compatible. For that use
scenario you only want validation, not conversion. However some
validation is free, like if you only accept numbers you could just turn
off parsing of strings in the template…

If files are read from storage then you can reread the file if it fails
validation on the first pass.

I wonder, in which use scenario it is that both of these conditions fail?

1. unspecified character-set and cannot assume UTF for JSON
3. unable to re-parse


BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159, which 
is another argument for just letting the lexer assume valid UTF.

Re: RFC: std.json sucessor


On Monday, 25 August 2014 at 20:21:01 UTC, Sönke Ludwig wrote:
Well, of course it's based on that RFC, did you seriously think 
something else?


I made no assumptions, just responded to what you wrote :-). It 
would be reasonable in the context of vibe.d to assume the 
ECMAScript spec.


But I generally agree, it's just that the anti-exception voices 
are pretty loud these days (including Walter's), so that I 
opted for a non-throwing solution instead.


Yes, the minimum requirement is to just get "did not validate" 
directly as a single value. One can create a wrapper to get 
exceptions.


I guess it wouldn't hurt though to default to throwing an 
exception, while still providing the 
GeneratorOptions.specialFloatLiterals option to handle those 
values without exception overhead, but in a non 
standard-conforming way.


What I care most about is getting all the free validation that 
can be added with no extra cost.


That will make writing web services easier. Like if you can 
define constraints like:


- root is array, values are strings.
- root is array, second level only arrays, third level is numbers
- root is dict, all arrays contain only numbers

What is a bit annoying about generic libs is that you have no 
idea what you are getting so you have to spend time creating dull 
validation code.


But maybe StructuredJSON should be a separate library. It would 
be useful for REST services to specify the grammar and 
auto-generate both javascript and D structures to hold it along 
with validation code.


However, just turning off parsing of "true", "false", "null", 
"[", "{" etc seems like a cheap addition that also can improve 
parsing speed if the compiler can make do with two if statements 
instead of a switch.


Ola.

Re: RFC: std.json sucessor


Am 25.08.2014 22:21, schrieb Sönke Ludwig:

that standard has no mention of infinity or
NaN


Sorry, to be precise, it has no suggestion of how to *handle* infinity 
or NaN.

Re: RFC: std.json sucessor


On Monday, 25 August 2014 at 20:35:32 UTC, Sönke Ludwig wrote:
BTW, JSON is *required* to be UTF encoded anyway as per 
RFC-7159, which is another argument for just letting the lexer 
assume valid UTF.


The lexer cannot assume valid UTF since the client might be a 
rogue, but it can just bail out if the lookahead isn't jSON? So 
UTF-validation is limited to strings.


You have to parse the strings because of the \u escapes of 
course, so some basic validation is unavoidable? But I guess full 
validation of string content could be another useful option along 
with "ignore escapes" for the case where you want to avoid 
decode-encode scenarios. (like for a proxy, or if you store 
pre-escaped unicode in a database)

Re: RFC: std.json sucessor