[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation

2013-02-07 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573324#comment-13573324
 ] 

Robert Newson commented on COUCHDB-1670:


list_to_integer is the thing that fails on this input fwiw.

 Replicator crashes if numbers in checkpoint docs are expressed in scientific 
 notation
 -

 Key: COUCHDB-1670
 URL: https://issues.apache.org/jira/browse/COUCHDB-1670
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Reporter: Jens Alfke

 The CouchDB 1.2 replicator process crashes with an Erlang exception when 
 parsing a checkpoint document read back from a remote database, if numbers in 
 the document were JSON-encoded in scientific notation instead of as integers. 
 This includes the properties source_last_seq, end_last_seq, start_last_seq.
 That is, the following encoding works fine:
 ..., source_last_seq: 1234567, ...
 whereas this completely-equivalent encoding causes an exception:
 ..., source_last_seq: 1.234567e+06, ...
 This issue raised its head as a result of a CouchDB-compatible engine I'm 
 writing (the Couchbase Sync Gateway) which can serve as a passive replication 
 endpoint. It's implemented in Go, and the Go JSON package has the side effect 
 of (a) parsing all JSON numbers into type 'double', and (b) encoding all 
 doubles into JSON using scientific notation if they're more than six digits 
 long. The net effect is that when CouchDB stores a checkpoint into the Sync 
 Adapter's database and then later reads it back, it barfs due to the 
 scientific notation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation

2013-02-07 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573648#comment-13573648
 ] 

Paul Joseph Davis commented on COUCHDB-1670:


Ah, try replacing that with a couch_util:json_decode/1 or w/e.

 Replicator crashes if numbers in checkpoint docs are expressed in scientific 
 notation
 -

 Key: COUCHDB-1670
 URL: https://issues.apache.org/jira/browse/COUCHDB-1670
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Reporter: Jens Alfke

 The CouchDB 1.2 replicator process crashes with an Erlang exception when 
 parsing a checkpoint document read back from a remote database, if numbers in 
 the document were JSON-encoded in scientific notation instead of as integers. 
 This includes the properties source_last_seq, end_last_seq, start_last_seq.
 That is, the following encoding works fine:
 ..., source_last_seq: 1234567, ...
 whereas this completely-equivalent encoding causes an exception:
 ..., source_last_seq: 1.234567e+06, ...
 This issue raised its head as a result of a CouchDB-compatible engine I'm 
 writing (the Couchbase Sync Gateway) which can serve as a passive replication 
 endpoint. It's implemented in Go, and the Go JSON package has the side effect 
 of (a) parsing all JSON numbers into type 'double', and (b) encoding all 
 doubles into JSON using scientific notation if they're more than six digits 
 long. The net effect is that when CouchDB stores a checkpoint into the Sync 
 Adapter's database and then later reads it back, it barfs due to the 
 scientific notation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation

2013-02-06 Thread Robert Newson (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572643#comment-13572643
 ] 

Robert Newson commented on COUCHDB-1670:


That's an interesting one, however I think we should hold fast to the rule that 
since is opaque. You have to pass back exactly what you got from couchdb. That 
said, we should make this a tidier 400 Bad Request.


 Replicator crashes if numbers in checkpoint docs are expressed in scientific 
 notation
 -

 Key: COUCHDB-1670
 URL: https://issues.apache.org/jira/browse/COUCHDB-1670
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Reporter: Jens Alfke

 The CouchDB 1.2 replicator process crashes with an Erlang exception when 
 parsing a checkpoint document read back from a remote database, if numbers in 
 the document were JSON-encoded in scientific notation instead of as integers. 
 This includes the properties source_last_seq, end_last_seq, start_last_seq.
 That is, the following encoding works fine:
 ..., source_last_seq: 1234567, ...
 whereas this completely-equivalent encoding causes an exception:
 ..., source_last_seq: 1.234567e+06, ...
 This issue raised its head as a result of a CouchDB-compatible engine I'm 
 writing (the Couchbase Sync Gateway) which can serve as a passive replication 
 endpoint. It's implemented in Go, and the Go JSON package has the side effect 
 of (a) parsing all JSON numbers into type 'double', and (b) encoding all 
 doubles into JSON using scientific notation if they're more than six digits 
 long. The net effect is that when CouchDB stores a checkpoint into the Sync 
 Adapter's database and then later reads it back, it barfs due to the 
 scientific notation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation

2013-02-06 Thread Jens Alfke (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572655#comment-13572655
 ] 

Jens Alfke commented on COUCHDB-1670:
-

 since is opaque. You have to pass back exactly what you got from couchdb.

I don't think that's a reasonable expectation. The JSON is going to be 
transformed anyway (to insert the _rev), so at some point it's going to be 
translated into an internal format and then regenerated. The output has to be 
an equivalent JSON document, but that doesn't mean byte-for-byte equivalence. 
For instance, object keys could be in a different order, Unicode escapes could 
be turned into literals or vice versa, and numbers might be represented 
differently, like changing to/from scientific notation or suppressing trailing 
zeros after the decimal point.

 Replicator crashes if numbers in checkpoint docs are expressed in scientific 
 notation
 -

 Key: COUCHDB-1670
 URL: https://issues.apache.org/jira/browse/COUCHDB-1670
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Reporter: Jens Alfke

 The CouchDB 1.2 replicator process crashes with an Erlang exception when 
 parsing a checkpoint document read back from a remote database, if numbers in 
 the document were JSON-encoded in scientific notation instead of as integers. 
 This includes the properties source_last_seq, end_last_seq, start_last_seq.
 That is, the following encoding works fine:
 ..., source_last_seq: 1234567, ...
 whereas this completely-equivalent encoding causes an exception:
 ..., source_last_seq: 1.234567e+06, ...
 This issue raised its head as a result of a CouchDB-compatible engine I'm 
 writing (the Couchbase Sync Gateway) which can serve as a passive replication 
 endpoint. It's implemented in Go, and the Go JSON package has the side effect 
 of (a) parsing all JSON numbers into type 'double', and (b) encoding all 
 doubles into JSON using scientific notation if they're more than six digits 
 long. The net effect is that when CouchDB stores a checkpoint into the Sync 
 Adapter's database and then later reads it back, it barfs due to the 
 scientific notation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation

2013-02-06 Thread Jason Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573163#comment-13573163
 ] 

Jason Smith commented on COUCHDB-1670:
--

Other issues have brought up the meaning of numbers in JSON. (For example, 
the spec just says that numbers are a series of numerals, implying arbitrary 
precision, not IEEE 754.)

I agree with Jens. In the important sense, he **is** returning the same opaque 
value to CouchDB. He encoded it differently, but IMO those differences are more 
like insignificant white space in the JSON.

(CouchDB does not follow this principle perfectly. IIRC, if you place 
insignificant whitespace in your `_view?key=xxx` query then you will not get 
a result.)

If the Go library does that, this is a good opportunity for CouchDB to become 
more liberal in what it accepts.

 Replicator crashes if numbers in checkpoint docs are expressed in scientific 
 notation
 -

 Key: COUCHDB-1670
 URL: https://issues.apache.org/jira/browse/COUCHDB-1670
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Reporter: Jens Alfke

 The CouchDB 1.2 replicator process crashes with an Erlang exception when 
 parsing a checkpoint document read back from a remote database, if numbers in 
 the document were JSON-encoded in scientific notation instead of as integers. 
 This includes the properties source_last_seq, end_last_seq, start_last_seq.
 That is, the following encoding works fine:
 ..., source_last_seq: 1234567, ...
 whereas this completely-equivalent encoding causes an exception:
 ..., source_last_seq: 1.234567e+06, ...
 This issue raised its head as a result of a CouchDB-compatible engine I'm 
 writing (the Couchbase Sync Gateway) which can serve as a passive replication 
 endpoint. It's implemented in Go, and the Go JSON package has the side effect 
 of (a) parsing all JSON numbers into type 'double', and (b) encoding all 
 doubles into JSON using scientific notation if they're more than six digits 
 long. The net effect is that when CouchDB stores a checkpoint into the Sync 
 Adapter's database and then later reads it back, it barfs due to the 
 scientific notation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation

2013-02-06 Thread Joan Touzet (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573175#comment-13573175
 ] 

Joan Touzet commented on COUCHDB-1670:
--

I think it's an incorrect assumption that since must always be a base 10 
number, scientific notation or not.

 Replicator crashes if numbers in checkpoint docs are expressed in scientific 
 notation
 -

 Key: COUCHDB-1670
 URL: https://issues.apache.org/jira/browse/COUCHDB-1670
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Reporter: Jens Alfke

 The CouchDB 1.2 replicator process crashes with an Erlang exception when 
 parsing a checkpoint document read back from a remote database, if numbers in 
 the document were JSON-encoded in scientific notation instead of as integers. 
 This includes the properties source_last_seq, end_last_seq, start_last_seq.
 That is, the following encoding works fine:
 ..., source_last_seq: 1234567, ...
 whereas this completely-equivalent encoding causes an exception:
 ..., source_last_seq: 1.234567e+06, ...
 This issue raised its head as a result of a CouchDB-compatible engine I'm 
 writing (the Couchbase Sync Gateway) which can serve as a passive replication 
 endpoint. It's implemented in Go, and the Go JSON package has the side effect 
 of (a) parsing all JSON numbers into type 'double', and (b) encoding all 
 doubles into JSON using scientific notation if they're more than six digits 
 long. The net effect is that when CouchDB stores a checkpoint into the Sync 
 Adapter's database and then later reads it back, it barfs due to the 
 scientific notation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation

2013-02-06 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573190#comment-13573190
 ] 

Paul Joseph Davis commented on COUCHDB-1670:


Jason and Jens are right here, although I do find it a bit surprising that we 
actually have an issue here given how erlang treats numbers. My only guess is 
that we have a guard for is_integer/1 instead of is_number/1 which would badarg 
on the parsed value (mochijson2 at least would parse that as a float).

Couple minor comments on discussion:

[~snej] has it right in that we can't expect that JSON will roundtrip 
byte-for-byte when we have an intermediary translation into an Erlang 
representation. We already rely on them facts so that we can tell people to 
sshh when we mutate number representations.

[~jhs] Is kinda right but its not just a series of numerals, though its not 
much more than looks like a valid number. While the encoding differences 
aren't quite white space difference levels, they are definitely in below the 
threshold of what we should tolerate, especially considering what we're using 
them for.

I also have no idea what [~jhs] is talking about with whitespace in the key. If 
there's truth to that then it sounds like a bug and not just merely a json 
encoding difference.

[~jhs] is also quoting Postel's law which is a crock and I have spent much time 
trying to quash the influence of that terrible idea in the project. The number 
of times I've gotten pissed trying to remember if its descending=true or 
reverse=true and checking if I have typos is annoyingly non-zero.

[~wohali] is also right in the generic sense that since (hehehe) should not be 
restricted to a numerical value and if we didn't have what appear to be laten 
bugs based on that assumption this probably wouldn't even be an issue.

And if y'all want to spend more time on this, start investigating round 
tripping the value 1.1 through a JSON decoder/encoder pair. I'll be here with 
the tissues when you get to asserting 56bit rounding precisions with the GNU 
libc strtod assumptions.

 Replicator crashes if numbers in checkpoint docs are expressed in scientific 
 notation
 -

 Key: COUCHDB-1670
 URL: https://issues.apache.org/jira/browse/COUCHDB-1670
 Project: CouchDB
  Issue Type: Bug
  Components: Replication
Reporter: Jens Alfke

 The CouchDB 1.2 replicator process crashes with an Erlang exception when 
 parsing a checkpoint document read back from a remote database, if numbers in 
 the document were JSON-encoded in scientific notation instead of as integers. 
 This includes the properties source_last_seq, end_last_seq, start_last_seq.
 That is, the following encoding works fine:
 ..., source_last_seq: 1234567, ...
 whereas this completely-equivalent encoding causes an exception:
 ..., source_last_seq: 1.234567e+06, ...
 This issue raised its head as a result of a CouchDB-compatible engine I'm 
 writing (the Couchbase Sync Gateway) which can serve as a passive replication 
 endpoint. It's implemented in Go, and the Go JSON package has the side effect 
 of (a) parsing all JSON numbers into type 'double', and (b) encoding all 
 doubles into JSON using scientific notation if they're more than six digits 
 long. The net effect is that when CouchDB stores a checkpoint into the Sync 
 Adapter's database and then later reads it back, it barfs due to the 
 scientific notation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira