[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation
[ https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573324#comment-13573324 ] Robert Newson commented on COUCHDB-1670: list_to_integer is the thing that fails on this input fwiw. Replicator crashes if numbers in checkpoint docs are expressed in scientific notation - Key: COUCHDB-1670 URL: https://issues.apache.org/jira/browse/COUCHDB-1670 Project: CouchDB Issue Type: Bug Components: Replication Reporter: Jens Alfke The CouchDB 1.2 replicator process crashes with an Erlang exception when parsing a checkpoint document read back from a remote database, if numbers in the document were JSON-encoded in scientific notation instead of as integers. This includes the properties source_last_seq, end_last_seq, start_last_seq. That is, the following encoding works fine: ..., source_last_seq: 1234567, ... whereas this completely-equivalent encoding causes an exception: ..., source_last_seq: 1.234567e+06, ... This issue raised its head as a result of a CouchDB-compatible engine I'm writing (the Couchbase Sync Gateway) which can serve as a passive replication endpoint. It's implemented in Go, and the Go JSON package has the side effect of (a) parsing all JSON numbers into type 'double', and (b) encoding all doubles into JSON using scientific notation if they're more than six digits long. The net effect is that when CouchDB stores a checkpoint into the Sync Adapter's database and then later reads it back, it barfs due to the scientific notation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation
[ https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573648#comment-13573648 ] Paul Joseph Davis commented on COUCHDB-1670: Ah, try replacing that with a couch_util:json_decode/1 or w/e. Replicator crashes if numbers in checkpoint docs are expressed in scientific notation - Key: COUCHDB-1670 URL: https://issues.apache.org/jira/browse/COUCHDB-1670 Project: CouchDB Issue Type: Bug Components: Replication Reporter: Jens Alfke The CouchDB 1.2 replicator process crashes with an Erlang exception when parsing a checkpoint document read back from a remote database, if numbers in the document were JSON-encoded in scientific notation instead of as integers. This includes the properties source_last_seq, end_last_seq, start_last_seq. That is, the following encoding works fine: ..., source_last_seq: 1234567, ... whereas this completely-equivalent encoding causes an exception: ..., source_last_seq: 1.234567e+06, ... This issue raised its head as a result of a CouchDB-compatible engine I'm writing (the Couchbase Sync Gateway) which can serve as a passive replication endpoint. It's implemented in Go, and the Go JSON package has the side effect of (a) parsing all JSON numbers into type 'double', and (b) encoding all doubles into JSON using scientific notation if they're more than six digits long. The net effect is that when CouchDB stores a checkpoint into the Sync Adapter's database and then later reads it back, it barfs due to the scientific notation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation
[ https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572643#comment-13572643 ] Robert Newson commented on COUCHDB-1670: That's an interesting one, however I think we should hold fast to the rule that since is opaque. You have to pass back exactly what you got from couchdb. That said, we should make this a tidier 400 Bad Request. Replicator crashes if numbers in checkpoint docs are expressed in scientific notation - Key: COUCHDB-1670 URL: https://issues.apache.org/jira/browse/COUCHDB-1670 Project: CouchDB Issue Type: Bug Components: Replication Reporter: Jens Alfke The CouchDB 1.2 replicator process crashes with an Erlang exception when parsing a checkpoint document read back from a remote database, if numbers in the document were JSON-encoded in scientific notation instead of as integers. This includes the properties source_last_seq, end_last_seq, start_last_seq. That is, the following encoding works fine: ..., source_last_seq: 1234567, ... whereas this completely-equivalent encoding causes an exception: ..., source_last_seq: 1.234567e+06, ... This issue raised its head as a result of a CouchDB-compatible engine I'm writing (the Couchbase Sync Gateway) which can serve as a passive replication endpoint. It's implemented in Go, and the Go JSON package has the side effect of (a) parsing all JSON numbers into type 'double', and (b) encoding all doubles into JSON using scientific notation if they're more than six digits long. The net effect is that when CouchDB stores a checkpoint into the Sync Adapter's database and then later reads it back, it barfs due to the scientific notation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation
[ https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572655#comment-13572655 ] Jens Alfke commented on COUCHDB-1670: - since is opaque. You have to pass back exactly what you got from couchdb. I don't think that's a reasonable expectation. The JSON is going to be transformed anyway (to insert the _rev), so at some point it's going to be translated into an internal format and then regenerated. The output has to be an equivalent JSON document, but that doesn't mean byte-for-byte equivalence. For instance, object keys could be in a different order, Unicode escapes could be turned into literals or vice versa, and numbers might be represented differently, like changing to/from scientific notation or suppressing trailing zeros after the decimal point. Replicator crashes if numbers in checkpoint docs are expressed in scientific notation - Key: COUCHDB-1670 URL: https://issues.apache.org/jira/browse/COUCHDB-1670 Project: CouchDB Issue Type: Bug Components: Replication Reporter: Jens Alfke The CouchDB 1.2 replicator process crashes with an Erlang exception when parsing a checkpoint document read back from a remote database, if numbers in the document were JSON-encoded in scientific notation instead of as integers. This includes the properties source_last_seq, end_last_seq, start_last_seq. That is, the following encoding works fine: ..., source_last_seq: 1234567, ... whereas this completely-equivalent encoding causes an exception: ..., source_last_seq: 1.234567e+06, ... This issue raised its head as a result of a CouchDB-compatible engine I'm writing (the Couchbase Sync Gateway) which can serve as a passive replication endpoint. It's implemented in Go, and the Go JSON package has the side effect of (a) parsing all JSON numbers into type 'double', and (b) encoding all doubles into JSON using scientific notation if they're more than six digits long. The net effect is that when CouchDB stores a checkpoint into the Sync Adapter's database and then later reads it back, it barfs due to the scientific notation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation
[ https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573163#comment-13573163 ] Jason Smith commented on COUCHDB-1670: -- Other issues have brought up the meaning of numbers in JSON. (For example, the spec just says that numbers are a series of numerals, implying arbitrary precision, not IEEE 754.) I agree with Jens. In the important sense, he **is** returning the same opaque value to CouchDB. He encoded it differently, but IMO those differences are more like insignificant white space in the JSON. (CouchDB does not follow this principle perfectly. IIRC, if you place insignificant whitespace in your `_view?key=xxx` query then you will not get a result.) If the Go library does that, this is a good opportunity for CouchDB to become more liberal in what it accepts. Replicator crashes if numbers in checkpoint docs are expressed in scientific notation - Key: COUCHDB-1670 URL: https://issues.apache.org/jira/browse/COUCHDB-1670 Project: CouchDB Issue Type: Bug Components: Replication Reporter: Jens Alfke The CouchDB 1.2 replicator process crashes with an Erlang exception when parsing a checkpoint document read back from a remote database, if numbers in the document were JSON-encoded in scientific notation instead of as integers. This includes the properties source_last_seq, end_last_seq, start_last_seq. That is, the following encoding works fine: ..., source_last_seq: 1234567, ... whereas this completely-equivalent encoding causes an exception: ..., source_last_seq: 1.234567e+06, ... This issue raised its head as a result of a CouchDB-compatible engine I'm writing (the Couchbase Sync Gateway) which can serve as a passive replication endpoint. It's implemented in Go, and the Go JSON package has the side effect of (a) parsing all JSON numbers into type 'double', and (b) encoding all doubles into JSON using scientific notation if they're more than six digits long. The net effect is that when CouchDB stores a checkpoint into the Sync Adapter's database and then later reads it back, it barfs due to the scientific notation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation
[ https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573175#comment-13573175 ] Joan Touzet commented on COUCHDB-1670: -- I think it's an incorrect assumption that since must always be a base 10 number, scientific notation or not. Replicator crashes if numbers in checkpoint docs are expressed in scientific notation - Key: COUCHDB-1670 URL: https://issues.apache.org/jira/browse/COUCHDB-1670 Project: CouchDB Issue Type: Bug Components: Replication Reporter: Jens Alfke The CouchDB 1.2 replicator process crashes with an Erlang exception when parsing a checkpoint document read back from a remote database, if numbers in the document were JSON-encoded in scientific notation instead of as integers. This includes the properties source_last_seq, end_last_seq, start_last_seq. That is, the following encoding works fine: ..., source_last_seq: 1234567, ... whereas this completely-equivalent encoding causes an exception: ..., source_last_seq: 1.234567e+06, ... This issue raised its head as a result of a CouchDB-compatible engine I'm writing (the Couchbase Sync Gateway) which can serve as a passive replication endpoint. It's implemented in Go, and the Go JSON package has the side effect of (a) parsing all JSON numbers into type 'double', and (b) encoding all doubles into JSON using scientific notation if they're more than six digits long. The net effect is that when CouchDB stores a checkpoint into the Sync Adapter's database and then later reads it back, it barfs due to the scientific notation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COUCHDB-1670) Replicator crashes if numbers in checkpoint docs are expressed in scientific notation
[ https://issues.apache.org/jira/browse/COUCHDB-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573190#comment-13573190 ] Paul Joseph Davis commented on COUCHDB-1670: Jason and Jens are right here, although I do find it a bit surprising that we actually have an issue here given how erlang treats numbers. My only guess is that we have a guard for is_integer/1 instead of is_number/1 which would badarg on the parsed value (mochijson2 at least would parse that as a float). Couple minor comments on discussion: [~snej] has it right in that we can't expect that JSON will roundtrip byte-for-byte when we have an intermediary translation into an Erlang representation. We already rely on them facts so that we can tell people to sshh when we mutate number representations. [~jhs] Is kinda right but its not just a series of numerals, though its not much more than looks like a valid number. While the encoding differences aren't quite white space difference levels, they are definitely in below the threshold of what we should tolerate, especially considering what we're using them for. I also have no idea what [~jhs] is talking about with whitespace in the key. If there's truth to that then it sounds like a bug and not just merely a json encoding difference. [~jhs] is also quoting Postel's law which is a crock and I have spent much time trying to quash the influence of that terrible idea in the project. The number of times I've gotten pissed trying to remember if its descending=true or reverse=true and checking if I have typos is annoyingly non-zero. [~wohali] is also right in the generic sense that since (hehehe) should not be restricted to a numerical value and if we didn't have what appear to be laten bugs based on that assumption this probably wouldn't even be an issue. And if y'all want to spend more time on this, start investigating round tripping the value 1.1 through a JSON decoder/encoder pair. I'll be here with the tissues when you get to asserting 56bit rounding precisions with the GNU libc strtod assumptions. Replicator crashes if numbers in checkpoint docs are expressed in scientific notation - Key: COUCHDB-1670 URL: https://issues.apache.org/jira/browse/COUCHDB-1670 Project: CouchDB Issue Type: Bug Components: Replication Reporter: Jens Alfke The CouchDB 1.2 replicator process crashes with an Erlang exception when parsing a checkpoint document read back from a remote database, if numbers in the document were JSON-encoded in scientific notation instead of as integers. This includes the properties source_last_seq, end_last_seq, start_last_seq. That is, the following encoding works fine: ..., source_last_seq: 1234567, ... whereas this completely-equivalent encoding causes an exception: ..., source_last_seq: 1.234567e+06, ... This issue raised its head as a result of a CouchDB-compatible engine I'm writing (the Couchbase Sync Gateway) which can serve as a passive replication endpoint. It's implemented in Go, and the Go JSON package has the side effect of (a) parsing all JSON numbers into type 'double', and (b) encoding all doubles into JSON using scientific notation if they're more than six digits long. The net effect is that when CouchDB stores a checkpoint into the Sync Adapter's database and then later reads it back, it barfs due to the scientific notation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira