This is an automated email from the ASF dual-hosted git repository. nickva pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/couchdb-jiffy.git
commit f6f49d1c0dce7d42d89b7b25aaab3a1ceb17c86e Author: Nick Vatamaniuc <[email protected]> AuthorDate: Mon Apr 13 22:44:00 2026 -0400 Use ffc.h library to parse numbers This is currently the fastest number parsing library available in C. It's port of Dr. Lemire's https://github.com/fastfloat/fast_float to C by Koleman Nix and available at https://github.com/kolemannix/ffc.h The nice thing about this library is it can parse the numbers directly from the stream without needing a separe copy into buffer just to have a \0 terminator for strtod and strtol. --- LICENSE | 221 ++++ README.ffc.md | 11 + c_src/decoder.c | 47 +- c_src/ffc.h | 3242 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ c_src/jiffy.c | 4 + 5 files changed, 3501 insertions(+), 24 deletions(-) diff --git a/LICENSE b/LICENSE index 088a534..ba6f644 100644 --- a/LICENSE +++ b/LICENSE @@ -342,3 +342,224 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +ffc.h Apache 2.0 License +======================== + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + Copyright 2021 The fast_float authors + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +ffc.h Boost License +=================== + +Boost Software License - Version 1.0 - August 17th, 2003 + +Permission is hereby granted, free of charge, to any person or organization +obtaining a copy of the software and accompanying documentation covered by +this license (the "Software") to use, reproduce, display, distribute, +execute, and transmit the Software, and to prepare derivative works of the +Software, and to permit third-parties to whom the Software is furnished to +do so, all subject to the following: + +The copyright notices in the Software and this entire statement, including +the above license grant, this restriction and the following disclaimer, +must be included in all copies of the Software, in whole or in part, and +all derivative works of the Software, unless such copies or derivative +works are solely in the form of machine-executable object code generated by +a source language processor. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT +SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE +FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. diff --git a/README.ffc.md b/README.ffc.md new file mode 100644 index 0000000..fc1f7e1 --- /dev/null +++ b/README.ffc.md @@ -0,0 +1,11 @@ +Fast Float C Library (ffc.h) +---------------------------- + +This is from https://github.com/kolemannix/ffc.h + +A copy and paste into c_src. In the unity build `jiffy.c` it's included as + +``` +#define FFC_IMPL +#include "ffc.h" +``` diff --git a/c_src/decoder.c b/c_src/decoder.c index 779f947..ae2d266 100644 --- a/c_src/decoder.c +++ b/c_src/decoder.c @@ -2,16 +2,15 @@ // See the LICENSE file for more information. #include <assert.h> -#include <errno.h> #include <stdlib.h> #include <string.h> #include "erl_nif.h" +#include "ffc.h" #include "jiffy.h" #include "jiffy_utf8.h" #define STACK_SIZE_INC 64 -#define NUM_BUF_LEN 32 #define JIFFY_SMALL_ARRAY_SIZE 64 enum { @@ -382,11 +381,10 @@ dec_number(Decoder* d, ERL_NIF_TERM* value) { ERL_NIF_TERM num_type = d->atoms->atom_error; char state = nst_init; - char nbuf[NUM_BUF_LEN]; int has_frac = 0; int has_exp = 0; double dval; - long lval; + int64_t lval; // Use the same trick as did for dec_string. The restrict qualifier hints // to the compiler p won't alias any other pointers so it can optimize @@ -633,26 +631,27 @@ parse: break; } - errno = 0; - - size_t num_len = d->i - start; - - if(num_len < NUM_BUF_LEN) { - memcpy(nbuf, &p[start], num_len); - nbuf[num_len] = '\0'; - - if(has_frac || has_exp) { - dval = strtod(nbuf, NULL); - if(errno != ERANGE) { - *value = enif_make_double(d->env, dval); - return 1; - } - } else { - lval = strtol(nbuf, NULL, 10); - if(errno != ERANGE) { - *value = enif_make_int64(d->env, lval); - return 1; - } + // Use ffc.h to parse numbers. It parses direclty from the stream no need + // to allocate a separate buffer as with strtod and strtol. The state + // machine already validated the syntax, so parse-level errors shouldn't + // occur here, only FFC_OUTCOME_OUT_OF_RANGE. If the range erro happens we + // fall back to Erlang with handle big numbers. + const char* nstart = (const char*)&p[start]; + const char* nend = (const char*)&p[d->i]; + const size_t num_len = d->i - start; + + if(has_frac || has_exp) { + ffc_parse_options opts = {FFC_PRESET_JSON, '.'}; + ffc_result res = ffc_from_chars_double_options(nstart, nend, &dval, opts); + if(res.outcome == FFC_OUTCOME_OK) { + *value = enif_make_double(d->env, dval); + return 1; + } + } else { + ffc_result res = ffc_parse_i64(num_len, nstart, 10, &lval); + if(res.outcome == FFC_OUTCOME_OK) { + *value = enif_make_int64(d->env, lval); + return 1; } } diff --git a/c_src/ffc.h b/c_src/ffc.h new file mode 100644 index 0000000..8d40697 --- /dev/null +++ b/c_src/ffc.h @@ -0,0 +1,3242 @@ +// fast_float by Daniel Lemire (Original) +// fast_float by João Paulo Magalhaes (Original) +// fast_float by Koleman Nix (C Port) +// +// +// with contributions from Eugene Golushkov +// with contributions from Maksim Kita +// with contributions from Marcin Wojdyr +// with contributions from Neal Richardson +// with contributions from Tim Paine +// with contributions from Fabio Pellacini +// with contributions from Lénárd Szolnoki +// with contributions from Jan Pharago +// with contributions from Maya Warrier +// with contributions from Taha Khokhar +// with contributions from Anders Dalvander +// with contributions from Koleman Nix +// with contributions from Michael Grunder +// +// +// Licensed under the Apache License, Version 2.0, or the +// MIT License or the Boost License. This file may not be copied, +// modified, or distributed except according to those terms. +// +// MIT License Notice +// +// MIT License +// +// Copyright (c) 2021 The fast_float authors +// +// Permission is hereby granted, free of charge, to any +// person obtaining a copy of this software and associated +// documentation files (the "Software"), to deal in the +// Software without restriction, including without +// limitation the rights to use, copy, modify, merge, +// publish, distribute, sublicense, and/or sell copies of +// the Software, and to permit persons to whom the Software +// is furnished to do so, subject to the following +// conditions: +// +// The above copyright notice and this permission notice +// shall be included in all copies or substantial portions +// of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF +// ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED +// TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A +// PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT +// SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +// CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +// OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR +// IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +// DEALINGS IN THE SOFTWARE. +// +// Apache License (Version 2.0) Notice +// +// Copyright 2021 The fast_float authors +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// +// BOOST License Notice +// +// Boost Software License - Version 1.0 - August 17th, 2003 +// +// Permission is hereby granted, free of charge, to any person or organization +// obtaining a copy of the software and accompanying documentation covered by +// this license (the "Software") to use, reproduce, display, distribute, +// execute, and transmit the Software, and to prepare derivative works of the +// Software, and to permit third-parties to whom the Software is furnished to +// do so, all subject to the following: +// +// The copyright notices in the Software and this entire statement, including +// the above license grant, this restriction and the following disclaimer, +// must be included in all copies of the Software, in whole or in part, and +// all derivative works of the Software, unless such copies or derivative +// works are solely in the form of machine-executable object code generated by +// a source language processor. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT +// SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE +// FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, +// ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +// DEALINGS IN THE SOFTWARE. +// + +// This file is auto-generated by amalgamate.py from the sources in src/. +// Do not edit it directly — edit the source files and regenerate. +// + +/* ffc.h + single-header decimal float parser using eisel-lemire + This is a direct port by Koleman Nix of the fast_float library + authored by Daniel Lemire +*/ + +#ifndef FFC_H +#define FFC_H + +#ifdef __cplusplus +extern "C" { +#endif + +#ifndef FFC_API +#define FFC_API + +#define FFC_VERSION_YEAR 26 +#define FFC_VERSION_MONTH 03 +#define FFC_VERSION_BUILD 03 +#define FFC_VERSION ((FFC_VERSION_YEAR << 16) | (FFC_VERSION_MONTH << 8) | (FFC_VERSION_BUILD)) +#define FFC_VERSION_STRINGIFY_(x) #x +#define FFC_VERSION_STRINGIFY(x) FFC_VERSION_STRINGIFY_(x) +#define FFC_VERSION_STRING \ + FFC_VERSION_STRINGIFY(FFC_VERSION_YEAR) "." \ + FFC_VERSION_STRINGIFY(FFC_VERSION_MONTH) "." \ + FFC_VERSION_STRINGIFY(FFC_VERSION_BUILD) + +#include <stddef.h> +#include <stdint.h> + +typedef uint32_t ffc_outcome; +enum ffc_outcome_bits { + FFC_OUTCOME_OK = 0, + FFC_OUTCOME_INVALID_INPUT = 1, + FFC_OUTCOME_OUT_OF_RANGE = 2, +}; + +typedef struct ffc_result { + // Where parsing stopped + char const *ptr; + // The outcome of the call + ffc_outcome outcome; +} ffc_result; + +typedef uint64_t ffc_format; +enum ffc_format_bits { + FFC_FORMAT_FLAG_SCIENTIFIC = 1ULL << 0, + FFC_FORMAT_FLAG_FIXED = 1ULL << 2, // Gap is present in fast_float original + FFC_FORMAT_FLAG_HEX = 1ULL << 3, + FFC_FORMAT_FLAG_NO_INFNAN = 1ULL << 4, + FFC_FORMAT_FLAG_BASIC_JSON = 1ULL << 5, + FFC_FORMAT_FLAG_BASIC_FORTRAN = 1ULL << 6, + FFC_FORMAT_FLAG_ALLOW_LEADING_PLUS = 1ULL << 7, + FFC_FORMAT_FLAG_SKIP_WHITE_SPACE = 1ULL << 8, + + /* Presets */ + FFC_PRESET_GENERAL = FFC_FORMAT_FLAG_FIXED | FFC_FORMAT_FLAG_SCIENTIFIC, + + FFC_PRESET_JSON = FFC_FORMAT_FLAG_BASIC_JSON | + FFC_PRESET_GENERAL | + FFC_FORMAT_FLAG_NO_INFNAN, + + FFC_PRESET_JSON_OR_INFNAN = FFC_FORMAT_FLAG_BASIC_JSON | + FFC_PRESET_GENERAL, + + FFC_PRESET_FORTRAN = FFC_FORMAT_FLAG_BASIC_FORTRAN | + FFC_PRESET_GENERAL +}; + +typedef struct ffc_parse_options { + /** Which number formats are accepted */ + ffc_format format; + /** The character used as decimal point; period will be used if decimal_point == '\0' */ + char decimal_point; +} ffc_parse_options; + +ffc_parse_options ffc_parse_options_default(void); + +typedef enum ffc_parse_outcome { + FFC_PARSE_OUTCOME_NO_ERROR = 0, + // [JSON-only] The minus sign must be followed by an integer. + FFC_PARSE_OUTCOME_JSON_MISSING_INTEGER_AFTER_SIGN = 1, + // A sign must be followed by an integer or dot. + FFC_PARSE_OUTCOME_MISSING_INTEGER_OR_DOT_AFTER_SIGN = 2, + // [JSON-only] The integer part must not have leading zeros. + FFC_PARSE_OUTCOME_JSON_LEADING_ZEROS_IN_INTEGER_PART = 3, + // [JSON-only] The integer part must have at least one digit. + FFC_PARSE_OUTCOME_JSON_NO_DIGITS_IN_INTEGER_PART = 4, + // [JSON-only] If there is a decimal point, there must be digits in the + // fractional part. + FFC_PARSE_OUTCOME_JSON_NO_DIGITS_IN_FRACTIONAL_PART = 5, + // The mantissa must have at least one digit. + FFC_PARSE_OUTCOME_NO_DIGITS_IN_MANTISSA = 6, + // Scientific notation requires an exponential part. + FFC_PARSE_OUTCOME_MISSING_EXPONENTIAL_PART = 7, +} ffc_parse_outcome; + +/* + * A simplified API; the result will be 0.0 on error, not uninitialized. + * If outcome is null, it will not be written to + */ +double ffc_parse_double_simple(size_t len, const char *input, ffc_outcome *outcome); +ffc_result ffc_parse_double(size_t len, const char *input, double *out); +/** + * Implements the fast_float algorithm from https://github.com/fastfloat/fast_float + * See original for more details + * + * This function parses the character sequence [first,last) for a number. It + * parses floating-point numbers expecting a locale-independent format equivalent + * to what is used by std::strtod in the default ("C") locale. The resulting + * floating-point value is the closest floating-point value (using either float + * or double), using the "round to even" convention for values that would + * otherwise fall right in-between two values. That is, we provide exact parsing + * according to the IEEE standard. + * + * Given a successful parse, the pointer (`ptr`) in the returned value is set to + * point right after the parsed number, and the `value` referenced is set to the + * parsed value. In case of error, the returned `ec` contains a representative + * error, otherwise the default (`FFC_OUTCOME_OK`) value is stored. + * + * The implementation does not allocate heap memory. + * + * Like the C++17 standard, the `fast_float::from_chars` functions take an + * optional last argument of the type `fast_float::chars_format`. It is a bitset + * value: we check whether `fmt & fast_float::chars_format::fixed` and `fmt & + * fast_float::chars_format::scientific` are set to determine whether we allow + * the fixed point and scientific notation respectively. The default is + * `fast_float::chars_format::general` which allows both `fixed` and + * `scientific`. + */ +ffc_result ffc_from_chars_double(const char *start, const char *end, double* out); +ffc_result ffc_from_chars_double_options(const char *start, const char *end, double* out, ffc_parse_options options); + +/* + * A simplified API; the result will be 0.0 on error, not uninitialized. + * If outcome is null, it will not be written to + */ +float ffc_parse_float_simple(size_t len, const char *s, ffc_outcome *outcome); +ffc_result ffc_parse_float(size_t len, const char *s, float *out); +ffc_result ffc_from_chars_float(const char *start, const char *end, float* out); +ffc_result ffc_from_chars_float_options(const char *start, const char *end, float* out, ffc_parse_options options); + +ffc_result ffc_parse_i64(size_t len, const char *input, int base, int64_t *out); +ffc_result ffc_parse_u64(size_t len, const char *input, int base, uint64_t *out); +ffc_result ffc_parse_i32(size_t len, const char *input, int base, int32_t *out); +ffc_result ffc_parse_u32(size_t len, const char *input, int base, uint32_t *out); + +/* + * A simplified API; the result will be 0 on error, not uninitialized. + * If outcome is null, it will not be written to + */ +int64_t ffc_parse_i64_simple(size_t len, const char *input, int base, ffc_outcome *outcome); +uint64_t ffc_parse_u64_simple(size_t len, const char *input, int base, ffc_outcome *outcome); +int32_t ffc_parse_i32_simple(size_t len, const char *input, int base, ffc_outcome *outcome); +uint32_t ffc_parse_u32_simple(size_t len, const char *input, int base, ffc_outcome *outcome); + +#endif // FFC_API + +#ifdef FFC_IMPL + +#ifndef FFC_COMMON_H +#define FFC_COMMON_H + +#include <float.h> // for NAN, FLT_MIN +#include <stdlib.h> +#include <string.h> // for memcpy +#include <stdbool.h> + +#ifndef ffc_internal +#define ffc_internal static +#endif + +#if defined(_MSC_VER) + #define ffc_inline __forceinline +#elif defined(__GNUC__) || defined(__clang__) + #define ffc_inline __attribute__((always_inline)) inline +#else + #define ffc_inline inline +#endif + +#if FFC_DEBUG +#include <stdio.h> +#define ffc_debug(...) do { fprintf(stderr, __VA_ARGS__); } while(0) +#else +#define ffc_debug(...) do { } while(0) +#endif + + +ffc_internal ffc_inline +uint64_t ffc_get_double_bits(double d) { + uint64_t bits; + memcpy(&bits, &d, sizeof(double)); + return bits; +} + +ffc_internal ffc_inline +uint32_t ffc_get_float_bits(float d) { + uint32_t bits; + memcpy(&bits, &d, sizeof(float)); + return bits; +} + +typedef union ffc_value { + double d; + float f; +} ffc_value; + +typedef union ffc_value_bits { + uint64_t di; + uint32_t fi; +} ffc_value_bits; + +typedef uint8_t ffc_value_kind; +enum ffc_value_kind_bits { + FFC_VALUE_KIND_FLOAT = 0, + FFC_VALUE_KIND_DOUBLE = 1, +}; + +typedef union ffc_int_value { + int64_t s64; + int32_t s32; + uint64_t u64; + uint32_t u32; +} ffc_int_value; + +typedef uint8_t ffc_int_kind; +enum ffc_int_kind_bits { + FFC_INT_KIND_S64 = 0, + FFC_INT_KIND_S32 = 1, + FFC_INT_KIND_U64 = 2, + FFC_INT_KIND_U32 = 3, +}; + +ffc_internal ffc_inline +bool ffc_int_kind_is_signed(ffc_int_kind ik) { + return ik == FFC_INT_KIND_S64 || ik == FFC_INT_KIND_S32; +} + + +ffc_internal ffc_inline +ffc_value_bits ffc_get_value_bits(ffc_value value, ffc_value_kind vk) { + ffc_value_bits bits; + if (vk == FFC_VALUE_KIND_DOUBLE) { + bits.di = ffc_get_double_bits(value.d); + } else { + bits.fi = ffc_get_float_bits(value.f); + } + return bits; +} + +ffc_internal ffc_inline +uint64_t ffc_int_value_max(ffc_int_kind ik) { + switch (ik) { + case FFC_INT_KIND_S64: return INT64_MAX; + case FFC_INT_KIND_S32: return INT32_MAX; + case FFC_INT_KIND_U64: return UINT64_MAX; + case FFC_INT_KIND_U32: return UINT32_MAX; + default: return 0; // should never happen + } +} + +#define ffc_set_value(ptr, value_kind, value) \ + (((value_kind) == FFC_VALUE_KIND_DOUBLE) ? ((ptr)->d = (double)(value)) : ((ptr)->f = (float)(value))) + +#define ffc_read_value(ptr, value_kind) \ + (((value_kind) == FFC_VALUE_KIND_DOUBLE) ? (ptr)->d : (ptr)->f) + +typedef struct ffc_adjusted_mantissa { + uint64_t mantissa; + int32_t power2; // a negative value indicates an invalid result +} ffc_adjusted_mantissa; + +ffc_internal ffc_inline size_t ffc_get_value_size(ffc_value_kind vk) { + if (vk == FFC_VALUE_KIND_DOUBLE) { + return sizeof(double); + } else { + return sizeof(float); + } +} + +// Bias so we can get the real exponent with an invalid adjusted_mantissa. +#define FFC_INVALID_AM_BIAS ((int32_t)-0x8000) + +/********************* context crack: word size *********************/ + +#if (defined(__x86_64) || defined(__x86_64__) || defined(_M_X64) || \ + defined(__amd64) || defined(__aarch64__) || defined(_M_ARM64) || \ + defined(__MINGW64__) || defined(__s390x__) || \ + (defined(__ppc64__) || defined(__PPC64__) || defined(__ppc64le__) || \ + defined(__PPC64LE__)) || \ + defined(__loongarch64) || (defined(__riscv) && __riscv_xlen == 64)) +#define FFC_64BIT 1 +#elif (defined(__i386) || defined(__i386__) || defined(_M_IX86) || \ + defined(__arm__) || defined(_M_ARM) || defined(__ppc__) || \ + defined(__MINGW32__) || defined(__EMSCRIPTEN__) || \ + (defined(__riscv) && __riscv_xlen == 32)) +#define FFC_32BIT 1 +#else +// Need to check incrementally, since SIZE_MAX is a size_t, avoid overflow. +// We can never tell the register width, but the SIZE_MAX is a good +// approximation. UINTPTR_MAX and INTPTR_MAX are optional, so avoid them for max +// portability. +#if SIZE_MAX == 0xffff +#error Unknown platform (16-bit, unsupported) +#elif SIZE_MAX == 0xffffffff +#define FFC_32BIT 1 +#elif SIZE_MAX == 0xffffffffffffffff +#define FFC_64BIT 1 +#else +#error Unknown platform (not 32-bit, not 64-bit?) +#endif +#endif + +/********************* context crack: intrinsics *********************/ +#if ((defined(_WIN32) || defined(_WIN64)) && !defined(__clang__)) || \ + (defined(_M_ARM64) && !defined(__MINGW32__)) +#include <intrin.h> +#endif + +#if defined(_MSC_VER) && !defined(__clang__) +#define FFC_VISUAL_STUDIO 1 +#endif + +/********************* context crack: byte order / endianness *********************/ +#if defined __BYTE_ORDER__ && defined __ORDER_BIG_ENDIAN__ +#define FFC_IS_BIG_ENDIAN (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) +#elif defined _WIN32 +#define FFC_IS_BIG_ENDIAN 0 +#else +#if defined(__APPLE__) || defined(__FreeBSD__) +#include <machine/endian.h> +#elif defined(sun) || defined(__sun) +#include <sys/byteorder.h> +#elif defined(__MVS__) +#include <sys/endian.h> +#else +#ifdef __has_include +#if __has_include(<endian.h>) +#include <endian.h> +#endif //__has_include(<endian.h>) +#endif //__has_include +#endif +# +#ifndef __BYTE_ORDER__ +// safe choice +#define FFC_IS_BIG_ENDIAN 0 +#endif +# +#ifndef __ORDER_LITTLE_ENDIAN__ +// safe choice +#define FFC_IS_BIG_ENDIAN 0 +#endif +# +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ +#define FFC_IS_BIG_ENDIAN 0 +#else +#define FFC_IS_BIG_ENDIAN 1 +#endif +#endif + +/********************* context crack: simd *********************/ +#if defined(__SSE2__) || (defined(FFC_VISUAL_STUDIO) && \ + (defined(_M_AMD64) || defined(_M_X64) || \ + (defined(_M_IX86_FP) && _M_IX86_FP == 2))) +#define FFC_SSE2 1 +#endif + +#if defined(__aarch64__) || defined(_M_ARM64) +#define FFC_NEON 1 +#endif + +#if defined(FFC_SSE2) || defined(FFC_NEON) +#define FFC_HAS_SIMD 1 +#endif + + +#ifdef FFC_SSE2 +#include <emmintrin.h> +#endif + +#ifdef FFC_NEON +#include <arm_neon.h> +#endif + + +#if defined(__GNUC__) +// disable -Wcast-align=strict (GCC only) +#define FFC_SIMD_DISABLE_WARNINGS \ + _Pragma("GCC diagnostic push") \ + _Pragma("GCC diagnostic ignored \"-Wcast-align\"") +#else +#define FFC_SIMD_DISABLE_WARNINGS +#endif + +#if defined(__GNUC__) +#define FFC_SIMD_RESTORE_WARNINGS _Pragma("GCC diagnostic pop") +#else +#define FFC_SIMD_RESTORE_WARNINGS +#endif + +ffc_internal ffc_inline +uint32_t ffc_count_leading_zeroes(uint64_t x) { +#if defined(__GNUC__) || defined(__clang__) + return x ? (uint32_t)__builtin_clzll(x) : 64u; +#else + if (x == 0) return 64u; + uint32_t n = 0; + if ((x >> 32) == 0) { n |= 32; x <<= 32; } + if ((x >> 48) == 0) { n |= 16; x <<= 16; } + if ((x >> 56) == 0) { n |= 8; x <<= 8; } + if ((x >> 60) == 0) { n |= 4; x <<= 4; } + if ((x >> 62) == 0) { n |= 2; x <<= 2; } + if ((x >> 63) == 0) { n |= 1; } + return n; +#endif +} + +typedef struct ffc_u128 { uint64_t low; uint64_t high; } ffc_u128; +ffc_internal ffc_inline +ffc_u128 ffc_mul_u64(uint64_t a, uint64_t b) { +#if defined(__SIZEOF_INT128__) + __uint128_t z = ((__uint128_t)a) * ((__uint128_t)b); + ffc_u128 output; + output.low = (uint64_t)z; + output.high = (uint64_t)(z >> 64); + return output; +#else + uint64_t a0 = a & 0xFFFFFFFF; + uint64_t a1 = a >> 32; + uint64_t b0 = b & 0xFFFFFFFF; + uint64_t b1 = b >> 32; + + uint64_t w0 = a0 * b0; + uint64_t t = (a1 * b0) + (w0 >> 32); + uint64_t w1 = t & 0xFFFFFFFF; + uint64_t w2 = t >> 32; + w1 += a0 * b1; + + ffc_u128 output; + output.low = a * b; + output.high = (a1 * b1) + w2 + (w1 >> 32); + return output; +#endif +} + +// slow emulation routine for 32-bit +ffc_internal ffc_inline uint64_t ffc_emulu(uint32_t x, uint32_t y) { + return x * (uint64_t)y; +} + +ffc_internal ffc_inline +uint64_t ffc_umul128_generic(uint64_t ab, uint64_t cd, uint64_t *hi) { + uint64_t ad = ffc_emulu((uint32_t)(ab >> 32), (uint32_t)cd); + uint64_t bd = ffc_emulu((uint32_t)ab, (uint32_t)cd); + uint64_t adbc = ad + ffc_emulu((uint32_t)ab, (uint32_t)(cd >> 32)); + uint64_t adbc_carry = (uint64_t)(adbc < ad); + uint64_t lo = bd + (adbc << 32); + *hi = ffc_emulu((uint32_t)(ab >> 32), (uint32_t)(cd >> 32)) + (adbc >> 32) + + (adbc_carry << 32) + (uint64_t)(lo < bd); + return lo; +} + +// compute 64-bit a*b +ffc_internal ffc_inline +ffc_u128 ffc_full_multiplication(uint64_t a, uint64_t b) { + ffc_u128 answer; +#if defined(_M_ARM64) && !defined(__MINGW32__) + // ARM64 has native support for 64-bit multiplications, no need to emulate + // But MinGW on ARM64 doesn't have native support for 64-bit multiplications + answer.high = __umulh(a, b); + answer.low = a * b; +#elif (defined(_WIN64) && !defined(__clang__) && \ + !defined(_M_ARM64) && !defined(__GNUC__)) + answer.low = _umul128(a, b, &answer.high); // _umul128 not available on ARM64 +#elif defined(FFC_64BIT) && defined(__SIZEOF_INT128__) + __uint128_t r = ((__uint128_t)a) * b; + answer.low = (uint64_t)(r); + answer.high = (uint64_t)(r >> 64); +#else + answer.low = ffc_umul128_generic(a, b, &answer.high); +#endif + return answer; +} + +#define FFC_DOUBLE_SMALLEST_POWER_OF_10 -342 +#define FFC_DOUBLE_LARGEST_POWER_OF_10 308 +#define FFC_DOUBLE_SIGN_INDEX 63 +#define FFC_DOUBLE_INFINITE_POWER 0x7FF +#define FFC_DOUBLE_MANTISSA_EXPLICIT_BITS 52 +#define FFC_DOUBLE_MINIMUM_EXPONENT -1023 +#define FFC_DOUBLE_MIN_EXPONENT_ROUND_TO_EVEN -4 +#define FFC_DOUBLE_MAX_EXPONENT_ROUND_TO_EVEN 23 +#define FFC_DOUBLE_MAX_EXPONENT_FAST_PATH 22 +#define FFC_DOUBLE_MAX_MANTISSA_FAST_PATH ((uint64_t)(2) << FFC_DOUBLE_MANTISSA_EXPLICIT_BITS) +#define FFC_DOUBLE_EXPONENT_MASK 0x7FF0000000000000 +#define FFC_DOUBLE_MANTISSA_MASK 0x000FFFFFFFFFFFFF +#define FFC_DOUBLE_HIDDEN_BIT_MASK 0x0010000000000000 +#define FFC_DOUBLE_MAX_DIGITS 769 + +#define FFC_FLOAT_SMALLEST_POWER_OF_10 -64 +#define FFC_FLOAT_LARGEST_POWER_OF_10 38 +#define FFC_FLOAT_SIGN_INDEX 31 +#define FFC_FLOAT_INFINITE_POWER 0xFF +#define FFC_FLOAT_MANTISSA_EXPLICIT_BITS 23 +#define FFC_FLOAT_MINIMUM_EXPONENT -127 +#define FFC_FLOAT_MIN_EXPONENT_ROUND_TO_EVEN -17 +#define FFC_FLOAT_MAX_EXPONENT_ROUND_TO_EVEN 10 +#define FFC_FLOAT_MAX_EXPONENT_FAST_PATH 10 +#define FFC_FLOAT_MAX_MANTISSA_FAST_PATH ((uint64_t)(2) << FFC_FLOAT_MANTISSA_EXPLICIT_BITS) +#define FFC_FLOAT_EXPONENT_MASK 0x7F800000 +#define FFC_FLOAT_MANTISSA_MASK 0x007FFFFF +#define FFC_FLOAT_HIDDEN_BIT_MASK 0x00800000 +#define FFC_FLOAT_MAX_DIGITS 114 + +#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0) + #define FFC_DOUBLE_MIN_EXPONENT_FAST_PATH 0 + #define FFC_FLOAT_MIN_EXPONENT_FAST_PATH 0 +#else + #define FFC_DOUBLE_MIN_EXPONENT_FAST_PATH -22 + #define FFC_FLOAT_MIN_EXPONENT_FAST_PATH -10 +#endif + +#define ffc_const(value_kind, name) (value_kind == FFC_VALUE_KIND_DOUBLE ? FFC_DOUBLE_##name : FFC_FLOAT_##name) + +#define FFC_POWERS_OF_5_NUMBER_OF_ENTRIES (2 * (FFC_DOUBLE_LARGEST_POWER_OF_10 - FFC_DOUBLE_SMALLEST_POWER_OF_10 + 1)) + +// Powers of five from 5^-342 all the way to 5^308 rounded toward one. +ffc_internal uint64_t FFC_POWERS_OF_FIVE[FFC_POWERS_OF_5_NUMBER_OF_ENTRIES] = { + 0xeef453d6923bd65a, 0x113faa2906a13b3f, 0x9558b4661b6565f8, 0x4ac7ca59a424c507, 0xbaaee17fa23ebf76, 0x5d79bcf00d2df649, 0xe95a99df8ace6f53, 0xf4d82c2c107973dc, 0x91d8a02bb6c10594, 0x79071b9b8a4be869, 0xb64ec836a47146f9, 0x9748e2826cdee284, 0xe3e27a444d8d98b7, 0xfd1b1b2308169b25, 0x8e6d8c6ab0787f72, 0xfe30f0f5e50e20f7, 0xb208ef855c969f4f, 0xbdbd2d335e51a935, 0xde8b2b66b3bc4723, 0xad2c788035e61382, 0x8b16fb203055ac76, 0x4c3bcb5021afcc31, 0xaddcb9e83c6b1793, 0xdf4abe242a1bbf3d, 0xd953e862 [...] +}; + +// lookup table for ascii values +ffc_internal bool FFC_SPACE_LUT[256] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 +}; + +ffc_internal ffc_inline bool ffc_is_space(char c) { + return FFC_SPACE_LUT[(uint8_t)c]; +} + +ffc_internal uint8_t FFC_CHAR_TO_DIGIT_LUT[] = { + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 255, 255, + 255, 255, 255, 255, 255, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, + 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, + 35, 255, 255, 255, 255, 255, 255, 10, 11, 12, 13, 14, 15, 16, 17, + 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, + 33, 34, 35, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, + 255}; + +ffc_internal ffc_inline +uint8_t ffc_char_to_digit(char c) { + return FFC_CHAR_TO_DIGIT_LUT[(uint8_t)c]; +} + +// Indexed by a 'base' but offset by 2, first entry is base 2, 64 digits, checks out right? +ffc_internal size_t FFC_MAXDIGITS_OF_BASE_U64[] = { + 64, 41, 32, 28, 25, 23, 22, 21, 20, 19, 18, 18, 17, 17, 16, 16, 16, 16, + 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13}; + +ffc_internal ffc_inline size_t ffc_max_digits_u64(int base) { + return FFC_MAXDIGITS_OF_BASE_U64[base - 2]; +} + +ffc_internal ffc_inline bool ffc_is_integer(char c) { + // can be micro-optimized, but compilers are entirely able to optimize it well + return (unsigned)(c - '0') <= 9u; +} + + +ffc_internal uint64_t FFC_MIN_SAFE_OF_BASE_U64[] = { + 9223372036854775808ull, 12157665459056928801ull, 4611686018427387904, + 7450580596923828125, 4738381338321616896, 3909821048582988049, + 9223372036854775808ull, 12157665459056928801ull, 10000000000000000000ull, + 5559917313492231481, 2218611106740436992, 8650415919381337933, + 2177953337809371136, 6568408355712890625, 1152921504606846976, + 2862423051509815793, 6746640616477458432, 15181127029874798299ull, + 1638400000000000000, 3243919932521508681, 6221821273427820544, + 11592836324538749809ull, 876488338465357824, 1490116119384765625, + 2481152873203736576, 4052555153018976267, 6502111422497947648, + 10260628712958602189ull, 15943230000000000000ull, 787662783788549761, + 1152921504606846976, 1667889514952984961, 2386420683693101056, + 3379220508056640625, 4738381338321616896}; + +ffc_internal ffc_inline uint64_t ffc_min_safe_u64_of_base(int base) { + return FFC_MIN_SAFE_OF_BASE_U64[base - 2]; +} + +ffc_internal ffc_inline +bool ffc_strncasecmp3(char const *actual_mixedcase, char const *expected_lowercase, size_t char_width) { + uint64_t mask = 0; + if (char_width == 1) { mask = 0x2020202020202020; } + else if (char_width == 2) { mask = 0x0020002000200020; } + else if (char_width == 4) { mask = 0x0000002000000020; } + else { return false; } + + uint64_t val1 = 0; + uint64_t val2 = 0; + if (char_width == 1 || char_width == 2) { + memcpy(&val1, actual_mixedcase, 3 * char_width); + memcpy(&val2, expected_lowercase, 3 * char_width); + val1 |= mask; + val2 |= mask; + return val1 == val2; + } else if (char_width == 4) { + memcpy(&val1, actual_mixedcase, 2 * char_width); + memcpy(&val2, expected_lowercase, 2 * char_width); + val1 |= mask; + if (val1 != val2) { + return false; + } + return (actual_mixedcase[2] | 32) == (expected_lowercase[2]); + } else { + return false; + } +} + +ffc_internal ffc_inline +bool ffc_strncasecmp5(char *actual_mixedcase, char const *expected_lowercase, size_t char_width) { + uint64_t mask = 0; + uint64_t val1 = 0; + uint64_t val2 = 0; + if (char_width == 1) { + mask = 0x2020202020202020; + memcpy(&val1, actual_mixedcase, 5 * char_width); + memcpy(&val2, expected_lowercase, 5 * char_width); + val1 |= mask; + val2 |= mask; + return val1 == val2; + } else if (char_width == 2) { + mask = 0x0020002000200020; + memcpy(&val1, actual_mixedcase, 4 * char_width); + memcpy(&val2, expected_lowercase, 4 * char_width); + val1 |= mask; + if (val1 != val2) { + return false; + } + return (actual_mixedcase[4] | 32) == (expected_lowercase[4]); + } else if (char_width == 4) { + mask = 0x0000002000000020; + memcpy(&val1, actual_mixedcase, 2 * char_width); + memcpy(&val2, expected_lowercase, 2 * char_width); + val1 |= mask; + if (val1 != val2) { + return false; + } + memcpy(&val1, actual_mixedcase + 2, 2 * char_width); + memcpy(&val2, expected_lowercase + 2, 2 * char_width); + val1 |= mask; + if (val1 != val2) { + return false; + } + return (actual_mixedcase[4] | 32) == (expected_lowercase[4]); + } + return false; +} + +/** + * Returns true if the floating-pointing rounding mode is to 'nearest'. + * It is the default on most system. This function is meant to be inexpensive. + * Credit : @mwalcott3 + */ + +ffc_internal ffc_inline +bool ffc_rounds_to_nearest(void) { + // https://lemire.me/blog/2020/06/26/gcc-not-nearest/ +#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0) + return false; +#endif + // See + // A fast function to check your floating-point rounding mode + // https://lemire.me/blog/2022/11/16/a-fast-function-to-check-your-floating-point-rounding-mode/ + // + // This function is meant to be equivalent to : + // prior: #include <cfenv> + // return fegetround() == FE_TONEAREST; + // However, it is expected to be much faster than the fegetround() + // function call. + // + // The volatile keyword prevents the compiler from computing the function + // at compile-time. + // There might be other ways to prevent compile-time optimizations (e.g., + // asm). The value does not need to be std::numeric_limits<float>::min(), any + // small value so that 1 + x should round to 1 would do (after accounting for + // excess precision, as in 387 instructions). + static float volatile fmin = FLT_MIN; + float fmini = fmin; // we copy it so that it gets loaded at most once. +// +// Explanation: +// Only when fegetround() == FE_TONEAREST do we have that +// fmin + 1.0f == 1.0f - fmin. +// +// FE_UPWARD: +// fmin + 1.0f > 1 +// 1.0f - fmin == 1 +// +// FE_DOWNWARD or FE_TOWARDZERO: +// fmin + 1.0f == 1 +// 1.0f - fmin < 1 +// +// Note: This may fail to be accurate if fast-math has been +// enabled, as rounding conventions may not apply. +#ifdef FFC_VISUAL_STUDIO +#pragma warning(push) +// todo: is there a VS warning? +// see +// https://stackoverflow.com/questions/46079446/is-there-a-warning-for-floating-point-equality-checking-in-visual-studio-2013 +#elif defined(__clang__) +#pragma clang diagnostic push +#pragma clang diagnostic ignored "-Wfloat-equal" +#elif defined(__GNUC__) +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wfloat-equal" +#endif + return (fmini + 1.0f == 1.0f - fmini); +#ifdef FFC_VISUAL_STUDIO +#pragma warning(pop) +#elif defined(__clang__) +#pragma clang diagnostic pop +#elif defined(__GNUC__) +#pragma GCC diagnostic pop +#endif +} + +ffc_internal ffc_inline +// packs up an adjusted_mantissa into a double +void ffc_am_to_float(bool negative, ffc_adjusted_mantissa am, ffc_value* value, ffc_value_kind vk) { + if (vk == FFC_VALUE_KIND_FLOAT) { + uint32_t word = (uint32_t)am.mantissa; + word = word | (uint32_t)(am.power2) << ffc_const(vk, MANTISSA_EXPLICIT_BITS); + word = word | (uint32_t)(negative) << ffc_const(vk, SIGN_INDEX); + memcpy(value, &word, sizeof(uint32_t)); + } else { + uint64_t word = am.mantissa; + word = word | (uint64_t)(am.power2) << ffc_const(vk, MANTISSA_EXPLICIT_BITS); + word = word | (uint64_t)(negative) << ffc_const(vk, SIGN_INDEX); + memcpy(value, &word, sizeof(uint64_t)); + } +} + +static const double FFC_DOUBLE_POWERS_OF_TEN[] = { + 1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10, 1e11, + 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19, 1e20, 1e21, 1e22 +}; + +static const float FFC_FLOAT_POWERS_OF_TEN[] = { + 1e0f, 1e1f, 1e2f, 1e3f, 1e4f, 1e5f, + 1e6f, 1e7f, 1e8f, 1e9f, 1e10f +}; + +// Largest integer value v so that (5**index * v) <= 1<<53. +// 0x20000000000000 == 1 << 53 +#define FFC_55555 (5LL * 5LL * 5LL * 5LL * 5LL) +static const uint64_t FFC_DOUBLE_MAX_MANTISSA[] = { + (uint64_t)0x20000000000000L, + (uint64_t)0x20000000000000L / 5, + (uint64_t)0x20000000000000L / (5 * 5), + (uint64_t)0x20000000000000L / (5 * 5 * 5), + (uint64_t)0x20000000000000L / (5 * 5 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555), + (uint64_t)0x20000000000000L / (FFC_55555 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * 5 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * 5 * 5 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * 5 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555 * 5 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555 * 5 * 5 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555 * FFC_55555), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555 * FFC_55555 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555 * FFC_55555 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555 * FFC_55555 * 5 * 5 * 5), + (uint64_t)0x20000000000000L / (FFC_55555 * FFC_55555 * FFC_55555 * FFC_55555 * 5 * 5 * 5 * 5)}; + +// Largest integer value v so that (5**index * v) <= 1<<24. +// 0x1000000 == 1<<24 +static const uint64_t FFC_FLOAT_MAX_MANTISSA[] = { + 0x1000000, + 0x1000000 / 5, + 0x1000000 / (5 * 5), + 0x1000000 / (5 * 5 * 5), + 0x1000000 / (5 * 5 * 5 * 5), + 0x1000000 / (FFC_55555), + 0x1000000 / (FFC_55555 * 5), + 0x1000000 / (FFC_55555 * 5 * 5), + 0x1000000 / (FFC_55555 * 5 * 5 * 5), + 0x1000000 / (FFC_55555 * 5 * 5 * 5 * 5), + 0x1000000 / (FFC_55555 * FFC_55555), + 0x1000000 / (FFC_55555 * FFC_55555 * 5)}; + +#ifndef FFC_ASSERT +#define FFC_ASSERT(x) \ + { ((void)(x)); } +#endif + +#ifndef FFC_DEBUG_ASSERT +#define FFC_DEBUG_ASSERT(x) \ + { ((void)(x)); } +#endif + +#endif // FFC_COMMON_H + +#ifndef FFC_PARSE_H +#define FFC_PARSE_H + +#include <math.h> + +/* section: read digits */ + +ffc_internal ffc_inline +uint64_t ffc_byteswap(uint64_t val) { + return (val & 0xFF00000000000000) >> 56 | (val & 0x00FF000000000000) >> 40 | + (val & 0x0000FF0000000000) >> 24 | (val & 0x000000FF00000000) >> 8 | + (val & 0x00000000FF000000) << 8 | (val & 0x0000000000FF0000) << 24 | + (val & 0x000000000000FF00) << 40 | (val & 0x00000000000000FF) << 56; +} + +ffc_internal ffc_inline +uint32_t ffc_byteswap_32(uint32_t val) { + return (val >> 24) | ((val >> 8) & 0x0000FF00u) | ((val << 8) & 0x00FF0000u) | + (val << 24); +} + +ffc_inline ffc_internal +uint64_t ffc_read8_to_u64(char const *chars) { + uint64_t val; + memcpy(&val, chars, sizeof(uint64_t)); +#if FFC_IS_BIG_ENDIAN == 1 + // Need to read as-if the number was in little-endian order. + val = ffc_byteswap(val); +#endif + return val; +} + +// Read 4 UC into a u32. Truncates UC if not char. +ffc_internal ffc_inline uint32_t +ffc_read4_to_u32(char const *chars) { + uint32_t val; + memcpy(&val, chars, sizeof(uint32_t)); +#if FFC_IS_BIG_ENDIAN == 1 + val = ffc_byteswap_32(val); +#endif + return val; +} + +#ifdef FFC_SSE2 + +ffc_internal ffc_inline +uint64_t ffc_simd_read8_to_u64_simdreg(__m128i const data) { + FFC_SIMD_DISABLE_WARNINGS + __m128i const packed = _mm_packus_epi16(data, data); +#ifdef FFC_64BIT + return (uint64_t)_mm_cvtsi128_si64(packed); +#else + uint64_t value; + // Visual Studio + older versions of GCC don't support _mm_storeu_si64 + _mm_storel_epi64((__m128i*)&value, packed); + return value; +#endif + FFC_SIMD_RESTORE_WARNINGS +} + +ffc_internal ffc_inline +uint64_t ffc_simd_read8_to_u64(uint16_t const *chars) { + FFC_SIMD_DISABLE_WARNINGS + return ffc_simd_read8_to_u64_simdreg( + _mm_loadu_si128((const __m128i*)chars)); + FFC_SIMD_RESTORE_WARNINGS +} + +#elif defined(FFC_NEON) + +ffc_internal ffc_inline +uint64_t ffc_simd_read8_to_u64_simdreg(uint16x8_t const data) { + FFC_SIMD_DISABLE_WARNINGS + uint8x8_t utf8_packed = vmovn_u16(data); + return vget_lane_u64(vreinterpret_u64_u8(utf8_packed), 0); + FFC_SIMD_RESTORE_WARNINGS +} + +ffc_internal ffc_inline +uint64_t ffc_simd_read8_to_u64(uint16_t const *chars) { + FFC_SIMD_DISABLE_WARNINGS + return ffc_simd_read8_to_u64_simdreg(vld1q_u16(chars)); + FFC_SIMD_RESTORE_WARNINGS +} + +#endif // FFC_SSE2 + +// credit @aqrit +ffc_internal ffc_inline uint32_t +ffc_parse_eight_digits_unrolled_swar(uint64_t val) { + uint64_t const mask = 0x000000FF000000FF; + uint64_t const mul1 = 0x000F424000000064; // 100 + (1000000ULL << 32) + uint64_t const mul2 = 0x0000271000000001; // 1 + (10000ULL << 32) + val -= 0x3030303030303030; + val = (val * 10) + (val >> 8); // val = (val * 2561) >> 8; + val = (((val & mask) * mul1) + (((val >> 16) & mask) * mul2)) >> 32; + return (uint32_t)val; +} + +// Call this if chars are definitely 8 digits. +ffc_internal ffc_inline +uint32_t ffc_parse_eight_digits_unrolled(char const *chars) { + return ffc_parse_eight_digits_unrolled_swar(ffc_read8_to_u64(chars)); // truncation okay +} + +// credit @aqrit +ffc_internal ffc_inline bool +ffc_is_made_of_eight_digits_fast(uint64_t val) { + return !((((val + 0x4646464646464646) | (val - 0x3030303030303030)) & + 0x8080808080808080)); +} + +ffc_internal ffc_inline bool +ffc_is_made_of_four_digits_fast(uint32_t val) { + return !((((val + 0x46464646) | (val - 0x30303030)) & 0x80808080)); +} + +ffc_internal ffc_inline +uint32_t ffc_parse_four_digits_unrolled(uint32_t val) { + val -= 0x30303030; + val = (val * 10) + (val >> 8); + return (((val & 0x00FF00FF) * 0x00640001) >> 16) & 0xFFFF; +} + +#ifdef FFC_HAS_SIMD + +// Call this if chars might not be 8 digits. +// Using this style (instead of is_made_of_eight_digits_fast() then +// parse_eight_digits_unrolled()) ensures we don't load SIMD registers twice. +ffc_internal ffc_inline +bool ffc_simd_parse_if_eight_digits_unrolled_simd(uint16_t const *chars, uint64_t* i) { +#ifdef FFC_SSE2 + FFC_SIMD_DISABLE_WARNINGS + __m128i const data = + _mm_loadu_si128((__m128i const *)chars); + + // (x - '0') <= 9 + // http://0x80.pl/articles/simd-parsing-int-sequences.html + __m128i const t0 = _mm_add_epi16(data, _mm_set1_epi16(32720)); + __m128i const t1 = _mm_cmpgt_epi16(t0, _mm_set1_epi16(-32759)); + + if (_mm_movemask_epi8(t1) == 0) { + *i = *i * 100000000 + ffc_parse_eight_digits_unrolled_swar(ffc_simd_read8_to_u64_simdreg(data)); + return true; + } else + return false; + FFC_SIMD_RESTORE_WARNINGS +#elif defined(FFC_NEON) + FFC_SIMD_DISABLE_WARNINGS + uint16x8_t const data = vld1q_u16(chars); + + // (x - '0') <= 9 + // http://0x80.pl/articles/simd-parsing-int-sequences.html + uint16x8_t const t0 = vsubq_u16(data, vmovq_n_u16('0')); + uint16x8_t const mask = vcltq_u16(t0, vmovq_n_u16('9' - '0' + 1)); + + if (vminvq_u16(mask) == 0xFFFF) { + *i = *i * 100000000 + ffc_parse_eight_digits_unrolled_swar(ffc_simd_read8_to_u64_simdreg(data)); + return true; + } else + return false; + FFC_SIMD_RESTORE_WARNINGS +#else + (void)chars; + (void)i; + return false; +#endif // FFC_SSE2 +} + +#endif // FFC_HAS_SIMD + +ffc_internal ffc_inline void +ffc_loop_parse_if_eight_digits(char const **p, char const *const pend, + uint64_t* i) { + // optimizes better than parse_if_eight_digits_unrolled() for char. + while ((pend - *p >= 8) && + ffc_is_made_of_eight_digits_fast(ffc_read8_to_u64(*p))) { + *i = (*i * 100000000) + + ffc_parse_eight_digits_unrolled_swar(ffc_read8_to_u64(*p)); + // in rare cases, this will overflow, but that's ok + *p += 8; + } +} + +/* section end: read digits */ + +/* section: parse */ + +ffc_parse_options ffc_parse_options_default(void) { + ffc_parse_options options; + options.format = FFC_PRESET_GENERAL; + options.decimal_point = '.'; + return options; +} + +typedef struct ffc_parsed { + int64_t exponent; + uint64_t mantissa; + /* Populated on error; indicates where parsing failed */ + char const *lastmatch; + bool negative; + bool valid; + bool too_many_digits; + char* int_part_start; + size_t int_part_len; + char* fraction_part_start; + size_t fraction_part_len; + + ffc_parse_outcome outcome; +} ffc_parsed; + +ffc_internal ffc_inline +ffc_parsed ffc_report_parse_error(char const *p, ffc_parse_outcome outcome) { + ffc_parsed answer; + answer.valid = false; + answer.lastmatch = p; + answer.outcome = outcome; + return answer; +} + +ffc_internal ffc_inline +ffc_parsed ffc_parse_number_string( + char const *p, + // CONTRACT: p < pend + char const *pend, + ffc_parse_options const options, + // explicitly passed to encourage optimizer to specialize + bool const basic_json_fmt +) { + ffc_format fmt = options.format; + char decimal_point = options.decimal_point; + + FFC_DEBUG_ASSERT(fmt != 0); + + ffc_parsed answer = {0}; + answer.negative = (*p == '-'); + // C++17 20.19.3.(7.1) explicitly forbids '+' sign here + // so we only allow it if we've been told to, and are not in json mode + bool allow_leading_plus = fmt & FFC_FORMAT_FLAG_ALLOW_LEADING_PLUS; + if ((*p == '-') || (uint64_t)(allow_leading_plus && !basic_json_fmt && *p == '+')) { + ++p; + if (p == pend) { + return ffc_report_parse_error(p, FFC_PARSE_OUTCOME_MISSING_INTEGER_OR_DOT_AFTER_SIGN); + } + if (basic_json_fmt) { + if (!ffc_is_integer(*p)) { // a sign must be followed by an integer + return ffc_report_parse_error(p, FFC_PARSE_OUTCOME_JSON_MISSING_INTEGER_AFTER_SIGN); + } + } else { + // a sign must be followed by an integer or the dot + if (!ffc_is_integer(*p) && (*p != decimal_point)) { + return ffc_report_parse_error(p, FFC_PARSE_OUTCOME_MISSING_INTEGER_OR_DOT_AFTER_SIGN); + } + } + } + + // phew, we've found the digits + char const *const start_digits = p; + + uint64_t i = 0; // an unsigned int avoids signed overflows (which are bad) + + while ((p != pend) && ffc_is_integer(*p)) { + // Horner's method: only ever multiplies by the constant 10 + // avoiding variable power-of-10 multiplies + + // might overflow, we will handle the overflow later + uint64_t digit_value = (uint64_t)(*p - '0'); + i = (10 * i) + digit_value; + ++p; + } + + char const *const end_of_integer_part = p; + + int64_t digit_count = (int64_t)(end_of_integer_part - start_digits); + answer.int_part_start = (char*)start_digits; + answer.int_part_len = (size_t)(digit_count); + + if (basic_json_fmt) { + // at least 1 digit in integer part + if (digit_count == 0) { + return ffc_report_parse_error(p, FFC_PARSE_OUTCOME_JSON_NO_DIGITS_IN_INTEGER_PART); + } + // no leading zeros + if ((start_digits[0] == '0' && digit_count > 1)) { + return ffc_report_parse_error(start_digits, FFC_PARSE_OUTCOME_JSON_LEADING_ZEROS_IN_INTEGER_PART); + } + } + + int64_t exponent = 0; + bool const has_decimal_point = (p != pend) && (*p == decimal_point); + + /* post-decimal exponential part (calculates a negative exponent) */ + if (has_decimal_point) { + ++p; + char const *before = p; + // can occur at most twice without overflowing, but let it occur more, since + // for integers with many digits, digit parsing is the primary bottleneck. + ffc_loop_parse_if_eight_digits(&p, pend, &i); + + while ((p != pend) && ffc_is_integer(*p)) { + uint8_t digit = (uint8_t)(*p - (char)('0')); + ++p; + i = i * 10 + digit; // in rare cases, this will overflow, but that's ok + } + + // pre: i = 123, digit_count = 3 + // 123.456 + // ^ pend + // ^ p + // ^ before + // exponent = -3 + // i = 123456 + // digit_count = 3 - (-3) = 6 + exponent = before - p; + answer.fraction_part_start = (char*)before; + answer.fraction_part_len = (size_t)(p - before); + digit_count -= exponent; + } + if (basic_json_fmt) { + // at least 1 digit in fractional part + if (has_decimal_point && exponent == 0) { + return ffc_report_parse_error(p, FFC_PARSE_OUTCOME_JSON_NO_DIGITS_IN_FRACTIONAL_PART); + } + } else if (digit_count == 0) { // we must have encountered at least one integer! + return ffc_report_parse_error(p, FFC_PARSE_OUTCOME_NO_DIGITS_IN_MANTISSA); + } + + /* explicit exponential part */ + int64_t exp_number = 0; + if (((uint64_t)(fmt & FFC_FORMAT_FLAG_SCIENTIFIC) && (p != pend) && + (('e' == *p) || ('E' == *p))) || + ((uint64_t)(fmt & FFC_FORMAT_FLAG_BASIC_FORTRAN) && (p != pend) && + (('+' == *p) || ('-' == *p) || ('d' == *p) || + ('D' == *p)))) { + char const *location_of_e = p; + if (('e' == *p) || ('E' == *p) || ('d' == *p) || + ('D' == *p)) { + ++p; + } + bool neg_exp = false; + if ((p != pend) && ('-' == *p)) { + neg_exp = true; + ++p; + } else if ((p != pend) && ('+' == *p)) { // '+' on exponent is allowed by C++17 20.19.3.(7.1) + ++p; + } + if ((p == pend) || !ffc_is_integer(*p)) { + if (!(uint64_t)(fmt & FFC_FORMAT_FLAG_FIXED)) { + // The exponential part is invalid for scientific notation, so it must + // be a trailing token for fixed notation. However, fixed notation is + // disabled, so report a scientific notation error. + return ffc_report_parse_error(p, FFC_PARSE_OUTCOME_MISSING_EXPONENTIAL_PART); + } + // Otherwise, we will be ignoring the 'e'. + p = location_of_e; + } else { + while ((p != pend) && ffc_is_integer(*p)) { + uint8_t digit = (uint8_t)(*p - '0'); + if (exp_number < 0x10000000) { + exp_number = 10 * exp_number + digit; + } + ++p; + } + if (neg_exp) { + exp_number = -exp_number; + } + exponent += exp_number; + } + } else { + // If it scientific and not fixed, we have to bail out. + if ((uint64_t)(fmt & FFC_FORMAT_FLAG_SCIENTIFIC) && + !(uint64_t)(fmt & FFC_FORMAT_FLAG_FIXED)) { + return ffc_report_parse_error(p, FFC_PARSE_OUTCOME_MISSING_EXPONENTIAL_PART); + } + } + answer.lastmatch = p; + answer.valid = true; + + // If we frequently had to deal with long strings of digits, + // we could extend our code by using a 128-bit integer instead + // of a 64-bit integer. However, this is uncommon. + // + // We can deal with up to 19 digits. + if (digit_count > 19) { // this is uncommon + ffc_debug("high digit_count %lld\n", digit_count); + // It is possible that the integer had an overflow. + // We have to handle the case where we have 0.0000somenumber. + // We need to be mindful of the case where we only have zeroes... + // E.g., 0.000000000...000. + char const *start = start_digits; + while ((start != pend) && (*start == '0' || *start == decimal_point)) { + if (*start == '0') { + digit_count--; + } + start++; + } + + if (digit_count > 19) { + answer.too_many_digits = true; + // Let us start again, this time, avoiding overflows. + // We don't need to call if is_integer, since we use the + // pre-tokenized spans from above. + i = 0; + p = answer.int_part_start; + char const *int_end = p + answer.int_part_len; + uint64_t const minimal_nineteen_digit_integer = 1000000000000000000; + while ((i < minimal_nineteen_digit_integer) && (p != int_end)) { + i = i * 10 + (uint64_t)(*p - '0'); + ++p; + } + if (i >= minimal_nineteen_digit_integer) { // We have a big integer + exponent = end_of_integer_part - p + exp_number; + } else { // We have a value with a fractional component. + p = answer.fraction_part_start; + char const *frac_end = p + answer.fraction_part_len; + while ((i < minimal_nineteen_digit_integer) && (p != frac_end)) { + i = i * 10 + (uint64_t)(*p - '0'); + ++p; + } + exponent = answer.fraction_part_start - p + exp_number; + } + // We have now corrected both exponent and i, to a truncated value + } + } + answer.exponent = exponent; + answer.mantissa = i; + return answer; +} + +/** + * Special case +inf, -inf, nan, infinity, -infinity. + * The case comparisons could be made much faster given that we know that the + * strings a null-free and fixed. + **/ +ffc_internal ffc_inline +ffc_result ffc_parse_infnan( + char *first, char *last, + ffc_value *value, + ffc_value_kind vk, + ffc_format fmt +) { + ffc_debug("parse_infnan\n"); + ffc_result answer; + answer.ptr = first; + answer.outcome = FFC_OUTCOME_OK; // be optimistic + // assume first < last, so dereference without checks; + bool const minus_sign = (*first == '-'); + // C++17 20.19.3.(7.1) explicitly forbids '+' sign here + if ((*first == '-') || + ((uint64_t)(fmt & FFC_FORMAT_FLAG_ALLOW_LEADING_PLUS) && + (*first == '+'))) { + ++first; + } + if (last - first >= 3) { + if (ffc_strncasecmp3(first, "nan", 1)) { + answer.ptr = (first += 3); + + // The macro casts the literal AFTER applying the negative sign. This is ok: + // -NAN is a double negative NaN, and (float)(-NAN) correctly produces a negative float NaN. + // Same for infinity — (float)(-INFINITY) gives negative float infinity + ffc_set_value(value, vk, minus_sign ? -NAN : NAN); + // Check for possible nan(n-char-seq-opt), C++17 20.19.3.7, + // C11 7.20.1.3.3. At least MSVC produces nan(ind) and nan(snan). + if (first != last && *first == '(') { + for (char *ptr = first + 1; ptr != last; ++ptr) { + if (*ptr == ')') { + answer.ptr = ptr + 1; // valid nan(n-char-seq-opt) + break; + } else if (!(('a' <= *ptr && *ptr <= 'z') || + ('A' <= *ptr && *ptr <= 'Z') || + ('0' <= *ptr && *ptr <= '9') || *ptr == '_')) + break; // forbidden char, not nan(n-char-seq-opt) + } + } + return answer; + } + if (ffc_strncasecmp3(first, "infinity", 1)) { + if ((last - first >= 8) && + ffc_strncasecmp5(first + 3, (char*)&"infinity"[3], 1)) { + answer.ptr = first + 8; + } else { + answer.ptr = first + 3; + } + ffc_set_value(value, vk, minus_sign ? -INFINITY : INFINITY); + return answer; + } + } + answer.outcome = FFC_OUTCOME_INVALID_INPUT; + return answer; +} + +ffc_internal ffc_inline +ffc_result ffc_parse_int_string( + char const *p, + char const *pend, + ffc_int_value *value, + ffc_int_kind ik, + ffc_parse_options options, + int const base + ) { + ffc_debug("input '%.*s'... ", (int)(pend - p), p); + ffc_format const fmt = options.format; + + if ((uint64_t)(fmt & FFC_FORMAT_FLAG_SKIP_WHITE_SPACE)) { + while ((p != pend) && ffc_is_space(*p)) { + p++; + } + } + + if (p == pend || base < 2 || base > 36) { + ffc_result invalid_input_result; + invalid_input_result.ptr = (char*)p; + invalid_input_result.outcome = FFC_OUTCOME_INVALID_INPUT; + return invalid_input_result; + } + + ffc_result answer; + char const *const first = p; + + bool const negative = (*p == (char)('-')); + + if (!ffc_int_kind_is_signed(ik) && negative) { + answer.outcome = FFC_OUTCOME_INVALID_INPUT; + answer.ptr = (char*)first; + return answer; + } + if ((*p == (char)('-')) || + ((uint64_t)(fmt & FFC_FORMAT_FLAG_ALLOW_LEADING_PLUS) && (*p == (char)('+')))) { + ++p; + } + + char const *const start_num = p; + + while (p != pend && *p == (char)('0')) { + ++p; + } + + bool const has_leading_zeros = p > start_num; + + char const *const start_digits = p; + + uint64_t i = 0; + if (base == 10) { + ffc_loop_parse_if_eight_digits(&p, pend, &i); // use SIMD if possible + } + while (p != pend) { + uint8_t digit = ffc_char_to_digit(*p); + if (digit >= base) { + break; + } + i = (uint64_t)(base) * i + digit; // might overflow, check this later + p++; + } + + size_t digit_count = (size_t)(p - start_digits); + + if (digit_count == 0) { + if (has_leading_zeros) { + value->u64 = 0; // Must zero the largest variant! + answer.outcome = FFC_OUTCOME_OK; + answer.ptr = p; + } else { + answer.outcome = FFC_OUTCOME_INVALID_INPUT; + answer.ptr = first; + } + return answer; + } + + answer.ptr = p; + + // check u64 overflow + size_t max_digits = ffc_max_digits_u64(base); + ffc_debug("digit_count %d, max_digits: %d\n", digit_count, max_digits); + if (digit_count > max_digits) { + answer.outcome = FFC_OUTCOME_OUT_OF_RANGE; + return answer; + } + // this check can be eliminated for all other types, but they will all require + // a max_digits(base) equivalent + if (digit_count == max_digits && i < ffc_min_safe_u64_of_base(base)) { + answer.outcome = FFC_OUTCOME_OUT_OF_RANGE; + return answer; + } + + ffc_debug("i is %lld, ik is %s\n", i, (ik == FFC_INT_KIND_U64 ? "u64" : + ik == FFC_INT_KIND_S64 ? "s64" : + ik == FFC_INT_KIND_U32 ? "u32" : + ik == FFC_INT_KIND_S32 ? "s32" : "?")); + // check other types overflow + if (ik != FFC_INT_KIND_U64) { + // Allow 1 greater magnitude when negative + if (i > ffc_int_value_max(ik) + (uint64_t)(negative)) { + answer.outcome = FFC_OUTCOME_OUT_OF_RANGE; + return answer; + } + } + + // All signed conversion goes through unsigned arithmetic to avoid UB + if (negative) { + uint64_t neg_i = ~i + 1; // This is the two's complement negation + // write into the signed slot via union + switch (ik) { + case FFC_INT_KIND_S64: value->s64 = (int64_t)neg_i; break; // implementation-defined, but works everywhere + case FFC_INT_KIND_S32: value->s32 = (int32_t)(int64_t)neg_i; break; + // unsigned kinds can't be negative — guarded by the range check above + // case FFC_INT_KIND_S16: value.i16 = (int16_t)(int64_t)neg_i; break; + // case FFC_INT_KIND_S8: value.i8 = (int8_t)(int64_t)neg_i; break; + } + } else { + switch (ik) { + case FFC_INT_KIND_S64: value->s64 = (int64_t)i; break; + case FFC_INT_KIND_S32: value->s32 = (int32_t)(int64_t)i; break; + case FFC_INT_KIND_U64: value->u64 = i; break; + case FFC_INT_KIND_U32: value->u32 = (uint32_t)i; break; + } + } + + answer.outcome = FFC_OUTCOME_OK; + return answer; +} + +#ifdef FFC_DEBUG + +#include <stdio.h> +ffc_internal ffc_inline +void ffc_dump_parsed(ffc_parsed const p) { + (void)p; + ffc_debug("mantissa: %llu\n", (unsigned long long)p.mantissa); + ffc_debug("exponent: %lld\n", (long long)p.exponent); + ffc_debug("negative: %d\n", p.negative); + ffc_debug("valid: %d\n", p.valid); + ffc_debug("too_many_digits: %d\n", p.too_many_digits); + ffc_debug("int_part_len: %zu\n", p.int_part_len); + ffc_debug("fraction_part_len: %zu\n", p.fraction_part_len); +} + +#endif + +/* section end: parse */ + +#endif // FFC_PARSE_H +#ifndef FFC_DIGIT_COMPARISON_H +#define FFC_DIGIT_COMPARISON_H + +#ifndef FFC_BIGINT_H +#define FFC_BIGINT_H + + +// rust style `try!()` macro, or `?` operator +#define FFC_TRY(x) \ + { \ + if (!(x)) \ + return false; \ + } + +// the limb width: we want efficient multiplication of double the bits in +// limb, or for 64-bit limbs, at least 64-bit multiplication where we can +// extract the high and low parts efficiently. this is every 64-bit +// architecture except for sparc, which emulates 128-bit multiplication. +// we might have platforms where `CHAR_BIT` is not 8, so let's avoid +// doing `8 * sizeof(limb)`. +#if defined(FFC_64BIT) && !defined(__sparc) +#define FFC_64BIT_LIMB 1 +typedef uint64_t ffc_bigint_limb; +#define FFC_LIMB_BITS 64 +#else +#define FFC_32BIT_LIMB +typedef uint32_t ffc_bigint_limb; +#define FFC_LIMB_BITS 32 +#endif + +typedef struct { ffc_bigint_limb* ptr; size_t len; } ffc_bigint_limb_span; + +ffc_internal ffc_inline +ffc_bigint_limb ffc_limb_span_index(ffc_bigint_limb_span limb_span, size_t index) { + FFC_DEBUG_ASSERT(index < limb_span.len); + return limb_span.ptr[index]; +} +#define ffc_span_index(span, index) ffc_limb_span_index(span, index) + +// number of bits in a bigint. this needs to be at least the number +// of bits required to store the largest bigint, which is +// `log2(10**(digits + max_exp))`, or `log2(10**(767 + 342))`, or +// ~3600 bits, so we round to 4000. +#define FFC_BIGINT_BITS 4000 + +// vector-like type that is allocated on the stack. the entire +// buffer is pre-allocated, and only the length changes. + +// SV_LIMB_COUNT should be 125 or 32-bit systems or 62 for 64-bit systems +#define SV_LIMB_COUNT FFC_BIGINT_BITS / FFC_LIMB_BITS + +typedef struct ffc_sv { + ffc_bigint_limb data[SV_LIMB_COUNT]; + // we never need more than 150 limbs + uint16_t len; +} ffc_sv; + +// add items to the vector, from a span, without bounds checking +ffc_internal ffc_inline +void ffc_sv_extend_unchecked(ffc_sv* sv, ffc_bigint_limb_span s) { + ffc_bigint_limb *ptr = sv->data + sv->len; + + size_t s_bytes = s.len * sizeof(ffc_bigint_limb); + memcpy(ptr, s.ptr, s_bytes); + sv->len += s.len; +} + +// try to add items to the vector, returning if items were added +ffc_internal ffc_inline +bool ffc_sv_try_extend(ffc_sv* sv, ffc_bigint_limb_span s) { + if (sv->len + s.len <= SV_LIMB_COUNT) { + ffc_sv_extend_unchecked(sv, s); + return true; + } else { + return false; + } +} + +// create from existing limb span. +ffc_internal ffc_inline +ffc_sv ffc_sv_create(ffc_bigint_limb_span s) { + ffc_sv new_one; + new_one.len = 0; + ffc_sv_try_extend(&new_one, s); + return new_one; +} + +ffc_internal ffc_inline +ffc_bigint_limb ffc_sv_index(ffc_sv sv, size_t index) { + FFC_DEBUG_ASSERT(index < sv.len); + return sv.data[index]; +} + +// index from the end of the container +ffc_internal ffc_inline +ffc_bigint_limb ffc_sv_rindex(ffc_sv sv, size_t index) { + FFC_DEBUG_ASSERT(index < sv.len); + size_t rindex = sv.len - index - 1; + return sv.data[rindex]; +} + +// append item to vector, without bounds checking +ffc_internal ffc_inline +void ffc_sv_push_unchecked(ffc_sv* sv, ffc_bigint_limb value) { + sv->data[sv->len] = value; + sv->len++; +} + +// append item to vector, returning if item was added +ffc_internal ffc_inline +bool ffc_sv_try_push(ffc_sv* sv, ffc_bigint_limb value) { + if (sv->len < SV_LIMB_COUNT) { + ffc_sv_push_unchecked(sv, value); + return true; + } else { + return false; + } +} + +// try to reserve new_len limbs, filling with limbs +// FF_DIVERGE: We remove some extra helpers and simply fill with zeros +ffc_internal ffc_inline +bool ffc_sv_try_reserve(ffc_sv* sv, size_t new_len) { + if (new_len > SV_LIMB_COUNT) { + return false; + } else { + if (new_len > sv->len) { + size_t fill_count = new_len - sv->len; + ffc_bigint_limb *first = sv->data + sv->len; + ffc_bigint_limb *last = first + fill_count; + for (ffc_bigint_limb* p = first; p < last; p++) { + *p = 0; + } + sv->len = (uint16_t)new_len; + } else { + sv->len = (uint16_t)new_len; + } + return true; + } +} + +// check if any limbs are non-zero after the given index. +ffc_internal ffc_inline +bool ffc_sv_exists_nonzero_after(ffc_sv sv, size_t index) { + while (index < sv.len) { + if (ffc_sv_rindex(sv, index) != 0) { + return true; + } + index++; + } + return false; +} + +// normalize the big integer, so most-significant zero limbs are removed. +ffc_internal ffc_inline +void ffc_sv_normalize(ffc_sv* sv) { + while (sv->len > 0 && ffc_sv_rindex(*sv, 0) == 0) { + sv->len--; + } +} + +ffc_internal ffc_inline +uint64_t ffc_uint64_hi64_1(uint64_t r0, bool* truncated) { + FFC_DEBUG_ASSERT(r0 != 0); + *truncated = false; + int shl = (int)ffc_count_leading_zeroes(r0); + return r0 << shl; +} + +ffc_internal ffc_inline +uint64_t ffc_uint64_hi64_2(uint64_t r0, uint64_t r1, bool* truncated) { + FFC_DEBUG_ASSERT(r0 != 0); + int shl = (int)ffc_count_leading_zeroes(r0); + if (shl == 0) { + *truncated = r1 != 0; + return r0; + } else { + int shr = 64 - shl; + *truncated = (r1 << shl) != 0; + return (r0 << shl) | (r1 >> shr); + } +} + +ffc_internal ffc_inline +uint64_t ffc_uint32_hi64_1(uint32_t r0, bool* truncated) { + return ffc_uint64_hi64_1(r0, truncated); +} + +ffc_internal ffc_inline +uint64_t ffc_uint32_hi64_2(uint32_t r0, uint32_t r1, bool* truncated) { + uint64_t x0 = r0; + uint64_t x1 = r1; + return ffc_uint64_hi64_1((x0 << 32) | x1, truncated); +} + +ffc_internal ffc_inline +uint64_t ffc_uint32_hi64_3(uint32_t r0, uint32_t r1, uint32_t r2, bool* truncated) { + uint64_t x0 = r0; + uint64_t x1 = r1; + uint64_t x2 = r2; + return ffc_uint64_hi64_2(x0, (x1 << 32) | x2, truncated); +} + +// add two small integers, checking for overflow. +// we want an efficient operation. for msvc, where +// we don't have built-in intrinsics, this is still +// pretty fast. +ffc_internal ffc_inline +ffc_bigint_limb ffc_bigint_scalar_add(ffc_bigint_limb x, ffc_bigint_limb y, bool* overflow) { + ffc_bigint_limb z; +// gcc and clang +#if defined(__has_builtin) +#if __has_builtin(__builtin_add_overflow) + *overflow = __builtin_add_overflow(x, y, &z); + return z; +#endif +#endif + + // generic, this still optimizes correctly on MSVC. + z = x + y; + *overflow = z < x; + return z; +} + +// multiply two small integers, getting both the high and low bits. +ffc_inline ffc_internal +ffc_bigint_limb ffc_bigint_scalar_mul(ffc_bigint_limb x, ffc_bigint_limb y, ffc_bigint_limb* carry) { +#ifdef FFC_64BIT_LIMB +#if defined(__SIZEOF_INT128__) + // GCC and clang both define it as an extension. + __uint128_t z = (__uint128_t)(x) * (__uint128_t)(y) + (__uint128_t)(*carry); + *carry = (ffc_bigint_limb)(z >> FFC_LIMB_BITS); + return (ffc_bigint_limb)(z); +#else + // fallback, no native 128-bit integer multiplication with carry. + // on msvc, this optimizes identically, somehow. + ffc_u128 z = ffc_full_multiplication(x, y); + bool overflow; + z.low = ffc_bigint_scalar_add(z.low, *carry, &overflow); + z.high += (uint64_t)(overflow); // cannot overflow + *carry = z.high; + return z.low; +#endif +#else + uint64_t z = (uint64_t)(x) * (uint64_t)(y) + (uint64_t)(*carry); + *carry = (ffc_bigint_limb)(z >> FFC_LIMB_BITS); + return (ffc_bigint_limb)(z); +#endif +} + +// add scalar value to bigint starting from offset. +// used in grade school multiplication +ffc_internal ffc_inline +bool ffc_bigint_small_add_from(ffc_sv* sv, ffc_bigint_limb y, size_t start) { + size_t index = start; + ffc_bigint_limb carry = y; + bool overflow; + while (carry != 0 && index < sv->len) { + sv->data[index] = ffc_bigint_scalar_add(sv->data[index], carry, &overflow); + carry = (ffc_bigint_limb)(overflow); + index += 1; + } + if (carry != 0) { + FFC_TRY(ffc_sv_try_push(sv, carry)); + } + return true; +} + +// add scalar value to bigint. +ffc_internal ffc_inline +bool ffc_bigint_small_add(ffc_sv* sv, ffc_bigint_limb y) { + return ffc_bigint_small_add_from(sv, y, 0); +} + +// multiply bigint by scalar value. +ffc_internal +bool ffc_bigint_small_mul(ffc_sv* sv, ffc_bigint_limb y) { + ffc_bigint_limb carry = 0; + for (size_t index = 0; index < sv->len; index++) { + sv->data[index] = ffc_bigint_scalar_mul(sv->data[index], y, &carry); + } + if (carry != 0) { + FFC_TRY(ffc_sv_try_push(sv, carry)); + } + return true; +} + +// add bigint to bigint starting from index. +// used in grade school multiplication +ffc_internal +bool ffc_bigint_large_add_from(ffc_sv* x, ffc_bigint_limb_span y, size_t start) { + // the effective x buffer is from `xstart..x.len()`, so exit early + // if we can't get that current range. + + // FFC_DIVERGE: We are calling our try_reserve instead of the o.g. try_resize + if (x->len < start || y.len > x->len - start) { + FFC_TRY(ffc_sv_try_reserve(x, y.len + start)); + } + + bool carry = false; + for (size_t index = 0; index < y.len; index++) { + ffc_bigint_limb xi = x->data[index + start]; + ffc_bigint_limb yi = ffc_span_index(y, index); + bool c1 = false; + bool c2 = false; + xi = ffc_bigint_scalar_add(xi, yi, &c1); + if (carry) { + xi = ffc_bigint_scalar_add(xi, 1, &c2); + } + x->data[index + start] = xi; + carry = c1 | c2; + } + + // handle overflow + if (carry) { + FFC_TRY(ffc_bigint_small_add_from(x, 1, y.len + start)); + } + return true; +} + +// add bigint to bigint. +ffc_internal ffc_inline +bool ffc_sv_large_add_from_zero(ffc_sv* x, ffc_bigint_limb_span y) { + return ffc_bigint_large_add_from(x, y, 0); +} + +// grade-school multiplication algorithm +ffc_internal +bool ffc_bigint_long_mul(ffc_sv* x, ffc_bigint_limb_span y) { + ffc_bigint_limb_span xs; + xs.ptr = x->data; + xs.len = x->len; + + // full copy of x into z + ffc_sv z = ffc_sv_create(xs); + + ffc_bigint_limb_span zs; + zs.ptr = z.data; + zs.len = z.len; + + if (y.len != 0) { + ffc_bigint_limb y0 = ffc_span_index(y, 0); + FFC_TRY(ffc_bigint_small_mul(x, y0)); + for (size_t index = 1; index < y.len; index++) { + + ffc_bigint_limb yi = ffc_span_index(y, index); + ffc_sv zi; // re-use the same buffer throughout + + if (yi != 0) { + zi.len = 0; + FFC_TRY(ffc_sv_try_extend(&zi, zs)); + FFC_TRY(ffc_bigint_small_mul(&zi, yi)); + ffc_bigint_limb_span zis; + zis.ptr = zi.data; + zis.len = zi.len; + FFC_TRY(ffc_bigint_large_add_from(x, zis, index)); + } + } + } + + ffc_sv_normalize(x); + return true; +} + +// grade-school multiplication algorithm +ffc_internal ffc_inline +bool ffc_bigint_large_mul(ffc_sv* x, ffc_bigint_limb_span y) { + if (y.len == 1) { + FFC_TRY(ffc_bigint_small_mul(x, ffc_span_index(y,0))); + } else { + FFC_TRY(ffc_bigint_long_mul(x, y)); + } + return true; +} + +static const uint32_t pow5_tables_large_step = 135; +static const uint64_t pow5_tables_small_powers[] = { + 1ULL, + 5ULL, + 25ULL, + 125ULL, + 625ULL, + 3125ULL, + 15625ULL, + 78125ULL, + 390625ULL, + 1953125ULL, + 9765625ULL, + 48828125ULL, + 244140625ULL, + 1220703125ULL, + 6103515625ULL, + 30517578125ULL, + 152587890625ULL, + 762939453125ULL, + 3814697265625ULL, + 19073486328125ULL, + 95367431640625ULL, + 476837158203125ULL, + 2384185791015625ULL, + 11920928955078125ULL, + 59604644775390625ULL, + 298023223876953125ULL, + 1490116119384765625ULL, + 7450580596923828125ULL, +}; +#ifdef FFC_64BIT_LIMB + static const ffc_bigint_limb ffc_large_power_of_5[] = { + 1414648277510068013ULL, 9180637584431281687ULL, 4539964771860779200ULL, + 10482974169319127550ULL, 198276706040285095ULL}; +#else + static const ffc_bigint_limb ffc_large_power_of_5[] = { + 4279965485U, 329373468U, 4020270615U, 2137533757U, 4287402176U, + 1057042919U, 1071430142U, 2440757623U, 381945767U, 46164893U}; +#endif + +// big integer type. implements a small subset of big integer +// arithmetic, using simple algorithms since asymptotically +// faster algorithms are slower for a small number of limbs. +// all operations assume the big-integer is normalized. +typedef struct ffc_bigint { + // storage of the limbs, in little-endian order. + ffc_sv vec; +} ffc_bigint; + +ffc_inline ffc_internal +ffc_bigint ffc_bigint_empty(void) { + ffc_sv sv; + sv.len = 0; + + ffc_bigint sv_bigint; + sv_bigint.vec = sv; + return sv_bigint; +} + +ffc_inline ffc_internal +ffc_bigint ffc_bigint_make(uint64_t value) { + ffc_sv sv; + sv.len = 0; +#ifdef FFC_64BIT_LIMB + ffc_sv_push_unchecked(&sv, value); +#else + ffc_sv_push_unchecked(&sv, (uint32_t)(value)); + ffc_sv_push_unchecked(&sv, (uint32_t)(value >> 32)); +#endif + ffc_sv_normalize(&sv); + + ffc_bigint sv_bigint; + sv_bigint.vec = sv; + return sv_bigint; +} + +// get the high 64 bits from the vector, and if bits were truncated. +// this is to get the significant digits for the float. +ffc_inline ffc_internal +uint64_t ffc_bigint_hi64(ffc_bigint me, bool* truncated) { + ffc_sv vec = me.vec; +#ifdef FFC_64BIT_LIMB + if (vec.len == 0) { + *truncated = false; + return 0; + } else if (vec.len == 1) { + return ffc_uint64_hi64_1(ffc_sv_rindex(vec,0), truncated); + } else { + uint64_t result = ffc_uint64_hi64_2(ffc_sv_rindex(vec, 0), ffc_sv_rindex(vec, 1), truncated); + *truncated |= ffc_sv_exists_nonzero_after(vec, 2); + return result; + } +#else + if (vec.len == 0) { + *truncated = false; + return 0; + } else if (vec.len == 1) { + return ffc_uint32_hi64_1(ffc_sv_rindex(vec,0), truncated); + } else if (vec.len == 2) { + return ffc_uint32_hi64_2(ffc_sv_rindex(vec,0), ffc_sv_rindex(vec,1), truncated); + } else { + uint64_t result = ffc_uint32_hi64_3( + ffc_sv_rindex(vec,0), + ffc_sv_rindex(vec,1), + ffc_sv_rindex(vec,2), + truncated + ); + *truncated |= ffc_sv_exists_nonzero_after(vec , 3); + return result; + } +#endif +} + +// compare two big integers, returning the large value. +// assumes both are normalized. if the return value is +// negative, other is larger, if the return value is +// positive, this is larger, otherwise they are equal. +// the limbs are stored in little-endian order, so we +// must compare the limbs in ever order. +ffc_internal ffc_inline +int ffc_bigint_compare(ffc_bigint me, ffc_bigint const *other) { + if (me.vec.len > other->vec.len) { + return 1; + } else if (me.vec.len < other->vec.len) { + return -1; + } else { + for (size_t index = me.vec.len; index > 0; index--) { + ffc_bigint_limb xi = ffc_sv_index(me.vec, index - 1); + ffc_bigint_limb yi = ffc_sv_index(other->vec, index - 1); + if (xi > yi) { + return 1; + } else if (xi < yi) { + return -1; + } + } + return 0; + } +} + +// shift left each limb n bits, carrying over to the new limb +// returns true if we were able to shift all the digits. +ffc_internal ffc_inline +bool ffc_bigint_shl_bits(ffc_bigint* me, size_t n) { + // Internally, for each item, we shift left by n, and add the previous + // right shifted limb-bits. + // For example, we transform (for u8) shifted left 2, to: + // b10100100 b01000010 + // b10 b10010001 b00001000 + FFC_DEBUG_ASSERT(n != 0); + FFC_DEBUG_ASSERT(n < sizeof(ffc_bigint_limb) * 8); + + size_t shl = n; + size_t shr = FFC_LIMB_BITS - shl; + ffc_bigint_limb prev = 0; + for (size_t index = 0; index < me->vec.len; index++) { + ffc_bigint_limb xi = ffc_sv_index(me->vec, index); + me->vec.data[index] = (xi << shl) | (prev >> shr); + prev = xi; + } + + ffc_bigint_limb carry = prev >> shr; + if (carry != 0) { + return ffc_sv_try_push(&me->vec, carry); + } + return true; +} + +// move the limbs left by `n` limbs. +ffc_internal ffc_inline +bool ffc_bigint_shl_limbs(ffc_bigint* me, size_t n) { + FFC_DEBUG_ASSERT(n != 0); + if (n + me->vec.len > SV_LIMB_COUNT) { + return false; + } else if (me->vec.len != 0) { + // move limbs + ffc_bigint_limb *dst = me->vec.data + n; + ffc_bigint_limb const *src = me->vec.data; + // std::copy_backward(src, src + vec.len(), dst + vec.len()); + // memmove to handle the overlap + memmove(dst, src, me->vec.len * sizeof(ffc_bigint_limb)); + + // fill in empty limbs + ffc_bigint_limb *first = me->vec.data; + // ffc_bigint_limb *last = first + n; + // ::std::fill(first, last, 0); + memset(first, 0, n * sizeof(ffc_bigint_limb)); + me->vec.len += n; + return true; + } else { + return true; + } +} + +// move the limbs left by `n` bits. +ffc_internal ffc_inline +bool ffc_bigint_shl(ffc_bigint* me, size_t n) { + size_t rem = n % FFC_LIMB_BITS; + size_t div = n / FFC_LIMB_BITS; + if (rem != 0) { + FFC_TRY(ffc_bigint_shl_bits(me, rem)); + } + if (div != 0) { + FFC_TRY(ffc_bigint_shl_limbs(me, div)); + } + return true; +} + +// get the number of leading zeros in the bigint. +ffc_internal ffc_inline +int ffc_bigint_ctlz(ffc_bigint me) { + if (me.vec.len == 0) { + return 0; + } else { +#ifdef FFC_64BIT_LIMB + return (int)ffc_count_leading_zeroes(ffc_sv_rindex(me.vec, 0)); +#else + // no use defining a specialized count_leading_zeros for a 32-bit type. + uint64_t r0 = ffc_sv_rindex(me.vec, 0); + return ffc_count_leading_zeroes(r0 << 32); +#endif + } +} + +// get the number of bits in the bigint. +ffc_internal ffc_inline +int ffc_bigint_bit_length(ffc_bigint me) { + int lz = ffc_bigint_ctlz(me); + return (int)(FFC_LIMB_BITS * me.vec.len) - lz; +} + +ffc_internal ffc_inline +bool ffc_bigint_mul(ffc_bigint* me, ffc_bigint_limb y) { return ffc_bigint_small_mul(&me->vec, y); } + +ffc_internal ffc_inline +bool ffc_bigint_add(ffc_bigint* me, ffc_bigint_limb y) { return ffc_bigint_small_add(&me->vec, y); } + +// multiply as if by 2 raised to a power. +ffc_internal ffc_inline +bool ffc_bigint_pow2(ffc_bigint* me, uint32_t exp) { return ffc_bigint_shl(me, exp); } + +// multiply as if by 5 raised to a power. +ffc_internal ffc_inline +bool ffc_bigint_pow5(ffc_bigint* me, uint32_t exp) { + // multiply by a power of 5 + size_t large_length = sizeof(ffc_large_power_of_5) / sizeof(ffc_bigint_limb); + + ffc_bigint_limb_span large; + large.ptr = (ffc_bigint_limb*)ffc_large_power_of_5; + large.len = large_length; + + while (exp >= pow5_tables_large_step) { + FFC_TRY(ffc_bigint_large_mul(&me->vec, large)); + exp -= pow5_tables_large_step; + } +#ifdef FFC_64BIT_LIMB + uint32_t small_step = 27; + ffc_bigint_limb max_native = 7450580596923828125UL; +#else + uint32_t small_step = 13; + ffc_bigint_limb max_native = 1220703125U; +#endif + while (exp >= small_step) { + FFC_TRY(ffc_bigint_small_mul(&me->vec, max_native)); + exp -= small_step; + } + if (exp != 0) { + FFC_TRY( + ffc_bigint_small_mul(&me->vec, (ffc_bigint_limb)(pow5_tables_small_powers[exp])) + ); + } + + return true; +} + +// multiply as if by 10 raised to a power. +ffc_internal ffc_inline +bool ffc_bigint_pow10(ffc_bigint* me, uint32_t exp) { + FFC_TRY(ffc_bigint_pow5(me, exp)); + return ffc_bigint_pow2(me, exp); +} + +#undef ffc_span_index +#undef sv_rindex + +#endif // FFC_BIGINT_H + +// 1e0 to 1e19 +static const uint64_t ffc_powers_of_ten_uint64[] = {1UL, + 10UL, + 100UL, + 1000UL, + 10000UL, + 100000UL, + 1000000UL, + 10000000UL, + 100000000UL, + 1000000000UL, + 10000000000UL, + 100000000000UL, + 1000000000000UL, + 10000000000000UL, + 100000000000000UL, + 1000000000000000UL, + 10000000000000000UL, + 100000000000000000UL, + 1000000000000000000UL, + 10000000000000000000UL}; + +// calculate the exponent, in scientific notation, of the number. +// this algorithm is not even close to optimized, but it has no practical +// effect on performance: in order to have a faster algorithm, we'd need +// to slow down performance for faster algorithms, and this is still fast. +ffc_inline ffc_internal int32_t +ffc_scientific_exponent(uint64_t mantissa, int32_t exponent) { + while (mantissa >= 10000) { + mantissa /= 10000; + exponent += 4; + } + while (mantissa >= 100) { + mantissa /= 100; + exponent += 2; + } + while (mantissa >= 10) { + mantissa /= 10; + exponent += 1; + } + return exponent; +} + +// this converts a native floating-point number to an extended-precision float. +ffc_internal ffc_inline ffc_adjusted_mantissa +ffc_to_extended(ffc_value value, ffc_value_kind vk) { + uint64_t const exponent_mask = ffc_const(vk, EXPONENT_MASK); + uint64_t const mantissa_mask = ffc_const(vk, MANTISSA_MASK); + uint64_t const hidden_bit_mask = ffc_const(vk, HIDDEN_BIT_MASK); + + // Finish converting this one + ffc_adjusted_mantissa am; + int32_t bias = ffc_const(vk, MANTISSA_EXPLICIT_BITS) - ffc_const(vk, MINIMUM_EXPONENT); + //ffc_int_value bits = ffc_get_value_bits(value, vk); + + switch (vk) { + case FFC_VALUE_KIND_DOUBLE: { + uint64_t bits = ffc_get_double_bits(value.d); + if ((bits & exponent_mask) == 0) { + // denormal + am.power2 = 1 - bias; + am.mantissa = bits & mantissa_mask; + } else { + // normal + am.power2 = (int32_t)((bits & exponent_mask) >> ffc_const(vk, MANTISSA_EXPLICIT_BITS)); + am.power2 -= bias; + am.mantissa = (bits & mantissa_mask) | hidden_bit_mask; + } + break; + } + case FFC_VALUE_KIND_FLOAT: { + uint32_t bits = ffc_get_float_bits(value.f); + if ((bits & exponent_mask) == 0) { + // denormal + am.power2 = 1 - bias; + am.mantissa = bits & mantissa_mask; + } else { + // normal + am.power2 = (int32_t)((bits & exponent_mask) >> ffc_const(vk, MANTISSA_EXPLICIT_BITS)); + am.power2 -= bias; + am.mantissa = (bits & mantissa_mask) | hidden_bit_mask; + } + break; + } + } + + return am; +} + +// get the extended precision value of the halfway point between b and b+u. +// we are given a native float that represents b, so we need to adjust it +// halfway between b and b+u. +ffc_internal ffc_inline +ffc_adjusted_mantissa ffc_to_extended_halfway(ffc_value value, ffc_value_kind vk) { + ffc_adjusted_mantissa am = ffc_to_extended(value, vk); + am.mantissa <<= 1; + am.mantissa += 1; + am.power2 -= 1; + return am; +} + +ffc_internal ffc_inline +void ffc_round_down_impl(ffc_adjusted_mantissa* am, int32_t shift) { + if (shift == 64) { + am->mantissa = 0; + } else { + am->mantissa >>= shift; + } + am->power2 += shift; +} + +// round nearest tie even, resolving ties using the truncated flag +ffc_internal ffc_inline +void ffc_round_nearest_tie_even_truncated(ffc_adjusted_mantissa *am, + int32_t shift, bool truncated) { + uint64_t const mask = (shift == 64) ? UINT64_MAX : ((uint64_t)(1) << shift) - 1; + uint64_t const halfway = (shift == 0) ? 0 : (uint64_t)(1) << (shift - 1); + uint64_t truncated_bits = am->mantissa & mask; + bool is_above = truncated_bits > halfway; + bool is_halfway = truncated_bits == halfway; + + // shift digits into position + if (shift == 64) { + am->mantissa = 0; + } else { + am->mantissa >>= shift; + } + am->power2 += shift; + + bool is_odd = (am->mantissa & 1) == 1; + am->mantissa += (uint64_t)(is_above || (is_halfway && truncated) || + (is_odd && is_halfway)); +} + +// round nearest tie even, resolving ties using a precomputed comparison result +ffc_internal ffc_inline +void ffc_round_nearest_tie_even_cmp(ffc_adjusted_mantissa *am, + int32_t shift, int ord) { + // shift digits into position + if (shift == 64) { + am->mantissa = 0; + } else { + am->mantissa >>= shift; + } + am->power2 += shift; + + bool is_odd = (am->mantissa & 1) == 1; + am->mantissa += (uint64_t)(ord > 0 || (ord == 0 && is_odd)); +} + +// Finalize rounding for a double after the shift callback has been applied. +// This is the common tail of ffc_round_double factored out to avoid +// duplicating the carry/infinite checks at every call site. +ffc_internal ffc_inline +void ffc_round_finish(ffc_adjusted_mantissa *am, ffc_value_kind vk) { + // check for carry + if (am->mantissa >= ((uint64_t)(2) << ffc_const(vk, MANTISSA_EXPLICIT_BITS))) { + am->mantissa = ((uint64_t)(1) << ffc_const(vk, MANTISSA_EXPLICIT_BITS)); + am->power2++; + } + + // check for infinite: we could have carried to an infinite power + am->mantissa &= ~((uint64_t)(1) << ffc_const(vk, MANTISSA_EXPLICIT_BITS)); + if (am->power2 >= ffc_const(vk, INFINITE_POWER)) { + am->power2 = ffc_const(vk, INFINITE_POWER); + am->mantissa = 0; + } +} + +// Common structure for round_double variants +#define ffc_round_core(sub_routine_call, vk) \ + do { \ + int32_t mantissa_shift = 64 - ffc_const(vk, MANTISSA_EXPLICIT_BITS) - 1; \ + if (-am->power2 >= mantissa_shift) { \ + int32_t _shift = -am->power2 + 1; \ + if (_shift > 64) _shift = 64; \ + sub_routine_call; \ + am->power2 = (am->mantissa < ((uint64_t)(1) << ffc_const(vk, MANTISSA_EXPLICIT_BITS))) \ + ? 0 \ + : 1; \ + return; \ + } \ + { int32_t _shift = mantissa_shift; sub_routine_call; } \ + ffc_round_finish(am, vk); \ + } while (0) + +// round down variant of round_double +ffc_internal ffc_inline +void ffc_round_down(ffc_adjusted_mantissa *am, ffc_value_kind vk) { + ffc_round_core(ffc_round_down_impl(am, _shift), vk); +} + +// tie-even (truncated) variant of round_double +ffc_internal ffc_inline +void ffc_round_tie_even_truncated(ffc_adjusted_mantissa *am, + bool truncated, ffc_value_kind vk) { + ffc_round_core(ffc_round_nearest_tie_even_truncated(am, _shift, truncated), vk); +} + +// tie-even (comparison) variant of round_double +ffc_internal ffc_inline +void ffc_round_double_tie_even_cmp(ffc_adjusted_mantissa *am, int ord, ffc_value_kind vk) { + ffc_round_core(ffc_round_nearest_tie_even_cmp(am, _shift, ord), vk); +} + +/* 1-byte chars (char, uint8_t) */ +#define FFC_INT_CMP_ZEROS_1 0x3030303030303030ULL +#define FFC_INT_CMP_LEN_1 8 + +/* 2-byte chars (uint16_t, wchar_t on Windows) */ +#define FFC_INT_CMP_ZEROS_2 0x0030003000300030ULL +#define FFC_INT_CMP_LEN_2 4 + +/* 4-byte chars (uint32_t, wchar_t on Linux) */ +#define FFC_INT_CMP_ZEROS_4 0x0000003000000030ULL +#define FFC_INT_CMP_LEN_4 2 + +ffc_internal ffc_inline +bool ffc_char_eq_zero(char const *p, size_t char_width) { + switch (char_width) { + case 1: return *p == '0'; + case 2: return *(uint16_t const *)p == 0x0030; + case 4: return *(uint32_t const *)p == 0x00000030; + default: return false; + } +} + +ffc_internal ffc_inline +void ffc_skip_zeros(char **first, char *last, size_t char_width) { + size_t cmp_len; + uint64_t cmp_mask; + switch (char_width) { + case 1: + cmp_len = FFC_INT_CMP_LEN_1; + cmp_mask = FFC_INT_CMP_ZEROS_1; + break; + case 2: + cmp_len = FFC_INT_CMP_LEN_2; + cmp_mask = FFC_INT_CMP_ZEROS_2; + break; + case 4: + cmp_len = FFC_INT_CMP_LEN_4; + cmp_mask = FFC_INT_CMP_ZEROS_4; + break; + default: + FFC_DEBUG_ASSERT(0); + return; + } + + uint64_t val; + while ((size_t)(last - *first) >= (cmp_len * char_width)) { + memcpy(&val, *first, sizeof(uint64_t)); + if (val != cmp_mask) { + break; + } + *first += cmp_len * char_width; + } + while (*first != last) { + if (!ffc_char_eq_zero(*first, char_width)) { + break; + } + *first += char_width; + } +} + +// determine if any non-zero digits were truncated. +// all characters must be valid digits. +ffc_internal ffc_inline +bool ffc_is_truncated(char const *first, char const *last, size_t char_width) { + size_t cmp_len; + uint64_t cmp_mask; + switch (char_width) { + case 1: + cmp_len = FFC_INT_CMP_LEN_1; + cmp_mask = FFC_INT_CMP_ZEROS_1; + break; + case 2: + cmp_len = FFC_INT_CMP_LEN_2; + cmp_mask = FFC_INT_CMP_ZEROS_2; + break; + case 4: + cmp_len = FFC_INT_CMP_LEN_4; + cmp_mask = FFC_INT_CMP_ZEROS_4; + break; + default: + FFC_DEBUG_ASSERT(0); + return 0; + } + // do 8-bit optimizations, can just compare to 8 literal 0s. + uint64_t val; + while ((size_t)(last - first) >= cmp_len) { + memcpy(&val, first, sizeof(uint64_t)); + if (val != cmp_mask) { + return true; + } + first += cmp_len * char_width; + } + while (first != last) { + if (!ffc_char_eq_zero(first, char_width)) { + return true; + } + first += char_width; + } + return false; +} + +ffc_internal ffc_inline +void ffc_parse_eight_digits(char **p, ffc_bigint_limb *value, size_t *counter, size_t *count) { + *value = *value * 100000000 + ffc_parse_eight_digits_unrolled(*p); + *p += 8; + *counter += 8; + *count += 8; +} + +ffc_internal ffc_inline +void ffc_parse_one_digit(char **p, ffc_bigint_limb *value, size_t *counter, size_t *count) { + *value = *value * 10 + (ffc_bigint_limb)(**p - '0'); + *p += 1; + *counter += 1; + *count += 1; +} + +ffc_internal ffc_inline +void ffc_add_native(ffc_bigint *big, ffc_bigint_limb power, ffc_bigint_limb value) { + ffc_bigint_mul(big, power); + ffc_bigint_add(big, value); +} + +ffc_internal ffc_inline +void ffc_round_up_bigint(ffc_bigint *big, size_t *count) { + // need to round-up the digits, but need to avoid rounding + // ....9999 to ...10000, which could cause a false halfway point. + ffc_add_native(big, 10, 1); + (*count)++; +} + +// parse the significant digits into a big integer +ffc_internal ffc_inline +void ffc_parse_mantissa(ffc_bigint *result, ffc_parsed num, + size_t max_digits, size_t *const digits) { + // try to minimize the number of big integer and scalar multiplication. + // therefore, try to parse 8 digits at a time, and multiply by the largest + // scalar value (9 or 19 digits) for each step. + size_t counter = 0; + *digits = 0; + ffc_bigint_limb value = 0; +#ifdef FFC_64BIT_LIMB + size_t step = 19; +#else + size_t step = 9; +#endif + + // process all integer digits. + char *p = num.int_part_start; + char *pend = p + num.int_part_len; + ffc_skip_zeros(&p, pend, 1); + // process all digits, in increments of step per loop + while (p != pend) { + while ((pend - p >= 8) && (step - counter >= 8) && + (max_digits - *digits >= 8)) { + ffc_parse_eight_digits(&p, &value, &counter, digits); + } + while (counter < step && p != pend && *digits < max_digits) { + ffc_parse_one_digit(&p, &value, &counter, digits); + } + if (*digits == max_digits) { + // add the temporary value, then check if we've truncated any digits + ffc_add_native(result, (ffc_bigint_limb)(ffc_powers_of_ten_uint64[counter]), value); + bool truncated = ffc_is_truncated(p, pend, 1); + if (num.fraction_part_start != NULL) { + truncated |= ffc_is_truncated(num.fraction_part_start, num.fraction_part_start + num.fraction_part_len, 1); + } + if (truncated) { + ffc_round_up_bigint(result, digits); + } + return; + } else { + ffc_add_native(result, (ffc_bigint_limb)(ffc_powers_of_ten_uint64[counter]), value); + counter = 0; + value = 0; + } + } + + // add our fraction digits, if they're available. + if (num.fraction_part_start != NULL) { + p = num.fraction_part_start; + pend = p + num.fraction_part_len; + if (*digits == 0) { + ffc_skip_zeros(&p, pend, 1); + } + // process all digits, in increments of step per loop + while (p != pend) { + while ((pend - p >= 8) && (step - counter >= 8) && + (max_digits - *digits >= 8)) { + ffc_parse_eight_digits(&p, &value, &counter, digits); + } + while (counter < step && p != pend && *digits < max_digits) { + ffc_parse_one_digit(&p, &value, &counter, digits); + } + if (*digits == max_digits) { + // add the temporary value, then check if we've truncated any digits + ffc_add_native(result, (ffc_bigint_limb)(ffc_powers_of_ten_uint64[counter]), value); + bool truncated = ffc_is_truncated(p, pend, 1); + if (truncated) { + ffc_round_up_bigint(result, digits); + } + return; + } else { + ffc_add_native(result, (ffc_bigint_limb)(ffc_powers_of_ten_uint64[counter]), value); + counter = 0; + value = 0; + } + } + } + + if (counter != 0) { + ffc_add_native(result, (ffc_bigint_limb)(ffc_powers_of_ten_uint64[counter]), value); + } +} + +ffc_internal ffc_inline +ffc_adjusted_mantissa ffc_positive_digit_comp(ffc_bigint *bigmant, int32_t exponent, ffc_value_kind vk) { + for (int i = 0; i < bigmant->vec.len; i++) { + ffc_debug("limb %d: %lld\n", i, bigmant->vec.data[i]); + } + FFC_ASSERT(ffc_bigint_pow10(bigmant, (uint32_t)(exponent))); + ffc_adjusted_mantissa answer; + bool truncated; + answer.mantissa = ffc_bigint_hi64(*bigmant, &truncated); + int bias = ffc_const(vk, MANTISSA_EXPLICIT_BITS) - ffc_const(vk, MINIMUM_EXPONENT); + answer.power2 = ffc_bigint_bit_length(*bigmant) - 64 + bias; + + ffc_round_tie_even_truncated(&answer, truncated, vk); + + return answer; +} + +// the scaling here is quite simple: we have, for the real digits `m * 10^e`, +// and for the theoretical digits `n * 2^f`. Since `e` is always negative, +// to scale them identically, we do `n * 2^f * 5^-f`, so we now have `m * 2^e`. +// we then need to scale by `2^(f- e)`, and then the two significant digits +// are of the same magnitude. +ffc_internal ffc_inline +ffc_adjusted_mantissa ffc_negative_digit_comp( + ffc_bigint *bigmant, ffc_adjusted_mantissa am, int32_t exponent, ffc_value_kind vk) { + ffc_bigint *real_digits = bigmant; + int32_t real_exp = exponent; + + // get the value of `b`, rounded down, and get a bigint representation of b+h + ffc_adjusted_mantissa am_b = am; + ffc_round_down(&am_b, vk); + ffc_value b; + ffc_am_to_float(false, am_b, &b, FFC_VALUE_KIND_DOUBLE); + + ffc_adjusted_mantissa theor = ffc_to_extended_halfway(b, vk); + ffc_bigint theor_digits = ffc_bigint_make(theor.mantissa); + int32_t theor_exp = theor.power2; + + // scale real digits and theor digits to be same power. + int32_t pow2_exp = theor_exp - real_exp; + uint32_t pow5_exp = (uint32_t)(-real_exp); + if (pow5_exp != 0) { + FFC_ASSERT(ffc_bigint_pow5(&theor_digits, pow5_exp)); + } + if (pow2_exp > 0) { + FFC_ASSERT(ffc_bigint_pow2(&theor_digits, (uint32_t)(pow2_exp))); + } else if (pow2_exp < 0) { + FFC_ASSERT(ffc_bigint_pow2(real_digits,(uint32_t)(-pow2_exp))); + } + + // compare digits, and use it to direct rounding + int ord = ffc_bigint_compare(*real_digits, &theor_digits); + ffc_adjusted_mantissa answer = am; + ffc_round_double_tie_even_cmp(&answer, ord, vk); + + return answer; +} + +// parse the significant digits as a big integer to unambiguously round +// the significant digits. here, we are trying to determine how to round +// an extended float representation close to `b+h`, halfway between `b` +// (the float rounded-down) and `b+u`, the next positive float. this +// algorithm is always correct, and uses one of two approaches. when +// the exponent is positive relative to the significant digits (such as +// 1234), we create a big-integer representation, get the high 64-bits, +// determine if any lower bits are truncated, and use that to direct +// rounding. in case of a negative exponent relative to the significant +// digits (such as 1.2345), we create a theoretical representation of +// `b` as a big-integer type, scaled to the same binary exponent as +// the actual digits. we then compare the big integer representations +// of both, and use that to direct rounding. +ffc_internal +ffc_adjusted_mantissa ffc_digit_comp(ffc_parsed num, ffc_adjusted_mantissa am, ffc_value_kind vk) { + ffc_debug("digit_comp\n"); + // remove the invalid exponent bias + am.power2 -= FFC_INVALID_AM_BIAS; + + int32_t sci_exp = + ffc_scientific_exponent(num.mantissa, (int32_t)(num.exponent)); + size_t max_digits = ffc_const(vk, MAX_DIGITS); + size_t digits = 0; + ffc_bigint bigmant = ffc_bigint_empty(); + ffc_parse_mantissa(&bigmant, num, max_digits, &digits); + // can't underflow, since digits is at most max_digits. + int32_t exponent = sci_exp + 1 - (int32_t)(digits); + if (exponent >= 0) { + return ffc_positive_digit_comp(&bigmant, exponent, vk); + } else { + return ffc_negative_digit_comp(&bigmant, am, exponent, vk); + } +} + +#endif // FFC_DIGIT_COMPARISON_H + +/* section: decimal to binary */ + +ffc_inline ffc_internal +ffc_u128 ffc_compute_product_approximation(int64_t q, uint64_t w, ffc_value_kind vk) { + // The required precision is mantissa_explicit_bits + 3 because + // 1. We need the implicit bit + // 2. We need an extra bit for rounding purposes + // 3. We might lose a bit due to the "upperbit" routine (result too small, + // requiring a shift) + uint64_t bit_precision = ffc_const(vk, MANTISSA_EXPLICIT_BITS) + 3; + uint64_t precision_mask = ((uint64_t)(0xFFFFFFFFFFFFFFFF) >> bit_precision); + + // Use FFC_DOUBLE_SMALLEST_POWER_OF_10 (-342) regardless of value kind + // to index into the shared 128 bit table + int const index = 2 * (int)(q - FFC_DOUBLE_SMALLEST_POWER_OF_10); + + // For small values of q, e.g., q in [0,27], the answer is always exact + // because ffc_mul_u64(w, powers_of_five[index]) gives the exact answer. + ffc_u128 firstproduct = ffc_mul_u64(w, FFC_POWERS_OF_FIVE[index]); + + if ((firstproduct.high & precision_mask) == precision_mask) { + // could further guard with (lower + w < lower) + // regarding the second product, we only need secondproduct.hi, but our + // expectation is that the compiler will optimize this extra work away if + // needed. + ffc_u128 secondproduct = ffc_mul_u64(w, FFC_POWERS_OF_FIVE[index + 1]); + + firstproduct.low += secondproduct.high; + if (secondproduct.high > firstproduct.low) { + firstproduct.high++; + } + } + return firstproduct; +} + +/** + * For q in (0,350), we have that + * f = (((152170 + 65536) * q ) >> 16); + * is equal to + * floor(p) + q + * where + * p = log(5**q)/log(2) = q * log(5)/log(2) + * + * For negative values of q in (-400,0), we have that + * f = (((152170 + 65536) * q ) >> 16); + * is equal to + * -ceil(p) + q + * where + * p = log(5**-q)/log(2) = -q * log(5)/log(2) + + * FF_DIVERGE: renamed from detail::power to b10_to_b2 + */ +ffc_internal ffc_inline int32_t ffc_b10_to_b2(int32_t q) { + return (((152170 + 65536) * q) >> 16) + 63; +} + +// Computes w * 10 ** q. +// The returned value should be a valid number that simply needs to be +// packed. However, in some very rare cases, the computation will fail. In such +// cases, we return an adjusted_mantissa with a negative power of 2: the caller +// should recompute in such cases. +ffc_inline ffc_internal +ffc_adjusted_mantissa ffc_compute_float(int64_t q, uint64_t w, ffc_value_kind vk) { + ffc_adjusted_mantissa answer; + if ((w == 0) || (q < ffc_const(vk, SMALLEST_POWER_OF_10))) { + answer.power2 = 0; + answer.mantissa = 0; + // result should be zero + return answer; + } + if (q > ffc_const(vk, LARGEST_POWER_OF_10)) { + // we want to get infinity: + answer.power2 = ffc_const(vk, INFINITE_POWER); + answer.mantissa = 0; + return answer; + } + // At this point in time q is in [powers::smallest_power_of_five, + // powers::largest_power_of_five]. + + // We want the most significant bit of i to be 1. Shift if needed. + int32_t lz = (int32_t)ffc_count_leading_zeroes(w); + w <<= lz; + + ffc_u128 product = ffc_compute_product_approximation(q, w, vk); + + // The computed 'product' is always sufficient. + // Mathematical proof: + // Noble Mushtak and Daniel Lemire, Fast Number Parsing Without Fallback (to + // appear) See script/mushtak_lemire.py + + // The "compute_product_approximation" function can be slightly slower than a + // branchless approach: value128 product = compute_product(q, w); but in + // practice, we can win big with the compute_product_approximation if its + // additional branch is easily predicted. Which is data specific. + int upperbit = (int)(product.high >> 63); + int shift = upperbit + 64 - ffc_const(vk, MANTISSA_EXPLICIT_BITS) - 3; + + answer.mantissa = product.high >> shift; + + // compute a biased-up power of 2 + answer.power2 = (int32_t)(ffc_b10_to_b2((int32_t)(q)) + upperbit - lz - ffc_const(vk, MINIMUM_EXPONENT)); + + if (answer.power2 <= 0) { // subnormal path + + if (-answer.power2 + 1 >= 64) { + // if we have more than 64 bits below the minimum exponent, you + // have a zero for sure. + answer.power2 = 0; + answer.mantissa = 0; + // result should be zero + return answer; + } + // shift is safe because -answer.power2 + 1 < 64 + answer.mantissa >>= -answer.power2 + 1; + + // Thankfully, we can't have both "round-to-even" and subnormals because + // "round-to-even" only occurs for powers close to 0 in the 32-bit and + // and 64-bit case (with no more than 19 digits), so we round up. + answer.mantissa += (answer.mantissa & 1); // round up + answer.mantissa >>= 1; + + // weird scenario: + // Suppose we start with 2.2250738585072013e-308, we end up + // with 0x3fffffffffffff x 2^-1023-53 which is technically subnormal + // whereas 0x40000000000000 x 2^-1023-53 is normal. Now, we need to round + // up 0x3fffffffffffff x 2^-1023-53 and once we do, we are no longer + // subnormal, but we can only know this after rounding. + // So we only declare a subnormal if we are smaller than the threshold. + answer.power2 = + (answer.mantissa < ((uint64_t)(1) << ffc_const(vk, MANTISSA_EXPLICIT_BITS))) ? 0 : 1; + return answer; + } // subnormal + + // usually, we round *up*, but if we fall right in between and we have an + // even basis, we need to round down + // We are only concerned with the cases where 5**q fits in single 64-bit word. + + // 'extremely sparse low bits in our product' and + // 'q is within the round to even range' and + // 'mantissa lowest 2 bits are exactly 01' + if ((product.low <= 1) && (q >= ffc_const(vk, MIN_EXPONENT_ROUND_TO_EVEN)) && + (q <= ffc_const(vk, MAX_EXPONENT_ROUND_TO_EVEN)) && + ((answer.mantissa & 3) == 1)) { // we may fall between two floats! + // + // To be in-between two floats we need that in doing + // answer.mantissa = product.high >> (upperbit + 64 - + // binary::mantissa_explicit_bits() - 3); + // ... we dropped out only zeroes. But if this happened, then we can go + // back!!! + + // mask off last bit + if ((answer.mantissa << shift) == product.high) { + answer.mantissa &= ~(uint64_t)(1); + } + } + + answer.mantissa += (answer.mantissa & 1); // round up + answer.mantissa >>= 1; + if (answer.mantissa >= ((uint64_t)(2) << ffc_const(vk, MANTISSA_EXPLICIT_BITS))) { + answer.mantissa = ((uint64_t)(1) << ffc_const(vk, MANTISSA_EXPLICIT_BITS)); + answer.power2++; // undo previous addition + } + + // normalize to pos INF? + answer.mantissa &= ~((uint64_t)(1) << ffc_const(vk, MANTISSA_EXPLICIT_BITS)); + if (answer.power2 >= ffc_const(vk, INFINITE_POWER)) { // infinity + answer.power2 = ffc_const(vk, INFINITE_POWER); + answer.mantissa = 0; + } + return answer; +} + +// create an adjusted mantissa, biased by the invalid power2 +// for significant digits already multiplied by 10 ** q. +ffc_internal ffc_inline +ffc_adjusted_mantissa ffc_compute_error_scaled(int64_t q, uint64_t w, int lz, ffc_value_kind vk) { + int hilz = (int)(w >> 63) ^ 1; + ffc_adjusted_mantissa answer; + answer.mantissa = w << hilz; + int bias = ffc_const(vk, MANTISSA_EXPLICIT_BITS) - ffc_const(vk, MINIMUM_EXPONENT); + answer.power2 = (int32_t)(ffc_b10_to_b2((int32_t)(q)) + bias - hilz - lz - 62 + + FFC_INVALID_AM_BIAS); + return answer; +} + +// w * 10 ** q, without rounding the representation up. +// the power2 in the exponent will be adjusted by invalid_am_bias. +ffc_internal ffc_inline +ffc_adjusted_mantissa ffc_compute_error(int64_t q, uint64_t w, ffc_value_kind vk) { + int32_t lz = (int32_t)ffc_count_leading_zeroes(w); + w <<= lz; + ffc_u128 product = ffc_compute_product_approximation(q, w, vk); + return ffc_compute_error_scaled(q, product.high, lz, vk); +} + +/* end section: decimal to binary */ + +/* section: entrypoint */ + +ffc_internal ffc_inline +bool ffc_clinger_fast_path_impl(uint64_t mantissa, int64_t exponent, bool is_negative, + ffc_value *value, ffc_value_kind value_kind) { + bool is_double = value_kind == FFC_VALUE_KIND_DOUBLE; + // The implementation of the Clinger's fast path is convoluted because + // we want round-to-nearest in all cases, irrespective of the rounding mode + // selected on the thread. + // We proceed optimistically, assuming that detail::rounds_to_nearest() + // returns true. + if (ffc_const(value_kind, MIN_EXPONENT_FAST_PATH) <= exponent && + exponent <= ffc_const(value_kind, MAX_EXPONENT_FAST_PATH)) { + // Unfortunately, the conventional Clinger's fast path is only possible + // when the system rounds to the nearest float. + // + // We expect the next branch to almost always be selected. + // We could check it first (before the previous branch), but + // there might be performance advantages at having the check + // be last. + if (ffc_rounds_to_nearest()) { + // We have that fegetround() == FE_TONEAREST. + // Next is Clinger's fast path. + if (mantissa <= ffc_const(value_kind, MAX_MANTISSA_FAST_PATH)) { + ffc_set_value(value, value_kind, mantissa); + + if (exponent < 0) { + if (is_double) { + value->d = value->d / FFC_DOUBLE_POWERS_OF_TEN[-exponent]; + } else { + value->f = value->f / FFC_FLOAT_POWERS_OF_TEN[-exponent]; + }; + } else { + if (is_double) { + value->d = value->d * FFC_DOUBLE_POWERS_OF_TEN[exponent]; + } else { + value->f = value->f * FFC_FLOAT_POWERS_OF_TEN[exponent]; + }; + } + if (is_negative) { + ffc_set_value(value, value_kind, -ffc_read_value(value, value_kind)); + } + return true; + } + } else { + // We do not have that fegetround() == FE_TONEAREST. + // Next is a modified Clinger's fast path, inspired by Jakub Jelínek's + // proposal + if (exponent >= 0 && + mantissa <= ffc_const(value_kind, MAX_MANTISSA)[exponent]) { +#if defined(__clang__) || defined(FFC_32BIT) + // Clang may map 0 to -0.0 when fegetround() == FE_DOWNWARD + if (mantissa == 0) { + ffc_set_value(value, value_kind, is_negative ? -0. : 0.); + return true; + } +#endif + if (is_double) { + value->d = (double)mantissa * FFC_DOUBLE_POWERS_OF_TEN[exponent]; + } else { + value->f = (float)mantissa * FFC_FLOAT_POWERS_OF_TEN[exponent]; + } + if (is_negative) { + ffc_set_value(value, value_kind, -ffc_read_value(value, value_kind)); + } + return true; + } + } + } + return false; +} + +ffc_internal ffc_inline +ffc_result ffc_from_chars_advanced(ffc_parsed const pns, ffc_value* value, ffc_value_kind vk) { + ffc_result answer; + + answer.outcome = FFC_OUTCOME_OK; // be optimistic :') + answer.ptr = (char*)pns.lastmatch; + + if (!pns.too_many_digits && + ffc_clinger_fast_path_impl(pns.mantissa, pns.exponent, pns.negative, value, vk)) { + ffc_debug("fast path hit"); + return answer; + } + + ffc_adjusted_mantissa am = ffc_compute_float(pns.exponent, pns.mantissa, vk); + ffc_debug("am.mantissa: %llu\n", am.mantissa); + ffc_debug("am.power2: %d\n", am.power2); + if (pns.too_many_digits && am.power2 >= 0) { + ffc_adjusted_mantissa am_plus_one = ffc_compute_float(pns.exponent, pns.mantissa + 1, vk); + bool equal = am.mantissa == am_plus_one.mantissa && am.power2 == am_plus_one.power2; + if (!equal) { + am = ffc_compute_error(pns.exponent, pns.mantissa, vk); + } + } + // If we called ffc_compute_float(pns.exponent, pns.mantissa) + // and we have an invalid power (am.power2 < 0), then we need to go the long + // way around again. This is very uncommon. + if (am.power2 < 0) { + am = ffc_digit_comp(pns, am, vk); + } + ffc_debug("am post mantissa: %llu\n", am.mantissa); + ffc_debug("am post power2: %d\n", am.power2); + ffc_am_to_float(pns.negative, am, value, vk); + + // Test for over/underflow. + if ((pns.mantissa != 0 && am.mantissa == 0 && am.power2 == 0) || + am.power2 == ffc_const(vk, INFINITE_POWER)) { + answer.outcome = FFC_OUTCOME_OUT_OF_RANGE; + } + return answer; +} + +ffc_internal ffc_inline +ffc_result ffc_from_chars(char* first, char* last, ffc_parse_options options, ffc_value *value, ffc_value_kind vk) { + + // Alias for parity with cpp code, no feature macros to apply + ffc_format const fmt = options.format; + + ffc_result answer; + if ((uint64_t)(fmt & FFC_FORMAT_FLAG_SKIP_WHITE_SPACE)) { + while ((first != last) && ffc_is_space(*first)) { + first++; + } + } + if (first == last) { + answer.outcome = FFC_OUTCOME_INVALID_INPUT; + answer.ptr = first; + return answer; + } + uint64_t json_mode = (uint64_t)(fmt & FFC_FORMAT_FLAG_BASIC_JSON); + ffc_parsed pns = ffc_parse_number_string(first, last, options, json_mode); + + #ifdef FFC_DEBUG + ffc_dump_parsed(pns); + #endif + + if (!pns.valid) { + if ((uint64_t)(fmt & FFC_FORMAT_FLAG_NO_INFNAN)) { + answer.outcome = FFC_OUTCOME_INVALID_INPUT; + answer.ptr = first; + return answer; + } else { + return ffc_parse_infnan(first, last, value, vk, fmt); + } + } + + // call overload that takes parsed_number_string directly. + return ffc_from_chars_advanced(pns, value, vk); +} + +ffc_result ffc_from_chars_double_options(const char *start, const char *end, double* out, ffc_parse_options options) { + // It would be UB to directly use *out as our ffc_value, even though its the same layout + ffc_value out_value = {0}; + + // The all-important call with a constant VALUE_KIND that should cascade in tons of inlining + ffc_result result = ffc_from_chars((char*)start, (char*)end, options, &out_value, FFC_VALUE_KIND_DOUBLE); + *out = out_value.d; + return result; +} +ffc_result ffc_from_chars_double(char const* first, char const* last, double* out) { + ffc_parse_options options = ffc_parse_options_default(); + return ffc_from_chars_double_options(first, last, out, options); +} +ffc_result ffc_parse_double(size_t len, const char *s, double *out) { + char *pend = (char*)(s + len); + return ffc_from_chars_double(s, pend, out); +} +double ffc_parse_double_simple(size_t len, const char *s, ffc_outcome *outcome) { + double out = 0.0; + ffc_result result = ffc_parse_double(len, s, &out); + if (outcome) { + *outcome = result.outcome; + } + return out; +} + +ffc_result ffc_from_chars_float_options(const char *start, const char *end, float* out, ffc_parse_options options) { + ffc_value out_value = {0}; + ffc_result result = ffc_from_chars((char*)start, (char*)end, options, &out_value, FFC_VALUE_KIND_FLOAT); + *out = out_value.f; + return result; +} +ffc_result ffc_from_chars_float(char const* first, char const* last, float* out) { + ffc_parse_options options = ffc_parse_options_default(); + return ffc_from_chars_float_options(first, last, out, options); +} +ffc_result ffc_parse_float(size_t len, const char *s, float *out) { + char *pend = (char*)(s + len); + return ffc_from_chars_float(s, pend, out); +} +float ffc_parse_float_simple(size_t len, const char *s, ffc_outcome *outcome) { + float out = 0.0; + ffc_result result = ffc_parse_float(len, s, &out); + if (outcome) { + *outcome = result.outcome; + } + return out; +} + +ffc_result ffc_parse_i64(size_t len, const char *input, int base, int64_t *out) { + char *pend = (char*)(input + len); + ffc_int_value value_out = {0}; + ffc_result result = ffc_parse_int_string(input, pend, &value_out, FFC_INT_KIND_S64, ffc_parse_options_default(), base); + *out = value_out.s64; + return result; +} +ffc_result ffc_parse_u64(size_t len, const char *input, int base, uint64_t *out) { + char *pend = (char*)(input + len); + ffc_int_value value_out = {0}; + ffc_result result = ffc_parse_int_string(input, pend, &value_out, FFC_INT_KIND_U64, ffc_parse_options_default(), base); + *out = value_out.u64; + return result; +} +ffc_result ffc_parse_i32(size_t len, const char *input, int base, int32_t *out) { + char *pend = (char*)(input + len); + ffc_int_value value_out = {0}; + ffc_result result = ffc_parse_int_string(input, pend, &value_out, FFC_INT_KIND_S32, ffc_parse_options_default(), base); + *out = value_out.s32; + return result; +} +ffc_result ffc_parse_u32(size_t len, const char *input, int base, uint32_t *out) { + char *pend = (char*)(input + len); + ffc_int_value value_out = {0}; + ffc_result result = ffc_parse_int_string(input, pend, &value_out, FFC_INT_KIND_U32, ffc_parse_options_default(), base); + *out = value_out.u32; + return result; +} + +int64_t ffc_parse_i64_simple(size_t len, const char *input, int base, ffc_outcome *outcome) { + int64_t out = 0; + ffc_result result = ffc_parse_i64(len, input, base, &out); + if (outcome) { + *outcome = result.outcome; + } + return out; +} +uint64_t ffc_parse_u64_simple(size_t len, const char *input, int base, ffc_outcome *outcome) { + uint64_t out = 0; + ffc_result result = ffc_parse_u64(len, input, base, &out); + if (outcome) { + *outcome = result.outcome; + } + return out; +} +int32_t ffc_parse_i32_simple(size_t len, const char *input, int base, ffc_outcome *outcome) { + int32_t out = 0; + ffc_result result = ffc_parse_i32(len, input, base, &out); + if (outcome) { + *outcome = result.outcome; + } + return out; +} +uint32_t ffc_parse_u32_simple(size_t len, const char *input, int base, ffc_outcome *outcome) { + uint32_t out = 0; + ffc_result result = ffc_parse_u32(len, input, base, &out); + if (outcome) { + *outcome = result.outcome; + } + return out; +} + +#undef FFC_DOUBLE_SMALLEST_POWER_OF_10 +#undef FFC_DOUBLE_LARGEST_POWER_OF_10 +#undef FFC_DOUBLE_SIGN_INDEX +#undef FFC_DOUBLE_INFINITE_POWER +#undef FFC_DOUBLE_MANTISSA_EXPLICIT_BITS +#undef FFC_DOUBLE_MINIMUM_EXPONENT +#undef FFC_DOUBLE_MIN_EXPONENT_ROUND_TO_EVEN +#undef FFC_DOUBLE_MAX_EXPONENT_ROUND_TO_EVEN +#undef FFC_DOUBLE_MAX_EXPONENT_FAST_PATH +#undef FFC_DOUBLE_MAX_MANTISSA_FAST_PATH +#undef FFC_DOUBLE_EXPONENT_MASK +#undef FFC_DOUBLE_MANTISSA_MASK +#undef FFC_DOUBLE_HIDDEN_BIT_MASK +#undef FFC_DOUBLE_MAX_DIGITS + +#undef FFC_FLOAT_SMALLEST_POWER_OF_10 +#undef FFC_FLOAT_LARGEST_POWER_OF_10 +#undef FFC_FLOAT_SIGN_INDEX +#undef FFC_FLOAT_INFINITE_POWER +#undef FFC_FLOAT_MANTISSA_EXPLICIT_BITS +#undef FFC_FLOAT_MINIMUM_EXPONENT +#undef FFC_FLOAT_MIN_EXPONENT_ROUND_TO_EVEN +#undef FFC_FLOAT_MAX_EXPONENT_ROUND_TO_EVEN +#undef FFC_FLOAT_MAX_EXPONENT_FAST_PATH +#undef FFC_FLOAT_MAX_MANTISSA_FAST_PATH +#undef FFC_FLOAT_EXPONENT_MASK +#undef FFC_FLOAT_MANTISSA_MASK +#undef FFC_FLOAT_HIDDEN_BIT_MASK +#undef FFC_FLOAT_MAX_DIGITS + +#undef FFC_POWERS_OF_5_NUMBER_OF_ENTRIES + +#endif /* FFC_IMPL */ + +#ifdef __cplusplus +} /* extern "C" */ +#endif + +#endif /* FFC_H */ + diff --git a/c_src/jiffy.c b/c_src/jiffy.c index 3083455..c44952a 100644 --- a/c_src/jiffy.c +++ b/c_src/jiffy.c @@ -5,6 +5,10 @@ // a single translation unit so the compiler can optimize across file // boundaries without lto +// Pull in ffc.h implementation once for the whole unity build. +#define FFC_IMPL +#include "ffc.h" + #include "decoder.c" #include "encoder.c" #include "objects.c"
