Christos Malliaridis created SOLR-18191:
-------------------------------------------
Summary: Disable lenient json parsing at API level
Key: SOLR-18191
URL: https://issues.apache.org/jira/browse/SOLR-18191
Project: Solr
Issue Type: Improvement
Components: JSON Request API, v2 API
Affects Versions: 10.0
Reporter: Christos Malliaridis
The current JSON parsing is lenient and allows fields like configset or
colleciton values during creation to be numbers instead of strings. This is
problematic for cases where resources are auto-populated with numeric IDs.
h3. Problematic Use Cases
The following use cases for creating a configset (but also other resources)
should provide more information why lenient input is problematic:
||Numeric Input||Example Value||API Response||Notes||
|Normal Numbers|12345|200 _created_|Simple use case that would make sense,
number is converted to string in background|
|Prefixed / Padded Numbers|012345|500 error: _Invalid numeric value: Leading
zeroes not allowed_|Leading zeros in string cases allowed, but not in numbers
(number parsing issue)|
|Prefixed / Padded Numbers as Strings|"012345"|200 _created_|Same number that
caused error before works if converted to string|
|Negative Numbers|-12345|400 _Invalid configset: [-12345]. configset names must
consist entirely of periods, underscores, hyphens, and alphanumerics as well as
not start with a hyphen._|Negative number is converted to string, and fails
because it starts with a hyphen|
|Extremly large numbers|99999...|500 _Number value length (6720) exceeds the
maximum allowed (1000, from
`StreamReadConstraints.getMaxNumberLength()`)_|Large numbers fail to be parsed
as numbers first|
|Extremly large numbers as strings|"9999..."|200 _created_|Large numbers that
previously failed succeed when converted to string before|
These use cases show that the conversion of numbers to strings often fail and
is inconsistent when lenient input is used.
h3. Proposed Solution
The solution proposed here is to disable this json parsing behavior in 3 steps,
of which the first two may be skipped if we consider this a bug. Otherwise,
since this is an API change, we should treat it as a breaking change and
migrate as proposed below:
# Optional: Introduce a flag for enabling the lenient input that is enabled by
default in the next 10.x version
# Optional: Change the default value from enabled to disabled in the 10.x+1
version
# Remove the lenient input configuration for json parsing and require correct
input types in 11.0
h3. Documentation Changes
We should also add the expected type and validation rules that apply to our
documentation, so that it is clear what are valid inputs for individual cases.
h3. Possible Issues
If we change the behavior for lenient input parsing globally (in case one
parser is used for all inputs), we should check which endpoints / fields are
affected by that, and document at endpoint level if specific fields are
"unexpectedly changing".
Do also note that a global change does also affect inputs that are expected to
be numbers and that are provided as strings.
h3. What is not considere here
During the tests I noticed that large numbers / large strings as inputs for
configset names or collection names can also become problematic when exceeding
256 characters. This however should not yet be considered here, as it is
another validation issue that should be addressed separately.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]