Christos Malliaridis created SOLR-18191:
-------------------------------------------

             Summary: Disable lenient json parsing at API level
                 Key: SOLR-18191
                 URL: https://issues.apache.org/jira/browse/SOLR-18191
             Project: Solr
          Issue Type: Improvement
          Components: JSON Request API, v2 API
    Affects Versions: 10.0
            Reporter: Christos Malliaridis


The current JSON parsing is lenient and allows fields like configset or 
colleciton values during creation to be numbers instead of strings. This is 
problematic for cases where resources are auto-populated with numeric IDs.
h3. Problematic Use Cases

The following use cases for creating a configset (but also other resources) 
should provide more information why lenient input is problematic:
||Numeric Input||Example Value||API Response||Notes||
|Normal Numbers|12345|200 _created_|Simple use case that would make sense, 
number is converted to string in background|
|Prefixed / Padded Numbers|012345|500 error: _Invalid numeric value: Leading 
zeroes not allowed_|Leading zeros in string cases allowed, but not in numbers 
(number parsing issue)|
|Prefixed / Padded Numbers as Strings|"012345"|200 _created_|Same number that 
caused error before works if converted to string|
|Negative Numbers|-12345|400 _Invalid configset: [-12345]. configset names must 
consist entirely of periods, underscores, hyphens, and alphanumerics as well as 
not start with a hyphen._|Negative number is converted to string, and fails 
because it starts with a hyphen|
|Extremly large numbers|99999...|500 _Number value length (6720) exceeds the 
maximum allowed (1000, from 
`StreamReadConstraints.getMaxNumberLength()`)_|Large numbers fail to be parsed 
as numbers first|
|Extremly large numbers as strings|"9999..."|200 _created_|Large numbers that 
previously failed succeed when converted to string before|

These use cases show that the conversion of numbers to strings often fail and 
is inconsistent when lenient input is used.
h3. Proposed Solution

The solution proposed here is to disable this json parsing behavior in 3 steps, 
of which the first two may be skipped if we consider this a bug. Otherwise, 
since this is an API change, we should treat it as a breaking change and 
migrate as proposed below:
 # Optional: Introduce a flag for enabling the lenient input that is enabled by 
default in the next 10.x version
 # Optional: Change the default value from enabled to disabled in the 10.x+1 
version
 # Remove the lenient input configuration for json parsing and require correct 
input types in 11.0

h3. Documentation Changes

We should also add the expected type and validation rules that apply to our 
documentation, so that it is clear what are valid inputs for individual cases.
h3. Possible Issues

If we change the behavior for lenient input parsing globally (in case one 
parser is used for all inputs), we should check which endpoints / fields are 
affected by that, and document at endpoint level if specific fields are 
"unexpectedly changing".

Do also note that a global change does also affect inputs that are expected to 
be numbers and that are provided as strings.
h3. What is not considere here

During the tests I noticed that large numbers / large strings as inputs for 
configset names or collection names can also become problematic when exceeding 
256 characters. This however should not yet be considered here, as it is 
another validation issue that should be addressed separately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to