[
https://issues.apache.org/jira/browse/SOLR-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075241#comment-18075241
]
Christos Malliaridis commented on SOLR-18191:
---------------------------------------------
In my opinion numbers should not be allowed if a string is expected. The
lenient behavior should be disabled and only valid json objects should be
processed. So theoretically, passing a number as name in form of {{123}}
instead of {{"123"}} should return a 400 bad request error. The API should be
strongly typed and not accept any input.
But That is my opinion and I am not sure if that would break workflows for some
users that may use numeric IDs as names without converting them prior to string.
> Disable lenient json parsing at API level
> -----------------------------------------
>
> Key: SOLR-18191
> URL: https://issues.apache.org/jira/browse/SOLR-18191
> Project: Solr
> Issue Type: Improvement
> Components: JSON Request API, v2 API
> Affects Versions: 10.0
> Reporter: Christos Malliaridis
> Priority: Major
> Labels: V2, api, json
>
> The current JSON parsing is lenient and allows fields like configset or
> colleciton values during creation to be numbers instead of strings. This is
> problematic for cases where resources are auto-populated with numeric IDs.
> h3. Problematic Use Cases
> The following use cases for creating a configset (but also other resources)
> should provide more information why lenient input is problematic:
> ||Numeric Input||Example Value||API Response||Notes||
> |Normal Numbers|12345|200 _created_|Simple use case that would make sense,
> number is converted to string in background|
> |Prefixed / Padded Numbers|012345|500 error: _Invalid numeric value: Leading
> zeroes not allowed_|Leading zeros in string cases allowed, but not in numbers
> (number parsing issue)|
> |Prefixed / Padded Numbers as Strings|"012345"|200 _created_|Same number that
> caused error before works if converted to string|
> |Negative Numbers|-12345|400 _Invalid configset: [-12345]. configset names
> must consist entirely of periods, underscores, hyphens, and alphanumerics as
> well as not start with a hyphen._|Negative number is converted to string, and
> fails because it starts with a hyphen|
> |Extremly large numbers|99999...|500 _Number value length (6720) exceeds the
> maximum allowed (1000, from
> `StreamReadConstraints.getMaxNumberLength()`)_|Large numbers fail to be
> parsed as numbers first|
> |Extremly large numbers as strings|"9999..."|200 _created_|Large numbers that
> previously failed succeed when converted to string before|
> These use cases show that the conversion of numbers to strings often fail and
> is inconsistent when lenient input is used.
> h3. Proposed Solution
> The solution proposed here is to disable this json parsing behavior in 3
> steps, of which the first two may be skipped if we consider this a bug.
> Otherwise, since this is an API change, we should treat it as a breaking
> change and migrate as proposed below:
> # Optional: Introduce a flag for enabling the lenient input that is enabled
> by default in the next 10.x version
> # Optional: Change the default value from enabled to disabled in the 10.x+1
> version
> # Remove the lenient input configuration for json parsing and require
> correct input types in 11.0
> h3. Documentation Changes
> We should also add the expected type and validation rules that apply to our
> documentation, so that it is clear what are valid inputs for individual cases.
> h3. Possible Issues
> If we change the behavior for lenient input parsing globally (in case one
> parser is used for all inputs), we should check which endpoints / fields are
> affected by that, and document at endpoint level if specific fields are
> "unexpectedly changing".
> Do also note that a global change does also affect inputs that are expected
> to be numbers and that are provided as strings.
> h3. What is not considere here
> During the tests I noticed that large numbers / large strings as inputs for
> configset names or collection names can also become problematic when
> exceeding 256 characters. This however should not yet be considered here, as
> it is another validation issue that should be addressed separately.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]