[ 
https://issues.apache.org/jira/browse/SOLR-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Drapushko updated SOLR-13850:
---------------------------------------
    Description: 
If you try to update non pre-analyzed fields in a document using atomic 
updates, data in pre-analyzed fields (if there is any) will be lost.

*Steps to reproduce*

1. Index this document into techproducts
{code:json}
{
  "id": "a",
  "n_s": "s1",
  "pre": 
"{\"v\":\"1\",\"str\":\"Alaska\",\"tokens\":[{\"t\":\"alaska\",\"s\":0,\"e\":6,\"i\":1}]}"
}
{code}

2. Query the document
{code:json}
{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
    {
      "id":"a",
      "n_s":"s1",
      "pre":"Alaska",
      "_version_":1647475215142223872}]
}}
{code}

3. Update using atomic syntax
{code:json}
{
  "add": {
    "doc": {
      "id": "a",
      "n_s": {"set": "s2"}
}}}
{code}

4. Observe the warning in solr log
UI:
{noformat}
 WARN x:techproducts_shard2_replica_n6 PreAnalyzedField Error parsing 
pre-analyzed field 'pre'
{noformat}

solr.log:
{noformat}
WARN (qtp1384454980-23) [c:techproducts s:shard2 r:core_node8 
x:techproducts_shard2_replica_n6] o.a.s.s.PreAnalyzedField Error parsing 
pre-analyzed field 'pre' => java.io.IOException: Invalid JSON type 
java.lang.String, expected Map
 at 
org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86)
{noformat}

5. Query the document again
{code:json}
{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
    {
      "id":"a",
      "n_s":"s2",
      "_version_":1647475461695995904}]
}}
{code}

*Result*: There is no 'pre' field in the document anymore.


_My thoughts on it_

1. Data loss can be prevented if the warning will be replaced with error 
(re-throwing exception). Atomic updates for such documents still won't work, 
but updates will be explicitly rejected.

2. Solr tries to read the document from index, merge it with input document and 
re-index the document, but when it reads indexed pre-analyzed fields the format 
is different, so Solr cannot parse and re-index those fields properly.

  was:
If you try to update non pre-analyzed fields in a document using atomic 
updates, data in pre-analyzed fields (if there is any) will be lost.

 

Steps to reproduce

1. Index this document into techproducts

{
  "id": "a",
  "n_s": "s1",
  "pre": 
"\{\"v\":\"1\",\"str\":\"Alaska\",\"tokens\":[{\"t\":\"alaska\",\"s\":0,\"e\":6,\"i\":1}]}"
}

2. Query the document

{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
    {
      "id":"a",
      "n_s":"s1",
      "pre":"Alaska",
      "_version_":1647475215142223872}]
}}

3. Update using atomic syntax

{
  "add": {
    "doc": {
      "id": "a",
      "n_s": \{"set": "s2"}
}}}

4. Observe the warning in solr log

UI:
WARN x:techproducts_shard2_replica_n6 PreAnalyzedField Error parsing 
pre-analyzed field 'pre'

solr.log:
WARN (qtp1384454980-23) [c:techproducts s:shard2 r:core_node8 
x:techproducts_shard2_replica_n6] o.a.s.s.PreAnalyzedField Error parsing 
pre-analyzed field 'pre' => java.io.IOException: Invalid JSON type 
java.lang.String, expected Map
at 
org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86)

5. Query the document again

{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
    {
      "id":"a",
      "n_s":"s2",
      "_version_":1647475461695995904}]
}}

Result: There is no 'pre' field in the document anymore.

 

My thoughts on it

1. Data loss can be prevented if the warning will be replaced with error 
(re-throwing exception). Atomic updates for such documents still won't work, 
but updates will be explicitly rejected.

2. Solr tries to read the document from index, merge it with input document and 
re-index the document, but when it reads indexed pre-analyzed fields the format 
is different, so Solr cannot parse and re-index those fields properly.


> Atomic Updates with PreAnalyzedField
> ------------------------------------
>
>                 Key: SOLR-13850
>                 URL: https://issues.apache.org/jira/browse/SOLR-13850
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.7.2, 8.2
>         Environment: Ubuntu 16.04 LTS / Java 8 (Zulu), Windows 10 / Java 11 
> (Oracle)
>            Reporter: Oleksandr Drapushko
>            Priority: Critical
>              Labels: AtomicUpdate
>
> If you try to update non pre-analyzed fields in a document using atomic 
> updates, data in pre-analyzed fields (if there is any) will be lost.
> *Steps to reproduce*
> 1. Index this document into techproducts
> {code:json}
> {
>   "id": "a",
>   "n_s": "s1",
>   "pre": 
> "{\"v\":\"1\",\"str\":\"Alaska\",\"tokens\":[{\"t\":\"alaska\",\"s\":0,\"e\":6,\"i\":1}]}"
> }
> {code}
> 2. Query the document
> {code:json}
> {
>   "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
>     {
>       "id":"a",
>       "n_s":"s1",
>       "pre":"Alaska",
>       "_version_":1647475215142223872}]
> }}
> {code}
> 3. Update using atomic syntax
> {code:json}
> {
>   "add": {
>     "doc": {
>       "id": "a",
>       "n_s": {"set": "s2"}
> }}}
> {code}
> 4. Observe the warning in solr log
> UI:
> {noformat}
>  WARN x:techproducts_shard2_replica_n6 PreAnalyzedField Error parsing 
> pre-analyzed field 'pre'
> {noformat}
> solr.log:
> {noformat}
> WARN (qtp1384454980-23) [c:techproducts s:shard2 r:core_node8 
> x:techproducts_shard2_replica_n6] o.a.s.s.PreAnalyzedField Error parsing 
> pre-analyzed field 'pre' => java.io.IOException: Invalid JSON type 
> java.lang.String, expected Map
>  at 
> org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86)
> {noformat}
> 5. Query the document again
> {code:json}
> {
>   "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
>     {
>       "id":"a",
>       "n_s":"s2",
>       "_version_":1647475461695995904}]
> }}
> {code}
> *Result*: There is no 'pre' field in the document anymore.
> _My thoughts on it_
> 1. Data loss can be prevented if the warning will be replaced with error 
> (re-throwing exception). Atomic updates for such documents still won't work, 
> but updates will be explicitly rejected.
> 2. Solr tries to read the document from index, merge it with input document 
> and re-index the document, but when it reads indexed pre-analyzed fields the 
> format is different, so Solr cannot parse and re-index those fields properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to