[ 
https://issues.apache.org/jira/browse/SOLR-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808705#comment-17808705
 ] 

Christine Poerschke commented on SOLR-17120:
--------------------------------------------

Thanks Calvin for testing and sharing your findings!

Couple of thinking out alouds and curiosity if I may, again with the caveat 
that I'm not very familiar with the partial update functionality – about the 
puzzle of {{olderDoc.getFieldNames()}} returning the {{fieldName}} but then 
{{olderDoc.getFieldValues()}} returned no corresponding value(s) – wondering
 * if the {{getFieldValues}} return value of {{null}} is due to the field being 
set to {{null}} somehow or the field not being set – ref: 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/solr/solrj/src/java/org/apache/solr/common/SolrInputDocument.java#L121-L127]
 * (to the extent that you are able to share) what is the field 
schema/definition e.g. does type make a difference or if the field is required 
or if the field is defaulted or multiValued etc.
 * about the sequence of events e.g. is it perhaps a case of a "remove this 
field" happening before there ever was an actual value for the field, or maybe 
there are two consecutive "remove this field" happenings, or variants on that 
theme in case of multiValued fields

Context for my curiosity is sort of that if we understand more on how the 
name-but-no-value scenario arises then that could point to a different 
(complementary or alternative) code change and/or we might discover 'something 
else' also being unexpected.

> NullPointerException in UpdateLog.applyOlderUpdates in solr 6.6-9.4 involving 
> partial updates
> ---------------------------------------------------------------------------------------------
>
>                 Key: SOLR-17120
>                 URL: https://issues.apache.org/jira/browse/SOLR-17120
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 6.6.2, 8.11.2, 9.4
>         Environment: The issue occurred on Linux, CentOS 7.9, with the 
> following JDK version:
> {noformat}
> openjdk version "11.0.20" 2023-07-18 LTS
> OpenJDK Runtime Environment (Red_Hat-11.0.20.0.8-1.el7_9) (build 
> 11.0.20+8-LTS)
> OpenJDK 64-Bit Server VM (Red_Hat-11.0.20.0.8-1.el7_9) (build 11.0.20+8-LTS, 
> mixed mode, sharing){noformat}
>            Reporter: Calvin Smith
>            Priority: Major
>
> I mailed the solr-users mailing list about this issue, but didn't get any 
> responses there, so am creating this issue. The subject of the email thread 
> for additional context was "NullPointerException in 
> UpdateLog.applyOlderUpdates under solr 8&9 involving partial updates and high 
> update load" - link: 
> [https://lists.apache.org/thread/n9zm4gocl7cf073syy1159dy6ojjrywl]
> I'm seeing a Solr HTTP 500 error when performing a partial update of a 
> document that turns out to triggered by there having been a recent update of 
> the same document that included a partial update that set a field to 
> {{{}null{}}}. I've observed the behavior in versions 6.6.2, 8.11.2, and 
> 9.4.0, which are the only 3 versions I've tried.
> To give an example, an update doc like
>  
> {code:java}
> {
>     "id": "123", 
>     "camera_unit": {"set": null}
> }{code}
>  
> followed shortly thereafter (not sure of exact timing, but I was using a 
> {{commitWithin}} of 600s and the subsequent updates were less than 20 seconds 
> later), after some other updates had happened for different documents, there 
> was another update of the same document, like
>  
> {code:java}
> {
>     "id": "123", 
>     "playlist": {
>       "set": [
>         12345
>       ]
>     },
>     "playlist_index_321": {
>       "set": 0
>     }
> }{code}
>  
> This later update may, but doesn't always, cause the 
> {{{}NullPointerException{}}}, so there is some other factor such as the state 
> of the {{tlog}} that also has to be satisfied for the error to occur.
> The exception is thrown by the following code in {{UpdateLog.java}} 
> ({{{}org.apache.solr.update.UpdateLog{}}}):
>  
> {code:java}
> /** Add all fields from olderDoc into newerDoc if not already present in 
> newerDoc */
>   private void applyOlderUpdates(
>       SolrDocumentBase<?, ?> newerDoc, SolrInputDocument olderDoc, 
> Set<String> mergeFields) {
>     for (String fieldName : olderDoc.getFieldNames()) {
>       // if the newerDoc has this field, then this field from olderDoc can be 
> ignored
>       if (!newerDoc.containsKey(fieldName)
>           && (mergeFields == null || mergeFields.contains(fieldName))) {
>         for (Object val : olderDoc.getFieldValues(fieldName)) {
>           newerDoc.addField(fieldName, val);
>         }
>       }
>     }
>   }{code}
>  
> The exception is due to the inner for statement trying to iterate over the 
> {{null}} value being returned by {{{}olderDoc.getFieldValues(fieldName){}}}.
> When I change that method to the following:
>  
> {code:java}
> /** Add all fields from olderDoc into newerDoc if not already present in 
> newerDoc */
>   private void applyOlderUpdates(
>       SolrDocumentBase<?, ?> newerDoc, SolrInputDocument olderDoc, 
> Set<String> mergeFields) {
>     for (String fieldName : olderDoc.getFieldNames()) {
>       // if the newerDoc has this field, then this field from olderDoc can be 
> ignored
>       if (!newerDoc.containsKey(fieldName)
>           && (mergeFields == null || mergeFields.contains(fieldName))) {
>         Collection<Object> values = olderDoc.getFieldValues(fieldName);
>         if (values == null) {
>             newerDoc.addField(fieldName, null);
>         } else {
>             for (Object val : values) {
>               newerDoc.addField(fieldName, val);
>             }
>         }
>       }
>     }
>   }{code}
>  
> Then after rebuilding the solr-core JAR with {{./gradlew devFull}} and 
> restarting Solr with that custom jar file, I can no longer reproduce the 
> error.
> I'm not familiar with the Solr codebase though and am not at all sure that 
> {{newerDoc.addField(fieldName, null)}} is what should be done there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to