[ 
https://issues.apache.org/jira/browse/SOLR-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Watts updated SOLR-2960:
--------------------------------

    Summary: XPathEntityProcessor does not clear nulls from empty multi-valued 
fields  (was: XPathEntityProcessor does not clear nulls from empty fields)
    
> XPathEntityProcessor does not clear nulls from empty multi-valued fields
> ------------------------------------------------------------------------
>
>                 Key: SOLR-2960
>                 URL: https://issues.apache.org/jira/browse/SOLR-2960
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>            Reporter: Michael Watts
>            Priority: Minor
>             Fix For: 3.6
>
>
> I can't confidently say I completeley understand all that these classes so 
> boldy tackle (that is, XPathEntityProcessor and XPathRecordReader) , but 
> there may be someone who does. Nonetheless, I think I've got some or most of 
> this right, and more likely there are more someones like that. So, I won't 
> qualify everything I say with a maybe -- lets this be the refactoring of 
> those. 
> Whenever mapping an XML file into a Solr Index, within the XPathRecordReader, 
> (used by the XPathEntityProcessor within the DataImportHandler), if (A) a 
> field is perceived to be null and is multivalued, it is pushed a value of 
> null (on top of any other values it previously had). Otherwise (B) for 
> multivalued fields, any found value is pushed onto its existing list of 
> values, and the field is marked as found within the frame (a.k.a record). 
> In general, when the end-tag of a record is seen, (C) the XPathRecordReader 
> clears all of the field's values which have been marked as found, as tidiness 
> is a value and they are supposedly no longer useful. 
> However, suppose that for a given record and multivalued field, a value is 
> never found (though it may have been found for other fields in the record), 
> only (A) will have occurred, never will (B) have occurred, the field will 
> never have been marked as found, and thus (C) never will have occurred for 
> the field. 
> So, the field will remain, with its list of nulls. 
> This list of nulls will grow until either the last record or a non-null value 
> is seen. 
> And so, (1) an out-of-memory error may occur, given sufficiently many records 
> and a mortal computer. 
> Moreover, (2), a transformer cannot reliably depend on the number of nulls in 
> the field (and this information cannot be guaranteed to be determined by some 
> other value). 
> I will try to provide more information, if this seems an issue and if there 
> doesn't seem to be an answer. 
> At this point, if I understand the problem correctly, it seems the answer is 
> to 'mark' those null fields, considering 'null' and added value. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to