[jira] [Updated] (SOLR-2094) When using a XPathEntityProcessor nested within a SQLEntityProcessor, the xpathReader isn't reinitilized for each new document

Noble Paul (JIRA) Wed, 26 Oct 2016 03:57:55 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Noble Paul updated SOLR-2094:
-----------------------------
    Description: 
I have a dih config with a SqlEntityProcessor that retrives a table. I then 
have a sub-entity with the XPathEntityProcessor type, this takes a value from 
the table as input to parse through an xml doc. 
I find that the first document is created correctly, but then the xpathReader 
of the XPathEntityProcessor does not reinitialize for the following documents 
so the initial documents input is used. 

{code:xml}
<dataSource name="hivseqdb" driver="com.mysql.jdbc.Driver"
           url="l"
           user="hivseqdb" password="hivseqdb" batchSize="1"/>
           
    <dataSource name="xmlFile" type="FileDataSource" />
    
        <document><entity name="Sequence" dataSource="hivseqdb" pk="se_id" 
query="SELECT * FROM hivseqdb.sequenceentry where se_id != '1'">
                        
            <entity name="FMA_Tissue_Hierarchy" 
                        dataSource="xmlFile"
                        pk="fma-id"
                        forEach="/tissue-samples" 
                        processor="XPathEntityProcessor" 
                        
url="/opt/hivseqdb/solr/conf/sub_ontology_translated.xml" 
                        stream="true">
                <field column="tissue-antology-parent-path" 
xpath="/tissue-samples/tissue[@fma-id='${Sequence.sampleTissueCode}']/parent-path"/>
            </entity>
{code}
DocBuilder dose call init on the XPathEntityProcessor but there is a 
conditional in the init method to check if the xpathReader is null:
{code:java}

  public void init(Context context) {
    super.init(context);
    if (xpathReader == null)
      initXpathReader();
    pk = context.getEntityAttribute("pk");
    dataSource = context.getDataSource();
    rowIterator = null;

  }
{code}
So the xPathReader is used again and again. Is there away to reinitialize the 
xPathReader for every document? Or what is the specific design reason for 
preserving it?


                
                

  was:
I have a dih config with a SqlEntityProcessor that retrives a table. I then 
have a sub-entity with the XPathEntityProcessor type, this takes a value from 
the table as input to parse through an xml doc. 
I find that the first document is created correctly, but then the xpathReader 
of the XPathEntityProcessor does not reinitialize for the following documents 
so the initial documents input is used. 


<dataSource name="hivseqdb" driver="com.mysql.jdbc.Driver"
           url="l"
           user="hivseqdb" password="hivseqdb" batchSize="1"/>
           
    <dataSource name="xmlFile" type="FileDataSource" />
    
        <document><entity name="Sequence" dataSource="hivseqdb" pk="se_id" 
query="SELECT * FROM hivseqdb.sequenceentry where se_id != '1'">
                        
            <entity name="FMA_Tissue_Hierarchy" 
                        dataSource="xmlFile"
                        pk="fma-id"
                        forEach="/tissue-samples" 
                        processor="XPathEntityProcessor" 
                        
url="/opt/hivseqdb/solr/conf/sub_ontology_translated.xml" 
                        stream="true">
                <field column="tissue-antology-parent-path" 
xpath="/tissue-samples/tissue[@fma-id='${Sequence.sampleTissueCode}']/parent-path"/>
            </entity>

DocBuilder dose call init on the XPathEntityProcessor but there is a 
conditional in the init method to check if the xpathReader is null:

  public void init(Context context) {
    super.init(context);
    if (xpathReader == null)
      initXpathReader();
    pk = context.getEntityAttribute("pk");
    dataSource = context.getDataSource();
    rowIterator = null;

  }

So the xPathReader is used again and again. Is there away to reinitialize the 
xPathReader for every document? Or what is the specific design reason for 
preserving it?


                
                


> When using a XPathEntityProcessor nested within a SQLEntityProcessor, the 
> xpathReader isn't reinitilized for each new document 
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2094
>                 URL: https://issues.apache.org/jira/browse/SOLR-2094
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4.1
>         Environment: Solr 1.4
>            Reporter: Niall O'Connor
>            Assignee: Alexandre Rafalovitch
>         Attachments: SOLR-2094.patch
>
>
> I have a dih config with a SqlEntityProcessor that retrives a table. I then 
> have a sub-entity with the XPathEntityProcessor type, this takes a value from 
> the table as input to parse through an xml doc. 
> I find that the first document is created correctly, but then the xpathReader 
> of the XPathEntityProcessor does not reinitialize for the following documents 
> so the initial documents input is used. 
> {code:xml}
> <dataSource name="hivseqdb" driver="com.mysql.jdbc.Driver"
>          url="l"
>            user="hivseqdb" password="hivseqdb" batchSize="1"/>
>            
>     <dataSource name="xmlFile" type="FileDataSource" />
>     
>       <document><entity name="Sequence" dataSource="hivseqdb" pk="se_id" 
> query="SELECT * FROM hivseqdb.sequenceentry where se_id != '1'">
>                       
>             <entity name="FMA_Tissue_Hierarchy" 
>                       dataSource="xmlFile"
>                       pk="fma-id"
>                       forEach="/tissue-samples" 
>                       processor="XPathEntityProcessor" 
>                       
> url="/opt/hivseqdb/solr/conf/sub_ontology_translated.xml" 
>                       stream="true">
>                 <field column="tissue-antology-parent-path" 
> xpath="/tissue-samples/tissue[@fma-id='${Sequence.sampleTissueCode}']/parent-path"/>
>             </entity>
> {code}
> DocBuilder dose call init on the XPathEntityProcessor but there is a 
> conditional in the init method to check if the xpathReader is null:
> {code:java}
>   public void init(Context context) {
>     super.init(context);
>     if (xpathReader == null)
>       initXpathReader();
>     pk = context.getEntityAttribute("pk");
>     dataSource = context.getDataSource();
>     rowIterator = null;
>   }
> {code}
> So the xPathReader is used again and again. Is there away to reinitialize the 
> xPathReader for every document? Or what is the specific design reason for 
> preserving it?
>               
>               



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-2094) When using a XPathEntityProcessor nested within a SQLEntityProcessor, the xpathReader isn't reinitilized for each new document

Reply via email to