[
https://issues.apache.org/jira/browse/SOLR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noble Paul updated SOLR-2094:
-----------------------------
Description:
I have a dih config with a SqlEntityProcessor that retrives a table. I then
have a sub-entity with the XPathEntityProcessor type, this takes a value from
the table as input to parse through an xml doc.
I find that the first document is created correctly, but then the xpathReader
of the XPathEntityProcessor does not reinitialize for the following documents
so the initial documents input is used.
{code:xml}
<dataSource name="hivseqdb" driver="com.mysql.jdbc.Driver"
url="l"
user="hivseqdb" password="hivseqdb" batchSize="1"/>
<dataSource name="xmlFile" type="FileDataSource" />
<document><entity name="Sequence" dataSource="hivseqdb" pk="se_id"
query="SELECT * FROM hivseqdb.sequenceentry where se_id != '1'">
<entity name="FMA_Tissue_Hierarchy"
dataSource="xmlFile"
pk="fma-id"
forEach="/tissue-samples"
processor="XPathEntityProcessor"
url="/opt/hivseqdb/solr/conf/sub_ontology_translated.xml"
stream="true">
<field column="tissue-antology-parent-path"
xpath="/tissue-samples/tissue[@fma-id='${Sequence.sampleTissueCode}']/parent-path"/>
</entity>
{code}
DocBuilder dose call init on the XPathEntityProcessor but there is a
conditional in the init method to check if the xpathReader is null:
{code:java}
public void init(Context context) {
super.init(context);
if (xpathReader == null)
initXpathReader();
pk = context.getEntityAttribute("pk");
dataSource = context.getDataSource();
rowIterator = null;
}
{code}
So the xPathReader is used again and again. Is there away to reinitialize the
xPathReader for every document? Or what is the specific design reason for
preserving it?
was:
I have a dih config with a SqlEntityProcessor that retrives a table. I then
have a sub-entity with the XPathEntityProcessor type, this takes a value from
the table as input to parse through an xml doc.
I find that the first document is created correctly, but then the xpathReader
of the XPathEntityProcessor does not reinitialize for the following documents
so the initial documents input is used.
<dataSource name="hivseqdb" driver="com.mysql.jdbc.Driver"
url="l"
user="hivseqdb" password="hivseqdb" batchSize="1"/>
<dataSource name="xmlFile" type="FileDataSource" />
<document><entity name="Sequence" dataSource="hivseqdb" pk="se_id"
query="SELECT * FROM hivseqdb.sequenceentry where se_id != '1'">
<entity name="FMA_Tissue_Hierarchy"
dataSource="xmlFile"
pk="fma-id"
forEach="/tissue-samples"
processor="XPathEntityProcessor"
url="/opt/hivseqdb/solr/conf/sub_ontology_translated.xml"
stream="true">
<field column="tissue-antology-parent-path"
xpath="/tissue-samples/tissue[@fma-id='${Sequence.sampleTissueCode}']/parent-path"/>
</entity>
DocBuilder dose call init on the XPathEntityProcessor but there is a
conditional in the init method to check if the xpathReader is null:
public void init(Context context) {
super.init(context);
if (xpathReader == null)
initXpathReader();
pk = context.getEntityAttribute("pk");
dataSource = context.getDataSource();
rowIterator = null;
}
So the xPathReader is used again and again. Is there away to reinitialize the
xPathReader for every document? Or what is the specific design reason for
preserving it?
> When using a XPathEntityProcessor nested within a SQLEntityProcessor, the
> xpathReader isn't reinitilized for each new document
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-2094
> URL: https://issues.apache.org/jira/browse/SOLR-2094
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler
> Affects Versions: 1.4.1
> Environment: Solr 1.4
> Reporter: Niall O'Connor
> Assignee: Alexandre Rafalovitch
> Attachments: SOLR-2094.patch
>
>
> I have a dih config with a SqlEntityProcessor that retrives a table. I then
> have a sub-entity with the XPathEntityProcessor type, this takes a value from
> the table as input to parse through an xml doc.
> I find that the first document is created correctly, but then the xpathReader
> of the XPathEntityProcessor does not reinitialize for the following documents
> so the initial documents input is used.
> {code:xml}
> <dataSource name="hivseqdb" driver="com.mysql.jdbc.Driver"
> url="l"
> user="hivseqdb" password="hivseqdb" batchSize="1"/>
>
> <dataSource name="xmlFile" type="FileDataSource" />
>
> <document><entity name="Sequence" dataSource="hivseqdb" pk="se_id"
> query="SELECT * FROM hivseqdb.sequenceentry where se_id != '1'">
>
> <entity name="FMA_Tissue_Hierarchy"
> dataSource="xmlFile"
> pk="fma-id"
> forEach="/tissue-samples"
> processor="XPathEntityProcessor"
>
> url="/opt/hivseqdb/solr/conf/sub_ontology_translated.xml"
> stream="true">
> <field column="tissue-antology-parent-path"
> xpath="/tissue-samples/tissue[@fma-id='${Sequence.sampleTissueCode}']/parent-path"/>
> </entity>
> {code}
> DocBuilder dose call init on the XPathEntityProcessor but there is a
> conditional in the init method to check if the xpathReader is null:
> {code:java}
> public void init(Context context) {
> super.init(context);
> if (xpathReader == null)
> initXpathReader();
> pk = context.getEntityAttribute("pk");
> dataSource = context.getDataSource();
> rowIterator = null;
> }
> {code}
> So the xPathReader is used again and again. Is there away to reinitialize the
> xPathReader for every document? Or what is the specific design reason for
> preserving it?
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]