[ 
https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-7383:
-----------------------------
    Description: 
The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) is 
broken again. See associated issues.

Below is a config that should work.

This is caused by Slashdot seemingly oscillating between RDF/RSS and pure RSS. 
Perhaps we should depend upon something more static, rather than an external 
service that is free to change as it desires.
{code:xml}
<dataConfig>
    <dataSource type="URLDataSource" />
    <document>
        <entity name="slashdot"
                pk="link"
                url="http://rss.slashdot.org/Slashdot/slashdot";
                processor="XPathEntityProcessor"
                forEach="/RDF/item"
                transformer="DateFormatTransformer">
                                
            <field column="source" xpath="/RDF/channel/title" 
commonField="true" />
            <field column="source-link" xpath="/RDF/channel/link" 
commonField="true" />
            <field column="subject" xpath="/RDF/channel/subject" 
commonField="true" />
                        
            <field column="title" xpath="/RDF/item/title" />
            <field column="link" xpath="/RDF/item/link" />
            <field column="description" xpath="/RDF/item/description" />
            <field column="creator" xpath="/RDF/item/creator" />
            <field column="item-subject" xpath="/RDF/item/subject" />
            <field column="date" xpath="/RDF/item/date" 
dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
            <field column="slash-department" xpath="/RDF/item/department" />
            <field column="slash-section" xpath="/RDF/item/section" />
            <field column="slash-comments" xpath="/RDF/item/comments" />
        </entity>
    </document>
</dataConfig>
{code}

  was:
The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) is 
broken again. See associated issues.

Below is a config that should work.

This is caused by Slashdot seemingly oscillating between RDF/RSS and pure RSS. 
Perhaps we should depend upon something more static, rather than an external 
service that is free to change as it desires.

<dataConfig>
    <dataSource type="URLDataSource" />
    <document>
        <entity name="slashdot"
                pk="link"
                url="http://rss.slashdot.org/Slashdot/slashdot";
                processor="XPathEntityProcessor"
                forEach="/RDF/item"
                transformer="DateFormatTransformer">
                                
            <field column="source" xpath="/RDF/channel/title" 
commonField="true" />
            <field column="source-link" xpath="/RDF/channel/link" 
commonField="true" />
            <field column="subject" xpath="/RDF/channel/subject" 
commonField="true" />
                        
            <field column="title" xpath="/RDF/item/title" />
            <field column="link" xpath="/RDF/item/link" />
            <field column="description" xpath="/RDF/item/description" />
            <field column="creator" xpath="/RDF/item/creator" />
            <field column="item-subject" xpath="/RDF/item/subject" />
            <field column="date" xpath="/RDF/item/date" 
dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
            <field column="slash-department" xpath="/RDF/item/department" />
            <field column="slash-section" xpath="/RDF/item/section" />
            <field column="slash-comments" xpath="/RDF/item/comments" />
        </entity>
    </document>
</dataConfig>



> DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo 
> possible
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-7383
>                 URL: https://issues.apache.org/jira/browse/SOLR-7383
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 5.0, 6.0
>            Reporter: Upayavira
>            Assignee: Alexandre Rafalovitch
>            Priority: Minor
>         Attachments: atom_20170315.tgz, rss-data-config.xml
>
>
> The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) 
> is broken again. See associated issues.
> Below is a config that should work.
> This is caused by Slashdot seemingly oscillating between RDF/RSS and pure 
> RSS. Perhaps we should depend upon something more static, rather than an 
> external service that is free to change as it desires.
> {code:xml}
> <dataConfig>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="slashdot"
>                 pk="link"
>                 url="http://rss.slashdot.org/Slashdot/slashdot";
>                 processor="XPathEntityProcessor"
>                 forEach="/RDF/item"
>                 transformer="DateFormatTransformer">
>                               
>             <field column="source" xpath="/RDF/channel/title" 
> commonField="true" />
>             <field column="source-link" xpath="/RDF/channel/link" 
> commonField="true" />
>             <field column="subject" xpath="/RDF/channel/subject" 
> commonField="true" />
>                       
>             <field column="title" xpath="/RDF/item/title" />
>             <field column="link" xpath="/RDF/item/link" />
>             <field column="description" xpath="/RDF/item/description" />
>             <field column="creator" xpath="/RDF/item/creator" />
>             <field column="item-subject" xpath="/RDF/item/subject" />
>             <field column="date" xpath="/RDF/item/date" 
> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
>             <field column="slash-department" xpath="/RDF/item/department" />
>             <field column="slash-section" xpath="/RDF/item/section" />
>             <field column="slash-comments" xpath="/RDF/item/comments" />
>         </entity>
>     </document>
> </dataConfig>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to