It sounds like maybe when you run this from code, you are getting an
error page instead of the RSS feed and that error page is a malformed
HTML.
Do you have a proxy where you run the code? If so, your browser may be
using proxy and your DIH code does not. You could try running
something like WireShark, Fiddler or similar t inspect the
request/response you are actually getting.
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
On Sat, Jun 7, 2014 at 10:52 AM, ienjreny ismaeel.enjr...@gmail.com wrote:
Hello,
I am using the following script to index RSS items
dataSource type=URLDataSource encoding=UTF-8 /
document
entity name=slashdot
pk=link
url=http://www.alarabiya.net/.mrss/ar.xml;
processor=XPathEntityProcessor
forEach=/rss/channel/item
field column=category_name name=category_name
xpath=/rss/channel/item/title /
field column=link name=url xpath=/rss/channel/item/link /
/entity
/document
But I am facing the following error
Caused by: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag
/head; expected /meta.
Can any body help?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Error-when-using-URLDataSource-to-index-RSS-items-tp4140548.html
Sent from the Solr - User mailing list archive at Nabble.com.