I printed the UpdateRequest object (getXML) and the XML is:
http://haha.comcontent
I can see that the issue is because the HTML/XML <> are replaced by < >
I understand that it is required to do so to keep them from
interfering with the solr xml document, but how do I accomplish what I
want to? I need to get the html in body field stripped out.
Any help is highly appreciated.
Thanks
Aseem
On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema wrote:
> Hey Guys,
> I have HTMLStripCharFilterFactory char filter declared in my
> schema.xml for fieldType text (code below). I am using this field type
> for body field of my schema. I am seeing different behavior when I use
> SolrJ to post a document (code below) and when I use the analysis.jsp.
> The text I am putting in the field is content.
>
> When SolrJ is used, the field gets the whole value
> content, but when analysis.jsp is used, it shows only
> "content" being used for the field.
>
> What am I possibly doing wrong here? How do I get
> HTMLStripCharFilterFactory to work, even if I am pushing data using
> SolrJ. Thanks.
>
> Your help is highly appreciated.
> Thanks
> --
> Aseem
>
> # schema.xml ##
>
>
>
> ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1"/>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> protected="protwords.txt"/>
>
>
>
> ## SolrJ Code ##
> CommonsHttpSolrServer server = new
> CommonsHttpSolrServer("http://aseem.desktop.amazon.com:8983/solr/sharepoint";);
> SolrInputDocument doc = new SolrInputDocument();
> UpdateRequest req = new UpdateRequest();
> doc.addField("url", "http://haha.com";);
> doc.addField("body", sbr.toString());*/
> doc.addField("body", "content");
> req.add(doc);
> req.setAction(ACTION.COMMIT, false, false);
> UpdateResponse resp = req.process(server);
> System.out.println(resp);
>
--
Aseem