Markus Klose created SOLR-3976: ---------------------------------- Summary: HTMLStripTransformer strips the "tika" field not the field to index -> cannot have both (stripped and unstripped) Key: SOLR-3976 URL: https://issues.apache.org/jira/browse/SOLR-3976 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.6 Reporter: Markus Klose Priority: Minor
I run into the situation to index an html file using the dataimport handler and got an unexpected output. I wanted to create one field with the original content and one field with the same content but without html markup. If I enaple the HTMLStripTransformer at field text2 the other one (text1) is striped as well example configuraion: <dataConfig> <dataSource type="BinFileDataSource" name="bin"/> <document> <entity name="f" processor="FileListEntityProcessor" recursive="true" rootEntity="false" dataSource="null" baseDir="...." fileName=".*.html" onError="skip" > <entity name="tika-test" processor="TikaEntityProcessor" url="${f.fileAbsolutePath}" format="html" dataSource="bin" onError="skip" transformer="HTMLStripTransformer,TemplateTransformer"> <field column="id" template="${f.file}"/> <field column="text" name="text1" /> <field column="text" name="text2" stripHTML="true"/> </entity> </entity> </document> </dataConfig> -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org