Re: Does DataImportHandler do any sanitizing?
If you want to sanitize them during indexing, the regular expression tools can do this. You would create a regular expression that matches bogus elements. There is a regular expression transformer in the DIH, and a regular expression CharFilter inside the Lucene text analysis stack. On Wed, Aug 15, 2012 at 2:10 PM, Michael Della Bitta wrote: > Hi, Jon, > > As far as I know, DataImportHandler doesn't transfer data to the rest > of Solr via XML so it shouldn't be a problem... > > Michael Della Bitta > > > Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 > www.appinions.com > Where Influence Isn’t a Game > > > On Wed, Aug 15, 2012 at 5:03 PM, Jon Drukman wrote: >> I am pulling some fields from a mysql database using DataImportHandler and >> some of them have invalid XML in them. Does DataImportHandler do any kind >> of filtering/sanitizing to ensure that it will go in OK or is it all on me? >> >> Example bad data: orphaned ampersands ("Peanut Butter & Jelly"), curly >> quotes ("we’re") >> >> -jsd- -- Lance Norskog goks...@gmail.com
Re: Does DataImportHandler do any sanitizing?
Hi, Jon, As far as I know, DataImportHandler doesn't transfer data to the rest of Solr via XML so it shouldn't be a problem... Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Aug 15, 2012 at 5:03 PM, Jon Drukman wrote: > I am pulling some fields from a mysql database using DataImportHandler and > some of them have invalid XML in them. Does DataImportHandler do any kind > of filtering/sanitizing to ensure that it will go in OK or is it all on me? > > Example bad data: orphaned ampersands ("Peanut Butter & Jelly"), curly > quotes ("we’re") > > -jsd-
Does DataImportHandler do any sanitizing?
I am pulling some fields from a mysql database using DataImportHandler and some of them have invalid XML in them. Does DataImportHandler do any kind of filtering/sanitizing to ensure that it will go in OK or is it all on me? Example bad data: orphaned ampersands ("Peanut Butter & Jelly"), curly quotes ("we’re") -jsd-