Re: Does DataImportHandler do any sanitizing?

2012-08-15 Thread Lance Norskog
If you want to sanitize them during indexing, the regular expression
tools can do this. You would create a regular expression that matches
bogus elements. There is a regular expression transformer in the DIH,
and a regular expression CharFilter inside the Lucene text analysis
stack.

On Wed, Aug 15, 2012 at 2:10 PM, Michael Della Bitta
 wrote:
> Hi, Jon,
>
> As far as I know, DataImportHandler doesn't transfer data to the rest
> of Solr via XML so it shouldn't be a problem...
>
> Michael Della Bitta
>
> 
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Wed, Aug 15, 2012 at 5:03 PM, Jon Drukman  wrote:
>> I am pulling some fields from a mysql database using DataImportHandler and
>> some of them have invalid XML in them.  Does DataImportHandler do any kind
>> of filtering/sanitizing to ensure that it will go in OK or is it all on me?
>>
>> Example bad data:  orphaned ampersands ("Peanut Butter & Jelly"), curly
>> quotes ("we’re")
>>
>> -jsd-



-- 
Lance Norskog
goks...@gmail.com


Re: Does DataImportHandler do any sanitizing?

2012-08-15 Thread Michael Della Bitta
Hi, Jon,

As far as I know, DataImportHandler doesn't transfer data to the rest
of Solr via XML so it shouldn't be a problem...

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 15, 2012 at 5:03 PM, Jon Drukman  wrote:
> I am pulling some fields from a mysql database using DataImportHandler and
> some of them have invalid XML in them.  Does DataImportHandler do any kind
> of filtering/sanitizing to ensure that it will go in OK or is it all on me?
>
> Example bad data:  orphaned ampersands ("Peanut Butter & Jelly"), curly
> quotes ("we’re")
>
> -jsd-


Does DataImportHandler do any sanitizing?

2012-08-15 Thread Jon Drukman
I am pulling some fields from a mysql database using DataImportHandler and
some of them have invalid XML in them.  Does DataImportHandler do any kind
of filtering/sanitizing to ensure that it will go in OK or is it all on me?

Example bad data:  orphaned ampersands ("Peanut Butter & Jelly"), curly
quotes ("we’re")

-jsd-