cas-metadata should handle this escaping/unescaping in its SerDe capabilities.
Kostsas, can yo provide the exact file that I can test on and upload it to JIRA? ------------------------ Chris Mattmann chris.mattm...@gmail.com -----Original Message----- From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> Reply-To: <dev@oodt.apache.org> Date: Thursday, October 9, 2014 at 2:59 AM To: "dev@oodt.apache.org" <dev@oodt.apache.org> Subject: Re: How to ingest files when metadata contain non standard characters? >Hi Kos, >Thanks for reply > >On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis < >kmavromma...@celgene.com> wrote: > >> I escaped the characters using the CGI::escapeHTML function from the CGI >> perl module. >> > >Wow. I am surpised at this one. I wonder if this is a bug which results in >the discrepancy or if this is intential behaviour! > > >> >> The differences between the two versions (mine escaped vs yours escaped) >> is in the encoding of the single quote "'" character, if I am not >>mistaken. >> I want to clarify this because your email come as simple ASCII (not >>HTML) >> > >Yes that is correct. > > >> >> I did try your command and it worked !!! >> > >OK grand. > > >> >> Now the question is how to do this encoding (your version) ☺ >> >> >Is this the question? My thoughts would be that this should be >encapsulated >within OODT somewhere and that it should not be necessary to escape >everything as you/we have been doing. This is extremely time consuming and >painful. > >I escaped everything here >http://www.freeformatter.com/html-escape.html > >and compared the strings here >http://text-compare.com/ > >The latter resource will verify that it is the single quote that is the >offending char here. >Thanks >Lewis