Thanks Kostas. Can you upload somewhere and then point here, the message list strips attachments..
Cheers, Chris ------------------------ Chris Mattmann chris.mattm...@gmail.com -----Original Message----- From: Konstantinos Mavrommatis <kmavromma...@celgene.com> Reply-To: <dev@oodt.apache.org> Date: Thursday, October 9, 2014 at 5:48 AM To: "dev@oodt.apache.org" <dev@oodt.apache.org> Subject: RE: How to ingest files when metadata contain non standard characters? >Thanks Chris, > >attached is an offending file before escape. >For the record perl module HTML::Entities does provide an escapeHTML >alternative that produces acceptable files. > >Thanks >K > > >> -----Original Message----- >> From: Chris Mattmann [mailto:chris.mattm...@gmail.com] >> Sent: Wednesday, October 08, 2014 11:38 AM >> To: dev@oodt.apache.org >> Subject: Re: How to ingest files when metadata contain non standard >> characters? >> >> cas-metadata should handle this escaping/unescaping in its SerDe >> capabilities. >> >> Kostsas, can yo provide the exact file that I can test on and upload it >> to JIRA? >> >> ------------------------ >> Chris Mattmann >> chris.mattm...@gmail.com >> >> >> >> >> -----Original Message----- >> From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> >> Reply-To: <dev@oodt.apache.org> >> Date: Thursday, October 9, 2014 at 2:59 AM >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> Subject: Re: How to ingest files when metadata contain non standard >> characters? >> >> >Hi Kos, >> >Thanks for reply >> > >> >On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis < >> >kmavromma...@celgene.com> wrote: >> > >> >> I escaped the characters using the CGI::escapeHTML function from the >> >> CGI perl module. >> >> >> > >> >Wow. I am surpised at this one. I wonder if this is a bug which >> results >> >in the discrepancy or if this is intential behaviour! >> > >> > >> >> >> >> The differences between the two versions (mine escaped vs yours >> >>escaped) is in the encoding of the single quote "'" character, if I >> >>am not mistaken. >> >> I want to clarify this because your email come as simple ASCII (not >> >>HTML) >> >> >> > >> >Yes that is correct. >> > >> > >> >> >> >> I did try your command and it worked !!! >> >> >> > >> >OK grand. >> > >> > >> >> >> >> Now the question is how to do this encoding (your version) ☺ >> >> >> >> >> >Is this the question? My thoughts would be that this should be >> >encapsulated within OODT somewhere and that it should not be necessary >> >to escape everything as you/we have been doing. This is extremely time >> >consuming and painful. >> > >> >I escaped everything here >> >http://www.freeformatter.com/html-escape.html >> > >> >and compared the strings here >> >http://text-compare.com/ >> > >> >The latter resource will verify that it is the single quote that is >> the >> >offending char here. >> >Thanks >> >Lewis >> > >********************************************************* >THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS >CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED >INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL >OR INDIVIDUALS NAMED ABOVE. >If the reader is not the intended recipient, or the >employee or agent responsible to deliver it to the >intended recipient, you are hereby notified that any >dissemination, distribution or copying of this >communication is strictly prohibited. If you have >received this communication in error, please reply to the >sender to notify us of the error and delete the original >message. Thank You.