Re: How to ingest files when metadata contain non standard characters?
Thanks Kos, Please see https://issues.apache.org/jira/browse/OODT-759 We will track it there from now on and determine what needs to be done. On Wed, Oct 8, 2014 at 8:55 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: Here is the offending file before escape: cas:metadata xmlns:cas=http://oodt.jpl.nasa.gov/1.0/cas; keyval keyderived_from/key val/gpfs/celgene/reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex/val val/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R1.fastq.gz/val val/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R2.fastq.gz/val /keyval keyval keyFilePath/key val/gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HM1_1.Sailfish.sfish/val /keyval keyval keystart_execution/key valTue Oct 7 20:49:12 2014/val /keyval keyval keyingest_user/key valkmavrommatis/val /keyval keyval keyend_execution/key valTue Oct 7 21:03:47 2014/val /keyval keyval keyrun_user/key valkmavrommatis/val /keyval keyval keyfile_host/key valussdgsphpccas02/val /keyval keyval keygenerator/key valsailfish/val /keyval keyval keyrun_host/key valussdgsphpccmp01/val /keyval keyval keysample_id/key val2569/val /keyval keyval keygenerator_version/key valsailfish[0.6.3]/val /keyval keyval keyProductType/key valGenericFile/val /keyval keyval keyanalysis_task/key val38/val /keyval keyval keygenerator_string/key valsailfish quant --index /gpfs/celgene/reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype 'T=PE:O=:S=AS' -1 (gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R1.fastq.gz) -2 (gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R2.fastq.gz) -o /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HM1_1.Sailfish.txt -p 8 --no_bias_correct /val /keyval /cas:metadata * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You. -- *Lewis*
Re: How to ingest files when metadata contain non standard characters?
Hi Kos, I take you up on your challenge ;) However I don't know if this will fix it. On Tue, Oct 7, 2014 at 11:31 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: valsailfish quant --index /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype 'T=PE:O=:S=AS' -1 (gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R1.fastq.gz) -2 (gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R2.fastq.gz) -o /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HP1_3.Sailfish.txt -p 8 --no_bias_correct /val OK, the code above is what you intially pasted... valsailfish quant --index /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype *#39;T=PE:O=gt;lt;:S=AS#39;* -1 lt;(gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R1.fastq.gz) -2 lt;(gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R2.fastq.gz) -o /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HP1_3.Sailfish.txt -p 8 --no_bias_correct /val The code above is what you pasted once you had escaped everything. Did you do this manually? I get a different output which I've pased below 1sailfish quant --index /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype *'T=PE:O=gt;lt;:S=AS'* -1 lt;(gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R1.fastq.gz) -2 lt;(gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R2.fastq.gz) -o /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HP1_3.Sailfish.txt -p 8 --no_bias_correct Please notice the difference in the part which I have boldened. Can you try reingesting and see if your come up donald trumps? org.apache.oodt.cas.filemgr.structs.exceptions.IngestException: exception ingesting product: [A1_1.Sailfish.sfish]: Message: Failed to ingest product [org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b] : java.lang.Exception: org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP method failed: HTTP/1.1 400 Bad Request BTW, you also have AGAIN highlighted the horrible opaque Product objects we get as Exception output. I logged an issue for this last week. https://issues.apache.org/jira/browse/OODT-755 We need to fix this and I will try my damdest to hack it at the weekend. Thanks Lewis
RE: How to ingest files when metadata contain non standard characters?
Hi Lewis I escaped the characters using the CGI::escapeHTML function from the CGI perl module. The differences between the two versions (mine escaped vs yours escaped) is in the encoding of the single quote ' character, if I am not mistaken. I want to clarify this because your email come as simple ASCII (not HTML) I did try your command and it worked !!! Now the question is how to do this encoding (your version) ☺ Thanks K -Original Message- From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: Wednesday, October 08, 2014 1:43 PM To: dev@oodt.apache.org Subject: Re: How to ingest files when metadata contain non standard characters? Hi Kos, I take you up on your challenge ;) However I don't know if this will fix it. On Tue, Oct 7, 2014 at 11:31 PM, Konstantinos Mavrommatis kmavromma...@celgene.commailto:kmavromma...@celgene.com wrote: valsailfish quant --index /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype 'T=PE:O=:S=AS' -1 (gunzip -c /gpfs/archive/RED/DA072/RNA- Seq/RawData/FastqFiles/HP1_3_R1.fastq. gz) -2 (gunzip -c /gpfs/archive/RED/DA072/RNA- Seq/RawData/FastqFiles/HP1_3_R2.fastq. gz) -o /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish- transcriptCount s/HP1_3.Sailfish.txt -p 8 --no_bias_correct /val OK, the code above is what you intially pasted... valsailfish quant --index /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype *#39;T=PE:O=gt;lt;:S=AS#39;* -1 lt;(gunzip -c /gpfs/archive/RED/DA072/RNA- Seq/RawData/FastqFiles/HP1_3_R1.fastq. gz) -2 lt;(gunzip -c /gpfs/archive/RED/DA072/RNA- Seq/RawData/FastqFiles/HP1_3_R2.fastq. gz) -o /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish- transcriptCount s/HP1_3.Sailfish.txt -p 8 --no_bias_correct /val The code above is what you pasted once you had escaped everything. Did you do this manually? I get a different output which I've pased below 1sailfish quant --index /reference/v1/Homo- sapiens/GRCh37.p12/SailFishIndex --libtype *'T=PE:O=gt;lt;:S=AS'* -1 lt;(gunzip -c /gpfs/archive/RED/DA072/RNA- Seq/RawData/FastqFiles/HP1_3_R1.fastq.gz) -2 lt;(gunzip -c /gpfs/archive/RED/DA072/RNA- Seq/RawData/FastqFiles/HP1_3_R2.fastq.gz) -o /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish- transcriptCounts/HP1_3.Sailfish.txt -p 8 --no_bias_correct Please notice the difference in the part which I have boldened. Can you try reingesting and see if your come up donald trumps? org.apache.oodt.cas.filemgr.structs.exceptions.IngestException: exception ingesting product: [A1_1.Sailfish.sfish]: Message: Failed to ingest product [org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b] : java.lang.Exception: org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP method failed: HTTP/1.1 400 Bad Request BTW, you also have AGAIN highlighted the horrible opaque Product objects we get as Exception output. I logged an issue for this last week. https://issues.apache.org/jira/browse/OODT-755 We need to fix this and I will try my damdest to hack it at the weekend. Thanks Lewis * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You.
Re: How to ingest files when metadata contain non standard characters?
Hi Kos, Thanks for reply On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: I escaped the characters using the CGI::escapeHTML function from the CGI perl module. Wow. I am surpised at this one. I wonder if this is a bug which results in the discrepancy or if this is intential behaviour! The differences between the two versions (mine escaped vs yours escaped) is in the encoding of the single quote ' character, if I am not mistaken. I want to clarify this because your email come as simple ASCII (not HTML) Yes that is correct. I did try your command and it worked !!! OK grand. Now the question is how to do this encoding (your version) ☺ Is this the question? My thoughts would be that this should be encapsulated within OODT somewhere and that it should not be necessary to escape everything as you/we have been doing. This is extremely time consuming and painful. I escaped everything here http://www.freeformatter.com/html-escape.html and compared the strings here http://text-compare.com/ The latter resource will verify that it is the single quote that is the offending char here. Thanks Lewis
Re: How to ingest files when metadata contain non standard characters?
In addition, if you can get to the bottom of what you think the intended behaviour is here, please feel free to log a ticket in Jira https://issues.apache.org/jira/browse/OODT/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel On Wed, Oct 8, 2014 at 5:59 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Kos, Thanks for reply On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: I escaped the characters using the CGI::escapeHTML function from the CGI perl module. Wow. I am surpised at this one. I wonder if this is a bug which results in the discrepancy or if this is intential behaviour! The differences between the two versions (mine escaped vs yours escaped) is in the encoding of the single quote ' character, if I am not mistaken. I want to clarify this because your email come as simple ASCII (not HTML) Yes that is correct. I did try your command and it worked !!! OK grand. Now the question is how to do this encoding (your version) ☺ Is this the question? My thoughts would be that this should be encapsulated within OODT somewhere and that it should not be necessary to escape everything as you/we have been doing. This is extremely time consuming and painful. I escaped everything here http://www.freeformatter.com/html-escape.html and compared the strings here http://text-compare.com/ The latter resource will verify that it is the single quote that is the offending char here. Thanks Lewis -- *Lewis*
Re: How to ingest files when metadata contain non standard characters?
cas-metadata should handle this escaping/unescaping in its SerDe capabilities. Kostsas, can yo provide the exact file that I can test on and upload it to JIRA? Chris Mattmann chris.mattm...@gmail.com -Original Message- From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com Reply-To: dev@oodt.apache.org Date: Thursday, October 9, 2014 at 2:59 AM To: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: How to ingest files when metadata contain non standard characters? Hi Kos, Thanks for reply On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: I escaped the characters using the CGI::escapeHTML function from the CGI perl module. Wow. I am surpised at this one. I wonder if this is a bug which results in the discrepancy or if this is intential behaviour! The differences between the two versions (mine escaped vs yours escaped) is in the encoding of the single quote ' character, if I am not mistaken. I want to clarify this because your email come as simple ASCII (not HTML) Yes that is correct. I did try your command and it worked !!! OK grand. Now the question is how to do this encoding (your version) ☺ Is this the question? My thoughts would be that this should be encapsulated within OODT somewhere and that it should not be necessary to escape everything as you/we have been doing. This is extremely time consuming and painful. I escaped everything here http://www.freeformatter.com/html-escape.html and compared the strings here http://text-compare.com/ The latter resource will verify that it is the single quote that is the offending char here. Thanks Lewis
RE: How to ingest files when metadata contain non standard characters?
Thanks Chris, attached is an offending file before escape. For the record perl module HTML::Entities does provide an escapeHTML alternative that produces acceptable files. Thanks K -Original Message- From: Chris Mattmann [mailto:chris.mattm...@gmail.com] Sent: Wednesday, October 08, 2014 11:38 AM To: dev@oodt.apache.org Subject: Re: How to ingest files when metadata contain non standard characters? cas-metadata should handle this escaping/unescaping in its SerDe capabilities. Kostsas, can yo provide the exact file that I can test on and upload it to JIRA? Chris Mattmann chris.mattm...@gmail.com -Original Message- From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com Reply-To: dev@oodt.apache.org Date: Thursday, October 9, 2014 at 2:59 AM To: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: How to ingest files when metadata contain non standard characters? Hi Kos, Thanks for reply On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: I escaped the characters using the CGI::escapeHTML function from the CGI perl module. Wow. I am surpised at this one. I wonder if this is a bug which results in the discrepancy or if this is intential behaviour! The differences between the two versions (mine escaped vs yours escaped) is in the encoding of the single quote ' character, if I am not mistaken. I want to clarify this because your email come as simple ASCII (not HTML) Yes that is correct. I did try your command and it worked !!! OK grand. Now the question is how to do this encoding (your version) ☺ Is this the question? My thoughts would be that this should be encapsulated within OODT somewhere and that it should not be necessary to escape everything as you/we have been doing. This is extremely time consuming and painful. I escaped everything here http://www.freeformatter.com/html-escape.html and compared the strings here http://text-compare.com/ The latter resource will verify that it is the single quote that is the offending char here. Thanks Lewis * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You.
Re: How to ingest files when metadata contain non standard characters?
Thanks Kostas. Can you upload somewhere and then point here, the message list strips attachments.. Cheers, Chris Chris Mattmann chris.mattm...@gmail.com -Original Message- From: Konstantinos Mavrommatis kmavromma...@celgene.com Reply-To: dev@oodt.apache.org Date: Thursday, October 9, 2014 at 5:48 AM To: dev@oodt.apache.org dev@oodt.apache.org Subject: RE: How to ingest files when metadata contain non standard characters? Thanks Chris, attached is an offending file before escape. For the record perl module HTML::Entities does provide an escapeHTML alternative that produces acceptable files. Thanks K -Original Message- From: Chris Mattmann [mailto:chris.mattm...@gmail.com] Sent: Wednesday, October 08, 2014 11:38 AM To: dev@oodt.apache.org Subject: Re: How to ingest files when metadata contain non standard characters? cas-metadata should handle this escaping/unescaping in its SerDe capabilities. Kostsas, can yo provide the exact file that I can test on and upload it to JIRA? Chris Mattmann chris.mattm...@gmail.com -Original Message- From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com Reply-To: dev@oodt.apache.org Date: Thursday, October 9, 2014 at 2:59 AM To: dev@oodt.apache.org dev@oodt.apache.org Subject: Re: How to ingest files when metadata contain non standard characters? Hi Kos, Thanks for reply On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: I escaped the characters using the CGI::escapeHTML function from the CGI perl module. Wow. I am surpised at this one. I wonder if this is a bug which results in the discrepancy or if this is intential behaviour! The differences between the two versions (mine escaped vs yours escaped) is in the encoding of the single quote ' character, if I am not mistaken. I want to clarify this because your email come as simple ASCII (not HTML) Yes that is correct. I did try your command and it worked !!! OK grand. Now the question is how to do this encoding (your version) ☺ Is this the question? My thoughts would be that this should be encapsulated within OODT somewhere and that it should not be necessary to escape everything as you/we have been doing. This is extremely time consuming and painful. I escaped everything here http://www.freeformatter.com/html-escape.html and compared the strings here http://text-compare.com/ The latter resource will verify that it is the single quote that is the offending char here. Thanks Lewis * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You.
RE: How to ingest files when metadata contain non standard characters?
Here is the offending file before escape: cas:metadata xmlns:cas=http://oodt.jpl.nasa.gov/1.0/cas; keyval keyderived_from/key val/gpfs/celgene/reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex/val val/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R1.fastq.gz/val val/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R2.fastq.gz/val /keyval keyval keyFilePath/key val/gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HM1_1.Sailfish.sfish/val /keyval keyval keystart_execution/key valTue Oct 7 20:49:12 2014/val /keyval keyval keyingest_user/key valkmavrommatis/val /keyval keyval keyend_execution/key valTue Oct 7 21:03:47 2014/val /keyval keyval keyrun_user/key valkmavrommatis/val /keyval keyval keyfile_host/key valussdgsphpccas02/val /keyval keyval keygenerator/key valsailfish/val /keyval keyval keyrun_host/key valussdgsphpccmp01/val /keyval keyval keysample_id/key val2569/val /keyval keyval keygenerator_version/key valsailfish[0.6.3]/val /keyval keyval keyProductType/key valGenericFile/val /keyval keyval keyanalysis_task/key val38/val /keyval keyval keygenerator_string/key valsailfish quant --index /gpfs/celgene/reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype 'T=PE:O=:S=AS' -1 (gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R1.fastq.gz) -2 (gunzip -c /gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R2.fastq.gz) -o /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HM1_1.Sailfish.txt -p 8 --no_bias_correct /val /keyval /cas:metadata * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You.