The problem could be a non UTF8 BOM character as the first character in a
file.  Try opening the XML file in a unicode agnostic editor that allows
for different encodings  and then re-write it in US ASCII.

https://en.wikipedia.org/wiki/Byte_order_mark

Peter

On Wed, Dec 18, 2019 at 11:31 AM Finan, Sean <
[email protected]> wrote:

> Sorry - I missed this:
> > I'm using the two CDA files that come with the cTAKES package
> (testpatient_cn_2.xml and testpatient_cn_1.xml compatible with
> NotesIIST_RTF.DTD
>
> Those files -should- be ok as they were originally used to test the CDA
> workflow.
>
> The code for CdaCasInitializer and ClinicalNotePreProcessor hasn't changed
> since 2015.
>
> The actual error is coming from the 3rd party xml parser (xerces):
> Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1;
> Content is not allowed in prolog.
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
> I am not sure what would be causing this.
>
> I don't run CDA, so I can't speak to the operational status of those
> components or the pipeline in general.
>
> Does anybody else out there use CDA?
>
> Sean
>
>
> ________________________________________
> From: Finan, Sean <[email protected]>
> Sent: Wednesday, December 18, 2019 2:22 PM
> To: [email protected]
> Subject: Re: cTAKES handling HL7 CDA Level 1 [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Hi Masoud,
>
> I am not an xml expert, so take this with a grain of salt.
>
> I think that something is wrong/unmatched with the first line of your xml
> document.
> Make sure that the first line is something like:
> <?xml version="1.0" encoding="utf-8"?>
>
> Sean
>
> ________________________________________
> From: Masoud Rouhizadeh <[email protected]>
> Sent: Wednesday, December 18, 2019 1:47 PM
> To: [email protected]
> Subject: Re: cTAKES handling HL7 CDA Level 1 [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi all,
>
> I'm using cTAKES user to process CDA documents by
> AggregateCdaProcessor.xml and AggregateCdaUMLSProcessor.xml located in
> /desc/ctakes-clinical-pipeline/desc/analysis_engine/
>
> My script to call this is
>
> java -Dctakes.umlsuser= -Dctakes.umlspw= -cp
> $CTAKES_HOME/lib/*:$CTAKES_HOME/desc/:$CTAKES_HOME/resources/
> -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms2g -Xmx3g
> org.apache.ctakes.core.cpe.CmdLineCpeRunner
> $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/test_cda_masoud.xml
>
> test_cda_masoud.xml has a proper path to CDA input and output. I'm using
> the two CDA files that come with the cTAKES package (testpatient_cn_2.xml
> and testpatient_cn_1.xml compatible with NotesIIST_RTF.DTD).
>
> Unfortunately, it seems that CdaCasInitializer cannot run, and I get the
> attached errors. I get the same errors when using the GUI with
> AggregateCdaProcessor AE
>
> - Am I missing something obvious?
> - Does cTAKES *User* installation handle CDA documents?
> - Is org.apache.ctakes.core.cpe.CmdLineCpeRunner an appropriate pipeline
> for CdaCasInitializer?
>
> Thank you so much for your help in advance.
>
> Masoud
>
>
>
>
>
>
>
> On 11/8/19, 8:30 AM, "Finan, Sean" <[email protected]>
> wrote:
>
>
>     Hi Masoud,
>
>     I think that the CdaCasInitializer is at least 10 years old.  I would
> not expect it to conform to any recent standards.
>
>     Does anybody else have a reader or transformer that can handle HL7 CDA
> r2?
>
>     Sean
>
>     p.s.
>     If anybody is involved with HL7 International, you may want to get
> some movement on addressing the typo on the page header(s):
>
>     Section 1a: Clinical Document Architcture (CDA®)
>
>     ________________________________________
>     From: Masoud Rouhizadeh <[email protected]>
>     Sent: Thursday, November 7, 2019 5:59 PM
>     To: [email protected]
>     Subject: cTAKES handling HL7 CDA Level 1 [EXTERNAL]
>
>     Dear cTAKES developer mailing list,
>
>     We have been working on a project at Hopkins for converting
> Epic-generated RTF notes into Clinical Document Architecture Level One.
>
>     We have been using HL7 CDA® Release 2 Schema, and now we plan to use
> cTAKES for concept extraction from those documents. The CDA Schema and
> examples can be found here
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.hl7.org_implement_standards_product-5Fbrief.cfm-3Fproduct-5Fid-3D7&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=h8q4BiKKL6eDBOGEta7gcpkDGIx5xFPlGrNfUPlzBuc&s=l8HjgDHeywmdkSUkOJBGWNLpJ-bPlw7Lmgzh02w8k2s&e=
>
>     In the cTAKES documentation, I see that CdaCasInitializer "does not
> handle all CDA documents. The CDA document must conform to the DTD
> resources/cda/NotesIIST_RTF.DTD."
>
>     Has anyone tested and evaluated cTAKES ability to consume HL7 CDA
> Level 1 Release 2 documents?
>
>     Thank you,
>     Masoud
>
>     ----
>     Masoud Rouhizadeh, PhD
>     Faculty - Division of Health Science Informatics (DHSI)
>     NLP Lead - Institute for Clinical and Translational Research (ICTR)
>     Johns Hopkins University School of Medicine
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cs.jhu.edu_-7Emrou_&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=h8q4BiKKL6eDBOGEta7gcpkDGIx5xFPlGrNfUPlzBuc&s=8fvrQoIy8orWYKCJoob5Z0Sbbioe5xyiN7pDMTzImOc&e=
>
>
>
>
>

Reply via email to