Hi Manuel,

Thank you for the information.  I have a couple of response lines …


> I need to do it because cTAKES seems to not work with the Portuguese language 
> at all
                - Yes and no … You can create a dictionary of terms in the 
Portuguese language.  This would allow ctakes to at least recognize these terms 
and save them for posterity.  However, the more advanced processing available 
for English (negation, uncertainty detection, etc.) will not be available.  If 
you can find other nlp projects that work with Portuguese it may be possible to 
insert them into a ctakes pipeline.  The instructions for creating a custom 
dictionary are here (language selection is not documented but it is on the gui, 
download the umls with portugese snomed if you can):
https://cwiki.apache.org/confluence/display/CTAKES/Dictionary+Creator+GUI

> What I have in mind is to create a pipeline system that first translates the 
> texts from Portuguese to English
                - Probably a good way to go if you have a decent translation 
tool.

> From my research, I couldn't find anything relevant in this topic.
                - We definitely could use more documentation.

> Well, since this is the user version, I don't have the runPiperSubmitter.bat 
> available
                - Correct.  It is a tool that was created after the 4.0 release.

> When I try to run the bat files inside the bin of the Dev Version, I have the 
> results shown in the image attached to this e-mail.
                -  Your attachments were scrubbed so I can’t see them.  
However, I have a guess: did you run a “maven package”, unzip the created 
installation file and run from the bin/ directory there?  Or are you running 
with the bin/ inside your development sandbox?  The second method won’t work 
and will give you the “class not found” errors that you are seeing.  If you 
want to run using Intellij, turn on the profile “runPiperGui” and compile.  
Maven should launch the gui after compilation.

> Well, first of all, my objective is to share my experiences with cTAKES, in 
> order to share with the community what I'm going through. This way I can 
> contribute to the community and probably help others who are going through 
> the same as me.
                -  Excellent.  Would you be willing to write documentation for 
the ctakes wiki?  Your emails are clear and extremely well formatted!


  1.  Is this feasible? Am I aiming for something that I simply can't rely in 
cTAKES only to do, because I have to translate the texts first?

-          Ctakes won’t translate for you, but if you can find a tool that will 
then processing with ctakes should be possible.

  1.  Why don't I have a TypeSystem.xml file to feed CVD first, in the 
Development Version? I can only find it in the User Version, under /resources.

-          The typesystem.xml file is in the ctakes-type-system project until 
you “maven package” and create an “installation”.  If you just run from your 
developer environment you can point to the TypeSystem.xml in 
ctakes-type-system/src/main/resources/…

  1.  Why do we have options in CVD for other languages, but it clearly only 
works for the English language?

-          The cvd is a tool that is part of Apache UIMA.  It is more generic 
than ctakes and can read xmi files created by other systems.  I have no idea 
what the details are concerning its language support.

  1.  Any other hint you can give me, concerning the big picture of what I'm 
trying to build here?

-          Not really, sorry.  The multi-lingual goes outside my area of 
knowledge.

Sean


From: Manuel Lamy [mailto:[email protected]]
Sent: Thursday, January 25, 2018 2:28 PM
To: [email protected]
Subject: Re: Problem using CPE and XMI Writer CAS Consumer [EXTERNAL]

Hello Sean,

Before all, thansk a lot for the quick and detailed answer. Awesome support by 
you.

I'll give you a structured answer to be the more objective and concise 
possible. I guess it's important to tell you what I'm trying to achieve in 
order for you to help me.

My Project

I'm actually making a project with cTAKES in a partnership with a Portuguese 
hospital.

My goal is to create reports of the narrative parts of the EMRs of this 
hospital, in order to report the symptoms, diseases and clinical procedures 
found in each EMR.

What I have in mind is to create a pipeline system that first translates the 
texts from Portuguese to English, and then creates these reports based on the 
translated texts.

I'm not even sure yet I can create a pipeline system of this style with cTAKES. 
I need to do it because cTAKES seems to not work with the Portuguese language 
at all (despite that option being shown in the languages list when using CVD 
and that's confusing). So, well, I will translate it, I guess it's my best bet.

But just a note, I think it should exist more support and documentation about 
how to work with cTAKES in different languages than English. From my research, 
I couldn't find anything relevant in this topic. Not even one reference telling 
clearly that cTAKES only works with English language and not with the others.

Version of cTAKES

Naturally, I'm running the development version of cTAKES. I'm using Intellij. 
I'm using the latest version of cTAKES, trunk, that corresponds to version 
4.0.1-SNAPSHOT.

So, I guess so far so good, just as you said, I'm using trunk.

I did everything as per the guide "Developer Install Guide", concerning the 
Intellij instructions. The guide I used can be found here: 
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+Developer+Install+Guide<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BDeveloper-2BInstall-2BGuide&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=CVSlmO5-wIWG9Bh-dTmpXoUGF5sgLWD2jp4sbGZ8vh8&e=>


Behavior of cTAKES when running pipelines

Well, I did what you told me. I ran the Default Clinical Pipeline and the Piper 
File Submitter as per the wiki's. I have the User and Development versions both 
in my machine.

Now, I tried to run those pipelines in the User and Development versions. I ran 
the respective bat files:


  *   For the Default Clinical Pipeline I ran 'bin/runClinicalPipeline  -i 
inputDirectory  --xmiOut outputDirectory  --user umlsUsername  --pass 
umlsPassword'
  *   For the Piper File Submitter, I ran the 'bin/runPiperSubmitter'
Well, the results of running these two bat files were quite differents for the 
User and Development versions.

User Version

Default Clinical Pipeline

In this version, I went to bin directory and just ran the line 
'bin/runClinicalPipeline  -i inputDirectory  --xmiOut outputDirectory  --user 
umlsUsername  --pass umlsPassword' with my parameters.

It worked well and created the XMI output files where it was supposed. And I 
could open them in CVD, first opening a TypeSystem.xml file and then the 
generated XMI files I wanted.

Piper File Submitter

Well, since this is the user version, I don't have the runPiperSubmitter.bat 
available. Is this normal? That's comprehensible and I guess normal, for what I 
understand from this quote " If you are running from a development environment 
(checked out trunk from SVN) they can also be run using the Piper File 
Submitter GUI." But you tell me.

Well, I can say the User Version did what I wanted in this step, but I thought 
that would be nice to replicate it in the Development version, since I guess 
I'll have to use it in the future in order to implement all I want for my 
project described in the beggining of this e-mail. And the problems arose in 
the Development version....

Development Version

Well, in this version, I tried to replicate what I did in the User version, 
thinking to myself it would output the same result. I was wrong.


Default Clinical Pipeline and Piper File Submitter

When I try to run the bat files inside the bin of the Dev Version, I have the 
results shown in the image attached to this e-mail.

Yes, could not find or load PiperFileRunner and PiperRunnerGui. Is it supposed 
to happen in the Development Version? Am I doing something wrong in here? i 
just followed the guides you have available. All my Development Version 
installation was per the guide.


My objective with this e-mail

Well, first of all, my objective is to share my experiences with cTAKES, in 
order to share with the community what I'm going through. This way I can 
contribute to the community and probably help others who are going through the 
same as me.

In second place, I would like to know your opinion about the feasability of 
what I'm trying to make here. My goal is build a pipeline system like:


  *   EMRs in Portuguese already in txt files in a directory -> Translation to 
English -> Process all of the texts with Clinical Pipeline -> Output XMI in 
order to open them in CVD
This is what I aim with cTAKES. So I have the following questions:


  1.  Is this feasible? Am I aiming for something that I simply can't rely in 
cTAKES only to do, because I have to translate the texts first?
  2.  Why don't I have a TypeSystem.xml file to feed CVD first, in the 
Development Version? I can only find it in the User Version, under /resources.
  3.  Why do we have options in CVD for other languages, but it clearly only 
works for the English language?
  4.  Any other hint you can give me, concerning the big picture of what I'm 
trying to build here?
Any additional information you need from my side, just tell me.

Thanks one more time for the quick answers and support Sean.

Best regards,

Manuel


2018-01-25 15:35 GMT+00:00 Finan, Sean 
<[email protected]<mailto:[email protected]>>:
Hi Manuel,

My first comment is that you are running ctakes in a somewhat “ancient” manner, 
or better put, the xml descriptor workflow has been pretty much deprecated.

You should try to run ctakes 4.0.  If you are software savvy then I advise that 
you try the development version that is in trunk.  You’ve probably been on the 
ctakes download page, but just a reminder :
http://ctakes.apache.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=eR4BZrqJcoxN9dwsWE5PUw9qwMAju7w9zOOzqMHT95U&e=>

The ctakes wiki has some useful information, and the 4.0 entry is here:
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=IgvR2Z9rgstXIbo3scW0DsWkA59X0ANVuYeO5P5lrwI&e=>

To start playing with ctakes I suggest that you try to run the default clinical 
pipeline, following the instructions here:
https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=hvwwTI35sq53mx3R9TsPtHEF3p2G29qCmVime1NsgKU&e=>

Those instructions will start the default clinical pipeline from a command 
line.  If you have the development version from trunk then there is a gui 
available to run pipelines:
https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=HKBfRNAlLaLk9c-sPqupZpQzAc5ddcWbbXvWxRiWwBw&e=>

There are also many other pipeline configurations available in trunk to run 
more advanced / involved pipelines.  They are not in the 4.0 release.  The 
pipelines (including 4.0 default) are all defined using the replacement for 
those xml descriptor files.  The replacements are called “piper files”.
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=E7wf87y0Ldqo_pGw2sYdC_DPEeqsmnLYPMkrM5LIz8w&e=>

I hope that you find the pipers easier to understand and use than the old xml 
descriptors.

Anyway, if you run the ctakes 4.0 default clinical pipeline as outlined in the 
wiki page it will use the new FileTreeReader and FileTreeXmiWriter combination.

Give it a whirl and let me know how things go.

Sean


From: Manuel Lamy [mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, January 25, 2018 9:09 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Problem using CPE and XMI Writer CAS Consumer [EXTERNAL]

Hello Sean,

First of all, thanks for your quick answer.

I'm probably making some confusion over here, so I have the following questions.


  1.  A CAS Consumer is defined by a XML file. What you are implying is that I 
should go to my consumer XML (__XmiWriterCasConsumer.xml) and change it's 
<implementationName> tag to 
'org.apache.ctakes.core.cc<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cc&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=77ECYie_8Zy3RN9ARtzl51dBaHan8dijiNX2p0IkjIA&e=>.FileTreeXmiWriter'
 instead of 
'org.apache.ctakes.core.cc<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cc&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=77ECYie_8Zy3RN9ARtzl51dBaHan8dijiNX2p0IkjIA&e=>.XmiWriterCasConsumer'?
 Funny enough, it gives me a classNotFoundException if I do this. Would like to 
have your confirmation if I'm doing the right thing please. The class is well 
defined in that path though.
  2.  Concerning the reader, I make the same analogy. Should I go to my 
descriptor and change it's <implementationName> tag from 
'org.apache.ctakes.core.cr<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cr&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=-ag_dLUKFN_aLQ4irY_xU_CLzGNrDn6NfV62R5ojs8k&e=>.FilesInDirectoryCollectionReader'
 to 
'org.apache.ctakes.core.cr<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cr&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=-ag_dLUKFN_aLQ4irY_xU_CLzGNrDn6NfV62R5ojs8k&e=>.FileTreeReader'?
I did these two things and the error is the same concerning the new consumer 
'FileTreeXmiWriter', as you can see in the first image attached to this e-mail.

I would also like to ask you another question:


       3. Why does my class 'FileTreeXmiWriter' has a lot of unresolved 
classes? You can see it in the second image attached to this e-mail. I can't 
seem to import them right. I tried to import the extension of this class only 
to check the result, and look how it solved the import to me. 'apache' is not 
recognized. I'm just kinda baffled with the hierarchy defined for this project. 
If you could give me a little bit of clarification in this topic and how to 
solve it I would be appreciated.

Thanks for your attention! I'm really looking forward to put this to work. 
cTAKES seems awesome. It just needs these little tweaks.

Best regards,

Manuel





2018-01-24 22:26 GMT+00:00 Finan, Sean 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>:
Hi Manuel,

Your image got scrubbed by a server, but the problem may have been fixed in a 
recent xmi writer.  The latest xmi writer is in ctakes core and is named 
FileTreeXmiWriter.  One possible cause for a problem in the writer is if the 
document has some unexpected character or character combination.  A document 
reader should be massaging documents before they are processed and sent to the 
writer.  The most recent file reader is named FileTreeReader and is also in 
ctakes core.

Sean



From: Manuel Lamy 
[mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>]
Sent: Wednesday, January 24, 2018 5:10 PM
To: 
[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>
Subject: Problem using CPE and XMI Writer CAS Consumer [EXTERNAL]

Hello guys,

I'm having problems running the CPE using a XMI Writer CAS Consumer. However, 
it works with other consumers.

Problem

In the figure below, you can see my setup and the error I'm obtaining:

[Imagem inline 2]

Logs

Concerning logs, I'm obtaining this from Intellij:

org.apache.uima.resource.ResourceInitializationException
            at 
org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize(CollectionProcessingEngine_impl.java:81)
            at 
org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessingEngine(UIMAFramework_impl.java:438)
            at 
org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIMAFramework.java:918)
            at 
org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:573)
            at org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
            at org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713)
Caused by: org.apache.uima.resource.ResourceConfigurationException
            at 
org.apache.uima.collection.impl.cpm.container.CPEFactory.pro<https://urldefense.proofpoint.com/v2/url?u=http-3A__l.cpm.container.CPEFactory.pro&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=Kd-RE-JiMaX2AlLA310idXB-Dyqrbh68kZ24-2ZFEe0&e=>duceIntegratedCasProcessor(CPEFactory.java:1093)
            at 
org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProcessors(CPEFactory.java:547)
            at 
org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.java:253)
            at 
org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.java:127)
            at 
org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize(CollectionProcessingEngine_impl.java:73)
            ... 5 more
Caused by: java.lang.Exception: The component XMI Writer CAS Consumer cannot be 
created. (Thread Name: Thread-5)
            ... 10 more

Attempted Solutions

I only found one guy with the same problem as me. The solution proposed in the 
thread, by Sean Finan, was to change the xml of my consumer 
(__XmiWriterCasConsumer.xml), particularly the content of the tag 
<implementationName>, from
 
<implementationName>org.apache.ctakes.core.cc<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cc&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=77ECYie_8Zy3RN9ARtzl51dBaHan8dijiNX2p0IkjIA&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__apache.ctakes.core.cc&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=55lXUJ1MFPyBhpVH4sCBuEZD-InGrPRtD4YTvCJpMFo&s=zBsJhrOUC6BXHsKiMP4cEZTtjqB73N9V-kjGKPhqaNA&e=>.XmiWriterCasConsumerCtakes</implementationName>

to

<implementationName>org.apache.uima.tools.components.XmiWriterCasConsumer</implementationName>



However, this didn't work. The error is exactly the same. I'm out of ideas 
about what to do. I would like to have the report of CPE in XMI, in order to 
read it with CVD. You can see the thread here:

http://mail-archives.apache.org/mod_mbox/ctakes-dev/201701.mbox/%[email protected]%3E<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201701.mbox_-253C29cefd1fa1b44ce4a8dc92ec8b1cd882-40CHEXMAIL1A.CHBOSTON.ORG-253E&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=AApC_ctDoYjWBegXtxXpnBYO1T5L0I1tSjXOytMmgM0&s=_6v4jkcWzpMVtIWPH-1GkFuXpcYGRYdjs3sGzVLuEPA&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201701.mbox_-253C29cefd1fa1b44ce4a8dc92ec8b1cd882-40CHEXMAIL1A.CHBOSTON.ORG-253E&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=55lXUJ1MFPyBhpVH4sCBuEZD-InGrPRtD4YTvCJpMFo&s=vzHmir9t5IBncKpumZCOCqviJeDNNVl4ZkjEiK9AMp8&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201701.mbox_-253C29cefd1fa1b44ce4a8dc92ec8b1cd882-40CHEXMAIL1A.CHBOSTON.ORG-253E&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=N5zX2YGt7jbGKsiWAN7z5tdADmV2PwJdHTvvx2oZ2fM&s=5c-Yr8TMBg7-VyEjwF7gJlT1xP3LpHC6dvnZbihxDPg&e=>



Result Expected

Running the CPE process and have outputs as XMI files.



Result Obtained

Running the CPE results in an error, specifically for the consumer 
__XMIWriterCasConsumer.



Conclusion

Do any of you guys had this problem before? Do you have a suggestion about how 
can it be solved? Thanks a lot



Best regards,

Manuel

Reply via email to