Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2024-05-15 Thread Miller, Timothy
Thanks Sean,
I was able to get it working – definitely a user/documentation issue and not an 
issue with the code. Looks like a great release. I’m happy to vote for release 
+1.
Tim


From: Finan, Sean 
Date: Tuesday, May 14, 2024 at 10:35 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


Ah - are you just running the class within intellij?  If so, you need to set 
the classpath in the run configuration to be ctakes-examples.  Otherwise the 
classpath doesn't contain anything from modules outside ctakes-gui and 
ctakes-core.

Alternatively, run the maven compile step with the "runPiperGui" profile 
selected.  That will also run the piper file submitter gui with the correct 
classpath.

Using a binary build, after running bin/getUmlsDictionary, running 
bin/runPiperSubmitter also works.

I don't want to do it for 5.1.0, but I should make names of the class, profile 
and script match.

I will check the wiki instructions and make sure that -exact- details are in 
there.

Sean

________
From: Miller, Timothy 
Sent: Tuesday, May 14, 2024 12:55 PM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


I can check out and build successfully with mvn from the command line. I can 
successfully open in intellij and run the piper file submitter. I get an error 
trying to run the default fast pipeline piper file:

Loading Piper File DefaultTokenizerPipeline ...


Error: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key No Analysis Component found for 
ContextDependentTokenizerAnnotator



It doesn’t seem to be able to find the ContextDependentTokenizerAnnotator.

Tim



From: Miller, Timothy 
Date: Tuesday, May 14, 2024 at 9:25 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim


From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e%3e>

I added a little bit to your instr

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2024-05-14 Thread Miller, Timothy
I can check out and build successfully with mvn from the command line. I can 
successfully open in intellij and run the piper file submitter. I get an error 
trying to run the default fast pipeline piper file:

Loading Piper File DefaultTokenizerPipeline ...


Error: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key No Analysis Component found for 
ContextDependentTokenizerAnnotator



It doesn’t seem to be able to find the ContextDependentTokenizerAnnotator.

Tim



From: Miller, Timothy 
Date: Tuesday, May 14, 2024 at 9:25 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim


From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e>

I added a little bit to your instructions in the ctakes-web-rest README  
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$%3e>

The lines here indirectly applies to pre-release builds:
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$%3e>

The 5.1.0-SNAPSHOT version of ctakes-web-rest has a dependency on the 5.1.0 
version of ctakes modules (not the SNAPSHOT).
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/pom.xml*L14__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1V

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS]

2024-05-14 Thread Miller, Timothy
What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim


From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$

I added a little bit to your instructions in the ctakes-web-rest README  
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$

The lines here indirectly applies to pre-release builds:
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$

The 5.1.0-SNAPSHOT version of ctakes-web-rest has a dependency on the 5.1.0 
version of ctakes modules (not the SNAPSHOT).
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/pom.xml*L14__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7IKyYTAw$

The pre-release basically contains an equivalent to "changed code or resources" 
in that the code and resources in the pre-release do not exist on maven 
central, which is where a maven build would normally get them.
When maven builds the pre-release it will not be able to find version 5.1.0 of 
any jars through maven central, so it will look for them in your local .m2 
directory.
Maven puts the 5.1.0 jars in your .m2 directory when you run 'mvn install' on 
the main ctakes project.

In summary,
To build ctakes-web-rest to test the pre-release war, one must run 'mvn 
install' on the ctakes main project before they run 'mvn package' on the 
ctakes-web-rest project (or on the main project's web-rest-build profile).
To build ctakes-web-rest once ctakes 5.1.0 has been released, the extra 
preliminary step of running 'mvn install' will not be necessary.


  *   If you have some time this week, we can connect to understand what 
exactly is the problem.

I can meet you tomorrow evening your time (4-7 pm IST) to work with you in the 
SQL problem.  If you'd rather keep your Friday night to yourself, I can work 
with the same time slot any time through next Monday evening.

Before the 6.0.0 release I will put some Release Manager information in the 
wiki.  The maven release process using a GitHub repo requires a little trick 
that took me a long time to figure out, and the pre-release testing deserves 
some recorded documentation.

Sean




From: gandhi rajan 
Sent: Thursday, May 2, 2024 1:42 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

* External Email - Caution *


Hi Sean,

Thanks for the update. So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder
you mean? Infact I was trying to build them on a machine which doesnt have
any historic jars in the .m2 folder an

Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2023-12-20 Thread Miller, Timothy
To some extent I think (and hope!) it will be superseded by the PBJ code that 
will be in cTAKES 5.0.0 anyways.
Tim


From: Finan, Sean 
Date: Wednesday, December 20, 2023 at 3:43 PM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


Hi Tim,

Thanks for the explanation.  I am going to remove the BERTRest classes.

Sean

From: Miller, Timothy 
Sent: Wednesday, December 20, 2023 6:25 PM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Hi Sean and Peter,
I put the BERTRest stuff in, with the intention of finishing it and adding the 
python code to run the REST server, but just never finished it up. I’m ok with 
leaving it out for now. (Now that we are on GitHub it would be so much easier 
to do things like this in branches and only merge when it’s actually finished!)
Thanks
Tim


From: Finan, Sean 
Date: Tuesday, December 5, 2023 at 10:59 AM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Peter,

My thoughts on this:

> a newer version of Mastif than Ctakes is packaged with, and additional 
> modifications that I've made
- Are you saying that you made a local/custom version of mastif that builds 
upon their latest (1.6?) version?  Or do you just need to update the ctakes 
dependency from 1.4 to 1.6?
- If you have a custom version of mastif, one way that I have dealt with this 
in other projects is to keep changes to the standard library 'in parallel' and 
call the parallel versions in my/our code.  If mastif is written in java then 
this is fairly easy to do by creating another module that shares the package 
structure of mastif.  The parallel code can be put into ctakes.
- Another way to deal with it is to distribute your custom code and jar with 
ctakes.  Mastif appears to use an Apache 2.0 license, so I think that this can 
be done.  If your changes are extensive or make the parallel option 
inconvenient then this may be the way to go.  "Developed by the Apache Software 
Foundation 
<https://urldefense.com/v3/__https://www.apache.org/__;!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTR-i2_Eg$
 > and introduced in 2004, the Apache 2.0 License is a is a permissive free 
software license. The license permits use of the software for any purpose, 
users are able to distribute it, to modify it, and to distribute modified 
versions of the software."  - 
https://urldefense.com/v3/__https://pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$<https://urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$><https://urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$%3chttps:/urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$%3e>
 .
- For an inclusion of a modified mastif in ctakes, maybe just put the whole 
thing into a project named "ctakes-mastif".

> no documentation or references
- A common problem with ctakes code, tests, resources ...  unfortunate and 
difficult to deal with.  I am guilty of some of this paucity of information.

> dependent on a BertREST server
- In instances such as this I would say that somebody checked in unfinished 
code, or that somebody forgot to check in a resource.  However, this particular 
code probably came from a developer working on an external project and checked 
in code that is intended to be used by that project.
- For any new developers out there: It is a 'best practice' to create your own 
project and include ctakes as a dependency.  Keep your project code only in 
your project repository.  If you want to make changes to ctakes in parallel, 
you can also create a module in your ctakes source root and put your non-ctakes 
code only in that module.  Don't

Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] [SUSPICIOUS]

2023-12-20 Thread Miller, Timothy
Hi Sean and Peter,
I put the BERTRest stuff in, with the intention of finishing it and adding the 
python code to run the REST server, but just never finished it up. I’m ok with 
leaving it out for now. (Now that we are on GitHub it would be so much easier 
to do things like this in branches and only merge when it’s actually finished!)
Thanks
Tim


From: Finan, Sean 
Date: Tuesday, December 5, 2023 at 10:59 AM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Peter,

My thoughts on this:

> a newer version of Mastif than Ctakes is packaged with, and additional 
> modifications that I've made
- Are you saying that you made a local/custom version of mastif that builds 
upon their latest (1.6?) version?  Or do you just need to update the ctakes 
dependency from 1.4 to 1.6?
- If you have a custom version of mastif, one way that I have dealt with this 
in other projects is to keep changes to the standard library 'in parallel' and 
call the parallel versions in my/our code.  If mastif is written in java then 
this is fairly easy to do by creating another module that shares the package 
structure of mastif.  The parallel code can be put into ctakes.
- Another way to deal with it is to distribute your custom code and jar with 
ctakes.  Mastif appears to use an Apache 2.0 license, so I think that this can 
be done.  If your changes are extensive or make the parallel option 
inconvenient then this may be the way to go.  "Developed by the Apache Software 
Foundation 
 and introduced in 2004, the Apache 2.0 License is a is a permissive free 
software license. The license permits use of the software for any purpose, 
users are able to distribute it, to modify it, and to distribute modified 
versions of the software."  - 
https://urldefense.com/v3/__https://pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$
 .
- For an inclusion of a modified mastif in ctakes, maybe just put the whole 
thing into a project named "ctakes-mastif".

> no documentation or references
- A common problem with ctakes code, tests, resources ...  unfortunate and 
difficult to deal with.  I am guilty of some of this paucity of information.

> dependent on a BertREST server
- In instances such as this I would say that somebody checked in unfinished 
code, or that somebody forgot to check in a resource.  However, this particular 
code probably came from a developer working on an external project and checked 
in code that is intended to be used by that project.
- For any new developers out there: It is a 'best practice' to create your own 
project and include ctakes as a dependency.  Keep your project code only in 
your project repository.  If you want to make changes to ctakes in parallel, 
you can also create a module in your ctakes source root and put your non-ctakes 
code only in that module.  Don't check in that module!
- All that said, everybody forgets/makes mistakes/hurries ...


Sean


From: Peter Abramowitsch 
Sent: Tuesday, December 5, 2023 12:38 PM
To: dev@ctakes.apache.org 
Subject: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL]

* External Email - Caution *


The question is:  what is our policy if a resource in the ctakes archive
depends upon another resource that is not in the archive and may not be
available elsewhere.  I'm sure there are other examples, but here are
two

1.   I've done some enhancements to the ZoneAnnotator for note section
detection, but these depend upon a newer version of Mastif than Ctakes is
packaged with, and additional modifications that I've made.   If I do add
the updates to the Zone Annotator, where should I put the customized Mastif
library - does it belong in cTakes?

2.  I found a couple of interesting annotators in the archive that are
dependent on a BertREST server, but there's no documentation or references
as to what code base that server comes from or whether its BERT model is
even publicly available.

DocTimeRelBertRestAnnotator
TemporalBertRestAnnotator
PolarityBertRestAnnotator

Here's my feeling:  Ctakes sources should be packaged to either be
self-sufficient or based on publicly available dependencies at the time of
check in.  If we really want to keep d

Best practices for documenting NLP versions

2022-10-21 Thread Miller, Timothy
We’ve recently been using cTAKES for some internal projects where we make 
modifications, often using the REST server, combined with an open-source python 
client that makes the output of the REST server easy to post-process:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-client-py
written by my colleagues Andy McMurry and Mike Terry, and pip installable. The 
output is then either converted to FHIR or written to whatever convenient 
format we need.

But it’s useful to know for a given run on a given project, what was the NLP 
configuration that produced this output? Obviously, there are things like 
version numbers, but since cTAKES is highly configurable, and our 
post-processing libraries have versions, and we may use trunk or a previous 
commit instead of releases, things get complicated quickly. Does anyone have an 
existing solution they are willing to share? Or does anyone have any thoughts 
on this topic? This question goes slightly beyond cTAKES, but cTAKES is 
responsible for a lot of the complexity in figuring this out since it’s the 
most configurable component.

Thanks
Tim



Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS]

2022-06-02 Thread Miller, Timothy
My recollection was that we ran into issues in previous attempts at migration 
with the large file sizes in our repo.
Tim


On Thu, 2022-06-02 at 20:55 +, Finan, Sean wrote:

* External Email - Caution *



Thank you Gandhi and Richard.


Unless somebody else beats me to it I will perform some research and see what 
approaches can be used and which might be best.  In the end the cTAKES Project 
Management Committee will need to vote for any action as sweeping as moving to 
github.


Sean



From: gandhi rajan <



gandhiraja...@gmail.com

>

Sent: Thursday, June 2, 2022 9:02 AM

To:



dev@ctakes.apache.org


Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]


* External Email - Caution *



Hi Sean,


If we are sure that the SVN has all the latest changes and active

development is primarily on SVN, then why don't we request a fresh git

repository and push all the changes over there.


More info on



https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$



On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean

<



sean.fi...@childrens.harvard.edu.invalid

> wrote:


Hi Richard, you bring up a valid concern.


cTAKES Developers:


The Apache Foundation has had an initiative to "move" all projects to

GitHub for some time now.


I don't know much about how this is done.  If anybody out there has

knowledge or experience that they can pass on, please share.


Thanks,

Sean



From: Richard Eckart de Castilho <



r...@apache.org

>

Sent: Thursday, June 2, 2022 3:39 AM

To:



dev@ctakes.apache.org


Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]


* External Email - Caution *



Hi,


it appears that the GitHub mirror of Apache cTAKES may be stuck.


When I check the svn log of



https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$


, I can

see activity as recent as May 2022.


However, on GitHub, I can only see stale branches:





https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$



Wouldn't it be good if the GitHub mirror would be kept up-to-date?


Best,


-- Richard




--

Regards,

Gandhi


"The best way to find urself is to lose urself in the service of others !!!"


Re: Ctakes + UMLS dictionary [EXTERNAL]

2022-01-18 Thread Miller, Timothy
I recently posted an updated 2021AA UMLS file to the ctakes resource 
sourceforge repo:
https://sourceforge.net/projects/ctakesresources/files/

which should be a drop-in replacement for the version included in the last 
ctakes release.

If you extract this new file in the same directory as your release version, 
this container setup is an example of how to download and where to put the file:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-rest-package/blob/master/Dockerfile
and it references the upgrade of the dictionary descriptor here:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-rest-package/blob/master/customDictionary.xml

Tim



On Tue, 2022-01-18 at 16:52 +, Shyam Bhimani wrote:

* External Email - Caution *



Peter,


Appreciate your response.


Shyam Bhimani








CONFIDENTIALITY NOTICE: The contents of this email message and any attachments 
are intended solely for the addressee(s) and may contain confidential and/or 
privileged information and may be legally protected from disclosure.


-Original Message-

From: Peter Abramowitsch <



pabramowit...@gmail.com

>

Sent: Tuesday, January 18, 2022 9:19 AM

To:



dev@ctakes.apache.org


Subject: Re: Ctakes + UMLS dictionary


** WARNING: This email originated from outside of Target RWE. **



As distributed, it contains the mappings of cuis to 2015 snomed and rxnorm 
vocabularies.  It does not contain ICD 9 or 10 mappings.  But creating a custom 
dictionary is a normal aspect of any serious installation. This is how you can 
incorporate more recent versions of the umls and other vocabularies.  See the 
ctakes dictionary creator for more information.


On Tue, Jan 18, 2022, 7:36 AM Shyam Bhimani <



sbhim...@targetrwe.com

> wrote:


Hello,




When I dig little deep I found below information on cTAKES wiki. Does

it mean default clinical pipeline uses 2015 version of SNOWMED,

RxNorm, ICD9, ICD10? Please advise.






Shyam Bhimani




*From:* Shyam Bhimani <



sbhim...@targetrwe.com

>

*Sent:* Thursday, January 13, 2022 8:10 PM

*To:*



dev@ctakes.apache.org


*Subject:* Ctakes + UMLS dictionary




*** **WARNING:* This email originated from outside of Target RWE. 




Hello,




I am new to cTAKES and having hard time understanding what

year/version dictionary (SNOMED-CT, RxNorm, ICD9 etc) is being used by

ctakes default clinical pipeline?


I have some medication names that are not being picked up by cTAKES eg

dupilumab, dupixent so I am trying to understand why. Please advise.




TIA




Shyam Bhimani


*Software Engineer*




*Target RWE *


5001 S Miami Blvd, Suite 100


Durham, NC 27703




sbhim...@targetrwe.com



C: (817) 323-0632





<



https://urldefense.com/v3/__https://nam12.safelinks.protection.outlook.com/?url=https*3A*2F*2Fwww__;JSUl!!NZvER7FxgEiBAiR_!6BbmTtLIk5mmapjkmyGElCq2e6V7CLZrfMHoXe1HWqAvhoPBuVZ0DZIxPp_iBM_Ah14a5A6IoX7Jne4$



.targetrwe.com%2F&data=04%7C01%7C%7Cff820d39bb904a065f6308d9da9e4f

74%7Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%7C637781195763008088%7CU

nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha

WwiLCJXVCI6Mn0%3D%7C3000&sdata=v32v0ofWwBBwwdTm%2BeykdKCDz20ItDE6f

2GMNZJva8k%3D&reserved=0>






[image: Title: LinkedIn - Description: image of LinkedIn icon]

<



https://urldefense.com/v3/__https://nam12.safelinks.protection.outlook.com/?url=https*3A*2F*2Fwww.linkedin.com*2Fcompany*2Ftargetrwe*2F&data=04*7C01*7C*7Cff820d39bb904a065f6308d9da9e4f74*7Cd09f6c4846d241f380993e0f7df7a48e*7C1*7C0*7C637781195763008088*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=QhFxUBhxaOjUSe*2BFXnCBNI33FQAplp1iuohoTKI2ZOE*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJQ!!NZvER7FxgEiBAiR_!6BbmTtLIk5mmapjkmyGElCq2e6V7CLZrfMHoXe1HWqAvhoPBuVZ0DZIxPp_iBM_Ah14a5A6I91-EdXg$

 >[image:

Title: Twitter - Description: image of Twitter icon]

<



Re: Performance of the cleartk history module [EXTERNAL]

2022-01-04 Thread Miller, Timothy
Peter,
That sounds really useful! Were you able to benchmark it for runtime on a 
reasonably sized sample of your notes? Just curious because I wouldn't have 
expected regex to be that much of a bottleneck.
Tim


On Tue, 2022-01-04 at 17:36 -0800, Peter Abramowitsch wrote:

* External Email - Caution *



Thank you for the fulsome and humorous response.  Yes, I understand

perfectly.  We definitely think along the same lines.  One of the drawbacks

of static and simple to understand utility functions like JCasUtil's  is

that one can just slap things together without getting to grips with the

wastage of resources that sometimes occur.


This brings me to the topic of Negex.  I've done a lot of improvements to

it, also after I sent you that version last year.  It has been well tested

in over 100 million notes so i think I can check it in.  But back to

performance - it used to execute 200+ regular expressions multiple times on

every sentence covering an identified annotation regardless of whether

there was any hope of any of them matching.   My solution was to build an

inverted index of the compiled expressions keyed on unique words found in

the expressions, so based on the sentence,  I could look up and execute

only the expressions that might match.  This might cut the number of regex

operations down to 5 or 10 and sometimes none at all.There were many

other changes that related to negation detection, of course.  For instance

- handling sentences that switch between negating and non negating phrases

within the same sentence.


Peter


On Tue, Jan 4, 2022 at 10:47 AM Finan, Sean <



sean.fi...@childrens.harvard.edu

> wrote:


Great question.


The package name "windowed" isn't helpfully self-descriptive.  It contains

yet another bit of code that I wrote as quickly as possible to help

somebody in real-time with a problem.

* There is only a 'procedural' difference between the two.  The models and

methods are the same.


The assertion engine has a bunch of objects delegating to objects

delegating to more objects.  Each object calls one or more

JCasUtil.select() frequently for the same types.  They also redundantly

call JCasUtil.selectCovered() and selectCovering() for the same types.


process( jcas ) {

  Collection<..> sentences = ...select(..);

  delegateA.do( sentences );

}

class DelegateA {

  void do( Collection<..> sentences ) {

   for ( Sentence sentence : sentences ) {

  Collection tokens = JCasUtil.selectCovered( jcas,

Token.class, sentence );

  delegateB.use( tokens );

 }

}

class DelegateB {

  void use( Collection<..> tokens ) {

 Collection sentence = JCasUtil.selectCovering( jcas,

Sentece.class, tokens );

...

  }

}


The above isn't an exact representation, but you get the point.

The problem with code like this is repeated traversal of the (object)

array in the cas.  Every JCasUtil.select* pours through the whole thing.

For a small document with a small cas (or early in a pipeline), that array

may be small and the traversal fast.  However, when people are

(unadvisably) processing a single document that sizes in the gigabyte

range, repeatedly going through the cas takes a long time.


So, what I did was create a single container object that holds Collections

of the types of interest and their covering relationships, populate all

that stuff once per process( jcas ) and pass that container through to each

delegate object.  Basically, a jcas lite.  The biggest culprit in the

assertion engines was repeatedly iterating over the array for covered and

covering windows, hence the subpackage name "windowed".


Is it faster for smaller docs?  Not so much.  Does it instantaneously

process the Encyclopedia Brittanica as one text?  Of course not.  Is it

orders of magnitudes faster on such onerous docs?  In my tests, yes.


Going through my delegating example above, the end delegate is the same.

Hence the processing is the same and repeatable.  In my tests on both small

and gargantuan documents the windowed version and the original version

produced the same output.


Sean








From: Peter Abramowitsch <



pabramowit...@gmail.com

>

Sent: Tuesday, January 4, 2022 11:39 AM

To:



dev@ctakes.apache.org


Subject: Re: Performance of the cleartk history module [EXTERNAL]


* External Email - Caution *



Hi Sean

Ok..  I was confused whether I was meant to find it in the sources.

But while you're reading this, is there a brief way to describe the

difference between the older:package


org.apache.ctakes.assertion.medfacts.cleartk;

and

org.apache.ctakes.assertion.medfacts.cleartk.windowed


Peter






On Tue, Jan 4, 2022 at 7:47 AM Finan, Sean <



sean.fi...@childrens.harvard.edu

>

wrote:


Hi Peter,


I created a second engine that just used text matchi

Re: empty preferredText [EXTERNAL]

2021-12-07 Thread Miller, Timothy
OK, I thought this might be what's happening. I did check my 2021 UMLS release 
and the cui does seem to have a preferred text but I think my container is 
using an older release. For what it's worth the CUI is:
C0360554

and a sentence that reproduces the issue in CVD with the current release is:

'Patient had problems tolerating oral hydrocortisone.'

I will see if I can find the older UMLS release lying around. I think the right 
workaround for now is your suggestion of using the covered text.

Tim


On Tue, 2021-12-07 at 17:59 +0100, Peter Abramowitsch wrote:

* External Email - Caution *



Hi Tim,


Yes, I've definitely encountered it.   It happens when the concept has a

CUI_TERM which has matched the text, but there is no corresponding entry in

the SNOMED or other vocab table mapping CUI to SNOMED.  The obvious choice

is to use the covered text as a surrogate, but technically it could be PHI

if that matters to you.  The other thing is to see if there's an MSH term

that maps using the metathesaurus.  If so, including MSH in your dictionary

as a src AND dest vocab will solve the problem.


Peter



On Tue, Dec 7, 2021 at 5:45 PM Miller, Timothy <

<mailto:timothy.mil...@childrens.harvard.edu>

timothy.mil...@childrens.harvard.edu

> wrote:


Hello,

I'm using the dictionary lookup (through ctakes-web-rest) and trying to

read off the preferredText that comes back as a human-readable way to

display the CUI. On a very small percentage, there does not seem to be any

preferredText. Has anyone else encountered this? Is this a limitation of

the underlying ontologies or a bug we can address?

Tim




empty preferredText

2021-12-07 Thread Miller, Timothy
Hello,
I'm using the dictionary lookup (through ctakes-web-rest) and trying to read 
off the preferredText that comes back as a human-readable way to display the 
CUI. On a very small percentage, there does not seem to be any preferredText. 
Has anyone else encountered this? Is this a limitation of the underlying 
ontologies or a bug we can address?
Tim



Re: Another question about relationship extractors [EXTERNAL]

2021-10-27 Thread Miller, Timothy
Hi Peter,
I guess you're asking why there is annotator code for all the relations but 
only released models for location_of and degree_of (severity)? The simple 
reason is those are the only two that we felt were accurate enough to release. 
We had an annotated training corpus with all the relations, but some relation 
types did not have enough instances to train accurate models with the methods 
of the time. We're circling back pretty regularly to discuss whether newer 
methods might be able to do better with less data, we'll try to keep in touch 
about that.

Thanks
Tim


On Wed, 2021-10-27 at 11:35 +0200, Peter Abramowitsch wrote:

* External Email - Caution *



Hi (probably Sean),  are the default model.jars for the

*CausesBringsAboutRelationExtractorAnnotator* and the

*ManagesTreatsRelationExtractorAnnotator* not part of the cTakes

sources?I looked through the source at all pipers and all unit tests

and on the net and I didn't find references to the usage of these

annotators.  When I run with them, they are definitely looking for models

of their own, and there is code to do the training, but this is an area

that's still a mystery to me.  Are these models proprietary to U of

Colorado which is where the source seems to come from?


Peter


Re: Loading model - what? [EXTERNAL]

2021-09-13 Thread Miller, Timothy
Hi Ben,
Those come from the dependency parser and SRL system, and I think are generated 
from the external library (ClearNLP?) we depend on for those modules. As for 
the models themselves, the files are in ctakes-dependency-parser-res, but they 
are binary files that will be difficult to understand without ClearNLP.
Tim


On Mon, 2021-09-13 at 22:17 +0200, Benjamin hansen wrote:

* External Email - Caution *



Hi, when i run my ctakes code i see in the stdout loads of


Loading model:

.

Loading model:

...

Loading model:

.

Loading model:



etc.


I understand that my pipeline is loading a lot of models - but what models

are they? Is there any way i can find out what models are being loaded in

the pipeline?


I have tried to search for "Loading model" is both ctakes and opennlp

source code to figure out where its being printed from but to no avail :(

Where is this being printed from? How can I find out what models are loaded?


Thanks in advance


Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

2021-05-18 Thread Miller, Timothy
But Sean, isn't what he's asking for essentially already implemented in cTAKES 
as the custom dictionary? I'm currently using that approach for my covid 
container:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container
Tim


From: Finan, Sean 
Sent: Tuesday, May 18, 2021 11:55 AM
To: dev@ctakes.apache.org
Cc: Himanshu Shekhar Sahoo
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi Greg,

>From 30,000 ft, I think that you would want to use the RutaEngine.

https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$

That seems to be the actual analysis engine that loads and uses rules to create 
annotations.
While you could use an xml descriptor or use the piper "set" command and do 
things like mapping ruta to ctakes type systems, I would take the alternate 
approach of "copying" the initialize(..) and process (..) methods and modify 
them to use ctakes types directly.

Disclaimer:  I know very little about uima ruta.  At some point I did look into 
it but it was for a specific (ctakes-derivative) project and I didn't go 
further than basic doc perusal.

If you move forward with this please let us all know what you find.  I think 
that there will be great interest in the community.

Sean

From: Greg Silverman 
Sent: Tuesday, May 18, 2021 11:13 AM
To: dev@ctakes.apache.org
Cc: Himanshu Shekhar Sahoo
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]

* External Email - Caution *


Hi Sean,
I was wondering if there was a way to use rule-base lookup of a custom
lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
When I Googled around, I stumbled on UIMA Ruta, but couldn't find anything
wrt to cTAKES specifics.

Thanks!


Greg--

On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

>  To which ctakes component(s) are you referring?
> 
> From: Greg Silverman 
> Sent: Sunday, May 16, 2021 6:02 PM
> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>
> * External Email - Caution *
>
>
> I looked all over and could not find any information on how to add this
> pipeline component to cTAKES. I assume it uses UIMA Ruta?
>
> Thanks in advance!
>
> Greg--
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> >
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>


--
Greg M. Silverman
Senior Systems Developer
NLP/IE 

Department of Surgery
University of Minnesota
g...@umn.edu


multi-threads on REST client?

2021-03-25 Thread Miller, Timothy
Just wondering what the logistics of this are. The REST interface has a
CAS pool of 10, and when it gets a new request, it grabs a CAS and
sends it into a pipeline. So what happens if the REST endpoint is
getting hit by tons of different requests at the same time? I'm
experimenting with this in python and getting hard to understand errors
(best as I can tell it looks like it's complainin that the output is
None). Just wondering if anyone has any insight about what's going on
on the server side and whether a) this _should_ work, b) it _could_
work if done properly.

Thanks
Tim



Re: 4.0.0.1 patch [EXTERNAL]

2021-02-26 Thread Miller, Timothy
Hi Sean,
I can't answer your primary question, but my recollection is that
4.0.0.1 was an absolutely minimalist change to just fix the
authentication, so I don't think ytex would've been touched.
Tim


On Thu, 2021-02-25 at 17:24 +, Mullane, Sean *HS wrote:
> * External Email - Caution *
> 
> 
> Hello,
> 
> I am just catching up with the NLM auth changes. I tried replacing
> the ctakes-core-4.0.0.jar file with ctakes-core-4.0.0.1.jar, and am
> getting this error:
> 
> ERROR [PiperFileRunner] MESSAGE LOCALIZATION FAILED: Can't find
> resource for bundle java.util.PropertyResourceBundle, key No Analysis
> Component found for org.apache.ctakes.core.ae.CuiFilterAnnotator
> 
> I saw a message from Tim Miller from December mentioning removing
> ytex components from ctakes-core. Was this done on the released
> version of 4.0.0.1? We're using ytex so I wonder if that may be the
> cause of this error. Or maybe applying the patch isn't as simple as
> drop-in replacing the jar? (I changed the API key in my config files
> and that seems to be working as expected).
> 
> Thanks,
> Sean
> 
> 


Re: Looking for comparable experiences with mysql [EXTERNAL]

2021-02-25 Thread Miller, Timothy
Gandhi,
Is that code public at all? I made a docker container for the REST
server that uses the hsql, but if mysql is even faster and the
dictionary building can be containerized that might be a nice next step
for better performance of the container.
Tim


On Thu, 2021-02-25 at 20:33 +0530, gandhi rajan wrote:
> * External Email - Caution *
> 
> 
> Hi Peter,
> 
> Noticed a similar behavior while working on cTAKES REST module. The
> in-memory HSQL in my case was stressing the application server memory
> and
> ended up slowing down the process whereas mysql performed better.
> Also
> the engine you use in MySQL matters as well.
> 
> We did a testing on MySQL based UMLS dictionary using multiple pods
> running
> ctakes rest and it was scaling fairly well. But havent explored with
> 100+
> connections. But i guess with connection pool configurations in MySQL
> DB it
> should be manageable. Hope it helps.
> 
> On Thu, Feb 25, 2021 at 7:37 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
> 
> > Hi all,
> > 
> > As an experiment I extracted my rather large HSQL UMLS dictionary
> > into a
> > local MYSQL instance and ran the equivalent of 3 simultaneous
> > ctakes
> > pipelines with the overlap lookup annotator against it with a set
> > of 1000
> > notes.
> > 
> > Comparing that with the same setup running against the traditional
> > in-memory HSQL database (three separate instances), I was surprised
> > to find
> > that the Mysql implementation it was 30% faster even though it is
> > an out of
> > process DB
> > 
> > Has that been anyone else's experience as well?  And if so, do you
> > have any
> > experience with a MYSQL based UMLS dictionary with 100+ pipeline
> > connections?
> > 
> > Peter
> > 
> 
> 


Re: neural negation model in ctakes [EXTERNAL]

2021-01-24 Thread Miller, Timothy
Peter, I'd be happy to try it, especially if it's made easy with a ctakes 
module! At the very least that sounds like it would be a good baseline 
comparison to use if we are benchmarking new ML methods. We have several 
datasets available internally that are not widely available in the research 
community.
Tim


From: Peter Abramowitsch 
Sent: Sunday, January 24, 2021 12:05 PM
To: dev@ctakes.apache.org
Subject: Re: neural negation model in ctakes [EXTERNAL]

* External Email - Caution *


Thats great Tim - it sounds very sophisticated!

In fact I had made some changes to the Negex Annotator a last fall which I
hadn't checked in but was waiting for Sean to test.  In a great deal of my
own testing I discovered that Negex, which is easily expandable to
accommodate new constructions, had only a couple of serious flaws and I
believe I have fixed these, as well as a performance issue it had.   If
you're interested in testing it up against yours that would be great.
Reading your description above, I wondered how it would do in the case of
strings of entities which were negated by a single negating trigger phrase
either ahead or behind the series.  Or what happens when a series of
entities which begins as all being negated has one expressed in a way that
stops the negation pattern.  These are the weaknesses I addressed in my
changes.

Regards
Peter

On Sun, Jan 24, 2021 at 5:08 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Hi all,
> I just checked in a usable proof-of-concept for a neural (RoBERTa-based to
> be specific) negation classifier. The way it works is a tiny bit of python
> code (using FastAPI) sets up a REST interface that runs the classifier:
> ctakes-assertion/src/main/python/negation_rest.py
>
> it runs a default model that I trained and uploaded into Huggingface
> modelhub. It will automatically download the first time the server is run.
>
> there is a startup script there too:
> ctakes-assertion/src/main/python/start_negation_rest.sh
>
> The idea would be to run this on whatever machine you have with the
> appropriate GPU resources and it creates 3 REST endpoints:
> /negation/initialize  -- to load the model (takes longer the first time as
> it will download)
> /negation/process -- to classify the data and return negation values
> /negation/collection_process_complete -- to unload the model
>
> to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
>
> ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java
>
> The main work here is converting the cTAKES entities/events into a simpler
> data structure that gets sent to the python REST server, making the REST
> call, and then converting the classifier output into the polarity property.
>
> Performance:
> The accuracy of this classifier is much better in my testing. I am looking
> forward to being able to hopefully make the path to improving the
> performance easier as it can potentially just be a change to the model
> string to have it grab a new model on modelhub.
>
> The speed is marginally slower if we do a 1-for-1 swap, but that's a
> little bit misleading, because we currently run 2 parsers to generate
> features for the default ML negation module. If we don't need those parsers
> we can dramatically cut the speed of the processing even with the neural
> negation module. I tested this with the python code running on a machine
> with a 1070ti. The goal for these methods going forward if we want to scale
> should be to have the neural call do a few things with a single pass,
> especially if we are using large transformer models. But this proof of
> concept of a single task will hopefully make it easier for other folks to
> do that if they wish.
>
> FYI, another way of doing this is by using python libraries like cassis
> and actually having python functions be essentially UIMA AEs -- I think
> there will be a place for both approaches and I'm not trying to wall off
> work in that direction.
>
> Tim
>
>


neural negation model in ctakes

2021-01-24 Thread Miller, Timothy
Hi all,
I just checked in a usable proof-of-concept for a neural (RoBERTa-based to be 
specific) negation classifier. The way it works is a tiny bit of python code 
(using FastAPI) sets up a REST interface that runs the classifier:
ctakes-assertion/src/main/python/negation_rest.py

it runs a default model that I trained and uploaded into Huggingface modelhub. 
It will automatically download the first time the server is run.

there is a startup script there too:
ctakes-assertion/src/main/python/start_negation_rest.sh

The idea would be to run this on whatever machine you have with the appropriate 
GPU resources and it creates 3 REST endpoints:
/negation/initialize  -- to load the model (takes longer the first time as it 
will download)
/negation/process -- to classify the data and return negation values
/negation/collection_process_complete -- to unload the model

to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java

The main work here is converting the cTAKES entities/events into a simpler data 
structure that gets sent to the python REST server, making the REST call, and 
then converting the classifier output into the polarity property.

Performance:
The accuracy of this classifier is much better in my testing. I am looking 
forward to being able to hopefully make the path to improving the performance 
easier as it can potentially just be a change to the model string to have it 
grab a new model on modelhub.

The speed is marginally slower if we do a 1-for-1 swap, but that's a little bit 
misleading, because we currently run 2 parsers to generate features for the 
default ML negation module. If we don't need those parsers we can dramatically 
cut the speed of the processing even with the neural negation module. I tested 
this with the python code running on a machine with a 1070ti. The goal for 
these methods going forward if we want to scale should be to have the neural 
call do a few things with a single pass, especially if we are using large 
transformer models. But this proof of concept of a single task will hopefully 
make it easier for other folks to do that if they wish.

FYI, another way of doing this is by using python libraries like cassis and 
actually having python functions be essentially UIMA AEs -- I think there will 
be a place for both approaches and I'm not trying to wall off work in that 
direction.

Tim



Re: Apache cTAKES 4.0.0.1 : UMLS Authentication Patch [EXTERNAL]

2021-01-21 Thread Miller, Timothy
Seconded, thanks a lot Sean and Peter for getting this working and
turned around so quickly! 
Tim

On Wed, 2021-01-20 at 23:13 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Thanks Sean!
> 
> Peter
> 
> On Wed, Jan 20, 2021 at 4:25 PM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > ???As some have experienced, the U.S.A. National Library of
> > Medicine (NLM)
> > has changed the authentication method for using the Unified Medical
> > Language System (UMLS).
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nlm.nih.gov_research_umls_index.html&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA&s=CVA7xXHEy4dOSNfEju1Or1cr6KZd3QY7bnY4yIDye3I&e=
> >  
> > 
> > 
> > Though a bit late in its arrival, Apache cTAKES now has a patch
> > release
> > that supports the new UMLS authentication method.
> > 
> > 
> > The release number is 4.0.0.1, an update of the previous release
> > version
> > 4.0.0 with a single change to enable the new UMLS authentication.
> > 
> > No other code or functionality has been modified and there are no
> > enhancements to the previous release 4.0.0
> > 
> > 
> > There are instructions for use on the Apache cTAKES wiki.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0.0.1&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA&s=5Fxduqd71TO5P2AWuZyzAYmaBG1BiOM7G3mWXN-ljqo&e=
> >  
> > 
> > 
> > The source code is available in the 4.0.0.1 tag Subversion (svn)
> > repository.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_tags_ctakes-2D4.0.0.1_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA&s=1jNLJHU_4gH08DUNZDjfC4BLGsPSKdiOe63D48Qqekw&e=
> >  
> > 
> > 
> > The jar and pom files are available from maven central and any
> > Applications utilizing Apache cTAKES as an Apache Maven dependency
> > should
> > update their pom files.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__search.maven.org_search-3Fq-3Dctakes&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA&s=7ICwdr1JlzQeT2skY6TMXmU_u3WAZlxTYKpIZGmGQfs&e=
> >  
> > 
> > 
> > At this time the Apache infra script that points mirror download
> > servers
> > to the pre-built zip/archive files has not run.  I hope that the
> > mirror
> > servers are updated in a day or two.
> > 
> > When the mirror servers are updated the buttons on the "Downloads"
> > page of
> > ctakes.apache.org should trigger a download of the patch
> > version.  Until
> > then you will get a "page not found" error.
> > 
> > Until the pre-built archive downloads are available through the
> > website,
> > you can find them in the release repository.
> > 
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_releases_org_apache_ctakes_ctakes-2Dcore_4.0.0.1_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA&s=uM_5s0vlGN8eJc1nK4s9RPxNQ2o5KB3vWRC1M0qo2HU&e=
> >  
> > 
> > 
> > For more information please visit the wiki page on the Apache
> > cTAKES
> > 4.0.0.1 patch release.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0.0.1&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA&s=5Fxduqd71TO5P2AWuZyzAYmaBG1BiOM7G3mWXN-ljqo&e=
> >  
> > 
> > 
> > 
> > A very special thanks goes to Peter Abramowitsch for conception and
> > original implementation of the authentication code and workflow.
> > 
> > 
> > Many thanks to those who boldly tested, documented and otherwise
> > made this
> > patch and its trunk equivalent possible, including
> > 
> > Kean Kaufmann
> > 
> > Gandhi Rajan
> > 
> > Eugenia Monogyiou
> > 
> > Timothy Miller
> > 
> > and anybody else that I have forgotten (apologies).
> > 
> > 
> > ?And for those of you gave gave me a bit of prodding to get this
> > wrapped
> > up and published ... in the end I am grateful and you have done us
> > all a
> > service.
> > 
> > 
> > Cheers,
> > 
> > Sean
> > 
> > 


Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2020-12-08 Thread Miller, Timothy
Those are the ones that I set to the empty string, I don't know how
it's still finding something. I'll poke around.
Tim


On Tue, 2020-12-08 at 17:39 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> In your dictionary configuration xml file?  That would be
> resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml  -
> depending upon what dictionary you are using.
> 
> You will find two sections that look like this:
> 
>  https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIFAw&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=uRA81eRtCuJYVkMEzd47jQTacPEI0XTrHeDpgKY_Ma0&s=9SE2vJimnmdqHHlSYjb0EtK6QJ0DDzB7O7PBZQ6ayJI&e=
> >
>  
>      
>  
> 
> 
> 
> 
> From: Miller, Timothy 
> Sent: Tuesday, December 8, 2020 12:18 PM
> To: dev@ctakes.apache.org
> Subject: Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a
> release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
> [SUSPICIOUS]
> 
> * External Email - Caution *
> 
> 
> I also forgot to follow some of the instructions for setting umls url
> and other fields in the descriptor to empty string. But now I get:
> 
> 08 Dec 2020 12:15:15  WARN UmlsUserApprover - Using alternate umlsURL
> found via: properties
> 08 Dec 2020 12:15:15  INFO UmlsUserApprover - Checking UMLS Account
> at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=7jjWEe7tIQEjuL1bxNuOtUY3RXS1He-CqwN-1jMluqo&s=5v4infTOKs_DKS1MU9guMdb9vBk3jRrvQixooV2M8ZY&e=
> :
> 
> 
> where is it finding this other umlsURL???
> 
> Tim
> 
> 
> On Tue, 2020-12-08 at 17:06 +, Finan, Sean wrote:
> > * External Email - Caution *
> > 
> > 
> > Hi Tim, Peter,
> > 
> > Just in case Peter can't get back to you right away,
> > 
> > > I'm actually setting this via my VM options as:
> > -Dctakes_umlspw=
> > 
> > I think that you want to use
> > -Dctakes.umls_apikey
> > 
> > On some systems/shells the dot doesn't work.  ctakes will also
> > accept
> > (dot to underscore)
> > -Dctakes_umls_apikey
> > 
> > 
> > I think that is what I used ...
> > 
> > 
> > From: Miller, Timothy 
> > Sent: Tuesday, December 8, 2020 11:52 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: 4.0.0 UMLS Authentication Patch - for Developers - Not
> > a
> > release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> > 
> > * External Email - Caution *
> > 
> > 
> > Hi Peter,
> > Sorry to leave you in the lurch so long, I've been trying it this
> > morning and running into some issues (I don't think patch-related
> > issues but just trouble getting to where I can test it).
> > 
> > So far:
> > 1) Had to add dictionary-fast to the dictionary pom to get it to
> > compile
> > 2) had to remove all ytex modules from main pom to get it to
> > compile
> > 3) org/apache/ctakes/dictionary/lookup2/util/UmlsUserTester.java
> > looks
> > to have some old code and doesn't compile (bypassed by changing to
> > "no
> > error check"
> > 
> > Then I had to remember what classes in 4.0.0 I could try to check
> > it,
> > and settled on
> > org/apache/ctakes/clinicalpipeline/ClinicalPipelineFactory.java
> > 
> > with no credentials, I get a UMLS authentication error (as
> > expected).
> > 
> > with my old credentials, I get these errors:
> > 08 Dec 2020 11:43:21 ERROR UmlsUserApprover - The user property
> > must
> > now be set to 'umls_api_key'
> > 08 Dec 2020 11:43:21 ERROR UmlsUserApprover -  Verify that you are
> > setting command-line option --user, or ctakes property umlsUser, or
> > environment variable umlsUser properly.
> > 08 Dec 2020 11:43:21 ERROR UmlsDictionaryLookupAnnotator - Error:
> > Invalid UMLS License.  A UMLS License is required to use the UMLS
> > dictionary lookup.
> > 
> > (also seems about right).
> > 
> > If i set ctakes_umlsuser=umls_api_key and ctakes_umlspw= > api
> > key>, I'm still getting an error:
> > 
> > 08 Dec 2020 11:49:05 ERROR UmlsUserApprover -   UMLS Account at
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.ni

Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2020-12-08 Thread Miller, Timothy
I also forgot to follow some of the instructions for setting umls url
and other fields in the descriptor to empty string. But now I get:

08 Dec 2020 12:15:15  WARN UmlsUserApprover - Using alternate umlsURL
found via: properties
08 Dec 2020 12:15:15  INFO UmlsUserApprover - Checking UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser:


where is it finding this other umlsURL???

Tim


On Tue, 2020-12-08 at 17:06 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> Hi Tim, Peter,
> 
> Just in case Peter can't get back to you right away,
> 
> > I'm actually setting this via my VM options as:
> -Dctakes_umlspw=
> 
> I think that you want to use
> -Dctakes.umls_apikey
> 
> On some systems/shells the dot doesn't work.  ctakes will also accept
> (dot to underscore)
> -Dctakes_umls_apikey
> 
> 
> I think that is what I used ...
> 
> 
> From: Miller, Timothy 
> Sent: Tuesday, December 8, 2020 11:52 AM
> To: dev@ctakes.apache.org
> Subject: Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a
> release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> 
> * External Email - Caution *
> 
> 
> Hi Peter,
> Sorry to leave you in the lurch so long, I've been trying it this
> morning and running into some issues (I don't think patch-related
> issues but just trouble getting to where I can test it).
> 
> So far:
> 1) Had to add dictionary-fast to the dictionary pom to get it to
> compile
> 2) had to remove all ytex modules from main pom to get it to compile
> 3) org/apache/ctakes/dictionary/lookup2/util/UmlsUserTester.java
> looks
> to have some old code and doesn't compile (bypassed by changing to
> "no
> error check"
> 
> Then I had to remember what classes in 4.0.0 I could try to check it,
> and settled on
> org/apache/ctakes/clinicalpipeline/ClinicalPipelineFactory.java
> 
> with no credentials, I get a UMLS authentication error (as expected).
> 
> with my old credentials, I get these errors:
> 08 Dec 2020 11:43:21 ERROR UmlsUserApprover - The user property must
> now be set to 'umls_api_key'
> 08 Dec 2020 11:43:21 ERROR UmlsUserApprover -  Verify that you are
> setting command-line option --user, or ctakes property umlsUser, or
> environment variable umlsUser properly.
> 08 Dec 2020 11:43:21 ERROR UmlsDictionaryLookupAnnotator - Error:
> Invalid UMLS License.  A UMLS License is required to use the UMLS
> dictionary lookup.
> 
> (also seems about right).
> 
> If i set ctakes_umlsuser=umls_api_key and ctakes_umlspw= key>, I'm still getting an error:
> 
> 08 Dec 2020 11:49:05 ERROR UmlsUserApprover -   UMLS Account at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=1zoXt5XWcztipmIoZwSHyBw0N4u9aig3s4i2cVZ6EM4&s=IyrSaDgcbPb4YT4a_k99DLUDNJtuXQVMg1sDAUsUcyw&e=
>   is not
> valid.
> 08 Dec 2020 11:49:05 ERROR UmlsUserApprover -  Verify that you are
> setting command-line option --user, or ctakes property umlsUser, or
> environment variable umlsUser properly.
> 08 Dec 2020 11:49:05 ERROR UmlsUserApprover -  Verify that you are
> setting command-line option --pass, or ctakes property umlsPass, or
> environment variable umlsPass properly.
> 
> 
> I'm actually setting this via my VM options as:
> -Dctakes_umlspw=
> 
> should I be doing something else?
> 
> Thanks
> Tim
> 
> 
> On Tue, 2020-12-08 at 12:21 +0100, Peter Abramowitsch wrote:
> > * External Email - Caution *
> > 
> > 
> > Attn Tim Miller
> > =
> > Hi Tim,
> > 
> > Were you able to test out the 4.0.0  umls authentication
> > patch?It
> > would
> > be good to know if it and its instructions can be dropped in
> > without
> > much
> > further work.
> > 
> > Peter
> > 
> > On Tue, Dec 1, 2020 at 3:34 PM Miller, Timothy <
> > timothy.mil...@childrens.harvard.edu> wrote:
> > 
> > > Peter, I saw the readme attachment, but it sounded from your
> > > email
> > > like
> > > there was a patch attachment too that I didn't see. Did that not
> > > come
> > > through?
> > > Tim
> > > 
> > > On Fri, 2020-11-27 at 18:19 +, Finan, Sean wrote:
> > > > * External Email - Caution *
> > > > 
> > > > 
> > > > ?Thanks Peter,
> > > > 
> > > > 
> > > > Happy Thanksgiving all
> > > > 
> > > > 
> &g

Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a release [EXTERNAL] [SUSPICIOUS]

2020-12-08 Thread Miller, Timothy
Hi Peter,
Sorry to leave you in the lurch so long, I've been trying it this
morning and running into some issues (I don't think patch-related
issues but just trouble getting to where I can test it).

So far:
1) Had to add dictionary-fast to the dictionary pom to get it to
compile
2) had to remove all ytex modules from main pom to get it to compile
3) org/apache/ctakes/dictionary/lookup2/util/UmlsUserTester.java looks
to have some old code and doesn't compile (bypassed by changing to "no
error check"

Then I had to remember what classes in 4.0.0 I could try to check it,
and settled on 
org/apache/ctakes/clinicalpipeline/ClinicalPipelineFactory.java

with no credentials, I get a UMLS authentication error (as expected).

with my old credentials, I get these errors:
08 Dec 2020 11:43:21 ERROR UmlsUserApprover - The user property must
now be set to 'umls_api_key' 
08 Dec 2020 11:43:21 ERROR UmlsUserApprover -  Verify that you are
setting command-line option --user, or ctakes property umlsUser, or
environment variable umlsUser properly.
08 Dec 2020 11:43:21 ERROR UmlsDictionaryLookupAnnotator - Error:
Invalid UMLS License.  A UMLS License is required to use the UMLS
dictionary lookup. 

(also seems about right).

If i set ctakes_umlsuser=umls_api_key and ctakes_umlspw=, I'm still getting an error:

08 Dec 2020 11:49:05 ERROR UmlsUserApprover -   UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser is not valid.
08 Dec 2020 11:49:05 ERROR UmlsUserApprover -  Verify that you are
setting command-line option --user, or ctakes property umlsUser, or
environment variable umlsUser properly.
08 Dec 2020 11:49:05 ERROR UmlsUserApprover -  Verify that you are
setting command-line option --pass, or ctakes property umlsPass, or
environment variable umlsPass properly.


I'm actually setting this via my VM options as:
-Dctakes_umlspw=

should I be doing something else?

Thanks
Tim


On Tue, 2020-12-08 at 12:21 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Attn Tim Miller
> =
> Hi Tim,
> 
> Were you able to test out the 4.0.0  umls authentication patch?It
> would
> be good to know if it and its instructions can be dropped in without
> much
> further work.
> 
> Peter
> 
> On Tue, Dec 1, 2020 at 3:34 PM Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
> 
> > Peter, I saw the readme attachment, but it sounded from your email
> > like
> > there was a patch attachment too that I didn't see. Did that not
> > come
> > through?
> > Tim
> > 
> > On Fri, 2020-11-27 at 18:19 +, Finan, Sean wrote:
> > > * External Email - Caution *
> > > 
> > > 
> > > ?Thanks Peter,
> > > 
> > > 
> > > Happy Thanksgiving all
> > > 
> > > 
> > > 
> > > From: Peter Abramowitsch 
> > > Sent: Friday, November 27, 2020 11:47 AM
> > > To: dev@ctakes.apache.org
> > > Subject: 4.0.0 UMLS Authentication Patch - for Developers - Not a
> > > release [EXTERNAL]
> > > 
> > > * External Email - Caution *
> > > 
> > > 
> > > Hi Sean
> > > 
> > > Given that you're still deciding about the tagging or branching
> > > for
> > > the 4.0.0 back-patch, I won't check the changes in, but they are
> > > attached here.They need to be unloaded at the top of the
> > > source
> > > tree.
> > > 
> > > Gandhi:  I've attached a slightly modified version of the
> > > instructions for your Wiki updates.
> > > If anyone wants the two unofficial 4.0.0 jars for testing, I
> > > would be
> > > happy to put them in dropbox
> > > 
> > > Regards & Happy Thanksgiving
> > > Peter
> > > 


Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a release [EXTERNAL] [SUSPICIOUS]

2020-12-01 Thread Miller, Timothy
Peter, I saw the readme attachment, but it sounded from your email like
there was a patch attachment too that I didn't see. Did that not come
through?
Tim

On Fri, 2020-11-27 at 18:19 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> ?Thanks Peter,
> 
> 
> Happy Thanksgiving all
> 
> 
> 
> From: Peter Abramowitsch 
> Sent: Friday, November 27, 2020 11:47 AM
> To: dev@ctakes.apache.org
> Subject: 4.0.0 UMLS Authentication Patch - for Developers - Not a
> release [EXTERNAL]
> 
> * External Email - Caution *
> 
> 
> Hi Sean
> 
> Given that you're still deciding about the tagging or branching for
> the 4.0.0 back-patch, I won't check the changes in, but they are
> attached here.They need to be unloaded at the top of the source
> tree.
> 
> Gandhi:  I've attached a slightly modified version of the
> instructions for your Wiki updates.
> If anyone wants the two unofficial 4.0.0 jars for testing, I would be
> happy to put them in dropbox
> 
> Regards & Happy Thanksgiving
> Peter
> 


Re: Changes to UTS Authentication for Authorized Content Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2020-11-25 Thread Miller, Timothy
That link doesn't say anything there about incremental update releases,
but even with the normal process I think we can get 4.0.1 out faster
than usual because it is such a small change and there are unlikely to
be multiple RCs to get one that works for everyone.
Does anyone want to volunteer to be release manager? It needs to be
someone on the PMC, so Sean, myself, Gandhi, or Chen probably.

Tim


On Tue, 2020-11-24 at 18:10 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> > > I haven't looked into whether or not Apache svn servers have a
> > > locking mechanism ...
> > I think it's worth checking -- if we're allowed to just branch off
> > the
> > 4.0.0 tag we can get a 4.0.1 distribution that just has this one
> > change, and we could have it built and uploaded quickly so we're
> > ready
> > for the UMLS change. How would we find out?
> 
> A 4.0.1 made directly from 4.0 with only the authentication update is
> probably the way to go.
> I suppose that for people with dependencies, downloads etc. fixed at
> 4.0 would have to get their new umls key and change their ctakes
> config anyway, so telling them to update any coded version numbers
> doesn't involve too much extra effort.
> 
> The main apache org how to release documentation is at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__infra.apache.org_release-2Dpublishing.html&d=DwIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=QVEto_k7Ovh16r4YjW7Uelv9_lDmvjxRwoI2r7_6qBk&s=fjMkpO1i2FXprtFbQ-XJ1cvVlSQ8-uz3gSOBojxNMI8&e=
>  
> I am not sure of anything specifically regarding patches.
> I don't know if we need to go through the full process for a point
> release ...
> 
> 
> 
> From: Miller, Timothy 
> Sent: Tuesday, November 24, 2020 12:45 PM
> To: dev@ctakes.apache.org
> Subject: Re: Changes to UTS Authentication for Authorized Content
> Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
> 
> * External Email - Caution *
> 
> 
> On Tue, 2020-11-24 at 16:29 +, Finan, Sean wrote:
> > * External Email - Caution *
> > 
> > 
> > Hi Tim and all,
> > 
> > Peter kindly checked this into trunk last week.
> > I tested that version and it seemed to work.
> > 
> > Another question might be "how do we get this into the/a release?
> > 
> > I haven't looked into whether or not Apache svn servers have a
> > locking mechanism on release branches, but if not I think that a
> > patch of 4.0 using the version that you and Greg tested should be a
> > simple checkin.
> 
> I think it's worth checking -- if we're allowed to just branch off
> the
> 4.0.0 tag we can get a 4.0.1 distribution that just has this one
> change, and we could have it built and uploaded quickly so we're
> ready
> for the UMLS change. How would we find out?
> 
> Tim
> 
> > I am sure that everybody is tired of hearing me say this, but I
> > would
> > like to get out a version 5 asap and disclaim that it is required
> > for
> > the new umls authentication.  That would make patching v4 a non-
> > issue.
> > 
> > Regardless of repository inclusion, the documentation (also written
> > by Peter) needs to get to the ctakes wiki  - and probably the main
> > ctakes web site.  On that note, the web site needs to be redone
> > asap.
> > 
> > Anyway, cheers to Peter for taking upon himself this update!
> > We do still have a few things left to do.
> > Volunteers?
> > 
> > Sean
> > 
> > 
> > From: Miller, Timothy 
> > Sent: Tuesday, November 24, 2020 11:07 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Changes to UTS Authentication for Authorized Content
> > Distributors [EXTERNAL] [SUSPICIOUS]
> > 
> > * External Email - Caution *
> > 
> > 
> > Peter,
> > I was able to try your changes and get this new authentication
> > mechanism to work in the default pipeline. Peter, Sean, et al, what
> > are
> > the next steps for getting this in to trunk? If you're not
> > comfortable
> > checking in directly maybe you can share the patch for review.
> > Tim
> > 
> > On Sun, 2020-11-15 at 20:54 +0100, Peter Abramowitsch wrote:
> > > * External Email - Caution *
> > > 
> > > 
> > > Hi Greg
> > > 
> > > I've got the modifications finished for the new UMLS
> > > authentication
> > > method
> > > 

Re: Changes to UTS Authentication for Authorized Content Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2020-11-24 Thread Miller, Timothy
There's no doubt, maybe even 5.0.0 could be justified, but the hope (at
least my hope!) was that if we could get out a 4.0.0.0.0.1 release
with just this change, it would satisfy anyone who just wants to make
sure their setup still works when the NLM switches off the REST server.
Tim


On Tue, 2020-11-24 at 22:19 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Sounds reasonable, but just a thought:  Are the changes in trunk
> sufficient to warrant a new major release?   Are there major
> structural or compatibility issues between 4.0 and trunk?  - it
> doesn’t strike me that there areHow about 4.0.0 going to 4.0.1
> and trunk becoming 4.1.0-SNAPSHOT?   I.e. a new feature release...
> when it comes.
> 
> Peter
> 
> Sent from my iPad
> 
> > On Nov 24, 2020, at 22:10, Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> > 
> > Webapp email is killing me ... that email was sent prematurely.
> > 
> > > ctakes-4.0.0-rc3to   ctakes-4.0.1
> > 
> > I think that is certainly one way to do it.
> > 
> > One could checkout the branch
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_branches_ctakes-2D4.0.0_&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=m8k3ufZy2nG7f3SJkq_l7KQvGiWF_Z2l4hcEJkYLGP0&s=k_mq8OPmtWdlvyj5tITQHfONr0GiO2Akx-eyT6FbVdQ&e=
> >  
> > and make the changes to that code.
> > 
> > Would the method be:
> > 1.  Checkout 4.0.0 branch
> > 2.  Apply the patch
> > 3.  Continue with the full release process, checkin and tag 4.0.1 ?
> > 4.  Keep working on trunk for the next release
> > 5.  Change the version numbers in trunk to ctakes-5.0.0-SNAPSHOT
> > ^- this would force all external projects using trunk to update
> > their dependency version.
> > 
> > Let us keep this rolling,
> > Sean
> > 
> > From: Finan, Sean 
> > Sent: Tuesday, November 24, 2020 4:02 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Changes to UTS Authentication for Authorized Content
> > Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
> > [SUSPICIOUS]
> > 
> > * External Email - Caution *
> > 
> > 
> > > ctakes-4.0.0-rc3to   ctakes-4.0.1
> > 
> > I think that is certainly one way to do it.
> > 
> > One could checkout the branch
> > 
> > Would the method be:
> > 1.  Checkout 4.0.0-rc3
> > 2.  Apply the patch
> > 3.  Continue with the full release process, checkin and tag?
> > 4.  Keep working on trunk for the next release
> > 
> > 
> > 
> > From: Miller, Timothy 
> > Sent: Tuesday, November 24, 2020 2:22 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Changes to UTS Authentication for Authorized Content
> > Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
> > 
> > * External Email - Caution *
> > 
> > 
> > Specifically, is the way to go to branch from the tag at:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_tags_ctakes-2D4.0.0-2Drc3&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=FQFm_YbfeRux_ry1zWBtjd3hgCIPEZvQqmrh1W9UAVE&s=F1goCJ3-zDm3bXn-8Z-aBDBSeOhu6U8vMwtDGnmxTE4&e=
> > 
> > (the latest release candidate before release I believe)
> > into
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_tags_ctakes-2D4.0.1&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=FQFm_YbfeRux_ry1zWBtjd3hgCIPEZvQqmrh1W9UAVE&s=5acAKSY9--DM3OWA_Kl4H7Uwt7AGhttmPYKUzdnYLEY&e=
> > 
> > ?
> > 
> > Tim
> > 
> > > On Tue, 2020-11-24 at 20:14 +0100, Peter Abramowitsch wrote:
> > > * External Email - Caution *
> > > 
> > > 
> > > Right, then.
> > > I'll get that done.
> > > 
> > > Peter
> > > 
> > > On Tue, Nov 24, 2020 at 7:53 PM Finan, Sean <
> > > sean.fi...@childrens.harvard.edu> wrote:
> > > 
> > > > I think so.  Whether we can 'release' it or not, branching code
> > > > from the
> > > > 4.0 release is probably a first step.
> > > > 
> > > > From: Peter Abramowitsch 
> > > > Sent: Tuesday, November 24, 2020 1:23 PM
> >

Re: Changes to UTS Authentication for Authorized Content Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2020-11-24 Thread Miller, Timothy
Specifically, is the way to go to branch from the tag at:
https://svn.apache.org/repos/asf/ctakes/tags/ctakes-4.0.0-rc3

(the latest release candidate before release I believe)
into
https://svn.apache.org/repos/asf/ctakes/tags/ctakes-4.0.1

?

Tim

On Tue, 2020-11-24 at 20:14 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Right, then.
> I'll get that done.
> 
> Peter
> 
> On Tue, Nov 24, 2020 at 7:53 PM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > I think so.  Whether we can 'release' it or not, branching code
> > from the
> > 4.0 release is probably a first step.
> > 
> > From: Peter Abramowitsch 
> > Sent: Tuesday, November 24, 2020 1:23 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Changes to UTS Authentication for Authorized Content
> > Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> > 
> > * External Email - Caution *
> > 
> > 
> > Sean
> > 
> > In the meantime, I think I have the 4.0.0 source somewhere.  I can
> > take my
> > copy which is early 4.0.1 and make whatever small changes are
> > needed (if
> > any) to get it to build in 4.0.0.   Would that be useful?
> > 
> > Peter
> > 
> > On Tue, Nov 24, 2020 at 6:46 PM Miller, Timothy <
> > timothy.mil...@childrens.harvard.edu> wrote:
> > 
> > > On Tue, 2020-11-24 at 16:29 +, Finan, Sean wrote:
> > > > * External Email - Caution *
> > > > 
> > > > 
> > > > Hi Tim and all,
> > > > 
> > > > Peter kindly checked this into trunk last week.
> > > > I tested that version and it seemed to work.
> > > > 
> > > > Another question might be "how do we get this into the/a
> > > > release?
> > > > 
> > > > I haven't looked into whether or not Apache svn servers have a
> > > > locking mechanism on release branches, but if not I think that
> > > > a
> > > > patch of 4.0 using the version that you and Greg tested should
> > > > be a
> > > > simple checkin.
> > > 
> > > I think it's worth checking -- if we're allowed to just branch
> > > off the
> > > 4.0.0 tag we can get a 4.0.1 distribution that just has this one
> > > change, and we could have it built and uploaded quickly so we're
> > > ready
> > > for the UMLS change. How would we find out?
> > > 
> > > Tim
> > > 
> > > > I am sure that everybody is tired of hearing me say this, but I
> > > > would
> > > > like to get out a version 5 asap and disclaim that it is
> > > > required for
> > > > the new umls authentication.  That would make patching v4 a
> > > > non-
> > > > issue.
> > > > 
> > > > Regardless of repository inclusion, the documentation (also
> > > > written
> > > > by Peter) needs to get to the ctakes wiki  - and probably the
> > > > main
> > > > ctakes web site.  On that note, the web site needs to be redone
> > > > asap.
> > > > 
> > > > Anyway, cheers to Peter for taking upon himself this update!
> > > > We do still have a few things left to do.
> > > > Volunteers?
> > > > 
> > > > Sean
> > > > 
> > > > 
> > > > From: Miller, Timothy 
> > > > Sent: Tuesday, November 24, 2020 11:07 AM
> > > > To: dev@ctakes.apache.org
> > > > Subject: Re: Changes to UTS Authentication for Authorized
> > > > Content
> > > > Distributors [EXTERNAL] [SUSPICIOUS]
> > > > 
> > > > * External Email - Caution *
> > > > 
> > > > 
> > > > Peter,
> > > > I was able to try your changes and get this new authentication
> > > > mechanism to work in the default pipeline. Peter, Sean, et al,
> > > > what
> > > > are
> > > > the next steps for getting this in to trunk? If you're not
> > > > comfortable
> > > > checking in directly maybe you can share the patch for review.
> > > > Tim
> > > > 
> > > > On Sun, 2020-11-15 at 20:54 +0100, Peter Abramowitsch wrote:
> > > > > * External Email - Caution *
> > > > > 
> > > > > 
> > > > > Hi Greg
> > > > > 
> > > > >

Re: Changes to UTS Authentication for Authorized Content Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2020-11-24 Thread Miller, Timothy
On Tue, 2020-11-24 at 16:29 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> Hi Tim and all,
> 
> Peter kindly checked this into trunk last week.  
> I tested that version and it seemed to work.
> 
> Another question might be "how do we get this into the/a release?
> 
> I haven't looked into whether or not Apache svn servers have a
> locking mechanism on release branches, but if not I think that a
> patch of 4.0 using the version that you and Greg tested should be a
> simple checkin.

I think it's worth checking -- if we're allowed to just branch off the
4.0.0 tag we can get a 4.0.1 distribution that just has this one
change, and we could have it built and uploaded quickly so we're ready
for the UMLS change. How would we find out?

Tim

> 
> I am sure that everybody is tired of hearing me say this, but I would
> like to get out a version 5 asap and disclaim that it is required for
> the new umls authentication.  That would make patching v4 a non-
> issue.  
> 
> Regardless of repository inclusion, the documentation (also written
> by Peter) needs to get to the ctakes wiki  - and probably the main
> ctakes web site.  On that note, the web site needs to be redone
> asap.  
> 
> Anyway, cheers to Peter for taking upon himself this update!  
> We do still have a few things left to do.  
> Volunteers?
> 
> Sean
> 
> 
> From: Miller, Timothy 
> Sent: Tuesday, November 24, 2020 11:07 AM
> To: dev@ctakes.apache.org
> Subject: Re: Changes to UTS Authentication for Authorized Content
> Distributors [EXTERNAL] [SUSPICIOUS]
> 
> * External Email - Caution *
> 
> 
> Peter,
> I was able to try your changes and get this new authentication
> mechanism to work in the default pipeline. Peter, Sean, et al, what
> are
> the next steps for getting this in to trunk? If you're not
> comfortable
> checking in directly maybe you can share the patch for review.
> Tim
> 
> On Sun, 2020-11-15 at 20:54 +0100, Peter Abramowitsch wrote:
> > * External Email - Caution *
> > 
> > 
> > Hi Greg
> > 
> > I've got the modifications finished for the new UMLS authentication
> > method
> > using API keys.  If you're game, I'd like you to be next to test
> > it.
> > Contact me at pabramowit...@gmail.com and I'll get you a new
> > ctakes-dictionary-lookup-fast.4.0,1,x,jar  and Readme.
> > 
> > If it's smooth for you, I'll talk with Sean about checking it in
> > and
> > what
> > wiki locations need to be updated.
> > 
> > To get your key you'll need to log into UMLS, If you've not
> > been
> > there
> > recently you'll need to go through their profile upgrade process
> > where user
> > details will be rerouted through one of the  public authentication
> > mechanisms.
> > Once in, go to your profile section and you'll find the API_KEY.
> > 
> > All of you will need to do this eventually.
> > 
> > Regards
> > Peter
> > 
> > Regards, Peter
> > 
> > On Wed, Nov 11, 2020 at 10:13 PM Greg Silverman <
> > g...@umn.edu.invalid>
> > wrote:
> > 
> > > Hi Peter,
> > > Thanks, that would be great. I like the backwards compatible
> > > method. Our
> > > issue is that we have custom configurations for use in Docker and
> > > Kubernetes with UIMA-AS, so this would be ideal.
> > > 
> > > Greg--
> > > 
> > > 
> > > On Wed, Nov 11, 2020 at 3:07 PM Peter Abramowitsch <
> > > pabramowit...@gmail.com>
> > > wrote:
> > > 
> > > > Hi Greg
> > > > It's actually extremely simple for current UMLS licensees.
> > > > The new API uses an API_KEY instead of user/password.Just
> > > > login to
> > > the
> > > > UTS site, go to your profile area and check on your key
> > > > I or someone else will make changes to the cTAKES validator to
> > > > accept
> > > this
> > > > key in lieu of name and password
> > > > 
> > > > For new UMLS users, they will need a couple of extra
> > > > steps.   They will
> > > get
> > > > an identity from one of the authentication providers like
> > > > Login.gov as a
> > > > part of the UTS registration process.   But having completed
> > > > that, they
> > > > will have a profile page with the API_KEY as above
> > > > 
> > > > 
> > > > 
> > > >

Re: Changes to UTS Authentication for Authorized Content Distributors [EXTERNAL]

2020-11-24 Thread Miller, Timothy
Peter,
I was able to try your changes and get this new authentication
mechanism to work in the default pipeline. Peter, Sean, et al, what are
the next steps for getting this in to trunk? If you're not comfortable
checking in directly maybe you can share the patch for review.
Tim

On Sun, 2020-11-15 at 20:54 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Hi Greg
> 
> I've got the modifications finished for the new UMLS authentication
> method
> using API keys.  If you're game, I'd like you to be next to test it.
> Contact me at pabramowit...@gmail.com and I'll get you a new
> ctakes-dictionary-lookup-fast.4.0,1,x,jar  and Readme.
> 
> If it's smooth for you, I'll talk with Sean about checking it in and
> what
> wiki locations need to be updated.
> 
> To get your key you'll need to log into UMLS, If you've not been
> there
> recently you'll need to go through their profile upgrade process
> where user
> details will be rerouted through one of the  public authentication
> mechanisms.
> Once in, go to your profile section and you'll find the API_KEY.
> 
> All of you will need to do this eventually.
> 
> Regards
> Peter
> 
> Regards, Peter
> 
> On Wed, Nov 11, 2020 at 10:13 PM Greg Silverman 
> wrote:
> 
> > Hi Peter,
> > Thanks, that would be great. I like the backwards compatible
> > method. Our
> > issue is that we have custom configurations for use in Docker and
> > Kubernetes with UIMA-AS, so this would be ideal.
> > 
> > Greg--
> > 
> > 
> > On Wed, Nov 11, 2020 at 3:07 PM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > wrote:
> > 
> > > Hi Greg
> > > It's actually extremely simple for current UMLS licensees.
> > > The new API uses an API_KEY instead of user/password.Just
> > > login to
> > the
> > > UTS site, go to your profile area and check on your key
> > > I or someone else will make changes to the cTAKES validator to
> > > accept
> > this
> > > key in lieu of name and password
> > > 
> > > For new UMLS users, they will need a couple of extra
> > > steps.   They will
> > get
> > > an identity from one of the authentication providers like
> > > Login.gov as a
> > > part of the UTS registration process.   But having completed
> > > that, they
> > > will have a profile page with the API_KEY as above
> > > 
> > > 
> > > 
> > > On Wed, Nov 11, 2020 at 7:27 PM Greg Silverman <
> > > g...@umn.edu.invalid>
> > > wrote:
> > > 
> > > > For example, the user installation guide has not been updated
> > > > to
> > reflect
> > > > the changes NLM is implementing. The impact for our workflow is
> > > > pretty
> > > > significant, so without a clear picture about what we need to
> > > > do in
> > order
> > > > to not have any down time is - to put it mildly -  leaving us
> > > > in the
> > > dark.
> > > > Greg--
> > > > 
> > > > On Tue, Nov 10, 2020 at 9:18 AM Greg Silverman 
> > > > wrote:
> > > > 
> > > > > It's still unclear what this means for me as a user of a
> > > > > piece of
> > > > software
> > > > > that uses UTS for authentication purposes. Could someone
> > > > > please, in
> > > plain
> > > > > language, describe what we as normal users who use software
> > > > > reliant
> > on
> > > > this
> > > > > authentication mechanism will have to do in order to not
> > > > > disrupt any
> > > > > running workflows?
> > > > > 
> > > > > Thanks!
> > > > > 
> > > > > Greg--
> > > > > 
> > > > > 
> > > > > On Mon, Nov 9, 2020 at 7:13 AM McLaughlin, Patrick (NIH/NLM)
> > > > > [E]
> > > > >  wrote:
> > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > The UMLS Terminology Services (UTS) is moving from a
> > username/password
> > > > > > login to an NIH-federal identity provider system on Monday,
> > > > > > November
> > > 9.
> > > > > > UMLS users will begin migrating their accounts to the new
> > > > > > system on
> > > this
> > > > > > date with a migration deadline of January 15, 2021.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > You will need to update any systems that use the UMLS user
> > validation
> > > > API
> > > > > > <
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov_help_license_validateumlsuserhelp.html&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=mRLdzmP8PH1wOUx_Eh0yspc_HfbCKRpLtcwojZLiy1U&s=vl8aEPfbmDAK-rTVWtqAu41tQQw1y1GI6MV0Gu6YDNI&e=
> > > > > > >,
> > as
> > > > > > described in my previous emails. We recommend you implement
> > > > > > the new
> > > > > > workflow as soon as possible after November 9.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Attached are instructions for implementing UMLS user
> > > > > > validation with
> > > the
> > > > > > new system. You MUST supply NLM with the domains (e.g.,
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.example.com&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=mRLdzmP

Re: cTAKES data flow [EXTERNAL]

2020-10-13 Thread Miller, Timothy
With the default pipelines, the only information that leaves your
computer is your UMLS credentials, which are used to verify that you
are a registered/current UMLS user.
Tim


On Tue, 2020-10-13 at 15:37 +0530, moinuddeen smrk wrote:
> * External Email - Caution *
> 
> 
> Hi Team,
> i am one of many users of cTAKES. i work with clinical trial
> sensitive
> data. I wanted to know about the data flow that cTAKES has. Following
> are
> my questions:
> 
> 1. Does cTAKES send any information (the text in the files) outside
> my
> workspace/computer ?
> 2. Does cTAKES store any information parsed to it outside of my
> computer?
> 
> Please do let me know the answers for this as soon as you can.
> 
> Thanks!
> 
> Regards,
> Riyaz


Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

2020-09-15 Thread Miller, Timothy
Peter,
The parts of speech come from the ctakes-pos-tagger module, which uses
the OpenNLP pos tagger trained on clinical data. There is a
constituency parser as well, which I think in theory can tag even
better (that might be able to get you a unary branch in a tree from NN
-> CD -> .), but is a lot slower than the pos tagger and we
probably don't want to make it necessary to run for simple dictionary
pipelines. 
Tim

On Tue, 2020-09-15 at 12:12 -0700, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Sean this conversation raises for me a question that I've had for a
> while.
>  Does the term finding mechanism actually use a treebank to find the
> POS or
> does it use a another less rigorous approach.   If it were rigorous
> wouldn't it be able to tag a pure number as an NN in the role
> of  object if
> it played the corresponding role in the sentence?
> 
> I've not had the same problem as Ayyub,  but I have been wondering
> why one
> needed to disable the identification of cm as a genetic acronym
> because of
> situations where clearly cm is part of a unit of measure and would
> show up
> as an entity's modifier in a treebank.
> 
> Does the question make sense?
> 
> Peter
> 
> On Tue, Sep 15, 2020, 9:02 AM Finan, Sean <
> sean.fi...@childrens.harvard.edu>
> wrote:
> 
> > I should mention that going the Paragraph route would only impact
> > term
> > lookup.
> > 
> > From: abad.ay...@cognizant.com 
> > Sent: Tuesday, September 15, 2020 11:54 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > * External Email - Caution *
> > 
> > 
> > Thank you Sean for the response. We shall definitely try that way.
> > I have
> > one question on the "f84.1" problem, since we have now developed a
> > lot of
> > features based on the output from cTAKES, is the impact of changing
> > the
> > sentenceDetectorAnnotator going to be huge?
> > 
> > Thanks & Regards
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > -Original Message-
> > From: Finan, Sean 
> > Sent: Tuesday, September 15, 2020 9:06 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > [External]
> > 
> > 
> > Hi Abad,
> > 
> > The first thing that I would try for the "97112" problem is
> > changing the
> > parts of speech that are ignored for lookup.  Right now a pure
> > number is
> > ignored - it is not a word.  So, similar to what I said in my
> > previous
> > email, change the dictionary lookup parameter exclusionTags.  But
> > to make
> > sure that you get everything, you can first try no exclusions:
> > set exclusionTags=""
> > 
> > My guess with the F84.1 problem is that your sentence splitter is
> > splitting "F84.1" but not splitting "F84 . 1".
> > 
> > I think that the best way to start debugging is adding the
> > PrettyTextWriter to the end of the piper and looking at its output
> > (see my
> > previous email).   It will print each sentence on a line and
> > indicate the
> > part of speech for each token.  If you can quickly and easily see
> > what the
> > system is doing then you might start to understand what needs to be
> > changed
> > to fit your data.
> > 
> > Sean
> > 
> > From: abad.ay...@cognizant.com 
> > Sent: Tuesday, September 15, 2020 11:15 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > * External Email - Caution *
> > 
> > 
> > Thank you Sean for the detailed response.  I think there was
> > miscommunication from our end with the requirement. Your solution
> > of adding
> > spaces between the entries worked but it required the input  text
> > also to
> > have the spaces. If the text comes in as 'F84.1' cTAKES didn't
> > reckon the
> > token but if the text came as 'F84 . 1' then cTAKES was recognizing
> > the
> > tokens for the below INSERT scripts.
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
> > 
> > But we encountered a similar issue when we configured an INSERT
> > entry as
> > below for CPT codes,
> > 
> > INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)
> > 
> > Where 97112 is a CPT code(which usually doesn’t have decimals or
> > '.'). We
> > expected cTAKES to recognize the CPT code '97112' as a separate
> > token but
> > it didn't. Could you pls. advise us on why this issue came up.
> > 
> > Is there something wrong in the configuration. Do we need to have
> > something additional for cTAKES to recognize the code alone as a
> > separate
> > token Is there any other way in which we can try to get the
> > respective
> > ICD/CPT code of the identified annotation from cTAKES, like
> > querying the
> > CPT/ICD table

Re: I think I found a bug. [EXTERNAL]

2020-08-31 Thread Miller, Timothy
Peter,
I think the email server doesn't let images through. Can you post an
imgur link maybe?
Tim

On Sun, 2020-08-30 at 14:35 -0700, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> Hi,
> I was getting a StringIndexOutOfBoundsException in
> DependencyUtil.doesSubsume(annot1, annot2)  with exactly this
> situation:
> 
> negex annotator
> the text begins  "negative for "
> 
> If the chunk negative for xyz is preceded by anything else, even a
> space, the problem goes away.  It also goes away when you choose
> another style of negation.   "no headache", for instance
> 
> I've traced the problem back to some illegal entries in the jCAS  You
> can see from the image below that the ContextAnnotation's begin
> offset is illegal.  
> 
> Clearly there's an off-by-one error and this triggered the exception
> because in my example, the Annotation is created right from the 0th
> char of my note text.  But it occurred to me that in every other
> case, where the annotation doesn't begin on the first character and
> it doesn't throw an exception, it might cause  downstream methods
> like doesSubsume to give the wrong result because the begin/end
> offsets are wrong.
> 
> I'm not sure how to follow this up.  But if anyone wants to tackle
> it?
> 
> This is from HistoryAttributeClassifier beginning at line 274
> 
> 
> 
> 
> 


Re: Sentence detector changes [EXTERNAL] [SUSPICIOUS]

2020-06-12 Thread Miller, Timothy
Hi Abad,
I've been following the thread but don't have much to add on top of what Sean's 
saying. The BIO version has one major benefit, in that it allows sentences to 
wrap newlines. But it does seem to break on Mr. and Dr. unfortunately. The 
solution is to create more training data but it's hard to get people excited 
about that. The next best solution is along the lines of what Sean suggested, 
to use post-processing to fix mistakes.
Tim


From: Finan, Sean 
Sent: Friday, June 12, 2020 1:20 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Sentence detector changes [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi Abad,

I can't say anything about Timothy Miller's availability.  He is on the ctakes 
dev mailing list so he may respond if he feels it is necessary.  He is quite 
busy with a lot of groundbreaking work, but I wanted to make sure that he got 
credit for the ..BIO annotator.

The piper file would be just as it was before for the Sentence..BIO with the 
classifier specified.
That would be followed by the lines

add EolSentenceFixer
add MrsDrSentenceJoiner
add AbadsNewDigitJoiner

where AbadsNewDigitJoiner is a custom AE using the logic of MrsDr.. that checks 
for digits before and after the dot (eg "5.5") instead of checking for a person 
title before the dot (eg "Mrs.")

Sean

From: abad.ay...@cognizant.com 
Sent: Friday, June 12, 2020 11:50 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: RE: Sentence detector changes [EXTERNAL]

* External Email - Caution *


Thank you for that quick response Sean :). So you mean to say we can add a new 
custom AE using the similar logic in MrsDr... and refer it in the piper file, 
in that case do we need to again mention the classifier jar path as   
"classifierJarPath=/org/apache/ctakes/core/sentdetect/model.jar".

Also is Timothy Miller available to help us on the issues with ' 
SentenceDetectorAnnotatorBIO ' where sentences are splitted on decimals or 
dates separated with '.'. I hope you guys are safe and doing well during this 
lock down. Stay safe :)

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Friday, June 12, 2020 9:06 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Sentence detector changes [EXTERNAL]

[External]


Hi Abad,

The expert on SentenceDetectorAnnotatorBIO is Timothy Miller, so he might be 
able to weigh in on some of this.

I haven't noticed Sentence..BIO splitting sentences on decimals, but as an AI 
trained model you never quite know what might happen.

You could easily make something like the MrsDr.. that handles decimal problems.

Basically, a copy of MrsDr.. with lines ~62
 if ( (text.endsWith( " Mr." ) || text.endsWith( " Mrs." ) || 
text.endsWith( " Dr." )
   || text.endsWith( " a.m." ) || text.endsWith( " p.m." )
   || text.equals( "Mr." ) || text.equals( "Mrs." ) || text.equals( 
"Dr." ))
  && i < sentenceCount - 1
  && !newlines.contains( sentence.getEnd() ) ) {

to something like

 if ( text.length() > 1
  && text.charAt( text.length()-1 ) == '.'
  && Character.isDigit( text.charAt( text.length()-2 ) )
  && !sentences.get( i+1 ).getCoveredText().isEmpty()
  && Character.isDigit( sentences.get( i+1 
).getCoveredText().charAt( 0 ) ) ) {

That if (..) could be cleaned up a little, but that should do it.

Sean





From: abad.ay...@cognizant.com 
Sent: Friday, June 12, 2020 11:21 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: RE: Sentence detector changes [EXTERNAL]

* External Email - Caution *


Hi Sean,

Thank you for your advise and we tried using the 'SentenceDetectorAnnotatorBIO' 
along with the changes required in piper files as you mentioned and we could 
find that its splitting the sentences based on '.'  only ,  Actually we were 
able to get similar o/p by using the  'SentenceDetectorAnnotator' itself by 
just using '.' as the only eosCandidate in the EOSScannerImpl class.

So will 'SentenceDetectorAnnotatorBIO'  be able to extract sentences using some 
other way. Like some problems we face are the ''SentenceDetectorAnnotatorBIO' ' 
is splitting the sentence whenever it sees a decimal point like 5.5 or a date 
where separated using '.' like 01.01.2020.

Can the AE's EolSentenceFixer & MrsDrSentenceJoiner  be able to resolve our 
above issues where sentences are splitted on encountering decimals or '.' 
separated dates. If it can what are the changes that we need to do in the piper 
file to incorporate the same.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-Original Message-
From: Finan, Sean 
Sent: Thursday, June 11, 2020 9:14 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Sentence detector chang

Re: Missing Medication Frequency and Allergy attributes from MedicationMention [EXTERNAL]

2020-06-06 Thread Miller, Timothy
Hi Honey,
I created a module last year for doing some medication attribute extraction, 
but it is not part of core ctakes yet so you would have to integrate it 
yourself. It uses the typesystem and most of the regular ctakes pipeline so it 
shouldn't be that difficult.
Check it out here:
https://github.com/tmills/ctakes-ade

If you want to give it a try and have questions I'll be happy to try to help. 
There is also a ctakes-drugner module that probably does similar things but I 
don't have experience with it myself.

Tim


From: Honey gandhi 
Sent: Saturday, June 6, 2020 2:53 AM
To: dev@ctakes.apache.org
Subject: Re: Missing Medication Frequency and Allergy attributes from 
MedicationMention [EXTERNAL]

* External Email - Caution *


Is there any other way to find relationship between medication and its 
dose/route/frequency or between anatomical site and its sign symptoms?

Thanks,
Honey G.

> On 06-Jun-2020, at 12:09 PM, Peter Abramowitsch  
> wrote:
>
> Some granular areas are unfinished in cTakes and in these cases, attributes
> mentioned are just placeholders for functionality that needs to be filled
> in.   I can't speak specifically to Medication Freq/Dose/Route, but much
> work is left to be done and contributed throughout the system.  Bodysite is
> another one of these.  Or conditionality and confidence.  In some cases you
> will never find them populated, or in others you'll find that values can
> only be detected in a small number of contexts.
>
> Unless an army of highly qualified developers and informaticists with free
> time materializes to take it much further, cTakes will always be a work in
> progress.  But many of us have already found it to be highly effective in
> its current form, and some have made private customizations to suit our own
> needs.
>
> Peter
>
> On Fri, Jun 5, 2020 at 2:58 AM Honey gandhi
>  wrote:
>
>> Hi
>>
>> We are exploring ctakes capabilities to use it as our NLP engine to parse
>> clinical data.
>>
>> Though we are able to parse the data at high level. We are not able to get
>> values for medication frequency, duration, allergy and other related
>> specifications.
>> It should have ideally populated values for ‘MedicationFrequency',
>> ‘MedicationAllergy' and other related fields in ‘MedicationMention’
>>
>> I have also tried including RelationSubPipe.piper file  from
>> cakes-relation-extractor to my Full.piper files in cakes-web-rest module.
>> But I don’t see any difference this made as I am yet not able to figure
>> out the relation among medication entity and its frequency, dosage etc.
>>
>> We are relatively new to this. Please advise on how to proceed further.
>>
>>
>> Thanks,
>> Honey G.


Re: how to activate inactive features in cTAKES? [EXTERNAL]

2020-04-30 Thread Miller, Timothy
Akram, the typesystem in ctakes was created by a project with the aim of 
specifying things that are useful, without specifying implementations for them 
all. There are many items in the data model that there are no ctakes modules to 
fill. The idea was that when people bring things online there are placeholders 
for that information, so that new functionality is not added in a completely ad 
hoc way. So of the examples you describe:

- discoveryTechnique is always the same because you are running the same 
pipeline
- confidence is not filled in by the dictionary lookup -- the current method 
used does not generate a confidence score
- disambiguated is not filled but is technically correct because there is no 
disambiguation algorithm running
- polarity, uncertainty, conditional, generic, historyOf, can be filled in by 
certain pipelines. You will have to add them after the DictionarySubPIpe to see 
them filled in.

Tim


From: Akram 
Sent: Thursday, April 30, 2020 4:37 AM
To: dev@ctakes.apache.org
Subject: how to activate inactive features in cTAKES? [EXTERNAL]

* External Email - Caution *


Hi
I can extract many tags when I use the default .piper in cTakes
Tags such as LabMention, AnatomicalSiteMention, ProcedureMention, etc they all 
extracted from applying this piper

load DefaultTokenizerPipeline

load DictionarySubPipe

writeHtml
writeXmis

The problem is there are some features that do not change no matter the text 
change.
most importantly confidence which is always 0
How can I get the confidence of each term?
other features such
discoveryTechnique is always 1

polarity always 0

uncertainty always 0

conditional always false

generic always false

historyOf always 0

score always 0

disambiguated always flase

how can I get these features working and where can I find more info about these 
features and what do they mean?
Thanks



Re: Relating MeasurementAnnotations to other IdentifiedAnnotations [EXTERNAL]

2019-08-20 Thread Miller, Timothy
Jeff, I don't think such a thing exists yet. You are right that the 
RelationExtractor would probably be the best place to put it.

I don't know whether there is anything in the type system intended for this -- 
I took a quick look and maybe AttributeRelation is the closest thing I could 
find.

To answer your other question, I think the type system and the structure are 
the project are both designed by people who study language, NLP, information 
extraction, so they may be related in that that group of people probably has a 
lot in common. But since they were in fact many different people there are a 
lot of differences. My sense is that the type system was designed as more of a 
"top-down" exercise, while the modules are added more bottom-up as people get 
the motivation and resources to write them. Hope that is somewhat useful!

Tim

-Original Message-
From: Jeffrey Miller 
mailto:jeffrey%20miller%20%3cjeff...@gmail.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Subject: Relating MeasurementAnnotations to other IdentifiedAnnotations 
[EXTERNAL]
Date: Tue, 20 Aug 2019 16:14:47 -0400


Hi,

Is there any configuration or component in cTAKES that can be used to
attribute a measurement annotation to another annotation that it applies
to? For example, for "2 mm incision" where we relate "2 mm" to "incision"?
It looks like there might be a roundabout way to find the head of the span
of the MeasurementAnnotation in the output of the dependency parser, but I
was wondering if this has been explored before? Perhaps the
RelationExtractor component?

I also have another more general question if anyone can help- how does the
structure of the cTAKES type system effect how cTAKES works? I am looking
for a general intuition of how the structure of the typesystem drives the
larger cTAKES architecture?

Thanks!
Jeff



Re: ML NER for cTakes [EXTERNAL]

2019-08-20 Thread Miller, Timothy
Yes, this is still true. I know there are different folks working on ML-based 
NER but none of it is in main line cTAKES yet. There is some ML in the 
pre-processing stages, and the outputs of that are used by the dictionary tool, 
but the lookup itself is done without learning.
Tim

-Original Message-
From: Maral Amir 
mailto:maral%20amir%20%3cmaraljav...@gmail.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Subject: ML NER for cTakes [EXTERNAL]
Date: Tue, 13 Aug 2019 11:52:40 -0700


Hi,

According to cTakes paper, "the ML NER module is not part of the current
cTAKES release". I was wondering if this is still true and the current
release still uses lookup for NER or we have ML NER for the current version.

Thanks,
Maral



Re: Clinical Processor [EXTERNAL]

2019-08-20 Thread Miller, Timothy
Can you send an error message that is as complete as possible? It is hard to 
tell from the information you've given.
Thanks
Tim


-Original Message-
From: Sébastien Boussard 
mailto:%3d%3fiso-8859-1%3fq%3fs%3de9bastien%3f%3d%20boussard%20%3cbouss...@bu.edu%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Subject: Clinical Processor [EXTERNAL]
Date: Thu, 15 Aug 2019 10:28:51 -0700


I'm working on making a clinical processor, and I've been having a lot of
trouble with the JCasTermAnnotator. It's telling me that it's failing to
initialize. It is connecting to umls and validating. I've had this problem
for a while, is there any other java class I could use. I have the
dictionary and I tried to make a custom dictionary.

Thanks,
Sebastien Boussard



Re: unicode issues [EXTERNAL]

2019-07-18 Thread Miller, Timothy
Thanks Remy, that makes sense, but I'm wondering why I get the correct offsets 
in one way of accessing ctakes (the CVD) but the wrong offsets through another 
way (the REST interface)?

I guess for the fake notes I'm fully in favor of saving as plain text/ascii 
files to simplify things. But there are more unicode characters than we can 
write smart rules for and I'd like to make sure unicode strings at least don't 
screw up offsets, even if we don't process them meaningfully. I'm sure we all 
look forward to generation Z doctor's notes that use the thumbs up/down emojis 
for patient prognosis :).

Tim



-Original Message-
From: Remy Sanouillet 
mailto:remy%20sanouillet%20%3cre...@foreseemed.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: unicode issues [EXTERNAL]
Date: Thu, 18 Jul 2019 13:37:33 -0700

Hi Tim,

What is happening is that your o'clock contains a smart quote (Unicode U+2019) 
which is encoded as three bytes: 0x6f9980, so you have to take those two extra 
bytes into account when counting offsets. For that particular character, it is 
much easier to just preprocess the text and replace all occurrences with the 
simple apostrophe (ASCII 0x6f). The one on your keyboard. It won't change any 
interpretation and it makes life simpler for everyone downstream. You probably 
will want to deal with all extended Unicode characters like emojis otherwise, 
you will encounter the same offset issues.

Rémy Sanouillet
NLP Engineer
re...@foreseemed.com<mailto:xx...@foreseemed.com>


[cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are 
intended solely for the use of the addressee and may contain legally privileged 
and confidential information. If the reader of this message is not the intended 
recipient, or an employee or agent responsible for delivering this message to 
the intended recipient, you are hereby notified that any dissemination, 
distribution, copying, or other use of this message or its attachments is 
strictly prohibited. If you have received this message in error, please notify 
the sender immediately by replying to this message and please delete it from 
your computer.


On Thu, Jul 18, 2019 at 1:20 PM Miller, Timothy 
mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:
I'm having a weird issue with unicode characters in one of the sample notes 
distributed with ctakes. The sentence is:

The right breast and axilla were sterilely prepped and draped in the usual 
standard fashion.  First the right 1 o’clock position 5 cm from the nipple was 
targeted.  Local anesthesia was obtained with 2% xylocaine.  A small skin 
incision was made.  Under ultrasound guidance from a medial approach, 2 passes 
with a 14 gauge biopsy device were performed and sent to pathology.  A clip was 
placed.

The unicode characters are the right single quotes in "o'clock". If I just put 
it in the CVD everything works fine, e.g. I find the drug "xylocaine" at 
location 203-212 and it's highlighted correctly. However, if I use the REST 
interface and send it using the python requests API, I get back the span 
205:214. If we then grab that span we get the wrong string (offset by 2, so 
something like "locaine. "

Any thoughts on where things might be going wrong for the REST interface? Does 
anyone more knowledgeable than me know how UIMA and cTAKES (and java for that 
matter) normally handle unicode?

Tim




unicode issues

2019-07-18 Thread Miller, Timothy
I'm having a weird issue with unicode characters in one of the sample notes 
distributed with ctakes. The sentence is:

The right breast and axilla were sterilely prepped and draped in the usual 
standard fashion.  First the right 1 o’clock position 5 cm from the nipple was 
targeted.  Local anesthesia was obtained with 2% xylocaine.  A small skin 
incision was made.  Under ultrasound guidance from a medial approach, 2 passes 
with a 14 gauge biopsy device were performed and sent to pathology.  A clip was 
placed.

The unicode characters are the right single quotes in "o'clock". If I just put 
it in the CVD everything works fine, e.g. I find the drug "xylocaine" at 
location 203-212 and it's highlighted correctly. However, if I use the REST 
interface and send it using the python requests API, I get back the span 
205:214. If we then grab that span we get the wrong string (offset by 2, so 
something like "locaine. "

Any thoughts on where things might be going wrong for the REST interface? Does 
anyone more knowledgeable than me know how UIMA and cTAKES (and java for that 
matter) normally handle unicode?

Tim



Re: Accessing the External Resource from the UimaContext without Using XML descriptor [EXTERNAL] [SUSPICIOUS]

2019-06-30 Thread Miller, Timothy
Just wanted to make a general comment about this. I've worked on the spelling 
correction problem a tiny bit and it has all of the difficulties you all 
describe, and I think it is also slow in a kind of unavoidable way because it's 
doing quite a bit of extra work on each word.

I still would like a better solution, but I find myself wondering if there's 
good evidence for spelling correction having a real impact on a problem. I 
would like to see a paper saying, "we corrected all the spelling in this subset 
of Mimic, and it had the following effect on performance:"

phenotyping: X -> Y
NER: X -> Y
adverse event detection: X -> Y

This is a serious amount of work to carry out these experiments, and 
potentially for a result that could be negative and difficult to publish. Even 
if I just do it as a thought experiment I have a hard time convincing myself 
that I'll see large gains in these categories.

Tim


From: Finan, Sean 
Sent: Saturday, June 29, 2019 7:00 PM
To: dev@ctakes.apache.org
Subject: Re: Accessing the External Resource from the UimaContext without Using 
XML descriptor [EXTERNAL] [SUSPICIOUS]

I implemented a quick and dirty soundex a few years ago.  Terrible precision.  
I tried using it as a "catch" for terms that were not netted by the regular 
lookup.   Then I found myself running down that rabbit hole trying to identify 
topics like you (Pete) mention ... which just means that I had turned an 
attempt at solving one nlp problem to attempting to solving two.   I crawled 
out and haven't looked back.

Sean

From: Peter Abramowitsch 
Sent: Saturday, June 29, 2019 12:02 PM
To: dev@ctakes.apache.org
Subject: Re: Accessing the External Resource from the UimaContext without Using 
XML descriptor [EXTERNAL]

I've been wondering whether Levenshtein Distance or Soundex have any
potential in the cTakes pipeline. For example, if, after failing the
dictionary lookup, one used something like CSpell to find a potential
concept, but then used one of these linguistic similarity methods to
quantify the difference between it and the source over the text range and
turn that into a confidence value, would it help mitigate overfitting?  I
guess the answer would be how often radically different concepts can differ
by a single character.  Another factor as was hinted at above is that
spelling issues in consumer provided text are completely different in
character from that of the rushed clinician, and these may require
completely different solutions.

On Fri, Jun 28, 2019 at 6:34 AM Remy Sanouillet 
wrote:

> Hi Siamak,
>
> I agree with Sean. Spelling correction in NLP is a bit of a tar baby. We
> attempted to integrate CSpell (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lsg3.nlm.nih.gov_Specialist_Summary_cSpell.html&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=CST_DJHBnyHs2yZy6bNYrEbg8KH5KIjIbtafSbM9NQQ&s=Yka0I-sYj7AQsBAXKF-s02fd6tpXYdHdT1chqkiJ004&e=
>  ) to improve
> recall.
> Unfortunately we had to take if out because the overfitting affected
> precision and increased ambiguity too much.
>
>Remy
>
> On Fri, Jun 28, 2019 at 5:20 AM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Siamak,
> >
> > The problem of misspelled terms is a big one.  I have read about
> > approaches taken by others for research, but nothing has been implemented
> > for ctakes.
> >
> > The only thing that has been done on my projects is addition to the
> > dictionary of common misspellings for a directed project.  For instance,
> in
> > a project specifically addressing brain aneurysms I added to the
> (project)
> > dictionary misspellings like "aneurism", "anurism" and "anurysm".  I
> didn't
> > worry about misspellings for terms that didn't apply to the project; I
> > didn't bother adding things like "skelatal" for "skeletal" because I
> didn't
> > really care if that term was missed.
> >
> > Sean
> > 
> > From: Siamak Barzegar 
> > Sent: Friday, June 28, 2019 6:12 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Accessing the External Resource from the UimaContext without
> > Using XML descriptor [EXTERNAL]
> >
> > Dear Sean,
> >
> > Thank you very much for your help.
> > As you suggested, I use "BsvRareWordDictionary" and create a BSV file for
> > my small lexicon.
> > I am using it in the Spanish medical documents. As you know medical
> > documents have a lot of typos.  I was wondering to know is there any
> > dictionary lookup in cTAKES or another component from other projects that
> > can detect these small typos?
> > for example, if we have this work in dictionary file:
> > C001|T01|Fumador 2 paq*ue*tes
> >
> > And in the document, we have "fumador 2 paq*eu*tes". Is there any way to
> be
> > able to annotate this typo word as well?
> >
> > With Best Wishes,
> > Siamak
> >
> >
> >
> > On Tue, 25 Jun 2019 

RE: Convert type system of a component to cTakes typesysem [EXTERNAL]

2019-06-07 Thread Miller, Timothy
I don't have much experience with Heideltime, but I think this would be a great 
addition to ctakes, so if you know Heideltime a bit and you're willing to put 
in the effort I'm happy to help with your understanding the typesystem. I don't 
know that there's an easy way of 'converting' other than just writing some java 
code in a UIMA analysis engine that converts UIMA types to whatever Heideltime 
reads, makes a call to Heideltime, and then iterates over types output from 
Heideltime and creating the equivalent UIMA types. If you have some more info 
on what sort of conversion you had in mind let me know.
Tim

-Original Message-
From: Siamak Barzegar  
Sent: Thursday, June 6, 2019 5:59 AM
To: dev@ctakes.apache.org
Subject: Convert type system of a component to cTakes typesysem [EXTERNAL]

Dear All,

I want to integrate HeidelTime project (
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_HeidelTime_heideltime&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=OpaELc8Grwv2E1s2bEQTdL8LDw39-LJdnTMrUg6g6wI&s=zze0700gfvrFb8vk0FxhQfRq25AsRVyy2p8RUbQK3-c&e=)
 as a component into cTakes to use it with other components in ctakes (to build 
a pipeline for my task) But the problem is two projects (HeidelTime and cTakes) 
have different typesystems.

is there anyway to convert heidelTime typesystem to cTakes one?

PS: It seems Nactem had a code for it, but it does not work  (
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_argo-2Dnactem_nactem-2Dtype-2Dmapper&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=OpaELc8Grwv2E1s2bEQTdL8LDw39-LJdnTMrUg6g6wI&s=DOFLsY1vAaBiOUXHNn3FuggLUuZBT4-DbwRnNH47xMQ&e=)


With Best Wishes,
Siamak


Re: ctake web service [EXTERNAL]

2019-03-07 Thread Miller, Timothy
That's a good question that I've also heard from others, and unfortunately I 
don't know the answer. My use cases are typically a single job at a time making 
sequential calls, so I wasn't stressing it with multiple asynchronous calls. I 
would've thought that the Tomcat container would have some ability to manage 
that though!
Tim


From: Kathy Ferro 
Sent: Thursday, March 7, 2019 6:10 PM
To: dev@ctakes.apache.org
Subject: Re: ctake web service [EXTERNAL]

Tim,

Does docker solution handle multiple instances?  I tested the Rest Web
Service with 2 requests at the same time, it errors out.  I removed the
part that write the result xml file to the disc; it still error out.

Best,
Kathy

On Mon, Mar 4, 2019 at 10:52 AM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> I don't know what the solution was, but I leave my ctakes REST server
> running basically full time and haven't seen time outs yet.
> Tim
>
> 
> From: gandhi rajan 
> Sent: Monday, March 4, 2019 10:43 AM
> To: dev@ctakes.apache.org
> Subject: Re: ctake web service [EXTERNAL]
>
> Hi Kathy, Sean did respond that there is no timeout happening from cTAKES
> end. You might probably have to look at database settings for this closed
> connection issue.
>
> Does someone have any clue on this?
>
> On Monday, March 4, 2019, Kathy Ferro  wrote:
>
> > Gandhi,
> >
> > Do you get any response to this issue?  Does it try to keep the
> connection
> > open while WS is up? Or does it open and close after it's done?
> >
> > We are still getting this error.
> > "ERROR JdbcRareWordDictionary - No operations allowed after statement
> > closed."
> >
> > Thanks
> > Kathy
> >
> >
> >
> > On Fri, Aug 17, 2018 at 9:43 AM Gandhi Rajan Natarajan <
> > gandhi.natara...@arisglobal.com> wrote:
> >
> > > Hi Kathy,
> > >
> > > Sometime back we encountered this issue and the problem seems to be DB
> > > connections getting timed out.
> > >
> > > Currently we are using the following implementations:
> > >
> "org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary"
> > > and "org.apache.ctakes.dictionary.lookup2.concept.JdbcConceptFactory"
> > >
> > > Does anybody aware of any timeout settings that needs to be done in
> these
> > > implementations to avoid DB connection timeout issue?
> > >
> > > -Original Message-
> > > From: Kathy Ferro 
> > > Sent: Thursday, August 16, 2018 11:07 PM
> > > To: dev@ctakes.apache.org
> > > Subject: ctake web service
> > >
> > > Hi,
> > >
> > > Just want to see if anybody has experience this issue.
> > >
> > > If the web service had been up for a day or two, it will drop the
> > > dictionary lookup.  The only result it returns are ConllDependencyNode
> > tag
> > > in the xmi file;  no mention, no concept, etc...
> > >
> > > I haven't have a chance to investigate it, yet.
> > >
> > > Kathy
> > > This email and any files transmitted with it are confidential and
> > intended
> > > solely for the use of the individual or entity to whom they are
> > addressed.
> > > If you are not the named addressee you should not disseminate,
> distribute
> > > or copy this e-mail. Please notify the sender or system manager by
> email
> > > immediately if you have received this e-mail by mistake and delete this
> > > e-mail from your system. If you are not the intended recipient you are
> > > notified that disclosing, copying, distributing or taking any action in
> > > reliance on the contents of this information is strictly prohibited and
> > > against the law.
> > >
> >
>
>
> --
> Regards,
> Gandhi
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>


Re: ctake web service [EXTERNAL]

2019-03-06 Thread Miller, Timothy
I basically took the lookup descriptor that is used by the fast aggregate 
pipeline in ctakes-clinical-pipeline/desc/
Tim


-Original Message-
From: gandhi rajan 
mailto:gandhi%20rajan%20%3cgandhiraja...@gmail.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org 
mailto:%22...@ctakes.apache.org%22%20%3c...@ctakes.apache.org%3e>>
Subject: Re: ctake web service [EXTERNAL]
Date: Wed, 6 Mar 2019 20:36:05 +0530


I guess this makes a big difference. Tim, are you using the other settings
in custom dictionary xml as is?

On Tuesday, March 5, 2019, Miller, Timothy <
timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:



Assuming you meant me, I'm hosting on an ubuntu linux machine, and I'm
using the hsql dictionary instead of the mysql dictionary.
Tim


-Original Message-
From: Kathy Ferro 
mailto:healthcare1...@gmail.com>mailto:kathy%20ferro%20%3chealthcare1...@gmail.com>%3e>>
Reply-to: mailto:dev@ctakes.apache.org>>
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: Re: ctake web service [EXTERNAL]
Date: Mon, 4 Mar 2019 22:57:22 -0500


Sean,

What machine do you hosting the WS and mysql on?  I am on window 10
server.  mySQL ini file looks fine.  I'm wondering window and mysql are not
being friend.

Thanks
Kathy

On Mon, Mar 4, 2019 at 10:52 AM Miller, Timothy <
timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>mailto:timothy.mil...@childrens.harvard.edu>>>
 wrote:



I don't know what the solution was, but I leave my ctakes REST server
running basically full time and haven't seen time outs yet.
Tim


From: gandhi rajan 
mailto:gandhiraja...@gmail.com><mailto:gandhiraja...@gmail.com









Sent: Monday, March 4, 2019 10:43 AM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: Re: ctake web service [EXTERNAL]

Hi Kathy, Sean did respond that there is no timeout happening from cTAKES
end. You might probably have to look at database settings for this closed
connection issue.

Does someone have any clue on this?

On Monday, March 4, 2019, Kathy Ferro 
mailto:healthcare1...@gmail.com>mailto:healthcare1...@gmail.com>>> wrote:



Gandhi,

Do you get any response to this issue?  Does it try to keep the


connection


open while WS is up? Or does it open and close after it's done?

We are still getting this error.
"ERROR JdbcRareWordDictionary - No operations allowed after statement
closed."

Thanks
Kathy



On Fri, Aug 17, 2018 at 9:43 AM Gandhi Rajan Natarajan <
gandhi.natara...@arisglobal.com<mailto:gandhi.natara...@arisglobal.com><mailto:gandhi.natara...@arisglobal.com>>
wrote:



Hi Kathy,

Sometime back we encountered this issue and the problem seems to be DB
connections getting timed out.

Currently we are using the following implementations:





"org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary"




and "org.apache.ctakes.dictionary.lookup2.concept.JdbcConceptFactory"

Does anybody aware of any timeout settings that needs to be done in




these




implementations to avoid DB connection timeout issue?

-Original Message-
From: Kathy Ferro 
mailto:healthcare1...@gmail.com>mailto:healthcare1...@gmail.com>>>
Sent: Thursday, August 16, 2018 11:07 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: ctake web service

Hi,

Just want to see if anybody has experience this issue.

If the web service had been up for a day or two, it will drop the
dictionary lookup.  The only result it returns are ConllDependencyNode


tag


in the xmi file;  no mention, no concept, etc...

I haven't have a chance to investigate it, yet.

Kathy
This email and any files transmitted with it are confidential and


intended


solely for the use of the individual or entity to whom they are


addressed.


If you are not the named addressee you should not disseminate,




distribute




or copy this e-mail. Please notify the sender or system manager by




email




immediately if you have received this e-mail by mistake and delete this
e-mail from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited and
against the law.








--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others
!!!"









Re: ctake web service [EXTERNAL] [SUSPICIOUS]

2019-03-05 Thread Miller, Timothy
The custom dictionary lookup descriptor I use is the one I checked into the svn 
repo:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/docker/customDictionary.xml?view=markup

for this to work you need to have the snomed/rxnorm dictionary somewhere that 
the dictionary module looks for dictionaries (like in $CTAKES_HOME/resources/)

You can also check out the Docker build file for a step-by-step of setting up 
and building the war file with that setup.
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/docker/Dockerfile?view=markup

Tim


-Original Message-
From: "Miller, Timothy" 
mailto:%22Miller,%20timothy%22%20%3ctimothy.mil...@childrens.harvard.edu%3e>>
Reply-to: 
To: dev@ctakes.apache.org 
mailto:%22...@ctakes.apache.org%22%20%3c...@ctakes.apache.org%3e>>
Subject: Re: ctake web service [EXTERNAL] [SUSPICIOUS]
Date: Tue, 5 Mar 2019 15:34:18 +


Assuming you meant me, I'm hosting on an ubuntu linux machine, and I'm using 
the hsql dictionary instead of the mysql dictionary.
Tim


-Original Message-
From: Kathy Ferro 
mailto:healthcare1...@gmail.com><mailto:kathy%20ferro%20%3chealthcare1...@gmail.com%3e>>
Reply-to: mailto:dev@ctakes.apache.org>>
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: Re: ctake web service [EXTERNAL]
Date: Mon, 4 Mar 2019 22:57:22 -0500


Sean,

What machine do you hosting the WS and mysql on?  I am on window 10
server.  mySQL ini file looks fine.  I'm wondering window and mysql are not
being friend.

Thanks
Kathy

On Mon, Mar 4, 2019 at 10:52 AM Miller, Timothy <
timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu><mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:



I don't know what the solution was, but I leave my ctakes REST server
running basically full time and haven't seen time outs yet.
Tim


From: gandhi rajan 
mailto:gandhiraja...@gmail.com><mailto:gandhiraja...@gmail.com>>
Sent: Monday, March 4, 2019 10:43 AM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: Re: ctake web service [EXTERNAL]

Hi Kathy, Sean did respond that there is no timeout happening from cTAKES
end. You might probably have to look at database settings for this closed
connection issue.

Does someone have any clue on this?

On Monday, March 4, 2019, Kathy Ferro 
mailto:healthcare1...@gmail.com><mailto:healthcare1...@gmail.com>>
 wrote:



Gandhi,

Do you get any response to this issue?  Does it try to keep the


connection


open while WS is up? Or does it open and close after it's done?

We are still getting this error.
"ERROR JdbcRareWordDictionary - No operations allowed after statement
closed."

Thanks
Kathy



On Fri, Aug 17, 2018 at 9:43 AM Gandhi Rajan Natarajan <
gandhi.natara...@arisglobal.com<mailto:gandhi.natara...@arisglobal.com><mailto:gandhi.natara...@arisglobal.com>>
 wrote:



Hi Kathy,

Sometime back we encountered this issue and the problem seems to be DB
connections getting timed out.

Currently we are using the following implementations:





"org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary"




and "org.apache.ctakes.dictionary.lookup2.concept.JdbcConceptFactory"

Does anybody aware of any timeout settings that needs to be done in




these




implementations to avoid DB connection timeout issue?

-Original Message-
From: Kathy Ferro 
mailto:healthcare1...@gmail.com><mailto:healthcare1...@gmail.com>>
Sent: Thursday, August 16, 2018 11:07 PM
To: 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org>
Subject: ctake web service

Hi,

Just want to see if anybody has experience this issue.

If the web service had been up for a day or two, it will drop the
dictionary lookup.  The only result it returns are ConllDependencyNode


tag


in the xmi file;  no mention, no concept, etc...

I haven't have a chance to investigate it, yet.

Kathy
This email and any files transmitted with it are confidential and


intended


solely for the use of the individual or entity to whom they are


addressed.


If you are not the named addressee you should not disseminate,




distribute




or copy this e-mail. Please notify the sender or system manager by




email




immediately if you have received this e-mail by mistake and delete this
e-mail from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited and
against the law.








--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others
!!!"





Re: ctake web service [EXTERNAL]

2019-03-05 Thread Miller, Timothy
Assuming you meant me, I'm hosting on an ubuntu linux machine, and I'm using 
the hsql dictionary instead of the mysql dictionary.
Tim


-Original Message-
From: Kathy Ferro 
mailto:kathy%20ferro%20%3chealthcare1...@gmail.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: ctake web service [EXTERNAL]
Date: Mon, 4 Mar 2019 22:57:22 -0500


Sean,

What machine do you hosting the WS and mysql on?  I am on window 10
server.  mySQL ini file looks fine.  I'm wondering window and mysql are not
being friend.

Thanks
Kathy

On Mon, Mar 4, 2019 at 10:52 AM Miller, Timothy <
timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:



I don't know what the solution was, but I leave my ctakes REST server
running basically full time and haven't seen time outs yet.
Tim


From: gandhi rajan mailto:gandhiraja...@gmail.com>>
Sent: Monday, March 4, 2019 10:43 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: ctake web service [EXTERNAL]

Hi Kathy, Sean did respond that there is no timeout happening from cTAKES
end. You might probably have to look at database settings for this closed
connection issue.

Does someone have any clue on this?

On Monday, March 4, 2019, Kathy Ferro 
mailto:healthcare1...@gmail.com>> wrote:



Gandhi,

Do you get any response to this issue?  Does it try to keep the


connection


open while WS is up? Or does it open and close after it's done?

We are still getting this error.
"ERROR JdbcRareWordDictionary - No operations allowed after statement
closed."

Thanks
Kathy



On Fri, Aug 17, 2018 at 9:43 AM Gandhi Rajan Natarajan <
gandhi.natara...@arisglobal.com<mailto:gandhi.natara...@arisglobal.com>> wrote:



Hi Kathy,

Sometime back we encountered this issue and the problem seems to be DB
connections getting timed out.

Currently we are using the following implementations:





"org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary"




and "org.apache.ctakes.dictionary.lookup2.concept.JdbcConceptFactory"

Does anybody aware of any timeout settings that needs to be done in




these




implementations to avoid DB connection timeout issue?

-Original Message-
From: Kathy Ferro mailto:healthcare1...@gmail.com>>
Sent: Thursday, August 16, 2018 11:07 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: ctake web service

Hi,

Just want to see if anybody has experience this issue.

If the web service had been up for a day or two, it will drop the
dictionary lookup.  The only result it returns are ConllDependencyNode


tag


in the xmi file;  no mention, no concept, etc...

I haven't have a chance to investigate it, yet.

Kathy
This email and any files transmitted with it are confidential and


intended


solely for the use of the individual or entity to whom they are


addressed.


If you are not the named addressee you should not disseminate,




distribute




or copy this e-mail. Please notify the sender or system manager by




email




immediately if you have received this e-mail by mistake and delete this
e-mail from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited and
against the law.








--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others
!!!"




Re: ctake web service [EXTERNAL]

2019-03-04 Thread Miller, Timothy
I don't know what the solution was, but I leave my ctakes REST server running 
basically full time and haven't seen time outs yet.
Tim


From: gandhi rajan 
Sent: Monday, March 4, 2019 10:43 AM
To: dev@ctakes.apache.org
Subject: Re: ctake web service [EXTERNAL]

Hi Kathy, Sean did respond that there is no timeout happening from cTAKES
end. You might probably have to look at database settings for this closed
connection issue.

Does someone have any clue on this?

On Monday, March 4, 2019, Kathy Ferro  wrote:

> Gandhi,
>
> Do you get any response to this issue?  Does it try to keep the connection
> open while WS is up? Or does it open and close after it's done?
>
> We are still getting this error.
> "ERROR JdbcRareWordDictionary - No operations allowed after statement
> closed."
>
> Thanks
> Kathy
>
>
>
> On Fri, Aug 17, 2018 at 9:43 AM Gandhi Rajan Natarajan <
> gandhi.natara...@arisglobal.com> wrote:
>
> > Hi Kathy,
> >
> > Sometime back we encountered this issue and the problem seems to be DB
> > connections getting timed out.
> >
> > Currently we are using the following implementations:
> > "org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary"
> > and "org.apache.ctakes.dictionary.lookup2.concept.JdbcConceptFactory"
> >
> > Does anybody aware of any timeout settings that needs to be done in these
> > implementations to avoid DB connection timeout issue?
> >
> > -Original Message-
> > From: Kathy Ferro 
> > Sent: Thursday, August 16, 2018 11:07 PM
> > To: dev@ctakes.apache.org
> > Subject: ctake web service
> >
> > Hi,
> >
> > Just want to see if anybody has experience this issue.
> >
> > If the web service had been up for a day or two, it will drop the
> > dictionary lookup.  The only result it returns are ConllDependencyNode
> tag
> > in the xmi file;  no mention, no concept, etc...
> >
> > I haven't have a chance to investigate it, yet.
> >
> > Kathy
> > This email and any files transmitted with it are confidential and
> intended
> > solely for the use of the individual or entity to whom they are
> addressed.
> > If you are not the named addressee you should not disseminate, distribute
> > or copy this e-mail. Please notify the sender or system manager by email
> > immediately if you have received this e-mail by mistake and delete this
> > e-mail from your system. If you are not the intended recipient you are
> > notified that disclosing, copying, distributing or taking any action in
> > reliance on the contents of this information is strictly prohibited and
> > against the law.
> >
>


--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"


Re: Looking for cTakes deployment strategies [EXTERNAL]

2019-01-29 Thread Miller, Timothy
Yousof,
I have seen this with SentenceDetectorAnnotatorBIO.xml annotator, but with the 
one you describe, I thought it had a hard-coded rule to break on newlines and 
split them into sentences. Do you have any log files that you can copy/paste 
the initialization lines so we can verify which sentence segmenter you're 
running?
Tim


From: Joseph Erfani 
Sent: Monday, January 28, 2019 6:19 PM
To: dev@ctakes.apache.org
Subject: Re: Looking for cTakes deployment strategies [EXTERNAL]

Hello everyone,
I have a question regarding the cTakes sentence detector. I am using
the "SentenceDetectorAnnotator.xml"
analysis engine located in the ctakes-core for sentence boundary detection.
It seems that the sentence boundary engine is not able to find the sentence
boundary, when a sentence is finished with a carriage return instead of a
period or several spaces.
e.g. the note
"He is a smoker
He has hypertension"

all the text is considered as one sentence, while there is a carriage
return after the word 'smoker' (at the end of the first sentence).
Have you encountered similar problem or do  you  have any suggestion for
this?

Thank you
Yousof

On Wed, Jan 16, 2019 at 10:47 AM Anusha Balasubramaniam <
anus...@foreseemed.com> wrote:

> Hello everyone,
>
> I am looking for a strategy to use cTakes to asynchronously process
> thousands of clinical notes by listening to a queue on AWS and maintaining
> a hot process with all the dictionaries loaded in memory. So far I've had
> some success using the REST server wrapper I found here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dirkweissenborn_ctakes-2Dserver&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=NkatAgDcxp3wmhmcluDwWbJYocosvqWSD3kcmDDGtHU&s=e3GmZyP0_WM8lPbTwxDcmbT1Qspwfgj-tSbYM3Wk-Q0&e=,
>  but it's still a
> synchronous call, which I found hard to scale.
> Are there any other wrappers out there that could be used to enable cTakes
> to listen to a port for input? Can anyone share some strategies they used
> to implement cTakes on AWS to achieve similar requirements?
>
> Thanks and Regards,
> Anusha
>


Re: ctakes-web-rest changes [EXTERNAL]

2019-01-23 Thread Miller, Timothy
I checked in some code to wrap the REST server in a docker container. The good 
news is, it lets you run a ctakes rest server with a pretty simple build 
command that should be system independent! The bad news is, the image is 16Gb, 
and it has a hard time running on a machine with 8Gb. So this is a work in 
progress, but if anyone wants to try it I'd be happy to hear how it works for 
you. It is in ctakes-web-rest/docker.
Just run:
docker build -t ctakes-web-rest .
from that directory, then run:
start_rest.sh
It will take a while for the server to start up because it needs to unpack the 
.war file and initialize all the UIMA modules. If you run:
docker logs 
you will be able to see how much progress it has made.
Once it's started you can navigate in a web browser to 
localhost:8080/ctakes-web-rest and you should see it. Or from a REST client api 
the url will be localhost:8080/ctakes-web-rest/service/analyze

Thanks
Tim


-Original Message-
From: gandhi rajan 
mailto:gandhi%20rajan%20%3cgandhiraja...@gmail.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org 
mailto:%22...@ctakes.apache.org%22%20%3c...@ctakes.apache.org%3e>>
Subject: Re: ctakes-web-rest changes [EXTERNAL]
Date: Sat, 22 Dec 2018 08:40:20 +0530


Thanks Tim. Great work.

On Friday, December 21, 2018, Miller, Timothy <
timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:



There is certainly no need to apologize! It's 100x easier for me to change
an existing version that runs than to write it from scratch since I don't
really know REST that well, so thanks for contributing that code. That's
the beauty of open source teams with different expertise!
Tim


From: Gandhi Rajan Natarajan 
mailto:gandhi.natara...@arisglobal.com>>
Sent: Friday, December 21, 2018 3:13 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: ctakes-web-rest changes [EXTERNAL]

Hi Tim,

Thanks for taking your time out and checking this. Have left my comments
in the JIRA issue. Sorry that I could not improvise on the REST module
which is more suitable for our business needs due to lack of domain
expertise.

Regards,
Gandhi

-Original Message-
From: Miller, Timothy 
mailto:timothy.mil...@childrens.harvard.edu>>
Sent: Friday, December 21, 2018 1:54 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: ctakes-web-rest changes

Hello all,
I've been trying out the ctakes-web-rest module for a project that uses
python where I wanted an easy way to send a sentence and get back some CUI
annotations. There was an issue where the returned json map was keyed by
the string of the concept, so it would only return one discovered concept
if more than one had the same string. In the course of fixing that I
noticed the code was writing the CAS to xmi, then manually parsing that
file, rather than just interrogating the JCas object, so I rewrote that as
well to use uimafit. Finally, I commented out the "full" pipeline -- it is
just too resource heavy to try to run 2 independent pipelines in parallel
on the same machine. I think the state of the module right now is suitable
for people who want to try and would make their own changes if they want
different pipelines (i.e., it's not yet shrink-wrapped) so I would prefer
it in a state with a simple pipeline that runs well.

Please take a look at the following issue with the attached patch and let
me know if there are any obvious problems.
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
apache.org_jira_browse_CTAKES-2D529&d=DwIGaQ&c=qS4goWBT7poplM69zy_
3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=k-ebO4GxtYSuyXd6BYi7jXvTFAafL_
nm1IIPeVzHdKA&s=yHIpAw72nyKeovPpQpuIFW1AxiENG54X5iOIKTtxtto&e=

Overall, it's in nice shape and I'm excited to get it into a usable shape,
I think this is a use case that would satisfy a lot of users.

Tim

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the named addressee you should not disseminate, distribute
or copy this e-mail. Please notify the sender or system manager by email
immediately if you have received this e-mail by mistake and delete this
e-mail from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited and
against the law.








Re: uima-as examples [EXTERNAL]

2019-01-18 Thread Miller, Timothy
Greg - I've developed a cluster-like architecture that uses Docker-wrapped 
UIMA-AS components on AWS for scalability. It's a work in progress but it might 
be helpful:
https://github.com/tmills/ctakes-docker
Tim


-Original Message-
From: Greg Silverman mailto:greg%20silverman%20%3c...@umn.edu%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Cc: Raymond Finzel 
mailto:raymond%20finzel%20%3cfinze...@umn.edu%3e>>, Reed 
McEwan mailto:reed%20mcewan%20%3crmce...@umn.edu%3e>>
Subject: Re: uima-as examples [EXTERNAL]
Date: Fri, 18 Jan 2019 12:23:53 -0600


Thanks Peter,
The architecture for our project 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nlpie_nlp-2Dadapt-2Dkube&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=WEY8xYYIUiTWnZDnwU72eUiyHXNWFAi3vY9DMayfV-g&s=fvf05Pvhnq2FEnKxgYHuXibuP5Is9-bZCEE8-cbqq8M&e=,
uima-as branch under current development), relies heavily on uima-as to
work in conjunction with ActiveMQ and a home spun multiplexer/collection
processing client to do all the heavy lifting for the nlp-engines we're
using. Currently, CLAMP, and BioMedICUS both support UIMA-AS out-of-the-box
(I'm looking into MetaMap, as I type this).

To the best of my knowledge, the MQ and broker work together (at least in
ActiveMQ).

Given the volume of documents we need to process and the constraint of
being tied to UIMA, UIMA-AS is the easiest option for implementing at
scale, for both speed and fault tolerance.

If anyone has done any work trying to integrate UIMA-AS into cTAKES we
would be very interested in this. Retrofitting a different solution into
our architecture at this time is not feasible.

Thanks very much!

Best!

Greg-



On Thu, Jan 17, 2019 at 10:08 PM Peter Abramowitsch 
mailto:pabramowit...@gmail.com>>
wrote:



I used a completely different approach that allows parallel but not async
processing.  Multiple [analysis engine+cas] pair objects pre-instantiated
into into a threadsafe pool running behind a web service interface. We can
fully saturate a single ctakes server process using multiple client
processes talking to that API each working synchronously and arriving at an
overall speed of 10-15 6K notes per second on a single server process.

I haven't used AS but it looks as if that middleware could have too many
moving parts for our needs.  They would generate many wakeups and context
switches adding undesired latency as a request makes its way to the
server.   I'm assuming that in AS, the broker and the MQ are separate
processes and not just in-process subsystems to the ctakes server process.
Is that right?

On Thu, Jan 17, 2019 at 4:09 PM Greg Silverman 
mailto:g...@umn.edu>> wrote:



Anyone out there developed a pipeline using UIMA-AS, as opposed to the
CPE/CPM file reader?

Thanks in advance!

Greg--

--
Greg M. Silverman
Senior Systems Developer
NLP/IE 

Cardiovascular Informatics 

University of Minnesota
g...@umn.edu

 ›  evaluate-it.org  ‹











Re: Looking for cTakes deployment strategies [EXTERNAL]

2019-01-16 Thread Miller, Timothy
Hi Anusha,
I've been working on a project that hasn't merged with ctakes yet, but has a 
github page:
https://github.com/tmills/ctakes-docker

it is a work in progress and so documentation is not great, but I've used it to 
do exactly what you're asking about -- setup a ctakes cluster on AWS to process 
millions of notes.

See the README for a general introduction and then take a look at the script 
bin/launch_cluster.sh

Tim


-Original Message-
From: Anusha Balasubramaniam 
mailto:anusha%20balasubramaniam%20%3canus...@foreseemed.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Subject: Looking for cTakes deployment strategies [EXTERNAL]
Date: Wed, 16 Jan 2019 10:40:55 -0800


Hello everyone,

I am looking for a strategy to use cTakes to asynchronously process
thousands of clinical notes by listening to a queue on AWS and maintaining
a hot process with all the dictionaries loaded in memory. So far I've had
some success using the REST server wrapper I found here:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dirkweissenborn_ctakes-2Dserver&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=YqHlEhy_rtyv1ECpkh6Nju79T2jpGNkfIfaDhI6C4nw&s=49CVRWzKU6zTCFHD70RiQCbBdtOLb9uZHsNa3HY7hg4&e=,
 but it's still a
synchronous call, which I found hard to scale.
Are there any other wrappers out there that could be used to enable cTakes
to listen to a port for input? Can anyone share some strategies they used
to implement cTakes on AWS to achieve similar requirements?

Thanks and Regards,
Anusha



Re: Question about negation [EXTERNAL]

2019-01-16 Thread Miller, Timothy
No, SHARPn was a later project. I'm not sure if there is any overlap in the 
datasets.

There are 2 ways to look at the features, one is to read this paper:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112774

and another is to look at the source:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/cleartk/AssertionCleartkAnalysisEngine.java?view=markup

Tim

-Original Message-
From: ouyeyu panyu 
mailto:ouyeyu%20panyu%20%3couy...@gmail.com%3e>>
Reply-to: 
To: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org>
Cc: dev@ctakes.apache.org 
mailto:%22...@ctakes.apache.org%22%20%3c...@ctakes.apache.org%3e>>
Subject: Re: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 08:09:06 -0800

Hi Timothy,

Thank you very much for the quick response.

https://pdfs.semanticscholar.org/8f2c/a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf<https://urldefense.proofpoint.com/v2/url?u=https-3A__pdfs.semanticscholar.org_8f2c_a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=bdfSiGGOpy6_mnRe0CZd0-wjjUpY-DH7SrOU5_WMkZE&s=UhoZqDN8rO9tb4R791cI7gKRT7zn_O2yZ8VZpbsD3Ek&e=>
 says
The Mayo-derived linguistically annotated corpus (Mayo) was developed in-house 
and consisted of 273 clinical notes (100 650 tokens; 7299 sentences; 61 
consult; 1 discharge summary; 4 educational visit; 4 general medical 
examination; 48 limited exam; 19 multi-system evaluation; 43 miscellaneous; 1 
preoperative medical evaluation; 3 report; 3 specialty evaluation; 5 dismissal 
summary; 73 subsequent visit; 5 therapy; 3 test-oriented miscellaneous).

Is SHARPn based on the aforementioned 273 clinical notes?
Also is there a way for me to look into the trained SVM model? Say what are 
features there and their weights?

Best,
Yu Pan


On Wed, Jan 16, 2019 at 7:58 AM Miller, Timothy 
mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:
It uses an SVM model. The training data is from a project called SHARPn, it is 
notes from Mayo Clinic with a variety of note types and specialties represented.

As for the example, is it a real example that someone wrote "Deny hepatitis"? 
That sounds more like a command than documentation of a negated concept 
("denies" or "denied" would seem more common?). Even if that is a real example, 
I think it's unusual enough that there are probably not examples of "Deny X" in 
the training data.

Tim


-Original Message-
From: ouyeyu panyu 
mailto:ouyeyu%20panyu%20%3couy...@gmail.com%3e>>
Reply-to: mailto:u...@ctakes.apache.org>>
To: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org>, 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 07:51:20 -0800

Hi ctakes dev team,

I have one question, hope someone can help me with it.
For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis” 
returns polarity=1.
It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for negation, 
which is machine learning based.
It seems this issue is caused by the training data. Is this true? And what is 
the training data and what machine learning algorithm is used? LogisticRegress, 
SVM, RandomForest or something else?
Thanks.



Re: Question about negation [EXTERNAL]

2019-01-16 Thread Miller, Timothy
It uses an SVM model. The training data is from a project called SHARPn, it is 
notes from Mayo Clinic with a variety of note types and specialties represented.

As for the example, is it a real example that someone wrote "Deny hepatitis"? 
That sounds more like a command than documentation of a negated concept 
("denies" or "denied" would seem more common?). Even if that is a real example, 
I think it's unusual enough that there are probably not examples of "Deny X" in 
the training data.

Tim


-Original Message-
From: ouyeyu panyu 
mailto:ouyeyu%20panyu%20%3couy...@gmail.com%3e>>
Reply-to: 
To: u...@ctakes.apache.org, 
dev@ctakes.apache.org
Subject: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 07:51:20 -0800

Hi ctakes dev team,

I have one question, hope someone can help me with it.
For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis” 
returns polarity=1.
It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for negation, 
which is machine learning based.
It seems this issue is caused by the training data. Is this true? And what is 
the training data and what machine learning algorithm is used? LogisticRegress, 
SVM, RandomForest or something else?
Thanks.


Re: AggregateCdaUmlsprocessor only annotates last section of CDA document [EXTERNAL] [SUSPICIOUS]

2019-01-11 Thread Miller, Timothy
Looks like someone fixed that as part of a different issue:
https://issues.apache.org/jira/browse/CTAKES-500
Tim


-Original Message-
From: "Finan, Sean" 
mailto:%22Finan,%20sean%22%20%3csean.fi...@childrens.harvard.edu%3e>>
Reply-to: 
To: dev@ctakes.apache.org 
mailto:%22...@ctakes.apache.org%22%20%3c...@ctakes.apache.org%3e>>
Subject: Re: AggregateCdaUmlsprocessor only annotates last section of CDA 
document [EXTERNAL] [SUSPICIOUS]
Date: Fri, 11 Jan 2019 16:05:21 +


Hi Sana,

This might be related to

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D450-3Ffilter-3D-2D5-26jql-3Dproject-2520-253D-2520CTAKES-2520AND-2520resolution-2520-253D-2520Unresolved-2520AND-2520-2522Attachment-2520count-2522-2520-253C-253D-2520-25222-2522-2520AND-2520-2522Attachment-2520count-2522-2520-253E-253D-2520-25221-2522-2520order-2520by-2520priority-2520DESC-252Cupdated-2520DESC&d=DwIFAw&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Ojz-Ww86QvcLG1VBfECfCcNudtXNQIe7c-jJ_UMXtiE&s=sd2GH6n5nOzk4vtOA4qKh0kULci4rCiBDMWyM0IKU0Y&e=

If anybody has time to test and approve the patch attached to that tar please 
let me know so that it can be checked in.

Thanks,
Sean

From: Sana Riaz 
mailto:sana.r...@xflowresearch.com>>
Sent: Friday, January 11, 2019 5:33 AM
To: dev@ctakes.apache.org
Subject: AggregateCdaUmlsprocessor only annotates last section of CDA document 
[EXTERNAL]

Hi,
I am trying to process CDA documents with AggregateCdaUMLSProcessor.xml
descriptor (clinical-pipeline). The cda document includes sections like
problems, medications, allergies, tests etc. In the plain_view, all these
section are visible in CVD but all the annotations extracted by
AggregateCdaUMLSProcessor are only on last section. i.e. there's no
annotation on the medications or problems.

I've looked into CdaCasInitializer output , and it only passes one segment
(the last one) so all the other annotators only process on that. In
addition to that, every section's id (including last) is assigned null as
[start section id="null"]

[end section id="null"]

Do i have to assign section id's myself? Any suggestion would be very
helpful.

Warm Regards,

Sana Riaz



SemanticCleanupTermConsumer

2018-12-31 Thread Miller, Timothy
Sean (and team),
I was using PrecisionTermConsumer for my ctakes-web-rest implementation hoping 
to avoid any overlaps at all, but when I saw some overlaps I noticed the 
comment:
PrecisionTermConsumer will only persist only the longest overlapping span of 
any semantic group.

So with this term consumer, "colon cancer" goes from 3 spans (colon, cancer, 
colon cancer) to 2 (colon, colon cancer) since cancer and colon cancer have the 
same semantic group. But if I want it to go to 1 (colon cancer), is that what 
SemanticCleanupTermConsumer does?

Tim



Re: ctakes-web-rest changes [EXTERNAL]

2018-12-21 Thread Miller, Timothy
There is certainly no need to apologize! It's 100x easier for me to change an 
existing version that runs than to write it from scratch since I don't really 
know REST that well, so thanks for contributing that code. That's the beauty of 
open source teams with different expertise!
Tim


From: Gandhi Rajan Natarajan 
Sent: Friday, December 21, 2018 3:13 AM
To: dev@ctakes.apache.org
Subject: RE: ctakes-web-rest changes [EXTERNAL]

Hi Tim,

Thanks for taking your time out and checking this. Have left my comments in the 
JIRA issue. Sorry that I could not improvise on the REST module which is more 
suitable for our business needs due to lack of domain expertise.

Regards,
Gandhi

-Original Message-----
From: Miller, Timothy 
Sent: Friday, December 21, 2018 1:54 AM
To: dev@ctakes.apache.org
Subject: ctakes-web-rest changes

Hello all,
I've been trying out the ctakes-web-rest module for a project that uses python 
where I wanted an easy way to send a sentence and get back some CUI 
annotations. There was an issue where the returned json map was keyed by the 
string of the concept, so it would only return one discovered concept if more 
than one had the same string. In the course of fixing that I noticed the code 
was writing the CAS to xmi, then manually parsing that file, rather than just 
interrogating the JCas object, so I rewrote that as well to use uimafit. 
Finally, I commented out the "full" pipeline -- it is just too resource heavy 
to try to run 2 independent pipelines in parallel on the same machine. I think 
the state of the module right now is suitable for people who want to try and 
would make their own changes if they want different pipelines (i.e., it's not 
yet shrink-wrapped) so I would prefer it in a state with a simple pipeline that 
runs well.

Please take a look at the following issue with the attached patch and let me 
know if there are any obvious problems.
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D529&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=k-ebO4GxtYSuyXd6BYi7jXvTFAafL_nm1IIPeVzHdKA&s=yHIpAw72nyKeovPpQpuIFW1AxiENG54X5iOIKTtxtto&e=

Overall, it's in nice shape and I'm excited to get it into a usable shape, I 
think this is a use case that would satisfy a lot of users.

Tim

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.


ctakes-web-rest changes

2018-12-20 Thread Miller, Timothy
Hello all,
I've been trying out the ctakes-web-rest module for a project that uses python 
where I wanted an easy way to send a sentence and get back some CUI 
annotations. There was an issue where the returned json map was keyed by the 
string of the concept, so it would only return one discovered concept if more 
than one had the same string. In the course of fixing that I noticed the code 
was writing the CAS to xmi, then manually parsing that file, rather than just 
interrogating the JCas object, so I rewrote that as well to use uimafit. 
Finally, I commented out the "full" pipeline -- it is just too resource heavy 
to try to run 2 independent pipelines in parallel on the same machine. I think 
the state of the module right now is suitable for people who want to try and 
would make their own changes if they want different pipelines (i.e., it's not 
yet shrink-wrapped) so I would prefer it in a state with a simple pipeline that 
runs well.

Please take a look at the following issue with the attached patch and let me 
know if there are any obvious problems.
https://issues.apache.org/jira/browse/CTAKES-529

Overall, it's in nice shape and I'm excited to get it into a usable shape, I 
think this is a use case that would satisfy a lot of users.

Tim



Re: Recognising Concept and its Value for text without space [EXTERNAL]

2018-11-07 Thread Miller, Timothy
Hi Zakir,
I think the problem here is that the default tokenizer will never split up a 
string like POD10 into ['POD', '10'] since there is no whitespace. The 
dictionary lookup uses tokens as the unit of analysis, so unless something like 
POD10 is in the dictionary database you will not get a hit for POD (which I 
assume is what you wanted). The only solution I can think of is to write your 
own tokenizer class, and swap it for the default tokenizer and re-run your 
pipeline.
Tim


-Original Message-
From: Zakir Saifi 
mailto:zakir%20saifi%20%3czakir.sa...@raxa.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Subject: Recognising Concept and its Value for text without space [EXTERNAL]
Date: Thu, 1 Nov 2018 16:38:41 +0530


Hi, Everyone. I want Ctakes, to recognise a concept its value from the text
for those strings in which there is no space between concept and its value
For eg. POD10 (Post Operative Day 10), Pulse120. How can I achieve this in
Ctakes?




test

2018-09-14 Thread Miller, Timothy
Please ignore.
Tim



Re: Cannot authenticate license on REST API TRACKING:000308016 [EXTERNAL]

2018-07-19 Thread Miller, Timothy
Are you providing your password via the xml descriptor file or an environment 
variable? The only thing I can think of is that there might be some 
misformatting in the xml, like an extra trailing space/newline in the field 
where one of the username/password goes.
Tim


From: Jain, Ritika 
Sent: Thursday, July 19, 2018 7:15 AM
To: dev@ctakes.apache.org
Subject: RE: Cannot authenticate license on REST API TRACKING:000308016 
[EXTERNAL]

Hi Sean

See this reply from UMLS support


That endpoint (documented here:
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__emea01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofpoint.com-252Fv2-252Furl-253Fu-253Dhttps-2D3A-5F-5Futs.nlm.nih.gov-5Fhelp-5Flicense-5Fvalidateumlsuserhelp.html-2526d-253DDwIFAg-2526c-253DqS4goWBT7poplM69zy-5F3xhKwEW14JZMSdioCoppxeFU-2526r-253Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao-2526m-253DfQiwb4h2SxUTGfMyinBlOo9wdbQdJuM3zugwflzf1F8-2526s-253DuO0nsPPev-2DybnYKedCLy-5F4HwS1GZsf7u-2D8H5w2UOyek-2526e-26amp-3Bdata-3D02-257C01-257C-257C032497f07be34e9a6b3908d5ecd6787f-257C1a407a2d76754d178692b3ac285306e4-257C0-257C0-257C636675328259370278-26amp-3Bsdata-3DkRL2rxurzA6WxsuiSYm9zRwvVeaMAys3dXFcbR1y-252BZc-253D-26amp-3Breserved-3D0-3D%26d%3DDwIFAg%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DGqHa003bWO-wQ-O5I0ufpc_LJHggfChk83dWdrndMS4%26s%3D-bDdjhQcUbADrAh24ci4iuJtNAILeTUJ7wYnpPORBQU%26e&data=02%7C01%7C%7C2cb58d890ef94c7aa76408d5ece0669d%7C1a407a2d76754d178692b3ac285306e4%7C0%7C0%7C636675370907437611&sdata=MAlPCxcpI%2F5QX92datNpg%2BfZjSyb9IBRl%2FE7q9mevzw%3D&reserved=0=
)
is not meant for end users, so it will not work with your license code and 
username.


The ctakes CVD uses the same end point ( also, pointed out in the logs I 
shared).

Regards,
Ritika

-Original Message-
From: Finan, Sean 
Sent: Thursday, July 19, 2018 4:39 PM
To: dev@ctakes.apache.org
Subject: Re: Cannot authenticate license on REST API TRACKING:000308016 
[EXTERNAL]

Hi Ritika,

I am glad that adding your proxy information got you one step closer to a 
working configuration.  However, I cannot say why your password isn't being 
properly validated.  If you can reach the umls server and your credentials are 
correct then the umls server should reply positively and ctakes should let the 
pipeline continue.

Does anybody else on the devlist have any ideas?

Sean

From: Jain, Ritika 
Sent: Thursday, July 19, 2018 5:06 AM
To: dev@ctakes.apache.org
Subject: RE: Cannot authenticate license on REST API TRACKING:000308016 
[EXTERNAL]

I can get it working adding proxy parameters in the java command, now I do not 
get the connection timeout, but a different error that the user is not valid. 
If you follow the email chain below, the support person from UMLS says that my 
user is a valid user and the account user to validate the user is not for end 
point users.

Can you help me with this?



14:29:06,054 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:06,060 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@39c6fd02]
14:29:06,067 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@39c6fd02]
14:29:06,072 INFO  [Chunker] Chunker model file: 
org/apache/ctakes/chunker/models/chunker-model.zip
14:29:07,745 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:07,746 INFO  [TokenizerAnnotatorPTB] Initializing 
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
14:29:07,756 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:07,770 INFO  [ContextDependentTokenizerAnnotator] Finite state machines 
loaded.
14:29:07,778 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:07,779 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,782 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,785 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,792 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,854 DEBUG [StandardEnvironment] Initializing new StandardEnvironment
14:29:07,857 DEBUG [StandardEnvir

Re: Parse Medical Research Papers [EXTERNAL]

2018-06-18 Thread Miller, Timothy
To get predicate argument structure the best method is probably to use the SRL 
(Semantic Role Labeling) annotator which is part of the 
ctakes-dependency-parser module. Check in the desc/ directory in that module 
for some sample pipelines to see its dependencies. Once you have that running, 
look for the types:
org.apache.ctakes.typesystem.type.textsem.Predicate
org.apache.ctakes.typesystem.type.textsem.SemanticArgument
org.apache.ctakes.typesystem.type.textsem.SemanticRoleRelation

in the CVD to get a feel for how predicate arguments are represented in the CAS.
If you are not familiar with SRL maybe check out this demo:
http://cogcomp.org/page/demo_view/SRL
and these slides (specifically the propbank, that is the style cTAKES uses):
https://nlp.stanford.edu/kristina/papers/SRL-Tutorial-post-HLT-NAACL-06.pdf

I believe StanfordNLP has a module to do this too, but of course not trained on 
clinical data and not using the augmented set of verb senses that were created 
by the PropBank team for the clinical domain.

Tim



From: Don Flinn 
Sent: Monday, June 18, 2018 5:40 AM
To: dev@ctakes.apache.org
Subject: Parse Medical Research Papers [EXTERNAL]

I want to parse medical research papers and am looking at using Ctakes.  I
do realize that Ctkes is aimed at Clinical Reports, but I would like to see
if I can use it for my purposes.  I'm initially looking to get a tuple of
Subject, Predicate, Object for each sentence and later additional semantic
information..

I modified ClinicalPipelineFactory.java to use  the following portion of a
research report -

"A research team based in Houston has developed a prototype for a
“bionic” heart replacement. Other designs all mimic the beating of
a heart, but due to many moving parts, the mechanical hearts
would quickly wear out. The heart developed by BiVACOR does not
beat, and instead has one moving part which propels the blood
throughout the body. The bionic heart has been safely and
successfully transplanted into animals leading to very promising
results."

I got the following result -
Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient ===
Generic? false === Conditional? false === History? false
Entity: replacement === Polarity: 1 === Uncertain? false === Subject:
patient === Generic? false === Conditional? false === History? false
Entity: mimic === Polarity: 0 === Uncertain? false === Subject: null ===
Generic? false === Conditional? false === History? false
Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient ===
Generic? false === Conditional? false === History? false
Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient ===
Generic? false === Conditional? false === History? false
Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient ===
Generic? false === Conditional? false === History? false

I assume my problem is related to the Snomed database, which is not trained
for what I want.

My questions -
Is my assumption correct?
Should I attempt to modify/extend Snomed?
Is there a better/different way to query Snomed to meet my needs?
Is there an existing database that I could use with Ctakes that would more
meet my needs?
Should I instead use the Stanford Java NLP system or the Apache OpenNLP?
I'll still need a database.

Thank you for any suggestions
Don


Re: issues with line endings [EXTERNAL]

2018-05-07 Thread Miller, Timothy
Yes, there is a setting in git but I think I'm in a bit of a catch-22 with 
git-svn. If I don't do anything, it auto-changes a bunch of files and won't let 
me even pull without checking in those changes. I can modify the .gitattributes 
file to not care about line endings, but then I can't pull because I have the 
modified .gitattribtues file! I think my solution is to check out a totally 
clean repo with git-svn, immediately push back the files with corrected (Unix) 
line endings, and then work from that copy.
Tim


From: Gandhi Rajan Natarajan 
Sent: Saturday, May 5, 2018 3:38 AM
To: dev@ctakes.apache.org
Subject: RE: issues with line endings [EXTERNAL]

Hi Tim,



Though I'm not an expert in git, I guess there is a setting to turn off this 
feature of auto correcting line endings in git-svn.



Just have a look at this link - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__dzone.com_articles_git-2Dshowing-2Dfile-2Dmodified-2Deven&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=05YYpgD74Yy5dihICtDGHLESWl1BVu0XHA9gD0hBDeU&s=r9A1Uam0pxgIy7Nzt2833VYY4xaAqQAiSWMRB38-6rU&e=
 and see if it helps.



Regards,

Gandhi





-Original Message-

From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]

Sent: Saturday, May 05, 2018 2:25 AM

To: dev@ctakes.apache.org

Subject: issues with line endings



I'm trying to use git-svn to do ctakes development but it has this weird issue 
where it auto "fixes" line endings (mainly in -ytex*

modules) to be LF from CRLF. So it won't let me pull until I've checked in 
those changes. And because it's automatic I can't clean my local copy (if I try 
they just show up again, it's like trying to strangle a ghost). Anyways, should 
we just to a brute force commit of all files to LF endings?

Tim

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.



issues with line endings

2018-05-04 Thread Miller, Timothy
I'm trying to use git-svn to do ctakes development but it has this
weird issue where it auto "fixes" line endings (mainly in -ytex*
modules) to be LF from CRLF. So it won't let me pull until I've checked
in those changes. And because it's automatic I can't clean my local
copy (if I try they just show up again, it's like trying to strangle a
ghost). Anyways, should we just to a brute force commit of all files to
LF endings?
Tim


Re: SentenceDetector [EXTERNAL]

2018-04-06 Thread Miller, Timothy
The changes were mainly meant to adapt the OpenNLP model to
idiosyncrasies of clinical text, but you're right that they have some
shortcomings.

The newline thing is in the data sources used originally to build the
model, there were frequent cases of headings/sentence fragments by
themselves on a line, and _no_ cases of mid-sentence newlines. That,
combined with the fact that OpenNLP's train file format (at the time)
itself used newlines as a separator, led to the creation of that simple
rule rather than trying to retrain with newline as a candidate sentence
splitter. I created a different training file format and annotator that
does what you suggest, and built an alternative sentence splitter
model, here:
org/apache/ctakes/core/ae/SentenceDetectorAnnotatorBIO.java

it operates at the character level and splits a document into
sentences. For some people it works better. For data where there are
potentially mid-sentence newlines (like MIMIC), it is probably the only
model with usable results. It's typical failure mode is to lump two
sentences together, while the default annotator does the opposite.

Tim


On Fri, 2018-04-06 at 02:11 +, Ewan Mellor wrote:
> I'm looking at SentenceDetector from ctakes-core.  It has a
> surprising
> idea of what counts as a "sentence".  Before I delve any deeper,
> I wanted to ask whether there is a reason for what it's doing, in
> particular
> whether there's anything in the clinical pipeline that's depending on
> its
> behavior specifically.
> 
> The main problem I have is that it's splitting on characters like
> colon and
> semicolon, which aren't usually considered sentence separators, with
> the
> result that it often ends up tagging phrases rather than whole
> sentences.
> 
> It's using SentenceDetectorCtakes and EndOfSentenceScannerImpl, which
> seem
> to be derived from equivalents in OpenNLP, but with changes that I
> can't
> track (they date from the original edu.mayo import as far as I can
> tell).
> Other than the additional separator characters, I can't tell whether
> these
> classes are doing anything important that you wouldn't equally get
> from
> OpenNLP's SentenceDetectorME, so I don't know why they're being used.
> 
> SentenceDetector is also splitting on newlines after passing the text
> through
> the max entropy sentence model.  I don't see the point in this -- if
> you're
> going to split on newlines anyway, then why not do that before
> passing
> through the entropy model?  Or just have newline as one of the
> potential
> EOS characters and treat it as a possible break point rather than a
> definite
> one?
> 
> Any insight would be welcome.
> 
> Thanks,
> 
> Ewan.

Re: consequences of change to typesystem [EXTERNAL]

2018-04-03 Thread Miller, Timothy
Yes, that's right. Especially for one-off contributions, it is really
helpful to the project if you open up a jira issue and attach the patch
to the issue, then one of the committers will check it and commit it.
Let us know if you have any questions about that.

For people interested in contributing more regularly (i.e., getting
committing privileges), which we are more than happy to see, that is
usually a good way to start as well.

Tim



On Tue, 2018-04-03 at 18:10 +, Gandhi Rajan Natarajan wrote:
> Hi Sean,
> 
> Please find the response from Sean Finan for the similar question I
> asked him earlier:
> 
> "Ctakes doesn't really have a steadfast process for making upgrades.
> 
> You should create a jira item or use an existing one.  Any commits
> should have a comment/message starting with the jira item.  For
> instance "CTAKES-441: Add LabValueFinder".
> 
> You can use patch files, attaching them to a jira item and requesting
> that somebody test them before the changes are committed.  You may
> want to create the patch using your git version and then commit it to
> ctakes using svn.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.devroom.io_2
> 009_10_26_how-2Dto-2Dcreate-2Dand-2Dapply-2Da-2Dpatch-2Dwith-
> 2Dgit_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=CrS3yfiJxacbnmFPA6qJIyrLpQCXyg
> 3EOYDAahILynY&s=UNYDqzKKwNXwggNdpJ8XikpBGUktz3yadc0Mfyw1pjk&e=
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.devroom.io_2
> 007_07_03_how-2Dto-2Dcreate-2Dand-2Dapply-2Da-2Dpatch-2Dwith-
> 2Dsubversion_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
> r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=CrS3yfiJxacbnmFPA6qJIyrLpQCXyg
> 3EOYDAahILynY&s=lddQG2thUvB1znl1AGa_4uES_nFv_lGhNaOsj_xMd-Y&e=
> 
> If the change is significant then you could create an svn branch of
> ctakes and then commit your changes to that branch.  Ask for
> assistance testing the branch and then merge the branch into trunk."
> 
> Hope it makes sense.
> 
> Regards,
> Gandhi
> 
> -Original Message-
> From: Mullane, Sean *HS [mailto:sp...@hscmail.mcc.virginia.edu]
> Sent: Tuesday, April 03, 2018 11:28 PM
> To: 'Finan, Sean' ; d...@ctakes.apac
> he.org
> Subject: RE: consequences of change to typesystem [EXTERNAL]
> 
> I have made some minor changes to DocumentMapperServiceImpl.java to
> fix this. The bodyLocation attributes now get added via the anno_link
> table in the database. I created JIRA issue 503 [0] for this issue,
> per the cTAKES wiki.
> 
> Since this is my first time committing a change to the project I'm
> not sure how to go about it. Is there a tutorial on how to file a
> pull request I can reference?
> 
> [0] https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apach
> e.org_jira_browse_CTAKES-
> 2D503&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=CrS3yfiJxacbnmFPA6qJIyrLpQCXyg
> 3EOYDAahILynY&s=RO1ApuEOrhaRTQ1RtZVRk8zyTdGOJe0EniNvV7aLmqs&e=
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Mullane, Sean *HS [mailto:sp...@hscmail.mcc.virginia.edu]
> Sent: Wednesday, March 28, 2018 6:54 PM
> To: 'Finan, Sean'; dev@ctakes.apache.org
> Subject: RE: consequences of change to typesystem [EXTERNAL]
> 
> Sean,
> 
> Glad I asked. I will try either what you suggested or the similar
> approach of adding some code to handle the bare-annotation-as-feature 
> case similarly to how annotations inside FSArrays are handled.
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Wednesday, March 28, 2018 8:40 AM
> To: dev@ctakes.apache.org
> Subject: Re: consequences of change to typesystem [EXTERNAL]
> 
> Hi Sean,
> 
> In case nobody else has replied,
> Yes, this would definitely break a whole lot of things.  I am not
> saying that it is a bad idea, just that the current
> BinaryTextRelation interface is used as-is in probably a thousand
> places, and while some refactoring might be trivial I wouldn't bet
> that it all would be as easy as one would like.
> 
> I haven't looked at the ytex DBConsumer, but could it possibly be
> easier to add some code there that would check BinaryTextRelations
> and create a new FSArray for each?  Stick those arrays in the cas
> immediately before and db write() and you should be able to do what
> you want without impacting the rest of ctakes.
> 
> Sean
> 
> From: Mullane, Sean *HS 
> Sent: Tuesday, March 27, 2018 6:05 PM
> To: dev@ctakes.apache.org
> Subject: consequences of change to typesystem [EXTERNAL]
> 
> I am trying out a change to the typesystem (explained below). If it
> works as I hope, I would want to contribute this back to the trunk.
> Before I invest too much time into this, can anyone tell me if this
> is likely to break things for other users? I a

uima 3

2018-03-15 Thread Miller, Timothy
Has some cool looking useful new functionality:
https://uima.apache.org/d/uimaj-3.0.0-alpha02/version_3_users_guide.htm
l#uv3.overview.new

Support for arbitrary Java objects, transportable in the CAS
New types: FSHashSet
Automatic garbage collection of unreferenced Feature Structures
better performance

And an interesting new select api that interacts with java streaming
api:

Set foundTypes =
   myIndex.select(MyType.class) 
   .coveredBy(myBoundingAnnotation)
   .nonOverlapping()
   .map(fs -> fs.getType())
   .collect(Collectors.toCollection(TreeSet::new));

Re: Sentence splitter [EXTERNAL]

2018-03-13 Thread Miller, Timothy
That sounds bizarre! I can think of two possibilities: a sentence break in the 
middle of the word (unlikely), or the different sentence splits caused the POS 
tagger some confusion, and tagged the word aspirin as a forbidden part of 
speech, like a preposition or something. If you check the token annotation on 
the word aspirin you should be able to see the part of speech tag for that word.
Tim


From: Tomasz Oliwa 
Sent: Tuesday, March 13, 2018 5:34 PM
To: dev@ctakes.apache.org
Subject: Re: Sentence splitter [EXTERNAL]

Hi,

I tested SentenceDetectorAnnotatorBIO in cTAKES 4.0.0, simply by replacing 
SentenceDetectorAnnotator.xml with SentenceDetectorAnnotatorBIO.xml in 
AggregatePlaintextFastUMLSProcessor.xml.

While it seemed to work, I noticed that in one example, an IdentifiedAnnotation 
was not found, that was found for the same input with just 
SentenceDetectorAnnotator.xml.

Could somebody check this please? Run the cTAKES CVD with the following input 
(without the "):

"
aspirin

his leg
"

On the machine I tested this, the MedicationMention does not show up with 
SentenceDetectorAnnotatorBIO, but it does with SentenceDetectorAnnotator.


From: Masoud Rouhizadeh 
Sent: Tuesday, March 13, 2018 3:02:35 PM
To: dev@ctakes.apache.org
Subject: Re: Sentence splitter [EXTERNAL]

Hi Sean,

Thank you for the pointer. I was able to run the SentenceDetectorAnnotatorBIO 
from ctakes-core. The results are way better than the SentenceDetectorAnnotator 
but I still see some issues such as splitting “Dr.” as a separate sentence 
(most likely due to the period after the abbreviation). Do you think there is a 
way to define an abbreviation list for SentenceDetectorAnnotatorBIO so that it 
knows that this is a word-final (i.e. abbreviation-final) and not a 
sentence-final period?

Thanks again,
Masoud





On 3/9/18, 5:35 PM, "Finan, Sean"  wrote:


Hi Masoud,

There is a very nice SentenceDetectorBIO in ctakes-core.  It will split 
sentences based upon features other than just a newline character, which 
appears to be what you want.

Sean



From: Masoud Rouhizadeh 
Sent: Friday, March 9, 2018 4:41 PM
To: dev@ctakes.apache.org
Subject: Sentence splitter [EXTERNAL]

Hello cTAKES team!



I was wondering what types of sentence splitters are available in cTAKES? 
The default sentence splitter does not appear to be the best one. See output 
for the demo example from the example in cTAKES installation guide:



Dr. Nutritious Medical Nutrition Therapy for Hyperlipidemia Referral from:

Julie Tester, RD, LD, CNSD Phone contact:

(555)

555-1212 Height:

144 cm Current Weight:

45 kg Date of current weight: 02-29-2001 Admit Weight:

[...]



Thanks so much,

Masoud







Masoud Rouhizadeh, PhD

NLP Specialist / Software Engineer

Institute for Clinical and Translational Research

Johns Hopkins University


https://urldefense.proofpoint.com/v2/url?u=http-3A__pages.jh.edu_-7Emrouhiz1&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=aZ4yDE4zQbRJuUQ8p-T5nPrjhYvXF28sFoJWEtP3sGU&s=ob0U2sSfS7UijTI8PqCh_MwMucxPc14ovmcC2vq7rDA&e=










Re: UmlsUserApprover Error [EXTERNAL]

2018-02-26 Thread Miller, Timothy
Is it possible there is some network issue preventing connectivity? New
institutional firewall maybe?

Otherwise, it looks like somehow your credentials are not getting into
the right place. Possible a configuration file had them before and it's
been changed out from under you?

One thing you can try, if you are using an IDE, you can directly put
your credentials into the VM options for your run configuration with:
-Dctakes.umlsuser= -Dctakes.umlspw

and see if you still get the issue.

Tim


On Sat, 2018-02-24 at 18:42 -0600, Andrew Phillips wrote:
> Hello,
> 
> I am getting an error after recompiling a script in my pipeline. My
> setup
> was working fine the last time I did a compile several months ago,
> and I
> have logged into my UMLS account to ensure it isn't an issue with my
> credentials, as well as done a complete reinstall from the GitHub
> repo and
> checked out the 4.0.0 release. The minor change I made in the script
> was
> just uncommenting something that I've used before, so I know there
> are no
> errors in it. Any insights as to what the issue may be? I've included
> the
> complete output below. Thank you.
> 
> 
> [INFO] Scanning for projects...
> [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
> missing, no dependency information available
> [WARNING] Failed to retrieve plugin descriptor for
> org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
> org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies
> could
> not be resolved: Failure to find
> org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 in
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repo.maven.apach
> e.org_maven2&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r
> =Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=NHws3pftXkncEWsu-
> Y6fCtMKfY3WWkYQmDYrA4AVcvU&s=1C-i1p8UnA38es-UT_d0FMIUOx5yrfK0NQh-
> PSEuxpA&e= was cached in the local repository,
> resolution will not be reattempted until the update interval of
> central has
> elapsed or updates are forced
> [INFO]
> [INFO]
> ---
> -
> [INFO] Building Apache cTAKES Temporal Information Extraction 4.0.1-
> SNAPSHOT
> [INFO]
> ---
> -
> [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
> missing, no dependency information available
> [WARNING] Failed to retrieve plugin descriptor for
> org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
> org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies
> could
> not be resolved: Failure to find
> org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 in
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repo.maven.apach
> e.org_maven2&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r
> =Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=NHws3pftXkncEWsu-
> Y6fCtMKfY3WWkYQmDYrA4AVcvU&s=1C-i1p8UnA38es-UT_d0FMIUOx5yrfK0NQh-
> PSEuxpA&e= was cached in the local repository,
> resolution will not be reattempted until the update interval of
> central has
> elapsed or updates are forced
> [INFO]
> [INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) > validate @
> ctakes-misc >>>
> [INFO]
> [INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) < validate @
> ctakes-misc <<<
> [INFO]
> [INFO]
> [INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ ctakes-misc
> ---
> log4j: reset attribute= "false".
> log4j: Threshold ="null".
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressAppender] additivity to [false].
> log4j: Level value for ProgressAppender is  [INFO].
> log4j: ProgressAppender level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m].
> log4j: Adding appender named [noEolAppender] to category
> [ProgressAppender].
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressDone] additivity to [false].
> log4j: Level value for ProgressDone is  [INFO].
> log4j: ProgressDone level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m%n].
> log4j: Adding appender named [eolAppender] to category
> [ProgressDone].
> log4j: Level value for root is  [INFO].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%d{dd MMM 
> HH:mm:ss}
> %5p %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 24 Feb 2018 18:22:23  INFO LvgAnnotator - URL for lvg.properties
> =/home/aphillips5/ctakes/ctakes-
> misc/target/classes/org/apache/ctakes/lvg/data/config/lvg.properties
> 24 Feb 2018 18:22:23  INFO SentenceD

Re: Fast UMLS dictionary lookup description [EXTERNAL] [SUSPICIOUS]

2018-02-23 Thread Miller, Timothy
Didn't you have some slides at some point as well? I don't know if they
are suitable for public consumption but I remember it was helpful for
me at least.
Tim

On Fri, 2018-02-23 at 15:34 +, Finan, Sean wrote:
> Unfortunately, writing is not my jam.  I wrote about 50% of a paper
> and then shoved it aside for other tasks.  Now I have no idea where I
> saved it ...
> 
> However, there is an outline of sorts in the code repository within
> the ctakes-dictionary-lookup-fast module.  The doc/ directory
> contains a few files and the DictionaryLookupHelp document may
> address your question.  I apparently wrote it in March of 2014 (time
> flies) so I am guessing that some minor details have changed, but the
> main flow is the same.
> 
> Sean
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>  
> Sent: Friday, February 23, 2018 2:57 AM
> To: dev@ctakes.apache.org
> Subject: RE: Fast UMLS dictionary lookup description [EXTERNAL]
> 
> Hi Masoud,
> 
> 
> 
> In this link - https://urldefense.proofpoint.com/v2/url?u=https-3A__c
> wiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BFast-
> 2BDictionary-
> 2BLookup&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs6
> 7GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=2lx9jiMXTJ4lNLDbef7KG0qSHx
> D_AZH_DYqrQyAZWSY&s=UpxVWvyK8fZ_8vnYhIrFZlUza0qBHuqVme5n-8zEeqw&e=, I
> could see an information stating " A paper on rare word indexing is
> currently in progress."
> 
> 
> 
> May be Sean or Tim will be able to provide info on this I feel.
> 
> 
> 
> Regards,
> 
> Gandhi
> 
> 
> 
> -Original Message-
> 
> From: Masoud Rouhizadeh [mailto:m...@jhu.edu]
> 
> Sent: Thursday, February 22, 2018 9:57 PM
> 
> To: dev@ctakes.apache.org
> 
> Subject: Fast UMLS dictionary lookup description
> 
> 
> 
> Hello, cTAKES developing team,
> 
> 
> 
> We are using and comparing various NLP tools (including cTAKES) for
> processing over 5 million clinical notes within Johns Hopkins Medical
> Institutes. As a part of our comparisons, we are exploring the
> architecture of the NER and (UMLS) concept linking components of the
> tools.
> 
> 
> 
> I was able to find the description on the cTAKES default/original
> dictionary look up in the Savova et. al. 2010 paper but I was not
> able to find a paper or tech report describing the fast UMLS
> dictionary lookup (Fast UMLS Processor) yet.
> 
> 
> 
> Any description of the fast dictionary lookup algorithm is highly
> appreciated.
> 
> 
> 
> Thank you,
> 
> Masoud Rouhizadeh
> 
> 
> 
> 
> 
> Masoud Rouhizadeh, PhD
> 
> 
> 
> NLP Specialist / Software Engineer
> 
> Institute for Clinical and Translational Research Center for Clinical
> Data Analysis School of Medicine, Johns Hopkins University
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__pages.jh.edu_-7Em
> rouhiz1&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67
> GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=2lx9jiMXTJ4lNLDbef7KG0qSHxD
> _AZH_DYqrQyAZWSY&s=sqC6maCH-rhpZGJ_y6zc1q1K1z5FDYjcN6HhX8e_ZbY&e=
> 
> 
> 
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> or system manager by email immediately if you have received this e-
> mail by mistake and delete this e-mail from your system. If you are
> not the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> information is strictly prohibited and against the law.
> 

Re: using umls dictionary lookup offline [EXTERNAL] [SUSPICIOUS]

2018-02-15 Thread Miller, Timothy
Again, not legal advice, but this is my rule of thumb:
- If you had to enter your UMLS credentials to download the copy of the
UMLS you're using with cTAKES, then you don't need to have the online
credentials check. (As Sean said, you are responsible for following
licenses in terms of redistribution.)
- If you did _not_ enter your UMLS credentials to download the copy of
the UMLS you're using with cTAKES (e.g., from our sourceforge mirror),
then you DO need to have the online credentials check. It is very
beneficial to the cTAKES project that we are allowed to redistribute
the UMLS in a format that's convenient for users getting started, so it
is really important not to abuse this.

Tim


On Thu, 2018-02-15 at 14:13 +, Finan, Sean wrote:
> Hi Devi,
> 
> There is a lot to say on this topic, and I can't possibly cover it
> all.  Disclaimer: the following is not meant to be complete.  It is
> the rambling of a layman, not a lawyer, who hasn't slept.  I did not
> draft the UMLS license, nor have I thoroughly read it since ... I
> want to say October.  If anybody notices that I state something
> inaccurate please correct me.  Also, apologies for shouting TAKES.
> 
> !!!  Please visit the UMLS license start page [1] for complete
> information on what you should do regarding its use.  Apache has no
> affiliation that I know of and this is not the best forum for legal
> matters.
> 
> In short, as things apply to Apache cTAKES:
> 1) There is available on sourceforge a prebuilt database containing a
> subset of the UMLS that is usable by Apache cTAKES.  
> 2) That database is not distributed or supported by Apache.  The
> licenses are incompatible.
> 3) The Apache cTAKES website downloads page [2] provides a link to it
> as a courtesy.
> 
> 4) Just like help on anything else 3rd party [3], information on
> using the dictionary [4] in the Apache cTAKES wiki, Apache cTAKES
> mailing lists, etc. is provided for assistance.
> 5) There are inherent expectations that those utilizing said help
> abide by all laws and restrictions of the third party.
> 
> 6) The "default" Apache cTAKES dictionary lookup uses a "rare word
> index" schema. [5]
> 7) While the database on sourceforge adheres to the rare word index
> schema,
> 8) An infinite number of databases can be created that conform to
> said schema and can be used by Apache cTAKES.  [6]
> 
> 9) There is also code in Apache cTAKES that can use other database
> schemas or bar-separated value flat files.
> 
> 10) While the "default clinical pipeline" [7] is possibly the most
> commonly run configuration,
> 11) The default clinical pipeline is far from being the only way to
> use Apache cTAKES.
> 
> 12) While the default dictionary lookup does require a check of the
> end user's UMLS license during initialization,
> 13) it is possible that the end user may want to run Apache cTAKES
> without the herein mentioned sourceforge database.
> 14) For that reason, there are configurations of the dictionary
> lookup that do not require a UMLS credential check.
> 
> I have run out of steam.  So,
> 1)  If you use the subset of the UMLS that exists on sourceforge,
> PLEASE keep the UMLS credential check enabled.
> 2)  If you use another database of your own making, you can do what
> you want.
> 3) I should also say that if you create your own dictionary using the
> UMLS, I am pretty certain that you are not allowed to distribute it
> without expressed permission from the NLM.  Please consult the UMLS
> license. [1]
> 
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nlm.nih.
> gov_databases_umls.html&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdi
> oCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kLwhTWLiycm7tPA3QG6BD8swPRXxgO
> TpWHc6l_TCKoA&s=IpjGTDhTHstuDCNdgaxEo9doI7Djf-cWL7JWrtOeKwE&e=
> [2] https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache
> .org_downloads.cgi&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp
> xeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kLwhTWLiycm7tPA3QG6BD8swPRXxgO
> TpWHc6l_TCKoA&s=5Pv5xjzH7FP4OSYumoLEsWrAzY5lRiZVBsYOmMoIR68&e=
> [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache
> .org_confluence_display_CTAKES_External-2BTools-2Band-
> 2BApplications&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
> &r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kLwhTWLiycm7tPA3QG6BD8swPRXxgO
> TpWHc6l_TCKoA&s=8I3gCGeAzw4jkeGDPg536JUlUHJvmIacIg8Jjx46_kQ&e=
> [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache
> .org_confluence_display_CTAKES_cTAKES-2B4.0-2BDictionaries-2Band-
> 2BModels&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heu
> p-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kLwhTWLiycm7tPA3QG6BD8swPRXxgO
> TpWHc6l_TCKoA&s=nFS7-kIWdv_QpbHxdxl26WBnm3yGaauhs8cRHlpqMYM&e=
> [5] https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache
> .org_confluence_display_C

Re: SubjectClearTkAnalysisEngine not working [EXTERNAL]

2018-01-16 Thread Miller, Timothy
OK, it sounds like a slight misunderstanding of what "subject" refers
to. The subject field refers to _who_ is the subject of an event.

This is important to differentiate diseases that are mentioned because
the patient is experiencing them ("pt has colon cancer") from those
that might be mentioned because a family member had them ("mother had
breast cancer").

What you're talking about sounds more like "Sections", which I think in
ctakes are called "segments". There is a regex-based section finder in
cTAKES but it is not enabled by default because it would usually need
to be customized for a given institutions notes.

Tim


On Wed, 2018-01-17 at 01:10 +0530, Ratan Sharma wrote:
> I am trying to find out something like If an entity falls in one of
> these
> category, and my understanding was subject can get me these
> information.
> 
> SUBJECT it belongs to like -
> *"Vital Signs", "BP", "Physical Examination", "Family Medical
> History",
> "Lab Results"*
> 
> Any idea how to achieve this.
> 
> 
> On Wed, Jan 17, 2018 at 1:05 AM, Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
> 
> > 
> > What output would you like? What are you expecting?
> > 
> > This field in theory could have a few different values: patient,
> > family_member, other, donor(iirc?)
> > 
> > But in reality our training data was very skewed towards the
> > patient
> > label, and the representation we used for training is not great at
> > picking up section-wide cues that would be helpful (like a family
> > history section header). So in practice it almost always will say
> > "patient." It may occasionally get something very obvious: "Mother
> > had
> > breast cancer"
> > I don't know if it will get this exact example, it probably needs
> > to
> > look exactly like a training instance because we had very few to
> > generalize from.
> > Thanks
> > Tim
> > 
> > 
> > On Wed, 2018-01-17 at 00:57 +0530, Ratan Sharma wrote:
> > > 
> > > I am able to pull entity information for different section
> > > correctly.
> > > But
> > > facing issues when it comes to pull subject information. The
> > > subject
> > > is
> > > always pulled as "PATIENT".
> > > 
> > > I do have this added in the AssertionPipeline
> > > builder.add(
> > > SubjectCleartkAnalysisEngine.createAnnotatorDescription() );
> > > 
> > > 
> > > Here are some sample output :
> > > 
> > > Entity: 3 === Text: Blood Transfusion === Polarity: 1 ===
> > > Subject:
> > > patient
> > > === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.SignSymptomMention
> > > Entity: 6 === Text: Blood === Polarity: 1 === Subject: patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.AnatomicalSiteMention
> > > Entity: 3 === Text: Transfusion Reaction === Polarity: 1 ===
> > > Subject:
> > > patient === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.SignSymptomMention
> > > Entity: 5 === Text: Transfusion === Polarity: 1 === Subject:
> > > patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.ProcedureMention
> > > Entity: 2 === Text: HIV === Polarity: 1 === Subject: patient ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> > > Entity: 6 === Text: Sickle Cell === Polarity: 1 === Subject:
> > > patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.AnatomicalSiteMention
> > > Entity: 2 === Text: Neurologic Disorders === Polarity: 1 ===
> > > Subject:
> > > patient === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> > > Entity: 2 === Text: Autoimmune Disorders === Polarity: 1 ===
> > > Subject:
> > > patient === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> > > Entity: 3 === Text: Autoimmune === Polarity: 1 === Subject:
> > > patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.SignSymptomMention
> > > Entity: 2 === Text: Autoimmune Disorders === Polarity: -1 ===
> > > Subject:
> > > patient === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> > > Entity: 3 === Text: Autoimmune === Polarity: 1 === Subject:
> > > patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.SignSymptomMention

Re: SubjectClearTkAnalysisEngine not working [EXTERNAL]

2018-01-16 Thread Miller, Timothy
What output would you like? What are you expecting?

This field in theory could have a few different values: patient,
family_member, other, donor(iirc?)

But in reality our training data was very skewed towards the patient
label, and the representation we used for training is not great at
picking up section-wide cues that would be helpful (like a family
history section header). So in practice it almost always will say
"patient." It may occasionally get something very obvious: "Mother had
breast cancer"
I don't know if it will get this exact example, it probably needs to
look exactly like a training instance because we had very few to
generalize from.
Thanks
Tim


On Wed, 2018-01-17 at 00:57 +0530, Ratan Sharma wrote:
> I am able to pull entity information for different section correctly.
> But
> facing issues when it comes to pull subject information. The subject
> is
> always pulled as "PATIENT".
> 
> I do have this added in the AssertionPipeline
> builder.add(
> SubjectCleartkAnalysisEngine.createAnnotatorDescription() );
> 
> 
> Here are some sample output :
> 
> Entity: 3 === Text: Blood Transfusion === Polarity: 1 === Subject:
> patient
> === EntityName:
> org.apache.ctakes.typesystem.type.textsem.SignSymptomMention
> Entity: 6 === Text: Blood === Polarity: 1 === Subject: patient ===
> EntityName:
> org.apache.ctakes.typesystem.type.textsem.AnatomicalSiteMention
> Entity: 3 === Text: Transfusion Reaction === Polarity: 1 === Subject:
> patient === EntityName:
> org.apache.ctakes.typesystem.type.textsem.SignSymptomMention
> Entity: 5 === Text: Transfusion === Polarity: 1 === Subject: patient
> ===
> EntityName:
> org.apache.ctakes.typesystem.type.textsem.ProcedureMention
> Entity: 2 === Text: HIV === Polarity: 1 === Subject: patient ===
> EntityName:
> org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> Entity: 6 === Text: Sickle Cell === Polarity: 1 === Subject: patient
> ===
> EntityName:
> org.apache.ctakes.typesystem.type.textsem.AnatomicalSiteMention
> Entity: 2 === Text: Neurologic Disorders === Polarity: 1 === Subject:
> patient === EntityName:
> org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> Entity: 2 === Text: Autoimmune Disorders === Polarity: 1 === Subject:
> patient === EntityName:
> org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> Entity: 3 === Text: Autoimmune === Polarity: 1 === Subject: patient
> ===
> EntityName:
> org.apache.ctakes.typesystem.type.textsem.SignSymptomMention
> Entity: 2 === Text: Autoimmune Disorders === Polarity: -1 ===
> Subject:
> patient === EntityName:
> org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> Entity: 3 === Text: Autoimmune === Polarity: 1 === Subject: patient
> ===
> EntityName:
> org.apache.ctakes.typesystem.type.textsem.SignSymptomMention

Re: Can we build CollectionReader from database [EXTERNAL]

2018-01-12 Thread Miller, Timothy
Hi Kishore,
Take a look in this directory for many different collection reader options:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-core/src/main/java/org/apache/ctakes/core/cr/

JcdbCollectionReader may work for you.

here are the parameters with comments:

59  /**
60   * SQL statement to retrieve the document.
61   */
62  public static final String PARAM_SQL = "SqlStatement";
63  
64  /**
65   * Name of column from resultset that contains the document text. 
Supported
66   * column types are CHAR, VARCHAR, and CLOB.
67   */
68  public static final String PARAM_DOCTEXT_COL = "DocTextColName";
69  
70  /**
71   * Name of external resource for database connection.
72   */
73  public static final String PARAM_DB_CONN_RESRC = "DbConnResrcName";
74  
75  /**
76   * Optional parameter. Specifies column names that will be used to 
form a
77   * document ID.
78   */
79  public static final String PARAM_DOCID_COLS = "DocIdColNames";
80  
81  /**
82   * Optional parameter. Specifies delimiter used when document ID is 
built.
83   */
84  public static final String PARAM_DOCID_DELIMITER = "DocIdDelimiter";
85  


Tim


From: kishore 
Sent: Friday, January 12, 2018 6:26 AM
To: dev@ctakes.apache.org
Subject: Can we build CollectionReader from database [EXTERNAL]

Hi,
I got to know we can build CollectionReader using FileCollectionReader.
Do we have option to build CollectionReader from database? Can you suggest
me how to do that?

Thanks,
Kishore.


Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-06 Thread Miller, Timothy
PROCEDURES==
>   [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac
> transplant 15 2018-01-01 15:00:00 +0100~
> --
>
>
> Again, it picked up the History Of in the first clause where "history of"
> preceded its predicate, but not subsequent ones, or after a time
> expression indicating the past.
>
> I have a mind to work on this one day, but I think I'll be doing it in my
> CAS post processor rather than the annotator itself as the problem really
> involves a whole new solution that looks at the semantics of the whole
> sentence and not just "history of (x)"  For that we'd start looking at the
> conldep nodes, time annotations, and more.
>
> Peter
>
>
>
>
>
> On 1/5/18, 12:58 PM, "Miller, Timothy"
>  wrote:
>
> >Uncertainty is when the text indicates some hedging about the concept:
> >"possible asthma" should have asthma as an IdentifiedAnnotation with the
> >uncertainty flag set to 1.
> >This is done by machine learning and it is not easy so it is not perfect.
> >
> >HistoryOf is for concepts that are explicitly in patient history, often
> >in a history section.
> >"history of lymphoma as a child"
> >lymphoma should have its history flag set to 1.
> >This is done by machine learning and it is not easy so it is not perfect.
> >
> >Confidence is a field that I don't believe gets set by any current
> >annotators, but in theory it is for methods that might use statistical
> >methods that output a score to set the score there.
> >The cTAKES dictionary lookup either hits or doesn't, so it doesn't set
> >that score.
> >
> >DiscoveryTechnique is a way to flag which entities were annotated by
> >which annotator, since it's possible to have, e.g., multiple clinical
> >concept taggers. We use it occasionally internally
> >to separate gold standard entities from system-discovered entities (in a
> >machine learning evaluation) but I don't know if any standard pipeline
> >components set it.
> >
> >Tim
> >
> >
> >From: Kumari,Puja 
> >Sent: Friday, January 5, 2018 12:03 AM
> >To: dev@ctakes.apache.org
> >Subject: Re: Unable to understand the importance of attributes in
> >IdentifiedAnnotations [EXTERNAL]
> >
> >Hi,
> >
> >
> >
> >Thanks for the replies but I am still not able to understand the
> >significance of the attributes such as Uncertainty, HistoryOf,
> >Confidence, DiscoveryTechniques.
> >
> >Can anyone give some examples or any information which will help me to
> >understand these concepts in more depth?
> >
> >
> >
> >Thanks.
> >
> >Puja Kumari
> >
> >
> >
> >On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan"
> > wrote:
> >
> >
> >
> >Try out this link -
> >"https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__na01.safelinks.prote
> >ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.
> apache.org-252Fconflu
> >ence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-
> 252BAssertion-26dat
> >a-3D02-257C01-257CPuja.Kumari3-2540cerner.com-
> 257C989437995db145fcbaa808d5
> >536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-
> 257C636506640417
> >310103-26sdata-3D8WN2HIq9RiCiZJiTtp0i6Sk7ZVDM
> gNGoUbJRW1Hevp4-253D-26reserv
> >ed-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> Heup-IbsIg
> >9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY
> x6674h&m=muQ5_Uh4Q-5Uui87e
> >9eAWy2afrJRgcg4FrOmy2VyFP8&s=0NlpH8OCzjaVbZq3yTy4pQcWgTYtUK
> JOD5orbrpKGro&e
> >="
> >
> >
> >
> >Regards,
> >
> >Gandhi
> >
> >
> >
> >
> >
> >-Original Message-
> >
> >From: Kumari,Puja [mailto:puja.kuma...@cerner.com]
> >
> >Sent: Thursday, January 04, 2018 3:11 PM
> >
> >To: dev@ctakes.apache.org
> >
> >Subject: Re: Unable to understand the importance of attributes in
> >IdentifiedAnnotations
> >
> >
> >
> >Hi,
> >
> >
> >
> >Thanks for your reply Krishnareddy but the link given says ³page not
> >found². Any other suggestions/links that you can share would be
> >appreciable.
> >
> >
> >
> >Thanks
> >
> >Puja Kumari
> >
> >
> >
> >On 1/4/18, 2:51 PM, "Krishnareddy"  wrote:
> >
> >
> >
> &g

Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-05 Thread Miller, Timothy
Uncertainty is when the text indicates some hedging about the concept:
"possible asthma" should have asthma as an IdentifiedAnnotation with the 
uncertainty flag set to 1.
This is done by machine learning and it is not easy so it is not perfect.

HistoryOf is for concepts that are explicitly in patient history, often in a 
history section.
"history of lymphoma as a child"
lymphoma should have its history flag set to 1.
This is done by machine learning and it is not easy so it is not perfect.

Confidence is a field that I don't believe gets set by any current annotators, 
but in theory it is for methods that might use statistical methods that output 
a score to set the score there.
The cTAKES dictionary lookup either hits or doesn't, so it doesn't set that 
score.

DiscoveryTechnique is a way to flag which entities were annotated by which 
annotator, since it's possible to have, e.g., multiple clinical concept 
taggers. We use it occasionally internally
to separate gold standard entities from system-discovered entities (in a 
machine learning evaluation) but I don't know if any standard pipeline 
components set it.

Tim


From: Kumari,Puja 
Sent: Friday, January 5, 2018 12:03 AM
To: dev@ctakes.apache.org
Subject: Re: Unable to understand the importance of attributes in 
IdentifiedAnnotations [EXTERNAL]

Hi,



Thanks for the replies but I am still not able to understand the significance 
of the attributes such as Uncertainty, HistoryOf, Confidence, 
DiscoveryTechniques.

Can anyone give some examples or any information which will help me to 
understand these concepts in more depth?



Thanks.

Puja Kumari



On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan"  
wrote:



Try out this link - 
"https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconfluence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-252BAssertion-26data-3D02-257C01-257CPuja.Kumari3-2540cerner.com-257C989437995db145fcbaa808d5536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-257C636506640417310103-26sdata-3D8WN2HIq9RiCiZJiTtp0i6Sk7ZVDMgNGoUbJRW1Hevp4-253D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=muQ5_Uh4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=0NlpH8OCzjaVbZq3yTy4pQcWgTYtUKJOD5orbrpKGro&e=";



Regards,

Gandhi





-Original Message-

From: Kumari,Puja [mailto:puja.kuma...@cerner.com]

Sent: Thursday, January 04, 2018 3:11 PM

To: dev@ctakes.apache.org

Subject: Re: Unable to understand the importance of attributes in 
IdentifiedAnnotations



Hi,



Thanks for your reply Krishnareddy but the link given says “page not 
found”. Any other suggestions/links that you can share would be appreciable.



Thanks

Puja Kumari



On 1/4/18, 2:51 PM, "Krishnareddy"  wrote:



Hi,



  You can find related information about these attributes in following 
link




_*https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconfluence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-252BAssertion-2A-5F-26data-3D02-257C01-257CPuja.Kumari3-2540cerner.com-257C738752ad0ee24b8bae6208d553547f25-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-257C636506544740520408-26sdata-3DTjBeskHtrWn8ycT16NaoDopB8bTX0SJTNfWMOG8-252B5fo-253D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=muQ5_Uh4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=NmbWV7FVYRENHOhWtMCOu2UoOaw-esE6uyr0W8KKtpA&e=





Thank You



Krishna Reddy





On Thursday 04 January 2018 12:31 PM, Kumari,Puja wrote:

> Hi,

> I am working on IdentifiedAnnotations in apache cTAKES and I am not 
able to  interpret the meaning of the following attributes under 
IdentifiedAnnotations:

> 1.Uncertainty

> 2.History

> 3.Confidence

> 4.Discovery Techniques

>

> What is the importance of these attributes?

> How can we make use of these to make our work efficient?

> Any suggestion / link to understand more would be helpful.

>

>

> Thanks.

> Puja Kumari

> puja.kuma...@cerner.com

>

>

>

>

>

> CONFIDENTIALITY NOTICE This message and any included attachments are 
from Cerner Corporation and are intended only for the addressee. The 
information contained in this message is confidential and may constitute inside 
or non-public information under international, federal, or state securities 
laws. Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. 

Re: How to use external CSV or BSV in addition to FastUMLS attention Sean [EXTERNAL]

2018-01-04 Thread Miller, Timothy
The UIMA Analysis Engine descriptor for the dictionary component has a 
parameter for what ctakes calls a "lookup descriptor". By default the lookup 
descriptor describes a lookup in a hsql engine. The xml files in that sample 
directory are lookup descriptors for a lookup using the bsv files they point 
to. If you want your bsv lookup to complement the default lookup it's possible 
to just have two dictionaries running with different lookup descriptors. I 
think it's also possible to have a lookup descriptor have multiple lookup types 
(i.e. multiple  sections inside ) but I can't 
guarantee that works!
Tim


From: Abramowitsch, Peter 
Sent: Thursday, January 4, 2018 7:51 AM
To: dev@ctakes.apache.org
Subject: Re: How to use external CSV or BSV in addition to FastUMLS  attention 
Sean [EXTERNAL]

Thanks Tim,

I did see that folder and its contents and it seemed the right place to
begin.  What I couldn't find was how/where to refer to one of those
CustomCuiTui.Xml files in an engine description.

Peter

On 1/4/18, 1:41 PM, "Miller, Timothy"
 wrote:

>Peter, I know Sean is busy this week and he may not see this for a while.
>But I tried this method over the summer and got it to work so I'm fairly
>confident that's the right approach still. Some of the details may have
>changed from two years ago, so I would also check out this directory as a
>starting point:
>https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_viewvc_
>ctakes_trunk_ctakes-2Ddictionary-2Dlookup-2Dfast-2Dres_src_main_resources_
>org_apache_ctakes_dictionary_lookup_fast_example_bsv_&d=DwIFAw&c=B73tqXN8E
>c0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswNF8BK5Orm10&m=j2h_timB4skc
>lRz6ICf0XlmaUgJekZOOgGo_WF-iuDw&s=qbZInrnxDgeP2prW-pOoOFkVLFweja-ct48H8NWy
>dIM&e=
>
>Tim
>
>
>From: Abramowitsch, Peter 
>Sent: Thursday, January 4, 2018 7:28 AM
>To: dev@ctakes.apache.org
>Subject: Re: How to use external CSV or BSV in addition to FastUMLS
>attention Sean [EXTERNAL]
>
>Further to my previous message, Sean, I was wondering if you could tell
>me whether this answer you gave in 2015, is still the right way to do
>things in ctakes4.x
>
>permalink:
>https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_s
>3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=H
>eup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BTl
>hofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO5
>6wR8erA&e=
>
>Subject:RE: How to update cTAKES so that new top level categories
>come out based on local
>dictionary?<https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.o
>rg_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdio
>CoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674
>h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxs
>DD1ZdfsHVXO56wR8erA&e=> [permalink]
><https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_
>s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BT
>lhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO
>56wR8erA&e=>
>From:   Finan, Sean (sean...@childrens.harvard.edu)
>Date:   Oct 6, 2015 2:04:56 pm
>List:   org.apache.incubator.ctakes-dev
>
>
>Regards
>Peter
>
>From: , Peter Abramowitsch
>mailto:pabramowit...@hearst.com>>
>Date: Thursday, January 4, 2018 at 12:50 PM
>To: "dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>"
>mailto:dev@ctakes.apache.org>>
>Subject: How to use external CSV or BSV in addition to FastUMLS
>
>Can someone point me to any up-to-date how-tos on how to include external
>CSV/BSV type resources to add synonyms, and other terms for dictionary
>lookup to augment the FAST UMLS resources that comes out of the box.
>Perhaps I have missed something, but looking at the
>CTakesDictionaryCreator UI, it looks like it is designed only to choose
>subsets of the UMLS data set rather than allowing one to bring in
>completely new information sources.  I scoured the Marklogic ctakes user
>archive, but so many of the entries are old and I'm not sure they
>describe the current way of doing things.
>
>The only approach I could see would be to take use the AggregateEngine
>description and have it point to the CSV annotator, creating a completely
>new AE but this would build other types of annotation, whereas what I'm
>thinking about is a case for creating identified mentions such as a
>DiseaseDisorderMention based on finding an acronym that the UMLS resource
>doesn't know about, even though the concept in its full textual form is
>there.
>
>I'm sure this is not a unique request and apologize in advance if it has
>already been answered somewhere
>
>- Peter



Re: How to use external CSV or BSV in addition to FastUMLS attention Sean [EXTERNAL]

2018-01-04 Thread Miller, Timothy
Peter, I know Sean is busy this week and he may not see this for a while. But I 
tried this method over the summer and got it to work so I'm fairly confident 
that's the right approach still. Some of the details may have changed from two 
years ago, so I would also check out this directory as a starting point:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-dictionary-lookup-fast-res/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/

Tim


From: Abramowitsch, Peter 
Sent: Thursday, January 4, 2018 7:28 AM
To: dev@ctakes.apache.org
Subject: Re: How to use external CSV or BSV in addition to FastUMLS  attention 
Sean [EXTERNAL]

Further to my previous message, Sean, I was wondering if you could tell me 
whether this answer you gave in 2015, is still the right way to do things in 
ctakes4.x

permalink:  
https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_s3ztinppusvsciss&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Xq7U7BTlhofW8xpZfuBKuudNTqry4yt5RzaoBoPLRIg&s=BSEa_ZZMusVnqd2JbfeyoBxsDD1ZdfsHVXO56wR8erA&e=

Subject:RE: How to update cTAKES so that new top level categories come 
out based on local 
dictionary?
 [permalink] 

From:   Finan, Sean (sean...@childrens.harvard.edu)
Date:   Oct 6, 2015 2:04:56 pm
List:   org.apache.incubator.ctakes-dev


Regards
Peter

From: , Peter Abramowitsch 
mailto:pabramowit...@hearst.com>>
Date: Thursday, January 4, 2018 at 12:50 PM
To: "dev@ctakes.apache.org" 
mailto:dev@ctakes.apache.org>>
Subject: How to use external CSV or BSV in addition to FastUMLS

Can someone point me to any up-to-date how-tos on how to include external 
CSV/BSV type resources to add synonyms, and other terms for dictionary lookup 
to augment the FAST UMLS resources that comes out of the box.   Perhaps I have 
missed something, but looking at the CTakesDictionaryCreator UI, it looks like 
it is designed only to choose subsets of the UMLS data set rather than allowing 
one to bring in completely new information sources.  I scoured the Marklogic 
ctakes user archive, but so many of the entries are old and I'm not sure they 
describe the current way of doing things.

The only approach I could see would be to take use the AggregateEngine 
description and have it point to the CSV annotator, creating a completely new 
AE but this would build other types of annotation, whereas what I'm thinking 
about is a case for creating identified mentions such as a 
DiseaseDisorderMention based on finding an acronym that the UMLS resource 
doesn't know about, even though the concept in its full textual form is there.

I'm sure this is not a unique request and apologize in advance if it has 
already been answered somewhere

- Peter


Re: Unable to get Confidence score for any entity [EXTERNAL]

2017-12-28 Thread Miller, Timothy
These items are created by a dictionary lookup -- not any kind of probabilistic 
algorithm -- which doesn't set the confidence score. There is nothing really 
like confidence distinguishing different kinds of found dictionary concepts.
Tim



From: Ratan Sharma 
Sent: Thursday, December 28, 2017 2:09 PM
To: dev@ctakes.apache.org
Subject: Unable to get Confidence score for any entity [EXTERNAL]

I am trying to find confidence score for difference section entities like -

ProcedureMention
SignSymptomMention
MedicationMention
DiseaseDisorderMention
AnatomicalSiteMention

But for all entities under any category, the Confidence score is always 0.0

Is there a specific setting I need to turn on to get these results.

Any suggestion / link to understand more would be helpful.


Re: non Medical entity extraction [EXTERNAL]

2017-12-21 Thread Miller, Timothy
By structured fields I mean non-note sources. Notes might be stored in
a database and other columns/tables in that database will contain
patient metadata, such as sex, birthdate, insurance status, etc.
Extracting this information is probably institution-specific. If you
don't have access to this kind of database and want to get it from
notes you will need to write your own uima annotator. There are
examples of how to do this in the ctakes-examples module.
Tim


On Thu, 2017-12-21 at 07:14 -0800, Vedic Baatein wrote:
> That makes sense. 
> 
> What would be a good way to extract information about the “structured
> fields” from the notes. Is there a specific module for it. 
> 
> Thanks,
> Nitesh
> 
> > 
> > On Dec 21, 2017, at 4:24 AM, Miller, Timothy  > ens.harvard.edu> wrote:
> > 
> > No, there is not that I'm aware of. While that information is often
> > in
> > the note, it is also usually in structured fields where it can be
> > extracted with ~100% accuracy so it's not a high priority for NLP.
> > Thanks
> > Tim
> > 
> > 
> > On Thu, 2017-12-21 at 09:26 +, abilash.mat...@cognizant.com
> > wrote:
> > > 
> > > Hi All,
> > > 
> > > Is there an option currently available with CTAKES for extracting
> > > patient name, age etc. from Medical records and lab reports?
> > > 
> > > Thanks,
> > > Abilash Mathew
> > > This e-mail and any files transmitted with it are for the sole
> > > use of
> > > the intended recipient(s) and may contain confidential and
> > > privileged
> > > information. If you are not the intended recipient(s), please
> > > reply
> > > to the sender and destroy all copies of the original message. Any
> > > unauthorized review, use, disclosure, dissemination, forwarding,
> > > printing or copying of this email, and/or any action taken in
> > > reliance on the contents of this e-mail is strictly prohibited
> > > and
> > > may be unlawful. Where permitted by applicable law, this e-mail
> > > and
> > > other e-mail communications sent to and from Cognizant e-mail
> > > addresses may be monitored.

Re: non Medical entity extraction [EXTERNAL]

2017-12-21 Thread Miller, Timothy
No, there is not that I'm aware of. While that information is often in
the note, it is also usually in structured fields where it can be
extracted with ~100% accuracy so it's not a high priority for NLP.
Thanks
Tim


On Thu, 2017-12-21 at 09:26 +, abilash.mat...@cognizant.com wrote:
> Hi All,
> 
> Is there an option currently available with CTAKES for extracting
> patient name, age etc. from Medical records and lab reports?
> 
> Thanks,
> Abilash Mathew
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply
> to the sender and destroy all copies of the original message. Any
> unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email, and/or any action taken in
> reliance on the contents of this e-mail is strictly prohibited and
> may be unlawful. Where permitted by applicable law, this e-mail and
> other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.

Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-12-15 Thread Miller, Timothy
Great, that's very helpful.

I'll be happy to help with extracting the information needed from the
CAS the easy way. Sean, am I remembering right that there was an API
started for that somewhere? Or maybe that was part of DeepPhe?

Tim


On Fri, 2017-12-15 at 03:52 +, Gandhi Rajan Natarajan wrote:
> Hi Tim,
> 
> Thanks for taking time out and having a look at this. As you
> mentioned, the dictionary descriptor file contains details specific
> to my setup which needs to be changes to 127.0.0.1 by default. Will
> make the change accordingly.
> 
> The only reason we went ahead with the approach of parsing XML to
> JSON is due to our lack of in-depth knowledge in cTAKES
> implementations. If I could get some guidance on how to get the
> required JSON details directly from type systems, will be happy to
> implement the same as it will be a huge performance gain.
> 
> Also as you said we have two directories names ctakes-web-rest and
> ctakes-rest-service. Ctakes-rest-service directory is no longer
> active and its obsolete. We are just maintaining it for some
> reference for the time being. We will knock it off soon.
> 
> Thanks again for the detailed feedback.
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> Sent: Friday, December 15, 2017 1:25 AM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS]
> [SUSPICIOUS]
> 
> I looked at this today. Looks like a great start!
> 
> I was able to get as far as deploying to tomcat, seeing the web form,
> and submitting, but didn't get correct feedback because I don't have
> a mysql dictionary set up, which the default descriptor points at. I
> didn't see any instructions for building that and didn't have time to
> figure that out.
> 
> I think I mentioned in a different thread that if this whole thing
> could be wrapped in a docker container that would be really powerful,
> but if not, there are a few things that are obvious to you as
> developers but would make it easier for novices (like me) to deploy.
> 
> * download tomcat bin and start with bin/startup.sh (check at
> localhost:8080)
> * run mvn install on my ctakes installation to populate jar files in
> the .m2 directory that were missing
> * run mvn package inside the ctakes-web-rest subdirectory
> * copy the .war file into the webapps directory in my tomcat
> installation.
> * While I couldn't get the dictionary to work pointing to mysql, I
> noticed that the dictionary descriptor file has a hardcoded IP
> address when maybe it should be 127.0.0.1?
> 
> One other thing I noticed in the code is that in sending back JSON it
> looks like you're turning the JCas into xml and then parsing it
> yourself. It should be easier just to access typesystem objects
> directly. Sean may have some API code laying around to simplify that
> as well.
> 
> To iterate over signs/symptoms, for example, you would do:
> 
> for(SignSymptomMention ss : JCasUtil.select(jcas,
> SignSymptomMention.class)){
>   int begin = ss.getBegin(); // begin offset
>   int end = ss.getEnd(): // end offset ...
> }
> 
> Using the typesystem directly may help you to speed up that code or
> make it easier to read. But maybe there is a reason to write it to
> xml that I'm not aware of.
> 
> Finally, I see there are two sub-directories with similar names,
> ctakes-rest-service and ctakes-web-rest. If they are duplicates can
> you delete the old one?
> 
> I'll keep poking around, but hopefully this is helpful feedback for
> you guys. Thanks again for getting this off the ground!
> 
> Tim
> 
> 
> 
> 
> On Thu, 2017-12-07 at 14:16 +, Miller, Timothy wrote:
> > 
> > I am really interested in this too, just waiting until I have a few
> > free hours to look around. Don't want you to think it's not of
> > interest.
> > Tim
> > 
> > 
> > On Tue, 2017-12-05 at 19:18 +, Finan, Sean wrote:
> > > 
> > > 
> > > Hi all,
> > > 
> > > I am trying to clear a backlog at work.  I will most likely not
> > > be
> > > able to do anything with ctakes for another week.  Hopefully some
> > > rest expert out there can prove their worth by testing ...
> > > 
> > > Sean
> > > 
> > > -Original Message-
> > > From: Matthew Vita [mailto:matthewvit...@gmail.com]
> > > Sent: Tuesday, December 05, 2017 1:58 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: cTAKES as REST service [EXTERNAL]
> > > 
> > > 
> > > 

Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-12-14 Thread Miller, Timothy
Another thought I just had is that it seems to load the pipeline when
the first call is made -- without knowing the REST APIs that well, is
it possible to load the pipelines when the war is deployed? With some
of our larger pipelines the first call may take quite a while. Would
every call re-load the pipeline?
Tim

On Thu, 2017-12-14 at 20:16 +, Finan, Sean wrote:
> Hi Tim,
> 
> Many thanks for testing the new rest service!  And double that for
> the setup instructions!
> 
> > 
> > if this whole thing could be wrapped in a docker container that
> > would be really powerful
> - Matthew and I have had a short discussion or two on a docker that
> he is working on.  It was working, but performed a lot of the spring
> updates and some workarounds that should no longer be needed.  The
> next iteration should be cleaner and simpler.  We have also talked
> about making the container more compact.  He is busy with real work,
> but I think that this is definitely just over the horizon.
> 
> > 
> > One other thing I noticed in the code is that in sending back JSON
> > it looks like you're turning the JCas into xml and then parsing it
> > yourself. It should be easier just to access typesystem objects
> > directly. Sean may have some API code laying around to simplify
> > that as well.
> -  I am actually looking at the rest/util/XmlParser and had the very
> same thought.  It is a great start though, and as far as I know it is
> the first publicly available ctakes json writer.  If anybody else out
> there already has or knows of another, please share!
> 
> 
> Cheers all,
> Sean
> 
> 
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
> Sent: Thursday, December 14, 2017 2:55 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS]
> [SUSPICIOUS] [SUSPICIOUS]
> 
> I looked at this today. Looks like a great start!
> 
> I was able to get as far as deploying to tomcat, seeing the web form,
> and submitting, but didn't get correct feedback because I don't have
> a mysql dictionary set up, which the default descriptor points at. I
> didn't see any instructions for building that and didn't have time to
> figure that out.
> 
> I think I mentioned in a different thread that if this whole thing
> could be wrapped in a docker container that would be really powerful,
> but if not, there are a few things that are obvious to you as
> developers but would make it easier for novices (like me) to deploy.
> 
> * download tomcat bin and start with bin/startup.sh (check at
> localhost:8080)
> * run mvn install on my ctakes installation to populate jar files in
> the .m2 directory that were missing
> * run mvn package inside the ctakes-web-rest subdirectory
> * copy the .war file into the webapps directory in my tomcat
> installation.
> * While I couldn't get the dictionary to work pointing to mysql, I
> noticed that the dictionary descriptor file has a hardcoded IP
> address when maybe it should be 127.0.0.1?
> 
> One other thing I noticed in the code is that in sending back JSON it
> looks like you're turning the JCas into xml and then parsing it
> yourself. It should be easier just to access typesystem objects
> directly. Sean may have some API code laying around to simplify that
> as well.
> 
> To iterate over signs/symptoms, for example, you would do:
> 
> for(SignSymptomMention ss : JCasUtil.select(jcas,
> SignSymptomMention.class)){
>   int begin = ss.getBegin(); // begin offset
>   int end = ss.getEnd():     // end offset ...
> }
> 
> Using the typesystem directly may help you to speed up that code or
> make it easier to read. But maybe there is a reason to write it to
> xml that I'm not aware of.
> 
> Finally, I see there are two sub-directories with similar names,
> ctakes-rest-service and ctakes-web-rest. If they are duplicates can
> you delete the old one?
> 
> I'll keep poking around, but hopefully this is helpful feedback for
> you guys. Thanks again for getting this off the ground!
> 
> Tim
> 
> 
> 
> 
> On Thu, 2017-12-07 at 14:16 +, Miller, Timothy wrote:
> > 
> > I am really interested in this too, just waiting until I have a
> > few 
> > free hours to look around. Don't want you to think it's not of 
> > interest.
> > Tim
> > 
> > 
> > On Tue, 2017-12-05 at 19:18 +, Finan, Sean wrote:
> > > 
> > > 
> > > Hi all,
> > > 
> > > I am trying to clear a backlog at work.  I will most likely not
> > > be 
> > > able to do anything with ctakes for another week

Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-12-14 Thread Miller, Timothy
I looked at this today. Looks like a great start!

I was able to get as far as deploying to tomcat, seeing the web form,
and submitting, but didn't get correct feedback because I don't have a
mysql dictionary set up, which the default descriptor points at. I
didn't see any instructions for building that and didn't have time to
figure that out.

I think I mentioned in a different thread that if this whole thing
could be wrapped in a docker container that would be really powerful,
but if not, there are a few things that are obvious to you as
developers but would make it easier for novices (like me) to deploy.

* download tomcat bin and start with bin/startup.sh (check at
localhost:8080)
* run mvn install on my ctakes installation to populate jar files in
the .m2 directory that were missing
* run mvn package inside the ctakes-web-rest subdirectory
* copy the .war file into the webapps directory in my tomcat
installation.
* While I couldn't get the dictionary to work pointing to mysql, I
noticed that the dictionary descriptor file has a hardcoded IP address
when maybe it should be 127.0.0.1?

One other thing I noticed in the code is that in sending back JSON it
looks like you're turning the JCas into xml and then parsing it
yourself. It should be easier just to access typesystem objects
directly. Sean may have some API code laying around to simplify that as
well.

To iterate over signs/symptoms, for example, you would do:

for(SignSymptomMention ss : JCasUtil.select(jcas,
SignSymptomMention.class)){
  int begin = ss.getBegin(); // begin offset
  int end = ss.getEnd():     // end offset
...
}

Using the typesystem directly may help you to speed up that code or
make it easier to read. But maybe there is a reason to write it to xml
that I'm not aware of.

Finally, I see there are two sub-directories with similar names,
ctakes-rest-service and ctakes-web-rest. If they are duplicates can you
delete the old one?

I'll keep poking around, but hopefully this is helpful feedback for you
guys. Thanks again for getting this off the ground!

Tim




On Thu, 2017-12-07 at 14:16 +, Miller, Timothy wrote:
> I am really interested in this too, just waiting until I have a few
> free hours to look around. Don't want you to think it's not of
> interest.
> Tim
> 
> 
> On Tue, 2017-12-05 at 19:18 +, Finan, Sean wrote:
> > 
> > Hi all,
> > 
> > I am trying to clear a backlog at work.  I will most likely not be
> > able to do anything with ctakes for another week.  Hopefully some
> > rest expert out there can prove their worth by testing ...
> > 
> > Sean
> > 
> > -Original Message-
> > From: Matthew Vita [mailto:matthewvit...@gmail.com] 
> > Sent: Tuesday, December 05, 2017 1:58 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: cTAKES as REST service [EXTERNAL]
> > 
> > 
> > Hi Gandhi, Sean, Tim, Alex, James,
> > 
> > I'm still getting back into the swing of things after my trip (I'm
> > on
> > business traveling at the moment, here in the states). I will be
> > jumping right back into cTAKES REST development next week
> > personally
> > and with a new team mate from the open source team.
> > 
> > I'm so sorry for my silence/lack of updates!!! Very excited to see
> > what Gandhi's updates are looking like and enriching the JSON
> > response payload.
> > 
> > Thanks,
> > 
> > Matthew Vita
> > www.matthewvita.com
> > 
> > On Tue, Dec 5, 2017 at 10:24 AM, Gandhi Rajan Natarajan <
> > Gandhi.Nata
> > ra...@arisglobal.com> wrote:
> > 
> > > 
> > > 
> > > Could someone help me out on the resources cleanup atleast if not
> > > review?
> > > 
> > > Regards,
> > > Gandhi
> > > 
> > > 
> > > -Original Message-
> > > From: Gandhi Rajan Natarajan [mailto:Gandhi.Natarajan@arisglobal.
> > > co
> > > m]
> > > Sent: Monday, December 04, 2017 10:05 PM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: cTAKES as REST service [EXTERNAL]
> > > 
> > > Hi Sean, Tim, Alex, Matthew, James and All,
> > > 
> > > I have placed the first cut version of cTAKES REST module in the 
> > > following path - 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_G
> > > oT
> > > eam
> > > Epsilon_ctakes-2Drest-
> > > 2Dservice_tree_&d=DwIFaQ&c=qS4goWBT7poplM69zy_3x
> > > hKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4g
> > > Ta
> > > o&m
> > > =AaXwWeHrvVgjd3l30MX0K74_d9uL4nLj63jy45d5x_Y&s=KZ65xiQo

Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS]

2017-12-07 Thread Miller, Timothy
I am really interested in this too, just waiting until I have a few
free hours to look around. Don't want you to think it's not of
interest.
Tim


On Tue, 2017-12-05 at 19:18 +, Finan, Sean wrote:
> Hi all,
> 
> I am trying to clear a backlog at work.  I will most likely not be
> able to do anything with ctakes for another week.  Hopefully some
> rest expert out there can prove their worth by testing ...
> 
> Sean
> 
> -Original Message-
> From: Matthew Vita [mailto:matthewvit...@gmail.com] 
> Sent: Tuesday, December 05, 2017 1:58 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES as REST service [EXTERNAL]
> 
> 
> Hi Gandhi, Sean, Tim, Alex, James,
> 
> I'm still getting back into the swing of things after my trip (I'm on
> business traveling at the moment, here in the states). I will be
> jumping right back into cTAKES REST development next week personally
> and with a new team mate from the open source team.
> 
> I'm so sorry for my silence/lack of updates!!! Very excited to see
> what Gandhi's updates are looking like and enriching the JSON
> response payload.
> 
> Thanks,
> 
> Matthew Vita
> www.matthewvita.com
> 
> On Tue, Dec 5, 2017 at 10:24 AM, Gandhi Rajan Natarajan < Gandhi.Nata
> ra...@arisglobal.com> wrote:
> 
> > 
> > Could someone help me out on the resources cleanup atleast if not
> > review?
> > 
> > Regards,
> > Gandhi
> > 
> > 
> > -Original Message-
> > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.co
> > m]
> > Sent: Monday, December 04, 2017 10:05 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: cTAKES as REST service [EXTERNAL]
> > 
> > Hi Sean, Tim, Alex, Matthew, James and All,
> > 
> > I have placed the first cut version of cTAKES REST module in the 
> > following path - 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoT
> > eam
> > Epsilon_ctakes-2Drest-
> > 2Dservice_tree_&d=DwIFaQ&c=qS4goWBT7poplM69zy_3x
> > hKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTa
> > o&m
> > =AaXwWeHrvVgjd3l30MX0K74_d9uL4nLj63jy45d5x_Y&s=KZ65xiQopzQNQarVc3BP
> > MxK
> > izpqJwoUJtjIJZC8C6iA&e=
> > master/ctakes-web-rest/
> > 
> > Things pending in the module:
> > 1) Index Page to test the rest module using AJAX call
> > 2) Revamping the final output XML
> > 
> > Request you all to have a look at this module and provide your 
> > feedback. I would also require expert advice to clean up the
> > resources 
> > folder - 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoT
> > eam
> > Epsilon_ctakes-2Drest-
> > 2Dservice_tree_&d=DwIFaQ&c=qS4goWBT7poplM69zy_3x
> > hKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTa
> > o&m
> > =AaXwWeHrvVgjd3l30MX0K74_d9uL4nLj63jy45d5x_Y&s=KZ65xiQopzQNQarVc3BP
> > MxK
> > izpqJwoUJtjIJZC8C6iA&e= master/ctakes-web-
> > rest/src/main/resources/org
> > 
> > This module can be deployed as a web-app in Tomcat using the
> > generated 
> > WAR file . It can be tested using any REST client (like Chrome's 
> > Postman app) by accessing the following URL - 
> > http://:/ctakes-web-rest/service/analyze
> > and providing the analysis text as request body.
> > 
> > Sample input : "Patient has cancer and nausea. Earlier he has been 
> > deducted for red eye."
> > Sample output:
> >  {
> > "DrugChangeStatusAnnotation": [],
> > "StrengthAnnotation": [],
> > "FractionStrengthAnnotation": [],
> > "FrequencyUnitAnnotation": [],
> > "CompanyAnnotation": [],
> > "DiseaseDisorderMention": [
> > "CANCER"
> > ],
> > "SignSymptomMention": [
> > "RED EYE",
> > "NAUSEA"
> > ],
> > "RouteAnnotation": [],
> > "DateAnnotation": [],
> > "MeasurementAnnotation": [],
> > "ProcedureMention": [],
> > "TimeMention": [],
> > "StrengthUnitAnnotation": []
> > }
> > 
> > Regards,
> > Gandhi
> > 
> > -Original Message-
> > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.co
> > m]
> > Sent: Sunday, November 19, 2017 1:45 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: cTAKES as REST service [EXTERNAL]
> > 
> > Hi All,
> > 
> > Have completed cTAKES Spring upgrade changes and checked in the
> > same 
> > to SVN. Please revert in case of any issues.
> > 
> > @Alex, Thanks a lot for taking time out and providing your review 
> > comments on Spring upgrade. Really appreciate it.
> > 
> > Now it will ease our effort in creating ctakes rest module.
> > 
> > Regards,
> > Gandhi
> > 
> > 
> > -Original Message-
> > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.co
> > m]
> > Sent: Sunday, November 19, 2017 4:20 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: cTAKES as REST service [EXTERNAL]
> > 
> > Hi,
> > 
> > I have attached the patch file for cTAKES Spring upgrade in 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.
> > org
> > _jira_browse_CTAKES-
> > 2D472&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
> > ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQ

Re: exact match to CUI_TERM table question. [EXTERNAL]

2017-11-29 Thread Miller, Timothy
On Wed, 2017-11-29 at 09:36 -0500, Kathy Ferro wrote:
> Good Morning,
> 
> 1. I have a term for x-ray that has different spelling such as x.ray,
> x.rays, xray, xrays, etc...
> I see several files in
> resources\org\apache\ctakes\assertion\semantic_classes
> folder.
> I created x-ray.txt with all the terms above and hoping it will do
> the
> trick.  No luck.
> Is there a way to link all this term to x-ray without have to modify
> fast
> dictionary for every x-ray entries?

No, these files are not for the dictionary lookup and will not add
concepts to the CAS.

> 
> 2. This might not have solution, but I'll ask anyway.  Looks like the
> terms
> has to be exact match to terms in cut_terms table.  Example document
> has
> "x-ray right elbow" or "elbow x-ray".  In the dictionary, I have "x-
> ray of
> elbow" and "x-ray of the elbow".  Is there a way to pick up both of
> entries
> in the dictionary without using black box (list)?  The term "left"
> and
> "right" might be important in some instance.
> 

How much is found really depends on the granularity of the source
resource (UMLS/SNOMED) and whatever tricks Sean's import tool applies.
UMLS often represents relations as concepts (elbow x-ray is in there).
But as the modifiers get added it sometimes is easier to model as
relations. For example, if you can detect "left" as a modifier, "elbow"
as AnatomicalSite, and "x-ray" as procedure, then a relation extractor
should find with "left" is modifying "elbow" and x-ray modifies
"elbow," to give a complete picture. cTAKES can do relations between
anatomical sites and other arguments, but I don't know if the default
release does body side (left,right).

> 3. This sample is kinda related to #2.  Document has term "diabetes"
> in one
> sentence.  Down several pages, it has more specific term such as "
> retinopathy" and  "controlled with insulin".
> What is the best way to handle this?  Do you suggest I add
> "'retinopathy".
> Does cTakes has term dependency?
> 
> It picks up.  (E08-E13) is wide range of codes.
> PREFTERM VALUES(11849,'Diabetes Mellitus').
> ICD10CM VALUES(11849,'E08-E13').
> PREFTERM VALUES(11860,'Diabetes Mellitus, Non-Insulin-Dependent')
> ICD10CM VALUES(11849,'E11').
> 
> I should also have pick up these, but didn't because of the exact
> match.
> INSERT INTO CUI_TERMS VALUES(11884,0,3,'retinopathy ;
> diabetic','retinopathy')
> INSERT INTO CUI_TERMS VALUES(11884,3,6,'retina abnormal - diabet -
> relat','diabet')
> INSERT INTO CUI_TERMS VALUES(11884,1,2,'diabetic
> retinopathy','retinopathy')
> INSERT INTO CUI_TERMS VALUES(11884,0,2,'retinopathy
> diabetic','retinopathy')
> 
> 
> Snip of Sample text:
> chief complaint: Patient came in complaining of having chest pain.
> Procedure: chest xrays.
> Problems:
> Type 2 diabetes
> depression
> retinopathy
> patient controlled with insulin.
> 

It should definitely get "retinopathy" since that's in snomed. The
first thing I check when dictionary misses something is whether the
linguistic annotations around it are correct (sentence, token, part of
speech).

> Sincerely appreciated you help.
> Kathy

Re: polarity tag in output for mention/concept. [EXTERNAL] [SUSPICIOUS]

2017-11-28 Thread Miller, Timothy
I'll just point out -- the kind of examples Kathy gave were the bane of
our existence while working on the ML-based assertion system. Even
though it is obvious what is going on to a human it was hard to encode
as a feature in a way that was learnable. But I think most rule-based
algorithms will also run into problems with this type of example
eventually if they have a hard-coded scoping mechanism (e.g., scope
extends up to 10 words to the right). If you make it larger than you
may increase the number of false positives your algorithm finds
(confusingly, here a false positive is an example the algorithm calls
negated that is not actually being negated).
Tim


On Tue, 2017-11-28 at 17:22 +, Finan, Sean wrote:
> Hi Kathy,
> 
> I am glad that you checked the wiki!  I should have pointed to it ...
> 
> In the example I sent the "relevant distance" between trigger terms
> and events would be 10.  There isn't any maximum as far as I know,
> but I think that 10 is the most that I've ever used.  The default is
> 7, and you can try with that (remove "*=*") before increasing the
> number(s).
> 
> The piper files aren't source code, they are just plain text and
> don't require compiling, etc.  How are you running the pipeline right
> now?  From a binary with a bin/run* script?
> 
> Sean
> 
> 
> -Original Message-
> From: Kathy Ferro [mailto:healthcare1...@gmail.com] 
> Sent: Tuesday, November 28, 2017 12:11 PM
> To: dev@ctakes.apache.org
> Subject: Re: polarity tag in output for mention/concept. [EXTERNAL]
> 
> Sean,
> 
> Thank you for information.
> 
> I was reading the document.  So, the MaxLeftScopeSize and
> MaxRightScopeSize are limit up to 10?  Is there anyway to adjust it
> without modify the source code?
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org
> _confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BNE-
> 2BContexts&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=f
> s67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=4K9fxMmBiI0QZB0UhriFp_Yv
> XDL8rmXtGRiKVgxMCPE&s=hsCB9xPXLC8fpiwrGXuEW9snw_WZbY0e-E-mhPOO9N8&e=
> 
> 
> Thanks again,
> Kathy
> 
> 
> 
> On Tue, Nov 28, 2017 at 9:31 AM, Finan, Sean < Sean.Finan@childrens.h
> arvard.edu> wrote:
> 
> > 
> > Hi Kathy,
> > 
> > The negation annotator used in the default clinical pipeline is
> > based 
> > upon machine learning and trained on real data.  It is possible
> > that 
> > such "denies" lists were underrepresented in the training
> > data.  One 
> > thing that you can try is adding another negation annotator.  The 
> > ContextAnnotator in ctakes-ne-contexts will add negation to terms 
> > without removing existing negation.  It also has configurable
> > scope/distance that may be helpful.
> > 
> > To use this, create a new piper file containing the two lines
> > 
> > load DefaultFastPipeline
> > add ContextAnnotator MaxLeftScopeSize=10 MaxRightScopeSize=10
> > 
> > The default scope sizes are 7, but increasing  the MaxRight* might 
> > help with your "denies" discoveries.  7 might be ok for the left,
> > so 
> > feel free to remove "MaxLeftScopeSize=10" from the line.
> > 
> > Then run your piper file (command line, gui, maven profile, etc.) 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.o
> > rg_
> > confluence_display_CTAKES_Piper-
> > 2BFiles&d=DwIBaQ&c=qS4goWBT7poplM69zy_
> > 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4g
> > Tao
> > &m=4K9fxMmBiI0QZB0UhriFp_YvXDL8rmXtGRiKVgxMCPE&s=rXqsHq_poDXmwkCf3L
> > 2M5
> > ZlsByCbUHcSWD84JQQuh5A&e=
> > 
> > Sean
> > 
> > -Original Message-
> > From: Kathy Ferro [mailto:healthcare1...@gmail.com]
> > Sent: Monday, November 27, 2017 8:10 PM
> > To: dev@ctakes.apache.org
> > Subject: polarity tag in output for mention/concept. [EXTERNAL]
> > 
> > Good evening,
> > 
> > I ran a few sentences through default clinical pipeline.
> > 
> > It really reliable if it's only one term after negative, but I am
> > get 
> > in-consistent value for polarity for the list of terms.  Please
> > see 
> > example below.
> > 
> > 1.   denies fatigue, malaise, fever, weight loss
> > SignSymthomMention:
> > polarity = -1:  fatigue, malaise,fever polarity = 1: weight loss.
> > Why does weight loss got single out?
> > 
> > 2.   denies ear pain or discharge, nasal obstruction or discharge,
> > sore
> > throat
> > polarity = -1: ear pain or discharge
> > polarity = 1: nasal obstruction or discharge, obstruction, sore
> > throat 
> > Doesn't even acknowledge the list.
> > 
> > 3.   denies back pain, joint swelling, joint stiffness, joint pain
> > polarity = -1: back pain, Swelling
> > polarity = 1: Joint swelling, Stiffness, pain What! totally messy
> > the 
> > pattern.
> > 
> > 4.   denied back pain, joint swelling, joint stiffness, joint pain
> > Ok, may be it doesn't like the word denies; I changed to denied,
> > deny, 
> > etc..
> > polarity = -1 : Swelling
> > everything else is 1.
> > 
> > 
> > My question is:
> > How do I handle the negative c

Re: Contribute to ctakes: it is in your best interests! RE: unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-11-21 Thread Miller, Timothy
el.jar
> > 1.3M./ctakes-temporal-res/src/main/resources/org/apache/
> > ctakes/temporal/ae/timeannotator/model.jar
> > 7.8M./ctakes-pos-tagger-res/src/main/resources/org/apache/
> > ctakes/postagger/models/clearnlp/mayo-en-pos-1.3.0.jar
> > 4.0K./ctakes-coreference-res/src/main/resources/org/apache/
> > ctakes/coreference/models/mention-cluster/model.jar
> > 1.5M./ctakes-core-res/src/main/resources/org/apache/ctakes/
> > core/sentdetect/model.jar
> > 
> > 504K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/subject/model.jar
> > 588K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/historyOf/model.jar
> > 332K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/uncertainty/model.jar
> > 740K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/conditional/model.jar
> > 592K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/polarity/sharpi2b2mipacqnegex/model.jar
> > 572K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/generic/model.jar
> > 1.5M./ctakes-assertion-res/resources/model/
> > sharpi2b2mipacqnegex/polarity/model.jar
> > 312K./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/lemmatizer/dictionary-
> > 1.3.1.jar
> > 228M./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/clearparser_models.jar
> > 5.8M./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/srl/mayo-en-srl-1.3.0.jar
> > 452K./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/pred/mayo-en-pred-1.3.0.jar
> > 1.2M./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/role/mayo-en-role-1.3.0.jar
> > 25M ./ctakes-dependency-parser-res/src/main/resources/
> > org/apache/ctakes/dependency/parser/models/dependency/mayo-
> > en-dep-1.3.0.jar
> > 688K./ctakes-relation-extractor-res/src/main/
> > resources/org/apache/ctakes/relationextractor/models/location_of/mo
> > del.jar
> > 488K./ctakes-relation-extractor-res/src/main/
> > resources/org/apache/ctakes/relationextractor/models/degree_of/mode
> > l.jar
> > 300K./ctakes-relation-extractor-res/src/main/
> > resources/org/apache/ctakes/relationextractor/models/
> > modifier_extractor/model.jar
> > 
> > 282Mtotal
> > 
> > or
> > 
> > $ find ./ -type f -size +5M | grep -v "\.jar" | grep -v "\.svn" |
> > grep 
> > -v "\.git" | xargs du -hsc 9.2M
> >    ./ctakes-coreference-res/src/main/resources/org/apache/
> > ctakes/coreference/models/index_med_5k/_3.prx
> > 
> > 20M
> > ./ctakes-coreference-res/src/main/resources/org/apache/
> > ctakes/coreference/models/index_med_5k/_3.tvf
> > 
> > 6.9M
> >    ./ctakes-coreference-res/src/main/resources/org/apache/
> > ctakes/coreference/pref_probs.txt
> > 
> > 13M
> > ./ctakes-chunker-res/src/main/resources/org/apache/ctakes/
> > chunker/models/chunker-model.zip
> > 
> > 6.4M
> >    ./ctakes-constituency-parser-res/src/main/resources/org/
> > apache/ctakes/constituency/parser/models/thyme.bin
> > 
> > 15M
> > ./ctakes-constituency-parser-res/src/main/resources/org/
> > apache/ctakes/constituency/parser/models/sharpacq-3.1.bin
> > 
> > 12M
> > ./ctakes-constituency-parser-res/src/main/resources/org/
> > apache/ctakes/constituency/parser/models/sharpacq-1.5.bin
> > 
> > 84M
> > ./resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
> > 16ab/sno_rx_16ab.script
> > 
> > 11M
> > ./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/pos.model
> > 
> > 38M
> > 
> > ./ctakes-assertion-
> > res/resources/model/sharpi2b2mipacqnegex/polarity/
> > training-data.liblinear
> > 
> > 9.6M
> >    ./ctakes-temporal/src/main/resources/org/apache/ctakes/
> > temporal/thyme_word2vec_mapped_50.vec
> > 
> > 91M
> > ./ctakes-temporal/src/main/resources/org/apache/ctakes/
> > temporal/gloveresult_3
> > 
> > 67M
> > ./ctakes-temporal/src/main/resources/org/apache/ctakes/
> > temporal/mimic_vectors.txt
> > 
> > 378Mtotal
> > 
> > Are all these resource

Re: Contribute to ctakes: it is in your best interests! RE: unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-11-20 Thread Miller, Timothy
> 4 tasks can be performed by anybody of any experience level.   They build
> upon each other and should help the implementers better understand ctakes.
> After that the sky is the limit.
>
> A couple of years ago I sat on a panel at a workshop for open source
> scientific software.  For the half dozen or so highlighted projects
> (ctakes was one!) the common thread was that getting people to
> contribute is extremely difficult.
> I have a tendency to assume that people always act in their best
> interests.  Any student thinking of going towards industry should be
> jumping at the opportunity to contribution to a large,
> production-quality project.  They should also realize that
> contribution means potential recommendation (and possibly hiring
> interest) by established developers, physicians and researchers that
> use ctakes.  Even just answering questions on a user or dev list creates 
> credibility and can build a network.
> Active researchers could discover common thoughts and directions that
> could lead to collaboration outside ctakes.  Researchers and companies
> trying to build upon open source should realize that direct
> contribution is easier than custom substitution.  Plus, it is in their
> best interests that code does what they need it to do in the fastest,
> lightest, most stable way possible.
> With a project like ctakes there are a lot of things that can be done,
> there are great opportunities to really shine.  "I wrote this tool for
> my thesis that performs some nlp task" sounds good.  Appending "in an
> Apache product and it has been taken up by thousands across the globe"
> makes it sound a lot better.
> At my previous job in industry the company actively contributed to
> several open source projects.  We had a few people for whom that was
> 50% of their job.  Why?  Because we made a commitment to use that open source 
> software.
> It was a better use of our resources to contribute to it, improve it
> and keep its momentum going and prevent it from becoming stale (or
> abandoned) while our software continued to move forward.
>
> Hmm, that was a touch more than I had planned to write.  A whole cup
> of coffee in that one.
>
> Sean
>
>
>
>
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> Sent: Saturday, November 18, 2017 8:13 AM
> To: dev@ctakes.apache.org
> Subject: Re: unknown dependencies [EXTERNAL] [SUSPICIOUS]
>
> Thanks Alex, looks like that was probably a fat-fingered auto-import
> on my part.
>
> I like your idea, and I don't know the best way to to start either,
> but maybe one suggestion is to start with one or two focused things to
> clean up, and then ask for volunteers to take on specific modules?
> Then people can contribute an hour here and there to do cleanup on
> their task/module and try to fix that thing in a 1-2-month long
> sprint. I am happy to contribute to cleanup, I am responsible for my
> fair share of unclean code, but since I don't have strong software
> engineering chops it would be good to have people with that background
> propose the tasks and describe exactly what needs to be done. My idea
> of cleaning is just to delete commented out sections of evaluation code.
>
> Tim
>
> 
> From: Alexandru Zbarcea 
> Sent: Friday, November 17, 2017 4:46 PM
> To: Apache cTAKES Dev
> Subject: unknown dependencies [EXTERNAL]
>
> Hi,
>
> I notice that a miss-dependency has slipped in the code:
> jdk.internal.org.objectweb.asm.commons.AnalyzerAdapter;
>
> Now, that the Jenkins builds is successful, I think it is easier to
> clean-up the code. I would like to be a common effort. I don't know
> the best way to approach this.
>
> Looking forward to your advice,
> Alex
>


Re: unknown dependencies [EXTERNAL]

2017-11-18 Thread Miller, Timothy
Thanks Alex, looks like that was probably a fat-fingered auto-import on my part.

I like your idea, and I don't know the best way to to start either, but maybe 
one suggestion is to start with one or two focused things to clean up, and then 
ask for volunteers to take on specific modules? Then people can contribute an 
hour here and there to do cleanup on their task/module and try to fix that 
thing in a 1-2-month long sprint. I am happy to contribute to cleanup, I am 
responsible for my fair share of unclean code, but since I don't have strong 
software engineering chops it would be good to have people with that background 
propose the tasks and describe exactly what needs to be done. My idea of 
cleaning is just to delete commented out sections of evaluation code.

Tim


From: Alexandru Zbarcea 
Sent: Friday, November 17, 2017 4:46 PM
To: Apache cTAKES Dev
Subject: unknown dependencies [EXTERNAL]

Hi,

I notice that a miss-dependency has slipped in the code:
jdk.internal.org.objectweb.asm.commons.AnalyzerAdapter;

Now, that the Jenkins builds is successful, I think it is easier to
clean-up the code. I would like to be a common effort. I don't know the
best way to approach this.

Looking forward to your advice,
Alex


Re: source code of user installation of cTakes. [EXTERNAL] [SUSPICIOUS]

2017-11-14 Thread Miller, Timothy
 tag, Polarity, etc.
> > > > Now, I am more interested in finding Procedure, Medication,
> > > > Drug,
> etc.
> > > > Could you please point me to the code file or help with code
> > > > snippet to capture above terms.
> > > >
> > > >
> > > >
> > > > On 30 October 2017 at 19:36, Finan, Sean
> > > > mailto:sean.fi...@childrens.harvard.edu><mailto:sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>>
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Bhagwat,
> > > > >
> > > > > If you are interested in the default clinical pipeline, you
> > > > > can look at the wiki here:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
> > > > > org_confluence_display_CTAKES_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3
> > > > > xh
> > > > > Kw
> > > > > EW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4g
> > > > > Ta
> > > > > o&
> > > > > m=Q-UMs2CriAdL_TaKVFqOnSOfqjR05AQWCnwqn6bOrBk&s=VdNz5x7XXCD3tr
> > > > > fx
> > > > > 4P
> > > > > oJCYVmL-_RYlSoCOOPf-i_tMs&e=
> > > > > Default+Clinical+Pipeline
> > > > > For a visual representation of what Tim described.
> > > > >
> > > > > The AEs used for the ctakes 4.0 default clinical pipeline are
> > > > > shown at the bottom of this wiki page:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
> > > > > org_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> > > > > fs
> > > > > 67
> > > > > GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Q-UMs2CriAdL_TaKVFqO
> > > > > nS
> > > > > Of
> > > > > qjR05AQWCnwqn6bOrBk&s=1hU1X63Qu3ZRVgWTSJd9uxe-X5W-hKlf24gMo6Gh
> > > > > Jw s& e= confluence/display/CTAKES/Piper+Files
> > > > > The Class names are shown, but not the packages.  If you have
> > > > > a decent IDE they should be easy to find - for Intellij press
> > > > > CTRL-N and type the name of the class.
> > > > >
> > > > > Another option is to use the Simple Pipeline Fabricator gui to
> > > > > look at the available readers and AEs and see what they do
> > > > > (and their required inputs).  Check the wiki at:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
> > > > > org_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> > > > > fs
> > > > > 67
> > > > > GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Q-UMs2CriAdL_TaKVFqO
> > > > > nS
> > > > > Of
> > > > > qjR05AQWCnwqn6bOrBk&s=1hU1X63Qu3ZRVgWTSJd9uxe-X5W-hKlf24gMo6Gh
> > > > > Jw s& e=
> > > > > confluence/display/CTAKES/Simple+Pipeline+Fabricator+GUI
> > > > > If you launch the gui and let it gather information, you can
> > > > > look at the pipe bit names and descriptions (reader, AE).  If
> > > > > it interests you, click the "add" button (big '+') and on the
> > > > > right you will see the path to the source code for that bit of
> > > > > the pipeline.  Not all AEs
> > > > are described ...
> > > > > calling all community ...  but I think that most are.
> > > > >
> > > > > Sean
> > > > >
> > > > >
> > > > > -Original Message-
> > > > > From: Miller, Timothy
> > > > > [mailto:timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu><mailto:timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>>]
> > > > > Sent: Monday, October 30, 2017 9:48 AM
> > > > > To: 
> > > > > dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>>
> > > > > Subject: Re: source code of user installation of cTakes.
> > > > > [EXTERNAL] [SUSPICIOUS]
> > > > >
> > > > > cTAKES is based on Apache UIMA, which is a pipeline-building tool.
> > > > > So the output you see in the CVD is the result of many
> > > > > different pieces of the pipeline run i

Re: source code of user installation of cTakes. [EXTERNAL] [SUSPICIOUS]

2017-11-08 Thread Miller, Timothy
al representation of what Tim described.
> > >
> > > The AEs used for the ctakes 4.0 default clinical pipeline are shown
> > > at the bottom of this wiki page: 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ONC114Bki6vY6dmCLn3sPjdNegVyawdkxXvYuBFoonI&s=oN0sRQQgrlsp8j926ayeysmYTVO2kriknuUjfIjlUq8&e=
> > > confluence/display/CTAKES/Piper+Files
> > > The Class names are shown, but not the packages.  If you have a
> > > decent IDE they should be easy to find - for Intellij press CTRL-N
> > > and type the name of the class.
> > >
> > > Another option is to use the Simple Pipeline Fabricator gui to look
> > > at the available readers and AEs and see what they do (and their
> > > required inputs).  Check the wiki at: 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ONC114Bki6vY6dmCLn3sPjdNegVyawdkxXvYuBFoonI&s=oN0sRQQgrlsp8j926ayeysmYTVO2kriknuUjfIjlUq8&e=
> > > confluence/display/CTAKES/Simple+Pipeline+Fabricator+GUI
> > > If you launch the gui and let it gather information, you can look at
> > > the pipe bit names and descriptions (reader, AE).  If it interests
> > > you, click the "add" button (big '+') and on the right you will see
> > > the path to the source code for that bit of the pipeline.  Not all
> > > AEs
> > are described ...
> > > calling all community ...  but I think that most are.
> > >
> > > Sean
> > >
> > >
> > > -Original Message-
> > > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> > > Sent: Monday, October 30, 2017 9:48 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: source code of user installation of cTakes. [EXTERNAL]
> > > [SUSPICIOUS]
> > >
> > > cTAKES is based on Apache UIMA, which is a pipeline-building tool.
> > > So the output you see in the CVD is the result of many different
> > > pieces of the pipeline run in succession, and they are each in
> > > different modules of cTAKES. ctakes-core has the most basic elements
> > > that will run for every pipeline -- tokens, sentences, etc.
> > > ctakes-dictionary-lookup-fast is what maps text spans to UMLS concepts.
> > ctakes-assertion finds negation status.
> > > ctakes-chunker creates syntactic chunks and ctakes-pos-tagger finds
> > > part-of-speech tags for tokens. There are many others but I think
> > > this covers the basics. In general, if you see a type in the CVD
> > > that you find interesting, your best bet is to grep the code for
> > > that type and see where it is being created (if you don't want to
> > > wait for an email
> > from the list).
> > > Pipeline components are known as "Analysis Engines" (AEs) in UIMA
> > > lingo and as a result are often in a package ending in .ae.
> > > Hope this helps you navigate the code!
> > > Tim
> > >
> > > 
> > > From: Bhagwat Posane 
> > > Sent: Monday, October 30, 2017 7:24 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: source code of user installation of cTakes. [EXTERNAL]
> > >
> > > Thanks Gandhi, for the quick response.
> > >
> > > I have source code of cTAKES which is available under
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.
> > > apache.org_repos_asf_ctakes_trunk&d=DwIBaQ&c=qS4goWBT7poplM69zy_
> > > 3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Efsfuj37pWNoR_
> > > 6AidMyWm4ab03VgFjoRDFcJxdS9k0&s=ZquL0hWuNhJGyujJCmNBTCENaERN6B
> > > U3zisHhnM18Wo&e=. I see there are many projects in it.
> > >
> > > I am checking user version using \CTAKES_HOME\bin\runctakesCVD.bat,
> > > this opens an UI. I could run analysis engine for a clinical note
> > > according to the guidelines in the user-install guide..
> > > It gives me descent result in the left pane of the UI.
> > > Now I am looking for the source-code that gives this result for a
> > > clinical note. Could you please point me to the project, where can I
> > > see to it in the ctakes-trunk or so.
> > >
> > >
&

Re: source code of user installation of cTakes. [EXTERNAL]

2017-10-30 Thread Miller, Timothy
cTAKES is based on Apache UIMA, which is a pipeline-building tool. So the 
output you see in the CVD is the result of many different pieces of the 
pipeline run in succession, and they are each in different modules of cTAKES. 
ctakes-core has the most basic elements that will run for every pipeline -- 
tokens, sentences, etc. ctakes-dictionary-lookup-fast is what maps text spans 
to UMLS concepts. ctakes-assertion finds negation status. ctakes-chunker 
creates syntactic chunks and ctakes-pos-tagger finds part-of-speech tags for 
tokens. There are many others but I think this covers the basics. In general, 
if you see a type in the CVD that you find interesting, your best bet is to 
grep the code for that type and see where it is being created (if you don't 
want to wait for an email from the list). Pipeline components are known as 
"Analysis Engines" (AEs) in UIMA lingo and as a result are often in a package 
ending in .ae.
Hope this helps you navigate the code!
Tim


From: Bhagwat Posane 
Sent: Monday, October 30, 2017 7:24 AM
To: dev@ctakes.apache.org
Subject: Re: source code of user installation of cTakes. [EXTERNAL]

Thanks Gandhi, for the quick response.

I have source code of cTAKES which is available under
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Efsfuj37pWNoR_6AidMyWm4ab03VgFjoRDFcJxdS9k0&s=ZquL0hWuNhJGyujJCmNBTCENaERN6BU3zisHhnM18Wo&e=.
 I see there are many
projects in it.

I am checking user version using \CTAKES_HOME\bin\runctakesCVD.bat, this
opens an UI. I could run analysis engine for a clinical note according to
the guidelines in the user-install guide..
It gives me descent result in the left pane of the UI.
Now I am looking for the source-code that gives this result for a clinical
note. Could you please point me to the project, where can I see to it in
the ctakes-trunk or so.



On 30 October 2017 at 16:36, Gandhi Rajan Natarajan <
gandhi.natara...@arisglobal.com> wrote:

> Hi Bhagwat,
>
> The source code of cTAKES is available under 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=Efsfuj37pWNoR_6AidMyWm4ab03VgFjoRDFcJxdS9k0&s=O0hR4sqek-qxWLs6iyaqEJz4RPgChsBLYICTCQHnrmw&e=
> asf/ctakes/trunk
>
> Regarding the resources to start off, cTAKES official site documentations
> should be fine.
>
> I also feel, mailing list is the one stop shop for all your other detailed
> queries.
>
> Regards,
> Gandhi
>
>
> -Original Message-
> From: Bhagwat Posane [mailto:bhagwat.pos...@gmail.com]
> Sent: Monday, October 30, 2017 4:30 PM
> To: dev@ctakes.apache.org
> Subject: source code of user installation of cTakes.
>
> Hello,
>
> I have seen the results of user installation of cTakes , the output is
> pretty interesting.
>
> Can anybody point to the source code of the same?
>
> I have just started exploring this project if anybody point me to good
> resources to understand it thoroughly that will be great help!!
>
> I have downloaded the developer installation too.
>
> --
> Thanks,
> Bhagwat Posane
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you are not the named addressee you should not disseminate, distribute
> or copy this e-mail. Please notify the sender or system manager by email
> immediately if you have received this e-mail by mistake and delete this
> e-mail from your system. If you are not the intended recipient you are
> notified that disclosing, copying, distributing or taking any action in
> reliance on the contents of this information is strictly prohibited and
> against the law.
>



--
Thanks,
Bhagwat Posane


Re: cTAKES as REST service [EXTERNAL]

2017-10-29 Thread Miller, Timothy
Sounds great, Matthew and Gandhi, thanks for sharing your solution.
Tim


From: Matthew Vita 
Sent: Sunday, October 29, 2017 11:59 AM
To: dev@ctakes.apache.org
Subject: Re: cTAKES as REST service [EXTERNAL]

Sean,

Ghandi and I have met and we both agreed that his solution is superior to
the one I was working on. Therefore, I will be helping to see this project
through to the end so we can get it into the codebase!

Here are the remaining work items that I will be spending time on:

   1. Get it running (I'm using Linux Mint)
   2. Test it out (including stress tests)
   3. Automate it to run in Docker (just need UMLS credentials)
   4. Make a call to
   
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoTeamEpsilon_cTAKES-2DConcept-2DMention-2DParser&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=RsnpR4AiLXd_qcFBYZG7T4_ExzPAkin8TsudEMLyPo8&s=YwuDivFqbAlEMTdeK-uxI2c01mLaq-4TfNwqDnVWUW4&e=
 to get a
   nice JSON payload that is easy to traverse (this can be an optional switch,
   of course - I believe it may be best to rewrite this in Java should this be
   included with the solution)
   5. Test the output in my web viewer:
   
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoTeamEpsilon_cTAKES-2DFriendly-2DWeb-2DUI&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=RsnpR4AiLXd_qcFBYZG7T4_ExzPAkin8TsudEMLyPo8&s=NjO4o8D_b6137joWwzPbit21dfg58a0_BXTikkpMFm8&e=
   6. Work on preparing the solution for the cTAKES core codebase. I will
   prepare it with a very rich README.

I will provide my updates over the coming days.

Thanks,

Matthew Vita
www.matthewvita.com

On Sun, Oct 29, 2017 at 7:47 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Gandhi,
>
> Thank you for the additional information.  Having a reliable rest service
> included with ctakes would be a boon for everybody interested in web
> access.  I look forward to checking out the info in github as soon as I am
> able.
>
> Thanks to you and Mathew both!
>
> Sean
>
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Sunday, October 29, 2017 5:44 AM
> To: dev@ctakes.apache.org
> Subject: RE: cTAKES as REST service [EXTERNAL]
>
> Hi Sean,
>
> I feel it's better to upgrade cTAKES Spring version to 4x so that exposing
> it as rest service becomes seamless. Please find the github link that
> contains the proposed changes for Spring upgrade in cTAKES,
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_gandhirajan_cTAKES_tree_master_SpringUpgrade_ctakes-
> 2DSVN-2Dsrc&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cedfmyhdY7P3qZdiVB-
> gp0T0WinfllT61pLMqbP_Jyw&s=eMYiHpgQwJ5Zjc7-gW6qyAJ3AS3-H622ZVSJEJcgd8s&e=
>
> I have not tested the changes in ytex modules as I'm not sure how to go
> about that.
>
> Matthew Vita will be reviewing the changes. He is also reviewing and
> testing my rest service changes. He will provide more info to us once we
> are done with our testing. So that we can discuss about productizing the
> same.
>
> Regards,
> Gandhi
>
>
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Friday, October 27, 2017 12:53 AM
> To: dev@ctakes.apache.org
> Subject: RE: cTAKES as REST service [EXTERNAL]
>
> Hi Gandhi,
>
> That sounds really great!  Thank you for sharing the process!
>
> Sean
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Thursday, October 26, 2017 3:02 PM
> To: dev@ctakes.apache.org
> Subject: RE: cTAKES as REST service [EXTERNAL]
>
> Hi Sean,
>
> I'm glad to inform that I was able to upgrade cTAKES to Spring 4 in my
> sandbox. As you have mentioned, it is used by uima fit for firing some
> queries.
>
> To brief it, I did the following changes:
>
> 1) Changing SimpleJdbcTemplate to JdbcTemplate in uima modules
> 2) Changing Spring version in cTAKES root pom.xml
> 3) Adding Spring versions in ctakes type system, ctakes assertion, ctakes
> ytex and ctakes ytex web modules.
>
> Now I'm able to expose cTAKES as a rest service which takes the clinical
> text as Input and outputs the result.
>
>  Hope it helps someone.
>
> Regards,
> Gandhi
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Wednesday, October 25, 2017 7:33 PM
> To: dev@ctakes.apache.org
> Subject: RE: cTAKES as REST service [EXTERNAL]
>
> Hi Sean,
>
> Thanks for the instant response. Will try to upgrade to Spring 4 and keep
> you posted about the progress.
>
> Regards,
> Gandhi
>
>
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Wednesday, October 25, 2017 7:28 PM
> To: dev@ctakes.apache.org

Re: CAS Visual Debugger - [EXTERNAL]

2017-10-25 Thread Miller, Timothy
I've had the same thought, and come to the same conclusions.
Tim


From: Melvin Ma 
Sent: Wednesday, October 25, 2017 1:33 PM
To: dev@ctakes.apache.org
Subject: CAS Visual Debugger - [EXTERNAL]

This is more of a question. I am fully aware that CAS Visual Debugger is
maintained in UIMA project.

For me for now, I will frequently need to use CVD to view .xmi file. It
would be really nice if I could put the type system xml as an input to CVD
startup argument (instead of manully lookup this file and load it). Do you
know anyway to do it? I checked the documents multiple times and was not
able to find anything.

Thanks.

Melvin


Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-04 Thread Miller, Timothy
I had in mind the notes in:
/ctakes-examples-res/src/main/resources/org/apache/ctakes/examples/notes/rtf

which I believe are the fake notes Dr. John Green wrote for us. I don't know 
why they are rtf but they are nice, non-toy-length notes.
Tim


From: Alexandru Zbarcea 
Sent: Tuesday, October 3, 2017 5:32 PM
To: Apache cTAKES Dev
Subject: Re: Missing resources for script that extracts markables from a corpus 
for analysis [EXTERNAL]

Hi Tim,

That's great news. If you think there are sample notes that can be used, I
can start working on the Lucene index and slowly build the UTest for them.

I have created CTAKES-462[1] where we can track this work.

Looking into the ctakes-examples-res, what I can find is:
$ find . -type f | grep -v "\.class" | grep -v "\.iml" | grep -v "\.jar" |
grep -v "\.rtf" | grep -v "\.xml" | grep -v "\.bsv" | grep -v "\.piper"
./main/resources/org/apache/ctakes/examples/notes/pain_no_swelling.txt
./main/resources/org/apache/ctakes/examples/notes/claudication
./main/resources/org/apache/ctakes/examples/notes/shark_bite.txt
./main/resources/org/apache/ctakes/examples/notes/edge_cases_plaintext_1.txt

./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_1.txt
./main/resources/org/apache/ctakes/examples/notes/right_knee_arthroscopy
./main/resources/org/apache/ctakes/examples/notes/SampleInputRadiologyNotes.txt

./main/resources/org/apache/ctakes/examples/notes/smoker/
doc1_07543210_sample_past_smoker.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc2_07543210_sample_past_smoker.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc2_07543210_sample_current.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc1_07543210_sample_unknown.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc1_07543210_sample_current.txt
./main/resources/org/apache/ctakes/examples/notes/mother_goose/README
./main/resources/org/apache/ctakes/examples/notes/mother_
goose/OneMistyMoistyMorning.txt
./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_2.txt
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/Peds_RoutBirthNote_1/Peds_RoutBirthNote_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_AAA_Leak_1/VascSurg_AAA_Leak_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/Peds_Dysphagia_1/Peds_Dysphagia_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_LaborProgressNote_1/OBGYN_LaborProgressNote_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_IUD_1/OBGYN_IUD_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_HysterectomyAndBSO_1/OBGYN_HysterectomyAndBSO_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_FollowUp_1/VascSurg_FollowUp_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_PROMCheck_1/OBGYN_PROMCheck_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_Gen_Abscess_1/OBGYN_Gen_Abscess_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/Peds_FebrileSez_1/Peds_FebrileSez_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_RO_AAA_1/VascSurg_RO_AAA_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_RO_DVT_1/VascSurg_RO_DVT_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/GenSurg_UmbilicalHernia_1/GenSurg_UmbilicalHernia_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_PVD_1/VascSurg_PVD_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_MVAPrego_1/OBGYN_MVAPrego_1

What notes do you consider I should start with (all) ?

Alex

[1] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D462&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=COSkyBpYGrcp_hTAFRRfTx8JCwHAzxTM3GMiXKrSbnE&s=jOmot_onPFb31eg689D0ihb5Y4dZTzKcQ40vMCW0Bgk&e=


On Mon, Oct 2, 2017 at 6:46 PM, Miller, Timothy  wrote:

> Yeah, it might be nice to build a lucene index of all the sample notes in
> the ctakes-example module. I'll create a jira for it but probably won't be
> able to get to it right away.
> Tim
>
> 
> From: Alexandru Zbarcea 
> Sent: Monday, October 2, 2017 5:31 PM
> To: Apache cTAKES Dev
> Subject: Re: Missing resources for script that extracts markables from a
> corpus for analysis [EXTERNAL]
>
> Hi Tim,
>
> I understand, makes sense. Is it possible to anonymize the data you have or
> come up with a separate body of test data to generate a Lucene index a

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Miller, Timothy
r which he was started on Coumadin. During the
> hospitalization for a right atrial clot in 03/02 hepatocellular
> carcinoma was first noted and he was referred to an oncologist.  The
> patient started study treatment of Thalomid 200mg (days 1-21), and
> Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the
> treatment of hepatocellular carcinoma.  He was concomitantly
> receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and
> Oxycodone. This patient presented to the emergency room with the
> chief complaint of hematochezia. He reported noticing bright red
> blood and small clots mixed in with his stool. On 07/13/02, he was
> admitted due to gastrointestinal bleed.  The physician ordered 2
> large bore intravenous lines and planned to transfuse for hematocrit
> less than 30%. Due to the  INR (international normalized ratio) level
> of 3.0, Coumadin was held. He was also noted to have bilateral lower
> extremity edema with dyspnea on exertion.  On 07/13/02, he had a
> chest X-ray PA and lateral done that showed no evidence of acute
> pneumonia or congestive heart failure.  On 07/14/02, he underwent  an
> ultrasound which was negative for deep vein thrombosis. This patient
> did not take Thalomid on the day of his admittance to the hospital,
> but resumed treatment shortly after with no return of symptoms. On
> 07/15/02, he was discharged in stable condition. There have been no
> further reports of bleeding at this time. Thedoctor has assessed the
> hematochezia as related to Coumadin treatment and previously
> diagnosed diverticulosis, and not to protocol therapy with Thalomid
> and Epirubicin.Additional information received from the investigator
> on 27Aug02 reveals that this male patient began on 07Jun02 two cycles
> of therapy with Thalidomide and Epirubicin.  His post cycle two
> computed tomography scans revealed increase in size of liver lesion
> with development of multiple new satellite nodules.  On 29Jul02, the
> investigator removed this patient from protocol for progressive
> disease and recommended hospice care.  After seeking a second opinion
> from two other institutions, this patient was admitted to hospice on
> 05Aug02.  On 20Aug02, the investigator noted that this patient was
> suffering worsening fatigue and got tired getting out of his
> chair.  On 25Aug02, this patient died due to disease
> progression.  The investigator assessed the death as not related to
> study treatment and expected"
> 
> 
> 
> 
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> Sent: Monday, October 02, 2017 10:36 AM
> To: dev@ctakes.apache.org
> Subject: Re: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> 
> My bad, I didn't read too closely and thought this was going to be a
> 
> coreference patch. I don't know this FSM code that well, so I am not
> an
> 
> expert. My biggest concern at a glance is that these additions help
> 
> find more true positives (as in your examples), can we verify that
> they
> 
> won't create false positives?
> 
> Tim
> 
> 
> 
> 
> 
> On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:
> 
> > 
> > Hi Sean,
> > 
> > 
> > 
> > Thanks again for the response. I guess its mistake from my side
> > that
> > 
> > I dint send the complete text. Did you mean that with the text I
> > 
> > sent, the co-reference superscript-1 will be lost?
> > 
> > 
> > 
> > Also as per your advice, We have created an issue  - https://urldef
> > ense.proofpoint.com/v2/url?u=https-
> > 3A__urldefen&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
> > &r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=CGnNaO_ZfECB0wIfj3
> > upr01l4w_rNBG8no_VN9cFxhs&s=ikLBvXRXENiHoTgailnfsVrB-
> > sy2hMgKCTVIO8iUeNE&e=
> > 
> > se.proofpoint.com/v2/url?u=https-
> > 
> > 3A__issues.apache.org_jira_browse_CTAKES-
> > 
> > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup
> > -
> > 
> > IbsIg9Q1TPOylpP9FE4GTK-
> > 
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh
> > _g
> > 
> > nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e=   fo
> > r
> > 
> > measurement FSM changes and attached the modified file changes.
> > Could
> > 
> > someone have a look and know your thoughts please?
> > 
> > 
> > 
> > Regards,
> > 
> > Gandhi
> > 
> > 
> > 
> > 
> > 
> > -Original Message-
> > 
> > Fr

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-02 Thread Miller, Timothy
Yeah, it might be nice to build a lucene index of all the sample notes in the 
ctakes-example module. I'll create a jira for it but probably won't be able to 
get to it right away.
Tim


From: Alexandru Zbarcea 
Sent: Monday, October 2, 2017 5:31 PM
To: Apache cTAKES Dev
Subject: Re: Missing resources for script that extracts markables from a corpus 
for analysis [EXTERNAL]

Hi Tim,

I understand, makes sense. Is it possible to anonymize the data you have or
come up with a separate body of test data to generate a Lucene index and
unit test the code? I think this would have the double benefit of the code
being tested and showing dev/users how the code is supposed to be used.

What do you think?

Alex


On Mon, Oct 2, 2017 at 9:45 AM, Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Thanks Alex,
> This code is for processing a clinical text data corpus stored as a
> lucene index -- data that cannot be redistributed for privacy reasons.
> Since it's so related to the coref stuff I thought it should go
> alongside the coreference module. But maybe it makes more sense as an
> external project since it can't really function without externally
> created resources -- what do you think?
> Tim
>
>
> On Sun, 2017-10-01 at 19:54 -0400, Alexandru Zbarcea wrote:
> > Hi,
> >
> > I was trying to do a UTest for the
> > org.apache.ctakes.coreference.data.PrintMimicMarkables (recently
> > added),
> > but I couldn't find any of the existing resources that can be used
> > for
> > this. Can anyone help me pointing to a resource (Lucene index)
> > folder.
> >
> > org.apache.ctakes.coreference.data.PrintMimicMarkables \
> >
> > /home/alex/projects/apache/ctakes/ctakes-dictionary-lookup-
> > res/target/classes/org/apache/ctakes/dictionary/lookup/rxnorm_index
> > \
> > index.out
> >
> > I was trying with the following lucene folder/resource:
> > ./ctakes-coreference-
> > res/src/main/resources/org/apache/ctakes/coreference/models/index_med
> > _5k
> >
> > And also the dictionaries:
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> > like_codes_sample
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/assertion_
> > cue_phrase_index
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> > like_sample
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/drug_index
> >
> > Any execution looks like:
> > 01 Oct 2017 19:50:19  INFO ConstituencyParser - Initializing
> > parser...
> > Oct 01, 2017 7:50:20 PM
> > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer process
> > WARNING: Got Exception. (Thread Name: [CollectionReader Thread]::)
> > Message:
> > docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> > Oct 01, 2017 7:50:20 PM
> > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer run(820)
> > WARNING: docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> > java.lang.IllegalArgumentException: docID must be >= 0 and <
> > maxDoc=5000
> > (got docID=5000)
> > at
> > org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseComposite
> > Reader.java:152)
> > at
> > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeRea
> > der.java:115)
> > at org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
> > at
> > org.apache.ctakes.core.cr.LuceneCollectionReader.getNext(LuceneCollec
> > tionReader.java:90)
> > at
> > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext(
> > ArtifactProducer.java:494)
> > at
> > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(Artif
> > actProducer.java:711)
> >
> > Collection process complete called, closing file writer.
> >
> > I appreciate any of your help,
> > Alex


Re: CTAKES-460: coreference Test should not be part of main [EXTERNAL]

2017-10-02 Thread Miller, Timothy
Thanks Alex, I've committed this patch.
I unfortunately looked at the wrong tab when typing my commit message
and committed it with the wrong issue number (459).

Tim

On Mon, 2017-10-02 at 08:17 -0400, Alexandru Zbarcea wrote:
> Hi,
> 
> I have refactor a main class that should have been a UTest.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or
> g_jira_browse_CTAKES-
> 2D460&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=T0fckwyf1n_TXQgdwCR5YlQItLlxMx
> 9nU_S5EUx1Iu0&s=f5ZcQqm3Dbk91cdhymh20-kg5cyZGoHPFjK0x9ZH32k&e= 
> 
> This moves the test code from src/main to src/test and also added
> some
> refactoring.
> 
> No impact. Can easily be merged.
> 
> Alex

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-02 Thread Miller, Timothy
My bad, I didn't read too closely and thought this was going to be a
coreference patch. I don't know this FSM code that well, so I am not an
expert. My biggest concern at a glance is that these additions help
find more true positives (as in your examples), can we verify that they
won't create false positives?
Tim


On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:
> Hi Sean,
> 
> Thanks again for the response. I guess its mistake from my side that
> I dint send the complete text. Did you mean that with the text I
> sent, the co-reference superscript-1 will be lost?
> 
> Also as per your advice, We have created an issue  - https://urldefen
> se.proofpoint.com/v2/url?u=https-
> 3A__issues.apache.org_jira_browse_CTAKES-
> 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e=   for
> measurement FSM changes and attached the modified file changes. Could
> someone have a look and know your thoughts please?
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Thursday, September 28, 2017 8:21 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy 
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 
> Hi Gandhi,
> 
> I don't recall you sending me that entire snippet of text.  I think
> that I only had your single example sentence.
> You have discovered one of the quirks of software: "change the data,
> change the result."
> Ctakes is a system with many moving parts.  Things that precede or
> follow your original example sentence will change the evaluation of
> that sentence.
> With the pipeline you are using and the full note, you should see a
> number (mine is 4) next to the first "thalomid" in the original
> example sentence.  If you click that number you should see (to the
> right) 4 instances of "thalomid".
> Tim can correct me here, but maybe the coreference module ranked the
> links between "thalomid" as much higher than the rank between "study
> treatment of thalomid 200mg" and "the treatment of hepatocellular
> carcinoma" and discarded the encapsulating treatment texts from
> markables?  It is probably more complex than that.
> 
> > 
> > we have also made some code changes in MeasurementFSM.java to
> > identify certain measurements like '20 mg/m2' which was not
> > identified out of the box.  Should we send the code changes to you
> > so that you can consider the same to be productized ? Please
> > advise."
> I don't know if you've noticed the recent emails on the dev list
> involving Alexandru Zbarcea.  Alex has been creating or commenting on
> Jira items and attaching code for  fixes and enhancements.  This is a
> widely used process and is fairly easy to follow.   I think that the
> following links are relevant:
> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http
> s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-
> 2Dissues-
> 2D861257307.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ&s=Fo-LGlsEfYJpgYcWvrDmor0B3YGxx5brZLelntVMxrU&e= 
> Creating patches:   https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-
> 2Dpre-2Dcommit-2Dreviews-
> 2D298977458.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ&s=wVhEQCU73iEplHm34bO2AtgaDUpjAvrFe4GFx5b6pYo&e= 
> Attaching files:   https://urldefense.proofpoint.com/v2/url?u=https-3
> A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand-
> 2Dscreenshots-2Dto-2Dissues-
> 2D765593805.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ&s=eO_HZCkkeOg8jF3CMYnMxttXRHSM16qdwPl5nTW48zQ&e= 
> 
> I don't know if you have a jira account and permissions for the
> ctakes project.  An administrator may need to set that up for you.
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Thursday, September 28, 2017 4:09 

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-02 Thread Miller, Timothy
Thanks Alex,
This code is for processing a clinical text data corpus stored as a
lucene index -- data that cannot be redistributed for privacy reasons.
Since it's so related to the coref stuff I thought it should go
alongside the coreference module. But maybe it makes more sense as an
external project since it can't really function without externally
created resources -- what do you think?
Tim


On Sun, 2017-10-01 at 19:54 -0400, Alexandru Zbarcea wrote:
> Hi,
> 
> I was trying to do a UTest for the
> org.apache.ctakes.coreference.data.PrintMimicMarkables (recently
> added),
> but I couldn't find any of the existing resources that can be used
> for
> this. Can anyone help me pointing to a resource (Lucene index)
> folder.
> 
> org.apache.ctakes.coreference.data.PrintMimicMarkables \
> 
> /home/alex/projects/apache/ctakes/ctakes-dictionary-lookup-
> res/target/classes/org/apache/ctakes/dictionary/lookup/rxnorm_index
> \
> index.out
> 
> I was trying with the following lucene folder/resource:
> ./ctakes-coreference-
> res/src/main/resources/org/apache/ctakes/coreference/models/index_med
> _5k
> 
> And also the dictionaries:
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> like_codes_sample
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/assertion_
> cue_phrase_index
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> like_sample
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/drug_index
> 
> Any execution looks like:
> 01 Oct 2017 19:50:19  INFO ConstituencyParser - Initializing
> parser...
> Oct 01, 2017 7:50:20 PM
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer process
> WARNING: Got Exception. (Thread Name: [CollectionReader Thread]::)
> Message:
> docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> Oct 01, 2017 7:50:20 PM
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer run(820)
> WARNING: docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> java.lang.IllegalArgumentException: docID must be >= 0 and <
> maxDoc=5000
> (got docID=5000)
> at
> org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseComposite
> Reader.java:152)
> at
> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeRea
> der.java:115)
> at org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
> at
> org.apache.ctakes.core.cr.LuceneCollectionReader.getNext(LuceneCollec
> tionReader.java:90)
> at
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext(
> ArtifactProducer.java:494)
> at
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(Artif
> actProducer.java:711)
> 
> Collection process complete called, closing file writer.
> 
> I appreciate any of your help,
> Alex

  1   2   3   >