Apache cTAKES, java jdk 17 and UIMA 3

2024-07-16 Thread Finan, Sean
Hi all,

The ctakes version 6.0.0-SNAPSHOT in the main GitHub branch 
https://github.com/apache/ctakes has been upgraded to be compatible with java 
17 and Uima 3.

Please test your favorite workflows and let me know if you have any problems.

I would like to get an -update- release out soon.  This would be a release with 
no new functionality, just updates to bring ctakes into this decade.

Note:

If you are using IntelliJ and already have ctakes as a project you may 
experience a problem with the classes in the type system not being found.  To 
remedy this, after you update you may need to click the "Generate Sources and 
Update Folders for all Projects." button in the maven window (folder with 
circle/reload icon).  Then a clean compile should allow a run to find generated 
types.

Sean



Sean Finan
Research Computing Principal Engineer
Computational Health Informatics Program, Natural Language Processing Lab
Boston Children's Hospital
sean.fi...@tch.harvard.edu


Re: [EXTERNAL] Upgrade to JDK 17 Inquiry

2024-07-06 Thread Finan, Sean
Hi John,

Thank you for the information.  I didn't realize that hsqldb had changed its 
format again.  I will keep that in mind if and when another dictionary is 
released.

Sean

From: Petersam, John Contractor 
Sent: Friday, July 5, 2024 9:21 AM
To: dev@ctakes.apache.org 
Subject: RE: [EXTERNAL] Upgrade to JDK 17 Inquiry

* External Email - Caution *


Hi Sean,
I don't believe the migration from 17 to 21 required any changes at all.  The 
big change was getting to 13, and I did that back when 13 came out.  The only 
changes I've made to stay current since then are general library updates.  The 
only one I haven't been able to update is hsqldb because 2.7.3 isn't compatible 
with the files created with 2.3.4.  I'm planning on updating those in the 
August/September timeframe so I can check that box as well.

Thanks,
John

-Original Message-
From: Finan, Sean 
Sent: Wednesday, July 03, 2024 3:57 PM
To: dev@ctakes.apache.org
Subject: Re: [EXTERNAL] Upgrade to JDK 17 Inquiry

Hi John,

Would you be able to share some of the changes that you needed to make for 21?

Thanks,
Sean

From: Petersam, John Contractor 
Sent: Wednesday, July 3, 2024 3:45 PM
To: dev@ctakes.apache.org 
Subject: RE: [EXTERNAL] Upgrade to JDK 17 Inquiry

* External Email - Caution *


Hi Ryan,
I'm on Java 21 and will migrate to 23 when it comes out.  I upgraded it years 
ago, so I don't remember all the specifics of what I changed.  It's not that 
difficult, but does require making some code modifications as well as updating 
some dependencies so you will definitely need to roll up your sleeves a bit.

Thanks,
John

-Original Message-
From: Ryan Swenson 
Sent: Wednesday, July 03, 2024 3:40 PM
To: dev@ctakes.apache.org
Subject: [EXTERNAL] Upgrade to JDK 17 Inquiry

Hello,

We have a major application using Apache cTakes for standardizing clinical 
laboratory results which has been running on Java 8, and we have had to operate 
with a security exception, and are now where we will need to assess if we can 
migrate to Java 17 or ultimately migrate entirely to a new paradigm and 
approach.  Has anyone assessed the effort, are there any breaking changes or 
compatibility issues, or dependency versioning conflicts that need to be 
addressed  to achieve an upgrade to running under Java 17 and soon, 22?

We are in favor of upgrading but we can only maintain the security exception 
for so long, and are now at a point where if it its either too large of an 
effort, or a ways out from achieving, we will need to migrate soon.

Thanks,
Ryan


Re: [EXTERNAL] Upgrade to JDK 17 Inquiry

2024-07-03 Thread Finan, Sean
Hi John,

Would you be able to share some of the changes that you needed to make for 21?

Thanks,
Sean

From: Petersam, John Contractor 
Sent: Wednesday, July 3, 2024 3:45 PM
To: dev@ctakes.apache.org 
Subject: RE: [EXTERNAL] Upgrade to JDK 17 Inquiry

* External Email - Caution *


Hi Ryan,
I'm on Java 21 and will migrate to 23 when it comes out.  I upgraded it years 
ago, so I don't remember all the specifics of what I changed.  It's not that 
difficult, but does require making some code modifications as well as updating 
some dependencies so you will definitely need to roll up your sleeves a bit.

Thanks,
John

-Original Message-
From: Ryan Swenson 
Sent: Wednesday, July 03, 2024 3:40 PM
To: dev@ctakes.apache.org
Subject: [EXTERNAL] Upgrade to JDK 17 Inquiry

Hello,

We have a major application using Apache cTakes for standardizing clinical 
laboratory results which has been running on Java 8, and we have had to operate 
with a security exception, and are now where we will need to assess if we can 
migrate to Java 17 or ultimately migrate entirely to a new paradigm and 
approach.  Has anyone assessed the effort, are there any breaking changes or 
compatibility issues, or dependency versioning conflicts that need to be 
addressed  to achieve an upgrade to running under Java 17 and soon, 22?

We are in favor of upgrading but we can only maintain the security exception 
for so long, and are now at a point where if it its either too large of an 
effort, or a ways out from achieving, we will need to migrate soon.

Thanks,
Ryan


Re: Upgrade to JDK 17 Inquiry [EXTERNAL]

2024-07-03 Thread Finan, Sean
Hi Ryan,

I have recently been working on upgrades to ctakes focused around vulnerability 
mitigation, so your question is very timely.

The current version in the github master branch 
https://github.com/apache/ctakes , 6.0.0-SNAPSHOT, should build and run with 
java 17.

I don't know of any plans to upgrade ctakes to java 22 any time soon.

Please try the 6.0.0-SNAPSHOT with jdk17 and let me know if you have any 
problems.

If you can contribute to the current ctakes vulnerability mitigation effort in 
any way, such as by providing a list of your exceptions, please do.

Thanks,

Sean





From: Ryan Swenson 
Sent: Wednesday, July 3, 2024 3:39 PM
To: dev@ctakes.apache.org 
Subject: Upgrade to JDK 17 Inquiry [EXTERNAL]

* External Email - Caution *


Hello,

We have a major application using Apache cTakes for standardizing clinical 
laboratory results which has been running on Java 8, and we have had to operate 
with a security exception, and are now where we will need to assess if we can 
migrate to Java 17 or ultimately migrate entirely to a new paradigm and 
approach.  Has anyone assessed the effort, are there any breaking changes or 
compatibility issues, or dependency versioning conflicts that need to be 
addressed  to achieve an upgrade to running under Java 17 and soon, 22?

We are in favor of upgrading but we can only maintain the security exception 
for so long, and are now at a point where if it its either too large of an 
effort, or a ways out from achieving, we will need to migrate soon.

Thanks,
Ryan


Apache cTAKES 5.1.0 has been released

2024-05-16 Thread Finan, Sean
Hi all,

I am pleased to announce that cTAKES 5.1.0 has been officially released.

Zip files containing the source code and a binary build are in the Release area 
on our GitHub site:
https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0

cTAKES 5.1.0 jars are also accessible for maven dependencies.  For instance, 
the ctakes-clinical-pipeline:
https://central.sonatype.com/artifact/org.apache.ctakes/ctakes-clinical-pipeline

The main branch on the GitHub repository now contains version 6.0.0-SNAPSHOT.
https://github.com/apache/ctakes

Sean





Re: Fw: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

2024-05-15 Thread Finan, Sean
Hi James,

That code is defunct and shouldn't be used.  I will take it out for the next 
release.

From: James Masanz 
Sent: Wednesday, May 15, 2024 4:05 PM
To: dev@ctakes.apache.org 
Subject: Re: Fw: Please test the Apache cTAKES 5.1.0 release candidate 
[EXTERNAL]

* External Email - Caution *


Hi all,

To get a clean environment, I started a new Windows Sandbox (on Windows
11), installed IntelliJ, and opened the downloaded ctakes sources as a
project.

Not sure if this was valid to try anymore - I tried to
run HelloWorldAggregatePipeline.class directly.

I haven't dug into this yet, but in case someone knows offhand - although
it completed with an exit code 0 and output a value for Polarity, there is
also this message included.

** Error: problem of opening/reading config file:
'file:\C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0\ctakes-lvg\target\classes\org\apache\ctakes\lvg\data\config\lvg.properties'.
Use -x option to specify the config file path.

The lvg.properties file does exist in that directory.

The context for that message is below

I will do some other testing before I take a look at
HelloWorldAggregatePipeline again.

- James


15 May 2024 14:44:25  INFO LvgAnnotator - URL for lvg.properties
=/C:/Users/Public/cT5/ctakes-5.1.0-source-release/ctakes-5.1.0/ctakes-lvg/target/classes/org/apache/ctakes/lvg/data/config/lvg.properties
15 May 2024 14:44:26  INFO SentenceDetector - Sentence detector model file:
org/apache/ctakes/core/models/sentdetect/sd-med-model.zip
15 May 2024 14:44:26  INFO TokenizerAnnotatorPTB - Initializing
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
15 May 2024 14:44:26  INFO LvgCmdApiResourceImpl - Loading NLM Norm and Lvg
with config file =
file:\C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0\ctakes-lvg\target\classes\org\apache\ctakes\lvg\data\config\lvg.properties
15 May 2024 14:44:26  INFO LvgCmdApiResourceImpl -   config file absolute
path =
C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0\file:\C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0\ctakes-lvg\target\classes\org\apache\ctakes\lvg\data\config\lvg.properties
15 May 2024 14:44:26  INFO LvgCmdApiResourceImpl - cwd =
C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0
15 May 2024 14:44:26  INFO LvgCmdApiResourceImpl - cd
file:\C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0\ctakes-lvg\target\classes\org\apache\ctakes\lvg\
** Configuration Error:
file:\C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0\ctakes-lvg\target\classes\org\apache\ctakes\lvg\data\config\lvg.properties
(The filename, directory name, or volume label syntax is incorrect)
** Error: problem of opening/reading config file:
'file:\C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0\ctakes-lvg\target\classes\org\apache\ctakes\lvg\data\config\lvg.properties'.
Use -x option to specify the config file path.
** Configuration Error:
file:\C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0\ctakes-lvg\target\classes\org\apache\ctakes\lvg\data\config\lvg.properties
(The filename, directory name, or volume label syntax is incorrect)
** Error: problem of opening/reading config file:
'file:\C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0\ctakes-lvg\target\classes\org\apache\ctakes\lvg\data\config\lvg.properties'.
Use -x option to specify the config file path.
15 May 2024 14:44:26  INFO LvgCmdApiResourceImpl - cd
C:\Users\Public\cT5\ctakes-5.1.0-source-release\ctakes-5.1.0
15 May 2024 14:44:26  INFO ContextDependentTokenizerAnnotator - Finite
state machines loaded.
15 May 2024 14:44:26  INFO POSTagger - POS tagger model file:
org/apache/ctakes/postagger/models/mayo-pos.zip
15 May 2024 14:44:26  INFO SentenceDetector - Starting processing.
15 May 2024 14:44:26  INFO TokenizerAnnotatorPTB - process(JCas) in
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
15 May 2024 14:44:26  INFO LvgAnnotator - process(JCas)
15 May 2024 14:44:26  INFO ContextDependentTokenizerAnnotator -
process(JCas)
15 May 2024 14:44:26  INFO POSTagger - process(JCas)
15 May 2024 14:44:26  INFO ConfigParameterExample - Token:Hello POS:NNP
15 May 2024 14:44:26  INFO ConfigParameterExample - Token:World POS:NNP
Entity: Hello === Polarity: 0
Entity: World === Polarity: 0

Process finished with exit code 0


On Wed, May 1, 2024 at 3:55 PM Finan, Sean
 wrote:

> Hi all,
>
> As you may have seen, the last 5.1.0 candidate had some issues.
>
> I have created a new 5.1.0 candidate, available here:
>
> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/__;!!NZvER7FxgEiBAiR_!tLjBU3w09cUeWE4115MKOWDwD6zM9k3Nyn-PlEf6BeifISf-bW5Fe2wZs1f1X42it4dl9D0EwoYgSLc_GZjCIKOkXjw-J_TO$
>
> As before, individual module jars up two levels and in associated
> subdirectories.
>
> Hopefully this candidate fares better.  Please report any findings before
> next Mo

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2024-05-15 Thread Finan, Sean
Thanks Tim!


From: Miller, Timothy 
Sent: Wednesday, May 15, 2024 11:38 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Thanks Sean,
I was able to get it working – definitely a user/documentation issue and not an 
issue with the code. Looks like a great release. I’m happy to vote for release 
+1.
Tim


From: Finan, Sean 
Date: Tuesday, May 14, 2024 at 10:35 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


Ah - are you just running the class within intellij?  If so, you need to set 
the classpath in the run configuration to be ctakes-examples.  Otherwise the 
classpath doesn't contain anything from modules outside ctakes-gui and 
ctakes-core.

Alternatively, run the maven compile step with the "runPiperGui" profile 
selected.  That will also run the piper file submitter gui with the correct 
classpath.

Using a binary build, after running bin/getUmlsDictionary, running 
bin/runPiperSubmitter also works.

I don't want to do it for 5.1.0, but I should make names of the class, profile 
and script match.

I will check the wiki instructions and make sure that -exact- details are in 
there.

Sean


From: Miller, Timothy 
Sent: Tuesday, May 14, 2024 12:55 PM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


I can check out and build successfully with mvn from the command line. I can 
successfully open in intellij and run the piper file submitter. I get an error 
trying to run the default fast pipeline piper file:

Loading Piper File DefaultTokenizerPipeline ...


Error: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key No Analysis Component found for 
ContextDependentTokenizerAnnotator



It doesn’t seem to be able to find the ContextDependentTokenizerAnnotator.

Tim



From: Miller, Timothy 
Date: Tuesday, May 14, 2024 at 9:25 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim


From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2024-05-14 Thread Finan, Sean
Ah - are you just running the class within intellij?  If so, you need to set 
the classpath in the run configuration to be ctakes-examples.  Otherwise the 
classpath doesn't contain anything from modules outside ctakes-gui and 
ctakes-core.

Alternatively, run the maven compile step with the "runPiperGui" profile 
selected.  That will also run the piper file submitter gui with the correct 
classpath.

Using a binary build, after running bin/getUmlsDictionary, running 
bin/runPiperSubmitter also works.

I don't want to do it for 5.1.0, but I should make names of the class, profile 
and script match.

I will check the wiki instructions and make sure that -exact- details are in 
there.

Sean


From: Miller, Timothy 
Sent: Tuesday, May 14, 2024 12:55 PM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


I can check out and build successfully with mvn from the command line. I can 
successfully open in intellij and run the piper file submitter. I get an error 
trying to run the default fast pipeline piper file:

Loading Piper File DefaultTokenizerPipeline ...


Error: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key No Analysis Component found for 
ContextDependentTokenizerAnnotator



It doesn’t seem to be able to find the ContextDependentTokenizerAnnotator.

Tim



From: Miller, Timothy 
Date: Tuesday, May 14, 2024 at 9:25 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim


From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e>

I added a little bit to your instructions in the ctakes-web-rest README  
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$%3e>

The lines here indirectly applies to pre-release builds:
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvE

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2024-05-14 Thread Finan, Sean
Hi Tim,

Thanks for testing, I'll look into this.

Sean

From: Miller, Timothy 
Sent: Tuesday, May 14, 2024 12:55 PM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


I can check out and build successfully with mvn from the command line. I can 
successfully open in intellij and run the piper file submitter. I get an error 
trying to run the default fast pipeline piper file:

Loading Piper File DefaultTokenizerPipeline ...


Error: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key No Analysis Component found for 
ContextDependentTokenizerAnnotator



It doesn’t seem to be able to find the ContextDependentTokenizerAnnotator.

Tim



From: Miller, Timothy 
Date: Tuesday, May 14, 2024 at 9:25 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim


From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e>

I added a little bit to your instructions in the ctakes-web-rest README  
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$%3e>

The lines here indirectly applies to pre-release builds:
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$%3e>

The 5.1.0-SNAPSHOT version

Re: Resending without attachments., [EXTERNAL]

2024-05-09 Thread Finan, Sean
Hi all,

I don't think that I have any of the original communications, but just picking 
up here:


  *   From your dump, it looks as if the main concept dictionary is missing.

If you just need the standard dictionary, you can use the information on this 
ctakes wiki page:
https://github.com/apache/ctakes/wiki/cTAKES+UMLS+Package+Fetcher
The beginning of that page briefly outlines the UMLS key requirement.
If you don't want to build a binary distributable to run the script in bin/ you 
can execute the class org.apache.ctakes.gui.dictionary.DictionaryDownloader


  *
 The dictionary in question is rather dated and intended to be a sample.  I 
found it here:

That dictionary is pretty old, and though it contains a lot of standard terms 
it is not "complete" for every purpose.
The dictionary on that github page is a copy of the ctakes dictionary.  We had 
to get specific permission to distribute any part of the umls, so by copying 
our dictionary in a public repo for redistribution this github group is doing a 
-bad thing-.  Please use the


  *   There are also models you may need, but not have.

Models for ctakes are in separate repositories.  When you build ctakes from the 
source obtained on github the models will automatically be downloaded from 
maven central.  Just for an example reference 
https://central.sonatype.com/artifact/org.apache.ctakes/ctakes-assertion-models


  *
*But first I recommend you get your license key and follow the instructions
about how to configure it into the WAR file.*

I think that I missed this part of the original communication.

I concur with what Peter said:
"you will continue to get a rather cryptic resource initialization
error until you've passed the API  key correctly."

For a quick "my first ctakes run", use the piper file submitter gui.  
https://github.com/apache/ctakes/wiki/Piper+File+Submitter
As you can see from the images on the wiki page, the default clinical pipeline 
does demand a key for the umls.  It can be entered on line 4 of the parameter 
table.  You'll notice that the value in the "Option" column, line 4, is 
"--key".   When you run ctakes through a command line, you can add the 
parameter --key followed by your umls key.
This older wiki page has a little information on the key  
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0.0.1


Sean



From: Peter Abramowitsch 
Sent: Thursday, May 9, 2024 5:07 PM
To: dev@ctakes.apache.org ; 
joel-paul.jeripoth...@achalahealth.com 
Subject: Resending without attachments., [EXTERNAL]

* External Email - Caution *


Shifting this thread back to the main ctakes thread where it belongs...

Hi Joel,

>From your dump, it looks as if the main concept dictionary is missing.

*"No Resource at
resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script"*

It's currently configured to run with a standard but older dictionary.  But
first we need to establish whether you have a UMLS api-key that gives you
access to use that vocabulary resource.  If not, here's where to begin
https://urldefense.com/v3/__https://documentation.uts.nlm.nih.gov/rest/authentication.html__;!!NZvER7FxgEiBAiR_!urKhCyJIGdr9FsV1dFNY3SP-VPO7Yh5yl-4bxLGt8UhOTSuGRzDH3r7uKnMcHT2PLgLFXXjJiV-nntNYRZIDb3yckvI7_OO62A$

The dictionary in question is rather dated and intended to be a sample.  I
found it here:
https://urldefense.com/v3/__https://github.com/CDCgov/NLPWorkbench/blob/master/ctakes-patch/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script__;!!NZvER7FxgEiBAiR_!urKhCyJIGdr9FsV1dFNY3SP-VPO7Yh5yl-4bxLGt8UhOTSuGRzDH3r7uKnMcHT2PLgLFXXjJiV-nntNYRZIDb3yckvJQR3Zu_A$
 .
Once you have your UMLS license you can also download the entire UMLS
vocabulary resource onto your machine, then run the cTakes Dictionary
Creator application  to build the vocabulary you need.  It selectively
fetches the parts you want from the UMLS files and builds a database for
use in cTakes.  I think most cTakes users build their own dictionaries
after they've become familiar with the application.

There are also models you may need, but not have.These large binary
objects got shifted when the source was transferred onto GitHub and I'm not
sure where they are stored now.Others on this thread will know.

*But first I recommend you get your license key and follow the instructions
about how to configure it into the WAR file.*I haven't used that module
before and it's probably been a decade since I last used apache tomcat.  In
any case, you will continue to get a rather cryptic resource initialization
error until you've passed the API  key correctly.

I'm about to head off to Europe, so you may need to lean on another
resource to get started.  That's why I've cc'd the ctakes thread and you
can take it from there.

Peter


Re: Remaining Maven errors visible in Eclipse [EXTERNAL]

2024-05-06 Thread Finan, Sean
Hi Peter,

Thanks again for testing.  I didn't have a problem with that ctakes.version in 
ytex-web, but I stuck a definition of it in  just in case.  It does 
the same thing as using parent.version, but just in case we ever change the 
parent I went with a definition of ctakes.version in the ytex-web pom.

Do you have a listing of the rest of the errors reported by eclipse?  I use 
Intellij and while I do get a bunch of version warnings I don't get any errors. 
 I think that ytex-web would need a fair amount of code overhaul to get rid of 
them, and unless there is a major demand from the community I don't know that 
it is worth doing.  Personally, I'd like to put ytex-web in the attic and refer 
to ctakes-web-rest as a replacement.  Perhaps we can do that in ctakes 6 ?

Thanks,

Sean


From: Peter Abramowitsch 
Sent: Sunday, May 5, 2024 4:48 PM
To: dev@ctakes.apache.org 
Subject: Remaining Maven errors visible in Eclipse [EXTERNAL]

* External Email - Caution *


Hi Sean, there are some minor 5.1.0 Maven glitches picked up by Eclipse,
one of which I can fix and others not.

in ctakes-ytex-web's pom.xml, I changed *ctakes*.version to *parent*.version.
I have not checked it in, it case it wasn't the right thing to do, but it
made the error go away.



org.apache.ctakes

ctakes-user-resources

${parent.version}



But every pom except the master pom, shows this error:

"Cannot parse lifecycle mapping for maven project Maven Project"

In fact  there is no lifecycle mapping file.

I looked at various solutions online and none of them worked, including
creating a dummy mapping and including it in the project - all it did was
insert a blank line between every line of the master pom!  *Che palle*, as
they say in Italian

I'm pretty sure it's not my Eclipse installation because my own Maven
projects (admittedly smaller) don't show this error.

Is anyone else seeing a red 'x' error next to every pom in the source tree
in Eclipse?

Eclipse Version: 2023-12 (4.30.0)
M2E - Maven Integration for Eclipse 2.6.0.20240220-1109
org.eclipse.m2e.feature.feature.group Eclipse.org - m2e

Peter


Re: Mastif Zoner is there now [EXTERNAL]

2024-05-02 Thread Finan, Sean
Hi Peter,

I am glad that mastif is now in the distrobution.  Yes, testing the pre-release 
is a little rough, but once released everything is streamlined and -easy-.  
After the release there shouldn't be any of the funny business of cleaning, 
grabbing a tag, installing local jars, etc.

Thanks again,

Sean

From: Peter Abramowitsch 
Sent: Thursday, May 2, 2024 3:11 PM
To: dev@ctakes.apache.org 
Subject: Mastif Zoner is there now [EXTERNAL]

* External Email - Caution *


Hi Sean

I did a clean build, also removing the mastif zoner library from my maven
cache.  It does get into the distribution now.

My git branch got a bit confused when I tried to merge the tag into it.
But by destroying my branch and using switch -c to create a new one off the
5.1.0 tag it seemed to do the right thing.  I guess when 5.1.0 is merged
into main, that won't be an issue

Peter


Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

2024-05-02 Thread Finan, Sean
Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://github.com/apache/ctakes/blob/main/pom.xml#L1074

I added a little bit to your instructions in the ctakes-web-rest README  
https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README

The lines here indirectly applies to pre-release builds:
https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README#L22

The 5.1.0-SNAPSHOT version of ctakes-web-rest has a dependency on the 5.1.0 
version of ctakes modules (not the SNAPSHOT).
https://github.com/apache/ctakes/blob/main/ctakes-web-rest/pom.xml#L14

The pre-release basically contains an equivalent to "changed code or resources" 
in that the code and resources in the pre-release do not exist on maven 
central, which is where a maven build would normally get them.
When maven builds the pre-release it will not be able to find version 5.1.0 of 
any jars through maven central, so it will look for them in your local .m2 
directory.
Maven puts the 5.1.0 jars in your .m2 directory when you run 'mvn install' on 
the main ctakes project.

In summary,
To build ctakes-web-rest to test the pre-release war, one must run 'mvn 
install' on the ctakes main project before they run 'mvn package' on the 
ctakes-web-rest project (or on the main project's web-rest-build profile).
To build ctakes-web-rest once ctakes 5.1.0 has been released, the extra 
preliminary step of running 'mvn install' will not be necessary.


  *   If you have some time this week, we can connect to understand what 
exactly is the problem.

I can meet you tomorrow evening your time (4-7 pm IST) to work with you in the 
SQL problem.  If you'd rather keep your Friday night to yourself, I can work 
with the same time slot any time through next Monday evening.

Before the 6.0.0 release I will put some Release Manager information in the 
wiki.  The maven release process using a GitHub repo requires a little trick 
that took me a long time to figure out, and the pre-release testing deserves 
some recorded documentation.

Sean




From: gandhi rajan 
Sent: Thursday, May 2, 2024 1:42 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

* External Email - Caution *


Hi Sean,

Thanks for the update. So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder
you mean? Infact I was trying to build them on a machine which doesnt have
any historic jars in the .m2 folder and thats why it was failing.
And ytex issue still remains a mystery to me. If you have some time this
week, we can connect to understand what exactly is the problem.

On Thu, 2 May 2024 at 02:32, Finan, Sean
 wrote:

> Hi Gandhi,
>
> I can build the web-rest module.  I should have mentioned that to build
> any of the rest projects you need to run mvn install.  As the rest requires
> 5.1.0 module jars and they don't exist externally (pre-release), maven must
> be able to fetch them from your .m2 directory.
>
> I haven't been able to duplicate the ytex problems that you see and don't
> know what might be causing them.
>
> Sean
>
> 
> From: gandhi rajan 
> Sent: Tuesday, April 30, 2024 11:18 AM
> To: dev@ctakes.apache.org 
> Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Peter,
>
> Thanks for the response. I dont think the generate test action is trying to
> use mysql but hsql DB. Anyways, I am able to build other modules apart from
> ytex and ytex-uima module.
>
> Sean, did you try building ctakes-web-rest module by any chance? It seems
> to be broken in my case.
>
> On Tue, 30 Apr 2024 at 01:28, Peter Abramowitsch 
> wrote:
>
> > Hi Gandhi,  I think the email from Jeff Painter may explain your
> > situation.  It's a question of your version of mysql being new.   The
> > crucial lines in your trace are:
> >
> > org.apache.ctakes.jdl.AppMain.main(AppMain.java:84)
> > [INFO]  [java] Caused by:
> > java.lang.reflect.InaccessibleObjectException: Unable to make protected
> > final java.lang.Class
> > java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int) throws
> > java.lang.ClassFormatError accessible: module java.base does not "opens
> > 

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

2024-05-01 Thread Finan, Sean
Hi Peter,

I think that I have the ctakes-mastif-zoner module behavior as desired.  Let me 
know if you have any problems with the new candidate.

Sean

From: Peter Abramowitsch 
Sent: Friday, April 26, 2024 11:41 PM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

* External Email - Caution *


Hi Sean,

It all compiles, but one of the jars is missing from the distribution.
It's the one I added:  ctakes-mastif-zoner which is required if you're
going to use the Zone Annotator.

It's in the master pom, and in the pom of ctakes-distribution, and the jar
got built in its projecte, but it's not scooped up into the distribution.
I'm not sure where else to look.Can you fix it?

Peter


On Fri, Apr 26, 2024 at 8:59 AM Finan, Sean
 wrote:

> Hi all,
>
> There is a candidate for version 5.1.0 of Apache cTAKES source code in a
> staging repository:
>
> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGZJnsDvVQ$
>
> The code is contained within the file:
> ctakes-5.1.0-source-release.zip<
> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGbgcnIf3Q$
> >
>
> I welcome you all to test your favorite pipeline(s) and report any issues.
> I am calling a vote from the PMC to finish by 12:nn Eastern time, next
> Wednesday May 1.  Please report any issues before that time.  If any
> 'road-block' issues are found they will need to be addressed before a
> release.
>
> Thank you,
> Sean
>
>
> p.s.
>
> The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0 tag:
> https://urldefense.com/v3/__https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGYS7mfi0g$
>
> The ctakes-5.1.0 tag was made from the 5.1.0 branch:
> https://urldefense.com/v3/__https://github.com/apache/ctakes/tree/5.1.0__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGZACCHJkw$
>
> The 5.1.0 branch is a copy of the main branch:
> https://urldefense.com/v3/__https://github.com/apache/ctakes/tree/main__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGaoZVS80g$
> The version number in the 5.1.0 branch is different, but there are no code
> differences between the two branches.
>
>
>


Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

2024-05-01 Thread Finan, Sean
; > without tests.
> > > >
> > > > However, check my previous email about your issue.  Whereas you'd
> > > narrowed
> > > > it down to a script, I found a line in your email which showed the
> > error
> > > > within that script's execution:  A java  program: jdl running as
> > App.Main
> > > > threw an assertion on one of the tasks connected with the mysql
> > database
> > > it
> > > > was trying to configure.  You could put some debugging statements in
> > > there
> > > > to see which one.
> > > >
> > > > Peter
> > > >
> > > > On Mon, Apr 29, 2024 at 4:55 AM gandhi rajan <
> gandhiraja...@gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks for the insights Peter. I dint make it clear that I did ran
> the
> > > >> install on ytex module with test case execution toggled off. I used
> > the
> > > >> following command - "mvn -e clean install -Dmaven.test.skip=true"
> and
> > I
> > > >> still hit the same error.
> > > >>
> > > >> On digging deep, I could find that the build process is trying to
> > > execute
> > > >> ""
> in
> > > >> build-main.xml which in turn is trying to invoke the following
> target
> > in
> > > >> build.setup.xml:
> > > >>
> > > >>  > > >> depends="generateTestYtexProperties,templateToConfig,deleteTestDb">
> > > >> 
> > > >> 
> > > >>
> > > >> Did you try running this on a fresh setup Peter?
> > > >>
> > > >> On Sun, 28 Apr 2024 at 01:17, Peter Abramowitsch <
> > > pabramowit...@gmail.com
> > > >> >
> > > >> wrote:
> > > >>
> > > >> > Hi Gandhi
> > > >> > Your error appears to be at this line
> > > >> >
> > > >> >
> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456:
> > > >> Java
> > > >> > returned: 1
> > > >> >
> > > >> > A test application being run here:  AppMain is in charge of
> loading
> > a
> > > >> > temporary mysqldb that is used to test that part of ytex.   For me
> > it
> > > is
> > > >> > working, but if  you can find a way to run that surefire test in
> the
> > > >> > debugger, you can find out why it's failing on one of the
> > assertions.
> > > >> > Otherwise you can  take this shortcut
> > > >> >
> > > >> > mvn  -Dmaven.test.skip=true
> > > >> >
> > > >> > To build the project without running any tests.
> > > >> >
> > > >> > On Sat, Apr 27, 2024 at 7:35 AM gandhi rajan <
> > gandhiraja...@gmail.com
> > > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Sean,
> > > >> > >
> > > >> > > When I tried to build the complete ctakes suite, i get build
> > failure
> > > >> for
> > > >> > > ctakes-ytex module with the following error:
> > > >> > >
> > > >> > > [ERROR] Failed to execute goal
> > > >> > > org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run
> > > >> > > (generate-test-config) on project ctakes-ytex: An Ant
> > BuildException
> > > >> has
> > > >> > > occured: The following error occurred while executing this line:
> > > >> > > [ERROR]
> > > >> > >
> > > >>
> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\build-setup.xml:149:
> > > >> > The
> > > >> > > following error occurred while executing this line:
> > > >> > > [ERROR]
> > > >> > >
> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:148:
> > > >> > The
> > > >> > > following error occurred while executing this line:
> > > >> > > [ERROR]
> > > >> > >
> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:295:
> > > >> > The
> > > >> > > following error occurred while executing this line:
> > > >> > > [ERROR]
> 

Fw: Please test the Apache cTAKES 5.1.0 release candidate

2024-05-01 Thread Finan, Sean
Hi all,

As you may have seen, the last 5.1.0 candidate had some issues.

I have created a new 5.1.0 candidate, available here:
https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/

As before, individual module jars up two levels and in associated 
subdirectories.

Hopefully this candidate fares better.  Please report any findings before next 
Monday,  May 6th.

Thank you,

Sean

p.s.
If you test build any of the rest projects (e.g. ctakes-web-rest) or build an 
installation with Dockhand, you must first run mvn install.  Those builds 
require ctakes module jars to exist where they can be fetched, and as ctakes 
5.1.0 will not be available through maven central before a release, the jars 
must be in your .m2 directory.






From: Finan, Sean
Sent: Friday, April 26, 2024 11:58 AM
To: dev@ctakes.apache.org 
Subject: Please test the Apache cTAKES 5.1.0 release candidate

Hi all,

There is a candidate for version 5.1.0 of Apache cTAKES source code in a 
staging repository:
https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/

The code is contained within the file:
ctakes-5.1.0-source-release.zip<https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip>

I welcome you all to test your favorite pipeline(s) and report any issues.
I am calling a vote from the PMC to finish by 12:nn Eastern time, next 
Wednesday May 1.  Please report any issues before that time.  If any 
'road-block' issues are found they will need to be addressed before a release.

Thank you,
Sean


p.s.

The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0 tag:
https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0

The ctakes-5.1.0 tag was made from the 5.1.0 branch:
https://github.com/apache/ctakes/tree/5.1.0

The 5.1.0 branch is a copy of the main branch:
https://github.com/apache/ctakes/tree/main
The version number in the 5.1.0 branch is different, but there are no code 
differences between the two branches.




Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

2024-04-29 Thread Finan, Sean
Hi Gandhi, Peter,

I am on Windows and the tests work fine.  They must all run for the release 
phases.

Peter,

I found the problem.  In the candidate the -distribution bin definition was 
missing mastif-zoner.  I added it and all looks good except that the xml files 
aren't there.  I am putting them in a src/user/resources/ directory to match 
other projects and adding all of the requisites for that paradigm.  That will 
copy them to resources/ in a source compile / package and binary distribution 
zip(s).  It will place them in the ctakes-user-resources.jar for use as a 
dependency in maven projects.  --> If you would rather they be in the 
mastif-zoner jar then that isn't a problem, just let me know and I'll do it 
that way.

Sean


From: Peter Abramowitsch 
Sent: Monday, April 29, 2024 11:50 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

* External Email - Caution *


I think this is the class where Java is exiting with 1
/ctakes-ytex/src/test/java/org/apache/ctakes/jdl/AppMainTest.java

btw my environment is MacOS and I notice yours is Windows, so the root
cause why this class is giving you trouble is something I wouldn't be able
to help you with.  But some debug statements rather than asserts would tell
you, I think.

Peter

On Mon, Apr 29, 2024 at 8:43 AM Peter Abramowitsch 
wrote:

> Hi Gandhi
> This project is an odd one in the sense that when you tell it to skip the
> tests, it still goes through the effort in building up the db environment
> that the tests would use.  But in any case, for me it does build either
> way.  In the attached log, I've run a maven clean before doing the build
> without tests.
>
> However, check my previous email about your issue.  Whereas you'd narrowed
> it down to a script, I found a line in your email which showed the error
> within that script's execution:  A java  program: jdl running as App.Main
> threw an assertion on one of the tasks connected with the mysql database it
> was trying to configure.  You could put some debugging statements in there
> to see which one.
>
> Peter
>
> On Mon, Apr 29, 2024 at 4:55 AM gandhi rajan 
> wrote:
>
>> Thanks for the insights Peter. I dint make it clear that I did ran the
>> install on ytex module with test case execution toggled off. I used the
>> following command - "mvn -e clean install -Dmaven.test.skip=true" and I
>> still hit the same error.
>>
>> On digging deep, I could find that the build process is trying to execute
>> "" in
>> build-main.xml which in turn is trying to invoke the following target in
>> build.setup.xml:
>>
>> > depends="generateTestYtexProperties,templateToConfig,deleteTestDb">
>> 
>> 
>>
>> Did you try running this on a fresh setup Peter?
>>
>> On Sun, 28 Apr 2024 at 01:17, Peter Abramowitsch > >
>> wrote:
>>
>> > Hi Gandhi
>> > Your error appears to be at this line
>> >
>> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456:
>> Java
>> > returned: 1
>> >
>> > A test application being run here:  AppMain is in charge of loading a
>> > temporary mysqldb that is used to test that part of ytex.   For me it is
>> > working, but if  you can find a way to run that surefire test in the
>> > debugger, you can find out why it's failing on one of the assertions.
>> > Otherwise you can  take this shortcut
>> >
>> > mvn  -Dmaven.test.skip=true
>> >
>> > To build the project without running any tests.
>> >
>> > On Sat, Apr 27, 2024 at 7:35 AM gandhi rajan 
>> > wrote:
>> >
>> > > Hi Sean,
>> > >
>> > > When I tried to build the complete ctakes suite, i get build failure
>> for
>> > > ctakes-ytex module with the following error:
>> > >
>> > > [ERROR] Failed to execute goal
>> > > org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run
>> > > (generate-test-config) on project ctakes-ytex: An Ant BuildException
>> has
>> > > occured: The following error occurred while executing this line:
>> > > [ERROR]
>> > >
>> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\build-setup.xml:149:
>> > The
>> > > following error occurred while executing this line:
>> > > [ERROR]
>> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:148:
>> > The
>> > > following error occurred while executing this line:
>> > > [ERROR]
>> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scrip

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

2024-04-29 Thread Finan, Sean
Hi Gandhi,

Thank you for testing.  I have not seen this error but will try to see if I can 
reproduce it or otherwise diagnose it.

Before I build the release candidate I make sure that my build area, maven 
cache, temp directories, etc. are empty, but maybe I still have something left 
from a previous build.

Sean

From: gandhi rajan 
Sent: Monday, April 29, 2024 7:54 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

* External Email - Caution *


Thanks for the insights Peter. I dint make it clear that I did ran the
install on ytex module with test case execution toggled off. I used the
following command - "mvn -e clean install -Dmaven.test.skip=true" and I
still hit the same error.

On digging deep, I could find that the build process is trying to execute
"" in
build-main.xml which in turn is trying to invoke the following target in
build.setup.xml:





Did you try running this on a fresh setup Peter?

On Sun, 28 Apr 2024 at 01:17, Peter Abramowitsch 
wrote:

> Hi Gandhi
> Your error appears to be at this line
>
> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456: Java
> returned: 1
>
> A test application being run here:  AppMain is in charge of loading a
> temporary mysqldb that is used to test that part of ytex.   For me it is
> working, but if  you can find a way to run that surefire test in the
> debugger, you can find out why it's failing on one of the assertions.
> Otherwise you can  take this shortcut
>
> mvn  -Dmaven.test.skip=true
>
> To build the project without running any tests.
>
> On Sat, Apr 27, 2024 at 7:35 AM gandhi rajan 
> wrote:
>
> > Hi Sean,
> >
> > When I tried to build the complete ctakes suite, i get build failure for
> > ctakes-ytex module with the following error:
> >
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run
> > (generate-test-config) on project ctakes-ytex: An Ant BuildException has
> > occured: The following error occurred while executing this line:
> > [ERROR]
> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\build-setup.xml:149:
> The
> > following error occurred while executing this line:
> > [ERROR]
> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:148:
> The
> > following error occurred while executing this line:
> > [ERROR]
> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:295:
> The
> > following error occurred while executing this line:
> > [ERROR]
> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456:
> Java
> > returned: 1
> > [ERROR] around Ant part ... > target="test.setup">... @ 5:70 in
> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\target\antrun\build-main.xml
> >
> > Is this expected Sean?
> >
> > On Fri, 26 Apr 2024 at 21:30, Finan, Sean
> >  wrote:
> >
> > > Hi all,
> > >
> > > There is a candidate for version 5.1.0 of Apache cTAKES source code in
> a
> > > staging repository:
> > >
> > >
> >
> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/__;!!NZvER7FxgEiBAiR_!o1HhLIOtrhbcq3eWO7A8MyQs9yWveCrI0nWVqT7mgYPonu6AeAo8EI3Jpj0RSGZ-cVwLwf44oOtMoJQCtSuifMhxRi5BTyzwGA$
> > >
> > > The code is contained within the file:
> > > ctakes-5.1.0-source-release.zip<
> > >
> >
> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip__;!!NZvER7FxgEiBAiR_!o1HhLIOtrhbcq3eWO7A8MyQs9yWveCrI0nWVqT7mgYPonu6AeAo8EI3Jpj0RSGZ-cVwLwf44oOtMoJQCtSuifMhxRi48T86umQ$
> > > >
> > >
> > > I welcome you all to test your favorite pipeline(s) and report any
> > issues.
> > > I am calling a vote from the PMC to finish by 12:nn Eastern time, next
> > > Wednesday May 1.  Please report any issues before that time.  If any
> > > 'road-block' issues are found they will need to be addressed before a
> > > release.
> > >
> > > Thank you,
> > > Sean
> > >
> > >
> > > p.s.
> > >
> > > The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0
> > tag:
> > > https://urldefense.com/v3/__https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0__;!!NZvER7FxgEiBAiR_!o1HhLIOtrhbcq3eWO7A8MyQs9yWveCrI0nWVqT7mgYPonu6AeAo8EI3Jpj0RSGZ-cVwLwf44oOtMoJQCtSuifMhxRi7ofZf95w$
> > >
> > > The ctakes-5.1.0 tag was made from the 5.1.0 branch:
> > >

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

2024-04-29 Thread Finan, Sean
Hi Peter,

Thank you for testing!  I will see if I can get the mastif-zoner in the 
distribution and push a 5.1.1 candidate.


From: Peter Abramowitsch 
Sent: Saturday, April 27, 2024 1:48 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

* External Email - Caution *


Hi again Sean
Perfect Compile
Within our context and our pipeline, it runs well.
Tried with simple and complex pipelines.
I have not used most of the piperRunner/Creator/ scripts.
I haven't exercised any of the PBJ stuff yet.
I don't use the REST projects or the YTEX DB stuff - we have our own

Apart from the missing project I mentioned in the previous email that does
need to be fixed, I would give 5.1.0 a plus for release.

Peter

On Fri, Apr 26, 2024 at 8:41 PM Peter Abramowitsch 
wrote:

> Hi Sean,
>
> It all compiles, but one of the jars is missing from the distribution.
> It's the one I added:  ctakes-mastif-zoner which is required if you're
> going to use the Zone Annotator.
>
> It's in the master pom, and in the pom of ctakes-distribution, and the jar
> got built in its projecte, but it's not scooped up into the distribution.
> I'm not sure where else to look.Can you fix it?
>
> Peter
>
>
> On Fri, Apr 26, 2024 at 8:59 AM Finan, Sean
>  wrote:
>
>> Hi all,
>>
>> There is a candidate for version 5.1.0 of Apache cTAKES source code in a
>> staging repository:
>>
>> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/__;!!NZvER7FxgEiBAiR_!otrJcgURLIeDrSgLjElYcfJXYPS7d0mMiE8_4tzF072l9casDyG4p1GpjTe3piQ4w3ONCm1ycaUtHLQ5jhEQ3wLwVFpBqoBA6Q$
>>
>> The code is contained within the file:
>> ctakes-5.1.0-source-release.zip<
>> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip__;!!NZvER7FxgEiBAiR_!otrJcgURLIeDrSgLjElYcfJXYPS7d0mMiE8_4tzF072l9casDyG4p1GpjTe3piQ4w3ONCm1ycaUtHLQ5jhEQ3wLwVFpmZx3omA$
>> >
>>
>> I welcome you all to test your favorite pipeline(s) and report any issues.
>> I am calling a vote from the PMC to finish by 12:nn Eastern time, next
>> Wednesday May 1.  Please report any issues before that time.  If any
>> 'road-block' issues are found they will need to be addressed before a
>> release.
>>
>> Thank you,
>> Sean
>>
>>
>> p.s.
>>
>> The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0 tag:
>> https://urldefense.com/v3/__https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0__;!!NZvER7FxgEiBAiR_!otrJcgURLIeDrSgLjElYcfJXYPS7d0mMiE8_4tzF072l9casDyG4p1GpjTe3piQ4w3ONCm1ycaUtHLQ5jhEQ3wLwVFqRzUqi_A$
>>
>> The ctakes-5.1.0 tag was made from the 5.1.0 branch:
>> https://urldefense.com/v3/__https://github.com/apache/ctakes/tree/5.1.0__;!!NZvER7FxgEiBAiR_!otrJcgURLIeDrSgLjElYcfJXYPS7d0mMiE8_4tzF072l9casDyG4p1GpjTe3piQ4w3ONCm1ycaUtHLQ5jhEQ3wLwVFqL1XNqbA$
>>
>> The 5.1.0 branch is a copy of the main branch:
>> https://urldefense.com/v3/__https://github.com/apache/ctakes/tree/main__;!!NZvER7FxgEiBAiR_!otrJcgURLIeDrSgLjElYcfJXYPS7d0mMiE8_4tzF072l9casDyG4p1GpjTe3piQ4w3ONCm1ycaUtHLQ5jhEQ3wLwVFrvD9mPJw$
>> The version number in the 5.1.0 branch is different, but there are no
>> code differences between the two branches.
>>
>>
>>


Please test the Apache cTAKES 5.1.0 release candidate

2024-04-26 Thread Finan, Sean
Hi all,

There is a candidate for version 5.1.0 of Apache cTAKES source code in a 
staging repository:
https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/

The code is contained within the file:
ctakes-5.1.0-source-release.zip

I welcome you all to test your favorite pipeline(s) and report any issues.
I am calling a vote from the PMC to finish by 12:nn Eastern time, next 
Wednesday May 1.  Please report any issues before that time.  If any 
'road-block' issues are found they will need to be addressed before a release.

Thank you,
Sean


p.s.

The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0 tag:
https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0

The ctakes-5.1.0 tag was made from the 5.1.0 branch:
https://github.com/apache/ctakes/tree/5.1.0

The 5.1.0 branch is a copy of the main branch:
https://github.com/apache/ctakes/tree/main
The version number in the 5.1.0 branch is different, but there are no code 
differences between the two branches.




Re: Custom dictionary no-"no" [was: Re: PREFTERMs not included in UMLS rare-word dictionary?] [EXTERNAL]

2024-04-16 Thread Finan, Sean
Hi Kean,

I think that it excludes terms that start with "no" because the usual desired 
behavior is to rely upon one of the negation engines to determine that status.

That being said, missing things like "non Hodgkin's" is definitely not a 
desired behavior.  I will look at the code and see if I can determine what 
happened there.

Thanks for reporting,

Sean


From: Kean Kaufmann 
Sent: Tuesday, April 16, 2024 10:42 AM
To: dev@ctakes.apache.org 
Subject: Custom dictionary no-"no" [was: Re: PREFTERMs not included in UMLS 
rare-word dictionary?] [EXTERNAL]

* External Email - Caution *


Hi Sean,

I ran the dictionary creator tool from ctakes 5.0.0, and am happy to see
that preferred texts are now also lookup texts -- thank you!
However, I also note that almost all terms starting with the letters "no"
are now omitted, except for a few starting with "no fh : ".
From my custom dictionary, that's about 21K terms missing, including common
ones like "no appetite", "nocturia", "nodule", "non hodgkin lymphoma",
"nondisplaced intertrochanteric fracture", "normal pressure glaucoma", ...
Is this filtering expected?  Is there a way for the user to control it?

Thanks as always,
Kean

On Wed, Dec 6, 2023 at 5:22 PM Finan, Sean
 wrote:

> Hi Kean,
>
> I can't think of a good reason for preferred text to not also be a lookup
> text.  It sounds like you might have uncovered a flaw in the dictionary
> creator tool.
>
> Time for a rebuild with the 5.0 release ...
>
> Thanks for the report,
>
> Sean
>
> 
> From: Kean Kaufmann 
> Sent: Wednesday, December 6, 2023 4:12 PM
> To: dev@ctakes.apache.org 
> Subject: PREFTERMs not included in UMLS rare-word dictionary? [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean and fellow Fast Dictionary Lookup fans,
>
> I notice that the UmlsJdbcRareWordDictionary doesn't seem to index terms
> from PREFTERM, only CUI_TERMS.
> ...


Re: Build issues today [EXTERNAL]

2024-01-30 Thread Finan, Sean
Hi Jeff,

That sounds great - I am happy when ctakes beneficial in any way.

As far as simplicity goes, there is an installation tool in 5.0.0-SNAPSHOT: 
https://repository.apache.org/content/repositories/snapshots/org/apache/ctakes/ctakes-dockhand/5.0.0-SNAPSHOT/
I have to run some final tests, but I have been thinking of placing it in 
GitHub as an artifact.  The single jar is all that is needed and it is a 
self-contained executable - as long as java is in the $PATH.  It builds an 
installation with a piper file that can be edited by the user.

We are still trying to make standard installation easier, so if you have any 
suggestions please post them on the ctakes GitHub repo here: 
https://github.com/apache/ctakes/issues
with the label "enhancement".

Thank you for all the enhancements and technical information,

Sean


From: Jeffery Painter 
Sent: Tuesday, January 30, 2024 11:40 AM
To: dev@ctakes.apache.org 
Subject: Re: Build issues today [EXTERNAL]

* External Email - Caution *


Thanks Sean,

I think we may end up with a series of pubs out of this work. I just
want to emphasize how much more efficient it is to compute these metrics
using cTakes than the old Perl UMLS::Similarity module. I don't think
anyone realizes the amount of computational time this saves us (days to
compute matrices with Perl versus seconds/minutes with cTakes).
Hopefully we can shed some light on the benefits (even if it is a bit
more complicated to setup and use) :-)


-

Jeff



On 1/30/24 10:59, Finan, Sean wrote:
> Hi Jeff,
>
> This looks pretty nice.  Thank you for the reference - I gave it a skim but 
> will be more thorough in a second run through whenever I get the time.
>
> Sean
> 
> From: Jeffery Painter 
> Sent: Tuesday, January 30, 2024 9:02 AM
> To: dev@ctakes.apache.org 
> Subject: Re: Build issues today [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thanks Sean - it is working again with the 5.0.0 models. I'm actually
> glad you did not roll in my updates yet, as I found another subtle bug
> in the creation of the concept graphs and I have implemented 4 more of
> the kernel metrics which we are investigating based on the paper from
> Sanchez and Batet:
> https://urldefense.com/v3/__https://www.sciencedirect.com/science/article/pii/S1532046411000645__;!!NZvER7FxgEiBAiR_!s3995wdWcA05lOS5icNAI7H_mffo2rfYVluIJXerr1bCMEAvc8bYwNToVmeLxyWqHUBW3AC6KWehvAgdRIUcSeoiYg$
>
>
> I am adding in the Dice, Ochiai, Simpson and Braun-Blanquet methods
> shown in Table 3. I discovered some of the path metrics were being
> inflated due to invalid max-depth calculations in the original concept
> graph creation due to the root concept not being set to a depth of zero
> which I have fixed and now all appears to be working. I will cancel
> those PR's and update later today.
>
>
> Thanks,
>
> Jeff
>
>
>
> On 1/26/24 15:30, Finan, Sean wrote:
>> This is open for other responses, but I would probably use:
>>
>>
>> Savova, G. K. et al. Mayo clinical Text Analysis and Knowledge Extraction 
>> System (cTAKES): architecture, component evaluation and applications. J. Am. 
>> Med. Inform. Assoc. JAMIA 17, 507–513 (2010).
>>
>>
>> You can also point to the github repo if appropriate.
>>
>>
>> Sean
>>
>> 
>> From: Jeffery Painter 
>> Sent: Friday, January 26, 2024 1:19 PM
>> To: dev@ctakes.apache.org 
>> Subject: Re: Build issues today [EXTERNAL]
>>
>> * External Email - Caution *
>>
>>
>> Thanks!
>>
>> We are preparing an abstract to submit to ICPE in a couple of weeks - is
>> there a preferred way to reference cTakes in publications?
>>
>>
>> -
>>
>> Jeffery
>>
>>
>> On 1/26/24 12:29, Finan, Sean wrote:
>>> Hi Jeff,
>>>
>>> I updated the poms in 5.0.0-SNAPSHOT to use the correct models.  There 
>>> shouldn't be any problem building.
>>>
>>> Sean
>>> 
>>> From: Jeffery Painter 
>>> Sent: Tuesday, January 23, 2024 11:55 AM
>>> To: dev@ctakes.apache.org 
>>> Subject: Build issues today [EXTERNAL]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Just a heads up, getting several build issues due to the fact that the
>>> 5.0.0-SNAPSHOT jars for most of the ctakes-??-models.jar seem to have
>>> disappeared from the snapshot repository.
>>>
>>> Manually adjusting those to 5.0.1-SNAPSHOT is working for now, but
>>> thought I would let whoever controls the build process know that a fresh
>>> checkout from gi

Re: Build issues today [EXTERNAL]

2024-01-30 Thread Finan, Sean
Hi Jeff,

This looks pretty nice.  Thank you for the reference - I gave it a skim but 
will be more thorough in a second run through whenever I get the time.

Sean

From: Jeffery Painter 
Sent: Tuesday, January 30, 2024 9:02 AM
To: dev@ctakes.apache.org 
Subject: Re: Build issues today [EXTERNAL]

* External Email - Caution *


Thanks Sean - it is working again with the 5.0.0 models. I'm actually
glad you did not roll in my updates yet, as I found another subtle bug
in the creation of the concept graphs and I have implemented 4 more of
the kernel metrics which we are investigating based on the paper from
Sanchez and Batet:
https://urldefense.com/v3/__https://www.sciencedirect.com/science/article/pii/S1532046411000645__;!!NZvER7FxgEiBAiR_!s3995wdWcA05lOS5icNAI7H_mffo2rfYVluIJXerr1bCMEAvc8bYwNToVmeLxyWqHUBW3AC6KWehvAgdRIUcSeoiYg$


I am adding in the Dice, Ochiai, Simpson and Braun-Blanquet methods
shown in Table 3. I discovered some of the path metrics were being
inflated due to invalid max-depth calculations in the original concept
graph creation due to the root concept not being set to a depth of zero
which I have fixed and now all appears to be working. I will cancel
those PR's and update later today.


Thanks,

Jeff



On 1/26/24 15:30, Finan, Sean wrote:
> This is open for other responses, but I would probably use:
>
>
> Savova, G. K. et al. Mayo clinical Text Analysis and Knowledge Extraction 
> System (cTAKES): architecture, component evaluation and applications. J. Am. 
> Med. Inform. Assoc. JAMIA 17, 507–513 (2010).
>
>
> You can also point to the github repo if appropriate.
>
>
> Sean
>
> 
> From: Jeffery Painter 
> Sent: Friday, January 26, 2024 1:19 PM
> To: dev@ctakes.apache.org 
> Subject: Re: Build issues today [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thanks!
>
> We are preparing an abstract to submit to ICPE in a couple of weeks - is
> there a preferred way to reference cTakes in publications?
>
>
> -
>
> Jeffery
>
>
> On 1/26/24 12:29, Finan, Sean wrote:
>> Hi Jeff,
>>
>> I updated the poms in 5.0.0-SNAPSHOT to use the correct models.  There 
>> shouldn't be any problem building.
>>
>> Sean
>> 
>> From: Jeffery Painter 
>> Sent: Tuesday, January 23, 2024 11:55 AM
>> To: dev@ctakes.apache.org 
>> Subject: Build issues today [EXTERNAL]
>>
>> * External Email - Caution *
>>
>>
>> Just a heads up, getting several build issues due to the fact that the
>> 5.0.0-SNAPSHOT jars for most of the ctakes-??-models.jar seem to have
>> disappeared from the snapshot repository.
>>
>> Manually adjusting those to 5.0.1-SNAPSHOT is working for now, but
>> thought I would let whoever controls the build process know that a fresh
>> checkout from github is currently broken.
>>
>> Example:
>>
>> [ERROR] Failed to execute goal on project ctakes-ytex-uima: Could not
>> resolve dependencies for project
>> org.apache.ctakes:ctakes-ytex-uima:jar:5.0.0-SNAPSHOT: The following
>> artifacts could not be resolved:
>> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT
>> (absent):
>> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT was
>> not found in 
>> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzyp0Ej7uw$
>> during a previous attempt. This failure was cached in the local
>> repository and resolution is not reattempted until the update interval
>> of apache.snapshots has elapsed or updates are forced -> [Help 1]
>>
>>
>> Verified that core models have disappeared:
>>
>> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzxgwSJHlg$
>>
>> contains no jars
>>
>>
>> Thanks,
>>
>> Jeff
>>
>>
>>
>>


Re: Build issues today [EXTERNAL]

2024-01-26 Thread Finan, Sean
This is open for other responses, but I would probably use:


Savova, G. K. et al. Mayo clinical Text Analysis and Knowledge Extraction 
System (cTAKES): architecture, component evaluation and applications. J. Am. 
Med. Inform. Assoc. JAMIA 17, 507–513 (2010).


You can also point to the github repo if appropriate.


Sean


From: Jeffery Painter 
Sent: Friday, January 26, 2024 1:19 PM
To: dev@ctakes.apache.org 
Subject: Re: Build issues today [EXTERNAL]

* External Email - Caution *


Thanks!

We are preparing an abstract to submit to ICPE in a couple of weeks - is
there a preferred way to reference cTakes in publications?


-

Jeffery


On 1/26/24 12:29, Finan, Sean wrote:
> Hi Jeff,
>
> I updated the poms in 5.0.0-SNAPSHOT to use the correct models.  There 
> shouldn't be any problem building.
>
> Sean
> 
> From: Jeffery Painter 
> Sent: Tuesday, January 23, 2024 11:55 AM
> To: dev@ctakes.apache.org 
> Subject: Build issues today [EXTERNAL]
>
> * External Email - Caution *
>
>
> Just a heads up, getting several build issues due to the fact that the
> 5.0.0-SNAPSHOT jars for most of the ctakes-??-models.jar seem to have
> disappeared from the snapshot repository.
>
> Manually adjusting those to 5.0.1-SNAPSHOT is working for now, but
> thought I would let whoever controls the build process know that a fresh
> checkout from github is currently broken.
>
> Example:
>
> [ERROR] Failed to execute goal on project ctakes-ytex-uima: Could not
> resolve dependencies for project
> org.apache.ctakes:ctakes-ytex-uima:jar:5.0.0-SNAPSHOT: The following
> artifacts could not be resolved:
> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT
> (absent):
> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT was
> not found in 
> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzyp0Ej7uw$
> during a previous attempt. This failure was cached in the local
> repository and resolution is not reattempted until the update interval
> of apache.snapshots has elapsed or updates are forced -> [Help 1]
>
>
> Verified that core models have disappeared:
>
> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzxgwSJHlg$
>
> contains no jars
>
>
> Thanks,
>
> Jeff
>
>
>
>


Re: Build issues today [EXTERNAL]

2024-01-26 Thread Finan, Sean
Hi Jeff,

I updated the poms in 5.0.0-SNAPSHOT to use the correct models.  There 
shouldn't be any problem building.

Sean

From: Jeffery Painter 
Sent: Tuesday, January 23, 2024 11:55 AM
To: dev@ctakes.apache.org 
Subject: Build issues today [EXTERNAL]

* External Email - Caution *


Just a heads up, getting several build issues due to the fact that the
5.0.0-SNAPSHOT jars for most of the ctakes-??-models.jar seem to have
disappeared from the snapshot repository.

Manually adjusting those to 5.0.1-SNAPSHOT is working for now, but
thought I would let whoever controls the build process know that a fresh
checkout from github is currently broken.

Example:

[ERROR] Failed to execute goal on project ctakes-ytex-uima: Could not
resolve dependencies for project
org.apache.ctakes:ctakes-ytex-uima:jar:5.0.0-SNAPSHOT: The following
artifacts could not be resolved:
org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT
(absent):
org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT was
not found in 
https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzyp0Ej7uw$
during a previous attempt. This failure was cached in the local
repository and resolution is not reattempted until the update interval
of apache.snapshots has elapsed or updates are forced -> [Help 1]


Verified that core models have disappeared:

https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzxgwSJHlg$

contains no jars


Thanks,

Jeff





Re: Build issues today [EXTERNAL]

2024-01-24 Thread Finan, Sean
Hi Akram,


5.0.0 has not yet been released.

It is/will be exactly what is in the github repository currently labeled 
5.0.0-SNAPSHOT.  https://github.com/apache/ctakes

There is a 5.0.0 tag here 
https://github.com/apache/ctakes/releases/tag/ctakes-5.0.0   - though the 
versioning within still indicates -SNAPSHOT.

Sean



From: Akram 
Sent: Wednesday, January 24, 2024 12:13 PM
To: dev@ctakes.apache.org 
Subject: Re: Build issues today [EXTERNAL]

* External Email - Caution *


 Is cTAKES 5.0 ready to be used? or is it still under development?where can we 
download it?
On Wednesday, 24 January 2024 at 12:00:07 pm GMT-5, Finan, Sean 
 wrote:

 Hi Gandhi,

Jeff's problem is related to something that avoids the problem that you saw.  
You can get rid of your error if you switch the models to 5.0.0 instead of 
5.0.0-SNAPSHOT  (or 5.0.1-SNAPSHOT).  Those 'version' models are not served by 
the Snapshot server, so they can be picked up without masking problems from 
maven.  Putting together a 5.0.0 candidate has been troublesome, but it will 
happen.

Thanks,
Sean


From: gandhi rajan 
Sent: Wednesday, January 24, 2024 9:12 AM
To: dev@ctakes.apache.org 
Subject: Re: Build issues today [EXTERNAL]

* External Email - Caution *


HI Sean,

This seems to be the same issue which I discussed with you and Peter
sometime last month that the fresh build and install is broken with version
5.0.0. It needs some tweaking to make it work.

On Wed, 24 Jan 2024 at 02:47, Finan, Sean
 wrote:

> Thank you for the heads-up.
> 
> From: Jeffery Painter 
> Sent: Tuesday, January 23, 2024 11:55 AM
> To: dev@ctakes.apache.org 
> Subject: Build issues today [EXTERNAL]
>
> * External Email - Caution *
>
>
> Just a heads up, getting several build issues due to the fact that the
> 5.0.0-SNAPSHOT jars for most of the ctakes-??-models.jar seem to have
> disappeared from the snapshot repository.
>
> Manually adjusting those to 5.0.1-SNAPSHOT is working for now, but
> thought I would let whoever controls the build process know that a fresh
> checkout from github is currently broken.
>
> Example:
>
> [ERROR] Failed to execute goal on project ctakes-ytex-uima: Could not
> resolve dependencies for project
> org.apache.ctakes:ctakes-ytex-uima:jar:5.0.0-SNAPSHOT: The following
> artifacts could not be resolved:
> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT
> (absent):
> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT was
> not found in
> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzyp0Ej7uw$
> during a previous attempt. This failure was cached in the local
> repository and resolution is not reattempted until the update interval
> of apache.snapshots has elapsed or updates are forced -> [Help 1]
>
>
> Verified that core models have disappeared:
>
>
> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzxgwSJHlg$
>
> contains no jars
>
>
> Thanks,
>
> Jeff
>
>
>
>

--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"



Re: Build issues today [EXTERNAL]

2024-01-24 Thread Finan, Sean
Hi Gandhi,

Jeff's problem is related to something that avoids the problem that you saw.  
You can get rid of your error if you switch the models to 5.0.0 instead of 
5.0.0-SNAPSHOT  (or 5.0.1-SNAPSHOT).  Those 'version' models are not served by 
the Snapshot server, so they can be picked up without masking problems from 
maven.  Putting together a 5.0.0 candidate has been troublesome, but it will 
happen.

Thanks,
Sean


From: gandhi rajan 
Sent: Wednesday, January 24, 2024 9:12 AM
To: dev@ctakes.apache.org 
Subject: Re: Build issues today [EXTERNAL]

* External Email - Caution *


HI Sean,

This seems to be the same issue which I discussed with you and Peter
sometime last month that the fresh build and install is broken with version
5.0.0. It needs some tweaking to make it work.

On Wed, 24 Jan 2024 at 02:47, Finan, Sean
 wrote:

> Thank you for the heads-up.
> 
> From: Jeffery Painter 
> Sent: Tuesday, January 23, 2024 11:55 AM
> To: dev@ctakes.apache.org 
> Subject: Build issues today [EXTERNAL]
>
> * External Email - Caution *
>
>
> Just a heads up, getting several build issues due to the fact that the
> 5.0.0-SNAPSHOT jars for most of the ctakes-??-models.jar seem to have
> disappeared from the snapshot repository.
>
> Manually adjusting those to 5.0.1-SNAPSHOT is working for now, but
> thought I would let whoever controls the build process know that a fresh
> checkout from github is currently broken.
>
> Example:
>
> [ERROR] Failed to execute goal on project ctakes-ytex-uima: Could not
> resolve dependencies for project
> org.apache.ctakes:ctakes-ytex-uima:jar:5.0.0-SNAPSHOT: The following
> artifacts could not be resolved:
> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT
> (absent):
> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT was
> not found in
> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzyp0Ej7uw$
> during a previous attempt. This failure was cached in the local
> repository and resolution is not reattempted until the update interval
> of apache.snapshots has elapsed or updates are forced -> [Help 1]
>
>
> Verified that core models have disappeared:
>
>
> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzxgwSJHlg$
>
> contains no jars
>
>
> Thanks,
>
> Jeff
>
>
>
>

--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"


Re: Build issues today [EXTERNAL]

2024-01-23 Thread Finan, Sean
Hi Jeff,

Thanks for the PRs, I did get notifications but haven't yet had time to take a 
look.

The versioning issue with the models is because I am prepping a 5.0.0 release 
and had to make some -versioning- with the model artifacts first.  It is a long 
story, but 5.0.0 should be put up for a vote soon.  Right now there are 
technical release issues holding it up.

Because of the 5.0.0 release in-process, I think that we will need to wait 
until 5.0.1 (or whatever is next) to get your updates and improvements into a 
release.

Many thanks, and I look forward to seeing what you have done,

Sean


From: Jeffery Painter 
Sent: Tuesday, January 23, 2024 4:28 PM
To: dev@ctakes.apache.org 
Subject: Re: Build issues today [EXTERNAL]

* External Email - Caution *


Thanks Sean...

I tried just updating the parent pom.xml to 5.0.1-SNAPSHOT and that
seemed to work for now.

If you did not see, I uploaded my PR requests as well and verified each
one builds independently with the current main branch.

https://urldefense.com/v3/__https://github.com/apache/ctakes/pull/10__;!!NZvER7FxgEiBAiR_!rLnzw0dsZI7qw5aqEdEC-KT8vlgYdrqKbYivutEU69f7S85vYrWwI1c3zq9t2IjAyDW-iO4YIewdSUyYJ6tKsjoyhQ$

https://urldefense.com/v3/__https://github.com/apache/ctakes/pull/11__;!!NZvER7FxgEiBAiR_!rLnzw0dsZI7qw5aqEdEC-KT8vlgYdrqKbYivutEU69f7S85vYrWwI1c3zq9t2IjAyDW-iO4YIewdSUyYJ6t92B9cbQ$

https://urldefense.com/v3/__https://github.com/apache/ctakes/pull/12__;!!NZvER7FxgEiBAiR_!rLnzw0dsZI7qw5aqEdEC-KT8vlgYdrqKbYivutEU69f7S85vYrWwI1c3zq9t2IjAyDW-iO4YIewdSUyYJ6tZsaHpTg$

I am computing some large similarity matrices with the updated code
today, and so far so good!

Thanks,

Jeff




On 1/23/24 16:17, Finan, Sean wrote:
> Thank you for the heads-up.
> 
> From: Jeffery Painter 
> Sent: Tuesday, January 23, 2024 11:55 AM
> To: dev@ctakes.apache.org 
> Subject: Build issues today [EXTERNAL]
>
> * External Email - Caution *
>
>
> Just a heads up, getting several build issues due to the fact that the
> 5.0.0-SNAPSHOT jars for most of the ctakes-??-models.jar seem to have
> disappeared from the snapshot repository.
>
> Manually adjusting those to 5.0.1-SNAPSHOT is working for now, but
> thought I would let whoever controls the build process know that a fresh
> checkout from github is currently broken.
>
> Example:
>
> [ERROR] Failed to execute goal on project ctakes-ytex-uima: Could not
> resolve dependencies for project
> org.apache.ctakes:ctakes-ytex-uima:jar:5.0.0-SNAPSHOT: The following
> artifacts could not be resolved:
> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT
> (absent):
> org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT was
> not found in 
> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzyp0Ej7uw$
> during a previous attempt. This failure was cached in the local
> repository and resolution is not reattempted until the update interval
> of apache.snapshots has elapsed or updates are forced -> [Help 1]
>
>
> Verified that core models have disappeared:
>
> https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzxgwSJHlg$
>
> contains no jars
>
>
> Thanks,
>
> Jeff
>
>
>
>


Re: Build issues today [EXTERNAL]

2024-01-23 Thread Finan, Sean
Thank you for the heads-up.

From: Jeffery Painter 
Sent: Tuesday, January 23, 2024 11:55 AM
To: dev@ctakes.apache.org 
Subject: Build issues today [EXTERNAL]

* External Email - Caution *


Just a heads up, getting several build issues due to the fact that the
5.0.0-SNAPSHOT jars for most of the ctakes-??-models.jar seem to have
disappeared from the snapshot repository.

Manually adjusting those to 5.0.1-SNAPSHOT is working for now, but
thought I would let whoever controls the build process know that a fresh
checkout from github is currently broken.

Example:

[ERROR] Failed to execute goal on project ctakes-ytex-uima: Could not
resolve dependencies for project
org.apache.ctakes:ctakes-ytex-uima:jar:5.0.0-SNAPSHOT: The following
artifacts could not be resolved:
org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT
(absent):
org.apache.ctakes:ctakes-dependency-parser-models:jar:5.0.0-SNAPSHOT was
not found in 
https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzyp0Ej7uw$
during a previous attempt. This failure was cached in the local
repository and resolution is not reattempted until the update interval
of apache.snapshots has elapsed or updates are forced -> [Help 1]


Verified that core models have disappeared:

https://urldefense.com/v3/__https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT/__;!!NZvER7FxgEiBAiR_!rib8DHyV18yZjtkkuXnrMN2QW0rSURWoyL5_Smr6rhyFqr3JaSzUbSIpnLP1QfUSfCeKAE5VksX-Ls64kzxgwSJHlg$

contains no jars


Thanks,

Jeff





Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2023-12-20 Thread Finan, Sean
I hope so!

From: Miller, Timothy 
Sent: Wednesday, December 20, 2023 6:55 PM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


To some extent I think (and hope!) it will be superseded by the PBJ code that 
will be in cTAKES 5.0.0 anyways.
Tim


From: Finan, Sean 
Date: Wednesday, December 20, 2023 at 3:43 PM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


Hi Tim,

Thanks for the explanation.  I am going to remove the BERTRest classes.

Sean

From: Miller, Timothy 
Sent: Wednesday, December 20, 2023 6:25 PM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Hi Sean and Peter,
I put the BERTRest stuff in, with the intention of finishing it and adding the 
python code to run the REST server, but just never finished it up. I’m ok with 
leaving it out for now. (Now that we are on GitHub it would be so much easier 
to do things like this in branches and only merge when it’s actually finished!)
Thanks
Tim


From: Finan, Sean 
Date: Tuesday, December 5, 2023 at 10:59 AM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Peter,

My thoughts on this:

> a newer version of Mastif than Ctakes is packaged with, and additional 
> modifications that I've made
- Are you saying that you made a local/custom version of mastif that builds 
upon their latest (1.6?) version?  Or do you just need to update the ctakes 
dependency from 1.4 to 1.6?
- If you have a custom version of mastif, one way that I have dealt with this 
in other projects is to keep changes to the standard library 'in parallel' and 
call the parallel versions in my/our code.  If mastif is written in java then 
this is fairly easy to do by creating another module that shares the package 
structure of mastif.  The parallel code can be put into ctakes.
- Another way to deal with it is to distribute your custom code and jar with 
ctakes.  Mastif appears to use an Apache 2.0 license, so I think that this can 
be done.  If your changes are extensive or make the parallel option 
inconvenient then this may be the way to go.  "Developed by the Apache Software 
Foundation 
<https://urldefense.com/v3/__https://www.apache.org/__;!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTR-i2_Eg$
 > and introduced in 2004, the Apache 2.0 License is a is a permissive free 
software license. The license permits use of the software for any purpose, 
users are able to distribute it, to modify it, and to distribute modified 
versions of the software."  - 
https://urldefense.com/v3/__https://pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$<https://urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$><https://urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$%3chttps:/urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$%3e>
 .
- For an inclusion of a modified mastif in ctakes, maybe just put the whole 
thing into a project named "ctakes-mastif".

> no documentation or references
- A common problem with ctakes code, tests, resources ...  unfortunate and 
difficult to deal with.  I am guilty of some of this paucity of information.

> dependent on a BertREST server
- In instances such as this I would say that somebody checked in unfinished 
code, or that somebody forgot to check in a resource.  However, this particular 
code probably came from a developer working on an external project and checked 
in code that is intended to be used by that project.
- For any new developers out there: It is a 'best practice' to create your own 

Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2023-12-20 Thread Finan, Sean
Hi Tim,

Thanks for the explanation.  I am going to remove the BERTRest classes.

Sean

From: Miller, Timothy 
Sent: Wednesday, December 20, 2023 6:25 PM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Hi Sean and Peter,
I put the BERTRest stuff in, with the intention of finishing it and adding the 
python code to run the REST server, but just never finished it up. I’m ok with 
leaving it out for now. (Now that we are on GitHub it would be so much easier 
to do things like this in branches and only merge when it’s actually finished!)
Thanks
Tim


From: Finan, Sean 
Date: Tuesday, December 5, 2023 at 10:59 AM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Peter,

My thoughts on this:

> a newer version of Mastif than Ctakes is packaged with, and additional 
> modifications that I've made
- Are you saying that you made a local/custom version of mastif that builds 
upon their latest (1.6?) version?  Or do you just need to update the ctakes 
dependency from 1.4 to 1.6?
- If you have a custom version of mastif, one way that I have dealt with this 
in other projects is to keep changes to the standard library 'in parallel' and 
call the parallel versions in my/our code.  If mastif is written in java then 
this is fairly easy to do by creating another module that shares the package 
structure of mastif.  The parallel code can be put into ctakes.
- Another way to deal with it is to distribute your custom code and jar with 
ctakes.  Mastif appears to use an Apache 2.0 license, so I think that this can 
be done.  If your changes are extensive or make the parallel option 
inconvenient then this may be the way to go.  "Developed by the Apache Software 
Foundation 
<https://urldefense.com/v3/__https://www.apache.org/__;!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTR-i2_Eg$
 > and introduced in 2004, the Apache 2.0 License is a is a permissive free 
software license. The license permits use of the software for any purpose, 
users are able to distribute it, to modify it, and to distribute modified 
versions of the software."  - 
https://urldefense.com/v3/__https://pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$<https://urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$>
 .
- For an inclusion of a modified mastif in ctakes, maybe just put the whole 
thing into a project named "ctakes-mastif".

> no documentation or references
- A common problem with ctakes code, tests, resources ...  unfortunate and 
difficult to deal with.  I am guilty of some of this paucity of information.

> dependent on a BertREST server
- In instances such as this I would say that somebody checked in unfinished 
code, or that somebody forgot to check in a resource.  However, this particular 
code probably came from a developer working on an external project and checked 
in code that is intended to be used by that project.
- For any new developers out there: It is a 'best practice' to create your own 
project and include ctakes as a dependency.  Keep your project code only in 
your project repository.  If you want to make changes to ctakes in parallel, 
you can also create a module in your ctakes source root and put your non-ctakes 
code only in that module.  Don't check in that module!
- All that said, everybody forgets/makes mistakes/hurries ...


Sean


From: Peter Abramowitsch 
Sent: Tuesday, December 5, 2023 12:38 PM
To: dev@ctakes.apache.org 
Subject: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL]

* External Email - Caution *


The question is:  what is our policy if a resource in the ctakes archive
depends upon another resource that is not in the archive and may not be
available elsewhere.  I'm sure there are other examples, but here are
two

1.   I've done some enhancements to the ZoneAnnotator for note section
detection, but these depend upon a newer version of Mastif than Ctakes is
packaged with, and additional modifications that I've made.   If I do add
the updates to the Zone Annotator, where should I put the customized Mastif
library - does it belong in cTakes?

2.  I found a couple of interesting annotators in the archive that are
dependent on a BertREST server

Re: PREFTERMs not included in UMLS rare-word dictionary? [EXTERNAL]

2023-12-06 Thread Finan, Sean
Hi Kean,

I can't think of a good reason for preferred text to not also be a lookup text. 
 It sounds like you might have uncovered a flaw in the dictionary creator tool.

Time for a rebuild with the 5.0 release ...

Thanks for the report,

Sean


From: Kean Kaufmann 
Sent: Wednesday, December 6, 2023 4:12 PM
To: dev@ctakes.apache.org 
Subject: PREFTERMs not included in UMLS rare-word dictionary? [EXTERNAL]

* External Email - Caution *


Hi Sean and fellow Fast Dictionary Lookup fans,

I notice that the UmlsJdbcRareWordDictionary doesn't seem to index terms
from PREFTERM, only CUI_TERMS.

For instance, my custom dictionary script has the term "Angina Pectoris" as
a PREFTERM,
but "angina pectoris" isn't among the CUI_TERMS inserts:

INSERT INTO PREFTERM VALUES(2962,'Angina Pectoris')
INSERT INTO CUI_TERMS VALUES(2962,0,3,'ischemic chest pain','ischemic')
INSERT INTO CUI_TERMS VALUES(2962,0,3,'ischaemic chest pain','ischaemic')
INSERT INTO CUI_TERMS VALUES(2962,0,1,'angina','angina')
INSERT INTO CUI_TERMS VALUES(2962,4,5,'pain ; chest , ischemic','ischemic')
INSERT INTO CUI_TERMS VALUES(2962,0,2,'anginal discomfort','anginal')
INSERT INTO CUI_TERMS VALUES(2962,4,5,'chest ; pain , ischemic','ischemic')
INSERT INTO CUI_TERMS VALUES(2962,0,2,'anginal syndrome','anginal')
INSERT INTO CUI_TERMS VALUES(2962,2,3,'syndrome ; anginal','anginal')
INSERT INTO CUI_TERMS VALUES(2962,0,1,'stenocardia','stenocardia')
INSERT INTO CUI_TERMS VALUES(2962,0,3,'anginal ; syndrome','anginal')
INSERT INTO CUI_TERMS VALUES(2962,0,2,'angor pectoris','angor')
INSERT INTO CUI_TERMS VALUES(2962,0,1,'stenocardias','stenocardias')
INSERT INTO CUI_TERMS VALUES(2962,2,3,'chest pain ischemic','ischemic')
INSERT INTO CUI_TERMS VALUES(2962,0,2,'anginal pain','anginal')
INSERT INTO CUI_TERMS VALUES(2962,0,1,'anginas','anginas')

So, in text containing the phrase "angina pectoris", the UmlsLookup
annotators identify only the CUI_TERMS term "angina" as a
SignSymptomMention.

First off, am I missing something?
I haven't used the default ctakessnorx.script dictionary for years.
Is this a peculiarity of the custom dictionaries I've been building?
Is there an option to include PREFTERMs in the rare-word index?
Or is there some reason *not* to include PREFTERMs -- would they mess up
the rare-word indexing somehow?

Certainly many PREFTERMs would never occur in the wild (e.g. "Benign
essential hypertension (disorder)"), but there are quite a few common
clinical terms that are in PREFTERM but not CUI_TERMs.  Off the top:
C0017168 gastroesophageal reflux disease, C0018802 congestive heart
failure, C0022104 irritable bowel syndrome, ...
We've been adding these to a supplementary BSV file as they come up, but
there are many more. This HSQL query for PREFTERM-only disorders on my
custom dictionary returns 175K+ rows; at first blush, 20% look legit.

select cui,lcase(prefterm) as prefterm
from tui t join prefterm p on p.cui=t.cui
and t.tui in
(19,20,33,34,37,40,41,42,43,44,45,46,47,48,49,50,56,57,184,190,191)
except (select cui,text from cui_terms c where c.cui=cui);

Thanks for any thoughts, and happy holidays.

Kean Kaufmann
RecordsOne


Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL]

2023-12-05 Thread Finan, Sean
Hi Peter,

My thoughts on this:

> a newer version of Mastif than Ctakes is packaged with, and additional 
> modifications that I've made
- Are you saying that you made a local/custom version of mastif that builds 
upon their latest (1.6?) version?  Or do you just need to update the ctakes 
dependency from 1.4 to 1.6?
- If you have a custom version of mastif, one way that I have dealt with this 
in other projects is to keep changes to the standard library 'in parallel' and 
call the parallel versions in my/our code.  If mastif is written in java then 
this is fairly easy to do by creating another module that shares the package 
structure of mastif.  The parallel code can be put into ctakes.
- Another way to deal with it is to distribute your custom code and jar with 
ctakes.  Mastif appears to use an Apache 2.0 license, so I think that this can 
be done.  If your changes are extensive or make the parallel option 
inconvenient then this may be the way to go.  "Developed by the Apache Software 
Foundation  and introduced in 2004, the Apache 2.0 
License is a is a permissive free software license. The license permits use of 
the software for any purpose, users are able to distribute it, to modify it, 
and to distribute modified versions of the software."  - 
https://pitt.libguides.com/openlicensing/apache2#:~:text=Apache%20License,modified%20versions%20of%20the%20software.
- For an inclusion of a modified mastif in ctakes, maybe just put the whole 
thing into a project named "ctakes-mastif".

> no documentation or references
- A common problem with ctakes code, tests, resources ...  unfortunate and 
difficult to deal with.  I am guilty of some of this paucity of information.

> dependent on a BertREST server
- In instances such as this I would say that somebody checked in unfinished 
code, or that somebody forgot to check in a resource.  However, this particular 
code probably came from a developer working on an external project and checked 
in code that is intended to be used by that project.
- For any new developers out there: It is a 'best practice' to create your own 
project and include ctakes as a dependency.  Keep your project code only in 
your project repository.  If you want to make changes to ctakes in parallel, 
you can also create a module in your ctakes source root and put your non-ctakes 
code only in that module.  Don't check in that module!
- All that said, everybody forgets/makes mistakes/hurries ...


Sean


From: Peter Abramowitsch 
Sent: Tuesday, December 5, 2023 12:38 PM
To: dev@ctakes.apache.org 
Subject: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL]

* External Email - Caution *


The question is:  what is our policy if a resource in the ctakes archive
depends upon another resource that is not in the archive and may not be
available elsewhere.  I'm sure there are other examples, but here are
two

1.   I've done some enhancements to the ZoneAnnotator for note section
detection, but these depend upon a newer version of Mastif than Ctakes is
packaged with, and additional modifications that I've made.   If I do add
the updates to the Zone Annotator, where should I put the customized Mastif
library - does it belong in cTakes?

2.  I found a couple of interesting annotators in the archive that are
dependent on a BertREST server, but there's no documentation or references
as to what code base that server comes from or whether its BERT model is
even publicly available.

DocTimeRelBertRestAnnotator
TemporalBertRestAnnotator
PolarityBertRestAnnotator

Here's my feeling:  Ctakes sources should be packaged to either be
self-sufficient or based on publicly available dependencies at the time of
check in.  If we really want to keep dangling sources, there should be a
separate folder for them rather than mixing them in with the living
product.   But, for now, I would be even happier if whoever checked in the
BertRest based annotators could provide links and documentation to the
dependencies

Your thoughts?
Peter


Re: Compilation Errors and the context.tokenizer [EXTERNAL]

2023-11-27 Thread Finan, Sean
As for slf4j being on http:, I don't know that I ever saw that.  If you check 
maven central it is actually https:

https://repo1.maven.org/maven2/org/slf4j/

As referred to here:
https://mvnrepository.com/artifact/org.slf4j/slf4j-api/2.0.5

I will do some more research on this tonight, though I welcome people to beat 
me to a solution!

Sean


From: Peter Abramowitsch 
Sent: Monday, November 27, 2023 7:01 AM
To: dev@ctakes.apache.org 
Subject: Re: Compilation Errors and the context.tokenizer [EXTERNAL]

* External Email - Caution *


Ghandi,

As I mentioned at the beginning of this thread, there's only one change I
needed to make to the settings.xml:  to comment out the dummy mirror server
that maven uses as a way of enforcing the https requirement.  Commenting it
out allowed maven to fetch slf4j-api.

Since slf4j  is integral to the build, it halted everything.   The error
manifested itself as maven just hanging with a message saying it was trying
to access 
https://urldefense.com/v3/__http://0.0.0.0__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHxuIKXiFw$
I made no other changes and there's nothing
about the settings.xml which is ctakes-specific.

Comment out this part of settings:


  maven-default-http-blocker
  external:http:*
  Pseudo repository to mirror external repositories initially
using HTTP.
  
https://urldefense.com/v3/__http://0.0.0.0/__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHzn8LRPpw$
 
  true


Have you already tried doing it?

For the model.jars it seems they are just brought to your local maven repo
from maven.apache.org.
Then a goal called unpack-dependencies copies the relevant files into your
build tree.

Generally, if you look very carefully at the maven output, it will tell you
what is the original cause of your error.
Don't just go by what the last output lines are

If this doesn't help,  I'm afraid you'll need to ask someone else who may
be able to ask you better questions about your environment.

Peter

On Mon, Nov 27, 2023 at 11:07 AM gandhi rajan 
wrote:

> Hi Peter,
>
> I did found your old discussion with Sean in the following link -
> https://urldefense.com/v3/__https://lists.apache.org/thread/w9c33421vxb21bnr6gd9r2tb3n1odnnw__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHxc2xW_0A$
>
> I am facing the same issue while building cakes core module. Could you
> please send me the URL details from your 'settings.xml' under
> '/usr/local/maven/config'
> folder to figure out from which repo you are trying to pull the
> dependencies from during build?
>
> On Mon, 27 Nov 2023 at 13:22, Peter Abramowitsch 
> wrote:
>
> > Hi Ghandi
> > I did some checking around and sure enough, the resource files are not in
> > the git archive.  I remember a conversation about this from long ago,
> that
> > git wasn't the best place for large binaries.  And they're not in git.
> So
> > I looked in my maven repository to see where my model files had come
> from.
> > Those models and in fact all the resource data for ctakes comes from
> these
> > sources:
> >
> > here's the content of my
> > .m2/repository/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT
> >
> > _remote.repositories
> > ctakes-core-models-5.0.0-20221224.062752-3.jar
> > ctakes-core-models-5.0.0-20221224.062752-3.jar.sha1
> > ctakes-core-models-5.0.0-20221224.062752-3.pom
> > ctakes-core-models-5.0.0-20221224.062752-3.pom.sha1
> > ctakes-core-models-5.0.0-SNAPSHOT-javadoc.jar.lastUpdated
> > ctakes-core-models-5.0.0-SNAPSHOT-sources.jar.lastUpdated
> > ctakes-core-models-5.0.0-SNAPSHOT.jar
> > ctakes-core-models-5.0.0-SNAPSHOT.pom
> > m2e-lastUpdated.properties
> > maven-metadata-apache.snapshots.xml
> > maven-metadata-apache.snapshots.xml.sha1
> > resolver-status.properties
> >
> > and here's the content of m2e-lastUpdated.properties
> >
> > #Fri Nov 17 22:03:31 CET 2023
> > apache.snapshots|https\://
> > repository.apache.org/content/groups/snapshots/|javadoc=1700255011056
>   >
> > <
> https://urldefense.com/v3/__http://repository.apache.org/content/groups/snapshots/*7Cjavadoc=1700255011056__;JQ!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHw9y4WMOQ$
> >
> > apache.snapshots|https\://
> > repository.apache.org/content/groups/snapshots/|sources=1700255007489
> 

Re: Compilation Errors and the context.tokenizer [EXTERNAL]

2023-11-27 Thread Finan, Sean
Hi guys,

There is supposed to be another way to get around this that doesn't require 
changes to the maven setting.xml

I checked in the fix back in January and it looks like the change is still 
present in the main branch.
https://github.com/apache/ctakes/commit/ec37c5948a0abc6dcd278077911db2a651c288f8


The problem as I discovered was:

  1.  maven 3.8 stopped supporting http: links by default - which Peter has 
noted.
  2.  For some reason maven points to apache builds with the prefix http:  - 
even though they are actually https:

https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/

Anyway, the change that I made to the main pom has since allowed everybody at 
boston children's hospital to build without jumping through any hoops.  I 
suppose that there could be something else that needs to be done.
A different fix/workaround is listed here: 
https://stackoverflow.com/questions/67001968/how-to-disable-maven-blocking-external-http-repositories

I did not use that solution as it causes greater perturbation and (from our 
testing) didn't seem to be necessary.

I am glad that the problem's resurgence is being caught now, even though I 
cannot duplicate it without backing out my previous change.

Sean



From: Peter Abramowitsch 
Sent: Monday, November 27, 2023 7:01 AM
To: dev@ctakes.apache.org 
Subject: Re: Compilation Errors and the context.tokenizer [EXTERNAL]

* External Email - Caution *


Ghandi,

As I mentioned at the beginning of this thread, there's only one change I
needed to make to the settings.xml:  to comment out the dummy mirror server
that maven uses as a way of enforcing the https requirement.  Commenting it
out allowed maven to fetch slf4j-api.

Since slf4j  is integral to the build, it halted everything.   The error
manifested itself as maven just hanging with a message saying it was trying
to access 
https://urldefense.com/v3/__http://0.0.0.0__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHxuIKXiFw$
I made no other changes and there's nothing
about the settings.xml which is ctakes-specific.

Comment out this part of settings:


  maven-default-http-blocker
  external:http:*
  Pseudo repository to mirror external repositories initially
using HTTP.
  
https://urldefense.com/v3/__http://0.0.0.0/__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHzn8LRPpw$
 
  true


Have you already tried doing it?

For the model.jars it seems they are just brought to your local maven repo
from maven.apache.org.
Then a goal called unpack-dependencies copies the relevant files into your
build tree.

Generally, if you look very carefully at the maven output, it will tell you
what is the original cause of your error.
Don't just go by what the last output lines are

If this doesn't help,  I'm afraid you'll need to ask someone else who may
be able to ask you better questions about your environment.

Peter

On Mon, Nov 27, 2023 at 11:07 AM gandhi rajan 
wrote:

> Hi Peter,
>
> I did found your old discussion with Sean in the following link -
> https://urldefense.com/v3/__https://lists.apache.org/thread/w9c33421vxb21bnr6gd9r2tb3n1odnnw__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHxc2xW_0A$
>
> I am facing the same issue while building cakes core module. Could you
> please send me the URL details from your 'settings.xml' under
> '/usr/local/maven/config'
> folder to figure out from which repo you are trying to pull the
> dependencies from during build?
>
> On Mon, 27 Nov 2023 at 13:22, Peter Abramowitsch 
> wrote:
>
> > Hi Ghandi
> > I did some checking around and sure enough, the resource files are not in
> > the git archive.  I remember a conversation about this from long ago,
> that
> > git wasn't the best place for large binaries.  And they're not in git.
> So
> > I looked in my maven repository to see where my model files had come
> from.
> > Those models and in fact all the resource data for ctakes comes from
> these
> > sources:
> >
> > here's the content of my
> > .m2/repository/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT
> >
> > _remote.repositories
> > ctakes-core-models-5.0.0-20221224.062752-3.jar
> > ctakes-core-models-5.0.0-20221224.062752-3.jar.sha1
> > ctakes-core-models-5.0.0-20221224.062752-3.pom
> > ctakes-core-models-5.0.0-20221224.062752-3.pom.sha1
> > ctakes-core-models-5.0.0-SNAPSHOT-javadoc.jar.lastUpdated
> > ctakes-core-models-5.0.0-SNAPSHOT-sources.jar.lastUpdated
> > ctakes-core-models-5.0.0-SNAPSHOT.jar
> > ctakes-core-models-5.0.0-SNAPSHOT.pom
> > m2e-lastUpdated.properties
> > maven-metadata-apache.snapshots.xml
> > maven-metadata-apache.snapshots.xml.sha1
> > resolver-status.properties
> >
> > and here's the content of m2e-lastUpdated.properties
> >
> > #Fri Nov 17 22:03:31 CET 2023
> > 

Re: Starting to look at 5.0 repo and found this... [EXTERNAL]

2023-11-17 Thread Finan, Sean
Hi Pete,

Thanks for testing!

I haven't seen those compilation errors, but I think that changing the import 
statements to fit the project structure is the best thing to do.

Just out of curiosity, are those imports in any particular module?

Thanks,

Sean

From: Peter Abramowitsch 
Sent: Friday, November 17, 2023 5:12 PM
To: dev@ctakes.apache.org 
Subject: Starting to look at 5.0 repo and found this... [EXTERNAL]

* External Email - Caution *


Hi all,   Looking at the 5.0 repo, there's a compilation error across many
projects because what is being imported as
*org.apache.ctakes.contexttokenizer.ae.**

is actually located in package
*org.apache.ctakes.context.tokenizer.ae
*
and the maven artifact is declared as
*ctakes-context-tokenizer*

I've changed the import statements in 13 files, but if there's anyone who
feels strongly that I should leave those alone and change the package &
folder instead, let me know.

DId anyone else notice this too?

Peter


Re: Testing the 5.0 version [EXTERNAL]

2023-08-10 Thread Finan, Sean
Hi Gandhi,

That is a great idea!

I would like to put off adding new functionality until 5.0 is released.
I am hoping that we can release what is in the GitHub repo as it is right now, 
save for bug fixes.

I will try to keep your spring boot idea on my radar for a version 6 upgrade.  
Would you be able to help with that?

Thanks,
Sean


From: gandhi rajan 
Sent: Thursday, August 10, 2023 1:20 PM
To: dev@ctakes.apache.org 
Subject: Re: Testing the 5.0 version [EXTERNAL]

* External Email - Caution *


Hi Sean,

One area I could think of is to transform ctakes web rest module from
traditional spring framework to spring boot framework which can enable the
end users to bootstrap the REST API easily and test the same which could
improve the overall adoption rate without major complexity.

On Thu, 10 Aug 2023 at 20:07, Finan, Sean
 wrote:

> Hi Peter,
>
> That is great news.  I sometime soon I will take a gander at ctakes and
> see if I can identify areas of importance or concern to me and what I might
> do to test them.  However, don't think of that as being a definitive list.
>
> All, please take advantage of Peter's offer and share items that you would
> like to receive some attention.
>
> If anybody can, please work with Peter to help keep ctakes a top-notch
> application for clinical NLP.
>
> Cheers,
>
> Sean
>
> 
> From: Peter Abramowitsch 
> Sent: Monday, August 7, 2023 11:48 AM
> To: dev@ctakes.apache.org 
> Subject: Testing the 5.0 version [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean,   looks like my funding for some experimentation with 5.0 is
> finally going to happen in a month or so.  I'm going to be looking at all
> the new functionality (I'm back on a branch of 4.0.1 on a custom
> webservices platform),  but is there any particular area of 5.0 that you'd
> like me to exercise?
>
> Peter
>


--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"


Re: Testing the 5.0 version [EXTERNAL]

2023-08-10 Thread Finan, Sean
Hi Peter,

That is great news.  I sometime soon I will take a gander at ctakes and see if 
I can identify areas of importance or concern to me and what I might do to test 
them.  However, don't think of that as being a definitive list.

All, please take advantage of Peter's offer and share items that you would like 
to receive some attention.

If anybody can, please work with Peter to help keep ctakes a top-notch 
application for clinical NLP.

Cheers,

Sean


From: Peter Abramowitsch 
Sent: Monday, August 7, 2023 11:48 AM
To: dev@ctakes.apache.org 
Subject: Testing the 5.0 version [EXTERNAL]

* External Email - Caution *


Hi Sean,   looks like my funding for some experimentation with 5.0 is
finally going to happen in a month or so.  I'm going to be looking at all
the new functionality (I'm back on a branch of 4.0.1 on a custom
webservices platform),  but is there any particular area of 5.0 that you'd
like me to exercise?

Peter


Re: Building 2023AA Snomed RxNorm dictionary fails [EXTERNAL]

2023-08-07 Thread Finan, Sean
Hi Akram,

The first thing that I'll mention is that there are a lot of updates to the 
ctakes Dictionary builder in the unreleased version 5.0, so I am going to talk 
about its use.  https://github.com/apache/ctakes

>  1. Combining SNOMED and RxNorm in Dictionary Creation:
> I extracted data from umls-2023AA-full and RxNorm_full. After utilizing NLM 
> Metamorphosys to install UMLS, the conversion of SNOMED from umls-2023AA-full 
> into RRF files was successfully accomplished.
- For clarity, are you stating that you created RRF files for snomed from 
umls-2023AA_full and separate RRF files from RxNorm_full sources ?  If so, are 
you sure that UMLS 2023AA_full doesn't contain all of the RxNorm information 
that you need?

> I can only select one "UMLS Installation" source, limiting me to either 
> SNOMED or RxNorm.
- This is correct.  Normally a dictionary is built from RRF files created using 
metamorphosys on a single source.
- There are two possible clobberings to combine dictionaries from disparate 
sources:

  1.  Concatenate the source RRF files from both sources.  You should only need 
to do this with the MRCONSO RRF files.  Then select the directory containing 
the concatenated RRF (and other RRF files) as the umls source for the 
dictionary creator gui.
  2.   Build 2 ctakes dictionaries, one from each source.  Then concatenate all 
"INSERT" lines into one dictionary file.

- A cleaner method for your situation is to create one ctakes dictionary for 
snomed and a separate ctakes dictionary for rxnorm.  Then create a dictionary 
descriptor file for multiple dictionaries.  Tim Miller has a great example of 
one here:  
https://github.com/tmills/ctakes-docker/blob/master/ctakes-as-pipeline/MultipleDictionaryLookupSpecExample.xml
- The multiple dictionary approach is more flexible, but try not to use 
multiple dictionaries with a lot of overlap.

> 2. Error Message During Dictionary Build:
> Log Message: user lacks privilege or object not found: MED in statement 
> [insert into MED-RT (CUI,MED-RT)  values (?,?)]
- I think that vocabularies containing a dash in the name such as "MED-RT" were 
problematic in older versions of the dictionary creator.  It should be ok with 
v.5
- The problem stemmed from SQL not allowing dash characters in table names 
without special treatment.  ctakes gets around it by converting the dash 
character to an underscore.

Sean


From: Akram 
Sent: Saturday, August 5, 2023 11:34 AM
To: dev@ctakes.apache.org 
Subject: Building 2023AA Snomed RxNorm dictionary fails [EXTERNAL]

* External Email - Caution *

Hi All

I've been working on creating a dictionary for the 2023AA UMLS, specifically 
incorporating SNOMED and RxNorm. However, I've encountered two main challenges 
that I'm hoping to get assistance with:

1. Combining SNOMED and RxNorm in Dictionary Creation:
   To initiate the process, I extracted data from umls-2023AA-full and 
RxNorm_full. After utilizing NLM Metamorphosys to install UMLS, the conversion 
of SNOMED from umls-2023AA-full into RRF files was successfully accomplished. 
However, when I proceeded to employ cTAKES Dictionary Creator for transforming 
UMLS SNOMED and RxNorm into a singular dictionary, I encountered an issue. The 
challenge lies in the fact that I can only select one "UMLS Installation" 
source, limiting me to either SNOMED or RxNorm. Is there a viable solution that 
would enable me to effectively incorporate both SNOMED and RxNorm into the 
dictionary generation process?

2. Error Message During Dictionary Build:
   Following the selection of the NLM Metamorphosys output folder as the "UMLS 
Installation" source and checking all relevant boxes for Vocabulary and 
Semantic Type, I clicked on "Build Dictionary". Unfortunately, this action 
resulted in an error message being displayed. I'm seeking guidance on how to 
address and resolve this error in order to successfully complete the dictionary 
creation process.

Error Message: Dictionary ctakesdictionary could not be built in F:\cTAKES
Log Message: user lacks privilege or object not found: MED in statement [insert 
into MED-RT (CUI,MED-RT)  values (?,?)]

I truly appreciate any assistance that can be provided in overcoming these 
challenges.

Thank you very much.

[Inline image]

[Inline image]



Re: What's new in 5.0 && testing JDK 11 [EXTERNAL]

2023-05-26 Thread Finan, Sean
Hi Peter,

In short (I have to run right now):

I am really glad that your employer will give you time to work with ctakes!

#1 The jdk 11 build compatibility was a necessity.  It turned out that the 
Apache Jenkins system no longer has a version of maven built with java 8.  They 
seem to also be bumping up their lowest version of java.
Unless I just couldn't find one.  It is still a selectable choice in the 
Jenkins configuration wizards, it just doesn't actually exist.  According to 
jira reports others found the same thing.
So, the only solution was for me to make ctakes buildable with jdk 11 and maven 
built with jdk 11.  However, it builds through java 11 as java 8 compatible 
byte code.  That was done for our CI on Jenkins.  I can still build ctakes with 
jdk 8.

 #2 I am not certain what issues you had.  Unfortunately my employer's email 
system mangles anything that looks like a link, and they become completely 
unreadable.

#3 Dennis Johns and I are trying to get the 5.0 wiki into shape, including 
differences between versions 4 and 5.
You guessed right about pbj.  It stands for "Python Bridge to Java" and uses 
artemis as a go-between for ctakes and python pipelines.  Though it allows 
python to java, it can be used pretty much any way you would imagine.  java to 
java, python to python, java-python-java, python-java-python, scaling out, 
joining, splitting tasks between different systems, etc.  There are a few 
java-python-java examples in the ctakes resources, one of which uses some 
python from our friends at cnlpt
(https://github.com/Machine-Learning-for-Medical-Language/cnlp_transformers).


Unfortunately dictionary lookup doesn't have any recent major improvements.

Sean


From: Peter Abramowitsch 
Sent: Friday, May 26, 2023 1:46 PM
To: dev@ctakes.apache.org 
Subject: What's new in 5.0 && testing JDK 11 [EXTERNAL]

* External Email - Caution *


Hi Sean,

It looks like I may get some support from my employer to explore 5.0 this
summer, and while doing so, also test the jdk11 build, but I have a couple
of quick questions.

1.  If the system would still require 1.8 to run due to certain
dependencies, what would be the advantage of building it under 11? - or
were you suggesting that an 11 runtime would be possible by upgrading those
dependencies too.

2.  In building the complete 5.0 from git, I've run into a problem with
maven blocking certain artifacts due to http/https issues.  There are
global fixes and project by project specific fixes.  Which do you
recommend?   Ideally should maven be run with -o?


*[INFO] --- maven-remote-resources-plugin:1.4:process (default) @
ctakes-core ---Downloading from maven-default-http-blocker:
https://urldefense.com/v3/__http://0.0.0.0/org/apache/ctakes/ctakes-models/5.0.0-SNAPSHOT/maven-metadata.xml__;!!NZvER7FxgEiBAiR_!rSuvDzYcI-VTRvyiFTaFxr9HI_L4MzFigaaWnvS0cqKIaLEF7BMIeepWBtVYMLmRMVIiIlcrud1QTdtlm_M70sb8DEL1-0WrqQ$
*

3.  Finally,  I had asked a while back if someone could point me to a list
of improvements or significant additions to cTakes that have occurred over
the last year or so.  Since no one responded, I decided to look at all the
SVN and Git commit messages and diffing the sources.

I did come across the PBJ project.  The readme doesn't actually explain
what it is for and there are various meanings of the term PBJ in the python
community.  This one looks like infrastructure to allow ctakes to be called
from a python pipeline using Artemis to decouple the processes -- or am I
wrong and it is the reverse (calling python from within a cTakes pipeline)

If there are any areas where  concept lookup has been improved through
better semantic contextualization please let us know!

Peter



Re: cTAKES build change, javac [EXTERNAL] [SUSPICIOUS]

2023-05-17 Thread Finan, Sean
Before anybody asks:

We as a community do want to update ctakes to be a 21st century application.

That means the latest cutting edge versions of java, uima, cleartk, log4j, etc.

If you can devote any time towards making this goal a reality, please let us 
know!
If you would like to coordinate a hackathon with modernization as a goal, 
please take charge of one and gain the credit!  At the very least I will show.  
:^)


Thanks all,

Sean




From: Finan, Sean 
Sent: Wednesday, May 17, 2023 10:47 PM
To: dev@ctakes.apache.org 
Subject: cTAKES build change, javac [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi all,

I am trying to make this understandable for all readers, so the statements that 
follow are simplified or abbreviated without tl/dr context.

There have been recent improvements in the Apache build environments that 
require java 11.
Those improvements make the java 8 build environments that ctakes has been 
using less than favorable, if not completely untenable.

Things had to be fiddled about until ctakes could build in a java 11 
environment.

Though ctakes can now be "built with" java 11, ctakes is "built as a" java 8 
application.

You still need java 8 to run ctakes.  The simple reason for this is that though 
the ctakes jars themselves can build on java 11, many dependencies still 
require java 8, and we need to stick to that lowest common denominator.  For 
the time being.

I have local java 11 builds working and regression tests working (java 8 vs. 
11), but given the infinite possible ctakes pipelines I cannot cover everything.

The Apache Jenkins 'maven central'ish builds are also working.

Please post on dev@ any NEW problems that you have building ctakes or any 
discrepancies that you see running ctakes pipeline v5-SNAPSHOT 5/16/2023 vs. 
5/18/2023.

Thanks all,

Sean


Sean Finan
Research Computing Principal Engineer
Computational Health Informatics Program, Natural Language Processing Lab
Boston Children's Hospital
sean.fi...@tch.harvard.edu


cTAKES build change, javac

2023-05-17 Thread Finan, Sean
Hi all,

I am trying to make this understandable for all readers, so the statements that 
follow are simplified or abbreviated without tl/dr context.

There have been recent improvements in the Apache build environments that 
require java 11.
Those improvements make the java 8 build environments that ctakes has been 
using less than favorable, if not completely untenable.

Things had to be fiddled about until ctakes could build in a java 11 
environment.

Though ctakes can now be "built with" java 11, ctakes is "built as a" java 8 
application.

You still need java 8 to run ctakes.  The simple reason for this is that though 
the ctakes jars themselves can build on java 11, many dependencies still 
require java 8, and we need to stick to that lowest common denominator.  For 
the time being.

I have local java 11 builds working and regression tests working (java 8 vs. 
11), but given the infinite possible ctakes pipelines I cannot cover everything.

The Apache Jenkins 'maven central'ish builds are also working.

Please post on dev@ any NEW problems that you have building ctakes or any 
discrepancies that you see running ctakes pipeline v5-SNAPSHOT 5/16/2023 vs. 
5/18/2023.

Thanks all,

Sean


Sean Finan
Research Computing Principal Engineer
Computational Health Informatics Program, Natural Language Processing Lab
Boston Children's Hospital
sean.fi...@tch.harvard.edu


Re: cTAKES questions [EXTERNAL]

2023-04-12 Thread Finan, Sean
Hi John,

Good questions.  Unfortunately, I can't really say what is going on as it seems 
that a lot of the information is in your images - 1000 words and all that.
Unfortunately, attachments and inserted images will not go through the dev@ 
email system.  Please copy/paste some plain text in this thread and we will try 
to help you.

The first "NOCODE" item might come from a table name mismatch in the database, 
e.g. "ICD-9" vs. "ICD_9", but that is a shot in the dark.

The second issue that you report is more concerning.  You are correct in that 
it is unexpected and most likely not a great thing to have happening.

Just in case it makes things easier, you can use another method for getting 
cuis.  For instance, add the SemanticTableFileWriter to the end of your 
pipeline.  It will write one file per note and accepts standard fileWriter 
parameter "SubDirectory", plus values for parameter "TableType": BSV, CSV, 
HTML, TAB.

Sean


From: JOHN R CASKEY 
Sent: Tuesday, April 11, 2023 11:45 PM
To: dev@ctakes.apache.org 
Subject: cTAKES questions [EXTERNAL]

* External Email - Caution *


Hello,

I have a minor bug to report, and a question that may be a part of a major bug.



If I create a custom dictionary with multiple vocabularies and then run cTAKES 
using this custom dictionary, cTAKES will sometimes replace the vocabulary name 
with the name of the custom dictionary. An example is shown in the attached 
image1.png that was run on the MIMIC dataset. I noticed that if I looked up the 
CUI C1548802 in the UMLS Metathesaurus Browser that had the incorrect 
vocabulary name inserted, it had ‘NOCODE’ for the code. This only seemed to 
occur with CUIs from the MTH vocabulary. Is this something that can be fixed 
within cTAKES?



The question and maybe major bug was we ran the same dataset (50 MIMIC notes) 
twice: once on the custom dictionary with multiple vocabularies described in 
the attached image1.png, and then using a custom dictionary that only included 
the snomed vocabulary. Next, we filtered the output from the multiple 
vocabulary dictionary to only include CUIs that were reported by snomed. The 
two outputs from cTAKES should have produced the same CUIs, but as can be seen 
in the attached Venn Diagrams, some of the CUIs reported by cTAKES running the 
snomed-only dictionary were not reported by cTAKES running the multiple 
vocabulary dictionary. Do you know why the two outputs would be different?



We’re running user installation of cTAKES 4.0.0.1 via



./bin/runPiperFile.sh -p path/to/piperfile -l path/to/custom_dict.xml -i 
inputDir --xmiOut outputDir



And then extracting the CUIs from the output XMI files.



Please let me know if I should report this as an issue on the new GitHub 
repository instead of via email.



Thanks!



John Caskey




Re: It is Official! Steps toward a cTAKES 5.0 release.

2023-03-08 Thread Finan, Sean
Hi all Apache cTAKES developers and users,

I have news on the release front ...

The Apache Infrastructure team is working on a new Artifact Distribution 
Platform.  It will be used to upload and promote release artifacts, sign keys, 
and host distributions in a fashion that is informative and attractive to a 
user.

Some of the old/current items that are part of an Apache project release are 
going to be "legacy" and there are some new metadata items that go with a 
release artifact.

I see two paths moving forward:


  1.   We push on with a release of cTAKES 5.0 and release in the current style.
  2.   We wait a couple of months until the Apache Infrastructure team has the 
new Artifact Distribution Platform ready and use it to release.

For #1 please keep in mind that we still haven't had a volunteer for the 
primary Release Manager.  Gandhi Rajan has volunteered to be co-RM but it will 
be a two-person job.

Either way can create Release Candidate source branches on GitHub to be tested 
and have issues posted on the cTAKES GitHub issues list.

This manner of Release Candidate testing would be a deviation from the method 
of creating Release Candidate artifacts including binary installations and 
putting them in a Subversion (svn) repository online.
We can probably place "binary installation" artifacts on GitHub, but somebody 
will need to check on space limits and other rules before we can make any 
promises there.  If there is some barrier there then testers would need to test 
binary installations by build/packaging locally on their system - which is a 
good thing to have tested anyway.

So, please post any thoughts or questions in reply to this email and we can try 
to figure out where to go from here.

Many thanks,

Sean

________
From: Finan, Sean 
Sent: Monday, February 20, 2023 5:12 PM
To: dev@ctakes.apache.org ; u...@ctakes.apache.org 

Subject: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Hi all,

The cTAKES Project Management Committee has voted that it is time to officially 
begin the release process for cTAKES 5.0

It has been almost 6 years since version 4.0.0 was released, and with a 
worldwide user count estimated in the thousands, a new release will be 
extremely valuable.

Releasing cTAKES 5.0 will involve some work, and the project needs volunteers 
to assist in the process.

The most important thing right now is the appointment of a Release Manager (RM).
While the position is not to be taken lightly and does involve work, it can be 
a great experience (and a resume builder).

We need a cTAKES committer to be the RM, but I am going to split the general 
responsibilities below.
I am doing this because I believe that any user familiar with cTAKES can be a 
co-RM.

Requiring a committer:
1.  Creating Release Candidates of the code.
2.  Deploying and Signing the actual Official Release.

Not requiring a committer:
1.  Coordinating people performing documentation, testing and bug fixing.
2.  Communicating progress with the developer list.

I am sure that I am forgetting something, but those are the 4 tasks that I can 
think of right now.

If you would like to be the Release Manager (or a co-RM), please volunteer on 
the dev@ctakes.apache.org mailing list.

Other tasks that must be performed for a release include:
1.  Testing the release candidates.
3.  Contributing documentation.
2.  Writing fixes for bugs that can be fixed for the release.
4.  Updating the release information on ctakes.apache.org

Anybody can test release candidates.  There are countless pipelines that can be 
built and tested, but I think that we should try to cover the 'most commonly 
used' pipelines.  If you run any pipeline, please report success - even if you 
don't run it specifically for release testing.
Documentation can be contributed by any user.  A cTAKES committer is required 
to actually push the documentation to the wiki, readme, release notes, etc. 
Sending out markdown, images, plain text or just recommendations is open to all 
users.
While only committers can actually push changes to cTAKES code, any user can 
contribute fixes by creating code patches or even just copy-pasting code in an 
email.
Updating the ctakes.apache.org website will require a committer, but 
non-committer assistance is possible just like it is for bug fixes.

One person (Tim Miller) has already volunteered to perform testing and another 
(Dennis Johns) is currently working on the GitHub wiki.
I don't think that people need to officially volunteer to perform last 4 listed 
tasks, but it may be beneficial to identify areas that you would like to cover 
in order to prevent duplicated work.

I suspect that I am forgetting at least some minor items, but they will come to 
light when encountered.

I urge you all to take part in the release process.  You can earn good karma, 
become famous as a cTAKES power user, and 

Re: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]

2023-02-22 Thread Finan, Sean
Hi Peter,

We definitely need some release notes, and a separate (linked) formatted list 
of what is new since 4.0 would be very beneficial.  I plan to write at least 
some of this information.  Like you, my life is really busy right now with a 
work project.

One thing that a Release Manager can do is try to come up with a schedule.  I 
hope that we can release 5.0 'soon', but I honestly do not know what will 
really happen.

Thanks, and I hope that you are able to get some free time soon.  Not for 
ctakes, but for your own self.

Sean


From: Peter Abramowitsch 
Sent: Wednesday, February 22, 2023 11:47 AM
To: dev@ctakes.apache.org 
Subject: Re: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]

* External Email - Caution *


Hi Sean and all,

If you expect the release process to last a couple of months, I can
volunteer.  At the moment and for the next few weeks I'm really busy.

One thing that would really help is to have a list of all the major changes
& additions that have happened since the 4.0.0 release.   I think that
would be valuable for everyone.
I also have some additions to fold in, but without a good knowledge of
what's been added/changed and why, it would not be safe to do that.

For instance there's a project ctakes-pbj in the sources.  Unless I've
missed something, it's Readme doesn't have any explanation of what it
actually is.  And there are new annotators and functionality.Is there a
comprehensive list?   Probably it would be for each author to document
their own additions, for accuracy and completeness.  I will be doing that
for sure.

Peter

On Wed, Feb 22, 2023 at 7:12 AM Finan, Sean
 wrote:

> Hi Gandhi,
>
> Thank you very much for volunteering!
>
> I am waiting to see if anybody else volunteers to be the RM, but I will
> help anybody that volunteers for any position as much as I can.
>
> Cheers,
>
> Sean
> 
> From: gandhi rajan 
> Sent: Monday, February 20, 2023 11:06 PM
> To: dev@ctakes.apache.org 
> Subject: Re: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean,
>
> I can volunteer for co-RM so that I can work under your guidance. Thanks.
>
> On Tue, 21 Feb 2023 at 03:43, Finan, Sean
>  wrote:
>
> > Hi all,
> >
> > The cTAKES Project Management Committee has voted that it is time to
> > officially begin the release process for cTAKES 5.0
> >
> > It has been almost 6 years since version 4.0.0 was released, and with a
> > worldwide user count estimated in the thousands, a new release will be
> > extremely valuable.
> >
> > Releasing cTAKES 5.0 will involve some work, and the project needs
> > volunteers to assist in the process.
> >
> > The most important thing right now is the appointment of a Release
> Manager
> > (RM).
> > While the position is not to be taken lightly and does involve work, it
> > can be a great experience (and a resume builder).
> >
> > We need a cTAKES committer to be the RM, but I am going to split the
> > general responsibilities below.
> > I am doing this because I believe that any user familiar with cTAKES can
> > be a co-RM.
> >
> > Requiring a committer:
> > 1.  Creating Release Candidates of the code.
> > 2.  Deploying and Signing the actual Official Release.
> >
> > Not requiring a committer:
> > 1.  Coordinating people performing documentation, testing and bug fixing.
> > 2.  Communicating progress with the developer list.
> >
> > I am sure that I am forgetting something, but those are the 4 tasks that
> I
> > can think of right now.
> >
> > If you would like to be the Release Manager (or a co-RM), please
> volunteer
> > on the dev@ctakes.apache.org mailing list.
> >
> > Other tasks that must be performed for a release include:
> > 1.  Testing the release candidates.
> > 3.  Contributing documentation.
> > 2.  Writing fixes for bugs that can be fixed for the release.
> > 4.  Updating the release information on ctakes.apache.org
> >
> > Anybody can test release candidates.  There are countless pipelines that
> > can be built and tested, but I think that we should try to cover the
> 'most
> > commonly used' pipelines.  If you run any pipeline, please report
> success -
> > even if you don't run it specifically for release testing.
> > Documentation can be contributed by any user.  A cTAKES committer is
> > required to actually push the documentation to the wiki, readme, release
> > notes, etc. Sending out markdown, images, plain text or just
> > recommendations is open to all users.
> > While only committers can actua

Re: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]

2023-02-22 Thread Finan, Sean
Hi Gandhi,

Thank you very much for volunteering!

I am waiting to see if anybody else volunteers to be the RM, but I will help 
anybody that volunteers for any position as much as I can.

Cheers,

Sean

From: gandhi rajan 
Sent: Monday, February 20, 2023 11:06 PM
To: dev@ctakes.apache.org 
Subject: Re: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]

* External Email - Caution *


Hi Sean,

I can volunteer for co-RM so that I can work under your guidance. Thanks.

On Tue, 21 Feb 2023 at 03:43, Finan, Sean
 wrote:

> Hi all,
>
> The cTAKES Project Management Committee has voted that it is time to
> officially begin the release process for cTAKES 5.0
>
> It has been almost 6 years since version 4.0.0 was released, and with a
> worldwide user count estimated in the thousands, a new release will be
> extremely valuable.
>
> Releasing cTAKES 5.0 will involve some work, and the project needs
> volunteers to assist in the process.
>
> The most important thing right now is the appointment of a Release Manager
> (RM).
> While the position is not to be taken lightly and does involve work, it
> can be a great experience (and a resume builder).
>
> We need a cTAKES committer to be the RM, but I am going to split the
> general responsibilities below.
> I am doing this because I believe that any user familiar with cTAKES can
> be a co-RM.
>
> Requiring a committer:
> 1.  Creating Release Candidates of the code.
> 2.  Deploying and Signing the actual Official Release.
>
> Not requiring a committer:
> 1.  Coordinating people performing documentation, testing and bug fixing.
> 2.  Communicating progress with the developer list.
>
> I am sure that I am forgetting something, but those are the 4 tasks that I
> can think of right now.
>
> If you would like to be the Release Manager (or a co-RM), please volunteer
> on the dev@ctakes.apache.org mailing list.
>
> Other tasks that must be performed for a release include:
> 1.  Testing the release candidates.
> 3.  Contributing documentation.
> 2.  Writing fixes for bugs that can be fixed for the release.
> 4.  Updating the release information on ctakes.apache.org
>
> Anybody can test release candidates.  There are countless pipelines that
> can be built and tested, but I think that we should try to cover the 'most
> commonly used' pipelines.  If you run any pipeline, please report success -
> even if you don't run it specifically for release testing.
> Documentation can be contributed by any user.  A cTAKES committer is
> required to actually push the documentation to the wiki, readme, release
> notes, etc. Sending out markdown, images, plain text or just
> recommendations is open to all users.
> While only committers can actually push changes to cTAKES code, any user
> can contribute fixes by creating code patches or even just copy-pasting
> code in an email.
> Updating the ctakes.apache.org website will require a committer, but
> non-committer assistance is possible just like it is for bug fixes.
>
> One person (Tim Miller) has already volunteered to perform testing and
> another (Dennis Johns) is currently working on the GitHub wiki.
> I don't think that people need to officially volunteer to perform last 4
> listed tasks, but it may be beneficial to identify areas that you would
> like to cover in order to prevent duplicated work.
>
> I suspect that I am forgetting at least some minor items, but they will
> come to light when encountered.
>
> I urge you all to take part in the release process.  You can earn good
> karma, become famous as a cTAKES power user, and perhaps be nominated as a
> Committer!
>
> Thank you all,
>
> Sean
>
>

--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"


Re: dev Digest 20 Feb 2023 22:13:01 -0000 Issue 1432 [EXTERNAL]

2023-02-22 Thread Finan, Sean
Hi Alexis,

That is an excellent idea!

I know that some classes and engines commonly used in version 4 have been 
marked "deprecated", but hopefully there is full backward compatibility.  
"Famous last words".  We should document these deprecations, reasons for 
deprecation and existing replacements.

Any problems that arise during testing that relate to incompatibility should 
definitely be documented - if not fixed.

Many thanks,

Sean


From: Alexis Raykhel 
Sent: Tuesday, February 21, 2023 10:34 AM
To: dev-dig...@ctakes.apache.org 
Cc: dev@ctakes.apache.org 
Subject: Re: dev Digest 20 Feb 2023 22:13:01 - Issue 1432 [EXTERNAL]

* External Email - Caution *


Would it be possible to include instructions on upgrading from older
versions (4 and even earlier) with the next release? No, I'm not
volunteering, trust me :D
--


Alexis Raykhel

Senior NLP Engineer

She/Her
https://urldefense.com/v3/__http://www.iodinesoftware.com/__;!!NZvER7FxgEiBAiR_!sobi6JckN7vbEObff6ewFbCCp0xjxRpIgqvZEMfkTd9cpeCw41hG2gKVhrMcHfzt9gsya4PDA2aRfAZvBwl0YEl_uja-heDKB_8$


On Mon, Feb 20, 2023 at 4:13 PM  wrote:

>
> dev Digest 20 Feb 2023 22:13:01 - Issue 1432
>
> Topics (messages 6959 through 6959)
>
> It is Official!  Steps toward a cTAKES 5.0 release.
> 6959 by: Finan, Sean
>
> Administrivia:
>
> -
> To post to the list, e-mail: dev@ctakes.apache.org
> To unsubscribe, e-mail: dev-digest-unsubscr...@ctakes.apache.org
> For additional commands, e-mail: dev-digest-h...@ctakes.apache.org
>
> ------
>
>
>
>
> -- Forwarded message --
> From: "Finan, Sean" 
> To: "dev@ctakes.apache.org" , "
> u...@ctakes.apache.org" 
> Cc:
> Bcc:
> Date: Mon, 20 Feb 2023 22:12:40 +
> Subject: It is Official!  Steps toward a cTAKES 5.0 release.
> Hi all,
>
> The cTAKES Project Management Committee has voted that it is time to
> officially begin the release process for cTAKES 5.0
>
> It has been almost 6 years since version 4.0.0 was released, and with a
> worldwide user count estimated in the thousands, a new release will be
> extremely valuable.
>
> Releasing cTAKES 5.0 will involve some work, and the project needs
> volunteers to assist in the process.
>
> The most important thing right now is the appointment of a Release Manager
> (RM).
> While the position is not to be taken lightly and does involve work, it
> can be a great experience (and a resume builder).
>
> We need a cTAKES committer to be the RM, but I am going to split the
> general responsibilities below.
> I am doing this because I believe that any user familiar with cTAKES can
> be a co-RM.
>
> Requiring a committer:
> 1.  Creating Release Candidates of the code.
> 2.  Deploying and Signing the actual Official Release.
>
> Not requiring a committer:
> 1.  Coordinating people performing documentation, testing and bug fixing.
> 2.  Communicating progress with the developer list.
>
> I am sure that I am forgetting something, but those are the 4 tasks that I
> can think of right now.
>
> If you would like to be the Release Manager (or a co-RM), please volunteer
> on the dev@ctakes.apache.org mailing list.
>
> Other tasks that must be performed for a release include:
> 1.  Testing the release candidates.
> 3.  Contributing documentation.
> 2.  Writing fixes for bugs that can be fixed for the release.
> 4.  Updating the release information on ctakes.apache.org
>
> Anybody can test release candidates.  There are countless pipelines that
> can be built and tested, but I think that we should try to cover the 'most
> commonly used' pipelines.  If you run any pipeline, please report success -
> even if you don't run it specifically for release testing.
> Documentation can be contributed by any user.  A cTAKES committer is
> required to actually push the documentation to the wiki, readme, release
> notes, etc. Sending out markdown, images, plain text or just
> recommendations is open to all users.
> While only committers can actually push changes to cTAKES code, any user
> can contribute fixes by creating code patches or even just copy-pasting
> code in an email.
> Updating the ctakes.apache.org website will require a committer, but
> non-committer assistance is possible just like it is for bug fixes.
>
> One person (Tim Miller) has already volunteered to perform testing and
> another (Dennis Johns) is currently working on the GitHub wiki.
> I don't think that people need to officially volunteer to perform last 4
> listed tasks, but it may be beneficial to identify areas that you w

It is Official! Steps toward a cTAKES 5.0 release.

2023-02-20 Thread Finan, Sean
Hi all,

The cTAKES Project Management Committee has voted that it is time to officially 
begin the release process for cTAKES 5.0

It has been almost 6 years since version 4.0.0 was released, and with a 
worldwide user count estimated in the thousands, a new release will be 
extremely valuable.

Releasing cTAKES 5.0 will involve some work, and the project needs volunteers 
to assist in the process.

The most important thing right now is the appointment of a Release Manager (RM).
While the position is not to be taken lightly and does involve work, it can be 
a great experience (and a resume builder).

We need a cTAKES committer to be the RM, but I am going to split the general 
responsibilities below.
I am doing this because I believe that any user familiar with cTAKES can be a 
co-RM.

Requiring a committer:
1.  Creating Release Candidates of the code.
2.  Deploying and Signing the actual Official Release.

Not requiring a committer:
1.  Coordinating people performing documentation, testing and bug fixing.
2.  Communicating progress with the developer list.

I am sure that I am forgetting something, but those are the 4 tasks that I can 
think of right now.

If you would like to be the Release Manager (or a co-RM), please volunteer on 
the dev@ctakes.apache.org mailing list.

Other tasks that must be performed for a release include:
1.  Testing the release candidates.
3.  Contributing documentation.
2.  Writing fixes for bugs that can be fixed for the release.
4.  Updating the release information on ctakes.apache.org

Anybody can test release candidates.  There are countless pipelines that can be 
built and tested, but I think that we should try to cover the 'most commonly 
used' pipelines.  If you run any pipeline, please report success - even if you 
don't run it specifically for release testing.
Documentation can be contributed by any user.  A cTAKES committer is required 
to actually push the documentation to the wiki, readme, release notes, etc. 
Sending out markdown, images, plain text or just recommendations is open to all 
users.
While only committers can actually push changes to cTAKES code, any user can 
contribute fixes by creating code patches or even just copy-pasting code in an 
email.
Updating the ctakes.apache.org website will require a committer, but 
non-committer assistance is possible just like it is for bug fixes.

One person (Tim Miller) has already volunteered to perform testing and another 
(Dennis Johns) is currently working on the GitHub wiki.
I don't think that people need to officially volunteer to perform last 4 listed 
tasks, but it may be beneficial to identify areas that you would like to cover 
in order to prevent duplicated work.

I suspect that I am forgetting at least some minor items, but they will come to 
light when encountered.

I urge you all to take part in the release process.  You can earn good karma, 
become famous as a cTAKES power user, and perhaps be nominated as a Committer!

Thank you all,

Sean



Re: Which repo is the repo I should be reaping from? [EXTERNAL]

2023-02-07 Thread Finan, Sean
Thanks for the pointer Rik.  I think that was boilerplate added from some 
general apache how-to.  imo That kind of thing should not go into 
project-specific pages (links, maybe) and I wasn't aware of it.

I have removed it.

All future documentation will be done on the GitHub page 
https://github.com/apache/ctakes

Hopefully new documentation will be created quickly.

Sean


From: Rick Coleman 
Sent: Tuesday, February 7, 2023 11:55 AM
To: dev@ctakes.apache.org 
Subject: Re: Which repo is the repo I should be reaping from? [EXTERNAL]

* External Email - Caution *


Sean (& gandi),

Thanks for the updated info.

While you're at it, some one might want to update this page:

https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/CTAKES/cTAKES*4.0*Developer*Install*Guide__;KysrKw!!NZvER7FxgEiBAiR_!p3HqewnKUv8GCDZuEyKMrrjHRHexsrkOJIZR1YN7oLB4fb01ZnWu1JYHyjkM6WKEW7g16hTDknGMdO56Loo0nAQWwNhKOzs$

It still assumes you want to use Git *with* Subversion.


Thanks again everyone for all of this very helpful information,

rik.


On 2/7/23 11:19, Finan, Sean wrote:
> Gandhi is 100% correct.
>
> The github repository contains the latest code for ctakes.  The svn repo 
> referenced in the website is a now stale :^(.  The svn repo does not contain 
> anything that is not in the github version unless you count bugs.
>
> The ctakes.apache.org website downloads page should be updated asap.  If any 
> committers can do this then please notify other committers have at it.
> One option is to just redirect to the github repo if nobody can devote effort 
> to updating it.
>
> It could be redone as a github page (Jekyll).
>
> If anybody has other ideas or feedback please post it on the devlist as at 
> this time our github doesn't have a discussion area.
>
> Thanks to all,
> Sean
>
> 
> From: gandhi rajan
> Sent: Tuesday, February 7, 2023 11:01 AM
> To:dev@ctakes.apache.org  
> Subject: Re: Which repo is the repo I should be reaping from? [EXTERNAL]
>
> * External Email - Caution *
>
>
> The code repo was recently moved from Subversion to GitHub. As of now, the
> active development is in GitHub as far as I know.
>
> On Tue, 7 Feb 2023 at 21:19, Rick Coleman  wrote:
>
>> Hi all,
>>
>> I was reading the last post, subject:
>> "[GitHub] [ctakes] Haags commented on issue #8: Issue clean install
>> using maven for ctakes-ytex"
>>
>> and that got me wondering, the download page mentions Subversion, but
>> there's a GitHub.
>>
>> Which repository is the *definitive* repository?  Why are there two on
>> two different systems?
>>
>> Thanks,
>>
>> rik.
>>
>
> --
> Regards,
> Gandhi
>
> "The best way to find urself is to lose urself in the service of others !!!"
>


Re: Which repo is the repo I should be reaping from? [EXTERNAL]

2023-02-07 Thread Finan, Sean
Gandhi is 100% correct.

The github repository contains the latest code for ctakes.  The svn repo 
referenced in the website is a now stale :^(.  The svn repo does not contain 
anything that is not in the github version unless you count bugs.

The ctakes.apache.org website downloads page should be updated asap.  If any 
committers can do this then please notify other committers have at it.
One option is to just redirect to the github repo if nobody can devote effort 
to updating it.

It could be redone as a github page (Jekyll).

If anybody has other ideas or feedback please post it on the devlist as at this 
time our github doesn't have a discussion area.

Thanks to all,
Sean


From: gandhi rajan 
Sent: Tuesday, February 7, 2023 11:01 AM
To: dev@ctakes.apache.org 
Subject: Re: Which repo is the repo I should be reaping from? [EXTERNAL]

* External Email - Caution *


The code repo was recently moved from Subversion to GitHub. As of now, the
active development is in GitHub as far as I know.

On Tue, 7 Feb 2023 at 21:19, Rick Coleman  wrote:

> Hi all,
>
> I was reading the last post, subject:
> "[GitHub] [ctakes] Haags commented on issue #8: Issue clean install
> using maven for ctakes-ytex"
>
> and that got me wondering, the download page mentions Subversion, but
> there's a GitHub.
>
> Which repository is the *definitive* repository?  Why are there two on
> two different systems?
>
> Thanks,
>
> rik.
>


--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"


Re: Crash course in cTakes [EXTERNAL]

2023-02-06 Thread Finan, Sean
Hi John,

Can you share any more details on this?

Thanks,

Sean

From: Petersam, John Contractor 
Sent: Monday, February 6, 2023 7:13 AM
To: dev@ctakes.apache.org 
Subject: RE: Crash course in cTakes [EXTERNAL]

* External Email - Caution *


Hi Rik,
I run mine on Java 19, so it can be done.  But I have also updated dependencies 
and made code modifications to support it.

Thanks,
John

-Original Message-
From: Rick Coleman 
Sent: Friday, February 03, 2023 5:59 PM
To: dev@ctakes.apache.org
Subject: Re: Crash course in cTakes [EXTERNAL]

Sean,

Thanks for getting back to me in this.  I was afraid that was what the answer 
was going to be.

I appreciate you taking the time to fill in some of the gaps.  If it's so 
dependent on Java 1.8, someone should probably remove the "or higher"
on the download page.


I look forward to getting this application up and running.

Until then,

rik.

On 2/3/23 15:57, Finan, Sean wrote:
> Hi Rick,
>
> Thank you for the questions and for reminding us that the documentation is 
> sparse, outdated and not very detailed.  Everybody needs a prod now and then 
> to get things done.
>
> I hope that we can get a solid README and Wiki going on GitHub, as well as an 
> update to the primary website.  It will take a lot of work and some 
> cooperation by committers and users alike.
>
> I have tried to address your questions inline below.
>
> Sean
>
> 
> From: Rick Coleman 
> Sent: Friday, February 3, 2023 3:14 PM
> To: dev@ctakes.apache.org 
> Subject: Crash course in cTakes [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hello everyone,
>
> Can anyone point me to an exhaustive set of documentation regarding cTakes?
>
>*   Not really.  The wiki that you found is the most that there is.
>*   Most information is scattered across emails written on the dev and 
> user lists.  You can search them here:  
> https://urldefense.com/v3/__https://apache.markmail.org/__;!!NZvER7FxgEiBAiR_!vSoolzbK8NAWQaElUhpa-gH234NiQTdDCQHd7Wms90IBgEnRv2N1Sbv0Ipgp5b8G1B-nT-X-qmQjr0EJnmRDSPTBdhQxQ9dh5cMCZLEk7w$
>
> The main site feels like it was written by a marketing major, lots of
> flash and catchiness, but little in the way of detailed documentation.
> Even the User Install Guide and the Developer Install guide read like
> what they are, install guides.
>
> For example:
> Is cTakes the whole package, or just the front end?
>
>*   ctakes is a clinical nlp platform (vague enough?).   I would say 
> "whole package", but extendable.
>*   It is built on Apache UIMA and allows users to create pipelines of 
> various nlp and i/o components.
>*   It comes with many components that have been built for clinical nlp.
>*   It is extendable; UIMA components from other sources can be placed in 
> the pipelines.
>*   There are front-ends for some tasks, such as running a pipeline or 
> creating a custom dictionary.
>
> If it's just the front end, what's the back end?
>
>*   I would say that each UIMA component is a bit of back-end, as is the 
> controller that actually runs the pipeline.
>*   As mentioned above, you can extend it with non-ctakes back-end 
> components .
>
> It mentions using my UMLS credentials, can you use a local copy of the
> relevant UMLS data?  If so how?
>
>*   If you are compiling and running the source then ctakes will 
> automatically download a default dictionary.
>*   If you are running a packaged binary then you'll need to manually pull 
> down a dictionary.
>*   Previous to ctakes 5 downlaoding, unzipping and copying the dictionary 
> was a manual process.
>*   If you are using v5 then you can run bin/getUmlsDictionary and a 
> simple gui will do it for you.
>*   You can also create your own custom dictionary.
>*   The wiki has a page on the dictionary creator gui.
>*   There are instructions on youtube that start with first steps.
>
> Are the requirements listed, 1GB drive space, Oracle Java 1.8 the
> minimum or the recommended?  What about RAM or CPU? Is non-Oracle Java
> acceptable?  What about 1.17, the current LTS version?
>
>> 1GB disk
> == Java 1.8
>> 2GB RAM  (>= 4 recommended)
>> = 64bit CPU
> OpenJDK seems to be fine.
>
> Every java release past 8 is bad for ctakes.  ctakes has a lot of 
> dependencies, many of which are old and rely on a java 8 feature here and 
> there.  ctakes itself probably requires a java 8 special here and there, but 
> I honestly don't know. Unfortunately, ctakes needs to have a serious update 
> effort - maybe for v6.  Part of the problem is actually its capabilities and 
> versatility - the availability of multiple available components and 
> workflows.  A 'minor' change can require a dozen end-to-end tests in dev and 
> user environments on multiple platforms.  Unit tests do not suffice.
>
>
> So, does anyone know where I can find out this information?
>
>
> Thanks.
>
> rik.
>
>


Re: Crash course in cTakes [EXTERNAL]

2023-02-03 Thread Finan, Sean
Hi Rick,

Thank you for the questions and for reminding us that the documentation is 
sparse, outdated and not very detailed.  Everybody needs a prod now and then to 
get things done.

I hope that we can get a solid README and Wiki going on GitHub, as well as an 
update to the primary website.  It will take a lot of work and some cooperation 
by committers and users alike.

I have tried to address your questions inline below.

Sean


From: Rick Coleman 
Sent: Friday, February 3, 2023 3:14 PM
To: dev@ctakes.apache.org 
Subject: Crash course in cTakes [EXTERNAL]

* External Email - Caution *


Hello everyone,

Can anyone point me to an exhaustive set of documentation regarding cTakes?

  *   Not really.  The wiki that you found is the most that there is.
  *   Most information is scattered across emails written on the dev and user 
lists.  You can search them here:  https://apache.markmail.org/

The main site feels like it was written by a marketing major, lots of
flash and catchiness, but little in the way of detailed documentation.
Even the User Install Guide and the Developer Install guide read like
what they are, install guides.

For example:
Is cTakes the whole package, or just the front end?

  *   ctakes is a clinical nlp platform (vague enough?).   I would say "whole 
package", but extendable.
  *   It is built on Apache UIMA and allows users to create pipelines of 
various nlp and i/o components.
  *   It comes with many components that have been built for clinical nlp.
  *   It is extendable; UIMA components from other sources can be placed in the 
pipelines.
  *   There are front-ends for some tasks, such as running a pipeline or 
creating a custom dictionary.

If it's just the front end, what's the back end?

  *   I would say that each UIMA component is a bit of back-end, as is the 
controller that actually runs the pipeline.
  *   As mentioned above, you can extend it with non-ctakes back-end components 
.

It mentions using my UMLS credentials, can you use a local copy of the
relevant UMLS data?  If so how?

  *   If you are compiling and running the source then ctakes will 
automatically download a default dictionary.
  *   If you are running a packaged binary then you'll need to manually pull 
down a dictionary.
  *   Previous to ctakes 5 downlaoding, unzipping and copying the dictionary 
was a manual process.
  *   If you are using v5 then you can run bin/getUmlsDictionary and a simple 
gui will do it for you.
  *   You can also create your own custom dictionary.
  *   The wiki has a page on the dictionary creator gui.
  *   There are instructions on youtube that start with first steps.

Are the requirements listed, 1GB drive space, Oracle Java 1.8 the
minimum or the recommended?  What about RAM or CPU? Is non-Oracle Java
acceptable?  What about 1.17, the current LTS version?

> 1GB disk
== Java 1.8
> 2GB RAM  (>= 4 recommended)
>= 64bit CPU
OpenJDK seems to be fine.

Every java release past 8 is bad for ctakes.  ctakes has a lot of dependencies, 
many of which are old and rely on a java 8 feature here and there.  ctakes 
itself probably requires a java 8 special here and there, but I honestly don't 
know. Unfortunately, ctakes needs to have a serious update effort - maybe for 
v6.  Part of the problem is actually its capabilities and versatility - the 
availability of multiple available components and workflows.  A 'minor' change 
can require a dozen end-to-end tests in dev and user environments on multiple 
platforms.  Unit tests do not suffice.


So, does anyone know where I can find out this information?


Thanks.

rik.



Re: Issuing compiling from the main branch using github [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2023-02-03 Thread Finan, Sean
By the way, for all who are using "mvn install" and aren't interested in 
creating a binary installation:

There is a new maven profile called "no-zips-build" in ctakes 5.  When enabled 
it will skip the creation of src and binary tar.gz and .zip files during the 
package phase.  This significantly speeds up build-time and sometimes you just 
want the jars - such as for mvn install.

In addition, there are 2 new profiles in ctakes 5 for web war creation: 
"web-rest-build" and "ytex-web-build".  When enabled, these will build the 
corresponding war files.  By default the wars are not built.  Why?  Because 
doing so creates 2 or 3 extra copies of ctakes.  One copy of jars, one copy in 
the .war, and then a 3rd in the binary .zip file.  This all leads to a longer 
build time and much larger disk footprint.  In addition, most users do not use 
the web projects, and if they do they usually want one, not both.  With proper 
documentation (coming soon) this should not cause any confusion.

Sean

____
From: Finan, Sean 
Sent: Friday, February 3, 2023 11:24 AM
To: dev@ctakes.apache.org 
Subject: Re: Issuing compiling from the main branch using github [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


I just checked in the revert on ctakes-ytex/scripts/data/build.xml

Packaging works on my machine, but it had also worked for all previous tests so 
maybe there is still an issue.

Sean

____
From: Finan, Sean 
Sent: Friday, February 3, 2023 11:09 AM
To: dev@ctakes.apache.org 
Subject: Re: Issuing compiling from the main branch using github [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Hi Scott,

Thanks for the report - especially the setup, failure message and commit 
details.  Would you mind copying it into the GitHub issues area?  
https://urldefense.com/v3/__https://github.com/apache/ctakes/issues__;!!NZvER7FxgEiBAiR_!r_NQsbbnDqD-QGMwq_ofjWAsCeeR1pGVKoB2EVnideNr2t6t24yCwJmoibSXDSf6F40_bEadTeF5AGH5NItUpOBViKAQtNUFoJM9wlumGFqmeqbv-A$

I think that I know what happened ...

The pom files for several modules used ${basedir} instead of the preferred 
${project.basedir}.  I updated them to standard.  I also changed ${basedir} in 
the ytex scripts and apparently, they don't have substitution for that 
expression.  I will have it fixed in a few minutes and try again.

Sean



From: Haag, Scott M 
Sent: Friday, February 3, 2023 10:10 AM
To: dev@ctakes.apache.org 
Subject: Issuing compiling from the main branch using github [EXTERNAL]

* External Email - Caution *


I was wondering if anybody had any advice for me on the following error, I am 
trying to clean install from the head of the main branch in git.

Below are my commands; this error start with commit 
d998331c1cfef48d792c78a8d3c1670498a8b925
The commands below work for the previous commit 
a97258b01d455f7816994070cf64deb311b29acc


git clone 
https://urldefense.com/v3/__https://github.com/apache/ctakes.git__;!!NZvER7FxgEiBAiR_!sNj-I6E9pJQ9g0k3JrJ_jk_1ye5PiGtLKrJi5vDzO4sZPBphb4sRIYrS9VEgT3JyiT79ME4Gnt7MZmOviJWN825Tpq417uZr$
 ;
cd ctakes;

mvn clean install -ff -DskipTests=true;


error message
/root/projects/ctakes/ctakes-ytex/scripts/build-setup.xml:149: The following 
error occurred while executing this line:
[ERROR] /root/projects/ctakes/ctakes-ytex/scripts/data/build.xml:148: The 
following error occurred while executing this line:
[ERROR] /root/projects/ctakes/ctakes-ytex/scripts/data/build.xml:531: Warning: 
Could not find file 
/root/projects/ctakes/ctakes-ytex/scripts/data/${project.basedir}/conn.xml.template
 to copy.


 mvn -v
Apache Maven 3.6.3
Maven home: /usr/share/maven
Java version: 1.8.0_282, vendor: AdoptOpenJDK, runtime: /opt/java/openjdk/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.10.147+", arch: "amd64", family: "unix"



Re: Issuing compiling from the main branch using github [EXTERNAL] [SUSPICIOUS]

2023-02-03 Thread Finan, Sean
I just checked in the revert on ctakes-ytex/scripts/data/build.xml

Packaging works on my machine, but it had also worked for all previous tests so 
maybe there is still an issue.

Sean


From: Finan, Sean 
Sent: Friday, February 3, 2023 11:09 AM
To: dev@ctakes.apache.org 
Subject: Re: Issuing compiling from the main branch using github [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Hi Scott,

Thanks for the report - especially the setup, failure message and commit 
details.  Would you mind copying it into the GitHub issues area?  
https://urldefense.com/v3/__https://github.com/apache/ctakes/issues__;!!NZvER7FxgEiBAiR_!r_NQsbbnDqD-QGMwq_ofjWAsCeeR1pGVKoB2EVnideNr2t6t24yCwJmoibSXDSf6F40_bEadTeF5AGH5NItUpOBViKAQtNUFoJM9wlumGFqmeqbv-A$

I think that I know what happened ...

The pom files for several modules used ${basedir} instead of the preferred 
${project.basedir}.  I updated them to standard.  I also changed ${basedir} in 
the ytex scripts and apparently, they don't have substitution for that 
expression.  I will have it fixed in a few minutes and try again.

Sean



From: Haag, Scott M 
Sent: Friday, February 3, 2023 10:10 AM
To: dev@ctakes.apache.org 
Subject: Issuing compiling from the main branch using github [EXTERNAL]

* External Email - Caution *


I was wondering if anybody had any advice for me on the following error, I am 
trying to clean install from the head of the main branch in git.

Below are my commands; this error start with commit 
d998331c1cfef48d792c78a8d3c1670498a8b925
The commands below work for the previous commit 
a97258b01d455f7816994070cf64deb311b29acc


git clone 
https://urldefense.com/v3/__https://github.com/apache/ctakes.git__;!!NZvER7FxgEiBAiR_!sNj-I6E9pJQ9g0k3JrJ_jk_1ye5PiGtLKrJi5vDzO4sZPBphb4sRIYrS9VEgT3JyiT79ME4Gnt7MZmOviJWN825Tpq417uZr$
 ;
cd ctakes;

mvn clean install -ff -DskipTests=true;


error message
/root/projects/ctakes/ctakes-ytex/scripts/build-setup.xml:149: The following 
error occurred while executing this line:
[ERROR] /root/projects/ctakes/ctakes-ytex/scripts/data/build.xml:148: The 
following error occurred while executing this line:
[ERROR] /root/projects/ctakes/ctakes-ytex/scripts/data/build.xml:531: Warning: 
Could not find file 
/root/projects/ctakes/ctakes-ytex/scripts/data/${project.basedir}/conn.xml.template
 to copy.


 mvn -v
Apache Maven 3.6.3
Maven home: /usr/share/maven
Java version: 1.8.0_282, vendor: AdoptOpenJDK, runtime: /opt/java/openjdk/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.10.147+", arch: "amd64", family: "unix"



Re: Issuing compiling from the main branch using github [EXTERNAL]

2023-02-03 Thread Finan, Sean
Hi Scott,

Thanks for the report - especially the setup, failure message and commit 
details.  Would you mind copying it into the GitHub issues area?  
https://github.com/apache/ctakes/issues

I think that I know what happened ...

The pom files for several modules used ${basedir} instead of the preferred 
${project.basedir}.  I updated them to standard.  I also changed ${basedir} in 
the ytex scripts and apparently, they don't have substitution for that 
expression.  I will have it fixed in a few minutes and try again.

Sean



From: Haag, Scott M 
Sent: Friday, February 3, 2023 10:10 AM
To: dev@ctakes.apache.org 
Subject: Issuing compiling from the main branch using github [EXTERNAL]

* External Email - Caution *


I was wondering if anybody had any advice for me on the following error, I am 
trying to clean install from the head of the main branch in git.

Below are my commands; this error start with commit 
d998331c1cfef48d792c78a8d3c1670498a8b925
The commands below work for the previous commit 
a97258b01d455f7816994070cf64deb311b29acc


git clone 
https://urldefense.com/v3/__https://github.com/apache/ctakes.git__;!!NZvER7FxgEiBAiR_!sNj-I6E9pJQ9g0k3JrJ_jk_1ye5PiGtLKrJi5vDzO4sZPBphb4sRIYrS9VEgT3JyiT79ME4Gnt7MZmOviJWN825Tpq417uZr$
 ;
cd ctakes;

mvn clean install -ff -DskipTests=true;


error message
/root/projects/ctakes/ctakes-ytex/scripts/build-setup.xml:149: The following 
error occurred while executing this line:
[ERROR] /root/projects/ctakes/ctakes-ytex/scripts/data/build.xml:148: The 
following error occurred while executing this line:
[ERROR] /root/projects/ctakes/ctakes-ytex/scripts/data/build.xml:531: Warning: 
Could not find file 
/root/projects/ctakes/ctakes-ytex/scripts/data/${project.basedir}/conn.xml.template
 to copy.


 mvn -v
Apache Maven 3.6.3
Maven home: /usr/share/maven
Java version: 1.8.0_282, vendor: AdoptOpenJDK, runtime: /opt/java/openjdk/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.10.147+", arch: "amd64", family: "unix"



Re: CUI Question [EXTERNAL]

2023-02-02 Thread Finan, Sean
Hi John,

Each annotation gets a unique concept for every combination of possible codes, 
semantic types, etc.
You have pasted a good example of when that happens:  (abbreviated)

< code="7092007" tui="T109"/>




This is definitely a little confusing when the CUI for all 4 'unique' concepts 
is the same, in your case cui="C0025859".

If you are interested in gathering annotations, cuis, codes, concepts, semantic 
types etc. you should consider using the OntologyConceptUtil in ctakes-core.
https://ctakes.apache.org/apidocs/4.0.0/org/apache/ctakes/core/util/OntologyConceptUtil.html

As far as I can tell, methods with application to your question would be:

getAnnotationsByCui( jCas, "C0025859" )
  --> which would return 3 annotations given your example.

getCuiCounts(  jCas )
  --> which would return a Map where  the cui is the key (String) 
and the # of annotations with that cui is the value (Long).  In your case this 
should be "C0025859", 3.

There are around 35 methods, so hopefully you can find some that fit your needs.

In case you really need something special, parsing the xmi files is probably 
not the best way to get information.


Sean



From: JOHN R CASKEY 
Sent: Thursday, February 2, 2023 1:58 PM
To: dev@ctakes.apache.org 
Subject: CUI Question [EXTERNAL]

* External Email - Caution *


Hello,
I’ve run into a problem and a question when running cTAKES. If I have a 
document and process it through cTAKES, then the XMI output will contain 
numerous XML tags. The tags our lab is interested in are the CUIs, for example, 
the XMI tag



Would indicate the CUI C0025859 for Metoprolol-containing product is found in a 
given document.

If I look at the input document text, then I can locate three instances of the 
drug Metoprolol in the document text. When I look at the cTAKES XMI output in 
the cTAKES XMI CVD viewer, each of the results for Metoprolol is part of 
ontologyConceptArr, with 4 members each, looking like this:

// found at org.apache.ctakes.typesystem.type.textsem.EventMention
//   org.apache.ctakes.typesystem.type.textsem.MedicationMention
//   ontologyConceptArr = uima.cas.FSArray[4]






Although not shown here, it is possible for there to be different CUIs within a 
single uima.cas.FSArray, with this array mapping to a single string of text in 
the document.

If I walk the XMI file and retrieve all CUIs, then the result will be the CUI 
C0025859 being found 12 times, however, if I extend the JCasAnnotator_ImplBase 
java class to extract the CUIs from the jCas annotations, then it only finds 
this CUI 3 times.

If part of the output needs to include a count of all CUIs found by cTAKES 
within a given document, which method is correct?

Thanks!


John Caskey, PhD
Senior Data Scientist
Department of Medicine
University of Wisconsin-Madison




Re: [EXT]Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

2023-01-11 Thread Finan, Sean
"Issues" and "Wiki" are now available on the ctakes github page 
https://github.com/apache/ctakes

I will move open jira items to "issues" over the next few days.

Sean

From: Bethard, Steven - (bethard) 
Sent: Monday, January 2, 2023 2:17 PM
To: dev@ctakes.apache.org 
Subject: Re: [EXT]Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

* External Email - Caution *


I would be in favor of using GitHub issues but only if that means we migrate 
all the Jira issues there and close down Jira. Having two different issue 
trackers sound like a recipe for things getting lost.

Steve

On 1/2/23, 12:02, "gandhi rajan" wrote:

External Email

Hi Sean,

Most of the GitHub projects in general have issues enabled. I would also
propose enabling Github discussions as well. This helps to keep the
discussions close to the code and it also helps to convert discussions to
issues and vice versa. Please let us know your thoughts.

On Tue, 3 Jan 2023 at 00:24, Finan, Sean
 wrote:

> Hi all,
>
> I have poked around a tiny bit on other Apache GitHub repositories.  It
> looks like many (e.g. spark, Solr) also do not have enabled 'issues'
> sections, while others do.  In addition, most do not have a GitHub Wiki,
> while at least one (echarts) does.
>
> As I mentioned, we still have Jira for issues, we also still have our wiki
> on Confluence.
>
> I don't know if we can move both to GitHub, but if it is possible ... what
> does everybody think?  That is to ask, "Would the issues and wiki both
> being GitHub-based be a benefit?"
>
> Obviously any change is contingent upon the Apache Infra team being able
> to relocate the cTAKES online resources.
>
> Thanks,
> Sean
> 
> From: Benjamin hansen 
> Sent: Saturday, December 31, 2022 5:04 PM
> To: dev@ctakes.apache.org 
> Cc: u...@ctakes.apache.org 
> Subject: Re: Apache cTAKES is now on GitHub ! [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi, this looks great. Awesome work.
>
> May I ask - is there any reason why the issues section in the new github
> repo has been deactivated?
>
> In my experience this is a great place to discuss issues and ask for help.
> It could be a good way to make the community a bit more active :)
>
> Thanks
>
> On Fri, Dec 30, 2022 at 7:49 PM Finan, Sean
>  wrote:
>
> > Hi all,
> >
> > I am pleased to announce that the cTAKES source code is now on GitHub at
> >
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$>
> > [
> >
> https://urldefense.com/v3/__https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1wQLpXfKg$<https://urldefense.com/v3/__https:/opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1wQLpXfKg$>
> > ]<
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$>
> >
> > GitHub - apache/ctakes: Apache ctakes<
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$>
> >
> > Apache ctakes. Contribute to apache/ctakes development by creating an
> > account on GitHub.
> > github.com
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > All current and future code development should be performed on the source
> > in GitHub.
> >
> >
> >Changes ( vs. Subversion Repository )
> >=
> >
> >   *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
> >   *   STRUCTURE: 

Re: [EXT]Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

2023-01-02 Thread Finan, Sean
Hi Steve,

I am in complete agreement with you.  The same would have to go for the wiki v5 
and forward.

Sean


From: Bethard, Steven - (bethard) 
Sent: Monday, January 2, 2023 2:17 PM
To: dev@ctakes.apache.org 
Subject: Re: [EXT]Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

* External Email - Caution *


I would be in favor of using GitHub issues but only if that means we migrate 
all the Jira issues there and close down Jira. Having two different issue 
trackers sound like a recipe for things getting lost.

Steve

On 1/2/23, 12:02, "gandhi rajan" wrote:

External Email

Hi Sean,

Most of the GitHub projects in general have issues enabled. I would also
propose enabling Github discussions as well. This helps to keep the
discussions close to the code and it also helps to convert discussions to
issues and vice versa. Please let us know your thoughts.

On Tue, 3 Jan 2023 at 00:24, Finan, Sean
 wrote:

> Hi all,
>
> I have poked around a tiny bit on other Apache GitHub repositories.  It
> looks like many (e.g. spark, Solr) also do not have enabled 'issues'
> sections, while others do.  In addition, most do not have a GitHub Wiki,
> while at least one (echarts) does.
>
> As I mentioned, we still have Jira for issues, we also still have our wiki
> on Confluence.
>
> I don't know if we can move both to GitHub, but if it is possible ... what
> does everybody think?  That is to ask, "Would the issues and wiki both
> being GitHub-based be a benefit?"
>
> Obviously any change is contingent upon the Apache Infra team being able
> to relocate the cTAKES online resources.
>
> Thanks,
> Sean
> 
> From: Benjamin hansen 
> Sent: Saturday, December 31, 2022 5:04 PM
> To: dev@ctakes.apache.org 
> Cc: u...@ctakes.apache.org 
> Subject: Re: Apache cTAKES is now on GitHub ! [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi, this looks great. Awesome work.
>
> May I ask - is there any reason why the issues section in the new github
> repo has been deactivated?
>
> In my experience this is a great place to discuss issues and ask for help.
> It could be a good way to make the community a bit more active :)
>
> Thanks
>
> On Fri, Dec 30, 2022 at 7:49 PM Finan, Sean
>  wrote:
>
> > Hi all,
> >
> > I am pleased to announce that the cTAKES source code is now on GitHub at
> >
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$>
> > [
> >
> https://urldefense.com/v3/__https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1wQLpXfKg$<https://urldefense.com/v3/__https:/opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1wQLpXfKg$>
> > ]<
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$>
> >
> > GitHub - apache/ctakes: Apache ctakes<
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$>
> >
> > Apache ctakes. Contribute to apache/ctakes development by creating an
> > account on GitHub.
> > github.com
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > All current and future code development should be performed on the source
> > in GitHub.
> >
> >
> >Changes ( vs. Subversion Repository )
> >=
> >
> >   *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
> >   *   STRUCTURE:   The project has been slightly restructured at a high
> > level.  The typical use

Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

2023-01-02 Thread Finan, Sean
Hi all,

I have poked around a tiny bit on other Apache GitHub repositories.  It looks 
like many (e.g. spark, Solr) also do not have enabled 'issues' sections, while 
others do.  In addition, most do not have a GitHub Wiki, while at least one 
(echarts) does.

As I mentioned, we still have Jira for issues, we also still have our wiki on 
Confluence.

I don't know if we can move both to GitHub, but if it is possible ... what does 
everybody think?  That is to ask, "Would the issues and wiki both being 
GitHub-based be a benefit?"

Obviously any change is contingent upon the Apache Infra team being able to 
relocate the cTAKES online resources.

Thanks,
Sean

From: Benjamin hansen 
Sent: Saturday, December 31, 2022 5:04 PM
To: dev@ctakes.apache.org 
Cc: u...@ctakes.apache.org 
Subject: Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

* External Email - Caution *


Hi, this looks great. Awesome work.

May I ask - is there any reason why the issues section in the new github
repo has been deactivated?

In my experience this is a great place to discuss issues and ask for help.
It could be a good way to make the community a bit more active :)

Thanks

On Fri, Dec 30, 2022 at 7:49 PM Finan, Sean
 wrote:

> Hi all,
>
> I am pleased to announce that the cTAKES source code is now on GitHub at
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$
> [
> https://urldefense.com/v3/__https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1wQLpXfKg$
> ]<https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$
>  >
> GitHub - apache/ctakes: Apache 
> ctakes<https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$
>  >
> Apache ctakes. Contribute to apache/ctakes development by creating an
> account on GitHub.
> github.com
> 
> 
> 
> 
> 
> 
> 
>
> All current and future code development should be performed on the source
> in GitHub.
>
>
>Changes ( vs. Subversion Repository )
>=
>
>   *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
>   *   STRUCTURE:   The project has been slightly restructured at a high
> level.  The typical user should not notice the difference.
>   *   CODE API:   All package, class, method and constant names remain the
> same, so your code should not need to be refactored.
>   *   DEPENDENCIES:   If you include cTAKES modules as dependencies in
> your maven project, you can simply change the version to obtain new
> 5.0.0-SNAPSHOT builds. *
>   *   BINARY PACKAGE:   The binary package has some minor differences, but
> the typical user should not notice them.
>
> * If you use maven dependency exclusions for resource ('-res') modules
> because of unwanted ML models, you need to change the excluded name
> extension from '-res' to '-model'.
>
>
>Moving forward from the Subversion Repository
>=
>
>   *   VERSION:   The project in the SVN repository was versioned
> 4.0.1-SNAPSHOT.
>   *   DEPRECATION:   The code and resources in the 4.0.1-SNAPSHOT
> Subversion (SVN) repository will remain available for checkout, but should
> be considered read-only.  4.0.1-SNAPSHOT built modules will remain
> available for maven dependencies.  All current and future code development
> should be performed on the source in GitHub.
>   *   RELEASE:   There is no cTAKES 4.0.1 release.
>
>Next Anticipated Release
>
>
>   *   VERSION:   As you might guess from the snapshot version change, we
> are gearing up for a version 5.0.0 release.
>   *   WHY 5.0.0:   There are so many new features over cTAKES 4.0.0,
> including completely new modules, that the version number was bumped up.
>   *   DOCUMENTATION:   All of the new toys will be documented in the
> confluence wiki at the time of the 5.0.0 release.
>   *   DATE:   There is no release date yet, but hopefully it will be very
> very soon ...
>
> Happy New Year,
>
> Sean
>
>
>


Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

2023-01-02 Thread Finan, Sean
Hi Benjamin,

I don't know why the issues section is not present.  We still have our jira 
area active, but it would be really nice to have as much 'one stop shopping' 
available as possible.

I will contact the Apache Infra team and see if I can get that feature up and 
running.

Thanks for bringing this to our attention,

Sean



From: Benjamin hansen 
Sent: Saturday, December 31, 2022 5:04 PM
To: dev@ctakes.apache.org 
Cc: u...@ctakes.apache.org 
Subject: Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

* External Email - Caution *


Hi, this looks great. Awesome work.

May I ask - is there any reason why the issues section in the new github
repo has been deactivated?

In my experience this is a great place to discuss issues and ask for help.
It could be a good way to make the community a bit more active :)

Thanks

On Fri, Dec 30, 2022 at 7:49 PM Finan, Sean
 wrote:

> Hi all,
>
> I am pleased to announce that the cTAKES source code is now on GitHub at
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$
> [
> https://urldefense.com/v3/__https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1wQLpXfKg$
> ]<https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$
>  >
> GitHub - apache/ctakes: Apache 
> ctakes<https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!viVqLaCzQSb4zvpoyBsm2nLGuaXgzH7AX2s7CZsmYFCa1exu0w6KH9QGlTxxKQdsuv2Rgrmw-zTFAXtqwY2esVP3dgOuH1xuMprPYA$
>  >
> Apache ctakes. Contribute to apache/ctakes development by creating an
> account on GitHub.
> github.com
> 
> 
> 
> 
> 
> 
> 
>
> All current and future code development should be performed on the source
> in GitHub.
>
>
>Changes ( vs. Subversion Repository )
>=
>
>   *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
>   *   STRUCTURE:   The project has been slightly restructured at a high
> level.  The typical user should not notice the difference.
>   *   CODE API:   All package, class, method and constant names remain the
> same, so your code should not need to be refactored.
>   *   DEPENDENCIES:   If you include cTAKES modules as dependencies in
> your maven project, you can simply change the version to obtain new
> 5.0.0-SNAPSHOT builds. *
>   *   BINARY PACKAGE:   The binary package has some minor differences, but
> the typical user should not notice them.
>
> * If you use maven dependency exclusions for resource ('-res') modules
> because of unwanted ML models, you need to change the excluded name
> extension from '-res' to '-model'.
>
>
>Moving forward from the Subversion Repository
>=
>
>   *   VERSION:   The project in the SVN repository was versioned
> 4.0.1-SNAPSHOT.
>   *   DEPRECATION:   The code and resources in the 4.0.1-SNAPSHOT
> Subversion (SVN) repository will remain available for checkout, but should
> be considered read-only.  4.0.1-SNAPSHOT built modules will remain
> available for maven dependencies.  All current and future code development
> should be performed on the source in GitHub.
>   *   RELEASE:   There is no cTAKES 4.0.1 release.
>
>Next Anticipated Release
>
>
>   *   VERSION:   As you might guess from the snapshot version change, we
> are gearing up for a version 5.0.0 release.
>   *   WHY 5.0.0:   There are so many new features over cTAKES 4.0.0,
> including completely new modules, that the version number was bumped up.
>   *   DOCUMENTATION:   All of the new toys will be documented in the
> confluence wiki at the time of the 5.0.0 release.
>   *   DATE:   There is no release date yet, but hopefully it will be very
> very soon ...
>
> Happy New Year,
>
> Sean
>
>
>


Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

2022-12-31 Thread Finan, Sean
Hi Peter,

Privileges for the GitHub repository are the same as they were for the old SVN 
repository.  Anybody who is an Apache cTAKES Committer then should have write 
permission for the repository.

You do need to have a GitHub account and connect it to your Apache account.
There are a couple of ways to handle it and I did this myself quite a while 
ago, but I think that the most direct method is:

  1.   Visit the Apache Account Utility at https://id.apache.org/
  2.  Log in with your Apache username and Apache password.
  3.  Halfway down the page, enter your GitHub Username in the first box 
labeled "Your GitHub Username".
  4.  Save Changes.

I think that it takes several hours for the system to establish the connection, 
at which point you might get some kind of notification email.


Regarding the NegEx, ZoneAnnotator and anything else that you might have 
locally, please do check it in!  For some reason I thought that the Negex 
change was done a long time ago, but I wasn't really paying close attention.  
Thanks again for the improvements!

Cheers,
Sean


From: Peter Abramowitsch 
Sent: Friday, December 30, 2022 11:31 PM
To: dev@ctakes.apache.org 
Subject: Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

* External Email - Caution *


Thank you Sean - looks like you & others put in a lot of work to make this
transition.  I'm looking forward to the "toys" you mentioned.
Will the repository protocol be the same as it was during the SVN days with
designated contributors?

Although I didn't receive any feedback, I might check in some improvements
I made to the Negex module and to the ZoneAnnotator.  These have been in
production for a year now, so I'm pretty sure they're stable.

Peter

On Fri, Dec 30, 2022 at 10:49 AM Finan, Sean
 wrote:

> Hi all,
>
> I am pleased to announce that the cTAKES source code is now on GitHub at
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!vvc9ppnqFpdHxDNEs2tIbtt4ZSkRCeydNFZM3uDqPhxj9AkB6XB3VW9vwfHslXEvY16AAmkA5nsxjU42V9PdnJo2sDmUTsdsGg$
> [
> https://urldefense.com/v3/__https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!vvc9ppnqFpdHxDNEs2tIbtt4ZSkRCeydNFZM3uDqPhxj9AkB6XB3VW9vwfHslXEvY16AAmkA5nsxjU42V9PdnJo2sDk81Mxahw$
> ]<https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!vvc9ppnqFpdHxDNEs2tIbtt4ZSkRCeydNFZM3uDqPhxj9AkB6XB3VW9vwfHslXEvY16AAmkA5nsxjU42V9PdnJo2sDmUTsdsGg$
>  >
> GitHub - apache/ctakes: Apache 
> ctakes<https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!vvc9ppnqFpdHxDNEs2tIbtt4ZSkRCeydNFZM3uDqPhxj9AkB6XB3VW9vwfHslXEvY16AAmkA5nsxjU42V9PdnJo2sDmUTsdsGg$
>  >
> Apache ctakes. Contribute to apache/ctakes development by creating an
> account on GitHub.
> github.com
> 
> 
> 
> 
> 
> 
> 
>
> All current and future code development should be performed on the source
> in GitHub.
>
>
>Changes ( vs. Subversion Repository )
>=
>
>   *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
>   *   STRUCTURE:   The project has been slightly restructured at a high
> level.  The typical user should not notice the difference.
>   *   CODE API:   All package, class, method and constant names remain the
> same, so your code should not need to be refactored.
>   *   DEPENDENCIES:   If you include cTAKES modules as dependencies in
> your maven project, you can simply change the version to obtain new
> 5.0.0-SNAPSHOT builds. *
>   *   BINARY PACKAGE:   The binary package has some minor differences, but
> the typical user should not notice them.
>
> * If you use maven dependency exclusions for resource ('-res') modules
> because of unwanted ML models, you need to change the excluded name
> extension from '-res' to '-model'.
>
>
>Moving forward from the Subversion Repository
>=
>
>   *   VERSION:   The project in the SVN repository was versioned
> 4.0.1-SNAPSHOT.
>   *   DEPRECATION:   The code and resources in the 4.0.1-SNAPSHOT
> Subversion (SVN) repository will remain available for checkout, but should
> be considered read-only.  4.0.1-SNAPSHOT built modules will remain
> available for maven dependencies.  All current and future code development
> should be performed on the source in GitHub.
>   *   RELEASE:   There is no cTAKES 4.0.1 release.
>
>Next Anticipated Release
>
>
>   *   VERSION:   As you might guess from the snapshot version change, we
> are gearing up for a version 5.0.0 release.
>   *   WHY 5.0.0:   There are so many new features over cTAKES 4.0.0,
> including co

Apache cTAKES is now on GitHub !

2022-12-30 Thread Finan, Sean
Hi all,

I am pleased to announce that the cTAKES source code is now on GitHub at 
https://github.com/apache/ctakes
[https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes]
GitHub - apache/ctakes: Apache ctakes
Apache ctakes. Contribute to apache/ctakes development by creating an account 
on GitHub.
github.com








All current and future code development should be performed on the source in 
GitHub.


   Changes ( vs. Subversion Repository )
   =

  *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
  *   STRUCTURE:   The project has been slightly restructured at a high level.  
The typical user should not notice the difference.
  *   CODE API:   All package, class, method and constant names remain the 
same, so your code should not need to be refactored.
  *   DEPENDENCIES:   If you include cTAKES modules as dependencies in your 
maven project, you can simply change the version to obtain new 5.0.0-SNAPSHOT 
builds. *
  *   BINARY PACKAGE:   The binary package has some minor differences, but the 
typical user should not notice them.

* If you use maven dependency exclusions for resource ('-res') modules because 
of unwanted ML models, you need to change the excluded name extension from 
'-res' to '-model'.


   Moving forward from the Subversion Repository
   =

  *   VERSION:   The project in the SVN repository was versioned 4.0.1-SNAPSHOT.
  *   DEPRECATION:   The code and resources in the 4.0.1-SNAPSHOT Subversion 
(SVN) repository will remain available for checkout, but should be considered 
read-only.  4.0.1-SNAPSHOT built modules will remain available for maven 
dependencies.  All current and future code development should be performed on 
the source in GitHub.
  *   RELEASE:   There is no cTAKES 4.0.1 release.

   Next Anticipated Release
   

  *   VERSION:   As you might guess from the snapshot version change, we are 
gearing up for a version 5.0.0 release.
  *   WHY 5.0.0:   There are so many new features over cTAKES 4.0.0, including 
completely new modules, that the version number was bumped up.
  *   DOCUMENTATION:   All of the new toys will be documented in the confluence 
wiki at the time of the 5.0.0 release.
  *   DATE:   There is no release date yet, but hopefully it will be very very 
soon ...

Happy New Year,

Sean




Re: Best practices for documenting NLP versions [EXTERNAL]

2022-10-26 Thread Finan, Sean
Hi all,

This versioning topic had come up at least once before, so I thought that I'd 
give it a shot before it fell off my radar.

I checked in some new stuff and rebuilt the snapshot, so you should be able to 
use one solution as of now.

I couldn't find a great solution for compile-time alone versioning.  Class file 
dates aren't great and I tried four different supposed maven solutions to write 
properties files but none of them worked.

What did work was placing build information in the manifest files in ctakes jar 
files during the package phase.  What this means is that if you are running in 
an IDE or just from compiled classes you will not know the version.  If you run 
from a jar file (built via maven) then you will have the following in each 
ctakes jar file:

Manifest-Version: 1.0
Implementation-Title: Apache cTAKES core
Implementation-Version: 4.0.1-SNAPSHOT
Specification-Vendor: The Apache Software Foundation
Specification-Title: Apache cTAKES core
Build-Jdk-Spec: 1.8
Created-By: Maven JAR Plugin 3.3.0
Specification-Version: 4.0
Implementation-Vendor: The Apache Software Foundation
Implementation-Build-Date: 2022-10-26 17:02

Above is the content of ctakes-core-4.0.1-20221026.172844-167.jar - the current 
snapshot build at 
https://repository.apache.org/content/repositories/snapshots/org/apache/ctakes/ctakes-core/4.0.1-SNAPSHOT/
which is what you get if you use ctakes as a dependency in your own project.

If you maven package locally then everything will be the same except for the 
Implementation-Build-Date at the bottom of the list.

You can get to this information manually or programmatically.  I added a static 
public method named getBuildInfo() to the FinishedLogger (ctakes-core 
util.log.FinishedLogger.java) that returns a jar build version and build date.  
If you are running outside of a jar then it returns an empty string.

In case you have no idea what FinishedLogger is, it prints stats on time 
(start, end, init, process, per note) and now it prints build information.  
piper: "add util.log.FinishedLogger"

I also added the build information to the ctakes banner at the "welcome to 
ctakes" step.  In case you have no idea what the banners are about then add 
"set WriteBanner=yes" to any piper that uses a collection reader that extends 
the AbstractFileTreeReader.  If you don't know what collection reader you are 
using then you are probably using FileTreeReader - which is an extension.

I hope that this is useful to somebody.

Sean



From: Greg Silverman 
Sent: Friday, October 21, 2022 6:23 PM
To: dev@ctakes.apache.org 
Subject: Re: Best practices for documenting NLP versions [EXTERNAL]

* External Email - Caution *


It was an off-the-cuff suggestion. Devil is obviously in the details.

On Fri, Oct 21, 2022 at 3:33 PM Peter Abramowitsch 
wrote:

> Interesting, but it would depend on how the docker is set up.  Our docker
> for instance, encapsulates all the code and imported jars, as you imply,
> but the piper and other runtime configuration such as section regex, negex,
> bsvs, etc are imported on a mounted FS during the container's runtime.
> Having them frozen into the docker instances would proliferate vast numbers
> of docker image-tars with 99% redundant data.  Or do you have a cleverer
> solution?
>
> Peter
>
> On Fri, Oct 21, 2022 at 10:18 PM Greg Silverman 
> wrote:
>
> > Why not use Docker and versioning by tags? See "C. Boettiger, An
> > introduction to Docker for reproducible research, SIGOPS Oper. Syst. Rev.
> > 49
> > (2015) 71–79. doi:10.1145/2723872.2723882.
> >  >   >"
> >
> >
> >
> > On Fri, Oct 21, 2022 at 3:15 PM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > wrote:
> >
> > > Well, obviously, the full range of permutations of all source files and
> > all
> > > annotators and pre and post ctakes code would require a huge amount of
> > > commit information on thousands of files... and not only ctakes
> > > files...recently I made some pretty significant changes to the
> ZonerCli
> > > library which is only a dependency of the ctakes distribution. How
> would
> > > all the commit info be used to tag the end results.  I think the answer
> > is
> > > that it's simply not feasible or useful. So we haven't gone to
> those
> > > lengths.  As far as we go at the UCs  is to version the piper file and
> > then
> > > write the versioned_name of the piper back into the json object
> returned
> > > for each note... We have our own rest service and our own Java and
> Python
> > > clients, but they don't touch the internals of the message in a way
> that
> > > interferes with the clinical informatics.  The note concept collection
> > > object with its piper version is then persisted in our data store.
>  The
> > > server jar also has a version which 

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2022-06-28 Thread Finan, Sean
Hi Richard,

Thank you for this information, any and all help that you can provide is 
greatly appreciated.

>The use of Git and GitHub is well supported by the INFRA team.
-- True. I actually contacted them a year or two ago and they already had 
mechanisms in place to easily migrate code and hook up CI.  That doesn't really 
worry me.

>Jenkins also supports GitHub very well [4]. For example, in UIMA, we just drop 
>a `Jenkinsfile` 
> I'm happy to help you setting that up for cTAKES as well.
-- Your assistance would be appreciated.  A bit ago when Infra switched Jenkins 
platforms we lost our (there kept) configurations and I had to create new 
setups on their current platform.  The wizard gui is helpful ... to a point.   
Anyway, an editable build configuration stored in our code repo would 
definitely be an improvement.

>I fear that people may not have svn installed anymore
-- Also very true, and a great reason to get our code into GitHub.

>So requiring svn to download models and drop them into m2 might be an 
>inconvenience.
-- I agree wholeheartedly, and my writing may have been imprecise but that was 
definitely not my intention.

>If the models live in a Maven Repository and can be dragged in as a normal 
>dependency, that would seem most convenient.
--  Yup.  A new model creator could deal with svn and the svn model repo, but 
the 99.% of developers who don't contribute models to ctakes wouldn't need 
to worry about this.

I hope that we don't let this slip.   It will require some effort with setup 
and test, and I fear that it may require reorganization of the code and 
resources such as I have proposed.  It definitely should not be a 
one-person-job ...  I also think that we need to have a ctakes 5.0 release 
before any of this is undertaken, which requires the usual planning, effort and 
cooperation.

Sean



From: Richard Eckart de Castilho 
Sent: Tuesday, June 28, 2022 6:54 AM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Hi all,

> On 6. Jun 2022, at 16:09, Finan, Sean 
>  wrote:
>
> Hi Kean,
>
> Thank you for the suggestion and the link. I am really glad that people are 
> interested in this guithub topic and taking it seriously. It would be great 
> if we could make it happen.
>
> While definitely a possibility, the git LFS paradigm is something that I 
> would like to avoid.
>
> Like keeping our models on SVN, it would also require separating models from 
> code into two different repos, e.g. github and bitbucket. As opposed to 
> bitbucket, the apache svn repos are long established, familiar to and 
> supported by the apache infrastructure team. The same goes for the apache 
> foundation use of github. I like being able to lean on the apache infra team 
> for help.

So GitHub seems to have support for LFS [1]. What I do not know is if the ASF's 
GitHub plan allows us to use this and if so if there is a volume limit. Would 
have to ask INFRA about that.

The use of Git and GitHub is well supported by the INFRA team. For example, 
there is self-service for creating and managing repos. [2]

There is also the `.asf.yaml` mechanism for configuring GitHub repos and 
hooking them up with the ASF infrastructure including mailing lists, website 
publishing, etc. etc. [3]

> The apache Jenkins servers are linked to the svn repos, making continuous 
> integration easy - on the rare occasion when somebody does change something 
> in a model repo. While I expect anybody savvy enough to work on models to 
> also have the knowhow and wherewithal to work with a separate svn repo, I 
> don't want them to need to get out to jenkins and manually kick off snapshot 
> builds.

Jenkins also supports GitHub very well [4]. For example, in UIMA, we just drop 
a `Jenkinsfile` [5,6] configuration file into each repo and Jenkins picks them 
up even gives us support pull requests [7].
I'm happy to help you setting that up for cTAKES as well.

> Probably most important is the requirement of the client user to have the LFS 
> command line client. I think that there are enough hoops stuck in front of 
> getting ctakes installed/checked out/cloned/etc. and it seems to me that one 
> of the biggest reasons to use github is to make things easier for absolute 
> newbies to just pull down code and experiment.

It is an additional hoop to jump through indeed, but it is a one-time action to 
install LFS. Chances are that people may even already have it set up because 
they use it in other repos.

> Keeping the models on a separate svn repo would mean that they aren't checked 
> out as code, but would be put in the .m2 maven area when a user runs maven 
> compile. While the total footprint of full ctakes would still be the same 
> size, it woul

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2022-06-06 Thread Finan, Sean
Hi Kean,

Thank you for the suggestion and the link.  I am really glad that people are 
interested in this guithub topic and taking it seriously.  It would be great if 
we could make it happen.

While definitely a possibility, the git LFS paradigm is something that I would 
like to avoid.  

Like keeping our models on SVN, it would also require separating models from 
code into two different repos, e.g. github and bitbucket.  As opposed to 
bitbucket, the apache svn repos are long established, familiar to and supported 
by the apache infrastructure team.  The same goes for the apache foundation use 
of github.  I like being able to lean on the apache infra team for help.

The apache Jenkins servers are linked to the svn repos, making continuous 
integration easy - on the rare occasion when somebody does change something in 
a model repo.  While I expect anybody savvy enough to work on models to also 
have the knowhow and wherewithal to work with a separate svn repo, I don't want 
them to need to get out to jenkins and manually kick off snapshot builds.

Probably most important is the requirement of the client user to have the LFS 
command line client.  I think that there are enough hoops stuck in front of 
getting ctakes installed/checked out/cloned/etc. and it seems to me that one of 
the biggest reasons to use github is to make things easier for absolute newbies 
to just pull down code and experiment.

Keeping the models on a separate svn repo would mean that they aren't checked 
out as code, but would be put in the .m2 maven area when a user runs maven 
compile.  While the total footprint of full ctakes would still be the same 
size, it would essentially make the code directory smaller and initial 
downloads/checkouts would be faster.  Plus, if done properly maybe it could 
"clean up" all of those nearly identically named modules in my intellij project 
window and I'd stop clicking on the wrong one when I've had too much coffee.

The LFS system is great for people who want to work on (in development) large 
files, but given the very lopsided ratio of model reuse vs. 
creation/modification in ctakes I don't think that we need to go that route.

I am only one voice of many, so this is obviously up for debate.  Thanks again,

Sean


From: Kean Kaufmann 
Sent: Monday, June 6, 2022 9:07 AM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Is Git LFS an option?
https://urldefense.com/v3/__https://www.atlassian.com/git/tutorials/git-lfs*installing-git-lfs__;Iw!!NZvER7FxgEiBAiR_!rhJYtElNafdN8aZaA2ELRmZRsDeX81m0IVx2yab70SFpsojM4fuIKTHlfGjo-kKfHlz_WjVFM8RgPjuPiEhaTqoWLzBOtKgs$
Needs an LFS-aware host e.g. Bitbucket; I don't know what the Apache
hosting setup is like.


On Fri, Jun 3, 2022 at 9:31 AM Finan, Sean
 wrote:

> Hi Tim,
>
> >we ran into issues in previous attempts at migration with the large file
> sizes in our repo
>
> Indeed we did, and over the years I have had thoughts on that.
>
> Those large files are large ml models, which are (mostly) static,
> replaceable/interchangeable, not always necessary, and in separate resource
> (-res) modules separated from code modules.
>
> When I was a ctakes newby really disliked the separation of code from
> resources by entirely separate -res modules.  Since then, through working
> on projects that use ctakes code but not (huge) resources as dependencies,
> I have realized the wisdom of the modular separation.  In fact, I put a
> -huge- model in its own -res module so that I could  it from a
> ctakes-dependent project, saving compile (download) time and disk space.
> Like you, I don't like to "download the internet" with maven   ;^)
>
> Right now we have the ner dictionaries in sourceforge, not the apache
> repos.  While this is done for legal reasons it has worked pretty well.
>
> I think that we could maintain an apache SVN repo of -res modules
> containing only huge model files.   I am guessing that we would have to
> make it a "side/sub project" to maintain a separate repo (jenkins build,
> etc.).
>
> Anyway, it would give us the freedom to use a github repo for code (and
> non-model resources) without users needing to go through the github
> large-file workflow, which I see as a barrier to entry.
>
> Thoughts?
>
> 
> From: Miller, Timothy 
> Sent: Thursday, June 2, 2022 6:21 PM
> To: dev@ctakes.apache.org
> Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
> [SUSPICIOUS] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> My recollection was that we ran into issues in previous attempts at
> migration with the large file sizes in our repo.
> Tim
>
>
> On Thu, 2022-06-02 at 20:55 +0

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2022-06-03 Thread Finan, Sean
Hi Tim,

>we ran into issues in previous attempts at migration with the large file sizes 
>in our repo

Indeed we did, and over the years I have had thoughts on that.  

Those large files are large ml models, which are (mostly) static, 
replaceable/interchangeable, not always necessary, and in separate resource 
(-res) modules separated from code modules.

When I was a ctakes newby really disliked the separation of code from resources 
by entirely separate -res modules.  Since then, through working on projects 
that use ctakes code but not (huge) resources as dependencies, I have realized 
the wisdom of the modular separation.  In fact, I put a -huge- model in its own 
-res module so that I could  it from a ctakes-dependent project, 
saving compile (download) time and disk space.  Like you, I don't like to 
"download the internet" with maven   ;^)

Right now we have the ner dictionaries in sourceforge, not the apache repos.  
While this is done for legal reasons it has worked pretty well.

I think that we could maintain an apache SVN repo of -res modules containing 
only huge model files.   I am guessing that we would have to make it a 
"side/sub project" to maintain a separate repo (jenkins build, etc.).   

Anyway, it would give us the freedom to use a github repo for code (and 
non-model resources) without users needing to go through the github large-file 
workflow, which I see as a barrier to entry.

Thoughts?


From: Miller, Timothy 
Sent: Thursday, June 2, 2022 6:21 PM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


My recollection was that we ran into issues in previous attempts at migration 
with the large file sizes in our repo.
Tim


On Thu, 2022-06-02 at 20:55 +, Finan, Sean wrote:

* External Email - Caution *



Thank you Gandhi and Richard.


Unless somebody else beats me to it I will perform some research and see what 
approaches can be used and which might be best.  In the end the cTAKES Project 
Management Committee will need to vote for any action as sweeping as moving to 
github.


Sean



From: gandhi rajan <

<mailto:gandhiraja...@gmail.com>

gandhiraja...@gmail.com

>

Sent: Thursday, June 2, 2022 9:02 AM

To:

<mailto:dev@ctakes.apache.org>

dev@ctakes.apache.org


Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]


* External Email - Caution *



Hi Sean,


If we are sure that the SVN has all the latest changes and active

development is primarily on SVN, then why don't we request a fresh git

repository and push all the changes over there.


More info on

<https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$>

https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$



On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean

<

<mailto:sean.fi...@childrens.harvard.edu.invalid>

sean.fi...@childrens.harvard.edu.invalid

> wrote:


Hi Richard, you bring up a valid concern.


cTAKES Developers:


The Apache Foundation has had an initiative to "move" all projects to

GitHub for some time now.


I don't know much about how this is done.  If anybody out there has

knowledge or experience that they can pass on, please share.


Thanks,

Sean



From: Richard Eckart de Castilho <

<mailto:r...@apache.org>

r...@apache.org

>

Sent: Thursday, June 2, 2022 3:39 AM

To:

<mailto:dev@ctakes.apache.org>

dev@ctakes.apache.org


Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]


* External Email - Caution *



Hi,


it appears that the GitHub mirror of Apache cTAKES may be stuck.


When I check the svn log of

<https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$>

https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$


, I can

see activity as recent as May 2022.


However, on GitHub, I can only see stale branches:



<https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$>

https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$



Wouldn't it be good if the

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

2022-06-02 Thread Finan, Sean
Thank you Gandhi and Richard.

Unless somebody else beats me to it I will perform some research and see what 
approaches can be used and which might be best.  In the end the cTAKES Project 
Management Committee will need to vote for any action as sweeping as moving to 
github.

Sean

From: gandhi rajan 
Sent: Thursday, June 2, 2022 9:02 AM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

* External Email - Caution *


Hi Sean,

If we are sure that the SVN has all the latest changes and active
development is primarily on SVN, then why don't we request a fresh git
repository and push all the changes over there.

More info on 
https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$

On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean
 wrote:

> Hi Richard, you bring up a valid concern.
>
> cTAKES Developers:
>
> The Apache Foundation has had an initiative to "move" all projects to
> GitHub for some time now.
>
> I don't know much about how this is done.  If anybody out there has
> knowledge or experience that they can pass on, please share.
>
> Thanks,
> Sean
> 
> From: Richard Eckart de Castilho 
> Sent: Thursday, June 2, 2022 3:39 AM
> To: dev@ctakes.apache.org
> Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi,
>
> it appears that the GitHub mirror of Apache cTAKES may be stuck.
>
> When I check the svn log of
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
> , I can
> see activity as recent as May 2022.
>
> However, on GitHub, I can only see stale branches:
>
>
> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$
>
> Wouldn't it be good if the GitHub mirror would be kept up-to-date?
>
> Best,
>
> -- Richard
>
>

--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"


Re: SmokingStatus & Side effects - piper file [EXTERNAL]

2022-06-02 Thread Finan, Sean
There are a few things to talk about, including some bad news.

The bad news:
All 3 of those annotators precede the use of UimaFit: 
https://uima.apache.org/uimafit.html
Pipers use UimaFit to simplify specification of parameters and configure 
advances pipelines.  
Piper files will not work with older annotators such as those you wish to 
utilize.

Some good news is that the problems are in the initialization of those 
annotators and not processing.
Some refactoring of those annotators to bring them up to date shouldn't be too 
difficult.

You have exemplified one of the reasons for creating the piper paradigm, which 
is the simplification of parameter specifications.  There isn't (shouldn't be) 
any need to specify urls, resources that point to the urls, then parameters 
that point to resources. 
For instance, a piper would just have:
set StopWordsFile=org/apache/ctakes/smokingstatus/data/PCS/stopwords_PCS.txt
set PCSKeyWordFile=org/apache/ctakes/smokingstatus/data/PCS/keywords_PCS.txt
set PathOfModel=org/apache/ctakes/smokingstatus/data/PCS/pcs_libsvm-2.91.model
add PcsClassifierAnnotator_libsvm

Some information on using piper files can be found here: 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files

Sean


From: Muhammad Ali Syed 
Sent: Thursday, June 2, 2022 3:22 PM
To: dev@ctakes.apache.org
Subject: Re: SmokingStatus & Side effects - piper file [EXTERNAL]

* External Email - Caution *


Below is my piper file and the sample text I am using is
ctakes-smoking-status/data/test/doc2_07543210_sample_current.txt:
load FullTokenizerPipeline

// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger

// Add Chunkers
load ChunkerSubPipe

// Default fast dictionary lookup
load DictionarySubPipe

// Add Cleartk Entity Attribute annotators
load AttributeCleartkSubPipe

// ClassifiableEntries - errors out at
org.apache.ctakes.smokingstatus.ae.ClassifiableEntries.initialize(ClassifiableEntries.java:134)
set SectionsToIgnore=20109,20138
set
AllowedClassifications=SMOKER,CURRENT_SMOKER,NON_SMOKER,PAST_SMOKER,UNKNOWN
set UimaDescriptorStep1=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step1.xml
set UimaDescriptorStep2=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step2_libsvm.xml
add ClassifiableEntries UimaDescriptorStep1Key=UimaDescriptorStep1
UimaDescriptorStep2Key=UimaDescriptorStep2

// KuRuleBasedClassifierAnnotator-  works but commented out for now
//add KuRuleBasedClassifierAnnotator
SmokingWordsFile=/org/apache/ctakes/smokingstatus/data/KU/keywords.txt
UnknownWordsFile=/org/apache/ctakes/smokingstatus/data/KU/unknown_words.txt

// PcsClassifierAnnotator_libsvm - errors out at
libsvm.svm.svm_predict(svm.java:2343)
set StopWordsFileRes=file:
org/apache/ctakes/smokingstatus/data/PCS/stopwords_PCS.txt
set PathOfModelRes=file:
org/apache/ctakes/smokingstatus/data/PCS/pcs_libsvm-2.91.model
set PCSKeyWordFileResc=file:
org/apache/ctakes/smokingstatus/data/PCS/keywords_PCS.txt
//add PcsClassifierAnnotator_libsvm PathOfModel=PathOfModelResc
StopWordsFile=StopWordsFileRes PCSKeyWordFile=PCSKeyWordFileResc

// SideEffectAnnotator - errors out
set sideEffectDic=file:
org/apache/ctakes/sideeffect/lookup/sideEffect_dictionary.txt
//add SideEffectAnnotator sideEffectTable=sideEffectDic

addLast util.log.FinishedLogger

On Thu, Jun 2, 2022 at 2:39 PM Finan, Sean
 wrote:

> Hi Muhammad,
>
> Can you please copy & paste the contents of your piper file?
>
> Thanks,
> Sean
> 
> From: Muhammad Ali Syed 
> Sent: Thursday, June 2, 2022 2:30 PM
> To: dev@ctakes.apache.org
> Subject: SmokingStatus & Side effects - piper file [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi there,
>
> I am exploring cTakes Smoking Status & Side Effects components and have not
> come across any piper file version of their implementation. When trying to
> incrementally add AEs to FullTokenizerPipeline.piper from these 2
> components I am running into issues such as:
> - getting ResourceInitializationExceptions - when adding
> ClassifiableEntries (did set UimaDescriptorStep1Key
> and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
> and ran into other issues
> - exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
> after adding
> KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
> libsvm.svm.svm_predict(svm.java:2343)
>
> My question is: can these 2 components (containing multiple AEs) be
> implemented by piper files as of now? In other words, can any pipeline,
> that can be created using XML descriptor files, be also created by piper
> files?
>
> Is there any sample piper code for pipelines that include either of these
> components?
>
> Regards,
>


Re: SmokingStatus & Side effects - piper file [EXTERNAL]

2022-06-02 Thread Finan, Sean
Hi Muhammad,

Can you please copy & paste the contents of your piper file?

Thanks,
Sean

From: Muhammad Ali Syed 
Sent: Thursday, June 2, 2022 2:30 PM
To: dev@ctakes.apache.org
Subject: SmokingStatus & Side effects - piper file [EXTERNAL]

* External Email - Caution *


Hi there,

I am exploring cTakes Smoking Status & Side Effects components and have not
come across any piper file version of their implementation. When trying to
incrementally add AEs to FullTokenizerPipeline.piper from these 2
components I am running into issues such as:
- getting ResourceInitializationExceptions - when adding
ClassifiableEntries (did set UimaDescriptorStep1Key
and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
and ran into other issues
- exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
after adding
KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
libsvm.svm.svm_predict(svm.java:2343)

My question is: can these 2 components (containing multiple AEs) be
implemented by piper files as of now? In other words, can any pipeline,
that can be created using XML descriptor files, be also created by piper
files?

Is there any sample piper code for pipelines that include either of these
components?

Regards,


Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

2022-06-02 Thread Finan, Sean
Hi Richard, you bring up a valid concern.

cTAKES Developers:

The Apache Foundation has had an initiative to "move" all projects to GitHub 
for some time now.  

I don't know much about how this is done.  If anybody out there has knowledge 
or experience that they can pass on, please share.

Thanks,
Sean

From: Richard Eckart de Castilho 
Sent: Thursday, June 2, 2022 3:39 AM
To: dev@ctakes.apache.org
Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

* External Email - Caution *


Hi,

it appears that the GitHub mirror of Apache cTAKES may be stuck.

When I check the svn log of 
https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
 , I can
see activity as recent as May 2022.

However, on GitHub, I can only see stale branches:

https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$

Wouldn't it be good if the GitHub mirror would be kept up-to-date?

Best,

-- Richard



Re: Issue in running developers version of Apache cTakes to process DefaultClinicalPipeline [EXTERNAL]

2022-04-07 Thread Finan, Sean
Hi Ankit,

Did you add CuiListFileWriter to your pipeline?  I don't think that it is in 
the default pipeline.

As long as you are running the latest code it should be present.   
It is in ctakes-core  org/apache/ctakes/core/cc/CuiListFileWriter.java  
It was added to ctakes on 02/14/2018.

Sean

From: Anand, Ankit (Campus) 
Sent: Thursday, April 7, 2022 2:29 PM
To: dev@ctakes.apache.org
Subject: RE: Issue in running developers version of Apache cTakes to process 
DefaultClinicalPipeline [EXTERNAL]

* External Email - Caution *


Thank you, the Issue below is resolved, but I am getting this error message. Is 
this something related with my setup.
I have setup ctakes on server which basically runs on linux. Since the build 
did not give any error and was successful, I think the setup should be fine.

[ankitk@login004 trunk]$ bin/runClinicalPipeline.sh -i ./inputdir --xmiOut 
./outputdir
log4j: reset attribute= "false".
log4j: Threshold ="null".
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressAppender] additivity to [false].
log4j: Level value for ProgressAppender is  [INFO].
log4j: ProgressAppender level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m].
log4j: Adding appender named [noEolAppender] to category [ProgressAppender].
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressDone] additivity to [false].
log4j: Level value for ProgressDone is  [INFO].
log4j: ProgressDone level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m%n].
log4j: Adding appender named [eolAppender] to category [ProgressDone].
log4j: Level value for root is  [INFO].
log4j: root level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%d{dd MMM  HH:mm:ss} %5p 
%c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
07 Apr 2022 10:19:20 ERROR PiperFileRunner - MESSAGE LOCALIZATION FAILED: Can't 
find resource for bundle java.util.PropertyResourceBundle, key No Analysis 
Component found for CuiListFileWriter

Thanks.
-Ankit

Best,

Ankit Anand
Graduate Assistant | Informatics Institute, UAB
Department of Computer Science, UAB
University of Alabama at Birmingham
Contact - (205) 585-7029


-Original Message-
From: Anand, Ankit (Campus)
Sent: Thursday, April 7, 2022 11:37 AM
To: pabramowit...@gmail.com
Cc: dev@ctakes.apache.org
Subject: RE: Issue in running developers version of Apache cTakes to process 
DefaultClinicalPipeline

Thank you so much Peter :)
My Issue is resolved. I was giving -xmiOut in my command line instead of 
--xmiOut which was causing the problem.

Best,

Ankit Anand
Graduate Assistant | Informatics Institute, UAB Department of Computer Science, 
UAB University of Alabama at Birmingham Contact - (205) 585-7029

-Original Message-
From: Peter Abramowitsch 
Sent: Thursday, April 7, 2022 1:30 AM
To: dev@ctakes.apache.org
Subject: Re: Issue in running developers version of Apache cTakes to process 
DefaultClinicalPipeline

Hi Ankit,

One normally doesn't put the input and output folders in the trunk area, but I 
just tried it and works fin 
r 2022 08:15:53  INFO FinishedLogger - Run Start Time:   Thu
Apr 07 08:14:53 CEST 2022
07 Apr 2022 08:15:53  INFO FinishedLogger - Processing Start Time:
 Thu Apr 07 08:15:18 CEST 2022
07 Apr 2022 08:15:53  INFO FinishedLogger - Processing End Time:
 Thu Apr 07 08:15:53 CEST 2022
07 Apr 2022 08:15:53  INFO FinishedLogger - Initialization Time Elapsed:
 24 seconds
07 Apr 2022 08:15:53  INFO FinishedLogger - Processing Time Elapsed:
 35 seconds
07 Apr 2022 08:15:53  INFO FinishedLogger - Total Run Time Elapsed:   1
minutes, 0 seconds
.
Your error message is complaining about your  *command line option for the 
input folder.*  That it is missing and that other args it's finding are 
unexpected.

So my guess is that your command line is not exactly as you have written.
For instance, the input folder has a space and a hyphen its name
Having a argument of  -i input -folder   instead of -i input-folder  would
cause an error like yours.  I tried that and I get your error

Option must have a value: [--option_u -u value] Option must have a value: 
[--option_t -t value] at
com.lexicalscope.jewel.cli.ValidationErrorBuilderImpl.validate(ValidationErrorBuilderImpl.java:64)
at
com.lexicalscope.jewel.cli.validation.ArgumentValidatorImpl.finishedProcessing(ArgumentValidatorImpl.java:104)
at
com.lexicalscope.jewel.cli.ArgumentCollectionBuilder.processArguments(ArgumentCollectionBuilder.java:129)
at

Re: End of the road for UIMAv2 - please upgrade to UIMAv3 [EXTERNAL]

2022-03-08 Thread Finan, Sean
Thank you Richard, this is great information.

From: Richard Eckart de Castilho 
Sent: Tuesday, March 8, 2022 3:43 AM
To: dev@ctakes.apache.org
Subject: End of the road for UIMAv2 - please upgrade to UIMAv3 [EXTERNAL]

* External Email - Caution *


On 17. Aug 2021, at 22:08, Finan, Sean  wrote:
>
> If you absolutely require uima 3 for some reason then I don't think that I 
> can help you.  You may want to ask the uima lists about mixing versions or 
> equivalent v2 solutions for your goals.

Besides connecting pipes through remote services, there is no way to combine 
UIMAv2 and UIMAv3.

Work on UIMAv2 has fully stopped.

UIMAv2 is very likely not going to get any more updates and bug fixes.
A very last uimaFIT 2.6.0 might still make it, but that's likely it.

I would strongly recommend that you upgrade to v3 as soon as possible.
If you have and trouble doing so, please let me know. The easiest way
is via the Apache UIMA users mailing list.

Best,

-- Richard

(Apache UIMA PMC Chair)



Re: Question for Sean et al? [EXTERNAL]

2022-02-10 Thread Finan, Sean
Hi Peter,

The fastest turnaround for you would probably be seen by modifying a local 
copy.  According to this site the source is available:

http://mastif.sourceforge.net/

Sean

From: Peter Abramowitsch 
Sent: Thursday, February 10, 2022 2:12 PM
To: dev@ctakes.apache.org
Subject: Question for Sean et al? [EXTERNAL]

* External Email - Caution *


Hi all

I've started using the mitre ZoneAnnotator am making some optimizations and
necessary changes which prevent it from leaking note text into the log file
(horrors!!!)

But I have a question.   Turns out that back in 2012  someone left raw
system.outs in the mastif-zoner code which comes to cTakes as a
dependency.  They are not part of the cTakes repo.

These print
input (character offset): 0
input (character offset): 34
...
in the log,  which can get really annoying when you have 100 million notes
as I have.

Here's my question ... what would you do?

   - Would you try to see about changing it in the original code (not sure
   who maintains it)
   - Use a modified version locally  (mind my own business)
   - Absorb this small library into the cTakes repo and remove the external
   dependency?

Peter


Re: Performance of the cleartk history module [EXTERNAL]

2022-01-05 Thread Finan, Sean
Hi Peter,

Your indexing solution sounds pretty slick.  I am practically salivating at the 
prospect of testing and using another negation engine!

Tim,

I think that you have done negation testing before (Plos one, '14?).  Do you 
still have a test configuration (i2b2, sharpn, mipacq) lurking somewhere?

Sean

From: Peter Abramowitsch 
Sent: Wednesday, January 5, 2022 12:51 AM
To: dev@ctakes.apache.org
Subject: Re: Performance of the cleartk history module [EXTERNAL]

* External Email - Caution *


Hi Tim,
The performance boost was the frosting on the cake:  I had to make changes
(at least for our team) because Negex was not working correctly in
sentences with multiple identified annotations only some of which were
meant to be negated.  Negex became over-eager - applying negation when it
shouldn't have.  But even in its original version it was much more
effective than the cleartk polarity module. Shifting from Polarity to
the original Negex was decidedly slower - you could feel it.

However, you're right it would be good to benchmark it and get some real
numbers.  But as I say, it was the need to fix some of its problems that
was the primary issue.  I suspect that the regex cpu loading wasn't a big
issue in the early days of Negex when testing on grammatical biomedical
text and there were only a few negex trigger patterns.   But with 310
potential patterns and extremely dense notes it can make a real
difference.   The compiled regex from each pattern is fairly complex as
well.

I don't like code that does unnecessary work (literally billions of times
in my case)  - and in a large suite like cTakes all the little coding
shortcuts that waste CPU do add up.

I'll do a test and publish the results when I check in the code.

On Tue, Jan 4, 2022 at 8:54 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Peter,
> That sounds really useful! Were you able to benchmark it for runtime on a
> reasonably sized sample of your notes? Just curious because I wouldn't have
> expected regex to be that much of a bottleneck.
> Tim
>
>
> On Tue, 2022-01-04 at 17:36 -0800, Peter Abramowitsch wrote:
>
> * External Email - Caution *
>
>
>
> Thank you for the fulsome and humorous response.  Yes, I understand
>
> perfectly.  We definitely think along the same lines.  One of the drawbacks
>
> of static and simple to understand utility functions like JCasUtil's  is
>
> that one can just slap things together without getting to grips with the
>
> wastage of resources that sometimes occur.
>
>
> This brings me to the topic of Negex.  I've done a lot of improvements to
>
> it, also after I sent you that version last year.  It has been well tested
>
> in over 100 million notes so i think I can check it in.  But back to
>
> performance - it used to execute 200+ regular expressions multiple times on
>
> every sentence covering an identified annotation regardless of whether
>
> there was any hope of any of them matching.   My solution was to build an
>
> inverted index of the compiled expressions keyed on unique words found in
>
> the expressions, so based on the sentence,  I could look up and execute
>
> only the expressions that might match.  This might cut the number of regex
>
> operations down to 5 or 10 and sometimes none at all.There were many
>
> other changes that related to negation detection, of course.  For instance
>
> - handling sentences that switch between negating and non negating phrases
>
> within the same sentence.
>
>
> Peter
>
>
> On Tue, Jan 4, 2022 at 10:47 AM Finan, Sean <
>
> <mailto:sean.fi...@childrens.harvard.edu>
>
> sean.fi...@childrens.harvard.edu
>
> > wrote:
>
>
> Great question.
>
>
> The package name "windowed" isn't helpfully self-descriptive.  It contains
>
> yet another bit of code that I wrote as quickly as possible to help
>
> somebody in real-time with a problem.
>
> * There is only a 'procedural' difference between the two.  The models and
>
> methods are the same.
>
>
> The assertion engine has a bunch of objects delegating to objects
>
> delegating to more objects.  Each object calls one or more
>
> JCasUtil.select() frequently for the same types.  They also redundantly
>
> call JCasUtil.selectCovered() and selectCovering() for the same types.
>
>
> process( jcas ) {
>
>   Collection<..> sentences = ...select(..);
>
>   delegateA.do( sentences );
>
> }
>
> class DelegateA {
>
>   void do( Collection<..> sentences ) {
>
>for ( Sentence sentence : sentences ) {
>
>   Collection tokens = JCasUtil.selectCovered( jcas,
>
> Token.class, sentence );
>
>   delegateB.use( to

Re: Performance of the cleartk history module [EXTERNAL]

2022-01-04 Thread Finan, Sean
Great question.

The package name "windowed" isn't helpfully self-descriptive.  It contains yet 
another bit of code that I wrote as quickly as possible to help somebody in 
real-time with a problem.
* There is only a 'procedural' difference between the two.  The models and 
methods are the same.

The assertion engine has a bunch of objects delegating to objects delegating to 
more objects.  Each object calls one or more JCasUtil.select() frequently for 
the same types.  They also redundantly call JCasUtil.selectCovered() and 
selectCovering() for the same types.

process( jcas ) {
  Collection<..> sentences = ...select(..);
  delegateA.do( sentences );
}
class DelegateA {
  void do( Collection<..> sentences ) {
   for ( Sentence sentence : sentences ) {
  Collection tokens = JCasUtil.selectCovered( jcas, Token.class, 
sentence );
  delegateB.use( tokens );
 }
}
class DelegateB {
  void use( Collection<..> tokens ) {
 Collection sentence = JCasUtil.selectCovering( jcas, 
Sentece.class, tokens );
...
  }
}

The above isn't an exact representation, but you get the point.
The problem with code like this is repeated traversal of the (object) array in 
the cas.  Every JCasUtil.select* pours through the whole thing.  For a small 
document with a small cas (or early in a pipeline), that array may be small and 
the traversal fast.  However, when people are (unadvisably) processing a single 
document that sizes in the gigabyte range, repeatedly going through the cas 
takes a long time.

So, what I did was create a single container object that holds Collections of 
the types of interest and their covering relationships, populate all that stuff 
once per process( jcas ) and pass that container through to each delegate 
object.  Basically, a jcas lite.  The biggest culprit in the assertion engines 
was repeatedly iterating over the array for covered and covering windows, hence 
the subpackage name "windowed".

Is it faster for smaller docs?  Not so much.  Does it instantaneously process 
the Encyclopedia Brittanica as one text?  Of course not.  Is it orders of 
magnitudes faster on such onerous docs?  In my tests, yes.

Going through my delegating example above, the end delegate is the same.  Hence 
the processing is the same and repeatable.  In my tests on both small and 
gargantuan documents the windowed version and the original version produced the 
same output.

Sean


   



From: Peter Abramowitsch 
Sent: Tuesday, January 4, 2022 11:39 AM
To: dev@ctakes.apache.org
Subject: Re: Performance of the cleartk history module [EXTERNAL]

* External Email - Caution *


Hi Sean
Ok..  I was confused whether I was meant to find it in the sources.
But while you're reading this, is there a brief way to describe the
difference between the older:package

org.apache.ctakes.assertion.medfacts.cleartk;
and
org.apache.ctakes.assertion.medfacts.cleartk.windowed

Peter





On Tue, Jan 4, 2022 at 7:47 AM Finan, Sean 
wrote:

> Hi Peter,
>
> I created a second engine that just used text matching or regular
> expressions given the discovered events.  It also uses covering section
> types, formatted text and other things, but the text match might be the
> most impactful item.
>
> You are an accomplished developer so the email scratch below is for the
> benefit of others who search archives.
>
> class LazyHistoryFinder extends JCasAnnotator_ImplBase {
>   String[] HISTORY = { "history of", "h/o", "h / o" };
>
>   boolean isHistory( EventMention event ) {
>text = e.getCoveredText().toLowerCase();
>   return Arrays.stream( HISTORY ).anyMatch( text::startsWith );
>   }
>
>   void process( JCas jcas ) throws Analysis*Ex {
> JCasUtil.select( jcas, EventMention.class )
>  .stream()
>  .filter( this::isHistory )
>  .foreach( e -> e.setHistoryOf(
> CONST.NE_HISTORY_OF_PRESENT ) );
>   }
> }
>
> It requires a stroll through the monstrous cas array and it certainly
> isn't sexy, but it gets the job done.
>
> Sean
>
>
> 
> From: Peter Abramowitsch 
> Sent: Monday, January 3, 2022 10:23 PM
> To: dev@ctakes.apache.org
> Subject: Re: Performance of the cleartk history module [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thanks Sean
>
> By "following engine", you mean a second instance of the history engine
> that uses only the event spans, or you modified the current one to traverse
> the event-span within the context window?I see you made some source
> changes in that area and will check tomorrow.
>
> Peter
>
> On Mon, Jan 3, 2022 at 2:26 PM Finan, Sean <
> sean.fi...@childrens.harvard.edu>
> wrote:
>
> > Hi Peter,
> >
> 

Re: Performance of the cleartk history module [EXTERNAL]

2022-01-04 Thread Finan, Sean
Hi Peter,

I created a second engine that just used text matching or regular expressions 
given the discovered events.  It also uses covering section types, formatted 
text and other things, but the text match might be the most impactful item.

You are an accomplished developer so the email scratch below is for the benefit 
of others who search archives. 

class LazyHistoryFinder extends JCasAnnotator_ImplBase {
  String[] HISTORY = { "history of", "h/o", "h / o" };

  boolean isHistory( EventMention event ) {
   text = e.getCoveredText().toLowerCase();
  return Arrays.stream( HISTORY ).anyMatch( text::startsWith );
  }

  void process( JCas jcas ) throws Analysis*Ex {
JCasUtil.select( jcas, EventMention.class )
 .stream()
 .filter( this::isHistory )
 .foreach( e -> e.setHistoryOf( CONST.NE_HISTORY_OF_PRESENT ) );
  }
}

It requires a stroll through the monstrous cas array and it certainly isn't 
sexy, but it gets the job done.  

Sean



From: Peter Abramowitsch 
Sent: Monday, January 3, 2022 10:23 PM
To: dev@ctakes.apache.org
Subject: Re: Performance of the cleartk history module [EXTERNAL]

* External Email - Caution *


Thanks Sean

By "following engine", you mean a second instance of the history engine
that uses only the event spans, or you modified the current one to traverse
the event-span within the context window?I see you made some source
changes in that area and will check tomorrow.

Peter

On Mon, Jan 3, 2022 at 2:26 PM Finan, Sean 
wrote:

> Hi Peter,
>
> I have noticed this and just added a following engine that recognized text
> within event spans.  It is a lazy solution, but it fit my needs and
> available time.
>
> Sean
> 
> From: Peter Abramowitsch 
> Sent: Monday, January 3, 2022 5:03 PM
> To: dev@ctakes.apache.org
> Subject: Performance of the cleartk history module [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi All
>
> I've noticed that the HistoryCleartkAnalysisEngine misses many common forms
> of subject history including the obvious "h/o" prefix.Looking into the
> distribution, there's a model.jar and what  appears to be a weights file
> containing trigger words:
> resources/org/apache/ctakes/assertion/models/history.txt   where h, o, /
> are all given their own weights.   But I'm not sure that they're actually
> used in this way:  see below.   However, there's also a tiny file:
> /org/apache/ctakes/assertion/semantic_classes/history.txt
> which does contain a few entries including "h/o" which I assume is used for
> training but is never referred to anywhere.
>
> Here's the behavior I'm seeing:
> example input condition term found history feature marked range text
> history of pregnancies "history of" included in the cu_term and prefterm
> yes
>   no history of pregnancies
> history of adenopathy "history of" not included in the cu_term or prefterm
> yes yes adenopathy
> H/O postpartum psychosis "h/o" not included in the prefterm or cu_term yes
> yes postpartum psychosis
> H/O: postpartum psychosis "h/o" not included in the prefterm or cu_term yes
> no postpartum psychosis
> H/O pregnancies "h/o"  included in the  cu_term yes no h/o pregnancies
>
> You can see that it is quite perverse -  there is a pattern suggesting that
> if the concept definition occupies the history words, then they cannot be
> seen by the history annotation engine.
>
> Has anyone else noticed this - and have they done anything about it?
>
> Peter
>


Re: Performance of the cleartk history module [EXTERNAL]

2022-01-03 Thread Finan, Sean
Hi Peter,

I have noticed this and just added a following engine that recognized text 
within event spans.  It is a lazy solution, but it fit my needs and available 
time.

Sean

From: Peter Abramowitsch 
Sent: Monday, January 3, 2022 5:03 PM
To: dev@ctakes.apache.org
Subject: Performance of the cleartk history module [EXTERNAL]

* External Email - Caution *


Hi All

I've noticed that the HistoryCleartkAnalysisEngine misses many common forms
of subject history including the obvious "h/o" prefix.Looking into the
distribution, there's a model.jar and what  appears to be a weights file
containing trigger words:
resources/org/apache/ctakes/assertion/models/history.txt   where h, o, /
are all given their own weights.   But I'm not sure that they're actually
used in this way:  see below.   However, there's also a tiny file:
/org/apache/ctakes/assertion/semantic_classes/history.txt
which does contain a few entries including "h/o" which I assume is used for
training but is never referred to anywhere.

Here's the behavior I'm seeing:
example input condition term found history feature marked range text
history of pregnancies "history of" included in the cu_term and prefterm yes
  no history of pregnancies
history of adenopathy "history of" not included in the cu_term or prefterm
yes yes adenopathy
H/O postpartum psychosis "h/o" not included in the prefterm or cu_term yes
yes postpartum psychosis
H/O: postpartum psychosis "h/o" not included in the prefterm or cu_term yes
no postpartum psychosis
H/O pregnancies "h/o"  included in the  cu_term yes no h/o pregnancies

You can see that it is quite perverse -  there is a pattern suggesting that
if the concept definition occupies the history words, then they cannot be
seen by the history annotation engine.

Has anyone else noticed this - and have they done anything about it?

Peter


Re: empty preferredText [EXTERNAL]

2021-12-07 Thread Finan, Sean
Hear, hear!

From: Peter Abramowitsch 
Sent: Tuesday, December 7, 2021 1:17 PM
To: dev@ctakes.apache.org
Subject: Re: empty preferredText [EXTERNAL]

* External Email - Caution *


"but I might revisit it on a snowy afternoon this winter. "  >>  let's hope
for lots of those and good holidays for all.

Peter

On Tue, Dec 7, 2021 at 7:05 PM Finan, Sean 
wrote:

> I think that you are both correct.
>
> One or more preferred texts for a concept should be available if the user
> created a local copy of the umls that contains one a vocabulary that
> contains a preferred text.  This is the first step in the process of
> creating a ctakes dictionary and takes place before the ctakes dictionary
> creator gui is started.  I forget what the umls tool is called.
> Metamorphosys?  Using default settings in metamorphosys(?) should make
> available a lot of preferred texts.
>
> I think that the ctakes dictionary creator does need to have such
> vocabularies checked as src and dest. I think that the original intention
> of separate src and dest options was so that you could obtain information
> like synonyms and preferred text from a vocabulary without writing a column
> of its codes to the dictionary.  It may not be working as intended or I may
> have deviated from that tactic at some point.
>
> A checked dest vocabulary will add a column to the dictionary database,
> but you can accelerate searches in ctakes by excluding them as desired code
> sources.   I am probably diving too far into the weeds with that
> information.
>
> While it is possible that the ctakes dictionary creator leaves unspecified
> preferred texts unavailable for a 'good' reason, it is more likely that I
> wasn't paying enough attention during implementation.  It has been 2 years
> since I looked at the dictionary creator (for the case-sensitive lookup),
> but I might revisit it on a snowy afternoon this winter.  Of course, if
> anybody out there in the dev world would like to take a first crack at it
> ...
>
> Sean
> 
> From: Peter Abramowitsch 
> Sent: Tuesday, December 7, 2021 12:34 PM
> To: dev@ctakes.apache.org
> Subject: Re: empty preferredText [EXTERNAL]
>
> * External Email - Caution *
>
>
> I think the issue is that preferred text in the dictionary is only
> populated by matches from the "dest" vocabularies and it uses *their*
> preferred text.  If there's no match in any of them, then it should put the
> CUI's own preferred text entry in the dictionary, but it doesn't.  I'm
> pretty sure It's available during the dictionary creation process, but
> probably not used.
>
> On Tue, Dec 7, 2021 at 6:22 PM Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
>
> > OK, I thought this might be what's happening. I did check my 2021 UMLS
> > release and the cui does seem to have a preferred text but I think my
> > container is using an older release. For what it's worth the CUI is:
> > C0360554
> >
> > and a sentence that reproduces the issue in CVD with the current release
> > is:
> >
> > 'Patient had problems tolerating oral hydrocortisone.'
> >
> > I will see if I can find the older UMLS release lying around. I think the
> > right workaround for now is your suggestion of using the covered text.
> >
> > Tim
> >
> >
> > On Tue, 2021-12-07 at 17:59 +0100, Peter Abramowitsch wrote:
> >
> > * External Email - Caution *
> >
> >
> >
> > Hi Tim,
> >
> >
> > Yes, I've definitely encountered it.   It happens when the concept has a
> >
> > CUI_TERM which has matched the text, but there is no corresponding entry
> in
> >
> > the SNOMED or other vocab table mapping CUI to SNOMED.  The obvious
> choice
> >
> > is to use the covered text as a surrogate, but technically it could be
> PHI
> >
> > if that matters to you.  The other thing is to see if there's an MSH term
> >
> > that maps using the metathesaurus.  If so, including MSH in your
> dictionary
> >
> > as a src AND dest vocab will solve the problem.
> >
> >
> > Peter
> >
> >
> >
> > On Tue, Dec 7, 2021 at 5:45 PM Miller, Timothy <
> >
> > <mailto:timothy.mil...@childrens.harvard.edu>
> >
> > timothy.mil...@childrens.harvard.edu
> >
> > > wrote:
> >
> >
> > Hello,
> >
> > I'm using the dictionary lookup (through ctakes-web-rest) and trying to
> >
> > read off the preferredText that comes back as a human-readable way to
> >
> > display the CUI. On a very small percentage, there does not seem to be
> any
> >
> > preferredText. Has anyone else encountered this? Is this a limitation of
> >
> > the underlying ontologies or a bug we can address?
> >
> > Tim
> >
> >
> >
>


Re: followup question [EXTERNAL] [SUSPICIOUS]

2021-10-27 Thread Finan, Sean
Oops, looking at what I jut wrote:

edgeNodes.put( relation.getCategory(), new Pair( argument1, argument2 ) 
);

would definitely NOT work.  This might be better, though a little more tedious:

 final Map edgeTypeCounts = new HashMap<>();
 final Map> edgeNodes = new HashMap<>();
  for ( BinaryTextRelation relation : relations ) {
 final String edgeType = relation.getCategory();
 final int typeCount = edgeTypeCounts.computeIfAbsent( edgeType, i -> 0 
);
 edgeTypeCounts.put( edgeType, typeCount + 1 );
 final Annotation argument1 = relation.getArg1().getArgument();
 final Annotation argument2 = relation.getArg2().getArgument();
edgeNodes.put( edgeType + "_" + typeCount, new Pair( argument1, 
argument2 ) );
  }

But then you would need to strip "_"{index} from all of the edges.  
Alternatively you could use Pair as the map key, but more than one 
relation type between two annotations would cause problems.  Or an ugly 
Map>> in which the pairs are constantly added 
to the collection ... 

Sean
________
From: Finan, Sean 
Sent: Wednesday, October 27, 2021 10:15 AM
To: dev@ctakes.apache.org
Subject: Re: followup question [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi Peter,

To your question:
>there isn't yet much code that pulls these relations out in any user-friendly 
>way?
the answer is mostly no.  The relation-related code in ctakes is mostly for 
experimentation and not utilization.

Annotations/Events and Relations are translated in a few places into other node 
and edge formats.

For instance, the ctakes-fhir module has a fhir reader and writer that utilize 
cas relations to create an extension in a fhir Element that consists of the 
relation type and a fhir Reference to the fhir Element representing the 
relation target annotation/event.

Elsewhere they are thrown into various visual output types, such as the 
PropertyTextWriter, PrettyTextWriter and HtmlTextWriter in ctakes-core.

There isn't any code that I know of that would do something like:
   Map> getEdgeAndNodes(JCas jcas ) {
  final Collection relations = JCasUtil.select( jcas, 
BinaryTextRelation.class );
  if ( relations == null || relations.isEmpty() ) {
 return Collections.emptyMap();
  }
  final Map> edgeNodes = new HashMap<>();
  for ( BinaryTextRelation relation : relations ) {
 final Annotation argument1 = relation.getArg1().getArgument();
 final Annotation argument2 = relation.getArg2().getArgument();
edgeNodes.put( relation.getCategory(), new Pair( argument1, argument2 ) 
);
  }
  return edgeNodes;
   }

But if that works then use it.

Sean


From: Peter Abramowitsch 
Sent: Wednesday, October 27, 2021 6:21 AM
To: dev@ctakes.apache.org
Subject: followup question [EXTERNAL]

* External Email - Caution *


Hi Sean,   I've been doing a bit of reading on propbanks, framesets, etc in
relation to what I'm seeing in the  CAS when I turn on some of the relation
extractors that do work (in contrast to the ones I mentioned before that
are missing a model).

Is it safe to say that these extractors are mostly experiments used to
validate semantic approaches proposed back in the day, and that there isn't
yet much code that pulls these relations out in any user-friendly way?   By
user-friendly I mean  creating a simpler "edge object" that simply joins
two identified events in a way that hides all the intermediate collections
of structures generated by these relation extractors and their dependencies.

Peter


Re: followup question [EXTERNAL]

2021-10-27 Thread Finan, Sean
Hi Peter,

To your question:
>there isn't yet much code that pulls these relations out in any user-friendly 
>way?
the answer is mostly no.  The relation-related code in ctakes is mostly for 
experimentation and not utilization.

Annotations/Events and Relations are translated in a few places into other node 
and edge formats.

For instance, the ctakes-fhir module has a fhir reader and writer that utilize 
cas relations to create an extension in a fhir Element that consists of the 
relation type and a fhir Reference to the fhir Element representing the 
relation target annotation/event. 

Elsewhere they are thrown into various visual output types, such as the 
PropertyTextWriter, PrettyTextWriter and HtmlTextWriter in ctakes-core.

There isn't any code that I know of that would do something like:
   Map> getEdgeAndNodes(JCas jcas ) {
  final Collection relations = JCasUtil.select( jcas, 
BinaryTextRelation.class );
  if ( relations == null || relations.isEmpty() ) {
 return Collections.emptyMap();
  }
  final Map> edgeNodes = new HashMap<>();
  for ( BinaryTextRelation relation : relations ) {
 final Annotation argument1 = relation.getArg1().getArgument();
 final Annotation argument2 = relation.getArg2().getArgument();
edgeNodes.put( relation.getCategory(), new Pair( argument1, argument2 ) 
);
  }
  return edgeNodes;
   }

But if that works then use it.

Sean


From: Peter Abramowitsch 
Sent: Wednesday, October 27, 2021 6:21 AM
To: dev@ctakes.apache.org
Subject: followup question [EXTERNAL]

* External Email - Caution *


Hi Sean,   I've been doing a bit of reading on propbanks, framesets, etc in
relation to what I'm seeing in the  CAS when I turn on some of the relation
extractors that do work (in contrast to the ones I mentioned before that
are missing a model).

Is it safe to say that these extractors are mostly experiments used to
validate semantic approaches proposed back in the day, and that there isn't
yet much code that pulls these relations out in any user-friendly way?   By
user-friendly I mean  creating a simpler "edge object" that simply joins
two identified events in a way that hides all the intermediate collections
of structures generated by these relation extractors and their dependencies.

Peter


Re: Question about use of Time Annotators in 4.0.1 (trunk) [EXTERNAL]

2021-10-26 Thread Finan, Sean
Hi Peter,

I use the piper files, and temporal sub piper TemporalSubPipe.piper in
ctakes-temporal-res/src/main/resources/org/apache/ctakes/temporal/pipeline

contains the following:

// Commands and parameters to create a default temporal processing 
sub-pipeline.  This is not a full pipeline.

// 'Generic' Events.  Use addDescription and let the EventAnnotator set itself 
up with defaults.
addDescription EventAnnotator

// Times.  Use addLogged to log start and finish of processing.  There aren't 
default models, so set specifically
add BackwardsTimeAnnotator 
classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar

// DocTimeRel: the relation bin for Events to the Document Creation Time.
add DocTimeRelAnnotator 
classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar

// Event - Time binary relations.
add EventTimeRelationAnnotator 
classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar

// Event - Event binary relations.
add EventEventRelationAnnotator 
classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar


The last time that I ran this it completed successfully.

Sean



From: Peter Abramowitsch 
Sent: Tuesday, October 26, 2021 9:09 AM
To: dev@ctakes.apache.org
Subject: Question about use of Time Annotators in 4.0.1 (trunk) [EXTERNAL]

* External Email - Caution *


I have a couple of questions about the TimeAnnotators  forward and backward

1.  The BackwardsTimeAnnotator complains that it doesn't know whether it is
in training mode when there is no "inTraining" parameter.  But when I
supply it with the value false, then it complains that it doesn't have a
classifier jar path, as if it now really thinks it's in training!  So
what's the trick to make it happy that it's not in training.

2.  The unit tests for the time annotators contain the two extra pipeline
steps:

   - CopyNPChunksToLookupWindowAnnotations.class
   - RemoveEnclosedLookupWindows.class

Are these needed for regular use of the Time annotators or is this just a
Unit test feature.

Peter


Re: An exception occured while executing the Java class. URI is not hierarchical [EXTERNAL]

2021-08-18 Thread Finan, Sean
Hi Benjamin,

My first question is: what pipeline are you trying to run?

My second question is: Do you really need to use LVG?

Sean

From: Benjamin hansen 
Sent: Wednesday, August 18, 2021 3:07 AM
To: dev@ctakes.apache.org
Subject: An exception occured while executing the Java class. URI is not 
hierarchical [EXTERNAL]

* External Email - Caution *


While working at a simple pipeline example I got this error:

*java.lang.IllegalArgumentException*: *URI is not hierarchical*

*at* java.io.File. (*File.java:420*)

*at* org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load (
*LvgCmdApiResourceImpl.java:65*)


I found that this issue has already been reported 4 years ago here
https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CTAKES-445__;!!NZvER7FxgEiBAiR_!9HBmXkq30TUdwnSpuHc8_7iEVkoMAiJ3p_rSTXE5d90TARHEdioOMNukOUaL6eB5CboTRBMsOYI$


I am on MacOS which the workaround patched proposed in that thread does not
fix... And like the last comment in the thread says - the patch likely also
does not work on linux.


This seems to be quite a serious bug since both mac and linux would be
serious development and production platforms for ctakes users.


Is there no fix for this after 4 years?


Re: UIMA version 3 or later JCas required [EXTERNAL]

2021-08-17 Thread Finan, Sean

Without a little more information I cannot tell you exactly why you are getting 
that exact error.

ctakes 4.0.0.1 was built with UIMA v2.  Trying to use a newer version of uima 
and/or uimafit will (frequently) not work.

Any declared dependency on ctakes, such as you have for core and 
clinical-pipeline (which includes core) will import the required uima, uimafit 
and other dependencies.  No additional declaration of dependency is required.

If you absolutely require uima 3 for some reason then I don't think that I can 
help you.  You may want to ask the uima lists about mixing versions or 
equivalent v2 solutions for your goals.

Sean



From: Benjamin hansen 
Sent: Tuesday, August 17, 2021 3:35 PM
To: dev@ctakes.apache.org
Subject: UIMA version 3 or later JCas required [EXTERNAL]

* External Email - Caution *


Hello, I am trying to compile some of the examples from the ctakes example
directory using maven. Specifically I am trying to compile the example
called: HelloWorldAggregatePipeline

I am using ctakes v. 4.0.0.1 and uima fit v. 3.0.2 and i am getting the
following error when i run

mvn package

and then

mvn exec:java -Dexec.mainClass=com.myproj.app.HelloWorldAggregatePipeline



Caused by: org.apache.uima.cas.CASRuntimeException: The JCas cannot be
initialized.  The following errors occurred:

JCas Class "org.apache.uima.jcas.tcas.DocumentAnnotation", loaded from
"jar:file:/Users/myuser/.m2/repository/org/apache/uima/uimaj-document-annotation/2.9.0/uimaj-document-annotation-2.9.0.jar!/org/apache/uima/jcas/tcas/DocumentAnnotation.class",
is missing required constructor; likely cause is wrong version (UIMA
version 3 or later JCas required).


I don't understand why i am getting this error when i am using uima fit v.
3.0.2.

My pom.xml file is seen below


Can anyone help me fix this please?







https://urldefense.com/v3/__http://maven.apache.org/POM/4.0.0__;!!NZvER7FxgEiBAiR_!_gz7tElyeaTShAFytscJFcJGhRYag_Hgp1bdGfI1KSmdRS05tiRrLBFBwlbg_S6zR_NKwy3Wlow$
 " xmlns:xsi="
https://urldefense.com/v3/__http://www.w3.org/2001/XMLSchema-instance__;!!NZvER7FxgEiBAiR_!_gz7tElyeaTShAFytscJFcJGhRYag_Hgp1bdGfI1KSmdRS05tiRrLBFBwlbg_S6zR_NKjIAjSAA$
 " xsi:schemaLocation="
https://urldefense.com/v3/__http://maven.apache.org/POM/4.0.0__;!!NZvER7FxgEiBAiR_!_gz7tElyeaTShAFytscJFcJGhRYag_Hgp1bdGfI1KSmdRS05tiRrLBFBwlbg_S6zR_NKwy3Wlow$
https://urldefense.com/v3/__http://maven.apache.org/xsd/maven-4.0.0.xsd__;!!NZvER7FxgEiBAiR_!_gz7tElyeaTShAFytscJFcJGhRYag_Hgp1bdGfI1KSmdRS05tiRrLBFBwlbg_S6zR_NKdagS9L8$
 ">
  4.0.0

  com.myproj.app
  myproj
  1.0-SNAPSHOT

  myproj
  
  
https://urldefense.com/v3/__http://www.example.com__;!!NZvER7FxgEiBAiR_!_gz7tElyeaTShAFytscJFcJGhRYag_Hgp1bdGfI1KSmdRS05tiRrLBFBwlbg_S6zR_NKGoYVPsc$
 

  
UTF-8
1.7
1.7
  

  

  junit
  junit
  4.11
  test



  org.apache.uima
  uimaj-core
  3.2.0



  org.apache.uima
  uimafit-core
  3.2.0



  org.apache.ctakes
  ctakes-core
  4.0.0.1



  org.apache.ctakes
  ctakes-clinical-pipeline
  4.0.0.1


  

  

  
  


  maven-clean-plugin
  3.1.0



  maven-resources-plugin
  3.0.2


  maven-compiler-plugin
  3.8.0


  maven-surefire-plugin
  2.22.1


  maven-jar-plugin
  3.0.2


  maven-install-plugin
  2.5.2


  maven-deploy-plugin
  2.8.2



  maven-site-plugin
  3.7.1


  maven-project-info-reports-plugin
  3.0.0

  

  



Re: Can you store cTAKES in an S3 bucket so you can use it with EMR for parallel processing? [EXTERNAL]

2021-08-13 Thread Finan, Sean
Hi Tom,

That is great news!  I am always happy to hear that people are using ctakes and 
have gotten it to run in novel ways!

- and I am happy that ctakes-dockhand is useful in your solution!

Cheers,
Sean



From: Thomas W Loehfelm 
Sent: Thursday, August 12, 2021 1:58 PM
To: dev@ctakes.apache.org
Subject: Re: Can you store cTAKES in an S3 bucket so you can use it with EMR 
for parallel processing? [EXTERNAL]

* External Email - Caution *


For parallel processing, consider using the ctakes-dockhand component. You can 
run ctakes as a docker service and then scale it using docker swarm to 
replicate that service across many nodes to expand processing capacity.

I can am happy to share my experience doing it and the code I use, although I 
am not a java expert and so can’t necessarily explain WHY my stuff is the way 
it is, just that it gets the job done.

The problem I wanted to address was:

  1.  Using a custom dictionary…
  2.  Defining a custom processing pipeline (piper file)…
  3.  …set up an API endpoint to receive the text of a report
  4.  …and return a custom subset of ctakes output in json format
  5.  …achieve desired processing scale/throughput via docker

Thanks to Sean and many others on this forum we are up and running. If that 
general workflow is what you are looking for I am happy to help in any way I 
can.

Tom



From: John Doe 
Date: Tuesday, August 3, 2021 at 1:24 PM
To: dev@ctakes.apache.org 
Subject: Re: Can you store cTAKES in an S3 bucket so you can use it with EMR 
for parallel processing? [EXTERNAL]
Hello,

Thanks for the response. The reason we are using a shared location for
ctakes is so that we have everything in one place. If we need to add our
own components, dictionaries, etc., we can do it all in one spot. It also
saves us from having to download ctakes on every machine every time we
start up a cluster. I didn't know the regular java file API would still
work with S3 but will have to give that a try. I am relying on CTAKES_HOME
being set since ctakes is stored on EFS so the node wouldn't be able to
find it on its own local file system. I'm basically mounting the EFS folder
holding ctakes onto each node and setting CTAKES_HOME to that so it can
find all the files it needs to. For us anyway, S3 has come up as the
primary means of storage for EMR and I'm not sure if EFS will be available,
which is why I'm trying to see if I can do it on S3.

On Mon, Aug 2, 2021 at 11:41 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi John,
>
> I am not completely sure that I understand what you are asking, and I
> think that this is more of an s3 question than a ctakes question, but here
> are a couple of comments:
>
> > the cTAKES part of it relies on CTAKES_HOME being set
> - Is this requirement on your side?   I never bother to set CTAKES_HOME.
>
> > So I need to store cTAKES in a shared location
> - I am not sure why you need to do this when it is possible to spin up
> multiple machines, each with its own ctakes "installation."
>
> > Usually, in EMR, you would use S3 for this
> - This seems to be quite a blanket statement
>
> > cTAKES relies on a hierarchical file structure
> - ok ...
>
> > such as storing cTAKES on S3 instead
> - I have [essentially] done this.  If I remember correctly I didn't need
> to venture too far outside my comfort zone.
>
> > altering cTAKES to work with a flat file structure using the S3
> - I haven't touched it for many years, but the flat file structure was
> essentially internal to s3 and files can still be referenced via a complete
> "hierarchical path" - it is just that the filename is "bob/likes/ice.cream"
>
> Again, I haven't needed to work with this for about 5 years, so what I did
> might be completely irrelevant.  I would hope that implementation is now
> simpler, examples more prevalent and documentation better than back in the
> day.
>
> Sean
>
> 
> From: John Doe 
> Sent: Sunday, July 25, 2021 3:28 PM
> To: dev@ctakes.apache.org
> Subject: Can you store cTAKES in an S3 bucket so you can use it with EMR
> for parallel processing? [EXTERNAL]
>
> * External Email - Caution *
>
>
> I'm working on a solution for running cTAKES in an Amazon EMR environment
> with Apache Spark so I can run multiple instances of cTAKES in parallel for
> processing a bunch of notes. However, the cTAKES part of it relies on
> CTAKES_HOME being set on every machine for locating model files and such.
> So I need to store cTAKES in a shared location so every node can set
> CTAKES_HOME to that location. Usually, in EMR, you would use S3 for this
> but it seems that cTAKES relies on a hierarchical file structure for
> loading in files (model files, dictionary files, etc.)

Re: Can you store cTAKES in an S3 bucket so you can use it with EMR for parallel processing? [EXTERNAL]

2021-08-02 Thread Finan, Sean
Hi John,

I am not completely sure that I understand what you are asking, and I think 
that this is more of an s3 question than a ctakes question, but here are a 
couple of comments:

> the cTAKES part of it relies on CTAKES_HOME being set
- Is this requirement on your side?   I never bother to set CTAKES_HOME.

> So I need to store cTAKES in a shared location
- I am not sure why you need to do this when it is possible to spin up multiple 
machines, each with its own ctakes "installation."

> Usually, in EMR, you would use S3 for this 
- This seems to be quite a blanket statement

> cTAKES relies on a hierarchical file structure
- ok ...

> such as storing cTAKES on S3 instead
- I have [essentially] done this.  If I remember correctly I didn't need to 
venture too far outside my comfort zone.

> altering cTAKES to work with a flat file structure using the S3
- I haven't touched it for many years, but the flat file structure was 
essentially internal to s3 and files can still be referenced via a complete 
"hierarchical path" - it is just that the filename is "bob/likes/ice.cream"

Again, I haven't needed to work with this for about 5 years, so what I did 
might be completely irrelevant.  I would hope that implementation is now 
simpler, examples more prevalent and documentation better than back in the day.

Sean


From: John Doe 
Sent: Sunday, July 25, 2021 3:28 PM
To: dev@ctakes.apache.org
Subject: Can you store cTAKES in an S3 bucket so you can use it with EMR for 
parallel processing? [EXTERNAL]

* External Email - Caution *


I'm working on a solution for running cTAKES in an Amazon EMR environment
with Apache Spark so I can run multiple instances of cTAKES in parallel for
processing a bunch of notes. However, the cTAKES part of it relies on
CTAKES_HOME being set on every machine for locating model files and such.
So I need to store cTAKES in a shared location so every node can set
CTAKES_HOME to that location. Usually, in EMR, you would use S3 for this
but it seems that cTAKES relies on a hierarchical file structure for
loading in files (model files, dictionary files, etc.). My current solution
uses EFS as an alternative. Is there a better alternative to this approach
to getting cTAKES integrated with EMR? I know there are alternative non-EMR
approaches to parallelizing cTAKES, but I may not have those technologies
available. I'm wondering if there is a good way around using EFS such as
storing cTAKES on S3 instead, but it seems like altering cTAKES to work
with a flat file structure using the S3 API may be a pretty big task.


Re: How to find IdentifiedAnnotationBuilder in build [EXTERNAL]

2021-05-28 Thread Finan, Sean
Hi John,

It shouldn't be excluded from ctakes-core, but if you are looking at the binary 
distributable or maven central artifact for ctakes 4.0.0.1 then it will 
definitely be absent as it didn't exist in that version.  It is only in the 
trunk version of ctakes currently under development.  You can use the trunk 
version in your maven project.

In your pom:
...
   
  4.0.1-SNAPSHOT
...
  
org.apache.ctakes
ctakes-core
${ctakes.version}
...

And you might need:
   
  
 apache.snapshots
 Apache Development Snapshot Repository
 https://repository.apache.org/content/groups/snapshots/
 
false
 
 
true
 
  




Use Example:

/**
 * Finds clinical procedures in text using regular expressions.
 * Accepts parameters for the procedure's regular expression and the 
procedure's CUI.
 */
public class ApacheConDemoEngine extends JCasAnnotator_ImplBase {

   @ConfigurationParameter(
 name = "REGEX",
 description = "Regular Rexpression to use for matching clinical 
procedures.",
 defaultValue = "biopsy"
   )
   private String _regex;

   @ConfigurationParameter(
 name = "REGEX_CUI",
 description = "CUI for matched clinical procedure expressions.",
 defaultValue = "AC123"
   )
   private String _regexCui;

   /**
* Finds Procedures using a regular expression and creates Identified 
Annotations.
*/
   @Override
   public void process( JCas jCas ) throws AnalysisEngineProcessException {
  IdentifiedAnnotationBuilder builder = new IdentifiedAnnotationBuilder()
   .group( SemanticGroup.PROCEDURE )
   .cui( _regexCui );
  try ( RegexSpanFinder finder = new RegexSpanFinder( _regex ) ) {
 finder.findSpans( jCas.getDocumentText() )
   .forEach( span ->
 builder
   .span( span )
   .build( jCas ) );
  } catch ( IllegalArgumentException iaE ) {
 throw new AnalysisEngineProcessException( iaE );
  }
   }

}


Sean

 

From: John Doe 
Sent: Friday, May 28, 2021 10:45 AM
To: dev@ctakes.apache.org
Subject: How to find IdentifiedAnnotationBuilder in build [EXTERNAL]

* External Email - Caution *


Hello,

I'm just wondering where IdentifiedAnnotationBuilder is located in the
ctakes lib. I see it in the source code in ctakes-core but I can't find it
in any of the maven dependency packages. I also extracted the ctakes-core
jar in CTAKES_HOME/lib and still didn't find it. Is there a simple way to
use this builder? Ideally, I could just add a dependency to my pom and pull
it in but I can't seem to find any that have it. Why is it excluded from
ctakes-core?

Thanks.


Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2021-05-26 Thread Finan, Sean
BTW, I have created a new meeting and the invitation below is defunct.  You 
still need to email me with your info to get an invitation.

Sean

From: Finan, Sean 
Sent: Wednesday, May 26, 2021 8:09 AM
To: dev@ctakes.apache.org
Subject: Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) 
[EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Ok, so much for keeping this private ... I should have been more specific about 
my email address.

Michael - this was my fault, not yours.


From: Finan, Sean 
Sent: Wednesday, May 26, 2021 7:59 AM
To: dev@ctakes.apache.org
Subject: Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) 
[EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Zoom meeting information:

Join from your computer or mobile device: 
https://bostonchildrens.zoom.us/j/98010721495?pwd=OHcxRURSUHVuMkFmeXhtWWZBakxmUT09
Password: 984073

Or dial in from your telephone:
Internally:   x28882
Externally:  646-558-8656 (Primary)
 408-638-0968 (If you are unable to dial into the 
primary number)

Or iPhone one-tap:
 +16465588656,,98010721495#  or 
+14086380968,,98010721495#

EWS link: 
https://urldefense.com/v3/__https://zoom.us/wc/98010721495/join__;!!NZvER7FxgEiBAiR_!7ilopTOtH-V7reyogElLU2lXfPc_tnBheCO_4U2j4L6xwwJkVtChHhTQgpUhLdTqkbZehKwAoyc$

Meeting ID: 980 1072 1495

From: Michael J Gurley 
Sent: Wednesday, May 26, 2021 7:54 AM
To: dev@ctakes.apache.org
Subject: Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) 
[EXTERNAL]

* External Email - Caution *


I would like to attend.  My email address:

m-gur...@northwestern.edu



On 5/26/21, 7:43 AM, "Finan, Sean"  wrote:

Hi all,


I am happy to announce that Peter Klugl, creator of Apache UIMA Ruta, will 
participate in an informal discussion on Apache's UIMA Rule-based Text 
Annotator (Ruta).


This Apache UIMA(tm) component consists of two major parts: An Analysis 
Engine, which interprets and executes the rule-based scripting language, and 
the Eclipse-based tooling (Workbench), which provides various support for 
developing rules.

- 
https://urldefense.com/v3/__https://uima.apache.org/ruta.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!E6XMh3TgbAsRJUjerL2Vmn9bMt2yyUI1OfKSFnH3MF_GOucDhUL2h5iSgV8-FUj20vo4tFs$


?It was originally developed for segmenting and processing discharge 
letters and similar clinical documents. Since then (>10 years), Ruta has always 
been applied to clinical documents and is being deployed in production by 
several companies.

- Peter Klugl, creator of Apache UIMA Ruta.


The discussion will take place in a zoom meeting this Friday, May 28, at 
11:am U.S. Eastern Time (EDT).

To prevent unwanted trolling, please reply to this email with your zoom id 
or email address and I will add you to the participant list.


I look forward to a lively and informative discussion.


Sean Finan






Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) [EXTERNAL] [SUSPICIOUS]

2021-05-26 Thread Finan, Sean
Sorry all, I should have said "If you are interested, send directly to my email 
address."  I forgot that Apache automatically routes through the lists.

Please send an email to:

sean.fi...@tch.harvard.edu

thanks,
Sean
____
From: Finan, Sean 
Sent: Wednesday, May 26, 2021 7:42 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org; u...@uima.apache.org; 
d...@uima.apache.org
Subject: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Hi all,


I am happy to announce that Peter Klugl, creator of Apache UIMA Ruta, will 
participate in an informal discussion on Apache's UIMA Rule-based Text 
Annotator (Ruta).


This Apache UIMA(tm) component consists of two major parts: An Analysis Engine, 
which interprets and executes the rule-based scripting language, and the 
Eclipse-based tooling (Workbench), which provides various support for 
developing rules.

- 
https://urldefense.com/v3/__https://uima.apache.org/ruta.html__;!!NZvER7FxgEiBAiR_!4dz37BHzdL9KerF2Km127ihFsz1AdeX5uWV6n7Izv7uB5cihpLlxbv2gOG_56uEZYEvVP_4SyOk$


?It was originally developed for segmenting and processing discharge letters 
and similar clinical documents. Since then (>10 years), Ruta has always been 
applied to clinical documents and is being deployed in production by several 
companies.

- Peter Klugl, creator of Apache UIMA Ruta.


The discussion will take place in a zoom meeting this Friday, May 28, at 11:am 
U.S. Eastern Time (EDT).

To prevent unwanted trolling, please reply to this email with your zoom id or 
email address and I will add you to the participant list.


I look forward to a lively and informative discussion.


Sean Finan




Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) [EXTERNAL] [SUSPICIOUS]

2021-05-26 Thread Finan, Sean
Ok, so much for keeping this private ... I should have been more specific about 
my email address.

Michael - this was my fault, not yours.


From: Finan, Sean 
Sent: Wednesday, May 26, 2021 7:59 AM
To: dev@ctakes.apache.org
Subject: Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) 
[EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Zoom meeting information:

Join from your computer or mobile device: 
https://bostonchildrens.zoom.us/j/98010721495?pwd=OHcxRURSUHVuMkFmeXhtWWZBakxmUT09
Password: 984073

Or dial in from your telephone:
Internally:   x28882
Externally:  646-558-8656 (Primary)
 408-638-0968 (If you are unable to dial into the 
primary number)

Or iPhone one-tap:
 +16465588656,,98010721495#  or 
+14086380968,,98010721495#

EWS link: 
https://urldefense.com/v3/__https://zoom.us/wc/98010721495/join__;!!NZvER7FxgEiBAiR_!7ilopTOtH-V7reyogElLU2lXfPc_tnBheCO_4U2j4L6xwwJkVtChHhTQgpUhLdTqkbZehKwAoyc$

Meeting ID: 980 1072 1495

From: Michael J Gurley 
Sent: Wednesday, May 26, 2021 7:54 AM
To: dev@ctakes.apache.org
Subject: Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) 
[EXTERNAL]

* External Email - Caution *


I would like to attend.  My email address:

m-gur...@northwestern.edu



On 5/26/21, 7:43 AM, "Finan, Sean"  wrote:

Hi all,


I am happy to announce that Peter Klugl, creator of Apache UIMA Ruta, will 
participate in an informal discussion on Apache's UIMA Rule-based Text 
Annotator (Ruta).


This Apache UIMA(tm) component consists of two major parts: An Analysis 
Engine, which interprets and executes the rule-based scripting language, and 
the Eclipse-based tooling (Workbench), which provides various support for 
developing rules.

- 
https://urldefense.com/v3/__https://uima.apache.org/ruta.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!E6XMh3TgbAsRJUjerL2Vmn9bMt2yyUI1OfKSFnH3MF_GOucDhUL2h5iSgV8-FUj20vo4tFs$


?It was originally developed for segmenting and processing discharge 
letters and similar clinical documents. Since then (>10 years), Ruta has always 
been applied to clinical documents and is being deployed in production by 
several companies.

- Peter Klugl, creator of Apache UIMA Ruta.


The discussion will take place in a zoom meeting this Friday, May 28, at 
11:am U.S. Eastern Time (EDT).

To prevent unwanted trolling, please reply to this email with your zoom id 
or email address and I will add you to the participant list.


I look forward to a lively and informative discussion.


Sean Finan






Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) [EXTERNAL]

2021-05-26 Thread Finan, Sean
Zoom meeting information:

Join from your computer or mobile device: 
https://bostonchildrens.zoom.us/j/98010721495?pwd=OHcxRURSUHVuMkFmeXhtWWZBakxmUT09
Password: 984073

Or dial in from your telephone:
Internally:   x28882 
Externally:  646-558-8656 (Primary)
 408-638-0968 (If you are unable to dial into the 
primary number)

Or iPhone one-tap:
 +16465588656,,98010721495#  or 
+14086380968,,98010721495# 

EWS link: https://zoom.us/wc/98010721495/join

Meeting ID: 980 1072 1495

From: Michael J Gurley 
Sent: Wednesday, May 26, 2021 7:54 AM
To: dev@ctakes.apache.org
Subject: Re: Discussion on Apache UIMA Ruta (Rule-based Text Annotation) 
[EXTERNAL]

* External Email - Caution *


I would like to attend.  My email address:

m-gur...@northwestern.edu



On 5/26/21, 7:43 AM, "Finan, Sean"  wrote:

Hi all,


I am happy to announce that Peter Klugl, creator of Apache UIMA Ruta, will 
participate in an informal discussion on Apache's UIMA Rule-based Text 
Annotator (Ruta).


This Apache UIMA(tm) component consists of two major parts: An Analysis 
Engine, which interprets and executes the rule-based scripting language, and 
the Eclipse-based tooling (Workbench), which provides various support for 
developing rules.

- 
https://urldefense.com/v3/__https://uima.apache.org/ruta.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!E6XMh3TgbAsRJUjerL2Vmn9bMt2yyUI1OfKSFnH3MF_GOucDhUL2h5iSgV8-FUj20vo4tFs$


?It was originally developed for segmenting and processing discharge 
letters and similar clinical documents. Since then (>10 years), Ruta has always 
been applied to clinical documents and is being deployed in production by 
several companies.

- Peter Klugl, creator of Apache UIMA Ruta.


The discussion will take place in a zoom meeting this Friday, May 28, at 
11:am U.S. Eastern Time (EDT).

To prevent unwanted trolling, please reply to this email with your zoom id 
or email address and I will add you to the participant list.


I look forward to a lively and informative discussion.


Sean Finan






Discussion on Apache UIMA Ruta (Rule-based Text Annotation)

2021-05-26 Thread Finan, Sean
Hi all,


I am happy to announce that Peter Klugl, creator of Apache UIMA Ruta, will 
participate in an informal discussion on Apache's UIMA Rule-based Text 
Annotator (Ruta).


This Apache UIMA(tm) component consists of two major parts: An Analysis Engine, 
which interprets and executes the rule-based scripting language, and the 
Eclipse-based tooling (Workbench), which provides various support for 
developing rules.

- https://uima.apache.org/ruta.html


?It was originally developed for segmenting and processing discharge letters 
and similar clinical documents. Since then (>10 years), Ruta has always been 
applied to clinical documents and is being deployed in production by several 
companies.

- Peter Klugl, creator of Apache UIMA Ruta.


The discussion will take place in a zoom meeting this Friday, May 28, at 11:am 
U.S. Eastern Time (EDT).

To prevent unwanted trolling, please reply to this email with your zoom id or 
email address and I will add you to the participant list.


I look forward to a lively and informative discussion.


Sean Finan




Re: Java Question [EXTERNAL]

2021-05-21 Thread Finan, Sean
Hi John, I sent a reply directly to jrcaskey at medicine.wisc.edu
Let me know if you don't get it.
Sean

Re: Java Question [EXTERNAL]

2021-05-20 Thread Finan, Sean
Hi John,

>Can you help me with a few questions?
- I will try.  Others may offer alternate or additional information.

> If I wanted to create a modified workflow without the entire source code, 
> could I create a jar file of the module I wanted to modify and then replace 
> that jar file in the User Install
- Yes.
In IntelliJ you should be able to open the Maven panel to handle this.  If you 
compiled with some other means, then in the top menubar View > Tool Windows > 
Maven.
The maven panel should display a list (tree) of all of the ctakes modules.  You 
can build an individual module here.  For instance, Apache cTAKES Dockhand > 
Lifecycle > package.  You will see some progress information in the run panel, 
including:
[INFO] Building jar: 
C:\Spiffy\ctakes_trunk\ctakes-dockhand\target\ctakes-dockhand-4.0.1-SNAPSHOT.jar
You just build a single module's jar - in this case the ctakes basic 
installation gui. The location of your new jar file will be in a location on 
your system.

>or would I need to compile the entire source code for cTAKES?
- No, but if you do ever want to build the entire module, use the maven panel, 
Apache cTAKES (root) > Lifecycle > package.

>When I try to create a jar file within IntelliJ IDEA, it asks for the main 
>class.
- It shouldn't do this if you use the maven package process as I outlined 
above.  If you still get a main class question then send me info on your 
complete process and I'll see if I can duplicate it.

>or should I build a jar file for the modified module without a main class and 
>then replace that in the lib/ folder of the User Install version of cTAKES?
- You should build one without specifying a main class and copy it to lib/ in 
the User Install.

> when I try to run cTAKES, I receive the error:
The feature org.apache.ctakes.typesystem.type.textspan.List:items is declared 
twice, with incompatible multipleReferencesAllowed specifications
- Is it an error that stops ctakes from running or is it just a warning?
The root of the problem is that different modules have copies of the type 
system xml.  This should be unnecessary and causes this problem if somebody 
modifies properties in one but not another.
For Instance, in ctakes-type-sysstem TypeSystem.xml :

  items
  
  uima.cas.FSList
  uima.tcas.Annotation
true

While in others:

items

uima.cas.FSList
uima.tcas.Annotation

This is essentially a bug that I am "fixing" right .. about .. now ...

>I haven’t made any modifications to how the type system is called, only in how 
>a custom dictionary is accessed.
- Just out of curiosity, how did you change custom dictionary access?  Maybe we 
can add it to ctakes.

Sean


From: JOHN R CASKEY 
Sent: Wednesday, May 19, 2021 5:01 PM
To: dev@ctakes.apache.org
Subject: Java Question [EXTERNAL]

* External Email - Caution *


Hello,
The cTAKES User Install is mostly sufficient for my lab, but I’ve found that I 
need to modify a few of the modules. I downloaded the cTAKES source and can 
successfully run workflows after updating the source code, but I’m having 
trouble building the modified modules and essentially creating an updated User 
Install of cTAKES from the source code. Can you help me with a few questions?


  *   I’m running IntelliJ IDEA, and I can compile the cTAKES source code with 
build profiles like runPiperGui without problems to run cTAKES programmatically 
or to start a GUI. I can also run essentially the same workflow in the User 
Install version by running the bash helper script ‘runPiperFile.sh’. If I 
wanted to create a modified workflow without the entire source code, could I 
create a jar file of the module I wanted to modify, and then replace that jar 
file in the User Install version, or would I need to compile the entire source 
code for cTAKES?

  *   When I try to create a jar file within IntelliJ IDEA, it asks for the 
main class. Would this be a class I create for the workflow I’m using, or 
something else? For example, could I use 
org.apache.ctakes.examples.pipeline.HelloWorldPiperRunner as a template to 
build a customized main class that would run programmatically, or should I 
build a jar file for the modified module without a main class and then replace 
that in the lib/ folder of the User Install version of cTAKES?

  *   I can create a jar file of a module within IntelliJ IDEA and then replace 
the modified jar file in the User Install version (for example, modifying 
ctakes-dictionary-lookup-fast-4.0.0.1.jar and replacing it in the lib/ folder), 
but when I try to run cTAKES, I receive the error:

The feature org.apache.ctakes.typesystem.type.textspan.List:items is declared 
twice, with incompatible multipleReferencesAllowed specifications

I haven’t made any 

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

2021-05-19 Thread Finan, Sean
Hi (other) Peter,

Many thanks for jumping in on this!

I would definitely be interested in seeing some examples, even though I don't 
have any specific use case right now.

I will ask a few local people and see if they are interested in an informal 
video chat.  If anybody out there in the general community is interested, 
please reply on this thread and maybe we can coordinate a single presentation 
time.

Cheers,

Sean

From: Peter Klügl 
Sent: Wednesday, May 19, 2021 3:33 PM
To: dev@ctakes.apache.org
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi all,


if you are interested in UIMA Ruta and want to know more about it, you
can always ask on the UIMA user list or me directly (I am the creator of
UIMA Ruta). I can also prepare some slides and we can have an informal
video chat where I give an overview of Ruta.


I am of course not objective here (for several reasons) but I think UIMA
Ruta could be really useful for cTAKES. It was originally developed for
segmenting and processing discharge letters and similar clincial
documents. Since then (>10 years), Ruta has always been applied to
clincial documents and is being deployed in production by several
companies. The language has some advantages and disadvantages compared
to other rule languages. In the context of cTAKES, the
direct/comprehensive support of UIMA and the IDE dev support are maybe
the most relevant advantages.


I was thinking about creating some introductory examples for the
combination and usage of UIMA Ruta and cTAKES. If you have a good use
case, let me know.


Best,


(another) Peter


Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> Hi all,
> Correct.
>
> Tim  is correct in the sense that he is using a custom dictionary (custom 
> synonyms, cuis, etc.) which kind of changes the "rules" of what the standard 
> dictionary lookup considers a valid term based upon available tokens in the 
> text.  There are other simple settings that further qualify how the standard 
> dictionary lookup accepts or discards synonyms.
>
> I think that what Greg is asking about is something with introduced "logic" 
> that can alter or remove terms already discovered by the standard dictionary 
> lookup.
>
> Peter and Kean both outline some custom annotators that they have created to 
> use logic that can alter/add/remove terms discovered by the standard 
> dictionary lookup.  I do the same thing for different projects and advise 
> everybody that applies ctakes to specific domains do the same.
>
> ctakes is a general purpose tool and results can definitely be improved when 
> catered to a more narrow purpose.
>
> Back to Greg, I got the feeling that he might be interested in a more 
> versatile annotator.  Introducing an engine that can utilize something like 
> ruta has several advantages:
> 1.  You  can "easily" add complex rules in one place.
> 2.  You can change rules external to code ...
>   2a. the same pipeline can be catered to different projects without changing 
> code in an annotator or creating a new annotator.
>   2b.  An end user who knows nothing about ctakes can change a ruta script to 
> fit their purposes.
> 3. Rules are supported and documented by uima ruta, so you don't have to 
> worry about that extra headache.
> 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the community 
> can apply ruta rules to their project.
>
> When I looked at it a few years ago it was for reason 2b.  In the end we went 
> for different annotators like Peter and Kean outlined and just use piper file 
> changes to satisfy #2 as that is definitely much easier.  However, it doesn't 
> benefit the community as a whole (#4).
>
> Cheers all, this is a great conversation!
>
> Sean
>
>
>
>
> 
> From: Kean Kaufmann 
> Sent: Wednesday, May 19, 2021 7:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
>> yes,  the line between "lookup" and rule execution is a little blurry
> sometimes.
>
> Sure is.  I blur it with a set of annotators that extend dictionary
> annotations based on words or annotations covered by the same Chunk, e.g.
>
> DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
> DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
> DiseaseDisorderMention
> ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention
>
> Higher recall than the regular UmlsLookupAnnotator;
> higher precision than the UmlsOverlapLookupAnnotator (which skips a
> specified num

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

2021-05-19 Thread Finan, Sean
Hi all,
Correct.

Tim  is correct in the sense that he is using a custom dictionary (custom 
synonyms, cuis, etc.) which kind of changes the "rules" of what the standard 
dictionary lookup considers a valid term based upon available tokens in the 
text.  There are other simple settings that further qualify how the standard 
dictionary lookup accepts or discards synonyms.

I think that what Greg is asking about is something with introduced "logic" 
that can alter or remove terms already discovered by the standard dictionary 
lookup.

Peter and Kean both outline some custom annotators that they have created to 
use logic that can alter/add/remove terms discovered by the standard dictionary 
lookup.  I do the same thing for different projects and advise everybody that 
applies ctakes to specific domains do the same.  

ctakes is a general purpose tool and results can definitely be improved when 
catered to a more narrow purpose.

Back to Greg, I got the feeling that he might be interested in a more versatile 
annotator.  Introducing an engine that can utilize something like ruta has 
several advantages:
1.  You  can "easily" add complex rules in one place.
2.  You can change rules external to code ...
  2a. the same pipeline can be catered to different projects without changing 
code in an annotator or creating a new annotator.
  2b.  An end user who knows nothing about ctakes can change a ruta script to 
fit their purposes.
3. Rules are supported and documented by uima ruta, so you don't have to worry 
about that extra headache.
4. Once Greg adds it to apache ctakes (right? ;^) everybody in the community 
can apply ruta rules to their project.

When I looked at it a few years ago it was for reason 2b.  In the end we went 
for different annotators like Peter and Kean outlined and just use piper file 
changes to satisfy #2 as that is definitely much easier.  However, it doesn't 
benefit the community as a whole (#4).

Cheers all, this is a great conversation!

Sean





From: Kean Kaufmann 
Sent: Wednesday, May 19, 2021 7:50 AM
To: dev@ctakes.apache.org
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


> yes,  the line between "lookup" and rule execution is a little blurry
sometimes.

Sure is.  I blur it with a set of annotators that extend dictionary
annotations based on words or annotations covered by the same Chunk, e.g.

DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
DiseaseDisorderMention
ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention

Higher recall than the regular UmlsLookupAnnotator;
higher precision than the UmlsOverlapLookupAnnotator (which skips a
specified number of tokens regardless of syntax).

I've been wanting a more general framework to fit this into, and thinking
it might be Ruta.
Thanks for the pointer to TokensRegex; I'll look at that as well.


On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch 
wrote:

> Hi All,  yes,  the line between "lookup" and rule execution is a little
> blurry sometimes.   Here's some more blurriness.
>
> I've done something related, adapting a UIMA tokens regex engine for
> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> CONLLDEP Annotations as the tokens to reason over.   You can set up
> expressions (rules) that look like this.
> (Yes, this case is already covered in the dictionary, but it's an example)
>
> Matcher A:   (lemma=="be");
> Matcher B:   /partially|partly/;
> Matcher C:   /vaccinated/;
>
> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
>
> You get the Annotation you've delegated to this task, with the entity
> value  "vaccinated|1234|5678"  and the range which spanned the tokens that
> caused the annotation rule to fire
>
> (See Stanford's Tokens Regex)
>
> Peter
>
>
> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
>
> > But Sean, isn't what he's asking for essentially already implemented in
> > cTAKES as the custom dictionary? I'm currently using that approach for my
> > covid container:
> >
> >
> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
> > Tim
> >
> > 
> > From: Finan, Sean 
> > Sent: Tuesday, May 18, 2021 11:55 AM
> > To: dev@ctakes.apache.org
> > Cc: Himanshu Shekhar Sahoo
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPI

Re: rule-based lookup for custom lexicon [EXTERNAL]

2021-05-18 Thread Finan, Sean
Hi Greg,

>From 30,000 ft, I think that you would want to use the RutaEngine.

https://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.ae.basic
https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html
http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java

That seems to be the actual analysis engine that loads and uses rules to create 
annotations.
While you could use an xml descriptor or use the piper "set" command and do 
things like mapping ruta to ctakes type systems, I would take the alternate 
approach of "copying" the initialize(..) and process (..) methods and modify 
them to use ctakes types directly.

Disclaimer:  I know very little about uima ruta.  At some point I did look into 
it but it was for a specific (ctakes-derivative) project and I didn't go 
further than basic doc perusal.  

If you move forward with this please let us all know what you find.  I think 
that there will be great interest in the community.

Sean

From: Greg Silverman 
Sent: Tuesday, May 18, 2021 11:13 AM
To: dev@ctakes.apache.org
Cc: Himanshu Shekhar Sahoo
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]

* External Email - Caution *


Hi Sean,
I was wondering if there was a way to use rule-base lookup of a custom
lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
When I Googled around, I stumbled on UIMA Ruta, but couldn't find anything
wrt to cTAKES specifics.

Thanks!


Greg--

On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

>  To which ctakes component(s) are you referring?
> 
> From: Greg Silverman 
> Sent: Sunday, May 16, 2021 6:02 PM
> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>
> * External Email - Caution *
>
>
> I looked all over and could not find any information on how to add this
> pipeline component to cTAKES. I assume it uses UIMA Ruta?
>
> Thanks in advance!
>
> Greg--
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> >
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>


--
Greg M. Silverman
Senior Systems Developer
NLP/IE 
<https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
 >
Department of Surgery
University of Minnesota
g...@umn.edu


Re: rule-based lookup for custom lexicon [EXTERNAL]

2021-05-18 Thread Finan, Sean
 To which ctakes component(s) are you referring? 

From: Greg Silverman 
Sent: Sunday, May 16, 2021 6:02 PM
To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
Subject: rule-based lookup for custom lexicon [EXTERNAL]

* External Email - Caution *


I looked all over and could not find any information on how to add this
pipeline component to cTAKES. I assume it uses UIMA Ruta?

Thanks in advance!

Greg--
--
Greg M. Silverman
Senior Systems Developer
NLP/IE 

Department of Surgery
University of Minnesota
g...@umn.edu


Re: svn or github [EXTERNAL]

2021-05-03 Thread Finan, Sean
Thanks Javi,

I am aware of github's lfs.  It is a good idea, but I am not sure how well it 
would work for a project with a community as large as ctakes.  Using the lfs 
tool seems like it is an inhibitor to easy adoption - which is not where we 
want to go.

Just to be clear, I am not talking about a migration to github.  The desire is 
to have a mirror of the svn repo on github.  The last I spoke with apache infra 
on this, the lfs was not a viable solution to the problem because it didn't fit 
into the mirroring technique.  The details on that were all behind a door that 
I never opened, so that is where my knowledge of the matter ends.

Thanks, and keep the ideas rolling,

Sean

From: Javi Roman 
Sent: Monday, May 3, 2021 4:13 AM
To: dev@ctakes.apache.org
Subject: Re: svn or github [EXTERNAL]

* External Email - Caution *


A way to properly work with large files and GitHub is to use the Git Large
File Storage (LFS) plugin created by GitHub. The following is a session
using this feature:

$ ctakes-testbed.git(main)]$ find -size +50M
./rest-api/healthnlp-examples/ctakes-temporal-demo/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script
./rest-api/healthnlp-examples/ctakes-temporal-demo/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx.script
./rest-api/healthnlp-examples/ctakes-web-client/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script
./rest-api/healthnlp-examples/ctakes-web-client/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx.script
$ git lfs track ctakessnorx.script
$ git lfs track ctakessnorx.script
$ git lfs track
Listing tracked patterns
ctakessnorx.script (.gitattributes)
ctakessnorx.script (.gitattributes)
$ git add .gitattributes
$ git add .
$ git commit -m "."
$ git push origin main

That enables the git version control system to track huge binary blobs. It
does so by creating a text-based reference to the blob, then tracking and
storing the blob in a location external to the git repository itself, in
this case hosted by GitHub.

This is just an idea.
--
Javi Roman

Twitter: @javiromanrh
GitHub: github.com/javiroman
Linkedin: es.linkedin.com/in/javiroman
Big Data Blog: dataintensive.info


On Tue, Apr 27, 2021 at 8:16 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Javi,
>
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!8HtG2DkQcka-Vf_LqzSiO-3nGu0P_X_2bTHtEX0UqdhZaxMS0c5Tp0FG7ONqFNniFlzhAI3QDKI$
> is / was an attempt at a mirror of the svn trunk repository.  There is
> nothing more complicated than that.
>
> Sean
>
> 
> From: Javi Roman 
> Sent: Tuesday, April 27, 2021 12:58 PM
> To: dev@ctakes.apache.org
> Subject: Re: svn or github [EXTERNAL]
>
> * External Email - Caution *
>
>
> Many thanks Sean.
>
> Any documentation about the repositories organization in Subversion? If I
> understand correctly, the mirror in Github is only the trunk folder in
> Subversion.
>
> --
> Javi Roman
>
> Twitter: @javiromanrh
> GitHub: github.com/javiroman
> Linkedin: es.linkedin.com/in/javiroman
> Big Data Blog: dataintensive.info
>
>
> On Tue, Apr 27, 2021 at 5:15 PM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Javi,
> >
> > I too would like to get more developers and activity with source
> available
> > on Github.  Hopefully you can help us do it.
> >
> > One problem that we had in the past concerning use of Github is caused by
> > large machine learning models in ctakes.  Github has file size limits for
> > repositories and some of our models surpassed these limits, which caused
> a
> > corruption of the original migration attempt and errors with subsequent
> > auto-merge checkins.  ctakes had to be removed from the svn : github
> > "mirroring".
> >
> > While large files (models, etc.) can be hosted as "release" binaries in
> > github, modifying ctakes' github use in such a way breaks mirroring that
> > would keep both the apache svn and github repositories synchronized.
> > Removing the large model files from the svn area could require further
> > customization of not only that layout but also getting things published
> in
> > maven central.
> >
> > There might be a simple way to reorganize files, simply maintain version
> > control on large files, keep repository mirroring and publication
> automated
> > and document the whole paradigm so that a community can support it.
> > Unfortunately, when this topic was last visited nobody authored or
> > implemented such a solution.
> >
> > It

Re: svn or github [EXTERNAL]

2021-04-27 Thread Finan, Sean
Hi Javi,

https://github.com/apache/ctakes 
is / was an attempt at a mirror of the svn trunk repository.  There is nothing 
more complicated than that.

Sean


From: Javi Roman 
Sent: Tuesday, April 27, 2021 12:58 PM
To: dev@ctakes.apache.org
Subject: Re: svn or github [EXTERNAL]

* External Email - Caution *


Many thanks Sean.

Any documentation about the repositories organization in Subversion? If I
understand correctly, the mirror in Github is only the trunk folder in
Subversion.

--
Javi Roman

Twitter: @javiromanrh
GitHub: github.com/javiroman
Linkedin: es.linkedin.com/in/javiroman
Big Data Blog: dataintensive.info


On Tue, Apr 27, 2021 at 5:15 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Javi,
>
> I too would like to get more developers and activity with source available
> on Github.  Hopefully you can help us do it.
>
> One problem that we had in the past concerning use of Github is caused by
> large machine learning models in ctakes.  Github has file size limits for
> repositories and some of our models surpassed these limits, which caused a
> corruption of the original migration attempt and errors with subsequent
> auto-merge checkins.  ctakes had to be removed from the svn : github
> "mirroring".
>
> While large files (models, etc.) can be hosted as "release" binaries in
> github, modifying ctakes' github use in such a way breaks mirroring that
> would keep both the apache svn and github repositories synchronized.
> Removing the large model files from the svn area could require further
> customization of not only that layout but also getting things published in
> maven central.
>
> There might be a simple way to reorganize files, simply maintain version
> control on large files, keep repository mirroring and publication automated
> and document the whole paradigm so that a community can support it.
> Unfortunately, when this topic was last visited nobody authored or
> implemented such a solution.
>
> It has been many years since this topic was discussed, maybe some fresh
> perspectives or modernizations can get ctakes on github.
>
> Thanks,
> Sean
>
> 
> From: Javi Roman 
> Sent: Tuesday, April 27, 2021 10:26 AM
> To: dev@ctakes.apache.org
> Subject: Re: svn or github [EXTERNAL]
>
> * External Email - Caution *
>
>
> I've just seen the development is based on subversion.
>
> It looks like some movement for migrating the subversion to GitHub (most of
> ASF projects migrated to github) in this issue [1], however the issue was
> created at 19/Nov/17 (it's in progress) and there aren't updates.
>
> I would like to open this discussion (fully migration to git) in order to
> get more developers and activity with an easier interface.
>
> Many thanks.
>
>
> [1]
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CTAKES-482__;!!NZvER7FxgEiBAiR_!4oyqgJknp0p2BR3zyrRLt-jYvzQkbeztpZ3Dx0lSJIsYxv97mbcSdUFU3W1H4BGE76GgEL4G-58$
> --
> Javi Roman
>
> Twitter: @javiromanrh
> GitHub: github.com/javiroman
> Linkedin: es.linkedin.com/in/javiroman
> Big Data Blog: dataintensive.info
>
>
> On Tue, Apr 27, 2021 at 3:53 PM Javi Roman 
> wrote:
>
> > Hi community!
> >
> > Is cTakes development currently done in github or subversion?
> >
> > --
> > Javi Roman
> >
> > Twitter: @javiromanrh
> > GitHub: github.com/javiroman
> > Linkedin: es.linkedin.com/in/javiroman
> > Big Data Blog: dataintensive.info
> >
>


Re: svn or github [EXTERNAL]

2021-04-27 Thread Finan, Sean
Hi Javi,

I too would like to get more developers and activity with source available on 
Github.  Hopefully you can help us do it.

One problem that we had in the past concerning use of Github is caused by large 
machine learning models in ctakes.  Github has file size limits for 
repositories and some of our models surpassed these limits, which caused a 
corruption of the original migration attempt and errors with subsequent 
auto-merge checkins.  ctakes had to be removed from the svn : github 
"mirroring".

While large files (models, etc.) can be hosted as "release" binaries in github, 
modifying ctakes' github use in such a way breaks mirroring that would keep 
both the apache svn and github repositories synchronized.  Removing the large 
model files from the svn area could require further customization of not only 
that layout but also getting things published in maven central.

There might be a simple way to reorganize files, simply maintain version 
control on large files, keep repository mirroring and publication automated and 
document the whole paradigm so that a community can support it.  Unfortunately, 
when this topic was last visited nobody authored or implemented such a solution.

It has been many years since this topic was discussed, maybe some fresh 
perspectives or modernizations can get ctakes on github.

Thanks,
Sean


From: Javi Roman 
Sent: Tuesday, April 27, 2021 10:26 AM
To: dev@ctakes.apache.org
Subject: Re: svn or github [EXTERNAL]

* External Email - Caution *


I've just seen the development is based on subversion.

It looks like some movement for migrating the subversion to GitHub (most of
ASF projects migrated to github) in this issue [1], however the issue was
created at 19/Nov/17 (it's in progress) and there aren't updates.

I would like to open this discussion (fully migration to git) in order to
get more developers and activity with an easier interface.

Many thanks.


[1] 
https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CTAKES-482__;!!NZvER7FxgEiBAiR_!4oyqgJknp0p2BR3zyrRLt-jYvzQkbeztpZ3Dx0lSJIsYxv97mbcSdUFU3W1H4BGE76GgEL4G-58$
--
Javi Roman

Twitter: @javiromanrh
GitHub: github.com/javiroman
Linkedin: es.linkedin.com/in/javiroman
Big Data Blog: dataintensive.info


On Tue, Apr 27, 2021 at 3:53 PM Javi Roman  wrote:

> Hi community!
>
> Is cTakes development currently done in github or subversion?
>
> --
> Javi Roman
>
> Twitter: @javiromanrh
> GitHub: github.com/javiroman
> Linkedin: es.linkedin.com/in/javiroman
> Big Data Blog: dataintensive.info
>


  1   2   3   4   5   6   7   8   9   >