Apache cTAKES GitHub mirror is stuck in 2019

2022-06-02 Thread Richard Eckart de Castilho
Hi,

it appears that the GitHub mirror of Apache cTAKES may be stuck.

When I check the svn log of https://svn.apache.org/repos/asf/ctakes/trunk/, I 
can
see activity as recent as May 2022.

However, on GitHub, I can only see stale branches:

https://github.com/apache/ctakes/branches

Wouldn't it be good if the GitHub mirror would be kept up-to-date?

Best,

-- Richard



Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

2022-06-02 Thread Finan, Sean
Hi Richard, you bring up a valid concern.

cTAKES Developers:

The Apache Foundation has had an initiative to "move" all projects to GitHub 
for some time now.  

I don't know much about how this is done.  If anybody out there has knowledge 
or experience that they can pass on, please share.

Thanks,
Sean

From: Richard Eckart de Castilho 
Sent: Thursday, June 2, 2022 3:39 AM
To: dev@ctakes.apache.org
Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

* External Email - Caution *


Hi,

it appears that the GitHub mirror of Apache cTAKES may be stuck.

When I check the svn log of 
https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
 , I can
see activity as recent as May 2022.

However, on GitHub, I can only see stale branches:

https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$

Wouldn't it be good if the GitHub mirror would be kept up-to-date?

Best,

-- Richard



Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

2022-06-02 Thread Richard Eckart de Castilho
On 2. Jun 2022, at 14:22, Finan, Sean 
 wrote:
> 
> I don't know much about how this is done.  If anybody out there has knowledge 
> or experience that they can pass on, please share.

When we did this for UIMA, the steps were documented here:

  https://uima.apache.org/convert-to-git.html

Not 100% sure if this is still the way to go - INFRA may know more.

Basically, if the Git(Hub) mirror is working properly, then at some point you 
can tell Infra to make it the main repo and to put SVN into read-only.
But first, the Git(Hub) mirror needs to be up-to-date.

I'm hanging out on the ASF slack e.g. in the ComDev channel - feel free to ping 
me there.

Cheers,

-- Richard

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

2022-06-02 Thread gandhi rajan
Hi Sean,

If we are sure that the SVN has all the latest changes and active
development is primarily on SVN, then why don't we request a fresh git
repository and push all the changes over there.

More info on https://infra.apache.org/svn-to-git-migration.html

On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean
 wrote:

> Hi Richard, you bring up a valid concern.
>
> cTAKES Developers:
>
> The Apache Foundation has had an initiative to "move" all projects to
> GitHub for some time now.
>
> I don't know much about how this is done.  If anybody out there has
> knowledge or experience that they can pass on, please share.
>
> Thanks,
> Sean
> 
> From: Richard Eckart de Castilho 
> Sent: Thursday, June 2, 2022 3:39 AM
> To: dev@ctakes.apache.org
> Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi,
>
> it appears that the GitHub mirror of Apache cTAKES may be stuck.
>
> When I check the svn log of
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
> , I can
> see activity as recent as May 2022.
>
> However, on GitHub, I can only see stale branches:
>
>
> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$
>
> Wouldn't it be good if the GitHub mirror would be kept up-to-date?
>
> Best,
>
> -- Richard
>
>

-- 
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"


SmokingStatus & Side effects - piper file

2022-06-02 Thread Muhammad Ali Syed
Hi there,

I am exploring cTakes Smoking Status & Side Effects components and have not
come across any piper file version of their implementation. When trying to
incrementally add AEs to FullTokenizerPipeline.piper from these 2
components I am running into issues such as:
- getting ResourceInitializationExceptions - when adding
ClassifiableEntries (did set UimaDescriptorStep1Key
and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
and ran into other issues
- exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
after adding
KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
libsvm.svm.svm_predict(svm.java:2343)

My question is: can these 2 components (containing multiple AEs) be
implemented by piper files as of now? In other words, can any pipeline,
that can be created using XML descriptor files, be also created by piper
files?

Is there any sample piper code for pipelines that include either of these
components?

Regards,


Re: SmokingStatus & Side effects - piper file [EXTERNAL]

2022-06-02 Thread Finan, Sean
Hi Muhammad,

Can you please copy & paste the contents of your piper file?

Thanks,
Sean

From: Muhammad Ali Syed 
Sent: Thursday, June 2, 2022 2:30 PM
To: dev@ctakes.apache.org
Subject: SmokingStatus & Side effects - piper file [EXTERNAL]

* External Email - Caution *


Hi there,

I am exploring cTakes Smoking Status & Side Effects components and have not
come across any piper file version of their implementation. When trying to
incrementally add AEs to FullTokenizerPipeline.piper from these 2
components I am running into issues such as:
- getting ResourceInitializationExceptions - when adding
ClassifiableEntries (did set UimaDescriptorStep1Key
and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
and ran into other issues
- exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
after adding
KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
libsvm.svm.svm_predict(svm.java:2343)

My question is: can these 2 components (containing multiple AEs) be
implemented by piper files as of now? In other words, can any pipeline,
that can be created using XML descriptor files, be also created by piper
files?

Is there any sample piper code for pipelines that include either of these
components?

Regards,


Re: SmokingStatus & Side effects - piper file [EXTERNAL]

2022-06-02 Thread Muhammad Ali Syed
Below is my piper file and the sample text I am using is
ctakes-smoking-status/data/test/doc2_07543210_sample_current.txt:
load FullTokenizerPipeline

// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger

// Add Chunkers
load ChunkerSubPipe

// Default fast dictionary lookup
load DictionarySubPipe

// Add Cleartk Entity Attribute annotators
load AttributeCleartkSubPipe

// ClassifiableEntries - errors out at
org.apache.ctakes.smokingstatus.ae.ClassifiableEntries.initialize(ClassifiableEntries.java:134)
set SectionsToIgnore=20109,20138
set
AllowedClassifications=SMOKER,CURRENT_SMOKER,NON_SMOKER,PAST_SMOKER,UNKNOWN
set UimaDescriptorStep1=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step1.xml
set UimaDescriptorStep2=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step2_libsvm.xml
add ClassifiableEntries UimaDescriptorStep1Key=UimaDescriptorStep1
UimaDescriptorStep2Key=UimaDescriptorStep2

// KuRuleBasedClassifierAnnotator-  works but commented out for now
//add KuRuleBasedClassifierAnnotator
SmokingWordsFile=/org/apache/ctakes/smokingstatus/data/KU/keywords.txt
UnknownWordsFile=/org/apache/ctakes/smokingstatus/data/KU/unknown_words.txt

// PcsClassifierAnnotator_libsvm - errors out at
libsvm.svm.svm_predict(svm.java:2343)
set StopWordsFileRes=file:
org/apache/ctakes/smokingstatus/data/PCS/stopwords_PCS.txt
set PathOfModelRes=file:
org/apache/ctakes/smokingstatus/data/PCS/pcs_libsvm-2.91.model
set PCSKeyWordFileResc=file:
org/apache/ctakes/smokingstatus/data/PCS/keywords_PCS.txt
//add PcsClassifierAnnotator_libsvm PathOfModel=PathOfModelResc
StopWordsFile=StopWordsFileRes PCSKeyWordFile=PCSKeyWordFileResc

// SideEffectAnnotator - errors out
set sideEffectDic=file:
org/apache/ctakes/sideeffect/lookup/sideEffect_dictionary.txt
//add SideEffectAnnotator sideEffectTable=sideEffectDic

addLast util.log.FinishedLogger

On Thu, Jun 2, 2022 at 2:39 PM Finan, Sean
 wrote:

> Hi Muhammad,
>
> Can you please copy & paste the contents of your piper file?
>
> Thanks,
> Sean
> 
> From: Muhammad Ali Syed 
> Sent: Thursday, June 2, 2022 2:30 PM
> To: dev@ctakes.apache.org
> Subject: SmokingStatus & Side effects - piper file [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi there,
>
> I am exploring cTakes Smoking Status & Side Effects components and have not
> come across any piper file version of their implementation. When trying to
> incrementally add AEs to FullTokenizerPipeline.piper from these 2
> components I am running into issues such as:
> - getting ResourceInitializationExceptions - when adding
> ClassifiableEntries (did set UimaDescriptorStep1Key
> and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
> and ran into other issues
> - exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
> after adding
> KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
> libsvm.svm.svm_predict(svm.java:2343)
>
> My question is: can these 2 components (containing multiple AEs) be
> implemented by piper files as of now? In other words, can any pipeline,
> that can be created using XML descriptor files, be also created by piper
> files?
>
> Is there any sample piper code for pipelines that include either of these
> components?
>
> Regards,
>


Re: SmokingStatus & Side effects - piper file [EXTERNAL]

2022-06-02 Thread Finan, Sean
There are a few things to talk about, including some bad news.

The bad news:
All 3 of those annotators precede the use of UimaFit: 
https://uima.apache.org/uimafit.html
Pipers use UimaFit to simplify specification of parameters and configure 
advances pipelines.  
Piper files will not work with older annotators such as those you wish to 
utilize.

Some good news is that the problems are in the initialization of those 
annotators and not processing.
Some refactoring of those annotators to bring them up to date shouldn't be too 
difficult.

You have exemplified one of the reasons for creating the piper paradigm, which 
is the simplification of parameter specifications.  There isn't (shouldn't be) 
any need to specify urls, resources that point to the urls, then parameters 
that point to resources. 
For instance, a piper would just have:
set StopWordsFile=org/apache/ctakes/smokingstatus/data/PCS/stopwords_PCS.txt
set PCSKeyWordFile=org/apache/ctakes/smokingstatus/data/PCS/keywords_PCS.txt
set PathOfModel=org/apache/ctakes/smokingstatus/data/PCS/pcs_libsvm-2.91.model
add PcsClassifierAnnotator_libsvm

Some information on using piper files can be found here: 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files

Sean


From: Muhammad Ali Syed 
Sent: Thursday, June 2, 2022 3:22 PM
To: dev@ctakes.apache.org
Subject: Re: SmokingStatus & Side effects - piper file [EXTERNAL]

* External Email - Caution *


Below is my piper file and the sample text I am using is
ctakes-smoking-status/data/test/doc2_07543210_sample_current.txt:
load FullTokenizerPipeline

// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger

// Add Chunkers
load ChunkerSubPipe

// Default fast dictionary lookup
load DictionarySubPipe

// Add Cleartk Entity Attribute annotators
load AttributeCleartkSubPipe

// ClassifiableEntries - errors out at
org.apache.ctakes.smokingstatus.ae.ClassifiableEntries.initialize(ClassifiableEntries.java:134)
set SectionsToIgnore=20109,20138
set
AllowedClassifications=SMOKER,CURRENT_SMOKER,NON_SMOKER,PAST_SMOKER,UNKNOWN
set UimaDescriptorStep1=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step1.xml
set UimaDescriptorStep2=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step2_libsvm.xml
add ClassifiableEntries UimaDescriptorStep1Key=UimaDescriptorStep1
UimaDescriptorStep2Key=UimaDescriptorStep2

// KuRuleBasedClassifierAnnotator-  works but commented out for now
//add KuRuleBasedClassifierAnnotator
SmokingWordsFile=/org/apache/ctakes/smokingstatus/data/KU/keywords.txt
UnknownWordsFile=/org/apache/ctakes/smokingstatus/data/KU/unknown_words.txt

// PcsClassifierAnnotator_libsvm - errors out at
libsvm.svm.svm_predict(svm.java:2343)
set StopWordsFileRes=file:
org/apache/ctakes/smokingstatus/data/PCS/stopwords_PCS.txt
set PathOfModelRes=file:
org/apache/ctakes/smokingstatus/data/PCS/pcs_libsvm-2.91.model
set PCSKeyWordFileResc=file:
org/apache/ctakes/smokingstatus/data/PCS/keywords_PCS.txt
//add PcsClassifierAnnotator_libsvm PathOfModel=PathOfModelResc
StopWordsFile=StopWordsFileRes PCSKeyWordFile=PCSKeyWordFileResc

// SideEffectAnnotator - errors out
set sideEffectDic=file:
org/apache/ctakes/sideeffect/lookup/sideEffect_dictionary.txt
//add SideEffectAnnotator sideEffectTable=sideEffectDic

addLast util.log.FinishedLogger

On Thu, Jun 2, 2022 at 2:39 PM Finan, Sean
 wrote:

> Hi Muhammad,
>
> Can you please copy & paste the contents of your piper file?
>
> Thanks,
> Sean
> 
> From: Muhammad Ali Syed 
> Sent: Thursday, June 2, 2022 2:30 PM
> To: dev@ctakes.apache.org
> Subject: SmokingStatus & Side effects - piper file [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi there,
>
> I am exploring cTakes Smoking Status & Side Effects components and have not
> come across any piper file version of their implementation. When trying to
> incrementally add AEs to FullTokenizerPipeline.piper from these 2
> components I am running into issues such as:
> - getting ResourceInitializationExceptions - when adding
> ClassifiableEntries (did set UimaDescriptorStep1Key
> and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
> and ran into other issues
> - exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
> after adding
> KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
> libsvm.svm.svm_predict(svm.java:2343)
>
> My question is: can these 2 components (containing multiple AEs) be
> implemented by piper files as of now? In other words, can any pipeline,
> that can be created using XML descriptor files, be also created by piper
> files?
>
> Is there any sample piper code for pipelines that include either of these
> components?
>
> Regards,
>


Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

2022-06-02 Thread Finan, Sean
Thank you Gandhi and Richard.

Unless somebody else beats me to it I will perform some research and see what 
approaches can be used and which might be best.  In the end the cTAKES Project 
Management Committee will need to vote for any action as sweeping as moving to 
github.

Sean

From: gandhi rajan 
Sent: Thursday, June 2, 2022 9:02 AM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

* External Email - Caution *


Hi Sean,

If we are sure that the SVN has all the latest changes and active
development is primarily on SVN, then why don't we request a fresh git
repository and push all the changes over there.

More info on 
https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$

On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean
 wrote:

> Hi Richard, you bring up a valid concern.
>
> cTAKES Developers:
>
> The Apache Foundation has had an initiative to "move" all projects to
> GitHub for some time now.
>
> I don't know much about how this is done.  If anybody out there has
> knowledge or experience that they can pass on, please share.
>
> Thanks,
> Sean
> 
> From: Richard Eckart de Castilho 
> Sent: Thursday, June 2, 2022 3:39 AM
> To: dev@ctakes.apache.org
> Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi,
>
> it appears that the GitHub mirror of Apache cTAKES may be stuck.
>
> When I check the svn log of
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
> , I can
> see activity as recent as May 2022.
>
> However, on GitHub, I can only see stale branches:
>
>
> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$
>
> Wouldn't it be good if the GitHub mirror would be kept up-to-date?
>
> Best,
>
> -- Richard
>
>

--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"


Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS]

2022-06-02 Thread Miller, Timothy
My recollection was that we ran into issues in previous attempts at migration 
with the large file sizes in our repo.
Tim


On Thu, 2022-06-02 at 20:55 +, Finan, Sean wrote:

* External Email - Caution *



Thank you Gandhi and Richard.


Unless somebody else beats me to it I will perform some research and see what 
approaches can be used and which might be best.  In the end the cTAKES Project 
Management Committee will need to vote for any action as sweeping as moving to 
github.


Sean



From: gandhi rajan <



gandhiraja...@gmail.com

>

Sent: Thursday, June 2, 2022 9:02 AM

To:



dev@ctakes.apache.org


Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]


* External Email - Caution *



Hi Sean,


If we are sure that the SVN has all the latest changes and active

development is primarily on SVN, then why don't we request a fresh git

repository and push all the changes over there.


More info on



https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$



On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean

<



sean.fi...@childrens.harvard.edu.invalid

> wrote:


Hi Richard, you bring up a valid concern.


cTAKES Developers:


The Apache Foundation has had an initiative to "move" all projects to

GitHub for some time now.


I don't know much about how this is done.  If anybody out there has

knowledge or experience that they can pass on, please share.


Thanks,

Sean



From: Richard Eckart de Castilho <



r...@apache.org

>

Sent: Thursday, June 2, 2022 3:39 AM

To:



dev@ctakes.apache.org


Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]


* External Email - Caution *



Hi,


it appears that the GitHub mirror of Apache cTAKES may be stuck.


When I check the svn log of



https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$


, I can

see activity as recent as May 2022.


However, on GitHub, I can only see stale branches:





https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$



Wouldn't it be good if the GitHub mirror would be kept up-to-date?


Best,


-- Richard




--

Regards,

Gandhi


"The best way to find urself is to lose urself in the service of others !!!"