The cTAKES sentence detector is not changed in the YTEX branch. The YTEX branch has an *additional* sentence detector that does not automatically split sentences on newlines - users can use this if they like.
-vj On Thu, Feb 6, 2014 at 1:01 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Vijay, > > > I have yet to run across clinical text from a real EMR where newlines > represent the end of a sentence > > Since James pointed out this possibility a couple weeks ago, I have kept > my eyes open. The problem is pretty ubiquitous in a corpus that I'm > working with right now. I just opened the first note and gave it a count > ... 95 lines total, 9 are sentence/phrase (lacking punctuation) endings. > This is not including lists, which comprise about half of the note. > One possible conjoinment was "Will consider [...] biopsy\nGiven [...]". > Depending upon how cTakes deals with it, the meaning could change > drastically. > > > I believe cTAKES absolutely has to support sentences with newlines > within them > > Yes, cTakes should do so, but I hope that you aren't suggesting that it > only support such a structure. > > Where is that easy button? > > -----Original Message----- > From: vijay garla [mailto:vnga...@gmail.com] > Sent: Thursday, February 06, 2014 10:31 AM > To: dev@ctakes.apache.org > Cc: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org; > vlad.valtchi...@gmail.com > Subject: Re: YTEX cTAKES 3.1.1 ready > > I believe it is worth migrating to trunk. > > Note that the sentence detector is also complementary - the existing > ctakes sentence detector is unchanged - users can choose which sentence > detector to use. There are changes to assertion & dependency parsing to > support sentences without newlines, and that works with both sentence > detectors. > > I believe cTAKES absolutely has to support sentences with newlines within > them - I have yet to run across clinical text from a real EMR where > newlines represent the end of a sentence - the changes to assertion & > dependency parsing will have to be done at some point. > > -vj > > > On Thu, Feb 6, 2014 at 10:19 AM, Chen, Pei > <pei.c...@childrens.harvard.edu>wrote: > > > VJ, > > Aside from the changes to the existing cTAKES code (sentence detector, > > etc.) [which we could leave out if it's still being debated], Do you > > think it's worth migrating the ytex code to trunk at this point? > > As you mentioned earlier, it's largely complementary. > > [I was just thinking of saving effort to maintain the separate branch > > and for simplicity for dev...] > > > > --Pei > > > > > -----Original Message----- > > > From: vijay garla [mailto:vnga...@gmail.com] > > > Sent: Wednesday, February 05, 2014 9:30 PM > > > To: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org; > > > vlad.valtchi...@gmail.com > > > Subject: Re: YTEX cTAKES 3.1.1 ready > > > > > > Hi Vlad, > > > > > > I Updated the umls install guide; see > > > https://code.google.com/p/ytex/wiki/UMLS_SQL_SERVER_3_1 > > > > > > I would prefer to add the docs in the ctakes confluence, but as far > > > as I > > can > > > tell, I don't have write access there - can somebody give me write > > privileges > > > on the ctakes confluence site? > > > > > > There was a bug in the umls install; copy > > > https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes- > > > ytex/scripts/data/build.xmlover > > > the corresponding file in your ctakes-3.1.2 install > > > (CTAKES_HOME\bin\ctakes-ytex\scripts\data) and you should be set. > > > The import is currently running on the UMLS 2013AA (I assume this > > > will > > complete > > > without issues as long as the umls schema hasn't changed from 2012). > > > > > > what trial and error did you have to go through to build the distro? > > > > > > -vj > > > > > > > > > On Wed, Feb 5, 2014 at 5:33 PM, vijay garla <vnga...@gmail.com> wrote: > > > > > > > Hi Vlad, > > > > > > > > sorry that the instructions aren't clear. > > > > > > > > re 1) What I am trying to say is install > > > > apache-ctakes-3.2.0-snapshot as usual (this is unchanged from > > > > 3.1.1). After that you still have to apply the lib and resources > > > > (these are things that cannot be distributed via apache). > > > > > > > > re 2) Yes, I need to update those docs. Hopefully will get to > > > > that at some point. However, I assume you already have a UMLS DB > > > > (also assume SQL Server). If you can't/don't want to use your > > > > existing umls DB, please tell me. The I'll priortize upgrading > > > > the doc on importing the umls tables (the scripts are there). > > > > > > > > best, > > > > > > > > VJ > > > > > > > > > > > > On Wed, Feb 5, 2014 at 4:44 PM, <vlad.valtchi...@gmail.com> wrote: > > > > > > > >> Hi VJ- > > > >> > > > >> so, with trial and error were able to make the distribution and > > > >> now have the apache-ctakes-3.1.2-SNAPSHOT-bin.zip archive. > > > >> > > > >> Here's what's unclear. > > > >> > > > >> 1. Is now this the only (combined) thing that you need for ctakes > > > >> 3.1.1 + Ytex? > > > >> the current documentation (https://code.google.com/p/yte > > > >> x/wiki/Installation_cTAKES_3_1?ts=1388793998&updated=Instal > > > >> lation_cTAKES_3_1) > > > >> which most probably is outdated, talks about installing cTakes > > > >> 3.1.1 first and then applying 2 SNAPSHOT archives (downloadable) > > > >> , lib and resources. > > > >> This is a confusion point. > > > >> > > > >> 2. The directions to import UMLS subset are then outdated as well. > > > >> Maybe one should use the old version (ctakes 2.5 and ytex 0.8) to > > > >> import the RRF files for the UMLS subset and then just use the > > > >> resulting db. Thoughts? > > > >> > > > >> Thanks, > > > >> Vlad Valtchinov > > > >> Brigham Rad > > > >> > > > >> > > > >> On Thursday, January 30, 2014 5:17:43 PM UTC-5, vijay garla wrote: > > > >> > > > >>> Hi Vlad, > > > >>> > > > >>> > > > >> All of ytex has been moved into ctakes, it is currently in a > > > >> branch ( > > > >>> https://svn.apache.org/repos/asf/ctakes/branches/ytex). You > > > >>> don't have to install ytex-0.8 - instead you will have to build > > > >>> and install from the ytex branch to create your own > > > >>> distribution. Steps > > 2 & 3 > > > are correct. > > > >>> > > > >>> Although it is a pain, if you have the jdk, maven, and svn, you > > > >>> can easily build your own distro: > > > >>> * open a command prompt > > > >>> * make sure jdk, maven, and svn are in your path > > > >>> * cd to some directory where you want to check stuff out (I like > > > >>> c:\temp) > > > >>> * run the following commands > > > >>> rmdir /s /q ctakes > > > >>> svn co https://svn.apache.org/repos/asf/ctakes/branches/ytex > > > >>> ctakes cd ctakes mvn clean install -DskipTests > > > >>> > > > >>> And you will have the ctakes (with ytex) distro in > > > >>> ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-b > > > >>> in.z > > > >>> ip > > > >>> > > > >>> What is the process for getting the ytex branch merged into trunk? > > > >>> As I mentioned, there are very few changes to other ctakes > > > >>> classes/types - this should be completely complementary and not > > > >>> affect any existing ctakes functionality. > > > >>> > > > >>> -vj > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Thu, Jan 30, 2014 at 4:56 PM, <vlad.va...@gmail.com> wrote: > > > >>> > > > >>>> Hi VJ-- > > > >>>> > > > >>>> this is great!! Thanks for all the hard work on it! > > > >>>> > > > >>>> We're starting to look into the new install. For now we're > > > >>>> trying the binaries out. > > > >>>> > > > >>>> There were these questions about the proper install steps: > > > >>>> > > > >>>> 1. Do we first install ytex-0.8 2. Then install the new cTakes > > > >>>> 3.1.1 instance and also apply the SNAPSHOT lib and resources > > > >>>> zips 3. Work our way to install the UMLS ontologies in the db > > > >>>> > > > >>>> Its is not entirely clear from the new document ( > > > >>>> https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_ > > > >>>> 1?ts=1388793998&updated=Installation_cTAKES_3_1) > > > >>>> if there's still need to install ytex-0.8, or YTEX has been > > > >>>> entirely merged into cTakes? > > > >>>> > > > >>>> If the last statement is correct, there are missing parts in > > > >>>> i.e the UMLS install steps that are linked from the new ctakes > > > >>>> 3.1.1 > > > document. > > > >>>> > > > >>>> Thanks, > > > >>>> vlad > > > >>>> > > > >>>> > > > >>>> On Friday, January 3, 2014 10:21:52 PM UTC-5, vijay garla wrote: > > > >>>>> > > > >>>>> Hello All, > > > >>>>> > > > >>>>> I have finished an initial cut at the port of YTEX to cTAKES > 3.1.1. > > > >>>>> Most of the YTEX functionality has been ported and integrated > > > >>>>> with cTAKES, and I've tested with MySQL and MS SQL Server > > > >>>>> (oracle > > > tests pending). > > > >>>>> > > > >>>>> Most of the changes were made in new projects - very little > > > >>>>> existing cTAKES code has been modified. The only non-trivial > > > >>>>> changes are in > > > >>>>> /ctakes- > > > assertion/src/main/java/org/apache/ctakes/assertion/medfac > > > >>>>> ts/i2b2/api > > > >>>>> - here I modified > > > >>>>> CharacterOffsetToLineTokenConverterCtakesImpl & > > > >>>>> SingleDocumentProcessorCtakes to deal with newlines within > > > >>>>> sentences correctly. Can somebody take a look at the changes > > > >>>>> in > > the > > > ytex branch? > > > >>>>> > > > >>>>> I believe that the branch https://svn.apache.org/ > > > >>>>> repos/asf/ctakes/branches/ytex is ready to be merged into > > > >>>>> ctakes trunk, but would like other users to test it as well. > Questions: > > > >>>>> > > > >>>>> * How can I distribute the ctakes binary distribution to ytex > > > >>>>> users before the merge? Can we make the branch build available > > > >>>>> somewhere? The binary distribution is too large to host on > > > >>>>> the ytex google code site (max > > > >>>>> 200 MB) > > > >>>>> * Non-ASF libraries - I have segregated these out into their > > > >>>>> own zip file that can be distributed via sourceforge. As a > > > >>>>> stopgap, I can upload this to the ytex google code site, but > > > >>>>> would prefer to upload to sourceforge. > > > >>>>> * UMLS Derivatives - Ditto for these - would like to move to > > > >>>>> sourceforge. > > > >>>>> * Documentation - How can I update the confluence docs? I > > > >>>>> would migrate the documentation from the google code website. > > > >>>>> > > > >>>>> Here the installation instructions (putting the wagon in front > > > >>>>> of the horse ...) > > > >>>>> > > > >>>>> https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_1? > > > >>>>> ts=1388793998&updated=Installation_cTAKES_3_1 > > > >>>>> > > > >>>>> Best, > > > >>>>> > > > >>>>> VJ > > > >>>>> > > > >>>>> > > > >>>>> -- > > > >>>> You received this message because you are subscribed to the > > > >>>> Google Groups "ytex-users" group. > > > >>>> To unsubscribe from this group and stop receiving emails from > > > >>>> it, send an email to ytex-users+...@googlegroups.com. > > > >>>> To post to this group, send email to ytex-...@googlegroups.com. > > > >>>> To view this discussion on the web visit > > > >>>> https://groups.google.com/d/ > > > >>>> msgid/ytex-users/70f03a80-ce1a-4c0e-b35d-5116d1c93ea0% > > > >>>> 40googlegroups.com. > > > >>>> > > > >>>> For more options, visit https://groups.google.com/groups/opt_out. > > > >>>> > > > >>> > > > >>> -- > > > >> You received this message because you are subscribed to the > > > >> Google Groups "ytex-users" group. > > > >> To unsubscribe from this group and stop receiving emails from it, > > > >> send an email to ytex-users+unsubscr...@googlegroups.com. > > > >> To post to this group, send email to ytex-us...@googlegroups.com. > > > >> To view this discussion on the web visit > > > >> https://groups.google.com/d/msgid/ytex-users/bc3bd705-55d2-4acd- > > > a273- > > > >> a3b1a7b36241%40googlegroups.com > > > >> . > > > >> > > > >> For more options, visit https://groups.google.com/groups/opt_out. > > > >> > > > > > > > > > > >