Steve and Dima, Thanks very much for these replies, this is very helpful to get started. I'll give this a try over the next couple of weeks, will let you know how it works.
Cheers, Will -----Original Message----- From: Steven Bethard [mailto:[email protected]] Sent: Friday, August 30, 2013 9:37 AM To: [email protected] Subject: Re: resources for training modules On Fri, Aug 30, 2013 at 8:46 AM, Dmitriy Dligach <[email protected]> wrote: > Retraining the relation extractor should be fairly easy. The > instructions I am about to give you apply if you are using cTAKES 3.0. > However, if you are planning to use the trunk version, my instructions > may no longer be accurate. Relation extraction has undergone some > changes recently in connection with cTAKES-190 issue and I don't fully > understand these most recent changes yet (but I am working on it). With the trunk version, there's no need to run PreprocessAndWriteXmi. Just run RelationExtractorEvaluation or RelationExtractorTrain directly. (The XMIs will be automatically written to target/xmi.) I believe the only required argument is --batches-dir, which gives the directory containing the directories containing Knowtator_XML directories. The other (optional) arguments should be similar to what Dima described (and you can see what they are by looking at the static Options classes (and their superclasses) in RelationExtractorEvaluation and RelationExtractorTrain). > If you are planning to annotate your data, it might be easier to use > Knowtator since we already have a gold standard reader for Knowtator. > If you want to use a different annotation tool, you just have to make > sure you add the manual annotations to the gold view of the XMI files. In the trunk version, most of the SHARP-specific stuff is handled by the SHARPXMI class. So if you need to customize things away from what was done for SHARP, that's probably where you'll need to go. At the moment, RelationExtractorTrain and RelationExtractorEvaluation call static methods on SHARPXMI, which means that it's not very extensible. We could conceivably change these methods to non-static methods, and then extensions of relation-extractor could provide their own implementation. We're certainly open to modifying the infrastructure here, so if you have any suggestions please do pass them on. Steve
