Hi again, I think Jorn was going to expand this to the other models as well once we got a handle on the XML and creation. I'll have to look into that again and see if we are saving the information to the model... which would allow us to reload the model with the same feature generator as the training.
That aside, we wouldn't be able to support everything and it would take some creative support on supporting the newer infrastructure. I don't think it would be a bad thing.... I think Jorn chose the Nanefinder first, because it was simpler to expand to the XML architecture and many already have a need to define their own series of feature generators. James On 6/13/2011 11:35 PM, [email protected] wrote: > Hi James, > > On Mon, Jun 13, 2011 at 11:32 PM, James Kosin <[email protected]> wrote: > >> On 6/13/2011 10:23 PM, [email protected] wrote: >>> Hi, >>> >>> Currently we only have implemented custom feature generators that we can >>> pass from command line only for NameFinder, but it would be very nice to >>> have it for all tools. >>> The Thai sentence detector customization is nice and simple, but to do >>> something for other languages the user would need to branch the code. We >>> should allow users to pass a factory class name from command line. Maybe >> we >>> could do it for every tool that doesn't use sequence feature generator. >> Also >>> would be nice to save the factory class name to the model to make sure we >>> are using the same feature generator during runtime and evaluation. >>> >>> What do you think? Maybe you have thought a better solution for that. >>> >>> Thanks >>> William >>> >> William, >> >> We discussed various options, unfortunately, most involved some security >> risk for the Java engine; including allowing the saving of the actual >> feature generator constructor itself to the model. Maybe the XML option >> may be a better route for the long run. We could even save the copy of >> the XML document in the model itself. But again that opens us up for >> issues if someone writes bad XML to cause issues. >> > Yes, it is very nice with the NameFinder because we can reuse code using the > XML descriptors. > > >> Maybe, we could have the feature generator a generic class that needed a >> constructor. Then each implementing language could have a new >> constructor that correctly built the feature generator. Unfortunately, >> it means a change would break any models. >> > I can't see why it would break the models. We could by default use the > current feature generators. > If we use factory to create the feature generator, the user is free to > create it using any resource (another dictionary implementation for example) > > We may need to re-open the issue when Jorn comes back or at least get >> another discussion going so we can try and weed out the issues with the >> options available. >> > Thanks > William >
