Hi Rupert, All I am building Speech To Text Engine ( [1] for those who need introduction). Engine requires DataFileProvider infrastructure for handling configuration file of acoustic and language modal. Basically what happens is client provides the *Acoustic Modal* *folder *, *Dictionary file* and *Language modal file* in jar file in following format. eg. sphinx4-data-1.0-SNAPSHOT.jar default modal file, it contains /edu/cmu/sphinx/models/language/en-us.lm.dmp *File* for language modal /edu/cmu/sphinx/models/acoustic/wsj/dict/cmudict.0.6d *File *for dictionary /edu/cmu/sphinx/models/acoustic/wsj/ *Folder* for acoustic modal
This jar can be added to project using the following dependency: <dependency> <groupId>edu.cmu.sphinx</groupId> <artifactId>sphinx4-data</artifactId> <version>1.0-SNAPSHOT</version> </dependency> but when clients wants to use his own modal file, Stanbol hasDataFileProvider infrastructure for handling such big binary configuration files. I went through the documentation of DataFileProvider [2] and some of the enhancement engine like Sentiment Word Classifier - source code that uses DataFileProvider service, to see the implementation of DataFileProvider , but I am not yet clear how to use it. Maybe you can provide some *insights* or *links* that provides better description of it. It will save lot of time. Regards, Suman Saurabh [1] https://sites.google.com/site/gsoc2014stanbol/home/abstract [2] http://stanbol.apache.org/docs/trunk/utils/datafileprovider