Since you maybe looking at Drupal intergratin down the path, I would look at using python znd the NLTK , and develop a web service that coild ghen be used by drupal On 01/07/2014 11:13 PM, "Katie" <konrad.ka...@gmail.com> wrote:
> Hello, > > Has anyone here experience in the world of natural language programming > (while applying information retrieval techniques)? > > I'm currently trying to develop a tool that will: > > 1. take a pdf and extract the text (paying no attention to images or > formatting) > 2. analyze the text via term weighting, inverse document frequency, and > other natural language processing techniques > 3. assemble a list of suggested terms and concepts that are weighted > heavily in that document > > Step 1 is straightforward and I've had much success there. Step 2 is the > problem child. I've played around with a few APIs (like AlchemyAPI) but > they have character length limitations or other shortcomings that keep me > looking. > > The background behind this project is that I work for a digital library > with a large pre-existing collection of pdfs with rudimentary metadata. The > aforementioned tool will be used to classify and group the pdfs according > to the themes of the library. Our CMS is Drupal so depending on my level of > ambition, this *might* develop into a module. > > Does this sound like a project that has been done/attempted before? Any > suggested tools or reading materials? >