Well, how I saw it was, if we simply do normal word count, we wont get the
full picture as the pronouns present in the paragraph wont add up to
increase the importance of a particular word. Suppose, we are talking about
a person X and after the first sentence if X is referred to by "He" or
"She", then we will actually end up with something that has been given
secondary importance.

And, to say in strict terms, it is more a computation theory work. It isn't
totally Mahout-sh, as Ted said. The reason I wanted to do it with Mahout is
this could be generalized and taken and implemented to cloud. This would
improve the searching and would yield a better result.

If possible, we can do an IRC session, if you guys are comfortable with it.


On Wed, Mar 23, 2011 at 10:09 PM, Ted Dunning <[email protected]> wrote:

> Another important question is whether this is something that is Mahout-ish.
>
> Mahout is a project that supports scalable data mining.  That currently
> includes a mature recommendation framework, less mature clustering and
> classification tools and a smattering of other tools.
>
> What you are proposing sounds a bit more like an application made up of
> different tools, possibly some from Mahout, and some from other sources.
>
> How do you see this?
>
>
> On Wed, Mar 23, 2011 at 9:37 AM, Ted Dunning <[email protected]>wrote:
>
>> Let's take this back to the mailing list so all can see.
>>
>> If you are familiar with the stanford parser, then this seems like a
>> feasible project for you to accomplish.  I would expect that very similar
>> results could be achieved using simple word or phrase counts, possibly with
>> the addition of a chunker.  My guess is that the parser would add very
>> little.
>>
>> Stefan Henß did some interesting and very simple work, for instance, for
>> automated FAQ generation that avoided parsing:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%[email protected]%3E
>>
>> On Wed, Mar 23, 2011 at 3:24 AM, Harsh <[email protected]> wrote:
>>
>>> I want to build over the Stanford parser (the one I am familiar with) and
>>> want to create a dependency graph for the sentences. The most occurring
>>> words in any paragraph generally depicts its theme. With the help of the
>>> dependency developed and word count, I want to guess the theme of the
>>> paragraph.
>>>
>>>
>

Reply via email to