I think you are making a very big (and very wrong) assumption here.

The non-grammaticality of these chunks does not generally adversely affect
topic identification and can actually help it quite a bit.

It is important to avoid "everybody knows" facts in your development at this
point.  Even if everybody you talk to agrees that you don't even need to
look at the data on this topic, you should still be suspicious of strong
statements without data.

On Sat, Dec 19, 2009 at 8:16 AM, Felix Lange <[email protected]> wrote:

> In particular, I have a question about building n-grams (subsets) from
> noun-chunks. In the
> power-sets of noun-chunks, we don't want to have subsets like "world's
> first". That would surely spoil the clustering. Every subset should include
> the grammatical core of the chunk, in this example, "aircraft".
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to