Hi mark:
    That was just one example. The Documents were news articles. Hence the 
broad coverage and not specific on -topic documents.  Since this is news 
from third party sources, I do not have control over  what comes into the 
index. (i.e. separate the machine generated from manually edited/curated). 
That said, I could perhaps whittle the content down by making sure that the 
documents processed are indeed worthy news articles and not random blog 
posts and non-releavnt docs. 

I do agree with your earlier comment that the query may be too broad.  As I 
have already mentioned, Its news articles. If these news articles (which 
are provided by various sources) come with boilerplate text, Other than 
process the document to remove it I cannot do much else. (for now we are 
not looking into removing the boilerplate text as it might provide us with 
some insight into other information). 

The initial investigative exercise in using the Significant terms was to 
identify terms that could perhaps enhance the content returned.  There is 
of course some manual editing of the significant terms to remove 
nonsensical terms(in context,  of course) to get to the final list of terms 
to be added to my query. 

Is tehre other functionality (experimental or otherwise) within ES that can 
help me do this ?



On Friday, 2 May 2014 18:17:41 UTC-5, Mark Harwood wrote:
>
> Pages like this suggest where the terms "patented" "resistance" and "
> marketintelligence.com's" are being picked up: 
> http://www.marketintelligencecenter.com/artificialintelligence.aspx?p=4
> Much of it looks machine-generated.
>
> Too much repetition of stock phrases mixed in with diverse topics make it 
> hard to pick up any kind of signal if this is the content you are including 
> in your searches.
>
> Cheers,
> Mark
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac541fd0-4143-47dc-a694-f770e0236b7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to