Hi mark: That was just one example. The Documents were news articles. Hence the broad coverage and not specific on -topic documents. Since this is news from third party sources, I do not have control over what comes into the index. (i.e. separate the machine generated from manually edited/curated). That said, I could perhaps whittle the content down by making sure that the documents processed are indeed worthy news articles and not random blog posts and non-releavnt docs.
I do agree with your earlier comment that the query may be too broad. As I have already mentioned, Its news articles. If these news articles (which are provided by various sources) come with boilerplate text, Other than process the document to remove it I cannot do much else. (for now we are not looking into removing the boilerplate text as it might provide us with some insight into other information). The initial investigative exercise in using the Significant terms was to identify terms that could perhaps enhance the content returned. There is of course some manual editing of the significant terms to remove nonsensical terms(in context, of course) to get to the final list of terms to be added to my query. Is tehre other functionality (experimental or otherwise) within ES that can help me do this ? On Friday, 2 May 2014 18:17:41 UTC-5, Mark Harwood wrote: > > Pages like this suggest where the terms "patented" "resistance" and " > marketintelligence.com's" are being picked up: > http://www.marketintelligencecenter.com/artificialintelligence.aspx?p=4 > Much of it looks machine-generated. > > Too much repetition of stock phrases mixed in with diverse topics make it > hard to pick up any kind of signal if this is the content you are including > in your searches. > > Cheers, > Mark > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac541fd0-4143-47dc-a694-f770e0236b7e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.