Hi Ramdev,
>>Is tehre other functionality (experimental or otherwise) within ES that can help me do this ? I'd recommend splitting HTML files that are clearly referencing multiple diverse news stories into multiple ES documents based on title headings or whatever indicates the start/end of each news item. For boilerplate-removal I have previously used this analyzer on an earlier incarnation of the significant_terms algo: https://issues.apache.org/jira/browse/LUCENE-725 Cheers Mark -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae098032-ac92-4de3-a0f5-681d3b4c1031%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.