Similar question as Zaid's first question - keyword extraction along TF-IDF 
logic.  Specifically, I have a corpus of ~10K articles and am looking to 
get a ranking of all the tokenized terms in each article based on their 
frequency in the article and the terms relative frequency across the 
corpus.  Thanks!

On Sunday, May 3, 2015 at 3:49:07 AM UTC-4, Zaid Amir wrote:
>
> Hi,
>
> I am wondering if it is possible at all to get the top ten most frequent 
> words in an Elasticsearch field across an entire index or alias.
>
> Here is what I'm trying to do:
>
> I am indexing text documents extracted from various document types (Word, 
> Powerpoint, PDF, etc) these are analyzed and stored in a field called 
> doc_content. I would like to know if there is a way to find the most 
> frequent word(s) in a particular index that are stored in the doc_content 
> field.
>
> To make it clearer, lets assume I am indexing invoices from Amazon and 
> eBay for example. Now lets assume I have 100 invoices from amazon and 20 
> invoices from ebay. Lets also assume that the word "amazon" occurs twice in 
> each amazon invoice and the word "ebay" occurs 3 times in each ebay 
> invoice. 
>
> Now, is there a way to get an aggregate of sort that tells me that the 
> word "amazon" appears in my index 200 times (100 invoices x 2 
> occurrences/invoice) and the word "ebay" occurs 60 times (20 invoices x 3 
> occurrences/invoice).
>
>
> My other question is if the former is possible, then is there a way to 
> determine what is the most frequent word that comes after a certain word?
>
> For example: lets assume I have 100 documents. 60 of these documents 
> contains the term "Old Cat" and 40 contains the term "Old Dog" and for the 
> sake of argument lets assume that these words only appear once in each 
> document.
>
> Now, if we can get the frequency of the word "old" which in our case 
> should be 100. Can we then determine a relation to the word that comes 
> right after it to have something like this:
>
>
>               __________ Cat (60)
>               |
> Old (100) |
>               |__________ Dog (40)
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aebf97f7-e20f-4d8e-a513-f79df4256b71%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to