Re: Decay Function question

2014-11-10 Thread Britta Weber
Hi Mario,

I added an image which hopefully explains a little better how decay
functions work here:
https://github.com/elasticsearch/elasticsearch/pull/8420/files

> - origin is the start (x-value) of the slope, in this case the date 9/17/2013

yes

> - for all points up to offset, the slope is flat, so there's no negative 
> scoring for the 5 days after 9/17/2013

The decay function never returns a negative value. It will always be
between 0 and 1. For each value within +- offset from the defined
origin the decay function will just return 1.

> - the "end" of the slope is scale

No, decay function will decrease further. The scale parameter just
steers how quickly the function approaches 0.

- at the data point 'scale', the slope will have a y-value (on the
graph) of decay (0.5)

yes.

> My test show that they actually get a 'zero' multiplier for their score, so 
> basically their scores all end up being zero. I was under the impression that 
> they would get a score multiplier of 0.5.

No, the score will decrease further until it reaches 0.  If you need
to have documents outside the "scale" range to have a value of 0.5 you
need to define a separate function which then adjusts the score for
these documents as needed.

On Thu, Nov 6, 2014 at 8:07 PM, Marlo Epres  wrote:
> I've begun using the decay function in order to promote more recent results
> in our index. In particular I'm using what's documented here:
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
>
> Here's the date example they use (let's assume gaussian slope):
>
> "DECAY_FUNCTION": {
> "FIELD_NAME": {
>   "origin": "2013-09-17",
>   "scale": "10d",
>   "offset": "5d",
>   "decay" : 0.5
> }
> }
>
> So in trying to visualize this decay, is it safe to make the following
> assumptions in terms of associating these input values to a graph:
>
> - origin is the start (x-value) of the slope, in this case the date
> 9/17/2013
> - for all points up to offset, the slope is flat, so there's no negative
> scoring for the 5 days after 9/17/2013
> - the "end" of the slope is scale
> - at the data point 'scale', the slope will have a y-value (on the graph) of
> decay (0.5)
>
> The last point I am not so sure about. Furthermore, I'm unclear as to what
> happens for articles outside of scale, so past 10 days. My test show that
> they actually get a 'zero' multiplier for their score, so basically their
> scores all end up being zero. I was under the impression that they would get
> a score multiplier of 0.5. Any help would be appreciated.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/32d518ad-2b63-4f13-a952-0f408722bd79%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALhJbBimcB2xX49FqyXyXye7ZUrsq3gDPT31gHF0p4%2B3q%2BaXRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: bulk http api broken?

2014-04-03 Thread Britta Weber
Hi Isaac,

I believe you ran into this bug:
https://github.com/elasticsearch/elasticsearch/pull/5623
It will be fixed in upcoming releases.

Britta

On Thu, Apr 3, 2014 at 3:54 PM, Isaac Dooley  wrote:
> I think I've found an intermittent bug for bulk updates over HTTP in
> Elasticsearch. It seems to happen on versions 0.90.11, 0.90.12, 1.0.1 when
> running a single server instance (not a cluster) on linux.
>
> I do a bulk PUT with documents going to multiple indices (mostly new ones).
> In my test I then wait a few seconds and then query the mapping, and get
> back empty results for one of the indices. I notice in the logs that I see
> "creating index" messages for all the relevant indices, but the
> "update_mapping" message is missing for the index that returns an empty
> mapping. For example here are those lines from my log:
>
>
>
> 2014-03-31 18:09:27,695 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.07] creating index, cause [auto(bulk api)], shards [5]/[1], mappings
> []
> 2014-03-31 18:09:27,940 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.06] creating index, cause [auto(bulk api)], shards [5]/[1], mappings
> []
> 2014-03-31 18:09:27,984 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.05] creating index, cause [auto(bulk api)], shards [5]/[1], mappings
> []
> 2014-03-31 18:09:28,026 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.04] creating index, cause [auto(bulk api)], shards [5]/[1], mappings
> []
> 2014-03-31 18:09:28,070 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.03] creating index, cause [auto(bulk api)], shards [5]/[1], mappings
> []
> 2014-03-31 18:09:28,112 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.02] creating index, cause [auto(bulk api)], shards [5]/[1], mappings
> []
> 2014-03-31 18:09:28,153 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.01] creating index, cause [auto(bulk api)], shards [5]/[1], mappings
> []
> 2014-03-31 18:09:28,851 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.07] update_mapping [logs] (dynamic)
> 2014-03-31 18:09:29,291 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.06] update_mapping [logs] (dynamic)
> 2014-03-31 18:09:29,297 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.04] update_mapping [logs] (dynamic)
> 2014-03-31 18:09:29,327 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.02] update_mapping [logs] (dynamic)
> 2014-03-31 18:09:29,700 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.05] update_mapping [logs] (dynamic)
> 2014-03-31 18:09:29,778 INFO
> [elasticsearch[6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f][clusterService#updateTask][T#1]]
> org.elasticsearch.cluster.metadata [6fb4d619-cfa5-45ea-b1d2-e3f94c1ad23f]
> [mypartitioned_20\
> 13.11.03] update_mapping [logs] (dynamic)
>
>  The index with broken mapping does exist and contains data, but the
> _mapping HTTP endpoint gives an empty result. Is this a known issue?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/41d1f1b5-03c4-4b9d-b02f-98104fd

Re: Mixing bool and multi match/function score query

2014-04-03 Thread Britta Weber
Hi,

I am unsure I got that right - the query in the function_score can be
any query, also a bool query. If you want documents that do not match
the above query, but some other criterion (for example range query) to
be scored with the function_score functions you can just replace the
query_string with a
"bool" : {"should": [{"query_string:{...}"}, {"range": {...}}]}.
Was that your question?
If not, could you elaborate a little more on what you want to achieve?



On Wed, Apr 2, 2014 at 5:01 PM, Garry Welding  wrote:
> I'm currently doing a query that's a mix of multi match and function score.
> The important bit of the JSON looks like this:
>
> "function_score":{
> "query":{
> "query_string":{
> "query":"some query",
>
> "fields":["id","name","strippedDescription","colourSearch","sizeSearch"]
> }
> }
> }
>
> However, I also want to include results that don't necessarily match the
> query but have a particular numeric value that's greater than 0. I think a
> bool query would do this, but I don't know how to use a bool query with a
> function score query.
>
> I understand that a multi match query is just shorthand for a bool query,
> and I could expand out the multi match query into its bool counter-part,
> however, I then don't know how I would do function score within that.
>
> Any ideas? I'm on version 1.1.0 by the way.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4279a874-eef8-47ec-9f49-f91efed1f851%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALhJbBguv2u8W3%2BDvosYY4Y7xv7fkqCP-9A4WBSVdg2_1MQHJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Corrupted ElasticSearch index ?

2014-03-31 Thread Britta Weber
Hi,

I am a little late but maybe it brings some closure...I believe you
ran into this: https://github.com/elasticsearch/elasticsearch/pull/5623
The symptoms for this bug are exactly what you describe.

Britta

On Mon, Mar 17, 2014 at 10:07 PM, Mac Jouz  wrote:
>
> Finally I fixed dynamically the broken index but taking account your answer
> I'm going to add files to avoid future problems
>
> Thanks Karol
>
> Regards
>
> José
>
> Le lundi 17 mars 2014 19:25:31 UTC+1, bizzorama a écrit :
>>
>> Hi, we tried both ways but:
>> First worked but was temporary and worked as index quickfix (after
>> powerdown it was lost again), of course we used the rest interfaces to fix
>> mappings that were already broken (we could not pump all data again so we
>> had to fix it somehow).
>>
>> We applied the mapping file as default (for all indexes) to avoid the
>> problem in future, we knew that all indexes can be started with same
>> mapping.
>>
>> 17-03-2014 17:56, "Mac Jouz"  napisał(a):
>>>
>>> Hi,
>>>
>>> Thanks Karol, changing ES version does not change the problem indeed.
>>>
>>> 2 complementary questions if I may:
>>> - You wrote that you copied the mapping file on ES location, did you try
>>> a way to do so dynamically with a REST call ?
>>> - Otherwise did you apply the modification for the specific "corrupted"
>>> index or copy the mapping file in default config ES location (that is to say
>>> that it was valid for all index ?)
>>>
>>> Regards
>>>
>>> José
>>>
>>>
>>>
>>> Le dimanche 16 mars 2014 16:37:19 UTC+1, bizzorama a écrit :

 Hi,

 it turned out that it was not a problem of ES version (we tested on both
 0.90.10 and 0.90.9) but just a ES bug ...
 after restarting pc or even just the service indices got broken ... we
 found out that this was the case of missing mappings.
 We observed that broken indices had their mappings corrupted (only some
 default fields were observed).
 You can check this by calling: http:\\es_address:9200\indexName\_mapping

 Our mappings were dynamic (not set manually - just figured out by ES
 when the records were incoming).

 The solution was to add a static mapping file like the one described
 here:

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-conf-mappings.html
 (we added the default one).

 I just copied mappings from a healty index, made some changes, turned it
 to a mapping file and copied to the ES server.

 Now everything works just fine.

 Regards,
 Karol


 W dniu niedziela, 16 marca 2014 14:54:00 UTC+1 użytkownik Mac Jouz
 napisał:
>
>
> Hi Bizzorama,
>
> I had a similar problem with the same configuration than you gave.
> ES ran since the 11th of February and was fed every day at 6:00 AM by 2
> LS.
> Everything worked well (kibana reports were correct and no data loss)
> until
> I restarted yesterday ES :-(
> Among 30 index (1 per day), 4 were unusable and data within kibana
> report
> for the related period were unavailable (same
> org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet[0]: 
> (key)
> field [@timestamp] not found)
>
> Do you confirm when you downgraded ES to 0.90.9 that you retrieved your
> data
> (i.e you was able to show your data in kibana reports) ?
>
> I will try to downgrade ES version as you suggested and will let you
> know
> more
>
> Thanks for your answer
>
>
>
> Sorry for the delay.
>
> Looks like you were right, after downgrading ES to 0.90.9 i couldn't
> reproduce the issue in such manner.
>
> Unfortunately, I found some other problems, and one looks like a
> blocker 
>
> After whole ES cluster powerdown, ES just started replaying 'no mapping
> for ... '  for each request.
>
> W dniu czwartek, 20 lutego 2014 16:42:20 UTC+1 użytkownik Binh Ly
> napisał:
>>
>> Your error logs seem to indicate some kind of version mismatch. Is it
>> possible for you to test LS 1.3.2 against ES 0.90.9 and take a sample of 
>> raw
>> logs from those 3 days and test them through to see if those 3 days work 
>> in
>> Kibana? The reason I ask is because LS 1.3.2 (specifically the 
>> elasticsearch
>> output) was built using the binaries from ES 0.90.9.
>>
>> Thanks.
>
>
> Le mardi 11 février 2014 13:18:01 UTC+1, bizzorama a écrit :
>>
>> Hi,
>>
>> I've noticed a very disturbing ElasticSearch behaviour ...
>> my environment is:
>>
>> 1 logstash (1.3.2) (+ redis to store some data) + 1 elasticsearch
>> (0.90.10) + kibana
>>
>> which process about 7 000 000 records per day,
>> everything worked fine on our test environment, untill we run some
>> tests for a longer period (about 15 days).
>>
>> After that time, kibana was un

Re: Elastic Search Tokenizer (for tf-idf)

2014-02-07 Thread Britta Weber
Hi,

you can also get raw term statistics stored in the index such as doc
frequency, term frequency etc within a script (>=0.90.10):

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

You can use this information to calculate your own score. If you want
to use a native script, there are also examples for (very simple)
implementations of common scoring functions (tf-idf, cosine and
language model) here:

https://github.com/imotov/elasticsearch-native-script-example

If you need the field lengths also for scoring, you can access that by
defining a field of type token_count as described here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html

Cheers,
Britta


On Thu, Feb 6, 2014 at 4:10 PM, Zachary Tong  wrote:
> Depending on what you need to accomplish, the new Term Vector API in 1.0
> will likely provide what you need.  When you enable both `field_statistics`
> and `term_statistics`, it will show you TF + DF both in your dictionary and
> in the document:
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/docs-termvectors.html
>
> -Zach
>
>
>
> On Thursday, February 6, 2014 7:09:52 AM UTC-5, Lital wrote:
>>
>> Hi,
>>
>> We are already using elastic search so I though we might get the TF-IDF
>> algorithm for "free" from it. Another option is to implement it ourselves.
>> Is it easy to use the Lucene embedded in the elastic search for this ?
>>
>> Thanks,
>> Lital
>>
>>
>> On Thursday, February 6, 2014 12:54:09 AM UTC+2, Itamar Syn-Hershko wrote:
>>>
>>> Lital, why do you need Elasticsearch for this? it is going to be way
>>> easier for you to use Lucene directly to do this?
>>>
>>> --
>>>
>>> Itamar Syn-Hershko
>>> http://code972.com | @synhershko
>>> Freelance Developer & Consultant
>>> Author of RavenDB in Action
>>>
>>>
>>> On Wed, Feb 5, 2014 at 4:02 PM, Lital  wrote:

 Hi,

 We would like to use elastic search in order to generate idf score for
 each token (for algorithm tf-idf).

 What are the types of built in tokenizers in the elastic search ?
 Should we specify which tokenizer to use in the indexing level (when
 inserting the data) or when performing search on it ?

 Is it also possible to make elastic search use a different tokenizer
 (that was implemented by me) ?

 Thanks,
 Lital

 This message may contain confidential and/or privileged information.
 If you are not the addressee or authorized to receive this on behalf of
 the addressee you must not use, copy, disclose or take action based on this
 message or any information herein.
 If you have received this message in error, please advise the sender
 immediately by reply email and delete this message. Thank you.

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/58047432-3f73-4a55-84cd-20051ff8738f%40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8bcf8cdc-6101-4240-870e-ebf973b3fcdd%40googlegroups.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALhJbBgy0Mshe3o20%3DbUF0ksz0w2ivQLiAQ%2Bp_8%2Buf39gxEOvA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: calculation of whymatch in elasticsearch

2014-02-07 Thread Britta Weber
Hi Nik,

would a script field also work for that? Something like:

{
"script_fields": {
   "field_match1": {
  "script": "if(_index['field1']['searchterm'].tf() > 0){
return 1;} else {return 0;}"
   },
   "field_match2": {
  "script": "if(_index['field2']['searchterm'].tf() > 0){
return 1;} else {return 0;}"
   },

}
}

Or did I get that wrong?

On Fri, Feb 7, 2014 at 2:36 PM, Nikolas Everett  wrote:
> I was thinking of putting together a simple "highlighter" that just returns
> if a field contains a match or not.  This sounds like a nice logic extension
> to that.  It probably wouldn't actually be a "highlighter" but I imagine
> it'd run during the highlight phase and function similarly.  It'd need
> something like the highlight_query, for example, to function properly.
> Anyway, I believe the short answer is, if you are looking for something
> specific, file an issue on github.
>
> Nik
>
>
> On Fri, Feb 7, 2014 at 8:28 AM, Britta Weber
>  wrote:
>>
>> If you know the fields that are contained in the document, you could
>> use a function_score query . For counting the number of fields a word
>> is contained in, you can use function_score with a boost_factor like
>> this:
>>
>> ```
>> {
>>"query": {
>>   "function_score": {
>>  "functions": [
>> {
>>"filter": {
>>   "term": {
>>  "field1": "searchterm"
>>   }
>>},
>>"boost_factor": 1
>> },
>> {
>>"filter": {
>>   "term": {
>>  "field2": "searchterm"
>>   }
>>},
>>"boost_factor": 1
>> },
>>  (here be more filters)
>>  ],
>>  "boost_mode": "replace",
>>  "score_mode": "sum"
>>   }
>>}
>> }
>> ```
>> This will add 1 to the score for each field (field1, field2,... ) that
>> has the term "searchterm" and the final score for each document will
>> be the number of fields in the document containing the term. Is this
>> what you want?
>>
>> For getting the term frequencies, you can checkout text scoring in scripts
>> here:
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html
>>
>> See also this thread:
>>
>> https://groups.google.com/forum/#!msg/elasticsearch/9fOEN1uArIY/7bVZP22zYg8J
>>
>> Cheers,
>> Britta
>>
>> On Fri, Feb 7, 2014 at 7:35 AM,   wrote:
>> > Hi,
>> >
>> > How do we implement whymatch concept in elasticsearch by finding the
>> > total
>> > number of fields in which the search term occurs and the frequency of
>> > that
>> > search term??
>> >
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to elasticsearch+unsubscr...@googlegroups.com.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/elasticsearch/72b99acc-bd1e-4954-bc52-f09971397daf%40googlegroups.com.
>> > For more options, visit https://groups.google.com/groups/opt_out.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CALhJbBji5u0oQxJB3Te-k4wKZ108pbn4an5kk%2BDDbjxB4%2BWJnQ%40mail.gmail.com.
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06gQg-RDGHaD4ScWzLRshUQ6NBybHfTnD1ufDUUH_CMA%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALhJbBhGmNn2EQWuxPeoAq2Do%3DgoZSZdYKTDLy2FkW1y1YkZZg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: calculation of whymatch in elasticsearch

2014-02-07 Thread Britta Weber
If you know the fields that are contained in the document, you could
use a function_score query . For counting the number of fields a word
is contained in, you can use function_score with a boost_factor like
this:

```
{
   "query": {
  "function_score": {
 "functions": [
{
   "filter": {
  "term": {
 "field1": "searchterm"
  }
   },
   "boost_factor": 1
},
{
   "filter": {
  "term": {
 "field2": "searchterm"
  }
   },
   "boost_factor": 1
},
 (here be more filters)
 ],
 "boost_mode": "replace",
 "score_mode": "sum"
  }
   }
}
```
This will add 1 to the score for each field (field1, field2,... ) that
has the term "searchterm" and the final score for each document will
be the number of fields in the document containing the term. Is this
what you want?

For getting the term frequencies, you can checkout text scoring in scripts here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

See also this thread:
https://groups.google.com/forum/#!msg/elasticsearch/9fOEN1uArIY/7bVZP22zYg8J

Cheers,
Britta

On Fri, Feb 7, 2014 at 7:35 AM,   wrote:
> Hi,
>
> How do we implement whymatch concept in elasticsearch by finding the total
> number of fields in which the search term occurs and the frequency of that
> search term??
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/72b99acc-bd1e-4954-bc52-f09971397daf%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALhJbBji5u0oQxJB3Te-k4wKZ108pbn4an5kk%2BDDbjxB4%2BWJnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: how to implement whymatch in elastic search

2014-02-07 Thread Britta Weber
Hi,

you can use a script to find out term frequencies and more of a
particular word in a field. This feature is available since 0.90.10.
Check it out here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

You can use this either inside a function_score [1] query if you want
to use the tf to influence the score, or you can define a script field
[2] if you only want to have the information returned.

If you want to write a native script, some examples how to use that
from java are in this repository:
https://github.com/imotov/elasticsearch-native-script-example

Hope that helps,
Britta


[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
[2] 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-script-fields.html

On Fri, Feb 7, 2014 at 7:24 AM, Navneet Mathpal
 wrote:
> Hi,
>
> how to implement whymatch in elastic search, Is there is any procedure to
> calculate the term frequency for particular word.
>
> for example :
>
> search query- google founder
>
> results are :- g123,g456,g789  (these are the docId corresponding to results
> )
>
> In g123  google is in title field 3 times.
>
>  so how to calculate them.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0e2d9c72-ccbb-45cc-ab65-4899905487e3%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALhJbBhi%3Dvjogo6UCERiLLng5pKP6z%2BsL_zjgfceaGx%3D9ZTk_A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: score based on term frequency only

2014-01-07 Thread Britta Weber
You could also use a script as described here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html


Cheers,
Britta

On Mon, Jan 6, 2014 at 2:13 AM, Ivan Brusic  wrote:
> You could provide your own Similarity class as a plugin. Don't have any
> sample code in front of me, but it would be based of  TFIDFSimilarity and
> you would basically needed to ignore the norms and other values.
>
> http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>
> The IDF portion could probably remain since it ranks the different terms in
> your query, not the score of each term.
>
> Cheers,
>
> Ivan
>
>
>
> On Sun, Jan 5, 2014 at 1:57 PM, Kevin S  wrote:
>>
>> I would like to score based entirely on term count.
>>
>> For example, given the following two documents:
>>
>> 1) { "apple" }
>>
>> 2) { "apple apple" }
>>
>> Searching "apple" ranks the first before the second.  I wish to rank the
>> second, in which the term occurs twice, with a higher score.
>>
>> Can someone please point me in the right direction for this?
>>
>> Thank you.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/1bb386ae-3ab5-4878-9d29-6462eaff14c7%40googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBwEy7UgdqYQmX3EuO71TwSAMCnDp7hdSkcvxLwH5jMJw%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALhJbBiFtgJOfhBqXkS-%2B2YWnDy81j7c5jaSFEkG%3DVizqTpykg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.