Re: elasticsearch phraze term frequency .tf() containing multiple words

2014-10-29 Thread barry
 You can also look at developing a custom analyzer so that your phrase is 
not broken up at white space when indexed. 

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html

Selecting the correct combination of char filters and tokenizers will 
retain phrases.

For example, using the whitespace analyzer will separate on whitespace:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=whitespace' -d 
'foo bar baz'
{
  "tokens" : [ {
"token" : "foo",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
  }, {
"token" : "bar",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 2
  }, {
"token" : "baz",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 3
  } ]
}


However, using the keyword analyzer will retain the entire phrase:

curl '192.168.w.xyz:9200/test/_analyze?pretty=1&analyzer=keyword' -d 'foo 
bAr baZ'
{
  "tokens" : [ {
"token" : "foo bAr baZ",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 1
  } ]
}

On Tuesday, October 28, 2014 10:00:01 AM UTC, vineeth mohan wrote:
>
> Hello Valergi ,
>
> This wont work , normally becuase the string would be tokenized into green 
> and energy.
> If you use shingle token filter and set it as 2   , it might work.
> Or in this case , you can see the position value of both the token using 
> the script and if its next to each other , you can take it as an 
> occurrence. 
>
> Thanks
>   Vineeth
>
> On Tue, Oct 28, 2014 at 3:06 PM,  > wrote:
>
>> I want to access frequency of a phraze combined from multiple words e.g. 
>> "green energy"
>>
>> I can access tf of "green" and "energy", example:
>>
>> "function_score":
>> {
>> "filter" : {
>> "terms" : { "content" : ["energy","green"]}
>> },
>> "script_score": {
>> "script": "_index['content']['energy'].tf() + 
>> _index['content']['green'].tf()",
>> "lang":"groovy"
>> }
>> }
>>
>> This works fine. However, how can I find the frequency of a term "green 
>> energy" as
>>
>> _index['content']['green energy'].tf() does not work
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/87fbc699-ade2-489f-b715-a987066d6cc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch phraze term frequency .tf() containing multiple words

2014-10-29 Thread barry
You can also look at developing a custom analyzer so that your phrase is 
not broken up at white space. 

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html

Selecting the correct combination of char filters and tokenizers will 
retain phrases.

On Tuesday, October 28, 2014 10:00:01 AM UTC, vineeth mohan wrote:
>
> Hello Valergi ,
>
> This wont work , normally becuase the string would be tokenized into green 
> and energy.
> If you use shingle token filter and set it as 2   , it might work.
> Or in this case , you can see the position value of both the token using 
> the script and if its next to each other , you can take it as an 
> occurrence. 
>
> Thanks
>   Vineeth
>
> On Tue, Oct 28, 2014 at 3:06 PM,  > wrote:
>
>> I want to access frequency of a phraze combined from multiple words e.g. 
>> "green energy"
>>
>> I can access tf of "green" and "energy", example:
>>
>> "function_score":
>> {
>> "filter" : {
>> "terms" : { "content" : ["energy","green"]}
>> },
>> "script_score": {
>> "script": "_index['content']['energy'].tf() + 
>> _index['content']['green'].tf()",
>> "lang":"groovy"
>> }
>> }
>>
>> This works fine. However, how can I find the frequency of a term "green 
>> energy" as
>>
>> _index['content']['green energy'].tf() does not work
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/2e4388a4-72d6-4933-9686-304dea0727f1%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8e5795d8-c5ec-4a18-a356-ccc4a7e13e43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Scoring of queries on nested documents

2014-10-22 Thread barry
After some investigation, the number of nested docs get counted 
individually along with the root doc. 

On Tuesday, October 21, 2014 4:55:56 PM UTC+1, ba...@intalex.com wrote:
>
> Thanks for the help Mark. 
> When calculating relevance can I assume that TF is the number of times 
> that the term appears in the collapsed nested field? I.e. all of the city 
> names get merged into one field, or is it handled a different way? Is the 
> Field Length Norm calculated in the same way?
>
> Barry
>
> On Tuesday, October 21, 2014 3:48:15 PM UTC+1, Mark Harwood wrote:
>>
>> The "score_mode" setting determines how the scores of the various child 
>> docs are attributed to the parent doc which is the final scored element.
>> See 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html#query-dsl-nested-query
>>
>> You can for example choose to take the average, max or sum of all the 
>> child documents that match your nested query and reward the parent doc with 
>> that value
>>
>>
>>
>> On Tuesday, October 21, 2014 9:56:51 AM UTC+1, ba...@intalex.com wrote:
>>>
>>> Hello,
>>> I am having a problem understanding how scoring of nested documents 
>>> works. I have found other people with similar questions which have remained 
>>> unanswered:
>>>
>>>
>>> http://stackoverflow.com/questions/25619632/elasticsearch-how-is-the-score-for-nested-queries-computed
>>>
>>>
>>> http://stackoverflow.com/questions/26263562/elasticsearch-boost-score-with-nested-query
>>>
>>> The relevant section of my current mapping (with nested parts) is:
>>> mappings: {
>>>
>>> person: {
>>> properties: {
>>> city: {
>>> type: nested
>>> properties: {
>>> visityear: {
>>> type: integer
>>> }
>>> name: {
>>> type: string
>>> }
>>> }
>>> }
>>> }
>>> }
>>>
>>> }
>>>
>>> If I have three people who have visited different numbers of cities and 
>>> I search for a common city they have all visited I get different score 
>>> values. The person who visited the greatest number of cities is ranked 
>>> first, with the person who visited only one city getting a score of 1 
>>> (currently ranked lowest). The output of the explanation is that hthe score 
>>> is based on 'child doc range from 0 to x'. My question is how do TF, IDF 
>>> and Field Norm work for nested documents when the score is being 
>>> calculated? 
>>>
>>> Many thanks,
>>> Barry
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8809201-3806-4a49-9b87-7eb0c2e02dc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Scoring of queries on nested documents

2014-10-21 Thread barry
Thanks for the help Mark. 
When calculating relevance can I assume that TF is the number of times that 
the term appears in the collapsed nested field? I.e. all of the city names 
get merged into one field, or is it handled a different way? Is the Field 
Length Norm calculated in the same way?

Barry

On Tuesday, October 21, 2014 3:48:15 PM UTC+1, Mark Harwood wrote:
>
> The "score_mode" setting determines how the scores of the various child 
> docs are attributed to the parent doc which is the final scored element.
> See 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html#query-dsl-nested-query
>
> You can for example choose to take the average, max or sum of all the 
> child documents that match your nested query and reward the parent doc with 
> that value
>
>
>
> On Tuesday, October 21, 2014 9:56:51 AM UTC+1, ba...@intalex.com wrote:
>>
>> Hello,
>> I am having a problem understanding how scoring of nested documents 
>> works. I have found other people with similar questions which have remained 
>> unanswered:
>>
>>
>> http://stackoverflow.com/questions/25619632/elasticsearch-how-is-the-score-for-nested-queries-computed
>>
>>
>> http://stackoverflow.com/questions/26263562/elasticsearch-boost-score-with-nested-query
>>
>> The relevant section of my current mapping (with nested parts) is:
>> mappings: {
>>
>> person: {
>> properties: {
>> city: {
>> type: nested
>> properties: {
>> visityear: {
>> type: integer
>> }
>> name: {
>> type: string
>> }
>> }
>> }
>> }
>> }
>>
>> }
>>
>> If I have three people who have visited different numbers of cities and I 
>> search for a common city they have all visited I get different score 
>> values. The person who visited the greatest number of cities is ranked 
>> first, with the person who visited only one city getting a score of 1 
>> (currently ranked lowest). The output of the explanation is that hthe score 
>> is based on 'child doc range from 0 to x'. My question is how do TF, IDF 
>> and Field Norm work for nested documents when the score is being 
>> calculated? 
>>
>> Many thanks,
>> Barry
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b9d6917-a085-4ac1-930a-8e8a70c7c1ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Scoring of queries on nested documents

2014-10-21 Thread barry
Edit: There is only one shard being used in this mapping. 

On Tuesday, October 21, 2014 9:56:51 AM UTC+1, ba...@intalex.com wrote:
>
> Hello,
> I am having a problem understanding how scoring of nested documents works. 
> I have found other people with similar questions which have remained 
> unanswered:
>
>
> http://stackoverflow.com/questions/25619632/elasticsearch-how-is-the-score-for-nested-queries-computed
>
>
> http://stackoverflow.com/questions/26263562/elasticsearch-boost-score-with-nested-query
>
> The relevant section of my current mapping (with nested parts) is:
> mappings: {
>
> person: {
> properties: {
> city: {
> type: nested
> properties: {
> visityear: {
> type: integer
> }
> name: {
> type: string
> }
> }
> }
> }
> }
>
> }
>
> If I have three people who have visited different numbers of cities and I 
> search for a common city they have all visited I get different score 
> values. The person who visited the greatest number of cities is ranked 
> first, with the person who visited only one city getting a score of 1 
> (currently ranked lowest). The output of the explanation is that hthe score 
> is based on 'child doc range from 0 to x'. My question is how do TF, IDF 
> and Field Norm work for nested documents when the score is being 
> calculated? 
>
> Many thanks,
> Barry
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ca8a63ee-9700-4ac6-8a4b-b6bb4362ee85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Scoring of queries on nested documents

2014-10-21 Thread barry
Hello,
I am having a problem understanding how scoring of nested documents works. 
I have found other people with similar questions which have remained 
unanswered:

http://stackoverflow.com/questions/25619632/elasticsearch-how-is-the-score-for-nested-queries-computed

http://stackoverflow.com/questions/26263562/elasticsearch-boost-score-with-nested-query

The relevant section of my current mapping (with nested parts) is:
mappings: {

person: {
properties: {
city: {
type: nested
properties: {
visityear: {
type: integer
}
name: {
type: string
}
}
}
}
}

}

If I have three people who have visited different numbers of cities and I 
search for a common city they have all visited I get different score 
values. The person who visited the greatest number of cities is ranked 
first, with the person who visited only one city getting a score of 1 
(currently ranked lowest). The output of the explanation is that hthe score 
is based on 'child doc range from 0 to x'. My question is how do TF, IDF 
and Field Norm work for nested documents when the score is being 
calculated? 

Many thanks,
Barry

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b6dd8305-43df-4146-89f2-28fea0264f61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Updating Datatype in Elasticsearch

2014-07-16 Thread Barry Williams
Hello All,
I'm a n00b, and I'm having trouble changing a field's datatype in 
elasticsearch - so that kibana can use it.

I read in a CSV with logstash. Here is a sample of that CSV:

DateTime,Session,Event,Data/Duration
2014-05-12T21:51:44,1399945863,Pressure,7.00



Here is my logstash config:

input {
  file {
path => 
"/elk/Samples/CPAP_07_14_2014/CSV/SleepSheep_07_14_2014_no_header.csv"
start_position => beginning
  }
}


filter {
  csv {
columns => ["DateTime","Session","Event","Data/Duration"]
  }
}


output {
  elasticsearch {
host => localhost
  }
  stdout { codec => rubydebug }
}



Whenever the data reaches elasticsearch, the mapping shows the 
"Date/Duration" field as a string, not a float, preventing kibana from 
using it for graphing.  I tried to use PUT on elasticsearch to overwrite 
the mapping, but it wont let me.


Where should I configure this datatype? In the CSV filter, in the output, 
in elasticsearch?

Thanks,
Barry

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5fac5f75-bcd3-4900-8d0a-94c930e7935c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


kibana ES version errors

2014-02-03 Thread Barry Kaplan
I'm running ES 0.90.10 and Kibana master. kibana emits an error saying it 
needs ES 0.90.9 or better. But kibana does not say what version it thinks 
its using, or even what url is using. 

>From ES:

curl http://10.0.136.6:9200
{
  "ok" : true,
  "status" : 200,
  "name" : "s-ops-monitor-01",
  "version" : {
"number" : "0.90.10",
"build_hash" : "0a5781f44876e8d1c30b6360628d59cb2a7a2bbb",
"build_timestamp" : "2014-01-10T10:18:37Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
  },
  "tagline" : "You Know, for Search"
}

>From kibana config:

elasticsearch: http://10.0.136.6:9200,

Is there anyway to debug kibana?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6a0f425-4b0b-4c89-b39b-f644670bfe35%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.