date:20140430

Re: IllegalStateException[field \DISPLAY_NAME\ was indexed without position data

2014-04-30 Thread chee hoo lum

Hi Ivan,

Running the following query it returns records below :

{
  query : { match : {DISPLAY_NAME : Happy People} }
}


Result :
https://gist.github.com/cheehoo/073ab926baa123b18224




but running with span query suggested:

{
 from : 100,
 size : 100,
 query : {
span_first : {
match : {
span_near : {
 clauses : [
 { span_term : { DISPLAY_NAME : happy } },
 { span_term : { DISPLAY_NAME : people } }

 ],
 slop : 1,
 in_order : true
 }
},
*end : 2*
}
   }
}


no result returned.

Any clues :)


Thanks.








On Wed, Apr 30, 2014 at 12:04 PM, Ivan Brusic i...@brusic.com wrote:

 Do you have any documents that starts with happy people?

 --
 Ivan


 On Tue, Apr 29, 2014 at 7:21 PM, chee hoo lum cheeho...@gmail.com wrote:

 Hi Ivan,

 Tried with 2 and 3 with no luck.

 {
  from : 100,
  size : 100,
  query : {
 span_first : {
 match : {
 span_near : {
 clauses : [
 { span_term : { DISPLAY_NAME : happy } },
 { span_term : { DISPLAY_NAME : people } }

 ],
 slop : 1,
 in_order : true
 }
 },
 *end : 2*
 }
}
 }


 The field is using standard analyzer with stopword=_none:

  DISPLAY_NAME: {
 type: string,
 analyzer: standard
 },

index.analysis.analyzer.standard.type: standard,
index.analysis.analyzer.standard.stopwords: _none_


 Any clue on this ? :) Thanks




 On Wed, Apr 30, 2014 at 12:37 AM, Ivan Brusic i...@brusic.com wrote:

 The end parameter is too low. It needs to be at a minimum the number of
 clauses in the span_near query.

 --
 Ivan


 On Mon, Apr 28, 2014 at 7:05 PM, chee hoo lum cheeho...@gmail.comwrote:

  Hi Ivan,

 Not able to get any result with the following query :

 {
  from : 100,
  size : 100,
  query : {
 span_first : {
 match : {
 span_near : {
 clauses : [
  { span_term : { DISPLAY_NAME : happy } },
 { span_term : { DISPLAY_NAME : people } }

 ],
 slop : 1,
 in_order : true
 }
 },
 end : 1
 }
}
 }


 Meanwhile tried with :

 {
  from : 100,
  size : 100,
  query : {
 span_first : {
 match : {
 span_term : { DISPLAY_NAME : happy }
 },
 end : 1
 }
}
 }

 and it returns :

   _index: jdbc_dev,
 _type: media,
 _id: 9556,
 _score: 4.612431,
 _source: {
 DISPLAY_NAME: Happy People,


 Anything wrong with my first query ?

 Thanks



 On Tue, Apr 29, 2014 at 12:16 AM, Ivan Brusic i...@brusic.com wrote:

 The main limitation of the span queries is that they only operate on
 analyzed terms. The terms used in span_term must match the terms in the
 index. In your case, there is no single term happy holiday in your 
 index,
 because the original document was tokenized into happy birthday
 to you.

 You would need to do a span near query of the two terms with a slop of
 1 and in order. This span near query will then be the argument to the span
 first.

 Here is a good explanation of span queries in Lucene:
 http://searchhub.org/2009/07/18/the-spanquery/

 --
 Ivan


  On Sun, Apr 27, 2014 at 11:24 PM, cyrilforce cheeho...@gmail.comwrote:

  Hi Ivan,

 I recreate the mapping and re-index the documents and now working
 fine. Thanks.

 Btw would like to ask how i could search two or more words in the
 span_first query as i need it to support the following searches :
 1)happy
 2)happy holiday
 3)happy birthday to you

 {
  from : 100,
  size : 100,
  query : {
 span_first : {
 match : {
* span_term : { DISPLAY_NAME : happy holiday }*
 },
 end : 1
 }
}
 }


 returns empty list even we have documents that display_name start
 with *happy holiday*.

 Thanks.


 On Sunday, April 27, 2014 2:55:37 AM UTC+8, cyrilforce wrote:

 Hi Ivan,

 I am using version elasticsearch-0.90.1. Nope we don't have any
 templates. Not sure whether your are referring to the full index mapping
 here's the gist

 media mapping
 https://gist.github.com/cheehoo/11327970

 full index mapping
 https://gist.github.com/cheehoo/11327996

 Thanks in advance.





 On Sat, Apr 26, 2014 at 8:31 AM, Ivan Brusic i...@brusic.comwrote:

 Your mapping looks correct. Which version are you running? Do you
 have any templates?

 Just to be on the safe side, can you provide the mapping that
 Elasticsearch is using (not the one you provide):

 http://localhost:9200/jdbc_dev/media/_mapping

 --
 Ivan




 On Fri, Apr 25, 2014 at 3:24 AM, cyrilforce cheeho...@gmail.comwrote:

 Hi,

 I am trying to query some records via the span_first query as
 below :

 {
  from : 100,
  size : 100,
  query : {
 span_first : {
 match : {
* span_term : { DISPLAY_NAME : happy }*
 },

Truncating scores

2014-04-30 Thread Loïc Wenkin

Hello everybody,

I am using the function_score query in order to compute a custom score for 
items I am indexing into ElasticSearch. I am using a native script (written 
in Java) in order to compute my score. This score is computed based on a 
date (Date.getTime()). When I use a logger and look what is returned by my 
native script, I get what I want, but when I look at the score of items 
returned by query (I use the replace mode), I get a truncated number (e.g. 
if a computed score displayed in the native script with the value 1 392 028 
423 243, it is returned with the value 1 392 028 420 000 as score of 
returned items). The problem here is that I am loosing milliseconds and 
seconds (I only get the decade part of seconds). Loose milliseconds can be 
acceptable, but I can't loose seconds.

Is this problem a limitation of ElasticSearch ? Is there any way to 
workaround this problem ?

Thanks in advance for your replies.

Regards,
Loïc Wenkin

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: performance issue with script scoring with fields having a large array

2014-04-30 Thread Radu Gheorghe

Hello,

Using _source for scripts is typically slow, because ES has to go to each
stored document and extract fields from there. A faster approach is to use
something like doc['field3'].values[12], which will used the field data
cache (already loaded in memory, at least after the first run):
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields

More details about field data can be found here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.htm

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Wed, Apr 30, 2014 at 12:27 PM, NM n.maisonne...@gmail.com wrote:

 I have document having fields containing  large array.

 I would like to score according to the value of a nth element of such
 array, but got very slow answer (5s) for only 10K document indexed.

 my mapping:
 document {
 id: value,
 field2: string,
 field3: [ int_1,int_2, ... , int_10k] - large array of 10K integers
 }

 assume I generated and indexed 10K documents with 1K random integer values
 in the field 'field3'

 I then use the following search query

 GET /test/document/_search
 {
   query:{
function_score:{
   script_score : {
 script :  _source.fields3[12] * _source.fields3[11] 
 }

 = got 5000 ms

 however with basic Java object with a simple nested loop:

 - for all the documents
   score[i] =  doc[i].fields[12] * doc[i].fields[11]
 - sort by score

 = got  50 ms

 ES is 100 slower than a simple loop..

 How to get similar performance with ES?

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2wmDJFBJvJ1fTUsszaP7GjVtJYfSU-AbHMq6NS%2BVqhFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Deployment architecture

2014-04-30 Thread Mark Walkom

It will work, but if you want to maintain HA then it'd make sense to keep
your inputs separate from your outputs. At least, that's my take :)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 30 April 2014 19:48, Norberto Meijome num...@gmail.com wrote:

Sending indexing requests to SLB - is this less optimal, or would outright
fail?
On 30/04/2014 9:04 am, Mark Walkom ma...@campaignmonitor.com wrote:

For searches, yes. You'd want the indexing to go to the masters.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 30 April 2014 09:02, Norberto Meijome num...@gmail.com wrote:

On a related note, if you have separate slb and master, your main LB
(say, haproxy) would be pointing to the slb , not the master , right?
On 29/04/2014 8:40 pm, Dinesh Chandra shadow.on.f...@gmail.com
wrote:

Hi,

I am very new to elasticsearch, I am trying to deploy elasticsearch in
my dev environment - While there are many ways in which Elasticsearch can
be deployed, I and my team have arrived at this architecture

4 Data Nodes
3 Master Nodes
2 Search Load Balancers (SLB)

Now my question is:
- Does it make sense to have SLB at all?
- Can I just have master nodes and have them perform the JOB of SLB
too?

Please enlighten me on a sensible Elasticsearch Architecture!

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/82ee8ae2-c84d-4685-b061-d3e433b7969f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/82ee8ae2-c84d-4685-b061-d3e433b7969f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACj2-4K9mh%3D%3Dv02mkRForLfHO8E4MYUcd3kNvfvFJGWvRwFiCg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CACj2-4K9mh%3D%3Dv02mkRForLfHO8E4MYUcd3kNvfvFJGWvRwFiCg%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bQXpsN12dCPQefkkL8LMX0bdsGVrs2uS0ZRLMtqRM%3DXg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624bQXpsN12dCPQefkkL8LMX0bdsGVrs2uS0ZRLMtqRM%3DXg%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACj2-4JqQ3Q%3DTKaTWbZTEkbFBW%2Bj6acGeFiBo7omUH-6aEo1Lg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CACj2-4JqQ3Q%3DTKaTWbZTEkbFBW%2Bj6acGeFiBo7omUH-6aEo1Lg%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y9ZS7FAVa1R%3D-sK4UWyJj6uoSC756fe%3Dii7Xi2e7Kn0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: index binary files

2014-04-30 Thread Radu Gheorghe

Hello,

Normally, you would send indexing requests to the REST API with the stuff
you want Elasticsearch to index:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html

If you want Elasticsearch to automatically fetch files from the file system
for you, have a look at David's FileSystem River:
https://github.com/dadoonet/fsriver

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr Elasticsearch Support * http://sematext.com/

On Tue, Apr 29, 2014 at 6:40 PM, anass benjelloun anass@gmail.comwrote:

hello,

I installed ElasticSearch, its work good i can index and search xml and
json content using Dev HTTP Client.
I need your help to index binary files in elasticsearch then search for
them by content.
I added mapper-attachements to elastic search but what i dont know is how
to specify the folder of pdf or docx files to index it. something like
base64 or i dont know.
Thanks for helping me.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/787f6815-408a-4ef7-bfd3-a5ee6cc02798%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/787f6815-408a-4ef7-bfd3-a5ee6cc02798%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2UQpB63eye_Yii0KiGYXiMj8Q6v3swRrxxYNk5jiMxpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Specify metadata per word/term in a string

2014-04-30 Thread Neeraj Makam

Hi

Given a text,say hello elastic search world,
is there a way i can associate a field or some metadata per word in the 
text on which i can later query? 
for eg: give code number to each word, and should be able to search like
text = hello AND code = 25
i.e return all hello words which have 25 in their code metadata. 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f5f40059-e9e7-4a05-93f8-aacdff26abb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Specify metadata per word/term in a string

2014-04-30 Thread vineeth mohan

Hello Neeraj ,

First of all you cant return a hello from Elasticsearch.
Elasticsearch works on feed level basis.
Which means if you want to search hello , you will get the feed with the
text hello elasticsearch search world but not just hello.

Only way I can think of create a different document for each word. So a
document would look like -

{
word : hello,
code : 25
}

here , you can get it worked.

If you want to retrieve the text also , give it as follows -

{
text : hello from Elasticsearch ,
words : [
{ word : hello , count : 25 } ,
{ word : from , count : 22}
]
}

WHERE words field is nested type.

Thanks
Vineeth

On Wed, Apr 30, 2014 at 4:36 PM, Neeraj Makam neeraj23...@gmail.com wrote:

Given a text,say hello elastic search world,
is there a way i can associate a field or some metadata per word in the
text on which i can later query?
for eg: give code number to each word, and should be able to search like
text = hello AND code = 25
i.e return all hello words which have 25 in their code metadata.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f5f40059-e9e7-4a05-93f8-aacdff26abb0%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f5f40059-e9e7-4a05-93f8-aacdff26abb0%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nwzNDRVkkxJ8qczgzFAacatg-tEQYaAkrk%2BeHRRcqthA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Specify metadata per word/term in a string

2014-04-30 Thread joergpra...@gmail.com

There is a related feature that is called payloads for terms. In
Elasticsearch you can assign payload to terms, e.g. numbers for custom
scoring.

Re: Truncating scores

2014-04-30 Thread Nikolas Everett

Scores are Java floats so I'd expect them to be less precise then the long
that getTime returns. I believe you could look at sorting rather then
scoring or look at reducing the precision on the top bits of your long.
You know, y2k bug style.

The reason the score is a float is that for text scoring its exact enough.
Also, some of the lucene data structures are actually more lossy then
float. The field norm, iirc, is a floating point number packet into 8 bits
rather the float's 32.

Nik

On Wed, Apr 30, 2014 at 5:56 AM, Loïc Wenkin loic.wen...@gmail.com wrote:

Hello everybody,

I am using the function_score query in order to compute a custom score for
items I am indexing into ElasticSearch. I am using a native script (written
in Java) in order to compute my score. This score is computed based on a
date (Date.getTime()). When I use a logger and look what is returned by my
native script, I get what I want, but when I look at the score of items
returned by query (I use the replace mode), I get a truncated number (e.g.
if a computed score displayed in the native script with the value 1 392 028
423 243, it is returned with the value 1 392 028 420 000 as score of
returned items). The problem here is that I am loosing milliseconds and
seconds (I only get the decade part of seconds). Loose milliseconds can be
acceptable, but I can't loose seconds.

Is this problem a limitation of ElasticSearch ? Is there any way to
workaround this problem ?

Thanks in advance for your replies.

Regards,
Loïc Wenkin

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd39xkFEJNfb0x8C-M5h6GaxP7qqFYBFjTcBua1siVRttQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: SearchParseExceptions in Marvel monitoring cluster

2014-04-30 Thread Boaz Leskes

Hi Mihir,

This type of error typically ocour when the marvel index doesn't contain 
the right data. I'm intrigued by the ClusterBlockException on you 
monitoring cluster.

Can you gist the output of : curl SERVER:9200/_cat/shards/?v for both nodes 
of you marvel cluster?

Thx,
Boaz

On Monday, April 28, 2014 2:43:30 PM UTC+2, Mihir M wrote:

 Hi, 

 We have 2 Elasticsearch clusters in our development environment. 
 One of them is our development cluster with 9 nodes having 
  - 4 Data nodes (with 4 GB heap) 
  - 3 Master eligible nodes (default heap) 
  - 2 Search Load Balancers (default heap) 

 The second is our monitoring cluster for storing Marvel data of the 
 development cluster. This cluster has 2 nodes running with default 
 configuration. 
 All the above nodes are running the latest ES version 1.1.1 and the latest 
 Marvel version which is 1.1.0. 

 Of late we have been seeing issues in the Marvel cluster. One of the nodes 
 in the Marvel cluster throws the following exception continuously: 
 [.marvel-2014.04.25][0], node[dA2UtjgdQ1S55zgvQHOHYQ], [P], s[STARTED]: 
 Failed to execute [org.elasticsearch.action.search.SearchRequest@24de815] 
 org.elasticsearch.search.SearchParseException: [.marvel-2014.04.25][0]: 
 from[-1],size[-1]: Parse Failure [Failed to parse source 
 [{facets:{0:{date_histogram:{key_field:@timestamp,value_field:total.search.query_total,interval:1m},global:true,facet_filter:{fquery:{query:{filtered:{query:{query_string:{query:_type:indices_stats}},filter:{bool:{must:[{range:{@timestamp:{from:1398434986844,to:now}}}],size:50,query:{filtered:{query:{query_string:{query:_type:cluster_event
  

 OR 
 _type:node_event}},filter:{bool:{must:[{range:{@timestamp:{from:1398434986844,to:now}}}],sort:[{@timestamp:{order:desc}},{@timestamp:{order:desc}}]}]]
  

 at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:634) 
 at 
 org.elasticsearch.search.SearchService.createContext(SearchService.java:507) 

 at 
 org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
  

 at 
 org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:324)
  

 at 
 org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
  

 at 
 org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
  

 at 
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
  

 at 
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:296)
  

 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  

 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  

 at java.lang.Thread.run(Thread.java:744) 
 Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
 Facet [0]: (value) field [total.search.query_total] not found 
 at 
 org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:186)
  

 at 
 org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
  

 at 
 org.elasticsearch.search.SearchService.parseSource(SearchService.java:622) 
 ... 10 more 

 It keeps repeating at regular intervals. Also this is observed in only one 
 of the 2 nodes of the monitoring cluster. Usually it is the master which 
 shows this exception. 
 Similar exceptions are observed in the Marvel dashboard - Cluster Overview 
 page. 

 Also in the development cluster in one of the Master nodes, we see 
 ClusterBlockException [shard state 0 not initialized or recovered] for the 
 monitoring cluster. 

 Please explain why this is happening. One more thing to add, we are facing 
 this problem ever since we migrated to ES 1.1.0. Before that while running 
 1.0.0, no such things were observed. 

 Looking forward to your reply. 




 - 
 Regards 
 -- 
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/SearchParseExceptions-in-Marvel-monitoring-cluster-tp4054926.html
  
 Sent from the ElasticSearch Users mailing list archive at Nabble.com. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e21279a2-62e9-4d08-9aed-f9d32c110da5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Truncating scores

2014-04-30 Thread Loïc Wenkin

Hello Nikolas,

Thanks for your reply. I have done something like what you have just
explained. I divide the score by 5000 before returning it. Doing this, I
remove milliseconds and I keep a precision of 5 seconds, which I expect to
be enough. If it's always a problem, I may try to remove some years from
the date in order to get a smallest number.

I think that using sort is an hard work since I have something like this in
my documents :

a: {

b: {

objectsSortableByDate: [

...

]

c: {

objectsSortableByDate: [

...

]

}

I want to filter my entities according the smallest (or highest) date of
any objectsSortableByDate (whatever they are in b or in c), and sometime,
I may have more than two nested objects, so, I think that the easiest way
to sort is using a computed score. If you have a better idea, I will take
it :)

Loïc

Le mercredi 30 avril 2014 14:48:37 UTC+2, Nikolas Everett a écrit :

The reason the score is a float is that for text scoring its exact
enough. Also, some of the lucene data structures are actually more lossy
then float. The field norm, iirc, is a floating point number packet into 8
bits rather the float's 32.

Nik

On Wed, Apr 30, 2014 at 5:56 AM, Loïc Wenkin loic@gmail.comjavascript:
wrote:

Hello everybody,

I am using the function_score query in order to compute a custom score
for items I am indexing into ElasticSearch. I am using a native script
(written in Java) in order to compute my score. This score is computed
based on a date (Date.getTime()). When I use a logger and look what is
returned by my native script, I get what I want, but when I look at the
score of items returned by query (I use the replace mode), I get a
truncated number (e.g. if a computed score displayed in the native script
with the value 1 392 028 423 243, it is returned with the value 1 392 028
420 000 as score of returned items). The problem here is that I am loosing
milliseconds and seconds (I only get the decade part of seconds). Loose
milliseconds can be acceptable, but I can't loose seconds.

Is this problem a limitation of ElasticSearch ? Is there any way to
workaround this problem ?

Thanks in advance for your replies.

Regards,
Loïc Wenkin

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com javascript:.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ccf7c19e-aa70-42ac-a4a4-d7174ab0de49%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c003b925-0766-4750-a722-3125a77c3774%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregation bug? Or user error?

2014-04-30 Thread Adrien Grand

This looks wrong indeed. By any chance, would you have a curl recreation of
this issue?


On Tue, Apr 29, 2014 at 7:35 PM, mooky nick.minute...@gmail.com wrote:

 It looks like a bug to me - but if its user error, then obviously I can
 fix it a lot quicker :)


 On Tuesday, 29 April 2014 13:04:53 UTC+1, mooky wrote:

 I am seeing some very odd aggregation results - where the sum of the
 sub-aggregations is more than the parent bucket.

 Results:
 CSSX : {
   doc_count : *24*,
   intentDate : {
 buckets : [ {
   key : Overdue,
   to : 1.3981248E12,
   to_as_string : 2014-04-22,
   doc_count : *1*,
   ME : {
 doc_count : *0*
   },
   NOT_ME : {
 doc_count : *24*
   }
 }, {
   key : May,
   from : 1.3981248E12,
   from_as_string : 2014-04-22,
   to : 1.4006304E12,
   to_as_string : 2014-05-21,
   doc_count : *23*,
   ME : {
 doc_count : 0
   },
   NOT_ME : {
 doc_count : *24*
   }
 }, {
   key : June,
   from : 1.4006304E12,
   from_as_string : 2014-05-21,
   to : 1.4033088E12,
   to_as_string : 2014-06-21,
   doc_count : *0*,
   ME : {
 doc_count : *0*
   },
   NOT_ME : {
 doc_count : *24*
   }
 } ]
   }
 },


 I wouldn't have thought that to be possible at all.
 Here is the request that generated the dodgy results.


 CSSX : {
   filter : {
 and : {
   filters : [ {
 type : {
   value : inventory
 }
   }, {
 term : {
   isAllocated : false
 }
   }, {
 term : {
   intentMarketCode : CSSX
 }
   }, {
 terms : {
   groupCompanyId : [ 0D13EF2D0E114D43BFE362F5024D8873, 
 0D593DE0CFBE49BEA3BF5AD7CD965782, 1E9C36CC45C64FCAACDEE0AF4FB91FBA, 
 33A946DC2B0E494EB371993D345F52E4, 6471AA50DFCF4192B8DD1C2E72A032C7, 
 9FB2FFDC0FF0797FE04014AC6F0616B6, 9FB2FFDC0FF1797FE04014AC6F0616B6, 
 9FB2FFDC0FF2797FE04014AC6F0616B6, 9FB2FFDC0FF3797FE04014AC6F0616B6, 
 9FB2FFDC0FF5797FE04014AC6F0616B6, 9FB2FFDC0FF6797FE04014AC6F0616B6, 
 AFE0FED33F06AFB6E04015AC5E060AA3 ]
 }
   }, {
 not : {
   filter : {
 terms : {
   status : [ Cancelled, Completed ]
 }
   }
 }
   } ]
 }
   },
   aggregations : {
 intentDate : {
   date_range : {
 field : intentDate,
 ranges : [ {
   key : Overdue,
   to : 2014-04-22
 }, {
   key : May,
   from : 2014-04-22,
   to : 2014-05-21
 }, {
   key : June,
   from : 2014-05-21,
   to : 2014-06-21
 } ]
   },
   aggregations : {
 ME : {
   filter : {
 term : {

   trafficOperatorSid : S-1-5-21-20xxspan
 style=color: #000; class=styled-by
 ...

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4ceceaaf-4fb8-4e54-97f4-c49fcbf9493d%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7WWj4GaAEH0K%2B37srpP4f_9S%3DKffM7k1DAAyZiy1zUpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Date range query ignore month

2014-04-30 Thread Fatih Karatana

Hi guys,

I've been using Elasticsearch as my data store and I got  lots of documents 
in it. My problem is, I figured it out that Elasticsearch does ignore month 
field regarding mapping and I can not get real search response. 

Here is what I have in my index and my query, please tell me if I'm wrong:

curl -XPUT 'http://localhost:9200/tt6/' -d '{}'
curl -XPUT 'http://localhost:9200/tt6/tweet/_mapping' -d '{tweet : 
{properties : {date : {type : date, format: -MM-DD HH:mm:ss 
'
curl -XPUT 'http://localhost:9200/tt6/tweet/1' -d '{date: 2014-02-14 
04:00:45}'

curl -XGET 'http://localhost:9200/tt6/_search' -d '
{
  query: {
bool: {
  must: [
{
  range: {
tweet.date: {
  from: 2014-12-01 00:00:00,
  to: 2014-12-30 00:00:00
}
  }
}
  ],
  must_not: [],
  should: []
}
  },
  from: 0,
  size: 10,
  sort: [],
  facets: {}
}'

And my response is
{
  took: 3,
  timed_out: false,
  _shards: {
total: 5,
successful: 5,
failed: 0
  },
  hits: {
total: 1,
max_score: 1,
hits: [
  {
_index: tt6,
_type: tweet,
_id: 1,
_score: 1,
_source: {
  date: 2014-02-14 04:00:45,
  name: test
}
  }
]
  }
}

By given date range it must has no response beet 1st of December 2014 and 
30th of December 2014, but it returns.

Any help will be appreciated.

Regards.

Fatih.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6ee655cf-9e77-439f-9aac-8255efafcb2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Registering node event listeners

2014-04-30 Thread Ivan Brusic

Would the DiscoverService solve my initial problem or only get around
constructing a DiscoveryNodesProvider? DiscoverService only uses
the InitialStateDiscoveryListener, which doesn't publish interesting events.

I won't be near a computer in the next few days to test.

--
Ivan

On Wed, Apr 30, 2014 at 4:40 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Have you looked at InternalNode.java?

Form my understanding you could try to implement your own DiscoveryModule
with DiscoveryService and start it like this

DiscoveryService discoService =
injector.getInstance(DiscoveryService.class).start();

Jörg

On Wed, Apr 30, 2014 at 12:17 AM, Ivan Brusic i...@brusic.com wrote:

I am looking to transition a piece of my search infrastructure from
polling the cluster's health status to hopefully receiving notifications
whenever an event occurs. Using the TransportService, I registered various
relevant listeners, but none of them are triggered.

Here is the gist of the code:

https://gist.github.com/brusic/2dcced28e0ed753b6632

Most of it I stole^H^H^H^H^Hborrowed from ZenDiscovery. I am assuming
something is not quite right with the TransportService. I tried using both
a node client and a master-less/data-less client. I also suspect that
the DiscoveryNodesProvider might not have been initialized correctly, but I
am primarily after the events from NodesFaultDetection, which does not use
the DiscoveryNodesProvider.

I know I am missing something obvious, but I cannot quite spot it. Is
there perhaps a different route using the TransportClient?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC5twFLr%2By_oqkV3_SjS9T_kikG9Z%2BBi6DJ_jOydHYBCA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC5twFLr%2By_oqkV3_SjS9T_kikG9Z%2BBi6DJ_jOydHYBCA%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEVGCvFFaeJmxba-UZEuKS7EK5FakqBbSgy4qUGuywtYg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoEVGCvFFaeJmxba-UZEuKS7EK5FakqBbSgy4qUGuywtYg%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB%2B%3Dox_Q7D-U%3DVVROusfdGuJWHF_hxZJAT85NAZ0d%3D1eg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Substring match in search term order using Elasticsearch

2014-04-30 Thread Kruti Shukla



Posted same question on stackover flow 
http://stackoverflow.com/questions/23244796/substring-match-in-search-term-order-using-elasticsearch;
 but still looking for Answer.


I'm new to elasticsearch

I want to perform substring/partial word match using elastic search. I want 
results to be returned in the perticular order. In order to explain my 
problem I will show you how I create my index, mappings and what are the 
records I use.

*Creating Index and mappings:*

PUT /my_index1
{
settings: {
analysis: {
filter: {
trigrams_filter: {
type: ngram,
min_gram: 3,
max_gram: 3
}
},
analyzer: {
trigrams: {
type:  custom,
tokenizer: standard,
filter:   [
lowercase,
trigrams_filter
]
}
}
}
},
mappings: {
my_type1: {
properties: {
text: {
type: string,
analyzer: trigrams 
}
}
}
}
}

*Bulk record insert:*

POST /my_index1/my_type1/_bulk
{ index: { _id: 1 }}
{ text: men's shaver }
{ index: { _id: 2 }}
{ text: men's foil shaver }
{ index: { _id: 3 }}
{ text: men's foil advanced shaver }
{ index: { _id: 4 }}
{ text: norelco men's foil advanced shaver }
{ index: { _id: 5 }}
{ text: men's shavers }
{ index: { _id: 6 }}
{ text: women's shaver }
{ index: { _id: 7 }}
{ text: women's foil shaver }
{ index: { _id: 8 }}
{ text: women's foil advanced shaver }
{ index: { _id: 9 }}
{ text: norelco women's foil advanced shaver }
{ index: { _id: 10 }}
{ text: women's shavers }

*Now, I want to perform search for en's shaver. I'm searching using 
follwing query:*

POST /my_index1/my_type1/_search
{
query: {
   match: {
  text: 
  { query: en's shaver,

minimum_should_match: 100%

  }
   }

}
}

I want results to be in following sequence:

   1. men's shaver -- closest match with following same search keyword 
   order en's shaver
   2. women's shaver -- closest match with following same search keyword 
   order en's shaver
   3. men's foil shaver -- increased distance by 1
   4. women's foil shaver -- increased distance by 1
   5. men's foil advanced shaver -- increased distance by 2
   6. women's foil advanced shaver -- increased distance by 2
   7. men's shavers -- substring match for shavers
   8. women's shavers -- substring match for shavers

I'm performing following query. It is not giving me result in the order I 
want:

POST /my_index1/my_type1/_search
{
   query: {
  query_string: {
 default_field: text,
 query: men's shaver,
 minimum_should_match: 90%
  }
   }
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7d43a2d-be99-45a5-a2a3-4151dbc52292%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Using snapshotrestore to separate indexing from searching

2014-04-30 Thread JoeZ99

as I posted before, our system does not fit very well in cluster structure,
because we have many small indices in place (about 1k indices with an
average of 6k records each), we guessed that with so many small indices,
the cluster spent too much time and resources which nodes should be master
, or where to locate absurdly small shards, etc... Bottom line is that the
cluster always ended up not working right. BTW, I'm suspecting that with a
few advanced tuning options of the cluster (shard routing and the like) we
may be able to put it on again, but unfortunately we can't find that kind
of knowledge in the standard doc. If any of you have any hint on this, it
would be greatly appreciated!!!

Anyway, we need to scale the system somehow, and this is what we've come up
with:

- Our indices can have configuration variations that make a reindex
needed at any time. it doesn't happen a lot, but it happens, and with 1k
indices, it's bound to happen.
- Indexing data is regenerated everyday, so every day the whole set of
indices is re-created (we figured it's much faster to recreate the index
than to update an existing one replacing everyone of its records)

We would like the machines used for searching results are only used for
that, and never used for indexing/reindexing ops, because we don't want the
user experience to suffer when searching against an already loaded server
because it's doing some heavy indexing.

In our ideal scenario, indexing/reindexing would be done in devoted
machines, which can be as many as needed, and searching would be done in
different machines. We plan to use the snapshot/restore feature for that.

Any time an index/reindex is needed, it would be done on one of these
indexing machines, and then the fresh index would be snapshotted, to be
restored to the search machine afterwards. We should have some client
control to make sure the snapshot process is only once at a time, it's my
understanding that this is not the case in the restore process (i.e. you
can have more than one restore process running on a cluster).

Individual item index can happen occasionally, but I figure when that
happens we can just index to both the searching machines and the indexing
machines, because it's never going to be big.

Please understand cluster instead of machine

How crazy does this whole thing sound, Is there any other way we can get
some scalability?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/82d7dd51-1b86-4b0f-8abc-425a45f1dfac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lucene Date Range Query in Kibana

2014-04-30 Thread Uli Bethke

Is there a way in Kibana or Lucene to define a date range query as Today-60 
days.

Something along the logical lines of visit_date: [*-60 TO *] 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/121e135e-e12f-417c-879f-36e877ec0d98%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Security of ES

2014-04-30 Thread Adrien Grand

Hi,

Elasticsearch doesn't support any form of authentification or authorization
at the moment. The way users deal with this issue is usually by giving
access to Elasticsearch through a proxy that would handle security based on
the path of the URL.

On Wed, Apr 30, 2014 at 5:56 PM, Patrick Proniewski
elasticsea...@patpro.net wrote:

Hello,

As a BOfH, I'm quite used to provide auth-based access to IT resources. As
CISO I must guaranty that users get only what they need, especially about
sensitive content. Unfortunately I can't find anything about
authentication, and security in ES documentation. It looks like the product
is designed like memcached: it's there and free to use.

Is there any way to provide some partitioning inside an ES cluster, so
that we can share the cluster without sharing the data?

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/E22ED5A1-1554-4558-BBC7-3408CBA3C179%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j53TD4iwPrP76RcKP6ofojtho%2Bt2o9BCbNsx3u0BLGpRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Security of ES

2014-04-30 Thread David Pilato

Yes. By now, you have to deal with security yourself.

So, secure URL using Ngnix for example, use aliases which will expose alias URL
and not direct index URL.
Use filters in aliases.

Example:

Let's say you have a groupid field in your documents and you have a doc index.
A doc A belongs to groupid marketing.
Doc B belongs to groupid finances.

Create an alias marketing which uses doc index with a prebuilt filter on
groupid with marketing.
Same for finances.

Then secure your URLs using Nginx and let users only access to the right URLs
(aliases) they should see.

My 2 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 30 avril 2014 à 17:56:10, Patrick Proniewski (elasticsea...@patpro.net) a
écrit:

Hello,

As a BOfH, I'm quite used to provide auth-based access to IT resources. As CISO
I must guaranty that users get only what they need, especially about sensitive
content. Unfortunately I can't find anything about authentication, and security
in ES documentation. It looks like the product is designed like memcached: it's
there and free to use.

Is there any way to provide some partitioning inside an ES cluster, so that we
can share the cluster without sharing the data?

thanks,
Patrick

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53611f0e.257130a3.2280%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: ES and SAN storage

2014-04-30 Thread Mohit Anchlia

I think anyone will find it difficult to answer such questions just because
there are several factors that derive the decision like latency
requirements, high availability requirements, how shared SAN storage is and
impact of somebody stealing IO under the hood etc. The best way is to
develop a test model and test it out. Look at cluster settings on how to
disable/enable shard allocation.

On Wed, Apr 30, 2014 at 8:47 AM, Patrick Proniewski
elasticsea...@patpro.net wrote:

Hello,

I'm still testing ES at a very small scale (1 node on a multipurpose
server), but I would like to extend it's use at work as a backend for
logstash. It means that the LS+ES cluster would have to eat few GB of data
every day, up to 15 or 20GB later if things go well.
I'm doing all this as a side project: no investment apart from work hours.
I will recycle blades and storage we plan to decommission from our
virtualization farm.
So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal
storage (an SD-card), and a LUN on a SAN.

How does ES behave is shared storage condition? What are the best
practices about nodes/shards/replicas/...?
Intended audience is Operation team, so less than 10 persons. So no big
search concurrency but probably mostly deep search and ill-designed
queries :)

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0EF076AD-2908-4860-A97F-060A5C511AC3%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWrdPOcspORJT_AR%3DXUNQ5H0xfVcEpL%2B6aZ-sPb9X_Lsgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Multiple or per field highlight type

2014-04-30 Thread Shmullus

I have mapping where I set one field's mapping term_vector to be 
with_positions_offsets.
I would then like to search with highlights on all the fields, is that 
possible?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f6c4e9b9-6c52-4677-a735-8da93e16b507%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES and SAN storage

2014-04-30 Thread Patrick Proniewski

Well, then maybe my questions were not precise enough.
My first goal was to make sure ES does work sharing a unique storage for all
nodes.
My second gaol was to learn if each node requires to have its dedicated file
tree, or if you can put every files together as if there's only one ES node.
Does-it make sense to have replicas when eventually filesystem IOs are shared?
Does moving a shard from a node to another makes data passing through the CPU,
or is ES smart enough to just pass the pointer to the file?

On 30 avr. 2014, at 18:33, Mohit Anchlia wrote:

I think anyone will find it difficult to answer such questions just because
there are several factors that derive the decision like latency requirements,
high availability requirements, how shared SAN storage is and impact of
somebody stealing IO under the hood etc. The best way is to develop a test
model and test it out. Look at cluster settings on how to disable/enable
shard allocation.

On Wed, Apr 30, 2014 at 8:47 AM, Patrick Proniewski
elasticsea...@patpro.net wrote:
Hello,

I'm still testing ES at a very small scale (1 node on a multipurpose server),
but I would like to extend it's use at work as a backend for logstash. It
means that the LS+ES cluster would have to eat few GB of data every day, up
to 15 or 20GB later if things go well.
I'm doing all this as a side project: no investment apart from work hours. I
will recycle blades and storage we plan to decommission from our
virtualization farm.
So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal
storage (an SD-card), and a LUN on a SAN.

How does ES behave is shared storage condition? What are the best practices
about nodes/shards/replicas/...?
Intended audience is Operation team, so less than 10 persons. So no big
search concurrency but probably mostly deep search and ill-designed queries
:)

thanks,
Patrick

Re: Security of ES

2014-04-30 Thread Patrick Proniewski

Thanks Adrien.

On 30 avr. 2014, at 18:02, Adrien Grand wrote:

 Hi,
 
 Elasticsearch doesn't support any form of authentification or authorization 
 at the moment. The way users deal with this issue is usually by giving access 
 to Elasticsearch through a proxy that would handle security based on the path 
 of the URL.
 
 
 On Wed, Apr 30, 2014 at 5:56 PM, Patrick Proniewski 
 elasticsea...@patpro.net wrote:
 Hello,
 
 As a BOfH, I'm quite used to provide auth-based access to IT resources. As 
 CISO I must guaranty that users get only what they need, especially about 
 sensitive content. Unfortunately I can't find anything about authentication, 
 and security in ES documentation. It looks like the product is designed like 
 memcached: it's there and free to use.
 
 Is there any way to provide some partitioning inside an ES cluster, so that 
 we can share the cluster without sharing the data?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/859E581C-1821-4154-9DF8-461C1BFA225B%40patpro.net.
For more options, visit https://groups.google.com/d/optout.

Re: Security of ES

2014-04-30 Thread Patrick Proniewski

Hmmm ok
I'll have to think about this. I do get the proxy part, very easy, I'm doing
this kind of stuff for eons. Now you write I can discriminate URL's by
injecting an arbitrary field into my data and creating an alias that names a
prebuilt filter. I've discovered aliases just 2 hours ago, I'll have to dive
into this to understand exactly how it works, and in particular how it can be
used into a logstash install.

thanks for the tip.

On 30 avr. 2014, at 18:04, David Pilato wrote:

Yes. By now, you have to deal with security yourself.

So, secure URL using Ngnix for example, use aliases which will expose alias
URL and not direct index URL.
Use filters in aliases.

Example:

Let's say you have a groupid field in your documents and you have a doc
index.
A doc A belongs to groupid marketing.
Doc B belongs to groupid finances.

Create an alias marketing which uses doc index with a prebuilt filter on
groupid with marketing.
Same for finances.

Then secure your URLs using Nginx and let users only access to the right URLs
(aliases) they should see.

My 2 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 30 avril 2014 à 17:56:10, Patrick Proniewski (elasticsea...@patpro.net) a
écrit:

Hello,

As a BOfH, I'm quite used to provide auth-based access to IT resources. As
CISO I must guaranty that users get only what they need, especially about
sensitive content. Unfortunately I can't find anything about authentication,
and security in ES documentation. It looks like the product is designed like
memcached: it's there and free to use.

Is there any way to provide some partitioning inside an ES cluster, so that
we can share the cluster without sharing the data?

thanks,
Patrick

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53611f0e.257130a3.2280%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: ES and SAN storage

2014-04-30 Thread Mohit Anchlia

I'll try and answer as much I know:

ES shouldn't have any issues working with SAN, NFS or EBS. Yes each node
need its own unique file path, they don't share files from other nodes.
Replicas in this only make sense if you are solving for a VM or a node
failure per se. Or it also makes sense if you have SAN storage coming from
a different array.

I don't follow your last question.

On Wed, Apr 30, 2014 at 10:04 AM, Patrick Proniewski
elasticsea...@patpro.net wrote:

Well, then maybe my questions were not precise enough.
My first goal was to make sure ES does work sharing a unique storage for
all nodes.
My second gaol was to learn if each node requires to have its dedicated
file tree, or if you can put every files together as if there's only one ES
node.
Does-it make sense to have replicas when eventually filesystem IOs are
shared?
Does moving a shard from a node to another makes data passing through the
CPU, or is ES smart enough to just pass the pointer to the file?

On 30 avr. 2014, at 18:33, Mohit Anchlia wrote:

I think anyone will find it difficult to answer such questions just
because there are several factors that derive the decision like latency
requirements, high availability requirements, how shared SAN storage is and
impact of somebody stealing IO under the hood etc. The best way is to
develop a test model and test it out. Look at cluster settings on how to
disable/enable shard allocation.

On Wed, Apr 30, 2014 at 8:47 AM, Patrick Proniewski
elasticsea...@patpro.net wrote:
Hello,

I'm still testing ES at a very small scale (1 node on a multipurpose
server), but I would like to extend it's use at work as a backend for
logstash. It means that the LS+ES cluster would have to eat few GB of data
every day, up to 15 or 20GB later if things go well.
I'm doing all this as a side project: no investment apart from work
hours. I will recycle blades and storage we plan to decommission from our
virtualization farm.
So I'm likely to end with 2 or 3 dual-xeon blades, but no real internal
storage (an SD-card), and a LUN on a SAN.

thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/F6DDE665-B311-4964-A0BF-FFEF156E4FA3%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWrqqNrh7jbW3%2BvO%2BSpXdxRGTvB3zcCod6yPRMgt42kcUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Significant Term aggregation

2014-04-30 Thread Ramdev Wudali

Hi:
I have been trying to use (and successfully did) the Significant terms
aggregations in release 1.1.0. The blog posts about this feature
http://www.elasticsearch.org/blog/significant-terms-aggregation/ was
extremely helpful. Since this feature is in experimental stage and the
authors had requested feedback and me not knowing about how to provide
feedback regarding specific features, I am restarting to posting on this
group.

I had posted on a different thread regarding accessing the TFIDF scores for
terms so that I could investigate ways in which I could enhance my queries.
This lead me to look at the experimental Significant Terms Aggregation. It
does what it says quite well. and I am glad this functionality exists.
However, I would like to see some possibilities of enhancements:

What I noticed in my aggregation results is a lot of Stopwords (a, an,
the, at, and, etc.) being included as significant terms. perhaps having the
possibility of including Stopword lists so that these stop words are not
included in the signifiant term calculations. (The significance is
calculated based on how many times a term appears in the query result vs
how many times it appears in whole index. ) For common stop words this
calculation i going to make them very significant.

Another possible enhancement would be get a phrase significance (instead of
a single term, doing a multi term significance) would be nice.

In the blog post, a similar effect is obtained by highlighting the terms
that are identified as significant.But it would be nice to just look at the
buckets and determine that.

Cheers and Thanks for all the fish

Ramdev

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/95bec4ed-69c6-409d-b6b8-4bbe4c8da229%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Lucene Date Range Query in Kibana

2014-04-30 Thread Ramdev Wudali

Lucene and hence elastic search and hence Kibana allows for date range to 
be queries as [NOW-60DAY TO NOW] similar to what you said.





On Wednesday, 30 April 2014 10:37:33 UTC-5, Uli Bethke wrote:

 Is there a way in Kibana or Lucene to define a date range query as 
 Today-60 days.

 Something along the logical lines of visit_date: [*-60 TO *] 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1cb787e8-f7c2-4520-9cf1-4098c15d95de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Substring match in search term order using Elasticsearch

2014-04-30 Thread Ramdev Wudali

what happens when you query  as you indicated ?

did you try and wildchar query ? Also perhaps  an analyzer with the shingle 
token filter 
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html#analysis-shingle-tokenfilter)
 
will work better for your purposes ?

Ramdev


On Wednesday, 30 April 2014 09:15:35 UTC-5, Kruti Shukla wrote:

 Posted same question on stackover flow 
 http://stackoverflow.com/questions/23244796/substring-match-in-search-term-order-using-elasticsearch;
  but still looking for Answer.


 I'm new to elasticsearch

 I want to perform substring/partial word match using elastic search. I 
 want results to be returned in the perticular order. In order to explain my 
 problem I will show you how I create my index, mappings and what are the 
 records I use.

 *Creating Index and mappings:*

 PUT /my_index1
 {
 settings: {
 analysis: {
 filter: {
 trigrams_filter: {
 type: ngram,
 min_gram: 3,
 max_gram: 3
 }
 },
 analyzer: {
 trigrams: {
 type:  custom,
 tokenizer: standard,
 filter:   [
 lowercase,
 trigrams_filter
 ]
 }
 }
 }
 },
 mappings: {
 my_type1: {
 properties: {
 text: {
 type: string,
 analyzer: trigrams 
 }
 }
 }
 }
 }

 *Bulk record insert:*

 POST /my_index1/my_type1/_bulk
 { index: { _id: 1 }}
 { text: men's shaver }
 { index: { _id: 2 }}
 { text: men's foil shaver }
 { index: { _id: 3 }}
 { text: men's foil advanced shaver }
 { index: { _id: 4 }}
 { text: norelco men's foil advanced shaver }
 { index: { _id: 5 }}
 { text: men's shavers }
 { index: { _id: 6 }}
 { text: women's shaver }
 { index: { _id: 7 }}
 { text: women's foil shaver }
 { index: { _id: 8 }}
 { text: women's foil advanced shaver }
 { index: { _id: 9 }}
 { text: norelco women's foil advanced shaver }
 { index: { _id: 10 }}
 { text: women's shavers }

 *Now, I want to perform search for en's shaver. I'm searching using 
 follwing query:*

 POST /my_index1/my_type1/_search
 {
 query: {
match: {
   text: 
   { query: en's shaver,

 minimum_should_match: 100%

   }
}

 }
 }

 I want results to be in following sequence:

1. men's shaver -- closest match with following same search keyword 
order en's shaver
2. women's shaver -- closest match with following same search keyword 
order en's shaver
3. men's foil shaver -- increased distance by 1
4. women's foil shaver -- increased distance by 1
5. men's foil advanced shaver -- increased distance by 2
6. women's foil advanced shaver -- increased distance by 2
7. men's shavers -- substring match for shavers
8. women's shavers -- substring match for shavers

 I'm performing following query. It is not giving me result in the order I 
 want:

 POST /my_index1/my_type1/_search
 {
query: {
   query_string: {
  default_field: text,
  query: men's shaver,
  minimum_should_match: 90%
   }
}
 }



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/df570460-9e71-4c4b-9208-c5a7f467cde5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cannot asynchronously update replica settings over many tables (ES 0.90.7)

2014-04-30 Thread Michael D. Moffitt

Hi all,

I am trying to grow my replicas from 0 to 2 across about 300 tables. I'm 
doing this by asynchronously issuing an UpdateSettingsRequest (through the 
Java client) for each table.

The first 100 go through fine (responding with a UpdateSettingsResponse), 
but the final ~200 fail with this exception:

 Failure is org.elasticsearch.transport.RemoteTransportException: 
[my-cluster][inet[/w.x.y.z:9300]][indices/settings/update]

We're using ES version 0.90.7.  Any ideas what might be clogging the pipes?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a61bfd14-e5d3-44ac-b3eb-2f1e95268101%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

help with jdbc rivers and type mapping

2014-04-30 Thread Eric Sims

i can't seem to understand how to fully set up my type mappings while using 
jdbc rivers and sql server.

here's an example.

PUT /_river/mytest_river/_meta
{
type: jdbc,
jdbc: {
  url:jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase,
  user:myuser,
  password:xxx,
  sql:select * from dbo.musicalbum (nolock),
  strategy : oneshot,
  index : myindex,
  type : album,
  bulk_size : 100,
  max_retries: 5,
  max_retries_wait:30s,
  max_bulk_requests : 5,
  bulk_flush_interval : 5s,
  type_mapping: {
  album: {properties: {
   AlbumDescription: {type: string},
   AlbumID: {type: string},
   Artist: {type: string},
   Genre: {type: string,index : not_analyzed},
   Label: {type: string},
   Title: {type: string},
   _id : {path : AlbumID}
}
  }
   }
}
}

so you can see i've specified both a select statement (which normally would 
dynamically produce the mapping for me) and also a type mapping. in the 
type mapping i've tried to specify that i want the _id to be the same as 
AlbumID, and also that i want the Genre to be not_analyzed. it ends up 
throwing multiple errors, only indexing one document, and not creating my 
full mapping.

here's what the mapping ends up looking like: (skipping some of the columns 
altogether!)

{
   myindex: {
  mappings: {
 album: {
properties: {
   AlbumDescription: {
  type: string
   },
   AlbumID: {
  type: string
   },
   Artist: {
  type: string
   },
   Genre: {
  type: string
   },
   Title: {
  type: string
   }
}
 }
  }
   }
}

any assistance would be helpful. it's driving me nuts.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c9af783-cf6c-4e41-a287-83ff5589350e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sense on github abandoned?

2014-04-30 Thread @mromagnoli

Agree 100%. Sense must return to Chrome Store! 

El martes, 29 de abril de 2014 11:52:49 UTC-3, Joshua Worden escribió:

 Would love to see this return to the chrome store. Was rather surprised to 
 see it gone when getting another developer started working with 
 elasticsearch. Even if it was buggy, it was the best way to get started.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES and SAN storage

2014-04-30 Thread Patrick Proniewski

On 30 avr. 2014, at 19:34, Mohit Anchlia wrote:

I'll try and answer as much I know:

ES shouldn't have any issues working with SAN, NFS or EBS. Yes each node need
its own unique file path, they don't share files from other nodes.

ok.

Replicas in this only make sense if you are solving for a VM or a node
failure per se. Or it also makes sense if you have SAN storage coming from a
different array.

ok.

I don't follow your last question.

My english is limited, sorry. As far as I understand ES, some shard balancing
occurs in the background, when some are created or deleted, others will move
from node to node so the number of shards is even between nodes. When storage
is isolated for each node, moving a shard to another node requires the file to
go through the node CPU/RAM, then network, then CPU/RAM of remote node, then
storage. It would be very nice in a shared-storage scenario that the shard
would not be moved through fs-cpu-ram-network-cpu-ram-fs but through a simple
rename-and-tell action.
Does it make sense?

On Wed, Apr 30, 2014 at 10:04 AM, Patrick Proniewski
elasticsea...@patpro.net wrote:
Well, then maybe my questions were not precise enough.
My first goal was to make sure ES does work sharing a unique storage for all
nodes.
My second gaol was to learn if each node requires to have its dedicated file
tree, or if you can put every files together as if there's only one ES node.
Does-it make sense to have replicas when eventually filesystem IOs are shared?
Does moving a shard from a node to another makes data passing through the
CPU, or is ES smart enough to just pass the pointer to the file?

Re: help with jdbc rivers and type mapping

2014-04-30 Thread joergpra...@gmail.com

Thanks for the report.

Does it work if you create the index with the custom mapping beforehand,
with tool like curl?

The JDBC river will use existing index then.

Jörg



On Wed, Apr 30, 2014 at 9:56 PM, Eric Sims eric.sims.aent@gmail.comwrote:

 i can't seem to understand how to fully set up my type mappings while
 using jdbc rivers and sql server.

 here's an example.

 PUT /_river/mytest_river/_meta
 {
 type: jdbc,
 jdbc: {
   url:jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase,
   user:myuser,
   password:xxx,
   sql:select * from dbo.musicalbum (nolock),
   strategy : oneshot,
   index : myindex,
   type : album,
   bulk_size : 100,
   max_retries: 5,
   max_retries_wait:30s,
   max_bulk_requests : 5,
   bulk_flush_interval : 5s,
   type_mapping: {
   album: {properties: {
AlbumDescription: {type: string},
AlbumID: {type: string},
Artist: {type: string},
Genre: {type: string,index : not_analyzed},
Label: {type: string},
Title: {type: string},
_id : {path : AlbumID}
 }
   }
}
 }
 }

 so you can see i've specified both a select statement (which normally
 would dynamically produce the mapping for me) and also a type mapping. in
 the type mapping i've tried to specify that i want the _id to be the same
 as AlbumID, and also that i want the Genre to be not_analyzed. it ends up
 throwing multiple errors, only indexing one document, and not creating my
 full mapping.

 here's what the mapping ends up looking like: (skipping some of the
 columns altogether!)

 {
myindex: {
   mappings: {
  album: {
 properties: {
AlbumDescription: {
   type: string
},
AlbumID: {
   type: string
},
Artist: {
   type: string
},
Genre: {
   type: string
},
Title: {
   type: string
}
 }
  }
   }
}
 }

 any assistance would be helpful. it's driving me nuts.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4c9af783-cf6c-4e41-a287-83ff5589350e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/4c9af783-cf6c-4e41-a287-83ff5589350e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEGjQfv%2BkRgia-GRu8D805hmv%2BLUkLXtCBX8VxHSFTTEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: The effect of multi-fields and copy_to on storage size

2014-04-30 Thread Jeremy McLain

Ideas anyone?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/21311c5e-c0d5-4896-8560-a24e1683b1fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sense on github abandoned?

2014-04-30 Thread Ivan Brusic

Must is a strong word. I highlighted some alternatives earlier.
On Apr 30, 2014 1:01 PM, @mromagnoli marce.romagn...@gmail.com wrote:

 Agree 100%. Sense must return to Chrome Store!

 El martes, 29 de abril de 2014 11:52:49 UTC-3, Joshua Worden escribió:

 Would love to see this return to the chrome store. Was rather surprised
 to see it gone when getting another developer started working with
 elasticsearch. Even if it was buggy, it was the best way to get started.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCZwdaaKvTDbyTFJ4oOMaxHiY63GSWmDN0shEJPbbB%2BgA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sense on github abandoned?

2014-04-30 Thread @mromagnoli

Yeah, maybe you are right. Anyway i have installed Marvel, and make a 
bookmark in Chrome with the URL to Sense.

Perhaps I cried in advance ;P

El miércoles, 30 de abril de 2014 17:19:36 UTC-3, Ivan Brusic escribió:

 Must is a strong word. I highlighted some alternatives earlier.
 On Apr 30, 2014 1:01 PM, @mromagnoli marce.r...@gmail.com javascript: 
 wrote:

 Agree 100%. Sense must return to Chrome Store! 

 El martes, 29 de abril de 2014 11:52:49 UTC-3, Joshua Worden escribió:

 Would love to see this return to the chrome store. Was rather surprised 
 to see it gone when getting another developer started working with 
 elasticsearch. Even if it was buggy, it was the best way to get started.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/98d89444-f75c-4f50-aece-6e55337c868d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fd0ec98d-d507-4a01-a9e7-d59535637465%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: help with jdbc rivers and type mapping

2014-04-30 Thread Eric Sims

no. i just tried deleting all indexes, then i did:

PUT /myindex

then 

PUT /myindex/album/_mapping
{
  myindex: {
mappings: {
   album: {
  properties: {
   AlbumDescription: {type: string},
   AlbumID: {type: string},
   Artist: {type: string},
   Genre: {type: string,index : not_analyzed},
   Label: {type: string},
   Title: {type: string},
   _id : {path : AlbumID}
}
   }
}
  }
}

then i ran the PUT statement in my previous post.

it still treats it as dynamic mappings

On Wednesday, April 30, 2014 3:56:22 PM UTC-4, Eric Sims wrote:

 i can't seem to understand how to fully set up my type mappings while 
 using jdbc rivers and sql server.

 here's an example.

 PUT /_river/mytest_river/_meta
 {
 type: jdbc,
 jdbc: {
   url:jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase,
   user:myuser,
   password:xxx,
   sql:select * from dbo.musicalbum (nolock),
   strategy : oneshot,
   index : myindex,
   type : album,
   bulk_size : 100,
   max_retries: 5,
   max_retries_wait:30s,
   max_bulk_requests : 5,
   bulk_flush_interval : 5s,
   type_mapping: {
   album: {properties: {
AlbumDescription: {type: string},
AlbumID: {type: string},
Artist: {type: string},
Genre: {type: string,index : not_analyzed},
Label: {type: string},
Title: {type: string},
_id : {path : AlbumID}
 }
   }
}
 }
 }

 so you can see i've specified both a select statement (which normally 
 would dynamically produce the mapping for me) and also a type mapping. in 
 the type mapping i've tried to specify that i want the _id to be the same 
 as AlbumID, and also that i want the Genre to be not_analyzed. it ends up 
 throwing multiple errors, only indexing one document, and not creating my 
 full mapping.

 here's what the mapping ends up looking like: (skipping some of the 
 columns altogether!)

 {
myindex: {
   mappings: {
  album: {
 properties: {
AlbumDescription: {
   type: string
},
AlbumID: {
   type: string
},
Artist: {
   type: string
},
Genre: {
   type: string
},
Title: {
   type: string
}
 }
  }
   }
}
 }

 any assistance would be helpful. it's driving me nuts.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1bda2b24-8fc4-4706-a43f-cadf820ebc6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES and SAN storage

2014-04-30 Thread Mohit Anchlia

It makes sense if it was just as simple :) The reason shards need to move
through the higher level of stack is that every node maintains it's own
indexes or lucene segments and it can't just be switched. And I think that
is primarily because of how internal structures are maintained in lucene.
You might be able to develop a workaround using one or more of these
settings:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html

On Wed, Apr 30, 2014 at 1:05 PM, Patrick Proniewski
elasticsea...@patpro.net wrote:

On 30 avr. 2014, at 19:34, Mohit Anchlia wrote:

I'll try and answer as much I know:

ES shouldn't have any issues working with SAN, NFS or EBS. Yes each node
need its own unique file path, they don't share files from other nodes.

ok.

Replicas in this only make sense if you are solving for a VM or a node
failure per se. Or it also makes sense if you have SAN storage coming from
a different array.

ok.

I don't follow your last question.

My english is limited, sorry. As far as I understand ES, some shard
balancing occurs in the background, when some are created or deleted,
others will move from node to node so the number of shards is even between
nodes. When storage is isolated for each node, moving a shard to another
node requires the file to go through the node CPU/RAM, then network, then
CPU/RAM of remote node, then storage. It would be very nice in a
shared-storage scenario that the shard would not be moved through
fs-cpu-ram-network-cpu-ram-fs but through a simple rename-and-tell action.
Does it make sense?

On Wed, Apr 30, 2014 at 10:04 AM, Patrick Proniewski
elasticsea...@patpro.net wrote:
Well, then maybe my questions were not precise enough.
My first goal was to make sure ES does work sharing a unique storage for
all nodes.
My second gaol was to learn if each node requires to have its dedicated
file tree, or if you can put every files together as if there's only one ES
node.
Does-it make sense to have replicas when eventually filesystem IOs are
shared?
Does moving a shard from a node to another makes data passing through
the CPU, or is ES smart enough to just pass the pointer to the file?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2D4B8E1F-3513-465F-B864-65401D9E38E1%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqDyjcfPKxvY37b%2B%2BTwnDk7xj9A%2BL0k19wiLG58XNGPZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: help with jdbc rivers and type mapping

2014-04-30 Thread joergpra...@gmail.com

The mapping has errors. Something like this might work better:

DELETE /myindex

PUT /myindex

PUT /myindex/album/_mapping
{

   album: {
  properties: {
   AlbumDescription: {type: string},
   AlbumID: {type: string},
   Artist: {type: string},
   Genre: {type: string,index : not_analyzed},
   Label: {type: string},
   Title: {type: string},
   _id : {
index_name : album.AlbumID,
path : full,
type : string
   }
}
   }
}

GET /myindex/album/_mapping

Jörg



On Wed, Apr 30, 2014 at 10:34 PM, Eric Sims eric.sims.aent@gmail.comwrote:

 no. i just tried deleting all indexes, then i did:

 PUT /myindex

 then

 PUT /myindex/album/_mapping
 {
   myindex: {
 mappings: {
album: {
   properties: {
AlbumDescription: {type: string},
AlbumID: {type: string},
Artist: {type: string},
Genre: {type: string,index : not_analyzed},
Label: {type: string},
Title: {type: string},
_id : {path : AlbumID}
 }
}
 }
   }
 }

 then i ran the PUT statement in my previous post.

 it still treats it as dynamic mappings

 On Wednesday, April 30, 2014 3:56:22 PM UTC-4, Eric Sims wrote:

 i can't seem to understand how to fully set up my type mappings while
 using jdbc rivers and sql server.

 here's an example.

 PUT /_river/mytest_river/_meta
 {
 type: jdbc,
 jdbc: {
   url:jdbc:sqlserver://mydbserver:1433;databaseName=mydatabase,
   user:myuser,
   password:xxx,
   sql:select * from dbo.musicalbum (nolock),
   strategy : oneshot,
   index : myindex,
   type : album,
   bulk_size : 100,
   max_retries: 5,
   max_retries_wait:30s,
   max_bulk_requests : 5,
   bulk_flush_interval : 5s,
   type_mapping: {
   album: {properties: {
AlbumDescription: {type: string},
AlbumID: {type: string},
Artist: {type: string},
Genre: {type: string,index : not_analyzed},
Label: {type: string},
Title: {type: string},
_id : {path : AlbumID}
 }
   }
}
 }
 }

 so you can see i've specified both a select statement (which normally
 would dynamically produce the mapping for me) and also a type mapping. in
 the type mapping i've tried to specify that i want the _id to be the same
 as AlbumID, and also that i want the Genre to be not_analyzed. it ends up
 throwing multiple errors, only indexing one document, and not creating my
 full mapping.

 here's what the mapping ends up looking like: (skipping some of the
 columns altogether!)

 {
myindex: {
   mappings: {
  album: {
 properties: {
AlbumDescription: {
   type: string
},
AlbumID: {
   type: string
},
Artist: {
   type: string
},
Genre: {
   type: string
},
Title: {
   type: string
}
 }
  }
   }
}
 }

 any assistance would be helpful. it's driving me nuts.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1bda2b24-8fc4-4706-a43f-cadf820ebc6c%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/1bda2b24-8fc4-4706-a43f-cadf820ebc6c%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGkTLqF6VC4kSYMT2WjnAcLiLF4RE-DG4914uc31DdRGg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Limit the amount of data generated by Marvel with marvel.agent.interval ?

2014-04-30 Thread Logan Hardy

I'm managing a pretty badass 11 node Elasticsearch cluster that is powering
a customer facing dashboard reporting platform. 20 cores per node, 64GB
RAM, SSDs, Dual 10 GbE of awesome. I evaluated Marvel while we were still
in development on the new platform and I found it to be a very valuable
tool. At first Marvel was indexing to the same cluster we were monitoring
and this was okay while we were in development as there were plenty of
extra cycles in the cluster to handle the load but now that we are in
production it doesn't make sense to burden the cluster with this. The
nature of our reporting system requires us to to have an index for each
customer so we're currently at 328 indexes and over 10,000 shards total.
The amount of data indexed by Marvel increases dramatically as the number
of indices increases so once we got over 300 indices in the system the
daily marvel index ended up at around 400 GB replicated and was indexing
around 2,000 documents a second by itself.

What I want to do is have Marvel index to a not as awesome 2 node
Elasticsearch monitoring cluster. 12 cores, 64 GB RAM and spinning disks.
But in practice these 2 nodes are unable to keep up with the load and get
completely bogged down. I'm thinking I can sacrifice redundancy and buy
myself some cycles by not using any replicas on the Marvel index. My other
idea is to set marvel.agent.interval from the default 10s to something like
30s on the assumption that this will cut the amount of data generated by a
third. Does this sound sane or do you have anyone have other ideas on what
I can try to limited the load?

marvel.agent.interval

Controls the interval between data samples. Defaults to 10s. Set to -1 to
temporarily disable exporting.

This setting is update-able via the Cluster Update Settings API.

Thanks -Logan-

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5884045a-49f7-48d4-a3cb-93a5f70c53cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: help with jdbc rivers and type mapping

2014-04-30 Thread Eric Sims

here's another weird bit. it doesn't seem to show the mappings right after 
i set them:

PUT /myindex/album/_mapping
{
  myindex: {
mappings: {
   album: {
  properties: {
   albumdescription: {type: string},
   albumid: {type: string},
   artist: {type: string},
   genre: {type: string, index : not_analyzed},
   label: {type: string, analyzer: whitespace},
   title: {type: string},
   time: {type : string},
   _id : {
index_name : album.AlbumID, 
path : full, 
type : string
   }
}
   }
}
  }
}


GET /myindex/album/_mapping

returns this:

{
   myindex: {
  mappings: {
 album: {
properties: {}
 }
  }
   }
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55b7887e-43e3-4836-bef7-55e4c9c6c8e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Limit the amount of data generated by Marvel with marvel.agent.interval ?

2014-04-30 Thread Mark Walkom

That's pretty sane. I believe the newest version of marvel increased the
default from 5s to 10s.

But be aware, you are breaking the license for Marvel with that number of
nodes - http://www.elasticsearch.org/overview/marvel/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 1 May 2014 06:52, Logan Hardy loganbha...@gmail.com wrote:

I'm managing a pretty badass 11 node Elasticsearch cluster that is
powering a customer facing dashboard reporting platform. 20 cores per node,
64GB RAM, SSDs, Dual 10 GbE of awesome. I evaluated Marvel while we were
still in development on the new platform and I found it to be a very
valuable tool. At first Marvel was indexing to the same cluster we were
monitoring and this was okay while we were in development as there were
plenty of extra cycles in the cluster to handle the load but now that we
are in production it doesn't make sense to burden the cluster with this.
The nature of our reporting system requires us to to have an index for each
customer so we're currently at 328 indexes and over 10,000 shards total.
The amount of data indexed by Marvel increases dramatically as the number
of indices increases so once we got over 300 indices in the system the
daily marvel index ended up at around 400 GB replicated and was indexing
around 2,000 documents a second by itself.

marvel.agent.interval

Controls the interval between data samples. Defaults to 10s. Set to -1 to
temporarily disable exporting.

This setting is update-able via the Cluster Update Settings API.

Thanks -Logan-

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5884045a-49f7-48d4-a3cb-93a5f70c53cf%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5884045a-49f7-48d4-a3cb-93a5f70c53cf%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624b_tp-8afb-okJSkWQ76KKbzFf9gaa97RJheLCx8-Zg0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Performance of Indexed-Shape Queries Vs Geoshape Queries

2014-04-30 Thread Ilya Paripsa

Hi Alex,

Thanks for your response.

Does this mean that the shape that I query by does not need to be indexed
by Elasticsearch on the fly? Or does this mean that the indexing of the
shape is so quick it does not affect the query latency?

Thank you,
Ilya.

On 21 April 2014 22:46, Alexander Reelsen a...@spinscale.de wrote:

Hey,

the main difference is basically the network overhead. What happens behind
the curtains is that a GET request for the shape is being executed if you
specify it in the request and then this shape is used instead of the
provided one.

Makes sense?

--Alex

On Tue, Apr 15, 2014 at 6:50 AM, ipari...@thoughtworks.com wrote:

Hi,

We ran tests comparing performance of Indexed-Shape Queries to custom
Geoshape Queries. We found that Elasticsearch yielded roughly same results
in both cases. We expected Indexed Shape queries to be faster than custom
Geoshape queries. Our understanding is that Elasticsearch has to convert
the custom geoshapes to quadtree on the fly as opposed to having it
pre-generated. I was wondering if anyone could let us know why there is
no difference in performance between these two query types.

*Experiment Design*

We indexed suburb boundary geometries into one doctype, and geocoded
points of interest (POIs) into another. We picked top 20 suburbs with
geometries that have most vertices, and ran two following queries for each
suburb geometry.

Geoshape Query

GET /spike_index/doc_type_pois/_search
{
query: {
geo_shape: {
field_geocode: {
shape: {
type: polygon,
coordinates: [ suburb multipolygon ]
}
}
}
}
}

Indexed-Shape Query

GET /spike_index/doc_type_pois/_search
{
query: {
geo_shape: {
field_geocode: {
indexed_shape: {
id: pre-indexed-geometry-id,
type: doc_type_suburb_quadtree,
index: spike_index,
path: field_geometry
}
}
}
}
}

The test was carried out using Siege from a box located within the same
VPC as the Elasticsearch instances. Please find the results below.

*Indexed-Shape Query Results*

Transactions:749559 hits
Availability:100.00 %
Elapsed time:602.80 secs
Data transferred: 10342.97 MB
Response time: 0.01 secs
Transaction rate: 1243.46 trans/sec
Throughput: 17.16 MB/sec
Concurrency: 14.92
Successful transactions: 749559
Failed transactions:0
Longest transaction: 5.01
Shortest transaction: 0.00

*Geoshape Query Results*

Transactions:723894 hits
Availability:100.00 %
Elapsed time:599.16 secs
Data transferred: 9988.83 MB
Response time: 0.01 secs
Transaction rate: 1208.18 trans/sec
Throughput: 16.67 MB/sec
Concurrency: 14.92
Successful transactions: 723894
Failed transactions:0
Longest transaction: 1.02
Shortest transaction: 0.00

If anyone could shed some light on why the results of these queries are
the same that would be very helpful.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/bfebad47-fd6d-45fe-8bca-97eb14199dad%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/bfebad47-fd6d-45fe-8bca-97eb14199dad%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qwLNX9SXnkY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8LaaFdzazyaNrfWV8wRydduNX57kFU2w_6pw5-O2Gabg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAGCwEM8LaaFdzazyaNrfWV8wRydduNX57kFU2w_6pw5-O2Gabg%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

Re: how to aggregate by metadata (types/field names)?

2014-04-30 Thread 'almineev .' via elasticsearch



 bump

I'm new to elastic, considering to move from a proprietary system...
I'm blocked on the fact that I can't get list of field hits per document as 
part of search results... Any help any clue?

 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/19e12a97-9db5-4d54-b8dd-91662c82a22a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

44 matches

Mail list logo