date:20140818

You need to use scroll if you have that requirement.

See:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#search-request-scroll

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 août 2014 à 08:02, Ron Sher ron.s...@gmail.com a écrit :

Hi,

We've noticed a strange behavior in elasticsearch during paging.

In one case we use a paging size of 60 and we have 63 documents. So the first
page is using size 60 and offset 0. The second page is using size 60 and
offset 60. What we see is that the result is inconsistent. Meaning, on the
2nd page, we sometimes get results that were before in the 1st page.

The query we use has an order by some numberic field that has many documents
with the same value (0).
It looks like the ordering between documents according to the same value,
which is 0, isn't consistent.

Did anyone encounter such behavior? Any suggestions on resolving this?

We're using version 1.3.1.

Thanks,
Ron
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJpcYKepYzh%2BBU2MSD2RQ19zjHYiXgf3anWBL9esq9fkGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8DAEA97B-687A-44A6-B638-189A49D6310E%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: inconsistent paging

Hi Ron,

The cause of this issue is that Elasticsearch uses Lucene's internal doc
IDs as tie-breakers. Internal doc IDs might be completely different across
replicas of the same data, so this explains why documents that have the
same sort values are not consistently ordered.

There are 2 potential ways to fix that problem:
1. Use scroll as David mentionned. It will create a context around your
request and will make sure that the same shards will be used for all pages.
However, it also gives another warranty, which is that the same
point-in-time view on the index will be used for each page, and this is
expensive to maintain.
2. Use a custom string value as a preference in order to always hit the
same shards for a given session[1]. This will help with always hitting the
same shards likely to 1. but without adding the additional cost of a scroll.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html

On Mon, Aug 18, 2014 at 8:02 AM, Ron Sher ron.s...@gmail.com wrote:

Hi,

We've noticed a strange behavior in elasticsearch during paging.

In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60. What we see is that the result is inconsistent. Meaning,
on the 2nd page, we sometimes get results that were before in the 1st page.

The query we use has an order by some numberic field that has many
documents with the same value (0).
It looks like the ordering between documents according to the same value,
which is 0, isn't consistent.

Did anyone encounter such behavior? Any suggestions on resolving this?

We're using version 1.3.1.

Thanks,
Ron

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJpcYKepYzh%2BBU2MSD2RQ19zjHYiXgf3anWBL9esq9fkGQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHuyJpcYKepYzh%2BBU2MSD2RQ19zjHYiXgf3anWBL9esq9fkGQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7FJofXSpDjHnpMVs1poHFREbrQ9DPnPX4YnjFjUKg_ng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help with the percentiles aggregation

Hi John,

You should be able to do something like:

{
  aggs: {
verb: {
  terms: {
field: verb
  },
  aggs: {
load_time_outliers: {
  percentiles: {
field: responsetime
  }
}
  }
}
  }
}

This will first break down your documents according to the http verb that
is being used and then compute percentiles separately for each unique verb.



On Fri, Aug 15, 2014 at 11:23 AM, John Ogden johnog65...@gmail.com wrote:

 Hi,

 Am trying to run a single command which calculates percentiles for
 multiple search queries.
 The data for this is an Apache log file, and I want to get the percentile
 response times for the gets, posts, heads (etc) in one go

 If I run this:
 curl -XPOST 'http://localhost:9200/_search?search_type=countpretty=true'
 -d '{
 facets: {
 0: {query : {term : { verb : get  }}},
 1: {query : {term : { verb : post }}}
 },
 aggs : {load_time_outlier : {percentiles : {field :
 responsetime}}}
 }'

 The response I get back has the counts for each subquery but only does the
 aggregations for the overall dataset
   facets : {
 0 : {
   _type : query,
   count : 5678
 },
 1 : {
   _type : query,
   count : 1234
 }
   },
   aggregations : {
 load_time_outlier : {
   values : {
 1.0 : 0.0,
  ...
 99.0 : 1234
   }
 }
   }

 I cant figure out how to structure the request so that I get the
 percentiles separately for each of the queries

 Could someone point me in in the right direction please

 Many thanks
 John

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/9d9696cb-adfa-4812-bd81-5efee0d29032%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/9d9696cb-adfa-4812-bd81-5efee0d29032%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5JwTLK2q10fEKX6bVBzYH69dSRgA2njoEvhhronqfh1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: impact of stored fields on performance

Hi Ashish,

On Thu, Aug 14, 2014 at 12:35 AM, Ashish Mishra laughingbud...@gmail.com
wrote:

That sounds possible. We are using spindle disks. I have ~36Gb free for
the filesystem cache, and the previous data size (without the added field)
was 60-65Gb per node. So it's likely that 50% of queries were previously
addressed out of the FS cache, even more if queries are unevenly
distributed.
Data size is now 200Gb/node. So only ~18% of queries could hit the cache
and the rest would incur seek times.

Hmm... given this knowledge, is there a way to mitigate the effect without
moving everything to SSD? Only a minority of queries return the stored
field and it is not indexed. Ideally, it would be stored in separate
(colocated) files from the indexed fields. That way, most queries would be
unaffected and only those returning the value incur the seek cost.

I imagine indexes with _source enabled would see similar effects.

Is a parent-child relationship a good way to achieve the scenario above?
The parent can contain indexed fields and the child has stored fields.
Not sure if this just introduces new problems.

I think that you don't even need parent/child relations for this. If you
identify a few large stored fields that you rarely need, you could store
them in a different index with the same _id and only GET them on demand.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j48QpGoV6Gh8ns5SzrABLFmZLMjWx6iEUGea2evx06kAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: accessing field data faster in script

Script filters are inherently slow due to the fact that they cannot
leverage the inverted index in order to skip efficiently over non-matching
documents. Even if they were written in assembly, this would likely still
be slow.

What kind of filtering are you trying to do with scripts?

On Thu, Aug 14, 2014 at 8:42 AM, avacados kotadia.ak...@gmail.com wrote:

How to access field data faster from native (java) script ??? should i
enable 'doc values'?

I am already using doc().getField() and casting to long. It is date field
type. But whenever, my argument to script changes, it has poor performance
for search query. Subsequent call with same argument has good performance.
(might be because _cache is true for that script filter.)

Thanks.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/afd89e62-0773-4684-904d-53805d9d7358%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/afd89e62-0773-4684-904d-53805d9d7358%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5MH4Pw_sLy9M7Tr01gH0L-QQbRfXQSQZg7iYrFT_EQtA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Return selected fields from aggregation?

Can you elaborate more on what you are after?

On Wed, Aug 13, 2014 at 5:16 PM, project2501 darreng5...@gmail.com wrote:

The old facet DSL was very nice and easy to understand. I could declare
only which fields I wanted returned.

how is this done with aggregations? The docs do not say.

I am only interested in the aggregation metrics not all the document
results.

I tried setting size:0 but that DOES NOT EVEN WORK.

Any help appreciated.

Thank you,
D

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2c845104-59c0-4ea4-90a6-551a93dc3f99%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c845104-59c0-4ea4-90a6-551a93dc3f99%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4m%3DLXao3%3Df%2BgV5uMXO_nLhLq-7fe-JcCh1oKhp2f8jYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

inconsistent paging

2014-08-18 Thread ronsher

We've noticed a strange behavior in elasticsearch during paging.

In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60.

What we see is that the result is inconsistent. Meaning, on the 2nd page, we
sometimes get results that were before in the 1st page.

The query we use has an order by some numberic field that has many documents
with the same value (0). 
It looks like the ordering between documents according to the same value,
which is 0, isn't consistent.

Did anyone encounter such behavior? Any suggestions on resolving this?

We're using version 1.3.1.

Thanks,
Ron



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/inconsistent-paging-tp4061986.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1408336696765-4061986.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Access to AbstractAggregationBuilder.name

Hi Phil,

We would indeed consider a PR for that change if it makes things easier to
you. Feel free to ping me when you open it so that I don't miss it.

On Wed, Aug 13, 2014 at 3:55 PM, Phil Wills otherp...@gmail.com wrote:

Hello,

In the Java API AbstractAggregationBuilder's name property is protected.
Is there a particular reason it can't be public, or have an accessor added,
or is this something you'd consider a PR for?

Not having access is making things more complicated than I'd like.

Thanks,

Phil

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/63c1ae3f-ff37-4f47-9147-037d19ff9eec%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/63c1ae3f-ff37-4f47-9147-037d19ff9eec%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6kGnOjSHYMQD%3DtEsJvaMuGps3qBHyaTHGQk%3DemKbXGvA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: inconsistent paging

2014-08-18 Thread vineeth mohan

You have asked teh same question from another GMAIL ID.
Please refer to the answers over there.

Thanks
Vineeth

On Mon, Aug 18, 2014 at 10:08 AM, ronsher rons...@gmail.com wrote:

We've noticed a strange behavior in elasticsearch during paging.

In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60.

What we see is that the result is inconsistent. Meaning, on the 2nd page,
we
sometimes get results that were before in the 1st page.

The query we use has an order by some numberic field that has many
documents
with the same value (0).
It looks like the ordering between documents according to the same value,
which is 0, isn't consistent.

Did anyone encounter such behavior? Any suggestions on resolving this?

We're using version 1.3.1.

Thanks,
Ron

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/inconsistent-paging-tp4061986.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DyZNSi7usuBuc88N26rXrOeZ0682VJq-6cFE281fkV2A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: accessing field data faster in script

2014-08-18 Thread avacados

Thanks Adrien for reply.


My script filter was,
===
{
 script: {
script: xyz,
params: {
   startRange: 1407939675,   // Timestamp in 
milliseconds ... keep changing on all queries
   endRange: 1410531675 // Timestamp in 
milliseconds. keep changing on all queries
},
lang: native,
_cache: true   // I removed this caching and i 
found significant performance improvement... do you know why ? :-)
 }
  },

===
My Native(Java) script code  // Return true if date ranges overlaps.

===

ScriptDocValues XsDocValue = (ScriptDocValues) doc().get(

start_time);

long XsLong = 0l;

if (XsDocValue != null  !XsDocValue.isEmpty()) {

XsLong = ((ScriptDocValues.Longs) doc().get(start_time))

.getValue();

}

ScriptDocValues XeDocValue = (ScriptDocValues) doc().get(end_time);

long XeLong = 0l;

if (XeDocValue != null  !XeDocValue.isEmpty()) {

XeLong = ((ScriptDocValues.Longs) doc().get(end_time))

.getValue();

}

if ((endRange = XsLong)  (startRange = XeLong)) {

return true;

}

===

On Monday, August 18, 2014 1:50:17 PM UTC+5:30, Adrien Grand wrote:

 Script filters are inherently slow due to the fact that they cannot 
 leverage the inverted index in order to skip efficiently over non-matching 
 documents. Even if they were written in assembly, this would likely still 
 be slow.

 What kind of filtering are you trying to do with scripts?


 On Thu, Aug 14, 2014 at 8:42 AM, avacados kotadi...@gmail.com 
 javascript: wrote:

 How to access field data faster from native (java) script ??? should i 
 enable 'doc values'?

 I am already using doc().getField() and casting to long. It is date field 
 type. But whenever, my argument to script changes, it has poor performance 
 for search query. Subsequent call with same argument has good performance. 
 (might be because _cache is true for that script filter.)

 Thanks.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/afd89e62-0773-4684-904d-53805d9d7358%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/afd89e62-0773-4684-904d-53805d9d7358%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/31232686-2208-4c9e-a0a5-53e7e33ba275%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Using a char_filter in combination with a lowercase filter

2014-08-18 Thread Matthias Hogerheijde

Hi,

We're using Elasticsearch with an Analyzer to map the `y` character to 
`ij`, (*char_fitler* named char_mapper) since in Dutch these two are 
somewhat interchangeable. We're also using a *lowercase filter*.

This is the configuration:

{
  analysis: {
analyzer: {
  index: {
type: custom,
tokenizer: standard,
filter: [
  lowercase,
  synonym_twoway,
  standard,
  asciifolding
],
char_filter: [
  char_mapper
]
  },
  index_prefix: {
type: custom,
tokenizer: standard,
filter: [
  lowercase,
  synonym_twoway,
  standard,
  asciifolding,
  prefixes
],
char_filter: [
  char_mapper
]
  },
  search: {
alias: [
  default
],
type: custom,
tokenizer: standard,
filter: [
  lowercase,
  synonym,
  synonym_twoway,
  standard,
  asciifolding
],
char_filter: [
  char_mapper
]
  },
  postal_code: {
tokenizer: keyword,
filter: [
  lowercase
]
  }
},
tokenizer: {
  standard: {
stopwords: [


]
  }
},
filter: {
  synonym: {
type: synonym,
synonyms: [
  st = sint,
  jp = jan pieterszoon,
  mh = maarten harpertszoon
]
  },
  synonym_twoway: {
type: synonym,
synonyms: [
  den haag, s gravenhage,
  den bosch, s hertogenbosch
]
  },
  prefixes: {
type: edgeNGram,
side: front,
min_gram: 1,
max_gram: 30
  }
},
char_filter: {
  char_mapper: {
type: mapping,
mappings: [
  y = ij
]
  }
}
  }
}

When indexing cities, we're using this mapping:

{
  properties: {
city: {
  type: multi_field,
  fields: {
city: {
  type: string
},
prefix: {
  type: string,
  boost: 0.5,
  index_analyzer: index_prefix
}
  }
},
province_code: {
  type: string
},
unique_name: {
  type: boolean
},
point: {
  type: geo_point
},
search_terms: {
  type: multi_field,
  fields: {
search_terms: {
  type: string
},
prefix: {
  boost: 0.5,
  index_analyzer: index_prefix,
  type: string
}
  }
}
  },
  search_analyzer: search,
  index_analyzer: index
}

When we index all the (Dutch) cities from our data-source, there are cities 
starting with both `IJ` and `Y`. (for example, these citiy names exist: 
*IJssel*, *IJsselstein*, *Yerseke* and *Ysselsteyn.*) It seems that these 
characters are not lowercased before the char_mapping is applied. 

Querying the index, results in

/top/city/_search?q=ijsselstein - works, returns the document for 
IJsselstein
/top/city/_search?q=Ijsselstein - works, returns the document for 
IJsselstein
/top/city/_search?q=yerseke - *doesn't *work, returns nothing
/top/city/_search?q=Yerseke - *does *work, returns the document for Yerseke
/top/city/_search?q=YsselsteYn - *doesn't *work, returns nothing
/top/city/_search?q=Ysselsteyn - *does *work, returns the document for 
Ysselsteyn

Changing the case of any other letter doesn't affect the results.

I've worked around this issue by adding the mapping Y = ij, i.e.:

char_filter: {
  char_mapper: {
type: mapping,
mappings: [
  y = ij,
  Y = ij
]
  }
}

This solves the problem, but I'd rather see that the lowercase filter is 
applied before the mapping, or, that I can make the order explicit. Is 
there any stance on this issue? Or is this intended behaviour?

Regards,
Matthias Hogerheijde



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c60de452-2a3f-42f7-a677-956f81ecec17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: accessing field data faster in script

Your filter would be faster if you used range filters on the start/end
dates instead of using a script.

On Mon, Aug 18, 2014 at 10:52 AM, avacados kotadia.ak...@gmail.com wrote:

 _cache: true   // I removed this caching and i
 found significant performance improvement... do you know why ? :-)


Yes: when caching a filter, it needs to be evaluated over all documents of
your index in order to be loaded into a bit set. On the other hand, when a
script filter is not cached it will typically only be evaluated on
documents that match the query.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6q0HJc_-J0i_mLBe%2BGKhkFdBEeTTabuYFGx21VToRVnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Excheption in suggester responce

2014-08-18 Thread makr

Hi!
I try test elasticsearch suggester, but i got strange error.

user@user:/user/esconfig # curl -X POST 
'localhost:9200/dwh_direct/_suggest?pretty' -d @suggester
{
  _shards : {
total : 5,
successful : 0,
failed : 5,
failures : [ {
  index : dwh_direct,
  shard : 0,
  reason : BroadcastShardOperationFailedException[[dwh_direct][0] ]; 
nested: ElasticsearchException[failed to execute suggest]; nested: 
ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper 
cannot be cast to 
org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
}, {
  index : dwh_direct,
  shard : 1,
  reason : BroadcastShardOperationFailedException[[dwh_direct][1] ]; 
nested: ElasticsearchException[failed to execute suggest]; nested: 
ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper 
cannot be cast to 
org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
}, {
  index : dwh_direct,
  shard : 2,
  reason : BroadcastShardOperationFailedException[[dwh_direct][2] ]; 
nested: ElasticsearchException[failed to execute suggest]; nested: 
ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper 
cannot be cast to 
org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
}, {
  index : dwh_direct,
  shard : 3,
  reason : BroadcastShardOperationFailedException[[dwh_direct][3] ]; 
nested: ElasticsearchException[failed to execute suggest]; nested: 
ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper 
cannot be cast to 
org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
} ]
  }
}

user@user:/user/esconfig # more suggester
{
  my-suggest : {
text : co,
completion : {
  field : name
}
  }
}

Is it bug in elasticsearch or i made mistake in configuration or query?


--

Maxim Krasovsky

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/465403c8-6f5a-4151-9c0f-e6e490fdfe13%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Enhancing perf for my cluster

2014-08-18 Thread Pierrick Boutruche

Hi everyone !

I'm currently working on a tool with *ES and Twitter Streaming API*, in 
which I try to find interesting profiles on Twitter, based on what they 
tweet, RT and which of their interactions are shared/RT.

Anyway, I use ES to index and search among tweets. To do that, I get 
Twitter stream data and put in a *single index users  tweets (2 types)*, 
linked by the user id via un parent-child relation. Actually, I thought of 
my indexing a lot and it is the best way to do it. 
- I need to update very often users (because i score them and because they 
update their profile quite often), so get the user nested in the tweet is 
not an option (too many replicas)
- I could put user's tweets directly in the user object but I would have 
huge objects and I don't really want that.

I work on a SoYouStart Server, 4c/4t 3.2GHz, 32Go RAM, 4To HDD.

My settings for the index are :

settings = {

 index : {

 number_of_replicas : 0,

 refresh_interval : '10s',

 routing.allocation.disable_allocation: False

 },

 analysis: {

 analyzer: {

 snowFrench:{

 type: snowball,

 language: French

 },

 snowEnglish:{

 type: snowball,

 language: English

 },

 snowGerman:{

 type: snowball,

 language: German

 },

 snowRussian:{

 type: snowball,

 language: Russian

 },

 snowSpanish:{

 type: snowball,

 language: Spanish

 },

 snowJapanese:{

 type: snowball,

 language: Japanese

 },

 edgeNGramAnalyzer:{

 tokenizer: myEdgeNGram

 },

 name_analyzer: {

 tokenizer: whitespace,

 type: custom,

 filter: [lowercase, multi_words, name_filter]

 },

 city_analyzer : {

 type : snowball,

 language : English

 }

 },

 tokenizer : {

 myEdgeNGram : {

 type : edgeNGram,

 min_gram : 2,

 max_gram : 5

 },

 name_tokenizer: {

 type: edgeNGram,

 max_gram: 100,

 min_gram: 4

 }

 },

 filter: {

 multi_words: {

 type: shingle,

 min_shingle_size: 2,

 max_shingle_size: 10

 },

 name_filter: {

 type: edgeNGram,

 max_gram: 100,

 min_gram: 4

 }  

 }

 }

 }


And my mappings are :

 tweet_mapping = {

 _all : {
 enabled : False
 },
 _ttl : { 
 enabled : True, 
 default : 400d 
 },
 _parent : {
 type : 'user'
 },
 properties: {
 textfr: {
 'type': 'string',
 '_analyzer': 'snowFrench',
 'copy_to': 'text'
 },
 texten: {
 'type': 'string',
 '_analyzer': 'snowEnglish',
 'copy_to': 'text'
 },
 textde: {
 'type': 'string',
 '_analyzer': 'snowGerman',
 'copy_to': 'text'
 },
 textja: {
 'type': 'string',
 '_analyzer': 'snowJapanese',
 'copy_to': 'text'
 },
 textru: {
 'type': 'string',
 '_analyzer': 'snowRussian',
 'copy_to': 'text'
 },
 textes: {
 'type': 'string',
 '_analyzer': 'snowSpanish',
 'copy_to': 'text'
 },
 text: {
 'type': 'string',
 'null_value': '',
 'index': 'analyzed',
 'store': 'yes'
 },
 entities: {
 'type': 'object',
 'index': 'analyzed',
 'store': 'yes',
 'properties': {
hashtags: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 },
 mentions: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'long',
 'precision_step': 64
 }
 }
 },  
 lang: {
 'index': 'not_analyzed',
 'store': 'yes', 
 'type': 'string'
 }, 
 created_at: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'date',
 'format' : 'dd-MM- HH:mm:ss'
 }
 }
 }
 user_mapping = {
 _all : {
 enabled : False
 },
 _ttl : { 
 enabled : True, 
 default : 600d 
 },
 properties: {
   lang: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'string'
 },
 name: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 }, 
 screen_name: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 }, 
 descfr: {
 'type': 'string',
 '_analyzer': 'snowFrench',
 'copy_to': 'description'
 },
 descen: {
 'type': 'string',
 '_analyzer': 'snowEnglish',
 'copy_to': 'description'
 },
 descde: {
 'type': 'string',
 '_analyzer': 'snowGerman',
 'copy_to': 'description'
 },
 descja: {
 'type': 'string',
 '_analyzer': 'snowJapanese',
 'copy_to': 'description'
 },
 descru: {
 'type': 'string',
 '_analyzer': 'snowRussian',
 'copy_to': 'description'
 },
 desces: {
 'type': 'string',
 '_analyzer': 'snowSpanish',
 'copy_to': 'description'
 },
 description: {
 'type': 'string',
 'null_value': '',
 'index': 'analyzed',
 'store': 'yes'
 },
 created_at: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'date',

elasticsearch php api with multiple hosts

2014-08-18 Thread Niv Penso



I followed this link to create an elasticsearch 2 nodes cluster on Azure: this 
link http://thomasardal.com/running-elasticsearch-in-a-cluster-on-azure/

the installation and configuring went good.

When i started to check the cluster i found a strange behaviour from the 
php client.

I declared 2 hosts in the client:

$ELSEARCH_SERVER = array(dns1:9200,dns2:9200);
$params = array();
$params['hosts'] = $ELSEARCH_SERVER;
$dstEl = new Elasticsearch\Client($params);

the excpected behaviour is that it will try to insert the documents to 
dns1 and if it fails it will *automatically* change to dns2. but, for 
some reason when one of the servers is down on insertion the php client 
throws an exception that it couldn't connect to host and only.

Is there any way to cause the client automatically choose an online server?

thnx, Niv

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7489eb44-bff3-41d1-baa1-da70b508ef66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How to update nest from 0.12 to 1.0

2014-08-18 Thread Dmitriy Bashkalin

Hello. Does someone use NEST for .NET?
Please help me. 
Sometime ago I asked how to get part of textfield. I wanted to do it with 
Highlight param no_match_size, but it's supported since NEST version 
1.0RC1. After update nest.dll from 0.12 to 1.0 I got problem that nothing 
works. Looking GitHub for changelog didn't help.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5952eaf6-7d31-4682-9789-bf4a720768ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: A few questions about node types + usage

2014-08-18 Thread Alex

Hello again Mark,

Thanks for your response. Your answers really are very helpful.

As with our previous conversation
https://groups.google.com/d/topic/elasticsearch/ZouS4NVsTJw/discussion I
am confused about how to make a client node also be master eligible. This
is what I posted there, I would really like some help understanding this:

I've done more investigating and it seems that a Client (AKA Query) node
cannot also be a Master node. As it says here
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election

*Nodes can be excluded from becoming a master by setting node.master to
false. Note, once a node is a client node (node.client set to true), it
will not be allowed to become a master (node.master is automatically set to
false).*

And from the elasticsearch.yml config file it says:

*# 2. You want this node to only serve as a master: to not store any data
and # to have free resources. This will be the coordinator of your
cluster. # #node.master: true #node.data: false # # 3. You want this node
to be neither master nor data node, but # to act as a search load
balancer (fetching data from nodes, # aggregating results,
etc.) # #node.master: false #node.data: false*

So I'm wondering how exactly you set up your client nodes to also be master
nodes. It seems like a master node can only either be purely a master or
master + data.

Perhaps you could show the relevant parts of one of your client node's
config?

Many thanks, Alex

On Saturday, 16 August 2014 01:04:37 UTC+1, Mark Walkom wrote:

1 - Up to you. We use the http output and then just use a round robin A
record to our 3 masters.
2 - They are routed but it makes more sense to specify.
3 - You're right, but most people only use 1 or 2 masters which is why
they get recommended to have at least 3.
4 - That sounds like a lot. We use masters that double as clients and they
only have 8GB, our use sounds similar and we don't have issues.

I wouldn't bother with 3 client only nodes to start, use them as master
and client and then if you find you are hitting memory issues due to
queries you can re-evaluate things.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 15 August 2014 20:11, Alex alex@gmail.com javascript: wrote:

Bump. Any help? Thanks

On Wednesday, 13 August 2014 12:10:14 UTC+1, Alex wrote:

Hello I would like some clarification about node types and their usage.

We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can
also be masters (discovery.zen.minimum_master_nodes set to 4). We will
use Logstash and Kibana. Kibana will be used 24/7 by between a couple and
handfuls of people.

Some questions:

1. Should incoming Logstash write requests be sent to the cluster in
general (using the *cluster* setting in the *elasticsearch* output)
or specifically to the client nodes or to the data nodes (via load
balancer)? I am unsure what kind of node is best for handling writes.

2. If client nodes exist in the cluster are Kibana requests
automatically routed to them? Do I need to somehow specify to Kibana
which
nodes to contact?

3. I have heard different information about master nodes and the
minimum_master_node setting. I've heard that you should have a odd
number
of master nodes but I fail to see why the parity of the number of
masters
matters as long as minimum_master_node is set to at least N/2 + 1. Does
it
really need to be odd?

4. I have been advised that the client nodes will use huge amount of
memory (which makes sense due to the nature of the Kibana facet
queries).
64GB per client node was recommended but I have no idea if that sounds
right or not. I don't have the ability to actually test it right now so
any
more guidance on that would be helpful.

I'd be so grateful to hear from you even if you only know something
about one of my queries.

Thank you for your time,
Alex

https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

river-csv plugin

2014-08-18 Thread HansPeterSloot

Hi, 

This is for elasticsearch : elasticsearch-1.3.2-1.noarch
There are 2 nodes in the cluster.
I have installed the river-csv pluging.

When loading a file with 5 million rows loading stops after 477400 rows.

I load with :
curl -XPUT localhost:9200/_river/my_csv_river/_meta -d '
{
type : csv,
csv_file : {
folder : /u01/app/div,
first_line_is_header:true
}
}'

In the logfile I see :
[2014-08-18 14:44:53,216][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Going to execute new bulk composed of 100 actions
[2014-08-18 14:44:53,275][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Executed bulk composed of 100 actions
[2014-08-18 14:44:53,280][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Going to execute new bulk composed of 100 actions
[2014-08-18 14:44:53,299][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Executed bulk composed of 100 actions
[2014-08-18 14:44:53,385][INFO 
][org.agileworks.elasticsearch.river.csv.CSVRiver] [Stanley] 
[csv][my_csv_river] Executed bulk composed of 100 actions

./es -v indices
status name pri rep size bytes   docs
green  _river 1   1  15452  2
green  my_csv_river   5   1  296047073 477400

Am I doing something wrong?

Regards HansP

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/76cafcc4-9966-4c0b-b891-b18b9376a74f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Excheption in suggester responce

2014-08-18 Thread vineeth mohan

Hello Maxim ,

Can you show the schema and a sample data that you have indexed.

Thanks
  Vineeth


On Mon, Aug 18, 2014 at 3:31 PM, m...@ciklum.com wrote:

 Hi!
 I try test elasticsearch suggester, but i got strange error.

 user@user:/user/esconfig # curl -X POST
 'localhost:9200/dwh_direct/_suggest?pretty' -d @suggester
 {
   _shards : {
 total : 5,
 successful : 0,
 failed : 5,
 failures : [ {
   index : dwh_direct,
   shard : 0,
   reason : BroadcastShardOperationFailedException[[dwh_direct][0]
 ]; nested: ElasticsearchException[failed to execute suggest]; nested:
 ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper
 cannot be cast to
 org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
 }, {
   index : dwh_direct,
   shard : 1,
   reason : BroadcastShardOperationFailedException[[dwh_direct][1]
 ]; nested: ElasticsearchException[failed to execute suggest]; nested:
 ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper
 cannot be cast to
 org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
 }, {
   index : dwh_direct,
   shard : 2,
   reason : BroadcastShardOperationFailedException[[dwh_direct][2]
 ]; nested: ElasticsearchException[failed to execute suggest]; nested:
 ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper
 cannot be cast to
 org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
 }, {
   index : dwh_direct,
   shard : 3,
   reason : BroadcastShardOperationFailedException[[dwh_direct][3]
 ]; nested: ElasticsearchException[failed to execute suggest]; nested:
 ClassCastException[org.elasticsearch.index.mapper.core.StringFieldMapper
 cannot be cast to
 org.elasticsearch.index.mapper.core.CompletionFieldMapper]; 
 } ]
   }
 }

 user@user:/user/esconfig # more suggester
 {
   my-suggest : {
 text : co,
 completion : {
   field : name
 }
   }
 }

 Is it bug in elasticsearch or i made mistake in configuration or query?


 --

 Maxim Krasovsky

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/465403c8-6f5a-4151-9c0f-e6e490fdfe13%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/465403c8-6f5a-4151-9c0f-e6e490fdfe13%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMQmX788p%3Db0G%2Bqk_z6wwsA9HBtLu66fLRoKxvupRo%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: EsRejectedExecutionException: rejected execution (queue capacity 1000)

2014-08-18 Thread Sávio S . Teles de Oliveira

You can put *threadpool.search.type: **cached* on elasticsearch.yml for
unbounded queue for reads.


2014-08-10 9:52 GMT-03:00 James digital...@gmail.com:

  On Sat, 2014-08-09 at 23:53 -0700, Deep wrote:

 Hi,



  Elastic search internally has thread pool and a queue size is associated
 with each pool. You can have pools for search threads, index threads etc.
 Please see the elastic search documentation in link
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
 . I think it is possible to override these properties in the
 elasticsearch.yml configuration file.



  Regards,

  Ishwardeep

 On Saturday, 9 August 2014 00:54:02 UTC+5:30, digit...@gmail.com wrote:

 So I've seen a few posts on this, but I've not seen any solutions posted.
 I've been log monitoring and I was trying to determine how to fix the
 below...any information would be great thank you.

 [2014-08-08 19:14:12,578][DEBUG][action.search.type   ] [Jericho
 Drumm] [bro-201408032100][2], node[fgjxNK0cQ3O5Usn7wyjaMA], [P],
 s[STARTED]: Failed to execute
 [org.elasticsearch.action.search.SearchRequest@126067b7] lastShard [true]
 org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
 rejected execution (queue capacity 1000) on
 org.elasticsearch.search.action.SearchServiceTransportAction$23@5a879352
 at
 org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
 at
 org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:509)
 at
 org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:203)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:171)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:153)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
 at
 org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
 at
 org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:101)
 at
 org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
 at
 org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
 at
 org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
 at
 org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
 at
 org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:75)
 at
 org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
 at
 org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
 at
 org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
 at
 org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
 at
 org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:294)
 at
 org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:44)
 at
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at
 org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
 at
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at

Re: Query problem

2014-08-18 Thread Luc Evers

David hi,

How can I configure the mapping so that the default analyzer will be the
whitespace one?

On Wed, Aug 13, 2014 at 2:46 PM, David Pilato da...@pilato.fr wrote:

Having no answer is not good. I think something goes wrong here. May be
you should see something in logs.

That said, if you don't want to break your string as tokens at index time,
you could set index:not_analyzed for fields you don't want to analyze.

But, you should read this part of the book:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/analysis-intro.html#analysis-intro

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr

Le 13 août 2014 à 14:39:20, Luc Evers (lucev...@gmail.com) a écrit:

I like to use elasticsearch as Nosql database + search engine for data
coming from text files (router configs) and databases .
First I moved a routerconfig to a json file which I indexed .

Mapping:

{
configs : {
mappings : {
test : {
properties : {
ConfLength : {
type : string
},
NVRAM : {
type : string
},
aaa : {
type : string
},
enable : {
type : string
},
hostname : {
type : string
},
lastChange : {
type : string
},
logging : {
type : string
},
model : {
type : string
},
policy-map : {
type : string
}
}
}
}
}
}

Document:

{
_index : configs,
_type : test,
_id : 7,
_score : 1,
_source : {
hostname : [
hostname test-1234
]
}
},

Example of a simple search: search a hostname.

*If I start a query*:

*curl -XGET 'http://127.0.0.1:9200/configs/_search?q=
http://127.0.0.1:9200/configs/_search?q=hostname test-1234'*
curl: (52) Empty reply from server

No respone

If I start a second query without hostname if got an answer:

*curl -XGET 'http://127.0.0.1:9200/configs/_search?q=
http://127.0.0.1:9200/configs/_search?q=test-1234'*
OKE

Analyser: standard

Why a search instruction can find test-1234 but not hostname test-1234
?

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b25127bb-2dca-440c-a7b3-937b5ddccd6d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b25127bb-2dca-440c-a7b3-937b5ddccd6d%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/xOrC6RMG_nw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53eb5e1b.3804823e.18f0%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.53eb5e1b.3804823e.18f0%40MacBook-Air-de-David.local?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAA0yNqLNnQK0%2BtJgRkOqwgJawqngMjmWJfXDgijpcuEbQYbyZw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ThreadPool reject_policy

2014-08-18 Thread Sávio S . Teles de Oliveira

What does it work the threadpool using reject_policy *caller*?

Can I catch the exception EsRejectedExecutionException (using Java api)
during heavy writes?

-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
CUIA Internet Brasil

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFKmhPtnm9xV21nhvtE%3D0hv4GoLXhugNpkJXqC9Mec93892USg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query problem

I think could help you:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 18 août 2014 à 15:39:36, Luc Evers (lucev...@gmail.com) a écrit:

David hi,

How can I configure the mapping so that the default analyzer will be the
whitespace one?

On Wed, Aug 13, 2014 at 2:46 PM, David Pilato da...@pilato.fr wrote:
Having no answer is not good. I think something goes wrong here. May be you
should see something in logs.

That said, if you don't want to break your string as tokens at index time, you
could set index:not_analyzed for fields you don't want to analyze.

But, you should read this part of the book:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/analysis-intro.html#analysis-intro

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 13 août 2014 à 14:39:20, Luc Evers (lucev...@gmail.com) a écrit:

I like to use elasticsearch as Nosql database + search engine for data coming
from text files (router configs) and databases .
First I moved a routerconfig to a json file which I indexed .

Mapping:

Document:

{
_index : configs,
_type : test,
_id : 7,
_score : 1,
_source : {
hostname : [
hostname test-1234
]
}
},

Example of a simple search: search a hostname.

If I start a query:

curl -XGET 'http://127.0.0.1:9200/configs/_search?q=hostname test-1234'
curl: (52) Empty reply from server

No respone

If I start a second query without hostname if got an answer:

curl -XGET 'http://127.0.0.1:9200/configs/_search?q=test-1234;'
OKE

Analyser: standard

Why a search instruction can find test-1234 but not hostname test-1234 ?

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b25127bb-2dca-440c-a7b3-937b5ddccd6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google
Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/xOrC6RMG_nw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53eb5e1b.3804823e.18f0%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAA0yNqLNnQK0%2BtJgRkOqwgJawqngMjmWJfXDgijpcuEbQYbyZw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53f202c7.2eb141f2.132%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

How to normalize score when combining regular query and function_score?

2014-08-18 Thread JohnnyM

First of all kudos on the awesome job everyone here is doing!

I was wondering if you guys can help me solve this puzzle:

Also available on stack 
overflow: 
http://stackoverflow.com/questions/25361795/elasticsearch-how-to-normalize-score-when-combining-regular-query-and-function

Idealy what I am trying to achieve is to assign weights to queries such 
that query1 constitutes 30% of the final score and query2 consitutes other 
70%, so to achieve the maximum score a document has to have highest 
possible score on query1 and query2. My study of the documentation did not 
yield any hints as to how to achieve this so lets try to solve a simpler 
problem.

Consider a query in following form:

{
query: {
bool: {
should: [
{
function_score: {
query: {match_all: {}},
script_score: {
script: some_script,
}
}
},
{
match: {
message: this is a test
}
}
]
}
}
}

The script can return an arbitrary number (think- it can return something 
like 12392002).

How do I make sure that the result from the script will not dominate the 
overall score? (my experiments using explain show that this indeed can 
happen very often)

Is there any way to normalize it? For example instead of script score 
return the ratio to max_script_score (achieved by document with highest 
score)?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2179ed93-575c-47d5-a13a-42d1e2244baa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

help with a grok filter

2014-08-18 Thread Kevin M

Could someone help me write a grok filter for this log real quick here is 
what the log looks like:


Aug 18 09:40:39 server01 webmin_log: 172.16.16.96 - username 
*[18/Aug/2014:09:40:39 
-0400]* GET /right.cgi?open=systemopen=status HTTP/1.1 200 3228

here is what I have so far:

match = [ message, %{SYSLOGTIMESTAMP:timestamp} %{WORD:Server} 
webmin_log: %{IP:IP_Address} - %{USERNAME:username} *[ stuck at this middle 
part [18/Aug/2014:09:40:39 -0400] *] %{WORD:method} 
%{URIPATHPARAM:request} HTTP/1.1 %{NUMBER:bytes} %{NUMBER:duration}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4784c4b4-65ab-4894-8a1b-a8ab0fba0ed6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help with the percentiles aggregation

2014-08-18 Thread John Ogden

That's spot on. Thanks!
On 18 Aug 2014 09:08, Adrien Grand adrien.gr...@elasticsearch.com wrote:

 Hi John,

 You should be able to do something like:

 {
   aggs: {
 verb: {
   terms: {
 field: verb
   },
   aggs: {
 load_time_outliers: {
   percentiles: {
 field: responsetime
   }
 }
   }
 }
   }
 }

 This will first break down your documents according to the http verb that
 is being used and then compute percentiles separately for each unique verb.



 On Fri, Aug 15, 2014 at 11:23 AM, John Ogden johnog65...@gmail.com
 wrote:

 Hi,

 Am trying to run a single command which calculates percentiles for
 multiple search queries.
 The data for this is an Apache log file, and I want to get the percentile
 response times for the gets, posts, heads (etc) in one go

 If I run this:
 curl -XPOST 'http://localhost:9200/_search?search_type=countpretty=true'
 -d '{
 facets: {
 0: {query : {term : { verb : get  }}},
 1: {query : {term : { verb : post }}}
 },
 aggs : {load_time_outlier : {percentiles : {field :
 responsetime}}}
 }'

 The response I get back has the counts for each subquery but only does
 the aggregations for the overall dataset
   facets : {
 0 : {
   _type : query,
   count : 5678
 },
 1 : {
   _type : query,
   count : 1234
 }
   },
   aggregations : {
 load_time_outlier : {
   values : {
 1.0 : 0.0,
  ...
 99.0 : 1234
   }
 }
   }

 I cant figure out how to structure the request so that I get the
 percentiles separately for each of the queries

 Could someone point me in in the right direction please

 Many thanks
 John

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/9d9696cb-adfa-4812-bd81-5efee0d29032%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/9d9696cb-adfa-4812-bd81-5efee0d29032%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 --
 Adrien Grand

 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/6tHMOeWYtoo/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5JwTLK2q10fEKX6bVBzYH69dSRgA2njoEvhhronqfh1A%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5JwTLK2q10fEKX6bVBzYH69dSRgA2njoEvhhronqfh1A%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGfq%3DRjVu58Jetkgf%3DGvJ4BkLjhWYPvm789UGPrr0U%2BOiA_Wxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ES ignores index.query.bool.max_clause_count in elasticsearch.yml

2014-08-18 Thread l . daedelow

It seems to me that ES ignores the index.query.bool.max_clause_count 
argument in elasticsearch.yml

Setting index.query.bool.max_clause_count: 5000 results in the following 
error:

Caused by: org.apache.lucene.search.BooleanQuery$TooManyClauses: 
maxClauseCount is set to 1024

Any solution whats going wrong here?
 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ddedd43b-9f1c-4373-9280-671d7cc828a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help with the percentiles aggregation

2014-08-18 Thread John Ogden

Slight follow on - do you know if returning this sort of stuff via Kibana 
is on the cards?
Just looking for an easy way to graph the results.

Thanks.





On Friday, 15 August 2014 10:23:16 UTC+1, John Ogden wrote:

 Hi,

 Am trying to run a single command which calculates percentiles for 
 multiple search queries.
 The data for this is an Apache log file, and I want to get the percentile 
 response times for the gets, posts, heads (etc) in one go

 If I run this:
 curl -XPOST 'http://localhost:9200/_search?search_type=countpretty=true' 
 -d '{
 facets: { 
 0: {query : {term : { verb : get  }}},
 1: {query : {term : { verb : post }}}
 },
 aggs : {load_time_outlier : {percentiles : {field : 
 responsetime}}}   
 }'

 The response I get back has the counts for each subquery but only does the 
 aggregations for the overall dataset 
   facets : {
 0 : {
   _type : query,
   count : 5678
 },
 1 : {
   _type : query,
   count : 1234
 }
   },
   aggregations : {
 load_time_outlier : {
   values : {
 1.0 : 0.0,
  ...
 99.0 : 1234
   }
 }
   }

 I cant figure out how to structure the request so that I get the 
 percentiles separately for each of the queries

 Could someone point me in in the right direction please

 Many thanks
 John


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/579dad15-4470-4f0d-a787-9b51fd7b447a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Help with multiple data ranges in a single query

2014-08-18 Thread John Ogden

I've been given a requirement to produce a single kibana dashboard showing 
app response times for multiple date ranges, and am stumped at how to 
proceed.
The user wants to see today's graph, along with the previous working day, 
day -7, day -28 and day -364 on the same screen - ideally, all 4 metrics in 
the same histogram  if they select another date range they want that to 
show the day-1, day-7 (etc) results too

The only thing I've been able to come up with so far is pushing each source 
event into elastic search  4 times (once with right timestamp,one with +1 
day, one with +7 days, one with +28 days, etc.) and writing separate 
queries for each, but this just feels wrong.

Any ideas how else the requirement could be met?


Many thanks.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3525d473-4172-45b6-852f-a0e4826eca3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help with the percentiles aggregation

Support for aggregations is indeed something that is on the roadmap for the
next version of Kibana (Kibana 4), see this message from Rashid:
https://groups.google.com/forum/?utm_medium=emailutm_source=footer#!msg/elasticsearch/I7um1mX4GSk/aUsT2EmyxysJ


On Mon, Aug 18, 2014 at 4:33 PM, John Ogden johnog65...@gmail.com wrote:

 Slight follow on - do you know if returning this sort of stuff via Kibana
 is on the cards?
 Just looking for an easy way to graph the results.

 Thanks.






 On Friday, 15 August 2014 10:23:16 UTC+1, John Ogden wrote:

 Hi,

 Am trying to run a single command which calculates percentiles for
 multiple search queries.
 The data for this is an Apache log file, and I want to get the percentile
 response times for the gets, posts, heads (etc) in one go

 If I run this:
 curl -XPOST 'http://localhost:9200/_search?search_type=countpretty=true'
 -d '{
 facets: {
 0: {query : {term : { verb : get  }}},
 1: {query : {term : { verb : post }}}
 },
 aggs : {load_time_outlier : {percentiles : {field :
 responsetime}}}
 }'

 The response I get back has the counts for each subquery but only does
 the aggregations for the overall dataset
   facets : {
 0 : {
   _type : query,
   count : 5678
 },
 1 : {
   _type : query,
   count : 1234
 }
   },
   aggregations : {
 load_time_outlier : {
   values : {
 1.0 : 0.0,
  ...
 99.0 : 1234
   }
 }
   }

 I cant figure out how to structure the request so that I get the
 percentiles separately for each of the queries

 Could someone point me in in the right direction please

 Many thanks
 John

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/579dad15-4470-4f0d-a787-9b51fd7b447a%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/579dad15-4470-4f0d-a787-9b51fd7b447a%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5pu8of4T06R8nVv1%3DvBy3wrX5Oqqowwhiiiqv5jhyK0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Aggregates - include source data

2014-08-18 Thread John D. Ament

Hi,

From looking at the docs, didn't seem overly clear.  Is it possible to 
include the data in an aggregate, or is it counts only?

John

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/83032bdc-53f4-4728-b109-e9ab3eb3d412%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregates - include source data

Aggregations only report counts or various metrics (see the metrics
aggregations: stats, min, max, sum, percentiles, cardinality, top_hits,
...). Maybe top_hits is what you are looking for?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

On Mon, Aug 18, 2014 at 5:34 PM, John D. Ament john.d.am...@gmail.com
wrote:

Hi,

From looking at the docs, didn't seem overly clear. Is it possible to
include the data in an aggregate, or is it counts only?

John

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/83032bdc-53f4-4728-b109-e9ab3eb3d412%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/83032bdc-53f4-4728-b109-e9ab3eb3d412%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Kct0y6LfDFENvvfgnN7N04vOpp35zRpg%3DG4AHw94Jhg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Optimization Questions

2014-08-18 Thread Andrew Selden

Hi Greg,

I believe max_num_segments is technically a hint that can be overridden by the
merge algorithm if it decides to. You might try simply re-running the optimize
again to get from ~25 down closer to 1. Sorry but I don't know of any way to
see when the optimize is finished - it's really just forcing a merge so looking
at merge stats is what you want.

Hope that helps.
Andrew

On Aug 15, 2014, at 8:01 PM, Gregory Sutcliffe gsutcli...@publishthis.com
wrote:

Hey Guys,
We were doing some updates to our es(1.3.1) clusters recently and had some
questions about _optimize. We optimized with max_num_segments 1 and we're
still seeing ~25 segments per shard. The index that was optimized had no
writes going to it during the time, it was actually freshly re-opened after
an upgrade. Also, are there any tricks to seeing when an optimize is done
other that watching merges stats and disk IO? Maybe some data in marvel?

Thanks for your assistance,
Greg

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/17622c72-f004-4fda-92fb-dda393a64807%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88238DE4-AFC4-41D0-B495-ED9938D7CB9C%40elasticsearch.com.
For more options, visit https://groups.google.com/d/optout.

Re: Enhancing perf for my cluster

2014-08-18 Thread Pierrick Boutruche

Hey guys,

Finally i changed all my queries to constantscorequeries. It's way better, 
but still, certain pages take a lot of time running... I don't understand 
why, and i don't have anything in my ES logs... 

Now the average time for search 20 users and their mentions/timeline + 
scoring them is about 4s (and almost 4s for the search). 
But when it takes time, it's still 60s for 1 page !!! 

I tried reading the explain data i can't get after the query but there's no 
response time. How can I find a way to understand why certain queries take 
so much time ?

Thanks !

Le lundi 18 août 2014 12:29:10 UTC+2, Pierrick Boutruche a écrit :

 Hi everyone !

 I'm currently working on a tool with *ES and Twitter Streaming API*, in 
 which I try to find interesting profiles on Twitter, based on what they 
 tweet, RT and which of their interactions are shared/RT.

 Anyway, I use ES to index and search among tweets. To do that, I get 
 Twitter stream data and put in a *single index users  tweets (2 types)*, 
 linked by the user id via un parent-child relation. Actually, I thought of 
 my indexing a lot and it is the best way to do it. 
 - I need to update very often users (because i score them and because they 
 update their profile quite often), so get the user nested in the tweet is 
 not an option (too many replicas)
 - I could put user's tweets directly in the user object but I would have 
 huge objects and I don't really want that.

 I work on a SoYouStart Server, 4c/4t 3.2GHz, 32Go RAM, 4To HDD.

 My settings for the index are :

 settings = {

 index : {

 number_of_replicas : 0,

 refresh_interval : '10s',

 routing.allocation.disable_allocation: False

 },

 analysis: {

 analyzer: {

 snowFrench:{

 type: snowball,

 language: French

 },

 snowEnglish:{

 type: snowball,

 language: English

 },

 snowGerman:{

 type: snowball,

 language: German

 },

 snowRussian:{

 type: snowball,

 language: Russian

 },

 snowSpanish:{

 type: snowball,

 language: Spanish

 },

 snowJapanese:{

 type: snowball,

 language: Japanese

 },

 edgeNGramAnalyzer:{

 tokenizer: myEdgeNGram

 },

 name_analyzer: {

 tokenizer: whitespace,

 type: custom,

 filter: [lowercase, multi_words, name_filter]

 },

 city_analyzer : {

 type : snowball,

 language : English

 }

 },

 tokenizer : {

 myEdgeNGram : {

 type : edgeNGram,

 min_gram : 2,

 max_gram : 5

 },

 name_tokenizer: {

 type: edgeNGram,

 max_gram: 100,

 min_gram: 4

 }

 },

 filter: {

 multi_words: {

 type: shingle,

 min_shingle_size: 2,

 max_shingle_size: 10

 },

 name_filter: {

 type: edgeNGram,

 max_gram: 100,

 min_gram: 4

 }  

 }

 }

 }


 And my mappings are :

 tweet_mapping = {

 _all : {
 enabled : False
 },
 _ttl : { 
 enabled : True, 
 default : 400d 
 },
 _parent : {
 type : 'user'
 },
 properties: {
 textfr: {
 'type': 'string',
 '_analyzer': 'snowFrench',
 'copy_to': 'text'
 },
 texten: {
 'type': 'string',
 '_analyzer': 'snowEnglish',
 'copy_to': 'text'
 },
 textde: {
 'type': 'string',
 '_analyzer': 'snowGerman',
 'copy_to': 'text'
 },
 textja: {
 'type': 'string',
 '_analyzer': 'snowJapanese',
 'copy_to': 'text'
 },
 textru: {
 'type': 'string',
 '_analyzer': 'snowRussian',
 'copy_to': 'text'
 },
 textes: {
 'type': 'string',
 '_analyzer': 'snowSpanish',
 'copy_to': 'text'
 },
 text: {
 'type': 'string',
 'null_value': '',
 'index': 'analyzed',
 'store': 'yes'
 },
 entities: {
 'type': 'object',
 'index': 'analyzed',
 'store': 'yes',
 'properties': {
hashtags: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 },
 mentions: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'long',
 'precision_step': 64
 }
 }
 },  
 lang: {
 'index': 'not_analyzed',
 'store': 'yes', 
 'type': 'string'
 }, 
 created_at: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'date',
 'format' : 'dd-MM- HH:mm:ss'
 }
 }
 }
 user_mapping = {
 _all : {
 enabled : False
 },
 _ttl : { 
 enabled : True, 
 default : 600d 
 },
 properties: {
   lang: {
 'index': 'not_analyzed',
 'store': 'yes',
 'type': 'string'
 },
 name: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 }, 
 screen_name: {
 'index': 'analyzed',
 'store': 'yes',
 'type': 'string',
 _analyzer: edgeNGramAnalyzer
 }, 
 descfr: {
 'type': 'string',
 '_analyzer':

indexing problem when using logstash

2014-08-18 Thread vitaly . bulgakov

I am using the foollowing config file
filter{
grok{
match=[
message,

(?:\?|\)C\=%{DATA:kw}\%{DATA}\sT\s%{DATA:town}\sS\s%{WORD:state}\s%{DATA}%{IP:ip}
]
}
grok{
match=[
message,
(?:\?|\)SRC\=%{DATA:src}(?:\|$)
]
}
}
output {
  elasticsearch {
host = localhost
  }
  stdout { codec = rubydebug }
}
And I thought kw, town, state, etc. will be fields in elastic search. 
But trying 
 
http://localhost:9200/_search?q=town:* AND state:*
I am getting

{took:5,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{*total:0*,max_score:null,hits:[]}}

 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b99b5f5a-9063-4970-8da2-106efc5de196%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: help with a grok filter

2014-08-18 Thread vitaly


On Monday, August 18, 2014 9:57:41 AM UTC-4, Kevin M wrote:

 Could someone help me write a grok filter for this log real quick here is 
 what the log looks like:


 Aug 18 09:40:39 server01 webmin_log: 172.16.16.96 - username 
 *[18/Aug/2014:09:40:39 
 -0400]* GET /right.cgi?open=systemopen=status HTTP/1.1 200 3228

 here is what I have so far:

 match = [ message, %{SYSLOGTIMESTAMP:timestamp} %{WORD:Server} 
 webmin_log: %{IP:IP_Address} - %{USERNAME:username} *[ stuck at this 
 middle part [18/Aug/2014:09:40:39 -0400] *] %{WORD:method} 
 %{URIPATHPARAM:request} HTTP/1.1 %{NUMBER:bytes} %{NUMBER:duration}

 
It is just a sequence of regular expressions catching fields one by one. 
Look, e.g at my post.   

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fc1251d5-d346-475d-9d21-bf993b45062e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: help with a grok filter

2014-08-18 Thread Kevin M

I dont see your post - what I am stuck with is whenever the date changes on 
that log example:


*[18/Aug/2014:09:40:39 -0400]*

*[20/Aug/2014:11:40:39 -0104]*
*[19/Aug/2014:08:40:39 -0500]*

the filter will not match it


On Monday, August 18, 2014 1:53:37 PM UTC-4, vitaly wrote:


 On Monday, August 18, 2014 9:57:41 AM UTC-4, Kevin M wrote:

 Could someone help me write a grok filter for this log real quick here is 
 what the log looks like:


 Aug 18 09:40:39 server01 webmin_log: 172.16.16.96 - username 
 *[18/Aug/2014:09:40:39 
 -0400]* GET /right.cgi?open=systemopen=status HTTP/1.1 200 3228

 here is what I have so far:

 match = [ message, %{SYSLOGTIMESTAMP:timestamp} %{WORD:Server} 
 webmin_log: %{IP:IP_Address} - %{USERNAME:username} *[ stuck at this 
 middle part [18/Aug/2014:09:40:39 -0400] *] %{WORD:method} 
 %{URIPATHPARAM:request} HTTP/1.1 %{NUMBER:bytes} %{NUMBER:duration}

  
 It is just a sequence of regular expressions catching fields one by one. 
 Look, e.g at my post.   


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b2e3db4a-385d-4bb0-aa2c-0b5b7f96b728%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Experimental Highlighter 0.0.11 Released

2014-08-18 Thread Nikolas Everett

I released version 0.0.11 of the Experimental Highlighter
https://github.com/wikimedia/search-highlighter we've been using .  Its
compatible with Elasticsearch 1.3.x and has a few new features:
1.  Conditional highlighting - skip highlighting fields you aren't going to
use!  Save time and IO bandwidth!
1.  Regular expressions - now you have two problems!

Read more at the link above if you are interested.

Its in use on our beta site
http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page so you can try it and
verify that it doesn't crash and stuff.

Cheers,


Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd00Lq%2B-k-STjnxC%2B-Lbx6jtP0zT9ShhmKcCmQFdN1ZdcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Mapper Attachment plugin 2.2.1 released

Heya,

We are pleased to announce the release of the Elasticsearch Mapper Attachment
plugin, version 2.2.1

The mapper attachments plugin adds the attachment type to Elasticsearch using
Apache Tika..
Release Notes - Version 2.2.1

Earlier today there was an Apache POI release to address a security
vulnerability. For some document types, the attachment mapper plugin will
indirectly use POI. This attachment mapper plugin release forces an update to
Apache POI and is a response to the POI issue. Previously, the attachment
mapper did not have an explicit dependency on POI. With this release, we have
added a direct dependency and set it to the recent versions of POI. This will
help users of the attachment mapper, who might be unaware of these
vulnerabilities, avoid them.

You can read more about the reported issues in CVE-2014-3529 and CVE-2014-3574

We encourage anyone using the attachment mapper plugin with untrusted documents
to update the plugin.
Update

[80] - Update a few dependencies
Doc

[79] - Docs: make the welcome page more obvious
Issues, Pull requests, Feature requests are warmly welcome on
elasticsearch-mapper-attachments project repository!

For questions or comments around this plugin, feel free to use elasticsearch
mailing list!

Enjoy,

- The Elasticsearch team

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53f25b3e.5bd062c2.132%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Mapper Attachment plugin 2.3.1 released

Heya,
We are pleased to announce the release of the Elasticsearch Mapper Attachment
plugin, version 2.3.1

The mapper attachments plugin adds the attachment type to Elasticsearch using
Apache Tika..
Release Notes - Version 2.3.1

You can read more about the reported issues in CVE-2014-3529 and CVE-2014-3574

We encourage anyone using the attachment mapper plugin with untrusted documents
to update the plugin.
Update

[80] - Update a few dependencies
Doc

[79] - Docs: make the welcome page more obvious
Issues, Pull requests, Feature requests are warmly welcome on
elasticsearch-mapper-attachments project repository!

For questions or comments around this plugin, feel free to use elasticsearch
mailing list!

Enjoy,

- The Elasticsearch team

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53f25b7d.1190cde7.132%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Unassigned Node and shards

2014-08-18 Thread IronMan2014

I saw this problem twice now.
I start with a Green two-node cluster, default 5 shards/node, I index about 
50,000 docs, shards/replicas look great and well balanced across the 2 
nodes.

I try the same test with 8 million docs. I come back when its done, and I 
see all primary shards on node1 and 2 replicas on node2 and three 
unassigned replicas on a third unassigned node.

I will look through the logs, but I was wondering if anyone has seen 
something similar or has any idea where/why this is coming from before I 
dig?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2bddca5e-ea3d-4804-9a0d-47d98bab96c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] swift-repository-plugin v0.5 released

2014-08-18 Thread Chad Horohoe

Hi all,

Just released to Central the v0.5 of the swift-repository plugin.
Mainly contains documentation updates but also built against
1.3.2 instead of 1.1.0.

https://github.com/wikimedia/search-repository-swift

-Chad

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0bc2613f-f97b-4dea-af93-d2bc9bb8521c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to increase memory

2014-08-18 Thread joergpra...@gmail.com

What version of ES do you use?

Jörg


On Mon, Aug 18, 2014 at 9:42 PM, rookie7799 pavelbara...@gmail.com wrote:

 Hello there,

 We are having the same exact problem with a really resource hungry query:
 5 nodes with 16GB ES_HEAP_SIZE
 1.2 Billion records inside 1 index with 5 shards

 Whenever we start running an aggregate query the whole cluster breaks and
 disconnects. Why can't it just not return results and simple give and error
 without actually killing the entire cluster?

 Cheers!


 On Saturday, February 9, 2013 1:05:54 PM UTC-5, Igor Motov wrote:

 ES_HEAP_SIZE ES_MAX_MEM ES_MIN_MEM are environment variables. They need
 to be specified on the command line. For example:

 ES_HEAP_SIZE=4g bin/elasticsearch -f

 To get JVM stats, you need to set jvm=true on stats request:

 curl -XGET 'http://localhost:9200/_cluster/nodes/stats?jvm=true;
 pretty=true'

 To understand how much memory you need, give it as much as you can, put
 some load and monitor jvm.mem.heap_used in the output of the stats
 command above. If this number ever goes and stays above 90%
 of available heap it's typically a good indicator that you need more.

 There is a small Russian elasticsearch forum - https://groups.google.com/
 forum/?fromgroups=#!forum/elasticsearch-ru

 On Saturday, February 9, 2013 12:57:04 PM UTC-5, Николай Измайлов wrote:

 In continuation of the topic https://github.com/
 elasticsearch/elasticsearch/issues/2636#issuecomment-13332877

 in continuation of the topic https://github.com/
 elasticsearch/elasticsearch/issues/2636#issuecomment-13332877
 On the page http://www.elasticsearch.org/guide/reference/setup/
 installation.html it is said that it is necessary to increase
 ES_HEAP_SIZE ES_MAX_MEM ES_MIN_MEM, but I have not found this configuration
 then /etc/elasticsearch/elasticsearch.yml. Here's my cluster

 {
   cluster_name : elasticsearch,
   nodes : {
 VPjABUm-REmy24NQ_AkXDQ : {
   timestamp : 1360432148849,
   name : Sin,
   transport_address : inet[/ip:9300],
   hostname : Ubuntu-1204-precise-64-minimal,
   indices : {
 store : {
   size : 34.6gb,
   size_in_bytes : 37221752556,
   throttle_time : 0s,
   throttle_time_in_millis : 0
 },
 docs : {
   count : 58480,
   deleted : 4759
 },
 indexing : {
   index_total : 20,
   index_time : 1.7s,
   index_time_in_millis : 1748,
   index_current : 0,
   delete_total : 0,
   delete_time : 0s,
   delete_time_in_millis : 0,
   delete_current : 0
 },
 get : {
   total : 2,
   time : 5ms,
   time_in_millis : 5,
   exists_total : 0,
   exists_time : 0s,
   exists_time_in_millis : 0,
   missing_total : 2,
   missing_time : 5ms,
   missing_time_in_millis : 5,
   current : 0
 },
 search : {
   query_total : 1726375,
   query_time : 7.7m,
   query_time_in_millis : 462631,
   query_current : 0,
   fetch_total : 61663,
   fetch_time : 20.9s,
   fetch_time_in_millis : 20955,
   fetch_current : 0
 },
 cache : {
   field_evictions : 0,
   field_size : 0b,
   field_size_in_bytes : 0,
   filter_count : 5896,
   filter_evictions : 0,
   filter_size : 511.6kb,
   filter_size_in_bytes : 523944,
   bloom_size : 22.1kb,
   bloom_size_in_bytes : 22640,
   id_cache_size : 0b,
   id_cache_size_in_bytes : 0
 },
 merges : {
   current : 0,
   current_docs : 0,
   current_size : 0b,
   current_size_in_bytes : 0,
   total : 0,
   total_time : 0s,
   total_time_in_millis : 0,
   total_docs : 0,
   total_size : 0b,
   total_size_in_bytes : 0
 },
 refresh : {
   total : 15,
   total_time : 143ms,
   total_time_in_millis : 143
 },
 flush : {
   total : 25,
   total_time : 3.2s,
   total_time_in_millis : 3205
 }
   }
 }
   }
 }


 As understand how much I need to allocate memory for elasticsearch and
 in General the description for each of the parameters.

 there is a Russian community ?

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/12846c2e-e777-4812-bce2-d6c97a30c352%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/12846c2e-e777-4812-bce2-d6c97a30c352%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit

Top hits aggregation default sort

2014-08-18 Thread Dan Tuffery

I using the top hits aggregation with a has_child query. In the top_hits 
aggregation documentation it says '*By default the hits are sorted by the 
score of the main query*', but I'm not seeing that in the results for my 
query

{
  from: 0,
  size: 3,
  query: {
has_child: {
  score_mode: max,
  type: child_type,
  query: {
match: {
  myField: {
query: some text
  }
}
  }
}
  },
  aggs: {
replies: {
  terms: {
field: parent_type_id,
size: 3
  },
  aggs: {
topChildren: {
  top_hits: {
size: 1
  }
}
  }
}
  }
}

the has_child query returns three parent results with the following scores.

   - doc 1 = 0.83619833
   - doc 2 = 0.7210085
   - doc 3 = 0.7210085

The score for the top hits aggregations are:

   - first top hit aggregation = 0.29160267
   - second top hit aggregation  = 0.83619833
   - third top hit aggregation = 0.58320534

So the 'second top hit aggregation' should be returned first followed with 
aggregations with the score  0.7210085?





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0b6849ad-4308-4afe-a76b-80153620f74b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to increase memory

2014-08-18 Thread rookie7799

Hi, it's 1.3.2

On Monday, August 18, 2014 5:49:03 PM UTC-4, Jörg Prante wrote:

 What version of ES do you use?

 Jörg


 On Mon, Aug 18, 2014 at 9:42 PM, rookie7799 pavelb...@gmail.com 
 javascript: wrote:

 Hello there,

 We are having the same exact problem with a really resource hungry query:
 5 nodes with 16GB ES_HEAP_SIZE
 1.2 Billion records inside 1 index with 5 shards

 Whenever we start running an aggregate query the whole cluster breaks and 
 disconnects. Why can't it just not return results and simple give and error 
 without actually killing the entire cluster?

 Cheers!


 On Saturday, February 9, 2013 1:05:54 PM UTC-5, Igor Motov wrote:

 ES_HEAP_SIZE ES_MAX_MEM ES_MIN_MEM are environment variables. They need 
 to be specified on the command line. For example:

 ES_HEAP_SIZE=4g bin/elasticsearch -f

 To get JVM stats, you need to set jvm=true on stats request:

 curl -XGET 'http://localhost:9200/_cluster/nodes/stats?jvm=true;
 pretty=true'

 To understand how much memory you need, give it as much as you can, put 
 some load and monitor jvm.mem.heap_used in the output of the stats 
 command above. If this number ever goes and stays above 90% 
 of available heap it's typically a good indicator that you need more.

 There is a small Russian elasticsearch forum - 
 https://groups.google.com/forum/?fromgroups=#!forum/elasticsearch-ru

 On Saturday, February 9, 2013 12:57:04 PM UTC-5, Николай Измайлов wrote:

 In continuation of the topic https://github.com/
 elasticsearch/elasticsearch/issues/2636#issuecomment-13332877

 in continuation of the topic https://github.com/
 elasticsearch/elasticsearch/issues/2636#issuecomment-13332877
 On the page http://www.elasticsearch.org/guide/reference/setup/
 installation.html it is said that it is necessary to increase 
 ES_HEAP_SIZE ES_MAX_MEM ES_MIN_MEM, but I have not found this 
 configuration 
 then /etc/elasticsearch/elasticsearch.yml. Here's my cluster

 {
   cluster_name : elasticsearch,
   nodes : {
 VPjABUm-REmy24NQ_AkXDQ : {
   timestamp : 1360432148849,
   name : Sin,
   transport_address : inet[/ip:9300],
   hostname : Ubuntu-1204-precise-64-minimal,
   indices : {
 store : {
   size : 34.6gb,
   size_in_bytes : 37221752556,
   throttle_time : 0s,
   throttle_time_in_millis : 0
 },
 docs : {
   count : 58480,
   deleted : 4759
 },
 indexing : {
   index_total : 20,
   index_time : 1.7s,
   index_time_in_millis : 1748,
   index_current : 0,
   delete_total : 0,
   delete_time : 0s,
   delete_time_in_millis : 0,
   delete_current : 0
 },
 get : {
   total : 2,
   time : 5ms,
   time_in_millis : 5,
   exists_total : 0,
   exists_time : 0s,
   exists_time_in_millis : 0,
   missing_total : 2,
   missing_time : 5ms,
   missing_time_in_millis : 5,
   current : 0
 },
 search : {
   query_total : 1726375,
   query_time : 7.7m,
   query_time_in_millis : 462631,
   query_current : 0,
   fetch_total : 61663,
   fetch_time : 20.9s,
   fetch_time_in_millis : 20955,
   fetch_current : 0
 },
 cache : {
   field_evictions : 0,
   field_size : 0b,
   field_size_in_bytes : 0,
   filter_count : 5896,
   filter_evictions : 0,
   filter_size : 511.6kb,
   filter_size_in_bytes : 523944,
   bloom_size : 22.1kb,
   bloom_size_in_bytes : 22640,
   id_cache_size : 0b,
   id_cache_size_in_bytes : 0
 },
 merges : {
   current : 0,
   current_docs : 0,
   current_size : 0b,
   current_size_in_bytes : 0,
   total : 0,
   total_time : 0s,
   total_time_in_millis : 0,
   total_docs : 0,
   total_size : 0b,
   total_size_in_bytes : 0
 },
 refresh : {
   total : 15,
   total_time : 143ms,
   total_time_in_millis : 143
 },
 flush : {
   total : 25,
   total_time : 3.2s,
   total_time_in_millis : 3205
 }
   }
 }
   }
 }


 As understand how much I need to allocate memory for elasticsearch and 
 in General the description for each of the parameters.

 there is a Russian community ?

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/12846c2e-e777-4812-bce2-d6c97a30c352%40googlegroups.com

How to safely migrate from one mount to another mount in Elasticsearch to store the data

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them to store 
data at the location which is /auto/share. I want to point one of the two 
nodes in the cluster to some other location to store the data say /auto/foo.
What would be the best way of achieving the above task without loosing any 
data.? And is it possible to do that without loosing any data.?

Thank you,
Shriyansh

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

Do you want to copy the existing data in /auto/share to /auto/foo, or start
with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyanshaj...@gmail.com wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them to store
data at the location which is /auto/share. I want to point one of the two
nodes in the cluster to some other location to store the data say /auto/foo.
What would be the best way of achieving the above task without loosing any
data.? And is it possible to do that without loosing any data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624aPBZLB0i-y-_JFJFJzVVLiZBz3VDiJUYTt%2BduUZ-Br6Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

If you want no data in /auto/foo then just create the directory, give it
the right permissions and then update the config to point to it.
It's the same process you did for /auto/share.

Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:32, shriyansh jain shriyanshaj...@gmail.com wrote:

I would prefer with no data in /auto/foo.? But would like to go with way,
which is efficient and more reliable.

Thank you,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to /auto/foo, or
start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them to
store data at the location which is /auto/share. I want to point one of the
two nodes in the cluster to some other location to store the data say
/auto/foo.
What would be the best way of achieving the above task without loosing
any data.? And is it possible to do that without loosing any data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624bMQP2HW4UxbV%3DX81fN-M0r7diZ9HVyZFRDcf9Q0P6mrA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

Yes, I have set *index.number_of_replicas: 1*. If I just point one of the 2
nodes to some other location, wont it loose the data stored by that node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

If you want no data in /auto/foo then just create the directory, give it
the right permissions and then update the config to point to it.
It's the same process you did for /auto/share.

Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com
javascript: wrote:

I would prefer with no data in /auto/foo.? But would like to go with way,
which is efficient and more reliable.

Thank you,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to /auto/foo, or
start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them to
store data at the location which is /auto/share. I want to point one of
the
two nodes in the cluster to some other location to store the data say
/auto/foo.
What would be the best way of achieving the above task without loosing
any data.? And is it possible to do that without loosing any data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eb602b14-4d3e-430c-93d8-935da98af66a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

Yes, I have set *index.number_of_replicas: 1*. If I just point one of the 2
nodes to some other location, wont it lose the data stored by that node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

If you want no data in /auto/foo then just create the directory, give it
the right permissions and then update the config to point to it.
It's the same process you did for /auto/share.

Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com
javascript: wrote:

I would prefer with no data in /auto/foo.? But would like to go with way,
which is efficient and more reliable.

Thank you,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to /auto/foo, or
start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them to
store data at the location which is /auto/share. I want to point one of
the
two nodes in the cluster to some other location to store the data say
/auto/foo.
What would be the best way of achieving the above task without loosing
any data.? And is it possible to do that without loosing any data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

If you point the instance to a new data location then yes, it will startup
with no data, but it won't lose the data completely as it will still be
located in your original /auto/share directory.

However given you have replicas set what will happen is when the node
starts up pointing to the new location it will simply start to copy the
data from the other node so that you fulfil your replica requirements.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:58, shriyansh jain shriyanshaj...@gmail.com wrote:

Yes, I have set *index.number_of_replicas: 1*. If I just point one of the
2 nodes to some other location, wont it lose the data stored by that node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

If you want no data in /auto/foo then just create the directory, give it
the right permissions and then update the config to point to it.
It's the same process you did for /auto/share.

Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

I would prefer with no data in /auto/foo.? But would like to go with
way, which is efficient and more reliable.

Thank you,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to /auto/foo, or
start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them to
store data at the location which is /auto/share. I want to point one of
the
two nodes in the cluster to some other location to store the data say
/auto/foo.
What would be the best way of achieving the above task without loosing
any data.? And is it possible to do that without loosing any data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YSXt_0L4W%3DNfA_3PjxyMz%2BXKZi0SS2noQQ3qdb0pOJWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: how to get char_filter to work?

2014-08-18 Thread Ivan Brusic

Sorry if I have not replied sooner, but I was on vacation.

I would use the two fields solution, especially since you simply cannot
store a stripped version. The source field is compressed, so the additional
index size is content dependent. Never used highlighting, so I cannot
recommend alternative approaches.

I use jsoup to strip HTML before the data reaches Elasticsearch. Not sure
if it is the best, but I have been using it for years.

Cheers,

Ivan


On Wed, Aug 13, 2014 at 8:16 AM, IronMan2014 sabdall...@gmail.com wrote:

 Ivan,

 A followup question, As I mentioned earlier storing html and applying
 char-filter doesn't really work especially with highlighted fields coming
 back with weird html display.
 So, I am thinking stripping html before indexing, so no html in index and
 source, but I will add an extra field like html_content which meant to
 store the html version and not be indexed.
 Do you see any problems with my approach? I see one like big index size.
 What do you recommend for an ideal solution? I am still confused as I
 thought this would be a common problem?


 On Friday, August 8, 2014 8:16:09 PM UTC-4, IronMan wrote:

 Thanks again. I wasn't expecting it to remove what's between the tags. I
 believe I understand the behavior and maybe its the case where I was greedy
 and expecting ElasticSearch to do it all.
 Here is a scenario that I was looking for: Assume I am looking to get an
 excerpt of text (Extracted text from a document), Elastic Search query will
 give me excerpt with html tags, but the tags are out of context, so I would
 have liked to be to display this excerpt with no html tags, I know I can
 probably strip the tags after the fact, but that's what I was trying to
 avoid.  In other words, in a perfect world, I would have liked 2 versions
 of the document, the original html one and another stripped one. When I
 need to query things like excerpts, I would query the stripped one, and
 when I needed the html, I would query the source. Hopefully I didn't make
 this more confusing.

 On Friday, August 8, 2014 4:58:03 PM UTC-4, Ivan Brusic wrote:

 The tokens that appear in the analyze API are the ones that are put into
 the inverted index. When you search for one of the terms that is not an
 HTML tag, there will be a match. What I don't understand after reading in
 detail your original, is exactly what behavior you are expecting.

 You indexed the phrase
 htmltrying out bElasticsearch/b, This is an html test/html

 but you expected a query for the term html to not match. However, the
 work html is clearly in the content. The html stripper will not remove
 the contents in between the tags, just the tags themselve. The analyze API
 should show you the correct term.

 Lucene has more control over what information you can retrieve, but the
 only way to get the analyzed token stream back from Elasticsearch is to use
 the analyze API on the field. Most people do not want an analyzed token
 stream, just the original field.

 --
 Ivan


 On Fri, Aug 8, 2014 at 12:01 PM, IronMike sabda...@gmail.com wrote:

 Also, Here is a link for someone who had the same problem, I am not
 sure if there was a final answer to that one. http://grokbase.com/t/gg/
 elasticsearch/126r4kv8tx/problem-with-standard-html-strip,
 I have to admit that I am a bit confused now about this topic. I
 understand analyzers will tokenize the sentence and strip html in the case
 of the html_strip, and _analyze works fine using the analyzer, what I am
 failing to understand, is how can I get the results of these tokens. Isn't
 the whole idea to be able to search for them tokens eventually?

 If not, whats the solution of what I would think is a common scenario,
 having to index html documents, where html tags don't need to be indexed,
 while keeping the original html for presentational purpose? Any ideas
 (Besides having to strip html tags manually before indexing?


 On Friday, August 8, 2014 1:02:07 PM UTC-4, IronMike wrote:

 Thanks for explaining. So, is there a way to be able to get non html
 from the index? I thought I read that it was possible to index without the
 html tags while keeping source intact. So, how would I get at the index
 with non html tags if you will?

 On Friday, August 8, 2014 12:52:37 PM UTC-4, Ivan Brusic wrote:

 The field is derived from the source and not generated from the
 tokens.

 If we indexed the sentence The quick brown foxes jumped over the
 lazy dogs with the english analyzer, the tokens would be

 http://localhost:9200/_analyze?text=The%20quick%20brown%
 20foxes%20jumped%20over%20the%20lazy%20dogsanalyzer=english

 quick brown fox jump over lazi dog

 After applying stopwords and stemming, the tokens do not form a
 sentence that looks like the original.

 --
 Ivan


 On Fri, Aug 8, 2014 at 9:42 AM, IronMike sabda...@gmail.com wrote:

 Ivan,

 The search results I am showing is for the field title not for the
 source. I thought I could query the field not the source and look at it

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

Why do you want to do this if you are worried about data loss?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 11:50, shriyansh jain shriyanshaj...@gmail.com wrote:

As you mentioned the node will not lose the data completely, is there any
possibility that it will lose some data.?

Thank you,
Shriyansh

On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

If you point the instance to a new data location then yes, it will
startup with no data, but it won't lose the data completely as it will
still be located in your original /auto/share directory.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:

Yes, I have set *index.number_of_replicas: 1*. If I just point one of
the 2 nodes to some other location, wont it lose the data stored by that
node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

If you want no data in /auto/foo then just create the directory, give
it the right permissions and then update the config to point to it.
It's the same process you did for /auto/share.

Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

I would prefer with no data in /auto/foo.? But would like to go with
way, which is efficient and more reliable.

Thank you,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to /auto/foo, or
start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them to
store data at the location which is /auto/share. I want to point one of
the
two nodes in the cluster to some other location to store the data say
/auto/foo.
What would be the best way of achieving the above task without
loosing any data.? And is it possible to do that without loosing any
data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa
9-4f6d-86b9-41b2059ab67f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/13131192-405a-43b9-ab56-62ff894c8237%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/13131192-405a-43b9-ab56-62ff894c8237%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

Just to make sure if /auto/share goes down I have data in /auto/foo.

Thanks,
Shriyansh

On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

Why do you want to do this if you are worried about data loss?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com
javascript: wrote:

As you mentioned the node will not lose the data completely, is there any
possibility that it will lose some data.?

Thank you,
Shriyansh

On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

If you point the instance to a new data location then yes, it will
startup with no data, but it won't lose the data completely as it will
still be located in your original /auto/share directory.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:

Yes, I have set *index.number_of_replicas: 1*. If I just point one of
the 2 nodes to some other location, wont it lose the data stored by that
node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

If you want no data in /auto/foo then just create the directory, give
it the right permissions and then update the config to point to it.
It's the same process you did for /auto/share.

Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

I would prefer with no data in /auto/foo.? But would like to go with
way, which is efficient and more reliable.

Thank you,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to /auto/foo,
or start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com
wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them
to store data at the location which is /auto/share. I want to point
one of
the two nodes in the cluster to some other location to store the data
say
/auto/foo.
What would be the best way of achieving the above task without
loosing any data.? And is it possible to do that without loosing any
data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa
9-4f6d-86b9-41b2059ab67f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

To make sure if /auto/share goes down, I have data in /auto/foo. And I am
short of space on /auto/share. Mainly bcz of these 2 reasons.

Thanks,
Shriyansh

On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

Why do you want to do this if you are worried about data loss?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com
javascript: wrote:

As you mentioned the node will not lose the data completely, is there any
possibility that it will lose some data.?

Thank you,
Shriyansh

On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

If you point the instance to a new data location then yes, it will
startup with no data, but it won't lose the data completely as it will
still be located in your original /auto/share directory.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:

Yes, I have set *index.number_of_replicas: 1*. If I just point one of
the 2 nodes to some other location, wont it lose the data stored by that
node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

If you want no data in /auto/foo then just create the directory, give
it the right permissions and then update the config to point to it.
It's the same process you did for /auto/share.

Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

I would prefer with no data in /auto/foo.? But would like to go with
way, which is efficient and more reliable.

Thank you,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to /auto/foo,
or start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com
wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them
to store data at the location which is /auto/share. I want to point
one of
the two nodes in the cluster to some other location to store the data
say
/auto/foo.
What would be the best way of achieving the above task without
loosing any data.? And is it possible to do that without loosing any
data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa
9-4f6d-86b9-41b2059ab67f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

This is why you have replicas, they give you redundancy at a higher level
that the filesystem,
If you are still concerned then you should add another node and increase
your replicas.

Playing around on the FS to create replicas is only extra management
overhead and likely to end up causing more problems than it's worth.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 August 2014 11:59, shriyansh jain shriyanshaj...@gmail.com wrote:

 Just to make sure if /auto/share goes down I have data in /auto/foo.

 Thanks,
 Shriyansh

 On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

 Why do you want to do this if you are worried about data loss?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com wrote:

 As you mentioned the node will not lose the data completely, is there
 any possibility that it will lose some data.?

 Thank you,
 Shriyansh

 On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

 If you point the instance to a new data location then yes, it will
 startup with no data, but it won't lose the data completely as it will
 still be located in your original /auto/share directory.

 However given you have replicas set what will happen is when the node
 starts up pointing to the new location it will simply start to copy the
 data from the other node so that you fulfil your replica requirements.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:


 Yes, I have set *index.number_of_replicas: 1*. If I just point one of
 the 2 nodes to some other location, wont it lose the data stored by that
 node.?


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

 If you want no data in /auto/foo then just create the directory, give
 it the right permissions and then update the config to point to it.
 It's the same process you did for /auto/share.


 Do you have replicas set on your indexes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com wrote:

 I would prefer with no data in /auto/foo.? But would like to go with
 way, which is efficient and more reliable.


 Thank you,
 Shriyansh

 On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

 Do you want to copy the existing data in /auto/share to /auto/foo,
 or start with no data?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com
  wrote:

 Hi,

 I have a Elasticsearch Cluster of 2 nodes. I have configured them
 to store data at the location which is /auto/share. I want to point 
 one of
 the two nodes in the cluster to some other location to store the data 
 say
 /auto/foo.
 What would be the best way of achieving the above task without
 loosing any data.? And is it possible to do that without loosing any 
 data.?

 Thank you,
 Shriyansh

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa
 9-4f6d-86b9-41b2059ab67f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it,
 send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e2
 3-4e6b-8cf5-9a8bb31c4328%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2dbadd5f-5e23-4e6b-8cf5-9a8bb31c4328%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/01d32d9d-3041-4fb7-babe-0e73e3908b31%40goo
 glegroups.com

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

Apart from replica's, that's really outside the scope of what ES provides.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 12:12, shriyansh jain shriyanshaj...@gmail.com wrote:

I got your point sir, but if my entire /auto/share goes down. Then I wont
have any chance to recover the data in /auto/share.
Is there any other way to recover the data.?

Thanks,
Shriyansh

On Monday, August 18, 2014 7:03:34 PM UTC-7, Mark Walkom wrote:

This is why you have replicas, they give you redundancy at a higher level
that the filesystem,
If you are still concerned then you should add another node and increase
your replicas.

Playing around on the FS to create replicas is only extra management
overhead and likely to end up causing more problems than it's worth.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 11:59, shriyansh jain shriyan...@gmail.com wrote:

Just to make sure if /auto/share goes down I have data in /auto/foo.

Thanks,
Shriyansh

On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

Why do you want to do this if you are worried about data loss?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com wrote:

As you mentioned the node will not lose the data completely, is there
any possibility that it will lose some data.?

Thank you,
Shriyansh

On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

If you point the instance to a new data location then yes, it will
startup with no data, but it won't lose the data completely as it will
still be located in your original /auto/share directory.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com wrote:

Yes, I have set *index.number_of_replicas: 1*. If I just point one
of the 2 nodes to some other location, wont it lose the data stored by
that
node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

If you want no data in /auto/foo then just create the directory,
give it the right permissions and then update the config to point to
it.
It's the same process you did for /auto/share.

Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com
wrote:

I would prefer with no data in /auto/foo.? But would like to go
with way, which is efficient and more reliable.

Thank you,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to
/auto/foo, or start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com
wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured
them to store data at the location which is /auto/share. I want to
point
one of the two nodes in the cluster to some other location to store
the
data say /auto/foo.
What would be the best way of achieving the above task without
loosing any data.? And is it possible to do that without loosing
any data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the
Google Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.
com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

Thank you for helping me out. I really appreciate it.

Regards,
Shriyansh

On Monday, August 18, 2014 7:23:50 PM UTC-7, Mark Walkom wrote:

Apart from replica's, that's really outside the scope of what ES provides.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 19 August 2014 12:12, shriyansh jain shriyan...@gmail.com
javascript: wrote:

I got your point sir, but if my entire /auto/share goes down. Then I wont
have any chance to recover the data in /auto/share.
Is there any other way to recover the data.?

Thanks,
Shriyansh

On Monday, August 18, 2014 7:03:34 PM UTC-7, Mark Walkom wrote:

This is why you have replicas, they give you redundancy at a higher
level that the filesystem,
If you are still concerned then you should add another node and increase
your replicas.

Playing around on the FS to create replicas is only extra management
overhead and likely to end up causing more problems than it's worth.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 11:59, shriyansh jain shriyan...@gmail.com wrote:

Just to make sure if /auto/share goes down I have data in /auto/foo.

Thanks,
Shriyansh

On Monday, August 18, 2014 6:55:59 PM UTC-7, Mark Walkom wrote:

Why do you want to do this if you are worried about data loss?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 11:50, shriyansh jain shriyan...@gmail.com wrote:

As you mentioned the node will not lose the data completely, is there
any possibility that it will lose some data.?

Thank you,
Shriyansh

On Monday, August 18, 2014 4:17:54 PM UTC-7, Mark Walkom wrote:

If you point the instance to a new data location then yes, it will
startup with no data, but it won't lose the data completely as it will
still be located in your original /auto/share directory.

However given you have replicas set what will happen is when the
node starts up pointing to the new location it will simply start to
copy
the data from the other node so that you fulfil your replica
requirements.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:58, shriyansh jain shriyan...@gmail.com
wrote:

Yes, I have set *index.number_of_replicas: 1*. If I just point one
of the 2 nodes to some other location, wont it lose the data stored by
that
node.?

Thank you,
Shriyansh

On Monday, August 18, 2014 3:34:48 PM UTC-7, Mark Walkom wrote:

If you want no data in /auto/foo then just create the directory,
give it the right permissions and then update the config to point to
it.
It's the same process you did for /auto/share.

Do you have replicas set on your indexes?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:32, shriyansh jain shriyan...@gmail.com
wrote:

I would prefer with no data in /auto/foo.? But would like to go
with way, which is efficient and more reliable.

Thank you,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to
/auto/foo, or start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com
wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured
them to store data at the location which is /auto/share. I want to
point
one of the two nodes in the cluster to some other location to
store the
data say /auto/foo.
What would be the best way of achieving the above task without
loosing any data.? And is it possible to do that without loosing
any data.?

Thank you,
Shriyansh

--
You received this message because you are subscribed to the
Google Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.
com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: How to safely migrate from one mount to another mount in Elasticsearch to store the data

I would like to know one more thing, what would be steps if I want to copy
the data from /auto/share to /auto/foo for a particular node.?

Thanks,
Shriyansh

On Monday, August 18, 2014 3:26:39 PM UTC-7, Mark Walkom wrote:

Do you want to copy the existing data in /auto/share to /auto/foo, or
start with no data?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 19 August 2014 08:23, shriyansh jain shriyan...@gmail.com
javascript: wrote:

Hi,

I have a Elasticsearch Cluster of 2 nodes. I have configured them to
store data at the location which is /auto/share. I want to point one of the
two nodes in the cluster to some other location to store the data say
/auto/foo.
What would be the best way of achieving the above task without loosing
any data.? And is it possible to do that without loosing any data.?

Thank you,
Shriyansh

https://groups.google.com/d/msgid/elasticsearch/415f8d41-4fa9-4f6d-86b9-41b2059ab67f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c427919f-b8f1-4286-9302-0869d2aa7b5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using a char_filter in combination with a lowercase filter

2014-08-18 Thread Ivan Brusic

Char filters are applied before the text is tokenized, and therefore they
are applied before the normal filters are used, which is why they are a
separate class of filter. With Lucene, the order is:

char filters - tokenizer - filters

Have you looked into the ICU analyzer?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-icu-plugin.html

I have no idea how well it works with Dutch.

Cheers,

Ivan


On Mon, Aug 18, 2014 at 2:14 AM, Matthias Hogerheijde 
matthias.hogerhei...@goabout.com wrote:

 Hi,

 We're using Elasticsearch with an Analyzer to map the `y` character to
 `ij`, (*char_fitler* named char_mapper) since in Dutch these two are
 somewhat interchangeable. We're also using a *lowercase filter*.

 This is the configuration:

 {
   analysis: {
 analyzer: {
   index: {
 type: custom,
 tokenizer: standard,
 filter: [
   lowercase,
   synonym_twoway,
   standard,
   asciifolding
 ],
 char_filter: [
   char_mapper
 ]
   },
   index_prefix: {
 type: custom,
 tokenizer: standard,
 filter: [
   lowercase,
   synonym_twoway,
   standard,
   asciifolding,
   prefixes
 ],
 char_filter: [
   char_mapper
 ]
   },
   search: {
 alias: [
   default
 ],
 type: custom,
 tokenizer: standard,
 filter: [
   lowercase,
   synonym,
   synonym_twoway,
   standard,
   asciifolding
 ],
 char_filter: [
   char_mapper
 ]
   },
   postal_code: {
 tokenizer: keyword,
 filter: [
   lowercase
 ]
   }
 },
 tokenizer: {
   standard: {
 stopwords: [


 ]
   }
 },
 filter: {
   synonym: {
 type: synonym,
 synonyms: [
   st = sint,
   jp = jan pieterszoon,
   mh = maarten harpertszoon
 ]
   },
   synonym_twoway: {
 type: synonym,
 synonyms: [
   den haag, s gravenhage,
   den bosch, s hertogenbosch
 ]
   },
   prefixes: {
 type: edgeNGram,
 side: front,
 min_gram: 1,
 max_gram: 30
   }
 },
 char_filter: {
   char_mapper: {
 type: mapping,
 mappings: [
   y = ij
 ]
   }
 }
   }
 }

 When indexing cities, we're using this mapping:

 {
   properties: {
 city: {
   type: multi_field,
   fields: {
 city: {
   type: string
 },
 prefix: {
   type: string,
   boost: 0.5,
   index_analyzer: index_prefix
 }
   }
 },
 province_code: {
   type: string
 },
 unique_name: {
   type: boolean
 },
 point: {
   type: geo_point
 },
 search_terms: {
   type: multi_field,
   fields: {
 search_terms: {
   type: string
 },
 prefix: {
   boost: 0.5,
   index_analyzer: index_prefix,
   type: string
 }
   }
 }
   },
   search_analyzer: search,
   index_analyzer: index
 }

 When we index all the (Dutch) cities from our data-source, there are
 cities starting with both `IJ` and `Y`. (for example, these citiy names
 exist: *IJssel*, *IJsselstein*, *Yerseke* and *Ysselsteyn.*) It seems
 that these characters are not lowercased before the char_mapping is
 applied.

 Querying the index, results in

 /top/city/_search?q=ijsselstein - works, returns the document for
 IJsselstein
 /top/city/_search?q=Ijsselstein - works, returns the document for
 IJsselstein
 /top/city/_search?q=yerseke - *doesn't *work, returns nothing
 /top/city/_search?q=Yerseke - *does *work, returns the document for
 Yerseke
 /top/city/_search?q=YsselsteYn - *doesn't *work, returns nothing
 /top/city/_search?q=Ysselsteyn - *does *work, returns the document for
 Ysselsteyn

 Changing the case of any other letter doesn't affect the results.

 I've worked around this issue by adding the mapping Y = ij, i.e.:

 char_filter: {
   char_mapper: {
 type: mapping,
 mappings: [
   y = ij,
   Y = ij
 ]
   }
 }

 This solves the problem, but I'd rather see that the lowercase filter is
 applied before the mapping, or, that I can make the order explicit. Is
 there any stance on this issue? Or is this intended behaviour?

 Regards,
 Matthias Hogerheijde



  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c60de452-2a3f-42f7-a677-956f81ecec17%40googlegroups.com

Re: A few questions about node types + usage