date:20140731

Yeah. Only one client instance for the JVM.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 31 juil. 2014 à 06:48, Andrew Gaydenko andrew.gayde...@gmail.com a écrit :

As far as I understand, Java client instance is stateless, and it's methods are 
pure functions (I means operating methods rather those related to initial 
configuration just after instantiation). As a result, it is sufficient to have 
the only client for given cluster for given JVM. Is it true? Or - are there any 
benefits in, say, creating own client for every index?
-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d38de8b6-6ed9-4e3f-9dfc-c9dedd1293d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5476E2C7-7804-4732-BAD3-FC91A3408406%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: How many Java clients do I need?

2014-07-31 Thread Andrew Gaydenko

On Thursday, July 31, 2014 10:18:52 AM UTC+4, David Pilato wrote:

 Yeah. Only one client instance for the JVM.


Now I'm happy to be sure, thanks!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/700c7505-5480-4059-94a0-0e0cfc29890f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to get more than 10 terms bucket with Java API?

2014-07-31 Thread Alain Désilets

Thx. That did the trick. I now realize that I was setting the size on the 
return value of addAggregation(), when in fact, I needed to set it on the 
return value of the field() method.

On Tuesday, 29 July 2014 15:56:29 UTC-4, Adrien Grand wrote:

 Hi,

 You need to set the size on the terms aggregation:

 AggregationBuilders.terms(category_names).field(category).size(20)


 On Mon, Jul 28, 2014 at 9:22 PM, Alain Désilets alainde...@gmail.com 
 javascript: wrote:

 I have an ES where I have indexed a bunch of files. Each file is tagged 
 with a category field. I want to write Java code to get a list of all the 
 categories. I am trying to do this using the terms aggregation:

 QueryBuilder qb = QueryBuilders.matchAllQuery(); 
 SearchResponse sr = 
 esClient.prepareSearch()
 .setQuery(qb)
 .setSize(20)
  
 .addAggregation(AggregationBuilders.terms(category_names).field(category)).setSize(20)
 .execute()
  .actionGet(); 

 Terms terms = sr.getAggregations().get(category_names);
 int numCategories = terms.getBuckets().size();
 System.out.println(Number of category buckets found: + numCategories);

 But this code always prints out that it found 10 category buckets. Yet, 
 if I execute the following query in Sense:

 GET files-index/file_with_category/_search
 {
   query: {
   match_all: {}
   },
   aggs: {
 categories: {
   terms: {
 field: category,
 size: 100
   }
 }
   }
 }

 I get 18 category buckets. What am I doing wrong in the Java code?

 Thx.

 Alain


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/b6d0300c-524e-423b-bd5a-6ce9d9e7b168%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/b6d0300c-524e-423b-bd5a-6ce9d9e7b168%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f4ff78a9-f042-43c6-ad61-d11ba5fb7ff1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yet another OOME: Java heap space thread :S

2014-07-31 Thread Chris Neal

Hi everyone,

First off, apologies for the thread.  I know OOME discussions are somewhat
overdone in the group, but I need to reach out for some help for this one.

I have a 2 node development cluster in EC2 on c3.4xlarge AMIs.  That means
16 vCPUs, 30GB RAM, 1Gb network, and I have 2 500GB EBS volumes for
Elasticsearch data on each AMI.

I'm running Java 1.7.0_55, and using the G1 collector.  The Java args are:

/usr/bin/java -Xms8g -Xmx8g -Xss256k -Djava.awt.headless=true -server
-XX:+UseCompressedOops -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError
The index has 2 shards, each with 1 replica.

I have a daily index being filled with application log data.  The index, on
average, gets to be about:
486M documents
53.1GB (primary size)
106.2GB (total size)

Other than indexing, there really is nothing going on in the cluster.  No
searches, or percolators, just collecting data.

I have:

   - Tweaked the idex.merge.policy
   - Tweaked the indices.fielddata.breaker.limit and cache.size
   - change the index refresh_interval from 1s to 60s
   - created a default template for the index such that _all is disabled,
   and all fields in the mapping are set to not_analyzed.

Here is my complete elasticsearch.yml:

action:
  disable_delete_all_indices: true
cluster:
  name: elasticsearch-dev
discovery:
  zen:
minimum_master_nodes: 2
ping:
  multicast:
enabled: false
  unicast:
hosts: 10.0.0.45,10.0.0.41
gateway:
  recover_after_nodes: 2
index:
  merge:
policy:
  max_merge_at_once: 5
  max_merged_segment: 15gb
  number_of_replicas: 1
  number_of_shards: 2
  refresh_interval: 60s
indices:
  fielddata:
breaker:
  limit: 50%
cache:
  size: 30%
node:
  name: elasticsearch-ip-10-0-0-45
path:
  data:
  - /usr/local/ebs01/elasticsearch
  - /usr/local/ebs02/elasticsearch
threadpool:
  bulk:
queue_size: 500
size: 75
type: fixed
  get:
queue_size: 200
size: 100
type: fixed
  index:
queue_size: 1000
size: 100
type: fixed
  search:
queue_size: 200
size: 100
type: fixed

The heap sits about 13GB used.   I had been batting OOME exceptions for
awhile, and thought I had it licked, but one just popped up again.  My
cluster has been up and running fine for 14 days, and I just got this OOME:

=
[2014-07-30 11:52:28,394][INFO ][monitor.jvm  ]
[elasticsearch-ip-10-0-0-41] [gc][young][1158834][109906] duration [770ms],
collections [1]/[1s], total [770ms]/[43.2m], memory
[13.4gb]-[13.4gb]/[16gb], all_pools {[young]
[648mb]-[8mb]/[0b]}{[survivor] [0b]-[0b]/[0b]}{[old]
[12.8gb]-[13.4gb]/[16gb]}
[2014-07-30 15:03:01,070][WARN ][index.engine.internal]
[elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed engine [out of
memory]
[2014-07-30 15:03:10,324][WARN
][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space
[2014-07-30 15:03:10,335][WARN
][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space
[2014-07-30 15:03:10,324][WARN ][index.merge.scheduler]
[elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to merge
java.lang.OutOfMemoryError: Java heap space
[2014-07-30 15:03:28,595][WARN ][index.translog   ]
[elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to flush shard
on translog threshold
org.elasticsearch.index.engine.FlushFailedEngineException:
[derbysoft-20140730][0] Flush failed
at
org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:805)
at
org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:604)
at
org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.run(TranslogService.java:202)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4416)
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2989)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3096)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3063)
at
org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:797)
... 5 more
[2014-07-30 15:03:28,658][WARN ][cluster.action.shard ]
[elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] sending failed shard
for [derbysoft-20140730][0], node[W-7FsjjZTyOXZdaJhhqxEA], [R], s[STARTED],
indexUUID [QC5Sg0FDSnOGUiFg30qNxA], reason [engine failure, message [out of
memory][IllegalStateException[this writer hit an OutOfMemoryError; cannot
commit]]]
[2014-07-30 15:34:36,418][WARN

Recommendation for using stop words

2014-07-31 Thread sulemanmubarik

Hi 
What are good recommendation to use the stop words. 
Using the default stop words provided by elastic search is good 
Or should I use some custom stop words too
More than 60% data is from twitter.  
Thanks 




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Recommendation-for-using-stop-words-tp4060924.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1406763306312-4060924.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Something like Wildcard Filter?

2014-07-31 Thread zouxcs

I have the same question now,how did you solve it?give me a hint.Thank you



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Something-like-Wildcard-Filter-tp2613862p4060937.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1406773485496-4060937.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: sum-aggregation script doesn't allow negative values?

2014-07-31 Thread Colin Goodheart-Smithe

The Elasticsearch log files can be found in the logs directory of your 
node's Elasticsearch directory.  If you re-create the error and have a look 
at the end of the log file you should see the stacktrace

Colin

On Wednesday, 30 July 2014 10:53:05 UTC+1, Valentin wrote:

 Hi Colin,

 I try increasing it up to 40 but nothing changes. I would post the stack 
 trace but I don't know how to find them.

 Thanks
 Valentin

 On Wednesday, July 30, 2014 10:24:09 AM UTC+2, Colin Goodheart-Smithe 
 wrote:

 Also, your shard_size parameter should always be greater than the size 
 parameter.  So if you are asking for size of 10 then I would try setting 
 shard_size to 20 or 30.

 On Wednesday, 30 July 2014 09:22:16 UTC+1, Colin Goodheart-Smithe wrote:

 Would you be able to re-run your query and post the stack trace from the 
 Elasticsearch server logs.  This might help to work out whats going on.

 Thanks

 Colin

 On Tuesday, 29 July 2014 12:29:00 UTC+1, Valentin wrote:

 Ok. I think I found the problem. As soon as I try to sort on the script 
 value it ceases to work

 works, but unsorted
 {
   size: 0,
   aggs: {
 winners: {
   terms: {
 field: tit,
 size: 10,
 shard_size: 4
   },
   aggs: {
 articles_over_time: {
   date_histogram: {
 field: datetime,
 interval: 1d
   }
 },
 diff: {
   sum: {
 script: (doc['datetime'].value  140641200) ? -1 : 
 1,
 lang: groovy
   }
 }
   }
 }
   }
 }

 does not work:
 {
   size: 0,
   aggs: {
 winners: {
   terms: {
 field: tit,
 size: 10,
 order: {
   diff: desc
 },
 shard_size: 4
   },
   aggs: {
 articles_over_time: {
   date_histogram: {
 field: datetime,
 interval: 1d
   }
 },
 diff: {
   sum: {
 script: (doc['datetime'].value  140641200) ? -1 : 
 1,
 lang: groovy
   }
 }
   }
 }
   }
 }




 On Tuesday, July 29, 2014 12:40:15 PM UTC+2, Valentin wrote:

 Hi Colin,

 I could figure out the shard_size problem thanks to your help.

 For the 'datetime' error: I checked and it exists in all the indices. 
 It has the correct mappings and the therefor probably could not have 
 wrong 
 values I guess. And using the elasticsearch-head plugin I dont get the 
 error but a wrong result which really seems strange.

 Thanks
 Valentin

 On Tuesday, July 29, 2014 11:54:08 AM UTC+2, Colin Goodheart-Smithe 
 wrote:

 Firstly, I think the reason you are only getting results from one 
 index when you are asking for a size of 1 in your terms aggregation is 
 because you are asking for the top 1 bucket from each shard on each 
 index. 
  This will then be merged together and only the top bucket will be kept. 
  If the top bucket is not the same on all indexes then you will not get 
 results from all indices.  Setting the shard_size parameter to something 
 like 10 can help with this (see 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_document_counts_are_approximate
  
 for more information on this)

 Second, I wonder if the reason you are getting the error from your 
 script is that you don't have a 'datetime' value for all of your 
 documents 
 in some of your indices?

 Regards,

 Colin

 On Monday, 28 July 2014 16:04:55 UTC+1, Valentin wrote:

 Hi Colin,

 now it gets really strange. First my alias
 curl 'http://localhost:9200/_alias?pretty'
 { 
   live-2014-07-27 : { 

 aliases : { 

   aggtest : { } 

 } 

   }, 

   live-2014-07-26 : { 

 aliases : { 

   aggtest : { } 

 } 

   } 

 }


 I tried two different queries:
 curl -XPOST 'http://localhost:9200/aggtest/video/_search?pretty=true' 
 -d '{
   size: 0,
   aggs: {
 winners: {
   terms: {
 field: tit,
 order: {
   diff: desc
 },
 size: 1
   },
   aggs: {
 articles_over_time: {
   date_histogram: {
 field: datetime,
 interval: 1d
   }
 },
 diff: {
   sum: {
 script: (doc['datetime'].value  140641200) ? -1 
 : 1,
 lang: groovy
   }
 }
   }
 }
   }
 }'

 and

 curl -XPOST '
 http://localhost:9200/live-2014-07-26,live-2014-07-27/video/_search?pretty=true'
  
 .

 both do give me a result (but a wrong one) when I do query using 
 elasticsearch-head but result in an error if I use the commandline

 {

   error : SearchPhaseExecutionException[Failed to execute phase 
 [query], all shards failed; shardFailures 
 {[_MxuihP3TfmZV4FYUQaRQQ][live-2014-07-26][1]: 
 QueryPhaseExecutionException[[live-2014-07-26][1]: 
 query[ConstantScore(cache(_type:video))],from[0],size[0]: Query Failed 
 [Failed to execute main query]];

EsRejectedExecutionException

2014-07-31 Thread Anand kumar

Hi All,

In my cluster, I'm having around 500 indices. When i'm trying to start the 
elasticsearch instance, its showing the following exception.
Why its happening, what should be done to resolve it?

Thanks,
Anand


[2014-07-31 11:50:01,551][DEBUG][action.search.type   ] [ESCS_NODE] 
[components][3], node[VQMkI5CBRB2hDEnZ5yVLcw], [P], s[STARTED]: Failed to 
execute [org.elasticsearch.action.search.SearchRequest@c34bbb8] lastShard 
[true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: 
rejected execution (queue capacity 1000) on 
org.elasticsearch.search.action.SearchServiceTransportAction$23@60cfc409
at 
org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at 
org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:509)
at 
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:203)
at 
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:171)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:153)
at 
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
at 
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
at 
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at 
org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:101)
at 
org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
at 
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
at 
org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
at 
org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:75)
at 
org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
at 
org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
at 
org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
at 
org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
at 
org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:294)
at 
org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:44)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at 
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at 
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at

Re: Geo distance filter exceptions

2014-07-31 Thread Joffrey Hercule

city.location was an example ;)
You should use yours.

Le mercredi 30 juillet 2014 18:13:53 UTC+2, Madhavan Ramachandran a écrit :

 Nope.. it did not work..got exception as 
 QueryParsingException[[offlocations_geo] failed to find geo_point field 
 [city.location

 Regards
 Madhavan.TR
 On Wednesday, July 30, 2014 10:08:45 AM UTC-5, Joffrey Hercule wrote:

 Hi !
 Use  query.

 ex :
 {
   query : {
 filtered : {
 query : {
 match_all : {}
 },
 filter : {
 geo_distance : {
 distance : 50km,
 city.location : {
 lat : 43.4,
 lon : 5.4
 }
 }
 }
 }
   }
 }

 Le mardi 29 juillet 2014 22:30:12 UTC+2, Madhavan Ramachandran a écrit :

 Hi Team,

 I am trying to find a solution for the below

1. Geo boundary based search.. My index have property for lat and 
lon as double.. not as a geopoint.. Here is my mapping for my index..

 How do i use the lon and lat from the below mapping for geo distance 
 filter/geo distance range filter ?

 Name Type Format Store?
 data string

 getpath string

 id double

 Region string

 Submarket string

 addr1 string

 addr2 string

 city string

 citymarket string

 country string

 countryid long

 cultureid long

 data string

 details string

 fax string

 id string

 language string

 lat double

 lon double

 When I search for the documents.. i got the below exception..

 Query : 
 {
 filter: {
 geo_distance : {
 distance : 300km,
 location : {
 lat : 45,
 lon : -122
 }
 }

 }
 }

 Exception: 
 error: SearchPhaseExecutionException[Failed to execute phase [query], 
 all shards failed; shardFailures 
 {[o3f66HetT3OSpVw895w0nA][offlocations][4]: 
 SearchParseException[[offlocations][4]: from[-1],size[-1]: Parse Failure 
 [Failed to parse source [{\n\filter\: {\n \geo_distance\ : {\n 
 \distance\ : \300km\,\n \location\ : {\n \lat\ : 45,\n \lon\ : 
 -122\n }\n } \n }\n}]]]; nested:

 I tried with removing the location, which i dont have in my mapping.. 

 {filter: { geo_distance : { distance : 300km,
 lat 
 : 45, lon : -122   }  }}

 I got the exception as lon is not a geo_point field..

 ElasticsearchIllegalArgumentException[the character '-' is not a valid 
 geohash character]; }]

 If i remove the - infront of lon.. then the exception says : 

 QueryParsingException[[offlocations] field [lon] is not a geo_point 
 field];




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a95b2a60-6f4d-4727-9b38-67cfd4329465%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Yet another OOME: Java heap space thread :S

Why do you start with 8gb HEAP? Can't you give 16gb or so?

/usr/bin/java -Xms8g -Xmx8g 



--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 30 juil. 2014 à 19:47, Chris Neal chris.n...@derbysoft.net a écrit :

Hi everyone,

First off, apologies for the thread.  I know OOME discussions are somewhat 
overdone in the group, but I need to reach out for some help for this one.  

I have a 2 node development cluster in EC2 on c3.4xlarge AMIs.  That means 16 
vCPUs, 30GB RAM, 1Gb network, and I have 2 500GB EBS volumes for Elasticsearch 
data on each AMI.

I'm running Java 1.7.0_55, and using the G1 collector.  The Java args are:
/usr/bin/java -Xms8g -Xmx8g -Xss256k -Djava.awt.headless=true -server 
-XX:+UseCompressedOops -XX:+UseG1GC -XX:MaxGCPauseMillis=20 
-XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError

The index has 2 shards, each with 1 replica.

I have a daily index being filled with application log data.  The index, on 
average, gets to be about:
486M documents
53.1GB (primary size)
106.2GB (total size)

Other than indexing, there really is nothing going on in the cluster.  No 
searches, or percolators, just collecting data.  

I have:
Tweaked the idex.merge.policy
Tweaked the indices.fielddata.breaker.limit and cache.size
change the index refresh_interval from 1s to 60s
created a default template for the index such that _all is disabled, and all 
fields in the mapping are set to not_analyzed.
Here is my complete elasticsearch.yml:

action:
  disable_delete_all_indices: true
cluster:
  name: elasticsearch-dev
discovery:
  zen:
minimum_master_nodes: 2
ping:
  multicast:
enabled: false
  unicast:
hosts: 10.0.0.45,10.0.0.41
gateway:
  recover_after_nodes: 2
index:
  merge:
policy:
  max_merge_at_once: 5
  max_merged_segment: 15gb
  number_of_replicas: 1
  number_of_shards: 2
  refresh_interval: 60s
indices:
  fielddata:
breaker:
  limit: 50%
cache:
  size: 30%
node:
  name: elasticsearch-ip-10-0-0-45
path:
  data:
  - /usr/local/ebs01/elasticsearch
  - /usr/local/ebs02/elasticsearch
threadpool:
  bulk:
queue_size: 500
size: 75
type: fixed
  get:
queue_size: 200
size: 100
type: fixed
  index:
queue_size: 1000
size: 100
type: fixed
  search:
queue_size: 200
size: 100
type: fixed

The heap sits about 13GB used.   I had been batting OOME exceptions for awhile, 
and thought I had it licked, but one just popped up again.  My cluster has been 
up and running fine for 14 days, and I just got this OOME:

=
[2014-07-30 11:52:28,394][INFO ][monitor.jvm  ] 
[elasticsearch-ip-10-0-0-41] [gc][young][1158834][109906] duration [770ms], 
collections [1]/[1s], total [770ms]/[43.2m], memory [13.4gb]-[13.4gb]/[16gb], 
all_pools {[young] [648mb]-[8mb]/[0b]}{[survivor] [0b]-[0b]/[0b]}{[old] 
[12.8gb]-[13.4gb]/[16gb]}
[2014-07-30 15:03:01,070][WARN ][index.engine.internal] 
[elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed engine [out of 
memory]
[2014-07-30 15:03:10,324][WARN ][netty.channel.socket.nio.AbstractNioSelector] 
Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[2014-07-30 15:03:10,335][WARN ][netty.channel.socket.nio.AbstractNioSelector] 
Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[2014-07-30 15:03:10,324][WARN ][index.merge.scheduler] 
[elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to merge
java.lang.OutOfMemoryError: Java heap space
[2014-07-30 15:03:28,595][WARN ][index.translog   ] 
[elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to flush shard on 
translog threshold
org.elasticsearch.index.engine.FlushFailedEngineException: 
[derbysoft-20140730][0] Flush failed
at 
org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:805)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:604)
at 
org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.run(TranslogService.java:202)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.IllegalStateException: this writer hit an 
OutOfMemoryError; cannot commit
at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4416)
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2989)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3096)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3063)
at 
org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:797)
... 5 more
[2014-07-30 15:03:28,658][WARN ][cluster.action.shard ]

Re: ORA-01882: timezone region not found

2014-07-31 Thread George DRAGU

As I noted, with 10g server there is no problem, but with 9i I catch the
error timezone region not found. As a general rule, in production
environment is recommended to use the last JDK (or JRE) so 7 version
(http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase14-419411.html).

Here we could found a warning about older versions:

WARNING: These older versions of the JRE and JDK are provided to help
developers debug issues in older systems. They are not updated with the
latest security patches and are not recommended for use in production.

But what if our customs still use Oracle 9i? We are forced to use a JDK
version 4 (1.4)?

miercuri, 30 iulie 2014, 21:31:48 UTC+3, Jörg Prante a scris:

Ups, I mean of course, this is *not* ES related ...

Jörg

On Wed, Jul 30, 2014 at 7:59 PM, joerg...@gmail.com javascript:
joerg...@gmail.com javascript: wrote:

This is ES related, but, what Oracle JDBC version is this and what Oracle
Database Server version?

Jörg

On Wed, Jul 30, 2014 at 3:59 PM, George DRAGU george@gmail.com
javascript: wrote:

Hello,

Is it any possibility to specify a parameter value to java command line
behind the JDBC River?
I think at a -Duser.timezone=Europe/Istanbul, for exemple.
When I try to create a JDBC River for an Oracle database (with jprante
plugin) I catch this error.

Thanks

https://groups.google.com/d/msgid/elasticsearch/a75e8e67-01f8-4b20-a2e9-86caba59e5aa%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5b2ae4be-9459-43cf-9542-83eb3412f3fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Constant Score Query Bool Query

2014-07-31 Thread Shawn Ritchie

Hi Guys,

Quick question regarding scoring when ConstantScoreQuery  Bool Query are 
used in conjunction

So here is the query

{

 from: 0,

 size: 10,

 explain: false,

 sort: [_score],

 query: {

 filtered: {

 query: {

 bool: {

 should: [{

 constant_score: {

 query: {

 match: {

 _all: {

 query: test

 }

 }

 },

 boost: 1.0

 }

 }, {

 constant_score: {

 query: {

 match: {

 _all: {

 query: check

 }

 }

 },

 boost: 1.0

 }

 }]

 },

 disable_coord: 1

 },

 filter: [{

 or: [{

 query: {

 match: {

 _all: {

 query: test

 }

 }

 }

 }, {

 query: {

 match: {

 _all: {

 query: check

 }

 }

 }

 }]

 }]

 }

 }

 }


Shouldn't the above query either return a score of either 1 OR 2 why is it 
returning score of lets say 1.4 if I am wrapping the sub queries  with a 
constant score query?

Regards
Shawn

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b749780a-23f3-4e1d-bdea-b6d59c9507e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ORA-01882: timezone region not found

With an Oracle JDBC client jar of 11gR2 or later, you will receive timezone
issues when connecting to Oracle DB 9i. Check Oracle Metalink ID 1068063.1

I feel a bit odd to give Oracle support on ES community mailing list. Sorry
for the noise. You could also open an issue at
https://github.com/jprante/elasticsearch-river-jdbc/issues

Jörg

On Thu, Jul 31, 2014 at 10:22 AM, George DRAGU george.gd.dr...@gmail.com
wrote:

We have multiple Oracle installations, but we tested with 9i and 10g. My
OS is an OS X Mavericks, JDK is the last one (1.7). The only JDBC driver
which works with (JDK 1.7) is for Oracle 12c Release 1 (
http://www.oracle.com/technetwork/database/features/jdbc/jdbc-drivers-12c-download-1958347.html).
As I noted, with 10g server there is no problem, but with 9i I catch the
error timezone region not found. As a general rule, in production
environment is recommended to use the last JDK (or JRE) so 7 version (
http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase14-419411.html).
Here we could found a warning about older versions:

But what if our customs still use Oracle 9i? We are forced to use a JDK
version 4 (1.4)?

miercuri, 30 iulie 2014, 21:31:48 UTC+3, Jörg Prante a scris:

Ups, I mean of course, this is *not* ES related ...

Jörg

On Wed, Jul 30, 2014 at 7:59 PM, joerg...@gmail.com joerg...@gmail.com
wrote:

This is ES related, but, what Oracle JDBC version is this and what
Oracle Database Server version?

Jörg

On Wed, Jul 30, 2014 at 3:59 PM, George DRAGU george@gmail.com
wrote:

Hello,

Thanks

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a75e8e67-01f8-4b20-a2e9-86caba59e5aa%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a75e8e67-01f8-4b20-a2e9-86caba59e5aa%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5b2ae4be-9459-43cf-9542-83eb3412f3fe%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5b2ae4be-9459-43cf-9542-83eb3412f3fe%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHA8JReMKVt4pV5Y29jZ_QnWVeDD3tVWOdjAupN5drijQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

knapsack use case

2014-07-31 Thread Matteo Moci

Hi All,
I have some questions about the knapsack plugin [1].

My idea to use the tool to do a backup to a file, starting from a 0.90.x
instance and then restore it on a different 1.2.x or 1.3.x instance. I see
it can't be done directly, copying to a local/remote cluster.

Would it work doing an intermediate step with a file?
Or the backup still has metadata about the es version it was generated
from, making it impossible?

Is the snapshot and restore feature [2] useful in my use case, or not?

Is the knapsack plugin able to backup and restore also aliases and
mappings, or do I have to manually migrate them before restoring data?

Thanks for the patience and the great work!
Matteo

[1] https://github.com/jprante/elasticsearch-knapsack
[2]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html

-- 
Matteo Moci
http://mox.fm

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAONgFZ60jWViqzRVO6_U-rYo6dUzunE3ojv%2BR5U8HX1Lwp4PdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSearch memory usage on centralized log clusters

2014-07-31 Thread Tim Stoop

Hi all,

We've been running an ElasticSearch cluster of three nodes since last
December. We're running them on Debian Wheezy. Due to the size of our
network, we're getting about 600 messages/s (800 at peak times). Using
logstash and daily indices, we're currently at about 3TB of data spread
over 158 indices. We use 2 replicas, so all machines have all data.

I'm struggling to understand what expected working limits are for
ElasticSearch in a centralized log situation and would really appreciate
input from others. What we eventually try to do is provide about 13 months
of searchable logs via Kibana, but we're mostly running into RAM
constraints. Each node has 3TB of data, but the heap usage is almost
constantly up at 27-29GB.

We did have a problem with garbage collects earlier, which took so long
that a node would drop from the cluster. To fix this, we switched to the G1
gc, which seems very well suited for this operation. The machines we're
using have 12 cores, so the added CPU overhead is negligible (general usage
of the cores is less than 30%). But I'm not enough of a Java dev to judge
if this switch could be the cause of the constant high heap usage. We're
currently at about 1GB RAM per 100GB of disk usage.

My questions:

1- Is 1GB RAM usage per 100GB disk usage an expected usage pattern for an
index heavy cluster?
2- Aside from closing indices, are there other ways of lowering this?
2.5- Should I worry about it?
3- Are we approaching this the wrong way and should we change our setup?
4- Would upgrading to 1.3.1 change the usage significantly, due to the fix
in issue 6856 or is it unrelated?

The numbers:

3 nodes with the following config:
- 92GB of RAM
- 12 cores with HT
- 6x 3.6TB spinning disks (we're contemplating adding CacheCade, since the
controller supports it)
- We expose the disks to elasticsearch via 6 mount points
- Debian Wheezy
- Java 1.7.0_65 OpenJDK JRE with IcedTea 2.5.1 (Debian package with version
7u65-2.5.1-2~deb7u1)
- ElasticSearch 1.2.1 from the ElasticSearch repositories

Config snippets (leaving out ips and the like):

bootstrap:
mlockall: true
cluster:
name: logging
routing:
allocation:
node_concurrent_recoveries: 4
discovery:
zen:
minimum_master_nodes: 2
ping:
unicast:
hosts:
[snip]
index:
number_of_replicas: 2
number_of_shards: 6
indices:
memory:
index_buffer_size: 50%
recovery:
concurrent_streams: 5
max_bytes_per_sec: 100mb
node:
concurrent_recoveries: 6
name: stor1-stor
path:
data:
- /srv/elasticsearch/a
- /srv/elasticsearch/b
- /srv/elasticsearch/c
- /srv/elasticsearch/d
- /srv/elasticsearch/f
- /srv/elasticsearch/e

Java options: -Xms30g -Xmx30g -Xss256k -Djava.awt.headless=true
-XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError

Thanks for your help! Please let me know if you need more information

--
Kind regards,
Tim Stoop

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2846daa5-a75c-4d04-ab34-957984c9e05e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ORA-01882: timezone region not found

2014-07-31 Thread George DRAGU

Thanks for your answer and excuse me for bothering you with unrelated ES 
problems.

joi, 31 iulie 2014, 11:46:51 UTC+3, Jörg Prante a scris:

 With an Oracle JDBC client jar of 11gR2 or later, you will receive 
 timezone issues when connecting to Oracle DB 9i. Check Oracle Metalink ID 
 1068063.1

 I feel a bit odd to give Oracle support on ES community mailing list. 
 Sorry for the noise. You could also open an issue at 
 https://github.com/jprante/elasticsearch-river-jdbc/issues 

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/254d40d7-8b2f-4f82-a484-1aef5ab30142%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Integration testing a native script

2014-07-31 Thread Nick T

Thanks for the help.

My working solution is:

1. Create the plugins/scripts directory created in /tmp for each test node
2. copy my script jar into it
3. Register script in settings:
 .put(script.native.NAME.type, NAME OF SCRIPT)

I am super grateful for all your help. Btw, I wasn't creating it as a 
plugin.


On Wednesday, 30 July 2014 11:33:45 UTC+1, Thomas wrote:

 I have noticed that you mention native java script so you have implemented 
 it as a plugin?
 if so try the following in your settings:
final Settings settings
 = settingsBuilder()
  ...

 .put(plugin.types, YourPlugin.class.getName())

 Thomas


 On Wednesday, 30 July 2014 12:31:06 UTC+3, Nick T wrote:

 Is there a way to have a native java script accessible in integration 
 tests? In my integration tests I am creating a test node in the /tmp 
 folder. 

 I've tried copying the script to /tmp/plugins/scripts but that was quite 
 hopeful and unfortunately does not work.

 Desperate for help.

 Thanks



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0d589406-5c0c-450b-b9b6-901d8278c4dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Understanding tokenization for auto-complete

2014-07-31 Thread Petar Djekic

I'm using ES's auto-completion and i'd like to understand how the prefix 
tokenization works. Example queries and results its returning currently:

1) 'blackb' - 'Blackberry Q10 Red' 
-- Expected

2) 'q' - 'Blackberry Q10 Red'
-- Expected

3) 'q10' - No result, expected is 'Blackberry Q10 Red'
-- Why are results returned when typing in 'q' but not 'q10'?

4) 'blackberry q10' 
-- Expected

5) 'sam' - 'Samsung Galaxy S5' 
-- Expected

6) 'galax'  - 'Samsung Galaxy S5' 
-- Expected

7) 'S5' 
- No result, expected is 'Samsung Galaxy S5' 

I'm indexing the document using input: [blackberry, Q10 Red], input: 
[samsung, galaxy s5], please find the mapping / query below. I thought 
the standard tokenizer would also tokenize on whitespaces and hence give 
result for S5, also i don't understand why 'q' gives results but 'q10' 
doesn't. Can i use the prefix tokenizer for such a use case or would it 
need to switch to ngrams completely?

My mapping looks as follow:

 mappings : {

suggestions : {

 _timestamp: {

   enabled: true,

   path : lastTimestamp

  },

 properties : {

   suggest : { type : completion,

index_analyzer : standard,

search_analyzer : simple,

payloads : true,

context : {

  type : {

type : category,

path : entity

and query like this:

{

suggestions : {

text : query',

completion : {

size : 5,

field : suggest,

fuzzy : {

fuzziness : 1

},

context : {

type : internalcategory'

}

}

}

 }





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a917898-8525-4916-807e-cb72001b30c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Data too large error

2014-07-31 Thread Rhys Campbell

I occasionally get the following error in Kibana from elasticsearch

*1.   *


*Oops!ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException:Data
 
too large, data for field [@timestamp] would be larger than limit 
of[639015321/609.4mb]]*

I even get this when searching for 5 min of data through the Kibana log 
stash search. It will disappear and reappear for the reason apparent to me. 
My total data size is not that big yet.

Elastic search has been left on the default of  splitting the indices by 
day. The current size in 168M for today. 

Any hints?

Cheers,.

Rhys

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ebad6813-41d4-4e41-9a88-ff94058cd2b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: knapsack use case

Snapshot/restore is always recommended, but is a 1.0 feature. This is a
standard API of ES and well supported by the ES team. With that, you can
handle all kinds of index data safely on a binary level, fully and
incrementally.

Knapsack plugin is for document export/import only. I wrote it to transport
_source data harvested over a long time period from a 1.0 system to a
production system. It works on _source or stored fields only. It uses
search/query and bulk indexing API without snapshots, so it is up to the
admin to stop index writes while knapsack runs. There is also a lookup of
index settings and mappings, this information is also included in the
export archive file, and re-applied at import time. But, there is no check
if these settings/mappings can be applied on the target successfully, this
is left to the admin to prepare plugins, analyzers, etc. Aliases are not
transported but this is a good idea for improvement.

Currently, knapsack plugin does not work on ES 1.3 but I am progressing to
implement this. I am adding a Java-level API. Currently it is REST only.

Jörg

On Thu, Jul 31, 2014 at 11:05 AM, Matteo Moci mox...@gmail.com wrote:

Hi All,
I have some questions about the knapsack plugin [1].

My idea to use the tool to do a backup to a file, starting from a 0.90.x
instance and then restore it on a different 1.2.x or 1.3.x instance. I see
it can't be done directly, copying to a local/remote cluster.

Would it work doing an intermediate step with a file?
Or the backup still has metadata about the es version it was generated
from, making it impossible?

Is the snapshot and restore feature [2] useful in my use case, or not?

Is the knapsack plugin able to backup and restore also aliases and
mappings, or do I have to manually migrate them before restoring data?

Thanks for the patience and the great work!
Matteo

[1] https://github.com/jprante/elasticsearch-knapsack
[2]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html

--
Matteo Moci
http://mox.fm

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAONgFZ60jWViqzRVO6_U-rYo6dUzunE3ojv%2BR5U8HX1Lwp4PdA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAONgFZ60jWViqzRVO6_U-rYo6dUzunE3ojv%2BR5U8HX1Lwp4PdA%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGrXthJjCchEf2oyvXKnSZyBp31nvnAeXwAZJaEkvnT5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Nested documents returned in search results by error

2014-07-31 Thread Jennifer Cumming

I'm trying to use a nested object type to store some additional data but 
running into some inconsistencies versus the documentation when it comes to 
searching. From the documentation it sounds like a simple search like

{
  query: {
bool: {
  must : {
query_string: {
  query: jane
}
  }
}
  }
}

Should not return any documents where the only instance of jane is in a 
nested document. Yet when I run this search against my test data it does. 
Mystery deepens when I then use the explain API to try work out why it was 
returned and it says the document was matched on _all yet I have 
include_in_all set to false in the mapping.

mapping and data for the two documents 
here https://gist.github.com/anonymous/febab9c09bdf9ea9849c
Search results here https://gist.github.com/anonymous/74311fab5e2452938505
Explain here https://gist.github.com/anonymous/4fed3653658d70fb0df8

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a2ab8e2-f725-4eb5-a4ee-1cb9d6a5097f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch memory usage on centralized log clusters

A lot of the answers for performance and capacity are it depends.
You'll get *much* better performance from Oracle java, 1.7u55 is current
recommended. Given you're experimenting with GCG1 (which isn't current
best practise, hence experimenting), you might even want to try Java 1.8.

If you want to drop memory use you can disable bloom filtering.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 19:16, Tim Stoop tim.st...@gmail.com wrote:

Hi all,

My questions:

The numbers:

3 nodes with the following config:
- 92GB of RAM
- 12 cores with HT
- 6x 3.6TB spinning disks (we're contemplating adding CacheCade, since the
controller supports it)
- We expose the disks to elasticsearch via 6 mount points
- Debian Wheezy
- Java 1.7.0_65 OpenJDK JRE with IcedTea 2.5.1 (Debian package with
version 7u65-2.5.1-2~deb7u1)
- ElasticSearch 1.2.1 from the ElasticSearch repositories

Config snippets (leaving out ips and the like):

Java options: -Xms30g -Xmx30g -Xss256k -Djava.awt.headless=true
-XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError

Thanks for your help! Please let me know if you need more information

--
Kind regards,
Tim Stoop

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2846daa5-a75c-4d04-ab34-957984c9e05e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2846daa5-a75c-4d04-ab34-957984c9e05e%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624aGOa%2B_EmSFkQgXZSv_gcyT%2BS1rO_XnpcZoaLDku4iruA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

3rd party scoring service

2014-07-31 Thread Alex S.V.

Hello,

My idea is to use 3rd party scoring service (REST), and currently I'd like
to use native scripts and play with NativeScriptFactory.
The approach has many drawbacks.

Here is my problem - assume we have two entities - products and product
prices. I should filter by price.
Price is a complex thing, because it depends on many factors, like request
date, remote user information, custom provided parameters. In case of
regular parent - child relation and has_child query it's too complex and
too slow to implement it using scripting (currently mvel).

Also one more condition - i have not many products - around 25K, and around
25M different base price items (which are basic for future price
calculation).
There are next ideas:
1. Have a service, which returns exact price for all product by custom
parameters like. The drawback is - there should be 5 same calls from each
shard (if 5 by default). In this case it doesn't matter, where base prices
are stored - in elasticsearch index, in database or in in-memory storage.
2. Write a code, which operates over child price documents on concrete
shard. In this case it will generate prices only for all properties from
particular shard. But I don't know, if I can access shard index or make
calls to the index from concrete shard in NativeScriptFactory class.

Could you point me the right way?

P.S. Initially I was interested in Redis-Elasticsearch
example http://java.dzone.com/articles/connecting-redis-elasticsearch

Thanks,
Alex

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: 3rd party scoring service

2014-07-31 Thread Itamar Syn-Hershko

You should bring the price over to Elasticsearch and not the other way
around. Scoring against an external service is an added friction with huge
performance costs.

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jul 31, 2014 at 1:50 PM, Alex S.V. alexs.vasile...@gmail.com
wrote:

Hello,

My idea is to use 3rd party scoring service (REST), and currently I'd like
to use native scripts and play with NativeScriptFactory.
The approach has many drawbacks.

Also one more condition - i have not many products - around 25K, and
around 25M different base price items (which are basic for future price
calculation).
There are next ideas:
1. Have a service, which returns exact price for all product by custom
parameters like. The drawback is - there should be 5 same calls from each
shard (if 5 by default). In this case it doesn't matter, where base prices
are stored - in elasticsearch index, in database or in in-memory storage.
2. Write a code, which operates over child price documents on concrete
shard. In this case it will generate prices only for all properties from
particular shard. But I don't know, if I can access shard index or make
calls to the index from concrete shard in NativeScriptFactory class.

Could you point me the right way?

P.S. Initially I was interested in Redis-Elasticsearch example
http://java.dzone.com/articles/connecting-redis-elasticsearch

Thanks,
Alex

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtVL%2B8Qia7pAcYVt0y0EH8d4U3X7%2Bd3N2v%3DxyFfhjQMdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: 3rd party scoring service

2014-07-31 Thread Alex S.V.

I think it's acceptable if service responds with 20ms and using some thrift
protocol for example. It's much better then current 500ms - 5s calculations
using elasticsearch scripting.
If we have 25K products than it could be around 300Kb data package from
this service. The risk is in possible broken communication or some
increased latency

Alex

On Thursday, July 31, 2014 1:59:36 PM UTC+3, Itamar Syn-Hershko wrote:

You should bring the price over to Elasticsearch and not the other way
around. Scoring against an external service is an added friction with huge
performance costs.

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jul 31, 2014 at 1:50 PM, Alex S.V. alexs.v...@gmail.com
javascript: wrote:

Hello,

My idea is to use 3rd party scoring service (REST), and currently I'd
like to use native scripts and play with NativeScriptFactory.
The approach has many drawbacks.

Here is my problem - assume we have two entities - products and product
prices. I should filter by price.
Price is a complex thing, because it depends on many factors, like
request date, remote user information, custom provided parameters. In case
of regular parent - child relation and has_child query it's too complex and
too slow to implement it using scripting (currently mvel).

Also one more condition - i have not many products - around 25K, and
around 25M different base price items (which are basic for future price
calculation).
There are next ideas:
1. Have a service, which returns exact price for all product by custom
parameters like. The drawback is - there should be 5 same calls from each
shard (if 5 by default). In this case it doesn't matter, where base prices
are stored - in elasticsearch index, in database or in in-memory storage.
2. Write a code, which operates over child price documents on concrete
shard. In this case it will generate prices only for all properties from
particular shard. But I don't know, if I can access shard index or make
calls to the index from concrete shard in NativeScriptFactory class.

Could you point me the right way?

P.S. Initially I was interested in Redis-Elasticsearch example
http://java.dzone.com/articles/connecting-redis-elasticsearch

Thanks,
Alex

https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c61f9637-3de8-4906-a2c4-49055dee2cd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch memory usage on centralized log clusters

2014-07-31 Thread Tim Stoop

Hi Mark,

Thanks for your reply!

Op donderdag 31 juli 2014 12:16:36 UTC+2 schreef Mark Walkom:

A lot of the answers for performance and capacity are it depends.

Well, I'm currently not that worried about performance, as long as it can
keep up with the amount of data we throw at it. It's currently 800
messages/s at peak times, but we expect to grow to about 2000. Beyond that,
as long as searches do not time out, I'd be happy. I'm much more worried
about stability at the moment, hence my question regarding if I should be
worried about the memory usage I'm seeing.

You'll get *much* better performance from Oracle java, 1.7u55 is current
recommended. Given you're experimenting with GCG1 (which isn't current
best practise, hence experimenting), you might even want to try Java 1.8.

Ok, will try the Oracle JRE. Regarding G1 being experimental, I assume you
mean from ES' POV, right? Because from what I read, it's fully supported in
Java 7. I didn't find any other way to solve the 'stop the world' gc the
CMS ran into every few hours :S I'm not a Java dev, however, just wanted
something that didn't crash once a day.

If you want to drop memory use you can disable bloom filtering.

Done and that did indeed help a little.

--
Kind regards,
Tim Stoop

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/df14168b-3615-450e-9a8f-4360c811d7d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elastic Search losing data not in _source on _update

Hi, 

Every time I run a POST request using _update, I notice that any indexed 
information I didn't put in _source appears to go missing.

Obviously, it would be ideal if I didn't have to store, for example, the 
contents of a several-megabyte file in _source in order to keep it in my 
record after calling the _update method on my index/mapping. 

To start, here is the version info for elastic search:

{
  status : 200,
  name : Feron,
  version : {
number : 1.3.1,
build_hash : 2de6dc5268c32fb49b205233c138d93aaf772015,
build_timestamp : 2014-07-28T14:45:15Z,
build_snapshot : false,
lucene_version : 4.9
  },
  tagline : You Know, for Search
}


Here's my cluster health: 


{
  cluster_name : my-cluster,
  status : green,
  timed_out : false,
  number_of_nodes : 2,
  number_of_data_nodes : 2,
  active_primary_shards : 5,
  active_shards : 10,
  relocating_shards : 0,
  initializing_shards : 0,
  unassigned_shards : 0
}


A script for recreating the issue is attached. In it, I create a mapping and 
save a record using the attachment plugin. The records correctly match searches 
on a field in _source, a field excluded from _source, and within the content 
(attachment) field (also excluded from source).


As soon as I make the POST request to …/_update searches against fields 
excluded from _source return 0 hits.


Is the only solution to this to store all fields in _source if I plan on 
calling _update on the record?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/341a24f2-aedf-4f5f-9a9e-1434b9ea1e62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
indexserver=example.org:9200
indexname=sample
curl -XDELETE http://$indexserver/$indexname/losing_data/_mapping?pretty=1;
curl -XPUT http://$indexserver/$indexname/losing_data/_mapping?pretty=1; -d '
{
  losing_data : {
_source : {
  enabled: true,
  excludes : [ content, not_sourced ]
},
properties : {
  record_counts : {
type : nested,
include_in_parent: true,
properties : {
  first_count : {
type : long
  },
  second_count : {
type : long
  }
}
  },

  description : {
type : string
  },
  not_sourced : {
type : string
  },
  blarf : {
type: string
  },
  content : {
type : attachment
}
  }

}
  }
}
'




file_path='test-1.rtf' # test-1.rtf is an RTF file containing the phrase This contains red
file_content=`cat $file_path | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json='
{
   content : '${file_content}',
   not_sourced: Giraffe,
   description: This is not about bears,
   record_counts: {
first_count: 1,
second_count: 100
   }
}
'
echo $json  json.file
curl -XPUT http://$indexserver/$indexname/losing_data/1; -d @json.file

curl http://$indexserver/$indexname/losing_data/_search?q=redpretty=1;
# should return one hit (from the file)
curl http://$indexserver/$indexname/losing_data/_search?q=giraffepretty=1;
# should return one hit (from not_sourced field)
curl http://$indexserver/$indexname/losing_data/_search?q=bearspretty=1;
# should return one hit (from description)

curl http://$indexserver/$indexname/losing_data/1?pretty=1;
# record_counts  first_count should be 1
curl -XPOST http://$indexserver/$indexname/losing_data/1/_update; -d '{
script: ctx._source.record_counts.first_count += 1,
lang: groovy
}'
curl http://$indexserver/$indexname/losing_data/1/?pretty=1;
# record_counts  first_count should be 2

curl http://$indexserver/$indexname/losing_data/_search?q=redpretty=1;
# I get 0 hits
curl http://$indexserver/$indexname/losing_data/_search?q=giraffepretty=1;
# I get 0 hits
curl http://$indexserver/$indexname/losing_data/_search?q=bearspretty=1;
# description is in source, so I still get a hit

Re: ElasticSearch memory usage on centralized log clusters

GCG1 is experimental in that it's not recommended by the ES team as you
guessed, even if it is supported within java.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 21:30, Tim Stoop tim.st...@gmail.com wrote:

Hi Mark,

Thanks for your reply!

Op donderdag 31 juli 2014 12:16:36 UTC+2 schreef Mark Walkom:

A lot of the answers for performance and capacity are it depends.

If you want to drop memory use you can disable bloom filtering.

Done and that did indeed help a little.

--
Kind regards,
Tim Stoop

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/df14168b-3615-450e-9a8f-4360c811d7d0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df14168b-3615-450e-9a8f-4360c811d7d0%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624aisD-1uRrKY-V_ox2gBah2GKqC21YEfbGZvm5V4n6nfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommendations needed for large ELK system design

2014-07-31 Thread Alex

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

1. I haven't looked into it much yet but I'm guessing Curator can handle
different index naming schemes. E.g. logs-2014.06.30 and
stats-2014.06.30. We'd actually be wanting to store the stats data for 2
years and logs for 90 days so it would indeed be helpful to split the data
into different index sets. Do you use Curator?

2. You say that you have 3 masters that also handle queries... but I
thought all masters did was handle queries? What is a master node that
*doesn't* handle queries? Should we have search load balancer nodes? AKA
not master and not data nodes.

3. In the interests of reducing the number of node combinations for us
to test out would you say, then, that 3 master (and query(??)) only nodes,
and the 6 1TB data only nodes would be good?

4. Quorum and split brain are new to me. This webpage

http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/
about
split brain recommends setting *discovery.zen.minimum_master_nodes* equal
to *N/2 + 1*. This formula is similar to the one given in the
documentation for quorum

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:

index operations only succeed if a quorum (replicas/2+1) of active shards
are available. I completely understand the split brain issue, but not
quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

1 - Looks ok, but why two replicas? You're chewing up disk for what
reason? Extra comments below.
2 - It's personal preference really and depends on how your end points
send to redis.
3 - 4GB for redis will cache quite a lot of data if you're only doing 50
events p/s (ie hours or even days based on what I've seen).
4 - No, spread it out to all the nodes. More on that below though.
5 - No it will handle that itself. Again, more on that below though.

Suggestions;
Set your indexes to (factors of) 6 shards, ie one per node, it spreads
query performance. I say factors of in that you can set it to 12 shards
per index to start and easily scale the node count and still spread the
load.
Split your stats and your log data into different indexes, it'll make
management and retention easier.
You can consider a master only node or (ideally) three that also handle
queries.
Preferably have an uneven number of master eligible nodes, whether you
make them VMs or physicals, that way you can ensure quorum is reached with
minimal fuss and stop split brain.
If you use VMs for master + query nodes then you might want to look at
load balancing the queries via an external service.

To give you an idea, we have a 27 node cluster - 3 masters that also
handle queries and 24 data nodes. Masters are 8GB with small disks, data
nodes are 60GB (30 heap) and 512GB disk.
We're running with one replica and have 11TB of logging data. At a high
level we're running out of disk more than heap or CPU and we're very write
heavy, with an average of 1K events p/s and comparatively minimal reads.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex alex@gmail.com javascript: wrote:

Hello,

We wish to set up an entire ELK system with the following features:

- Input from Logstash shippers located on 400 Linux VMs. Only a
handful of log sources on each VM.
- Data retention for 30 days, which is roughly 2TB of data in indexed
ES JSON form (not including replica shards)
- Estimated input data rate of 50 messages per second at peak hours.
Mostly short or medium length one-line messages but there will be Java
traces and very large service responses (in the form of XML) to deal with
too.
- The entire system would be on our company LAN.
- The stored data will be a mix of application logs (info, errors
etc) and server stats (CPU, memory usage etc) and would mostly be
accessed
through Kibana.

This is our current plan:

- Have the LS shippers perform minimal parsing (but would do
multiline). Have them point to two load-balanced servers containing Redis
and LS indexers (which would do all parsing).
- 2 replica shards for each index, which ramps the total data storage
up to 6TB
- ES cluster spread over 6 nodes. Each node is 1TB in size
- LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

1. Does the balance between the number of nodes, the number of
replica

Failure connecting to x.x.x.x: dial tcp x.x.x.x:x: connection refused

2014-07-31 Thread Indirajith V

Hi all, I am Naive to elasticsearch and logstash. Logstash suddenly. When I 
looked in to the logs parsed and I can only see the following.
Jul 31 14:55:16 logs 2014-07-31T14:55:16+02:00  
logstash-forwarder[22514]: 2014/07/31 14:55:16.595025 Failure connecting to 
x.x,x.x: dial tcp x.x.x.x:x: connection refused.
Can anyone help to rectify this. I can not find any solution. Thanks in 
advance!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/96794c7b-120c-4159-945d-ef184c94abe1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic Search losing data not in _source on _update

Yes you need the complete source so excluding fields won't work as expected.
In that case, you need to send back the attachment again I guess.



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 31 juillet 2014 à 13:38:54, Jordan Reiter (jordantheco...@gmail.com) a écrit:

Hi, 

Every time I run a POST request using _update, I notice that any indexed 
information I didn't put in _source appears to go missing.

Obviously, it would be ideal if I didn't have to store, for example, the 
contents of a several-megabyte file in _source in order to keep it in my record 
after calling the _update method on my index/mapping. 

To start, here is the version info for elastic search:


{
  status : 200,
  name : Feron,
  version : {
number : 1.3.1,
build_hash : 2de6dc5268c32fb49b205233c138d93aaf772015,
build_timestamp : 2014-07-28T14:45:15Z,
build_snapshot : false,
lucene_version : 4.9
  },
  tagline : You Know, for Search
}




Here's my cluster health:  




{
  cluster_name : my-cluster,
  status : green,
  timed_out : false,
  number_of_nodes : 2,
  number_of_data_nodes : 2,
  active_primary_shards : 5,
  active_shards : 10,
  relocating_shards : 0,
  initializing_shards : 0,
  unassigned_shards : 0
}


A script for recreating the issue is attached. In it, I create a mapping and 
save a record using the attachment plugin. The records correctly match searches 
on a field in _source, a field excluded from _source, and within the content 
(attachment) field (also excluded from source).


As soon as I make the POST request to …/_update searches against fields 
excluded from _source return 0 hits.


Is the only solution to this to store all fields in _source if I plan on 
calling _update on the record?
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/341a24f2-aedf-4f5f-9a9e-1434b9ea1e62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53da44d3.38437fdb.f0d0%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: more like this vs. mlt

2014-07-31 Thread Peter Li

Sorry, I copied and pasted the wrong query. It still didn't work with the 
min* clauses in the body:

curl -XGET $url/ease/RadiologyResult/_search?routing=07009409pretty -d '
{

query : {
more_like_this : {
fields: [
Observation.Value
],
ids : [ 90642 ],
min_term_freq : 1,
min_doc_freq : 1
}
}
}'

This returned nothing either.

On Wednesday, July 30, 2014 11:03:28 PM UTC-5, vineeth mohan wrote:

 Hello Peter , 

 You have set these variable for the API and not the query , that is why 
 its working - min_term_freq=1 , min_doc_freq=1

 Thanks
Vineeth


 On Thu, Jul 31, 2014 at 5:02 AM, Peter Li jenli...@gmail.com 
 javascript: wrote:

 I ran a query:

 curl -XGET 
 $url/ease/RadiologyResult/90642/_mlt?routing=07009409mlt_fields=Observation.Valuemin_term_freq=1min_doc_freq=1pretty

 It worked and returned several documents. But if I ran this:

 curl -XGET $url/ease/RadiologyResult/_search?routing=07009409pretty -d 
 '
 {
 
 query : {
 more_like_this : {
 fields: [
 Observation.Value
 ],
 ids : [ 90642 ]
}
}
 }'

 It returned nothing. Is there something I am missing ?

 Thanks in advance.
  
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ffa73983-ce04-4fd3-9786-ee09d3248d83%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ffa73983-ce04-4fd3-9786-ee09d3248d83%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3a47fc16-cc0d-4871-8ea6-0287635fa66e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic Search losing data not in _source on _update

So I guess using updates is not a good idea for records with file 
attachments.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/972f1b41-d10b-4ebc-8ac2-c83b80891924%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic Search losing data not in _source on _update

In your case, it's not. Because you excluded the attachment field.

If you are a Java developer, you could easily use Tika directly in your own
code and send to elasticsearch only the extracted content and not the binary
file.
In that case, you could remove mapper attachment plugin.

If not, I think you need to send again the full JSON document, including the
binary file.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 31 juillet 2014 à 16:32:04, Jordan Reiter (jordantheco...@gmail.com) a écrit:

So I guess using updates is not a good idea for records with file attachments.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/972f1b41-d10b-4ebc-8ac2-c83b80891924%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53da5401.684a481a.f0d0%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Diagnosing a slow query

2014-07-31 Thread Christopher Ambler

Okay, let's attack this directly. We have a cluster of 6 machines (6 
nodes). We have an index of just under 3.5 million documents. Each document 
represents an Internet domain name. We are performing queries against this 
index to see names that exist in our index. Most queries are coming back in 
the sub-50ms range. But a bunch are taking 600ms to 900ms and, thus, 
showing up in our slow query log. If they ALL were performing at this 
speed, I'd wouldn't be nearly as confused, but it looks like only about 10% 
to 20% of the queries are slow. That's clearly too much.

Head reports that this index looks like this:

aftermarket-2014-07-31_02-38-19
size: 424Mi (2.47Gi)
docs: 3,428,471 (3,428,471)

Here is the configuration for a typical node (they're all pretty-much the 
same). We have 2 machines in a dev data center, 2 machines in a mesa data 
center and 2 machines in a phx data center. Each of the two machines in a 
data center has a node.zone tag set, and, as you can see, I have the 
cluster routing awareness set to see zone as its marching orders. The 
data pipes between the data centers are beefy, and while I acknowledge that 
cross-DC isn't something that's generally smiled-upon, it appears to work 
fine.

Each machine has 96G of RAM. We start ES giving it 30G for the heap size. 
File descriptors are set at 64,000. Note that I've selected the memory 
mapped file system.

#
# Server-specific settings for cluster domainiq-es
#
cluster.name: domainiq-es
node.name: Mesa-03
node.zone: es-mesa-prod
discovery.zen.ping.unicast.hosts: [dev2.glbt1.gdg, m1p1.mesa1.gdg, 
m1p4.mesa1.gdg, p3p3.phx3.gdg, p3p4.phx3.gdg]
#
# The following configuration items should be the same for all ES servers
#
node.master: true
node.data: true
index.number_of_shards: 5
index.number_of_replicas: 5
index.store.type: mmapfs
index.memory.index_buffer_size: 30%
index.translog.flush_threshold_ops: 25000
index.refresh_interval: 30s
bootstrap.mlockall: true
cluster.routing.allocation.awareness.attributes: zone
gateway.recover_after_nodes: 4
gateway.recover_after_time: 2m
gateway.expected_nodes: 6
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.timeout: 10s
discovery.zen.ping.retries: 3
discovery.zen.ping.interval: 15s
discovery.zen.ping.multicast.enabled: false

And here is a typical slow query:

[2014-07-31 07:35:31,530][WARN ][index.search.slowlog.query] [Mesa-03] 
[aftermarket-2014-07-31_02-38-19][2] took[707.6ms], took_millis[707], 
types[premium], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], 
source[], 
extra_source[{size:35,query:{query_string:{query:sld:petusies^20.0 
OR tokens:(((pet^1.2 pets^1.0 *^1.0)AND(us^1.2 *^0.8)AND(ie^1.2 
*^0.6)AND(s^1.2 *^0.4)) OR((pet^1.2 pets^1.0)AND(us^1.2)AND(ie^1.2))^3.0) 
AND tld:(com^1.001 OR in^0.99 OR co.in^0.941174367459617 OR 
net.in^0.8848832474555992 OR us^0.85 OR org.in^0.8397882862729736 OR 
gen.in^0.785829669672289 OR firm.in^0.7414549824163524 OR ind.in^0.7 OR 
org^0.6) OR 
_id:petusi.es^5.0-domaintype:partner,lowercase_expanded_terms:true,analyze_wildcard:false}}}],
 


So note that I create 5 shards and 5 replicas, so that each node has all 5 
shards at all times. I THOUGHT THIS MEANT BETTER PERFORMANCE. That is, I 
thought having all 5 shards on every node meant that a query to a node 
didn't have to ask another node for data. IS THIS NOT TRUE?

Here's where it also gets interesting: I tried setting the number of shards 
to 2 (with 5 replicas) and my slow queries went to almost 2 seconds 
(2000ms). This is also terribly counter-intuitive! I thought fewer shards 
meant less lookup time.

Clearly, I want to optimize for read here. I don't care if indexing is 
three times as slow, we need our queries to be sub-100ms.

Any help is SERIOUSLY appreciated (and if you're in the Bay Area, I'm not 
above bribes of beer :-))

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/43b8bd8a-b20f-49de-a99d-825168095d6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Yet another OOME: Java heap space thread :S

2014-07-31 Thread Chris Neal

Ooops.  Sorry.  That was a copy/paste error.  It is using 16GB.  Here is
the correct process arguments:

/usr/bin/java -Xms16g -Xmx16g -Xss256k -Djava.awt.headless=true -server
-XX:+UseCompressedOops -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.pidfile=/var/run/elasticsearch/elasticsearch.pid
-Des.path.home=/usr/share/elasticsearch [snip CP]

Thanks!
Chris


On Thu, Jul 31, 2014 at 2:43 AM, David Pilato da...@pilato.fr wrote:

 Why do you start with 8gb HEAP? Can't you give 16gb or so?

 /usr/bin/java -Xms8g -Xmx8g



 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 30 juil. 2014 à 19:47, Chris Neal chris.n...@derbysoft.net a écrit :

 Hi everyone,

 First off, apologies for the thread.  I know OOME discussions are somewhat
 overdone in the group, but I need to reach out for some help for this one.

 I have a 2 node development cluster in EC2 on c3.4xlarge AMIs.  That means
 16 vCPUs, 30GB RAM, 1Gb network, and I have 2 500GB EBS volumes for
 Elasticsearch data on each AMI.

 I'm running Java 1.7.0_55, and using the G1 collector.  The Java args are:

 /usr/bin/java -Xms8g -Xmx8g -Xss256k -Djava.awt.headless=true -server
 -XX:+UseCompressedOops -XX:+UseG1GC -XX:MaxGCPauseMillis=20
 -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError
 The index has 2 shards, each with 1 replica.

 I have a daily index being filled with application log data.  The index,
 on average, gets to be about:
 486M documents
 53.1GB (primary size)
 106.2GB (total size)

 Other than indexing, there really is nothing going on in the cluster.  No
 searches, or percolators, just collecting data.

 I have:

- Tweaked the idex.merge.policy
- Tweaked the indices.fielddata.breaker.limit and cache.size
- change the index refresh_interval from 1s to 60s
- created a default template for the index such that _all is disabled,
and all fields in the mapping are set to not_analyzed.

 Here is my complete elasticsearch.yml:

 action:
   disable_delete_all_indices: true
 cluster:
   name: elasticsearch-dev
 discovery:
   zen:
 minimum_master_nodes: 2
 ping:
   multicast:
 enabled: false
   unicast:
 hosts: 10.0.0.45,10.0.0.41
 gateway:
   recover_after_nodes: 2
 index:
   merge:
 policy:
   max_merge_at_once: 5
   max_merged_segment: 15gb
   number_of_replicas: 1
   number_of_shards: 2
   refresh_interval: 60s
 indices:
   fielddata:
 breaker:
   limit: 50%
 cache:
   size: 30%
 node:
   name: elasticsearch-ip-10-0-0-45
 path:
   data:
   - /usr/local/ebs01/elasticsearch
   - /usr/local/ebs02/elasticsearch
 threadpool:
   bulk:
 queue_size: 500
 size: 75
 type: fixed
   get:
 queue_size: 200
 size: 100
 type: fixed
   index:
 queue_size: 1000
 size: 100
 type: fixed
   search:
 queue_size: 200
 size: 100
 type: fixed

 The heap sits about 13GB used.   I had been batting OOME exceptions for
 awhile, and thought I had it licked, but one just popped up again.  My
 cluster has been up and running fine for 14 days, and I just got this OOME:

 =
 [2014-07-30 11:52:28,394][INFO ][monitor.jvm  ]
 [elasticsearch-ip-10-0-0-41] [gc][young][1158834][109906] duration [770ms],
 collections [1]/[1s], total [770ms]/[43.2m], memory
 [13.4gb]-[13.4gb]/[16gb], all_pools {[young]
 [648mb]-[8mb]/[0b]}{[survivor] [0b]-[0b]/[0b]}{[old]
 [12.8gb]-[13.4gb]/[16gb]}
 [2014-07-30 15:03:01,070][WARN ][index.engine.internal]
 [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed engine [out of
 memory]
 [2014-07-30 15:03:10,324][WARN
 ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the
 selector loop.
 java.lang.OutOfMemoryError: Java heap space
 [2014-07-30 15:03:10,335][WARN
 ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the
 selector loop.
 java.lang.OutOfMemoryError: Java heap space
 [2014-07-30 15:03:10,324][WARN ][index.merge.scheduler]
 [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to merge
 java.lang.OutOfMemoryError: Java heap space
 [2014-07-30 15:03:28,595][WARN ][index.translog   ]
 [elasticsearch-ip-10-0-0-41] [derbysoft-20140730][0] failed to flush shard
 on translog threshold
 org.elasticsearch.index.engine.FlushFailedEngineException:
 [derbysoft-20140730][0] Flush failed
 at
 org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:805)
  at
 org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:604)
 at
 org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.run(TranslogService.java:202)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IllegalStateException: this writer hit an

Can the elasticsearch-reindex plugin preserve internal search indices?

2014-07-31 Thread Chris Berry

Greetings,

Recently, as suggested in this mailing list and the inter-web in several 
posts, we used the elasticsearch-reindex plugin to change the number of 
Shards for a given large Index, because the Shards for that Index had 
become too large.

But after the reindex — we could not search against the new Index. 
All of the stored data was transferred properly though.

In essence, all of the internal search indices did not transfer from the 
old Index to the new.
(The first sign of this was that the new Index was about 1/4 the size of 
the old)

I am assuming that I just did this wrong??
Is it possible to reindex from one Index into another, preserving all the 
internal search indices??

BTW: it doesn’t seem to matter that whether we had the _source saved or 
not??

Thanks,
— Chris 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b86cc2fe-87df-4126-8145-65c2984e7755%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Failed to configure logging error on start up.

That scenario should not happen since FileVisitOption.FOLLOW_LINKS is
enabled.

https://github.com/elasticsearch/elasticsearch/blob/fe86c8bc88a321bf587dd8eb4df52aaed9ed2156/src/main/java/org/elasticsearch/common/logging/log4j/LogConfigurator.java#L107

Seems like a bug somewhere.

-- 

Ivan


On Wed, Jul 30, 2014 at 11:54 AM, Peter Li jenli.pe...@gmail.com wrote:

 Did more experiments. If I used a real scripts directory, instead of a
 symbolic link,
 then no error message. But does this means that I will have to drop the
 same script
 into all my server's config/scripts directory ? It would be nice to use
 symbolic links
 for this.

 Any suggestions ?

 On Wednesday, July 30, 2014 1:36:24 PM UTC-5, Peter Li wrote:

 I have a setup with multiple servers.
 The file tree for each is like the following:

 /data/
configs/
  elastic-1.yml
  logging-1.yml
  scripts/
(empty)
elastic-core/   (from distribution)
  bin/...
  config/...
  lib/...
  logs/...
elastic-1/
   bin -- ../elastic-core/bin
   config/
  elasticsearch.yml -- ../../configs/elastic-1.yml
  logging.yml -- ../../configs/logging-1.yml
  scripts -- ../../configs/scripts
   data/...
   lib -- ../elastic-core/lib
   logs/...

 In the elastic-1.yml, I have:

path.conf=/data/elastic-1/config

 When I start the node without the config/scripts symbolic link:

 /data/elastic-1/bin/elasticsearch -Des.config=/data/elastic-1/
 config/elasticsearch.yml

 It runs fine. But if I have the scripts link/directory, it complains of:

 Failed to configure logging...
 org.elasticsearch.ElasticsearchException: Failed to load logging
 configuration
 at org.elasticsearch.common.logging.log4j.LogConfigurator.
 resolveConfig(LogConfigurator.java:117)
 at org.elasticsearch.common.logging.log4j.LogConfigurator.
 configure(LogConfigurator.java:81)
 at org.elasticsearch.bootstrap.Bootstrap.setupLogging(Bootstrap.java:94)
 at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:178)
 at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
 Caused by: java.nio.file.FileSystemException: /data/elastic-1/config:
 Unknown error 1912615013
 at sun.nio.fs.UnixException.translateToIOException(Unknown Source)
 at sun.nio.fs.UnixException.asIOException(Unknown Source)
 at sun.nio.fs.UnixDirectoryStream$UnixDirectoryIterator.readNextEntry(Unknown
 Source)
 at sun.nio.fs.UnixDirectoryStream$UnixDirectoryIterator.hasNext(Unknown
 Source)
 at java.nio.file.FileTreeWalker.walk(Unknown Source)
 at java.nio.file.FileTreeWalker.walk(Unknown Source)
 at java.nio.file.Files.walkFileTree(Unknown Source)
 at org.elasticsearch.common.logging.log4j.LogConfigurator.
 resolveConfig(LogConfigurator.java:107)
 ... 4 more
 log4j:WARN No appenders could be found for logger (common.jna).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
 more info.

 Any guesses as to why it is complaining ?

 Thanks in advance.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5a010109-7f5e-43a4-b0b6-1ee59957fb8d%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/5a010109-7f5e-43a4-b0b6-1ee59957fb8d%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB-aWq4cOJ0%2BZMuBBOwCWSBjk70JNgHu8bLLg8tt4F%3DWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ip type and support for a port?

Yes. :80 should not be part of the IP. It's a port.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 31 juillet 2014 à 16:56:29, Chris Neal (chris.n...@derbysoft.net) a écrit:

Hi Jorg,

Here's an example:

An ip field named as 'host' in mapping.
host: {
type :ip
},

Then I add below to ES
POST /test_index/test
{
host:127.0.0.1:80
}

ES returns this error
{
error: MapperParsingException[failed to parse [host]]; nested:
ElasticsearchIllegalArgumentException[failed to parse ip [127.0.0.1:80]];
nested: NumberFormatException[For input string: \1:80\]; ,
status: 400
}

If I only post IP without port like this
POST /test_index/test
{
host:127.0.0.1
}
ES return success.

Does that help explain the issue?
Thanks!
Chris

On Wed, Jul 30, 2014 at 4:35 PM, joergpra...@gmail.com joergpra...@gmail.com
wrote:
Can you give an example what you mean by IP ports?

Transport protocols like TCP has ports, but IP (Internet addresses) is used to
address hosts on a network.

Jörg

On Wed, Jul 30, 2014 at 11:02 PM, Chris Neal chris.n...@derbysoft.net wrote:
Hi all,

I'm trying to use the ip type in ES, but my IPs also have ports. That doesn't
seem to be supported, which was a bit of a surprise!

Does anyone know of a way to do this? Or does it sound like a good feature to
add support for to this type?

Thanks!
Chris
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DpgEWdxZcaFe1wmSWtjRfXqxvUC0vbq8bfE0mC7DusS_%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKah6pkCWgKZYASM%3DfxyBq8waPQjyCBHj2aZaAeyiRwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjhbQunFsT8gi0bspkNDmj9u5kOyhohoAuEX20P0rcBkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53da6df3.75c6c33a.f0d0%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: ip type and support for a port?

Maybe it would help to implement a new ES data type for URI/URL?

Usage example:

POST /test/test
{
url : http://127.0.0.1:80;
}

Jörg

On Thu, Jul 31, 2014 at 4:56 PM, Chris Neal chris.n...@derbysoft.net
wrote:

Hi Jorg,

Here's an example:

An ip field named as 'host' in mapping.
host: {
type :ip
},

Then I add below to ES
POST /test_index/test
{
host:127.0.0.1:80 http://127.0.0.1/
}

If I only post IP without port like this
POST /test_index/test
{
host:127.0.0.1
}
ES return success.

Does that help explain the issue?
Thanks!
Chris

On Wed, Jul 30, 2014 at 4:35 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Can you give an example what you mean by IP ports?

Transport protocols like TCP has ports, but IP (Internet addresses) is
used to address hosts on a network.

Jörg

On Wed, Jul 30, 2014 at 11:02 PM, Chris Neal chris.n...@derbysoft.net
wrote:

Hi all,

I'm trying to use the ip type in ES, but my IPs also have ports. That
doesn't seem to be supported, which was a bit of a surprise!

Does anyone know of a way to do this? Or does it sound like a good
feature to add support for to this type?

Thanks!
Chris

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DpgEWdxZcaFe1wmSWtjRfXqxvUC0vbq8bfE0mC7DusS_%2Bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DpgEWdxZcaFe1wmSWtjRfXqxvUC0vbq8bfE0mC7DusS_%2Bw%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKah6pkCWgKZYASM%3DfxyBq8waPQjyCBHj2aZaAeyiRwQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKah6pkCWgKZYASM%3DfxyBq8waPQjyCBHj2aZaAeyiRwQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjhbQunFsT8gi0bspkNDmj9u5kOyhohoAuEX20P0rcBkQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjhbQunFsT8gi0bspkNDmj9u5kOyhohoAuEX20P0rcBkQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkTDvuqYVFBy-L-WvgWA9NjeiHBAuj%2BmmiLW%3DJkn1jgQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ip type and support for a port?

I think you should handle that at index time and separate IP from port and
index both in separate fields.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 31 juillet 2014 à 18:38:09, Chris Neal (chris.n...@derbysoft.net) a écrit:

Right, but I was more wondering about the use case of supporting a port in
*some* fashion, in some type?
Or is a string type the best type for holding an ip:port combination?

Thanks for the help :)

On Thu, Jul 31, 2014 at 11:25 AM, David Pilato da...@pilato.fr wrote:
Yes. :80 should not be part of the IP. It's a port.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 31 juillet 2014 à 16:56:29, Chris Neal (chris.n...@derbysoft.net) a écrit:

Hi Jorg,

Here's an example:

An ip field named as 'host' in mapping.
host: {
type :ip
},

Then I add below to ES
POST /test_index/test
{
host:127.0.0.1:80
}

If I only post IP without port like this
POST /test_index/test
{
host:127.0.0.1
}
ES return success.

Does that help explain the issue?
Thanks!
Chris

On Wed, Jul 30, 2014 at 4:35 PM, joergpra...@gmail.com joergpra...@gmail.com
wrote:
Can you give an example what you mean by IP ports?

Transport protocols like TCP has ports, but IP (Internet addresses) is used to
address hosts on a network.

Jörg

On Wed, Jul 30, 2014 at 11:02 PM, Chris Neal chris.n...@derbysoft.net wrote:
Hi all,

I'm trying to use the ip type in ES, but my IPs also have ports. That doesn't
seem to be supported, which was a bit of a surprise!

Does anyone know of a way to do this? Or does it sound like a good feature to
add support for to this type?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKah6pkCWgKZYASM%3DfxyBq8waPQjyCBHj2aZaAeyiRwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53da6df3.75c6c33a.f0d0%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DpiAi7F3f6YE_QP5VjtPDzx-bxtmW1PbpwYWtU2KN02U%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53da7154.520eedd1.f0d0%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: ip type and support for a port?

I don't see the value we can add here.
For example, does a Range query on URL make sense?

If it's only for breaking the string into two subfields at index time, it can
be done by a plugin like the mapper attachment. Although, I'd prefer using
logstash for example to do that before sending the index request to ES.

Just my 0.05 cents here :)

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 31 juillet 2014 à 18:39:33, joergpra...@gmail.com (joergpra...@gmail.com) a
écrit:

Maybe it would help to implement a new ES data type for URI/URL?

Usage example:

POST /test/test
{
url : http://127.0.0.1:80;
}

Jörg

On Thu, Jul 31, 2014 at 4:56 PM, Chris Neal chris.n...@derbysoft.net wrote:
Hi Jorg,

Here's an example:

An ip field named as 'host' in mapping.
host: {
type :ip
},

Then I add below to ES
POST /test_index/test
{
host:127.0.0.1:80
}

If I only post IP without port like this
POST /test_index/test
{
host:127.0.0.1
}
ES return success.

Does that help explain the issue?
Thanks!
Chris

On Wed, Jul 30, 2014 at 4:35 PM, joergpra...@gmail.com joergpra...@gmail.com
wrote:
Can you give an example what you mean by IP ports?

Transport protocols like TCP has ports, but IP (Internet addresses) is used to
address hosts on a network.

Jörg

On Wed, Jul 30, 2014 at 11:02 PM, Chris Neal chris.n...@derbysoft.net wrote:
Hi all,

I'm trying to use the ip type in ES, but my IPs also have ports. That doesn't
seem to be supported, which was a bit of a surprise!

Does anyone know of a way to do this? Or does it sound like a good feature to
add support for to this type?

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkTDvuqYVFBy-L-WvgWA9NjeiHBAuj%2BmmiLW%3DJkn1jgQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53da728f.23f9c13c.f0d0%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch still scan all types in a index even if I specify a type

All types eventually belong to the same Lucene index and Lucene cannot
handle different types for the same field name. Avoid using the same name
across types if the field type is different.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/mapping.html#_avoiding_type_gotchas

-- 
Ivan


On Wed, Jul 30, 2014 at 8:58 PM, panfei cnwe...@gmail.com wrote:

 First, put some sample data:

 curl -XPUT 'localhost:9200/testindex/action1/1?pretty' -d '
 {
 title: jumping tom,
 val: 101
 }'

 curl -XPUT 'localhost:9200/testindex/action2/1?pretty' -d '
 {
 title: jumping jerry,
 val: test
 }'

 as you can see, and the mapping is :

 {
 action1 : {
 properties : {
 val : {
 type : long
 },
 title : {
 type : string
 }
 }
 },
 action2 : {
 properties : {
 val : {
 type : string
 },
 title : {
 type : string
 }
 }
 }
 }

 But when do a aggs action:

 curl 'http://192.168.2.245:9200/testindex/action1/_search' -d '
 {
 aggs: {
 vals: {
 terms: {
 field: val
 }
 }
 }
 }'


 {
 took : 37,
 timed_out : false,
 _shards : {
 total : 5,
 successful : 4,
 failed : 1,
 failures : [
 {
 index : testindex,
 shard : 2,
 status : 500,
 reason : 
 RemoteTransportException[[a00][inet[/192.168.2.246:9300]][search/phase/query]];
 nested: ElasticsearchException[java.lang.NumberFormatException: Invalid
 shift value (84) in prefixCoded bytes (is encoded value really an INT?)];
 nested: UncheckedExecutionException[java.lang.NumberFormatException:
 Invalid shift value (84) in prefixCoded bytes (is encoded value really an
 INT?)]; nested: NumberFormatException[Invalid shift value (84) in
 prefixCoded bytes (is encoded value really an INT?)]; 
 }
 ]
 },
 hits : {
 total : 0,
 max_score : null,
 hits : [
 ]
 },
 aggregations : {
 vals : {
 buckets : [
 ]
 }
 }
 }

 The val field in action1 type is mapped to long, but it seems that ES
 still scan the action2 type even if I specify the action1 type.

 any advice to resolve this issue ? thanks.
 --
 不学习，不知道

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLB_Md4w49%2BDW2O2OuLY6RBAR0DPz6rHLvb9WcLq8h3n6Q%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLB_Md4w49%2BDW2O2OuLY6RBAR0DPz6rHLvb9WcLq8h3n6Q%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCA5e332x7zaZUn9kaFTANL-Pv--%2Buj3ed4a%3DY8_zgBGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommendation for using stop words

Tuning stop words can be as long of a process as you want it to be. Saving
your queries/results and doing some search analytics can help you fine tune
the stop words. In general, the default stop words list is very good for
English, but Twitterspeak is not really English. :)

You can look at all the terms in your inverted index (
https://github.com/jprante/elasticsearch-index-termlist) to see what the
top words are and see if they are relevant.

Have you looked at the common words query?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-common-terms-query.html

Cheers,

Ivan

On Wed, Jul 30, 2014 at 4:35 PM, sulemanmubarik sulemanmuba...@gmail.com
wrote:

Hi
What are good recommendation to use the stop words.
Using the default stop words provided by elastic search is good
Or should I use some custom stop words too
More than 60% data is from twitter.
Thanks

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Recommendation-for-using-stop-words-tp4060924.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDhFfibwP1pbG52AKaNfx%3DtHcqW-KZJ7rV42ugwFbe_Mg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: EsRejectedExecutionException

The thread pool will reject any search requests when there are 1000 actions
already queued.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html

Do you have this many search requests at one time? Do you have warmers
and/or percolators running since you mentioned it occurred at startup?

-- 
Ivan


On Thu, Jul 31, 2014 at 12:36 AM, Anand kumar anandv1...@gmail.com wrote:

 Hi All,

 In my cluster, I'm having around 500 indices. When i'm trying to start the
 elasticsearch instance, its showing the following exception.
 Why its happening, what should be done to resolve it?

 Thanks,
 Anand


 [2014-07-31 11:50:01,551][DEBUG][action.search.type   ] [ESCS_NODE]
 [components][3], node[VQMkI5CBRB2hDEnZ5yVLcw], [P], s[STARTED]: Failed to
 execute [org.elasticsearch.action.search.SearchRequest@c34bbb8] lastShard
 [true]
 org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
 rejected execution (queue capacity 1000) on
 org.elasticsearch.search.action.SearchServiceTransportAction$23@60cfc409
 at
 org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
 at
 org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:509)
 at
 org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:203)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:171)
 at
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:153)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
 at
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
 at
 org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
 at
 org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:101)
 at
 org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
 at
 org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
 at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
 at
 org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
 at
 org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:75)
 at
 org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
 at
 org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
 at
 org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
 at
 org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
 at
 org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:294)
 at
 org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:44)
 at
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at
 org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
 at
 org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at
 org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
 at
 org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at
 org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
 at
 org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
 at
 org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
 at

Re: ip type and support for a port?

Of course, range queries make much sense. For example, over schemes, IP
ranges /subnets, or subdomains, or host names.

URL/URIs must also be canonicalized to make them comparable which is a
non-trivial issue, not resolvable by regex/pattern parsing.

URL/URIs data types are a building block for web crawlers.

I do not see Logstash being a webcrawler such as Nutch, Heritrix, or
Crawler4j

My 2¢s.

Jörg

On Thu, Jul 31, 2014 at 6:45 PM, David Pilato da...@pilato.fr wrote:

I don't see the value we can add here.
For example, does a Range query on URL make sense?

If it's only for breaking the string into two subfields at index time, it
can be done by a plugin like the mapper attachment. Although, I'd prefer
using logstash for example to do that before sending the index request to
ES.

Just my 0.05 cents here :)

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr

Le 31 juillet 2014 à 18:39:33, joergpra...@gmail.com (
joergpra...@gmail.com) a écrit:

Maybe it would help to implement a new ES data type for URI/URL?

Usage example:

POST /test/test
{
url : http://127.0.0.1:80;
}

Jörg

On Thu, Jul 31, 2014 at 4:56 PM, Chris Neal chris.n...@derbysoft.net
wrote:

Hi Jorg,

Here's an example:

An ip field named as 'host' in mapping.
host: {
type :ip
},

Then I add below to ES
POST /test_index/test
{
host:127.0.0.1:80 http://127.0.0.1/
}

If I only post IP without port like this
POST /test_index/test
{
host:127.0.0.1
}
ES return success.

Does that help explain the issue?
Thanks!
Chris

On Wed, Jul 30, 2014 at 4:35 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Can you give an example what you mean by IP ports?

Transport protocols like TCP has ports, but IP (Internet addresses) is
used to address hosts on a network.

Jörg

On Wed, Jul 30, 2014 at 11:02 PM, Chris Neal chris.n...@derbysoft.net
wrote:

Hi all,

I'm trying to use the ip type in ES, but my IPs also have ports. That
doesn't seem to be supported, which was a bit of a surprise!

Does anyone know of a way to do this? Or does it sound like a good
feature to add support for to this type?

Thanks!
Chris
--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DpgEWdxZcaFe1wmSWtjRfXqxvUC0vbq8bfE0mC7DusS_%2Bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DpgEWdxZcaFe1wmSWtjRfXqxvUC0vbq8bfE0mC7DusS_%2Bw%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKah6pkCWgKZYASM%3DfxyBq8waPQjyCBHj2aZaAeyiRwQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKah6pkCWgKZYASM%3DfxyBq8waPQjyCBHj2aZaAeyiRwQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjhbQunFsT8gi0bspkNDmj9u5kOyhohoAuEX20P0rcBkQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjhbQunFsT8gi0bspkNDmj9u5kOyhohoAuEX20P0rcBkQ%40mail.gmail.com?utm_medium=emailutm_source=footer.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkTDvuqYVFBy-L-WvgWA9NjeiHBAuj%2BmmiLW%3DJkn1jgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkTDvuqYVFBy-L-WvgWA9NjeiHBAuj%2BmmiLW%3DJkn1jgQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options,

Re: Diagnosing a slow query

2014-07-31 Thread Christopher Ambler

It has been suggested that what I'm seeing is a CPU-bound issue in that the 
large number of OR directives in our query could make many of these queries 
take a long time.

As I'm not an expert on crafting queries, any expert opinions?

Because I'm feeling pretty good about my configuration about now...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ca810684-5fa3-4f0e-92ca-38b3fe359fd7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Memory Explosion: Heap Dump in less than one minute

2014-07-31 Thread Tom Wilson

What exactly do I need to delete and how do I do it?

On Wednesday, July 30, 2014 5:45:03 PM UTC-7, Mark Walkom wrote:

 Unless you are attached to the stats you have in the marvel index for 
 today it might be easier to delete them than try to recover the unavailable 
 shards.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 31 July 2014 10:36, Tom Wilson twils...@gmail.com javascript: 
 wrote:

 Upping to 1GB, memory usage seems to level off at 750MB, but there's a 
 problem in there somewhere. I'm getting a failure message, and the marvel 
 dashboard isn't able to fetch.


 C:\elasticsearch-1.1.1\binelasticsearch
 Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
 [2014-07-30 17:33:27,138][INFO ][node ] [Mondo] 
 version[1.1.1], pid[10864], build[f1585f0/2014-04-16
 T14:27:12Z]
 [2014-07-30 17:33:27,139][INFO ][node ] [Mondo] 
 initializing ...
 [2014-07-30 17:33:27,163][INFO ][plugins  ] [Mondo] 
 loaded [ldap-river, marvel], sites [marvel]
 [2014-07-30 17:33:30,731][INFO ][node ] [Mondo] 
 initialized
 [2014-07-30 17:33:30,731][INFO ][node ] [Mondo] 
 starting ...
 [2014-07-30 17:33:31,027][INFO ][transport] [Mondo] 
 bound_address {inet[/0.0.0.0:9300]}, publish_address
  {inet[/192.168.0.6:9300]}
 [2014-07-30 17:33:34,202][INFO ][cluster.service  ] [Mondo] 
 new_master [Mondo][liyNQAHAS0-8f-qDDqa5Rg][twilson-T
 HINK][inet[/192.168.0.6:9300]], reason: zen-disco-join (elected_as_master)
 [2014-07-30 17:33:34,239][INFO ][discovery] [Mondo] 
 elasticsearch/liyNQAHAS0-8f-qDDqa5Rg
  [2014-07-30 17:33:34,600][INFO ][http ] [Mondo] 
 bound_address {inet[/0.0.0.0:9200]}, publish_address
  {inet[/192.168.0.6:9200]}
 [2014-07-30 17:33:35,799][INFO ][gateway  ] [Mondo] 
 recovered [66] indices into cluster_state
 [2014-07-30 17:33:35,815][INFO ][node ] [Mondo] 
 started
 [2014-07-30 17:33:39,823][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:39,830][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:39,837][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:39,838][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:43,973][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:44,212][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:44,357][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:44,501][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:53,294][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:53,309][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:53,310][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:53,310][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:34:03,281][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:34:03,283][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:34:03,286][DEBUG][action.search.type   ] [Mondo] All 
 shards failed for phase: [query_fetch]
 [2014-07-30 17:34:45,662][ERROR][marvel.agent.exporter] [Mondo] 
 create failure (index:[.marvel-2014.07.31] type: [no
 de_stats]): UnavailableShardsException[[.marvel-2014.07.31][0] [2] 
 shardIt, [0] active : Timeout waiting for [1m], reque
 st: org.elasticsearch.action.bulk.BulkShardRequest@39b65640]



 On Wednesday, July 30, 2014 5:30:29 PM UTC-7, Mark Walkom wrote:

 Up that to 1GB and see if it starts.
 512MB is pretty tiny, you're better off starting at 1/2GB if you can.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com
  

 On 31 July 2014 10:28, Tom Wilson twils...@gmail.com wrote:

  JDK 1.7.0_51

 It has 512MB of heap, which was enough -- I've been running it like 
 that for the past few months, and I only have two indexes and around 
 300-400 documents. This is a development instance I'm running on my local 
 machine. This only happened when I started it today. 

 -tom


 On Wednesday, July 30, 2014 5:16:11 PM UTC-7, Mark Walkom wrote:

 What java version? How much heap have you allocated and how much RAM 
 on the server?

 Basically you have too much data for the heap size,

Re: Memory Explosion: Heap Dump in less than one minute

Look into the curator, which should help:
https://github.com/elasticsearch/curator

If you have just a single development instance, perhaps Marvel is an
overkill. Do you need historical metrics? If not, just use some other
plugin such as head/bigdesk/hq.

Cheers,

Ivan




On Thu, Jul 31, 2014 at 10:52 AM, Tom Wilson twilson...@gmail.com wrote:

 What exactly do I need to delete and how do I do it?


 On Wednesday, July 30, 2014 5:45:03 PM UTC-7, Mark Walkom wrote:

 Unless you are attached to the stats you have in the marvel index for
 today it might be easier to delete them than try to recover the unavailable
 shards.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 31 July 2014 10:36, Tom Wilson twils...@gmail.com wrote:

 Upping to 1GB, memory usage seems to level off at 750MB, but there's a
 problem in there somewhere. I'm getting a failure message, and the marvel
 dashboard isn't able to fetch.


 C:\elasticsearch-1.1.1\binelasticsearch
 Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
 [2014-07-30 17:33:27,138][INFO ][node ] [Mondo]
 version[1.1.1], pid[10864], build[f1585f0/2014-04-16
 T14:27:12Z]
 [2014-07-30 17:33:27,139][INFO ][node ] [Mondo]
 initializing ...
 [2014-07-30 17:33:27,163][INFO ][plugins  ] [Mondo]
 loaded [ldap-river, marvel], sites [marvel]
 [2014-07-30 17:33:30,731][INFO ][node ] [Mondo]
 initialized
 [2014-07-30 17:33:30,731][INFO ][node ] [Mondo]
 starting ...
 [2014-07-30 17:33:31,027][INFO ][transport] [Mondo]
 bound_address {inet[/0.0.0.0:9300]}, publish_address
  {inet[/192.168.0.6:9300]}
 [2014-07-30 17:33:34,202][INFO ][cluster.service  ] [Mondo]
 new_master [Mondo][liyNQAHAS0-8f-qDDqa5Rg][twilson-T
 HINK][inet[/192.168.0.6:9300]], reason: zen-disco-join
 (elected_as_master)
 [2014-07-30 17:33:34,239][INFO ][discovery] [Mondo]
 elasticsearch/liyNQAHAS0-8f-qDDqa5Rg
  [2014-07-30 17:33:34,600][INFO ][http ] [Mondo]
 bound_address {inet[/0.0.0.0:9200]}, publish_address
  {inet[/192.168.0.6:9200]}
 [2014-07-30 17:33:35,799][INFO ][gateway  ] [Mondo]
 recovered [66] indices into cluster_state
 [2014-07-30 17:33:35,815][INFO ][node ] [Mondo]
 started
 [2014-07-30 17:33:39,823][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:39,830][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:39,837][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:39,838][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:43,973][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:44,212][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:44,357][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:44,501][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:53,294][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:53,309][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:53,310][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:33:53,310][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:34:03,281][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:34:03,283][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:34:03,286][DEBUG][action.search.type   ] [Mondo] All
 shards failed for phase: [query_fetch]
 [2014-07-30 17:34:45,662][ERROR][marvel.agent.exporter] [Mondo]
 create failure (index:[.marvel-2014.07.31] type: [no
 de_stats]): UnavailableShardsException[[.marvel-2014.07.31][0] [2]
 shardIt, [0] active : Timeout waiting for [1m], reque
 st: org.elasticsearch.action.bulk.BulkShardRequest@39b65640]



 On Wednesday, July 30, 2014 5:30:29 PM UTC-7, Mark Walkom wrote:

 Up that to 1GB and see if it starts.
 512MB is pretty tiny, you're better off starting at 1/2GB if you can.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 31 July 2014 10:28, Tom Wilson twils...@gmail.com wrote:

  JDK 1.7.0_51

 It has 512MB of heap, which was enough -- I've been running it like
 that for the past few months, and I only have two indexes and around
 300-400 documents. This is a development instance I'm

Re: ip type and support for a port?

2014-07-31 Thread Chris Neal

Thanks guys for the discussion. I'm glad we're talking it through!

For an immediate solution it is easy to parse off the port and and store it
separately. Longer term I do see value in a url/uri type as Jorg mentions.

Thanks again. Much appreciated!
Chris

On Thu, Jul 31, 2014 at 12:35 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Of course, range queries make much sense. For example, over schemes, IP
ranges /subnets, or subdomains, or host names.

URL/URIs must also be canonicalized to make them comparable which is a
non-trivial issue, not resolvable by regex/pattern parsing.

URL/URIs data types are a building block for web crawlers.

I do not see Logstash being a webcrawler such as Nutch, Heritrix, or
Crawler4j

My 2¢s.

Jörg

On Thu, Jul 31, 2014 at 6:45 PM, David Pilato da...@pilato.fr wrote:

I don't see the value we can add here.
For example, does a Range query on URL make sense?

If it's only for breaking the string into two subfields at index time, it
can be done by a plugin like the mapper attachment. Although, I'd prefer
using logstash for example to do that before sending the index request to
ES.

Just my 0.05 cents here :)

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr

Le 31 juillet 2014 à 18:39:33, joergpra...@gmail.com (
joergpra...@gmail.com) a écrit:

Maybe it would help to implement a new ES data type for URI/URL?

Usage example:

POST /test/test
{
url : http://127.0.0.1:80;
}

Jörg

On Thu, Jul 31, 2014 at 4:56 PM, Chris Neal chris.n...@derbysoft.net
wrote:

Hi Jorg,

Here's an example:

An ip field named as 'host' in mapping.
host: {
type :ip
},

Then I add below to ES
POST /test_index/test
{
host:127.0.0.1:80 http://127.0.0.1/
}

If I only post IP without port like this
POST /test_index/test
{
host:127.0.0.1
}
ES return success.

Does that help explain the issue?
Thanks!
Chris

On Wed, Jul 30, 2014 at 4:35 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Can you give an example what you mean by IP ports?

Transport protocols like TCP has ports, but IP (Internet addresses) is
used to address hosts on a network.

Jörg

On Wed, Jul 30, 2014 at 11:02 PM, Chris Neal chris.n...@derbysoft.net
wrote:

Hi all,

I'm trying to use the ip type in ES, but my IPs also have ports. That
doesn't seem to be supported, which was a bit of a surprise!

Does anyone know of a way to do this? Or does it sound like a good
feature to add support for to this type?

Thanks!
Chris
--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DpgEWdxZcaFe1wmSWtjRfXqxvUC0vbq8bfE0mC7DusS_%2Bw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DpgEWdxZcaFe1wmSWtjRfXqxvUC0vbq8bfE0mC7DusS_%2Bw%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKah6pkCWgKZYASM%3DfxyBq8waPQjyCBHj2aZaAeyiRwQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKah6pkCWgKZYASM%3DfxyBq8waPQjyCBHj2aZaAeyiRwQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjhbQunFsT8gi0bspkNDmj9u5kOyhohoAuEX20P0rcBkQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjhbQunFsT8gi0bspkNDmj9u5kOyhohoAuEX20P0rcBkQ%40mail.gmail.com?utm_medium=emailutm_source=footer.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

[ANN] Elasticsearch Juju charm

2014-07-31 Thread Jorge Castro

Hello everyone,

We (Ubuntu) have been working on a Juju[1] charm for Elasticsearch so that
our users can deploy ES easily on as many clouds as possible.

- https://github.com/charms/elasticsearch/tree/trusty

We use ansible to install the latest packages from the upstream repository,
though the charm has an option of changing that for those of you that
maintain your own internal mirrors of the packages. With this charm it's
possible to stand up ES clusters on just about any cloud where Ubuntu runs.
This
is the 2nd iteration of our charm and we're using it in production, so
we're keen on getting more eyeballs on it. The basic goal is to enable
every Ubuntu user (and eventually Windows and CentOS) to easily deploy ES
clusters out of the box in 14.04 (and in 12.04 after that as well).

I was hoping we could get some community folks to give our code a peer
review, perhaps point out places where we could improve, and to ensure that
we're following upstream best practices.

Any help/guidance would be appreciated!

--
Jorge Castro

PS: On a related note we do have charms for Kibana and Logstash as well:

- https://github.com/charms/kibana/
- https://github.com/charms/logstash-indexer/

And I am also currently working on what we call a bundle, which is a set
of charms bundled together if anyone is interested in checking it out, the
idea is for people to be able to drag and drop bundles for deployment:
http://manage.jujucharms.com/bundle/~jorge/elasticsearch/cluster

[1]: http://juju.ubuntu.com

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2fa7e262-2454-42ef-8d4b-11a3173b54fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

timestamp format

2014-07-31 Thread Cristian Falcas

Hello all,

I'm using currently rsyslogd to send messages to elasticsearch and kibana
as a GUI.

Rsyslogd is sending the @timestamp in the following format:

2014-07-31T21:01:16.515922+03:00

I was wondering if elasticsearch is able to understand this format? Because
kibana sorting doesn't do anything with it. The sorting is completly random.

Should I send the timestamp in another format? Can I keep the microseconds?

Best regards,
Cristian Falcas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMo7R_dKY%3DE-PyiKhc1SUtv5NKPM%3DAUJFyNmiwN-yuOigDEmQQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: slow filter execution

2014-07-31 Thread Kireet Reddy

Quick update, I found that if I explicitly set _cache to true, things seem 
to work more as expected, i.e. subsequent executions of the query sped up. 
I looked at DateFieldMapper.rangeFilter() and to me it looks like if a 
number is passed, caching will be disabled unless it's explicitly set to 
true. Not sure if this has been fixed in 1.3.x yet or not. This meshes with 
my observed behavior. 

On Wednesday, July 30, 2014 8:59:37 AM UTC-7, Kireet Reddy wrote:

 Thanks for the detailed reply. 

 I am a bit confused about and vs bool filter execution. I read this post 
 http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/ 
 on 
 the elasticsearch blog. From that, I thought the bool filter would work by 
 basically creating a bitset for the entire segment(s) being examined. If 
 the filter value changes every time, will this still be cheaper than an AND 
 filter that will just examine the matching docs? My segments can be very 
 big and this query for example on matched one document.

 There is no match_all query filter, There is a match query filter on a 
 field named all. :)

 Based on your feedback, I moved all filters, including the query filter, 
 into the bool filter. However it didn't change things: the query takes an 
 order of magnitude slower with the range filter, unless I set execution to 
 fielddata. I am using 1.2.2, I tried the strategy anyways and it didn't 
 make a difference.

 {
 query: {
 filtered: {
 query: {
 match_all: {}
 },
 filter: {
 bool: {
 must: [
 {
 terms: {
 source_id: [s1, s2, s3]
 }
 },
 {
 query: {
 match: {
 all: {
 query: foo
 }
 }
 }
 },
 {
 range: {
 published: {
 to: 1406064191883
 }
 }
 }
 ]
 }
 }
 }
 },
 sort: [
 {
 crawlDate: {
 order: desc
 }
 }
 ]
 }

 On Wednesday, July 30, 2014 4:30:10 AM UTC-7, Clinton Gormley wrote:

 Don't use the `and` filter - use the `bool` filter instead.  They have 
 different execution modes and the `bool` filter works best with bitset 
 filters (but also knows how to handle non-bitset filters like geo etc).  

 Just remove the `and`, `or` and `not` filters from your DSL vocabulary.

 Also, not sure why you are ANDing with a match_all filter - that doesn't 
 make much sense.

 Depending on which version of ES you're using, you may be encountering a 
 bug in the filtered query which ended up always running the query first, 
 instead of the filter. This was fixed in v1.2.0 
 https://github.com/elasticsearch/elasticsearch/issues/6247 .  If you are 
 on an earlier version you can force filter-first execution manually by 
 specifying a strategy of random_access_100.  See 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy

 In summary, (and taking your less granular datetime clause into account) 
 your query would be better written as:

 GET /_search
 {
   query: {
 filtered: {
   strategy: random_access_100,   pre 1.2 only
   filter: {
 bool: {
   must: [
 {
   terms: {
 source_id: [ s1, s2, s3 ]
   }
 },
 {
   range: {
 published: {
   gte: now-1d/d   coarse grained, cached
 }
   }
 },
 {
   range: {
 published: {
   gte: now-30m  fine grained, not cached, 
 could use fielddata too
 },
 _cache: false
   }
 }
   ]
 }
   }
 }
   }
 }





 On 30 July 2014 10:55, David Pilato da...@pilato.fr wrote:

 May be a stupid question: why did you put that filter inside a query and 
 not within the same filter you have at the end?


 For my test case it's the same every time. In the real query it will 
 change every time, but I planned to not cache this filter and have a less 
 granular date filter in the bool filter that would be cached. However 
 while 
 debugging I

Re: slow filter execution

2014-07-31 Thread Clinton Gormley

On 31 July 2014 20:25, Kireet Reddy kir...@feedly.com wrote:

 Quick update, I found that if I explicitly set _cache to true, things seem
 to work more as expected, i.e. subsequent executions of the query sped up.
 I looked at DateFieldMapper.rangeFilter() and to me it looks like if a
 number is passed, caching will be disabled unless it's explicitly set to
 true. Not sure if this has been fixed in 1.3.x yet or not. This meshes with
 my observed behavior.


Nice catch!!!

That's a notable bug!  Opened here:
https://github.com/elasticsearch/elasticsearch/issues/7114

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSuS6f28kmXT_b3LFvCZJG1-_ui2D%3Drf-rojn4x6Mf%2Brw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Complete suggestion Issue

2014-07-31 Thread Madhavan Ramachandran

Hi 

I have created index for typeahead aka completion suggestion in ES. I 
feeded the below data into ES.

us-en-us-chicago
us-en-us-chicago, central loop
us-en-us-chicago, east loop
us-en-us-chicago, eastern east-west
us-en-us-chicago, i-80  i-55 corridors
us-en-us-chicago, i-90 corridor
us-en-us-chicago, north (cook county)
us-en-us-chicago, north (lake county)
us-en-us-chicago, north i-94 corridor
us-en-us-chicago, north michigan avenue
us-en-us-chicago, northwest
us-en-us-chicago, northwest indiana
us-en-us-chicago, o'hare
us-en-us-chicago, river north
us-en-us-chicago, west loop
us-en-us-chicago, western east-west

when i run the below suggest query, i am getting only 5 results out of the 
above..

{   
my-suggest : {
text : us-en-us-c,
completion : {
field : search
}
}
}


{
_shards: {
total: 5,
successful: 5,
failed: 0
},
my-suggest: [
{
text: us-en-us-c,
offset: 0,
length: 10,
options: [
{
text: us-en-us-chicago,
score: 1
},
{
text: us-en-us-chicago, central loop,
score: 1
},
{
text: us-en-us-chicago, east loop,
score: 1
},
{
text: us-en-us-chicago, eastern east-west,
score: 1
},
{
text: us-en-us-chicago, i-80  i-55 corridors,
score: 1
}
]
}
]
}

But if i do search.. i used to see all the results.. how to get all the 
results using _suggest ?

Regards
Madhavan.TR

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f9e77708-bd3f-417e-8e79-766894afa45d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It is possibile don't token word with elasticsearch?

2014-07-31 Thread Fedele Mantuano

Hi,

i have a question. I have an elasticsearch server and I use logstash to put 
log into it, but I noticed that the mail are tokenized:

t...@example-test.com became [test, example, test].

It is possibile don't token word with elasticsearch?

I found this guide:


*http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html*but
 
I can't work.

Is this code correct?

curl -XPUT http://localhost:9200/test_index/ -d '
{   
  

  settings: { 
  analysis : { 
filter : { 
word_filter : { 
type:word_delimiter,
generate_word_parts:false,
generate_number_parts:false,
split_on_numerics:false,
split_on_case_change:false,
preserve_original:true 
}'


Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d1550294-777b-40f6-9f70-3531ddf9a7cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: It is possibile don't token word with elasticsearch?

Change the mapping for this field and change index to not_analyzed.
So it will be kept as is.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 31 juil. 2014 à 21:55, Fedele Mantuano mantuano.fed...@gmail.com a écrit :

Hi,

i have a question. I have an elasticsearch server and I use logstash to put log
into it, but I noticed that the mail are tokenized:

t...@example-test.com became [test, example, test].

It is possibile don't token word with elasticsearch?

I found this guide:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

but I can't work.

Is this code correct?

curl -XPUT http://localhost:9200/test_index/ -d '
{

settings: {
analysis : {
filter : {
word_filter : {
type:word_delimiter,
generate_word_parts:false,
generate_number_parts:false,
split_on_numerics:false,
split_on_case_change:false,
preserve_original:true
}'

Thanks
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d1550294-777b-40f6-9f70-3531ddf9a7cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3567C7B1-7BF3-4739-85B5-03FE29306FF2%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

How to perform a wildcard in phrase search

2014-07-31 Thread Joyce Wu

Hi,

I have a document contains a phrase interest entities.  If I use wildcard 
query to search entit*, that document is returned.  However, if I use 
wildcard query to do a phrase search interest*entit*, that document is 
not found.  I also tried with prefix query as well as wildcard query with 
interest entit*, no document is found either.   

I would expect follow query to work according to ElasticSearch document:  
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_literal_wildcard_literal_and_literal_regexp_literal_queries.html

  {wildcard:{document:interest*entit*}}

The document was indexed using standard tokenizer and standard filter, 
which should be not stemmed.

Does anyone has an idea how to get the wildcard query works with wildcard 
in a phrase?

Many thanks!


Joyce




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b75810bb-9e10-4dfd-aa2a-bfad63d78c27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Remote access through SSH

2014-07-31 Thread Chia-Eng Chang

I tried curl -XGET http://IPADDRESS:9200/.. http://IPADDRESS:9200/.. but
it failed.
Actually I cant even telnet port 9200 on my machine.
My guess is that elasticsearch port 9200 is hidden behind ssh port 22.

So I use ssh tunnel forwarding port 9200 on the server to my machine.
Like :

ssh -Lmy-port:target-machine:target-port user@target-machine

Then I can simply apply curl -get localhost:9200 to query elasticsearch on my
cloud server.
The java api transpot client might need the same setting to make it work.

On Wednesday, July 30, 2014 5:48:28 PM UTC-7, Mark Walkom wrote:

You can also curl from your local machine to the server, without having to
SSH to it - curl -XGET http://IPADDRESS:9200/

You don't need to provide SSH credentials for that transport client
example.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 31 July 2014 10:35, Chia-Eng Chang chia...@uci.edu javascript:
wrote:

Thank you for the links. Yeah, I am new to ES. (and http rest)
What I understand is that if I want to get the index documents on my SSH
server, I can SSH log in the server.
And then rest http get from localhost:9200.

Could you explain more about use SSH directly for it?
I think what I want to do is close to this transport client example
http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html#transport-client

But I have to provide ssh credential.

On Wednesday, July 30, 2014 4:47:22 PM UTC-7, Mark Walkom wrote:

You may want to look at http://www.elasticsearch.
org/guide/en/elasticsearch/reference/current/search.html

If you are just learning ES, then check out http://
exploringelasticsearch.com/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 09:35, Chia-Eng Chang chia...@uci.edu wrote:

Thanks @Mark
I have a public key on server and I know how to SSH to server then get
the index from localhost:9200.
But what I want to do is remotely obtain the index on the SSH server
(which I know its public IP)

On Wednesday, July 30, 2014 3:56:04 PM UTC-7, Mark Walkom wrote:

You need to use SSH directly for it, curl won't work.

ssh user@host -i ~/.ssh/id_rsa.pub

Assuming you have a public key on the server.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 08:47, Chia-Eng Chang chia...@uci.edu wrote:

About the HTTP API, I wonder if I want to remote access a cluster
on SSH server, what should I include in my http rest command:

example as mapping:

curl -XGET ' http://localhost:9200/ index /_mapping/ type '

I tried something like below but got failed:

curl -XGET -u user_name: --key ~/.ssh/id_rsa --pubkey
~/.ssh/id_rsa.pub 'xx.xxx.xxx.xxx:9200/index/_mapping/type'

Is there anyone knows the solution?

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/31b4e835-8ebb-4dc7-bc2b-c8fa09414f12%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/31b4e835-8ebb-4dc7-bc2b-c8fa09414f12%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d6404d32-6b0f-4670-8626-f38b1284809d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d6404d32-6b0f-4670-8626-f38b1284809d%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

https://groups.google.com/d/msgid/elasticsearch/3d1ad9da-e964-482b-89ac-75ad35b68227%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to

Re: Remote access through SSH

2014-07-31 Thread Patrick Proniewski

Hi,

Looks like your problem is not related to ES. It's a networking problem. You ES
server appears to live behind a firewall (or is bind to 127.0.0.1 only), and
your only way to contact it is to create an ssh tunnel to the server and to
pipe your queries into this tunnel so that you can get in touch with the ES
process.

You might want to get in touch with your hosting provider. Also, if you can run
a web server (apache, nginx...) on your server, you might want to configure a
proxy in this web server, so that requests sent to a particular URL will be
redirected to the ES process. You would have something like this:

client (public IP, port 80 or 443) http-server - (localhost port 9200)
ES process

It allows you to add some kind of access control to the ES process by using
access control directives from Apache or Nginx.

On 31 juil. 2014, at 23:47, Chia-Eng Chang wrote:

I tried curl -XGET http://IPADDRESS:9200/ but it failed.
Actually I cant even telnet port 9200 on my machine.
My guess is that elasticsearch port 9200 is hidden behind ssh port 22.

So I use ssh tunnel forwarding port 9200 on the server to my machine.
Like :
ssh -Lmy-port:target-machine:target-port user@target-machine

Then I can simply apply curl -get localhost:9200 to query elasticsearch on my
cloud server.
The java api transpot client might need the same setting to make it work.

On Wednesday, July 30, 2014 5:48:28 PM UTC-7, Mark Walkom wrote:
You can also curl from your local machine to the server, without having to
SSH to it - curl -XGET http://IPADDRESS:9200/

You don't need to provide SSH credentials for that transport client example.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 10:35, Chia-Eng Chang chia...@uci.edu wrote:
Thank you for the links. Yeah, I am new to ES. (and http rest)
What I understand is that if I want to get the index documents on my SSH
server, I can SSH log in the server.
And then rest http get from localhost:9200.

Could you explain more about use SSH directly for it?
I think what I want to do is close to this transport client example
But I have to provide ssh credential.

On Wednesday, July 30, 2014 4:47:22 PM UTC-7, Mark Walkom wrote:
You may want to look at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search.html

If you are just learning ES, then check out http://exploringelasticsearch.com/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 09:35, Chia-Eng Chang chia...@uci.edu wrote:
Thanks @Mark
I have a public key on server and I know how to SSH to server then get the
index from localhost:9200.
But what I want to do is remotely obtain the index on the SSH server (which I
know its public IP)

On Wednesday, July 30, 2014 3:56:04 PM UTC-7, Mark Walkom wrote:
You need to use SSH directly for it, curl won't work.

ssh user@host -i ~/.ssh/id_rsa.pub

Assuming you have a public key on the server.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 08:47, Chia-Eng Chang chia...@uci.edu wrote:
About the HTTP API, I wonder if I want to remote access a cluster on SSH
server, what should I include in my http rest command:

example as mapping:

curl -XGET ' http://localhost:9200/
index /_mapping/ type '

I tried something like below but got failed:

curl -XGET -u user_name: --key ~/.ssh/id_rsa --pubkey ~/.ssh/id_rsa.pub
'xx.xxx.xxx.xxx:9200/index/_ma
pping/type'

Is there anyone knows the solution?

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/31b4e835-8ebb-4dc7-bc2b-c8fa09414f12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

For more options, visit https://groups.google.com/d/optout.

Issue (Bug?) with field_value_factor and Search Templates

2014-07-31 Thread Germán Carrillo

Hi,

I'm getting the following error message while attempting to use the Search
Template (I use ES 1.3.1):

 *nested: ElasticsearchException[Unable to find a field mapper for field
[weight]*


I' m storing the template in the .script index as stated here [1]. The
field 'weight' is used inside a field_value_factor (function_score).

The query runs appropriately when not using a Search Template.


You can find a mapping, sample data, working query, sample template, and a
not working template query at [2].
Is that a bug?


I'd appreciate any help,

Germán


[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-template.html#_filling_in_a_query_string_with_a_single_value
 [2] https://titanpad.com/es-fvf-issue

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANaz7mwZPi5ZpYnBNHpvapyqW6kZfy8zfkuRgqROYjivFdHCnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommendations needed for large ELK system design

1 - Curator FTW.
2 - Masters handle cluster state, shard allocation and a whole bunch of
other stuff around managing the cluster and it's members and data. A node
that is master and data set to false is considered a search node. But the
role of being a master is not onerous, so it made sense for us to double up
the roles. We then just round robin any queries to these three masters.
3 - Yes, butit's entirely dependent on your environment. If you're
happy with that and you can get the go-ahead then see where it takes you.
4 - Quorum is automatic and having the n/2+1 means that the majority of
nodes will have to take place in an election, which reduces the possibility
of split brain. If you set the discovery settings then you are also
essentially setting the quorum settings.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 22:27, Alex alex.mon...@gmail.com wrote:

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

1. I haven't looked into it much yet but I'm guessing Curator can
handle different index naming schemes. E.g. logs-2014.06.30 and
stats-2014.06.30. We'd actually be wanting to store the stats data for 2
years and logs for 90 days so it would indeed be helpful to split the data
into different index sets. Do you use Curator?

3. In the interests of reducing the number of node combinations for us
to test out would you say, then, that 3 master (and query(??)) only nodes,
and the 6 1TB data only nodes would be good?

4. Quorum and split brain are new to me. This webpage

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:
index operations only succeed if a quorum (replicas/2+1) of active shards
are available. I completely understand the split brain issue, but not
quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex alex@gmail.com wrote:

Hello,

We wish to set up an entire ELK system with the following features:

Re: It is possibile don't token word with elasticsearch?

2014-07-31 Thread Fedele Mantuano

Thanks.

I do like this:

curl -XPUT http://localhost:9200/index_test/ -d ' 
{
  mappings: {
type_test1 : {
  properties : {
sender-address : {
  type : string,
  index: not_analyzed },
 recipient-address : {
  type : string,
  index: not_analyzed }}},
type_test2 : {
  properties : {
User-Agent : {
  type : string,
  index: not_analyzed }'

and I solved.

Bye bye

On Thursday, July 31, 2014 9:55:42 PM UTC+2, Fedele Mantuano wrote:

 Hi,

 i have a question. I have an elasticsearch server and I use logstash to 
 put log into it, but I noticed that the mail are tokenized:

 t...@example-test.com became [test, example, test].

 It is possibile don't token word with elasticsearch?

 I found this guide:


 *http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html
  
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html*but
  
 I can't work.

 Is this code correct?

 curl -XPUT http://localhost:9200/test_index/ -d '
 { 
 

   settings: { 
   analysis : { 
 filter : { 
 word_filter : { 
 type:word_delimiter,
 generate_word_parts:false,
 generate_number_parts:false,
 split_on_numerics:false,
 split_on_case_change:false,
 preserve_original:true 
 }'


 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9fa76939-84ec-41c4-a47f-a0b3543a77a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Diagnosing a slow query

2014-07-31 Thread Otis Gospodnetic

ORs certainly tend to be slower than simpler/shorter term queries, but I'd 
suspect that cross-DC part because your index is tinny and servers are 
beefy and plentiful.  Maybe you can look at your network and query metrics 
and correlate a drop or spike in traffic or packet loss with slow queries?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/

tel: +1 347 480 1610   fax: +1 718 679 9190



On Thursday, July 31, 2014 7:49:43 PM UTC+2, Christopher Ambler wrote:

 It has been suggested that what I'm seeing is a CPU-bound issue in that 
 the large number of OR directives in our query could make many of these 
 queries take a long time.

 As I'm not an expert on crafting queries, any expert opinions?

 Because I'm feeling pretty good about my configuration about now...


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/18093462-aa38-4f6b-835f-0f6578dd69dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommendations needed for large ELK system design

2014-07-31 Thread Otis Gospodnetic

You can further simplify your architecture by using rsyslog with
omelasticsearch instead of LS.

This might be handy:
http://blog.sematext.com/2013/07/01/recipe-rsyslog-elasticsearch-kibana/

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr Elasticsearch Support * http://sematext.com/

tel: +1 347 480 1610 fax: +1 718 679 9190

On Friday, August 1, 2014 12:57:26 AM UTC+2, Mark Walkom wrote:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 31 July 2014 22:27, Alex alex@gmail.com javascript: wrote:

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

1. I haven't looked into it much yet but I'm guessing Curator can
handle different index naming schemes. E.g. logs-2014.06.30 and
stats-2014.06.30. We'd actually be wanting to store the stats data for 2
years and logs for 90 days so it would indeed be helpful to split the
data
into different index sets. Do you use Curator?

2. You say that you have 3 masters that also handle queries... but
I thought all masters did was handle queries? What is a master node that
*doesn't* handle queries? Should we have search load balancer nodes?
AKA not master and not data nodes.

3. In the interests of reducing the number of node combinations for
us to test out would you say, then, that 3 master (and query(??)) only
nodes, and the 6 1TB data only nodes would be good?

4. Quorum and split brain are new to me. This webpage

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:

index operations only succeed if a quorum (replicas/2+1) of active
shards
are available. I completely understand the split brain issue, but not
quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web:

Re: Diagnosing a slow query

2014-07-31 Thread Christopher Ambler

Suspecting this, we tried taking things down to a single server and still 
have the exact same response.

That said, optimizing the query to get rid of some of those ORs has helped, 
so I think that's the path we're taking.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/17cdcddd-fa02-40bf-b59c-dfee613c3a84%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cluster making

2014-07-31 Thread arshpreet singh

Hi I am new to clusters. First I was using elastic search only with 4 GB
machine but now I want to increase it. I am using rocks linux for cluster
making. I have installed frontend but having problem while making cluster.
I am following exact tutorial provided by rocks linux website but frontend
does not detect node.
Is there any problem with dhcp configuration?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAstK2FJ-MhHyarbxX21OdPh5i5SBdJUyx4aa7YaTgRApQd1gg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Cluster making

2014-07-31 Thread arshpreet singh

On 1 Aug 2014 06:57, Mark Walkom ma...@campaignmonitor.com wrote:

 It will help if you link to the tutorial page, as this doesn't really
make much sense.

Thanks for quick reply. I have installed rocks on frontend and now
following the documentation for cluster installing.

http://www.rocksclusters.org/roll-documentation/base/5.5/install-compute-nodes.html

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAstK2F0iKFgyy_JFL_%3DyQBbpwJc4q3S1-pvVMhKaF1LUVQ63Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Cluster making

I don't think you need this - ES handles clustering by itself.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 1 August 2014 11:38, arshpreet singh arsh...@gmail.com wrote:

On 1 Aug 2014 06:57, Mark Walkom ma...@campaignmonitor.com wrote:

It will help if you link to the tutorial page, as this doesn't really
make much sense.

Thanks for quick reply. I have installed rocks on frontend and now
following the documentation for cluster installing.

http://www.rocksclusters.org/roll-documentation/base/5.5/install-compute-nodes.html

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAstK2F0iKFgyy_JFL_%3DyQBbpwJc4q3S1-pvVMhKaF1LUVQ63Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAAstK2F0iKFgyy_JFL_%3DyQBbpwJc4q3S1-pvVMhKaF1LUVQ63Q%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YUOi9cxzeUJ-9tod2x0%2BBpAtg%2ByHG23j%2BcTLqqyygtew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

index size impact on search performance?

2014-07-31 Thread Wang Yong

Hi folks, I have an index storing lots of time serial data. The data are put
into index by :

 

curl -XPUT 'localhost:9200/testindex/action1/1?pretty' -d '

{

val: 23,

timestamp: 1406822400

}'

 

And the only thing I search in this index is histogram facet in a very short
time range, like recent 5 min. I found that the performance was pretty
good at first. But when the index get bigger, the performance dropped to
unacceptable. I found the IO maybe the bottleneck by checking the result of
iostat.

 

My question is, even I only facet in a very short time range, why the size
of index has so big impact on the performance of such query? Do I have to
use daily index, just like logstash?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00fd01cfad2e%2457938420%2406ba8c60%24%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic Search losing data not in _source on _update

Is there any way to do this so it can be stored but I don't get it
when I pull in the _source record? Even extracted text is going to be
huge when you're talking about 20-30+ page documents.

On Thu, Jul 31, 2014 at 10:34 AM, David Pilato da...@pilato.fr wrote:
In your case, it's not. Because you excluded the attachment field.

If not, I think you need to send again the full JSON document, including the
binary file.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 31 juillet 2014 à 16:32:04, Jordan Reiter (jordantheco...@gmail.com) a
écrit:

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/972f1b41-d10b-4ebc-8ac2-c83b80891924%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/26HBTz6XKgM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53da5401.684a481a.f0d0%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

--
Jordan Reiter
AACE - Association for the Advancement of Computing in Education
Email: jor...@aace.org | Website: www.aace.org | +1.267.438.2388

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD4hTsUW6VZ2dhRjqyMaRV5uiTGYuAWiz4Z%3D0y0dtbozJeHMLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic Search losing data not in _source on _update

Never mind, a little googling answered that question:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/root-object.html#source-field
In a search request, you can ask for only certain fields by
specifying the _source parameter in the request body.

That neatly resolves my issue!

It does mean I'm going to have to change my mapping and, probably,
re-index my entire collection.

Thanks for your help
Jordan

On Thu, Jul 31, 2014 at 10:21 PM, Jordan Reiter jor...@aace.org wrote:
Is there any way to do this so it can be stored but I don't get it
when I pull in the _source record? Even extracted text is going to be
huge when you're talking about 20-30+ page documents.

On Thu, Jul 31, 2014 at 10:34 AM, David Pilato da...@pilato.fr wrote:
In your case, it's not. Because you excluded the attachment field.

If not, I think you need to send again the full JSON document, including the
binary file.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 31 juillet 2014 à 16:32:04, Jordan Reiter (jordantheco...@gmail.com) a
écrit:

For more options, visit https://groups.google.com/d/optout.

--
Jordan Reiter
AACE - Association for the Advancement of Computing in Education
Email: jor...@aace.org | Website: www.aace.org | +1.267.438.2388

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD4hTsWbKFb2iA9x-ezsz-EiY8j1gH%2BfkMpGv-khQnyUqv%3DqzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: index size impact on search performance?

Well. I guess it depends on your query. What does it look like?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 1 août 2014 à 04:14, Wang Yong cnwangy...@gmail.com a écrit :

Hi folks, I have an index storing lots of time serial data. The data are put
into index by :

curl -XPUT 'localhost:9200/testindex/action1/1?pretty' -d '
{
val: 23,
timestamp: 1406822400
}'

And the only thing I search in this index is histogram facet in a very short
time range, like “recent 5 min”. I found that the performance was pretty good
at first. But when the index get bigger, the performance dropped to
unacceptable. I found the IO maybe the bottleneck by checking the result of
iostat.

My question is, even I only facet in a very short time range, why the size of
index has so big impact on the performance of such query? Do I have to use
daily index, just like logstash?
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00fd01cfad2e%2457938420%2406ba8c60%24%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0D3FCF70-13F5-4E92-8407-47E907736F79%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Issue (Bug?) with field_value_factor and Search Template

2014-07-31 Thread German Carrillo

Hi, 

I'm getting the following error message while attempting to use the Search 
Template (I use ES 1.3.1):

*nested: ElasticsearchException[Unable to find a field mapper for field 
[weight]*


I' m storing the template in the .script index as stated here [1]. The 
field 'weight' is used inside a field_value_factor (function_score). 

The query runs appropriately when not using a Search Template. 


You can find a mapping, sample data, working query, sample template, and a 
not working template query at [2].
Is that a bug? 


I'd appreciate any help,

Germán


[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-template.html#_filling_in_a_query_string_with_a_single_value
[2] https://titanpad.com/es-fvf-issue

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/65d2ceb8-37cb-4f61-92e6-e85c0c4f0b6e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: index size impact on search performance?