date:20141204

Did you see this? 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-service-win.html

What is missing?

David

> Le 5 déc. 2014 à 07:05, Cheten Dev  a écrit :
> 
> Hi,
> 
> I am new to elastic search . i am going through documentation .
> documentation is mostly about Linux/unix , it doesn't mention how to 
> configuration for windows
> can i get some help on this?
> 
> Thanks 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/ed340e57-4be7-484e-a933-76c55e810ac7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/848A2B43-1D28-4D6C-B06F-5721C5ACE656%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: How to get ES to autodetect geo_point type and if it can't, where should I define the mapping?

You basically define a mapping once and you're done.
Have a look at templates. They could help you using naming convention for 
fields. Like all *location field names will be geo_point.

David

> Le 4 déc. 2014 à 22:44, am  a écrit :
> 
> Hello, I am using ES js wrapper in a nodejs application. I would like to get 
> ES to do some geospatial searches, so from what I understand, I need to set 
> the following options on mylocation field:
> 
> mylocation: {
> properties: {
> type: 'geo_point',
> lat_lon: true
> }
> }
> 
> I am inserting the data like so:
> 
> // In my nodejs insert()
> client.index({
> index: 'myapp',
> type: 'user',
> id: user.id,
> body: {
> name: user.name,
> mylocation: { lat: mylat, lon: mylon}
> }
> });
> 
> 
> This inserts the lat,lon as doubles though. To counter this, I have a 
> declared a mapping:
> 
> client.indicies.putMappings({
> index: 'myapp',
> type: 'user',
> body: {
> 'user': {
>  properties: {
>  mylocation: { type: 'geo_point', lat_lon: true }
>  }
> }
> }
> });
> 
> This all works, but it is it possible to supply ES a certain format so that 
> ES will automatically create a geo_point type with lat_lon set to true?
> 
> If not, could someone clarify where I should declare the mappings? Should 
> mappings be declared each time before an insert/update happens in ES or is it 
> just one time? If it's just one time, do I put the mapping inside my root 
> nodejs application? What happens if I restart the nodejs application: would 
> the mapping attempt to go through all ES documents and reindex the mylocation 
> field to geo_point even though it's already been set from the previous reboot 
> of the app?
> 
> Thanks!
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/d1ac1327-365f-4bcb-a0f9-94dd89580c7d%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6B1DC450-AFF8-4F55-B07C-7F5087C1DA98%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Got thread waiting in java when search

2014-12-04 Thread nodexy

Fixed this .

The key point is NEVER invoke cross-server or cross-process service in a 
for loop . This is not a question to elasticsearch but common programming 
sutff.
Sorry for disturbation. Thanks.


On Tuesday, December 2, 2014 5:55:21 PM UTC+8, nodexy wrote:
>
> Hi,
>
> I got this issue when build a basic  search service by ElasticSearch in 
> Java as belown:
>
> SearchResponse resp = client
> .prepareSearch(ElasticSearchManager.INDEX_NAME)
> .setTypes(ElasticSearchManager.INDEX_TYPE)
> // way 1
> // .setQuery(QueryBuilders.termQuery("pkgName",
> // pkgName.toLowerCase()))
> // way 2
> .setQuery(QueryBuilders.matchAllQuery())
> .setPostFilter(
> FilterBuilders.termFilter("pkgName",
> pkgName.toLowerCase())).execute().
> actionGet();
>
>
>  And this method is invoked in a for loop on a list with 20 elements .
>
> When we run this service in Tomcat container under 500 concurrent thread, 
> so many threads are waiting  like this(TPS is about 1500):
> 名称: catalina-exec-15
> 状态: org.elasticsearch.common.util.concurrent.BaseFuture$Sync@23772759上的
> WAITING
> 总阻止数: 1,136, 总等待数: 166,545
>
> 堆栈跟踪: 
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.
> parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.
> doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.
> acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.
> java:274)
> org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:
> 113)
> org.elasticsearch.action.support.AdapterActionFuture.actionGet(
> AdapterActionFuture.java:45)
> com.xxx.service.elastic.ElasticSearchService.getByPkgName(
> ElasticSearchService.java:324)
> ... ...
>
> (sorry for the Chinese system language )
>
> monitor info from jvisualvm:
>
>
> Notes: 
> 1. All ElasticsSearch is default .
> 2. Another service without for loop is really good , 1 concurrent with 
> 5000+ tps .
>
> So is for loop the real reason ? How should I update this ?  
> Really thanks for your attention .
>  
> nodexy
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bf1ec0db-434a-42a9-a549-934ca7ba047c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Does Elastic Search 2.0 have shard splitting?

Why not using aliases and add more index/shards when you need to?
What is wrong with this design?

David

> Le 5 déc. 2014 à 01:56, Kevin Burton  a écrit :
> 
> I just assumed that ES was planning on building in shard splitting at some 
> point since it's a glaringly obvious addition to the feature set.
> 
> Then I saw this:
> 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/overallocation.html
> 
> > Users often ask why Elasticsearch doesn’t support shard-splitting — the 
> > ability to split each shard into two or more pieces. The reason is that 
> > shard-splitting is a bad idea
> 
> I really hate that answer.  Seems like it's just trying to excuse not having 
> a major feature.
> 
> The ability to incrementally add capacity to your cluster seems like a clear 
> win.  One would even say this type of scalability is "elastic" ... 
> 
> Are there plans to add this in at some future point?  
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/bd89f8d2-6ad5-4d4e-8b8f-a0656b63afba%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/BE60E8C8-6D36-4C50-92E1-ACE40E458D88%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Configuration for elastic search in windows

2014-12-04 Thread Cheten Dev

Hi,

I am new to elastic search . i am going through documentation .
documentation is mostly about Linux/unix , it doesn't mention how to 
configuration for windows
can i get some help on this?

Thanks 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed340e57-4be7-484e-a933-76c55e810ac7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is there a way to convert @timestamp of ES to Unix epoch time in milliseconds?

This is exactly what elasticsearch does behind the scene.
It index ms since epoch.

David

> Le 5 déc. 2014 à 05:29, Mungeol Heo  a écrit :
> 
> Hi,
> 
> As I mentioned at the title of this question, I wonder is there a way
> to convert @timestamp of ES to Unix epoch time in milliseconds by
> using ES query?
> For instance, "2014-11-10T15:00:00.000Z" to "141559920".
> Any help will be great.
> 
> Thanks.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CADQPeWwW_bitzy5GA_R8ukpXDCEj%3DXCM66N994iY35KpYPtdRw%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/349D0B40-1CB3-4567-A35B-9535783AF8B5%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: When a master node goes down, how the client query works?

2014-12-04 Thread Aaron

Thank you both Elvar and Jorg for your replies.

Aaron



On Thursday, December 4, 2014 7:58:07 AM UTC-5, Elvar Böðvarsson wrote:
>
> Two options
>
> 1. Have a client instance of elasticsearch on a different server or on the 
> same server that does the query. That node must be set to master=false and 
> data=false. Being a member of the cluster means you know where the data is.
> 2. Use a http reverse proxy that connects to all the nodes in the cluster, 
> if http 9200 is unavailable on one node then traffic is sent to the other 
> nodes.
>
> Best option is to combine the two. Have two elasticsearch nodes as client 
> members of the cluster, install a reverse proxy on those two and 
> loadbalance between them on the IP level with a solution of choice.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a50ae28e-34c4-4369-a043-6cda5bf30fd2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

new API request for MultiGet

2014-12-04 Thread Ted Smith

Hello:

 I am trying to get a list of docs (same index and type) using a list of 
"ids", but only limited to certain "fields"
instead of whole doc
Currently I am doing as below

  MultiGetRequestBuilder mr;
for (String id : ids)
{
Item item = new Item(index, type, id).fields(fields);
mr.add(item);
}
MultiGetResponse res = mr.execute().actionGet();

Is there plan to have new API so that I could do 

mr.add(index, type,ids).fields(fields)

this would make a huge performance difference at client side (maybe service 
side as well)

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cd45203a-add8-437c-99b5-b5740660f6cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Is there a way to convert @timestamp of ES to Unix epoch time in milliseconds?

2014-12-04 Thread Mungeol Heo

Hi,

As I mentioned at the title of this question, I wonder is there a way
to convert @timestamp of ES to Unix epoch time in milliseconds by
using ES query?
For instance, "2014-11-10T15:00:00.000Z" to "141559920".
Any help will be great.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADQPeWwW_bitzy5GA_R8ukpXDCEj%3DXCM66N994iY35KpYPtdRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch hardware planning

RAID is useful, you just need to understand the limits. And the potential
for data loss with multiple ES nodes writing to multiple data directories
is not inconsequential if it's an important system with business
requirements.
To reiterate because it's really important this is known - if you lose one
of the data.dir points on a node you lose *all* data on the node. The ES
dev team has had talks about improving this so things are written on a
segment level rather than a direct stripe, but there's no ETA on that that
I am aware of.

ES will obviously be limited by the OS/FS/hardware/etc throughput of any
given channel, I haven't seen anyone do testing of RAID0 versus ES striping
though so it's an interesting question.

On 5 December 2014 at 12:11, Kevin Burton  wrote:

> Saying that RAID is good for anything is a bit of a stretch :-P
>
> I'm not sure how good ES is with splitting the index across volumes but
> the database has a lot more options here for load distribution.  RAID is
> naive by design and the optimizations a RAID controller/impl are limited.
>
> If ES can outperform RAID0 zero then that might be the better route ...
> just not sure how good ES is at it though.
>
> On Thursday, December 4, 2014 2:42:25 PM UTC-8, Mark Walkom wrote:
>>
>> Be aware that using multiple data locations in ES is akin to RAID0; which
>> means if you lose a disk then you lose all the data on that node.
>> Personally, I'd suggest you leverage hardware RAID and let it do what it
>> is good at, otherwise you just have more management overhead and greater
>> risk of a hardware failure causing a bigger problem.
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/68a76c15-e363-4d01-9133-bdebd8797a21%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9S9toOpaqzkjQPLg-cndKG4fZ7vsuvTuANGEdUAugZig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch hardware planning

2014-12-04 Thread Kevin Burton

Saying that RAID is good for anything is a bit of a stretch :-P

I'm not sure how good ES is with splitting the index across volumes but the 
database has a lot more options here for load distribution.  RAID is naive 
by design and the optimizations a RAID controller/impl are limited.

If ES can outperform RAID0 zero then that might be the better route ... 
just not sure how good ES is at it though.

On Thursday, December 4, 2014 2:42:25 PM UTC-8, Mark Walkom wrote:
>
> Be aware that using multiple data locations in ES is akin to RAID0; which 
> means if you lose a disk then you lose all the data on that node.
> Personally, I'd suggest you leverage hardware RAID and let it do what it 
> is good at, otherwise you just have more management overhead and greater 
> risk of a hardware failure causing a bigger problem.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/68a76c15-e363-4d01-9133-bdebd8797a21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Does Elastic Search 2.0 have shard splitting?

2014-12-04 Thread Kevin Burton

I just assumed that ES was planning on building in shard splitting at some 
point since it's a glaringly obvious addition to the feature set.

Then I saw this:

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/overallocation.html

> Users often ask why Elasticsearch doesn’t support *shard-splitting* — the 
ability to split each shard into two or more pieces.  
The
 
reason is that shard-splitting is a bad idea

I really hate that answer.  Seems like it's just trying to excuse not 
having a major feature.

The ability to incrementally add capacity to your cluster seems like a 
clear win.  One would even say this type of scalability is "elastic" ... 

Are there plans to add this in at some future point?  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bd89f8d2-6ad5-4d4e-8b8f-a0656b63afba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch adds one more shard when on index creation

Did you specify the replica count as well as the shard count? By default ES
will add a replica unless you specifically tell it not to.
You can check this using the _cat APIs
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat.html#cat

Also 10 shards is a bit of overkill for a single node, unless you are
intending on extending the cluster to more nodes of course.

On 5 December 2014 at 10:42, Marko Krstic  wrote:

> Hi,
> I have a problem with one node cluster. When i create index with 10
> shards, it adds one more active shard. First i thought that it is replica
> shard, but than i read that there is no replica on single node cluster. Is
> this some problem in settings or it is normal?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e2493064-3113-41c9-8c89-9eb3509c2ff1%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9BYb3SmPJ%3Duh8u3s%2B-Pmz1zrJ%2BKmbDAjp74iJAJbNm_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: zen discovery gossip router?

I figured as much but wanted to make sure :)

Essentially these are just plain old ES nodes, so you can pick a few at
random from your cluster and list them. It's not a specific, independent
server(s).

On 5 December 2014 at 09:54, Fernando Padilla  wrote:

> So yeah, I probably mis-named it :)  from gossip-router to gossip-server.
> But I think it still means the same thing :)
>
>
> On 12/4/14 2:52 PM, Fernando Padilla wrote:
>
>> Um, it does.. ZenDiscovery Unicast:
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/modules-discovery-zen.html#unicast
>>
>> It accepts a list of well known servers to drive it's "gossip" protocol.
>> I want to run free-standing servers/nodes with well known host:port
>> combos.  The docs don't really talk about that at all.. do i really have
>> to install and run a full ES instance to accomplish this?
>>
>>
>> On Thursday, December 4, 2014 11:06:18 AM UTC-8, Fernando Padilla wrote:
>>
>> I can't find any information in the guide nor google on how to setup
>> a zen gossip router.  Can anyone help me out?
>>
>> I really want to use elasticsearch, but I need to get over this one
>> snag. :)
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/OE0AWbvPdPc/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com
>> .
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/74ad51c8-
>> 4b26-4f85-a499-e20e9d10b5ce%40googlegroups.com
>> > 4b26-4f85-a499-e20e9d10b5ce%40googlegroups.com?utm_medium=
>> email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/5480E629.7000407%40gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8AV%2B3%3DWw%2BRbu_UiV18OhwGTUzdXg9L4joi9op5Zt2aGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: downgrading from 1.4 to1.3

This is why it's good to test before rolling out to critical platforms.

On 5 December 2014 at 09:29, Jack Judge  wrote:

> Wrong in every fundamental aspect.
>
> This was a clusterfuck and still is. The K3 dashboards are used by our
> devs, network guys and management for a variety of tasks. When they stopped
> working we lost sight of large parts of our operation.
> Because of the lack of documentation and the time pressure, I don't have
> the luxury of sitting down, picking up a coffee and learning all about
> CORS, so yeah we've downgraded back to 1.3
>
> Unless you know how to fix it, which I don't, then this is a clusterfuck,
> it rendered our system useless to us.
>
>
> On Thursday, 4 December 2014 07:40:47 UTC-8, Itamar Syn-Hershko wrote:
>>
>> Classic CORS error - maybe * is blocked by ES. Haven't had to deal with
>> this myself (yet) so can't help you here. All in all just a small rough
>> edge to smooth, not a clusterfuck.
>>
>> A quick solution would be to install K3 as a site plugin and use it
>> internally (don't expose it to the web)
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko 
>> Freelance Developer & Consultant
>> Author of RavenDB in Action 
>>
>> On Thu, Dec 4, 2014 at 3:20 AM, Jack Judge  wrote:
>>
>>> Well you're right there's JS errors, CORS related;
>>>
>>> XMLHttpRequest cannot load http://10.5.41.120:9200/
>>> logstash-2014.12.04/_search. Request header field Content-Type is not
>>> allowed by Access-Control-Allow-Headers.
>>>
>>> In my elasticsearch.yml I've got this on all nodes,
>>>
>>> http.cors.allow-origin: "/.*/"
>>> http.cors.enabled: true
>>>
>>> Which google leads me to believe should open it up for anything. K3 is
>>> fronted by apache and a bit more googling prompted me to add this to the
>>>  section of httpd.conf
>>>
>>> Header set Access-Control-Allow-Origin "*"
>>>
>>> Still getting the same errors :(
>>> I'm at a loss to know what else to do now.
>>>
>>>
>>>
>>> On Wednesday, 3 December 2014 15:48:28 UTC-8, Itamar Syn-Hershko wrote:

 I'm not aware of compat issues with K3 and ES 1.4 other than
 https://github.com/elasticsearch/kibana/issues/1637 . I'd check for
 javascript errors, and try to see what's going on under the hood, really.
 When you have more data about this, you can either quickly resolve, or open
 a concrete bug :)


  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/19d57d39-b764-493c-bd60-c8ae3aff087a%
>>> 40googlegroups.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/feb498f7-b7aa-424b-b839-66445019d204%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8SAOhM8NC4b0vR1njMKai4hwKethKnj267yhySoaK6Xw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

slow performance on phrase queries in should clause

2014-12-04 Thread Kireet Reddy

Our system is normally very responsive, but very occasionally people submit 
long phrase queries which timeout and cause high system load. Not all long 
phrase queries cause issues, but I have been debugging one that I've 
found.[1]

The query is in the filter section of a constant score query as below. This 
form times out. However if I move the query out of the should section and 
into the must section, the query runs very quickly (in the full query, 
there was another filter in the should section). Converting this to an AND 
filter is also fast. Is there a reason for this? Are should filters 
executed on the full set and not short circuited with the results of must 
filters?

{

"query": {

"constant_score": {

"filter": {

"bool": {

"must": { "terms": { -- selective terms filter -- } 
 },

"should": { "query": { "match": { "text": { "query": 
"…", "type": "phrase" } } } }

}

}

}

}

}





[1] query 
-- 
ぶ新サービスは2015年春にリリースの予定。IoTのハードウェアそのものではなく、SDKやデータベース、解析、IDといったバックグラウンド環境をサービスとして提供するというものだ。発表後、松本氏は「例えばイケてる時計型のプロダクトを作ったとして、（機能面では）単体での価値は1〜2割だったりする。でも本当に重要なのはバックエンド。しかしユーザーから見てみれば時計というプロダクトそのものに大きな価値を感じることが多い。そうであれば、IoTのバックエンドをBaaS（Backend
 
as a 
Service：ユーザーの登録や管理、データ保管といったバックエンド環境をサービスとして提供すること）のように提供できればプロダクトの開発に集中できると思う。クラウドが出てネットサービスの開発が手軽になったのと同じような環境を提供したい」とサービスについて語ってくれた。
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5b1e6260-5c19-4ac7-bf1e-939360bf509e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES 1.0.3 CPU usage drastically increased

You still are overloaded with replicas, it's pointless having them there
and it keeps your cluster out of a green state.

On 5 December 2014 at 09:52, Dunaeth  wrote:

> Actualy the master node is also a datanode (si web have two datanodes),
> but just the only one that our application is aware of. We have several
> metrics on the VM, and our outsourcer may have metrics on the physical
> host. What's strange is that this ES setup ran without trouble for many
> months before this issue occured. From what I can see from our graphs, the
> only metric that could be significative is the JVM heap used as it often
> reach 80% of the assigned heap size and seems to be flushed then, it can
> occure at most every 1 or 2 hours.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d24c1d0e-d3c2-4ce4-9d65-9df422abd15e%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9T4i0%2BCfbAH5QoC6xwzjZ7w15apd-aq9ctZAuY-R%2BV7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch adds one more shard when on index creation

2014-12-04 Thread Marko Krstic

Hi,
I have a problem with one node cluster. When i create index with 10 shards, 
it adds one more active shard. First i thought that it is replica shard, 
but than i read that there is no replica on single node cluster. Is this 
some problem in settings or it is normal?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2493064-3113-41c9-8c89-9eb3509c2ff1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How to disable repository verification in 1.4.0?

2014-12-04 Thread Lev

from the docs:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html#_repository_verification

This kills one of our super cool use cases which is restoring production 
indices from S3 to development clusters. I don't want to give s3 write 
access to the dev clusters but repo verificaiton fails without it.

How can I disable repository verification in 1.4?

Thanks,
Lev

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0cac9a2a-6141-4f93-8737-4e42e8c1c0be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: zen discovery gossip router?

2014-12-04 Thread Fernando Padilla

So yeah, I probably mis-named it :) from gossip-router to
gossip-server. But I think it still means the same thing :)

On 12/4/14 2:52 PM, Fernando Padilla wrote:

Um, it does.. ZenDiscovery Unicast:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#unicast

It accepts a list of well known servers to drive it's "gossip" protocol.
I want to run free-standing servers/nodes with well known host:port
combos. The docs don't really talk about that at all.. do i really have
to install and run a full ES instance to accomplish this?

On Thursday, December 4, 2014 11:06:18 AM UTC-8, Fernando Padilla wrote:

I can't find any information in the guide nor google on how to setup
a zen gossip router. Can anyone help me out?

I really want to use elasticsearch, but I need to get over this one
snag. :)

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/OE0AWbvPdPc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/74ad51c8-4b26-4f85-a499-e20e9d10b5ce%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5480E629.7000407%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES 1.0.3 CPU usage drastically increased

2014-12-04 Thread Dunaeth

Actualy the master node is also a datanode (si web have two datanodes), but 
just the only one that our application is aware of. We have several metrics on 
the VM, and our outsourcer may have metrics on the physical host. What's 
strange is that this ES setup ran without trouble for many months before this 
issue occured. From what I can see from our graphs, the only metric that could 
be significative is the JVM heap used as it often reach 80% of the assigned 
heap size and seems to be flushed then, it can occure at most every 1 or 2 
hours.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d24c1d0e-d3c2-4ce4-9d65-9df422abd15e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: zen discovery gossip router?

2014-12-04 Thread Fernando Padilla

Um, it does.. ZenDiscovery Unicast:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#unicast

It accepts a list of well known servers to drive it's "gossip" protocol. I 
want to run free-standing servers/nodes with well known host:port combos.  
The docs don't really talk about that at all.. do i really have to install 
and run a full ES instance to accomplish this?


On Thursday, December 4, 2014 11:06:18 AM UTC-8, Fernando Padilla wrote:
>
> I can't find any information in the guide nor google on how to setup a zen 
> gossip router.  Can anyone help me out?
>
> I really want to use elasticsearch, but I need to get over this one snag. 
> :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/74ad51c8-4b26-4f85-a499-e20e9d10b5ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: zen discovery gossip router?

What do you mean by router?
ES doesn't have that concept so perhaps you are confusing it with something
else :)

On 5 December 2014 at 06:06, Fernando Padilla  wrote:

> I can't find any information in the guide nor google on how to setup a zen
> gossip router.  Can anyone help me out?
>
> I really want to use elasticsearch, but I need to get over this one snag.
> :)
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1c7e47d9-1f4e-4551-8a05-74798c606aba%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9-AZQdk75m6EHfY_1PCWsa9Owk2mfB7guMq8rOgaKPQA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch hardware planning

Be aware that using multiple data locations in ES is akin to RAID0; which
means if you lose a disk then you lose all the data on that node.
Personally, I'd suggest you leverage hardware RAID and let it do what it is
good at, otherwise you just have more management overhead and greater risk
of a hardware failure causing a bigger problem.

The rest of your setup looks sane, gong with VMs adds a minimal performance
loss but gives you much more flexibility, and I'd start with the lower
amount of RAM as you mentioned.

On 4 December 2014 at 22:27, Elvar Böðvarsson  wrote:

> I am preparing proposals on hardware for our Elasticsearch log storage.
>
> What I would love to have are SSD's for most recent logs or SSD's for hot
> data. For that I have come down to two solutions with 3x physical servers.
>
> 1. Use Windows 2012 R2 as the OS, use Storage Spaces to prvide a tiered
> storage for SSD's and HDD's to Elasticsearch.
>
> 2. Install ESXi. Create two VM's on each server, one that has access to
> SSD disks and one who has access to HDD's. That will give me a 6 node
> Elasticsearch cluster. Use tags to keep latest indices on the SSD nodes,
> when they are x days old a script or curator will remove the SSD tag from
> the index and add a HDD tag, hopefully resulting in a migration to the HDD
> nodes. Either Windows 2012 R2 or Ubuntu will be used here.
>
> Each Elasticsearch node will get either 32gig memory or 64gig memory.
> Undecided on that at the moment, might go with the lower amount to have the
> option of expanding it if there is need. With 192gigs in the cluster there
> might not even be a need for SSD's.
>
> Not sure about the number of cores.
>
> Plan to skip raid on everything except the OS disks and use Elasticsearch
> striping for the data. Total storage will be about 20gigs a day without
> replication, and total storage will be about 15tb.
>
> Any angles I'm missing?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1e6bbc44-fc3b-4724-b657-63722a951d06%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8sOZvre9Mbb8%3DU%2B2xxGnede4hdL_SRKeMV95uFRDL0Fw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: downgrading from 1.4 to1.3

2014-12-04 Thread Jack Judge

Wrong in every fundamental aspect.

This was a clusterfuck and still is. The K3 dashboards are used by our 
devs, network guys and management for a variety of tasks. When they stopped 
working we lost sight of large parts of our operation. 
Because of the lack of documentation and the time pressure, I don't have 
the luxury of sitting down, picking up a coffee and learning all about 
CORS, so yeah we've downgraded back to 1.3 

Unless you know how to fix it, which I don't, then this is a clusterfuck, 
it rendered our system useless to us.


On Thursday, 4 December 2014 07:40:47 UTC-8, Itamar Syn-Hershko wrote:
>
> Classic CORS error - maybe * is blocked by ES. Haven't had to deal with 
> this myself (yet) so can't help you here. All in all just a small rough 
> edge to smooth, not a clusterfuck.
>
> A quick solution would be to install K3 as a site plugin and use it 
> internally (don't expose it to the web)
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
> On Thu, Dec 4, 2014 at 3:20 AM, Jack Judge  > wrote:
>
>> Well you're right there's JS errors, CORS related;
>>
>> XMLHttpRequest cannot load 
>> http://10.5.41.120:9200/logstash-2014.12.04/_search. Request header 
>> field Content-Type is not allowed by Access-Control-Allow-Headers.
>>
>> In my elasticsearch.yml I've got this on all nodes,
>>
>> http.cors.allow-origin: "/.*/"
>> http.cors.enabled: true
>>
>> Which google leads me to believe should open it up for anything. K3 is 
>> fronted by apache and a bit more googling prompted me to add this to the 
>>  section of httpd.conf 
>>
>> Header set Access-Control-Allow-Origin "*"
>>
>> Still getting the same errors :( 
>> I'm at a loss to know what else to do now.
>>
>>
>>
>> On Wednesday, 3 December 2014 15:48:28 UTC-8, Itamar Syn-Hershko wrote:
>>>
>>> I'm not aware of compat issues with K3 and ES 1.4 other than 
>>> https://github.com/elasticsearch/kibana/issues/1637 . I'd check for 
>>> javascript errors, and try to see what's going on under the hood, really. 
>>> When you have more data about this, you can either quickly resolve, or open 
>>> a concrete bug :)
>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/19d57d39-b764-493c-bd60-c8ae3aff087a%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/feb498f7-b7aa-424b-b839-66445019d204%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES 1.0.3 CPU usage drastically increased

Why so many replicas when you only have one data node? You won't even be
able to allocate them!
Your heap is also pretty small, 2GB is something you'd generally run on a
dev instance, I'd suggest going to 4GB if you can.

You need some monitoring around this to really put things into perspective.
Try installing a plugin like ElasticHQ and definitely look at Marvel as
well, you should also be monitoring the VMs on a system level to give you a
better idea on general resource usage.

Try updating to later versions of ES, there are always good improvements as
new releases come out so you may get a positive gain there. But without
understanding your resource use more it's hard to say,

On 5 December 2014 at 02:25, Dunaeth  wrote:

> Hi,
>
> We're running a two-nodes ES 1.0.3 cluster with the current setup :
>
> VM on host A :
> 4 vCore CPU
> 32GB RAM
> ES master (only node being queried)
> MySQL slave (used as a backup, never queried)
> JVM settings
>
> /usr/lib/jvm/java-7-openjdk-amd64//bin/java -Xms2g -Xmx2g -Xss256k
> -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
> -Des.pidfile=/var/run/elasticsearch.pid
> -Des.path.home=/usr/share/elasticsearch -cp
> :/usr/share/elasticsearch/lib/elasticsearch-1.0.3.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
> -Des.default.config=/etc/elasticsearch/elasticsearch.yml
> -Des.default.path.home=/usr/share/elasticsearch
> -Des.default.path.logs=/home/log/elasticsearch
> -Des.default.path.data=/home/elasticsearch
> -Des.default.path.work=/tmp/elasticsearch
> -Des.default.path.conf=/etc/elasticsearch
> org.elasticsearch.bootstrap.Elasticsearch
>
>
> VM on host B :
> 2 vCore CPU
> 16GB RAM
> ES datanode (search are dispatched, no indexing)
> MySQL master
> JVM settings
>
> /usr/lib/jvm/java-7-openjdk-amd64//bin/java -Xms2g -Xmx2g -Xss256k
> -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
> -Des.pidfile=/var/run/elasticsearch.pid
> -Des.path.home=/usr/share/elasticsearch -cp
> :/usr/share/elasticsearch/lib/elasticsearch-1.0.3.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
> -Des.default.config=/etc/elasticsearch/elasticsearch.yml
> -Des.default.path.home=/usr/share/elasticsearch
> -Des.default.path.logs=/home/log/elasticsearch
> -Des.default.path.data=/home/elasticsearch
> -Des.default.path.work=/tmp/elasticsearch
> -Des.default.path.conf=/etc/elasticsearch
> org.elasticsearch.bootstrap.Elasticsearch
>
> ---
>
> Before we got a 4 vCore CPU / 32GB VM, the master node was the same as the
> secondary node.
> On this cluster, we have a 5 shards (+5 replica) index - we'll call it
> main - with ~130k documents at the moment for a 120MB size which, we update
> documents that were updated by our customers in our application with a cron
> than run every 5 minutes and updates at most 2k docs, we can have a few
> thousands docs in queue.
> We are also using logstash to log some user actions that our application
> relies on in monthly basis indices. Those indices have 2 shards (+2
> replica) with 1~6M docs for a size from 380MB to 1.5GB. At the moment, we
> have 11 log indices.
> We do search queries on both the main and the latest log indices.
> Marginally, some queries can occure on older log indices.
> Looking at our stats, I'd say we have a 2:1 indexing / searching ratio,
> but it can vary depending on seasonality.
> We also have a 1 shard (+1 replica) dedicated percolator index on which
> we're executing percolation queries before each log that will be indexed in
> ES through logstash.
> We never optimized any index.
>
> Our issue :
>
> Since we updated ES to v1.0.3 to deal with a field data breaker bug,
> everything was running fine until we experienced a drastic CPU usage
> increase(from near 100% to 200%) without any reason (no change in our
> application nor on the traffic we got). No ES restart have been able to
> give us back a normal CPU usage. In emergency, we decided to switch our
> main node from 2 vCore CPU/ 16GB to 4 vCore CPU / 32 GB and the CPU usage
> of the new node never went beyond 30% for almost 10 days. And then the
> issue happened again, the CPU usage raised to 400% without any reason.
>
> It is worth noting that the secondary node is not subject to this issue.
>
> Our outsourcer told us this CPU increase was due to deadlocks caused by
> malformed queries, but those malformed queries already happened before and
> restarting ES didn't solve the high CPU usage.
> He also told us our server had not enough resources and it would be better
> to have 2 serveurs for MySQL master / slave and 2 to 3 distinct serveurs
> for the ES cluster, which seems weird when we saw the main ES server had a
> maximum 30% CPU usage for days.
>
> We plan

Re: ElasticSearch can't automatically recover after a big HEAP utilization

What ES version, what Java version?
How much actual data?

On 5 December 2014 at 04:31, Sergio Henrique 
wrote:

> Hi guys, everything ok?
>
> I want to talk about a problem that we are facing with our ES cluster.
>
> Today we have four machines in our cluster, each machine has 16GB of RAM
> (8GB HEAP and 8GB OS).
> We have a total of 73,975,578 documents, 998 shards and 127 indices.
> To index our docs we use the bulk API. Today our bulk request is made with
> a total
> up to 300 items. We put our docs in a queue so we can make the request in
> the background.
> The log below shows a piece of the information about the amount of
> documents that was
> sent for ES to index:
>
> [2014-12-03 11:19:32 -0200] execute Event Create with 77 items in app 20
> [2014-12-03 11:19:32 -0200] execute User Create with 1 items in app 67
> [2014-12-03 11:19:40 -0200] execute User Create with 1 items in app 61
> [2014-12-03 11:19:49 -0200] execute User Create with 1 items in app 62
> [2014-12-03 11:19:50 -0200] execute User Create with 1 items in app 27
> [2014-12-03 11:19:50 -0200] execute User Create with 2 items in app 20
> [2014-12-03 11:19:54 -0200] execute User Create with 5 items in app 61
> [2014-12-03 11:19:58 -0200] execute User Update with 61 items in app 20
> [2014-12-03 11:20:02 -0200] execute User Create with 2 items in app 61
> [2014-12-03 11:20:02 -0200] execute User Create with 1 items in app 27
> [2014-12-03 11:20:10 -0200] execute User Create with 2 items in app 20
> [2014-12-03 11:20:19 -0200] execute User Create with 5 items in app 61
> [2014-12-03 11:20:20 -0200] execute User Create with 3 items in app 20
> [2014-12-03 11:20:20 -0200] execute User Create with 1 items in app 24
> [2014-12-03 11:20:25 -0200] execute User Create with 1 items in app 61
> [2014-12-03 11:20:28 -0200] execute User Create with 1 items in app 20
> [2014-12-03 11:20:37 -0200] execute Event Create with 91 items in app 20
> [2014-12-03 11:20:42 -0200] execute User Create with 1 items in app 76
> [2014-12-03 11:20:42 -0200] execute Event Create with 300 items in app 61
> [2014-12-03 11:20:50 -0200] execute User Create with 4 items in app 61
> [2014-12-03 11:20:51 -0200] execute User Create with 1 items in app 62
> [2014-12-03 11:20:51 -0200] execute User Create with 2 items in app 20
> [2014-12-03 11:20:55 -0200] execute User Create with 3 items in app 61
>
> Sometimes the request occurs with just one item in the bulk. Another
> interesting
> thing is: we send that data frequently, in other words, the stress we put
> in
> ES is pretty high.
>
> The big problem is when ES HEAP start approaching 75% of utilization and
> the GC does not reach its normal value.
>
> This log entrance show some GC working:
>
> [2014-12-02 21:28:04,766][WARN ][monitor.jvm  ] [es-node-2]
> [gc][old][43249][56] duration [48s], collections [2]/[48.2s], total
> [48s]/[17.9m], memory [8.2gb]->[8.3gb]/[8.3gb], all_pools {[young]
> [199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
> [14.1mb]->[18.9mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
> [2014-12-02 21:28:33,120][WARN ][monitor.jvm  ] [es-node-2]
> [gc][old][43250][57] duration [28.3s], collections [1]/[28.3s], total
> [28.3s]/[18.4m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
> [199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
> [18.9mb]->[17.5mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
> [2014-12-02 21:29:21,222][WARN ][monitor.jvm  ] [es-node-2]
> [gc][old][43251][59] duration [47.9s], collections [2]/[48.1s], total
> [47.9s]/[19.2m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
> [199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
> [17.5mb]->[21.2mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
> [2014-12-02 21:30:08,916][WARN ][monitor.jvm  ] [es-node-2]
> [gc][old][43252][61] duration [47.5s], collections [2]/[47.6s], total
> [47.5s]/[20m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
> [199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
> [21.2mb]->[20.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
> [2014-12-02 21:30:56,208][WARN ][monitor.jvm  ] [es-node-2]
> [gc][old][43253][63] duration [47.1s], collections [2]/[47.2s], total
> [47.1s]/[20.7m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
> [199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
> [20.8mb]->[24.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
> [2014-12-02 21:32:07,013][WARN ][transport] [es-node-2]
> Received response for a request that has timed out, sent [165744ms] ago,
> timed out [8ms] ago, action [discovery/zen/fd/ping], node
> [[es-node-1][sXwCdIhSRZKq7xZ6TAQiBg][localhost][inet[xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:9300]]],
> id [3002106]
> [2014-12-02 21:36:41,880][WARN ][monitor.jvm  ] [es-node-2]
> [gc][old][43254][78] duration [5.7m], collections [15]/[5.7m], total
> [5.7m]/[26.5m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
> [199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
> [24.8mb]->[24.4mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
>
> Another part that we use a lot is the ES sea

Re: Differences between _source in Kibana and in elasticsearch

2014-12-04 Thread Jay Swan

I would guess that you need to refresh your field list in the Settings > 
Indices > Index pattern section of Kibana4; this is a new thing in Kibana4 
that's very different from v3. Drove me crazy trying to figure it out until 
I filed an issue. See Rashid's answer to my Github issue here:

https://github.com/elasticsearch/kibana/issues/1995

It would be nice to see this happen at least semi-automagically in the 
future; it's going to bite a lot of people during migration.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fa95a091-752b-40de-b5f4-99dc3cc0c5aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How to get ES to autodetect geo_point type and if it can't, where should I define the mapping?

2014-12-04 Thread am

Hello, I am using ES js wrapper in a nodejs application. I would like to 
get ES to do some geospatial searches, so from what I understand, I need to 
set the following options on mylocation field:

mylocation: {
properties: {
type: 'geo_point',
lat_lon: true
}
}

I am inserting the data like so:

// In my nodejs insert()
client.index({
index: 'myapp',
type: 'user',
id: user.id,
body: {
name: user.name,
mylocation: { lat: mylat, lon: mylon}
}
});


This inserts the lat,lon as doubles though. To counter this, I have a 
declared a mapping:

client.indicies.putMappings({
index: 'myapp',
type: 'user',
body: {
'user': {
 properties: {
 mylocation: { type: 'geo_point', lat_lon: true }
 }
}
}
});

This all works, but it is it possible to supply ES a certain format so that 
ES will automatically create a geo_point type with lat_lon set to true?

If not, could someone clarify where I should declare the mappings? Should 
mappings be declared each time before an insert/update happens in ES or is 
it just one time? If it's just one time, do I put the mapping inside my 
root nodejs application? What happens if I restart the nodejs application: 
would the mapping attempt to go through all ES documents and reindex the 
mylocation field to geo_point even though it's already been set from the 
previous reboot of the app?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d1ac1327-365f-4bcb-a0f9-94dd89580c7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Good merge settings for interactively maintained index

2014-12-04 Thread Michael McCandless

OK I ran a quick test using Wikipedia docs; net/net I think
TieredMergePolicy's (the default) behavior is fine.  Once a too-large
segment has > 50% deletes it is eligible for merging and will be
aggressively merged.

To visualize this, I first built a 33.3M doc Wikipedia index (append
only), then ran forever randomly replacing each doc, which is a worst
case test since every update also deletes a previous doc.

I set max merged segment size to 800 MB, so I had a good number (17)
of them; otherwise I left TMP at defaults.

I refreshed every 3 seconds, and plotted the resulting graph of %tg
deleted but not yet merge docs over time:



It quickly ramps up from 0 at the start and only falls again once
the too-large segments start being merged and eventually stabilizes
to a fairly narrow range of 33%-45%.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Dec 4, 2014 at 5:30 AM, Michael McCandless 
wrote:

> 25-40% is definitely "normal" for an index where many docs are being
> replaced; I've seen this go up to ~65% before large merges bring it back
> down.
>
> On 2) there may be some improvements we can make to Lucene default
> TieredMergePolicy here, to reclaim deletes for the "too large" segments ...
> I'll have a look.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Dec 4, 2014 at 4:06 AM, Michal Taborsky  > wrote:
>
>> Hello Nikolas,
>>
>> we are facing similar behavior. Did you find out anything?
>>
>> Thank you,
>> Michal
>>
>> Dne pondělí, 8. září 2014 22:55:12 UTC+2 Nikolas Everett napsal(a):
>>
>>> My indexes change somewhat frequently.  If I let leave the merge
>>> settings as the default I end up with 25%-40% deleted documents (some
>>> indexes higher, some lower).  I'm looking for some generic advice on:
>>> 1.  Is that 25%-40% ok?
>>> 2.  What kind of settings should I set to keep that in an acceptable
>>> range?  For some meaning of acceptable.
>>>
>>> On (1) I'm pretty sure 25%-40% is OK for my low query traffic indexes -
>>> no use optimizing them anyway.  But for my high search traffic indexes I
>>> _think_ I see a performance improvement when I have lower (<5%) deleted
>>> documents and fewer segments.  But computers are complicated and my
>>> performance tests might just have been testing cache warming  Does this
>>> conclusion match other's experience?
>>>
>>> On (2) I'm not really sure what to do.  It _looks_ _like_ Lucene isn't
>>> picking up the bigger segments to merge the deletes out of them.  I assume
>>> that is because they are bumping against the max allowed segment size and
>>> therefor it can only merge one at a time so it always has something better
>>> to do.  I'm not sure that is healthy though.  Some of those old segments
>>> can get really bloated - like 40%-50% deleted.
>>>
>>> Thanks!
>>>
>>> Nik
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe_cN%2B2PtNT68z%2B5%3DDJ4W-vaO4-pUJ3bo1o0AFe%3D-4B1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Startup issues with ES 1.3.5

2014-12-04 Thread Support Monkey

I would think the network is a prime suspect then, as there is no 
significant difference between 1.2.x and 1.3.x in relation to memory usage. 
And you'd certainly see OOMs in node logs if it was a memory issue.

On Thursday, December 4, 2014 12:45:58 PM UTC-8, Chris Moore wrote:
>
> There is nothing (literally) in the log of either data node after the node 
> joined events and nothing in the master log between index recovery and the 
> first error message.
>
> There are 0 queries run before the errors start occurring (access to the 
> nodes is blocked via a firewall, so the only communications are between the 
> nodes). We have 50% of the RAM allocated to the heap on each node (4GB 
> each).
>
> This cluster operated without issue under 1.1.2. Did something change 
> between 1.1.2 and 1.3.5 that drastically increased idle heap requirements?
>
>
> On Thursday, December 4, 2014 3:29:23 PM UTC-5, Support Monkey wrote:
>>
>> Generally ReceiveTimeoutTransportException is due to network disconnects 
>> or a node failing to respond due to heavy load. What does the log 
>> of pYi3z5PgRh6msJX_armz_A show you? Perhaps it has too little heap 
>> allocated. Rule of thumb is 1/2 available memory but <= 31GB
>>
>> On Wednesday, December 3, 2014 12:52:58 PM UTC-8, Jeff Keller wrote:
>>>
>>>
>>> ES Version: 1.3.5
>>>
>>> OS: Ubuntu 14.04.1 LTS
>>>
>>> Machine: 2 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, 8 GB RAM at AWS
>>>
>>> master (ip-10-0-1-18), 2 data nodes (ip-10-0-1-19, ip-10-0-1-20)
>>>
>>>
>>> *After upgrading from ES 1.1.2...*
>>>
>>>
>>> 1. Startup ES on master
>>> 2. All nodes join cluster
>>> 3. [2014-12-03 20:30:54,789][INFO ][gateway  ] 
>>> [ip-10-0-1-18.ec2.internal] recovered [157] indices into cluster_state
>>> 4. Checked health a few times
>>>
>>>
>>> curl -XGET localhost:9200/_cat/health?v
>>>
>>>
>>> 5. 6 minutes after cluster recovery initiates (and 5:20 after the 
>>> recovery finishes), the log on the master node (10.0.1.18) reports:
>>>
>>>
>>> [2014-12-03 20:36:57,532][DEBUG][action.admin.cluster.node.stats] 
>>> [ip-10-0-1-18.ec2.internal] failed to execute on node 
>>> [pYi3z5PgRh6msJX_armz_A]
>>>
>>> org.elasticsearch.transport.ReceiveTimeoutTransportException: 
>>> [ip-10-0-1-20.ec2.internal][inet[/10.0.1.20:9300]][cluster/nodes/stats/n] 
>>> request_id [17564] timed out after [15001ms]
>>>
>>> at 
>>> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
>>>
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>> 6. Every 30 seconds or 60 seconds, the above error is reported for one 
>>> or more of the data nodes
>>>
>>> 7. During this time, queries (search, index, etc.) don’t return. They 
>>> hang until the error state temporarily resolves itself (a varying time 
>>> around 15-20 minutes) at which point the expected result is returned.
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c17de300-8b29-4ef9-ae64-d723cc0ad45c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Startup issues with ES 1.3.5

2014-12-04 Thread Chris Moore

There is nothing (literally) in the log of either data node after the node 
joined events and nothing in the master log between index recovery and the 
first error message.

There are 0 queries run before the errors start occurring (access to the 
nodes is blocked via a firewall, so the only communications are between the 
nodes). We have 50% of the RAM allocated to the heap on each node (4GB 
each).

This cluster operated without issue under 1.1.2. Did something change 
between 1.1.2 and 1.3.5 that drastically increased idle heap requirements?


On Thursday, December 4, 2014 3:29:23 PM UTC-5, Support Monkey wrote:
>
> Generally ReceiveTimeoutTransportException is due to network disconnects 
> or a node failing to respond due to heavy load. What does the log 
> of pYi3z5PgRh6msJX_armz_A show you? Perhaps it has too little heap 
> allocated. Rule of thumb is 1/2 available memory but <= 31GB
>
> On Wednesday, December 3, 2014 12:52:58 PM UTC-8, Jeff Keller wrote:
>>
>>
>> ES Version: 1.3.5
>>
>> OS: Ubuntu 14.04.1 LTS
>>
>> Machine: 2 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, 8 GB RAM at AWS
>>
>> master (ip-10-0-1-18), 2 data nodes (ip-10-0-1-19, ip-10-0-1-20)
>>
>>
>> *After upgrading from ES 1.1.2...*
>>
>>
>> 1. Startup ES on master
>> 2. All nodes join cluster
>> 3. [2014-12-03 20:30:54,789][INFO ][gateway  ] 
>> [ip-10-0-1-18.ec2.internal] recovered [157] indices into cluster_state
>> 4. Checked health a few times
>>
>>
>> curl -XGET localhost:9200/_cat/health?v
>>
>>
>> 5. 6 minutes after cluster recovery initiates (and 5:20 after the 
>> recovery finishes), the log on the master node (10.0.1.18) reports:
>>
>>
>> [2014-12-03 20:36:57,532][DEBUG][action.admin.cluster.node.stats] 
>> [ip-10-0-1-18.ec2.internal] failed to execute on node 
>> [pYi3z5PgRh6msJX_armz_A]
>>
>> org.elasticsearch.transport.ReceiveTimeoutTransportException: 
>> [ip-10-0-1-20.ec2.internal][inet[/10.0.1.20:9300]][cluster/nodes/stats/n] 
>> request_id [17564] timed out after [15001ms]
>>
>> at 
>> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> 6. Every 30 seconds or 60 seconds, the above error is reported for one or 
>> more of the data nodes
>>
>> 7. During this time, queries (search, index, etc.) don’t return. They 
>> hang until the error state temporarily resolves itself (a varying time 
>> around 15-20 minutes) at which point the expected result is returned.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/50dfaccc-b8c6-4f72-afad-d641078d42e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Startup issues with ES 1.3.5

2014-12-04 Thread Support Monkey

Generally ReceiveTimeoutTransportException is due to network disconnects or 
a node failing to respond due to heavy load. What does the log 
of pYi3z5PgRh6msJX_armz_A show you? Perhaps it has too little heap 
allocated. Rule of thumb is 1/2 available memory but <= 31GB

On Wednesday, December 3, 2014 12:52:58 PM UTC-8, Jeff Keller wrote:
>
>
> ES Version: 1.3.5
>
> OS: Ubuntu 14.04.1 LTS
>
> Machine: 2 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, 8 GB RAM at AWS
>
> master (ip-10-0-1-18), 2 data nodes (ip-10-0-1-19, ip-10-0-1-20)
>
>
> *After upgrading from ES 1.1.2...*
>
>
> 1. Startup ES on master
> 2. All nodes join cluster
> 3. [2014-12-03 20:30:54,789][INFO ][gateway  ] 
> [ip-10-0-1-18.ec2.internal] recovered [157] indices into cluster_state
> 4. Checked health a few times
>
>
> curl -XGET localhost:9200/_cat/health?v
>
>
> 5. 6 minutes after cluster recovery initiates (and 5:20 after the recovery 
> finishes), the log on the master node (10.0.1.18) reports:
>
>
> [2014-12-03 20:36:57,532][DEBUG][action.admin.cluster.node.stats] 
> [ip-10-0-1-18.ec2.internal] failed to execute on node 
> [pYi3z5PgRh6msJX_armz_A]
>
> org.elasticsearch.transport.ReceiveTimeoutTransportException: 
> [ip-10-0-1-20.ec2.internal][inet[/10.0.1.20:9300]][cluster/nodes/stats/n] 
> request_id [17564] timed out after [15001ms]
>
> at 
> org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
> 6. Every 30 seconds or 60 seconds, the above error is reported for one or 
> more of the data nodes
>
> 7. During this time, queries (search, index, etc.) don’t return. They hang 
> until the error state temporarily resolves itself (a varying time around 
> 15-20 minutes) at which point the expected result is returned.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5b3a2f05-af84-4334-a924-4b828c33138c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java API synchronous vs asynchronous?

2014-12-04 Thread mansishah

I have a related question about synchronous behavior of Java APIs. I 
understand refresh will make sure that the indexed document becomes 
searchable but what about subsequent updates / deletes to the same document.

If I index a document and it is going to be done asynchronously does that 
mean an immediate update or delete on this document could potentially fail 
because the document is not yet actually saved? Note that the update/ 
delete could be coming from potentially different client but still 
serially. 

Is there a API like "flush" which guarantees that everything that has been 
indexed so far is now available in ES (not for search but for 
get/update/delete) at least in the trans log.

On Sunday, April 13, 2014 8:34:19 AM UTC-7, Jörg Prante wrote:
>
> There is massive impact - but try and see for yourself. 
>
> Jörg
>
>
> On Sun, Apr 13, 2014 at 4:54 PM, Edward Chin  > wrote:
>
>> If I do that, and there are concurrent threads doing the same thing, is 
>> there impact to threads that are simply searching?  I don't mind the 
>> indexing threads being slow, but i don't want to impact the search threads 
>> too much.
>>
>> thanks
>> ed
>>
>> On Apr 13, 2014, at 10:47 AM, joerg...@gmail.com  wrote:
>>
>> Look at IndexRequestBuilder, there is a method setRefresh(true) so you 
>> can enforce a post refresh per index request.
>>
>> Jörg
>>
>>
>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/DWN30UYzeN4/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHJay-pvs%3D4QoOyfYs%3DZkC_fQTtedO%2B6EeS0iyL_KHffA%40mail.gmail.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5F90E219-EAF2-41AA-848D-99D27CB0952F%40gmail.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f143fb0f-625f-4965-8416-346017080b0c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES 1.4.0 TransportClient intermittent disconnects (NodeDisconnectedException)

2014-12-04 Thread Zed



We're in the middle of testing a new ES implementation in our QA 
environment.   We have set up a service which has a singleton 
TrasportClient instance.   After times of inactivity, when invoking a 
search via the client we receive NodeDisconnectedExceptions.   

We normally can connect and run queries, however this appears to happen 
after queries have not run for some time (unsure of how long -- possibly an 
hour or more).Doesn't the client by-default use a keepalive so that the 
firewall does not close any open communications with the server?Besides 
defining the cluster.name and specifying the transport hosts, we have no 
other client settings.

Environment:   ES 1.4.0

Client:  TransportClient (singleton used inside a Spring container)

We see errors like below:

org.elasticsearch.client.transport.NoNodeAvailableException: None of the 
configured nodes were available: [[Elektra 
Natchios][I-9Rp2hITi6MLTYfgTQm4w][blablahhost][inet[/10.230.2.159:9300]], 
[Toro][CeyRN2idRzGgJC3gwmsxdw][bwzldelas1][inet[/10.230.2.158:9300]]]

at 
org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:234)

at 
org.elasticsearch.action.TransportActionNodeProxy$1.handleException(TransportActionNodeProxy.java:78)

at 
org.elasticsearch.transport.TransportService$Adapter$3.run(TransportService.java:323)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)

Caused by: org.elasticsearch.transport.NodeDisconnectedException: [Elektra 
Natchios][inet[/10.230.2.159:9300]][indices:data/read/search] disconnected

Can one of you ES gurus please point me in the right direction.Thanks!

Zed

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2db4e96f-bc89-43aa-881f-d3fdc79da216%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

NodeDisconnectedException - help!

2014-12-04 Thread Zed



We're in the middle of testing a new ES implementation in our QA 
environment.   We have set up a service which has a singleton 
TrasportClient instance.   After times of inactivity, when invoking a 
search via the client we receive NodeDisconnectedExceptions.   

We normally can connect and run queries, however this appears to happen 
after queries have not run for some time (unsure of how long -- possibly an 
hour or more).Doesn't the client by-default use a keepalive so that the 
firewall does not close any open communications with the server?Besides 
defining the cluster.name and specifying the transport hosts, we have no 
other client settings.

Environment:   ES 1.4.1

Client:  TransportClient (singleton used inside a Spring container)

We see errors like below:

org.elasticsearch.client.transport.NoNodeAvailableException: None of the 
configured nodes were available: [[Elektra 
Natchios][I-9Rp2hITi6MLTYfgTQm4w][blablahhost][inet[/10.230.2.159:9300]], 
[Toro][CeyRN2idRzGgJC3gwmsxdw][bwzldelas1][inet[/10.230.2.158:9300]]]

at 
org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:234)

at 
org.elasticsearch.action.TransportActionNodeProxy$1.handleException(TransportActionNodeProxy.java:78)

at 
org.elasticsearch.transport.TransportService$Adapter$3.run(TransportService.java:323)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)

Caused by: org.elasticsearch.transport.NodeDisconnectedException: [Elektra 
Natchios][inet[/10.230.2.159:9300]][indices:data/read/search] disconnected

Can one of you ES gurus please point me in the right direction.Thanks!

Zed

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cac8db95-e4d4-474a-91a6-7426a661ff1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

zen discovery gossip router?

2014-12-04 Thread Fernando Padilla

I can't find any information in the guide nor google on how to setup a zen 
gossip router.  Can anyone help me out?

I really want to use elasticsearch, but I need to get over this one snag. :)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c7e47d9-1f4e-4551-8a05-74798c606aba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Do I tell the world to hit one node? Or many? Or load balance?

2014-12-04 Thread Christopher Ambler

I'm curious why no data. Wouldn't having the data local mean faster lookups?

On Wednesday, December 3, 2014 1:14:10 PM UTC-8, Christian Hedegaard wrote:
>
>  In our environment our cluster is inside EC2/VPC. We have an ELB in 
> front of the cluster. We use DNS to assign a CNAME to the ELB for easier 
> internal use. The cluster is currently at 15 nodes, 3 of which are “master 
> only, no data” and associate themselves with the ELB. The ELB balances 
> requests to/from the master nodes. The master nodes are slightly smaller in 
> memory, but faster in CPU than the rest of the nodes so they can quickly 
> serve requests. The rest of the nodes are “data only” nodes. They are not 
> master eligible and they just store and serve data to/from the masters via 
> the ELB.
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7b92cd3-1cd9-4d58-bb66-17e7aa02647b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: geo_point mapped field is returned as string in java api

2014-12-04 Thread Nicholas Knize

Glad to hear you resolved your issue.  Let us know if you have any other 
questions.

- Nick

On Thursday, December 4, 2014 12:04:14 PM UTC-6, T Vinod Gupta wrote:
>
> nevermind, i solved it by doing something like this - 
> GeoPoint latLng = GeoPoint.parseFromLatLon((String) 
> sourceMap.get("lat_lng"));
>
> at the time of indexing, i am passing as ","
>
> earlier i was passing as GeoPoint but that caused a major problem and 
> messed up my mapping. When i read the document and then change some other 
> field and do index, then it would raise this exception - 
> ElasticsearchParseException("field must be either '" + LATITUDE + "', '" + 
> LONGITUDE + "' or '" + GEOHASH + "'");
>
> so i figured that indexing by passing the string (even though mapping is 
> geo_point) works.
> thanks 
>
> On Thu, Dec 4, 2014 at 6:32 AM, Nicholas Knize <
> nichola...@elasticsearch.com > wrote:
>
>> Can you post a code example from your use case for how you're inserting, 
>> retrieving, and reading the documents?
>>
>> - Nick
>>
>>
>> On Tuesday, December 2, 2014 2:16:30 PM UTC-6, T Vinod Gupta wrote:
>>>
>>> has anyone seen this problem? my mapping says that the field is of type 
>>> geo_point. but when i read documents using java api and get the sourcemap, 
>>> the type of the field is string and i can't cast it to a GeoPoint.
>>>
>>> ...
>>>   "lat_lng" : {
>>> "type" : "geo_point",
>>> "lat_lon" : true
>>>   },
>>> ..
>>>
>>> thanks
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/b5ed03a4-eaad-491b-9856-087ae3ab2587%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/138a1355-7815-4afa-915d-924332ce494a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: geo_point mapped field is returned as string in java api

2014-12-04 Thread T Vinod Gupta

nevermind, i solved it by doing something like this -
GeoPoint latLng = GeoPoint.parseFromLatLon((String)
sourceMap.get("lat_lng"));

at the time of indexing, i am passing as ","

earlier i was passing as GeoPoint but that caused a major problem and
messed up my mapping. When i read the document and then change some other
field and do index, then it would raise this exception -
ElasticsearchParseException("field must be either '" + LATITUDE + "', '" +
LONGITUDE + "' or '" + GEOHASH + "'");

so i figured that indexing by passing the string (even though mapping is
geo_point) works.
thanks

On Thu, Dec 4, 2014 at 6:32 AM, Nicholas Knize <
nicholas.kn...@elasticsearch.com> wrote:

> Can you post a code example from your use case for how you're inserting,
> retrieving, and reading the documents?
>
> - Nick
>
>
> On Tuesday, December 2, 2014 2:16:30 PM UTC-6, T Vinod Gupta wrote:
>>
>> has anyone seen this problem? my mapping says that the field is of type
>> geo_point. but when i read documents using java api and get the sourcemap,
>> the type of the field is string and i can't cast it to a GeoPoint.
>>
>> ...
>>   "lat_lng" : {
>> "type" : "geo_point",
>> "lat_lon" : true
>>   },
>> ..
>>
>> thanks
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b5ed03a4-eaad-491b-9856-087ae3ab2587%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHau4yvDyp5uO_bhj%3DFwNtNB_fPu9sHpbKfb-OMr4hBygL8bfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSearch can't automatically recover after a big HEAP utilization

2014-12-04 Thread Sergio Henrique

Hi guys, everything ok?

I want to talk about a problem that we are facing with our ES cluster.

Today we have four machines in our cluster, each machine has 16GB of RAM 
(8GB HEAP and 8GB OS).
We have a total of 73,975,578 documents, 998 shards and 127 indices.
To index our docs we use the bulk API. Today our bulk request is made with 
a total
up to 300 items. We put our docs in a queue so we can make the request in 
the background.
The log below shows a piece of the information about the amount of 
documents that was
sent for ES to index:

[2014-12-03 11:19:32 -0200] execute Event Create with 77 items in app 20
[2014-12-03 11:19:32 -0200] execute User Create with 1 items in app 67
[2014-12-03 11:19:40 -0200] execute User Create with 1 items in app 61
[2014-12-03 11:19:49 -0200] execute User Create with 1 items in app 62
[2014-12-03 11:19:50 -0200] execute User Create with 1 items in app 27
[2014-12-03 11:19:50 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:19:54 -0200] execute User Create with 5 items in app 61
[2014-12-03 11:19:58 -0200] execute User Update with 61 items in app 20
[2014-12-03 11:20:02 -0200] execute User Create with 2 items in app 61
[2014-12-03 11:20:02 -0200] execute User Create with 1 items in app 27
[2014-12-03 11:20:10 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:20:19 -0200] execute User Create with 5 items in app 61
[2014-12-03 11:20:20 -0200] execute User Create with 3 items in app 20
[2014-12-03 11:20:20 -0200] execute User Create with 1 items in app 24
[2014-12-03 11:20:25 -0200] execute User Create with 1 items in app 61
[2014-12-03 11:20:28 -0200] execute User Create with 1 items in app 20
[2014-12-03 11:20:37 -0200] execute Event Create with 91 items in app 20
[2014-12-03 11:20:42 -0200] execute User Create with 1 items in app 76
[2014-12-03 11:20:42 -0200] execute Event Create with 300 items in app 61
[2014-12-03 11:20:50 -0200] execute User Create with 4 items in app 61
[2014-12-03 11:20:51 -0200] execute User Create with 1 items in app 62
[2014-12-03 11:20:51 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:20:55 -0200] execute User Create with 3 items in app 61

Sometimes the request occurs with just one item in the bulk. Another 
interesting
thing is: we send that data frequently, in other words, the stress we put in
ES is pretty high.

The big problem is when ES HEAP start approaching 75% of utilization and 
the GC does not reach its normal value.

This log entrance show some GC working:

[2014-12-02 21:28:04,766][WARN ][monitor.jvm  ] [es-node-2] 
[gc][old][43249][56] duration [48s], collections [2]/[48.2s], total 
[48s]/[17.9m], memory [8.2gb]->[8.3gb]/[8.3gb], all_pools {[young] 
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor] 
[14.1mb]->[18.9mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:28:33,120][WARN ][monitor.jvm  ] [es-node-2] 
[gc][old][43250][57] duration [28.3s], collections [1]/[28.3s], total 
[28.3s]/[18.4m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young] 
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor] 
[18.9mb]->[17.5mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:29:21,222][WARN ][monitor.jvm  ] [es-node-2] 
[gc][old][43251][59] duration [47.9s], collections [2]/[48.1s], total 
[47.9s]/[19.2m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young] 
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor] 
[17.5mb]->[21.2mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:30:08,916][WARN ][monitor.jvm  ] [es-node-2] 
[gc][old][43252][61] duration [47.5s], collections [2]/[47.6s], total 
[47.5s]/[20m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young] 
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor] 
[21.2mb]->[20.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:30:56,208][WARN ][monitor.jvm  ] [es-node-2] 
[gc][old][43253][63] duration [47.1s], collections [2]/[47.2s], total 
[47.1s]/[20.7m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young] 
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor] 
[20.8mb]->[24.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:32:07,013][WARN ][transport] [es-node-2] 
Received response for a request that has timed out, sent [165744ms] ago, 
timed out [8ms] ago, action [discovery/zen/fd/ping], node 
[[es-node-1][sXwCdIhSRZKq7xZ6TAQiBg][localhost][inet[xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:9300]]],
 
id [3002106]
[2014-12-02 21:36:41,880][WARN ][monitor.jvm  ] [es-node-2] 
[gc][old][43254][78] duration [5.7m], collections [15]/[5.7m], total 
[5.7m]/[26.5m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young] 
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor] 
[24.8mb]->[24.4mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}

Another part that we use a lot is the ES search, these lines show some log 
entrances that
were generated when the search was done

[2014-12-03 11:43:22 -0200] buscou pagina 1 de 111235 (10 por pagina) do 
app 61
[2014-12-03 11:44:12 -0200] buscou pagina 1 de 30628 (10 por pagina) do app

Re: Sustainable way to regularly purge deleted docs

2014-12-04 Thread Jonathan Foy

Hello

I do agree with both of you that my use of optimize as regular maintenance 
isn't the correct way to do things, but it's been the only thing that I've 
found that keeps the deleted doc count/memory under control.  I very much 
want to find something that works to avoid it.

I came to much the same conclusions that you did regarding the merge 
settings and logic.  It took a while (and eventually just reading the code) 
to find out that though dynamic, the merge settings don't actually take 
effect until a shard is moved/created (fixed in 1.4), so a lot of my early 
work thinking I'd changed settings wasn't really valid.  That said, my 
merge settings are still largely what I have listed earlier in the thread, 
though repeating them for convenience:

indices.store.throttle.type: none
index.merge.policy.reclaim_deletes_weight: 6.0 <-- This one I know is 
quite high, I kept bumping it up before I realized the changes weren't 
taking effect immediately
index.merge.policy.max_merge_at_once: 5
index.merge.policy.max_merge_at_once_explicit: 5
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merged_segment: 2gb

I DO have a mess of nested documents in the type that I know is the most 
troublesome...perhaps the merge logic doesn't take deleted nested documents 
into account when deciding what segment to merge?  Or perhaps since I have 
a small max_merged_segment, it's like Nikolas said and those max sized 
segments are just rarely reclaimed in normal operation, and so the deleted 
doc count (and the memory they take up) grows.  I don't have memory issues 
during normal merge operations, so I think I may start testing with a 
larger max segment size.

I'll let you know if I ever get it resolved.



On Wednesday, December 3, 2014 3:05:18 PM UTC-5, Govind Chandrasekhar wrote:
>
> Jonathan,
>
> Your current setup doesn't look ideal. As Nikolas pointed out, optimize 
> should be run under exceptional circumstances, not for regular maintenance. 
> That's what the merge policy setting are for, and the right settings should 
> meet your needs, atleast theoretically. That said, I can't say I've always 
> heeded this advice, since I've often resorted to using only_expunge_deletes 
> when things have gotten out of hand, because it's an easy remedy to a large 
> problem.
>
> I'm trying out a different set of settings to those Nikolas just pointed 
> out. Since my issue is OOMs when merges take place, not so much I/O, I 
> figured the issue is with one of two things:
>  1. Too many segments are being merged concurrently.
>  2. The size of the merged segments are large.
> I reduced "max_merge_at_once", but this didn't fix the issue. So it had to 
> be that the segments being merged were quite large. I noticed that my 
> largest segments often formed >50% of each shard and had upto 30% deletes, 
> and OOMs occurred since when these massive segments were being "merged" to 
> expunge deletes, since it led to the amount of data on the shard almost 
> doubling.
>
> To remedy this, I've REDUCED the size of "max_merged_segment" (I can live 
> with more segments) and reindexed all of my data (since this doesn't help 
> reduced existing large segments). If I understand merge settings correctly, 
> this means that in the worst case scenario, the amount of memory used for 
> merging will be (max_marged_segment x max_merge_at_once) GB. 
>
> Since these settings don't apply retrospectively to existing large 
> segments, I've reindexed all of my data. All of this was done in the last 
> day or so, so I've yet to see how it works out, though I'm optimistic.
>
> By the way, I believe "max_marged_segment" limits are not observed for 
> explicit optimize, so atleast in my setup, I'm going to have to shy away 
> from explicitly expunging deletes. It could be that in your case, because 
> of repeated explicit optimizes, or use of max_num_segments, coupled with 
> the fact that you have a lot of reindexing going on (that too with child 
> documents, since any change in any one of the child documents results in 
> all other child documents and the parent document being marked as deleted), 
> things have gotten particularly out of hand.
>
>
> On 3 December 2014 at 06:29, Nikolas Everett  > wrote:
>
>>
>>
>> On Wed, Dec 3, 2014 at 8:32 AM, Jonathan Foy > > wrote:
>>
>> Interesting...does the very large max_merged_segment not result in memory 
>>> issues when the largest segments are merged?  When I run my the cleanup 
>>> command (_optimize?only_expunge_deletes) I see a steep spike in memor as 
>>> each merge is completing, followed by an immediate drop, presumably as the 
>>> new segment is fully initialized and then the old ones are subsequently 
>>> dropped.  I'd be worried that I'd run out of memory when initializing the 
>>> larger segments.  That being said, I only notice the large spikes when 
>>> merging via the explicit optimize/only_expunge_deletes command, the 
>>> continuous merging throughout the day results in very

Advices on bookmarking docs

2014-12-04 Thread Roger de Cordova Farias

We have a lot of docs like this:

{
  "_type": "doc",
  "_id": "123",
  "_source": {
"parent_name": "abc"
  }
}

Each doc has only one parent_name but multiple docs can have the same
parent. It is like a many-to-one relationship, but the parent has no other
info apart of its name, so we didn't create a separate doc for them

Now we want to provide our users the option to bookmark parents so he can
later do queries on docs that are children of his bookmarked parents only.
We could easily do that with a terms filter like that:

{
  "filter": {
"terms": {
  "parent_name": [
"abc",
"def",
"ghi"
  ]
}
  }
}

We could pass to the filter all the user's bookmarked parents names that
are persisted, let's say, in a relational database.

But the problem is that we have more than 50 million docs and the user can
bookmark millions of parents. That would be too heavy to send a filter with
millions of terms in every request. So we need to handle the bookmarks
directly on ElasticSearch.

We considered using a filtered alias, so that we have that very same filter
persisted in the Elastic and we won't have to pass it in every request.
This would be already way better than passing the filter in each request,
but we want more, we want it to be very performatic. Filtering with
millions of terms would be slow, even if we don't need to send the filter
in the request

Now we decided to add in our docs a meta field with information like "who
bookmarked me", somethink like this:

{
  "_type": "doc",
  "_id": "123",
  "_source": {
"parent_name": "abc",
"bookmarked_by": [
  "roger",
  "john"
]
  }
}

Then we can use a term (term, without the "s") filter like this:

{
  "filter": {
"term": {
  "bookmarked_by": "roger"
}
  }
}

That would be  (I hope) way more performatic than our last approach, but
still has issues.

The problem we would have now is about updating bookmarks.
When the user bookmarks/un-bookmarks a parent, we can do a query for all
docs with this parent and update their "bookmarked_by" field with the user
identifier. That is ok.
But what happens when we add a new doc with a parent the user bookmarked
before?
We could query for the other docs with the same parent and copy the
bookmarked_by field to the new doc, but that is ugly.

So we concluded we need to have the bookmarked_by field centralized in a
parent doc.

We considered the following approaches:

*1 - parent-child relationship*

{
  "_type": "parent",
  "_id": "1",
  "_source": {
"bookmarked_by": [
  "roger",
  "john"
]
  }
}

{
  "_type": "child",
  "_id": "1",
  "_parent": "1",
  "_source": {}
}
{
  "_type": "child",
  "_id": "2",
  "_parent": "1",
  "_source": {}
}

Then, when user "roger" does a query on the children, the query would also
have a has_parent filter like this:

{
  "has_parent": {
"parent_type": "parent",
"filter": {
  "term": {
"bookmarked_by": "roger"
  }
}
  }
}

*2 - nested type*

{
  "_type": "parent",
  "_id": "1",
  "_source": {
"bookmarked_by": [
  "roger",
  "john"
],
"children": [
  {
"id": 1
  },
  {
"id": 2
  }
]
  }
}

Then,  when user "roger" does a query, we use a nested query to query only
the children with bookmarked parents:

{
  "nested": {
"path": "children",
"query": {
  
  "filter": {
"has_parent": {
  "parent_type": "parent",
  "filter": {
"term": {
  "bookmarked_by": "roger"
}
  }
}
  }
}
  }
}


*3 - No actual joins approach*


{
  "_type": "parent",
  "_id": "1",
  "_source": {
"name": "abc",
"bookmarked_by": [
  "roger",
  "john"
]
  }
}


{
  "_type": "child",
  "_id": "1",
  "_source": {
"parent_name": "abc",
"bookmarked_by": [
  "roger",
  "john"
]
  }
}

{
  "_type": "child",
  "_id": "2",
  "_source": {
"parent_name": "abc",
"bookmarked_by": [
  "roger",
  "john"
]
  }
}

Then, every time a parent gets updated, we query for all its children
(using the parent_name field) and update their bookmarked_by fields to
reflect the updated parent's bookmarked_by field.
And every time we add a new child doc we query for its parent and copy the
parent's bookmarked_by field to the new doc





The main problem with the first 2 approaches is the need to do join in
runtime. I didn't test them, but I think that joining with millions of docs
could be way slower than not joining at all.
Also, the nested type approach has the issue of returning the parent doc on
queries, and we need to return the matching children only.
The third approach looks to be the more performatic one, but it is almost
as ugly as not having the parent in a separate doc at all.

I may have put some wrong information here, as I didn't test every
approach. I'm only using common knowledge with some guessing, but I hope I
have described o

Re: kibana connection failed (even with http.cors.allow-origin in elasticsearch.ym)

2014-12-04 Thread Dan Langille

SOLVED.

Not sure how, but while messing around with the nginx.conf files, it 
started working.  Sorry I do not have more information which points 
directly to the solution.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/adda21ef-cda3-4b11-a772-0b2c8b1f803a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Advices on migrating 1.3.2 to 1.4.1

2014-12-04 Thread Roger de Cordova Farias

Thank you for the advice

2014-12-04 9:30 GMT-02:00 Elvar Böðvarsson :

> I upgraded our logging cluster to 1.4 without any problems.
>
> When I looked into upgrading a separate dev/test instance used for a
> different purpose I ran into problems with the plugins. If you are using
> plugins, make sure they are supported in 1.4.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1873d1cb-6f49-413d-8157-1220b64411e0%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp2533%3D%3Dr1d__%2BKgQr%2Ba66rQ%3Df4WEgMNwphAY0hu06APomqeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: downgrading from 1.4 to1.3

2014-12-04 Thread Itamar Syn-Hershko

Classic CORS error - maybe * is blocked by ES. Haven't had to deal with
this myself (yet) so can't help you here. All in all just a small rough
edge to smooth, not a clusterfuck.

A quick solution would be to install K3 as a site plugin and use it
internally (don't expose it to the web)

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 

On Thu, Dec 4, 2014 at 3:20 AM, Jack Judge  wrote:

> Well you're right there's JS errors, CORS related;
>
> XMLHttpRequest cannot load
> http://10.5.41.120:9200/logstash-2014.12.04/_search. Request header field
> Content-Type is not allowed by Access-Control-Allow-Headers.
>
> In my elasticsearch.yml I've got this on all nodes,
>
> http.cors.allow-origin: "/.*/"
> http.cors.enabled: true
>
> Which google leads me to believe should open it up for anything. K3 is
> fronted by apache and a bit more googling prompted me to add this to the
>  section of httpd.conf
>
> Header set Access-Control-Allow-Origin "*"
>
> Still getting the same errors :(
> I'm at a loss to know what else to do now.
>
>
>
> On Wednesday, 3 December 2014 15:48:28 UTC-8, Itamar Syn-Hershko wrote:
>>
>> I'm not aware of compat issues with K3 and ES 1.4 other than
>> https://github.com/elasticsearch/kibana/issues/1637 . I'd check for
>> javascript errors, and try to see what's going on under the hood, really.
>> When you have more data about this, you can either quickly resolve, or open
>> a concrete bug :)
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/19d57d39-b764-493c-bd60-c8ae3aff087a%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsXoVG%3DqYm8_0RC0H1%3DtWQfBtXhOScv6bjwKN9winF4cQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ES 1.0.3 CPU usage drastically increased

2014-12-04 Thread Dunaeth

Hi,

We're running a two-nodes ES 1.0.3 cluster with the current setup :

VM on host A :
4 vCore CPU
32GB RAM
ES master (only node being queried)
MySQL slave (used as a backup, never queried)
JVM settings

/usr/lib/jvm/java-7-openjdk-amd64//bin/java -Xms2g -Xmx2g -Xss256k 
-Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch 
-Des.pidfile=/var/run/elasticsearch.pid 
-Des.path.home=/usr/share/elasticsearch -cp 
:/usr/share/elasticsearch/lib/elasticsearch-1.0.3.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
 
-Des.default.config=/etc/elasticsearch/elasticsearch.yml 
-Des.default.path.home=/usr/share/elasticsearch 
-Des.default.path.logs=/home/log/elasticsearch 
-Des.default.path.data=/home/elasticsearch 
-Des.default.path.work=/tmp/elasticsearch 
-Des.default.path.conf=/etc/elasticsearch 
org.elasticsearch.bootstrap.Elasticsearch


VM on host B :
2 vCore CPU
16GB RAM
ES datanode (search are dispatched, no indexing)
MySQL master
JVM settings

/usr/lib/jvm/java-7-openjdk-amd64//bin/java -Xms2g -Xmx2g -Xss256k 
-Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch 
-Des.pidfile=/var/run/elasticsearch.pid 
-Des.path.home=/usr/share/elasticsearch -cp 
:/usr/share/elasticsearch/lib/elasticsearch-1.0.3.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
 
-Des.default.config=/etc/elasticsearch/elasticsearch.yml 
-Des.default.path.home=/usr/share/elasticsearch 
-Des.default.path.logs=/home/log/elasticsearch 
-Des.default.path.data=/home/elasticsearch 
-Des.default.path.work=/tmp/elasticsearch 
-Des.default.path.conf=/etc/elasticsearch 
org.elasticsearch.bootstrap.Elasticsearch

---

Before we got a 4 vCore CPU / 32GB VM, the master node was the same as the 
secondary node.
On this cluster, we have a 5 shards (+5 replica) index - we'll call it main 
- with ~130k documents at the moment for a 120MB size which, we update 
documents that were updated by our customers in our application with a cron 
than run every 5 minutes and updates at most 2k docs, we can have a few 
thousands docs in queue.
We are also using logstash to log some user actions that our application 
relies on in monthly basis indices. Those indices have 2 shards (+2 
replica) with 1~6M docs for a size from 380MB to 1.5GB. At the moment, we 
have 11 log indices.
We do search queries on both the main and the latest log indices. 
Marginally, some queries can occure on older log indices.
Looking at our stats, I'd say we have a 2:1 indexing / searching ratio, but 
it can vary depending on seasonality.
We also have a 1 shard (+1 replica) dedicated percolator index on which 
we're executing percolation queries before each log that will be indexed in 
ES through logstash.
We never optimized any index.

Our issue :

Since we updated ES to v1.0.3 to deal with a field data breaker bug, 
everything was running fine until we experienced a drastic CPU usage 
increase(from near 100% to 200%) without any reason (no change in our 
application nor on the traffic we got). No ES restart have been able to 
give us back a normal CPU usage. In emergency, we decided to switch our 
main node from 2 vCore CPU/ 16GB to 4 vCore CPU / 32 GB and the CPU usage 
of the new node never went beyond 30% for almost 10 days. And then the 
issue happened again, the CPU usage raised to 400% without any reason.

It is worth noting that the secondary node is not subject to this issue.

Our outsourcer told us this CPU increase was due to deadlocks caused by 
malformed queries, but those malformed queries already happened before and 
restarting ES didn't solve the high CPU usage.
He also told us our server had not enough resources and it would be better 
to have 2 serveurs for MySQL master / slave and 2 to 3 distinct serveurs 
for the ES cluster, which seems weird when we saw the main ES server had a 
maximum 30% CPU usage for days.

We plan to update ES version to see if this issue is a bug that was already 
solved, but are there other things we could try ? I wondered if our jvm 
heap was enough since we have many data, that we use many filters in our 
search queries and that we have more than 8GB unused memory on our main 
node. Does the fact that our secondary node is not subject to this issue 
says it's an indexing issue ?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/59e995ba-cf68-4fb6-848a-d7e4f0c8bb99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

kibana connection failed (even with http.cors.allow-origin in elasticsearch.ym)

2014-12-04 Thread Dan Langille

I'm running:

elasticsearch-1.4.0_1
logstash-1.4.2_1
kibana-3.1.1
on FreeBSD 9.3

On initial setup, the prebuilt dashboard: (Logstash Dashboard) at 
/index.html#/dashboard/file/logstash.json worked

I added more panels from github.  The Logstash Dashboard still worked. 
 Then it didn't.

screen shot: http://img.ly/images/8746523/full

I have confirmed elasticsearch is running by browsing to port 9200 and 
checking with ps.  Yes, it's running.

Next I tried changes to configuration:

$ sudo grep cors /usr/local/etc/elasticsearch/elasticsearch.yml
http.cors.allow-origin: "/.*/"
http.cors.enabled: true

I also tried the FQDN of the host, but the results were the same.

No errors are seen in /var/log/elasticsearch/elasticsearch.log when the 
above message is produced.

Google searching led me to the above suggestions.  Now I'm at a loss as to 
what to try next.

Suggestions please?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/902f8df3-c5ae-442c-b83e-4a7a649c2f41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Differences between _source in Kibana and in elasticsearch

2014-12-04 Thread Stefan Meisner Larsen

Hi,

I use logstash's syslog plugin to collect logs, searching elastic search 
and kibana for the same object gives different results in the _source 
field...

Elasticsearch version 1.4.0, Kibana 4.0.0-BETA2

When querying elasticsearch with curl I get:

curl -XGET http://localhost:9200/logstash*/_search?pretty
stml@riakcs:~/work/java/elasticsearch/data/stml_elasticsearch/nodes/0/indices$ 
curl -XGET 
'http://localhost:9200/logstash*/_search?pretty&q=_id:AUoVYl3Ayvv7Nc0uRA6X'
{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
  },
  "hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
  "_index" : "logstash-2014.12.04",
  "_type" : "syslog",
  "_id" : "AUoVYl3Ayvv7Nc0uRA6X",
  "_score" : 1.0,
  "_source":{"message":"pam_authenticate: Authentication 
failure","@version":"1","@timestamp":"2014-12-04T12:59:35.000Z","type":"syslog","host":"0:0:0:0:0:0:0:1","priority":83,"timestamp":"Dec
  
4 
13:59:35","logsource":"riakcs","program":"su","pid":"15292","severity":3,"facility":10,"facility_label":"security/authorization","severity_label":"Error"}
} ]
  }
}


But in Kibana I get:


@timestamp   December 4th 2014, 13:59:35.000  @version   1  _id   
AUoVYl3Ayvv7Nc0uRA6X  _index   logstash-2014.12.04  _source   
{"message":"pam_authenticate: 
Authentication 
failure","@version":"1","@timestamp":"2014-12-04T12:59:35.000Z","type":"syslog","host":"0:0:0:0:0:0:0:1"}
  
_type   syslog  host   0:0:0:0:0:0:0:1  message   pam_authenticate: 
Authentication failure  type   syslog

Missing a lot of fields in _source...

I would have expected these views of the same field to be alike...have I 
misunderstood something

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/25a96d8d-6e51-4e48-8294-14bd9b52be34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: geo_point mapped field is returned as string in java api

2014-12-04 Thread Nicholas Knize

Can you post a code example from your use case for how you're inserting, 
retrieving, and reading the documents?

- Nick

On Tuesday, December 2, 2014 2:16:30 PM UTC-6, T Vinod Gupta wrote:
>
> has anyone seen this problem? my mapping says that the field is of type 
> geo_point. but when i read documents using java api and get the sourcemap, 
> the type of the field is string and i can't cast it to a GeoPoint.
>
> ...
>   "lat_lng" : {
> "type" : "geo_point",
> "lat_lon" : true
>   },
> ..
>
> thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5ed03a4-eaad-491b-9856-087ae3ab2587%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

MapperParsingException

2014-12-04 Thread Stefan

Hey! 
I am quite new to the ELK-Stack but what I have seen so far is awesome even 
if it's sometimes challenging for me! :D 

But I am facing a huge problem right now. The loglines often contain 
xml/soap messages and i want to filter them using the xml filter. 

When doing so I sometimes get the following errormessage 
"org.elasticsearch.index.mapper.MapperParsingException: object mapping 
[someId] tr ying to serialize a value with no field associated with it, 
current value [ABC-A3C-XYZ]. 

Another example (the errormessage from the es log):

[2014-12-04 15:07:00,312][DEBUG][action.bulk  ] [Sludge] [
logstash-2014.12.04][0] failed to execute bulk item (index) index {[logstash
-2014.12.04][tomcat][Rg37kc1TQ8OD4jvhQwX0PQ], source[{"message":"01.12.2014 
11:10:03.117 [INFO ] [.60-6180-exec-3] [abc/stgService] [x.y.z.stgService] 
- Inbound Message  ID: 845 Address: 
http://localhost:1234/test/testservice Encoding: ISO-8859-1 Http-Method: 
POST Content-Type: text/xml Headers: {connection=[close], 
Content-Length=[477], content-type=[text/xml]} Payload:  http://schemas.xmlsoap.org/soap/envelope/\"; 
xmlns:ser=\"http://service.sc.test.testcompany.com/\";>AUK-CZX-U6Q1
 

89NGN-6067
 
-- ","@version":"1","@timestamp":
"2014-12-04T14:06:59.341Z","type":"tomcat","host":"FERI-PC-7-64","path":
"C:\\logstash-1.4.2\\bin\\hibernate.log","tags":["multiline","gsubed",
"gsubed","Inbound Message"],"timestamp":"01.12.2014 11:10:03.117","loglevel"
:"[INFO ]","thread":"[.60-6180-exec-3]","service":"[abc/stgService]",
"javaclass":"[x.y.z.stgService]","messagetype":"Inbound Message",
"trennzeichen":"","messageid":"ID: 
845","url":"Address: 
http://localhost:1234/test/testservice","codierung":"Encoding: ISO-8859-1",
"httpmethode":"Http-Method: POST","contenttype":"Content-Type: text/xml",
"headers":"Headers: {connection=[close], Content-Length=[477], 
content-type=[text/xml]}","xmlmsg":" http://schemas.xmlsoap.org/soap/envelope/\"; 
xmlns:ser=\"http://service.sc.test.testcompany.com/\";>AUK-CZX-U6Q1
 

89NGN-6067"
,"greedydata":"-- ","parsedxml":{
"xmlns:soapenv":"http://schemas.xmlsoap.org/soap/envelope/","xmlns:ser":
"http://service.sc.test.testcompany.com/","Body":[{"erzeugeServicename":[{
"ServicenameRequest":[{"ID":["AUK-CZX-U6Q"],"sbSOS":["1 89"],
"sbServicename":["NGN-6067"],"bbSOS":[{}],"bbServicename":[{}]}]}]}]}}]}
org.elasticsearch.index.mapper.MapperParsingException: object mapping [sbSOS
] trying to serialize a value with no field associated with it, current 
value [1 89]
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(
ObjectMapper.java:703)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper
.java:500)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(
ObjectMapper.java:707)
at org.elasticsearch.index.mapper.object.ObjectMapper.
serializeNonDynamicArray(ObjectMapper.java:696)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(
ObjectMapper.java:605)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper
.java:492)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(
ObjectMapper.java:555)
at org.elasticsearch.index.mapper.object.ObjectMapper.
serializeNonDynamicArray(ObjectMapper.java:686)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(
ObjectMapper.java:605)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper
.java:492)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(
ObjectMapper.java:555)
at org.elasticsearch.index.mapper.object.ObjectMapper.
serializeNonDynamicArray(ObjectMapper.java:686)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(
ObjectMapper.java:605)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper
.java:492)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(
ObjectMapper.java:555)
at org.elasticsearch.index.mapper.object.ObjectMapper.
serializeNonDynamicArray(ObjectMapper.java:686)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(
ObjectMapper.java:605)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper
.java:492)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(
ObjectMapper.java:555)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper
.java:490)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.
java:541)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.
java:490)
at org.elasticsearch.index.shard.service.InternalIndexShard.
prepareCreate(InternalIndexShard.java:392)
at org.elasticsearch.action.bulk.TransportShardBulkAction.
shardIndexOperation(TransportShardBulkAction.java:

delete by query api - ClusterBlockException[blocked by: [FORBIDDEN/8/index write (api)];] status": 403

2014-12-04 Thread pmiles . mail

Hi,

I'm trying to delete data by query from our elastic search cluster. If i 
run the delete on the current active index it works fine.

However if i try to run the delete against an older index ( which is still 
open ) I get an exception.

{
"error": "ClusterBlockException[blocked by: [FORBIDDEN/8/index write 
(api)];]",
"status": 403
}


The query is http://1.2.3.4:9200/graylog2_102/message/_query

{
  "size": 3,
"query": {
"match": {"message": "apples"}
 }}


Can you advise on how to delete these specifc 'apple' messages without 
deleting the whole index ?

Thanks

Paul.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e1807c73-3edd-4a86-b419-2197a0d95e30%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: When a master node goes down, how the client query works?

With Java client, you have not to worry about that. You have either
multiple node connections or explicitly chosen the nodes to connect to
(transport client).

With HTTP client, the official Elasticsearch clients provide methods to
connect to multiple hosts. If one host goes down, the next one is chosen.

Jörg

On Wed, Dec 3, 2014 at 5:50 PM, Aaron  wrote:

> This is a newbie question about how the cluster works. I try to find the
> answer from the group, but seems not exact the same question I have.
> By reading the guide from elasticsearch.com, I understand when a master
> node goes down, a new master node will be elected automatically. However, a
> client does not know that and he still tries to query the old master node.
> I was wondering what the result will be.
>
> Assume I already have data indexed into 5 nodes, n1 to n5, where n1 is the
> original master node. so when the client queries, the client do something
> like:  curl -XPOST "http://n1:9200/movies/movie/_search?q=*:Godfather";,
> life is good so far.
>
> When n1 node goes down, assume n2 becomes the new master node. Since the
> client does no know n1 is down, he still submits the same query to n1, what
> result will be returned?
>
> Should a client query the cluster instead of querying a master node? How
> to submit a query to the cluster?
>
> Thanks a lot in advance!
>
> Aaron
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7b963b8c-625d-48eb-a75e-05a8ee4b53b3%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG3%2BJTkk7CaHKRqV%2BtSpUzshRC0Fv5y3dLPn%3DLr1Lq8hA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: what would be the effect of using an arbitrary large count

2014-12-04 Thread kazoompa

I am still not expert an in ES but surely when not paging the process time 
will be higher because more documents have to be brought back in the 
response. However, depending on what kind of queries you perform, the 
subsequent queries will be faster. I am thinking of *filter bool queries* 
with caching and bitset query representation. I doubt that a size of *1m* 
will have any effect if there are only 100 documents indexed.

I wouldn't advise removing the paging for the sake of bandwidth and network 
traffic especially if you have many clients using the same server, may be a 
very optimized server setup can help.

We do a full query only when doing statistical calculation requiring all 
documents.

On Wednesday, December 3, 2014 11:40:05 PM UTC-5, Ron Sher wrote:
>
> Hi 
> We have a multi tenant SaaS solution and we expose our use of 
> elasticsearch through an API. We noticed that clients who use the API 
> directly, and not through our front-end, don't bother with paging and just 
> use a count of 500k or 1m even if the actual count is much lower. I was 
> wondering about the effect of not limiting the count since we're thinking 
> of making a breaking change and limit this to 100. 
>
> Thanks for your help 
> Ron

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/51e07a71-3649-4e99-9217-21b0f32b20b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch not assigning replica shards and not reallocating

2014-12-04 Thread Sebastián Schepens

Nevermind, i think i fixed it, it seems somewhere between 1.3.2 and 1.4.0 
config settingts for watermark changed, i changed this:
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%

for:
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%

And it worked!


On Thursday, December 4, 2014 10:27:07 AM UTC-3, Sebastián Schepens wrote:
>
> I have 25 nodes and 12 java clients that bulk index.
> This is the command i'm using:
>
> curl -XPUT localhost:9200/_cluster/settings -d '{
> "transient" : {
> "cluster.routing.allocation.exclude._host" : "HOST"
> }
> }'
>
>
> Got any idea why replica shards are not being allocated?
>
> This is my config file:
> indices.fielddata.cache.size: 30%
> indices.fielddata.cache.expire: 10m
> indices.breaker.fielddata.limit: 60%
>
> cluster.name: elasticsearch
> action.auto_create_index: true
>
> cluster.routing.allocation.disk.threshold_enabled: true
> cluster.routing.allocation.disk.watermark.low: .85
> cluster.routing.allocation.disk.watermark.high: .90
>
> node.name: "##HOSTNAME##"
> node.master: false
> node.data: true
> node.client: false
>
> index.number_of_shards: 5
> index.number_of_replicas: 1
>
> bootstrap.mlockall: true
>
> transport.tcp.compress: true
> http.port: 8080
>
> gateway.type: local
> gateway.recover_after_nodes: 2
> gateway.recover_after_time: 1m
> gateway.expected_nodes: ##SERVERS_COUNT##
>
> cluster.routing.allocation.node_initial_primaries_recoveries: 20
> cluster.routing.allocation.node_concurrent_recoveries: 10
> indices.recovery.max_bytes_per_sec: 50mb
> indices.recovery.concurrent_streams: 5
>
> discovery.zen.minimum_master_nodes: 2
> discovery.zen.ping.timeout: 10s
> discovery.zen.ping.multicast.enabled: false
> discovery.zen.ping.unicast.hosts: ##MASTER_NODES##
>
> processors: 16
> index.mapper.dynamic: true
> action.disable_delete_all_indices: true
> index.refresh_internal: 10s
> cluster.routing.allocation.cluster_concurrent_rebalance: 20
> script.disable_dynamic: true
> http.compression: true
>
> Thanks,
> Sebastian
>
>
> On Thursday, December 4, 2014 1:00:24 AM UTC-3, Mark Walkom wrote:
>>
>> How many nodes do you have?
>>
>> Can you provide the command you are sending ES to set the exclude?
>>
>> On 4 December 2014 at 13:46, Sebastián Schepens <
>> sebastian...@mercadolibre.com> wrote:
>>
>>> I have an issue with elasticsearch version 1.4.1, i had indexes without 
>>> replicas and i recently i added 1 replica to all indexes and they are not 
>>> getting allocated.
>>> I also have an issue when using cluster setting to exclude a node from 
>>> allocation, it just doesnt work, data is not moved from the node, this used 
>>> to work on 1.3.2.
>>>
>>> I don't know what could be wrong and i dont see anything meaningful in 
>>> logs.
>>>
>>> Can someone help me?
>>>
>>> Thanks,
>>> Sebastian
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/51d4d2f5-026d-43e2-9087-927ba7279067%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b40d0fb2-e1d7-4686-be2d-c6ad5c6ea62d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: data visualization with elasticsearch aggregations and d3 explaining

2014-12-04 Thread kazoompa

You really need to go over the query documentation first, there are plenty 
of good examples and tutorials that will guide you and help you understand 
your snippet.

I gather you understand the query part, the aggregation part (there are 
lots of docs) shows the frequency (count) of touchdowns on field "qtr".

You really need to dive into the docs :)



On Thursday, December 4, 2014 2:00:56 AM UTC-5, Mohd Syafiq wrote:
>
> Hi guys, i want to build graph using d3 and query from elassticsearch, but 
> anyone here tell me what exactly mean by this code ? 
>
>
>  client.search({
> index: 'nfl',
> size: 5,
> body: {
> // Begin query.
> query: {
> // Boolean query for matching and excluding items.
> bool: {
> must: {
>  match: { 
> "description": "TOUCHDOWN" 
>   // "season": "2012"
> }
> },
> must_not: { match: { "qtr": 5 }}
> }
> },
> // Aggregate on the results
> aggs: {
> touchdowns: {
> terms: {
> field: "qtr",
> order: { "_term" : "asc" }
> }
> }
> }
> // End query.
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/230e182a-529e-4343-8cb6-3eb7a39f2fa7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch not assigning replica shards and not reallocating

2014-12-04 Thread Sebastián Schepens

I have 25 nodes and 12 java clients that bulk index.
This is the command i'm using:

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._host" : "HOST"
}
}'


Got any idea why replica shards are not being allocated?

This is my config file:
indices.fielddata.cache.size: 30%
indices.fielddata.cache.expire: 10m
indices.breaker.fielddata.limit: 60%

cluster.name: elasticsearch
action.auto_create_index: true

cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: .85
cluster.routing.allocation.disk.watermark.high: .90

node.name: "##HOSTNAME##"
node.master: false
node.data: true
node.client: false

index.number_of_shards: 5
index.number_of_replicas: 1

bootstrap.mlockall: true

transport.tcp.compress: true
http.port: 8080

gateway.type: local
gateway.recover_after_nodes: 2
gateway.recover_after_time: 1m
gateway.expected_nodes: ##SERVERS_COUNT##

cluster.routing.allocation.node_initial_primaries_recoveries: 20
cluster.routing.allocation.node_concurrent_recoveries: 10
indices.recovery.max_bytes_per_sec: 50mb
indices.recovery.concurrent_streams: 5

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 10s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ##MASTER_NODES##

processors: 16
index.mapper.dynamic: true
action.disable_delete_all_indices: true
index.refresh_internal: 10s
cluster.routing.allocation.cluster_concurrent_rebalance: 20
script.disable_dynamic: true
http.compression: true

Thanks,
Sebastian


On Thursday, December 4, 2014 1:00:24 AM UTC-3, Mark Walkom wrote:
>
> How many nodes do you have?
>
> Can you provide the command you are sending ES to set the exclude?
>
> On 4 December 2014 at 13:46, Sebastián Schepens <
> sebastian...@mercadolibre.com > wrote:
>
>> I have an issue with elasticsearch version 1.4.1, i had indexes without 
>> replicas and i recently i added 1 replica to all indexes and they are not 
>> getting allocated.
>> I also have an issue when using cluster setting to exclude a node from 
>> allocation, it just doesnt work, data is not moved from the node, this used 
>> to work on 1.3.2.
>>
>> I don't know what could be wrong and i dont see anything meaningful in 
>> logs.
>>
>> Can someone help me?
>>
>> Thanks,
>> Sebastian
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/51d4d2f5-026d-43e2-9087-927ba7279067%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/18fb961b-a443-489c-9d97-015e351728ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: term suggester : strange results.

2014-12-04 Thread DH

Yes ! Thank a lot, "mode : always" did the trick. 

Now, I just have some strange frequency numbers .. suggesters boasts 203 
"tomato", whereas a query only return 97.
The frequencies are less important, so I guess I'll be able to live with 
that.

Thank again

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8e58d7a1-5cac-4535-afb9-fe58be574aa5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: What is elastic search bounded by? Is it cpu, memory etc

Why do you set bulk indexing queue size to 3000?

Why do you limit field data cache to 25%?

What documents are in the index?

How do your queries look like?

Jörg

On Tue, Dec 2, 2014 at 1:06 PM, rmadd  wrote:

> I am running elastic search in my personal box.
>
> Memory: 6GB
> Processor: Intel® Core™ i3-3120M CPU @ 2.50GHz × 4
> OS: Ubuntu 12.04 - 64-bit
>
> *ElasticSearch* Settings: Only running locally
> Version : 1.2.2
> ES_MIN_MEM=3g
> ES_MAX_MEM=3g
> threadpool.bulk.queue_size: 3000
> indices.fielddata.cache.size: 25%
> http.compression: true
> bootstrap.mlockall: true
> script.disable_dynamic: true
> cluster.name: elasticsearch
>
> *Scenario*: I am trying to test the performance of my bulk
> queries/aggregations. The test case is to run asynchronous http requests to
> node.js which in turn will call elastic search. The tests are running from
> a
> Java method. Started with 50 requests at a time. Each request is divided
> and
> parallized in to two asynchronous(async.parallel) bulk queries in node.js.
> I
> am using  node-elasticsearch 
> api (uses elasticsearch 1.3 api). The two bulk queries contain 13 and 10
> queries respectively.And the two are asynchronously sent to elastic search
> from node.js. When the Elastic Search returns, the query results are
> combined and sent back to the test case.
>
> *Observations*: I see that all the cpu cores are utilized 100%. Memory is
> utilized around 90%. The response time for all 50 requests combined is 30
> seconds. If I run just the single queries each alone, in the bulk queries,
> each are returning in less than 100 milli-seconds. Node.js is taking
> negligible time to forward requests to elastic search and combine responses
> from elastic search. Even if run the test case synchronously from java, the
> response time does not change. I may say that elastic search is not doing
> parallel processing. Is this because I am CPU or memory bound?
>
> *Questions*: Is this expected, because I am running Elastic Search in my
> local machine and not in a cluster? Can I improve my performance in my
> local
> machine? I would definitely start a cluster. But I want to know, how to
> improve the performance scalably. What is it that the elastic search is
> bound to?
>
> I am not able to find this in forums. And am sure this would help others.
> Thanks for your help.
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/What-is-elastic-search-bounded-by-Is-it-cpu-memory-etc-tp4067016.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1417522017795-4067016.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG2yVtE1LrmRg5XDU5ZfLbU6z7ztcX_o3nLPbkhZBgY6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: When a master node goes down, how the client query works?

Two options

1. Have a client instance of elasticsearch on a different server or on the 
same server that does the query. That node must be set to master=false and 
data=false. Being a member of the cluster means you know where the data is.
2. Use a http reverse proxy that connects to all the nodes in the cluster, 
if http 9200 is unavailable on one node then traffic is sent to the other 
nodes.

Best option is to combine the two. Have two elasticsearch nodes as client 
members of the cluster, install a reverse proxy on those two and 
loadbalance between them on the IP level with a solution of choice.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7e0c1382-399f-4a0b-ba77-59fa64c9ac2c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What is elastic search bounded by? Is it cpu, memory etc

2014-12-04 Thread rmadd

I am running elastic search in my personal box.

Memory: 6GB
Processor: Intel® Core™ i3-3120M CPU @ 2.50GHz × 4
OS: Ubuntu 12.04 - 64-bit

*ElasticSearch* Settings: Only running locally
Version : 1.2.2
ES_MIN_MEM=3g
ES_MAX_MEM=3g
threadpool.bulk.queue_size: 3000
indices.fielddata.cache.size: 25%
http.compression: true
bootstrap.mlockall: true
script.disable_dynamic: true
cluster.name: elasticsearch

*Scenario*: I am trying to test the performance of my bulk
queries/aggregations. The test case is to run asynchronous http requests to
node.js which in turn will call elastic search. The tests are running from a
Java method. Started with 50 requests at a time. Each request is divided and
parallized in to two asynchronous(async.parallel) bulk queries in node.js. I
am using node-elasticsearch
api (uses elasticsearch 1.3 api). The two bulk queries contain 13 and 10
queries respectively.And the two are asynchronously sent to elastic search
from node.js. When the Elastic Search returns, the query results are
combined and sent back to the test case.

*Observations*: I see that all the cpu cores are utilized 100%. Memory is
utilized around 90%. The response time for all 50 requests combined is 30
seconds. If I run just the single queries each alone, in the bulk queries,
each are returning in less than 100 milli-seconds. Node.js is taking
negligible time to forward requests to elastic search and combine responses
from elastic search. Even if run the test case synchronously from java, the
response time does not change. I may say that elastic search is not doing
parallel processing. Is this because I am CPU or memory bound?

*Questions*: Is this expected, because I am running Elastic Search in my
local machine and not in a cluster? Can I improve my performance in my local
machine? I would definitely start a cluster. But I want to know, how to
improve the performance scalably. What is it that the elastic search is
bound to?

I am not able to find this in forums. And am sure this would help others.
Thanks for your help.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/What-is-elastic-search-bounded-by-Is-it-cpu-memory-etc-tp4067016.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1417522017795-4067016.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: term suggester : strange results.

2014-12-04 Thread Nikolas Everett

On Thu, Dec 4, 2014 at 7:27 AM, DH  wrote:

Hi, everyone,
>
> I'm trying to figure put some discrepencies (I think) in the results of my
> suggesters, with ES V0.90.5.
>
> My indices are big and can contain a wide array of language.
> when I do this (NB : tomate is the french for tomato)
>   ;
>
> {
>   "query": {
> "match_all": {}
>   },
>   "suggest": {
> "my_suggester": {
>   "text": "tomate",
>   "term": {
> "field": "my_field"
>   }
> }
>   }
> }
>
> Es doesn't suggest "tomato". And the terms it suggest are rather
> low-scoring (it even suggests "tote", with a measly 0.5 score)
>
> however, if I do this :
>
> {
>   "query": {
> "match_all": {}
>   },
>   "suggest": {
> "my_suggester": {
>   "text": "tomato",
>   "term": {
> "field": "my_field"
>   }
> }
>   }
> }
>
> Es suggests "tomate", along with a buch of lower scoring terms.
>
> As far as I undestand, tomate is as close to tomato  as tomato is close to
> tomate, and thus ES should suggest tomato when I'm asking for tomate.
> I'm positive that the two terms are present in the indices; and brought
> back by my request, so that would not be the issue.
>
> So, I wonder ..
> Is there something I did not understand regarding suggesters?
> Is that behaviour normal?
> Is it due to the older version of ES I'm still using?
>
> If anyone using suggesters would help me make sense of this, that'd be
> helpful.
> Thanks
>


Try setting these:
"suggest_mode": "always",
"size": 10,
"max_term_freq": 0.65

and then backing off from there.  Your suggestion should come back with all
the settings and you'll learn more about what is going on.

Depending on how the documents are sharded you can see changes in
behavior.  Its more likely when you have fewer documents.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0rdwOoSBsSdVMuq_P7ntqn%2B7YOOZpD_PyFcqg4%3D99vbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

jvm.dll crashes

2014-12-04 Thread Vinayak Bhosale

I have a test cluster with three nodes. Each node has 4 gigs of ram. What I 
see is, jvm on each nodes crashes once every day. Following is the event 
log message:

Faulting application name: elasticsearch-service-x64.exe, version: 
1.0.15.0, time stamp: 0x51543b9d
Faulting module name: jvm.dll, version: 21.0.0.17, time stamp: 0x4e08569d
Exception code: 0xc005
Fault offset: 0x001a6318
Faulting process id: 0xe8c
Faulting application start time: 0x01d00c6f04806f74
Faulting application path: 
G:\elasticsearch-1.1.0\bin\elasticsearch-service-x64.exe
Faulting module path: C:\Program Files\Java\jre7\bin\server\jvm.dll
Report Id: 9af4e958-7939-11e4-80c0-000d3a108017
Faulting package full name: 
Faulting package-relative application ID: 

Most of the settings (heap size etc) are defaults on all nodes. Can anyone 
please suggest a solution to this issue? We are on version 1.1. Is this 
issue resolved in later versions?


Thanks
Vinayak

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5eef415f-49b2-4fda-bf7e-ab502405948b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Progress on Hive "Push Down Filtering"

2014-12-04 Thread Costin Leau

Sorry I missed the other thread; I'll respond here.

Yes, that's in the pipeline - see issue #276.

As you pointed out with push down this could potentially be done 
automatically...

Cheers,

P.S. Thanks for the kind words. If you encounter issues/bug or have 
suggestions, please keep the feedback coming.

On 12/4/14 12:06 PM, James Andrew-Smith wrote:

Hi Costin,

Thank you for the rapid response - just wanted to say I appreciate the Hadoop 
install works so easily just as advertised.

Shame about the push down filter but this is what I expected.

I'll focus on keeping the projection as lightweight as possible - on that note 
- I started another thread
(https://groups.google.com/forum/m/#!topic/elasticsearch/-3Lbdw5Wigg
) about 
using aggregation queries (which I am a
huge fan of) via Hive. Is this in possible/in the pipeline?

Cheers
James

On Thursday, 4 December 2014 20:04:17 UTC+11, Costin Leau wrote:

Hi,

There are two aspects when dealing with large tables.

1. Projection

The table mapping/definition is necessary as it indicates what information 
is needed - a small mapping excludes a
lot of
unnecessary data.

2. Push Down filtering

Unfortunately there hasn't been much happening on this front since the 
functionality is fairly restricted and not
really
pluggable especially when
dealing with non-HDFS resources. The ORC support has improved things a bit 
however it's still early days...

Cheers,

On 12/4/14 7:41 AM, James Andrew-Smith wrote:
> Has there been any progress on the "Push Down Filtering" mentioned by 
Costin?
> 
(http://ryrobes.com/systems/connecting-tableau-to-elasticsearch-read-how-to-query-elasticsearch-with-hive-sql-and-hadoop/#comment-1169375542

)

>
>
> Right now I am working around this by creating a lot of specific table 
mappings to maintain performance.
>
> --
> You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
email to
>elasticsearc...@googlegroups.com  
.
> To view this discussion on the web visit

>https://groups.google.com/d/msgid/elasticsearch/bd8151a0-af89-4617-9efe-aa738e70862c%40googlegroups.com

> 
>.

> For more options, visithttps://groups.google.com/d/optout 
.

--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/612d65f5-90f9-4ac5-93ca-5cf8fe6fb03f%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54805358.5020208%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

term suggester : strange results.

2014-12-04 Thread DH

Hi, everyone,

I'm trying to figure put some discrepencies (I think) in the results of my 
suggesters, with ES V0.90.5.

My indices are big and can contain a wide array of language.
when I do this (NB : tomate is the french for tomato)
  ;

{
  "query": {
"match_all": {}
  },
  "suggest": {
"my_suggester": {
  "text": "tomate",   
  "term": {
"field": "my_field"
  }
}
  }
}

Es doesn't suggest "tomato". And the terms it suggest are rather 
low-scoring (it even suggests "tote", with a measly 0.5 score)

however, if I do this :

{
  "query": {
"match_all": {}
  },
  "suggest": {
"my_suggester": {
  "text": "tomato",   
  "term": {
"field": "my_field"
  }
}
  }
}

Es suggests "tomate", along with a buch of lower scoring terms. 

As far as I undestand, tomate is as close to tomato  as tomato is close to 
tomate, and thus ES should suggest tomato when I'm asking for tomate.
I'm positive that the two terms are present in the indices; and brought 
back by my request, so that would not be the issue.

So, I wonder ..
Is there something I did not understand regarding suggesters?
Is that behaviour normal?
Is it due to the older version of ES I'm still using?

If anyone using suggesters would help me make sense of this, that'd be 
helpful.
Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4beac46-8520-472d-9acc-fdaa02dd5d4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to Upgrade from 1.1.1 to 1.2.2 in a windows enviroment (as windows service)

2014-12-04 Thread Costin Leau

The version is used not just in the title and description but also in the
starting script (to setup the classpath and such).
This is done on purpose, to make sure the incorrect version is not loaded by
accident (path rename, reinstall, etc...).
You could use sym links but then you'd have to re-edit the startup scripts as
well. It's simpler to reinstall the service:

service remove

service install

On 12/3/14 7:26 PM, Steve Camire wrote:

Would setting up a symbolic link still necessitate re-installing the Windows
service after each upgrade? I noticed that
the service, when installed, contains version-specific information in places
such as the display name and description.

IE:
Elasticsearch 1.4.1 (node-01)
Elasticsearch 1.4.1 Windows Service - http://elasticsearch.org

Is there anything else (and possibly more important) that would make the
Windows service more version-specific and
necessitate re-installing it after every upgrade?

On Thursday, July 17, 2014 3:51:31 AM UTC-4, Costin Leau wrote:

Hi,

Remove the old service (service remove) then install it again using the new
path.
Going forward you might want to look into using file-system links (which
Windows Vista + supports) so that you can make
an alias
to the folder, install the service for it and reuse that across installs.
That is, install under c:\elasticsearch\current (which can point to 1.1.0,
then 1.1.1, 1.2.2, etc...) while your
service
points to
\current.
Of course, you need to check whether each version introduces some changes
into the init/stop script (does happen though
rarely)
and use that.

Cheers,
--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
elasticsearch+unsubscr...@googlegroups.com
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8d94910a-6003-475e-bf9c-d670db61fa59%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5480525A.6070800%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to Upgrade from 1.1.1 to 1.2.2 in a windows enviroment (as windows service)

Use NSSM ( http://nssm.cc/ ) to create the service instead.

Organize your folders like this

C:\Elasticsearch\
C:\Elasticsearch\nssm.exe
C:\Elasticsearch\elasticsearch.bat
C:\Elasticsearch\elasticsearch-1.1.0\
C:\Elasticsearch\elasticsearch-1.2.2\
C:\Elasticsearch\data
C:\Elasticsearch\logs
C:\Elasticsearch\the other folders

in elasticsearch.bat only add the full path to the elasticsearch.bat that 
is included with each version, for example 
C:\Elasticsearch\elasticsearch-1.1.0\bin\elasticsearch.bat

Make sure your config has the folders C:\Elasticsearch\data defined

When upgrading, just add the new version in a folder, modify the bat in the 
root and restart the service.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/02085eb9-68a5-4d9d-8538-3504822a10c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Advices on migrating 1.3.2 to 1.4.1

I upgraded our logging cluster to 1.4 without any problems.

When I looked into upgrading a separate dev/test instance used for a 
different purpose I ran into problems with the plugins. If you are using 
plugins, make sure they are supported in 1.4.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1873d1cb-6f49-413d-8157-1220b64411e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch hardware planning

I am preparing proposals on hardware for our Elasticsearch log storage.

What I would love to have are SSD's for most recent logs or SSD's for hot
data. For that I have come down to two solutions with 3x physical servers.

1. Use Windows 2012 R2 as the OS, use Storage Spaces to prvide a tiered
storage for SSD's and HDD's to Elasticsearch.

2. Install ESXi. Create two VM's on each server, one that has access to SSD
disks and one who has access to HDD's. That will give me a 6 node
Elasticsearch cluster. Use tags to keep latest indices on the SSD nodes,
when they are x days old a script or curator will remove the SSD tag from
the index and add a HDD tag, hopefully resulting in a migration to the HDD
nodes. Either Windows 2012 R2 or Ubuntu will be used here.

Each Elasticsearch node will get either 32gig memory or 64gig memory.
Undecided on that at the moment, might go with the lower amount to have the
option of expanding it if there is need. With 192gigs in the cluster there
might not even be a need for SSD's.

Not sure about the number of cores.

Plan to skip raid on everything except the OS disks and use Elasticsearch
striping for the data. Total storage will be about 20gigs a day without
replication, and total storage will be about 15tb.

Any angles I'm missing?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1e6bbc44-fc3b-4724-b657-63722a951d06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Good merge settings for interactively maintained index

2014-12-04 Thread Michael McCandless

25-40% is definitely "normal" for an index where many docs are being
replaced; I've seen this go up to ~65% before large merges bring it back
down.

On 2) there may be some improvements we can make to Lucene default
TieredMergePolicy here, to reclaim deletes for the "too large" segments ...
I'll have a look.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Dec 4, 2014 at 4:06 AM, Michal Taborsky 
wrote:

> Hello Nikolas,
>
> we are facing similar behavior. Did you find out anything?
>
> Thank you,
> Michal
>
> Dne pondělí, 8. září 2014 22:55:12 UTC+2 Nikolas Everett napsal(a):
>
>> My indexes change somewhat frequently.  If I let leave the merge settings
>> as the default I end up with 25%-40% deleted documents (some indexes
>> higher, some lower).  I'm looking for some generic advice on:
>> 1.  Is that 25%-40% ok?
>> 2.  What kind of settings should I set to keep that in an acceptable
>> range?  For some meaning of acceptable.
>>
>> On (1) I'm pretty sure 25%-40% is OK for my low query traffic indexes -
>> no use optimizing them anyway.  But for my high search traffic indexes I
>> _think_ I see a performance improvement when I have lower (<5%) deleted
>> documents and fewer segments.  But computers are complicated and my
>> performance tests might just have been testing cache warming  Does this
>> conclusion match other's experience?
>>
>> On (2) I'm not really sure what to do.  It _looks_ _like_ Lucene isn't
>> picking up the bigger segments to merge the deletes out of them.  I assume
>> that is because they are bumping against the max allowed segment size and
>> therefor it can only merge one at a time so it always has something better
>> to do.  I'm not sure that is healthy though.  Some of those old segments
>> can get really bloated - like 40%-50% deleted.
>>
>> Thanks!
>>
>> Nik
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRetuA9UDMMqPi9pZuGqUtdGGxrZM5ugP%2BVO3SVCUxTD6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Maven plugin on GitHub

2014-12-04 Thread Chetan Padhye

Hi Good plugin . .  I tried to run it but it start and then stop once pom 
execution is finished.  How can we modify plugin to keep it running once 
started. My intention is to use this plugin for demo installations. so i 
can install elastic search node and start it on any machine for my demo. 



On Friday, 17 January 2014 06:14:22 UTC+5:30, AlexC wrote:
>
> If anyone is interested in using a Maven plugin to run Elasticsearch for 
> integration testing, I just published one on GitHub:
> https://github.com/alexcojocaru/elasticsearch-maven-plugin.
>
> It is an alternative to starting a node through the code.
>
> The readme should provide enough information, but let me know if something 
> is missing or not clear enough.
> It uses ES v0.90.7, but it can be easily updated to the latest ES version 
> by changing the dependency version in the pom.xml file.
>
> alex
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a9188126-93df-4d43-aaf8-4c324ceb12a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Progress on Hive "Push Down Filtering"

2014-12-04 Thread James Andrew-Smith

Hi Costin,

Thank you for the rapid response - just wanted to say I appreciate the 
Hadoop install works so easily just as advertised.

Shame about the push down filter but this is what I expected.

I'll focus on keeping the projection as lightweight as possible - on that 
note - I started another thread (
https://groups.google.com/forum/m/#!topic/elasticsearch/-3Lbdw5Wigg) about 
using aggregation queries (which I am a huge fan of) via Hive. Is this in 
possible/in the pipeline?

Cheers
James

On Thursday, 4 December 2014 20:04:17 UTC+11, Costin Leau wrote:
>
> Hi, 
>
> There are two aspects when dealing with large tables. 
>
> 1. Projection 
>
> The table mapping/definition is necessary as it indicates what information 
> is needed - a small mapping excludes a lot of 
> unnecessary data. 
>
> 2. Push Down filtering 
>
> Unfortunately there hasn't been much happening on this front since the 
> functionality is fairly restricted and not really 
> pluggable especially when 
> dealing with non-HDFS resources. The ORC support has improved things a bit 
> however it's still early days... 
>
> Cheers, 
>
>
> On 12/4/14 7:41 AM, James Andrew-Smith wrote: 
> > Has there been any progress on the "Push Down Filtering" mentioned by 
> Costin? 
> > (
> http://ryrobes.com/systems/connecting-tableau-to-elasticsearch-read-how-to-query-elasticsearch-with-hive-sql-and-hadoop/#comment-1169375542)
>  
>
> > 
> > 
> > Right now I am working around this by creating a lot of specific table 
> mappings to maintain performance. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to 
> > elasticsearc...@googlegroups.com   elasticsearch+unsubscr...@googlegroups.com >. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/bd8151a0-af89-4617-9efe-aa738e70862c%40googlegroups.com
>  
> > <
> https://groups.google.com/d/msgid/elasticsearch/bd8151a0-af89-4617-9efe-aa738e70862c%40googlegroups.com?utm_medium=email&utm_source=footer>.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>
> -- 
> Costin 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/612d65f5-90f9-4ac5-93ca-5cf8fe6fb03f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: This version of Kibana requires at least Elasticsearch 1.4.0.Beta1 but using 1.4.1

2014-12-04 Thread Stefan Meisner Larsen

I had the same problem when I accidentially joined in cluster with a 
collegue who was using an older version of elasticsearch.
Changed the cluster name and everything worked perfectly ;-)

/Stefan

Den torsdag den 4. december 2014 08.01.09 UTC+1 skrev David Montgomery:
>
>
> I added the below to elasticsearch.ymk config.  Still kibana provides the 
> same error
>
> http.cors.enabled: true
> http.cors.allow-origin: http://monitor-development-east.test.com:5601
>
> For those at elastiseaerch..can you provide me color on what my be going 
> on?  
>
> Thanks
>
>
>
>
>
>
>
> On Thursday, December 4, 2014 7:53:50 AM UTC+8, David Montgomery wrote:
>>
>> Hi,
>>
>>  have no agents as of yet.  Just logstash server with the below config
>>
>> input {
>>   redis {
>> host => "<%=@redis_host%>"
>> data_type => "list"
>> key => "logstash"
>> codec => json
>>   }
>> }
>>
>> output {
>> stdout { }
>> elasticsearch { 
>> host => "<%=node[:ipaddress]%>" 
>> }
>> }
>>
>> On Thursday, December 4, 2014 6:19:36 AM UTC+8, Jack Judge wrote:
>>>
>>> This tripped me up too.
>>>
>>> Are your logstash agents using the elasticsearch output ? If so, they'll 
>>> be running the embedded version of ES that comes with Logstash, and that's 
>>> the version that's stopping kibana from working. Basically *any* ES node 
>>> that connects to your cluster in any form must be 1.4.x
>>> My solution was to change the logstash output to elasticsearch_http
>>>
>>> JJ
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5dad34dc-fe0f-4ab4-86bb-a33571fc434c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Good merge settings for interactively maintained index

2014-12-04 Thread Michal Taborsky

Hello Nikolas,

we are facing similar behavior. Did you find out anything?

Thank you,
Michal

Dne pondělí, 8. září 2014 22:55:12 UTC+2 Nikolas Everett napsal(a):
>
> My indexes change somewhat frequently.  If I let leave the merge settings 
> as the default I end up with 25%-40% deleted documents (some indexes 
> higher, some lower).  I'm looking for some generic advice on:
> 1.  Is that 25%-40% ok?
> 2.  What kind of settings should I set to keep that in an acceptable 
> range?  For some meaning of acceptable.
>
> On (1) I'm pretty sure 25%-40% is OK for my low query traffic indexes - no 
> use optimizing them anyway.  But for my high search traffic indexes I 
> _think_ I see a performance improvement when I have lower (<5%) deleted 
> documents and fewer segments.  But computers are complicated and my 
> performance tests might just have been testing cache warming  Does this 
> conclusion match other's experience?
>
> On (2) I'm not really sure what to do.  It _looks_ _like_ Lucene isn't 
> picking up the bigger segments to merge the deletes out of them.  I assume 
> that is because they are bumping against the max allowed segment size and 
> therefor it can only merge one at a time so it always has something better 
> to do.  I'm not sure that is healthy though.  Some of those old segments 
> can get really bloated - like 40%-50% deleted.
>
> Thanks!
>
> Nik
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Progress on Hive "Push Down Filtering"

2014-12-04 Thread Costin Leau

Hi,

There are two aspects when dealing with large tables.

1. Projection

The table mapping/definition is necessary as it indicates what information is needed - a small mapping excludes a lot of
unnecessary data.

2. Push Down filtering

Unfortunately there hasn't been much happening on this front since the functionality is fairly restricted and not really
pluggable especially when

dealing with non-HDFS resources. The ORC support has improved things a bit
however it's still early days...

Cheers,

On 12/4/14 7:41 AM, James Andrew-Smith wrote:

Has there been any progress on the "Push Down Filtering" mentioned by Costin?
(http://ryrobes.com/systems/connecting-tableau-to-elasticsearch-read-how-to-query-elasticsearch-with-hive-sql-and-hadoop/#comment-1169375542)

Right now I am working around this by creating a lot of specific table mappings
to maintain performance.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
elasticsearch+unsubscr...@googlegroups.com
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/bd8151a0-af89-4617-9efe-aa738e70862c%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/54802382.5060704%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: URI search not behaving the same as in Query DSL

You have to enable "analyze_wildcard: true"

Jörg

On Wed, Dec 3, 2014 at 4:58 PM, drjz  wrote:

> Hi all,
>
> I am testing using wildcards in field names. I have the following URI
> search:
>
> _search?q=p.\*:pair&explain
>
> It returns me results (in the browser).
>
> However, when I express this query in Query DSL like this, then no results
> get returned:
>
> {
> "query" : {
> "query_string" : {
> "fields" : ["p.*"],
> "query" : "pair"
> }
> }
> }
>
> Is this a bug or do I have the wrong translation of the query in Query DSL?
>
> Thanks!
>
> /JZ
>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d0236cbc-17e2-4f14-83b7-da5941351d48%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHhFgbyqH9grwoa%3DyST5NS%3DTa208Tg8kSf_zJ8sYYtU8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scan and Scroll - handling failure

Yes, if you get an error while scan/scroll is active, you have to close the
procedure and restart from the beginning.

Not sure what you mean by an "extended period of time" but you can surely
keep the cursor open for some minutes without too much impact.

Jörg

On Wed, Dec 3, 2014 at 12:30 AM, Barrett Kern  wrote:

> Hello,
>
> I have a very large data set spread over multiple indexes that I want to
> basically grab each record/transform into another index. Reading the docs
> points me towards scan & scroll and then some bulk indexing. What concerns
> me is failure during this copy it seems there is no way to 'resume' this
> job if it fails in the middle. T Based off some initial tests this copy
> will take a long time to run and I wonder if I have overlooked some options
> or I am not thinking of something. The only thing I can think of is
> persisting scroll id and keeping them open for an extended period of time
> but the downside being this will have strong negative impact on ES memory.
>
> Thanks,
> Barry
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4ce039ea-e44b-4ab6-acb0-ed7e4dfea35e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH0w_xmS%2BbB1bFegOBvcqXv3euwobr-gDhZi_yMaVZi7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch snapshots throttle problems

2014-12-04 Thread Johan Öhr

I noticed these warnings on some of my nodes while executing the snapshot, 
maybe it has to do something with why its so slow.

[2014-12-03 15:57:35,699][WARN ][snapshots] [xx06] 
[[xxx-2014-11-20][7]] [my_backup:snapshot_test] failed to create snapshot

org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: 
[xxx-2014-11-20][7] Aborted

at 
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext$AbortableInputStream.checkAborted(BlobStoreIndexShardRepository.java:632)

at 
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext$AbortableInputStream.read(BlobStoreIndexShardRepository.java:625)

at java.io.FilterInputStream.read(FilterInputStream.java:107)

at 
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshotFile(BlobStoreIndexShardRepository.java:557)

at 
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:501)

at 
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.snapshot(BlobStoreIndexShardRepository.java:139)

at 
org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.snapshot(IndexShardSnapshotAndRestoreService.java:86)

at 
org.elasticsearch.snapshots.SnapshotsService$5.run(SnapshotsService.java:818)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745

Den onsdagen den 3:e december 2014 kl. 10:25:38 UTC+1 skrev Johan Öhr:
>
> Hi, 
>
> I have 12 elasticsearch nodes, with 10gb eth 
>
> Ive been having alot of problem with the performance of snapshots, its 
> throttles to 20 mb/s even tho i set max_snapshot_bytes_per_sec to something 
> else, ive tried to set it in bytes, in megabytes (500m, 500mb) 
>
> Ive tried to move 100gb file from elastic-node to my nfs-server, its about 
> 1gb/s 
> Ive tried to move 100gb file from elastic-node down to my share, its about 
> 1gb/s 
> Ive tried to just cp -rp my index from elastic-node to my share, its about 
> 1gb/s 
>
> Am i missing something here? How is the max_snapshot_bytes_per_sec suppose 
> to look like? 
> Are there any other settings (like recovery streams etc) that affects 
> this? 
>
> My backup dir: 
> curl -XPUT 'http://localhost:9200/_snapshot/my_backup' -d '{ 
> "type": "fs", 
> "settings": { 
>"location": "/misc/backup_elastic/snapshot", 
>"compress": true, 
>"verify": true, 
>"max_snapshot_bytes_per_sec" : "1048576000", 
>"max_restore_bytes_per_sec" : "1048576000" 
> } 
> }' 
>
> And snapshot: 
>  curl -XPUT "localhost:9200/_snapshot/my_backup/snapshot_test" -d '{ 
> "indices": "index-2014-01-03", 
> "ignore_unavailable": "true", 
> "include_global_state": "true", 
> "partial": "true" 
> }' 
>
> Regards, Johan
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/13857ee7-5e36-45a5-a2b5-3ffa66e8fa21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Term query not working in all indices