date:20140912

Hello Jigish ,

I dont think you can achieve all of these in Elasticsearch.
You can restrict the HTTP methods to GET and POST in Elasticsearch.
But for most of other tasks , Nginx would be a better option.
Elasticsearch jetty plugin might also help you -
https://github.com/sonian/elasticsearch-jetty

Thanks
   Vineeth

On Sat, Sep 13, 2014 at 9:03 AM, jigish thakar 
wrote:

> We are using elasticsearch as back-end for our in-house logging and
> monitoring system. We have multiple sites pouring in data to one ES cluster
> but in different index. e.g. abc-us has data from US site, abc-india has it
> from India site.
> Now concerns are we need some security checks before pushing in data to
> cluster.
>
>1. data coming to index is coming from right IP address
>2. incoming json request is of inserting new data and not delete/update
>3. while reading we want certain IP should not be able to read data of
>other index.
>
> Kindly let me know if its possible to achieve using elasticsearch.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b1ec03df-245a-4705-92ef-8c26002a7f82%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kjhd2C6Jrgy6RmRjsW_-C-rN85yKBiYbnCyAGdb8h3Ag%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch: security concerns

2014-09-12 Thread jigish thakar

We are using elasticsearch as back-end for our in-house logging and 
monitoring system. We have multiple sites pouring in data to one ES cluster 
but in different index. e.g. abc-us has data from US site, abc-india has it 
from India site.
Now concerns are we need some security checks before pushing in data to 
cluster.

   1. data coming to index is coming from right IP address
   2. incoming json request is of inserting new data and not delete/update
   3. while reading we want certain IP should not be able to read data of 
   other index.
   
Kindly let me know if its possible to achieve using elasticsearch.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b1ec03df-245a-4705-92ef-8c26002a7f82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch.net client, endpoint strategy?

Hello Lasse ,

Following is my idea on the whole thing -

Routing -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-routing
When a index request comes , based on the ID of the request , a hash
function computes the shard to which the request has to be routed.
Hence we are achieving a load balancing based on this procedure. It has to
be also noted that the routing key can be controlled.

Sniffing - While creating a client , you can use the sniffing feature for
elasticsearch to determine all the nodes and use them all in a load
balanced way.

Thanks
   Vineeth

On Fri, Sep 12, 2014 at 2:39 PM, Lasse Schou  wrote:

> Hi,
>
> Not sure if this is the right user group, but here goes:
>
> I'm planning to use ElasticSearch.net as the client for connecting to my
> ES cluster. I have one question I haven't been able to find the answer to.
> I know that the ConnectionPool feature can check if nodes fail, but can the
> client also ensure that data is written to the right shard, or does it
> simply use round robin to connect?
>
> Example:
>
> - A document "1234" is created
> - Based on the current number of shards, the document should be put on
> node 6 and a replica on node 11.
> - Is the request sent directly to node 6, or is it sent to a random node
> which will then forward the request to the right servers?
>
> Thanks,
> Lasse
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/52209da0-ad03-470e-b8d4-f4fd283dda4e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D_-%3DcCHMJ%3DWEZQ-6FJx_xy%3DHV13UzHwj7M%2BEwpGi%2B1cA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Purge the deleted documents on disk

Hello Wei ,

You can use the in optimize  API - max_num_segments as 1 or
 only_expunge_deletes  as true .
OPTIMIZE -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-optimize.html#indices-optimize

Thanks
 Vineeth

On Sat, Sep 13, 2014 at 5:32 AM, Wei  wrote:

> Hi all,
>
> If there's any api to clear all the deleted documents on disk?
> I read that
> Deleting a document doesn’t immediately remove the document from disk — it
> just marks it as deleted.
> Elasticsearch will clean up deleted documents in the background as you
> continue to index more data.
>
> By request the index stats, it shows
> "primaries": {
>
>- "docs": {
>   - "count": 3268352,
>   - "deleted": 71249
>} }
>
> I run refresh, or optimize on my index, But it only cleaned a small number
> of deleted documents from disk. Is there any way I can clean all the
> deleted docs? to reduce this number to 0?
>
> Thanks
> Wei
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/adc8877e-dbc4-4751-bf4b-05f346e6c497%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kqh9JRyE77K1%3DgT3U0F7L1S9ERTCCPpz0obKH4Eqa5vw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Do I need the JDBC driver

2014-09-12 Thread Ivan Brusic

Elasticsearch is no different than any other data store: your application
can add data by using the prescribed methods. Every data store has some
sort of data input method. Elasticsearch allows river plugins, which mean
that the Elasticsearch process can pull data instead of the standard push
model. The pull model is usually employed when two data sources should be
in sync (CouchDB, RDBMS).

I would stick to the standard push model. Have your client application
index data via the PHP library.

Cheers,

Ivan

On Fri, Sep 12, 2014 at 10:54 AM, Employ  wrote:

> I must admit I'm new to this so I find some of the information hard to
> understand. So sorry if I am asking stupid questions.
>
> On 12 Sep 2014, at 18:26, Ivan Brusic  wrote:
>
> I would strongly prefer to maintain control of the indexing side and not
> in Elasticsearch. In fact, the Elasticsearch team has talked about
> deprecating river plugins. I do not have any numbers, but I would suspect
> that the majority of users do not use a river plugin. And yes, the correct
> term is the JDBC plugin, not driver. The wrong term confused many. :)
>
> --
> Ivan
>
> On Fri, Sep 12, 2014 at 3:24 AM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> You can use either style, it is a matter of taste, or convenience.
>>
>> With the JDBC plugin, you can also push data instead of pull.
>>
>> Jörg
>>
>> On Fri, Sep 12, 2014 at 12:11 PM, James  wrote:
>>
>>> I want to close this issue but I still do not understand if I should be
>>> pushing documents from my database using the PHP client or using the JDBC
>>> river to pull them into elasticsearch from the SQL database.
>>>
>>> They can both achieve the same thing, but what is the usecase which
>>> defines when is the right time to use each implementation.
>>>
>>>
>>>
>>> On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:

 Hi,

 I'm setting up a system where I have a main SQL database which is
 synced with elasticsearch. My plan is to use the main PHP library for
 elasticsearch.

 I was going to have a cron run every thirty minuets to check for items
 in my database that not only have an "active" flag but that also do not
 have an "indexed" flag, that means I need to add them to the index. Then I
 was going to add that item to the index. Since I am using taking this path,
 it doesn't seem like I need the JDBC driver, as I can add items to
 elasticsearch using the PHP library.

 So, my question is, can I get away without using the JDBC driver?

 James


  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/6c244e00-1f89-447d-8eb5-114f0b5efcbd%40googlegroups.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHT2DcMHJwMjxBZ0RsV4_eKJyB2KjBCiqB2ZTac8fzkTg%40mail.gmail.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/0dzSMbARlks/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBvDga8q--Au8yWaX2RMGgcDTpYMhLu243tB9w7z0W0_A%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9EBFE31B-8DEA-4E05-8342-9E0013BC450B%40employ.com
>

Purge the deleted documents on disk

2014-09-12 Thread Wei

Hi all,

If there's any api to clear all the deleted documents on disk?
I read that 
Deleting a document doesn’t immediately remove the document from disk — it 
just marks it as deleted. 
Elasticsearch will clean up deleted documents in the background as you 
continue to index more data.

By request the index stats, it shows 
"primaries": {
   
   - "docs": {
  - "count": 3268352,
  - "deleted": 71249
   } }

I run refresh, or optimize on my index, But it only cleaned a small number 
of deleted documents from disk. Is there any way I can clean all the 
deleted docs? to reduce this number to 0?

Thanks 
Wei

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/adc8877e-dbc4-4751-bf4b-05f346e6c497%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: High CPU on random idle node

2014-09-12 Thread Justin Lintz

Thanks, it ended up being the Ganglia plugin that was causing crazy CPU 
consumption.  I've disabled it since we'll end up buying Marvel once we've 
fully deployed.  

On Friday, September 12, 2014 6:36:27 PM UTC-4, Jörg Prante wrote:
>
> Looks like you have a monitoring tool running and it got stuck in the node 
> stats call while traversing a number of shards/segments.
>
> How many shards/segments are in your migration? It seems to be very active.
>
> Maybe the bloom filter format conversion is expensive but I am not sure.
>
> Jörg
>
> On Sat, Sep 13, 2014 at 12:10 AM, Justin Lintz  > wrote:
>
>> Hi,
>>
>> We're deploying ES for logstash and I recently setup the cluster to 
>> migrate older indexes to "slower" nodes for longer term storage.  The nodes 
>> are not being queried or doing any indexing but CPU is at 75% constantly. 
>>  Here is output from hot_threads , jstack and some settings
>>
>> $ java -version
>> java version "1.7.0_65"
>> OpenJDK Runtime Environment (IcedTea 2.5.1) 
>> (7u65-2.5.1-4ubuntu1~0.12.04.2)
>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
>>
>> hot threads: https://gist.github.com/jlintz/3d965940284f1a7acf1e
>> settings: https://gist.github.com/jlintz/c701496a3db26ff0a20e
>> jstack: https://gist.github.com/jlintz/35924149197850e52931
>>
>> I just realized while posting this my index buffers are set too high for 
>> these nodes since they arent doing indexing, but just in case that's not 
>> the issue, I'll post anyway.  I'll report back if issue is still present
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6b987879-3997-4c5f-9753-14cab0dae545%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/463d5e56-57da-4680-ab39-65dc484daa11%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Seeking opinions on cluster platforms

Not sure what is "extreme". The design of ES may be a surprise for those
who are not familiar with distributed system architecture.

ES can handle faults in software. I pile up cheap 1U rack servers with 32
cores, 64G RAM, ~1TB RAID 0. All nodes are equally provisioned.

If a server fails, mostly spindle drives or fans, it is decommissioned and
repaired.

No need to monitor for master failure or making backups. Master is switched
over automatically by ES, and replica level 1 (or higher) is a must.

Jörg


On Sat, Sep 13, 2014 at 12:37 AM, Mark Walkom 
wrote:

> Personally, I'd go with the latter and then let the software handle all
> the redundancy. You can get super cheap 1RU pizza boxes from Quanta or the
> like and save yourself a bundle in that area and then leverage automation
> and configuration using The Foreman and Puppet.
>
> Tie a bit more smarts into it and you would have an awesome elastic
> compute platform. Or just use something like OpenStack, though it might be
> a bit heavy.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 13 September 2014 03:16, Jack Park  wrote:
>
>> Let me pose a question by suggesting two extremes for hardware to create
>> and maintain a growing ElasticSearch cluster datacenter (not in the cloud).
>>
>> One extreme places redundancy at the server hardware level, by which I
>> mean:
>> dual power supplies, RAID hard drives
>>
>> Another extreme places redundancy in a multitude of backup servers:
>> commodity servers, single power supply, no RAID on the disks, low cost,
>> with a cluster monitor that can advise of a failed master or backup, and
>> can rebuild the replacement
>>
>> I would love to learn how others see or implement within the boundaries
>> of those extremes, with the understanding that the two poles are just
>> suggestions, there may be other ways to slice this space.
>>
>> Many thanks in advance
>> Jack
>> ps: documents I read based on a broad query:
>> https://github.com/aphyr/partitions-post
>> http://www.elasticsearch.org/case-study/maptimize/
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html
>>
>> http://www.slideshare.net/clintongormley/scaling-realtime-search-and-analytics-with-elasticsearch
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fyHDhB9N8rOCwuf%2B3GR1E8xQ4aqSoQD8cYKZwo72bHw7A%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEM624bho%2BcJStsAAMvx5ZMApNEqCSz3a4oEofrU7VfEeuVX%2Bg%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFuXTUVQrhKpueJWKPT2ws4CtXm6mB-4rGLB-0DFkC37w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Cluster allocation awareness - opposite

It's a little unclear what you are doing.

You indicate you have a single cluster but then no replication happens
between the nodes in each DC?
By gateway do you mean a tribe node?
And by replicate the local indexes do you want to set replicas = 1 and then
have them on this new node in each DC?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 September 2014 22:40, spezam  wrote:

> Hi All,
> we currently have an Elasticsearch (1.1.1) cluster distributed among DCs
>
> * 3 Data nodes in 3 DC
> * 1 Gateway node
>
> each DC index its own data and no shards replica is happening between DC.
> The gateway lets us to query all the indexes in all the DC.
>
> Now, for performance and redundancy we would like to add one data node per
> DC and replicate the 'local' indexes.
>
> We read about cluster allocation awareness, and thought that was the
> perfect solution, until we realize it actually acts opposite of what we
> want.
> If node members are in the same awareness attribute, they would just
> 'ignore' each others and don't spread shards or replicate.
>
> Is such a deployment possible/does it makes sense?
>
> Thanks a million,
> Matteo
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3cff8e6e-9b3b-432c-baba-03dd44b3862c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y-3rjNgomPDz24mCKRb0c5Fo6Hd8KKwRaoo8M%2BL%2BWPsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Marvel enhancement

I'd add it here - https://github.com/elasticsearch/elasticsearch/issues

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 13 September 2014 01:42, Andrew Ochsner  wrote:

> Hi:
>
> Is there a place to submit suggestions for Marvel enhancements?  It would
> be nice if the Cluster Name was included in the Browser Title...
>
> Thanks
> Andy O
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/94b7eb13-53c6-4527-b8ba-eadfc224b9cf%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624a0WTPMsjNWMDQCBFkMDGFV-UaBR9%3DA5gzdtXWojpyTXA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Seeking opinions on cluster platforms

Personally, I'd go with the latter and then let the software handle all the
redundancy. You can get super cheap 1RU pizza boxes from Quanta or the like
and save yourself a bundle in that area and then leverage automation and
configuration using The Foreman and Puppet.

Tie a bit more smarts into it and you would have an awesome elastic compute
platform. Or just use something like OpenStack, though it might be a bit
heavy.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 13 September 2014 03:16, Jack Park  wrote:

> Let me pose a question by suggesting two extremes for hardware to create
> and maintain a growing ElasticSearch cluster datacenter (not in the cloud).
>
> One extreme places redundancy at the server hardware level, by which I
> mean:
> dual power supplies, RAID hard drives
>
> Another extreme places redundancy in a multitude of backup servers:
> commodity servers, single power supply, no RAID on the disks, low cost,
> with a cluster monitor that can advise of a failed master or backup, and
> can rebuild the replacement
>
> I would love to learn how others see or implement within the boundaries of
> those extremes, with the understanding that the two poles are just
> suggestions, there may be other ways to slice this space.
>
> Many thanks in advance
> Jack
> ps: documents I read based on a broad query:
> https://github.com/aphyr/partitions-post
> http://www.elasticsearch.org/case-study/maptimize/
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html
>
> http://www.slideshare.net/clintongormley/scaling-realtime-search-and-analytics-with-elasticsearch
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fyHDhB9N8rOCwuf%2B3GR1E8xQ4aqSoQD8cYKZwo72bHw7A%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bho%2BcJStsAAMvx5ZMApNEqCSz3a4oEofrU7VfEeuVX%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: High CPU on random idle node

Looks like you have a monitoring tool running and it got stuck in the node
stats call while traversing a number of shards/segments.

How many shards/segments are in your migration? It seems to be very active.

Maybe the bloom filter format conversion is expensive but I am not sure.

Jörg

On Sat, Sep 13, 2014 at 12:10 AM, Justin Lintz  wrote:

> Hi,
>
> We're deploying ES for logstash and I recently setup the cluster to
> migrate older indexes to "slower" nodes for longer term storage.  The nodes
> are not being queried or doing any indexing but CPU is at 75% constantly.
>  Here is output from hot_threads , jstack and some settings
>
> $ java -version
> java version "1.7.0_65"
> OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.12.04.2)
> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
>
> hot threads: https://gist.github.com/jlintz/3d965940284f1a7acf1e
> settings: https://gist.github.com/jlintz/c701496a3db26ff0a20e
> jstack: https://gist.github.com/jlintz/35924149197850e52931
>
> I just realized while posting this my index buffers are set too high for
> these nodes since they arent doing indexing, but just in case that's not
> the issue, I'll post anyway.  I'll report back if issue is still present
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6b987879-3997-4c5f-9753-14cab0dae545%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGCJvGZU-tjQaCxC57MgXcy63TR7mDE1reHwAKzyRfoMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

High CPU on random idle node

2014-09-12 Thread Justin Lintz

Hi,

We're deploying ES for logstash and I recently setup the cluster to migrate 
older indexes to "slower" nodes for longer term storage.  The nodes are not 
being queried or doing any indexing but CPU is at 75% constantly.  Here is 
output from hot_threads , jstack and some settings

$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.12.04.2)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

hot threads: https://gist.github.com/jlintz/3d965940284f1a7acf1e
settings: https://gist.github.com/jlintz/c701496a3db26ff0a20e
jstack: https://gist.github.com/jlintz/35924149197850e52931

I just realized while posting this my index buffers are set too high for 
these nodes since they arent doing indexing, but just in case that's not 
the issue, I'll post anyway.  I'll report back if issue is still present

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6b987879-3997-4c5f-9753-14cab0dae545%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Context in Native Scripts

2014-09-12 Thread Zeev Sands

I was hoping to be able to intercept the information somewhere between
searching and scoring to avoid extra round trips, but,I guess, having
two queries works as well, although it might be slow.

Thank you for your help!

On 09/12/2014 11:30 AM, vineeth mohan wrote:

Hello Zeev ,

The only way i can think is using 2 query -

1. Find sum of all scores -
{
"aggs": {
"sum": {
"sum": {
"script": "doc.score"
}
}
}
}
2. In the second request , using scripting in function score query
and find the deviation.

Thanks
Vineeth

On Fri, Sep 12, 2014 at 7:35 PM, Zeev Sands > wrote:

Hi,

Thank you for the reply. Here is an example of a scoring behavior
I'm talking about:

a) given a user query a set of documents is produced. Let's
call this set S.
b) suppose each document has a numeric field called "F".
The average of this field values for the set of documents
S is calculated. Let's call this average A.
So is A = sum(F) / N, sum(F) is the sum of the values of
field F for each document in S, and N is |S|, the number of
documents in S.
c) final score for each document is the deviation from the
average: score = F - A.

So, in order to calculate the score for each document, I need to
know "A", which depends on *all* documents produced by the query.
This is a simplified example, the actual score calculation is more
involved.

Here is a different case, where I might like to know all the
documents produce by a query in order to score them: I have an
external server that handles the actual score calculation for each
document, communicating millions of documents to this server one
document at a time is expensive. I would prefer to first get all
the documents selected by a query, then send all of this info to
the server and get one reply containing custom scores for all the
documents at once.

While I'm at it, a quick additional question: the rescore query
and post_filter look interesting. Is there any native (java) api
to implement custom re-scorer or custom post_filter? Just a link
to api would be very helpful.

Thank you again,
ZS

On 09/11/2014 08:20 PM, vineeth mohan wrote:

Hello ,

Can you give a more elaborate explanation on the behavior of
scoring you want ?
I dont see any direct way to achieve this.

Also re-scoring might interest you -

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-rescore.html

Thanks
Vineeth

On Thu, Sep 11, 2014 at 11:19 PM, mailto:zeev.sa...@gmail.com>> wrote:

Hello everyone,

I've been playing with native scripts and have a few
questions:

Is there any notion of context for native scripts?

For example, is there a way to know that a method
"runAsDouble", for example, is called for the last time?
I might, for instance, like to send some sort of
statistics after a search is done.

Is there any way to know how many documents the search
produced, beforehand?
I might want to do some pre calculations based on this
number before the actual scoring begins.

Is there any way to get all the documents (or ids)
somehow to process (score) them in bulk?
My scoring might depend on the search result, I might
want to calculate an average of a search result field and
base my scores on this number.

I apologize in advance, if some of my questions are
uninformed. I'm new to ES, trying to switch from Solr.

Thank you,

--
You received this message because you are subscribed to the

Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to
elasticsearch+unsubscr...@googlegroups.com
.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/48643754-67cc-497c-8c84-c1565dfcb867%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic

in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/8PFme4-9Ykw/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to elasticsearch+unsubscr...@googlegroups.com

percolation against same non-changed docs?

2014-09-12 Thread sabdalla80

I understand the concept of how docs are checked against existing 
percolators. One thing I am not clear on, does elasticSearch run against 
percolators again for same un-changed documents?

eg. 
I just indexed 5 million docs and ran it against all percolators.
The next day, I ran the same 5 million docs again, does it check against 
the percolators again, or is it smart enough to know these are same docs? 
is percolating done in-memory, so it doesn't take a lot of time to get 
matches back?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0aa904d-c755-4c9e-ae21-fc6d06906e26%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Trouble with search preference=_local

2014-09-12 Thread Kurt Hutchison




I have ES 1.0.2 and Lucene 4.6. 

I was hoping preference=_local would query on the current node, but it does 
not. 

The same node comes up at the top of the list for all shards regardless of 
which node I search from, 
it is not the primary node, neither is it the local node. 

I am testing this with the _search_shards API like this: 

curl -XGET localhost:9200/2/_search_shards?preference=_local 

The larger problem I am trying to solve is to restrict searches to a geo, 
we are a two geo site 
and don't want searches going across the WAN. 

I have experimented with allocation awareness (not forced), also with mixed 
results, searches 
always go to the first zone, this might actually be workable since we are a 
primary/DR 
setup, but it is not behaving as documented so I am leery of relying on 
this behavior. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ceabad94-f86c-40c6-a098-e4ea2381abfe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Yes, 2 servers are not enough from a fault tolerance perspective.

It is hard to find out why your ES cluster runs slow without information.
Maybe a few settings changes is all you need, I do not know. Maybe you can
find out from the logs what to do.

For sizing an ELK stack, there are many hints on the net, the best are
available from the company.

I hesitate to recommend anything on AWS service or elsewhere. Personally I
am in the situation that I use bare metal server in my own data center with
the specifications I want.

In the end, it is up to you to decide if you go the path "many servers with
less power" or "few servers with much power". This might also not be a
technical issue but also a strategic question.

As Mark said, Elasticsearch was designed to scale out. This means, you can
add servers very easily, and this improves the capacity and power of the
overall system. For many it is enough to add nodes and see the problems go
away, without thinking hard about the reasons.

Jörg



On Fri, Sep 12, 2014 at 1:51 PM, Pavel P  wrote:

> 2Jörg
>
> 1. How I decided that 3 is enough.
> I've started from 2 nodes in the cluster. And it was not able to manage
> the index load.
> Then, in this conversation,
> https://groups.google.com/forum/#!topic/elasticsearch/7XHQjAoKPfw, you
> explained me that 2 nodes cluster is not a cluster. So I went to 3 nodes.
> The indexing process now goes smoothly and I'm satisfied by it.
> 2. The maximum capacity = the size of the data? The data is spreaded
> equally, each has ~500Gb. The shards are located not equally of course,
> because it' shard to split 5 shards between servers.
> 3.
>
>>  if your requirements allow that according to the data patterns and the
>> search load, but not with the ES OOTB settings
>
>
> what is the ES OOTB?
>
> The main target of our cluster is to save all the logs from our internal
> aplications and then allow us to search through them, and, do some
> analytics using Kibana.
> The search load currently is something about 0, because as soon I'm trying
> to search it works quite slow, and when I'm trying to aggregate the values
> - I even have my cluster down.
>
> What is your view on this issue, Jörg, should we go to the 10 small
> servers, rather then 3 big ones?
>
> Regards,
>
> On Friday, September 12, 2014 2:43:15 PM UTC+3, Jörg Prante wrote:
>>
>> Regarding the shards, if you have 3 nodes and 1 index, with 5 shards you
>> have a sort of "impedance mismatch" because 5 (or 10 with replica) shards
>> do not distribute equally on 3 nodes.
>>
>> Rule: use a shard count that is always a factor of the node, e.g. 3, 6,
>> 9, 12  for 3 nodes.
>>
>> Can you tell what the maximum capacity of a single node is for your
>> installation? Somehow you must have concluded that 3 nodes are sufficient -
>> how did you do that? It does not only depend on observing index size. You
>> can even run 1.5TB index on a single node, if your requirements allow that
>> according to the data patterns and the search load, but not with the ES
>> OOTB settings, which is for development installations.
>>
>> Also note that Kibana is great but I have the impression (I do not use
>> it) that many queries from the UI are not optimized regarding filter caches
>> and tend to waste resources. There is much space left for improvement.
>>
>> Jörg
>>
>> On Fri, Sep 12, 2014 at 12:26 PM, Mark Walkom 
>> wrote:
>>
>>> As I initially mentioned, it all depends on your use case but generally
>>> ES does scale better horizontally rather than vertically. If you can, spin
>>> up another cluster along side the one you have and then replica the data
>>> set and query usage and compare the performance.
>>>
>>> Ideally you should aim for one primary shard per node but you can over
>>> allocate if you expect to grow - ie create 6 shards if you expect to grow
>>> to 6 servers. This applies on larger clusters as well, to a point.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 12 September 2014 19:24, Pavel P  wrote:
>>>
 Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and
 75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30
 RAM (in total 24 CPU and 90RAM) ?
 Assuming that the information would be spread there equally.

 btw, what about the shards allocation. Currently I use the default one
 5 shards and 1 replica. Could this be a potential thing to optimisation?
 How the shards scheme should look on the cluster with the bigger number
 of the nodes?

 Regards,

 On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:
>
> The answer is it depends on what sort of use case you have.
> But if you are experiencing problems like you are then usually it's
> due to the cluster being at capacity and needing more resources.
>
> You may find it cheaper to mo

Index Missing : illegal Argument exception

2014-09-12 Thread Muthu Kumar

Hi All,

I configured elastic search to use search and visualize the hadoop log 
files.
I have hadoop cluster with 8 nodes and i use eco systems Hive/Pig to push 
the log files to elastic search.
I configured elastic search successfully(single node) and i started kibana 
as well.
When i try to run a sample Hive scripts which was given in elastic search 
site. iam getting the following error.


hive> select * from artists;
OK
Failed with exception java.io.IOException:java.lang.IllegalException:Index 
[radio/artists] missing and settings [es.field.read.empty.as.null] is set 
to false
Time taken 0.348 seconds
hive>

Can somebody help me to fix this issue?

Many thanks in advance!

Regards,
Muthu

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e4b6c528-e638-4cb5-b346-4c571031e997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Do I need the JDBC driver

2014-09-12 Thread Employ

I must admit I'm new to this so I find some of the information hard to 
understand. So sorry if I am asking stupid questions.

> On 12 Sep 2014, at 18:26, Ivan Brusic  wrote:
> 
> I would strongly prefer to maintain control of the indexing side and not in 
> Elasticsearch. In fact, the Elasticsearch team has talked about deprecating 
> river plugins. I do not have any numbers, but I would suspect that the 
> majority of users do not use a river plugin. And yes, the correct term is the 
> JDBC plugin, not driver. The wrong term confused many. :)
> 
> -- 
> Ivan
> 
>> On Fri, Sep 12, 2014 at 3:24 AM, joergpra...@gmail.com 
>>  wrote:
>> You can use either style, it is a matter of taste, or convenience.
>> 
>> With the JDBC plugin, you can also push data instead of pull.
>> 
>> Jörg
>> 
>>> On Fri, Sep 12, 2014 at 12:11 PM, James  wrote:
>>> I want to close this issue but I still do not understand if I should be 
>>> pushing documents from my database using the PHP client or using the JDBC 
>>> river to pull them into elasticsearch from the SQL database. 
>>> 
>>> They can both achieve the same thing, but what is the usecase which defines 
>>> when is the right time to use each implementation.
>>> 
>>> 
>>> 
 On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
 Hi,
 
 I'm setting up a system where I have a main SQL database which is synced 
 with elasticsearch. My plan is to use the main PHP library for 
 elasticsearch. 
 
 I was going to have a cron run every thirty minuets to check for items in 
 my database that not only have an "active" flag but that also do not have 
 an "indexed" flag, that means I need to add them to the index. Then I was 
 going to add that item to the index. Since I am using taking this path, it 
 doesn't seem like I need the JDBC driver, as I can add items to 
 elasticsearch using the PHP library.
 
 So, my question is, can I get away without using the JDBC driver?
 
 James
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/6c244e00-1f89-447d-8eb5-114f0b5efcbd%40googlegroups.com.
>>> 
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHT2DcMHJwMjxBZ0RsV4_eKJyB2KjBCiqB2ZTac8fzkTg%40mail.gmail.com.
>> 
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/0dzSMbARlks/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBvDga8q--Au8yWaX2RMGgcDTpYMhLu243tB9w7z0W0_A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9EBFE31B-8DEA-4E05-8342-9E0013BC450B%40employ.com.
For more options, visit https://groups.google.com/d/optout.

Re: Do I need the JDBC driver

2014-09-12 Thread Employ

Ah that's really interesting, it's good to get some comparison. In that case 
you are saying to use the official php library for the document indexing?

James

> On 12 Sep 2014, at 18:26, Ivan Brusic  wrote:
> 
> I would strongly prefer to maintain control of the indexing side and not in 
> Elasticsearch. In fact, the Elasticsearch team has talked about deprecating 
> river plugins. I do not have any numbers, but I would suspect that the 
> majority of users do not use a river plugin. And yes, the correct term is the 
> JDBC plugin, not driver. The wrong term confused many. :)
> 
> -- 
> Ivan
> 
>> On Fri, Sep 12, 2014 at 3:24 AM, joergpra...@gmail.com 
>>  wrote:
>> You can use either style, it is a matter of taste, or convenience.
>> 
>> With the JDBC plugin, you can also push data instead of pull.
>> 
>> Jörg
>> 
>>> On Fri, Sep 12, 2014 at 12:11 PM, James  wrote:
>>> I want to close this issue but I still do not understand if I should be 
>>> pushing documents from my database using the PHP client or using the JDBC 
>>> river to pull them into elasticsearch from the SQL database. 
>>> 
>>> They can both achieve the same thing, but what is the usecase which defines 
>>> when is the right time to use each implementation.
>>> 
>>> 
>>> 
 On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
 Hi,
 
 I'm setting up a system where I have a main SQL database which is synced 
 with elasticsearch. My plan is to use the main PHP library for 
 elasticsearch. 
 
 I was going to have a cron run every thirty minuets to check for items in 
 my database that not only have an "active" flag but that also do not have 
 an "indexed" flag, that means I need to add them to the index. Then I was 
 going to add that item to the index. Since I am using taking this path, it 
 doesn't seem like I need the JDBC driver, as I can add items to 
 elasticsearch using the PHP library.
 
 So, my question is, can I get away without using the JDBC driver?
 
 James
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/6c244e00-1f89-447d-8eb5-114f0b5efcbd%40googlegroups.com.
>>> 
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHT2DcMHJwMjxBZ0RsV4_eKJyB2KjBCiqB2ZTac8fzkTg%40mail.gmail.com.
>> 
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/0dzSMbARlks/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBvDga8q--Au8yWaX2RMGgcDTpYMhLu243tB9w7z0W0_A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8AB631FC-020F-4EF4-B639-F2D84D5AFC0C%40employ.com.
For more options, visit https://groups.google.com/d/optout.

Re: Do I need the JDBC driver

2014-09-12 Thread Ivan Brusic

I would strongly prefer to maintain control of the indexing side and not in
Elasticsearch. In fact, the Elasticsearch team has talked about deprecating
river plugins. I do not have any numbers, but I would suspect that the
majority of users do not use a river plugin. And yes, the correct term is
the JDBC plugin, not driver. The wrong term confused many. :)

-- 
Ivan

On Fri, Sep 12, 2014 at 3:24 AM, joergpra...@gmail.com <
joergpra...@gmail.com> wrote:

> You can use either style, it is a matter of taste, or convenience.
>
> With the JDBC plugin, you can also push data instead of pull.
>
> Jörg
>
> On Fri, Sep 12, 2014 at 12:11 PM, James  wrote:
>
>> I want to close this issue but I still do not understand if I should be
>> pushing documents from my database using the PHP client or using the JDBC
>> river to pull them into elasticsearch from the SQL database.
>>
>> They can both achieve the same thing, but what is the usecase which
>> defines when is the right time to use each implementation.
>>
>>
>>
>> On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
>>>
>>> Hi,
>>>
>>> I'm setting up a system where I have a main SQL database which is synced
>>> with elasticsearch. My plan is to use the main PHP library for
>>> elasticsearch.
>>>
>>> I was going to have a cron run every thirty minuets to check for items
>>> in my database that not only have an "active" flag but that also do not
>>> have an "indexed" flag, that means I need to add them to the index. Then I
>>> was going to add that item to the index. Since I am using taking this path,
>>> it doesn't seem like I need the JDBC driver, as I can add items to
>>> elasticsearch using the PHP library.
>>>
>>> So, my question is, can I get away without using the JDBC driver?
>>>
>>> James
>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/6c244e00-1f89-447d-8eb5-114f0b5efcbd%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHT2DcMHJwMjxBZ0RsV4_eKJyB2KjBCiqB2ZTac8fzkTg%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBvDga8q--Au8yWaX2RMGgcDTpYMhLu243tB9w7z0W0_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Data loss after network disconnect

2014-09-12 Thread Igor Motov

How were these nodes doing in terms of available heap space before the 
disconnects occurred? 

On Wednesday, September 10, 2014 6:26:19 AM UTC-4, Israel Tsadok wrote:
>
> A temporary network disconnect of the master node caused a torrent of 
> RELOCATING shards, and then one shard remained UNASSIGNED and the cluster 
> state was left red.
>
> looking inside the index directory for the shard on the disk, I found that 
> it was empty (i.e., the _state and translog dirs were there, but the index 
> dir had no files).
>
> Looking at the log files, I see that the disconnect happened around 
> 11:42:05, and a few minutes later I start seeing these error messages:
>
> *[2014-09-10 11:45:33,341]*[WARN ][indices.cluster  ] 
> [buzzilla_data008] [el-2011-10-31-][0] failed to start shard
> *[2014-09-10 11:45:33,342]*[WARN ][cluster.action.shard ] 
> [buzzilla_data008] [el-2011-10-31-][0] sending failed shard for 
> [el-2011-10-31-][0], node[RAR26zfuTiKl4mdbRVTtNA], [P], 
> s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message 
> [IndexShardGatewayRecoveryException[[el-2011-10-31-][0] failed to fetch 
> index version after copying it over]; nested: 
> IndexShardGatewayRecoveryException[[el-2011-10-31-][0] shard allocated 
> for local recovery (post api), should exist, but doesn't, current files: 
> []]; nested: IndexNotFoundException[no segments* file found in 
> store(least_used[rate_limited(mmapfs(/home/omgili/data/elasticsearch/data/buzzilla/nodes/0/indices/el-2011-10-31-/0/index),
>  
> type=MERGE, rate=20.0)]): files: []]; ]]
>
> The relevant log files are at 
> https://gist.github.com/itsadok/97453743d6b211681aca
> data009 is the original master, data017 is the new master, and data008 is 
> where I found the empty index directory.
>
> I had to delete the unassigned index from the cluster to return to green 
> state.
> I am running Elasticsearch 1.2.1 in a 20 node cluster. 
>
> How does this happen? What can I do to prevent this from happening again?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/749729f6-daa1-470c-a835-d8f5dd85ad87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Seeking opinions on cluster platforms

2014-09-12 Thread Jack Park

Let me pose a question by suggesting two extremes for hardware to create
and maintain a growing ElasticSearch cluster datacenter (not in the cloud).

One extreme places redundancy at the server hardware level, by which I mean:
dual power supplies, RAID hard drives

Another extreme places redundancy in a multitude of backup servers:
commodity servers, single power supply, no RAID on the disks, low cost,
with a cluster monitor that can advise of a failed master or backup, and
can rebuild the replacement

I would love to learn how others see or implement within the boundaries of
those extremes, with the understanding that the two poles are just
suggestions, there may be other ways to slice this space.

Many thanks in advance
Jack
ps: documents I read based on a broad query:
https://github.com/aphyr/partitions-post
http://www.elasticsearch.org/case-study/maptimize/
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html
http://www.slideshare.net/clintongormley/scaling-realtime-search-and-analytics-with-elasticsearch

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fyHDhB9N8rOCwuf%2B3GR1E8xQ4aqSoQD8cYKZwo72bHw7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Regex queries possible ?

2014-09-12 Thread Nikolas Everett

If not you can write a script filter that runs the regex.  Its slow but it
doesn't sound like you need it to be fast.

On Fri, Sep 12, 2014 at 11:33 AM, vineeth mohan 
wrote:

> Hi ,
>
> If this pattern is a single word , regex query might do the trick -
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#query-dsl-regexp-query
>
> Thanks
>   Vineeth
>
> On Fri, Sep 12, 2014 at 7:35 PM, Log Muncher 
> wrote:
>
>> Hi,
>>
>> One of my servers appears to be feeding nonsense into Fluentd which is
>> then ending up in elastic search.
>>
>> Is it possible to use regex in queries ?
>>
>> The syslog message content is always the same they start with numbers
>> followed by close bracket, etc.
>>
>> 123)
>>
>> 89)
>>
>> 203)
>>
>>
>> Is there a way to do the equivalent of ^\d+)  in a elastic search query ?
>>
>>
>> Thanks !
>>
>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/19b17dc5-f188-4223-8d72-40732112814c%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5n4yowfX98esw1MuUxDtVSjyxRtNHvnjqarnZ20o32N0A%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3hmLjyw-LZ5sKFUCvyOujD_aj5VUymNh8U19Qfp9ALbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Marvel enhancement

2014-09-12 Thread Andrew Ochsner

Hi:

Is there a place to submit suggestions for Marvel enhancements?  It would 
be nice if the Cluster Name was included in the Browser Title...

Thanks
Andy O

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/94b7eb13-53c6-4527-b8ba-eadfc224b9cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

_cluster/settings transient/persistent conflict

2014-09-12 Thread Andrew Ochsner

Hi:  

When I have a cluster state that looks like this...which wins?  Is there 
any way to clear a setting?  Guessing I need to wait 
until https://github.com/elasticsearch/elasticsearch/issues/6732 but what 
do I do in the meantime?  Set both?

{
  "persistent" : {
"cluster" : {
  "routing" : {
"allocation" : {
  "enable" : "all",
  "disable_allocation" : "false"
}
  }
}
  },
  "transient" : {
"cluster" : {
  "routing" : {
"allocation" : {
  "enable" : "none",
}
  }
}
  }
}



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed74ec91-29d5-4e4a-91cb-1f9537f3e93a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Linking of query/search

2014-09-12 Thread Alex Kamil

you can combine ES with RDBMS, and run your SQL queries either directly
against db, or pull data via JDBC River into ES, I wrote about it here:
http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html


On Fri, Sep 12, 2014 at 10:55 AM, Ivan Brusic  wrote:

> You cannot join documents in Lucene/Elasticsearch (at least not like a
> RDBMS). You would need to either denormalize your data, join on the client
> side or execute 2+ queries.
>
> --
> Ivan
>
> On Fri, Sep 12, 2014 at 12:45 AM,  wrote:
>
>> Hello!
>>
>> Can anyone shine some light on my question?
>> Is the query in question achievable in ES directly?
>>
>> If not, I can probably do that in application later, but it would be
>> nicer if ES could serve me the final results.
>>
>> Matej
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/6f3345f2-4b25-4b06-b203-4ad0de201e8f%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBgybZpCz1bKV%3DE7XF_cHGDuFKS1wruKNAYZTbo8t0jvA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX623repUH5k2XbkFBFNu-b3cSKyObuyf793AVhOt3Gb-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Regex queries possible ?

Hi ,

If this pattern is a single word , regex query might do the trick -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#query-dsl-regexp-query

Thanks
  Vineeth

On Fri, Sep 12, 2014 at 7:35 PM, Log Muncher 
wrote:

> Hi,
>
> One of my servers appears to be feeding nonsense into Fluentd which is
> then ending up in elastic search.
>
> Is it possible to use regex in queries ?
>
> The syslog message content is always the same they start with numbers
> followed by close bracket, etc.
>
> 123)
>
> 89)
>
> 203)
>
>
> Is there a way to do the equivalent of ^\d+)  in a elastic search query ?
>
>
> Thanks !
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/19b17dc5-f188-4223-8d72-40732112814c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5n4yowfX98esw1MuUxDtVSjyxRtNHvnjqarnZ20o32N0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Context in Native Scripts

Hello Zeev ,

The only way i can think is using 2 query -


   1. Find sum of all scores -
   {
 "aggs": {
   "sum": {
 "sum": {
   "script": "doc.score"
 }
   }
 }
   }
   2. In the second request , using scripting in function score query and
   find the deviation.

Thanks
  Vineeth




















On Fri, Sep 12, 2014 at 7:35 PM, Zeev Sands  wrote:

>  Hi,
>
> Thank you for the reply. Here is an example of a scoring behavior I'm
> talking about:
>
> a) given a user query a set of documents is produced. Let's call this
> set S.
> b) suppose each document has a numeric field called "F".
> The average of this field values for the set of documents S is
> calculated. Let's call this average A.
>  So is A = sum(F) / N, sum(F) is the sum of the values of field F
> for each document in S, and  N is |S|, the number of documents in S.
> c) final score for each document is the deviation from the average:
> score = F - A.
>
> So, in order to calculate the score for each document, I need to know "A",
> which depends on *all* documents produced by the query. This is a
> simplified example, the actual score calculation is more involved.
>
> Here is a different case, where I might like to know all the documents
> produce by a query in order to score them: I have an external server that
> handles the actual score calculation for each document, communicating
> millions of documents to this server one document at a time is expensive. I
> would prefer to first get all the documents selected by a query, then send
> all of this info to the server and get one reply containing custom scores
> for all the documents at once.
>
> While I'm at it, a quick additional question: the rescore query and
> post_filter look interesting. Is there any native (java) api to implement
> custom re-scorer or custom post_filter? Just a link to api would be very
> helpful.
>
> Thank you again,
> ZS
>
>
>
> On 09/11/2014 08:20 PM, vineeth mohan wrote:
>
> Hello ,
>
>  Can you give a more elaborate explanation on the behavior of scoring you
> want ?
> I dont see any direct way to achieve this.
>
>  Also re-scoring might  interest  you -
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-rescore.html
>
>  Thanks
>Vineeth
>
> On Thu, Sep 11, 2014 at 11:19 PM,  wrote:
>
>>
>> Hello everyone,
>>
>>  I've been playing with native scripts and have a few questions:
>>
>>  Is there any notion of context for native scripts?
>>
>> For example, is there a way to know that a method "runAsDouble", for
>> example, is called for the last time?
>> I might, for instance, like to send some sort of statistics after a
>> search is done.
>>
>> Is there any way to know how many documents the search produced,
>> beforehand?
>> I might want to do some pre calculations based on this number before
>> the actual scoring begins.
>>
>> Is there any way to get all the documents (or ids) somehow to process
>> (score) them in bulk?
>> My scoring might depend on the search result, I might want to
>> calculate an average of a search result field and base my scores on this
>> number.
>>
>> I apologize in advance, if some of my questions are uninformed. I'm
>> new to ES, trying to switch from Solr.
>>
>> Thank you,
>>
>> ZS
>>
>>
>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/48643754-67cc-497c-8c84-c1565dfcb867%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/8PFme4-9Ykw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mJ2jQ5ueZQepu8Z%2B0Sjo%3DwxhTh%3D3AvREOiJtKMaFOMXA%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view

Indexing + Quering GeoPoint documents

2014-09-12 Thread Ashesh Ambasta

This is more of a how-to question. I've created this mapping for 
"places/shops":


   - places: 
   {
  - mappings: 
  {
 - shops: 
 {
- dynamic: "true",
- numeric_detection: true,
- properties: 
{
   - available: 
   {
  - type: "boolean"
  },
   - location: 
   {
  - type: "geo_point",
  - lat_lon: true,
  - geohash: true
  }
   }
}
 }
  }
   
Where I'm using location as the point I'd like to use for GeoDistance 
queries.

I'm then indexing a document that looks like;

{"available": true, location: [50, 50]}

I see that the document is indexed when doing a GET for places/shops/1 (I'm 
indexing the document at id = 1)

I'm then executing a query using elastic4s in scala which is again quite 
simple;
val nearbyFuture:Future[SearchResponse] =
  client execute {
  search in "places" -> "shops" filter {
geoDistance("location") point(lat, lon) distance(kms)
  }
}


In this case lat & lon are both 50.0, and kms = 50 as well. This is a 
really simple query and yet I end up with 0 hits.

Whats's going wrong here?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7b56190-5dca-4483-a4cb-63f57a709750%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Linking of query/search

2014-09-12 Thread Ivan Brusic

You cannot join documents in Lucene/Elasticsearch (at least not like a
RDBMS). You would need to either denormalize your data, join on the client
side or execute 2+ queries.

-- 
Ivan

On Fri, Sep 12, 2014 at 12:45 AM,  wrote:

> Hello!
>
> Can anyone shine some light on my question?
> Is the query in question achievable in ES directly?
>
> If not, I can probably do that in application later, but it would be nicer
> if ES could serve me the final results.
>
> Matej
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6f3345f2-4b25-4b06-b203-4ad0de201e8f%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBgybZpCz1bKV%3DE7XF_cHGDuFKS1wruKNAYZTbo8t0jvA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Context in Native Scripts

2014-09-12 Thread Zeev Sands

Hi,

Thank you for the reply. Here is an example of a scoring behavior I'm
talking about:

a) given a user query a set of documents is produced. Let's call
this set S.

b) suppose each document has a numeric field called "F".
The average of this field values for the set of documents S is
calculated. Let's call this average A.
So is A = sum(F) / N, sum(F) is the sum of the values of field
F for each document in S, and N is |S|, the number of documents in S.
c) final score for each document is the deviation from the average:
score = F - A.

So, in order to calculate the score for each document, I need to know
"A", which depends on *all* documents produced by the query. This is a
simplified example, the actual score calculation is more involved.

Here is a different case, where I might like to know all the documents
produce by a query in order to score them: I have an external server
that handles the actual score calculation for each document,
communicating millions of documents to this server one document at a
time is expensive. I would prefer to first get all the documents
selected by a query, then send all of this info to the server and get
one reply containing custom scores for all the documents at once.

While I'm at it, a quick additional question: the rescore query and
post_filter look interesting. Is there any native (java) api to
implement custom re-scorer or custom post_filter? Just a link to api
would be very helpful.

Thank you again,
ZS

On 09/11/2014 08:20 PM, vineeth mohan wrote:

Hello ,

Can you give a more elaborate explanation on the behavior of scoring
you want ?

I dont see any direct way to achieve this.

Also re-scoring might interest you -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-rescore.html

Thanks
Vineeth

On Thu, Sep 11, 2014 at 11:19 PM, > wrote:

Hello everyone,

I've been playing with native scripts and have a few questions:

Is there any notion of context for native scripts?

For example, is there a way to know that a method
"runAsDouble", for example, is called for the last time?
I might, for instance, like to send some sort of statistics
after a search is done.

Is there any way to know how many documents the search
produced, beforehand?
I might want to do some pre calculations based on this number
before the actual scoring begins.

Is there any way to get all the documents (or ids) somehow to
process (score) them in bulk?
My scoring might depend on the search result, I might want to
calculate an average of a search result field and base my scores
on this number.

I apologize in advance, if some of my questions are
uninformed. I'm new to ES, trying to switch from Solr.

Thank you,

--
You received this message because you are subscribed to the Google

Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscr...@googlegroups.com
.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/48643754-67cc-497c-8c84-c1565dfcb867%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/8PFme4-9Ykw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mJ2jQ5ueZQepu8Z%2B0Sjo%3DwxhTh%3D3AvREOiJtKMaFOMXA%40mail.gmail.com
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5412FDB8.9050805%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Regex queries possible ?

2014-09-12 Thread Log Muncher

Hi,

One of my servers appears to be feeding nonsense into Fluentd which is then 
ending up in elastic search.

Is it possible to use regex in queries ?

The syslog message content is always the same they start with numbers 
followed by close bracket, etc.

123)

89)

203)


Is there a way to do the equivalent of ^\d+)  in a elastic search query ?


Thanks !



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/19b17dc5-f188-4223-8d72-40732112814c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Field Data Cache Size and Eviction

2014-09-12 Thread Philippe Laflamme

Forgot to mention that we're using ES 1.1.1

On Friday, September 12, 2014 9:21:23 AM UTC-4, Philippe Laflamme wrote:
>
> Hi,
>
> I have a cluster with nodes configured with a 18G heap. We've noticed a 
> degradation in performance recently after increasing the volume of data 
> we're indexing.
>
> I think the issue is due to the field data cache doing eviction. Some 
> nodes are doing lots of them, some aren't doing any. This is explained by 
> our routing strategy which results in non-uniform document distribution. 
> Maybe we can improve this eventually, but in the meantime, I'm trying to 
> understand why the nodes are evicting cached data.
>
> The metrics show that the field data cache is only ~1.5GB in size, yet we 
> have this in our elasticsearch.yml:
>
> indices.fielddata.cache.size: 10gb
>
> Why would a node evict cache entries when it should still have plenty of 
> room to store more? Are we missing another setting? Is there a way to tell 
> what the actual fielddata cache size is at runtime (maybe it did not pickup 
> the configuration setting for some reason)?
>
> Thanks,
> Philippe
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/512be87e-561a-4031-a465-d256ad400bbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Field Data Cache Size and Eviction

2014-09-12 Thread Philippe Laflamme

Hi,

I have a cluster with nodes configured with a 18G heap. We've noticed a 
degradation in performance recently after increasing the volume of data 
we're indexing.

I think the issue is due to the field data cache doing eviction. Some nodes 
are doing lots of them, some aren't doing any. This is explained by our 
routing strategy which results in non-uniform document distribution. Maybe 
we can improve this eventually, but in the meantime, I'm trying to 
understand why the nodes are evicting cached data.

The metrics show that the field data cache is only ~1.5GB in size, yet we 
have this in our elasticsearch.yml:

indices.fielddata.cache.size: 10gb

Why would a node evict cache entries when it should still have plenty of 
room to store more? Are we missing another setting? Is there a way to tell 
what the actual fielddata cache size is at runtime (maybe it did not pickup 
the configuration setting for some reason)?

Thanks,
Philippe

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e619f974-1632-4694-a0f9-40c32100c504%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

complex nested aggregation query based on time

2014-09-12 Thread Christophe Vandeplas

Hello there,


I am trying to write a rather complex aggregation

Let's say my json documents contains the following fields: timestamp, 
username, subject

The search should return documents where:
- two identical "subject" fields, 
- by the same username, 
- within an interval of X minutes. 

Using nested aggregation I can group by username, and count the identical 
subjects (terms).
However I can't find a way to also specify a time interval within the 
query.  (the identical subjects should be within an interval of X minutes)

All pointers are welcome.

Thanks
Christophe 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2f90d46f-2330-4a0f-8658-8cbdf6824415%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cluster allocation awareness - opposite

2014-09-12 Thread spezam

Hi All,
we currently have an Elasticsearch (1.1.1) cluster distributed among DCs

* 3 Data nodes in 3 DC
* 1 Gateway node

each DC index its own data and no shards replica is happening between DC.
The gateway lets us to query all the indexes in all the DC.

Now, for performance and redundancy we would like to add one data node per 
DC and replicate the 'local' indexes.

We read about cluster allocation awareness, and thought that was the 
perfect solution, until we realize it actually acts opposite of what we 
want.
If node members are in the same awareness attribute, they would just 
'ignore' each others and don't spread shards or replicate.

Is such a deployment possible/does it makes sense?

Thanks a million,
Matteo

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3cff8e6e-9b3b-432c-baba-03dd44b3862c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

2Jörg

1. How I decided that 3 is enough.
I've started from 2 nodes in the cluster. And it was not able to manage the 
index load.
Then, in this 
conversation, 
https://groups.google.com/forum/#!topic/elasticsearch/7XHQjAoKPfw, 
you explained me that 2 nodes cluster is not a cluster. So I went to 3 
nodes.
The indexing process now goes smoothly and I'm satisfied by it.
2. The maximum capacity = the size of the data? The data is spreaded 
equally, each has ~500Gb. The shards are located not equally of course, 
because it' shard to split 5 shards between servers.
3. 

>  if your requirements allow that according to the data patterns and the 
> search load, but not with the ES OOTB settings


what is the ES OOTB?

The main target of our cluster is to save all the logs from our internal 
aplications and then allow us to search through them, and, do some 
analytics using Kibana. 
The search load currently is something about 0, because as soon I'm trying 
to search it works quite slow, and when I'm trying to aggregate the values 
- I even have my cluster down.

What is your view on this issue, Jörg, should we go to the 10 small 
servers, rather then 3 big ones?

Regards,

On Friday, September 12, 2014 2:43:15 PM UTC+3, Jörg Prante wrote:
>
> Regarding the shards, if you have 3 nodes and 1 index, with 5 shards you 
> have a sort of "impedance mismatch" because 5 (or 10 with replica) shards 
> do not distribute equally on 3 nodes.
>
> Rule: use a shard count that is always a factor of the node, e.g. 3, 6, 9, 
> 12  for 3 nodes.
>
> Can you tell what the maximum capacity of a single node is for your 
> installation? Somehow you must have concluded that 3 nodes are sufficient - 
> how did you do that? It does not only depend on observing index size. You 
> can even run 1.5TB index on a single node, if your requirements allow that 
> according to the data patterns and the search load, but not with the ES 
> OOTB settings, which is for development installations. 
>
> Also note that Kibana is great but I have the impression (I do not use it) 
> that many queries from the UI are not optimized regarding filter caches and 
> tend to waste resources. There is much space left for improvement.
>
> Jörg
>
> On Fri, Sep 12, 2014 at 12:26 PM, Mark Walkom  > wrote:
>
>> As I initially mentioned, it all depends on your use case but generally 
>> ES does scale better horizontally rather than vertically. If you can, spin 
>> up another cluster along side the one you have and then replica the data 
>> set and query usage and compare the performance.
>>
>> Ideally you should aim for one primary shard per node but you can over 
>> allocate if you expect to grow - ie create 6 shards if you expect to grow 
>> to 6 servers. This applies on larger clusters as well, to a point.
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com 
>> web: www.campaignmonitor.com
>>
>>
>> On 12 September 2014 19:24, Pavel P > 
>> wrote:
>>
>>> Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and 
>>> 75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30 
>>> RAM (in total 24 CPU and 90RAM) ?
>>> Assuming that the information would be spread there equally.
>>>
>>> btw, what about the shards allocation. Currently I use the default one 5 
>>> shards and 1 replica. Could this be a potential thing to optimisation?
>>> How the shards scheme should look on the cluster with the bigger number 
>>> of the nodes?
>>>
>>> Regards,
>>>
>>> On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:

 The answer is it depends on what sort of use case you have.
 But if you are experiencing problems like you are then usually it's due 
 to the cluster being at capacity and needing more resources.

 You may find it cheaper to move to more numerous and smaller nodes that 
 you can distribute the load across, as that is where ES excels and also 
 how 
 many other big data platforms operate.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 12 September 2014 19:01, Pavel P  wrote:

> Java version is "1.7.0_55"
> Elasticsearch is 1.3.1
>
> Well, the cost of the whole setup is the question.
> currently it's something about 1000$ per month on AWS. Do we really 
> need to pay a lot more then 1000$/month to support the 1.5Tb data?
>
> Could you briefly describe how much nodes do you expect to handle that 
> much of data?
>
> The side question is, how the the really Big Data solution works, when 
> they do the search or aggregation from the data which size is far more 
> then 
> 1.5Tb? Or it's as well is the size of the architecture.
>
> Regards,
>
> On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:
>>
>>

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Regarding the shards, if you have 3 nodes and 1 index, with 5 shards you
have a sort of "impedance mismatch" because 5 (or 10 with replica) shards
do not distribute equally on 3 nodes.

Rule: use a shard count that is always a factor of the node, e.g. 3, 6, 9,
12  for 3 nodes.

Can you tell what the maximum capacity of a single node is for your
installation? Somehow you must have concluded that 3 nodes are sufficient -
how did you do that? It does not only depend on observing index size. You
can even run 1.5TB index on a single node, if your requirements allow that
according to the data patterns and the search load, but not with the ES
OOTB settings, which is for development installations.

Also note that Kibana is great but I have the impression (I do not use it)
that many queries from the UI are not optimized regarding filter caches and
tend to waste resources. There is much space left for improvement.

Jörg

On Fri, Sep 12, 2014 at 12:26 PM, Mark Walkom 
wrote:

> As I initially mentioned, it all depends on your use case but generally ES
> does scale better horizontally rather than vertically. If you can, spin up
> another cluster along side the one you have and then replica the data set
> and query usage and compare the performance.
>
> Ideally you should aim for one primary shard per node but you can over
> allocate if you expect to grow - ie create 6 shards if you expect to grow
> to 6 servers. This applies on larger clusters as well, to a point.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 12 September 2014 19:24, Pavel P  wrote:
>
>> Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and
>> 75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30
>> RAM (in total 24 CPU and 90RAM) ?
>> Assuming that the information would be spread there equally.
>>
>> btw, what about the shards allocation. Currently I use the default one 5
>> shards and 1 replica. Could this be a potential thing to optimisation?
>> How the shards scheme should look on the cluster with the bigger number
>> of the nodes?
>>
>> Regards,
>>
>> On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:
>>>
>>> The answer is it depends on what sort of use case you have.
>>> But if you are experiencing problems like you are then usually it's due
>>> to the cluster being at capacity and needing more resources.
>>>
>>> You may find it cheaper to move to more numerous and smaller nodes that
>>> you can distribute the load across, as that is where ES excels and also how
>>> many other big data platforms operate.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 12 September 2014 19:01, Pavel P  wrote:
>>>
 Java version is "1.7.0_55"
 Elasticsearch is 1.3.1

 Well, the cost of the whole setup is the question.
 currently it's something about 1000$ per month on AWS. Do we really
 need to pay a lot more then 1000$/month to support the 1.5Tb data?

 Could you briefly describe how much nodes do you expect to handle that
 much of data?

 The side question is, how the the really Big Data solution works, when
 they do the search or aggregation from the data which size is far more then
 1.5Tb? Or it's as well is the size of the architecture.

 Regards,

 On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:
>
> That's a lot of data for 3 nodes!
> You really need to adjust your infrastructure; add more nodes, more
> ram, or alternatively remove some old indexes (delete or close).
>
> What ES and java version are you running?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 12 September 2014 18:48, Pavel P  wrote:
>
>> Hi,
>>
>> Again I have an issue with the power of the cluster.
>>
>> I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb
>> disk attached.
>>
>>
>> 
>>
>>
>> There are 1323957069 docs (1.64TB) there, the documents distribution
>> is the next:
>>
>>
>> 
>>
>> All the 3 nodes are data nodes.
>>
>> The index throughput is something about 10-20k documents per minute.
>> (it's the logstash -> elasticsearch setup, we store different logs in the
>> cluster)
>>
>> My concerns are the next:
>>
>> 1. When I load the index page of kibana - the loading of the document
>> t

Re: I need to call my server xxx.xx.xx.xxx:xxxxx using elasticsearch api in python

Can you please tell me the step to do that from starting. That will be a 
great help.

Regards,
Nimit

On Friday, 12 September 2014 16:35:11 UTC+5:30, Honza Král wrote:
>
> I am sorry, I cannot help you. Elasticsearch requires no password and 
> the URL you supplied doesn't correspond to any API in Elasticsearch 
> (unless you are pointing to a single document). I assume there is 
> something weird with your setup. 
>
> On Fri, Sep 12, 2014 at 1:01 PM, Nimit Jain  > wrote: 
> > I am not think so that I have reached Elasticsearch using this. But 
> below is 
> > the command that I have used. 
> > 
> > curl -i -XGET 10.xxx.66.xxx:6xxx8/ea/api/discovery.json -d ' 
> > 
> > There is no proxy in between the client and server but yes it do have 
> the 
> > login which we are not doing right now. Could you please tell me the way 
> to 
> > provide username and password using elasticsearch. 
> > 
> > It would be very helpful. 
> > 
> > Regards, 
> > Nimit 
> > 
> > On Friday, 12 September 2014 15:39:28 UTC+5:30, Honza Král wrote: 
> >> 
> >> what is the curl comman you use to reach elasticsearch? 
> >> 
> >> On Fri, Sep 12, 2014 at 11:51 AM, Nimit Jain  
> wrote: 
> >> > With the same URL I am able to get the json from curl command. 
> >> > Full url is http://10.xxx.66.xxx:6xxx8/ea/api/discovery.json  but 
> with 
> >> > elasticsearch the Status is N/A. I don't know why this is happening. 
> >> > 
> >> > Regards, 
> >> > Nimit 
> >> > 
> >> > On Friday, 12 September 2014 12:29:39 UTC+5:30, Magnus Bäck wrote: 
> >> >> 
> >> >> On Friday, September 12, 2014 at 04:45 CEST, 
> >> >>  Nimit Jain  wrote: 
> >> >> 
> >> >> > Thanks Honza for your reply. While trying the below code with 
> >> >> > print(es.info()) I am getting the below error. 
> >> >> > pydev debugger: starting (pid: 9652) 
> >> >> > GET / [status:401 request:0.563s] 
> >> >> > Traceback (most recent call last): 
> >> >> 
> >> >> [...] 
> >> >> 
> >> >> > elasticsearch.exceptions.TransportError: TransportError(401, '') 
> >> >> > Here the status is 401. Please help. 
> >> >> 
> >> >> HTTP status 401 is "Unauthorized". Do you have a proxy that 
> >> >> expects authentication between your host and Elasticsearch? 
> >> >> I suspect Elasticsearch itself isn't capable of returning 401. 
> >> >> 
> >> >> -- 
> >> >> Magnus Bäck| Software Engineer, Development Tools 
> >> >> magnu...@sonymobile.com | Sony Mobile Communications 
> >> > 
> >> > -- 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "elasticsearch" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> > an 
> >> > email to elasticsearc...@googlegroups.com. 
> >> > To view this discussion on the web visit 
> >> > 
> >> > 
> https://groups.google.com/d/msgid/elasticsearch/3657ab99-706f-4bfb-af74-44d55c618902%40googlegroups.com.
>  
>
> >> > 
> >> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to elasticsearc...@googlegroups.com . 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/d3eec423-9cee-4fbf-90a8-60d4f79ffa23%40googlegroups.com.
>  
>
> > 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bf06fa28-2bbc-4055-b943-9f94bd08d434%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: I need to call my server xxx.xx.xx.xxx:xxxxx using elasticsearch api in python

2014-09-12 Thread Honza Král

I am sorry, I cannot help you. Elasticsearch requires no password and
the URL you supplied doesn't correspond to any API in Elasticsearch
(unless you are pointing to a single document). I assume there is
something weird with your setup.

On Fri, Sep 12, 2014 at 1:01 PM, Nimit Jain  wrote:
> I am not think so that I have reached Elasticsearch using this. But below is
> the command that I have used.
>
> curl -i -XGET 10.xxx.66.xxx:6xxx8/ea/api/discovery.json -d '
>
> There is no proxy in between the client and server but yes it do have the
> login which we are not doing right now. Could you please tell me the way to
> provide username and password using elasticsearch.
>
> It would be very helpful.
>
> Regards,
> Nimit
>
> On Friday, 12 September 2014 15:39:28 UTC+5:30, Honza Král wrote:
>>
>> what is the curl comman you use to reach elasticsearch?
>>
>> On Fri, Sep 12, 2014 at 11:51 AM, Nimit Jain  wrote:
>> > With the same URL I am able to get the json from curl command.
>> > Full url is http://10.xxx.66.xxx:6xxx8/ea/api/discovery.json  but with
>> > elasticsearch the Status is N/A. I don't know why this is happening.
>> >
>> > Regards,
>> > Nimit
>> >
>> > On Friday, 12 September 2014 12:29:39 UTC+5:30, Magnus Bäck wrote:
>> >>
>> >> On Friday, September 12, 2014 at 04:45 CEST,
>> >>  Nimit Jain  wrote:
>> >>
>> >> > Thanks Honza for your reply. While trying the below code with
>> >> > print(es.info()) I am getting the below error.
>> >> > pydev debugger: starting (pid: 9652)
>> >> > GET / [status:401 request:0.563s]
>> >> > Traceback (most recent call last):
>> >>
>> >> [...]
>> >>
>> >> > elasticsearch.exceptions.TransportError: TransportError(401, '')
>> >> > Here the status is 401. Please help.
>> >>
>> >> HTTP status 401 is "Unauthorized". Do you have a proxy that
>> >> expects authentication between your host and Elasticsearch?
>> >> I suspect Elasticsearch itself isn't capable of returning 401.
>> >>
>> >> --
>> >> Magnus Bäck| Software Engineer, Development Tools
>> >> magnu...@sonymobile.com | Sony Mobile Communications
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to elasticsearc...@googlegroups.com.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/elasticsearch/3657ab99-706f-4bfb-af74-44d55c618902%40googlegroups.com.
>> >
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d3eec423-9cee-4fbf-90a8-60d4f79ffa23%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDir1UxRxz%2BW3LH%3D40YfafJarDwN7W%2B0gE-8Div3HAbjSfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: I need to call my server xxx.xx.xx.xxx:xxxxx using elasticsearch api in python

I am not think so that I have reached Elasticsearch using this. But below 
is the command that I have used.

curl -i -XGET 10.xxx.66.xxx:6xxx8/ea/api/discovery.json -d '

There is no proxy in between the client and server but yes it do have the 
login which we are not doing right now. Could you please tell me the way to 
provide username and password using elasticsearch.

It would be very helpful.

Regards,
Nimit

On Friday, 12 September 2014 15:39:28 UTC+5:30, Honza Král wrote:
>
> what is the curl comman you use to reach elasticsearch? 
>
> On Fri, Sep 12, 2014 at 11:51 AM, Nimit Jain  > wrote: 
> > With the same URL I am able to get the json from curl command. 
> > Full url is http://10.xxx.66.xxx:6xxx8/ea/api/discovery.json  but with 
> > elasticsearch the Status is N/A. I don't know why this is happening. 
> > 
> > Regards, 
> > Nimit 
> > 
> > On Friday, 12 September 2014 12:29:39 UTC+5:30, Magnus Bäck wrote: 
> >> 
> >> On Friday, September 12, 2014 at 04:45 CEST, 
> >>  Nimit Jain  wrote: 
> >> 
> >> > Thanks Honza for your reply. While trying the below code with 
> >> > print(es.info()) I am getting the below error. 
> >> > pydev debugger: starting (pid: 9652) 
> >> > GET / [status:401 request:0.563s] 
> >> > Traceback (most recent call last): 
> >> 
> >> [...] 
> >> 
> >> > elasticsearch.exceptions.TransportError: TransportError(401, '') 
> >> > Here the status is 401. Please help. 
> >> 
> >> HTTP status 401 is "Unauthorized". Do you have a proxy that 
> >> expects authentication between your host and Elasticsearch? 
> >> I suspect Elasticsearch itself isn't capable of returning 401. 
> >> 
> >> -- 
> >> Magnus Bäck| Software Engineer, Development Tools 
> >> magnu...@sonymobile.com | Sony Mobile Communications 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to elasticsearc...@googlegroups.com . 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/3657ab99-706f-4bfb-af74-44d55c618902%40googlegroups.com.
>  
>
> > 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d3eec423-9cee-4fbf-90a8-60d4f79ffa23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

As I initially mentioned, it all depends on your use case but generally ES
does scale better horizontally rather than vertically. If you can, spin up
another cluster along side the one you have and then replica the data set
and query usage and compare the performance.

Ideally you should aim for one primary shard per node but you can over
allocate if you expect to grow - ie create 6 shards if you expect to grow
to 6 servers. This applies on larger clusters as well, to a point.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 September 2014 19:24, Pavel P  wrote:

> Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and
> 75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30
> RAM (in total 24 CPU and 90RAM) ?
> Assuming that the information would be spread there equally.
>
> btw, what about the shards allocation. Currently I use the default one 5
> shards and 1 replica. Could this be a potential thing to optimisation?
> How the shards scheme should look on the cluster with the bigger number of
> the nodes?
>
> Regards,
>
> On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:
>>
>> The answer is it depends on what sort of use case you have.
>> But if you are experiencing problems like you are then usually it's due
>> to the cluster being at capacity and needing more resources.
>>
>> You may find it cheaper to move to more numerous and smaller nodes that
>> you can distribute the load across, as that is where ES excels and also how
>> many other big data platforms operate.
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 12 September 2014 19:01, Pavel P  wrote:
>>
>>> Java version is "1.7.0_55"
>>> Elasticsearch is 1.3.1
>>>
>>> Well, the cost of the whole setup is the question.
>>> currently it's something about 1000$ per month on AWS. Do we really need
>>> to pay a lot more then 1000$/month to support the 1.5Tb data?
>>>
>>> Could you briefly describe how much nodes do you expect to handle that
>>> much of data?
>>>
>>> The side question is, how the the really Big Data solution works, when
>>> they do the search or aggregation from the data which size is far more then
>>> 1.5Tb? Or it's as well is the size of the architecture.
>>>
>>> Regards,
>>>
>>> On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:

 That's a lot of data for 3 nodes!
 You really need to adjust your infrastructure; add more nodes, more
 ram, or alternatively remove some old indexes (delete or close).

 What ES and java version are you running?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 12 September 2014 18:48, Pavel P  wrote:

> Hi,
>
> Again I have an issue with the power of the cluster.
>
> I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
> attached.
>
>
> 
>
>
> There are 1323957069 docs (1.64TB) there, the documents distribution
> is the next:
>
>
> 
>
> All the 3 nodes are data nodes.
>
> The index throughput is something about 10-20k documents per minute.
> (it's the logstash -> elasticsearch setup, we store different logs in the
> cluster)
>
> My concerns are the next:
>
> 1. When I load the index page of kibana - the loading of the document
> types panel takes about a minute. It that ok?
> 2. For the document type user_account, when I try to build the terms
> panel for the field "message.raw" (the string of 20-30 characters). My
> cluster stucks.
> In the logs I can find the next
>
> [2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius]
>> New used memory 6499531395 [6gb] from field [message.raw] would be larger
>> than configured breaker: 6414558822 [5.9gb], breaking
>
>
> But, despite of the breakers, when it tries to calculate that terms
> pie, it stops indexing the input documents. The queue goes up. Then, it
> happens that I see the heap exceptions and to solve them the only thing I
> could do is to reboot the cluster.
>
> *My question is the next:*
>
> It looks like I have quite powerful servers and the correct
> configuration (my ES_HEAP_SIZE is set to 15g), while they are still
> not able to process the 1.5Tb of information or doing that quite slowly.
> Do you have any advice of how to overcome that and make my cluster to
> r

Re: Do I need the JDBC driver

You can use either style, it is a matter of taste, or convenience.

With the JDBC plugin, you can also push data instead of pull.

Jörg

On Fri, Sep 12, 2014 at 12:11 PM, James  wrote:

> I want to close this issue but I still do not understand if I should be
> pushing documents from my database using the PHP client or using the JDBC
> river to pull them into elasticsearch from the SQL database.
>
> They can both achieve the same thing, but what is the usecase which
> defines when is the right time to use each implementation.
>
>
>
> On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
>>
>> Hi,
>>
>> I'm setting up a system where I have a main SQL database which is synced
>> with elasticsearch. My plan is to use the main PHP library for
>> elasticsearch.
>>
>> I was going to have a cron run every thirty minuets to check for items in
>> my database that not only have an "active" flag but that also do not have
>> an "indexed" flag, that means I need to add them to the index. Then I was
>> going to add that item to the index. Since I am using taking this path, it
>> doesn't seem like I need the JDBC driver, as I can add items to
>> elasticsearch using the PHP library.
>>
>> So, my question is, can I get away without using the JDBC driver?
>>
>> James
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6c244e00-1f89-447d-8eb5-114f0b5efcbd%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHT2DcMHJwMjxBZ0RsV4_eKJyB2KjBCiqB2ZTac8fzkTg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Do I need the JDBC driver

2014-09-12 Thread James

I want to close this issue but I still do not understand if I should be 
pushing documents from my database using the PHP client or using the JDBC 
river to pull them into elasticsearch from the SQL database. 

They can both achieve the same thing, but what is the usecase which defines 
when is the right time to use each implementation.



On Wednesday, September 10, 2014 10:59:18 AM UTC+1, James wrote:
>
> Hi,
>
> I'm setting up a system where I have a main SQL database which is synced 
> with elasticsearch. My plan is to use the main PHP library for 
> elasticsearch. 
>
> I was going to have a cron run every thirty minuets to check for items in 
> my database that not only have an "active" flag but that also do not have 
> an "indexed" flag, that means I need to add them to the index. Then I was 
> going to add that item to the index. Since I am using taking this path, it 
> doesn't seem like I need the JDBC driver, as I can add items to 
> elasticsearch using the PHP library.
>
> So, my question is, can I get away without using the JDBC driver?
>
> James
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c244e00-1f89-447d-8eb5-114f0b5efcbd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: I need to call my server xxx.xx.xx.xxx:xxxxx using elasticsearch api in python

2014-09-12 Thread Honza Král

what is the curl comman you use to reach elasticsearch?

On Fri, Sep 12, 2014 at 11:51 AM, Nimit Jain  wrote:
> With the same URL I am able to get the json from curl command.
> Full url is http://10.xxx.66.xxx:6xxx8/ea/api/discovery.json  but with
> elasticsearch the Status is N/A. I don't know why this is happening.
>
> Regards,
> Nimit
>
> On Friday, 12 September 2014 12:29:39 UTC+5:30, Magnus Bäck wrote:
>>
>> On Friday, September 12, 2014 at 04:45 CEST,
>>  Nimit Jain  wrote:
>>
>> > Thanks Honza for your reply. While trying the below code with
>> > print(es.info()) I am getting the below error.
>> > pydev debugger: starting (pid: 9652)
>> > GET / [status:401 request:0.563s]
>> > Traceback (most recent call last):
>>
>> [...]
>>
>> > elasticsearch.exceptions.TransportError: TransportError(401, '')
>> > Here the status is 401. Please help.
>>
>> HTTP status 401 is "Unauthorized". Do you have a proxy that
>> expects authentication between your host and Elasticsearch?
>> I suspect Elasticsearch itself isn't capable of returning 401.
>>
>> --
>> Magnus Bäck| Software Engineer, Development Tools
>> magnu...@sonymobile.com | Sony Mobile Communications
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3657ab99-706f-4bfb-af74-44d55c618902%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDir7Q%3DWg_jpMvmtixYArsPWSmm148Z%3DtttAX4HLs_H75kg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: I need to call my server xxx.xx.xx.xxx:xxxxx using elasticsearch api in python

Also, can we mention the username and password of the url from python in 
ElasticSearch.

On Friday, 12 September 2014 15:21:50 UTC+5:30, Nimit Jain wrote:
>
> With the same URL I am able to get the json from curl command. 
> Full url is http://10.xxx.66.xxx:6xxx8/ea/api/discovery.json  but with 
> elasticsearch the Status is N/A. I don't know why this is happening.
>
> Regards,
> Nimit
>
> On Friday, 12 September 2014 12:29:39 UTC+5:30, Magnus Bäck wrote:
>>
>> On Friday, September 12, 2014 at 04:45 CEST, 
>>  Nimit Jain  wrote: 
>>
>> > Thanks Honza for your reply. While trying the below code with 
>> > print(es.info()) I am getting the below error. 
>> > pydev debugger: starting (pid: 9652) 
>> > GET / [status:401 request:0.563s] 
>> > Traceback (most recent call last): 
>>
>> [...] 
>>
>> > elasticsearch.exceptions.TransportError: TransportError(401, '') 
>> > Here the status is 401. Please help. 
>>
>> HTTP status 401 is "Unauthorized". Do you have a proxy that 
>> expects authentication between your host and Elasticsearch? 
>> I suspect Elasticsearch itself isn't capable of returning 401. 
>>
>> -- 
>> Magnus Bäck| Software Engineer, Development Tools 
>> magnu...@sonymobile.com | Sony Mobile Communications 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e5ed8eb3-4ab8-40fc-ba92-8d8222b9abb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: I need to call my server xxx.xx.xx.xxx:xxxxx using elasticsearch api in python

With the same URL I am able to get the json from curl command. 
Full url is http://10.xxx.66.xxx:6xxx8/ea/api/discovery.json  but with 
elasticsearch the Status is N/A. I don't know why this is happening.

Regards,
Nimit

On Friday, 12 September 2014 12:29:39 UTC+5:30, Magnus Bäck wrote:
>
> On Friday, September 12, 2014 at 04:45 CEST, 
>  Nimit Jain > wrote: 
>
> > Thanks Honza for your reply. While trying the below code with 
> > print(es.info()) I am getting the below error. 
> > pydev debugger: starting (pid: 9652) 
> > GET / [status:401 request:0.563s] 
> > Traceback (most recent call last): 
>
> [...] 
>
> > elasticsearch.exceptions.TransportError: TransportError(401, '') 
> > Here the status is 401. Please help. 
>
> HTTP status 401 is "Unauthorized". Do you have a proxy that 
> expects authentication between your host and Elasticsearch? 
> I suspect Elasticsearch itself isn't capable of returning 401. 
>
> -- 
> Magnus Bäck| Software Engineer, Development Tools 
> magnu...@sonymobile.com  | Sony Mobile Communications 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3657ab99-706f-4bfb-af74-44d55c618902%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and 
75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30 
RAM (in total 24 CPU and 90RAM) ?
Assuming that the information would be spread there equally.

btw, what about the shards allocation. Currently I use the default one 5 
shards and 1 replica. Could this be a potential thing to optimisation?
How the shards scheme should look on the cluster with the bigger number of 
the nodes?

Regards,

On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:
>
> The answer is it depends on what sort of use case you have.
> But if you are experiencing problems like you are then usually it's due to 
> the cluster being at capacity and needing more resources.
>
> You may find it cheaper to move to more numerous and smaller nodes that 
> you can distribute the load across, as that is where ES excels and also how 
> many other big data platforms operate.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 12 September 2014 19:01, Pavel P > 
> wrote:
>
>> Java version is "1.7.0_55"
>> Elasticsearch is 1.3.1
>>
>> Well, the cost of the whole setup is the question.
>> currently it's something about 1000$ per month on AWS. Do we really need 
>> to pay a lot more then 1000$/month to support the 1.5Tb data?
>>
>> Could you briefly describe how much nodes do you expect to handle that 
>> much of data?
>>
>> The side question is, how the the really Big Data solution works, when 
>> they do the search or aggregation from the data which size is far more then 
>> 1.5Tb? Or it's as well is the size of the architecture.
>>
>> Regards,
>>
>> On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:
>>>
>>> That's a lot of data for 3 nodes!
>>> You really need to adjust your infrastructure; add more nodes, more ram, 
>>> or alternatively remove some old indexes (delete or close).
>>>
>>> What ES and java version are you running?
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 12 September 2014 18:48, Pavel P  wrote:
>>>
 Hi,

 Again I have an issue with the power of the cluster.

 I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk 
 attached.

 There are 1323957069 docs (1.64TB) there, the documents distribution 
 is the next:

 All the 3 nodes are data nodes.

 The index throughput is something about 10-20k documents per minute. 
 (it's the logstash -> elasticsearch setup, we store different logs in the 
 cluster)

 My concerns are the next:

 1. When I load the index page of kibana - the loading of the document 
 types panel takes about a minute. It that ok?
 2. For the document type user_account, when I try to build the terms 
 panel for the field "message.raw" (the string of 20-30 characters). My 
 cluster stucks.
 In the logs I can find the next

 [2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius] 
> New used memory 6499531395 [6gb] from field [message.raw] would be larger 
> than configured breaker: 6414558822 [5.9gb], breaking

 But, despite of the breakers, when it tries to calculate that terms 
 pie, it stops indexing the input documents. The queue goes up. Then, it 
 happens that I see the heap exceptions and to solve them the only thing I 
 could do is to reboot the cluster.

 *My question is the next:*

 It looks like I have quite powerful servers and the correct 
 configuration (my ES_HEAP_SIZE is set to 15g), while they are still 
 not able to process the 1.5Tb of information or doing that quite slowly.
 Do you have any advice of how to overcome that and make my cluster to 
 response more fast? How should I adjust the infrastructure?

 Which hardware should I need to manipulate the 1.5Tb in the reasonable 
 amount of time?

 Any thoughts are welcome.

 Regards,

  -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%
 40googlegroups.com

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

The answer is it depends on what sort of use case you have.
But if you are experiencing problems like you are then usually it's due to
the cluster being at capacity and needing more resources.

You may find it cheaper to move to more numerous and smaller nodes that you
can distribute the load across, as that is where ES excels and also how
many other big data platforms operate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 September 2014 19:01, Pavel P  wrote:

> Java version is "1.7.0_55"
> Elasticsearch is 1.3.1
>
> Well, the cost of the whole setup is the question.
> currently it's something about 1000$ per month on AWS. Do we really need
> to pay a lot more then 1000$/month to support the 1.5Tb data?
>
> Could you briefly describe how much nodes do you expect to handle that
> much of data?
>
> The side question is, how the the really Big Data solution works, when
> they do the search or aggregation from the data which size is far more then
> 1.5Tb? Or it's as well is the size of the architecture.
>
> Regards,
>
> On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:
>>
>> That's a lot of data for 3 nodes!
>> You really need to adjust your infrastructure; add more nodes, more ram,
>> or alternatively remove some old indexes (delete or close).
>>
>> What ES and java version are you running?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 12 September 2014 18:48, Pavel P  wrote:
>>
>>> Hi,
>>>
>>> Again I have an issue with the power of the cluster.
>>>
>>> I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
>>> attached.
>>>
>>>
>>> 
>>>
>>>
>>> There are 1323957069 docs (1.64TB) there, the documents distribution is
>>> the next:
>>>
>>>
>>> 
>>>
>>> All the 3 nodes are data nodes.
>>>
>>> The index throughput is something about 10-20k documents per minute.
>>> (it's the logstash -> elasticsearch setup, we store different logs in the
>>> cluster)
>>>
>>> My concerns are the next:
>>>
>>> 1. When I load the index page of kibana - the loading of the document
>>> types panel takes about a minute. It that ok?
>>> 2. For the document type user_account, when I try to build the terms
>>> panel for the field "message.raw" (the string of 20-30 characters). My
>>> cluster stucks.
>>> In the logs I can find the next
>>>
>>> [2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius]
 New used memory 6499531395 [6gb] from field [message.raw] would be larger
 than configured breaker: 6414558822 [5.9gb], breaking
>>>
>>>
>>> But, despite of the breakers, when it tries to calculate that terms pie,
>>> it stops indexing the input documents. The queue goes up. Then, it happens
>>> that I see the heap exceptions and to solve them the only thing I could do
>>> is to reboot the cluster.
>>>
>>> *My question is the next:*
>>>
>>> It looks like I have quite powerful servers and the correct
>>> configuration (my ES_HEAP_SIZE is set to 15g), while they are still not
>>> able to process the 1.5Tb of information or doing that quite slowly.
>>> Do you have any advice of how to overcome that and make my cluster to
>>> response more fast? How should I adjust the infrastructure?
>>>
>>> Which hardware should I need to manipulate the 1.5Tb in the reasonable
>>> amount of time?
>>>
>>> Any thoughts are welcome.
>>>
>>> Regards,
>>>
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You rece

Elasticsearch.net client, endpoint strategy?

2014-09-12 Thread Lasse Schou

Hi,

Not sure if this is the right user group, but here goes:

I'm planning to use ElasticSearch.net as the client for connecting to my ES 
cluster. I have one question I haven't been able to find the answer to. I 
know that the ConnectionPool feature can check if nodes fail, but can the 
client also ensure that data is written to the right shard, or does it 
simply use round robin to connect?

Example:

- A document "1234" is created
- Based on the current number of shards, the document should be put on node 
6 and a replica on node 11.
- Is the request sent directly to node 6, or is it sent to a random node 
which will then forward the request to the right servers?

Thanks,
Lasse

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52209da0-ad03-470e-b8d4-f4fd283dda4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Java version is "1.7.0_55"
Elasticsearch is 1.3.1

Well, the cost of the whole setup is the question.
currently it's something about 1000$ per month on AWS. Do we really need to 
pay a lot more then 1000$/month to support the 1.5Tb data?

Could you briefly describe how much nodes do you expect to handle that much 
of data?

The side question is, how the the really Big Data solution works, when they 
do the search or aggregation from the data which size is far more then 
1.5Tb? Or it's as well is the size of the architecture.

Regards,

On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:
>
> That's a lot of data for 3 nodes!
> You really need to adjust your infrastructure; add more nodes, more ram, 
> or alternatively remove some old indexes (delete or close).
>
> What ES and java version are you running?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 12 September 2014 18:48, Pavel P > 
> wrote:
>
>> Hi,
>>
>> Again I have an issue with the power of the cluster.
>>
>> I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk 
>> attached.
>>
>>
>> 
>>
>>
>> There are 1323957069 docs (1.64TB) there, the documents distribution is 
>> the next:
>>
>>
>> 
>>
>> All the 3 nodes are data nodes.
>>
>> The index throughput is something about 10-20k documents per minute. 
>> (it's the logstash -> elasticsearch setup, we store different logs in the 
>> cluster)
>>
>> My concerns are the next:
>>
>> 1. When I load the index page of kibana - the loading of the document 
>> types panel takes about a minute. It that ok?
>> 2. For the document type user_account, when I try to build the terms 
>> panel for the field "message.raw" (the string of 20-30 characters). My 
>> cluster stucks.
>> In the logs I can find the next
>>
>> [2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius] New 
>>> used memory 6499531395 [6gb] from field [message.raw] would be larger than 
>>> configured breaker: 6414558822 [5.9gb], breaking
>>
>>
>> But, despite of the breakers, when it tries to calculate that terms pie, 
>> it stops indexing the input documents. The queue goes up. Then, it happens 
>> that I see the heap exceptions and to solve them the only thing I could do 
>> is to reboot the cluster.
>>
>> *My question is the next:*
>>
>> It looks like I have quite powerful servers and the correct configuration 
>> (my ES_HEAP_SIZE is set to 15g), while they are still not able to 
>> process the 1.5Tb of information or doing that quite slowly.
>> Do you have any advice of how to overcome that and make my cluster to 
>> response more fast? How should I adjust the infrastructure?
>>
>> Which hardware should I need to manipulate the 1.5Tb in the reasonable 
>> amount of time?
>>
>> Any thoughts are welcome.
>>
>> Regards,
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Discrete value aggregations on a URL field

2014-09-12 Thread Magnus Bäck

On Friday, September 12, 2014 at 09:23 CEST,
 Ali Kheyrollahi  wrote:

>On Friday, 12 September 2014 08:18:19 UTC+1, Ali Kheyrollahi wrote:
>
> > I am trying to find numbers of discrete value per URL in a day and
> > the result is not what I expect.

[...]

> > Result is bizarre, I mean it breaks my URL into its segments
> > and aggregates on that. Do I need to use Hash of the URL (I prefer
> > not to)?
>
> OK, it seems that I need to use not_analyzed on the field. Is that
> correct?

Yes.

-- 
Magnus Bäck| Software Engineer, Development Tools
magnus.b...@sonymobile.com | Sony Mobile Communications

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20140912085425.GA9172%40seldlx20533.corpusers.net.
For more options, visit https://groups.google.com/d/optout.

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

That's a lot of data for 3 nodes!
You really need to adjust your infrastructure; add more nodes, more ram, or
alternatively remove some old indexes (delete or close).

What ES and java version are you running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 September 2014 18:48, Pavel P  wrote:

> Hi,
>
> Again I have an issue with the power of the cluster.
>
> I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
> attached.
>
>
> 
>
>
> There are 1323957069 docs (1.64TB) there, the documents distribution is
> the next:
>
>
> 
>
> All the 3 nodes are data nodes.
>
> The index throughput is something about 10-20k documents per minute. (it's
> the logstash -> elasticsearch setup, we store different logs in the cluster)
>
> My concerns are the next:
>
> 1. When I load the index page of kibana - the loading of the document
> types panel takes about a minute. It that ok?
> 2. For the document type user_account, when I try to build the terms panel
> for the field "message.raw" (the string of 20-30 characters). My cluster
> stucks.
> In the logs I can find the next
>
> [2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius] New
>> used memory 6499531395 [6gb] from field [message.raw] would be larger than
>> configured breaker: 6414558822 [5.9gb], breaking
>
>
> But, despite of the breakers, when it tries to calculate that terms pie,
> it stops indexing the input documents. The queue goes up. Then, it happens
> that I see the heap exceptions and to solve them the only thing I could do
> is to reboot the cluster.
>
> *My question is the next:*
>
> It looks like I have quite powerful servers and the correct configuration
> (my ES_HEAP_SIZE is set to 15g), while they are still not able to process
> the 1.5Tb of information or doing that quite slowly.
> Do you have any advice of how to overcome that and make my cluster to
> response more fast? How should I adjust the infrastructure?
>
> Which hardware should I need to manipulate the 1.5Tb in the reasonable
> amount of time?
>
> Any thoughts are welcome.
>
> Regards,
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bMpf0T2Spn0gBOowsLK-7rxSKf%2BE30FaFqV397LQSaKw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
attached.

There are 1323957069 docs (1.64TB) there, the documents distribution is the
next:

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per minute. (it's
the logstash -> elasticsearch setup, we store different logs in the cluster)

My concerns are the next:

1. When I load the index page of kibana - the loading of the document types
panel takes about a minute. It that ok?
2. For the document type user_account, when I try to build the terms panel
for the field "message.raw" (the string of 20-30 characters). My cluster
stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius] New
> used memory 6499531395 [6gb] from field [message.raw] would be larger than
> configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms pie, it
stops indexing the input documents. The queue goes up. Then, it happens
that I see the heap exceptions and to solve them the only thing I could do
is to reboot the cluster.

*My question is the next:*

It looks like I have quite powerful servers and the correct configuration
(my ES_HEAP_SIZE is set to 15g), while they are still not able to process
the 1.5Tb of information or doing that quite slowly.
Do you have any advice of how to overcome that and make my cluster to
response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the reasonable
amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: index size load from what file?

2014-09-12 Thread Jason Wee

anyone?

On Thursday, September 4, 2014 9:44:29 PM UTC+8, Jason Wee wrote:
>
> Hello ES,
>
> With curl showing the index statistics as below:
>
> $ curl 'http://localhost:9200/_cat/indices?v'
> health index   pri rep docs.count docs.deleted store.size pri.store.size 
> green  twitter   1   0  00   123b   123b 
>
>
> 123b is the index size? and where does this information comes from? shard 
> state file? index state file or global file?
>
> /data/nodes/0/indices/twitter/0/_state/state-2
> /data/nodes/0/indices/twitter/0/index/segments_1
> /data/nodes/0/indices/twitter/0/index/segments.gen
> /data/nodes/0/indices/twitter/_state/state-1
> /data/nodes/0/_state/global-1
>
> Thank you.
>
> Jason
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77ca94f7-d6e1-46ee-8f10-9dde02f2620c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Logstash Parsing Error