Re: Any experience with ES and Data Compressing Filesystems?

2014-08-03 Thread horst knete
Hi again,

a quick report regarding compression:

we are using a 3-TB btrfs-volume with 32k block size now which reduced the 
amount of data from 3,2 TB to 1,1TB without any segnificant performance 
losses ( we are using a 8 CPU, 20 GB Memory machine with an iSCSI.Link to 
the volume ).

So for us i can only suggest using the btrfs-volume for long term storage.

Am Montag, 21. Juli 2014 08:48:12 UTC+2 schrieb Patrick Proniewski:
>
> Hi, 
>
> gzip/zlib compression is very bad for performance, so it can be 
> interesting for closed indices, but for live data I would not recommend it. 
> Also, you must know that: 
>
> Compression using lz4 is already enabled into indices, 
> ES/Lucene/Java usually read&write 4k blocks, 
>
> -> hence, compression is achieved on 4k blocks. If your filesystem uses 4k 
> blocks and you add FS compression, you will probably have a very small 
> gain, if any. I've tried on ZFS: 
>
> Filesystem SizeUsed   Avail Capacity  Mounted on 
> zdata/ES-lz4   1.1T1.9G1.1T 0%/zdata/ES-lz4 
> zdata/ES   1.1T1.9G1.1T 0%/zdata/ES 
>
> If you are using a larger block size, like 128k, a compressed filesystem 
> does show some benefit: 
>
> Filesystem SizeUsed   Avail Capacity  Mounted on 
> zdata/ES-lz4   1.1T1.1G1.1T 0%   
>  /zdata/ES-lz4-> compressratio  1.73x 
> zdata/ES-gzip  1.1T901M1.1T 0%   
>  /zdata/ES-gzip-> compressratio  2.27x 
> zdata/ES   1.1T1.9G1.1T 0%/zdata/ES 
>
> But a file system block larger than 4k is very suboptimal for IO (ES read 
> or write one 4k block -> your FS must read or write a 128k block). 
>
> On 21 juil. 2014, at 07:58, horst knete > 
> wrote: 
>
> > Hey guys, 
> > 
> > we have mounted an btrfs file system with the compression method "zlib" 
> for 
> > testing purposes on our elasticsearchserver and copied one of the 
> indices 
> > on the btrfs volume, unfortunately it had no success and still got the 
> size 
> > of 50gb :/ 
> > 
> > I will further try it with other compression methods and will report 
> here 
> > 
> > Am Samstag, 19. Juli 2014 07:21:20 UTC+2 schrieb Otis Gospodnetic: 
> >> 
> >> Hi Horst, 
> >> 
> >> I wouldn't bother with this for the reasons Joerg mentioned, but should 
> >> you try it anyway, I'd love to hear your findings/observations. 
> >> 
> >> Otis 
> >> -- 
> >> Performance Monitoring * Log Analytics * Search Analytics 
> >> Solr & Elasticsearch Support * http://sematext.com/ 
> >> 
> >> 
> >> 
> >> On Wednesday, July 16, 2014 6:56:36 AM UTC-4, horst knete wrote: 
> >>> 
> >>> Hey Guys, 
> >>> 
> >>> to save a lot of hard disk space, we are going to use an compression 
> file 
> >>> system, which allows us transparent compression for the es-indices. 
> (It 
> >>> seems like es-indices are very good compressable, got up to 65% 
> >>> compression-rate in some tests). 
> >>> 
> >>> Currently the indices are laying at a ext4-Linux Filesystem which 
> >>> unfortunately dont have the transparent compression ability. 
> >>> 
> >>> Anyone of you got experience with compression file systems like BTRFS 
> or 
> >>> ZFS/OpenZFS and can tell us if this led to big performance losses? 
> >>> 
> >>> Thanks for responding 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1f9bf509-b185-4c66-99c5-d8f69e95bea8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: boosting query howto?

2014-08-03 Thread Bernd Fehling
Hi Jörg,
thanks for the advise, it seams to be my solution.

Are there any API javadocs for ES?
It takes me 3 to 4 times longer writing something for ES than for Solr 
because of searching through the sources
and no useful javadocs.

Bernd


Am Freitag, 1. August 2014 16:07:10 UTC+2 schrieb Jörg Prante:
>
> Have you tried boosting boolean query clauses?
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
>
> Jörg
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/489eceec-df34-4cda-86ef-bb354a162848%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Design HA ES for 16 TB logs data | Is SAN storage a good idea?

2014-08-03 Thread Mark Walkom
Heavy aggregations = lots of ram
Storage, if you can use SSD.

The only rule of thumb is get the best possible hardware that you can
afford.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 4 August 2014 13:09, John Cherniavsky  wrote:

> SAN question aside - what are guidelines on the balance of CPU/RAM/Storage
> so that no one thing is the obvious bottleneck.
>
> I know it depends on workload, so
>
> * For aggregation heavy workloads, about how much RAM : Storage?
>
> * For high volume, but smaller queries (individual log retrieval), what's
> the right CPU : Storage for spinning disk? To much CPU and all the extra
> queries are waiting on the disks to return, too much disk and the CPU can't
> keep up (or does that never happen?)
>
> Obviously every configuration is different - so does anyone have
> guidelines or past experience?
>
> On Sunday, August 3, 2014 1:49:09 PM UTC-7, Jörg Prante wrote:
>>
>> A. There are many unknown factors regarding "SAN storage", e.g. how is
>> the latency and the IOPS? Most of SAN are black boxes and do not scale over
>> the number of connected hosts, so you should test it thoroughly to make an
>> educated decision. There is no simple "yes" or "no". As a matter of fact, I
>> would never use SAN, only local storage, because SAN comes with the risk of
>> being bottleneck.
>>
>> B. No matter what specifications, you should test your configuration
>> first if it fits your performance requirements, there is no "yes" or "no".
>> The minimum number of nodes is 3 to avoid situations like split brain.
>>
>> C. You should expect more throughput if you can decouple client workload
>> from server workload, but that also depends on your workload pattern and
>> your tests. For example if you must preprocess data before indexing, or
>> postprocess search results, you will welcome additional nodes as a great
>> help.
>>
>> Jörg
>>
>>
>> On Sun, Aug 3, 2014 at 9:26 PM, sirkubax 
>> wrote:
>>
>>> Hi,
>>>
>>> I'm testing/planning implementation for 16 TB data logs (1 month, daily
>>> indexes about 530GB/day). Indexes are deleted after 1 month (TTL is 1
>>> month).
>>>
>>> The documents size vary from few bytes to 1MB (average of ~3 kb).
>>>
>>> We have 2 data center, and the requirement is to provide access to
>>> dataset when one is down.
>>>
>>> My current implementation looks like this:
>>>
>>>   cluster.routing.allocation.awareness.attributes: datacenter
>>>
>>>   cluster.routing.allocation.awareness.force.datacenter.values:
>>> datacenterA,datacenterB
>>>
>>> So the indexes are located on nodes in datacenterA and datacenterB.
>>> There is 1 replica for each index, so the index/replica is  balanced
>>> between locations.
>>>
>>> The problem A:
>>>
>>>  I have been offered a SAN storage space that could be provided to any
>>> of ES node machines. Now, it index/replica scenario, I need 2 * 16 TB = 32
>>> TB disk storage. If in raid1, it makes 64TB "real world" disk storage.
>>>
>>> Providing "independent, high quality" storage may (if ES would allow)
>>> reduce the size to required 16TB. I said "if ES would allow", because up to
>>> my current knowledge, nodes can not "share" dataset. If many nodes run on a
>>> common storage, they create own, unique path. Is that correct?
>>>
>>>  Could I run ES cluster where indexes have no replica, but still, nodeX
>>> failure does not affect accessibility of nodeXdataset to the Cluster?
>>>
>>> In my current idea of indexes without replica scenario, powering off (or
>>> failure) of the "NodeXDatacenterA" would make datasetX unavailable to read
>>> in cluster, at least until I start NodeXDatacenterB that would have access
>>> to datasetX (the same path configuration). Of course NodeXDatacenterA and
>>> NodeXDatacenterB could not run both in the same time.
>>>
>>> I just guess, that workaround suggested above is not "in the ES
>>> philosophy of shared storage and self-balancing". It would make upgrade of
>>> single node problematic, less fault-tolerant, etc.
>>>
>>>  Facts that makes me think about this solution is, that I have
>>> available some "24-core, 64GH Ram, limited disk storage" machines and a
>>> 16TB SAN storage that I could mount to that machines.
>>>
>>>  Do You have any suggestion of SAN storage usage? Is that a good idea
>>> at all?
>>>
>>>  The problem B: Design
>>>
>>>  My current idea of building the environment is to order N (6-8? or
>>> more) machines with big HDD's and run "normal ES cluster" with shards and
>>> replicas stored locally.
>>>
>>> The question is: how many of them would be enough :)
>>>
>>> Providing 24-core,64GB RAM and 4TB each it would make 4 machines to run
>>> minimal cluster settings in single Datacenter, and 8 machines total for
>>> both datacenters. What do you think about possible performance.
>>>
>>> Actually to be storage-safe I would go for 6-8 TB disk storage per
>>> machine. That would allow to run on "less than 4" nodes while 

Re: Design HA ES for 16 TB logs data | Is SAN storage a good idea?

2014-08-03 Thread John Cherniavsky
SAN question aside - what are guidelines on the balance of CPU/RAM/Storage 
so that no one thing is the obvious bottleneck.

I know it depends on workload, so

* For aggregation heavy workloads, about how much RAM : Storage?

* For high volume, but smaller queries (individual log retrieval), what's 
the right CPU : Storage for spinning disk? To much CPU and all the extra 
queries are waiting on the disks to return, too much disk and the CPU can't 
keep up (or does that never happen?)

Obviously every configuration is different - so does anyone have guidelines 
or past experience?

On Sunday, August 3, 2014 1:49:09 PM UTC-7, Jörg Prante wrote:
>
> A. There are many unknown factors regarding "SAN storage", e.g. how is the 
> latency and the IOPS? Most of SAN are black boxes and do not scale over the 
> number of connected hosts, so you should test it thoroughly to make an 
> educated decision. There is no simple "yes" or "no". As a matter of fact, I 
> would never use SAN, only local storage, because SAN comes with the risk of 
> being bottleneck.
>
> B. No matter what specifications, you should test your configuration first 
> if it fits your performance requirements, there is no "yes" or "no". The 
> minimum number of nodes is 3 to avoid situations like split brain.
>
> C. You should expect more throughput if you can decouple client workload 
> from server workload, but that also depends on your workload pattern and 
> your tests. For example if you must preprocess data before indexing, or 
> postprocess search results, you will welcome additional nodes as a great 
> help.
>
> Jörg
>
>
> On Sun, Aug 3, 2014 at 9:26 PM, sirkubax  > wrote:
>
>> Hi,
>>
>> I'm testing/planning implementation for 16 TB data logs (1 month, daily 
>> indexes about 530GB/day). Indexes are deleted after 1 month (TTL is 1 
>> month).
>>
>> The documents size vary from few bytes to 1MB (average of ~3 kb).
>>
>> We have 2 data center, and the requirement is to provide access to 
>> dataset when one is down.
>>
>> My current implementation looks like this:
>>
>>   cluster.routing.allocation.awareness.attributes: datacenter
>>
>>   cluster.routing.allocation.awareness.force.datacenter.values: 
>> datacenterA,datacenterB
>>
>> So the indexes are located on nodes in datacenterA and datacenterB. There 
>> is 1 replica for each index, so the index/replica is  balanced between 
>> locations.
>>
>> The problem A:
>>
>>  I have been offered a SAN storage space that could be provided to any 
>> of ES node machines. Now, it index/replica scenario, I need 2 * 16 TB = 32 
>> TB disk storage. If in raid1, it makes 64TB "real world" disk storage.
>>
>> Providing "independent, high quality" storage may (if ES would allow) 
>> reduce the size to required 16TB. I said "if ES would allow", because up to 
>> my current knowledge, nodes can not "share" dataset. If many nodes run on a 
>> common storage, they create own, unique path. Is that correct? 
>>
>>  Could I run ES cluster where indexes have no replica, but still, nodeX 
>> failure does not affect accessibility of nodeXdataset to the Cluster?
>>
>> In my current idea of indexes without replica scenario, powering off (or 
>> failure) of the "NodeXDatacenterA" would make datasetX unavailable to read 
>> in cluster, at least until I start NodeXDatacenterB that would have access 
>> to datasetX (the same path configuration). Of course NodeXDatacenterA and 
>> NodeXDatacenterB could not run both in the same time.
>>
>> I just guess, that workaround suggested above is not "in the ES 
>> philosophy of shared storage and self-balancing". It would make upgrade of 
>> single node problematic, less fault-tolerant, etc.
>>
>>  Facts that makes me think about this solution is, that I have available 
>> some "24-core, 64GH Ram, limited disk storage" machines and a 16TB SAN 
>> storage that I could mount to that machines.
>>
>>  Do You have any suggestion of SAN storage usage? Is that a good idea at 
>> all?
>>
>>  The problem B: Design
>>
>>  My current idea of building the environment is to order N (6-8? or 
>> more) machines with big HDD's and run "normal ES cluster" with shards and 
>> replicas stored locally.
>>
>> The question is: how many of them would be enough :)
>>
>> Providing 24-core,64GB RAM and 4TB each it would make 4 machines to run 
>> minimal cluster settings in single Datacenter, and 8 machines total for 
>> both datacenters. What do you think about possible performance. 
>>
>> Actually to be storage-safe I would go for 6-8 TB disk storage per 
>> machine. That would allow to run on "less than 4" nodes while operation in 
>> single datacenter.
>>
>> I wonder if 64GB RAM would be enough. 
>>
>> The whole process of acquiring new servers takes time - is there a "good 
>> practise" guide to determine minimum number of servers in the cluster?
>>
>>  How many shards would You suggest?
>>
>>  Question C:
>>
>>  I have seen some performance advices to make "client" ES nodes as a 
>> machine without da

hi , about copying index data of elasticsearch ,i have a question. can you give me a suggestion?

2014-08-03 Thread huangshanjay
i had build a elasticsearch cluster in one idc of city A, at the same time, 
indexing in city A. 
but in some time,i need have the same cluster  in city B,including index 
data. can i copy the index data of city A to cluster of city B but index in 
city B again?
if this idea if ok,what i should do in detail.(i use es just one month)
i am waiting for replay.thanks very much.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/61b5483e-07cd-4a92-9676-ff95837e55ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: action.admin.indices.create failed to create

2014-08-03 Thread Stephen Samuel
Bit of a late reply as I never saw this, but like the other guys say, 
sounds like you're not specifying the index. What was the syntax you were 
using.

On Monday, April 21, 2014 6:37:39 PM UTC+1, miki haiat wrote:
>
> HI ,
>
> I using elastic4s for an api client  , i cant index anything 
> im getting this error 
>
> [2014-04-21 20:34:02,735][DEBUG][action.admin.indices.create] [Captain UK] 
> [] failed to create
> java.lang.StringIndexOutOfBoundsException: String index out of range: 0
>  at java.lang.String.charAt(String.java:658)
>  at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.
> validateIndexName(MetaDataCreateIndexService.java:168)
>  at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.validate
> (MetaDataCreateIndexService.java:523)
>  at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.
> access$100(MetaDataCreateIndexService.java:87)
>  at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.
> execute(MetaDataCreateIndexService.java:220)
>  at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.
> run(InternalClusterService.java:308)
>  at org.elasticsearch.common.util.concurrent.
> PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(
> PrioritizedEsThreadPoolExecutor.java:134)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> java:1145)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
>  at java.lang.Thread.run(Thread.java:745)
>
> I tried several method and option but the result is the same ,
>
> what im i doing wrong  ??
>
> thanks 
> miki
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8064b263-fcaa-47b2-a7ab-5e77717a59a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: rest api or java client?

2014-08-03 Thread Stephen Samuel
It does support 2.11 of course.

And about the Java client documentation - one more reason to use the Scala 
DSL in Elastic4s as you'll get code completion.

For example you can do this

`search in "places"->"cities" query "paris" start 5 limit 10` and each step 
of the way the DSL will let you know what's applicable for the syntax.

On Friday, July 25, 2014 11:06:30 AM UTC+1, CB wrote:
>
> thanks for the answers, here are my thoughts:
>
> 1. If using pure REST client - Using a Load Balancer will make sure that 
> the endpoint address goes to any of the "live" nodes (round robin) so that 
> if one of those nodes "dies" or if I scale out the cluster (add more nodes) 
> it is transparent to the client. Does that make sense?
>
> 2. Jörg - can you please provide more details / link explaining about why 
> and how the "REST API sits on top a Java Client"
>
> 3. The java client is fine but the documentation of the actual query API 
> is pretty basic and will always send you to the REST documentation. I found 
> it hard to "translate" the REST API docs to native java client APIs
>
> elastic4s seems very promising, although not sure it supports scala 2.11. 
> I might give it a spin - thanks for the tip ;)
>
> BTW - Do you know if the java client is using a binary protocol ? that 
> might become a big advantage over REST for large query results..
>
>
> On Friday, July 25, 2014 10:59:43 AM UTC+3, Jörg Prante wrote:
>>
>> 1. No. ES is already managing connections, see TransportClient
>>
>> 2. REST API sits on top of native Java client. So, because of HTTP, you 
>> have overhead with REST. Async call API with HTTP is a mess.
>>
>> 3. All actions are routed automatically to the relevant shards only, no 
>> matter what client.
>>
>> 4. There are scala clients out there like elastic4s that wrap the native 
>> Java API, so I wonder why you do not use them?
>>
>> Jörg
>>
>>
>> On Fri, Jul 25, 2014 at 8:25 AM, CB  wrote:
>>
>>> hi all,
>>>
>>> i'm new to elastic search and would like to ask some basic questions.
>>>
>>> we are developing a system based on the play framework (non blocking io, 
>>> event loop, scala)
>>>
>>> we are currently working with elastic search through the rest api which 
>>> is working ok in dev. we are concerned about performance once we move to 
>>> production environment. here are some questions:
>>>
>>> 1. can i point the rest api end point to a load balancer configured in 
>>> front of the ES cluster? is that a common best practice?
>>>
>>> 2. is there any performance boost if we switch from rest api calls to 
>>> native java client? if so - is it lagging behind with features?
>>>
>>> 3. java client - is this a smart client? meaning - can the client direct 
>>> the queries  to the relevant shard / shards for faster result retrieval?
>>>
>>> 4. any other advice / suggestion in regards to native client vs REST API 
>>> for using ES?
>>>
>>> thanks!
>>> CB
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/3ca59232-8462-4e66-8400-8a5aca18fe0c%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/782eb968-5e09-452f-8c91-004b997f8f04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shard rebalancing

2014-08-03 Thread Mark Walkom
Shard size will depend entirely on how many shards you've set and how big
the index is.
Allocation of data to shards happens in a round-robin manner, so balancing
isn't needed.

What do you mean by shards changing in the background?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 4 August 2014 04:45, 'Sandeep Ramesh Khanzode' via elasticsearch <
elasticsearch@googlegroups.com> wrote:

> What is the behavior of ES when it comes to shard sizes? Does it do
> automatic shard rebalancing at any point of time? If so, is it also
> controlled through an API?
>
> How can I know if the shards are changing in the background? If I do not
> add any new node or change any cluster configuration once indexing has
> started, is there any pattern to this behavior? Please let me know.
>
> Thanks,
> Sandeep
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9f3dee22-0c33-4446-a0dc-eaf1a314d2c4%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aCRon%2B1%2BFWKnC7tkKU5oMYyHiONGaOiTJs%3DCHhE%3DezoA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Guice Creation Error

2014-08-03 Thread David Pilato
Do you create a new client for each request?
If so, create only one client when your application start.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 4 août 2014 à 00:42, Subacini B  a écrit :

Hi,

We are using Spring Framework with elasticsearch.


ES Version : 1.2.1
Code

Client client = new TransportClient().addTransportAddress(new 
InetSocketTransportAddress(url, 9300));


SearchResponse response = client.prepareSearch(""escore")

.addAggregation(AggregationBuilders.cardinality("CUSTOMER").field("CUSTOMER"))
.setSize(0).execute().actionGet();

client.close();


It works fine for fist few hits  , After that it throws below exception. Is 
there any thing wrong with the code. Really appreciate any pointers/help on 
this.


java.lang.RuntimeException: org.apache.cxf.interceptor.Fault: Guice creation 
errors:

1) Error injecting constructor, java.lang.OutOfMemoryError: unable to create 
new native thread
  at org.elasticsearch.threadpool.ThreadPool.(Unknown Source)
  while locating org.elasticsearch.threadpool.ThreadPool
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:713)
at org.elasticsearch.threadpool.ThreadPool.(ThreadPool.java:144)
at sun.reflect.GeneratedConstructorAccessor69.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
at 
org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at 
org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
at 
org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
at 
org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
at 
org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
at 
org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
at 
org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
at 
org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
at 
org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:830)
at 
org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
at 
org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
at 
org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70)
at 
org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59)
at 
org.elasticsearch.client.transport.TransportClient.(TransportClient.java:187)
at 
org.elasticsearch.client.transport.TransportClient.(TransportClient.java:117)
-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJfX5E2rVXE7eVe08JTkgx8rpmOQPsFCjUdEgrxz8rZuurBEog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/D2EB8DCB-6E3B-4324-981F-22273EE46808%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: If I have ELK stack running on EC2. How can I make the ES as a cluster?

2014-08-03 Thread Mark Walkom
ES can take disk space into account, 1.3.X does this automatically -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html#disk

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 4 August 2014 03:36, Aaron Mefford  wrote:

> I don't know that ES has any intelligence to support varied node sizes so
> I would say yes they should be the same size. I've not looked into this so
> I may be wrong.
>
> Also I use multiple Ebs volumes in a software raid. to increase non
> provisioned iops.  Not necessary if you use piops.
>
> Aaron
>
>
> Sent from my iPhone
>
> On Aug 2, 2014, at 9:41 AM, vjbangis  wrote:
>
> Thanks to that Aaron.
>
> BTW, should the "data" path be the same size and path with each other
> (node/s).
>
> let say
> Node* 1*
> path.data: /data/elasticsearch
> size = 100GB (EBS)
>
> Node *2*
> path.data: /data/elasticsearch
> size = 100GB (EBS)
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/GVs4RKYRGXs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7b5aa1bc-dc8b-48bd-a2c8-f6ee3ef3e809%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/-611812713062967954%40unknownmsgid
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZH7S_hbh6RYmqpoN5qJX5M6MRdLEtH_WUbXTcbwrGNXw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Guice Creation Error

2014-08-03 Thread Subacini B
Hi,

We are using Spring Framework with elasticsearch.


*ES Version* : 1.2.1

*Code*


Client client = new TransportClient().addTransportAddress(new
InetSocketTransportAddress(url, 9300));

SearchResponse response = client.prepareSearch(""escore")

.addAggregation(AggregationBuilders.cardinality("CUSTOMER").field("CUSTOMER"))
.setSize(0).execute().actionGet();
client.close();


It works fine for fist few hits  , After that it throws below
exception. Is there any thing wrong with the code. Really appreciate
any pointers/help on this.


java.lang.RuntimeException: org.apache.cxf.interceptor.Fault: *Guice
creation errors:
*
1) Error injecting constructor, *java.lang.OutOfMemoryError: unable to
create new native thread*
  at org.elasticsearch.threadpool.ThreadPool.(Unknown Source)
  while locating org.elasticsearch.threadpool.ThreadPool*Caused by:
java.lang.OutOfMemoryError: unable to create new native thread*
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:713)
at org.elasticsearch.threadpool.ThreadPool.(ThreadPool.java:144)
at sun.reflect.GeneratedConstructorAccessor69.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
at 
org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at 
org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
at 
org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
at 
org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
at 
org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
at 
org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
at 
org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
at 
org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
at 
org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:830)
at 
org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
at 
org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
at 
org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70)
at 
org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59)
at 
org.elasticsearch.client.transport.TransportClient.(TransportClient.java:187)
*at 
org.elasticsearch.client.transport.TransportClient.(TransportClient.java:117)*

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJfX5E2rVXE7eVe08JTkgx8rpmOQPsFCjUdEgrxz8rZuurBEog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Design HA ES for 16 TB logs data | Is SAN storage a good idea?

2014-08-03 Thread joergpra...@gmail.com
A. There are many unknown factors regarding "SAN storage", e.g. how is the
latency and the IOPS? Most of SAN are black boxes and do not scale over the
number of connected hosts, so you should test it thoroughly to make an
educated decision. There is no simple "yes" or "no". As a matter of fact, I
would never use SAN, only local storage, because SAN comes with the risk of
being bottleneck.

B. No matter what specifications, you should test your configuration first
if it fits your performance requirements, there is no "yes" or "no". The
minimum number of nodes is 3 to avoid situations like split brain.

C. You should expect more throughput if you can decouple client workload
from server workload, but that also depends on your workload pattern and
your tests. For example if you must preprocess data before indexing, or
postprocess search results, you will welcome additional nodes as a great
help.

Jörg


On Sun, Aug 3, 2014 at 9:26 PM, sirkubax 
wrote:

> Hi,
>
> I'm testing/planning implementation for 16 TB data logs (1 month, daily
> indexes about 530GB/day). Indexes are deleted after 1 month (TTL is 1
> month).
>
> The documents size vary from few bytes to 1MB (average of ~3 kb).
>
> We have 2 data center, and the requirement is to provide access to dataset
> when one is down.
>
> My current implementation looks like this:
>
>   cluster.routing.allocation.awareness.attributes: datacenter
>
>   cluster.routing.allocation.awareness.force.datacenter.values:
> datacenterA,datacenterB
>
> So the indexes are located on nodes in datacenterA and datacenterB. There
> is 1 replica for each index, so the index/replica is  balanced between
> locations.
>
> The problem A:
>
> I have been offered a SAN storage space that could be provided to any of
> ES node machines. Now, it index/replica scenario, I need 2 * 16 TB = 32 TB
> disk storage. If in raid1, it makes 64TB "real world" disk storage.
>
> Providing "independent, high quality" storage may (if ES would allow)
> reduce the size to required 16TB. I said "if ES would allow", because up to
> my current knowledge, nodes can not "share" dataset. If many nodes run on a
> common storage, they create own, unique path. Is that correct?
>
>  Could I run ES cluster where indexes have no replica, but still, nodeX
> failure does not affect accessibility of nodeXdataset to the Cluster?
>
> In my current idea of indexes without replica scenario, powering off (or
> failure) of the "NodeXDatacenterA" would make datasetX unavailable to read
> in cluster, at least until I start NodeXDatacenterB that would have access
> to datasetX (the same path configuration). Of course NodeXDatacenterA and
> NodeXDatacenterB could not run both in the same time.
>
> I just guess, that workaround suggested above is not "in the ES philosophy
> of shared storage and self-balancing". It would make upgrade of single node
> problematic, less fault-tolerant, etc.
>
>  Facts that makes me think about this solution is, that I have available
> some "24-core, 64GH Ram, limited disk storage" machines and a 16TB SAN
> storage that I could mount to that machines.
>
>  Do You have any suggestion of SAN storage usage? Is that a good idea at
> all?
>
> The problem B: Design
>
> My current idea of building the environment is to order N (6-8? or more)
> machines with big HDD's and run "normal ES cluster" with shards and
> replicas stored locally.
>
> The question is: how many of them would be enough :)
>
> Providing 24-core,64GB RAM and 4TB each it would make 4 machines to run
> minimal cluster settings in single Datacenter, and 8 machines total for
> both datacenters. What do you think about possible performance.
>
> Actually to be storage-safe I would go for 6-8 TB disk storage per
> machine. That would allow to run on "less than 4" nodes while operation in
> single datacenter.
>
> I wonder if 64GB RAM would be enough.
>
> The whole process of acquiring new servers takes time - is there a "good
> practise" guide to determine minimum number of servers in the cluster?
>
>  How many shards would You suggest?
>
> Question C:
>
> I have seen some performance advices to make "client" ES nodes as a
> machine without data storage so it would not suffer from I/O issues. If
> having 2 of them, how would you scale it?
>
> Do you think it's worth having 2 client-only machines, or better 2 more
> "complete" nodes with data storage, as extra nodes to ES cluster (so 10
> instead of 8 nodes).
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%40googlegroups.com
> 
> .
> For more

Design HA ES for 16 TB logs data | Is SAN storage a good idea?

2014-08-03 Thread sirkubax


Hi,

I'm testing/planning implementation for 16 TB data logs (1 month, daily 
indexes about 530GB/day). Indexes are deleted after 1 month (TTL is 1 
month).

The documents size vary from few bytes to 1MB (average of ~3 kb).

We have 2 data center, and the requirement is to provide access to dataset 
when one is down.

My current implementation looks like this:

  cluster.routing.allocation.awareness.attributes: datacenter

  cluster.routing.allocation.awareness.force.datacenter.values: 
datacenterA,datacenterB

So the indexes are located on nodes in datacenterA and datacenterB. There 
is 1 replica for each index, so the index/replica is  balanced between 
locations.

The problem A:

I have been offered a SAN storage space that could be provided to any of ES 
node machines. Now, it index/replica scenario, I need 2 * 16 TB = 32 TB 
disk storage. If in raid1, it makes 64TB "real world" disk storage.

Providing "independent, high quality" storage may (if ES would allow) 
reduce the size to required 16TB. I said "if ES would allow", because up to 
my current knowledge, nodes can not "share" dataset. If many nodes run on a 
common storage, they create own, unique path. Is that correct? 

 Could I run ES cluster where indexes have no replica, but still, nodeX 
failure does not affect accessibility of nodeXdataset to the Cluster?

In my current idea of indexes without replica scenario, powering off (or 
failure) of the "NodeXDatacenterA" would make datasetX unavailable to read 
in cluster, at least until I start NodeXDatacenterB that would have access 
to datasetX (the same path configuration). Of course NodeXDatacenterA and 
NodeXDatacenterB could not run both in the same time.

I just guess, that workaround suggested above is not "in the ES philosophy 
of shared storage and self-balancing". It would make upgrade of single node 
problematic, less fault-tolerant, etc.

 Facts that makes me think about this solution is, that I have available 
some "24-core, 64GH Ram, limited disk storage" machines and a 16TB SAN 
storage that I could mount to that machines.

 Do You have any suggestion of SAN storage usage? Is that a good idea at 
all?

The problem B: Design

My current idea of building the environment is to order N (6-8? or more) 
machines with big HDD's and run "normal ES cluster" with shards and 
replicas stored locally.

The question is: how many of them would be enough :)

Providing 24-core,64GB RAM and 4TB each it would make 4 machines to run 
minimal cluster settings in single Datacenter, and 8 machines total for 
both datacenters. What do you think about possible performance. 

Actually to be storage-safe I would go for 6-8 TB disk storage per machine. 
That would allow to run on "less than 4" nodes while operation in single 
datacenter.

I wonder if 64GB RAM would be enough. 

The whole process of acquiring new servers takes time - is there a "good 
practise" guide to determine minimum number of servers in the cluster?

 How many shards would You suggest?

Question C:

I have seen some performance advices to make "client" ES nodes as a machine 
without data storage so it would not suffer from I/O issues. If having 2 of 
them, how would you scale it?

Do you think it's worth having 2 client-only machines, or better 2 more 
"complete" nodes with data storage, as extra nodes to ES cluster (so 10 
instead of 8 nodes).



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0565daed-f398-48da-be62-8646844581d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch field mapping, dynamic_templates

2014-08-03 Thread sirkubax
I dod migrate to ES 1.3.1

I did try to do the same trick, but it's fail to PUT oryginal, just dumped 
settings.
Any ideas?

curl -XGET localhost:9200/_template?pretty > template_all

curl -XPUT localhost:9200/_template/*?pretty -d @template_all

*{*
*  "error" : "ActionRequestValidationException[Validation Failed: 1: 
template is missing;]",*
*  "status" : 500*
*}*
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b0981737-8788-4e90-8f2d-e8afc345c1a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Route query so that data for a shard is localized

2014-08-03 Thread joergpra...@gmail.com
Have you consulted the docs

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism

about the optimizations of term lookup for TermFilter?

There are caches in use, and for term lookup, you can also use routing to
select a particular shard.

Regarding the "tree-like data mapping": ES rolls the tree notation into a
flat format to make use of the Lucene API for fields in documents. There is
no performance implication with this. If you decide to use an extraordinary
high amount of fields (>>1000), you will notice each field consumes a bit
of RAM, but this is not related to a "tree-like data mapping".

Jörg



On Sun, Aug 3, 2014 at 8:37 PM, 'Sandeep Ramesh Khanzode' via elasticsearch
 wrote:

> Hi,
>
> I have fairly large data and a ES cluster. Can I use some shard knowledge
> to execute queries so that only data relevant to a particular shard is
> fetched for that shard/node? I want to make sure that if I have a filter,
> then the values in the TermFilter only hold records that are relevant to
> the shard it will act upon. Is this a known problem? If so, how is it
> solved?
>
> Is there any performance implication in using the tree-like data mapping
> in ES? I am evaluating it now, and I wanted to know if it is feasible to
> maintain a treelike structure in ES, or just split it into multiple records
> or multiple indices?
>
> Thanks,
> Sandeep
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFy48Ga63bH3Q8bmOwa-sRH4yVVODOw9NhxJ0YQD8AC7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Shard rebalancing

2014-08-03 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
What is the behavior of ES when it comes to shard sizes? Does it do 
automatic shard rebalancing at any point of time? If so, is it also 
controlled through an API? 

How can I know if the shards are changing in the background? If I do not 
add any new node or change any cluster configuration once indexing has 
started, is there any pattern to this behavior? Please let me know.

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9f3dee22-0c33-4446-a0dc-eaf1a314d2c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Route query so that data for a shard is localized

2014-08-03 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

I have fairly large data and a ES cluster. Can I use some shard knowledge 
to execute queries so that only data relevant to a particular shard is 
fetched for that shard/node? I want to make sure that if I have a filter, 
then the values in the TermFilter only hold records that are relevant to 
the shard it will act upon. Is this a known problem? If so, how is it 
solved?

Is there any performance implication in using the tree-like data mapping in 
ES? I am evaluating it now, and I wanted to know if it is feasible to 
maintain a treelike structure in ES, or just split it into multiple records 
or multiple indices?

Thanks, 
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: If I have ELK stack running on EC2. How can I make the ES as a cluster?

2014-08-03 Thread Aaron Mefford
I don't know that ES has any intelligence to support varied node sizes so I
would say yes they should be the same size. I've not looked into this so I
may be wrong.

Also I use multiple Ebs volumes in a software raid. to increase non
provisioned iops.  Not necessary if you use piops.

Aaron


Sent from my iPhone

On Aug 2, 2014, at 9:41 AM, vjbangis  wrote:

Thanks to that Aaron.

BTW, should the "data" path be the same size and path with each other
(node/s).

let say
Node* 1*
path.data: /data/elasticsearch
size = 100GB (EBS)

Node *2*
path.data: /data/elasticsearch
size = 100GB (EBS)

 --
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/GVs4RKYRGXs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7b5aa1bc-dc8b-48bd-a2c8-f6ee3ef3e809%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/-611812713062967954%40unknownmsgid.
For more options, visit https://groups.google.com/d/optout.


How to find null_value in query_string like we can in missing filter

2014-08-03 Thread pulkitsinghal
I'm using elasticsearch v0.90.5

With a missing filter, we can track missing fields:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-missing-filter.html
and make sure that a null_value also counts as missing.

How can we do the same in a query_string?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_field_names
Based on my test so far, the non-existent field counts as missing but the 
null_value field counts as present.

How should I write my query?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c51f170b-3775-4b8a-909c-8d00a9095c69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: SocketTimeoutException while using JEST to connect to ES

2014-08-03 Thread Eitan Vesely
Hi Im facing the same issue. can you please elaborate about yout HTTP proxy 
issue?
or any other options??

On Saturday, August 2, 2014 7:09:30 PM UTC+3, anuj maheshwari wrote:
>
> My client was not able to communicate with ES server. It was a HTTP proxy 
> issue. Try looking around this for your case.
>
>
> On Sat, Aug 2, 2014 at 4:48 PM, Renu > 
> wrote:
>
>> I'm facing the same issue. Using Jest 0.1.0 version. Is there any 
>> solution for this problem?
>>
>>
>> On Wednesday, June 25, 2014 1:53:29 PM UTC+5:30, anuj maheshwari wrote:
>>>
>>> Hi,
>>>
>>> I am evaluating Log Stash and Elastic Search for one of our requirement. 
>>> I have to query data from Elastic Search in a Java Application. I am trying 
>>> to use JEST to make a HTTP/REST call to ES. When I am trying to add a 
>>> simple index in ES using Jest Client, I am getting 
>>> "java.net.SocketTimeoutException: 
>>> Read timed out" exception at the time of client execute statement. 
>>>
>>> HttpClientConfig clientConfig = new HttpClientConfig.Builder("
>>> http://My_Host:9200";).readTimeout(2).multiThreaded(true).build();
>>> JestClientFactory factory = new JestClientFactory();
>>> factory.setHttpClientConfig(clientConfig);
>>> 
>>> JestHttpClient jestClient = (JestHttpClient) factory.getObject();
>>>
>>>
>>> *jestClient.execute(new CreateIndex.Builder("myIndex").build()); 
>>> // Here I am getting Socket Read Timeout Exception.*
>>>
>>>
>>>
>>> Below is the full error stack trace - 
>>>
>>>
>>> Exception in thread "main" java.net.SocketTimeoutException: Read timed 
>>> out
>>>  at java.net.SocketInputStream.socketRead0(Native Method)
>>> at java.net.SocketInputStream.read(SocketInputStream.java:150)
>>>  at java.net.SocketInputStream.read(SocketInputStream.java:121)
>>> at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(
>>> SessionInputBufferImpl.java:136)
>>>  at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(
>>> SessionInputBufferImpl.java:152)
>>>  at org.apache.http.impl.io.SessionInputBufferImpl.readLine(
>>> SessionInputBufferImpl.java:270)
>>>  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(
>>> DefaultHttpResponseParser.java:140)
>>> at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(
>>> DefaultHttpResponseParser.java:57)
>>>  at org.apache.http.impl.io.AbstractMessageParser.parse(
>>> AbstractMessageParser.java:260)
>>>  at org.apache.http.impl.DefaultBHttpClientConnection.
>>> receiveResponseHeader(DefaultBHttpClientConnection.java:161)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke(
>>> NativeMethodAccessorImpl.java:57)
>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>> DelegatingMethodAccessorImpl.java:43)
>>>  at java.lang.reflect.Method.invoke(Method.java:601)
>>> at org.apache.http.impl.conn.CPoolProxy.invoke(CPoolProxy.java:138)
>>>  at com.sun.proxy.$Proxy0.receiveResponseHeader(Unknown Source)
>>> at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(
>>> HttpRequestExecutor.java:271)
>>>  at org.apache.http.protocol.HttpRequestExecutor.execute(
>>> HttpRequestExecutor.java:123)
>>> at org.apache.http.impl.execchain.MainClientExec.
>>> execute(MainClientExec.java:254)
>>>  at org.apache.http.impl.execchain.ProtocolExec.
>>> execute(ProtocolExec.java:195)
>>> at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
>>>  at org.apache.http.impl.execchain.RedirectExec.
>>> execute(RedirectExec.java:108)
>>> at org.apache.http.impl.client.InternalHttpClient.doExecute(
>>> InternalHttpClient.java:186)
>>>  at org.apache.http.impl.client.CloseableHttpClient.execute(
>>> CloseableHttpClient.java:82)
>>> at org.apache.http.impl.client.CloseableHttpClient.execute(
>>> CloseableHttpClient.java:106)
>>>  at io.searchbox.client.http.JestHttpClient.execute(
>>> JestHttpClient.java:59)
>>> at webservice.ESClient.main(ESClient.java:50)
>>> Process exited with exit code 1.
>>>
>>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/gsweC_svK38/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/97d349ca-f12e-4be0-85fb-c05e17ede84e%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view th

Inexplicable wrong results in automated tests

2014-08-03 Thread John D. Ament
Hi

So after running a few rounds of local automated tests, I've noticed that 
sometimes I get the wrong results in my index.  This seems to only be an 
issue with my automated tests and not when running the application manually 
(at least I haven't seen the wrong results after several executions).

My search looks like this:

SearchResponse searchResponse = 
esClient.client().prepareSearch(indexName).setTypes(RECORD_TYPE)
.setFetchSource(true)
.setPostFilter(FilterBuilders.andFilter(
FilterBuilders.inFilter("typeId",types.toArray(new 
Integer[]{})).cache(false),

FilterBuilders.inFilter("stateId",states.toArray(new 
Integer[]{})).cache(false)
).cache(false))
.addSort("dateCreated.value", SortOrder.DESC)
.addSort("recordId",SortOrder.DESC)
.execute().actionGet();

The issue appears both with and without the cache flag passed in.

The way my tests work is that I execute a bunch of seeds, then run queries 
against the seeds to verify I get the right results.  I'll create 5 records 
in my test, where the typeId's are always 1,2,3,4 and the stateIds are 
anything between 1 and 14, except for 6.

5 is a special state in my case.  I only want to include that state 
sometimes.  So I'll run one query with all the states except 5 and 6.  I 
expect that this will give me 4 records back (the 5th record is in state 
5).  Instead I'm getting back 5 results, as if ES is also including state 5 
in the list even though I didn't want it.

In my test I run this query twice.  The test fails sometimes on the first 
execution, never on the second execution (I have an arquillian deployment, 
and start up the app once, then seed data, run the first query in one test 
method, run the second query in a second test method).  I'm assuming that 
these filters are acting like a pure AND - the record must match both 
fields to be returned.  So, any idea why I might be getting the wrong 
results?

John

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c65c050a-2898-4065-b3a5-c8ad0cda0ed1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Nest - range filter in a form of BaseFilter

2014-08-03 Thread Adam Porat
Hi,
I'm using Nest version 0.12.0.
I need to get a range filter in a form of BaseFilter.
However, this line of code creates a faulty BaseFilter which doesn't 
contain the actual condition:
*agg.ElasticsearchFilter = Nest.Filter.Range(i => 
i.GreaterOrEquals(filteredUpdateDate));*
Is this a bug, or am I doing something wrong?
Any way that will enable to cast from a RangeFilterDescriptor to a 
BaseFilter will do. 
This is becase .FacetFilter( i => i.Bool(j => j.Must(...))) accepts a 
BaseFilter[].
Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7bb41594-74c8-440a-92fb-c78678129933%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[Kibana] Plugin to add new char type

2014-08-03 Thread vineeth mohan
Hi ,

I would like to add a new chart type for 2 level term grouping ( treemap
visualziation).

Is there anyway i can add this as a plugin to existing Kibana ?


Thanks
Vineeth

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kgn1kvkMSZonzxGQNwe1RaZxORpmi0P4W4acDmX%3DJq%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Using existing field for mapping

2014-08-03 Thread Ayush
I am new to elastic search, I have created an index "cmn" with a type 
"mention". I am trying to import data from my existing solr to 
elasticsearch, so I want to map an existing field to the _id field.

I have created the following file under /config/mappings/cmn/,

{
"mappings": {
"mentions":{
"_id" : {
"path" : "docKey"
}
  }
}
}

But this doesn't seem to be working, every time I index a record the 
following _id is created,

"_index": "cmn",
"_type": "mentions",
"_id": "k4E0dJr6Re2Z39HAIjYMmg",
"_score": 1

Also, the mapping is not reflects. I have also tried the following option,

{
"mappings": {
"_id" : {
"path" : "docKey"
}
}
}

SAMPLE DOCUMENT: Basically a tweet.

 {
   "usrCreatedDate": "2012-01-24 21:34:47",
   "sex": "U",
   "listedCnt": 2,
   "follCnt": 432,
   "state": "Southampton",
   "classified": 0,
   "favCnt": 468,
   "timeZone": "Casablanca",
   "twitterId": 47038,
   "lang": "en",
   "stnostem": "#ootd #ootw #fashion #styling #photography 
#white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh";,
   "sourceId": "tw",
   "timestamp": "2014-04-09T22:58:00.396Z",
   "sentiment": 0,
   "updatedOnGMTDate": "2014-04-09T22:56:57.000Z",
   "userLocation": "Southampton",
   "age": 0,
   "priorityScore": 57.4700012207031,
   "statusCnt": 14612,
   "name": "YazzyK",
   "profilePicUrl": 
"http://pbs.twimg.com/profile_images/453578494556270594/orsA0pKi_normal.jpeg";,
   "mentions": "",
   "sourceStripped": "Instagram",
   "collectionName": "STREAMING",
   "tags": "557/161/193/197",
   "msgid": 1397084280396.33,
   "_version_": 1464949081784713200,
   "url2": 
"{\"urls\":[{\"url\":\"http://t.co/YbPFrXlpuh\",\"expandedURL\":\"http://instagram.com/p/mliZbgxVZm/\",\"displayURL\":\"instagram.com/p/mliZbgxVZm/\",\"start\":88,\"end\":110}]}";,
   "links": "http://t.co/YbPFrXlpuh";,
   "retweetedStatus": "",
   "twtScreenName": "YazKader",
   "postId": "454030232501358592",
   "country": "Bermuda",
   "message": "#ootd #ootw #fashion #styling #photography 
#white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh";,
   "source": "http://instagram.com\"; 
rel=\"nofollow\">Instagram",
   "parentStatusId": -1,
   "bio": "Live and breathe Fashion. Persian and proud- 
Instagram: @Yazkader",
   "createdOnGMTDate": "2014-04-09T22:56:57.000Z",
   "searchText": "#ootd #ootw #fashion #styling #photography 
#white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh";,
   "isFavorited": "False",
   "frenCnt": 214,
   "docKey": "tw_454030232501358592"
}
Also, how can we create unique mapping for each "TYPE" and not just the 
index.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/15c33940-d95c-48b4-afcd-a3cccdabafaf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana Search Question

2014-08-03 Thread AK
Hi,
i'm using ELK for stop my API logs
My message contains a json with an API request, something like that

{"initObj": {"mediaType": 0, "pageSize": 100, "pageIndex": 0, "exact": 
false, "orderBy": "NONE", "orderDir": "ASC", "orderMeta": ""}

I'm looking to find all the massages where "pageSize" in greater than 50.
Is there a way to do so? Without filtering pageSize in logstash

Thanks
AK

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ffc2c939-faaf-4972-b446-e59d76887193%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch jdbc-river do not import data from mysql

2014-08-03 Thread joergpra...@gmail.com
As the error message indicates, at line 10, you have a comma after

  "type" : "test",

and before a closing "}", which is invalid JSON syntax.

Jörg


On Sun, Aug 3, 2014 at 3:14 AM, mithril  wrote:

> There is a error seems be the key.
>
>
>
>> [2014-08-03 09:10:57,791][DEBUG][action.index ] [Amina Synge]
>>> [_river][0], node[95_1PcghS5aXgd3PQqqBFA], [P]
>>> , s[STARTED]: Failed to execute [index {[_river][my_jdbc_river][_meta],
>>> source[{
>>> "type" : "jdbc",
>>> "jdbc" : {
>>> "url" : "jdbc:mysql://127.0.0.1:3306/test",
>>> "user" : "root",
>>> "password" : "123123",
>>> "sql" : "select id as _id from documents",
>>> "index" : "test",
>>> "type" : "test",
>>> }
>>> }
>>> ]}]
>>> org.elasticsearch.index.mapper.MapperParsingException: failed to parse
>>> at
>>> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:536)
>>> at
>>> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
>>> at
>>> org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:394)
>>> at
>>> org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:194)
>>> at
>>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationActi
>>> on.performOnPrimary(TransportShardReplicationOperationAction.java:534)
>>> at
>>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationActi
>>> on$1.run(TransportShardReplicationOperationAction.java:433)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:744)
>>> Caused by: org.elasticsearch.common.jackson.core.JsonParseException:
>>> Unexpected character ('}' (code 125)): was expectin
>>> g either valid name character (for unquoted name) or double-quote (for
>>> quoted) to start field name
>>>  at [Source: [B@3f0cb9e7; line: 10, column: 6]
>>> at
>>> org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1524)
>>> at
>>> org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:557)
>>> at
>>> org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:475
>>> )
>>> at
>>> org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._handleOddName(UTF8StreamJsonParser.java:1792
>>> )
>>> at
>>> org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1480)
>>> at
>>> org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:700)
>>> at
>>> org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:50)
>>> at
>>> org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:492)
>>> at
>>> org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:538)
>>> at
>>> org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:480)
>>> at
>>> org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
>>> ... 8 more
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d1d2610c-da58-41e5-abc0-f84aede089fd%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF-qm1Dfd9zrHntd92tp_gRCoKjhEtwckxynRDNwOe%2BWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.