Re: Indexing large number of files each with a huge size

2014-08-26 Thread Sandeep Ramesh Khanzode
Hi Jorg, This is mostly standard code that I am referring. This is called from multiple threads for a different set of files on disk. Please provide your suggestions. Thanks,

Re: aggregate on analyzed field

2014-08-26 Thread Adrien Grand
Hi, Multi-fields are usually the way to go in such cases, see http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html On Mon, Aug 25, 2014 at 9:49 PM, kti...@hotmail.com wrote: I am aggregating documents by customer name to find how many documents we

Re: How to index Office files? *.txt and *.pdf are working...

2014-08-26 Thread David Pilato
I see what happened. Could you open an issue in mapper plugin? Will fix that next week. Thanks for the details! -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 25 août 2014 à 15:03, Dirk Bauer dirk.ba...@gmail.com a écrit : Hi David, thx for your help, but it's

Multi Tenant DB and JDBC River

2014-08-26 Thread Nitin Maheshwari
Hi Jörg, I am working on a multi tenant application where each tenant has its own database. I am planning to use ES for indexing the data, and JDBC river for doing periodic bulk indexing. I do not want to create one river per DB per object type. This will lead to too many rivers. I wanted to

Re: Multi Tenant DB and JDBC River

2014-08-26 Thread joergpra...@gmail.com
For multi tenant, the river concept is awkward. River is a singleton and is bound to single user execution, and you are right, creating river instances per DB and per index does not scale. There are several options: - write a more sophisticated plugin which acts as a service and not as a

Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread Jilles van Gurp
This is the generally accepted dogma and it has some merit. However, having two storage systems is more than a bit annoying. If you are aware of the limitations and caveats, elasticsearch is actually a perfectly good document store that happens to have a deeply integrated querying engine. This

AW: Shards

2014-08-26 Thread Markus Wiesenbacher
Hi, I´ve found the problem, the JSON-structure was not correct, it has to be this if you are using the JAVA-API: { analysis:{ ... }, index:{ number_of_replicas:1, number_of_shards:3 } } Thanks Markus ;) Von: elasticsearch@googlegroups.com

Get distinct result by using multi_match and suggestion

2014-08-26 Thread Ramy
Is there a way to solve the following problem? I have created a search field with suggestions functionality. The user is able to search for names, categories, etc. These fields are mapped like: { settings: { analysis: { analyzer: { *autocomplete*: { type: custom,

Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread Raphael Waldmann
Mohit Anchlia, How do you sync ES with your main DB? That's what I'm thinking for my project because I don't have much experience with ES. Thanks On Aug 26, 2014 1:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: In general use elasticsearch only as a secondary index. Have a copy of data

Re: AutoCompletion Suggester - Duplicate record in suggestion return

2014-08-26 Thread alistairj
Hi Alexander, If I may, I have a follow-up question to your response here. How does the completion suggester behave with fields such as payload and score when it is unifying the response based on output ?? Are scores increased based on this combination? if payloads are different, which ones

Timezone in Simple Query

2014-08-26 Thread Gianni Livolsi
All dates are UTC. Internally, a date maps to a number type long. When applied on date fields the range filter accepts also a time_zone parameter { range : { born : { gte: 2012-01-01, time_zone: +1:00 } } } but this is not possible {

Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread Raphael Waldmann
I am reading a lot studying what is the best aproach fo this. My main question can be resumed in two points If I choose ES to index my postgresql. What's the best way to do that? I need cluster? The most problems that I read about was related to that. If this is true and I can run in one node

Aggregation across indices

2014-08-26 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi, If I have two indices each having part of the record and joined using some common identifier, can I issue a query across both indices and have aggregations apply taking into consideration both indices? Example: Index 1: Type 1: ID: String Field1: String Field2: String Index 2: Type 2: ID:

Re: Ability to search accross 'types' in the same index, with different search parameters yet applying the same size and from values, in a single search query

2014-08-26 Thread vineeth mohan
Hello AJ , You can do this as follows { query_string : { query : test-type1.status:1 || test-type2.status:2 } But then there is a bug associated with a corner condition of this - https://github.com/elasticsearch/elasticsearch/issues/4081 So be a bit careful. Thanks Vineeth Thanks

Re: Aggregation across indices

2014-08-26 Thread vineeth mohan
Hello Sandeep , What you are intending is not possible. But then Elasticsearch do have some good relational operations which needs to be defined before indexing. If you can elaborate your use case , we can help on this. Thanks Vineeth On Tue, Aug 26, 2014 at 6:04 PM, 'Sandeep Ramesh

Failed start of 2nd instance on same host with mlockall=true

2014-08-26 Thread R. Toma
Hi all, In an attempt to squeeze more power out of our physical servers we want to run multiple ES jvm's per server. Some specs: - servers has 24 cores, 256GB ram - each instance binds on different (alias) ip - each instance has 32GB heap - both instances run under user 'elastic' - limits for

Re: Need some advice to build a log central.

2014-08-26 Thread vineeth mohan
Hello Sang , Can i know why you are using Hive. I feel you can do the analysis in Elasticsearch itself. Rest seems good to me. Thanks Vineeth On Tue, Aug 26, 2014 at 8:03 AM, Sang Dang zkid...@gmail.com wrote: Hello All, I have selected #2 as my solution. I write data to ES, and

Re: Is it possible to register a RestFilter without creating a plugin?

2014-08-26 Thread vineeth mohan
Hello Jinyuan , I dont feel this is possible. In such a provision , how will you define what the REST API will do ? Thanks Vineeth On Tue, Aug 26, 2014 at 2:41 AM, Jinyuan Zhou zhou.jiny...@gmail.com wrote: Thanks, -- You received this message because you are subscribed to the

define multiple types in an index

2014-08-26 Thread HansPeterSloot
Hello, I am using elasticsearch 1.3.2 and try to understand elasticsearch (with my Oracle background ;-)). For testing I use the data available on http://fec.gov/disclosurep/PDownload.do There is a datafile for every state of the USA. I don't know whether it is a good idea but I want to make 1

Re: Swap indexes?

2014-08-26 Thread Lee Gee
I was looking for the index alias, thanks all. On Tuesday, June 17, 2014 9:31:00 AM UTC+1, Lee Gee wrote: Is it possible to have one ES instance create an index and then have a second instance use that created index, without downtime? tia lee -- You received this message because you are

Re: _suggest suggestion/question

2014-08-26 Thread Lee Gee
Thank you, Vineeth. On Sunday, August 17, 2014 12:04:20 PM UTC+1, vineeth mohan wrote: Hello Lee , You will need to use context suggester for this purpose - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/suggester-context.html Also this difference stems from the

Re: How to get the field infomation when _all and _source was set disabled

2014-08-26 Thread vineeth mohan
Hello Wang , By default the _source field stores the input JSON and gives it back for each document match. If you disable , ES wont be able to return it. Hence the result you see. By default ES wont make any efforts to tap the Stored information , it rather takes the json stored in _source field.

gateway.recover_after_nodes minimum_master_nodes in a distributed environment?

2014-08-26 Thread Chris Neal
Hello all, Question about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in a distributed ES cluster. By distributed I mean I have: 2 nodes that are data only: 'node.data' = 'true', 'node.master' = 'false', 'http.enabled' = 'false', 1 node that is a

Re: Using elasticsearch as a realtime fire hose

2014-08-26 Thread Jilles van Gurp
You might want to look at developing a plugin for this or maybe using an existing one. This one for example might do partly what you need: https://github.com/derryx/elasticsearch-changes-plugin If you develop your own plugin, you should be able to tap into what is happening in the cluster at a

Re: Logstash stop communicating with Elasticsearch

2014-08-26 Thread Jilles van Gurp
I had some issues with logstash as well and ended up modifying the elasticsearch_http plugin to tell me what was going on. Turned out my cluster was red because my index template required more replicas than was possible:-). The problem was that logstash does not fail very gracefully and

Re: Java API or REST API for client development ?

2014-08-26 Thread Jilles van Gurp
I use a in house developed java rest client for elasticsearch. Unfortunately it's not in any shape to untangle from our code base and put on Github yet but I might consider that if there's more interest. Basically I use apache httpclient, I implemented a simple round robin strategy so I can

Re: Failed start of 2nd instance on same host with mlockall=true

2014-08-26 Thread joergpra...@gmail.com
You should run one node per host. Two nodes add overhead and suffer from the effects you described. For mlockall, the user needs privilege to allocate the specified locked mem, and the OS need contiguous RAM per mlockall call. If the user's memlock limit is exhausted, or if RAM allocation gets

Re: how to use my customer lucene analyzer(tokenizer)?

2014-08-26 Thread art
Thanks Jun, that was helpful. It helped me to realize I had not fully connected my analyzer plugin. On Thursday, August 21, 2014 11:23:47 PM UTC-7, Jun Ohtani wrote: Hi Art, I wrote an example specifying the kuromoji analyzer(kuromoji) and custom analyzer(my_analyzer) for a field. curl

Function Query with an aggregation function of nested field

2014-08-26 Thread Srinivasan Ramaswamy
I have documents with the above mentioned schema. authorId : 10 authorName: Joshua Bloch books: { { bookId: 101 bookName: Effective Java description : effective java book with useful recommendations Category: 1 sales: { { keyword: effective java count: 200 }, { keyword: java tips count: 100 },

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-26 Thread joergpra...@gmail.com
Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt which uses in turn sun.misc.Unsafe.getInt I have created a gist of the JVM crash file at

Re: indices.memory.index_buffer_size

2014-08-26 Thread Yongtao You
Thanks Mark. What confuses me are global setting (which suggests cluster-wide setting) and on a specific node (which suggests node level setting). I could just try it out, but it's hard to tell if the setting worked or not. :( On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote:

Re: indices.memory.index_buffer_size

2014-08-26 Thread Nikolas Everett
I just looked at this code! Its a setting that you set globally at the cluster level. It takes effect per node. What that means is that for every active shard on each the node gets an equal share of that much space. Active means has been written to in the past six minutes or so. When a node

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-26 Thread joergpra...@gmail.com
Still broken with lzf-compress 1.0.3 https://gist.github.com/jprante/d2d829b497db4963aea5 Jörg On Tue, Aug 26, 2014 at 7:54 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Thanks for the logstash mapping command. I can reproduce it now. It's the LZF encoder that bails out at

Elastic HQ not getting back vendor info from Elasticsearch.

2014-08-26 Thread John Smith
I posted an issue with Elastic HQ here: https://github.com/royrusso/elasticsearch-HQ/issues/164 But just in case maybe an Elastic dev can have a look and see if it's Elasticsearch issue or not. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch

Elastic HQ not getting back vendor info.

2014-08-26 Thread John Smith
I posted an issue with Elastic HQ here: https://github.com/royrusso/elasticsearch-HQ/issues/164 But just in case maybe an Elastic dev can have a look and see if it's Elasticsearch issue or not. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch

Re: groovy for scripting

2014-08-26 Thread Alex S.V.
providing self-update: I found that I could create cross-request cache using next script (like a cross-request incrementer): POST /test/_search { query: {match_all:{}}, script_fields: { a: { script: import groovy.lang.Script;class A extends Script{static i=0;def

term_stats return sometime return meaningless number

2014-08-26 Thread youwei chen
Our elasticsearch instance sometime return meaningless number for terms_stats, query return correct data. I am using Kibana as front end, this is generated query:

Failing Replica Shards

2014-08-26 Thread David Kleiner
Hello, In the past couple of days I've been getting a lot of error messages about corrupted replica shards. The primary shards come up fast after ES process restart but replicas take a long time to come back. Sometimes it takes a few node restarts to 'kick' the nodes to start replica shards.

Data per node in ES

2014-08-26 Thread Gaurav Tiwari
Hi , We are analyzing ES for storing our log data (~ 400 GB/Day) and will be integrating Logstash and ES. What is the maximum amount of data that can be stored on one node of ES ? Regards, Gaurav -- You received this message because you are subscribed to the Google Groups elasticsearch

Re: Reduce Number of Segments

2014-08-26 Thread Michael McCandless
OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for spinning disks. Maybe try also disabling merge throttling and see if that has an effect? 6 MB/sec seems slow... Mike McCandless http://blog.mikemccandless.com On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker

Re: Elasticsearch for logging. HOW to configure automatic creation of the new index every day?

2014-08-26 Thread David Kleiner
Hello Konstantin, You can use index value of name-%{+.MM.dd} in your elasticsearch output in logstash (link: http://logstash.net/docs/1.4.2/outputs/elasticsearch#index) HTH, David On Tuesday, August 26, 2014 10:01:39 AM UTC-7, Konstantin Erman wrote: Most of the guides I could find

Re: indices.memory.index_buffer_size

2014-08-26 Thread Michael McCandless
See also https://github.com/elasticsearch/elasticsearch/pull/7440 (will be in 1.4.0) which returns the actual RAM buffer size assigned to that shard by the little dance. Mike McCandless http://blog.mikemccandless.com On Tue, Aug 26, 2014 at 2:15 PM, Nikolas Everett nik9...@gmail.com wrote: I

Re: Can't open file to read checksums

2014-08-26 Thread Ivan Brusic
A few questions: What version of Elasticsearch are you using? Are you using the Java client and is it the same version of the cluster? Did you upgrade recently and was the index built with an older version of Elasticsearch? Elasticsearch recently added checksum verification (1.3?), so perhaps

elasticsearch processing pipeline capability?

2014-08-26 Thread Kevin B
Is there any facility in elasticsearch to help with sending terms to an external processes after lucene processing (tokenization, filters, etc)? The idea here is having some external analysis / nlp code run against the documents while keeping all the pre-processing choices consistent and in

Re: term_stats return sometime return meaningless number

2014-08-26 Thread youwei chen
Additional information: take mean of boolean value return 4,607,182,418,800,017,408 Thanks On Tuesday, August 26, 2014 3:58:43 PM UTC-4, youwei chen wrote: Our elasticsearch instance sometime return meaningless number for terms_stats, query return correct data. I am using Kibana as front

Re: Marvel not showing nodes stats

2014-08-26 Thread Jeff Byrnes
I'm experiencing a similar issue to this. We have two clusters: - 2 node monitoring cluster (1 master/data 1 just data) - 5 node production cluster (2 data, 3 masters) The output below is from the non-master data node of the Marvel monitoring cluster. There are no errors being

Re: Parent/Child query performance in version 1.1.2

2014-08-26 Thread Mark Greene
Just wanted to close the loop on this in case anyone stumbled upon the same issue. After upgrading to version 1.3.2 which had the performance increase stemming from https://github.com/elasticsearch/elasticsearch/pull/5846, we were able to see a dramatic decrease in parent/child query latency.

Re: elasticsearch processing pipeline capability?

2014-08-26 Thread joergpra...@gmail.com
If you want to retrieve the term list of an index after Lucene processing via REST HTTP API, you can try https://github.com/jprante/elasticsearch-index-termlist Jörg On Tue, Aug 26, 2014 at 10:41 PM, Kevin B blaisde...@gmail.com wrote: Is there any facility in elasticsearch to help with

How do I start elasticsearch as a service?

2014-08-26 Thread Eric Greene
Forgive me I'm a little lost. I am working on deploying elasticsearch on a AWS server. Previously in development I have started elasticsearch using ./bin/elasticsearch -Des.config=/etc/elasticsearch/elasticsearch.yml But in live deployment, I want to keep elasticsearch running as a service...

Re: How do I start elasticsearch as a service?

2014-08-26 Thread Mark Walkom
Check the logs under /var/log/elasticsearch, they should have something. Also please be aware that 1.2.0 has a critical bug and you should be using 1.2.1 instead. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27

Re: Elastic HQ not getting back vendor info from Elasticsearch.

2014-08-26 Thread Mark Walkom
ElasticHQ is a community plugin, the ES devs can't help here. I have raised issues against ElasticHQ in the past and Roy has fixed them pretty quickly :) Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August

Aggregation query works with search, but not with msearch

2014-08-26 Thread Dhruv Garg
I am trying to troubleshoot the following observation: Following code works as expected: Elasticsearch::Model.client.search search_type: 'count', index: target_indices, body: query Response: {took=2, timed_out=false, _shards={total=2, successful=2, failed=0}, hits={total=6, max_score=0.0,

alerting in Marvel

2014-08-26 Thread kti_sk
Hi, I started using Marvel for my cluster monitoring. Does Marvel have a way to set notification such as send me email if cpu load is over 80%? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop

Getting different results while using bool query vs bool query with function score query

2014-08-26 Thread Akshay Shukla
I am trying to add a custom boost to the different should clauses in the bool query, but I am getting different number of results when I use the bool query with 2 should clauses containing 2 simple query string query vs a bool query with 2 should clauses with 2 function score query

Re: alerting in Marvel

2014-08-26 Thread Mark Walkom
Nope. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 09:14, kti...@hotmail.com wrote: Hi, I started using Marvel for my cluster monitoring. Does Marvel have a way to set notification such as send

Re: gateway.recover_after_nodes minimum_master_nodes in a distributed environment?

2014-08-26 Thread Mark Walkom
Only master eligible for discovery.zen.minimum_master_nodes, so in your case it is 1. And that's bad as you can end up with a split brain situation. You should, if you can, make all three nodes master eligible. gateway.recover_after_nodes is all nodes, as per

Re: How do I start elasticsearch as a service?

2014-08-26 Thread Eric Greene
Thanks Mark, I found that if I comment out the line in elasticsearch.yml that sets the data path, it works. I will upgrade as you have suggested, thanks for that. On Tuesday, August 26, 2014 4:04:05 PM UTC-7, Mark Walkom wrote: Check the logs under /var/log/elasticsearch, they should have

Re: Data per node in ES

2014-08-26 Thread Mark Walkom
Depends. How much disk do you have? RAM? CPU? Java version and release? ES version? What's your query load like? Are you doing lots of aggregates or facets? The best way to know is to start using ELK on an platform indicative of your intended server size and then see how much data a single node

Re: alerting in Marvel

2014-08-26 Thread Mark Walkom
Also, you should really be monitoring your systems and core measurements (disk, CPU etc) with something specific for the job. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 August 2014 09:16, Mark Walkom

Re: alerting in Marvel

2014-08-26 Thread kti_sk
Hi, My goal was to figure out if i need to scale out if there is a sudden spike in the load. Can you be more specific about something specific for the job? On Tuesday, August 26, 2014 4:32:32 PM UTC-7, Mark Walkom wrote: Also, you should really be monitoring your systems and core measurements

Re: Micro Analysis in Kibana

2014-08-26 Thread Mungeol Heo
The question is how the micro analysis of the Kibana cloud do this without setting 'not_analyzed' to the fields? On Saturday, April 5, 2014 4:55:20 AM UTC+9, Binh Ly wrote: You'll need to set the field name to not_analyzed so that you can get a distinct value for the whole field (instead of

Re: Micro Analysis in Kibana

2014-08-26 Thread Mungeol Heo
The question is how does the micro analysis of the Kibana can do this without setting 'not_analyzed' to the fields? On Saturday, April 5, 2014 4:55:20 AM UTC+9, Binh Ly wrote: You'll need to set the field name to not_analyzed so that you can get a distinct value for the whole field (instead

RE: alerting in Marvel

2014-08-26 Thread KimTaein
ok so they are for monitoring the system running Elasticsearch. However, if i want to be notified of ES specific data points such as its JVM memory % there doesn't seem to be a solution. Thanks From: ma...@campaignmonitor.com Date: Wed, 27 Aug 2014 10:05:05 +1000 Subject: Re: alerting in

Re: gateway.recover_after_nodes minimum_master_nodes in a distributed environment?

2014-08-26 Thread Chris Neal
Thank you Mark. Makes perfect sense. Chris On Tue, Aug 26, 2014 at 6:25 PM, Mark Walkom ma...@campaignmonitor.com wrote: Only master eligible for discovery.zen.minimum_master_nodes, so in your case it is 1. And that's bad as you can end up with a split brain situation. You should, if you

Re: Is it possible to register a RestFilter without creating a plugin?

2014-08-26 Thread Jinyuan Zhou
Thanks Vineeth, But I guess it does not change anything about REST API if elasticsearch offer some way that is easier than building a plugin to allow registering RestFiles to rest api calls. For a lot of frameworks, it is very common to provide configuration based approach to register some of

Re: Function Query with an aggregation function of nested field

2014-08-26 Thread Srinivasan Ramaswamy
Any thoughts anyone ? I am primarily looking for an answer to my 2nd question. On Tuesday, August 26, 2014 10:14:37 AM UTC-7, Srinivasan Ramaswamy wrote: I have documents with the above mentioned schema. authorId : 10 authorName: Joshua Bloch books: { { bookId: 101 bookName:

got QueryPhaseExecutionException when using custom query parser

2014-08-26 Thread Peiyong Lin
Hi all, I wrote my own custom query parser, and extended elasticsearch as a plugin, the code is in the following link. query parser http://pastebin.mozilla.org/6172836 customized query http://pastebin.mozilla.org/6172837 plugin http://pastebin.mozilla.org/6172844 I used the default settings of