Hi Jorg,
This is mostly standard code that I am referring. This is called from
multiple threads for a different set of files on disk.
Please provide your suggestions. Thanks,
Hi,
Multi-fields are usually the way to go in such cases, see
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html
On Mon, Aug 25, 2014 at 9:49 PM, kti...@hotmail.com wrote:
I am aggregating documents by customer name to find how many documents we
I see what happened. Could you open an issue in mapper plugin?
Will fix that next week.
Thanks for the details!
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 25 août 2014 à 15:03, Dirk Bauer dirk.ba...@gmail.com a écrit :
Hi David,
thx for your help, but it's
Hi Jörg,
I am working on a multi tenant application where each tenant has its own
database. I am planning to use ES for indexing the data, and JDBC river for
doing periodic bulk indexing. I do not want to create one river per DB per
object type. This will lead to too many rivers.
I wanted to
For multi tenant, the river concept is awkward. River is a singleton and is
bound to single user execution, and you are right, creating river instances
per DB and per index does not scale.
There are several options:
- write a more sophisticated plugin which acts as a service and not as a
This is the generally accepted dogma and it has some merit. However, having
two storage systems is more than a bit annoying. If you are aware of the
limitations and caveats, elasticsearch is actually a perfectly good
document store that happens to have a deeply integrated querying engine.
This
Hi,
I´ve found the problem, the JSON-structure was not correct, it has to be
this if you are using the JAVA-API:
{
analysis:{ ... },
index:{
number_of_replicas:1,
number_of_shards:3
}
}
Thanks
Markus ;)
Von: elasticsearch@googlegroups.com
Is there a way to solve the following problem?
I have created a search field with suggestions functionality. The user is
able to search for names, categories, etc. These fields are mapped like:
{
settings: {
analysis: {
analyzer: {
*autocomplete*: {
type: custom,
Mohit Anchlia,
How do you sync ES with your main DB?
That's what I'm thinking for my project because I don't have much
experience with ES.
Thanks
On Aug 26, 2014 1:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
In general use elasticsearch only as a secondary index. Have a copy of
data
Hi Alexander,
If I may, I have a follow-up question to your response here. How does the
completion suggester behave with fields such as payload and score when it
is unifying the response based on output ?? Are scores increased based on
this combination? if payloads are different, which ones
All dates are UTC. Internally, a date maps to a number type long.
When applied on date fields the range filter accepts also a time_zone
parameter
{
range : {
born : {
gte: 2012-01-01,
time_zone: +1:00
}
}
}
but this is not possible
{
I am reading a lot studying what is the best aproach fo this.
My main question can be resumed in two points
If I choose ES to index my postgresql. What's the best way to do that?
I need cluster? The most problems that I read about was related to that. If
this is true and I can run in one node
Hi,
If I have two indices each having part of the record and joined using some
common identifier, can I issue a query across both indices and have
aggregations apply taking into consideration both indices?
Example:
Index 1: Type 1:
ID: String
Field1: String
Field2: String
Index 2: Type 2:
ID:
Hello AJ ,
You can do this as follows
{
query_string : {
query : test-type1.status:1 || test-type2.status:2
}
But then there is a bug associated with a corner condition of this -
https://github.com/elasticsearch/elasticsearch/issues/4081
So be a bit careful.
Thanks
Vineeth
Thanks
Hello Sandeep ,
What you are intending is not possible.
But then Elasticsearch do have some good relational operations which needs
to be defined before indexing.
If you can elaborate your use case , we can help on this.
Thanks
Vineeth
On Tue, Aug 26, 2014 at 6:04 PM, 'Sandeep Ramesh
Hi all,
In an attempt to squeeze more power out of our physical servers we want to
run multiple ES jvm's per server.
Some specs:
- servers has 24 cores, 256GB ram
- each instance binds on different (alias) ip
- each instance has 32GB heap
- both instances run under user 'elastic'
- limits for
Hello Sang ,
Can i know why you are using Hive.
I feel you can do the analysis in Elasticsearch itself.
Rest seems good to me.
Thanks
Vineeth
On Tue, Aug 26, 2014 at 8:03 AM, Sang Dang zkid...@gmail.com wrote:
Hello All,
I have selected #2 as my solution.
I write data to ES, and
Hello Jinyuan ,
I dont feel this is possible.
In such a provision , how will you define what the REST API will do ?
Thanks
Vineeth
On Tue, Aug 26, 2014 at 2:41 AM, Jinyuan Zhou zhou.jiny...@gmail.com
wrote:
Thanks,
--
You received this message because you are subscribed to the
Hello,
I am using elasticsearch 1.3.2 and try to understand elasticsearch (with my
Oracle background ;-)).
For testing I use the data available
on http://fec.gov/disclosurep/PDownload.do
There is a datafile for every state of the USA.
I don't know whether it is a good idea but I want to make 1
I was looking for the index alias, thanks all.
On Tuesday, June 17, 2014 9:31:00 AM UTC+1, Lee Gee wrote:
Is it possible to have one ES instance create an index and then have a
second instance use that created index, without downtime?
tia
lee
--
You received this message because you are
Thank you, Vineeth.
On Sunday, August 17, 2014 12:04:20 PM UTC+1, vineeth mohan wrote:
Hello Lee ,
You will need to use context suggester for this purpose -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/suggester-context.html
Also this difference stems from the
Hello Wang ,
By default the _source field stores the input JSON and gives it back for
each document match.
If you disable , ES wont be able to return it.
Hence the result you see.
By default ES wont make any efforts to tap the Stored information , it
rather takes the json stored in _source field.
Hello all,
Question
about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in
a distributed ES cluster. By distributed I mean I have:
2 nodes that are data only:
'node.data' = 'true',
'node.master' = 'false',
'http.enabled' = 'false',
1 node that is a
You might want to look at developing a plugin for this or maybe using an
existing one. This one for example might do partly what you
need: https://github.com/derryx/elasticsearch-changes-plugin
If you develop your own plugin, you should be able to tap into what is
happening in the cluster at a
I had some issues with logstash as well and ended up modifying the
elasticsearch_http plugin to tell me what was going on. Turned out my
cluster was red because my index template required more replicas than was
possible:-). The problem was that logstash does not fail very gracefully
and
I use a in house developed java rest client for elasticsearch.
Unfortunately it's not in any shape to untangle from our code base and put
on Github yet but I might consider that if there's more interest.
Basically I use apache httpclient, I implemented a simple round robin
strategy so I can
You should run one node per host.
Two nodes add overhead and suffer from the effects you described.
For mlockall, the user needs privilege to allocate the specified locked
mem, and the OS need contiguous RAM per mlockall call. If the user's
memlock limit is exhausted, or if RAM allocation gets
Thanks Jun, that was helpful. It helped me to realize I had not fully
connected my analyzer plugin.
On Thursday, August 21, 2014 11:23:47 PM UTC-7, Jun Ohtani wrote:
Hi Art,
I wrote an example specifying the kuromoji analyzer(kuromoji) and custom
analyzer(my_analyzer) for a field.
curl
I have documents with the above mentioned schema.
authorId : 10
authorName: Joshua Bloch
books: {
{
bookId: 101
bookName: Effective Java
description : effective java book with useful recommendations
Category: 1
sales: {
{
keyword: effective java
count: 200
},
{
keyword: java tips
count: 100
},
Thanks for the logstash mapping command. I can reproduce it now.
It's the LZF encoder that bails out at
org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt
which uses in turn sun.misc.Unsafe.getInt
I have created a gist of the JVM crash file at
Thanks Mark.
What confuses me are global setting (which suggests cluster-wide setting)
and on a specific node (which suggests node level setting). I could just
try it out, but it's hard to tell if the setting worked or not. :(
On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote:
I just looked at this code!
Its a setting that you set globally at the cluster level. It takes effect
per node. What that means is that for every active shard on each the
node gets an equal share of that much space. Active means has been
written to in the past six minutes or so. When a node
Still broken with lzf-compress 1.0.3
https://gist.github.com/jprante/d2d829b497db4963aea5
Jörg
On Tue, Aug 26, 2014 at 7:54 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Thanks for the logstash mapping command. I can reproduce it now.
It's the LZF encoder that bails out at
I posted an issue with Elastic HQ here:
https://github.com/royrusso/elasticsearch-HQ/issues/164
But just in case maybe an Elastic dev can have a look and see if it's
Elasticsearch issue or not.
Thanks
--
You received this message because you are subscribed to the Google Groups
elasticsearch
I posted an issue with Elastic HQ
here: https://github.com/royrusso/elasticsearch-HQ/issues/164
But just in case maybe an Elastic dev can have a look and see if it's
Elasticsearch issue or not.
Thanks
--
You received this message because you are subscribed to the Google Groups
elasticsearch
providing self-update:
I found that I could create cross-request cache using next script (like a
cross-request incrementer):
POST /test/_search
{
query: {match_all:{}},
script_fields: {
a: {
script: import groovy.lang.Script;class A extends Script{static
i=0;def
Our elasticsearch instance sometime return meaningless number for
terms_stats, query return correct data.
I am using Kibana as front end, this is generated query:
Hello,
In the past couple of days I've been getting a lot of error messages about
corrupted replica shards. The primary shards come up fast after ES process
restart but replicas take a long time to come back. Sometimes it takes a
few node restarts to 'kick' the nodes to start replica shards.
Hi ,
We are analyzing ES for storing our log data (~ 400 GB/Day) and will be
integrating Logstash and ES. What is the maximum amount of data that can
be stored on one node of ES ?
Regards,
Gaurav
--
You received this message because you are subscribed to the Google Groups
elasticsearch
OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for
spinning disks.
Maybe try also disabling merge throttling and see if that has an effect? 6
MB/sec seems slow...
Mike McCandless
http://blog.mikemccandless.com
On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker
Hello Konstantin,
You can use index value of name-%{+.MM.dd} in your elasticsearch
output in logstash
(link: http://logstash.net/docs/1.4.2/outputs/elasticsearch#index)
HTH,
David
On Tuesday, August 26, 2014 10:01:39 AM UTC-7, Konstantin Erman wrote:
Most of the guides I could find
See also https://github.com/elasticsearch/elasticsearch/pull/7440 (will be
in 1.4.0) which returns the actual RAM buffer size assigned to that shard
by the little dance.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Aug 26, 2014 at 2:15 PM, Nikolas Everett nik9...@gmail.com wrote:
I
A few questions:
What version of Elasticsearch are you using?
Are you using the Java client and is it the same version of the cluster?
Did you upgrade recently and was the index built with an older version of
Elasticsearch?
Elasticsearch recently added checksum verification (1.3?), so perhaps
Is there any facility in elasticsearch to help with sending terms to an
external processes after lucene processing (tokenization, filters, etc)?
The idea here is having some external analysis / nlp code run against the
documents while keeping all the pre-processing choices consistent and in
Additional information:
take mean of boolean value return 4,607,182,418,800,017,408
Thanks
On Tuesday, August 26, 2014 3:58:43 PM UTC-4, youwei chen wrote:
Our elasticsearch instance sometime return meaningless number for
terms_stats, query return correct data.
I am using Kibana as front
I'm experiencing a similar issue to this. We have two clusters:
- 2 node monitoring cluster (1 master/data 1 just data)
- 5 node production cluster (2 data, 3 masters)
The output below is from the non-master data node of the Marvel monitoring
cluster. There are no errors being
Just wanted to close the loop on this in case anyone stumbled upon the same
issue.
After upgrading to version 1.3.2 which had the performance increase
stemming from https://github.com/elasticsearch/elasticsearch/pull/5846, we
were able to see a dramatic decrease in parent/child query latency.
If you want to retrieve the term list of an index after Lucene processing
via REST HTTP API, you can try
https://github.com/jprante/elasticsearch-index-termlist
Jörg
On Tue, Aug 26, 2014 at 10:41 PM, Kevin B blaisde...@gmail.com wrote:
Is there any facility in elasticsearch to help with
Forgive me I'm a little lost.
I am working on deploying elasticsearch on a AWS server. Previously in
development I have started elasticsearch using ./bin/elasticsearch
-Des.config=/etc/elasticsearch/elasticsearch.yml
But in live deployment, I want to keep elasticsearch running as a service...
Check the logs under /var/log/elasticsearch, they should have something.
Also please be aware that 1.2.0 has a critical bug and you should be using
1.2.1 instead.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 27
ElasticHQ is a community plugin, the ES devs can't help here.
I have raised issues against ElasticHQ in the past and Roy has fixed them
pretty quickly :)
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 27 August
I am trying to troubleshoot the following observation:
Following code works as expected:
Elasticsearch::Model.client.search search_type: 'count', index:
target_indices, body: query
Response:
{took=2, timed_out=false, _shards={total=2, successful=2,
failed=0}, hits={total=6, max_score=0.0,
Hi,
I started using Marvel for my cluster monitoring.
Does Marvel have a way to set notification such as send me email if cpu
load is over 80%?
Thanks
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop
I am trying to add a custom boost to the different should clauses in the
bool query, but I am getting different number of results when I use the
bool query with 2 should clauses containing 2 simple query string query vs
a bool query with 2 should clauses with 2 function score query
Nope.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 27 August 2014 09:14, kti...@hotmail.com wrote:
Hi,
I started using Marvel for my cluster monitoring.
Does Marvel have a way to set notification such as send
Only master eligible for discovery.zen.minimum_master_nodes, so in your
case it is 1. And that's bad as you can end up with a split brain
situation. You should, if you can, make all three nodes master eligible.
gateway.recover_after_nodes is all nodes, as per
Thanks Mark, I found that if I comment out the line in elasticsearch.yml
that sets the data path, it works.
I will upgrade as you have suggested, thanks for that.
On Tuesday, August 26, 2014 4:04:05 PM UTC-7, Mark Walkom wrote:
Check the logs under /var/log/elasticsearch, they should have
Depends.
How much disk do you have? RAM? CPU? Java version and release? ES version?
What's your query load like? Are you doing lots of aggregates or facets?
The best way to know is to start using ELK on an platform indicative of
your intended server size and then see how much data a single node
Also, you should really be monitoring your systems and core measurements
(disk, CPU etc) with something specific for the job.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 27 August 2014 09:16, Mark Walkom
Hi,
My goal was to figure out if i need to scale out if there is a sudden spike
in the load.
Can you be more specific about something specific for the job?
On Tuesday, August 26, 2014 4:32:32 PM UTC-7, Mark Walkom wrote:
Also, you should really be monitoring your systems and core measurements
The question is how the micro analysis of the Kibana cloud do this without
setting 'not_analyzed' to the fields?
On Saturday, April 5, 2014 4:55:20 AM UTC+9, Binh Ly wrote:
You'll need to set the field name to not_analyzed so that you can get a
distinct value for the whole field (instead of
The question is how does the micro analysis of the Kibana can do this
without setting 'not_analyzed' to the fields?
On Saturday, April 5, 2014 4:55:20 AM UTC+9, Binh Ly wrote:
You'll need to set the field name to not_analyzed so that you can get a
distinct value for the whole field (instead
ok so they are for monitoring the system running Elasticsearch.
However, if i want to be notified of ES specific data points such as its JVM
memory % there doesn't seem to be a solution.
Thanks
From: ma...@campaignmonitor.com
Date: Wed, 27 Aug 2014 10:05:05 +1000
Subject: Re: alerting in
Thank you Mark. Makes perfect sense.
Chris
On Tue, Aug 26, 2014 at 6:25 PM, Mark Walkom ma...@campaignmonitor.com
wrote:
Only master eligible for discovery.zen.minimum_master_nodes, so in your
case it is 1. And that's bad as you can end up with a split brain
situation. You should, if you
Thanks Vineeth,
But I guess it does not change anything about REST API if elasticsearch
offer some way that is easier than building a plugin to allow registering
RestFiles to rest api calls. For a lot of frameworks, it is very common to
provide configuration based approach to register some of
Any thoughts anyone ? I am primarily looking for an answer to my 2nd
question.
On Tuesday, August 26, 2014 10:14:37 AM UTC-7, Srinivasan Ramaswamy wrote:
I have documents with the above mentioned schema.
authorId : 10
authorName: Joshua Bloch
books: {
{
bookId: 101
bookName:
Hi all,
I wrote my own custom query parser, and extended elasticsearch as a plugin,
the code is in the following link.
query parser http://pastebin.mozilla.org/6172836
customized query http://pastebin.mozilla.org/6172837
plugin http://pastebin.mozilla.org/6172844
I used the default settings of
67 matches
Mail list logo