date:20140826

For multi tenant, the river concept is awkward. River is a singleton and is
bound to single user execution, and you are right, creating river instances
per DB and per index does not scale.

There are several options:

- write a more sophisticated plugin which acts as a service and not as a
singleton. The ES service component, which would maintain state in the
cluster state, could accept job requests where each job request is
equivalent to a JDBC pull. The job requests are delegated to a node which
is not very busy with jobs (load balancing). The code of the JDBC river can
be reused for that.

- write a separate middleware for your tenants where they can have separate
access to the DB and prepare ES JSON bulk files from (maybe be by REST API
calls similar in style to ES). This would be a domain specific solution but
offers most flexibility to the tenants, they are free to decide how and
when to create and index the data from DB.

Jörg

On Tue, Aug 26, 2014 at 11:21 AM, Nitin Maheshwari ask4ni...@gmail.com
wrote:

Hi Jörg,

I am working on a multi tenant application where each tenant has its own
database. I am planning to use ES for indexing the data, and JDBC river for
doing periodic bulk indexing. I do not want to create one river per DB per
object type. This will lead to too many rivers.

I wanted to modify the JDBC river so that I can give parent DB location,
where all tenant db connection information is available. And then inside
the river, modify it such that a feader thread is created for each river.

Do you see any issue with this or do you have any other recommendation?

Thanks,
Nitin

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/771fc3a8-2203-4db8-a07b-067430e7a473%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/771fc3a8-2203-4db8-a07b-067430e7a473%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEZumpR1oazTuVn6Ofad71jgEMkSBOKARizK9-gOFpVsA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread 'Sandeep Ramesh Khanzode' via elasticsearch

This is the generally accepted dogma and it has some merit. However, having 
two storage systems is more than a bit annoying. If you are aware of the 
limitations and caveats, elasticsearch is actually a perfectly good 
document store that happens to have a deeply integrated querying engine. 
This is useful since most solutions involving a secondary store involve 
solutions that have a much less capable querying engine and additional 
latency + architectural complexity related to pumping around data to 
elastic search. 

Elasticsearch crud operations are atomic. I.e. you can read your own writes 
across the cluster. If you use the version attribute during updates, you 
can detect version conflicts and prevent overwriting updates with stale 
data as well. This is a similar model that you would find in e.g. couchdb 
and similar document stores. There are not that many sharded and 
replicated, horizontally scalable document stores out there and even fewer 
with decent querying ability.

The caveat is that elasticsearch is not as battle tested as other solutions 
in this space and that various people have shown that ways exist to cause 
an elastic search cluster to lose data, to corrupt data, etc. So, you need 
to be prepared to be able to recover from such situations. That means you 
need backups (e.g. use the snapshots feature) and a plan for when things go 
bad. 

The flip side is that other solutions have issues as well. Postgresql 
clustering is brand new and probably has issues and if you use it in non 
clustered mode, the failure scenarios get even more interesting. I use 
Mariadb Galera cluster and it sucks big time and it needs a lot of 
handholding during upgrades.  Couchdb doesn't shard and shares server 
failure scenarios with elasticsearch. Mongodb and cassandra each have had 
their share of issues related to data corruption and data loss in the 
recent past and both have recently fixed major issues related to that. So, 
there are lots of solutions out there and none of them are perfect.

Elasticsearch has several major areas where it needs improvement (and which 
are indeed being worked on in recent versions):
1) it has many ways it can run out of memory. If you skim through the 
release notes of recent versions, you'll see a lot of fixes related to that 
including the use of e.g. circuit breakers. The problem with OOM's is that 
it can cause a cascading cluster failure where one node becomes slow, 
eventually drops out of the cluster and then other nodes start having the 
same issues. I've personally seen Kibana kill our cluster on two occasions. 
In both cases the logs of all nodes were full of OOM's and the cluster died 
while simply clicking through different dashboards in Kibana. This has not 
happened with the current 1.3.x version (yet) but that doesn't mean it is 
impossible.
2) split brain situations when a quorum is lost but not detected are fairly 
easy to trigger. Every time I do a rolling update, the cluster takes 
several seconds to catch up with fact that I'm shutting down nodes. I have 
a three node cluster. One node goes down, means my cluster should be 
yellow. Two nodes down means red and it should no longer accept writes. The 
problem is that during those few seconds, the cluster status may not 
reflect reality and nodes may in fact be accepting writes when they 
shouldn't. 
3) A full cluster restart needs a lot of handholding. The reason for this 
is that most of the failure scenarios relate to there not being a quorum 
and detecting that. For example, if you simply restart the nodes one by one 
quickly you will easily get your cluster in a red state where it should no 
longer be accepting writes. The problem as described above is that 
detecting this relies on timeouts and there may be some nodes that continue 
to write for a few seconds after they should have stopped doing that. By 
the time your cluster goes red, it's too late and you are going to have to 
manually decide which shards you want to loose. That's why you need to keep 
an eye on cluster status during rolling updates. Imagine somebody power 
cycling your elastic search node cluster or worse, rebooting the switch 
that connects your nodes.
4) Elasticsearch under load may throw 503s occasionally. I've seen this 
happen on our test infrastructure a couple of times and it worries me. This 
is not something you want to see when you are writing customer data. 

Mitigation for these issues typically involves using specialized nodes for 
read and write traffic and cluster management. Additionally, you need to 
heavily tweak things to make certain failure scenarios less likely. Out of 
the box, there is a lot of stuff that can go wrong.  

We're actually deprecating our mariadb architecture and switching to an 
elasticsearch only architecture. I'm well aware that I'm taking a risk here 
and I have a backup plan for most of those risks. This includes changing 
plans and switching to couchdb or a similar document store

AW: Shards

2014-08-26 Thread Markus Wiesenbacher

Hi,

 

I´ve found the problem, the JSON-structure was not correct, it has to be
this if you are using the JAVA-API:

 

{ 
   analysis:{ ... },

   index:{ 
   number_of_replicas:1,
   number_of_shards:3

   }
}

 

Thanks

Markus ;)

 

Von: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
Im Auftrag von Markus Wiesenbacher
Gesendet: Montag, 25. August 2014 23:55
An: elasticsearch@googlegroups.com
Betreff: Shards

 

Hi folks,

 

I am using a single Node-Cluster (v1.3.2) on my PC, and I was wondering that
there are always 5 shards in the file-system (separate Lucene-indices), no
matter how many I configure in in elasticsearch.yml or programmatically with
Java-API (loadFromSource with JSON-String). Do I missunderstand something?

 

Many thanks!

 

Markus ;)

 

BTW: Here´s my JSON for the settings:

 

{ 
   analysis:{ ... },
   settings:{ 
  index:{ 
 number_of_replicas:1,
 number_of_shards:3
  }
   }
}

 

-- 
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/009901cfc0af%2448b7ab40%24da
2701c0%24%40codefreun.de
https://groups.google.com/d/msgid/elasticsearch/009901cfc0af%2448b7ab40%24d
a2701c0%24%40codefreun.de?utm_medium=emailutm_source=footer .
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/001a01cfc11e%249c6973d0%24d53c5b70%24%40codefreun.de.
For more options, visit https://groups.google.com/d/optout.

Get distinct result by using multi_match and suggestion

2014-08-26 Thread Ramy

Is there a way to solve the following problem?

I have created a search field with suggestions functionality. The user is 
able to search for names, categories, etc. These fields are mapped like:

{
  settings: {
analysis: {
  analyzer: {
*autocomplete*: {
  type: custom,
  tokenizer: *edge_ngram_tokenizer*,
  filter: [ lowercase ]
}
  },
  tokenizer: {
*edge_ngram_tokenizer*: {
  type: edgeNGram,
  min_gram: 1,
  max_gram: 20,
  token_chars: [letter, digit]
}
  }
}
  },
  mappings: {
my_type: {
dynamic: strict,
properties: {
id: {
type: long
},
*name*: {
type: string,
analyzer: english,
fields: {
  *autocomplete: {*
*type: string, *
*index_analyzer: autocomplete, *
*search_analyzer: standard*
*  }*
}
},
*category*: {
type: string,
analyzer: english,
fields: {
  *autocomplete: {*
*type: string, *
*index_analyzer: autocomplete, *
*search_analyzer: standard*
*  }*
}
},
...

Now when i do something like this:
curl -XGET http://localhost:9200/my_index/my_type/_search; -d'
{
  _source: false,
  query: {
multi_match: {
  query: pet,
  fields: [
 *.*autocomplete*
  ]
}
  }
}'

I get results like these:
- Peter
- Peter
- Peter
- Petra
- Petra
etc.

*How can I reduce (distinct) the results on server side like these?*
- Peter
- Petra
- etc.

thx, Ramy

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/859b067e-f2cb-4cc4-beef-bba547a85906%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread Raphael Waldmann

Mohit Anchlia,

How do you sync ES with your main DB?

That's what I'm thinking for my project because I don't have much
experience with ES.

Thanks
On Aug 26, 2014 1:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote:

In general use elasticsearch only as a secondary index. Have a copy of
data somewhere else which is more reliable. Elasticsearch often runs into
index corruption issues which are hard to resolve.

On Mon, Aug 25, 2014 at 9:30 PM, xiehai...@gmail.com wrote:

On Tuesday, August 26, 2014 6:46:12 AM UTC+8, Raphael Waldmann wrote:

Hi,

First I would like to thanks all of you for Elastic. I am thinking in
use it in a ERP that I am building. What do you think about this? Am I
crazy?

Has someone face this? I really don't think that I am comfy enough to do
this, change the problems that I already know, for new problems that I
really don't know how to deal.

I believe that nosql will prevail over traditional sql, but I don't know
if I am ready to this task.

So how you think that I should integrate (or not) postgresql with
ELASTICSEARCH?

Will you plan t use ES to index data in postgresql?

I have similar idea, want to use ES instead datawarehouse.

Some problems I can see:
1) Data in RDBMS are stored in tables, connected with relationship. You
can use very complex sql to query a complex result, how to do in ES?
2) If your want to run some analyse algorithms with exist data, how to
running in ES?
3) if your data are enough big, search one keyword in '_all' field, ES
will be slow?

Thanks.
-Terrs

Thanks again,

rsw1981

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yHVPWNXxgys/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqoEZHchXCU7p%3DNt9FtibQyCiWwU6nt9YQNUATiOspMVQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqoEZHchXCU7p%3DNt9FtibQyCiWwU6nt9YQNUATiOspMVQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHMXrw5BH-OA2BqWmUWOt2HyB-3tZEiw3cwJ%3D1U9aaucMTk-Tg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: AutoCompletion Suggester - Duplicate record in suggestion return

2014-08-26 Thread alistairj

Hi Alexander,

If I may, I have a follow-up question to your response here. How does the
completion suggester behave with fields such as payload and score when it
is unifying the response based on output ?? Are scores increased based on
this combination? if payloads are different, which ones are returned?

Thanks for you help!

Alistair

On Monday, April 21, 2014 2:26:13 PM UTC+2, Alexander Reelsen wrote:

Hey,

the output is used to unify the search results, otherwise the input is
used. The payload itself is just meta information.
The main reason, why you see the suggestion twice is, that even though a
document is deleted and cannot be found anymore, the suggest data
structures are only cleaned up during merges/optimizations. Running
optimize should fix this.

Makes sense?

--Alex

On Sun, Apr 13, 2014 at 12:49 PM, kidkid zki...@gmail.com javascript:
wrote:

I have figure out the problem.
The main problem is I have used the same output for all input then ES
have been wrong in this case.

I still trying to improve the performance. I am just test on 64Gb Ram
server (32Gb for ES 1.0.1) 24 core.
Have only 2 record but it took me 3ms to suggest.

On Sunday, April 13, 2014 4:53:21 PM UTC+7, kidkid wrote:

There are something really strange.
I don't know whether anyone have worked with this such feature or it's
just not-stable feature.
If we do index same input, and different output,payload, then only one
result found.

Do anyone tell me how could I fix it ?

https://groups.google.com/d/msgid/elasticsearch/f6547a58-c002-4ff3-80c9-2052e1d14ddd%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/13c35309-a55b-45d7-ba37-bd7bb44e6f20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Timezone in Simple Query

2014-08-26 Thread Gianni Livolsi

All dates are UTC. Internally, a date maps to a number type long. 
When applied on date fields the range filter accepts also a time_zone 
parameter 
{ 
range : { 
born : { 
gte: 2012-01-01, 
time_zone: +1:00 
} 
} 
} 

but this is not possible 
{ 
match : { 
post_date : 2012-01-01, 
*time_zone: +1:00*  this do not work } 
} 

How can i do  to permit any users to query correct respect his own 
timezone...appending 
it to query?? 


 

Tnx 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39414b48-f2c5-4faa-b103-96b91c0888b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread Raphael Waldmann

I am reading a lot studying what is the best aproach fo this.

My main question can be resumed in two points

If I choose ES to index my postgresql. What's the best way to do that?

I need cluster? The most problems that I read about was related to that. If
this is true and I can run in one node should I do that?

Thanks for share your experience.

Have a nice day

Raphael Waldmann

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHMXrw52OShKM0snMxtHy-rSPEvscNQeoUurbR8uqp_x0%2BPZtA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Aggregation across indices

Hi,

If I have two indices each having part of the record and joined using some 
common identifier, can I issue a query across both indices and have 
aggregations apply taking into consideration both indices?

Example:
Index 1: Type 1:
ID: String
Field1: String
Field2: String

Index 2: Type 2:
ID: String (From above. I can keep this same to behave like a foreign key.)
Field3: String
Field4: String

Can I effect a join across both indices and aggregate on Field4 for example?

Please let me know. Thanks,
Sandeep 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Ability to search accross 'types' in the same index, with different search parameters yet applying the same size and from values, in a single search query

Hello AJ ,

You can do this as follows

{
query_string : {
query : test-type1.status:1 || test-type2.status:2
}

But then there is a bug associated with a corner condition of this -
https://github.com/elasticsearch/elasticsearch/issues/4081
So be a bit careful.

Thanks
Vineeth

}

On Tue, Aug 26, 2014 at 1:21 PM, Ajinkya Apte ajin...@gmail.com wrote:

Hello,
Examples of some documents:

POST /test-index/test-type-1/doc-1
{
text : Some text,
status : 1
}

POST /test-index/test-type-2/doc-1
{
text : Some new text,
status : 1
}

POST /test-index/test-type-2/doc-2
{
text : Some even new text,
status : 2
}

Is there a single query I can use so that I can get all the documents that
are of 'status'=1 in 'type'='test-type-1' and 'status'=2 in
'type'='test-type-2' applying the same 'size' and 'from' params? As of
right now I am running two different queries and then I am trying to merge
them programatically. Any better way you recommend?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7f6bac9f-8fc3-4886-8a70-4dac81424073%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7f6bac9f-8fc3-4886-8a70-4dac81424073%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nkrc8beGyi-tLgox4Ec25H%2BUftBTy1-Q7xbwXOP4RMVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregation across indices

Hello Sandeep ,

What you are intending is not possible.
But then Elasticsearch do have some good relational operations which needs
to be defined before indexing.
If you can elaborate your use case , we can help on this.

Thanks
Vineeth

On Tue, Aug 26, 2014 at 6:04 PM, 'Sandeep Ramesh Khanzode' via
elasticsearch elasticsearch@googlegroups.com wrote:

Hi,

If I have two indices each having part of the record and joined using some
common identifier, can I issue a query across both indices and have
aggregations apply taking into consideration both indices?

Example:
Index 1: Type 1:
ID: String
Field1: String
Field2: String

Index 2: Type 2:
ID: String (From above. I can keep this same to behave like a foreign key.)
Field3: String
Field4: String

Can I effect a join across both indices and aggregate on Field4 for
example?

Please let me know. Thanks,
Sandeep

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D93B%2Bk1QCKQHg_n%3D%3Da9Yih9Lyi1k4Gt_LZ7kywnBiroQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Failed start of 2nd instance on same host with mlockall=true

2014-08-26 Thread R. Toma

Hi all,

In an attempt to squeeze more power out of our physical servers we want to
run multiple ES jvm's per server.

Some specs:
- servers has 24 cores, 256GB ram
- each instance binds on different (alias) ip
- each instance has 32GB heap
- both instances run under user 'elastic'
- limits for 'elastic' user: memlock=unlimited
- es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:
- increase of overal cpu load
- lots of I/O to disks
- no logging for 2nd instance
- 2nd instance hangs
- 1st instance keeps running, but gets slowish
- cd /proc/pid causes a hang of cd process (until 2nd instance is killed)
- exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed)

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Need some advice to build a log central.

Hello Sang ,

Can i know why you are using Hive.
I feel you can do the analysis in Elasticsearch itself.
Rest seems good to me.

Thanks
Vineeth

On Tue, Aug 26, 2014 at 8:03 AM, Sang Dang zkid...@gmail.com wrote:

Hello All,
I have selected #2 as my solution.
I write data to ES, and use kibana+ to realtime monitor.
For stats, I use Hive.

Each project, I will create a index, for each type of log I will put in a
ES Type,
ex: ProjectXlog_debug
log_error
Stats_API
Stats_PageView
Stats_XYZ

I am wonder whether it's good ?
Should I separate by time for each type of project ?

Regards.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35487688-4204-4f4d-aa2e-2a9b6a43aa82%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35487688-4204-4f4d-aa2e-2a9b6a43aa82%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mvMAK%3DkKqg%3DTyzb-J0Boo_CVPUnC_vY0j%2BhNn_rH8_%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is it possible to register a RestFilter without creating a plugin?

Hello Jinyuan ,

I dont feel this is possible.
In such a provision , how will you define what the REST API will do ?

Thanks
   Vineeth


On Tue, Aug 26, 2014 at 2:41 AM, Jinyuan Zhou zhou.jiny...@gmail.com
wrote:

 Thanks,

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEkN%3DsgLqmkT-vwQ%2BptCs0LmPa0BDw5hX3A5Yzg-Wx_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

define multiple types in an index

2014-08-26 Thread HansPeterSloot

Hello,

I am using elasticsearch 1.3.2 and try to understand elasticsearch (with my 
Oracle background ;-)).
For testing I use the data available 
on http://fec.gov/disclosurep/PDownload.do

There is a datafile for every state of the USA.
I don't know whether it is a good idea but I want to make 1 index with a 
type for every state.

I want to define the fields and their types in advance.

Can I create the index with type AL and add other types after creation?
 I tried but I was not able to do it.

I created the the folloing index 
curl -XPOST localhost:9200/contributions -d '{
settings : {
number_of_shards : 10,  number_of_replicas : 1, _index : true
},
mappings : {
  AK : {properties : 
 {cand_id : {type : string , index : 
not_analyzed  },
  cand_nm : {type : string   },
  cmte_id : {type : string  }
} 
  }
, AL : {
properties : {
  cand_id : {type : string , index : 
not_analyzed  },
  cand_nm : { type : string   },
  cmte_id : {type : string  }
} 
  } 
} 
}'

Can I add types for AR and AZ after creation?
They have the same column definition. 
Is there a better way to achieve this?

Regards HansP

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/191875fc-8378-4ed1-8ca9-b0f2fbfc2ccc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Swap indexes?

2014-08-26 Thread Lee Gee

I was looking for the index alias, thanks all.

On Tuesday, June 17, 2014 9:31:00 AM UTC+1, Lee Gee wrote:

 Is it possible to have one ES instance create an index and then have a 
 second instance use that created index, without downtime?

 tia
 lee


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c577a018-fe46-4b73-a08c-ea07796fa02d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: _suggest suggestion/question

2014-08-26 Thread Lee Gee

Thank you, Vineeth.

On Sunday, August 17, 2014 12:04:20 PM UTC+1, vineeth mohan wrote:

Hello Lee ,

You will need to use context suggester for this purpose -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/suggester-context.html

Also this difference stems from the fact that , both actual data and auto
completion data are stored in different data structures.
This is to make sure that the auto completion data is memory resident and
thus super fast.

Thanks
Vineeth

On Sun, Aug 17, 2014 at 3:32 PM, Lee Gee lee...@gmail.com javascript:
wrote:

My reading, which may not be accurate, of this [1] clear and concise
post,
is that it is not possible to use a reference to an existing field as an
argument to a suggestor's 'input' or 'payload' fields.

Please would you clarify if I have missed something?

If I was correct, would it be much work to add these features?

TIA
Lee

[1] http://www.elasticsearch.org/blog/you-complete-me/

https://groups.google.com/d/msgid/elasticsearch/2367a474-f47b-43ae-bad0-7326256dec60%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9ea51925-5ef8-48f3-8960-e5462e112713%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to get the field infomation when _all and _source was set disabled

Hello Wang ,

By default the _source field stores the input JSON and gives it back for
each document match.
If you disable , ES wont be able to return it.
Hence the result you see.
By default ES wont make any efforts to tap the Stored information , it
rather takes the json stored in _source field.

Now to get the text set as stored , you need to use the fields option.
Typically , you need to tell ES  , you need so and so fields.
This information would be searched in stored field space rather than
_source field.

In your query , you need to mention the fields you are interested in -

searchRequestBuilder.setTypes(type1).fields([ title ] )
( equal-ant in Java)


 Thanks
   Vineeth


On Mon, Aug 25, 2014 at 1:09 PM, Wang Mingxing wmx...@gmail.com wrote:

  Hi,
 I created an index, which was named test_all, and it has a table :
 type1. I want to test the usage of _all and _source. Now , I change
 their status to false. The mapping as follows:
 $ curl -XGET 'localhost:9200/test_all/_mapping/type1?pretty'
 {
   test_all : {
 mappings : {
   type1 : {
 _all : {
   enabled : false
 },
 _source : {
   enabled : false
 },
 properties : {
   content : {
 type : string,
 analyzer : ik
   },
   title : {
 type : string,
 store : true,
 analyzer : ik
   }
 }
   }
 }
   }
 }

 In the table type1, I store the title information. I insert five
 document in type1. But, when retrievaling them, I could not find the
 field title information.

 $ curl -XGET 'localhost:9200/test_all/type1/_search?pretty'
 {
   took : 16,
   timed_out : false,
   _shards : {
 total : 5,
 successful : 5,
 failed : 0
   },
   hits : {
 total : 5,
 max_score : 1.0,
 hits : [ {
   _index : test_all,
   _type : type1,
   _id : zWQno3rLS56hkwJ_Y108Dg,
   _score : 1.0
 }, {
   _index : test_all,
   _type : type1,
   _id : BDKa-IP7TDK_iM2VNGFPYw,
   _score : 1.0
 }, {
   _index : test_all,
   _type : type1,
   _id : n97suWSwQACgx35APTOqPg,
   _score : 1.0
 }, {
   _index : test_all,
   _type : type1,
   _id : 2P7OblUiQB2Y8ZCtWWWTdg,
   _score : 1.0
 }, {
   _index : test_all,
   _type : type1,
   _id : Lo_PFVeKTEWazwCLbyKAqQ,
   _score : 1.0
 } ]
   }
 }

 Then, I try to resolve it by JAVA API:
 public static void indexSearch(Client client){
 SearchRequestBuilder
 searchRequestBuilder=client.prepareSearch(test_all);
 searchRequestBuilder.setTypes(type1);
 SearchResponse
 searchResponse=searchRequestBuilder.execute().actionGet();
 SearchHit[]hits=searchResponse.getHits().getHits();
 System.out.println(count: +hits.length);
 for(SearchHit hit:hits){
 System.out.println();
 System.out.println(docID: +hit.getId());
 System.out.println(score: +hit.getScore());
 System.out.println(title:
 +hit.getFields().get(title).toString());
 }
 }

  and it shows:

 Exception in thread main count: 5
 
 java.lang.NullPointerException
 at es.api.Test_All.indexSearch(Test_All.java:64)
 at es.api.Test_All.main(Test_All.java:73)
 docID: zWQno3rLS56hkwJ_Y108Dg
 score: 1.0

 I guess the value doesn't exist.

 Can you call me why?

 Many Thanks.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/53FAE834.109%40gmail.com
 https://groups.google.com/d/msgid/elasticsearch/53FAE834.109%40gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mD1J2Jx%2B5_%2BppYsd9Uc%3DkAtS%2B9fjTWbJjOYXmAorYP5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

gateway.recover_after_nodes minimum_master_nodes in a distributed environment?

2014-08-26 Thread Chris Neal

Hello all,

Question
about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in
a distributed ES cluster.  By distributed I mean I have:

2 nodes that are data only:
'node.data' = 'true',
'node.master' = 'false',
'http.enabled' = 'false',

1 node that is a master/search only node:
'node.master' = 'true',
'node.data' = 'false',
'http.enabled' = 'true',

When setting discovery.zen.minimum_master_nodes, is the (n / 2) + 1 formula
including *all* nodes of all types in the cluster, or just those who can be
masters?

Similarly, when setting gateway.recover_after_nodes, is this value the
number of all nodes of all types in the cluster, or just those that are
data nodes?

Thank you very much for your time!
Chris

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using elasticsearch as a realtime fire hose

You might want to look at developing a plugin for this or maybe using an
existing one. This one for example might do partly what you
need: https://github.com/derryx/elasticsearch-changes-plugin

If you develop your own plugin, you should be able to tap into what is
happening in the cluster at a pretty low level.

Jilles

On Monday, August 25, 2014 9:27:42 AM UTC+2, Jim Alateras wrote:

What kind of events do you think of? Single new document indexed? Batch of
docs indexed? Node-wide? Or cluster wide?

event on whenever a document is added to an index cluster wide

You mention Redis, for something like publish/subscribe pattern, you'd
have to use a persistent connection and implement your own ES actions,
which is possible with e.g. HTTP websockets

A sketchy implementation can be found here:

https://github.com/jprante/elasticsearch-transport-websocket

thanks for the reference, I will have a deeper look at it.

Jörg

On Sat, Aug 23, 2014 at 8:09 PM, Jim Alateras j...@sutoiku.com wrote:

I was wondering whether there were any mechanisms to use ES as a
realtime feed for downstream systems. I have a cluster that gathers
observations from many sensors. I have a need to maintain a list of
realtime counters in REDIS so I want to further process these observation
once they hit the database. Additionally I also want to be able to create
event streams for different type of feeds.

I could do all this outside ES but I was wondering whether there were
mechanisms within ES that will allow me to subscribe to add events for a
particular type or index.

cheers
/jima

https://groups.google.com/d/msgid/elasticsearch/9f5b1d11-0be1-461d-a5bd-dd70f1a0b6c1%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a3468dc3-2b96-4f00-a921-fba6892b5bba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Logstash stop communicating with Elasticsearch

I had some issues with logstash as well and ended up modifying the 
elasticsearch_http plugin to tell me what was going on. Turned out my 
cluster was red because my index template required more replicas than was 
possible:-). The problem was that logstash does not fail very gracefully 
and logging is not that great either (which I find ironic for a logging 
centric product). So I modified it to simply log the actual elastic search 
response, which was a 503 unavailable. From there it was pretty clear what 
to fix.

I filed a bug + pull request for this but it seems nobody has done anything 
with it so far: https://github.com/elasticsearch/logstash/issues/1367

Jilles

On Saturday, August 23, 2014 2:51:18 PM UTC+2, 凌波清风 wrote:

 Hello, 
 I also happen that you encounter this problem, the situation happened 
 to me is that this error occurs in the morning every day. You do not know 
 how to solve, hoping to give some help.

 thx.

 在 2014年7月18日星期五UTC+8下午8时56分54秒，Alexandre Fricker写道：

 Everithing was working fine until 4 h this morning when Logstash stop 
 send new logs to Elasticsearch and when I stop then restart the losgstash 
 process
 it reprocess a bulk of new log lines and when it start to send it to 
 Elasticserch it start writing this message again and again

 {:timestamp=2014-07-18T09:46:29.593000+0200, :message=Failed to 
 flush outgoing items, :outgoing_count=86, :exception=#RuntimeError: 
 Non-OK response code from Elasticsearch: 404, 
 :backtrace=[/soft/sth/lib/logstash/outputs/elasticsearch/protocol.rb:127:in
  
 `bulk_ftw', 
 /soft/sth/lib/logstash/outputs/elasticsearch/protocol.rb:80:in `bulk', 
 /soft/sth/lib/logstash/outputs/elasticsearch.rb:321:in `flush', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:219:in
  
 `buffer_flush', org/jruby/RubyHash.java:1339:in `each', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:216:in
  
 `buffer_flush', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:193:in
  
 `buffer_flush', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:112:in
  
 `buffer_initialize', org/jruby/RubyKernel.java:1521:in `loop', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:110:in
  
 `buffer_initialize'], :level=:warn}

 But when I check Elastisearch status in Elastisearch HQ everything is 
 Green and OK

 From the day beafore nothing change except that I added a new type of 
 data but only 15 logs every 1 minute



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5fd9e3e2-c38b-4678-995a-80787375267f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Java API or REST API for client development ?

I use a in house developed java rest client for elasticsearch.
Unfortunately it's not in any shape to untangle from our code base and put
on Github yet but I might consider that if there's more interest.

Basically I use apache httpclient, I implemented a simple round robin
strategy so I can failover if nodes go down, and I implemented a simple
rest client around this to support put/post/delete/get requests. Also added
some basic interpretation of statuses and have mapped those to sensible
exceptions.

The idea is that this client is wrapped with another client that supports
more high level APIs that are exposed from elasticsearch. So you can do
things like index/delete documents, manage aliases, do bulk indexing etc.
My long term goal was actually to have two implementations of that client
one for REST and one for embedded elasticsearch. That would be an
interesting project because it would give you choice. Except, I never got
around to doing the embedded client implementation since we don't really
need it so far. Something else that we use is to model the query DSL using
static java methods and provides a simple DSL for creating queries in Java.
This in turn uses my github jsonj project that allows you to
programmatically manipulate json structures.

None of this is particularly complicated but altogether there is quite a
bit of code to write and quite a few things you can get wrong. It's always
hard to separate the general purpose stuff from the application specific
stuff and thats one reason why I have not yet put this code out.

Jilles

On Wednesday, March 26, 2014 10:46:16 AM UTC+1, Subhadip Bagui wrote:

Hi,

We have a cloud management framework where all the event data are to be
stored in elasticsearch. I have to start the client side code for this.

I need a suggestion here. Which one should I use, elasticsearch Java API
or REST API for the client ?

Kindly suggest and mention the pros and cons for the same so it will be
easy for me to decide the product design than latter hassel.

Subhadip

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/350caf92-63c1-4a0c-a1b5-781cb4b09cfb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Failed start of 2nd instance on same host with mlockall=true

You should run one node per host.

Two nodes add overhead and suffer from the effects you described.

For mlockall, the user needs privilege to allocate the specified locked
mem, and the OS need contiguous RAM per mlockall call. If the user's
memlock limit is exhausted, or if RAM allocation gets fragmented,
memlocking is no longer possible and fails.

Jörg

On Tue, Aug 26, 2014 at 2:54 PM, R. Toma renzo.t...@gmail.com wrote:

Hi all,

In an attempt to squeeze more power out of our physical servers we want to
run multiple ES jvm's per server.

The 1st instance has been running for weeks.

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvtj3NKTWyMTjTre1FfJS31Khn%3DDAy_kCxgVcCFpmDSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: how to use my customer lucene analyzer(tokenizer)?

2014-08-26 Thread art

Thanks Jun, that was helpful. It helped me to realize I had not fully
connected my analyzer plugin.

On Thursday, August 21, 2014 11:23:47 PM UTC-7, Jun Ohtani wrote:

Hi Art,

I wrote an example specifying the kuromoji analyzer(kuromoji) and custom
analyzer(my_analyzer) for a field.

curl -XPUT http://localhost:9200/kuromoji-sample; -d'
{
settings: {
index: {
analysis: {
analyzer: {
my_analyzer: {
tokenizer: kuromoji_tokenizer,
filter: [
kuromoji_baseform
]
}
}
}
}
},
mappings: {
sample: {
properties: {
title: {
type: string,
analyzer: my_analyzer
},
body : {
type: string,
analyzer: kuromoji
}
}
}
}
}'

I hope that it will be helpful for you.

2014-08-22 9:18 GMT+09:00 a...@safeshepherd.com javascript::

I have the same question about using an analyzer I have written as a
plug-in for ElasticSearch 1.3.

https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/blob/es-1.3/README.md

demonstrates only how to use the tokenizers in combination with the
built-in CustomAnalyzer. They do not show how to use the kuromoji analyzer
itself.

When I try to specify my analyzer for a field, I get errors like this:

MapperParsingException[Analyzer [special_analyzer] not found for field
[foo]];

Can you show an example of how to specify the kuromoji analyzer for a
field? I should then be able to adapt it for use with my plugin analyzer.

Thanks in advance,
Art

On Tuesday, August 5, 2014 12:34:42 AM UTC-7, Jun Ohtani wrote:

Hi,

I think this plugin will be helpful for you.

https://github.com/elasticsearch/elasticsearch-analysis-kuromoji
2014/08/05 15:58 fanc...@gmail.com:

I want to use my own Chinese analyzer and I can write lucene analyzer
class myself. How can I integrate it to elasticsearch?
I googled and found http://www.elasticsearch.org/guide/en/
elasticsearch/guide/current/custom-analyzers.html. But it only combine
existing tokenizers and filters. I can use tokenizer writing in java by
myself.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c3fe52cd-8cb5-4c53-b0fe-87183deb45bf%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c3fe52cd-8cb5-4c53-b0fe-87183deb45bf%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

https://groups.google.com/d/msgid/elasticsearch/da795847-3ea2-4afb-9a7b-aefdd6f111a0%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
---
Jun Ohtani
blog : http://blog.johtani.info

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a792d08d-534f-4619-bfcb-0f01262b6c51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Function Query with an aggregation function of nested field

2014-08-26 Thread Srinivasan Ramaswamy

I have documents with the above mentioned schema. 


authorId : 10
authorName: Joshua Bloch
books: {
{
bookId: 101
bookName:  Effective Java
description : effective java book with useful recommendations
Category: 1
sales: {
{
keyword: effective java
count: 200
},
{
keyword: java tips
count: 100
},
{
keyword: java joshua bloch
count: 50
} 
}
createDate: 08-25-2014
},
{
bookId: 102,
bookName: Java Puzzlers
description : Java Puzzlers: Traps, Pitfalls, and Corner Cases 
Category: 2
sales: {
{
keyword: java puzzlers
count: 100
},
{
keyword: joshua bloch puzzler
count: 50
}
}
}
}

The sales information is stored with each book along with the search query 
that lead to that sales. If the user applied a category filter, I would 
like to count only books that belong to that category.

I would like to sort the list of authors returned based on a function of 
sales data and text match. For eg if the search query  is java I would 
like to return the above mentioned doc and all other author documents which 
has the term java in them. I came up with the following query:

{
   query: {
  function_score: {
 boost_mode: replace,
 query: {
match: { bookName:java}
 },
 script_score: {
params: {
   param1: 2
},
 script: doc['books.sales.count'].isEmpty() ? _score : 
_score * doc['books.sales.count'].value * param1 
 }
  }
   }
}


I have few questions with the query i have above
1. The results dont look sorted by sales. I have authors who dont have any 
books with sales in them at the top
2. How do i use the sum of all sales for an author (across all books within 
the author document) in the script ? Is there a sum function for the nested 
fields inside a document when using script_score ? Note that sales is a 
nested field inside another nested field products.
3. As a next step I would also like to use a filter for keyword within the 
script_score to only include sales whose keyword value matches with the 
search query term

Any help would be much appreciated. 

Thanks
Srini

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f858caee-bb43-45e1-ada3-212a78378aa0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

Thanks for the logstash mapping command. I can reproduce it now.

It's the LZF encoder that bails out at
org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt

which uses in turn sun.misc.Unsafe.getInt

I have created a gist of the JVM crash file at

https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b

There has been a fix in LZF lately
https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

for version 1.0.3 which has been released recently.

I will build a snapshot ES version with LZF 1.0.3 and see if this works...

Jörg



On Mon, Aug 25, 2014 at 11:30 PM, tony.apo...@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and Logstash
 1.4.1.  The error occurs even before my data is sent.  Can you try to
 reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {index.refresh_interval
 : 5s  },  mappings : {_default_ : {   _all : {enabled :
 true},   dynamic_templates : [ { string_fields : {
   match : *,   match_mapping_type : string,
 mapping : { type : string, index : analyzed,
 omit_norms : true,   fields : { raw :
 {type: string, index : not_analyzed, ignore_above : 256}
   }   } }   } ],   properties : {
 @version: { type: string, index: not_analyzed }, geoip
  : {   type : object, dynamic: true,
 path: full, properties : {   location : {
 type : geo_point } } }   }}  }}



 On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote:

 I have no plugins installed (yet) and only changed es.logger.level to
 DEBUG in logging.yml.

 elasticsearch.yml:
 cluster.name: es-AMS1Cluster
 node.name: KYLIE1
 node.rack: amssc2client02
 path.data: /export/home/apontet/elasticsearch/data
 path.work: /export/home/apontet/elasticsearch/work
 path.logs: /export/home/apontet/elasticsearch/logs
 network.host:    = sanitized line; file contains actual
 server IP
 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7]
   = Also sanitized

 Thanks,
 Tony




 On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote:

 I tested a simple Hello World document on Elasticsearch 1.3.2 with
 Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings.

 No issues.

 So I would like to know more about the settings in elasticsearch.yml,
 the mappings, and the installed plugins.

 Jörg


 On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com joerg...@gmail.com
  wrote:

 I have some Solaris 10 Sparc V440/V445 servers available and can try to
 reproduce over the weekend.

 Jörg


 On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir 
 rober...@elasticsearch.com wrote:

 How big is it? Maybe i can have it anyway? I pulled two ancient
 ultrasparcs out of my closet to try to debug your issue, but unfortunately
 they are a pita to work with (dead nvram battery on both, zeroed mac
 address, etc.) Id still love to get to the bottom of this.
  On Aug 22, 2014 3:59 PM, tony@iqor.com wrote:

 Hi Adrien,
 It's a bunch of garbled binary data, basically a dump of the process
 image.
 Tony


 On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote:

 Hi Tony,

 Do you have more information in the core dump file? (cf. the Core
 dump written line that you pasted)


 On Thu, Aug 21, 2014 at 7:53 PM, tony@iqor.com wrote:

 Hello,
 I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to
 scale out of small x86 machine.  I get a similar exception running ES 
 with
 JAVA_OPTS=-d64.  When Logstash 1.4.1 sends the first message I get the
 error below on the ES process:


 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x7a9a3d8c, pid=14473, tid=209
 #
 # JRE version: 7.0_25-b15
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
 solaris-sparc compressed oops)
 # Problematic frame:
 # V  [libjvm.so+0xba3d8c]  Unsafe_GetInt+0x158
 #
 # Core dump written. Default location: /export/home/elasticsearch/
 elasticsearch-1.3.2/core or core.14473
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #

 ---  T H R E A D  ---

 Current thread (0x000107078000):  JavaThread
 elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker
 #147} daemon [_thread_in_vm, id=209, stack(0x5b80,
 0x5b84)]

 siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN),
 si_addr=0x000709cc09e7


 I can run ES using 32bit java but have to shrink ES_HEAPS_SIZE more
 than I want to.  Any assistance would be appreciated.

 Regards,
 Tony


 On Tuesday, July 22, 2014 5:43:28 AM UTC-4, David Roberts wrote:

Re: indices.memory.index_buffer_size

2014-08-26 Thread Yongtao You

Thanks Mark.

What confuses me are global setting (which suggests cluster-wide setting)
and on a specific node (which suggests node level setting). I could just
try it out, but it's hard to tell if the setting worked or not. :(

On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-indices.html

states It is a global setting that bubbles down to all the different
shards allocated on a specific node.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 25 August 2014 03:12, Yongtao You yongt...@gmail.com javascript:
wrote:

Hi,

Is the indices.memory.index_buffer_size configuration a cluster wide
configuration or per node configuration? Do I need to set it on every node?
Or just the master (eligible) node?

Thanks.
Yongtao

https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: indices.memory.index_buffer_size

2014-08-26 Thread Nikolas Everett

I just looked at this code!

Its a setting that you set globally at the cluster level. It takes effect
per node. What that means is that for every active shard on each the
node gets an equal share of that much space. Active means has been
written to in the past six minutes or so. When a node first starts all
shards are active assumed active and those that are not updated at all lose
active status after the timeout. You can watch the little dance it does by
setting
index.engine.internal: DEBUG
in logging.yml.

Now - I'm not actually sure how important a setting it it is. I opened
https://github.com/elasticsearch/elasticsearch/issues/7441 to get suggest
allowing better spreading it around. Mike'll probably close it if
spreading it around wouldn't really help things much.

Nik

On Tue, Aug 26, 2014 at 2:07 PM, Yongtao You yongtao@gmail.com wrote:

Thanks Mark.

What confuses me are global setting (which suggests cluster-wide
setting) and on a specific node (which suggests node level setting). I
could just try it out, but it's hard to tell if the setting worked or not.
:(

On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote:

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/modules-indices.html states It is a global setting
that bubbles down to all the different shards allocated on a specific node.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 25 August 2014 03:12, Yongtao You yongt...@gmail.com wrote:

Hi,

Is the indices.memory.index_buffer_size configuration a cluster wide
configuration or per node configuration? Do I need to set it on every node?
Or just the master (eligible) node?

Thanks.
Yongtao

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1CmkjPAPJns3PjCmsFicu8KYV0DRjv9T2qacx636sy7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

Still broken with lzf-compress 1.0.3

https://gist.github.com/jprante/d2d829b497db4963aea5

Jörg


On Tue, Aug 26, 2014 at 7:54 PM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 Thanks for the logstash mapping command. I can reproduce it now.

 It's the LZF encoder that bails out at
 org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt

 which uses in turn sun.misc.Unsafe.getInt

 I have created a gist of the JVM crash file at

 https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b

 There has been a fix in LZF lately
 https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

 for version 1.0.3 which has been released recently.

 I will build a snapshot ES version with LZF 1.0.3 and see if this works...

 Jörg



 On Mon, Aug 25, 2014 at 11:30 PM, tony.apo...@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and Logstash
 1.4.1.  The error occurs even before my data is sent.  Can you try to
 reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {index.refresh_interval
 : 5s  },  mappings : {_default_ : {   _all : {enabled :
 true},   dynamic_templates : [ { string_fields : {
   match : *,   match_mapping_type : string,
 mapping : { type : string, index : analyzed,
 omit_norms : true,   fields : { raw :
 {type: string, index : not_analyzed, ignore_above : 256}
   }   } }   } ],   properties : {
 @version: { type: string, index: not_analyzed }, geoip
  : {   type : object, dynamic: true,
 path: full, properties : {   location : {
 type : geo_point } } }   }}  }}



 On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote:

 I have no plugins installed (yet) and only changed es.logger.level to
 DEBUG in logging.yml.

 elasticsearch.yml:
 cluster.name: es-AMS1Cluster
 node.name: KYLIE1
 node.rack: amssc2client02
 path.data: /export/home/apontet/elasticsearch/data
 path.work: /export/home/apontet/elasticsearch/work
 path.logs: /export/home/apontet/elasticsearch/logs
 network.host:    = sanitized line; file contains actual
 server IP
 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7]
   = Also sanitized

 Thanks,
 Tony




 On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote:

 I tested a simple Hello World document on Elasticsearch 1.3.2 with
 Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings.

 No issues.

 So I would like to know more about the settings in elasticsearch.yml,
 the mappings, and the installed plugins.

 Jörg


 On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I have some Solaris 10 Sparc V440/V445 servers available and can try
 to reproduce over the weekend.

 Jörg


 On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir 
 rober...@elasticsearch.com wrote:

 How big is it? Maybe i can have it anyway? I pulled two ancient
 ultrasparcs out of my closet to try to debug your issue, but 
 unfortunately
 they are a pita to work with (dead nvram battery on both, zeroed mac
 address, etc.) Id still love to get to the bottom of this.
  On Aug 22, 2014 3:59 PM, tony@iqor.com wrote:

 Hi Adrien,
 It's a bunch of garbled binary data, basically a dump of the process
 image.
 Tony


 On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote:

 Hi Tony,

 Do you have more information in the core dump file? (cf. the Core
 dump written line that you pasted)


 On Thu, Aug 21, 2014 at 7:53 PM, tony@iqor.com wrote:

 Hello,
 I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to
 scale out of small x86 machine.  I get a similar exception running ES 
 with
 JAVA_OPTS=-d64.  When Logstash 1.4.1 sends the first message I get the
 error below on the ES process:


 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x7a9a3d8c, pid=14473, tid=209
 #
 # JRE version: 7.0_25-b15
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
 solaris-sparc compressed oops)
 # Problematic frame:
 # V  [libjvm.so+0xba3d8c]  Unsafe_GetInt+0x158
 #
 # Core dump written. Default location: /export/home/elasticsearch/
 elasticsearch-1.3.2/core or core.14473
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #

 ---  T H R E A D  ---

 Current thread (0x000107078000):  JavaThread
 elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker
 #147} daemon [_thread_in_vm, id=209, stack(0x5b80,
 0x5b84)]

 siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN),
 si_addr=0x000709cc09e7


 I can

Elastic HQ not getting back vendor info from Elasticsearch.

2014-08-26 Thread John Smith

I posted an issue with Elastic HQ here: 
https://github.com/royrusso/elasticsearch-HQ/issues/164

But just in case maybe an Elastic dev can have a look and see if it's 
Elasticsearch issue or not.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elastic HQ not getting back vendor info.

2014-08-26 Thread John Smith

I posted an issue with Elastic HQ 
here: https://github.com/royrusso/elasticsearch-HQ/issues/164

But just in case maybe an Elastic dev can have a look and see if it's 
Elasticsearch issue or not.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6161414-ad80-4881-bf87-ede7f1818437%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: groovy for scripting

2014-08-26 Thread Alex S.V.

providing self-update:

I found that I could create cross-request cache using next script (like a 
cross-request incrementer):

POST /test/_search
{
query: {match_all:{}},
script_fields: {
   a: { 
   script: import groovy.lang.Script;class A extends Script{static 
i=0;def run() {i++}},
   lang: groovy
   }
}
}

In good view mode the script is:

import groovy.lang.Script

class A extends Script{
  static i=0

  def run() {
 i++
  }
}

Actually here *i* variable is not thread-safe, but idea is clean - you need 
define a class, inherited from Script and implement abstract method run.
Also this class is access on each node-thread.
Now I'm looking for a solution to make a query-scope type counter (for 
one-node configuration). I think it's could be done by passing unique 
query_id in parameters, but I'm afraid of making code non thread safe, or 
vice versa - thread safe, but with reduce performance.
Researching more...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb402d2c-8820-4a1f-99e0-0453c0c82cf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

term_stats return sometime return meaningless number

2014-08-26 Thread youwei chen

Our elasticsearch instance sometime return meaningless number for 
terms_stats, query return correct data.

I am using Kibana as front end, this is generated query: 

{facets:{terms:{terms_stats:{value_field:metric,key_field:host,size:10,order:count},facet_filter:{fquery:{query:{filtered:{query:{bool:{should:[{query_string:{query:(service:\.StorageProxy.RecentReadLatencyMicros\)
 
AND (layer:\cassandra\) AND (@timestamp:[now-1m TO now]) AND 
host:169.26.4.167}}]}},filter:{bool:{must:[{range:{@timestamp:{from:1408992976055,to:now}}},{terms:{host:[169.26.4.167]}},{terms:{host:[169.26.4.167]}}],size:0}

max=4.6366831074216192E+18
mean=1.5455610358072064E+18
min=0
term=169.26.4.167
total=4.6366831074216192E+18

the metric field would have some number between 0 and 100 while the term 
stat report huge number.  If i delete index, it will show correct term 
stats again.

I tried refresh, close/open index, none seem to work except delete the 
index and recreate it.  Have anyone face similar issue?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5e141e56-7e01-4899-949f-c3d7f69a353d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Failing Replica Shards

2014-08-26 Thread David Kleiner

Hello,

In the past couple of days I've been getting a lot of error messages about 
corrupted replica shards.  The primary shards come up fast after ES process 
restart but replicas take a long time to come back. Sometimes it takes a 
few node restarts to 'kick' the nodes to start replica shards.

ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer.  It's a 
3-way cluster with 4 logstash feeders hanging off it. 

Here are the errors;

[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
Salvador Dali] [downloader-2014.08][4] received shard failed for 
[downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R], 
s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine 
failure, message [corrupted preexisting 
index][CorruptIndexException[[downloader-2014.08][4] Corrupted index 
[corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec 
footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520 
(resource: 
NIOFSIndexInput(path=/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc))
[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
Salvador Dali] [eventlog-2014.06][0] received shard failed for 
[eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message 
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0] 
Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by: 
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
footer=-1071082520 (resource: 
NIOFSIndexInput(path=/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd))
[2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 / 
Salvador Dali] [eventlog-2014.07][0] received shard failed for 
[eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message 
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0] 
Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by: 
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
footer=-1071082520 (resource: 
NIOFSIndexInput(path=/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd))



Thanks,

David

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c0af53fb-6fdd-4624-bf6c-9b9d50081689%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Data per node in ES

2014-08-26 Thread Gaurav Tiwari

Hi ,

We are analyzing ES for storing our log data (~ 400 GB/Day) and will be 
integrating Logstash and ES.  What is the maximum amount of data that can 
be stored on one node of ES ?

Regards,
Gaurav

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1b5123f-51a7-41b4-9915-d4ea705d23de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Reduce Number of Segments

2014-08-26 Thread Michael McCandless

OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for
spinning disks.

Maybe try also disabling merge throttling and see if that has an effect? 6
MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker ch...@chris-decker.com
wrote:

Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected
was included with ES 1.2.0.

*Any other ideas / suggestions? *Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:

Which version of ES are you using? Versions before 1.2 have a bug that
caused merge throttling to throttle far more than requested such that you
couldn't get any faster than ~8 MB / sec. See https://github.com/
elasticsearch/elasticsearch/issues/6018

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com
wrote:

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the knobs
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:
1. How can I increase the current merge rate? According to Elastic HQ,
my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have
SSDs, but with 15k drives it seems like I should be able to get better
rates. I tried increasing indices.store.throttle.max_bytes_per_sec
from the default of 20mb to 40mb in my templates, but I didn't see a
noticeable change in disk IOps or the merge rate the next day. Did I do
something incorrectly? I'm going to experiment with setting it overall
with index.store.throttle.max_bytes_per_sec and removing it from my
templates.
2. Should I move away from the default merge policy, or stick with the
default (tiered)?

Any advice you have is much appreciated; additional details on my
situation are below.

- I generate 2 indices per day - “high” and “low”. I usually end up
with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200
segments for my ‘low’ index, which I then optimize once I roll-over to the
next day’s indices.
- 4 ES servers (soon to be 8).
— Each server has:
12 Xeon cores running at 2.3 GHz
15k drives
128 GB of RAM
68 GB used for OS / file system machine
60 GB used by 2 JVMs
- Index ~ 750 GB per day; 1.5 TB if you include the replicas
- Relevant configs:
TEMPLATE:
index.refresh_interval : 60s,
index.number_of_replicas : 1,
index.number_of_shards : 4,
index.merge.policy.max_merged_segment : 50g,
index.merge.policy.segments_per_tier : 5,
index.merge.policy.max_merge_at_once : “5”,
indices.store.throttle.max_bytes_per_sec : 40mb.

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch for logging. HOW to configure automatic creation of the new index every day?

2014-08-26 Thread David Kleiner

Hello Konstantin,

You can use index value of name-%{+.MM.dd}  in your elasticsearch 
output in logstash

(link: http://logstash.net/docs/1.4.2/outputs/elasticsearch#index)

HTH,

David

On Tuesday, August 26, 2014 10:01:39 AM UTC-7, Konstantin Erman wrote:

 Most of the guides I could find recommend creation of *one index per day* 
 when Elastic is used to store and query log files. Unfortunately not a 
 single guide dares to explain *HOW exactly shall I configure freshly 
 installed Elastic to create new index every day*. Could somebody please 
 help me with it?

 A few bits of additional info: I deal with Elastic on Windows Server (or 
 may be on Azure, but not any Linux) and I (plan) to send log events to 
 Elastic using Serilog. Any advise for those special circumstances 
 appreciated.

 Thank you!
 Konstantin


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7c2fbf8d-1c5e-435d-945b-2e6baf012abe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: indices.memory.index_buffer_size

2014-08-26 Thread Michael McCandless

See also https://github.com/elasticsearch/elasticsearch/pull/7440 (will be
in 1.4.0) which returns the actual RAM buffer size assigned to that shard
by the little dance.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Aug 26, 2014 at 2:15 PM, Nikolas Everett nik9...@gmail.com wrote:

I just looked at this code!

Nik

On Tue, Aug 26, 2014 at 2:07 PM, Yongtao You yongtao@gmail.com
wrote:

Thanks Mark.

What confuses me are global setting (which suggests cluster-wide
setting) and on a specific node (which suggests node level setting). I
could just try it out, but it's hard to tell if the setting worked or not.
:(

On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote:

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/modules-indices.html states It is a global setting
that bubbles down to all the different shards allocated on a specific node.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 25 August 2014 03:12, Yongtao You yongt...@gmail.com wrote:

Hi,

Is the indices.memory.index_buffer_size configuration a cluster wide
configuration or per node configuration? Do I need to set it on every node?
Or just the master (eligible) node?

Thanks.
Yongtao

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1CmkjPAPJns3PjCmsFicu8KYV0DRjv9T2qacx636sy7g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1CmkjPAPJns3PjCmsFicu8KYV0DRjv9T2qacx636sy7g%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRcwFh-qCtu6B15Ni9KjzCYVojXtc4KTzMc%2Be1BVHZ%3D-Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Can't open file to read checksums

2014-08-26 Thread Ivan Brusic

A few questions:

What version of Elasticsearch are you using?
Are you using the Java client and is it the same version of the cluster?
Did you upgrade recently and was the index built with an older version of
Elasticsearch?

Elasticsearch recently added checksum verification (1.3?), so perhaps you
have some sort of version mismatch.

Cheers,

Ivan



On Mon, Aug 25, 2014 at 10:52 AM, Casper Thrane casper.s.thr...@gmail.com
wrote:

 Hi!

 We get the following errors, on two of our nodes. And after that our
 cluster doesn't work. I have no idea what it means.

 [2014-08-25 17:46:39,323][WARN ][indices.store]
 [p-elasticlog03] Can't open file to read checksums
 java.io.FileNotFoundException: No such file [_6cq_es090_0.doc]
 at
 org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:173)
 at
 org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:144)
 at
 org.elasticsearch.index.store.DistributorDirectory.openInput(DistributorDirectory.java:130)
 at
 org.elasticsearch.index.store.Store$MetadataSnapshot.checksumFromLuceneFile(Store.java:532)
 at
 org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:459)
 at
 org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:433)
 at
 org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:271)
 at
 org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186)
 at
 org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140)
 at
 org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61)
 at
 org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277)
 at
 org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268)
 at
 org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)

 Br
 Casper

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5f417878-3e49-478d-90e7-ca5c42734567%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/5f417878-3e49-478d-90e7-ca5c42734567%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBYGbr-k%2BB41UUwUY7DVk27KUiuf1xY5kekkXoNRc3grg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

elasticsearch processing pipeline capability?

2014-08-26 Thread Kevin B

Is there any facility in elasticsearch to help with sending terms to an 
external processes after lucene processing (tokenization, filters, etc)? 
 The idea here is having some external analysis / nlp code run against the 
documents while keeping all the pre-processing choices consistent and in 
one place (i.e. the analysis setup in elasticsearch index configuration).

I am not very familiar with Lucene, but I believe possibly their update 
request processor is intended for scenarios like this needing a simple 
pipeline.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: term_stats return sometime return meaningless number

2014-08-26 Thread youwei chen

Additional information:

take mean of boolean value return 4,607,182,418,800,017,408

Thanks

On Tuesday, August 26, 2014 3:58:43 PM UTC-4, youwei chen wrote:

 Our elasticsearch instance sometime return meaningless number for 
 terms_stats, query return correct data.

 I am using Kibana as front end, this is generated query: 

 {facets:{terms:{terms_stats:{value_field:metric,key_field:host,size:10,order:count},facet_filter:{fquery:{query:{filtered:{query:{bool:{should:[{query_string:{query:(service:\.StorageProxy.RecentReadLatencyMicros\)
  
 AND (layer:\cassandra\) AND (@timestamp:[now-1m TO now]) AND 
 host:169.26.4.167}}]}},filter:{bool:{must:[{range:{@timestamp:{from:1408992976055,to:now}}},{terms:{host:[169.26.4.167]}},{terms:{host:[169.26.4.167]}}],size:0}

 max=4.6366831074216192E+18
 mean=1.5455610358072064E+18
 min=0
 term=169.26.4.167
 total=4.6366831074216192E+18

 the metric field would have some number between 0 and 100 while the term 
 stat report huge number.  If i delete index, it will show correct term 
 stats again.

 I tried refresh, close/open index, none seem to work except delete the 
 index and recreate it.  Have anyone face similar issue?

 Thanks.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/781d0077-93d4-4d1d-93ca-953e84704964%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Marvel not showing nodes stats

2014-08-26 Thread Jeff Byrnes

I'm experiencing a similar issue to this. We have two clusters:

   - 2 node monitoring cluster (1 master/data  1 just data)
   - 5 node production cluster (2 data, 3 masters)
   
The output below is from the non-master data node of the Marvel monitoring 
cluster. There are no errors being reported by any of the production nodes.

[2014-08-26 21:10:51,503][DEBUG][action.search.type   ] 
[stage-search-marvel-1c] [.marvel-2014.08.26][2], 
node[iGRH8Gc2QO698RMlWy8rgQ], [P], s[STARTED]: Failed to execute 
[org.elasticsearch.action.search.SearchRequest@355e93ff]
org.elasticsearch.transport.RemoteTransportException: 
[stage-search-marvel-1b][inet[/10.99.111.122:9300]][search/phase/query]
Caused by: org.elasticsearch.search.SearchParseException: 
[.marvel-2014.08.26][2]: query[ConstantScore(BooleanFilter(+*:* 
+cache(_type:index_stats) +cache(@timestamp:[140908680 TO 
140908746])))],from[-1],size[10]: Parse Failure [Failed to parse source 
[{size:10,query:{filtered:{query:{match_all:{}},filter:{bool:{must:[{match_all:{}},{term:{_type:index_stats}},{range:{@timestamp:{from:now-10m/m,to:now/m}}}],facets:{timestamp:{terms_stats:{key_field:index.raw,value_field:@timestamp,order:term,size:2000}},primaries.docs.count:{terms_stats:{key_field:index.raw,value_field:primaries.docs.count,order:term,size:2000}},primaries.indexing.index_total:{terms_stats:{key_field:index.raw,value_field:primaries.indexing.index_total,order:term,size:2000}},total.search.query_total:{terms_stats:{key_field:index.raw,value_field:total.search.query_total,order:term,size:2000}},total.merges.total_size_in_bytes:{terms_stats:{key_field:index.raw,value_field:total.merges.total_size_in_bytes,order:term,size:2000}},total.fielddata.memory_size_in_bytes:{terms_stats:{key_field:index.raw,value_field:total.fielddata.memory_size_in_bytes,order:term,size:2000]]
at 
org.elasticsearch.search.SearchService.parseSource(SearchService.java:664)
at 
org.elasticsearch.search.SearchService.createContext(SearchService.java:515)
at 
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:487)
at 
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:256)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:688)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:677)
at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
Facet [timestamp]: failed to find mapping for index.raw
at 
org.elasticsearch.search.facet.termsstats.TermsStatsFacetParser.parse(TermsStatsFacetParser.java:126)
at 
org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
at 
org.elasticsearch.search.SearchService.parseSource(SearchService.java:648)
... 9 more
[2014-08-26 21:10:51,503][DEBUG][action.search.type   ] 
[stage-search-marvel-1c] [.marvel-2014.08.26][2], 
node[iGRH8Gc2QO698RMlWy8rgQ], [P], s[STARTED]: Failed to execute 
[org.elasticsearch.action.search.SearchRequest@32f235e9]
org.elasticsearch.transport.RemoteTransportException: 
[stage-search-marvel-1b][inet[/10.99.111.122:9300]][search/phase/query]
Caused by: org.elasticsearch.search.SearchParseException: 
[.marvel-2014.08.26][2]: query[ConstantScore(BooleanFilter(+*:* 
+cache(_type:node_stats) +cache(@timestamp:[140908680 TO 
140908746])))],from[-1],size[10]: Parse Failure [Failed to parse source 
[{size:10,query:{filtered:{query:{match_all:{}},filter:{bool:{must:[{match_all:{}},{term:{_type:node_stats}},{range:{@timestamp:{from:now-10m/m,to:now/m}}}],facets:{timestamp:{terms_stats:{key_field:node.ip_port.raw,value_field:@timestamp,order:term,size:2000}},master_nodes:{terms:{field:node.ip_port.raw,size:2000},facet_filter:{term:{node.master:true}}},os.cpu.usage:{terms_stats:{key_field:node.ip_port.raw,value_field:os.cpu.usage,order:term,size:2000}},os.load_average.1m:{terms_stats:{key_field:node.ip_port.raw,value_field:os.load_average.1m,order:term,size:2000}},jvm.mem.heap_used_percent:{terms_stats:{key_field:node.ip_port.raw,value_field:jvm.mem.heap_used_percent,order:term,size:2000}},fs.total.available_in_bytes:{terms_stats:{key_field:node.ip_port.raw,value_field:fs.total.available_in_bytes,order:term,size:2000}},fs.total.disk_io_op:{terms_stats:{key_field:node.ip_port.raw,value_field:fs.total.disk_io_op,order:term,size:2000]]
at 
org.elasticsearch.search.SearchService.parseSource(SearchService.java:664)
at

Re: Parent/Child query performance in version 1.1.2

2014-08-26 Thread Mark Greene

Just wanted to close the loop on this in case anyone stumbled upon the same 
issue.

After upgrading to version 1.3.2 which had the performance increase 
stemming from https://github.com/elasticsearch/elasticsearch/pull/5846, we 
were able to see a dramatic decrease in parent/child query latency. We're 
executing queries under 150ms which is manageable for now and will be 
eagerly awaiting further improvements from the work Clinton highlighted 
here: https://github.com/elasticsearch/elasticsearch/issues/7394.

Along the way in our testing we got a little confused as we attempted to do 
our troubleshooting on 1 data node in order to keep things simple, this 
manifested in some misplaced assumptions around the performance increases 
that came from work released in 1.2.0. In our testing on a single node, we 
did _not_ observe a latency decrease at all when going from 1.1.2 to 1.3.2. 
However, when we changed our test cluster to use two data nodes, we saw a 
huge improvement. So my earlier assertion around not seeing those 
improvements in version 1.3.2 was incorrect although I'm still confused as 
to why a single node configuration was not benefiting.

In any case, wanted to thank the ES developers for being generous with 
their time helping us track this issue down. Now that I realize the 
incredible pace in which ES versions are released, we'll be much more 
vigilant about keeping up.

Thanks again!


On Monday, August 25, 2014 11:32:38 AM UTC-4, Mark Greene wrote:

 Hey Clinton,

 Thanks for the heads up on what's on the horizon. That definitely sounds 
 like a drastic improvement. That being said, my fear here is that even with 
 that improvement, this data model (parent/child) doesn't seem to that 
 performant with a moderate amount of documents. In order for us to really 
 adopt this methodology of using parent/child, we'd expect to see sub 100ms 
 performance so long as we were feeding ES with enough RAM. 

 My hunch here is there must be some code path that is hit when running on 
 more than 1 data node that either doesn't write to the cache or skips it on 
 the read and hits the disk. We don't have a ton of load on our data nodes, 
 CPU is well under 30% and IOWait is usually under 0.30.

 Just to reiterate, when we run the parent/child query on one data node, it 
 runs in less than 100ms, when it runs across two data nodes, it's 10s. 
 This is being experienced on version 1.1.2 and 1.3.2.

 On Monday, August 25, 2014 10:55:15 AM UTC-4, Clinton Gormley wrote:

 Something else to note: parent-child now uses global ordinals to make 
 queries 3x faster than they were previously, but global ordinals need to be 
 rebuilt after the index has refreshed (assuming some data has changed).

 Currently there is no way to refresh p/c global ordinals eagerly (ie 
 during the refresh phase) and so it happens on the first query after a 
 refresh.  1.3.3 and 1.4.0 will include an option to allow eager building of 
 global ordinals which should remove this latency spike: 
 https://github.com/elasticsearch/elasticsearch/issues/7394

 You may want to consider increasing the refresh_interval so that global 
 ordinals remain valid for longer.


 On 25 August 2014 16:48, Mark Greene ma...@evertrue.com wrote:

 Hi Adrien,

 Thanks for reaching out.

 We actually were exited to see the performance improvements stated in 
 the 1.2.0 release notes so we upgraded to 1.3.2. We saw some performance 
 improvement but it wasn't orders of magnitude and queries are still running 
 very slow.

 We also tried your suggestion of using the 'preference=_local' query 
 param but we didn't see any difference there. Additionally, running the 
 query 10 times, we saw no improvement in speed.

 Currently, the only major performance increase we've seen with 
 parent/child queries is dropping down to 1 data node, at which, we see 
 queries executing well under the 100ms mark.




 On Friday, August 22, 2014 6:42:27 PM UTC-4, Adrien Grand wrote:

 Hi Mark,

 Given that you had 1 replica in your first setup, it could take several 
 queries to warm up the field data cache completely, does the query still 
 take 16 seconds to run if you run it eg. 10 times? (3 should be enough, 
 but 
 just to be sure)

 Does it change anything if you query elasticsearch with 
 preference=_local? This should be equivalent to your single-node setup, so 
 it would be interesting to see if that changes something.

 As a side note, you might want to try out a more recent version of 
 Elasticsearch since parent/child performance improved quite significantly 
 in 1.2.0 because of https://github.com/elasticsearch/elasticsearch/
 pull/5846



 On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene ma...@evertrue.com 
 wrote:

 I wanted to update the list with an interesting piece of information. 
 We found that when we took one of our two data nodes out of the cluster, 
 leaving just one data node with no replicas, the query performance 
 increased dramatically. The queries are now

Re: elasticsearch processing pipeline capability?

If you want to retrieve the term list of an index after Lucene processing
via REST HTTP API, you can try

https://github.com/jprante/elasticsearch-index-termlist

Jörg

On Tue, Aug 26, 2014 at 10:41 PM, Kevin B blaisde...@gmail.com wrote:

Is there any facility in elasticsearch to help with sending terms to an
external processes after lucene processing (tokenization, filters, etc)?
The idea here is having some external analysis / nlp code run against the
documents while keeping all the pre-processing choices consistent and in
one place (i.e. the analysis setup in elasticsearch index configuration).

I am not very familiar with Lucene, but I believe possibly their update
request processor is intended for scenarios like this needing a simple
pipeline.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGQ1NaTJn31H%3DTn7xLQTwagXWSDT5vM3xDLtt9wfcTaTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

How do I start elasticsearch as a service?

2014-08-26 Thread Eric Greene

Forgive me I'm a little lost.

I am working on deploying elasticsearch on a AWS server.  Previously in 
development I have started elasticsearch using ./bin/elasticsearch 
-Des.config=/etc/elasticsearch/elasticsearch.yml

But in live deployment, I want to keep elasticsearch running as a service...

I have 1.2.0 installed on Ubuntu 12.04 on my AWS instance.

I run sudo /etc/init.d/elasticsearch start and I get:
* Starting Elasticsearch server

I check sudo /etc/init.d/elasticsearch status and I get:
* elasticsearch is not running

I'm not sure how to troubleshoot.  Any advice or suggestions?  Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How do I start elasticsearch as a service?

Check the logs under /var/log/elasticsearch, they should have something.

Also please be aware that 1.2.0 has a critical bug and you should be using
1.2.1 instead.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 27 August 2014 08:42, Eric Greene ericdgre...@gmail.com wrote:

Forgive me I'm a little lost.

I am working on deploying elasticsearch on a AWS server. Previously in
development I have started elasticsearch using ./bin/elasticsearch
-Des.config=/etc/elasticsearch/elasticsearch.yml

But in live deployment, I want to keep elasticsearch running as a
service...

I have 1.2.0 installed on Ubuntu 12.04 on my AWS instance.

I run sudo /etc/init.d/elasticsearch start and I get:
* Starting Elasticsearch server

I check sudo /etc/init.d/elasticsearch status and I get:
* elasticsearch is not running

I'm not sure how to troubleshoot. Any advice or suggestions? Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Ya1xeRTaebytv9D8Zv9zTK-7XoB3zK4vhvRNHQuX%3DgMQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic HQ not getting back vendor info from Elasticsearch.

ElasticHQ is a community plugin, the ES devs can't help here.

I have raised issues against ElasticHQ in the past and Roy has fixed them
pretty quickly :)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 27 August 2014 04:44, John Smith java.dev@gmail.com wrote:

I posted an issue with Elastic HQ here: https://github.com/
royrusso/elasticsearch-HQ/issues/164

But just in case maybe an Elastic dev can have a look and see if it's
Elasticsearch issue or not.

Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624a4FzMnmVpR-piEaTpbqMzizdfNdo3QdcuG-ascgZt5vg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Aggregation query works with search, but not with msearch

2014-08-26 Thread Dhruv Garg

I am trying to troubleshoot the following observation:

Following code works as expected: 

Elasticsearch::Model.client.search search_type: 'count', index: 
target_indices, body: query

Response:

{took=2, timed_out=false, _shards={total=2, successful=2, 
failed=0}, hits={total=6, max_score=0.0, hits=[]}, 
aggregations={recent={doc_count=3, 
searches={buckets=[{key=user-1, doc_count=3}]

However, when using the above in an msearch, the response is not useful:

Elasticsearch::Model.client.msearch body: [{ search_type: 'count', index: 
target_indices, search: query }]

Response:

{responses=[{took=0, timed_out=false, _shards={total=2, 
successful=2, failed=0}, hits={total=6, max_score=0.0, 
hits=[]}}]}

---

What am I missing?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fef8c41-7232-43e6-8632-9e2e5058240d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

alerting in Marvel

2014-08-26 Thread kti_sk

Hi,
I started using Marvel for my cluster monitoring. 
Does Marvel have a way to set notification such as send me email if cpu 
load is over 80%?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Getting different results while using bool query vs bool query with function score query

2014-08-26 Thread Akshay Shukla

I am trying to add a custom boost to the different should clauses in the 
bool query, but I am getting different number of results when I use the 
bool query with 2 should clauses containing 2 simple query string query vs 
a bool query with 2 should clauses with 2 function score query 
encapsulating the same simple query string queries. 
The following query returns me 2 results for my data set: 
{ 
query : { 
filtered : { 
query : { 
bool : { 
should : [ { 
simple_query_string : { 
query : 128, 
fields : [ content.name_enu.simple ] 
} 
}, { 
simple_query_string : { 
query : 128, 
fields : [ content.name_enu.simple_with_numeric ] 
} 
} ] 
} 
}, 
filter : { 
bool : { 
must : [ { 
term : { 
securityInfo.securityType : open 
} 
}, { 
bool : { 
must : [ { 
term : { 
sourceId.sourceSystem : jmeter_007971_numeric 
} 
}, { 
term : { 
sourceId.type : file 
} 
} ] 
} 
} ], 
_cache : true 
} 
} 
} 
}, 
fields : [ elementId, sourceId.id, sourceId.type, 
sourceId.sourceSystem, sourceVersion, content.name_enu ] 
} 



Where as if I use the following query I get 5 results, same simple query 
strings but with function scores: 
{ 
query : { 
filtered : { 
query : { 
bool : { 
should : [ { 
function_score : { 
query : { 
simple_query_string : { 
query : 128, 
fields : [ content.name_enu.simple ] 
} 
}, 
boost_factor : 1.5 
} 
}, { 
function_score : { 
query : { 
simple_query_string : { 
query : 128, 
fields : [ content.name_enu.simple_with_numeric ] 
} 
}, 
boost_factor : 2.5 
} 
} ] 
} 
}, 
filter : { 
bool : { 
must : [ { 
term : { 
securityInfo.securityType : open 
} 
}, { 
bool : { 
must : [ { 
term : { 
sourceId.sourceSystem : jmeter_007971_numeric 
} 
}, { 
term : { 
sourceId.type : file 
} 
} ] 
} 
} ], 
_cache : true 
} 
} 
} 
}, 
fields : [ elementId, sourceId.id, sourceId.type, 
sourceId.sourceSystem, sourceVersion, content.name_enu ] 
} 



From my understanding of how the should clause works I was expecting both 
the queries to return 5 results but I am not able to understand why the 1st 
query returns me 2 results for my data set. The content.name_enu.simple 
uses a simple analyzer, whereas simple_with_numeric uses whitespace 
tokenizer and lowercase filter

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e31e1c7-8b07-4220-abc9-c520d681495a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: alerting in Marvel

Nope.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 August 2014 09:14, kti...@hotmail.com wrote:

 Hi,
 I started using Marvel for my cluster monitoring.
 Does Marvel have a way to set notification such as send me email if cpu
 load is over 80%?

 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aY%2BjtFF%3D00wftO8ds%3DCO5J-F%3DQVTEwfO1xJVcmMV2pHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: gateway.recover_after_nodes minimum_master_nodes in a distributed environment?

Only master eligible for discovery.zen.minimum_master_nodes, so in your
case it is 1. And that's bad as you can end up with a split brain
situation. You should, if you can, make all three nodes master eligible.

gateway.recover_after_nodes is all nodes, as per
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 26 August 2014 23:37, Chris Neal chris.n...@derbysoft.net wrote:

Hello all,

Question
about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in
a distributed ES cluster. By distributed I mean I have:

2 nodes that are data only:
'node.data' = 'true',
'node.master' = 'false',
'http.enabled' = 'false',

1 node that is a master/search only node:
'node.master' = 'true',
'node.data' = 'false',
'http.enabled' = 'true',

When setting discovery.zen.minimum_master_nodes, is the (n / 2) + 1
formula including *all* nodes of all types in the cluster, or just
those who can be masters?

Similarly, when setting gateway.recover_after_nodes, is this value the
number of all nodes of all types in the cluster, or just those that are
data nodes?

Thank you very much for your time!
Chris

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624biBGL9O7zaT%3Dfm3%2BfNRCMoDrQDR_CRCV%3DhM9FZCAqOpw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How do I start elasticsearch as a service?

2014-08-26 Thread Eric Greene

Thanks Mark, I found that if I comment out the line in elasticsearch.yml
that sets the data path, it works.

I will upgrade as you have suggested, thanks for that.

On Tuesday, August 26, 2014 4:04:05 PM UTC-7, Mark Walkom wrote:

Check the logs under /var/log/elasticsearch, they should have something.

Also please be aware that 1.2.0 has a critical bug and you should be using
1.2.1 instead.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 27 August 2014 08:42, Eric Greene ericd...@gmail.com javascript:
wrote:

Forgive me I'm a little lost.

I am working on deploying elasticsearch on a AWS server. Previously in
development I have started elasticsearch using ./bin/elasticsearch
-Des.config=/etc/elasticsearch/elasticsearch.yml

But in live deployment, I want to keep elasticsearch running as a
service...

I have 1.2.0 installed on Ubuntu 12.04 on my AWS instance.

I run sudo /etc/init.d/elasticsearch start and I get:
* Starting Elasticsearch server

I check sudo /etc/init.d/elasticsearch status and I get:
* elasticsearch is not running

I'm not sure how to troubleshoot. Any advice or suggestions? Thanks

https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/58806f15-0a63-44f6-9a35-85a460384fa5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Data per node in ES

Depends.
How much disk do you have? RAM? CPU? Java version and release? ES version?
What's your query load like? Are you doing lots of aggregates or facets?

The best way to know is to start using ELK on an platform indicative of
your intended server size and then see how much data a single node can
handle, then extrapolate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 27 August 2014 06:24, Gaurav Tiwari gtins...@gmail.com wrote:

Hi ,

We are analyzing ES for storing our log data (~ 400 GB/Day) and will be
integrating Logstash and ES. What is the maximum amount of data that can
be stored on one node of ES ?

Regards,
Gaurav

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1b5123f-51a7-41b4-9915-d4ea705d23de%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c1b5123f-51a7-41b4-9915-d4ea705d23de%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YwMUZhTQ4Bt-DD67CQ3Z2ykA83ZA4GmyH_yjtBLAeZPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: alerting in Marvel