Re: Indexing large number of files each with a huge size

2014-08-26 Thread Sandeep Ramesh Khanzode
Hi Jorg,

This is mostly standard code that I am referring. This is called from
multiple threads for a different set of files on disk.
Please provide your suggestions. Thanks,


BulkRequestBuilder bulkRequest = client.prepareBulk();
bulkRequest.setRefresh(false);

for every input file in the input list, do ...
MapString, Object jsonDocument = new HashMapString,
Object();

jsonDocument.put(fileContent, STRING_CONTENT_OF_FILE);
jsonDocument.put(fileProperty1,
FILE_PROPERTY_1_STRING);
jsonDocument.put(fileProperty1, FILE_PROPERTY_2_STRING);
jsonDocument.put(fileProperty1, FILE_PROPERTY_3_STRING);
jsonDocument.put(filePath, new
BytesRef(filePath.toString()));

bulkRequest.add(client.prepareIndex(indexName,
typeName).setSource(jsonDocument));
}

BulkResponse bulkResponse = bulkRequest.execute().actionGet();



Thanks,
Sandeep


On Mon, Aug 25, 2014 at 10:40 PM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 Can you show the program how you index?

 Before tuning heap sizes or batch sizes, it is good to check if the
 program works correct.

 Jörg


 On Mon, Aug 25, 2014 at 7:00 PM, 'Sandeep Ramesh Khanzode' via
 elasticsearch elasticsearch@googlegroups.com wrote:

 Hi,

 I am trying to index documents, each file approx ~10-20 MB. I start
 seeing memory issues if I try to index them all in a multi-threaded
 environment from a single TransportClient on one machine to a single node
 cluster with 32GB ES server. It seems like the memory is an issue on the
 client as well as server side, and I probably understand and expect that
 :).

 I have tried tuning the heap sizes and batch sizes in Bulk APIs. However,
 am I trying to push the limits too much? One thought is to probably stream
 the data so that I do not hold it all in memory. Is it possible? Is this a
 general problem or just that my usage is wrong?

 Thanks,
 Sandeep

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.

 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/d2612109-b31c-4127-857b-f8aa27fb0aeb%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d2612109-b31c-4127-857b-f8aa27fb0aeb%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/QQDTzCAMQyU/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG7oByjnRhFoHboLJRRzhdBbsr%2BXC8NO0JU9KP0VEU4HQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG7oByjnRhFoHboLJRRzhdBbsr%2BXC8NO0JU9KP0VEU4HQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKnM90Z91arY3mtT3QGJJow49rRdR9zawuEmTABdVC5m_v%2B%2BuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: aggregate on analyzed field

2014-08-26 Thread Adrien Grand
Hi,

Multi-fields are usually the way to go in such cases, see
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html


On Mon, Aug 25, 2014 at 9:49 PM, kti...@hotmail.com wrote:

 I am aggregating documents by customer name to find how many documents we
 have per customer.
 The aggregates bucketize words in names. For example, if I have customer,
 Tom Cruise, I would get 2 buckets, Tom and Cruise

 How would I treat the analyzed field as not_analyzed  in aggregate query?
 I still want the field to remain analyzed so that I can do fulltext search.

 thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/ff43bdfa-db39-4894-8cf8-a1f6b0df96ce%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/ff43bdfa-db39-4894-8cf8-a1f6b0df96ce%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5R69zwcKpYGr60WD_yQrwGcnfaRnfJusHUvmLimg2XnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to index Office files? *.txt and *.pdf are working...

2014-08-26 Thread David Pilato
I see what happened. Could you open an issue in mapper plugin?

Will fix that next week.

Thanks for the details!

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 25 août 2014 à 15:03, Dirk Bauer dirk.ba...@gmail.com a écrit :
 
 Hi David,
 
 thx for your help, but it's still not working.
 
 What I did:
 
 The query 
 
 {
   query: {
 match: {
   _all: test
 }
   }
 }
 
 delivers all my indexed document (also the '.doc / *.docx files) and I can 
 see the base64 stuff in the file.file field.
 So this looks good to me.
 
 Then I went to ..\config\logging.yml and added under the logger: section an 
 entry for 
 
 1st attempt: org.apache.plugin.mapper.attachments: TRACE
 2nd attempt: org.apache.tika: TRACE
 
 After shutdown of ES, restart, deleting the existing index and reindexing of 
 my test documents there was no additional entry from the mapper plug or tika 
 in the log.
 ES is logging fine...
 
 logger:
   # log action execution errors for easier debugging
   action: DEBUG
   # reduce the logging for aws, too much is logged under the default INFO
   com.amazonaws: WARN
 
   # gateway
   #gateway: DEBUG
   #index.gateway: DEBUG
 
   # peer shard recovery
   #indices.recovery: DEBUG
 
   # discovery
   #discovery: TRACE
 
   index.search.slowlog: TRACE, index_search_slow_log_file
   index.indexing.slowlog: TRACE, index_indexing_slow_log_file
 
   # DBA: Enabled logger for plugin mapper.attachments
   org.apache.plugin.mapper.attachments: TRACE
 
 
 The next idea was that maybe the mapping plugin is missing some files for 
 parsing for Office documents?
 In the plug-in folder I can see *.jar files for 
 
 rome-0.9.jar
 tagsoup-1.2.1.jar
 tika-core-1.5.jar
 tika-parsers-1.5.jar
 vorbis-java-core-0.1.jar
 vorbis-java-core-0.1-tests.jar
 vorbis-java-tika-0.1.jar
 xercesImpl-2.8.1.jar
 xml-apis-1.3.03.jar
 xmpcore-5.1.2.jar
 xz-1.2.jar
 apache-mime4j-core-0.7.2.jar
 apache-mime4j-dom-0.7.2.jar
 asm-debug-all-4.1.jar
 aspectjrt-1.6.11.jar
 bcmail-jdk15-1.45.jar
 bcprov-jdk15-1.45.jar
 boilerpipe-1.1.0.jar
 commons-compress-1.5.jar
 commons-logging-1.1.1.jar
 elasticsearch-mapper-attachments-2.3.1.jar
 fontbox-1.8.4.jar
 geronimo-stax-api_1.0_spec-1.0.1.jar
 isoparser-1.0-RC-1.jar
 jdom-1.0.jar
 jempbox-1.8.4.jar
 jhighlight-1.0.jar
 juniversalchardet-1.0.3.jar
 metadata-extractor-2.6.2.jar
 netcdf-4.2-min.jar
 pdfbox-1.8.4.jar
 
 Not sure but here you will find additional files poi*.jar that should be 
 responsible to parse the office files:
 
 http://mvnrepository.com/artifact/org.apache.tika/tika-parsers/1.5
 
 The following files were downloaded to the plugin folder but the documents 
 are still not parsed...
 
 poi-3.10-beta2.jar
 poi-ooxml-3.10-beta2.jar
 poi-scratchpad-3.10-beta2.jar
 
 The last check was to make sure the word document are not corruped. A 
 colleage of mine has checked a test file with
 
 java -jar tika-app-1.5.jar –g
 
 and the output was fine for the document.
 
 So, anyone some more ideas??
 
 Thanks
 Dirk
 
 Am Montag, 25. August 2014 10:56:54 UTC+2 schrieb David Pilato:
 
 From my experience, this should work. Indexing Word docs should work as Tika 
 support office docs.
 
 Not sure what you are doing wrong. Try to send a match all query and ask for 
 field file.file.
 
 Also, you could set mapper plugin to TRACE mode in logging.yml and see if it 
 tells something interesting.
 
 HTH
 
 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
 
 Le 25 août 2014 à 09:05, Dirk Bauer dirk@gmail.com a écrit :
 
 Hi,
 
 using elasticsearch-1.3.2 with 
 
 Plug-in
 -
 name: mapper-attachments
 version: 2.3.1
 description: Adds the attachment type allowing to parse difference 
 attachment formats
 jvm: true
 site: false
 
 on Windows 8 for evaluation purpose.
 
 JVM 
 -
 version: 1.7.0_67
 vm_name: Java HotSpot(TM) Client VM
 vm_version: 24.65-b04
 vm_vendor: Oracle Corporation
 
 
 I have created the following mapping:
 
 {
 myIndex: {
 mappings: {
 dokument: {
 properties: {
 created: {
 type: date
 format: dateOptionalTime
 }
 description: {
 type: string
 }
 file: {
 type: attachment
 path: full
 fields: {
 file: {
 type: string
 store: true
 term_vector: with_positions_offsets
 }
 author: {
 type: string
 }
 title: {
 type: string
 }
 name: {
 type: string
 }
 date: {
 type: date
 format: dateOptionalTime
 }
 keywords: {
 type: string
 }
 content_type: {
 type: string
 }
 content_length: {
 type: integer
 }
 language: {
 type: string
 }
 }
 }
 id: {
 type: string
 }
 title: {
 type: string
 }
 }
 }
 }
 }
 }
 
 Because I like to use ES from C#/.NET I have created a little C# app that 
 reads a file as base64 encodes stream from hard drive and put the document 
 to the index of ES. I'm working with this POST request:
 
 {
   id: 8dbf1d73-44d1-4e20-aa35-13b18ddf5057,
   title: Test,
   description: Test Description,
   created: 2014-01-20T19:04:20.1019885+01:00,
   file: {
 _content_type: application/pdf,
 _name: Test.pdf,

Multi Tenant DB and JDBC River

2014-08-26 Thread Nitin Maheshwari
Hi Jörg,

I am working on a multi tenant application where each tenant has its own 
database. I am planning to use ES for indexing the data, and JDBC river for 
doing periodic bulk indexing. I do not want to create one river per DB per 
object type. This will lead to too many rivers. 

I wanted to modify the JDBC river so that I can give parent DB location, 
where all tenant db connection information is available. And then inside 
the river, modify it such that a feader thread is created for each river.

Do you see any issue with this or do you have any other recommendation?

Thanks,
Nitin

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/771fc3a8-2203-4db8-a07b-067430e7a473%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Multi Tenant DB and JDBC River

2014-08-26 Thread joergpra...@gmail.com
For multi tenant, the river concept is awkward. River is a singleton and is
bound to single user execution, and you are right, creating river instances
per DB and per index does not scale.

There are several options:

- write a more sophisticated plugin which acts as a service and not as a
singleton. The ES service component, which would maintain state in the
cluster state, could accept job requests where each job request is
equivalent to a JDBC pull. The job requests are delegated to a node which
is not very busy with jobs (load balancing). The code of the JDBC river can
be reused for that.

- write a separate middleware for your tenants where they can have separate
access to the DB and prepare ES JSON bulk files from (maybe be by REST API
calls similar in style to ES). This would be a domain specific solution but
offers most flexibility to the tenants, they are free to decide how and
when to create and index the data from DB.

Jörg




On Tue, Aug 26, 2014 at 11:21 AM, Nitin Maheshwari ask4ni...@gmail.com
wrote:

 Hi Jörg,

 I am working on a multi tenant application where each tenant has its own
 database. I am planning to use ES for indexing the data, and JDBC river for
 doing periodic bulk indexing. I do not want to create one river per DB per
 object type. This will lead to too many rivers.

 I wanted to modify the JDBC river so that I can give parent DB location,
 where all tenant db connection information is available. And then inside
 the river, modify it such that a feader thread is created for each river.

 Do you see any issue with this or do you have any other recommendation?

 Thanks,
 Nitin

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/771fc3a8-2203-4db8-a07b-067430e7a473%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/771fc3a8-2203-4db8-a07b-067430e7a473%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEZumpR1oazTuVn6Ofad71jgEMkSBOKARizK9-gOFpVsA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread Jilles van Gurp
This is the generally accepted dogma and it has some merit. However, having 
two storage systems is more than a bit annoying. If you are aware of the 
limitations and caveats, elasticsearch is actually a perfectly good 
document store that happens to have a deeply integrated querying engine. 
This is useful since most solutions involving a secondary store involve 
solutions that have a much less capable querying engine and additional 
latency + architectural complexity related to pumping around data to 
elastic search. 

Elasticsearch crud operations are atomic. I.e. you can read your own writes 
across the cluster. If you use the version attribute during updates, you 
can detect version conflicts and prevent overwriting updates with stale 
data as well. This is a similar model that you would find in e.g. couchdb 
and similar document stores. There are not that many sharded and 
replicated, horizontally scalable document stores out there and even fewer 
with decent querying ability.

The caveat is that elasticsearch is not as battle tested as other solutions 
in this space and that various people have shown that ways exist to cause 
an elastic search cluster to lose data, to corrupt data, etc. So, you need 
to be prepared to be able to recover from such situations. That means you 
need backups (e.g. use the snapshots feature) and a plan for when things go 
bad. 

The flip side is that other solutions have issues as well. Postgresql 
clustering is brand new and probably has issues and if you use it in non 
clustered mode, the failure scenarios get even more interesting. I use 
Mariadb Galera cluster and it sucks big time and it needs a lot of 
handholding during upgrades.  Couchdb doesn't shard and shares server 
failure scenarios with elasticsearch. Mongodb and cassandra each have had 
their share of issues related to data corruption and data loss in the 
recent past and both have recently fixed major issues related to that. So, 
there are lots of solutions out there and none of them are perfect.

Elasticsearch has several major areas where it needs improvement (and which 
are indeed being worked on in recent versions):
1) it has many ways it can run out of memory. If you skim through the 
release notes of recent versions, you'll see a lot of fixes related to that 
including the use of e.g. circuit breakers. The problem with OOM's is that 
it can cause a cascading cluster failure where one node becomes slow, 
eventually drops out of the cluster and then other nodes start having the 
same issues. I've personally seen Kibana kill our cluster on two occasions. 
In both cases the logs of all nodes were full of OOM's and the cluster died 
while simply clicking through different dashboards in Kibana. This has not 
happened with the current 1.3.x version (yet) but that doesn't mean it is 
impossible.
2) split brain situations when a quorum is lost but not detected are fairly 
easy to trigger. Every time I do a rolling update, the cluster takes 
several seconds to catch up with fact that I'm shutting down nodes. I have 
a three node cluster. One node goes down, means my cluster should be 
yellow. Two nodes down means red and it should no longer accept writes. The 
problem is that during those few seconds, the cluster status may not 
reflect reality and nodes may in fact be accepting writes when they 
shouldn't. 
3) A full cluster restart needs a lot of handholding. The reason for this 
is that most of the failure scenarios relate to there not being a quorum 
and detecting that. For example, if you simply restart the nodes one by one 
quickly you will easily get your cluster in a red state where it should no 
longer be accepting writes. The problem as described above is that 
detecting this relies on timeouts and there may be some nodes that continue 
to write for a few seconds after they should have stopped doing that. By 
the time your cluster goes red, it's too late and you are going to have to 
manually decide which shards you want to loose. That's why you need to keep 
an eye on cluster status during rolling updates. Imagine somebody power 
cycling your elastic search node cluster or worse, rebooting the switch 
that connects your nodes.
4) Elasticsearch under load may throw 503s occasionally. I've seen this 
happen on our test infrastructure a couple of times and it worries me. This 
is not something you want to see when you are writing customer data. 

Mitigation for these issues typically involves using specialized nodes for 
read and write traffic and cluster management. Additionally, you need to 
heavily tweak things to make certain failure scenarios less likely. Out of 
the box, there is a lot of stuff that can go wrong.  

We're actually deprecating our mariadb architecture and switching to an 
elasticsearch only architecture. I'm well aware that I'm taking a risk here 
and I have a backup plan for most of those risks. This includes changing 
plans and switching to couchdb or a similar document store 

AW: Shards

2014-08-26 Thread Markus Wiesenbacher
Hi,

 

I´ve found the problem, the JSON-structure was not correct, it has to be
this if you are using the JAVA-API:

 

{ 
   analysis:{ ... },

   index:{ 
   number_of_replicas:1,
   number_of_shards:3

   }
}

 

Thanks

Markus ;)

 

Von: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
Im Auftrag von Markus Wiesenbacher
Gesendet: Montag, 25. August 2014 23:55
An: elasticsearch@googlegroups.com
Betreff: Shards

 

Hi folks,

 

I am using a single Node-Cluster (v1.3.2) on my PC, and I was wondering that
there are always 5 shards in the file-system (separate Lucene-indices), no
matter how many I configure in in elasticsearch.yml or programmatically with
Java-API (loadFromSource with JSON-String). Do I missunderstand something?

 

Many thanks!

 

Markus ;)

 

BTW: Here´s my JSON for the settings:

 

{ 
   analysis:{ ... },
   settings:{ 
  index:{ 
 number_of_replicas:1,
 number_of_shards:3
  }
   }
}

 

-- 
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/009901cfc0af%2448b7ab40%24da
2701c0%24%40codefreun.de
https://groups.google.com/d/msgid/elasticsearch/009901cfc0af%2448b7ab40%24d
a2701c0%24%40codefreun.de?utm_medium=emailutm_source=footer .
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/001a01cfc11e%249c6973d0%24d53c5b70%24%40codefreun.de.
For more options, visit https://groups.google.com/d/optout.


Get distinct result by using multi_match and suggestion

2014-08-26 Thread Ramy
Is there a way to solve the following problem?

I have created a search field with suggestions functionality. The user is 
able to search for names, categories, etc. These fields are mapped like:

{
  settings: {
analysis: {
  analyzer: {
*autocomplete*: {
  type: custom,
  tokenizer: *edge_ngram_tokenizer*,
  filter: [ lowercase ]
}
  },
  tokenizer: {
*edge_ngram_tokenizer*: {
  type: edgeNGram,
  min_gram: 1,
  max_gram: 20,
  token_chars: [letter, digit]
}
  }
}
  },
  mappings: {
my_type: {
dynamic: strict,
properties: {
id: {
type: long
},
*name*: {
type: string,
analyzer: english,
fields: {
  *autocomplete: {*
*type: string, *
*index_analyzer: autocomplete, *
*search_analyzer: standard*
*  }*
}
},
*category*: {
type: string,
analyzer: english,
fields: {
  *autocomplete: {*
*type: string, *
*index_analyzer: autocomplete, *
*search_analyzer: standard*
*  }*
}
},
...

Now when i do something like this:
curl -XGET http://localhost:9200/my_index/my_type/_search; -d'
{
  _source: false,
  query: {
multi_match: {
  query: pet,
  fields: [
 *.*autocomplete*
  ]
}
  }
}'

I get results like these:
- Peter
- Peter
- Peter
- Petra
- Petra
etc.

*How can I reduce (distinct) the results on server side like these?*
- Peter
- Petra
- etc.

thx, Ramy

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/859b067e-f2cb-4cc4-beef-bba547a85906%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread Raphael Waldmann
Mohit Anchlia,

How do you sync ES with your main DB?

That's what I'm thinking for my project because I don't have much
experience with ES.

Thanks
 On Aug 26, 2014 1:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote:

 In general use elasticsearch only as a secondary index. Have a copy of
 data somewhere else which is more reliable. Elasticsearch often runs into
 index corruption issues which are hard to resolve.


 On Mon, Aug 25, 2014 at 9:30 PM, xiehai...@gmail.com wrote:


 On Tuesday, August 26, 2014 6:46:12 AM UTC+8, Raphael Waldmann wrote:

 Hi,

 First I would like to thanks all of you for Elastic. I am thinking in
 use it in a ERP that I am building. What do you think about this? Am I
 crazy?

 Has someone face this? I really don't think that I am comfy enough to do
 this, change the problems that I already know, for new problems that I
 really don't know how to deal.

 I believe that nosql will prevail over traditional sql, but I don't know
 if I am ready to this task.

 So how you think that I should integrate (or not) postgresql with
 ELASTICSEARCH?


 Will you plan t use ES to index data in postgresql?

 I have similar idea, want to use ES instead datawarehouse.

 Some problems  I can see:
 1) Data in RDBMS are stored in tables,  connected with relationship. You
 can use very complex sql to query a complex result, how to do in ES?
 2) If your want to run some analyse algorithms with exist data, how to
 running in ES?
 3) if your data are enough big, search one keyword in '_all' field, ES
 will be slow?


 Thanks.
 -Terrs

 Thanks again,


 rsw1981

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f5500235-46e8-4c6c-8597-e42d7401d22a%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/yHVPWNXxgys/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqoEZHchXCU7p%3DNt9FtibQyCiWwU6nt9YQNUATiOspMVQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqoEZHchXCU7p%3DNt9FtibQyCiWwU6nt9YQNUATiOspMVQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHMXrw5BH-OA2BqWmUWOt2HyB-3tZEiw3cwJ%3D1U9aaucMTk-Tg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: AutoCompletion Suggester - Duplicate record in suggestion return

2014-08-26 Thread alistairj
Hi Alexander,

If I may, I have a follow-up question to your response here. How does the 
completion suggester behave with fields such as payload and score when it 
is unifying the response based on output ?? Are scores increased based on 
this combination? if payloads are different, which ones are returned?

Thanks for you help!

Alistair


On Monday, April 21, 2014 2:26:13 PM UTC+2, Alexander Reelsen wrote:

 Hey,

 the output is used to unify the search results, otherwise the input is 
 used. The payload itself is just meta information.
 The main reason, why you see the suggestion twice is, that even though a 
 document is deleted and cannot be found anymore, the suggest data 
 structures are only cleaned up during merges/optimizations. Running 
 optimize should fix this.

 Makes sense?


 --Alex



 On Sun, Apr 13, 2014 at 12:49 PM, kidkid zki...@gmail.com javascript: 
 wrote:

 I have figure out the problem.
 The main problem is I have used the same output for all input then ES 
 have been wrong in this case.

 I still trying to improve the performance. I am just test on 64Gb Ram 
 server (32Gb for ES 1.0.1) 24 core.
 Have only 2 record but it took me 3ms to suggest.






 On Sunday, April 13, 2014 4:53:21 PM UTC+7, kidkid wrote:

 There are something really strange.
 I don't know whether anyone have worked with this such feature or it's 
 just not-stable feature.
 If we do index same input, and different output,payload, then only one 
 result found.

 Do anyone tell me how could I fix it ?


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/f6547a58-c002-4ff3-80c9-2052e1d14ddd%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/f6547a58-c002-4ff3-80c9-2052e1d14ddd%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/13c35309-a55b-45d7-ba37-bd7bb44e6f20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Timezone in Simple Query

2014-08-26 Thread Gianni Livolsi
All dates are UTC. Internally, a date maps to a number type long. 
When applied on date fields the range filter accepts also a time_zone 
parameter 
{ 
range : { 
born : { 
gte: 2012-01-01, 
time_zone: +1:00 
} 
} 
} 

but this is not possible 
{ 
match : { 
post_date : 2012-01-01, 
*time_zone: +1:00*  this do not work } 
} 

How can i do  to permit any users to query correct respect his own 
timezone...appending 
it to query?? 


 

Tnx 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39414b48-f2c5-4faa-b103-96b91c0888b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Building an ERP with Elasticsearch. Am I crazy?

2014-08-26 Thread Raphael Waldmann
I am reading a lot studying what is the best aproach fo this.

My main question can be resumed in two points

If I choose ES to index my postgresql. What's the best way to do that?

I need cluster? The most problems that I read about was related to that. If
this is true and I can run in one node should I do that?

Thanks for share your experience.

Have a nice day

Raphael Waldmann

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHMXrw52OShKM0snMxtHy-rSPEvscNQeoUurbR8uqp_x0%2BPZtA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Aggregation across indices

2014-08-26 Thread 'Sandeep Ramesh Khanzode' via elasticsearch
Hi,

If I have two indices each having part of the record and joined using some 
common identifier, can I issue a query across both indices and have 
aggregations apply taking into consideration both indices?

Example:
Index 1: Type 1:
ID: String
Field1: String
Field2: String

Index 2: Type 2:
ID: String (From above. I can keep this same to behave like a foreign key.)
Field3: String
Field4: String

Can I effect a join across both indices and aggregate on Field4 for example?

Please let me know. Thanks,
Sandeep 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Ability to search accross 'types' in the same index, with different search parameters yet applying the same size and from values, in a single search query

2014-08-26 Thread vineeth mohan
Hello AJ ,

You can do this as follows

{
query_string : {
query :  test-type1.status:1 || test-type2.status:2
}

But then there is a bug associated with a corner condition of  this -
https://github.com/elasticsearch/elasticsearch/issues/4081
So be a bit careful.

Thanks
  Vineeth

Thanks
Vineeth

}


On Tue, Aug 26, 2014 at 1:21 PM, Ajinkya Apte ajin...@gmail.com wrote:

 Hello,
 Examples of some documents:

 POST /test-index/test-type-1/doc-1
 {
 text : Some text,
 status : 1
 }

 POST /test-index/test-type-2/doc-1
 {
 text : Some new text,
 status : 1
 }

 POST /test-index/test-type-2/doc-2
 {
 text : Some even new text,
 status : 2
 }

 Is there a single query I can use so that I can get all the documents that
 are of 'status'=1 in 'type'='test-type-1' and 'status'=2 in
 'type'='test-type-2' applying the same 'size' and 'from' params? As of
 right now I am running two different queries and then I am trying to merge
 them programatically. Any better way you recommend?

 AJ

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/7f6bac9f-8fc3-4886-8a70-4dac81424073%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/7f6bac9f-8fc3-4886-8a70-4dac81424073%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nkrc8beGyi-tLgox4Ec25H%2BUftBTy1-Q7xbwXOP4RMVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation across indices

2014-08-26 Thread vineeth mohan
Hello Sandeep ,

What you are intending is not possible.
But then Elasticsearch do have some good relational operations which needs
to be defined before indexing.
If you can elaborate your use case , we can help on this.

Thanks
   Vineeth


On Tue, Aug 26, 2014 at 6:04 PM, 'Sandeep Ramesh Khanzode' via
elasticsearch elasticsearch@googlegroups.com wrote:

 Hi,

 If I have two indices each having part of the record and joined using some
 common identifier, can I issue a query across both indices and have
 aggregations apply taking into consideration both indices?

 Example:
 Index 1: Type 1:
 ID: String
 Field1: String
 Field2: String

 Index 2: Type 2:
 ID: String (From above. I can keep this same to behave like a foreign key.)
 Field3: String
 Field4: String

 Can I effect a join across both indices and aggregate on Field4 for
 example?

 Please let me know. Thanks,
 Sandeep

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2b839a9a-b109-4948-8d8b-58107f77572e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3D93B%2Bk1QCKQHg_n%3D%3Da9Yih9Lyi1k4Gt_LZ7kywnBiroQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Failed start of 2nd instance on same host with mlockall=true

2014-08-26 Thread R. Toma
Hi all,

In an attempt to squeeze more power out of our physical servers we want to 
run multiple ES jvm's per server.

Some specs:
- servers has 24 cores, 256GB ram
- each instance binds on different (alias) ip
- each instance has 32GB heap
- both instances run under user 'elastic'
- limits for 'elastic' user: memlock=unlimited
- es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:
- increase of overal cpu load
- lots of I/O to disks
- no logging for 2nd instance
- 2nd instance hangs
- 1st instance keeps running, but gets slowish
- cd /proc/pid causes a hang of cd process (until 2nd instance is killed)
- exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed)

Maybe (un)related: I have never been able to run Elasticsearch in a 
virtualbox with memlock=unlimited and mlockall=true.


After an hour of trial  errors I found that removing setting 
'bootstrap.mlockall' (setting it to false) from 2nd instance's 
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need some advice to build a log central.

2014-08-26 Thread vineeth mohan
Hello Sang ,

Can i know why you are using Hive.
I feel you can do the analysis in Elasticsearch itself.
Rest seems good to me.

Thanks
   Vineeth


On Tue, Aug 26, 2014 at 8:03 AM, Sang Dang zkid...@gmail.com wrote:

 Hello All,
 I have selected #2 as my solution.
 I write data to ES, and use kibana+ to realtime monitor.
 For stats, I use Hive.

 Each project, I will create a index, for each type of log I will put in a
 ES Type,
 ex: ProjectXlog_debug
   log_error
   Stats_API
   Stats_PageView
   Stats_XYZ

 I am wonder whether it's good ?
 Should I separate by time for each type of project ?

 Regards.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/35487688-4204-4f4d-aa2e-2a9b6a43aa82%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/35487688-4204-4f4d-aa2e-2a9b6a43aa82%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mvMAK%3DkKqg%3DTyzb-J0Boo_CVPUnC_vY0j%2BhNn_rH8_%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is it possible to register a RestFilter without creating a plugin?

2014-08-26 Thread vineeth mohan
Hello Jinyuan ,

I dont feel this is possible.
In such a provision , how will you define what the REST API will do ?

Thanks
   Vineeth


On Tue, Aug 26, 2014 at 2:41 AM, Jinyuan Zhou zhou.jiny...@gmail.com
wrote:

 Thanks,

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEkN%3DsgLqmkT-vwQ%2BptCs0LmPa0BDw5hX3A5Yzg-Wx_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


define multiple types in an index

2014-08-26 Thread HansPeterSloot
Hello,

I am using elasticsearch 1.3.2 and try to understand elasticsearch (with my 
Oracle background ;-)).
For testing I use the data available 
on http://fec.gov/disclosurep/PDownload.do

There is a datafile for every state of the USA.
I don't know whether it is a good idea but I want to make 1 index with a 
type for every state.

I want to define the fields and their types in advance.

Can I create the index with type AL and add other types after creation?
 I tried but I was not able to do it.

I created the the folloing index 
curl -XPOST localhost:9200/contributions -d '{
settings : {
number_of_shards : 10,  number_of_replicas : 1, _index : true
},
mappings : {
  AK : {properties : 
 {cand_id : {type : string , index : 
not_analyzed  },
  cand_nm : {type : string   },
  cmte_id : {type : string  }
} 
  }
, AL : {
properties : {
  cand_id : {type : string , index : 
not_analyzed  },
  cand_nm : { type : string   },
  cmte_id : {type : string  }
} 
  } 
} 
}'

Can I add types for AR and AZ after creation?
They have the same column definition. 
Is there a better way to achieve this?

Regards HansP

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/191875fc-8378-4ed1-8ca9-b0f2fbfc2ccc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Swap indexes?

2014-08-26 Thread Lee Gee
I was looking for the index alias, thanks all.

On Tuesday, June 17, 2014 9:31:00 AM UTC+1, Lee Gee wrote:

 Is it possible to have one ES instance create an index and then have a 
 second instance use that created index, without downtime?

 tia
 lee


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c577a018-fe46-4b73-a08c-ea07796fa02d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: _suggest suggestion/question

2014-08-26 Thread Lee Gee
Thank you, Vineeth.

On Sunday, August 17, 2014 12:04:20 PM UTC+1, vineeth mohan wrote:

 Hello Lee ,

 You will need to use context suggester for this purpose - 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/suggester-context.html

 Also this difference stems from the fact that , both actual data and auto 
 completion data are stored in different data structures.
 This is to make sure that the auto completion data is memory resident and 
 thus super fast.

 Thanks
   Vineeth


 On Sun, Aug 17, 2014 at 3:32 PM, Lee Gee lee...@gmail.com javascript: 
 wrote:

 My reading, which may not be accurate, of this [1] clear and concise 
 post, 
 is that it is not possible to use a reference to an existing field as an 
 argument to a suggestor's 'input' or 'payload' fields.

 Please would you clarify if I have missed something?

 If I was correct, would it be much work to add these features?

 TIA
 Lee

 [1] http://www.elasticsearch.org/blog/you-complete-me/

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/2367a474-f47b-43ae-bad0-7326256dec60%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/2367a474-f47b-43ae-bad0-7326256dec60%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9ea51925-5ef8-48f3-8960-e5462e112713%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to get the field infomation when _all and _source was set disabled

2014-08-26 Thread vineeth mohan
Hello Wang ,

By default the _source field stores the input JSON and gives it back for
each document match.
If you disable , ES wont be able to return it.
Hence the result you see.
By default ES wont make any efforts to tap the Stored information , it
rather takes the json stored in _source field.

Now to get the text set as stored , you need to use the fields option.
Typically , you need to tell ES  , you need so and so fields.
This information would be searched in stored field space rather than
_source field.

In your query , you need to mention the fields you are interested in -

searchRequestBuilder.setTypes(type1).fields([ title ] )
( equal-ant in Java)


 Thanks
   Vineeth


On Mon, Aug 25, 2014 at 1:09 PM, Wang Mingxing wmx...@gmail.com wrote:

  Hi,
 I created an index, which was named test_all, and it has a table :
 type1. I want to test the usage of _all and _source. Now , I change
 their status to false. The mapping as follows:
 $ curl -XGET 'localhost:9200/test_all/_mapping/type1?pretty'
 {
   test_all : {
 mappings : {
   type1 : {
 _all : {
   enabled : false
 },
 _source : {
   enabled : false
 },
 properties : {
   content : {
 type : string,
 analyzer : ik
   },
   title : {
 type : string,
 store : true,
 analyzer : ik
   }
 }
   }
 }
   }
 }

 In the table type1, I store the title information. I insert five
 document in type1. But, when retrievaling them, I could not find the
 field title information.

 $ curl -XGET 'localhost:9200/test_all/type1/_search?pretty'
 {
   took : 16,
   timed_out : false,
   _shards : {
 total : 5,
 successful : 5,
 failed : 0
   },
   hits : {
 total : 5,
 max_score : 1.0,
 hits : [ {
   _index : test_all,
   _type : type1,
   _id : zWQno3rLS56hkwJ_Y108Dg,
   _score : 1.0
 }, {
   _index : test_all,
   _type : type1,
   _id : BDKa-IP7TDK_iM2VNGFPYw,
   _score : 1.0
 }, {
   _index : test_all,
   _type : type1,
   _id : n97suWSwQACgx35APTOqPg,
   _score : 1.0
 }, {
   _index : test_all,
   _type : type1,
   _id : 2P7OblUiQB2Y8ZCtWWWTdg,
   _score : 1.0
 }, {
   _index : test_all,
   _type : type1,
   _id : Lo_PFVeKTEWazwCLbyKAqQ,
   _score : 1.0
 } ]
   }
 }

 Then, I try to resolve it by JAVA API:
 public static void indexSearch(Client client){
 SearchRequestBuilder
 searchRequestBuilder=client.prepareSearch(test_all);
 searchRequestBuilder.setTypes(type1);
 SearchResponse
 searchResponse=searchRequestBuilder.execute().actionGet();
 SearchHit[]hits=searchResponse.getHits().getHits();
 System.out.println(count: +hits.length);
 for(SearchHit hit:hits){
 System.out.println();
 System.out.println(docID: +hit.getId());
 System.out.println(score: +hit.getScore());
 System.out.println(title:
 +hit.getFields().get(title).toString());
 }
 }

  and it shows:

 Exception in thread main count: 5
 
 java.lang.NullPointerException
 at es.api.Test_All.indexSearch(Test_All.java:64)
 at es.api.Test_All.main(Test_All.java:73)
 docID: zWQno3rLS56hkwJ_Y108Dg
 score: 1.0

 I guess the value doesn't exist.

 Can you call me why?

 Many Thanks.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/53FAE834.109%40gmail.com
 https://groups.google.com/d/msgid/elasticsearch/53FAE834.109%40gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mD1J2Jx%2B5_%2BppYsd9Uc%3DkAtS%2B9fjTWbJjOYXmAorYP5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


gateway.recover_after_nodes minimum_master_nodes in a distributed environment?

2014-08-26 Thread Chris Neal
Hello all,

Question
about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in
a distributed ES cluster.  By distributed I mean I have:

2 nodes that are data only:
'node.data' = 'true',
'node.master' = 'false',
'http.enabled' = 'false',

1 node that is a master/search only node:
'node.master' = 'true',
'node.data' = 'false',
'http.enabled' = 'true',

When setting discovery.zen.minimum_master_nodes, is the (n / 2) + 1 formula
including *all* nodes of all types in the cluster, or just those who can be
masters?

Similarly, when setting gateway.recover_after_nodes, is this value the
number of all nodes of all types in the cluster, or just those that are
data nodes?

Thank you very much for your time!
Chris

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using elasticsearch as a realtime fire hose

2014-08-26 Thread Jilles van Gurp
You might want to look at developing a plugin for this or maybe using an 
existing one. This one for example might do partly what you 
need: https://github.com/derryx/elasticsearch-changes-plugin

If you develop your own plugin, you should be able to tap into what is 
happening in the cluster at a pretty low level.

Jilles

On Monday, August 25, 2014 9:27:42 AM UTC+2, Jim Alateras wrote:

 What kind of events do you think of? Single new document indexed? Batch of 
 docs indexed? Node-wide? Or cluster wide?

 event on whenever a document is added to an index cluster wide
  


 You mention Redis, for something like publish/subscribe pattern, you'd 
 have to use a persistent connection and implement your own ES actions, 
 which is possible with e.g. HTTP websockets 

 A sketchy implementation can be found here:

 https://github.com/jprante/elasticsearch-transport-websocket


 thanks for the reference,  I will have a deeper look at it. 



 Jörg



 On Sat, Aug 23, 2014 at 8:09 PM, Jim Alateras j...@sutoiku.com wrote:

 I was wondering whether there were any mechanisms to use ES as a 
 realtime feed for downstream systems. I have a cluster that gathers 
 observations from many sensors. I have a need to maintain a list of 
 realtime counters in REDIS so I want to further process these observation 
 once they hit the database. Additionally I also want to be able to create 
 event streams for different type of feeds. 

 I could do all this outside ES but I was wondering whether there were 
 mechanisms within ES that will allow me to subscribe to add events for a 
 particular type or index.


 cheers
 /jima

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/9f5b1d11-0be1-461d-a5bd-dd70f1a0b6c1%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/9f5b1d11-0be1-461d-a5bd-dd70f1a0b6c1%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a3468dc3-2b96-4f00-a921-fba6892b5bba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Logstash stop communicating with Elasticsearch

2014-08-26 Thread Jilles van Gurp
I had some issues with logstash as well and ended up modifying the 
elasticsearch_http plugin to tell me what was going on. Turned out my 
cluster was red because my index template required more replicas than was 
possible:-). The problem was that logstash does not fail very gracefully 
and logging is not that great either (which I find ironic for a logging 
centric product). So I modified it to simply log the actual elastic search 
response, which was a 503 unavailable. From there it was pretty clear what 
to fix.

I filed a bug + pull request for this but it seems nobody has done anything 
with it so far: https://github.com/elasticsearch/logstash/issues/1367

Jilles

On Saturday, August 23, 2014 2:51:18 PM UTC+2, 凌波清风 wrote:

 Hello, 
 I also happen that you encounter this problem, the situation happened 
 to me is that this error occurs in the morning every day. You do not know 
 how to solve, hoping to give some help.

 thx.

 在 2014年7月18日星期五UTC+8下午8时56分54秒,Alexandre Fricker写道:

 Everithing was working fine until 4 h this morning when Logstash stop 
 send new logs to Elasticsearch and when I stop then restart the losgstash 
 process
 it reprocess a bulk of new log lines and when it start to send it to 
 Elasticserch it start writing this message again and again

 {:timestamp=2014-07-18T09:46:29.593000+0200, :message=Failed to 
 flush outgoing items, :outgoing_count=86, :exception=#RuntimeError: 
 Non-OK response code from Elasticsearch: 404, 
 :backtrace=[/soft/sth/lib/logstash/outputs/elasticsearch/protocol.rb:127:in
  
 `bulk_ftw', 
 /soft/sth/lib/logstash/outputs/elasticsearch/protocol.rb:80:in `bulk', 
 /soft/sth/lib/logstash/outputs/elasticsearch.rb:321:in `flush', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:219:in
  
 `buffer_flush', org/jruby/RubyHash.java:1339:in `each', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:216:in
  
 `buffer_flush', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:193:in
  
 `buffer_flush', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:112:in
  
 `buffer_initialize', org/jruby/RubyKernel.java:1521:in `loop', 
 /soft/sth/vendor/bundle/jruby/1.9/gems/stud-0.0.17/lib/stud/buffer.rb:110:in
  
 `buffer_initialize'], :level=:warn}

 But when I check Elastisearch status in Elastisearch HQ everything is 
 Green and OK

 From the day beafore nothing change except that I added a new type of 
 data but only 15 logs every 1 minute



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5fd9e3e2-c38b-4678-995a-80787375267f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Java API or REST API for client development ?

2014-08-26 Thread Jilles van Gurp
I use a in house developed java rest client for elasticsearch. 
Unfortunately it's not in any shape to untangle from our code base and put 
on Github yet but I might consider that if there's more interest.

Basically I use apache httpclient, I implemented a simple round robin 
strategy so I can failover if nodes go down, and I implemented a simple 
rest client around this to support put/post/delete/get requests. Also added 
some basic interpretation of statuses and have mapped those to sensible 
exceptions. 

The idea is that this client is wrapped with another client that supports 
more high level APIs that are exposed from elasticsearch. So you can do 
things like index/delete documents, manage aliases, do bulk indexing etc. 
My long term goal was actually to have two implementations of that client 
one for REST and one for embedded elasticsearch. That would be an 
interesting project because it would give you choice. Except, I never got 
around to doing the embedded client implementation since we don't really 
need it so far. Something else that we use is to model the query DSL using 
static java methods and provides a simple DSL for creating queries in Java. 
This in turn uses my github jsonj project that allows you to 
programmatically manipulate json structures. 

None of this is particularly complicated but altogether there is quite a 
bit of code to write and quite a few things you can get wrong. It's always 
hard to separate the general purpose stuff from the application specific 
stuff and thats one reason why I have not yet put this code out. 

Jilles


On Wednesday, March 26, 2014 10:46:16 AM UTC+1, Subhadip Bagui wrote:

 Hi, 

 We have a cloud management framework where all the event data are to be 
 stored in elasticsearch. I have to start the client side code for this.
  
 I need a suggestion here. Which one should I use, elasticsearch Java API 
 or REST API for the client ?

 Kindly suggest and mention the pros and cons for the same so it will be 
 easy for me to decide the product design than latter hassel.

 Subhadip
  



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/350caf92-63c1-4a0c-a1b5-781cb4b09cfb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Failed start of 2nd instance on same host with mlockall=true

2014-08-26 Thread joergpra...@gmail.com
You should run one node per host.

Two nodes add overhead and suffer from the effects you described.

For mlockall, the user needs privilege to allocate the specified locked
mem, and the OS need contiguous RAM per mlockall call. If the user's
memlock limit is exhausted, or if RAM allocation gets fragmented,
 memlocking is no longer possible and fails.

Jörg


On Tue, Aug 26, 2014 at 2:54 PM, R. Toma renzo.t...@gmail.com wrote:

 Hi all,

 In an attempt to squeeze more power out of our physical servers we want to
 run multiple ES jvm's per server.

 Some specs:
 - servers has 24 cores, 256GB ram
 - each instance binds on different (alias) ip
 - each instance has 32GB heap
 - both instances run under user 'elastic'
 - limits for 'elastic' user: memlock=unlimited
 - es config for both instances: bootstrap.mlockall=true

 The 1st instance has been running for weeks.

 When starting the 2nd instance the following things happen:
 - increase of overal cpu load
 - lots of I/O to disks
 - no logging for 2nd instance
 - 2nd instance hangs
 - 1st instance keeps running, but gets slowish
 - cd /proc/pid causes a hang of cd process (until 2nd instance is killed)
 - exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed)

 Maybe (un)related: I have never been able to run Elasticsearch in a
 virtualbox with memlock=unlimited and mlockall=true.


 After an hour of trial  errors I found that removing setting
 'bootstrap.mlockall' (setting it to false) from 2nd instance's
 configuration fixes things.

 I am confused, but acknowledge I do not know anything about memlocking.

 Any ideas?

 Regards,
 Renzo





  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvtj3NKTWyMTjTre1FfJS31Khn%3DDAy_kCxgVcCFpmDSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to use my customer lucene analyzer(tokenizer)?

2014-08-26 Thread art
Thanks Jun, that was helpful.  It helped me to realize I had not fully 
connected my analyzer plugin.

On Thursday, August 21, 2014 11:23:47 PM UTC-7, Jun Ohtani wrote:

 Hi Art,

 I wrote an example specifying the kuromoji analyzer(kuromoji) and custom 
 analyzer(my_analyzer) for a field.

 curl -XPUT http://localhost:9200/kuromoji-sample; -d'
 {
   settings: {
 index: {
   analysis: {
 analyzer: {
   my_analyzer: {
 tokenizer: kuromoji_tokenizer,
 filter: [
   kuromoji_baseform
 ]
   }
 }
   }
 }
   },
   mappings: {
 sample: {
   properties: {
 title: {
   type: string,
   analyzer: my_analyzer
 },
 body : {
   type: string,
   analyzer: kuromoji
 }
   }
 }
   }
 }'

 I hope that it will be helpful for you.


 2014-08-22 9:18 GMT+09:00 a...@safeshepherd.com javascript::

 I have the same question about using an analyzer I have written as a 
 plug-in for ElasticSearch 1.3.


 https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/blob/es-1.3/README.md
  
 demonstrates only how to use the tokenizers in combination with the 
 built-in CustomAnalyzer. They do not show how to use the kuromoji analyzer 
 itself.

 When I try to specify my analyzer for a field, I get errors like this:

 MapperParsingException[Analyzer [special_analyzer] not found for field 
 [foo]];

 Can you show an example of how to specify the kuromoji analyzer for a 
 field?  I should then be able to adapt it for use with my plugin analyzer.

 Thanks in advance,
 Art



 On Tuesday, August 5, 2014 12:34:42 AM UTC-7, Jun Ohtani wrote:

 Hi,

 I think this plugin will be helpful for you.

 https://github.com/elasticsearch/elasticsearch-analysis-kuromoji
 2014/08/05 15:58 fanc...@gmail.com:

 I want to use my own Chinese analyzer and I can write lucene analyzer 
 class myself. How can I integrate it to elasticsearch?
 I googled and found http://www.elasticsearch.org/guide/en/
 elasticsearch/guide/current/custom-analyzers.html. But it only combine 
 existing tokenizers and filters. I can use tokenizer writing in java by 
 myself.

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/c3fe52cd-8cb5-4c53-b0fe-87183deb45bf%
 40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/c3fe52cd-8cb5-4c53-b0fe-87183deb45bf%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/da795847-3ea2-4afb-9a7b-aefdd6f111a0%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/da795847-3ea2-4afb-9a7b-aefdd6f111a0%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 -- 
 ---
 Jun Ohtani
 blog : http://blog.johtani.info
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a792d08d-534f-4619-bfcb-0f01262b6c51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Function Query with an aggregation function of nested field

2014-08-26 Thread Srinivasan Ramaswamy
I have documents with the above mentioned schema. 


authorId : 10
authorName: Joshua Bloch
books: {
{
bookId: 101
bookName:  Effective Java
description : effective java book with useful recommendations
Category: 1
sales: {
{
keyword: effective java
count: 200
},
{
keyword: java tips
count: 100
},
{
keyword: java joshua bloch
count: 50
} 
}
createDate: 08-25-2014
},
{
bookId: 102,
bookName: Java Puzzlers
description : Java Puzzlers: Traps, Pitfalls, and Corner Cases 
Category: 2
sales: {
{
keyword: java puzzlers
count: 100
},
{
keyword: joshua bloch puzzler
count: 50
}
}
}
}

The sales information is stored with each book along with the search query 
that lead to that sales. If the user applied a category filter, I would 
like to count only books that belong to that category.

I would like to sort the list of authors returned based on a function of 
sales data and text match. For eg if the search query  is java I would 
like to return the above mentioned doc and all other author documents which 
has the term java in them. I came up with the following query:

{
   query: {
  function_score: {
 boost_mode: replace,
 query: {
match: { bookName:java}
 },
 script_score: {
params: {
   param1: 2
},
 script: doc['books.sales.count'].isEmpty() ? _score : 
_score * doc['books.sales.count'].value * param1 
 }
  }
   }
}


I have few questions with the query i have above
1. The results dont look sorted by sales. I have authors who dont have any 
books with sales in them at the top
2. How do i use the sum of all sales for an author (across all books within 
the author document) in the script ? Is there a sum function for the nested 
fields inside a document when using script_score ? Note that sales is a 
nested field inside another nested field products.
3. As a next step I would also like to use a filter for keyword within the 
script_score to only include sales whose keyword value matches with the 
search query term

Any help would be much appreciated. 

Thanks
Srini

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f858caee-bb43-45e1-ada3-212a78378aa0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-26 Thread joergpra...@gmail.com
Thanks for the logstash mapping command. I can reproduce it now.

It's the LZF encoder that bails out at
org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt

which uses in turn sun.misc.Unsafe.getInt

I have created a gist of the JVM crash file at

https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b

There has been a fix in LZF lately
https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

for version 1.0.3 which has been released recently.

I will build a snapshot ES version with LZF 1.0.3 and see if this works...

Jörg



On Mon, Aug 25, 2014 at 11:30 PM, tony.apo...@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and Logstash
 1.4.1.  The error occurs even before my data is sent.  Can you try to
 reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {index.refresh_interval
 : 5s  },  mappings : {_default_ : {   _all : {enabled :
 true},   dynamic_templates : [ { string_fields : {
   match : *,   match_mapping_type : string,
 mapping : { type : string, index : analyzed,
 omit_norms : true,   fields : { raw :
 {type: string, index : not_analyzed, ignore_above : 256}
   }   } }   } ],   properties : {
 @version: { type: string, index: not_analyzed }, geoip
  : {   type : object, dynamic: true,
 path: full, properties : {   location : {
 type : geo_point } } }   }}  }}



 On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote:

 I have no plugins installed (yet) and only changed es.logger.level to
 DEBUG in logging.yml.

 elasticsearch.yml:
 cluster.name: es-AMS1Cluster
 node.name: KYLIE1
 node.rack: amssc2client02
 path.data: /export/home/apontet/elasticsearch/data
 path.work: /export/home/apontet/elasticsearch/work
 path.logs: /export/home/apontet/elasticsearch/logs
 network.host:    = sanitized line; file contains actual
 server IP
 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7]
   = Also sanitized

 Thanks,
 Tony




 On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote:

 I tested a simple Hello World document on Elasticsearch 1.3.2 with
 Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings.

 No issues.

 So I would like to know more about the settings in elasticsearch.yml,
 the mappings, and the installed plugins.

 Jörg


 On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com joerg...@gmail.com
  wrote:

 I have some Solaris 10 Sparc V440/V445 servers available and can try to
 reproduce over the weekend.

 Jörg


 On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir 
 rober...@elasticsearch.com wrote:

 How big is it? Maybe i can have it anyway? I pulled two ancient
 ultrasparcs out of my closet to try to debug your issue, but unfortunately
 they are a pita to work with (dead nvram battery on both, zeroed mac
 address, etc.) Id still love to get to the bottom of this.
  On Aug 22, 2014 3:59 PM, tony@iqor.com wrote:

 Hi Adrien,
 It's a bunch of garbled binary data, basically a dump of the process
 image.
 Tony


 On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote:

 Hi Tony,

 Do you have more information in the core dump file? (cf. the Core
 dump written line that you pasted)


 On Thu, Aug 21, 2014 at 7:53 PM, tony@iqor.com wrote:

 Hello,
 I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to
 scale out of small x86 machine.  I get a similar exception running ES 
 with
 JAVA_OPTS=-d64.  When Logstash 1.4.1 sends the first message I get the
 error below on the ES process:


 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x7a9a3d8c, pid=14473, tid=209
 #
 # JRE version: 7.0_25-b15
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
 solaris-sparc compressed oops)
 # Problematic frame:
 # V  [libjvm.so+0xba3d8c]  Unsafe_GetInt+0x158
 #
 # Core dump written. Default location: /export/home/elasticsearch/
 elasticsearch-1.3.2/core or core.14473
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #

 ---  T H R E A D  ---

 Current thread (0x000107078000):  JavaThread
 elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker
 #147} daemon [_thread_in_vm, id=209, stack(0x5b80,
 0x5b84)]

 siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN),
 si_addr=0x000709cc09e7


 I can run ES using 32bit java but have to shrink ES_HEAPS_SIZE more
 than I want to.  Any assistance would be appreciated.

 Regards,
 Tony


 On Tuesday, July 22, 2014 5:43:28 AM UTC-4, David Roberts wrote:

Re: indices.memory.index_buffer_size

2014-08-26 Thread Yongtao You
Thanks Mark.

What confuses me are global setting (which suggests cluster-wide setting) 
and on a specific node (which suggests node level setting). I could just 
try it out, but it's hard to tell if the setting worked or not. :(

On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote:


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-indices.html
  
 states It is a global setting that bubbles down to all the different 
 shards allocated on a specific node.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 25 August 2014 03:12, Yongtao You yongt...@gmail.com javascript: 
 wrote:

 Hi,

 Is the indices.memory.index_buffer_size configuration a cluster wide 
 configuration or per node configuration? Do I need to set it on every node? 
 Or just the master (eligible) node?

 Thanks.
 Yongtao

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: indices.memory.index_buffer_size

2014-08-26 Thread Nikolas Everett
I just looked at this code!

Its a setting that you set globally at the cluster level.  It takes effect
per node.  What that means is that for every active shard on each the
node gets an equal share of that much space.  Active means has been
written to in the past six minutes or so.  When a node first starts all
shards are active assumed active and those that are not updated at all lose
active status after the timeout.  You can watch the little dance it does by
setting
  index.engine.internal: DEBUG
in logging.yml.

Now - I'm not actually sure how important a setting it it is.  I opened
https://github.com/elasticsearch/elasticsearch/issues/7441 to get suggest
allowing better spreading it around.  Mike'll probably close it if
spreading it around wouldn't really help things much.

Nik

On Tue, Aug 26, 2014 at 2:07 PM, Yongtao You yongtao@gmail.com wrote:

 Thanks Mark.

 What confuses me are global setting (which suggests cluster-wide
 setting) and on a specific node (which suggests node level setting). I
 could just try it out, but it's hard to tell if the setting worked or not.
 :(


 On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote:

 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/modules-indices.html states It is a global setting
 that bubbles down to all the different shards allocated on a specific node.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 25 August 2014 03:12, Yongtao You yongt...@gmail.com wrote:

  Hi,

 Is the indices.memory.index_buffer_size configuration a cluster wide
 configuration or per node configuration? Do I need to set it on every node?
 Or just the master (eligible) node?

 Thanks.
 Yongtao

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1CmkjPAPJns3PjCmsFicu8KYV0DRjv9T2qacx636sy7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

2014-08-26 Thread joergpra...@gmail.com
Still broken with lzf-compress 1.0.3

https://gist.github.com/jprante/d2d829b497db4963aea5

Jörg


On Tue, Aug 26, 2014 at 7:54 PM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 Thanks for the logstash mapping command. I can reproduce it now.

 It's the LZF encoder that bails out at
 org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt

 which uses in turn sun.misc.Unsafe.getInt

 I have created a gist of the JVM crash file at

 https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b

 There has been a fix in LZF lately
 https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7

 for version 1.0.3 which has been released recently.

 I will build a snapshot ES version with LZF 1.0.3 and see if this works...

 Jörg



 On Mon, Aug 25, 2014 at 11:30 PM, tony.apo...@iqor.com wrote:

 I captured a WireShark trace of the interaction between ES and Logstash
 1.4.1.  The error occurs even before my data is sent.  Can you try to
 reproduce it on your testbed with this message I captured?

 curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y

 Contests of file 'y:
 {  template : logstash-*,  settings : {index.refresh_interval
 : 5s  },  mappings : {_default_ : {   _all : {enabled :
 true},   dynamic_templates : [ { string_fields : {
   match : *,   match_mapping_type : string,
 mapping : { type : string, index : analyzed,
 omit_norms : true,   fields : { raw :
 {type: string, index : not_analyzed, ignore_above : 256}
   }   } }   } ],   properties : {
 @version: { type: string, index: not_analyzed }, geoip
  : {   type : object, dynamic: true,
 path: full, properties : {   location : {
 type : geo_point } } }   }}  }}



 On Monday, August 25, 2014 3:53:18 PM UTC-4, tony@iqor.com wrote:

 I have no plugins installed (yet) and only changed es.logger.level to
 DEBUG in logging.yml.

 elasticsearch.yml:
 cluster.name: es-AMS1Cluster
 node.name: KYLIE1
 node.rack: amssc2client02
 path.data: /export/home/apontet/elasticsearch/data
 path.work: /export/home/apontet/elasticsearch/work
 path.logs: /export/home/apontet/elasticsearch/logs
 network.host:    = sanitized line; file contains actual
 server IP
 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: [s1, s2, s3, s5 , s6, s7]
   = Also sanitized

 Thanks,
 Tony




 On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote:

 I tested a simple Hello World document on Elasticsearch 1.3.2 with
 Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings.

 No issues.

 So I would like to know more about the settings in elasticsearch.yml,
 the mappings, and the installed plugins.

 Jörg


 On Sat, Aug 23, 2014 at 11:25 AM, joerg...@gmail.com 
 joerg...@gmail.com wrote:

 I have some Solaris 10 Sparc V440/V445 servers available and can try
 to reproduce over the weekend.

 Jörg


 On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir 
 rober...@elasticsearch.com wrote:

 How big is it? Maybe i can have it anyway? I pulled two ancient
 ultrasparcs out of my closet to try to debug your issue, but 
 unfortunately
 they are a pita to work with (dead nvram battery on both, zeroed mac
 address, etc.) Id still love to get to the bottom of this.
  On Aug 22, 2014 3:59 PM, tony@iqor.com wrote:

 Hi Adrien,
 It's a bunch of garbled binary data, basically a dump of the process
 image.
 Tony


 On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote:

 Hi Tony,

 Do you have more information in the core dump file? (cf. the Core
 dump written line that you pasted)


 On Thu, Aug 21, 2014 at 7:53 PM, tony@iqor.com wrote:

 Hello,
 I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to
 scale out of small x86 machine.  I get a similar exception running ES 
 with
 JAVA_OPTS=-d64.  When Logstash 1.4.1 sends the first message I get the
 error below on the ES process:


 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x7a9a3d8c, pid=14473, tid=209
 #
 # JRE version: 7.0_25-b15
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode
 solaris-sparc compressed oops)
 # Problematic frame:
 # V  [libjvm.so+0xba3d8c]  Unsafe_GetInt+0x158
 #
 # Core dump written. Default location: /export/home/elasticsearch/
 elasticsearch-1.3.2/core or core.14473
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #

 ---  T H R E A D  ---

 Current thread (0x000107078000):  JavaThread
 elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker
 #147} daemon [_thread_in_vm, id=209, stack(0x5b80,
 0x5b84)]

 siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN),
 si_addr=0x000709cc09e7


 I can 

Elastic HQ not getting back vendor info from Elasticsearch.

2014-08-26 Thread John Smith
I posted an issue with Elastic HQ here: 
https://github.com/royrusso/elasticsearch-HQ/issues/164

But just in case maybe an Elastic dev can have a look and see if it's 
Elasticsearch issue or not.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elastic HQ not getting back vendor info.

2014-08-26 Thread John Smith
I posted an issue with Elastic HQ 
here: https://github.com/royrusso/elasticsearch-HQ/issues/164

But just in case maybe an Elastic dev can have a look and see if it's 
Elasticsearch issue or not.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6161414-ad80-4881-bf87-ede7f1818437%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: groovy for scripting

2014-08-26 Thread Alex S.V.
providing self-update:

I found that I could create cross-request cache using next script (like a 
cross-request incrementer):

POST /test/_search
{
query: {match_all:{}},
script_fields: {
   a: { 
   script: import groovy.lang.Script;class A extends Script{static 
i=0;def run() {i++}},
   lang: groovy
   }
}
}

In good view mode the script is:

import groovy.lang.Script

class A extends Script{
  static i=0

  def run() {
 i++
  }
}

Actually here *i* variable is not thread-safe, but idea is clean - you need 
define a class, inherited from Script and implement abstract method run.
Also this class is access on each node-thread.
Now I'm looking for a solution to make a query-scope type counter (for 
one-node configuration). I think it's could be done by passing unique 
query_id in parameters, but I'm afraid of making code non thread safe, or 
vice versa - thread safe, but with reduce performance.
Researching more...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb402d2c-8820-4a1f-99e0-0453c0c82cf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


term_stats return sometime return meaningless number

2014-08-26 Thread youwei chen
Our elasticsearch instance sometime return meaningless number for 
terms_stats, query return correct data.

I am using Kibana as front end, this is generated query: 

{facets:{terms:{terms_stats:{value_field:metric,key_field:host,size:10,order:count},facet_filter:{fquery:{query:{filtered:{query:{bool:{should:[{query_string:{query:(service:\.StorageProxy.RecentReadLatencyMicros\)
 
AND (layer:\cassandra\) AND (@timestamp:[now-1m TO now]) AND 
host:169.26.4.167}}]}},filter:{bool:{must:[{range:{@timestamp:{from:1408992976055,to:now}}},{terms:{host:[169.26.4.167]}},{terms:{host:[169.26.4.167]}}],size:0}

max=4.6366831074216192E+18
mean=1.5455610358072064E+18
min=0
term=169.26.4.167
total=4.6366831074216192E+18

the metric field would have some number between 0 and 100 while the term 
stat report huge number.  If i delete index, it will show correct term 
stats again.

I tried refresh, close/open index, none seem to work except delete the 
index and recreate it.  Have anyone face similar issue?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5e141e56-7e01-4899-949f-c3d7f69a353d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Failing Replica Shards

2014-08-26 Thread David Kleiner
Hello,

In the past couple of days I've been getting a lot of error messages about 
corrupted replica shards.  The primary shards come up fast after ES process 
restart but replicas take a long time to come back. Sometimes it takes a 
few node restarts to 'kick' the nodes to start replica shards.

ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer.  It's a 
3-way cluster with 4 logstash feeders hanging off it. 

Here are the errors;

[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
Salvador Dali] [downloader-2014.08][4] received shard failed for 
[downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R], 
s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine 
failure, message [corrupted preexisting 
index][CorruptIndexException[[downloader-2014.08][4] Corrupted index 
[corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec 
footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520 
(resource: 
NIOFSIndexInput(path=/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc))
[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
Salvador Dali] [eventlog-2014.06][0] received shard failed for 
[eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message 
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0] 
Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by: 
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
footer=-1071082520 (resource: 
NIOFSIndexInput(path=/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd))
[2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 / 
Salvador Dali] [eventlog-2014.07][0] received shard failed for 
[eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message 
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0] 
Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by: 
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
footer=-1071082520 (resource: 
NIOFSIndexInput(path=/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd))



Thanks,

David

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c0af53fb-6fdd-4624-bf6c-9b9d50081689%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Data per node in ES

2014-08-26 Thread Gaurav Tiwari
Hi ,

We are analyzing ES for storing our log data (~ 400 GB/Day) and will be 
integrating Logstash and ES.  What is the maximum amount of data that can 
be stored on one node of ES ?

Regards,
Gaurav

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1b5123f-51a7-41b4-9915-d4ea705d23de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reduce Number of Segments

2014-08-26 Thread Michael McCandless
OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for
spinning disks.

Maybe try also disabling merge throttling and see if that has an effect?  6
MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com


On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker ch...@chris-decker.com
wrote:

 Mike,

 Thanks for the response.

 I'm running ES 1.2.1.  It appears the issue that you reported / corrected
 was included with ES 1.2.0.

 *Any other ideas / suggestions?  *Were the settings that I posted sane?


 Thanks!,
 Chris

 On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:

 Which version of ES are you using?  Versions before 1.2 have a bug that
 caused merge throttling to throttle far more than requested such that you
 couldn't get any faster than ~8 MB / sec.  See https://github.com/
 elasticsearch/elasticsearch/issues/6018

 Tiered merge policy is best.

 Mike McCandless

 http://blog.mikemccandless.com


 On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com
 wrote:

  All,

 I’m looking for advice on how to reduce the number of segments for my
 indices because in my use case (log analysis), quick searches are more
 important than real-time access to data.  I've turned many of the knobs
 available within ES, and read many blog postings, ES documentation, etc.,
 but still feel like there is room for important.

 Specific questions I have:
 1. How can I increase the current merge rate?  According to Elastic HQ,
 my merge rate is 6 MB/s (according to Elastic HQ).  I know I don't have
 SSDs, but with 15k drives it seems like I should be able to get better
 rates.  I tried increasing indices.store.throttle.max_bytes_per_sec
 from the default of 20mb to 40mb in my templates, but I didn't see a
 noticeable change in disk IOps or the merge rate the next day.  Did I do
 something incorrectly?  I'm going to experiment with setting it overall
 with index.store.throttle.max_bytes_per_sec and removing it from my
 templates.
 2. Should I move away from the default merge policy, or stick with the
 default (tiered)?

 Any advice you have is much appreciated; additional details on my
 situation are below.

 

 - I generate 2 indices per day - “high” and “low”.  I usually end up
 with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200
 segments for my ‘low’ index, which I then optimize once I roll-over to the
 next day’s indices.
 - 4 ES servers (soon to be 8).
   — Each server has:
 12 Xeon cores running at 2.3 GHz
 15k drives
 128 GB of RAM
 68 GB used for OS / file system machine
 60 GB used by 2 JVMs
 - Index ~ 750 GB per day; 1.5 TB if you include the replicas
 - Relevant configs:
 TEMPLATE:
   index.refresh_interval : 60s,
   index.number_of_replicas : 1,
   index.number_of_shards : 4,
   index.merge.policy.max_merged_segment : 50g,
   index.merge.policy.segments_per_tier : 5,
   index.merge.policy.max_merge_at_once : “5”,
indices.store.throttle.max_bytes_per_sec : 40mb.

 ELASTICSEARCH.YML:
 indices.memory.index_buffer_size: 30%



 Thanks in advance!,
 Chris

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch for logging. HOW to configure automatic creation of the new index every day?

2014-08-26 Thread David Kleiner
Hello Konstantin,

You can use index value of name-%{+.MM.dd}  in your elasticsearch 
output in logstash

(link: http://logstash.net/docs/1.4.2/outputs/elasticsearch#index)

HTH,

David

On Tuesday, August 26, 2014 10:01:39 AM UTC-7, Konstantin Erman wrote:

 Most of the guides I could find recommend creation of *one index per day* 
 when Elastic is used to store and query log files. Unfortunately not a 
 single guide dares to explain *HOW exactly shall I configure freshly 
 installed Elastic to create new index every day*. Could somebody please 
 help me with it?

 A few bits of additional info: I deal with Elastic on Windows Server (or 
 may be on Azure, but not any Linux) and I (plan) to send log events to 
 Elastic using Serilog. Any advise for those special circumstances 
 appreciated.

 Thank you!
 Konstantin


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7c2fbf8d-1c5e-435d-945b-2e6baf012abe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: indices.memory.index_buffer_size

2014-08-26 Thread Michael McCandless
See also https://github.com/elasticsearch/elasticsearch/pull/7440 (will be
in 1.4.0) which returns the actual RAM buffer size assigned to that shard
by the little dance.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Aug 26, 2014 at 2:15 PM, Nikolas Everett nik9...@gmail.com wrote:

 I just looked at this code!

 Its a setting that you set globally at the cluster level.  It takes effect
 per node.  What that means is that for every active shard on each the
 node gets an equal share of that much space.  Active means has been
 written to in the past six minutes or so.  When a node first starts all
 shards are active assumed active and those that are not updated at all lose
 active status after the timeout.  You can watch the little dance it does by
 setting
   index.engine.internal: DEBUG
 in logging.yml.

 Now - I'm not actually sure how important a setting it it is.  I opened
 https://github.com/elasticsearch/elasticsearch/issues/7441 to get suggest
 allowing better spreading it around.  Mike'll probably close it if
 spreading it around wouldn't really help things much.

 Nik

 On Tue, Aug 26, 2014 at 2:07 PM, Yongtao You yongtao@gmail.com
 wrote:

 Thanks Mark.

 What confuses me are global setting (which suggests cluster-wide
 setting) and on a specific node (which suggests node level setting). I
 could just try it out, but it's hard to tell if the setting worked or not.
 :(


 On Sunday, August 24, 2014 3:13:17 PM UTC-7, Mark Walkom wrote:

 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/modules-indices.html states It is a global setting
 that bubbles down to all the different shards allocated on a specific node.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 25 August 2014 03:12, Yongtao You yongt...@gmail.com wrote:

  Hi,

 Is the indices.memory.index_buffer_size configuration a cluster wide
 configuration or per node configuration? Do I need to set it on every node?
 Or just the master (eligible) node?

 Thanks.
 Yongtao

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f67e3a30-521c-4c13-8620-c79133cea01c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d76f4c67-9250-4ab9-b02d-0f0c78b33be6%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1CmkjPAPJns3PjCmsFicu8KYV0DRjv9T2qacx636sy7g%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1CmkjPAPJns3PjCmsFicu8KYV0DRjv9T2qacx636sy7g%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRcwFh-qCtu6B15Ni9KjzCYVojXtc4KTzMc%2Be1BVHZ%3D-Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can't open file to read checksums

2014-08-26 Thread Ivan Brusic
A few questions:

What version of Elasticsearch are you using?
Are you using the Java client and is it the same version of the cluster?
Did you upgrade recently and was the index built with an older version of
Elasticsearch?

Elasticsearch recently added checksum verification (1.3?), so perhaps you
have some sort of version mismatch.

Cheers,

Ivan



On Mon, Aug 25, 2014 at 10:52 AM, Casper Thrane casper.s.thr...@gmail.com
wrote:

 Hi!

 We get the following errors, on two of our nodes. And after that our
 cluster doesn't work. I have no idea what it means.

 [2014-08-25 17:46:39,323][WARN ][indices.store]
 [p-elasticlog03] Can't open file to read checksums
 java.io.FileNotFoundException: No such file [_6cq_es090_0.doc]
 at
 org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:173)
 at
 org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:144)
 at
 org.elasticsearch.index.store.DistributorDirectory.openInput(DistributorDirectory.java:130)
 at
 org.elasticsearch.index.store.Store$MetadataSnapshot.checksumFromLuceneFile(Store.java:532)
 at
 org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:459)
 at
 org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:433)
 at
 org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:271)
 at
 org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186)
 at
 org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140)
 at
 org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61)
 at
 org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277)
 at
 org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268)
 at
 org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)

 Br
 Casper

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5f417878-3e49-478d-90e7-ca5c42734567%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/5f417878-3e49-478d-90e7-ca5c42734567%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBYGbr-k%2BB41UUwUY7DVk27KUiuf1xY5kekkXoNRc3grg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch processing pipeline capability?

2014-08-26 Thread Kevin B
Is there any facility in elasticsearch to help with sending terms to an 
external processes after lucene processing (tokenization, filters, etc)? 
 The idea here is having some external analysis / nlp code run against the 
documents while keeping all the pre-processing choices consistent and in 
one place (i.e. the analysis setup in elasticsearch index configuration).

I am not very familiar with Lucene, but I believe possibly their update 
request processor is intended for scenarios like this needing a simple 
pipeline.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: term_stats return sometime return meaningless number

2014-08-26 Thread youwei chen
Additional information:

take mean of boolean value return 4,607,182,418,800,017,408

Thanks

On Tuesday, August 26, 2014 3:58:43 PM UTC-4, youwei chen wrote:

 Our elasticsearch instance sometime return meaningless number for 
 terms_stats, query return correct data.

 I am using Kibana as front end, this is generated query: 

 {facets:{terms:{terms_stats:{value_field:metric,key_field:host,size:10,order:count},facet_filter:{fquery:{query:{filtered:{query:{bool:{should:[{query_string:{query:(service:\.StorageProxy.RecentReadLatencyMicros\)
  
 AND (layer:\cassandra\) AND (@timestamp:[now-1m TO now]) AND 
 host:169.26.4.167}}]}},filter:{bool:{must:[{range:{@timestamp:{from:1408992976055,to:now}}},{terms:{host:[169.26.4.167]}},{terms:{host:[169.26.4.167]}}],size:0}

 max=4.6366831074216192E+18
 mean=1.5455610358072064E+18
 min=0
 term=169.26.4.167
 total=4.6366831074216192E+18

 the metric field would have some number between 0 and 100 while the term 
 stat report huge number.  If i delete index, it will show correct term 
 stats again.

 I tried refresh, close/open index, none seem to work except delete the 
 index and recreate it.  Have anyone face similar issue?

 Thanks.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/781d0077-93d4-4d1d-93ca-953e84704964%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel not showing nodes stats

2014-08-26 Thread Jeff Byrnes
I'm experiencing a similar issue to this. We have two clusters:

   - 2 node monitoring cluster (1 master/data  1 just data)
   - 5 node production cluster (2 data, 3 masters)
   
The output below is from the non-master data node of the Marvel monitoring 
cluster. There are no errors being reported by any of the production nodes.

[2014-08-26 21:10:51,503][DEBUG][action.search.type   ] 
[stage-search-marvel-1c] [.marvel-2014.08.26][2], 
node[iGRH8Gc2QO698RMlWy8rgQ], [P], s[STARTED]: Failed to execute 
[org.elasticsearch.action.search.SearchRequest@355e93ff]
org.elasticsearch.transport.RemoteTransportException: 
[stage-search-marvel-1b][inet[/10.99.111.122:9300]][search/phase/query]
Caused by: org.elasticsearch.search.SearchParseException: 
[.marvel-2014.08.26][2]: query[ConstantScore(BooleanFilter(+*:* 
+cache(_type:index_stats) +cache(@timestamp:[140908680 TO 
140908746])))],from[-1],size[10]: Parse Failure [Failed to parse source 
[{size:10,query:{filtered:{query:{match_all:{}},filter:{bool:{must:[{match_all:{}},{term:{_type:index_stats}},{range:{@timestamp:{from:now-10m/m,to:now/m}}}],facets:{timestamp:{terms_stats:{key_field:index.raw,value_field:@timestamp,order:term,size:2000}},primaries.docs.count:{terms_stats:{key_field:index.raw,value_field:primaries.docs.count,order:term,size:2000}},primaries.indexing.index_total:{terms_stats:{key_field:index.raw,value_field:primaries.indexing.index_total,order:term,size:2000}},total.search.query_total:{terms_stats:{key_field:index.raw,value_field:total.search.query_total,order:term,size:2000}},total.merges.total_size_in_bytes:{terms_stats:{key_field:index.raw,value_field:total.merges.total_size_in_bytes,order:term,size:2000}},total.fielddata.memory_size_in_bytes:{terms_stats:{key_field:index.raw,value_field:total.fielddata.memory_size_in_bytes,order:term,size:2000]]
at 
org.elasticsearch.search.SearchService.parseSource(SearchService.java:664)
at 
org.elasticsearch.search.SearchService.createContext(SearchService.java:515)
at 
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:487)
at 
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:256)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:688)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:677)
at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: 
Facet [timestamp]: failed to find mapping for index.raw
at 
org.elasticsearch.search.facet.termsstats.TermsStatsFacetParser.parse(TermsStatsFacetParser.java:126)
at 
org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
at 
org.elasticsearch.search.SearchService.parseSource(SearchService.java:648)
... 9 more
[2014-08-26 21:10:51,503][DEBUG][action.search.type   ] 
[stage-search-marvel-1c] [.marvel-2014.08.26][2], 
node[iGRH8Gc2QO698RMlWy8rgQ], [P], s[STARTED]: Failed to execute 
[org.elasticsearch.action.search.SearchRequest@32f235e9]
org.elasticsearch.transport.RemoteTransportException: 
[stage-search-marvel-1b][inet[/10.99.111.122:9300]][search/phase/query]
Caused by: org.elasticsearch.search.SearchParseException: 
[.marvel-2014.08.26][2]: query[ConstantScore(BooleanFilter(+*:* 
+cache(_type:node_stats) +cache(@timestamp:[140908680 TO 
140908746])))],from[-1],size[10]: Parse Failure [Failed to parse source 
[{size:10,query:{filtered:{query:{match_all:{}},filter:{bool:{must:[{match_all:{}},{term:{_type:node_stats}},{range:{@timestamp:{from:now-10m/m,to:now/m}}}],facets:{timestamp:{terms_stats:{key_field:node.ip_port.raw,value_field:@timestamp,order:term,size:2000}},master_nodes:{terms:{field:node.ip_port.raw,size:2000},facet_filter:{term:{node.master:true}}},os.cpu.usage:{terms_stats:{key_field:node.ip_port.raw,value_field:os.cpu.usage,order:term,size:2000}},os.load_average.1m:{terms_stats:{key_field:node.ip_port.raw,value_field:os.load_average.1m,order:term,size:2000}},jvm.mem.heap_used_percent:{terms_stats:{key_field:node.ip_port.raw,value_field:jvm.mem.heap_used_percent,order:term,size:2000}},fs.total.available_in_bytes:{terms_stats:{key_field:node.ip_port.raw,value_field:fs.total.available_in_bytes,order:term,size:2000}},fs.total.disk_io_op:{terms_stats:{key_field:node.ip_port.raw,value_field:fs.total.disk_io_op,order:term,size:2000]]
at 
org.elasticsearch.search.SearchService.parseSource(SearchService.java:664)
at 

Re: Parent/Child query performance in version 1.1.2

2014-08-26 Thread Mark Greene
Just wanted to close the loop on this in case anyone stumbled upon the same 
issue.

After upgrading to version 1.3.2 which had the performance increase 
stemming from https://github.com/elasticsearch/elasticsearch/pull/5846, we 
were able to see a dramatic decrease in parent/child query latency. We're 
executing queries under 150ms which is manageable for now and will be 
eagerly awaiting further improvements from the work Clinton highlighted 
here: https://github.com/elasticsearch/elasticsearch/issues/7394.

Along the way in our testing we got a little confused as we attempted to do 
our troubleshooting on 1 data node in order to keep things simple, this 
manifested in some misplaced assumptions around the performance increases 
that came from work released in 1.2.0. In our testing on a single node, we 
did _not_ observe a latency decrease at all when going from 1.1.2 to 1.3.2. 
However, when we changed our test cluster to use two data nodes, we saw a 
huge improvement. So my earlier assertion around not seeing those 
improvements in version 1.3.2 was incorrect although I'm still confused as 
to why a single node configuration was not benefiting.

In any case, wanted to thank the ES developers for being generous with 
their time helping us track this issue down. Now that I realize the 
incredible pace in which ES versions are released, we'll be much more 
vigilant about keeping up.

Thanks again!


On Monday, August 25, 2014 11:32:38 AM UTC-4, Mark Greene wrote:

 Hey Clinton,

 Thanks for the heads up on what's on the horizon. That definitely sounds 
 like a drastic improvement. That being said, my fear here is that even with 
 that improvement, this data model (parent/child) doesn't seem to that 
 performant with a moderate amount of documents. In order for us to really 
 adopt this methodology of using parent/child, we'd expect to see sub 100ms 
 performance so long as we were feeding ES with enough RAM. 

 My hunch here is there must be some code path that is hit when running on 
 more than 1 data node that either doesn't write to the cache or skips it on 
 the read and hits the disk. We don't have a ton of load on our data nodes, 
 CPU is well under 30% and IOWait is usually under 0.30.

 Just to reiterate, when we run the parent/child query on one data node, it 
 runs in less than 100ms, when it runs across two data nodes, it's 10s. 
 This is being experienced on version 1.1.2 and 1.3.2.

 On Monday, August 25, 2014 10:55:15 AM UTC-4, Clinton Gormley wrote:

 Something else to note: parent-child now uses global ordinals to make 
 queries 3x faster than they were previously, but global ordinals need to be 
 rebuilt after the index has refreshed (assuming some data has changed).

 Currently there is no way to refresh p/c global ordinals eagerly (ie 
 during the refresh phase) and so it happens on the first query after a 
 refresh.  1.3.3 and 1.4.0 will include an option to allow eager building of 
 global ordinals which should remove this latency spike: 
 https://github.com/elasticsearch/elasticsearch/issues/7394

 You may want to consider increasing the refresh_interval so that global 
 ordinals remain valid for longer.


 On 25 August 2014 16:48, Mark Greene ma...@evertrue.com wrote:

 Hi Adrien,

 Thanks for reaching out.

 We actually were exited to see the performance improvements stated in 
 the 1.2.0 release notes so we upgraded to 1.3.2. We saw some performance 
 improvement but it wasn't orders of magnitude and queries are still running 
 very slow.

 We also tried your suggestion of using the 'preference=_local' query 
 param but we didn't see any difference there. Additionally, running the 
 query 10 times, we saw no improvement in speed.

 Currently, the only major performance increase we've seen with 
 parent/child queries is dropping down to 1 data node, at which, we see 
 queries executing well under the 100ms mark.




 On Friday, August 22, 2014 6:42:27 PM UTC-4, Adrien Grand wrote:

 Hi Mark,

 Given that you had 1 replica in your first setup, it could take several 
 queries to warm up the field data cache completely, does the query still 
 take 16 seconds to run if you run it eg. 10 times? (3 should be enough, 
 but 
 just to be sure)

 Does it change anything if you query elasticsearch with 
 preference=_local? This should be equivalent to your single-node setup, so 
 it would be interesting to see if that changes something.

 As a side note, you might want to try out a more recent version of 
 Elasticsearch since parent/child performance improved quite significantly 
 in 1.2.0 because of https://github.com/elasticsearch/elasticsearch/
 pull/5846



 On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene ma...@evertrue.com 
 wrote:

 I wanted to update the list with an interesting piece of information. 
 We found that when we took one of our two data nodes out of the cluster, 
 leaving just one data node with no replicas, the query performance 
 increased dramatically. The queries are now 

Re: elasticsearch processing pipeline capability?

2014-08-26 Thread joergpra...@gmail.com
If you want to retrieve the term list of an index after Lucene processing
via REST HTTP API, you can try

https://github.com/jprante/elasticsearch-index-termlist

Jörg


On Tue, Aug 26, 2014 at 10:41 PM, Kevin B blaisde...@gmail.com wrote:

 Is there any facility in elasticsearch to help with sending terms to an
 external processes after lucene processing (tokenization, filters, etc)?
  The idea here is having some external analysis / nlp code run against the
 documents while keeping all the pre-processing choices consistent and in
 one place (i.e. the analysis setup in elasticsearch index configuration).

 I am not very familiar with Lucene, but I believe possibly their update
 request processor is intended for scenarios like this needing a simple
 pipeline.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGQ1NaTJn31H%3DTn7xLQTwagXWSDT5vM3xDLtt9wfcTaTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How do I start elasticsearch as a service?

2014-08-26 Thread Eric Greene
Forgive me I'm a little lost.

I am working on deploying elasticsearch on a AWS server.  Previously in 
development I have started elasticsearch using ./bin/elasticsearch 
-Des.config=/etc/elasticsearch/elasticsearch.yml

But in live deployment, I want to keep elasticsearch running as a service...

I have 1.2.0 installed on Ubuntu 12.04 on my AWS instance.

I run sudo /etc/init.d/elasticsearch start and I get:
* Starting Elasticsearch server

I check sudo /etc/init.d/elasticsearch status and I get:
* elasticsearch is not running

I'm not sure how to troubleshoot.  Any advice or suggestions?  Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How do I start elasticsearch as a service?

2014-08-26 Thread Mark Walkom
Check the logs under /var/log/elasticsearch, they should have something.

Also please be aware that 1.2.0 has a critical bug and you should be using
1.2.1 instead.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 August 2014 08:42, Eric Greene ericdgre...@gmail.com wrote:

 Forgive me I'm a little lost.

 I am working on deploying elasticsearch on a AWS server.  Previously in
 development I have started elasticsearch using ./bin/elasticsearch
 -Des.config=/etc/elasticsearch/elasticsearch.yml

 But in live deployment, I want to keep elasticsearch running as a
 service...

 I have 1.2.0 installed on Ubuntu 12.04 on my AWS instance.

 I run sudo /etc/init.d/elasticsearch start and I get:
 * Starting Elasticsearch server

 I check sudo /etc/init.d/elasticsearch status and I get:
 * elasticsearch is not running

 I'm not sure how to troubleshoot.  Any advice or suggestions?  Thanks

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Ya1xeRTaebytv9D8Zv9zTK-7XoB3zK4vhvRNHQuX%3DgMQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic HQ not getting back vendor info from Elasticsearch.

2014-08-26 Thread Mark Walkom
ElasticHQ is a community plugin, the ES devs can't help here.

I have raised issues against ElasticHQ in the past and Roy has fixed them
pretty quickly :)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 August 2014 04:44, John Smith java.dev@gmail.com wrote:

 I posted an issue with Elastic HQ here: https://github.com/
 royrusso/elasticsearch-HQ/issues/164

 But just in case maybe an Elastic dev can have a look and see if it's
 Elasticsearch issue or not.

 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624a4FzMnmVpR-piEaTpbqMzizdfNdo3QdcuG-ascgZt5vg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Aggregation query works with search, but not with msearch

2014-08-26 Thread Dhruv Garg
I am trying to troubleshoot the following observation:

Following code works as expected: 

Elasticsearch::Model.client.search search_type: 'count', index: 
target_indices, body: query

Response:

{took=2, timed_out=false, _shards={total=2, successful=2, 
failed=0}, hits={total=6, max_score=0.0, hits=[]}, 
aggregations={recent={doc_count=3, 
searches={buckets=[{key=user-1, doc_count=3}]

However, when using the above in an msearch, the response is not useful:

Elasticsearch::Model.client.msearch body: [{ search_type: 'count', index: 
target_indices, search: query }]

Response:

{responses=[{took=0, timed_out=false, _shards={total=2, 
successful=2, failed=0}, hits={total=6, max_score=0.0, 
hits=[]}}]}

---

What am I missing?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fef8c41-7232-43e6-8632-9e2e5058240d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


alerting in Marvel

2014-08-26 Thread kti_sk
Hi,
I started using Marvel for my cluster monitoring. 
Does Marvel have a way to set notification such as send me email if cpu 
load is over 80%?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Getting different results while using bool query vs bool query with function score query

2014-08-26 Thread Akshay Shukla
I am trying to add a custom boost to the different should clauses in the 
bool query, but I am getting different number of results when I use the 
bool query with 2 should clauses containing 2 simple query string query vs 
a bool query with 2 should clauses with 2 function score query 
encapsulating the same simple query string queries. 
The following query returns me 2 results for my data set: 
{ 
query : { 
filtered : { 
query : { 
bool : { 
should : [ { 
simple_query_string : { 
query : 128, 
fields : [ content.name_enu.simple ] 
} 
}, { 
simple_query_string : { 
query : 128, 
fields : [ content.name_enu.simple_with_numeric ] 
} 
} ] 
} 
}, 
filter : { 
bool : { 
must : [ { 
term : { 
securityInfo.securityType : open 
} 
}, { 
bool : { 
must : [ { 
term : { 
sourceId.sourceSystem : jmeter_007971_numeric 
} 
}, { 
term : { 
sourceId.type : file 
} 
} ] 
} 
} ], 
_cache : true 
} 
} 
} 
}, 
fields : [ elementId, sourceId.id, sourceId.type, 
sourceId.sourceSystem, sourceVersion, content.name_enu ] 
} 



Where as if I use the following query I get 5 results, same simple query 
strings but with function scores: 
{ 
query : { 
filtered : { 
query : { 
bool : { 
should : [ { 
function_score : { 
query : { 
simple_query_string : { 
query : 128, 
fields : [ content.name_enu.simple ] 
} 
}, 
boost_factor : 1.5 
} 
}, { 
function_score : { 
query : { 
simple_query_string : { 
query : 128, 
fields : [ content.name_enu.simple_with_numeric ] 
} 
}, 
boost_factor : 2.5 
} 
} ] 
} 
}, 
filter : { 
bool : { 
must : [ { 
term : { 
securityInfo.securityType : open 
} 
}, { 
bool : { 
must : [ { 
term : { 
sourceId.sourceSystem : jmeter_007971_numeric 
} 
}, { 
term : { 
sourceId.type : file 
} 
} ] 
} 
} ], 
_cache : true 
} 
} 
} 
}, 
fields : [ elementId, sourceId.id, sourceId.type, 
sourceId.sourceSystem, sourceVersion, content.name_enu ] 
} 



From my understanding of how the should clause works I was expecting both 
the queries to return 5 results but I am not able to understand why the 1st 
query returns me 2 results for my data set. The content.name_enu.simple 
uses a simple analyzer, whereas simple_with_numeric uses whitespace 
tokenizer and lowercase filter

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e31e1c7-8b07-4220-abc9-c520d681495a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: alerting in Marvel

2014-08-26 Thread Mark Walkom
Nope.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 August 2014 09:14, kti...@hotmail.com wrote:

 Hi,
 I started using Marvel for my cluster monitoring.
 Does Marvel have a way to set notification such as send me email if cpu
 load is over 80%?

 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aY%2BjtFF%3D00wftO8ds%3DCO5J-F%3DQVTEwfO1xJVcmMV2pHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: gateway.recover_after_nodes minimum_master_nodes in a distributed environment?

2014-08-26 Thread Mark Walkom
Only master eligible for discovery.zen.minimum_master_nodes, so in your
case it is 1. And that's bad as you can end up with a split brain
situation. You should, if you can, make all three nodes master eligible.

gateway.recover_after_nodes is all nodes, as per
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 26 August 2014 23:37, Chris Neal chris.n...@derbysoft.net wrote:

 Hello all,

 Question
 about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in
 a distributed ES cluster.  By distributed I mean I have:

 2 nodes that are data only:
 'node.data' = 'true',
 'node.master' = 'false',
 'http.enabled' = 'false',

 1 node that is a master/search only node:
 'node.master' = 'true',
 'node.data' = 'false',
 'http.enabled' = 'true',

 When setting discovery.zen.minimum_master_nodes, is the (n / 2) + 1
 formula including *all* nodes of all types in the cluster, or just
 those who can be masters?

 Similarly, when setting gateway.recover_after_nodes, is this value the
 number of all nodes of all types in the cluster, or just those that are
 data nodes?

 Thank you very much for your time!
 Chris

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624biBGL9O7zaT%3Dfm3%2BfNRCMoDrQDR_CRCV%3DhM9FZCAqOpw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How do I start elasticsearch as a service?

2014-08-26 Thread Eric Greene
Thanks Mark, I found that if I comment out the line in elasticsearch.yml 
that sets the data path, it works.

I will upgrade as you have suggested, thanks for that.


On Tuesday, August 26, 2014 4:04:05 PM UTC-7, Mark Walkom wrote:

 Check the logs under /var/log/elasticsearch, they should have something.

 Also please be aware that 1.2.0 has a critical bug and you should be using 
 1.2.1 instead.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 27 August 2014 08:42, Eric Greene ericd...@gmail.com javascript: 
 wrote:

 Forgive me I'm a little lost.

 I am working on deploying elasticsearch on a AWS server.  Previously in 
 development I have started elasticsearch using ./bin/elasticsearch 
 -Des.config=/etc/elasticsearch/elasticsearch.yml

 But in live deployment, I want to keep elasticsearch running as a 
 service...

 I have 1.2.0 installed on Ubuntu 12.04 on my AWS instance.

 I run sudo /etc/init.d/elasticsearch start and I get:
 * Starting Elasticsearch server

 I check sudo /etc/init.d/elasticsearch status and I get:
 * elasticsearch is not running

 I'm not sure how to troubleshoot.  Any advice or suggestions?  Thanks

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/1c191994-ba5c-495d-b5e8-4e0bed3c4845%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58806f15-0a63-44f6-9a35-85a460384fa5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Data per node in ES

2014-08-26 Thread Mark Walkom
Depends.
How much disk do you have? RAM? CPU? Java version and release? ES version?
What's your query load like? Are you doing lots of aggregates or facets?

The best way to know is to start using ELK on an platform indicative of
your intended server size and then see how much data a single node can
handle, then extrapolate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 August 2014 06:24, Gaurav Tiwari gtins...@gmail.com wrote:

 Hi ,

 We are analyzing ES for storing our log data (~ 400 GB/Day) and will be
 integrating Logstash and ES.  What is the maximum amount of data that can
 be stored on one node of ES ?

 Regards,
 Gaurav

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c1b5123f-51a7-41b4-9915-d4ea705d23de%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c1b5123f-51a7-41b4-9915-d4ea705d23de%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YwMUZhTQ4Bt-DD67CQ3Z2ykA83ZA4GmyH_yjtBLAeZPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: alerting in Marvel

2014-08-26 Thread Mark Walkom
Also, you should really be monitoring your systems and core measurements
(disk, CPU etc) with something specific for the job.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 August 2014 09:16, Mark Walkom ma...@campaignmonitor.com wrote:

 Nope.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 27 August 2014 09:14, kti...@hotmail.com wrote:

 Hi,
 I started using Marvel for my cluster monitoring.
 Does Marvel have a way to set notification such as send me email if cpu
 load is over 80%?

 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624b35tjeCpEpgFZKU1wJfLWjuT8vcD_mSrx5DuueLpc6VA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: alerting in Marvel

2014-08-26 Thread kti_sk
Hi,
My goal was to figure out if i need to scale out if there is a sudden spike 
in the load.
Can you be more specific about something specific for the job?


On Tuesday, August 26, 2014 4:32:32 PM UTC-7, Mark Walkom wrote:

 Also, you should really be monitoring your systems and core measurements 
 (disk, CPU etc) with something specific for the job.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 27 August 2014 09:16, Mark Walkom ma...@campaignmonitor.com 
 javascript: wrote:

 Nope.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 27 August 2014 09:14, kti...@hotmail.com javascript: wrote:

 Hi,
 I started using Marvel for my cluster monitoring. 
 Does Marvel have a way to set notification such as send me email if cpu 
 load is over 80%?

 Thanks

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da2d9029-8f74-4ad6-bf0d-dcca67517dfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Micro Analysis in Kibana

2014-08-26 Thread Mungeol Heo
The question is how the micro analysis of the Kibana cloud do this without 
setting 'not_analyzed' to the fields?

On Saturday, April 5, 2014 4:55:20 AM UTC+9, Binh Ly wrote:

 You'll need to set the field name to not_analyzed so that you can get a 
 distinct value for the whole field (instead of tokenized values):

 {
   mappings: {
 doc: {
   properties: {
 name: {
   type: string,
   index: not_analyzed
 }
   }
 }
   }
 }

 After that, you can do a terms facet on name and you'll get the count that 
 you want.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fc80242-d98b-4227-932a-171352453dcf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Micro Analysis in Kibana

2014-08-26 Thread Mungeol Heo
The question is how does the micro analysis of the Kibana can do this 
without setting 'not_analyzed' to the fields?

On Saturday, April 5, 2014 4:55:20 AM UTC+9, Binh Ly wrote:

 You'll need to set the field name to not_analyzed so that you can get a 
 distinct value for the whole field (instead of tokenized values):

 {
   mappings: {
 doc: {
   properties: {
 name: {
   type: string,
   index: not_analyzed
 }
   }
 }
   }
 }

 After that, you can do a terms facet on name and you'll get the count that 
 you want.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a3f3fc86-17fd-48be-bf19-3a796c84b464%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


RE: alerting in Marvel

2014-08-26 Thread KimTaein
ok so they are for monitoring the system running Elasticsearch.
However, if i want to be notified of ES specific data points such as its JVM 
memory % there doesn't seem to be a solution.
 
Thanks
 
From: ma...@campaignmonitor.com
Date: Wed, 27 Aug 2014 10:05:05 +1000
Subject: Re: alerting in Marvel
To: elasticsearch@googlegroups.com

Nagios, Zabbix, PRTG, Observium, or anything cloud hosted.Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor


email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 August 2014 09:59,  kti...@hotmail.com wrote:


Hi,
My goal was to figure out if i need to scale out if there is a sudden spike in 
the load.
Can you be more specific about something specific for the job?


On Tuesday, August 26, 2014 4:32:32 PM UTC-7, Mark Walkom wrote:

Also, you should really be monitoring your systems and core measurements (disk, 
CPU etc) with something specific for the job.Regards,




Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com




On 27 August 2014 09:16, Mark Walkom ma...@campaignmonitor.com wrote:




Nope.Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com



web: www.campaignmonitor.com


On 27 August 2014 09:14,  kti...@hotmail.com wrote:





Hi,I started using Marvel for my cluster monitoring. Does Marvel have a way to 
set notification such as send me email if cpu load is over 80%?





Thanks




-- 

You received this message because you are subscribed to the Google Groups 
elasticsearch group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/45fb7614-5b65-425c-b858-e4dce4bee4d3%40googlegroups.com.






For more options, visit https://groups.google.com/d/optout.










-- 

You received this message because you are subscribed to the Google Groups 
elasticsearch group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da2d9029-8f74-4ad6-bf0d-dcca67517dfa%40googlegroups.com.



For more options, visit https://groups.google.com/d/optout.







-- 

You received this message because you are subscribed to a topic in the Google 
Groups elasticsearch group.

To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/elasticsearch/dLUD-w6kNtk/unsubscribe.

To unsubscribe from this group and all its topics, send an email to 
elasticsearch+unsubscr...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YkiLg3xTNPgUnkaxNmsVhV2gMd8oPEqM8Ue-UHFWvrSA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/BLU182-W742A5C5D7DCDFFB8DF3D19FEDD0%40phx.gbl.
For more options, visit https://groups.google.com/d/optout.


Re: gateway.recover_after_nodes minimum_master_nodes in a distributed environment?

2014-08-26 Thread Chris Neal
Thank you Mark.  Makes perfect sense.

Chris


On Tue, Aug 26, 2014 at 6:25 PM, Mark Walkom ma...@campaignmonitor.com
wrote:

 Only master eligible for discovery.zen.minimum_master_nodes, so in your
 case it is 1. And that's bad as you can end up with a split brain
 situation. You should, if you can, make all three nodes master eligible.

 gateway.recover_after_nodes is all nodes, as per
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html#recover-after

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 26 August 2014 23:37, Chris Neal chris.n...@derbysoft.net wrote:

 Hello all,

 Question
 about gateway.recover_after_nodes and discovery.zen.minimum_master_nodes in
 a distributed ES cluster.  By distributed I mean I have:

 2 nodes that are data only:
 'node.data' = 'true',
 'node.master' = 'false',
 'http.enabled' = 'false',

 1 node that is a master/search only node:
 'node.master' = 'true',
 'node.data' = 'false',
 'http.enabled' = 'true',

 When setting discovery.zen.minimum_master_nodes, is the (n / 2) + 1
 formula including *all* nodes of all types in the cluster, or just
 those who can be masters?

 Similarly, when setting gateway.recover_after_nodes, is this value the
 number of all nodes of all types in the cluster, or just those that are
 data nodes?

 Thank you very much for your time!
 Chris

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAND3DphDg41GUrj-YLfU7W0_L6veTXMJjJPJ7Wfu6V9VsvdKHw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEM624biBGL9O7zaT%3Dfm3%2BfNRCMoDrQDR_CRCV%3DhM9FZCAqOpw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEM624biBGL9O7zaT%3Dfm3%2BfNRCMoDrQDR_CRCV%3DhM9FZCAqOpw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpg8z92md3%3D-njUj4Zb%2BXu7Z65XOzDK4hw%2B6XjKv2pGSzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is it possible to register a RestFilter without creating a plugin?

2014-08-26 Thread Jinyuan Zhou
Thanks Vineeth,
But I guess it  does not change anything about REST API if elasticsearch
offer some way that is easier than building  a plugin to allow registering
RestFiles to rest api calls.  For a lot of frameworks, it is very common to
provide configuration based approach to register some of  pre/pro
processors around  services. I hope ES provide this kind of mechanism. But
my first impression, it does not have such support at this time.
Regards,
Jack

Jinyuan (Jack) Zhou


On Tue, Aug 26, 2014 at 5:57 AM, vineeth mohan vm.vineethmo...@gmail.com
wrote:

 Hello Jinyuan ,

 I dont feel this is possible.
 In such a provision , how will you define what the REST API will do ?

 Thanks
Vineeth


 On Tue, Aug 26, 2014 at 2:41 AM, Jinyuan Zhou zhou.jiny...@gmail.com
 wrote:

 Thanks,

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/41dab07d-b7f1-4622-8c77-a9d56b19abed%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/g_veXqDhQP4/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEkN%3DsgLqmkT-vwQ%2BptCs0LmPa0BDw5hX3A5Yzg-Wx_A%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEkN%3DsgLqmkT-vwQ%2BptCs0LmPa0BDw5hX3A5Yzg-Wx_A%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANBTPCFM0VWQJcwkdfF%3DEecEXOPicQQ5K7m26aWsLwycSfpLiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Function Query with an aggregation function of nested field

2014-08-26 Thread Srinivasan Ramaswamy
Any thoughts anyone ? I am primarily looking for an answer to my 2nd 
question.

On Tuesday, August 26, 2014 10:14:37 AM UTC-7, Srinivasan Ramaswamy wrote:

 I have documents with the above mentioned schema. 


 authorId : 10
 authorName: Joshua Bloch
 books: {
 {
 bookId: 101
 bookName:  Effective Java
 description : effective java book with useful recommendations
 Category: 1
 sales: {
 {
 keyword: effective java
 count: 200
 },
 {
 keyword: java tips
 count: 100
 },
 {
 keyword: java joshua bloch
 count: 50
 } 
 }
 createDate: 08-25-2014
 },
 {
 bookId: 102,
 bookName: Java Puzzlers
 description : Java Puzzlers: Traps, Pitfalls, and Corner Cases 
 Category: 2
 sales: {
 {
 keyword: java puzzlers
 count: 100
 },
 {
 keyword: joshua bloch puzzler
 count: 50
 }
 }
 }
 }

 The sales information is stored with each book along with the search query 
 that lead to that sales. If the user applied a category filter, I would 
 like to count only books that belong to that category.

 I would like to sort the list of authors returned based on a function of 
 sales data and text match. For eg if the search query  is java I would 
 like to return the above mentioned doc and all other author documents which 
 has the term java in them. I came up with the following query:

 {
query: {
   function_score: {
  boost_mode: replace,
  query: {
 match: { bookName:java}
  },
  script_score: {
 params: {
param1: 2
 },
  script: doc['books.sales.count'].isEmpty() ? _score : 
 _score * doc['books.sales.count'].value * param1 
  }
   }
}
 }


 I have few questions with the query i have above
 1. The results dont look sorted by sales. I have authors who dont have any 
 books with sales in them at the top
 2. How do i use the sum of all sales for an author (across all books 
 within the author document) in the script ? Is there a sum function for the 
 nested fields inside a document when using script_score ? Note that sales 
 is a nested field inside another nested field products.
 3. As a next step I would also like to use a filter for keyword within the 
 script_score to only include sales whose keyword value matches with the 
 search query term

 Any help would be much appreciated. 

 Thanks
 Srini



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c86f6564-927d-4b9b-b9c6-4d8cdb4e72c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


got QueryPhaseExecutionException when using custom query parser

2014-08-26 Thread Peiyong Lin
Hi all,

I wrote my own custom query parser, and extended elasticsearch as a plugin, 
the code is in the following link.

query parser http://pastebin.mozilla.org/6172836
customized query http://pastebin.mozilla.org/6172837
plugin http://pastebin.mozilla.org/6172844

I used the default settings of Elasticsearch, and the document I PUT is 
{
  test: haha
}
{
  test: ahah
}

I used the query:
{
  query: {
backwards: {
  test: haha
}
}

And the error message I got is:

[2014-08-27 13:26:41,678][DEBUG][action.search.type   ] [Poison] 
[test][2], node[w4ORe_ERQBeOVpII3P9w1w], [P], s[STARTED]: Failed to execute 
[org.elasticsearch.action.search.SearchRequest@7e1416e] lastShard [true]
org.elasticsearch.search.query.QueryPhaseExecutionException: [test][2]: 
query[filtered(BackwardsQuery: 
test:ahah)-cache(_type:test)],from[0],size[10]: Query Failed [Failed to 
execute main query]
at 
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at 
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:261)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
org.elasticsearch.backwardstermquery.BackwardsTermQuery$BackwardsScorer.docID(BackwardsTermQuery.java:118)
at 
org.elasticsearch.backwardstermquery.BackwardsTermQuery$BackwardsScorer.nextDoc(BackwardsTermQuery.java:133)
at 
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
at 
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
at 
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)
at 
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:156)
... 7 more

I am very confused of it, could someone please point out what's wrong? 
Thank you so much!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2082f731-34c5-4d92-9fe0-439cef5fdabc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.