Index creation on Plugin instantiation

2015-05-26 Thread Thomas
Hi,

I'm experimenting with elasticsearch plugins creation and I'm trying to 
create an index (if missing) on plugin startup.

I wanted to ask what is the best place to add the code snippet for code 
creation? I have added it at an injected binding  with Client as 
constructor parameter but i get the following error:

no known master node, scheduling a retry
 [2015-05-26 12:03:27,289][ERROR][bootstrap] {1.4.1}: 
 Initialization Failed ...
 1) UncategorizedExecutionException[Failed execution]
 ExecutionException[java.lang.NullPointerException]
 NullPointerException


My guess is that Client is not ready yet to handle index creation requests, 
my code snippet is the following:

public class IndexCreator {

 private final String indexName;
 private final ESLogger LOG;

 @Inject
 public IndexCreator(Settings settings, Client client) {
 this.LOG = Loggers.getLogger(getClass(), settings);
 this.indexName = settings.get(metis.index.name, .metis);
 

   String indexName = .metis-registry;

IndicesExistsResponse resp = 
 client.admin().indices().prepareExists(indexName).get();

if (!resp.isExists()) {
client.admin().indices().prepareCreate(indexName).get();
} 

}
 }


And I add this as binding to my module

public class MyModule extends AbstractModule {

private final Settings settings;

public MyModule(Settings settings) {
this.settings = Preconditions.checkNotNull(settings);
}

@Override
protected void configure() {
bind(IndexCreator.class).asEagerSingleton();
}
}


But it produces the overmentioned error, any ideas?

Thanks,

Thomas

-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e7616e28-b6aa-45b0-989f-5ee7d55c02ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana 4 - ability to see source data from Dashboard

2015-04-14 Thread Thomas Bratt
A colleague just pointed out that you can add a search to the dashboard. 
Seems to work :)

On Tuesday, 14 April 2015 14:57:43 UTC+1, Thomas Bratt wrote:

 Hi,

 I can't seem to get access to the original data by drilling down on the 
 visualizations on the dashboard. Am I missing something?

 Many thanks,

 Thomas


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6b921ea9-404b-4f1a-9c11-b455304b7cb5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana 4 - ability to select a date range on dashboard that is reflected in other visualizations

2015-04-14 Thread Thomas Bratt
Hi,

I am using Kibana 4 with a Date Histogram. I can select a time range with 
the mouse but the other visualizations on the dashboard do not seem to 
update.  I only have data from today which might be affecting things.

Would appreciate it if someone could tell me how to get this to work :)

Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/672d0d69-a84c-4e51-aff9-9302d6805215%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana 4 - ability to see source data from Dashboard

2015-04-14 Thread Thomas Bratt
Hi,

I can't seem to get access to the original data by drilling down on the 
visualizations on the dashboard. Am I missing something?

Many thanks,

Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c79b522e-1524-40cf-a8fb-9670fec1b807%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana: Mark warnings as solved

2015-04-09 Thread Thomas Güttler
I know how to use a programming language and I could do start a own project.

But I would like to avoid it, since it leads to plubming. I guess other 
people have same use case,
and I would like to use (and improve) an existing project.

But I have not found any up to now.

How do other ELK users solve my use case?

I guess I am missing something.

Regards,
  Thomas Güttler


Am Mittwoch, 8. April 2015 11:02:35 UTC+2 schrieb James Green:

 Couldn't you update the document with a flag on a field?

 On 8 April 2015 at 09:43, Thomas Güttler h...@tbz-pariv.de javascript: 
 wrote:

 We are evaluating if ELK is the right tool for our logs and event 
 messages.

 We need a way to mark warnings as done. All warnings of this type 
 should be invisible in the future.

 Use case:

 There was a bug in our code and the dev team has created a fix. 
 Continuous Integration is running,
 and soon the bug in the production system will be gone.

 We need a way to mark the warnings as this type of warning is already 
 handled, and the 
 fix will be in the production system during the next three hours.

 Can you understand what I want?

 How to handle this with ELK?

 Just removing these logs from ElasticSearch is not a solution, since 
 during the next hours (after
 setting the flag done) new events can still come into the system.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ff5e0583-3f1d-4ba4-af38-ee0a4823afc2%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ff5e0583-3f1d-4ba4-af38-ee0a4823afc2%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6edd4558-7035-48d2-85b2-7e88f6571acc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana: Mark warnings as solved

2015-04-08 Thread Thomas Güttler
We are evaluating if ELK is the right tool for our logs and event messages.

We need a way to mark warnings as done. All warnings of this type should 
be invisible in the future.

Use case:

There was a bug in our code and the dev team has created a fix. Continuous 
Integration is running,
and soon the bug in the production system will be gone.

We need a way to mark the warnings as this type of warning is already 
handled, and the 
fix will be in the production system during the next three hours.

Can you understand what I want?

How to handle this with ELK?

Just removing these logs from ElasticSearch is not a solution, since during 
the next hours (after
setting the flag done) new events can still come into the system.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ff5e0583-3f1d-4ba4-af38-ee0a4823afc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ELK for logfiles

2015-03-27 Thread Thomas Güttler
Hi,

I am planing to use ELK for our log files.

I read docs about logstash, elasticsearch and kibana.

Still the whole picture is not solid. 

Especially the reporting area is something I can't understand up to now.

Kibana seems to be a great tool to do the visualization. 

But can I get the single log for debugging the root of problems?

Example: I see that 99 systems work fine, and 1 systems emits warnings.

Which interface could I use the see the logs in ElasticSearch 
of this system?

Needed features:

Show all logs from system foo in the period between 2015-03-27 00:00 and 
00:10 (ten minutes).

Show all logs with log level error of system foo in day 2015-03-27

Is Kibana the right tool for this?

Or am I on the wrong track?

Which tool could be used to analyze log data in ElasticSearch?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a03e8696-6320-4911-8f03-2f7f7a756a58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Using ELK to analyze log warnings and exceptions - and mark them as solved

2015-03-13 Thread Thomas Güttler
We run several servers running our code.

Of course there are bugs which cause exceptions and warnings since 
something unusual occurs.

I want to  analyze our logs to find unhandled warnings.

I am unsure if ELK can help us.

There need to be some way to aggregate warnings to a warning of type X (to 
remove duplicates).

If a warning was handled and solved, we need a way to mark the warnings of 
type X as solved.

The flag should only be set for a limited period of time (example 48 
hours). During this
time the new code should be deployed and the error should nor occur again.

If it sill occurs after N hours the warning should be visible again.

Can you understand what I want?

Can this be done with ELK, or I am on the wrong track?

Regards,
  Thomas Güttler

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d67d74ca-ab6a-4739-b119-63f52bbb7231%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Easy ELK Stack setup

2015-02-25 Thread Thomas Güttler
Hi,

I want to setup an ELK stack without wasting time. That's why I ask here 
before starting.

My environment is simple: all traffic comes in from localhost. There is 
only one server for the ELK setup.

But there will be several ELK stacks running in the future. But again each 
traffic will come in only from localhost.
The systems will run isolated. 

I see these solutions:

  - take a docker container

  - do it by hand (RPM install)

  - use Chef/Puppet. But up to now we don't use any of those tools.

  - any other idea?

What do you think?


Regards,
  Thomas Güttler

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b82b581c-cb25-47f3-83f2-7f6877c21ec4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Leaving out information from the response

2015-02-25 Thread Thomas Matthijs
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html

On Wed, Feb 25, 2015 at 9:13 AM, James m...@employ.com wrote:

 Hi,

 I want to have certain data in my elasticsearch index but I don't want it
 to be returned with a query. At the moment it seems to return every bit of
 data I have for each index and then I use my PHP application to hide it. Is
 it possible to select what fields elasticsearch returns in it's response to
 my PHP application.

 For example for each time:

 Name
 Location
 Description
 Keywords
 Unique ID
 Create date

 I just want to have in the response from elasticsearch:

 Name
 Location
 Description


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/089c53f5-0aa4-48b5-acb4-df4d6ccfee13%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/089c53f5-0aa4-48b5-acb4-df4d6ccfee13%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABY_-Z4x%3DgOq1EbtvsLLRgQn1Ad8Zd5QzpubZaL5KA98p7J2Xw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana 4 behind reverse proxy. Is it possible?

2015-02-06 Thread Cijo Thomas
I hit the same issue when accessing the site using DNS name. When I am in
the machine, http:localhost:/ works though. Have not figured out the
fix for this yet. It seems like Kibana 4 CORS issue.

On Thu, Jan 29, 2015 at 3:38 PM, Konstantin Erman kon...@gmail.com wrote:

 Yes, Kibana 4 beta 3. And I have just one URL rewrite rule (pictured).
 Were you getting the same error when it was not working for you?


 https://lh3.googleusercontent.com/-oDiu_ncjJlA/VMrEJL-Qj_I/Aic/so2IvrgTQbY/s1600/RewriteRule.png


 On Thursday, January 29, 2015 at 3:31:56 PM UTC-8, Cijo Thomas wrote:

 Can you show your URL rewrite rules ? Also  are you using Kibana 4 beta 3
 ?

 On Thu, Jan 29, 2015 at 1:09 PM, Konstantin Erman kon...@gmail.com
 wrote:

 Unfortunately I could not replicate your success :-(

 Let me show you what I did in case you may be notice any difference from
 your case.


 https://lh6.googleusercontent.com/-HzQRKhGl9ag/VMqfkWnSF8I/Ah0/SsXrJlQ2vW8/s1600/Output_Caching.png


 https://lh6.googleusercontent.com/-V2VTx-iT888/VMqf0K7jChI/Ah8/qC7umA0XP_U/s1600/AppPool1.png


 https://lh6.googleusercontent.com/-4jL3Hyoq0QY/VMqgF7d0-II/AiE/77VOeAZP2e0/s1600/AppPool2.png


 https://lh5.googleusercontent.com/-aBFCh_BZKn4/VMqgnM9ejhI/AiM/zxnsdD-VK8U/s1600/Error.png
 Any ideas what I may be missing?

 Thanks!
 Konstantin

 On Thursday, January 29, 2015 at 10:13:40 AM UTC-8, Cijo Thomas wrote:

 I have been fighting with this for quite some time, Finally found the
 workaround. Let me know if it helps you!

 On Thu, Jan 29, 2015 at 10:12 AM, Konstantin Erman kon...@gmail.com
 wrote:

 Thank you for the good news! I'm a little swamped currently, but I
 will definitely give it a try when I get a minute.

 Just to make sure - disable Output cache for the website - where is
 it in IIS Management Console?


 On Wednesday, January 28, 2015 at 4:38:01 PM UTC-8, Cijo Thomas wrote:

 Its possible to use IIS with the following steps.
 1) Disable Output cache for the website you are using as reverse
 proxy.
 2) Run the website in a new apppool, which do not have any managed
 code.

 With the above two steps, kibana4 runs fine with IIS as reverse proxy.


 On Saturday, December 27, 2014 at 4:19:31 PM UTC-8, Konstantin Erman
 wrote:

 We currently use Kibana 3 hosted in IIS behind IIS reverse proxy for
 auhentication. Naturally we look at Kibana 4 Beta 3 expecting it to 
 replace
 Kibana 3 soon. Kibana 4 is self hosted and works nicely when accessed
 directly, but we need authentication and whatever I do I cannot make it
 work from behind reverse proxy! Early or later I get 401 accessing some
 internal resource.

 Wonder if anybody hit similar problem and have any insight how to
 make it work.
 We cannot use Shield as its price is way beyond our bounds.

 Thanks!
 Konstantin

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit https://groups.google.com/d/to
 pic/elasticsearch/r_uDcHR-rrw/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40goo
 glegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 --
 Warm Regards,
 Cijo Thomas
 +1 3125606441
 #14b380f1ef4f9e0e_CAH6LTpE1RoLVPN1aQWeJm-6nyWiovzr=hGudAmUeAGGgBAuYWg@mail.gmail.com_14b3786b590d6772_CAH6LTpEkRjF1kDbKEm5Frvwb6BBCH_Xhvm1hhKtMYtYCphrraQ@mail.gmail.com_SafeHtmlFilter_

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit https://groups.google.com/d/
 topic/elasticsearch/r_uDcHR-rrw/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/cbf5e8d0-6769-4ef5-b625-9a6457fac86c%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/cbf5e8d0-6769-4ef5-b625-9a6457fac86c%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 --
 Warm Regards,
 Cijo Thomas
 +1 3125606441
 #14b380f1ef4f9e0e_CAH6LTpE1RoLVPN1aQWeJm-6nyWiovzr=hGudAmUeAGGgBAuYWg@mail.gmail.com_SafeHtmlFilter_

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/r_uDcHR-rrw/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https

Re: Kibana 4 behind reverse proxy. Is it possible?

2015-01-29 Thread Cijo Thomas
I have been fighting with this for quite some time, Finally found the
workaround. Let me know if it helps you!

On Thu, Jan 29, 2015 at 10:12 AM, Konstantin Erman kon...@gmail.com wrote:

 Thank you for the good news! I'm a little swamped currently, but I will
 definitely give it a try when I get a minute.

 Just to make sure - disable Output cache for the website - where is it
 in IIS Management Console?


 On Wednesday, January 28, 2015 at 4:38:01 PM UTC-8, Cijo Thomas wrote:

 Its possible to use IIS with the following steps.
 1) Disable Output cache for the website you are using as reverse proxy.
 2) Run the website in a new apppool, which do not have any managed code.

 With the above two steps, kibana4 runs fine with IIS as reverse proxy.


 On Saturday, December 27, 2014 at 4:19:31 PM UTC-8, Konstantin Erman
 wrote:

 We currently use Kibana 3 hosted in IIS behind IIS reverse proxy for
 auhentication. Naturally we look at Kibana 4 Beta 3 expecting it to replace
 Kibana 3 soon. Kibana 4 is self hosted and works nicely when accessed
 directly, but we need authentication and whatever I do I cannot make it
 work from behind reverse proxy! Early or later I get 401 accessing some
 internal resource.

 Wonder if anybody hit similar problem and have any insight how to make
 it work.
 We cannot use Shield as its price is way beyond our bounds.

 Thanks!
 Konstantin

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/r_uDcHR-rrw/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Warm Regards,
Cijo Thomas
+1 3125606441 #SafeHtmlFilter_

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6LTpEkRjF1kDbKEm5Frvwb6BBCH_Xhvm1hhKtMYtYCphrraQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana 4 behind reverse proxy. Is it possible?

2015-01-29 Thread Cijo Thomas
Can you show your URL rewrite rules ? Also  are you using Kibana 4 beta 3 ?

On Thu, Jan 29, 2015 at 1:09 PM, Konstantin Erman kon...@gmail.com wrote:

 Unfortunately I could not replicate your success :-(

 Let me show you what I did in case you may be notice any difference from
 your case.


 https://lh6.googleusercontent.com/-HzQRKhGl9ag/VMqfkWnSF8I/Ah0/SsXrJlQ2vW8/s1600/Output_Caching.png


 https://lh6.googleusercontent.com/-V2VTx-iT888/VMqf0K7jChI/Ah8/qC7umA0XP_U/s1600/AppPool1.png


 https://lh6.googleusercontent.com/-4jL3Hyoq0QY/VMqgF7d0-II/AiE/77VOeAZP2e0/s1600/AppPool2.png


 https://lh5.googleusercontent.com/-aBFCh_BZKn4/VMqgnM9ejhI/AiM/zxnsdD-VK8U/s1600/Error.png
 Any ideas what I may be missing?

 Thanks!
 Konstantin

 On Thursday, January 29, 2015 at 10:13:40 AM UTC-8, Cijo Thomas wrote:

 I have been fighting with this for quite some time, Finally found the
 workaround. Let me know if it helps you!

 On Thu, Jan 29, 2015 at 10:12 AM, Konstantin Erman kon...@gmail.com
 wrote:

 Thank you for the good news! I'm a little swamped currently, but I will
 definitely give it a try when I get a minute.

 Just to make sure - disable Output cache for the website - where is it
 in IIS Management Console?


 On Wednesday, January 28, 2015 at 4:38:01 PM UTC-8, Cijo Thomas wrote:

 Its possible to use IIS with the following steps.
 1) Disable Output cache for the website you are using as reverse proxy.
 2) Run the website in a new apppool, which do not have any managed code.

 With the above two steps, kibana4 runs fine with IIS as reverse proxy.


 On Saturday, December 27, 2014 at 4:19:31 PM UTC-8, Konstantin Erman
 wrote:

 We currently use Kibana 3 hosted in IIS behind IIS reverse proxy for
 auhentication. Naturally we look at Kibana 4 Beta 3 expecting it to 
 replace
 Kibana 3 soon. Kibana 4 is self hosted and works nicely when accessed
 directly, but we need authentication and whatever I do I cannot make it
 work from behind reverse proxy! Early or later I get 401 accessing some
 internal resource.

 Wonder if anybody hit similar problem and have any insight how to make
 it work.
 We cannot use Shield as its price is way beyond our bounds.

 Thanks!
 Konstantin

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit https://groups.google.com/d/
 topic/elasticsearch/r_uDcHR-rrw/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d089ed71-71c4-4997-93fa-2dc7125b7f49%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




 --
 Warm Regards,
 Cijo Thomas
 +1 3125606441
 #14b3786b590d6772_CAH6LTpEkRjF1kDbKEm5Frvwb6BBCH_Xhvm1hhKtMYtYCphrraQ@mail.gmail.com_SafeHtmlFilter_

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/r_uDcHR-rrw/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/cbf5e8d0-6769-4ef5-b625-9a6457fac86c%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/cbf5e8d0-6769-4ef5-b625-9a6457fac86c%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Warm Regards,
Cijo Thomas
+1 3125606441 #SafeHtmlFilter_

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6LTpE1RoLVPN1aQWeJm-6nyWiovzr%3DhGudAmUeAGGgBAuYWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana 4 behind reverse proxy. Is it possible?

2015-01-28 Thread Cijo Thomas
Its possible to use IIS with the following steps.
1) Disable Output cache for the website you are using as reverse proxy.
2) Run the website in a new apppool, which do not have any managed code.

With the above two steps, kibana4 runs fine with IIS as reverse proxy.


On Saturday, December 27, 2014 at 4:19:31 PM UTC-8, Konstantin Erman wrote:

 We currently use Kibana 3 hosted in IIS behind IIS reverse proxy for 
 auhentication. Naturally we look at Kibana 4 Beta 3 expecting it to replace 
 Kibana 3 soon. Kibana 4 is self hosted and works nicely when accessed 
 directly, but we need authentication and whatever I do I cannot make it 
 work from behind reverse proxy! Early or later I get 401 accessing some 
 internal resource. 

 Wonder if anybody hit similar problem and have any insight how to make it 
 work. 
 We cannot use Shield as its price is way beyond our bounds. 

 Thanks! 
 Konstantin

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7355f0ec-1bac-4e62-b675-a5f23d04ef7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Out of memory on start with 38GB index

2015-01-15 Thread Thomas Cataldo
(BufferedUpdatesStream.java:287)

at 
org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271)

at 
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262)

at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421)

at 
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292)

at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:267)

at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:257)

at 
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:171)

at 
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)

at 
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)

at 
org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)

at 
org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:225)

at 
org.elasticsearch.index.engine.internal.InternalEngine.refresh(InternalEngine.java:796)

at 
org.elasticsearch.index.engine.internal.InternalEngine.delete(InternalEngine.java:692)

at 
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:798)

at 
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:268)

at 
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

[2015-01-14 12:01:32,238][DEBUG][index.service] [Saint Elmo] 
[mailspool] [1] closing... (reason: [engine failure, message [refresh 
failed][OutOfMemoryError[Java heap space]]])

[2015-01-14 12:01:32,238][DEBUG][index.shard.service  ] [Saint Elmo] 
[mailspool][1] state: [RECOVERING]-[CLOSED], reason [engine failure, 
message [refresh failed][OutOfMemoryError[Java heap space]]]

[2015-01-14 12:01:32,315][DEBUG][index.service] [Saint Elmo] 
[mailspool] [1] closed (reason: [engine failure, message [refresh 
failed][OutOfMemoryError[Java heap space]]])


I tried adding a few settings to my elasticsearch.yml as suggested in the 
referenced issue :

index.load_fixed_bitset_filters_eagerly: false

index.warmer.enabled: false
indices.breaker.total.limit: 30% 

But none of this settings seems to work for me.

Our mapping is visible here 
: 
http://git.blue-mind.net/gitlist/bluemind/blob/master/esearch/config/templates/mailspool.json

It is used to store a full text index of emails. It uses parent / child 
structure :
The msgBody type contains the full text of the messages and attachments.
The msg type contains user flags (unread, important, folder it is store 
into, etc).

We use this structure as msg is often updated : mails are often marked as 
read or moved. The msgBody can be pretty big so we don't want to update the 
whole document when a simple email flag is changed.

Does this kind of index structure reminds a particular bug or required 
setting ? Any rule of thumb to size memory regarding to index size on disk ?

Regards,
Thomas.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/580ef748-abe9-44f6-ab4e-e388fe5b7803%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: out of memory at startup with large index and parent/child relation

2015-01-14 Thread Thomas Cataldo
Hi,

By removing all my translog files, ES can start without error.


On Wednesday, January 14, 2015 at 2:56:48 PM UTC+1, Thomas Cataldo wrote:

 Hi,

 I encounter a problem with a large index (38GB) that prevents ES 1.4.2 
 from starting.
 The problem looks pretty similar to the one in 
 https://github.com/elasticsearch/elasticsearch/issues/8394

 I tried some of the recommandations from this post (and linked ones) :

 index.load_fixed_bitset_filters_eagerly: false
 index.warmer.enabled: false
 indices.breaker.total.limit: 30%

 And event with that, my server does not start [1].

 I uploaded to gist the mapping for the index : 
 https://gist.github.com/tcataldo/c0b6b3dfec9823bf6523

 I tried several OS memory, ES heap combinations, the biggest being
 48GiB for the operating system and 32GiB for ES heap and it still
 fails with that.

 Any idea or link to an open issue I could follow ?

 Regards,
 Thomas.



 1. debug output:

 [2015-01-14 12:01:55,740][DEBUG][indices.cluster  ] [Saint Elmo] 
 [mailspool][0] creating shard
 [2015-01-14 12:01:55,741][DEBUG][index.service] [Saint Elmo] 
 [mailspool] creating shard_id [0]
 [2015-01-14 12:01:56,041][DEBUG][index.deletionpolicy ] [Saint Elmo] 
 [mailspool][0] Using [keep_only_last] deletion policy
 [2015-01-14 12:01:56,041][DEBUG][index.merge.policy   ] [Saint Elmo] 
 [mailspool][0] using [tiered] merge mergePolicy with 
 expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_on\
 ce[10], max_merge_at_once_explicit[30], max_merged_segment[5gb], 
 segments_per_tier[10.0], reclaim_deletes_weight[2.0]
 [2015-01-14 12:01:56,041][DEBUG][index.merge.scheduler] [Saint Elmo] 
 [mailspool][0] using [concurrent] merge scheduler with max_thread_count[2], 
 max_merge_count[4]
 [2015-01-14 12:01:56,042][DEBUG][index.shard.service  ] [Saint Elmo] 
 [mailspool][0] state: [CREATED]
 [2015-01-14 12:01:56,043][DEBUG][index.translog   ] [Saint Elmo] 
 [mailspool][0] interval [5s], flush_threshold_ops [2147483647], 
 flush_threshold_size [200mb], flush_threshold_period [3\
 0m]
 [2015-01-14 12:01:56,044][DEBUG][index.shard.service  ] [Saint Elmo] 
 [mailspool][0] state: [CREATED]-[RECOVERING], reason [from gateway]
 [2015-01-14 12:01:56,044][DEBUG][index.gateway] [Saint Elmo] 
 [mailspool][0] starting recovery from local ...
 [2015-01-14 12:01:56,048][DEBUG][river.cluster] [Saint Elmo] 
 processing [reroute_rivers_node_changed]: execute
 [2015-01-14 12:01:56,048][DEBUG][river.cluster] [Saint Elmo] 
 processing [reroute_rivers_node_changed]: no change in cluster_state
 [2015-01-14 12:01:56,048][DEBUG][cluster.service  ] [Saint Elmo] 
 processing [shard-failed ([mailspool][3], node[gOgAuHo4SXyfyuPpws0Usw], 
 [P], s[INITIALIZING]), reason [engine failure, \
 message [refresh failed][OutOfMemoryError[Java heap space: done 
 applying updated cluster_state (version: 4)
 [2015-01-14 12:01:56,062][DEBUG][index.engine.internal] [Saint Elmo] 
 [mailspool][0] starting engine
 [2015-01-14 12:02:19,701][WARN ][index.engine.internal] [Saint Elmo] 
 [mailspool][0] failed engine [refresh failed]
 java.lang.OutOfMemoryError: Java heap space
 at org.apache.lucene.util.FixedBitSet.init(FixedBitSet.java:187)
 at 
 org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
 at 
 org.elasticsearch.index.cache.filter.weighted.WeightedFilterCache$FilterCacheFilterWrapper.getDocIdSet(WeightedFilterCache.java:177)
 at 
 org.elasticsearch.common.lucene.search.OrFilter.getDocIdSet(OrFilter.java:55)
 at 
 org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:46)
 at 
 org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:130)
 at 
 org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
 at 
 org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:136)
 at 
 org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFilter.java:59)
 at 
 org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:554)
 at 
 org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:287)
 at 
 org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271)
 at 
 org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262)
 at 
 org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421)
 at 
 org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292)
 at 
 org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:267)
 at 
 org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:257

out of memory at startup with large index and parent/child relation

2015-01-14 Thread Thomas Cataldo
Hi,

I encounter a problem with a large index (38GB) that prevents ES 1.4.2 from 
starting.
The problem looks pretty similar to the one in 
https://github.com/elasticsearch/elasticsearch/issues/8394

I tried some of the recommandations from this post (and linked ones) :

index.load_fixed_bitset_filters_eagerly: false
index.warmer.enabled: false
indices.breaker.total.limit: 30%

And event with that, my server does not start [1].

I uploaded to gist the mapping for the index : 
https://gist.github.com/tcataldo/c0b6b3dfec9823bf6523

I tried several OS memory, ES heap combinations, the biggest being
48GiB for the operating system and 32GiB for ES heap and it still
fails with that.

Any idea or link to an open issue I could follow ?

Regards,
Thomas.



1. debug output:

[2015-01-14 12:01:55,740][DEBUG][indices.cluster  ] [Saint Elmo] 
[mailspool][0] creating shard
[2015-01-14 12:01:55,741][DEBUG][index.service] [Saint Elmo] 
[mailspool] creating shard_id [0]
[2015-01-14 12:01:56,041][DEBUG][index.deletionpolicy ] [Saint Elmo] 
[mailspool][0] Using [keep_only_last] deletion policy
[2015-01-14 12:01:56,041][DEBUG][index.merge.policy   ] [Saint Elmo] 
[mailspool][0] using [tiered] merge mergePolicy with 
expunge_deletes_allowed[10.0], floor_segment[2mb], max_merge_at_on\
ce[10], max_merge_at_once_explicit[30], max_merged_segment[5gb], 
segments_per_tier[10.0], reclaim_deletes_weight[2.0]
[2015-01-14 12:01:56,041][DEBUG][index.merge.scheduler] [Saint Elmo] 
[mailspool][0] using [concurrent] merge scheduler with max_thread_count[2], 
max_merge_count[4]
[2015-01-14 12:01:56,042][DEBUG][index.shard.service  ] [Saint Elmo] 
[mailspool][0] state: [CREATED]
[2015-01-14 12:01:56,043][DEBUG][index.translog   ] [Saint Elmo] 
[mailspool][0] interval [5s], flush_threshold_ops [2147483647], 
flush_threshold_size [200mb], flush_threshold_period [3\
0m]
[2015-01-14 12:01:56,044][DEBUG][index.shard.service  ] [Saint Elmo] 
[mailspool][0] state: [CREATED]-[RECOVERING], reason [from gateway]
[2015-01-14 12:01:56,044][DEBUG][index.gateway] [Saint Elmo] 
[mailspool][0] starting recovery from local ...
[2015-01-14 12:01:56,048][DEBUG][river.cluster] [Saint Elmo] 
processing [reroute_rivers_node_changed]: execute
[2015-01-14 12:01:56,048][DEBUG][river.cluster] [Saint Elmo] 
processing [reroute_rivers_node_changed]: no change in cluster_state
[2015-01-14 12:01:56,048][DEBUG][cluster.service  ] [Saint Elmo] 
processing [shard-failed ([mailspool][3], node[gOgAuHo4SXyfyuPpws0Usw], 
[P], s[INITIALIZING]), reason [engine failure, \
message [refresh failed][OutOfMemoryError[Java heap space: done 
applying updated cluster_state (version: 4)
[2015-01-14 12:01:56,062][DEBUG][index.engine.internal] [Saint Elmo] 
[mailspool][0] starting engine
[2015-01-14 12:02:19,701][WARN ][index.engine.internal] [Saint Elmo] 
[mailspool][0] failed engine [refresh failed]
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.FixedBitSet.init(FixedBitSet.java:187)
at 
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
at 
org.elasticsearch.index.cache.filter.weighted.WeightedFilterCache$FilterCacheFilterWrapper.getDocIdSet(WeightedFilterCache.java:177)
at 
org.elasticsearch.common.lucene.search.OrFilter.getDocIdSet(OrFilter.java:55)
at 
org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:46)
at 
org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:130)
at 
org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
at 
org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:136)
at 
org.apache.lucene.search.QueryWrapperFilter$1.iterator(QueryWrapperFilter.java:59)
at 
org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:554)
at 
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:287)
at 
org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271)
at 
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262)
at 
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:267)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:257)
at 
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:171)
at 
org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118

RepositoryMissingException when restoring into a new cluster

2015-01-06 Thread Thomas Ardal
I'm using the snapshot/restore feature of Elasticsearch, together with the 
Azure plugin to backup snapshots to Azure blob storage. Everything works 
when doing snapshots from a cluster and restoring to the same cluster. Now 
I'm in a situation where I want to restore an entirely new cluster (let's 
call that cluster b) from a snapshot generated from cluster a. When I run a 
restore request on cluster b, I get a 404. Doing a _status call on the 
snapshot, I get the same error:

{error:RepositoryMissingException[[elasticsearch_logs] 
missing],status:404}


The new cluster is configured with the Azure plugin and the same settings for 
Azure. I guess the error is caused by the fact, that Elasticsearch generates 
some metadata about the snapshots and store them locally in a the _snapshot 
index and this index is not on the new cluster. The same error is happening if 
I delete the data dir on cluster a and try to restore cluster a from a snapshot.


How would I deal with a situation like this?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5a0147c-cc38-4947-8530-0c66eb00fc2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: RepositoryMissingException when restoring into a new cluster

2015-01-06 Thread Thomas Ardal
That was exactly what I was missing. I didn't create the repository named 
elasticsearch_logs on cluster B. After I created it, the backup runs 
smoothly.

Thanks, David!

On Wednesday, January 7, 2015 8:31:08 AM UTC+1, David Pilato wrote:

 Did you create the repository on cluster B?
 How?

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 7 janv. 2015 à 08:19, Thomas Ardal thoma...@gmail.com javascript: 
 a écrit :

 I'm using the snapshot/restore feature of Elasticsearch, together with the 
 Azure plugin to backup snapshots to Azure blob storage. Everything works 
 when doing snapshots from a cluster and restoring to the same cluster. Now 
 I'm in a situation where I want to restore an entirely new cluster (let's 
 call that cluster b) from a snapshot generated from cluster a. When I run a 
 restore request on cluster b, I get a 404. Doing a _status call on the 
 snapshot, I get the same error:

 {error:RepositoryMissingException[[elasticsearch_logs] 
 missing],status:404}


 The new cluster is configured with the Azure plugin and the same settings for 
 Azure. I guess the error is caused by the fact, that Elasticsearch generates 
 some metadata about the snapshots and store them locally in a the _snapshot 
 index and this index is not on the new cluster. The same error is happening 
 if I delete the data dir on cluster a and try to restore cluster a from a 
 snapshot.


 How would I deal with a situation like this?

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/b5a0147c-cc38-4947-8530-0c66eb00fc2a%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/b5a0147c-cc38-4947-8530-0c66eb00fc2a%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b1da7ff-9dd5-49ab-bd1f-34b38b1e0645%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Frontend webapp and boilerplate query code

2015-01-02 Thread Thomas
Yes,

Let's say that you wan to represent a pie graph with some short of 
aggregated data in it extracted from elasticsearch. Instead of writing the 
query in javascript or having it client side in the code we need something 
like a simple call of a get API for instance getMyPieData() from another 
service and get the data to represent that information.

If we let elasticsearch to do that we may want to use search templates:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-template.html#pre-registered-templates

if we want to as well cache that query we may use elasticsearch query cache:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-shard-query-cache.html#index-modules-shard-query-cache

That is what elasticsearch more or less provides for my use case afaik. My 
question is if there is something else that I can do to cache and hide my 
queries apart from the over-mentioned solutions which make me bound to 
elasticsearch(some separate module, technology etc. based on best 
practises), while in another case I might want to change elasticsearch with 
something else and still not affect my frontend code, or more important not 
to constantly hit elasticsearch for those data.

I'm trying to first verify that I will not reinvent the wheel and build my 
own solution

Thank you again 

On Friday, 2 January 2015 17:04:59 UTC+2, Thomas wrote:

 Hi,

 I wish everybody a happy new year, all the best for 2015, and in 
 continuation of the great success of ES,

 In our project we intend to create a simple webapp that will query 
 elasticsearch for insights. We do not want to directly query elasticsearch 
 for two reasons:

- security
- avoid boilerplate query code and to be able to decouple it

 What is the best way to achieve that? We are currently evaluating building 
 the frontend in python/django project. has anyone faced similiar task and 
 is it possible to share some thoughts?

 In other situations NGinX was a solution for security, but for avoiding 
 having all the boilerplate query code in client side (e.g. in javascript) 
 what is the most well established way?

 Finally, there are cases were some caching may be needed to avoid hitting 
 elasticsearch constantly for the same data, how is this tackled? We need to 
 build our own module to do all these?

 thank you in advance

 Thomas


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ba94df5-7f54-4ec3-87f5-7a038dd15ca7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch Frontend webapp and boilerplate query code

2015-01-02 Thread Thomas
Hi,

I wish everybody a happy new year, all the best for 2015, and in 
continuation of the great success of ES,

In our project we intend to create a simple webapp that will query 
elasticsearch for insights. We do not want to directly query elasticsearch 
for two reasons:

   - security
   - avoid boilerplate query code and to be able to decouple it

What is the best way to achieve that? We are currently evaluating building 
the frontend in python/django project. has anyone faced similiar task and 
is it possible to share some thoughts?

In other situations NGinX was a solution for security, but for avoiding 
having all the boilerplate query code in client side (e.g. in javascript) 
what is the most well established way?

Finally, there are cases were some caching may be needed to avoid hitting 
elasticsearch constantly for the same data, how is this tackled? We need to 
build our own module to do all these?

thank you in advance

Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7bdce45e-eff8-4f04-9a9b-f3256a9b28a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: using a nested object field in a multi_match query

2014-12-30 Thread thomas . vaughan


On Wednesday, December 10, 2014 4:33:12 PM UTC-3, thomas@beatport.com 
wrote:



 On Monday, August 11, 2014 1:29:56 PM UTC-4, Mike Topper wrote:

 Hello,

 I'm having trouble coming up with how to supply a field within a nested 
 object in the multi_match fields list.  I'm using the multi_match query in 
 order to perform query time field boosting, but something like:


   query: {
 multi_match: {
   query: China Mieville,
   operator: and,
   fields: [
 _all, title^2, author.name^1.5
   ]
 }
   }

 doesn't seem to work.  the title is boosted fine but in fact if i take 
 out the _all field then i can see that author.name is never being 
 used.  is there a way to supply nested fields within a multi_match query?


 I've just been bit by this too. Anyone know how to make this work?


In our case we switched the mapping type from nested to object and then 
this worked. I'm aware of the implications of this switch. We don't need 
the features provided by nested. Others may, of course.

Thanks.

-Tom

 



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/857a9674-4661-4730-9ec8-79ba3426a603%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: using a nested object field in a multi_match query

2014-12-10 Thread thomas . vaughan


On Monday, August 11, 2014 1:29:56 PM UTC-4, Mike Topper wrote:

 Hello,

 I'm having trouble coming up with how to supply a field within a nested 
 object in the multi_match fields list.  I'm using the multi_match query in 
 order to perform query time field boosting, but something like:


   query: {
 multi_match: {
   query: China Mieville,
   operator: and,
   fields: [
 _all, title^2, author.name^1.5
   ]
 }
   }

 doesn't seem to work.  the title is boosted fine but in fact if i take out 
 the _all field then i can see that author.name is never being used.  is 
 there a way to supply nested fields within a multi_match query?


I've just been bit by this too. Anyone know how to make this work?

Thanks.

-Tom


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79272696-3745-4ce7-93e3-44d5b4cdd75e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance issue while indexing lot of documents

2014-11-06 Thread Thomas Matthijs
On Thu, Nov 6, 2014 at 11:09 AM, Moshe Recanati re.mo...@gmail.com wrote:

 // bulkRequest = client.prepareBulk();



Please fix your code to clearly only send 1000 in a bulk request.
Looks like you are just increasnig the size of the bulk request now and
executing it over and over

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABY_-Z7zNQJAqMdpry2hvuyK80UY-XPFHRw1YR23SKcrrZ%2B6%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Modify the index setting after the index created ? what's the function of search_quote_analyzer ?

2014-10-27 Thread Thomas Christie
Bump, I'm having the same problem.

On Thursday, June 12, 2014 10:32:14 PM UTC-5, Ivan Ji wrote:

 Hi all,

 I want to modify one field's search analyzer from standard to keyword 
 after the index created. So I try to PUT mapping :

 $ curl -XPUT 'http://localhost:9200/qindex/main/_mapping' -d '
 {
 main : {
 properties : {
 name : { type: string, index: analyzed, 
 index_analyzer: filename_ngram, search_analyzer: keyword}
 }
 }
 }
 '


 The operation seems succeed. Because I expect it might conflict, what 
 would the situations that conflict might occur? This is my first question.

 Anyway then I try to get the mapping out: (partial)

   name: {
 type: string,
 index_analyzer: filename_ngram,
 search_analyzer: keyword,
 include_in_all: true,
 search_quote_analyzer: standard
 }


  So I am wondering whether my operation succeeded? and what is the 
 search_quote_analyzer function?  And it still remains standard, does it 
 matter?

 Could anyone answer me these questions?

 Cheers,

 Ivan


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d3ec9347-931e-43bf-a199-d667a43f42a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Floating point precision in response

2014-09-17 Thread Thomas
Hi,

I have a quick question with regards the response of numeric values. I 
perform an aggregation with the sum aggregation and when I get back the 
response in a curl request the number is shown as follows:


aggs:{
   day_clicks:{
 sum: {
field : clicks
 }
   }
 }


response

...
 doc_count: 384,
 day_clicks: {
value: 
 *2.7372883E7*},
 


If you noticed the *E7* which is the floating point instead of just 
printing the actual number:

...
  value: 
 *27372883**...*


Has anyone faced a similar case? At what level is this happening? in 
elastiscearch's response or later. I have noticed in marvel/sense that the 
response is coming in this way and the transformation is happening client 
side. Is there a way to change that in the response of ES?

Thank you very much

Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e99a0e58-ea22-49a4-974e-8796c7941c74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using ES as a primary datastore.

2014-09-17 Thread Thomas
Hi,

You have to calculate the volumes you will keep in one shard first then you 
have to break your volumes into the number of shards you will maintain and 
then scale accordingly into a number of nodes, or at least as your volumes 
grow you should grow your cluster as well.

It is difficult to predict what problems may arise it is too generic your 
case, what will be the usage of the cluster? what queries you will perform, 
you will mostly do indexing and occasionally querying or you will 
intensively query your data.

Most important you need to  think how you will partition your data, will 
you have one index, multiple index like a logstash approach? or not
Maybe check here: https://www.found.no/foundation/sizing-elasticsearch/

For data more than a year what you will do delete them? Do you afford to 
lose data? Will you keep backups?

IMHO, these are some of the questions you must answer in order to see 
whether such an approach suit your needs. It is hardware, structure and 
partitioning of your data.

Thomas

On Wednesday, 17 September 2014 13:41:55 UTC+3, P Suman wrote:

 Hello,

  We are planning to use ES as a primary datastore. 

 Here is my usecase

 We receive a million transactions per day (all are inserts). 
 Each transaction is around 500KB size, transaction has 10 fields we should 
 be able to search on all 10 fields. 
 We want to keep around 1 yr worth of data, this comes around 180TB

 Can you please let me know any problems that might arise if i use elastic 
 search as the primary datastore.



 Regards,
 Suman






-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch script execution on missing field

2014-09-17 Thread Thomas
I think the correct way to see if there is a missing field is the following

doc['countryid'].empty == true

Check also:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields

btw why such an old version of ES?

Thomas

On Wednesday, 17 September 2014 13:53:08 UTC+3, Manoj wrote:

 I am currently using ES version 0.19. For a feature requirement, I wanted 
 to execute script on missing field through terms facet. The curl which I 
 tried is something like below

 code
 {
 query: {
 term: {
 content: deep
 }
 },
 filter: {
 and: {
 filters: [
 {
 type: {
 value: twitter
 }
 }
 ]
 }
 },
 facets: {
 loca: {
 terms: {
 field: countryid,
 script: doc['countryid']==null?1:doc['countryid'].value
 }
 }
 }
 }
 /code

 I assume that missing fields can be accessed by the condition 
 doc['countryid']==null. But it looks like this is not the way to identify 
 the missing field in script :-(

 For which am always receiving response as missing

 code{
   took : 1,
   timed_out : false,
   _shards : {
 total : 6,
 successful : 6,
 failed : 0
   },
   hits : {
 total : 0,
 max_score : null,
 hits : [ ]
   },
   facets : {
 loca : {
   _type : terms,
   missing : 1,
   total : 0,
   other : 0,
   terms : [ ]
 }
   }
 }/code

 Could anybody help me to get this correct.

 Thanks in advance, Manoj


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5a1acee-05c1-412d-b10a-d4235cd0b628%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indexing is becoming slow, what to look for?

2014-09-09 Thread Thomas
By setting this parameter, some additional questions of mine have been 
generated:

By setting indices.memory.index_buffer_size to a specific node and not to 
all nodes of the cluster, will this configuration be taken into account 
from all nodes? Is it going to be cluster wide or only for index operations 
of the specific node? So do I need to set this up to all nodes one by one, 
do a restart and then see the effects?

Finally, if we index data into an index of 10 shards and I have 5 nodes, 
that means that the particular node will index to 2 shards, so 
the indices.memory.index_buffer_size will refer to those specific two 
shards?

Thank you very much

Thomas

On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote:

 Hi,

 I have been performing indexing operations in my elasticsearch cluster for 
 some time now. Suddenly, I have been facing some latency while indexing and 
 I'm trying to find the reason for it. 

 Details:

 I have a custom process which is uploading every interval a number of logs 
 with bulk API. This process was taking about 5-7 minutes every time. For 
 some reason, the last days I noticed that the exact same procedure, same 
 volumes, takes about 15-20 minutes. While manipulating the data I run 
 update operations through scripting (groovy). My cluster is a set of 5 
 nodes, my first impression was that I need to scale therefore I added an 
 extra node. The problem seemed that it was solved but after a day again I 
 face the same issue. 

 Is it possible to give some ideas about what to check, or what seems to be 
 the issue? How is possible to check if a background process is running or 
 creating any issues (expunge etc.)? Does anyone has any similar problems?

 Any help appreciated, let me know what info to share

 ES version is 1.3.1
 JDK is 1.7

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7e8e0bb-aa71-4210-97db-6e8cd46cd79c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Indexing is becoming slow, what to look for?

2014-09-05 Thread Thomas
Hi,

I have been performing indexing operations in my elasticsearch cluster for 
some time now. Suddenly, I have been facing some latency while indexing and 
I'm trying to find the reason for it. 

Details:

I have a custom process which is uploading every interval a number of logs 
with bulk API. This process was taking about 5-7 minutes every time. For 
some reason, the last days I noticed that the exact same procedure, same 
volumes, takes about 15-20 minutes. While manipulating the data I run 
update operations through scripting (groovy). My cluster is a set of 5 
nodes, my first impression was that I need to scale therefore I added an 
extra node. The problem seemed that it was solved but after a day again I 
face the same issue. 

Is it possible to give some ideas about what to check, or what seems to be 
the issue? How is possible to check if a background process is running or 
creating any issues (expunge etc.)? Does anyone has any similar problems?

Any help appreciated, let me know what info to share

ES version is 1.3.1
JDK is 1.7

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fed6d402-d4bb-44ea-8de7-d66c2ec5cb91%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: aggregations

2014-09-05 Thread Thomas
What version of es have you been using, afaik in later versions you can 
control the percentage of heap space to utilize with update settings api, 
try to increase it a bit and see what happens, default is 60%, increase it 
for example to 70%:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#fielddata-circuit-breaker

T.

On Wednesday, 3 September 2014 19:58:02 UTC+3, navdeep agarwal wrote:

 hi ,

 i am bit new Elastic search ,while testing on elasticsearch's aggregation 
 feature ,i am always hitting data too large,i understand that aggregations 
 are very memory intensive , so is there any way query in ES where one 
 query's output can be  ingested to aggregation so that number of input to 
 aggregation is limited . i have used filter and querying before 
 aggregations .

 i have around 60 GB index on 5 shards .

 queries i tried:

 GET **/_search
 {
   query: {term: {
 file_sha2: {
   value: 
 }
   }}, 
   
   aggs: {
   top_filename: {
 max: {
   field: portalid
 }
   }
   
   }
 }

 ---

 GET /_search
 {
   
 aggs: {
   top filename: {
 filter: {term: {
   file_sha2: xx
 }},
 aggs: {
   top_filename: {
 max: {
   field: portalid
 }
   }
 }
   }
 }
 
 
   
 }


 thanks in advance .
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5ca4244c-972e-4adf-bb1d-1ef2134fcdd7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indexing is becoming slow, what to look for?

2014-09-05 Thread Thomas
Thx Michael,

I will read the post in detail and let you know for any findings

Thomas.

On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote:

 Hi,

 I have been performing indexing operations in my elasticsearch cluster for 
 some time now. Suddenly, I have been facing some latency while indexing and 
 I'm trying to find the reason for it. 

 Details:

 I have a custom process which is uploading every interval a number of logs 
 with bulk API. This process was taking about 5-7 minutes every time. For 
 some reason, the last days I noticed that the exact same procedure, same 
 volumes, takes about 15-20 minutes. While manipulating the data I run 
 update operations through scripting (groovy). My cluster is a set of 5 
 nodes, my first impression was that I need to scale therefore I added an 
 extra node. The problem seemed that it was solved but after a day again I 
 face the same issue. 

 Is it possible to give some ideas about what to check, or what seems to be 
 the issue? How is possible to check if a background process is running or 
 creating any issues (expunge etc.)? Does anyone has any similar problems?

 Any help appreciated, let me know what info to share

 ES version is 1.3.1
 JDK is 1.7

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/caf966f1-97a3-4b86-b900-dc36dcaa279e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indexing is becoming slow, what to look for?

2014-09-05 Thread Thomas
Hi,

I wanted to clarify something from the blog post you mentioned. You specify 
that based on calculations we should give at most ~512 MB indexing 
buffer per active shard What i wanted to ask is what do we mean with 
the term active? Do you mean the primary only or not?

Thank you again

On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote:

 Hi,

 I have been performing indexing operations in my elasticsearch cluster for 
 some time now. Suddenly, I have been facing some latency while indexing and 
 I'm trying to find the reason for it. 

 Details:

 I have a custom process which is uploading every interval a number of logs 
 with bulk API. This process was taking about 5-7 minutes every time. For 
 some reason, the last days I noticed that the exact same procedure, same 
 volumes, takes about 15-20 minutes. While manipulating the data I run 
 update operations through scripting (groovy). My cluster is a set of 5 
 nodes, my first impression was that I need to scale therefore I added an 
 extra node. The problem seemed that it was solved but after a day again I 
 face the same issue. 

 Is it possible to give some ideas about what to check, or what seems to be 
 the issue? How is possible to check if a background process is running or 
 creating any issues (expunge etc.)? Does anyone has any similar problems?

 Any help appreciated, let me know what info to share

 ES version is 1.3.1
 JDK is 1.7

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/901847bc-4e02-465f-a11f-9896e31d0e7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Indexing is becoming slow, what to look for?

2014-09-05 Thread Thomas
Got it thanks

On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote:

 Hi,

 I have been performing indexing operations in my elasticsearch cluster for 
 some time now. Suddenly, I have been facing some latency while indexing and 
 I'm trying to find the reason for it. 

 Details:

 I have a custom process which is uploading every interval a number of logs 
 with bulk API. This process was taking about 5-7 minutes every time. For 
 some reason, the last days I noticed that the exact same procedure, same 
 volumes, takes about 15-20 minutes. While manipulating the data I run 
 update operations through scripting (groovy). My cluster is a set of 5 
 nodes, my first impression was that I need to scale therefore I added an 
 extra node. The problem seemed that it was solved but after a day again I 
 face the same issue. 

 Is it possible to give some ideas about what to check, or what seems to be 
 the issue? How is possible to check if a background process is running or 
 creating any issues (expunge etc.)? Does anyone has any similar problems?

 Any help appreciated, let me know what info to share

 ES version is 1.3.1
 JDK is 1.7

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f10aa50-13e1-4dff-b44a-ac9ce4cbf2d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Cloud-aws version for 1.3.1 of elasticsearch

2014-07-30 Thread Thomas
Hi,

I wanted to ask whether the version of cloud-aws plugin is 2.1.1 for 
elasticsearch 1.3.1, by looking at the github page:
https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/es-1.3

How come the plugin version for 1.3.1 of elasticserach goes backwards? For 
elasticsearch 1.2.x the version of cloud-aws is 2.2.0.
Is this correct?

Thank you very much
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ccb42a1f-a0f0-40ed-81d6-96b0e1b279c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Integration testing a native script

2014-07-30 Thread Thomas
Hi,

I have tried the same approach and it worked for me, meaning to copy the 
script I want to perform an integration test and run my IT.

I do the following steps

1) Setup the required paths for elasticsearch

 final Settings settings
= settingsBuilder()
.put(http.enabled, true)
.put(path.conf, confDir)
.put(path.data, dataDir)
.put(path.work, workDir)
.put(path.logs, logsDir)


2) copy your scripts to the appropriate location
3) fire up a local node
 node = 
nodeBuilder().local(true).settings(settings).clusterName(nodeName).node();
 node.start();

Maybe you first start the node and then add the script, this might not work 
because i think es does a per minute scan for new scripts and the IT test 
does not allow this to happen, hence you should first copy your script and 
then start the node

Hope it helps

Thomas

On Wednesday, 30 July 2014 12:31:06 UTC+3, Nick T wrote:

 Is there a way to have a native java script accessible in integration 
 tests? In my integration tests I am creating a test node in the /tmp 
 folder. 

 I've tried copying the script to /tmp/plugins/scripts but that was quite 
 hopeful and unfortunately does not work.

 Desperate for help.

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4b257e79-e424-4573-8f12-81e0a95b27b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Integration testing a native script

2014-07-30 Thread Thomas
I have noticed that you mention native java script so you have implemented 
it as a plugin?
if so try the following in your settings:
   final Settings settings
= settingsBuilder()
 ...

.put(plugin.types, YourPlugin.class.getName())

Thomas


On Wednesday, 30 July 2014 12:31:06 UTC+3, Nick T wrote:

 Is there a way to have a native java script accessible in integration 
 tests? In my integration tests I am creating a test node in the /tmp 
 folder. 

 I've tried copying the script to /tmp/plugins/scripts but that was quite 
 hopeful and unfortunately does not work.

 Desperate for help.

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7baa9562-58be-4f16-9392-9cf07e4e989d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: 1.1.1 to 1.3 upgrade possible?

2014-07-29 Thread Thomas
Thnx Mark,

I can see that as you mentioned new version 1.3.1 has been released

Thomas

On Monday, 28 July 2014 11:11:57 UTC+3, Thomas wrote:

 Hi,

 I maintain a working cluster which is in version 1.1.1 and I'm planning to 
 upgrade to version 1.3.0 which is released the previous week. I wanted to 
 ask whether it is compatible to upgrade or whether I will have any known 
 issues/problems, what to expect in general.

 Thank you very much
 Thomas


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3d3cc8f0-b1c8-404f-be98-ff1133c6771d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


1.1.1 to 1.3 upgrade possible?

2014-07-28 Thread Thomas
Hi,

I maintain a working cluster which is in version 1.1.1 and I'm planning to 
upgrade to version 1.3.0 which is released the previous week. I wanted to 
ask whether it is compatible to upgrade or whether I will have any known 
issues/problems, what to expect in general.

Thank you very much
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6c5d7e6-150e-4756-9532-9b9d0beee58e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: 1.1.1 to 1.3 upgrade possible?

2014-07-28 Thread Thomas
Great,

thanks 4 your reply Mark

On Monday, 28 July 2014 11:11:57 UTC+3, Thomas wrote:

 Hi,

 I maintain a working cluster which is in version 1.1.1 and I'm planning to 
 upgrade to version 1.3.0 which is released the previous week. I wanted to 
 ask whether it is compatible to upgrade or whether I will have any known 
 issues/problems, what to expect in general.

 Thank you very much
 Thomas


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/21478ac1-82dc-4945-98ce-e176abddf3b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregation on parent/child documents

2014-07-25 Thread Thomas
Hi,

I wanted to ask whether is possible to perform aggregations combining 
parent/child documents, something similar with the nested aggregation and 
the reverse nested aggregation. It would be very helpful to have the 
ability to create for instance buckets based on parent document fields and 
get back aggregations that contain fields of both parent and children 
documents combined.

Any thoughts, future features to be added in the near releases, related to 
the above?

Thank you
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91d60d52-c538-45b5-8cf0-91cb1e9d9a9a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation on parent/child documents

2014-07-25 Thread Thomas
Hi Adrien and thank you for the reply,

This is exactly what i had in mind alongside with the reversed search 
equivalent with the reverse_nested, this is planed for version 1.4.0 
onwards as i see, will keep track of any updates on this, thanks

Thomas

On Friday, 25 July 2014 14:54:50 UTC+3, Thomas wrote:

 Hi,

 I wanted to ask whether is possible to perform aggregations combining 
 parent/child documents, something similar with the nested aggregation and 
 the reverse nested aggregation. It would be very helpful to have the 
 ability to create for instance buckets based on parent document fields and 
 get back aggregations that contain fields of both parent and children 
 documents combined.

 Any thoughts, future features to be added in the near releases, related to 
 the above?

 Thank you
 Thomas


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a6c7dfa1-d8b1-4ce5-8046-73892f74b33e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch init script for centos or rhel ?

2014-07-16 Thread Thomas Kuther
The one from the elasticsearch CentOS rpm repository works fine here on EL6.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html
(there are also 1.0 and 1.1 repos, simply adjust the baseurl)

The source is here:
https://github.com/elasticsearch/elasticsearch/blob/master/src/rpm/init.d/elasticsearch
..but I recommend the rpm from the repo because of /etc/sysconfig and
install locations etc, much easier that way.

~Tom

Am 16.07.2014 09:12, schrieb Aesop Wolf:
 Did you ever find a script that works on CentOS? I'm also looking for one.

 On Friday, March 14, 2014 9:18:04 AM UTC-7, Dominic Nicholas wrote:

 Thanks. 
 Does anyone know of a version that
 uses  /etc/rc.d/init.d/functions instead of /lib/lsb, that would
 work on CentOS and work with elasticsearch 1.0.1 ?
 Dom

 On Friday, March 14, 2014 9:24:12 AM UTC-4, David Pilato wrote:

 May be
 this? 
 https://github.com/elasticsearch/elasticsearch/blob/master/src/deb/init.d/elasticsearch
 
 https://github.com/elasticsearch/elasticsearch/blob/master/src/deb/init.d/elasticsearch

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 14 mars 2014 à 14:19, Dominic Nicholas
 dominic.s...@gmail.com a écrit :

 Hi - can someone please point me to an /etc/init.d script for
 elasticsearch 1.0.1 for CentOS or RHEL ?

 Thanks
 -- 
 You received this message because you are subscribed to the
 Google Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from
 it, send an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit
 
 https://groups.google.com/d/msgid/elasticsearch/25064596-595d-4227-be37-d20f267edc5b%40googlegroups.com
 
 https://groups.google.com/d/msgid/elasticsearch/25064596-595d-4227-be37-d20f267edc5b%40googlegroups.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout
 https://groups.google.com/d/optout.

 -- 
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a9e8017c-a565-40a6-944b-8920a591f6d6%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a9e8017c-a565-40a6-944b-8920a591f6d6%40googlegroups.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53C62A62.3040100%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Brian Thomas
I was just curious if there was a way of doing this without doing this, I 
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another 
property available also, like es.mapping.id.include.in.src where you could 
specify whether the src field actually gets included in the source 
document.  In elasticsearch, you can create and update documents without 
having to include the id in the source document, so I think it would make 
sense to be able to do that with elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

 You need to specify the id of the document you want to update somehow. 
 Since in es-hadoop things are batch focused, each 
 doc needs its own id specified somehow hence the use of 'es.mapping.id' 
 to indicate its value. 
 Is there a reason why this approach does not work for you - any 
 alternatives that you thought of? 

 Cheers, 

 On 7/7/14 10:48 PM, Brian Thomas wrote: 
  I am trying to update an elasticsearch index using elasticsearch-hadoop. 
  I am aware of the *es.mapping.id* 
  configuration where you can specify that field in the document to use as 
 an id, but in my case the source document does 
  not have the id (I used elasticsearch's autogenerated id when indexing 
 the document).  Is it possible to specify the id 
  to update without having the add a new field to the MapWritable object? 
  
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an email to 
  elasticsearc...@googlegroups.com javascript: mailto:
 elasticsearch+unsubscr...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
  
  
 https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=emailutm_source=footer.
  

  For more options, visit https://groups.google.com/d/optout. 

 -- 
 Costin 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/77259ed3-a896-47cc-9304-cc32046756ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Brian Thomas
I was just curious if there was a way of doing this without doing this, I 
can add the field if necessary.

For alternatives, what if in addition to es.mapping.id, there is another 
property available also, like es.mapping.id.exlude that will not include 
the id field in the source document.  In elasticsearch, you can create and 
update documents without having to include the id in the source document, 
so I think it would make sense to be able to do that with 
elasticsearch-hadoop also.

On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote:

 You need to specify the id of the document you want to update somehow. 
 Since in es-hadoop things are batch focused, each 
 doc needs its own id specified somehow hence the use of 'es.mapping.id' 
 to indicate its value. 
 Is there a reason why this approach does not work for you - any 
 alternatives that you thought of? 

 Cheers, 

 On 7/7/14 10:48 PM, Brian Thomas wrote: 
  I am trying to update an elasticsearch index using elasticsearch-hadoop. 
  I am aware of the *es.mapping.id* 
  configuration where you can specify that field in the document to use as 
 an id, but in my case the source document does 
  not have the id (I used elasticsearch's autogenerated id when indexing 
 the document).  Is it possible to specify the id 
  to update without having the add a new field to the MapWritable object? 
  
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an email to 
  elasticsearc...@googlegroups.com javascript: mailto:
 elasticsearch+unsubscr...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com
  
  
 https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com?utm_medium=emailutm_source=footer.
  

  For more options, visit https://groups.google.com/d/optout. 

 -- 
 Costin 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2c6753aa-c459-489b-9f86-6803a5616718%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-07-07 Thread Brian Thomas
Here is the gradle build I was using originally:

apply plugin: 'java'
apply plugin: 'eclipse'

sourceCompatibility = 1.7
version = '0.0.1'
group = 'com.spark.testing'

repositories {
mavenCentral()
}

dependencies {
compile 'org.apache.spark:spark-core_2.10:1.0.0'
compile 'edu.stanford.nlp:stanford-corenlp:3.3.1'
compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: 
'3.3.1', classifier:'models'
compile files('lib/elasticsearch-hadoop-2.0.0.jar')
testCompile 'junit:junit:4.+'
testCompile group: com.github.tlrx, name: elasticsearch-test, version: 
1.2.1
}


When I ran dependencyInsight on jackson, I got the following output:

C:\dev\workspace\SparkProjectgradle dependencyInsight --dependency 
jackson-core

:dependencyInsight
com.fasterxml.jackson.core:jackson-core:2.3.0
\--- com.fasterxml.jackson.core:jackson-databind:2.3.0
 +--- org.json4s:json4s-jackson_2.10:3.2.6
 |\--- org.apache.spark:spark-core_2.10:1.0.0
 | \--- compile
 \--- com.codahale.metrics:metrics-json:3.0.0
  \--- org.apache.spark:spark-core_2.10:1.0.0 (*)

org.codehaus.jackson:jackson-core-asl:1.0.1
\--- org.codehaus.jackson:jackson-mapper-asl:1.0.1
 \--- org.apache.hadoop:hadoop-core:1.0.4
  \--- org.apache.hadoop:hadoop-client:1.0.4
   \--- org.apache.spark:spark-core_2.10:1.0.0
\--- compile

Version 1.0.1 of jackson-core-asl does not have the field 
ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do.

On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote:

 Hi,

 Glad to see you sorted out the problem. Out of curiosity what version of 
 jackson were you using and what was pulling it in? Can you share you maven 
 pom/gradle build?


 On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas brianjt...@gmail.com 
 javascript: wrote:

 I figured it out, dependency issue in my classpath.  Maven was pulling 
 down a very old version of the jackson jar.  I added the following line to 
 my dependencies and the error went away:

 compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13'


 On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote:

  I am trying to test querying elasticsearch using Apache Spark using 
 elasticsearch-hadoop.  I am just trying to do a query to the elasticsearch 
 server and return the count of results.

 Below is my test class using the Java API:

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.io.MapWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.JavaPairRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.serializer.KryoSerializer;
 import org.elasticsearch.hadoop.mr.EsInputFormat;

 import scala.Tuple2;

 public class ElasticsearchSparkQuery{

 public static int query(String masterUrl, String 
 elasticsearchHostPort) {
 SparkConf sparkConfig = new SparkConf().setAppName(
 ESQuery).setMaster(masterUrl);
 sparkConfig.set(spark.serializer, 
 KryoSerializer.class.getName());
 JavaSparkContext sparkContext = new 
 JavaSparkContext(sparkConfig);

 Configuration conf = new Configuration();
 conf.setBoolean(mapred.map.tasks.speculative.execution, 
 false);
 conf.setBoolean(mapred.reduce.tasks.speculative.execution, 
 false);
 conf.set(es.nodes, elasticsearchHostPort);
 conf.set(es.resource, media/docs);
 conf.set(es.query, ?q=*);

 JavaPairRDDText, MapWritable esRDD = 
 sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class,
 MapWritable.class);
 return (int) esRDD.count();
 }
 }


 When I try to run this I get the following error:


 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0
 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 
 locally
 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit 
 [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0]
 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id...
 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0
 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES
 at org.elasticsearch.hadoop.serialization.json.
 JacksonJsonParser.clinit(JacksonJsonParser.java:38)
 at org.elasticsearch.hadoop.serialization.ScrollReader.
 read(ScrollReader.java:75)
 at org.elasticsearch.hadoop.rest.RestRepository.scroll(
 RestRepository.java:267)
 at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(
 ScrollQuery.java:75)
 at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(
 EsInputFormat.java:319)
 at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.
 nextKeyValue(EsInputFormat.java:255)
 at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(
 NewHadoopRDD.scala:122)
 at org.apache.spark.InterruptibleIterator.hasNext(
 InterruptibleIterator.scala:39)
 at org.apache.spark.util.Utils$.getIteratorSize

Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-07 Thread Brian Thomas
I am trying to update an elasticsearch index using elasticsearch-hadoop.  I 
am aware of the *es.mapping.id* configuration where you can specify that 
field in the document to use as an id, but in my case the source document 
does not have the id (I used elasticsearch's autogenerated id when indexing 
the document).  Is it possible to specify the id to update without having 
the add a new field to the MapWritable object?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-07-06 Thread Brian Thomas
I figured it out, dependency issue in my classpath.  Maven was pulling down 
a very old version of the jackson jar.  I added the following line to my 
dependencies and the error went away:

compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13'

On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote:

  I am trying to test querying elasticsearch using Apache Spark using 
 elasticsearch-hadoop.  I am just trying to do a query to the elasticsearch 
 server and return the count of results.

 Below is my test class using the Java API:

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.io.MapWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.JavaPairRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.serializer.KryoSerializer;
 import org.elasticsearch.hadoop.mr.EsInputFormat;

 import scala.Tuple2;

 public class ElasticsearchSparkQuery{

 public static int query(String masterUrl, String 
 elasticsearchHostPort) {
 SparkConf sparkConfig = new 
 SparkConf().setAppName(ESQuery).setMaster(masterUrl);
 sparkConfig.set(spark.serializer, 
 KryoSerializer.class.getName());
 JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig);

 Configuration conf = new Configuration();
 conf.setBoolean(mapred.map.tasks.speculative.execution, false);
 conf.setBoolean(mapred.reduce.tasks.speculative.execution, 
 false);
 conf.set(es.nodes, elasticsearchHostPort);
 conf.set(es.resource, media/docs);
 conf.set(es.query, ?q=*);

 JavaPairRDDText, MapWritable esRDD = 
 sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class,
 MapWritable.class);
 return (int) esRDD.count();
 }
 }


 When I try to run this I get the following error:


 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0
 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 
 locally
 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit 
 [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0]
 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id...
 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0
 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES
 at 
 org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.clinit(JacksonJsonParser.java:38)
 at 
 org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:75)
 at 
 org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:267)
 at 
 org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75)
 at 
 org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:319)
 at 
 org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:255)
 at 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
 at 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1014)
 at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
 at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
 at 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
 at 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
 at org.apache.spark.scheduler.Task.run(Task.scala:51)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 Has anyone run into this issue with the JacksonJsonParser?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9c2b2f2e-5196-4a72-bfbc-4cd0fda9edf0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-07-04 Thread Brian Thomas
 I am trying to test querying elasticsearch using Apache Spark using 
elasticsearch-hadoop.  I am just trying to do a query to the elasticsearch 
server and return the count of results.

Below is my test class using the Java API:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.serializer.KryoSerializer;
import org.elasticsearch.hadoop.mr.EsInputFormat;

import scala.Tuple2;

public class ElasticsearchSparkQuery{

public static int query(String masterUrl, String elasticsearchHostPort) 
{
SparkConf sparkConfig = new 
SparkConf().setAppName(ESQuery).setMaster(masterUrl);
sparkConfig.set(spark.serializer, KryoSerializer.class.getName());
JavaSparkContext sparkContext = new JavaSparkContext(sparkConfig);

Configuration conf = new Configuration();
conf.setBoolean(mapred.map.tasks.speculative.execution, false);
conf.setBoolean(mapred.reduce.tasks.speculative.execution, false);
conf.set(es.nodes, elasticsearchHostPort);
conf.set(es.resource, media/docs);
conf.set(es.query, ?q=*);

JavaPairRDDText, MapWritable esRDD = 
sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class,
MapWritable.class);
return (int) esRDD.count();
}
}


When I try to run this I get the following error:


4/07/04 14:58:07 INFO executor.Executor: Running task ID 0
14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit 
[node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0]
14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id...
14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0
java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES
at 
org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.clinit(JacksonJsonParser.java:38)
at 
org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:75)
at 
org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:267)
at 
org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:75)
at 
org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:319)
at 
org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:255)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1014)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Has anyone run into this issue with the JacksonJsonParser?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9da5ae25-3e57-4c24-ab45-c62c987ebec0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Proper parsing of String values like 1m, 1q HOUR etc.

2014-06-25 Thread Thomas
Hi,

Thanks again for your time, what I'm trying to do is to be able to generate 
for example the time in milliseconds in the same way the elasticsearch core 
does when for example you passes into the date histogram the 1q 
configuration. I'm trying to simulate in a way the interval of date 
histogram without using the date histogram, if possible. What is the one 
liner of code (if you can say that) that does this transformation of 1q 
into milliseconds and elasticsearch is able to give the intervals of date 
histogram, because then i compare time intervals with date histogram and I 
want to be exactly the same.

And please allow me to make one more question, since elasticsearch uses 
joda start of week is considered always Monday? independently of the 
timezone?

Thanks!!

Thomas

On Tuesday, 17 June 2014 18:31:37 UTC+3, Thomas wrote:

 Hi,

 I was wondering whether there is a proper Utility class to parse the given 
 values and get the duration in milliseconds probably for values such as 1m 
 (which means 1 minute) 1q (which means 1 quarter) etc.

 I have found that elasticsearch utilizes class TimeValue but it only 
 parses up to week, and values such as WEEK, HOUR are not accepted. So is in 
 elasticsearch source any utility class that does the job ? (for Histograms, 
 ranges wherever is needed)

 Thank you
 Thomas



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bbff2784-c06d-4944-9887-0147e9e31a5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Proper parsing of String values like 1m, 1q HOUR etc.

2014-06-24 Thread Thomas
Hi Brian,

Thanks for your reply, I understand your point but if you check the source 
code of TimeValue it does not support the quarter and the year so I was 
wondering what is the class that supports the transformation of the string 
1q into millisecods or 1y into millisecods if any

Thanks

On Tuesday, 17 June 2014 18:31:37 UTC+3, Thomas wrote:

 Hi,

 I was wondering whether there is a proper Utility class to parse the given 
 values and get the duration in milliseconds probably for values such as 1m 
 (which means 1 minute) 1q (which means 1 quarter) etc.

 I have found that elasticsearch utilizes class TimeValue but it only 
 parses up to week, and values such as WEEK, HOUR are not accepted. So is in 
 elasticsearch source any utility class that does the job ? (for Histograms, 
 ranges wherever is needed)

 Thank you
 Thomas



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9666f856-7327-4e97-8185-de603f02aee6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregation Framework, possible to get distribution of requests per user

2014-06-24 Thread Thomas
Hi,

I wanted to ask whether it is possible to get with the aggregation 
framework the distribution of one specific type of documents sent per user, 
I'm interested for occurrences of documents per user, e.g. :

1000 users sent 1 document 
500 ussers  sent 2 documents
X number of unique users sent Y documents (each)
etc.

on each document i index the user_id

Is there a way to support such a query, or partially support it? get the 
first 10 rows of this type of list not the exhaustive list. Can you give me 
some hint? 

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c9e7e543-372c-4441-9cac-e7c0f259ed4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation Framework, possible to get distribution of requests per user

2014-06-24 Thread Thomas
Hi David 

Thank you for your reply, so based on your suggestion I should maintain a 
document (e.g. user) with some aggregated values and I should update it as 
we move along with our indexing of our data, correct?

This though would only give me totals. I cannot apply something like a 
range. I found as well a similar discussion here 
https://groups.google.com/forum/#!msg/elasticsearch/UsrCG2Abj-A/IDO9DX_PoQwJ. 
Maybe something similar with the terms and histogram aggregation could 
support this logic like instead of giving :

{
aggs : {
requests_distribution : {
distribution : {
field : user_id,
interval : 50
}
}
}
}

and the result could be:

{
aggregations: {
requests_distribution : {
buckets: [
{
key: 0,
doc_count: 2
},
{
key: 50,
doc_count: 400
},
{
key: 150,
doc_count: 30
}
]
}
}
}

Where the key represents a unique number of users like for 0 to 50 users 
have 2 documents per user etc.

Just an idea

Thanks
Thomas

On Tuesday, 24 June 2014 13:32:13 UTC+3, Thomas wrote:

 Hi,

 I wanted to ask whether it is possible to get with the aggregation 
 framework the distribution of one specific type of documents sent per user, 
 I'm interested for occurrences of documents per user, e.g. :

 1000 users sent 1 document 
 500 ussers  sent 2 documents
 X number of unique users sent Y documents (each)
 etc.

 on each document i index the user_id

 Is there a way to support such a query, or partially support it? get the 
 first 10 rows of this type of list not the exhaustive list. Can you give me 
 some hint? 

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ae8b56f1-a783-4ade-b948-079f6457ae27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation Framework, possible to get distribution of requests per user

2014-06-24 Thread Thomas
My mistake sorry,

Here is an example:

I have the request document:

request:{
 dynamic : strict,
 properties : {
time : {
  format : dateOptionalTime,
  type : date
},
user_id : {
   index : not_analyzed,
   type : string
},
country : {
   index : not_analyzed,
   type : string
}
  }
}

I want to find the number of (unique) user_ids that have X number of 
documents, e.g. for country US, and ideally I need the full list e.g.:


1000 users have 43 documents
..
100 users have 234 documents
150 users have 500 documents
etc..

In other words the distribution of documents (requests) per unique user 
count, of course I can understand that it is a pretty heavy operation in 
terms of memory, but we may limit to the top 100 rows for instance, or if 
we can workaround it.

Thanks again for your time
Thomas

On Tuesday, 24 June 2014 13:32:13 UTC+3, Thomas wrote:

 Hi,

 I wanted to ask whether it is possible to get with the aggregation 
 framework the distribution of one specific type of documents sent per user, 
 I'm interested for occurrences of documents per user, e.g. :

 1000 users sent 1 document 
 500 ussers  sent 2 documents
 X number of unique users sent Y documents (each)
 etc.

 on each document i index the user_id

 Is there a way to support such a query, or partially support it? get the 
 first 10 rows of this type of list not the exhaustive list. Can you give me 
 some hint? 

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e07561ed-7f1b-4e98-8a8d-16e410324cc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Splunk vs. Elastic search performance?

2014-06-19 Thread Thomas Paulsen
We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with 
the same amount of machines. Unfortunately with Elasticsearch you need 
almost double amount of storage, plus a LOT of patience to make is run. It 
took us six months to set it up properly, and even now, the system is quite 
buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, 
it is ok, if you don´t mind the hassle of setting up and operating it. The 
costs you save by not buying a splunk license you have to invest into 
consultants to get it up and running. Our dev teams hate Elasticsearch and 
prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:

 That's a lot of data! I don't know of any installations that big but 
 someone else might.

 What sort of infrastructure are you running splunk on now, what's your 
 current and expected retention?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com javascript: 
 wrote:

 We have a large Splunk instance.  We load about 1.25 Tb of logs a day. 
  We have about 1,300 loaders (servers that collect and load logs - they may 
 do other things too).

 As I look at Elasticsearch / Logstash / Kibana does anyone know of a 
 performance comparison guide?  Should I expect to run on very similar 
 hardware?  More? or Less?

 Sure it depends on exactly what we're doing, the exact queries and the 
 frequency we'd run them but I'm trying to get any kind of idea before we 
 start.

 Are there any white papers or other documents about switching?  It seems 
 an obvious choice but I can only find very little performance comparisons 
 (I did see that Elasticsearch just hired the former VP of Products at 
 Splunk, Gaurav Gupta - but there were few numbers in that article either).

 Thanks,
 Frank

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

2014-06-13 Thread Thomas
Hi,

I'm facing a performance issue with some aggregations I perform, and I need 
your help if possible:

I have to documents, the *request* and the *event*. The request is the 
parent of the event. Below is a (sample) mapping

event : {
dynamic : strict,
_parent : {
   type : request
},
properties : {
   event_time : {
format : dateOptionalTime,
type : date
   },
   count : {
  type : integer
},
event : {
index : not_analyzed,
type : string
}
 }
}

request : {
dynamic : strict,
 _id : {
   path : uniqueId
 },
 properties : {
uniqueId : {
 index : not_analyzed,
 type : string
},
user : {
 index : not_analyzed,
 type : string
},
   code : {
  type : integer
   },
   country : {
 index : not_analyzed,
 type : string
   },
   city : {
 index : not_analyzed,
 type : string
   }
  
}
}

My cluster is becoming really big (almost 2 TB of data with billions of 
documents) and i maintain one index per day, whereas I occasionally delete 
old indices. My daily index is about 20GB big. The version of elasticsearch 
that I use is 1.1.1. 

My problems start when I want to get some aggregations of events with some 
criteria which is applied in the parent request document. For example count 
be the events of type *click for country = US and code=12. What I was 
initially doing was to generate a scriptFilter for the request document (in 
Groovy) and I was adding multiple aggregations in one search request. This 
ended up being very slow so I removed the scripting logic and I supported 
my logic with java code.*

What seems to be initially solved in my local machine, when I got back to 
the cluster, nothing has changed. Again my app performs really really poor. 
I get more than 10 seconds to perform a search with ~10 sub-aggregations.

What seems strange is that I notice that the cluster is pretty ok with 
regards load average, CPU etc. 

Any hints on where to look for solving this out? to be able to identify the 
bottleneck

*Ask for any additional information to provide*, I didn't want to make this 
post too long to read
Thank you

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

2014-06-13 Thread Thomas
Below is an example aggregation i perform, is there any optimizations I can 
perform? Maybe disabling some features i do not need etc.

curl -XPOST 
http://localhost:9200/logs-idx.20140613/event/_search?search_type=count; -d
'
{
  aggs: {
f1: {
  filter: {
or: [
  {
and: [
  {
has_parent: {
  type: request,
  filter: {
and: {
  filters: [
{
  term: {
country: US
  }
},
{
  term: {
city: NY
  }
},
{
  term: {
code: 12
  }
}
  ]
}
  }
}
  },
  {
range: {
  event_time: {
gte: 2014-06-13T10:00:00,
lt: 2014-06-13T11:00:00
  }
}
  }
]
  },
  {
and: [
  {
has_parent: {
  type: request,
  filter: {
and: {
  filters: [
{
  term: {
country: US
  }
},
{
  term: {
city: NY
  }
},
{
  term: {
code: 12
  }
},
{
  range: {
request_time: {
  gte: 2014-06-13T10:00:00,
  lt: 2014-06-13T11:00:00
}
  }
}
  ]
}
  }
}
  },
  {
range: {
  event_time: {
lt: 2014-06-13T10:00:00
  }
}
  }
]
  }
]
  },
  aggs: {
per_interval: {
  date_histogram: {
field: event_time,
interval: minute
  },
  aggs: {
metrics: {
  terms: {
field: event,
size: 10
  }
}
  }
}
  }
}
  }
}'


On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

 Hi,

 I'm facing a performance issue with some aggregations I perform, and I 
 need your help if possible:

 I have to documents, the *request* and the *event*. The request is the 
 parent of the event. Below is a (sample) mapping

 event : {
 dynamic : strict,
 _parent : {
type : request
 },
 properties : {
event_time : {
 format : dateOptionalTime,
 type : date
},
count : {
   type : integer
 },
 event : {
 index : not_analyzed,
 type : string
 }
  }
 }

 request : {
 dynamic : strict,
  _id : {
path : uniqueId
  },
  properties : {
 uniqueId : {
  index : not_analyzed,
  type : string
 },
 user : {
  index : not_analyzed,
  type : string
 },
code : {
   type : integer
},
country : {
  index : not_analyzed,
  type : string
},
city : {
  index : not_analyzed,
  type : string
}
   
 }
 }

 My cluster is becoming really big (almost 2 TB of data with billions of 
 documents) and i maintain one index per day, whereas I occasionally delete 
 old indices. My daily index is about 20GB big. The version of elasticsearch 
 that I use is 1.1.1. 

 My problems start when I want to get some aggregations of events with some 
 criteria which is applied in the parent request document. For example count 
 be the events of type *click for country = US and code=12. What I was 
 initially doing was to generate a scriptFilter for the request document (in 
 Groovy) and I was adding multiple aggregations in one search request. This 
 ended up being very slow so I removed the scripting logic and I supported 
 my logic with java code.*

 What seems to be initially solved in my local machine, when I got back to 
 the cluster, nothing has changed. Again my app performs really really poor. 
 I get more than 10 seconds to perform a search with ~10 sub-aggregations

Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

2014-06-13 Thread Thomas
So I restructured my curl as follows, is this what you mean?, by doing some 
first hits i do get some slight improvement, but need to check into 
production data:

Thank you will try it and come back with results

curl -XPOST 
http://10.129.2.42:9200/logs-idx.20140613/event/_search?search_type=count; 
-d'
{
  query: {
filtered: {
  filter: {
or: [
  {
and: [
  {
has_parent: {
  type: request,
  filter: {
and: {
  filters: [
{
  term: {
country: US
  }
},
{
  term: {
city: NY
  }
},
{
  term: {
code: 12
  }
}
  ]
}
  }
}
  },
  {
range: {
  event_time: {
gte: 2014-06-13T10:00:00,
lt: 2014-06-13T11:00:00
  }
}
  }
]
  },
  {
and: [
  {
has_parent: {
  type: request,
  filter: {
and: {
  filters: [
{
  term: {
country: US
  }
},
{
  term: {
city: NY
  }
},
{
  term: {
code: 12
  }
},
{
  range: {
request_time: {
  gte: 2014-06-13T10:00:00,
  lt: 2014-06-13T11:00:00
}
  }
}
  ]
}
  }
}
  },
  {
range: {
  event_time: {
lt: 2014-06-13T10:00:00
  }
}
  }
]
  }
]
  }
}
  },
  aggs: {
per_interval: {
  date_histogram: {
field: event_time,
interval: minute
  },
  aggs: {
metrics: {
  terms: {
field: event,
size: 12
  }
}
  }
}
  }
}'


On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

 Hi,

 I'm facing a performance issue with some aggregations I perform, and I 
 need your help if possible:

 I have to documents, the *request* and the *event*. The request is the 
 parent of the event. Below is a (sample) mapping

 event : {
 dynamic : strict,
 _parent : {
type : request
 },
 properties : {
event_time : {
 format : dateOptionalTime,
 type : date
},
count : {
   type : integer
 },
 event : {
 index : not_analyzed,
 type : string
 }
  }
 }

 request : {
 dynamic : strict,
  _id : {
path : uniqueId
  },
  properties : {
 uniqueId : {
  index : not_analyzed,
  type : string
 },
 user : {
  index : not_analyzed,
  type : string
 },
code : {
   type : integer
},
country : {
  index : not_analyzed,
  type : string
},
city : {
  index : not_analyzed,
  type : string
}
   
 }
 }

 My cluster is becoming really big (almost 2 TB of data with billions of 
 documents) and i maintain one index per day, whereas I occasionally delete 
 old indices. My daily index is about 20GB big. The version of elasticsearch 
 that I use is 1.1.1. 

 My problems start when I want to get some aggregations of events with some 
 criteria which is applied in the parent request document. For example count 
 be the events of type *click for country = US and code=12. What I was 
 initially doing was to generate a scriptFilter for the request document (in 
 Groovy) and I was adding multiple aggregations in one search request. This 
 ended up being very slow so I removed the scripting logic and I supported 
 my logic with java code.*

 What seems to be initially solved in my local machine, when I got back to 
 the cluster, nothing has changed. Again my app performs really really poor. 
 I get more than 10 seconds to perform a search with ~10

Re: Indexing nonstandard geo_point field.

2014-06-01 Thread Brian Thomas
I looked at the documenation for elasticsearch's geo_shape and it looks 
like that use [longitude, latitude]

Found this node on the geo_shape documentation page 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-geo-shape-type.html

Note: In GeoJSON, and therefore Elasticsearch, the correct*coordinate order 
is longitude, latitude (X, Y)* within coordinate arrays. This differs from 
many Geospatial APIs (e.g., Google Maps) that generally use the colloquial 
latitude, longitude (Y, X).


An alternative I found was to use the computed fields plugin
https://github.com/SkillPages/elasticsearch-computed-fields

and create a mapping like this:

@coordinates-str : {
 type : computed,
script : _source.geo.coordinates[0] + ',' + 
_source.geo.coordinates[1],
  result : {
  type : geo_point,
  store : true
 }
 }

This seems to create the string in the correct format for the geo point. 
 The issue I am having with this method right now is that Elasticsearch 
will return an error if the source document does not have the 
geo.coordinates field.  




On Sunday, June 1, 2014 4:28:24 PM UTC-4, Alexander Reelsen wrote:

 Hey,

 you could index this as a geo shape (as this is valid GeoJSON). If you 
 really need the functionality for a geo_point, you need to change the 
 structure of the data.


 --Alex


 On Sat, May 31, 2014 at 3:36 PM, Brian Thomas mynam...@gmail.com 
 javascript: wrote:

 I am new to Elasticsearch and I am trying to index a json document with a 
 nonstandard lat/long format.

 I know the standard format for a geo_point array is [lon, lat], but the 
 documents I am indexing has format [lat, lon].  

 This is what the JSON element looks like:

 geo: {
   type: Point,
   coordinates: [
 38.673459,
 -77.336781
   ]
 }

 Is there anyway I could have elasticsearch reorder this array or convert 
 this field to a string without having to modify the source document prior 
 to indexing? Could this be done using a field mapping or script in 
 elasticsearch?


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/0e688310-5777-4906-889e-cd77693c3908%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/0e688310-5777-4906-889e-cd77693c3908%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/48b0ae24-aaa8-4a05-9690-23032974da31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Indexing nonstandard geo_point field.

2014-05-31 Thread Brian Thomas
I am new to Elasticsearch and I am trying to index a json document with a 
nonstandard lat/long format.

I know the standard format for a geo_point array is [lon, lat], but the 
documents I am indexing has format [lat, lon].  

This is what the JSON element looks like:

geo: {
  type: Point,
  coordinates: [
38.673459,
-77.336781
  ]
}

Is there anyway I could have elasticsearch reorder this array or convert 
this field to a string without having to modify the source document prior 
to indexing? Could this be done using a field mapping or script in 
elasticsearch?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e688310-5777-4906-889e-cd77693c3908%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Question about week granularity elasticsearch uses

2014-05-23 Thread Thomas
Hello,

I stepped into a situation where I need to truncate a timestamp field 
truncated to the week, and i want to do it the exact way elasticsearch does 
it in the datehistogram aggregation in order to be able to perform 
comparisons. Does anyone knows how I should perform the truncate to the 
week? I notice that datehistogram returns the beginning of the week 
(MONDAY), is it safe to use the Calendar way as follows?

Calendar cal = Calendar.getInstance();
 cal.setTimeZone(TimeZone.getTimeZone(GMT));
 // cal.set(Calendar.DAY_OF_WEEK, cal.getFirstDayOfWeek());
 cal.set(Calendar.DAY_OF_WEEK, Calendar.MONDAY);
 Date time = cal.getTime();
 System.out.println(time =  + time);


Is the first day of week depends on the Locale in elasticsearch or not?

Thank you

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d763e87-d1ab-491f-af7e-8a0b4ba71d39%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Clear cache on demand and Circuit breaker Problem

2014-05-13 Thread Thomas
Hi,

I'm trying to get some aggregated information by Querying Elasticsearch via 
my app. What I notice is that after some time I get a CircuitBreaker 
exception and my query fails. I can assume that I load too many fielddata 
and eventually the CircuitBreaker stops my query. Inside my application I 
have a logic where I do the query sequentially by time. For example I split 
the query with a range if two hours to every half hour, hence instead of 
doing one query for the full two hours period I do 4 queries of half hour 
each. And this is something I can configure.

My Question is whether it has a meaning to perform a clearCache request 
between my requests (or every 15 minutes for instance) in order to avoid 
CircuitBreaker exception. I know I will make it slower but to my mind it is 
better to perform a bit poorly rather than stopping the operation. Knowing 
that the query remains the same (with different parameters) does this have 
a meaning ? or I will end up deleting and creating the same cache again and 
again?

client.admin().indices().prepareClearCache(indexName).get();


Other alternatives here to avoid circuitbreaker in the most efficient way? 
Of course if I leave it unbounded I eventually get a heap space exception..

Thank you

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d093c6df-b40b-4704-b0dd-c6bc300299c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Scripts reload on demand

2014-05-08 Thread Thomas
Hi,

I was wondering whether there is a way to reload the scripts on demand 
provided under config/scripts. I'm facing a weird situation were although 
the documentation describes that the scripts are loaded every xx amount of 
time (configuration) I do not see that happening and there is no way to see 
a new script I put unless I restart my node(s). Is there a curl request to 
be able to force reload the scripts? Additionally, is there any curl 
command that can display which scripts are loaded into ES Node and which 
are not?

I use elasticsearch 1.1.1 and my scripts are in Groovy (with groovy lang 
plugin installed)

Thank you

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/51e0da62-8934-4e67-9fb8-792353f532da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


access parent bucket's key from child aggregation in geohash grid

2014-05-01 Thread Thomas Gruner
Hello!

I have been progressing well with aggregations, but this one has got me 
stumped. 

I'm trying to figure out how to access the key of the parent bucket from a 
child aggregation. 

The parent bucket is geohash_grid, and the child aggregation is avg (trying 
to get avg lat and lon, but only for points that match the parent's 
bucket's geohash key)

Something like this:
aggregations : { 
LocationsGrid: {
geohash_grid : {
field : Locations,
precision : 7, 
},
aggregations : {
avg_lat: {
avg: {
script: if 
(doc['Locations'].value.geohash.startsWith(*parent_bucket.key*)) 
doc['Locations'].value.lat;
}
}
},
}
}


Thanks for any help or ideas with this!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/624d0bdd-c380-4c72-b642-e6afff3458a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Upgrade cluster from 0.90.11 to 1.1.1

2014-04-26 Thread Thomas Ardal
I'm running a two-node cluster with Elasticsearch 0.90.11. I want to 
upgrade to the newest version (1.1.1), but I'm not entirely sure on how to 
do it. 0.90.11 is based on Lucene 4.6.1 and 1.1.1 on Lucene 4.7.2. Can I do 
the following:

1. stop node 1.
2. install 1.1.1 on node 1.
3. copy data folder to 1.1.1.
4. start node 1 and wait for it to synchronize.
5. stop node 2.
6. install 1.1.1 on node 2.
7. copy data folder to 1.1.1.
8. start node 2 and wait for it to synchronize.

I can live with downtime if not possible otherwise.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/26670af5-f4d5-4d11-859b-bdbbc08367f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Parent/Child combination Script possible?

2014-04-14 Thread Thomas
Thanks Sven,

Yes this would solve a lot of use cases..

Is there anyone that can respond whether we should create an issue for 
that? The link provided does not mention whether finally this should be 
opened as an issue

Thanks 
Thomas

On Friday, 11 April 2014 18:53:08 UTC+3, Thomas wrote:

 Hello,


 I have two document types which are utilizing a parent/child relation. I 
 want to perform an aggregation where the script utilizes fields from both 
 documents. Is that possible?

 More specifically:

 Parent Document 

 {
   tag:{
_id: {
  path: tag_id
},
properties: {
  tag_id: {index: not_analyzed,type: string},
  name: {index: not_analyzed,type: string}
  tag_counter: {type: integer}
}
   }
 }



 Child Document

 {
   click:{
_parent: {
  type: tag
},
properties: {
  type: {index: not_analyzed,type: string},
  clicks_counter: {type: integer}
}
   }
 }



 curl -XGET http://localhost:9200/tags-index/tags,clicks/_search; -d'
 {
aggregations: {
   one_day_filter: {
  filter: {
 range: {
ts: {
   gte: 2014-03-15T00:00:00,
   lt: 2014-03-15T01:00:00
}
 }
  },
  aggregations: {
 parent: {
filter: {
   has_child: {
  type: clicks,
  query: {
 match_all: {}
  }
   }
},
aggregations: {
   metrics: {
  terms: {
 script: *doc[\tags.tag_counter\].value - 
 doc[\clicks.clicks_counter\].value*
 }
  }
   }
}
 }
  }
   }
},
size: 0
 }'



 Thanks
 Thomas


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9a188e78-e220-4afd-b840-e85ea0cade4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Parent/Child combination Script possible?

2014-04-11 Thread Thomas
Hello,


I have two document types which are utilizing a parent/child relation. I 
want to perform an aggregation where the script utilizes fields from both 
documents. Is that possible?

More specifically:

Parent Document 

{
   tag:{
_id: {
  path: tag_id
},
properties: {
  tag_id: {index: not_analyzed,type: string},
  name: {index: not_analyzed,type: string}
  tag_counter: {type: integer}
}
   }
 }



Child Document

{
   click:{
_parent: {
  type: tag
},
properties: {
  type: {index: not_analyzed,type: string},
  clicks_counter: {type: integer}
}
   }
 }



curl -XGET http://localhost:9200/tags-index/tags,clicks/_search; -d'
 {
aggregations: {
   one_day_filter: {
  filter: {
 range: {
ts: {
   gte: 2014-03-15T00:00:00,
   lt: 2014-03-15T01:00:00
}
 }
  },
  aggregations: {
 parent: {
filter: {
   has_child: {
  type: clicks,
  query: {
 match_all: {}
  }
   }
},
aggregations: {
   metrics: {
  terms: {
 script: *doc[\tags.tag_counter\].value - 
 doc[\clicks.clicks_counter\].value*
 }
  }
   }
}
 }
  }
   }
},
size: 0
 }'



Thanks
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ffee913-9e82-4862-befe-e0f7ff9038e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Terms aggregation scripts running slower than expected

2014-04-09 Thread Thomas S.
Hi,

I am currently exploring the option of using scripts with aggregations and 
I noticed that for some reason scripts for terms aggregations are executed 
much slower than for other aggregations, even if the script doesn't access 
any fields yet. This also happens for native Java scripts. I'm running 
Elasticsearch 1.1.0.

For example, on my data set the simple script 1 takes around 400ms for 
the sum and histogram aggregations, but takes around 25s to run on a terms 
aggregation, even on repeated runs. What is going on here? Terms 
aggregations without a script are very fast, and histogram/sum aggregations 
with scripts that access the document are also very fast: I had to 
transform a script aggregation that should have been a terms aggregation 
into a histogram and convert the numeric values back into terms on the 
client so the aggregation would be executed in reasonable time.


In [2]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
'aggregations': { 'test_script': { 'terms': { 'script': '1' } } }})
Out[2]:
{u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
 u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
 u'key': u'1'}]}},
 u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
 u'timed_out': False,
 u'took': 24986}


In [10]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
'aggregations': { 'test_script': { 'sum': { 'script': '1' } } }})
Out[10]:
{u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
 u'aggregations': {u'test_script': {u'value': 4231327.0}},
 u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
 u'timed_out': False,
 u'took': 363}


In [8]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
'aggregations': { 'test_script': { 'histogram': { 'script': '1', 
'interval': 1 } } }})
Out[8]:
{u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
 u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
 u'key': 1}]}},
 u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
 u'timed_out': False,
 u'took': 421}


Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4af8942c-db46-47fa-9d38-370051a15c5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: query field computed by child query?

2014-03-31 Thread Thomas Andres

Thanks for the examples. Looks quite interesting. If I understand that 
correctly, I'd have to write a plugin doing my subquery. Too bad I don't 
have much time right now :( Sounds like an interesting challenge :)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f76b7750-fcba-48dd-a1be-61d385b12bd4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


query field computed by child query?

2014-03-28 Thread Thomas Andres
I have documents in a parent/child relation. In a query run on the parent, 
I'd like to know, if the found parents have children matching some query. I 
don't want to filter only parents with some conditions on the child, but 
only get the information, that they have childrens matching some query.

Any idea if that's possible? I've been thinking maybe adding a scrip_field 
that would compute that, but have no idea how to run child queries from a 
script field.

An example to clarify my problem:
child has a boolean field error

I run a query on the parent and want to show an information if any of the 
children has the error flag set.

Any hint would be welcome.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c5bbf2eb-7960-4740-9e0c-a70dbe98a9aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: query field computed by child query?

2014-03-28 Thread Thomas Andres
I want to return all parents (or those matching some other query 
conditions) but in addition to the other data in the document, I want to 
compute for each parent, if he has any child with a set error flag. I don't 
want to filter on this condition in this case.

Am Freitag, 28. März 2014 14:21:30 UTC+1 schrieb Binh Ly:

 Not sure I understand. So if you run a _search on the parent, and use the 
 has_child filter to return only parents that match some child condition, is 
 that not what you want? 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c99eb564-0767-4adc-b2f6-e4ca00c879a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Inconsistent search cluster status and search results after long GC run

2014-03-27 Thread Thomas S.
)
at 
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:556)
at 
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:308)
at 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: 
[node2][inet[/10.216.32.81:9300]] Node not connected
at 
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
at 
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
at 
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
... 7 more


NODE 2
[2014-03-27 07:19:02,871][INFO ][cluster.service  ] [node2] removed 
{[node3][RRqWlTWnQ7ygvsOaJS0_mA][node3][inet[/10.235.38.84:9300]]{master=true},},
 
reason: 
zen-disco-node_failed([node3][RRqWlTWnQ7ygvsOaJS0_mA][node3][inet[/10.235.38.84:9
300]]{master=true}), reason failed to ping, tried [2] times, each with 
maximum [9s] timeout


NODE 3
[2014-03-27 07:19:20,055][WARN ][monitor.jvm  ] [node3] 
[gc][old][539697][754] duration [35.1s], collections [1]/[35.8s], total 
[35.1s]/[2.7m], memory [4.9gb]-[4.2gb]/[7.9gb], all_pools {[young] 
[237.8mb]-[7.4mb]/[266.2mb]}{[survivor] [25.5mb]-[0b]/[33
.2mb]}{[old] [4.6gb]-[4.2gb]/[7.6gb]}
[2014-03-27 07:19:20,112][INFO ][discovery.zen] [node3] 
master_left 
[[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true}],
 
reason [do not exists on master, act as master failure]
[2014-03-27 07:19:20,117][INFO ][cluster.service  ] [node3] master 
{new 
[node1][DxlcpaqOTmmpNSRoqt1sZg][node1.example][inet[/10.252.78.88:9300]]{data=false,
 
master=true}, previous 
[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300
]]{master=true}}, removed 
{[node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true},},
 
reason: zen-disco-master_failed 
([node2][A45sMYqtQsGrwY5exK0sEg][node2][inet[/10.216.32.81:9300]]{master=true})


After this scenario, the cluster doesn't recover properly: The worst thing 
is that node 1 sees nodes 1+3, node 2 sees nodes 1+2 and node 3 sees nodes 
1+3. Since the cluster is set up to operate with two nodes, both data nodes 
2 and 3 accept data and searches, causing inconsistent results and 
requiring us to do a full cluster restart and reindex all production data 
to make sure the cluster is consistent again.


NODE 1 (GET /_nodes):
{
  cluster_name : elasticsearch,
  nodes : {
DxlcpaqOTmmpNSRoqt1sZg : {
  name : node1,
  ...
},
RRqWlTWnQ7ygvsOaJS0_mA : {
  name : node3,
  ...
}
  }
}

NODE 2 (GET /_nodes):
{
  cluster_name : elasticsearch,
  nodes : {
A45sMYqtQsGrwY5exK0sEg : {
  name : node2,
  ...
},
DxlcpaqOTmmpNSRoqt1sZg : {
  name : node1,
  ...
}
  }
}

NODE 3 (GET /_nodes):
{
  cluster_name : elasticsearch,
  nodes : {
DxlcpaqOTmmpNSRoqt1sZg : {
  name : node1,
  ...
},
RRqWlTWnQ7ygvsOaJS0_mA : {
  name : node3,
  ...
}
  }
}


Here are the configurations:

BASE CONFIG (for all nodes):
action:
  disable_delete_all_indices: true
discovery:
  zen:
fd:
  ping_retries: 2
  ping_timeout: 9s
minimum_master_nodes: 2
ping:
  multicast:
enabled: false
  unicast:
hosts: [node1.example, node2.example, node3.example]
index:
  fielddata:
cache: node
indices:
  fielddata:
cache:
  size: 40%
  memory:
index_buffer_size: 20%
threadpool:
  bulk:
queue_size: 100
type: fixed
transport:
  tcp:
connect_timeout: 3s

NODE 1:
node:
  data: false
  master: true
  name: node1

NODE 2:
node:
  data: true
  master: true
  name: node2

NODE 3:
node:
  data: true
  master: true
  name: node3


Questions:
1) What can we do to minimize long GC runs, so the nodes don't become 
unresponsive and disconnect in the first place? (FYI: Our index is 
currently about 80 GB in size with over 2M docs (per node), 60 shards, heap 
size 8 GB. We run both searches and aggregations on it.)
2) Obviously, having the cluster state in a state like the above is 
unacceptable and we therefore want to make sure that even if a node 
disconnects because of GC, the cluster can fully recover and only one of 
the two data nodes can accept data and searches while a node is 
disconnected. Is there anything that needs to be changed in the 
Elasticsearch code to fix this issue?

Thanks,
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop

Split brain problem on Azure

2014-03-27 Thread Thomas Ardal
I'm experiencing split brain problem on my Elasticsearch cluster on Azure, 
consisting of two nodes. I've read about the zen.ping.timeout 
and discovery.zen.minimum_master_nodes settings, but I guess that I can't 
use those settings, when using the Azure plugin. Any ideas for avoiding 
split brain using the Azure plugin?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/169394c1-6a8c-430e-a9c1-0286b1789fce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Split brain problem on Azure

2014-03-27 Thread Thomas Ardal
Ok. Also using the zen.* keys?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7e5eaf00-b943-4c64-aaad-a5ac78400ea3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Inconsistent search cluster status and search results after long GC run

2014-03-27 Thread Thomas S.
Thanks Jörg,

I can increase the ping_timeout to 60s for now. However, shouldn't the goal 
be to minimize the time GC runs? Is the node blocked when GC runs and will 
delay any requests to it? If so, then it would be very bad to allow long GC 
runs.

Regarding the bulk thread pool: I specifically set this to a higher value 
to avoid errors when we perform bulk indexing (we had errors sometimes when 
the queue was full and set to 50. I was also going to increase the index 
queue since there are sometimes errors). I will try keeping the limit and 
give it more heap space to indexing instead, as you suggested.

Regarding Java 8: We're currently running Java 7 and haven't tweaked any GC 
specific settings. Do you think it makes sense to already switch to Java 8 
on production and enable the G1 garbage collector?

Thanks again,
Thomas

On Thursday, March 27, 2014 9:41:10 PM UTC+1, Jörg Prante wrote:

 It seems you run into trouble because you changed some of the default 
 settings, worsening your situation.

 Increase ping_timout from 9s to 60s as first band aid - you have GCs with 
 35secs running.

 You should reduce the bulk thread pool of 100 to 50, this reduces high 
 memory pressure on the 20% memory you allow. Give more heap space to 
 indexing, use 50% instead of 20%.

 Better help would be to diagnose the nodes if you exceed the capacity for 
 search and index operations. If so, think about adding nodes.

 More finetuning after adding nodes could include G1 GC with Java 8, which 
 is targeted to minimize GC stalls. This would not solve node capacity 
 problems though.

 Jörg


 On Thu, Mar 27, 2014 at 4:46 PM, Binh Ly binh...@yahoo.com 
 javascript:wrote:

 I would probably not master enable any node that can potentially gc for a 
 couple seconds. You want your master-eligible nodes to make decisions as 
 quick as possible.

 About your GC situation, I'd find out what the underlying cause is:

 1) Do you have bootstrap.mlockall set to true?

 2) Does it usually triggered while running queries? Or is there a pattern 
 on when it usually triggers?

 3) Is there anything else running on these nodes that would overload and 
 affect normal ES operations?
  
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/cd594a91-00c4-43ae-97d8-bbda35618d8e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/cd594a91-00c4-43ae-97d8-bbda35618d8e%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/86db1b12-038f-47d6-9fac-9e8eb8314dbc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Inconsistent search cluster status and search results after long GC run

2014-03-27 Thread Thomas S.
Forgot to reply to your questions, Binh:

1) No I haven't set this. However I wonder if this has any significant 
effect since swap space is barely used.
2) It seems to happen when the cluster is under high load but I haven't 
seen any specific pattern so far.
3) No there's not. There's a very small Redis instance running on node1, 
but there's nothing else on the nodes with shards (where the GC problem 
happens).

If I was going to disable master on any node that has shards I'd have to 
add another dummy node with master:true so the cluster is in good state if 
any one of the nodes is down.


On Thursday, March 27, 2014 4:46:41 PM UTC+1, Binh Ly wrote:

 I would probably not master enable any node that can potentially gc for a 
 couple seconds. You want your master-eligible nodes to make decisions as 
 quick as possible.

 About your GC situation, I'd find out what the underlying cause is:

 1) Do you have bootstrap.mlockall set to true?

 2) Does it usually triggered while running queries? Or is there a pattern 
 on when it usually triggers?

 3) Is there anything else running on these nodes that would overload and 
 affect normal ES operations?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ae93e7c-a6f7-4784-8b4a-71d6f52552a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Delete by query fails often with HTTP 503

2014-03-18 Thread Thomas S.
Hi,

We often get failures when using the delete by query API. The response is 
an HTTP 503 with a body like this:

{_indices: {myindex: {_shards: {successful: 2, failed: 58, 
total: 60

Is there a way to figure out what is causing this error? It seems to mostly 
happen when the search cluster is busy.

Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Delete by query fails often with HTTP 503

2014-03-18 Thread Thomas S.
Thanks Clint,

We have two nodes with 60 shards per node. I will increase the queue size. 
Hopefully this will reduce the amount of rejections.

Thomas


On Tuesday, March 18, 2014 6:11:27 PM UTC+1, Clinton Gormley wrote:

 Do you have lots of shards on just a few nodes? Delete by query is handled 
 by the `index` thread pool, but those threads are shared across all shards 
 on a node.  Delete by query can produce a large number of changes, which 
 can fill up the thread pool queue and result in rejections.

 You can either just (a) retry or (b) increase the queue size for the 
 `index` thread pool (which will use more memory as more delete requests 
 will need to be queued)

 See 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#types

 clint


 On 18 March 2014 08:13, Thomas S. thom...@gmail.com javascript: wrote:

 Hi,

 We often get failures when using the delete by query API. The response is 
 an HTTP 503 with a body like this:

 {_indices: {myindex: {_shards: {successful: 2, failed: 58, 
 total: 60

 Is there a way to figure out what is causing this error? It seems to 
 mostly happen when the search cluster is busy.

 Thomas

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f8c84eaf-79b9-4f4e-9b26-732d11544fb9%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b815184a-8382-4b25-8a54-b98753f6cbb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Unable to load script under config/scripts

2014-03-10 Thread Thomas
Hi,

I'm trying to keep some scripts within config/scripts but elasticsearch 
seems that it cannot locate them. What could be a possible reason for this?

When need to invoke it es fails with the following 

No such property: scriptname for class: Script1

Any ideas?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f96b90e4-7704-49e0-8dd6-38ef1ebe6558%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Queue capacity and EsRejectedExecutionException leads to loss of data

2014-02-26 Thread Thomas
Thanks David,

So this is a rabbitMQRiver issue, is there a need to open a separate issue? 
(Never done the procedure, will look this one)

Thomas

On Wednesday, 26 February 2014 15:48:55 UTC+2, Thomas wrote:

 Hi,

 We have installed the RabbitMQ river plugin to pull data from our Queue 
 and adding them to ES. The thing is that at some point we are receiving the 
 following exception and we have as a result to *lose data*.

 [1775]: index [events-idx], type [click], id 
 [3f6e4604146b435aabcf4ea5a493fd32], message 
 [EsRejectedExecutionException[rejected execution (queue capacity 50) on 
 org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@12843ca2]]


 We have changed the configuration  of queue size to 1000 and the problem 
 disappeared. 

 My question is that is there any configuration/way to tell ES to instead 
 of throwing this exception and discarding the document to wait for 
 available resources (with the corresponding performance impact)?

 Thanks

 Thomas




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/03dcb0ea-2b6a-478b-b678-f52ecbc09298%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Avoiding duplicate documents with versioning

2014-02-18 Thread Thomas
Just for any other people that might find this post useful, finally we 
managed to get the expected functionality as described here

Thanks
Thomas

On Saturday, 15 February 2014 16:53:20 UTC+2, Thomas wrote:

 Hi,

 First of all congrats for the 1.0 release!! Thumbs up for the aggregation 
 framework :)

 I'm trying to build a system which is kind of querying for analytics. I 
 have a document called *event*, and I have events of specific type (e.g. 
 click open etc.) per page. So per page i might have for example an *open 
 event*. The thing is that I might as well take the open event *more than 
 once*, but I want to count it only once. So I use the versioning API and 
 I provide the same document id having as a result the version to increase. 

 In my queries I use the _timestamp field to determine the last document 
 that I counted. But my problem is that since ES reindex the document, it 
 updates _timestamp so it seems as recent document, and in my queries I 
 count it again.

 Is there a way to simply *discard* the document if the document with the 
 same id exists, without stopping the bulk operation of uploading documents?

 Thanks 
 Thomas


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/49af9451-023c-4c49-9211-255b07ca2191%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Avoiding duplicate documents with versioning

2014-02-15 Thread Thomas
Hi,

First of all congrats for the 1.0 release!! Thumbs up for the aggregation 
framework :)

I'm trying to build a system which is kind of querying for analytics. I 
have a document called *event*, and I have events of specific type (e.g. 
click open etc.) per page. So per page i might have for example an *open 
event*. The thing is that I might as well take the open event *more than 
once*, but I want to count it only once. So I use the versioning API and I 
provide the same document id having as a result the version to increase. 

In my queries I use the _timestamp field to determine the last document 
that I counted. But my problem is that since ES reindex the document, it 
updates _timestamp so it seems as recent document, and in my queries I 
count it again.

Is there a way to simply *discard* the document if the document with the 
same id exists, without stopping the bulk operation of uploading documents?

Thanks 
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/15a8062b-a60c-4c2e-ae41-6dd31b4b360b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Avoiding duplicate documents with versioning

2014-02-15 Thread Thomas
Just an update,

If we use the op_type=create in the index request, will probably discard 
the duplicate document. But, in the case where we do a bulk operation will 
it stop the bulk upload? or will generate the error and move on to the next 
document?

thanks

On Saturday, 15 February 2014 16:53:20 UTC+2, Thomas wrote:

 Hi,

 First of all congrats for the 1.0 release!! Thumbs up for the aggregation 
 framework :)

 I'm trying to build a system which is kind of querying for analytics. I 
 have a document called *event*, and I have events of specific type (e.g. 
 click open etc.) per page. So per page i might have for example an *open 
 event*. The thing is that I might as well take the open event *more than 
 once*, but I want to count it only once. So I use the versioning API and 
 I provide the same document id having as a result the version to increase. 

 In my queries I use the _timestamp field to determine the last document 
 that I counted. But my problem is that since ES reindex the document, it 
 updates _timestamp so it seems as recent document, and in my queries I 
 count it again.

 Is there a way to simply *discard* the document if the document with the 
 same id exists, without stopping the bulk operation of uploading documents?

 Thanks 
 Thomas


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dbf19235-5b76-4a09-8b86-9a0fbf7e8d1c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Marvel houskeeping

2014-02-14 Thread Thomas Andres
I upgraded elasticsearch to 0.90.11 and installed marvel. Congratulations 
on a really nice tool!

Now I have a small issue: since marvel is generating quite a lot of data 
(for our develop system), I would like to configure an automatic delete of 
old data. Is there such an option? I didn't find anything in the 
documentation. It would be great to specify a rolling window of n days of 
data to keep.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/90ac3f1f-23c4-461f-95d5-f054f1fc5706%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


ElasticSearch Analytics Capabilites

2014-02-13 Thread Binil Thomas
ES seems to have ability to run analytic queries. I have read about people 
using it as an OLAP solution [1], although I have not yet read anyone 
describe their experience. In that respect how does ES analytics 
capabilities compare against:

1) Dremel clones [2] like Impala  Presto (for near real-time, ad hoc 
analytic queries over large datasets)
2) Lambda Architecture [3] systems (where queries are known up- front, but 
need to run against a large dataset)

Does anyone here have experience running ES in such usecases, beyond the 
free text searching one ES is well-known for?

Thanks,
Binil

[1]: https://groups.google.com/forum/#!topic/elasticsearch/iTy9IYL23as
[2]: 
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf
[3]: 
http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-principles-for-architecting

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5c75a380-3971-45cd-b10d-a91b3b97ecc3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: EC2 Discovery is not working with AutoScaling group (AWS)

2014-02-07 Thread Thomas FATTAL
Finally, I fixed my problem.
There was a mistake for the field discovery.ec2.groups. Instead of a 
string, I had to put an array of string.
And I also forgot to add the tag platform:prod to CloudFormation when 
launching my stack. 

Fixed!

On Friday, 7 February 2014 14:54:05 UTC+1, Thomas FATTAL wrote:

 Hi,

 I'm trying to configure two Elasticsearch nodes in AWS in the same 
 autoscaling group (CloudFormation).
 I am having some problems with them discovering each other.

 The following shows the elasticsearch.log I have on the first machine with 
 the instance-id i-2db5db03.
 The second machine has an instance-id i-324e6612.

 It seems that both nodes recognize each other, thanks to 
 discovery.ec2.tag.* field I added but then there are some problems that 
 make them not to join together:

 [2014-02-07 13:17:08,852][INFO ][node ] 
 [ip-10-238-225-133.ec2.internal] version[1.0.0.Beta2], pid[15342], 
 build[296cfbe/2013-12-02T15:46:27Z]
 [2014-02-07 13:17:08,853][INFO ][node ] 
 [ip-10-238-225-133.ec2.internal] initializing ...
 [2014-02-07 13:17:08,917][INFO ][plugins  ] 
 [ip-10-238-225-133.ec2.internal] loaded [cloud-aws], sites [paramedic]
 [2014-02-07 13:17:15,452][DEBUG][discovery.zen.ping.unicast] 
 [ip-10-238-225-133.ec2.internal] using initial hosts [], with 
 concurrent_connects [10]
 [2014-02-07 13:17:15,455][DEBUG][discovery.ec2] 
 [ip-10-238-225-133.ec2.internal] using ping.timeout [3s], 
 master_election.filter_client [true], master_election.filter_data [false]
 [2014-02-07 13:17:15,456][DEBUG][discovery.zen.elect  ] 
 [ip-10-238-225-133.ec2.internal] using minimum_master_nodes [1]
 [2014-02-07 13:17:15,457][DEBUG][discovery.zen.fd ] 
 [ip-10-238-225-133.ec2.internal] [master] uses ping_interval [1s], 
 ping_timeout [30s], ping_retries [3]
 [2014-02-07 13:17:15,500][DEBUG][discovery.zen.fd ] 
 [ip-10-238-225-133.ec2.internal] [node  ] uses ping_interval [1s], 
 ping_timeout [30s], ping_retries [3]
 [2014-02-07 13:17:16,769][DEBUG][discovery.ec2] 
 [ip-10-238-225-133.ec2.internal] using host_type [PRIVATE_IP], tags 
 [{platform=prod}], groups [[]] with any_group [true], availability_zones 
 [[]]
 [2014-02-07 13:17:19,930][INFO ][node ] 
 [ip-10-238-225-133.ec2.internal] initialized
 [2014-02-07 13:17:19,931][INFO ][node ] 
 [ip-10-238-225-133.ec2.internal] starting ...
 [2014-02-07 13:17:20,455][INFO ][transport] 
 [ip-10-238-225-133.ec2.internal] bound_address {inet[/0.0.0.0:9300]}, 
 publish_address {inet[/10.238.225.133:9300]}
 [2014-02-07 13:17:20,527][TRACE][discovery] 
 [ip-10-238-225-133.ec2.internal] waiting for 30s for the initial state to 
 be set by the discovery
 [2014-02-07 13:17:21,981][TRACE][discovery.ec2] 
 [ip-10-238-225-133.ec2.internal] building dynamic unicast discovery nodes...
 [2014-02-07 13:17:21,982][TRACE][discovery.ec2] 
 [ip-10-238-225-133.ec2.internal] filtering out instance i-2db5db03 based 
 tags {platform=prod}, not part of [{Key: aws:cloudformation:stack-id, 
 Value: 
 arn:aws:cloudformation:us-east-1:876119091332:stack/ES-10/daf53050-8ff8-11e3-bdce-50e241629418,
  
 }, {Key: aws:cloudformation:stack-name, Value: ES-10, }, {Key: 
 aws:cloudformation:logical-id, Value: ESASG, }, {Key: 
 aws:autoscaling:groupName, Value: ES-10-ESASG-BHGX7KKQ9QPR, }]
 [2014-02-07 13:17:21,983][TRACE][discovery.ec2] 
 [ip-10-238-225-133.ec2.internal] filtering out instance i-324e6612 based 
 tags {platform=prod}, not part of [{Key: aws:cloudformation:logical-id, 
 Value: ESASG, }, {Key: aws:cloudformation:stack-id, Value: 
 arn:aws:cloudformation:us-east-1:876119091332:stack/ES-10/daf53050-8ff8-11e3-bdce-50e241629418,
  
 }, {Key: aws:cloudformation:stack-name, Value: ES-10, }, {Key: 
 aws:autoscaling:groupName, Value: ES-10-ESASG-BHGX7KKQ9QPR, }]
 [2014-02-07 13:17:21,983][DEBUG][discovery.ec2] 
 [ip-10-238-225-133.ec2.internal] using dynamic discovery nodes []
 [2014-02-07 13:17:23,744][TRACE][discovery.ec2] 
 [ip-10-238-225-133.ec2.internal] building dynamic unicast discovery nodes...
 [2014-02-07 13:17:23,745][TRACE][discovery.ec2] 
 [ip-10-238-225-133.ec2.internal] filtering out instance i-2db5db03 based 
 tags {platform=prod}, not part of [{Key: aws:cloudformation:stack-id, 
 Value: 
 arn:aws:cloudformation:us-east-1:876119091332:stack/ES-10/daf53050-8ff8-11e3-bdce-50e241629418,
  
 }, {Key: aws:cloudformation:stack-name, Value: ES-10, }, {Key: 
 aws:cloudformation:logical-id, Value: ESASG, }, {Key: 
 aws:autoscaling:groupName, Value: ES-10-ESASG-BHGX7KKQ9QPR, }]
 [2014-02-07 13:17:23,745][TRACE][discovery.ec2] 
 [ip-10-238-225-133.ec2.internal] filtering out instance i-324e6612 based 
 tags {platform=prod}, not part of [{Key: aws:cloudformation:logical-id, 
 Value: ESASG, }, {Key: aws:cloudformation:stack-id, Value

Deployment of a ES cluster on AWS

2014-02-06 Thread Thomas FATTAL
Hi!

I want to deploy a cluster of Elasticsearch nodes on AWS.
All our existing infrastructure is using CloudFormation with Chef 
cookbooks. We also did setup AutoScaling Group to restart application nodes 
automatically when some are going down.

I have several questions concerning the ES cluster I try to setup:
1) I was wondering what are the best practices for managing a ES cluster on 
AWS. Is it recommended to put the EC2 ES nodes in an auto-scaling group as 
well? Or is it a problem for the EC2 discovery?

2) If the CPU goes at 100% on a machine, is it recommended to upgrade the 
type of the machine to something more powerful or to add a new node?

3) Is there a recommended configuration schema in term of number of nodes 
in the cluster ?

Thanks a lot for your answer,
Thomas (@nypias)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/441c3b42-0fb6-427d-b520-85f8c8ba1fee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: There were no results because no indices were found that match your selected time span

2014-02-02 Thread Thomas Ardal
Okay, thanks!

On Tuesday, January 28, 2014 8:53:27 PM UTC+1, David Pilato wrote:

 Should work from 0.90.9. 

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 28 janvier 2014 at 20:51:14, Thomas Ardal 
 (thoma...@gmail.comjavascript:) 
 a écrit:

 I know and that's the plan. But with 1.0.0 right around the corner and a 
 lot of data to migrate, I'll probably wait for that one. 

 Does Marvel only support the most recent versions of ES?

 On Tuesday, January 28, 2014 8:43:26 PM UTC+1, David Pilato wrote: 

  0.90.1?
 You should update to 0.90.10.

 --
 David ;-) 
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
  
 Le 28 janv. 2014 à 20:11, Thomas Ardal thoma...@gmail.com a écrit :

  As bonus info I'm running Elasticsearch 0.90.1 on windows server 2012. 
 I'm using the Jetty plugin to force https and basic authentication, but are 
 accessing Marvel from localhost through http. My browser asks me for 
 credentials when opening the Marvel url, so it could be caused by the basic 
 authentication setup. Or?

 On Tuesday, January 28, 2014 8:01:21 PM UTC+1, Thomas Ardal wrote: 

 When trying out Marvel on my Elasticsearch installation, I get the error 
 There were no results because no indices were found that match your 
 selected time span in the top of the page. 

 If I understand the documentation, Marvel automatically collects 
 statistics from all indexes on the node. What am I doing wrong?
  
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/7255ee52-5101-4942-8abd-b29642035237%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.

   --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/8bee9ba2-d0bf-42c3-b8ac-2c45707b9f96%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c2e396aa-7fcb-4257-ba10-c5b89827f662%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


There were no results because no indices were found that match your selected time span

2014-01-28 Thread Thomas Ardal
When trying out Marvel on my Elasticsearch installation, I get the error 
There were no results because no indices were found that match your 
selected time span in the top of the page.

If I understand the documentation, Marvel automatically collects statistics 
from all indexes on the node. What am I doing wrong?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/454e6e01-de1a-4a23-b270-16bf90273c47%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: There were no results because no indices were found that match your selected time span

2014-01-28 Thread Thomas Ardal
As bonus info I'm running Elasticsearch 0.90.1 on windows server 2012. I'm 
using the Jetty plugin to force https and basic authentication, but are 
accessing Marvel from localhost through http. My browser asks me for 
credentials when opening the Marvel url, so it could be caused by the basic 
authentication setup. Or?

On Tuesday, January 28, 2014 8:01:21 PM UTC+1, Thomas Ardal wrote:

 When trying out Marvel on my Elasticsearch installation, I get the error 
 There were no results because no indices were found that match your 
 selected time span in the top of the page.

 If I understand the documentation, Marvel automatically collects 
 statistics from all indexes on the node. What am I doing wrong?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7255ee52-5101-4942-8abd-b29642035237%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: There were no results because no indices were found that match your selected time span

2014-01-28 Thread Thomas Ardal
I know and that's the plan. But with 1.0.0 right around the corner and a 
lot of data to migrate, I'll probably wait for that one.

Does Marvel only support the most recent versions of ES?

On Tuesday, January 28, 2014 8:43:26 PM UTC+1, David Pilato wrote:

 0.90.1?
 You should update to 0.90.10.

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 28 janv. 2014 à 20:11, Thomas Ardal thoma...@gmail.com javascript: 
 a écrit :

 As bonus info I'm running Elasticsearch 0.90.1 on windows server 2012. I'm 
 using the Jetty plugin to force https and basic authentication, but are 
 accessing Marvel from localhost through http. My browser asks me for 
 credentials when opening the Marvel url, so it could be caused by the basic 
 authentication setup. Or?

 On Tuesday, January 28, 2014 8:01:21 PM UTC+1, Thomas Ardal wrote:

 When trying out Marvel on my Elasticsearch installation, I get the error 
 There were no results because no indices were found that match your 
 selected time span in the top of the page.

 If I understand the documentation, Marvel automatically collects 
 statistics from all indexes on the node. What am I doing wrong?

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/7255ee52-5101-4942-8abd-b29642035237%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8bee9ba2-d0bf-42c3-b8ac-2c45707b9f96%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: parent/child where parent exists query

2014-01-24 Thread Thomas
Hi Adrien and thanks for the reply,

This sounds like what I was looking for :) Will investigate it

Thanks
Thomas

On Thursday, 23 January 2014 20:25:01 UTC+2, Thomas wrote:

 Hi,

 I have been working with a parent child schema creation and I was 
 wondering if there is a way to perform a search in children documents with 
 the query where parent exists and get only those documents.

 I can index children documents and there is no mandatory to have parent 
 document. Therefore, I want only to get the children who have a parent 
 document.

 Is this functionality possible? How is possible to perform such query? I 
 work with latest version 1.0.0.RC1


 Thank you
 Thomas 



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3bda84b6-a01c-4123-a875-05d7b23e9389%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: parent/child where parent exists query

2014-01-24 Thread Thomas
Hi,

I have two questions with regards this feature

1) I have been reading the documentation and at some point it described 
that all _ids are loaded into memory:

all _id values are loaded to memory (heap) in order to support fast lookups


This means that only the parent _ids that match correct? Not all the _ids 
of the parent/child relationship

 
2) Is it possible to perform an aggregation and take some values from the 
parent document and some from the child document? for example take some 
information from the parent document (e.g. city) and some information from 
the child document (e.g. for car model= BMW) and do some counts? and 
present them all together?

e.g.

{

parentDoc:{ 

 properties: {

 city:{

type: string,

index: not_analyzed 

}

}

}

}

{

childDoc:{ 

 properties: {

 carModel:{

type: string,

index: not_analyzed 

}

}

}

}
 

Aggregate child documents per child Document where parent exists (because i 
have childDocuments without parent info and i do not want to take those) 
per city and carModel. Below is an idea of what I'm trying to do

curl -XGET 'localhost:9200/delivery_logs_pc_idx/childDoc/_search?pretty' -d 
'{
query: {
match_all:{}
},
aggregations : {
myFilter : {
filter : {
{
has_parent : {
type : deliveryLog,
query: {match_all:{}}
}
} 
},
aggregations : {
preferrence : { terms : { script : 
*doc.parent.city.value* +  *doc.child.carModel.value*, size:100 } }
}
}
},
size:0
}'

Is there an alternative way of achieving that?

Thank you

On Thursday, 23 January 2014 20:25:01 UTC+2, Thomas wrote:

 Hi,

 I have been working with a parent child schema creation and I was 
 wondering if there is a way to perform a search in children documents with 
 the query where parent exists and get only those documents.

 I can index children documents and there is no mandatory to have parent 
 document. Therefore, I want only to get the children who have a parent 
 document.

 Is this functionality possible? How is possible to perform such query? I 
 work with latest version 1.0.0.RC1


 Thank you
 Thomas 



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05e42115-b73f-40fe-81ef-332ca0ab75b1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Cannot set string type to analyzed

2013-12-17 Thread Thomas
Thanks for your reply 

Here is an example:

{
  query:{
filtered:{
  query:{term:{userId:testUser1}},
  filter:{
   nested:{
  path:extra,
  query:{
filtered:{
  query:{match_all:{}},
  filter:{
bool:{
  must:[
{term:{key : city}},
{term:{value : Stockholm}}
  ]
}
  }
}
  }
}
  }
}
  },
  size:0
}

I have inserted data with key city and value Stockholm in my scenario, the 
thing is that if I change the mapping to not_analyzed and run the same 
scenario i get back data whereas if I set index to analyzed or not set it 
at all (default value) then I do not get back results which leads me to the 
conclusion that somehow analyzed is not accepted when setting mapping

Thomas

On Tuesday, December 17, 2013 5:10:58 PM UTC+2, Thomas wrote:

 Hi,

 I'm trying to create a mapping for a nested document and I realize that i 
 cannot set the index type of a string field to analyzed:

 {
 action: {
 _all: {
 enabled: false
 },
 _type: {
 index: no
 },
 _timestamp: {
 enabled: true,
 store: yes
 },
 _routing: {
 required: true,
 path: userId
 },
 properties: {
 userId: {
 type: string,
 index: not_analyzed
 },
 extra: {
 type: nested,
 properties: {
 key: {
 type: string,
 index: not_analyzed
 },
 *value: {*
 * type: string,*
 * index: analyzed*
 * }*
 }
 }
 }
 }
 }


 The above ends up with the following result when calling for the mapping 
 (curl -XGET localhost:9200/my_index/action/_mapping?pretty ):

 {
 action: {
 _all: {
 enabled: false
 },
 _type: {
 index: no
 },
 _timestamp: {
 enabled: true,
 store: yes
 },
 _routing: {
 required: true,
 path: userId
 },
 properties: {
 userId: {
  type : string,
  index : not_analyzed,
  omit_norms : true,
  index_options : docs
 },
 extra: {
 type: nested,
 properties: {
 key: {
type : string,
index : 
 not_analyzed,
omit_norms : true,
index_options : docs 
 },
 *value: {*
 * type: string*
 * }*
 }
 }
 }
 }
 }

 Should it display index type as analyzed? furthermore I cannot search 
 based on this field? Should I search by defining the analyzer?

 ES version 0.90.7 but i noticed this since version 0.90.2, probably I'm 
 doing something wrong here.

 Looking forward for your reply
 Thank you




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b31fd783-2477-4011-8f59-6c7651b0437e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.