Malformed Json for an object with a rawfield only using XContentBuilder

2013-12-24 Thread Andrea Zinicola
Hi,

I couldn't find a way to create a valid document using the XContentBuilder 
including a raw field only. 

Following the code I used:

  jsonStr = {\field\:\value\};
  XContentBuilder xb = jsonBuilder()
  .startObject()
  .rawField(object_name, jsonStr.getBytes())
   .endObject();

This code resulted in a malformed json document { , object_name : 
{field:value} }. Looking at the JsonXContentGenerator implementation of 
writeRawField it looks like this is what the code is expected to do, in 
fact:

@Override
public void writeRawField(String fieldName, byte[] content, 
OutputStream bos) throws IOException {
generator.writeRaw(, \);
generator.writeRaw(fieldName);
generator.writeRaw(\ : );
flush();
bos.write(content);
}

In order to overcome this I had to add a dummy empty field to the document:

  XContentBuilder xb = jsonBuilder()
  .startObject()
  .field(dummy, )
  .rawField(DOC_OBJECT_NAME, jsonStr.getBytes())
   .endObject();

I wonder if there is any other method allowing to add a raw field only to a 
document using XContentBuilder or not. 

Thank you in advance,

Andrea

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1eb49552-df19-4548-9696-534c1c3428b7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Need help retrieving field from ES

2013-12-24 Thread Nick Toseland
Hi Karol

Thanks for the reply. We have been left this ES setup by a previous member
of staff.

We create new Indexes every hour using the following the following perl
statement

Are you saying I have to add store = yes to keepalive?
We don't do that for the other as you can see?


create_index(
   index   = $index,
   settings = {
   _timestamp = { enabled = 1, store
= 1 },
   number_of_shards  = 3,
   number_of_replicas= 1,
   },
   mappings = {
   varnish = {
   _timestamp = { enabled = 1,
store = 1 },
   properties  = {
   content_length = { type =
'integer' },
   age = { type = 'integer' },
   keepalive = { type =
'integer' },
   resp_time   = { type =
'float' },
   host= { type =
'string', index = 'not_analyzed' },
   time= { type =
'string', store = 'yes' },
SNIPPED
   location= { type =
'string', index = 'not_analyzed' },
   addr = {
   fields = {
   ip = { type =
'ip' },
   addr   = { type =
'string', index = 'not_analyzed' },
   }
   },
   }
   }
   },
   );

I would no if the _source field has been disabled, how do I check? Does
this help more:

{

_index: 2013122312

_type: log

_id: Juh_YQJaT4GQ8Pjwk1bnqw

_score: 1

_source: {

protocol: HTTP/1.0

cdn: -

vary: Accept-Encoding,ETag

browser: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.1)
GeckaSeka/20090911 Firefox/3.5.1

encoding: -

location: -

geo: US

ref: -

origin: -

cookie: -

uri: /

cache_control: -

content_length: 54053

userid: 0

age: 11556

resp_time: 0.000110149

method: GET

accept: -

ssl: 0

response_code: 200

accept_language: -

varnstat: hit

_src: log

addr: 41.5.97.6

}

}


Do you need anymore information to help?

Thanks again

Nick



On Tue, Dec 24, 2013 at 2:27 AM, Karol Gwaj ka...@gwaj.me wrote:

 keepalive field is stored in _source field (if you want to store it
 separately you have to add store : true to mapping)
 hard to tell more based on your example,
 also maybe you disabled _source field completely?


 On Monday, December 23, 2013 8:40:17 PM UTC, Nick Toseland wrote:

 Hi All

 I am new to ElasticSearch, please forgive my stupidity.

 I cant seem to get the keepalive field out of ES.

 {
   _index : lj-2013122320,
   _type : varnish,
   _id : Y1M18ZItTDaap_rOAS5YOA,
   _score : 1.0
 }

 I can get other field out of it cdn:

 {
   _index : 2013122320,
   _type : log,
   _id : 2neLlVNKQCmXq6etTE6Kcw,
   _score : 1.0,
   fields : {
 cdn : -
   }
 }

 The mapping is there:

 {log:{_timestamp:{enabled:true,store:true},properties:
 {keepalive:{type:integer

 Any help is much appreciated.

 Thanks in advance

 Nick

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/roVCeLImQxs/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4ddd4df2-b2f6-4a3f-8ff5-5a0196f389d7%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMWL0yPqpG6H5gC58UWZHFodwOV14WM4P_4LpovA_mTB%3DDkG2A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Setting path.data stops the EC2 stuff

2013-12-24 Thread Steinar Bang
Platform: ES 0.9.7 on Ubuntu 12.04 on AWS EC2 instance

I now have the EC2 clustering stuff working on itself.

What I'm trying to do now, is index on ephemeral storage (where I have
space) and save the index to s3, and restore it on EC2 instance startup.

Note: I am only running a single node in a single EC2 instance.

Note2: I will try the EBS stuff also but right now I have an s3 bucket
set up for me, and can experiment with it. I need someone with EC2 admin
privileges to set up the EBS volume (...or so I think...?).

My problem now is that if I set data.path to in
/etc/elasticsearch/elasticsearch.yml to /mnt/elasticsearch (to get
ephemeral storage, since my main problem here is running out of disk
space on an EC2 instance that is started and stopped) then the EC2
discovery and S3 settings seem to be skipped.

Here is the complete elasticsearch.yml file with keys XXX'd:
 https://gist.github.com/steinarb/8094353

Ie. if I don't set path.data then the file
/var/log/elasticsearch/mysecretname-cluster.log is written to, and messages
like this, is output:
 [2013-12-24 12:17:14,341][TRACE][discovery.ec2] [Foster, Bill] 
building dynamic unicast discovery nodes...
 [2013-12-24 12:17:14,342][DEBUG][discovery.ec2] [Foster, Bill] 
using dynamic discovery nodes []
 ...
 [2013-12-24 12:17:17,715][DEBUG][gateway.s3   ] [Foster, Bill] 
reading state from gateway 
org.elasticsearch.gateway.shared.SharedStorageGateway$1@2eb9f428 ...

If I do set data.path, /var/log/elasticsearch/elasticsearch.log is
written to and no .ec2 and .s3 messages can be found in the log.

(Once I get this working, one more problem will be to ensure that
/mnt/elasticsearch is created and owned by user elasticsearch before ES
is started.)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/878uvaig05.fsf%40dod.no.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Aggregate vs facets vs nested documents ?

2013-12-24 Thread Adrien Grand
Hi,

Your data model is assuming a 1-N relationship between transactions and
worker events. There several ways that you can solve this kind of issues
with Elasticsearch:
 - denormalization,
 - nested documents,
 - parent/child relations.

Maybe the easiest way to do it would be to store data directly in the
expected format. On a start event, you would insert a new transaction into
your index using the transaction id as a document id, and later when the
transaction ends, you could update the transaction to record the fact that
the transaction finished and its duration. With this option, data is
indexed in a way that is easily searchable and you'll be able to leverage
all the power of aggregations facets to compute things like the number of
non-terminated transactions, the average duration, etc.

However, if you have a very high ingestion rate, this option might become a
bit too slow, in which case you might want instead to record transactions
as parent documents and events as child documents using parent/child
relations. This will make indexing faster but require more memory and
you'll have less power at query time. For example, finding transactions
that started but didn't finished would require to find the start events,
then resolve their transaction ids, then find the end events, resolve their
transaction ids and finally to compute the difference of the two sets of
transactions. This kind of query would typically execute much slower than
if you already had all information in a single transaction document (as
with the previous option). Moreover computing things like the average
duration of transactions using facets or aggregations wouldn't be possible
anymore with parent/child relations.

See
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/for
more information.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7c9-M999xebmfMgrhv%2BpA51R1EeF0oTuZ5hXycf0y0DA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: ElasticSearch and Oracle Java version 1.7 update 45

2013-12-24 Thread Adrien Grand
Hi,

Indeed Lucene suffers from bugs in Oracle's Java 1.7u40 and 1.7u45.
However, 1.7u25 should be safe. You can find information about these bugs
in Lucene's JIRA and mailing-list. For example, here is one that affects
Java  1.7u40: https://issues.apache.org/jira/browse/LUCENE-5212

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6DpFhNkDAKQzSc5-e%3DktOcHKdWRSs6T86DLwb8N3ueAQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Compared to Solr (with Solr Cloud), what is the advantage(s) of Elasticsearch?

2013-12-24 Thread Daniel Guo
I never used Apache Solr before, and I'm trying ElasticSearch in my project.
The document of ES is a little scarce, but I have to explain to my 
supervisor why I chose ES over Solr.

As far as I know, Solr (with Solr Cloud) also supports distributed 
indexing, near real-time update and searching, and automatic load 
balancing, 
which are the main features of ElasticSearch. 

What are the advantages of ES comparing to Apache Solr? Could anybody give 
me a tip, or some information links?
Thanks a lot.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/44aa3f8d-59cb-4500-9b81-694718019057%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Compared to Solr (with Solr Cloud), what is the advantage(s) of Elasticsearch?

2013-12-24 Thread David Pilato
I would say: play with both for some hours.
I really think you will get some answers by yourself!

I don't want to say more than this as I have probably a biased opinion ;-)

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 24 décembre 2013 at 15:16:59, Daniel Guo (daniel5...@gmail.com) a écrit:

I never used Apache Solr before, and I'm trying ElasticSearch in my project.
The document of ES is a little scarce, but I have to explain to my supervisor 
why I chose ES over Solr.

As far as I know, Solr (with Solr Cloud) also supports distributed indexing, 
near real-time update and searching, and automatic load balancing, 
which are the main features of ElasticSearch. 

What are the advantages of ES comparing to Apache Solr? Could anybody give me a 
tip, or some information links?
Thanks a lot.

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/44aa3f8d-59cb-4500-9b81-694718019057%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52b99df5.238e1f29.45b0%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Compared to Solr (with Solr Cloud), what is the advantage(s) of Elasticsearch?

2013-12-24 Thread Nikolas Everett
About six months ago I spent a week porting a prototype from Solr Cloud to
Elasticsearch with the intent of evaluating Elasticsearch and either
throwing out the port or building off of it.  By the third day or so I was
convinced I'd stick with Elasticsearch because:
1.  I was impressed with
http://www.elasticsearch.org/contributing-to-elasticsearch/.
2.  The documentation is better.
3.  I liked the query DSL better than solr's.
4.  There is some http GET that you can hit in solr that will delete the
index (or a shard or something).  That shook my faith in humanity a little.
 Especially when I pasted it into IRC and my coworker clicked it or mouse
overed it or something  Gets.  Idempotent.
5.  I liked the phrase suggester.
6.  My ops team seemed like it better.
7.  There was (and still is) a deb package.
8.  I liked the way Elasticsearch was tested.  I admit I haven't actually
looked into how Solr is tested.

Since then:
1.  I've enjoyed the process of landing changes in Elasticsearch much more
then Lucene.  I assume Solr would be the same because it is in the same
repository as Lucene,  The github process (pull request, etc) is better
than JIRA/svn/patch files.  I also think the Elasticsearch
committers/repository collaborators are easier to work with then the Lucene
folks.
2.  The phrase suggester needed some work to be as good as our
(surprisingly advanced) home grown suggester.  It is now that good.
3.  Elasticsearch has really improved the process of maintaining their
documentation so I imagine it'll only get better.
4.  It seems to be working.  We're using 0.90.7 at this point (see
https://en.wikisource.org/wiki/Special:Version) to power the search on a
couple hundred wikis without any trouble.  Try it:
https://en.wikisource.org/w/index.php?search=aliastitle=Special%3ASearch

Nik


On Tue, Dec 24, 2013 at 9:45 AM, David Pilato da...@pilato.fr wrote:

 I would say: play with both for some hours.
 I really think you will get some answers by yourself!

 I don't want to say more than this as I have probably a biased opinion ;-)

 --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 24 décembre 2013 at 15:16:59, Daniel Guo 
 (daniel5...@gmail.com//daniel5...@gmail.com)
 a écrit:

 I never used Apache Solr before, and I'm trying ElasticSearch in my
 project.
 The document of ES is a little scarce, but I have to explain to my
 supervisor why I chose ES over Solr.

 As far as I know, Solr (with Solr Cloud) also supports distributed
 indexing, near real-time update and searching, and automatic load
 balancing,
 which are the main features of ElasticSearch.

 What are the advantages of ES comparing to Apache Solr? Could anybody give
 me a tip, or some information links?
 Thanks a lot.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/44aa3f8d-59cb-4500-9b81-694718019057%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/etPan.52b99df5.238e1f29.45b0%40MacBook-Air-de-David.local
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0twn4ywTg5wUS44F_otxHCVcYT2vHjG8%2B0DS_PXHF_TQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Possible to make ES Node Name same as Hostname?

2013-12-24 Thread Tony Su
Hello Jorg,
I was not aware of this page, but after skimming the page it is not what I 
need in this situation.
The majority of the commands on this page are invoked as attributes of 
systemctl which is the command line tool to inspect, modify and manage 
systemd in a running system.
I'm not sure why yet some commands are included that AFAIK are more 
appropriately invoked within a Unit config file, but maybe those commands 
can be invoked in both situations.
 
But, since these systemd environment commands are to be run from within 
systemctl, they're basically ways to modify a running environment 
interactively, they're not the way to setup and modify the environment on 
bootup.
 
In systemd, environment variables can be setup during boot, primarily in 
the .target Unit files, but because systemd is uniquely fully compatible 
with other Linux subsystems, almost all the traditional ways to setup, 
modify and run is supported. In openSUSE' case, it's migrating from the 
well known SystemVinit subsystem, so until legacy init and bash ways of 
invoking code are replaced, they are a perfectly legitimate way of doing 
things, still.
 
So, that is why in my previous post I described how I traditionally used 
the bash script way of creating an environment variable 
(/etc/profile.local) and then tested to make sure it works... So that 
doesn't appear to be more problem. 
 
Instead, I believe the problem sounds explicitly Elasticsearch code when 
the following very specific error was returned
elasticsearch.service: main process exited, code=exited, 
status=3/NOTIMPLEMENTED
 
Could have been any error, but one so specific?
 
Tony
 
 
 
 
 

On Monday, December 23, 2013 2:54:35 PM UTC-8, Jörg Prante wrote:

 Note, if you are using systemd, you must set environment vars with 
 systemctl http://www.freedesktop.org/software/systemd/man/systemctl.html

 Not sure what -Des commands are. If you mean the elasticsearch command 
 line, many ES config variables can be prefixed with es., the -D flag is 
 Java.

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2c3eab8b-8446-4d57-b3f2-c0d10eef79bb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How can I merge the results of two aggregations?

2013-12-24 Thread Adrien Grand
You can use scripts to run aggregations on the union of two fields. Here is
an example:

GET /test/_search
{
aggregations: {
name : {
terms : {
script: _doc['firstname'].values +
_doc['lastname'].values
}
}
}
}

Another alternative would be to index both first names and last names in a
single 'name' field and to run the aggregation on this field at search
time. Although this would require more disk space, this would also be
faster.

On Fri, Dec 20, 2013 at 10:53 AM, Tim S timsti...@gmail.com wrote:

 Sorry, maybe I'm missing something. Afaics the OR filter would operate on
 filters or queries, not on aggregations. Can you give me an example of how
 I'd use this to merge the result of the two aggregations?

 Thanks.


 On Friday, December 20, 2013 2:54:14 AM UTC, kidkid wrote:

 Hi sorry, it would be my mistake:

 Could you take a look at OrFilter: http://www.elasticsearch.org/guide/en/
 elasticsearch/reference/current/query-dsl-or-filter.html

 In your case I think you could use match all query  use OrFilter  let
 ES merge the result.





 On Thursday, December 19, 2013 2:29:06 AM UTC-8, Tim S wrote:

 How can I use bool query on the result of an aggregation? I want both
 aggregations to independently facet on the whole index, then merge the
 results. I can see how I would use a bool query to limit the set of docs
 I'm aggregating on, but I can't see how I would use it to merge the results
 of two aggregations?

 Thanks,

 Tim.

 On Wednesday, December 18, 2013 4:40:37 PM UTC, kidkid wrote:

 You could use bool query and let ElasticSearch do the rest.

 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/query-dsl-bool-query.html

 On Wednesday, December 18, 2013 3:58:07 PM UTC+7, Tim S wrote:

 In the example below, I ask elasticsearch for two aggregations (I've
 simplified it, it's actually got some nested aggregations in there).

 {
 aggregations: {
 agg1: {
 terms: {
 field: forname
 }
 },
 agg2: {
 terms: {
 field: surname
 }
 }
 }
 }

 What I get back is two sets of results, i.e.

 {
 aggregations : {
 agg1 : {
 buckets : [ {
 key : john,
 doc_count : 1
 }, {
 key : bob,
 doc_count : 4
 } ]
 },
 agg2 : {
 buckets : [ {
 key : smith,
 doc_count : 3
 }, {
 key : jones,
 doc_count : 2
 } ]
 }
 }
 }

 What I'd like to get back is one set of results. I.e. a list of terms
 appearing in either of the fields, with the counts summed across both, 
 e.g.

 {
 aggregations : {
 agg1 OR agg2 : {
 buckets : [ {
 key : john,
 doc_count : 1
 }, {
 key : bob,
 doc_count : 4
 }, {
 key : smith,
 doc_count : 3
 }, {
 key : jones,
 doc_count : 2
 } ]
 }
 }
 }

 Is there any way of doing this? I could request them and merge them in
 my own code, but if there's a built in way then I'd rather use that.

 Thanks,

 Tim.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/067e2d3e-2d7f-46e9-9ce6-0384a9a89b43%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4VLK-NQbREqtTwVeDC%3DrTgTcWfyxjeWmF_uL%3DeJsus6A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Completion suggester with separated doc types (multiple indices?)

2013-12-24 Thread Facundo Olano
Hello, I've started playing around with the completion suggester to 
implement autocomplete functionality in my application and found it pretty 
straightforward and simple to use. 

The problem I have is that I don't want mixed types in my suggestions: if 
have a music index with song and artist types, I want to have song 
and artist autocompletes. From the documentation I get that this doesn't 
seem to be supported, so I'm considering having a separate index for each 
type. I've 
readhttp://elasticsearch-users.115913.n3.nabble.com/More-indices-vs-more-types-td3999423.html#a4002051that
 this is not the best practice, but from the size of my data I presume 
I won't be having problems: I have around 10 types, most of them with 
around 2000 documents. It's worth noting that I'm just using elasticsearch 
for this autocomplete functionality (although I may use it for regular 
search eventually).

So I wanted to check if it makes sense modeling my indices that way or if 
there's an alternative solution to my problem (for example, using the suggest 
plugin https://github.com/spinscale/elasticsearch-suggest-plugin?)

Thanks, 
Facundo.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/65d950a0-0784-46eb-94c3-7689ab858fe2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Possible to make ES Node Name same as Hostname?

2013-12-24 Thread Tony Su
OK, 
I ran each of your tests, but because I'm invoking elasticsearch as a 
service and not from the CLI, I interactively ran the command you wanted 
prepended, then started the elasticsearch service, then reloaded 
elasticsearch-head pointing to the elasticsearch node.
 
So, as follows, I first disabled the service so it doesn't start 
automatically. After each block below I ran elasticsearch-head with the 
same results, a random friendly node name was created in 
elasticsearch-head
 
# echo $NAME
/bin/hostname
# $NAME
ELASTICSEAR-1
 
# systemctl stop elasticsearch.service
# export NAME=hostname
# systemctl start elasticsearch.service
 
# systemtcl stop elasticsearch.service
# export NAME=$HOSTNAME
# systemctl start elasticsearch.service
 
# systemctl stop elasticsearch.service
# export NAME=$hostname
# systemctl start elasticsearch.service
 
For your reference is the contents of the elasticsearch.service Unit file 
(aka service configuration file). Although it references many exterior 
files (the -Des commands), I doubt in this case anything in them are likely 
to be relevant because we seem to be dealing with a variable (if supported) 
is a null value.
 
[Unit]
Description=Starts and stops a single elasticsearch instance on this system
Documentation=http://www.elasticsearch.org
[Service]
Type=forking
EnvironmentFile=/etc/sysconfig/elasticsearch
User=elasticsearch
Group=elasticsearch
PIDFile=/var/run/elasticsearch/elasticsearch.pid
ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p 
/var/run/elasticsearch/elasticsearch.pid -Des.default.config=$CONF_FILE 
-Des.default.path.home=$ES_HOME -Des.default.path.logs=$LOG_DIR 
-Des.default.path.data=$DATA_DIR -Des.default.path.work=$WORK_DIR 
-Des.default.path.conf=$CONF_DIR
# See MAX_OPEN_FILES
LimitNOFILE=65535
# See MAX_LOCKED_MEMORY, use infinity when MAX_LOCKED_MEMORY=unlimited 
and using bootstrap.mlockall: true
#LimitMEMLOCK=infinity
[Install]
WantedBy=multi-user.target
 
 
A thought, if you're running the same version as me, I wonder if there 
might be a difference between the RPM build (which I am using) and yours 
which of course is a DEB build unless you built from source. I'm using the 
RPM downloaded directly through the Elasticsearch website.
 
Tony

 
 
 
 
 
 
 
 
 
 
 

On Monday, December 23, 2013 6:16:45 PM UTC-8, Karol Gwaj wrote:

 Im using elasticsearch version 0.90.7 (which is this same as the one you 
 mentioned in your question)
 Also im running my elasticsearch cluster on ubuntu, so my example was more 
 suited for this linux distribution

 and coming back to your problem, to diagnose it you can:
 - add *echo $NAME* statement at the beginning of bin/elasticsearch script 
 (if nothing is printed then your environment variable is not declared 
 correctly)
 - add *export NAME=`hostname`* at the beginning of bin/elasticsearch 
 script 
 - add *export NAME=$HOSTNAME* at the beginning of bin/elasticsearch script

 can you try the steps above first (one at the time), before defining your 
 environment variable in /etc/profile.local

 Cheers,

 On Monday, December 23, 2013 9:52:40 PM UTC, Tony Su wrote:

 After considering this post,
 I successfully created an environmental variable by adding to the bash 
 profile file (actually on the openSUSE I'm running, I created a file 
 /etc/profile.local which contains system customizations, the original 
 /etc/profile should not be edited). BTW - on a non-Windows box, hostname 
 must be in lower case, not upper case.
  
 /etc/profile.local
 export NAME=/bin/hostname
  
 After running source /etc/profile.local to activate the contents of the 
 file I can successfully test the new variable, it does return the machine's 
 hostname.
  
 $NAME
  
 But, when I modify the elasticsearch.yml file exactly as described
  
   *node.name* http://node.name/: ${NAME}
  
 The result is that the elasticsearch service fails to start with the 
 following error:
  
 ELASTICSEAR-1 systemd[1]: Starting Starts and stops a single 
 elasticsearch instance on this system...
 Dec 23 13:23:18 ELASTICSEAR-1 systemd[1]: PID file 
 /var/run/elasticsearch/elasticsearch.pid not readable (yet?) after start.
 Dec 23 13:23:43 ELASTICSEAR-1 systemd[1]: Started Starts and stops a 
 single elasticsearch instance on this system.
 Dec 23 13:23:43 ELASTICSEAR-1 systemd[1]: elasticsearch.service: main 
 process exited, code=exited, status=3/NOTIMPLEMENTED
 Dec 23 13:23:43 ELASTICSEAR-1 systemd[1]: Unit elasticsearch.service 
 entered failed state.
 I also tried without the curly braces, but then the string is read 
 literally and not as a variable.
 Commenting out the attempt to set the node name to the hostname allows 
 the elasticsearch service to start again.
  
 From the above error(not implemented), is it possible that the current 
 stable elasticsearch release does not support your recommendation and I 
 need to maybe install an unstable version?
  
 Thx,
 Tony
  
  
  
  
  

 On Sunday, December 22, 2013 4:53:48 PM UTC-8, Karol Gwaj 

Reports and Notifications.

2013-12-24 Thread CP
We have a HUGE splunk install and constantly are running into our limits. 
 We have decided to go with a tiered solution using 
kibana+logstash+elasticsearch.  The one thing that we really need is a way 
to have reports generated like we can with splunk.  Does anyone know if 
there is a plugin or third party app to do this sort of thing?

Thanks,
CP

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4ec90c41-bde7-406c-9d15-ae7e28115add%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Possible to make ES Node Name same as Hostname?

2013-12-24 Thread Karol Gwaj
yep, my setup is a little bit different from yours
i actually downloaded elasticsearch as tar.gz file

probably the problem you are experiencing is related more to the fact that 
systemctl is not passing environment variables to executed script
i had similar problem with upstart (which is equivalent of systemctl on 
ubuntu)

from what i know about systemctl, the way to define environment variables 
that will be passed to executed script is through EnvironmentFile 
so maybe try to define NAME environment variable in 
*/etc/sysconfig/elasticsearch* (EnvironmentFile )

from your service unit file i see that elasticsearch startup script is 
located in:  */usr/share/elasticsearch/bin/elasticsearch*
if you add definition of your environment variable at the beginning of this 
file then you will not have to worry about systemctl not passing 
environment variables to your script

so try something like that first:

export NAME=$HOSTNAME
echo $NAME # this should 
print your hostname
/usr/share/elasticsearch/bin/elasticsearch


On Tuesday, December 24, 2013 4:24:02 PM UTC, Tony Su wrote:

 OK, 
 I ran each of your tests, but because I'm invoking elasticsearch as a 
 service and not from the CLI, I interactively ran the command you wanted 
 prepended, then started the elasticsearch service, then reloaded 
 elasticsearch-head pointing to the elasticsearch node.
  
 So, as follows, I first disabled the service so it doesn't start 
 automatically. After each block below I ran elasticsearch-head with the 
 same results, a random friendly node name was created in 
 elasticsearch-head
  
 # echo $NAME
 /bin/hostname
 # $NAME
 ELASTICSEAR-1
  
 # systemctl stop elasticsearch.service
 # export NAME=hostname
 # systemctl start elasticsearch.service
  
 # systemtcl stop elasticsearch.service
 # export NAME=$HOSTNAME
 # systemctl start elasticsearch.service
  
 # systemctl stop elasticsearch.service
 # export NAME=$hostname
 # systemctl start elasticsearch.service
  
 For your reference is the contents of the elasticsearch.service Unit file 
 (aka service configuration file). Although it references many exterior 
 files (the -Des commands), I doubt in this case anything in them are likely 
 to be relevant because we seem to be dealing with a variable (if supported) 
 is a null value.
  
 [Unit]
 Description=Starts and stops a single elasticsearch instance on this system
 Documentation=http://www.elasticsearch.org
 [Service]
 Type=forking
 EnvironmentFile=/etc/sysconfig/elasticsearch
 User=elasticsearch
 Group=elasticsearch
 PIDFile=/var/run/elasticsearch/elasticsearch.pid
 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p 
 /var/run/elasticsearch/elasticsearch.pid -Des.default.config=$CONF_FILE 
 -Des.default.path.home=$ES_HOME -Des.default.path.logs=$LOG_DIR 
 -Des.default.path.data=$DATA_DIR -Des.default.path.work=$WORK_DIR 
 -Des.default.path.conf=$CONF_DIR
 # See MAX_OPEN_FILES
 LimitNOFILE=65535
 # See MAX_LOCKED_MEMORY, use infinity when MAX_LOCKED_MEMORY=unlimited 
 and using bootstrap.mlockall: true
 #LimitMEMLOCK=infinity
 [Install]
 WantedBy=multi-user.target
  
  
 A thought, if you're running the same version as me, I wonder if there 
 might be a difference between the RPM build (which I am using) and yours 
 which of course is a DEB build unless you built from source. I'm using the 
 RPM downloaded directly through the Elasticsearch website.
  
 Tony

  
  
  
  
  
  
  
  
  
  
  

 On Monday, December 23, 2013 6:16:45 PM UTC-8, Karol Gwaj wrote:

 Im using elasticsearch version 0.90.7 (which is this same as the one you 
 mentioned in your question)
 Also im running my elasticsearch cluster on ubuntu, so my example was 
 more suited for this linux distribution

 and coming back to your problem, to diagnose it you can:
 - add *echo $NAME* statement at the beginning of bin/elasticsearch 
 script (if nothing is printed then your environment variable is not 
 declared correctly)
 - add *export NAME=`hostname`* at the beginning of bin/elasticsearch 
 script 
 - add *export NAME=$HOSTNAME* at the beginning of bin/elasticsearch 
 script

 can you try the steps above first (one at the time), before defining your 
 environment variable in /etc/profile.local

 Cheers,

 On Monday, December 23, 2013 9:52:40 PM UTC, Tony Su wrote:

 After considering this post,
 I successfully created an environmental variable by adding to the bash 
 profile file (actually on the openSUSE I'm running, I created a file 
 /etc/profile.local which contains system customizations, the original 
 /etc/profile should not be edited). BTW - on a non-Windows box, hostname 
 must be in lower case, not upper case.
  
 /etc/profile.local
 export NAME=/bin/hostname
  
 After running source /etc/profile.local to activate the contents of 
 the file I can successfully test the new variable, it does return the 
 machine's hostname.
  
 $NAME
  
 But, when I modify the elasticsearch.yml file exactly as 

Re: Completion suggester with separated doc types (multiple indices?)

2013-12-24 Thread Alexander Reelsen
Hey,

the way the completion suggester is implemented, it does not support
filtering by types (neither does the suggest plugin) - so this makes a
pretty clear decision process for your use-case. The reason for this, is
the different approach how suggest data is stored and queried - in a
nutshell, the suggest data structure simply takes the whole index data and
uses it for suggestions. The type itself is simply spoken just another
metadata, which cannot be filtered out.

I'd go with the completion suggester if possible (as I wrote the suggest
plugin I can tell that the completion suggester has a way better design,
and, obviously, is part of the core).

Hope this helps...


--Alex



On Tue, Dec 24, 2013 at 4:50 PM, Facundo Olano facundo.ol...@gmail.comwrote:

 Hello, I've started playing around with the completion suggester to
 implement autocomplete functionality in my application and found it pretty
 straightforward and simple to use.

 The problem I have is that I don't want mixed types in my suggestions: if
 have a music index with song and artist types, I want to have song
 and artist autocompletes. From the documentation I get that this doesn't
 seem to be supported, so I'm considering having a separate index for each
 type. I've 
 readhttp://elasticsearch-users.115913.n3.nabble.com/More-indices-vs-more-types-td3999423.html#a4002051that
  this is not the best practice, but from the size of my data I presume
 I won't be having problems: I have around 10 types, most of them with
 around 2000 documents. It's worth noting that I'm just using elasticsearch
 for this autocomplete functionality (although I may use it for regular
 search eventually).

 So I wanted to check if it makes sense modeling my indices that way or if
 there's an alternative solution to my problem (for example, using the suggest
 plugin https://github.com/spinscale/elasticsearch-suggest-plugin?)

 Thanks,
 Facundo.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/65d950a0-0784-46eb-94c3-7689ab858fe2%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9E1D57m%2B1d8HM2jfyawo3COuadybgds2LhzoQv3Lw86w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Nodes are not able to connect to the master

2013-12-24 Thread Amit Soni
Alex - So if I were to look for a right suggestor which works on a part of
index (filtered) and not the entire index, is edge N-gram the best choice?

-Amit.


On Tue, Dec 24, 2013 at 4:21 PM, Alexander Reelsen a...@spinscale.de wrote:

 Hey,

 connection timed out means, that the other host is not reachable. This can
 have dozens of reasons. If the node never joins the cluster, you might have
 a firewall problem. If the node already had joined the cluster, your might
 have a temporary network outage, or maybe your node is under an extremely
 high load. Without proper monitoring and more digging through the log files
 this is really hard to tell.

 A first try might be, if you are able to reach that port manually on that
 host - completely independent from elasticsearch itself. If  this does not
 work, you got other problems.


 --Alex


 On Tue, Dec 24, 2013 at 8:33 AM, deep saxena sandy100s...@gmail.comwrote:

 org.elasticsearch.transport.ConnectTransportException: [Crooked
 Man][inet[/192.168.202.1:9300]] connect_timeout[30s]
 at
 org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:671)
 at
 org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:610)
 at
 org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:580)
 at
 org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:127)
 at
 org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:300)
 at
 org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.ConnectException: Connection timed out: no further
 information: /192.168.202.1:9300
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
 at
 org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 at
 org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

 .yaml file contain the default settings, no changes in that, still
 not connecting. Is there any issue with the network setting or the
 elasticsearch configuration?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c1225b66-2b1e-48e9-941e-23c9fb181328%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-EdjRXhqeyao_Fz4zWyd9Wsftbg_kfqu%3DWQE2Z_%3DE%2B5Q%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAOGaQKpJHhyQ2z%3DePZbLe%2BCVhUe-HZ39Wt-7UZJnkg7RZG3ww%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Reports and Notifications.

2013-12-24 Thread chad patterson
In splunk there is the ability to create a report or notification if a
certain threshold/event/trigger is captured. Then a script or report is
triggered/sent.  We are looking for simuliar functionality.  I will check
out what you sent.
On Dec 24, 2013 7:31 PM, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

 Hi,

 Could you please describe what you mean by reports?  Are you looking for
 daily/weekly email with graphs or something else?

 We have that in SPM (monitoring) and Logsene (log analytics) is getting
 it, too.  Kibana has this as well via phantomjs, I believe, though I'm not
 sure how/if it's hooked up to email.

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/



 On Tuesday, December 24, 2013 2:53:22 PM UTC-5, CP wrote:

 We have a HUGE splunk install and constantly are running into our limits.
  We have decided to go with a tiered solution using
 kibana+logstash+elasticsearch.  The one thing that we really need is a way
 to have reports generated like we can with splunk.  Does anyone know if
 there is a plugin or third party app to do this sort of thing?

 Thanks,
 CP

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/CrmmeHqa-HY/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4b231a64-52b4-4c23-b87c-bd3680ac44ac%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANVfK9LeVrGi_p2QtD3Kh5jY1CDgqwaOTfs5s3sh%2B%2BVWAWw6Aw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Several questions on ES in production environment

2013-12-24 Thread Otis Gospodnetic
Hi,

On Tuesday, December 24, 2013 8:47:54 AM UTC-5, Han JU wrote:

 Hi,

 We're approaching the first release of our product and we use 
 ElasticSearch as a key component in our system. But there's still some 
 questions and doubts so I'd like to listen to the more experienced users 
 and ElasticSearch folks here.

 1. We use ElasticSearch as a search tool but also the storage of all 
 documents. It means that the front-end retrieves fields from ES just as if 
 it's  a database. We've already disable the index (index: no) on the fields 
 that don't need to be searched (list of ids etc.) but is this a good usage 
 of ElasticSearch? Given that we expected to have ~ 1 billion documents (~ 
 1.4kb each) in our first 3 months in a single index.


1.4KB is pretty small, so that's fine.  Often keeping it all in ES is 
simpler - doesn't require another hope to another server (e.g. a DB) to 
retrieve display data, there is one moving piece fewer, which makes 
everything simple.  I'd keep your display data in ES and worry about 
changing it later IFF you have issues.
 

 2. We will use thrift to push documents in production because we've seen a 
 performance gain. Is there any downside of using thrift over plain json?

 3. Some of our queries uses regexp filter. In my comprehension this needs 
 to load the target field of every document to see if it matches, so it's 
 pretty costly for an index of 1 billion docs?


Yes, regexps are not the fastest.  What are you trying to do that requires 
regexp filter? 

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f26e7656-61cf-4609-8182-d7c6d406a5cc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.