Re: Count of Words (Text Based Search) Using Facets

2014-02-04 Thread Jun Ohtani
Hi Hiro,

I think you should use script term statistics.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html#_term_statistics

I post sample json and query DSL to gist.
https://gist.github.com/johtani/8818938

Note: Term “Java” is indexed just “java”, because standard analyzer use 
lowercase_filter.
  My sample script use “java” not “Java”. 

I am hoping that it will be helpful for you.

Regards

Jun Ohtani
joht...@gmail.com
blog : http://blog.johtani.info
twitter : http://twitter.com/johtani

2014/01/28 15:54、Hiro Gangwani  のメール:

> Hi,
> We are indexing PDF, Word document in ES using attachment as type. Text based 
> search is implemented using QueryBuilder and field query.Is it possible to 
> get the count of words as defined in the search criteria for each results 
> returned.
> 
> For example:
> Document A contain Java key word 50 times and Document B contains Java key 
> word 30 times.
> When search criteria is "Java" and text based search is executed we get 2 
> documents in search results.
> Is it possible to get count of Java in document A and document B?
> I have used Term facets which just given count of documents where Java text 
> is defined. In this case only 2. What we need is count of Java word in each 
> document returned in result.
> 
> We are stuck up with this requirement and unable to find the solution for 
> this. Any help for this issue is appreciated and thanks in advance.
> 
> Hiro
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/73629eee-7b58-44d4-87b3-aeb0d18b4c03%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: how to calculate relevancy by the help of precision and recall

2014-02-04 Thread Navneet Mathpal

Ivan but
On Wednesday, 5 February 2014 11:36:50 UTC+5:30, Navneet Mathpal wrote:
>
> Hi,
>
> I want to know how do we calculate relevancy with the help of precision 
> and reacall,
>
> for example:-
> A = The number of relevant records retrieved, 
> B = The number of relevantrecords not retrieved, and 
> C = The number of irrelevant records retrieved. 
> In this example A = 45, B = 35(80-45)and C = 15(60-45)
> Recall =(45 / (45 + 35)) * 100% => 45/80 * 100% =56%
> Precision =(45 / (45 + 15)) * 100% => 45/60 * 100% =75%
>
>
> what would be its relevancy how how ?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a3fff8c4-c80b-4b78-9f3b-0ee38dd9b6b5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Adding a river using the python driver

2014-02-04 Thread Honza Král
Hi Mihnea,

you can do any request outside of the dedicated api endpoints by
calling `.transport.perform_request` on your Elasticsearch instance
manually.

Hope this helps,
Honza

On Wed, Feb 5, 2014 at 12:56 AM, Mihnea Dobrescu-Balaur
 wrote:
> Hello,
>
> I can't find how to add a river using the python driver.
>
> I tried in various ways of calling es.index with '_river' as an index but I
> couldn't find a way to add the "_meta" document. The closest I got was an
> error from ES "no river _meta document found after 5 attempts".
>
>
> Can this be done? And if so, how?
>
> Thanks,
> Mihnea
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/56180088-b96a-464a-8f81-c30bcff52b6d%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDiovYCpY7j9vp47vye%3D66a8xxkgwVNCoj%3DbeuPa%3Dni2WSA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: how to calculate relevancy by the help of precision and recall

2014-02-04 Thread Navneet Mathpal
hey thanks ivan :)

On Wednesday, 5 February 2014 12:44:46 UTC+5:30, Ivan Brusic wrote:
>
> Interesting topic. Not elasticsearch specific, but nevertheless 
> interesting. One method to calculate relevancy given the precision and 
> recall of a query is by using the F1 score: 
> http://en.wikipedia.org/wiki/F1_score
>
> F1 would be equal to 2 * (P * R) / (P + R), where P is the precision and R 
> is the recall.
>
> I am glad someone is using elasticsearch as a search engine. Lately it 
> feels like it is mainly an analytical tool. :)
>
> Cheers,
>
> Ivan
>
>
> On Tue, Feb 4, 2014 at 10:06 PM, Navneet Mathpal 
> 
> > wrote:
>
>> Hi,
>>
>> I want to know how do we calculate relevancy with the help of precision 
>> and reacall,
>>
>> for example:-
>> A = The number of relevant records retrieved, 
>> B = The number of relevantrecords not retrieved, and 
>> C = The number of irrelevant records retrieved. 
>> In this example A = 45, B = 35(80-45)and C = 15(60-45)
>>  Recall =(45 / (45 + 35)) * 100% => 45/80 * 100% =56%
>> Precision =(45 / (45 + 15)) * 100% => 45/60 * 100% =75%
>>
>>
>> what would be its relevancy how how ?
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0c5e3d1e-b58b-4ff1-a893-0cfbaef87aa8%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec8db27b-761b-4af5-a59b-fbd7901c5d8c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch LXC on Ubuntu 14.04 and recomended settings

2014-02-04 Thread engel der
Hi Mark,

thank you for your answer.

Am Dienstag, 4. Februar 2014 23:08:32 UTC+1 schrieb Mark Walkom:
>
> That looks ok, similar to how we do things with virtualised master/data 
> nodes.
> I wouldn't specify your shard/replica count on the node though, do it in 
> the index as it allows you to change with ease. 
>

What do you mean by "specify your shard/replica count on the node"? I did 
specify them in the config file /etc/elasticsearch/elasticsearch.yml - is 
that wrong? where else to specify them?

Regards,
Flo


> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 5 February 2014 05:19, Tony Su > wrote:
>
>> Hi Enger,
>> Although I don't yet have enough experience building ES clusters to 
>> directly answer your question(s),
>>  
>> Typically when I'm involved in this type of provisioning, I generally 
>> start off with a set of objectives and then design accordingly. I'd be 
>> interested in your objectives list and then match those to your proposed 
>> configuration.
>>  
>> Thx,
>> Tony
>>  
>>  
>>  
>>
>> On Tuesday, February 4, 2014 9:54:48 AM UTC-8, engel der wrote:
>>
>>> Hi,
>>>
>>> we are setting up a Elasticsearch 1.0 (RC2) Cluster and I think I need 
>>> some help were to start with (settings related). We have got 6 physical 
>>> server with 265GB RAM and 2TB local SAS storage (seperated in two Raid10 
>>> Groups as LVM VGs). Those six servers are running Ubuntu 14.04. All "roles" 
>>> (Application Server [NGINX+PHP-FPM+GlusterFS-Client+Elasticsearch 
>>> "searcher"], Database Server [Galera Cluster], Storage Server [GlusterFS], 
>>> Cache Server [Redis] ...) will be running in LXC containers. Most of them 
>>> Ubuntu 14.04 only the Galera Cluster in 12.04.
>>> We expect about 100GB of data to index and the data is changing not that 
>>> fast (5% per day?). The idea is to install Elastic Search on all 6 
>>> Application Severs as "searcher" with:
>>>
>>> cluster.name: search001
>>> node.master: false
>>> node.data: false
>>> #node.master: true
>>> #node.data: true
>>> node.max_local_storage_nodes: 1
>>> index.number_of_shards: 5
>>> index.number_of_replicas: 2
>>>
>>> Add 3 "data" Nodes with:
>>>
>>> cluster.name: search001
>>> node.master: false
>>> #node.data: false
>>> #node.master: true
>>> node.data: true
>>> node.max_local_storage_nodes: 1
>>> index.number_of_shards: 5
>>> index.number_of_replicas: 2
>>> bootstrap.mlockall: true
>>>
>>> and 3 "master" nodes:
>>>
>>> cluster.name: search001
>>> #node.master: false
>>> node.data: false
>>> node.master: true
>>> #node.data: true
>>> node.max_local_storage_nodes: 1
>>> index.number_of_shards: 5
>>> index.number_of_replicas: 2
>>>
>>> The LXCs for those "searchers" get 8GB RAM, the "masters" get 2GB RAM 
>>> and the "data" LXCs get 60GB and 300GB storage.
>>>
>>> What about the Java settings for those "data" nodes???
>>>
>>> cat /etc/default/elasticsearch 
>>> # Run Elasticsearch as this user ID and group ID
>>> ES_USER=elasticsearch
>>> ES_GROUP=elasticsearch
>>>
>>> # Heap Size (defaults to 256m min, 1g max)
>>> ES_HEAP_SIZE=30g
>>>
>>> # Heap new generation
>>> ES_HEAP_NEWSIZE=1g
>>>
>>> # max direct memory
>>> ES_DIRECT_SIZE=???
>>>
>>> # Maximum number of open files, defaults to 65535.
>>> MAX_OPEN_FILES=65535
>>>
>>> # Maximum locked memory size. Set to "unlimited" if you use the
>>> # bootstrap.mlockall option in elasticsearch.yml. You must also set
>>> # ES_HEAP_SIZE.
>>> MAX_LOCKED_MEMORY=unlimited
>>>
>>> # Maximum number of VMA (Virtual Memory Areas) a process can own
>>> MAX_MAP_COUNT=262144  #more
>>>
>>> # Elasticsearch log directory
>>> #LOG_DIR=/var/log/elasticsearch
>>>
>>> # Elasticsearch data directory
>>> #DATA_DIR=/var/lib/elasticsearch
>>>
>>> # Elasticsearch work directory
>>> #WORK_DIR=/tmp/elasticsearch
>>>
>>> # Elasticsearch configuration directory
>>> #CONF_DIR=/etc/elasticsearch
>>>
>>> # Elasticsearch configuration file (elasticsearch.yml)
>>> #CONF_FILE=/etc/elasticsearch/elasticsearch.yml
>>>
>>> # Additional Java OPTS
>>> #ES_JAVA_OPTS=
>>>
>>> # Configure restart on package upgrade (true, every other setting will 
>>> lead to not restarting)
>>> #RESTART_ON_UPGRADE=true
>>>
>>>
>>> What about the master and searcher settings? I guess I do not have to 
>>> tune them?
>>>
>>> Thank you for any help!
>>>
>>> Regards,
>>> Flo
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/7ab42442-3680-4364-851a-c4f3b590f00e%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe 

Re: Index relocation during initialization

2014-02-04 Thread Anantha Govindarajan
Hi Clinton,

We are also facing the issue , I verified recovery starts only after 
expected nodes arrived. In case of full cluster restart , initially all the 
shards become unavailable and master starts allocate unassigned nodes. 
While allocation  BalancedShardAllocator comes to play which change the 
previously balanced allocation(before full restart).
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cd4a5c5e-660d-4c67-b01d-0064ee6ea1c5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: how to calculate relevancy by the help of precision and recall

2014-02-04 Thread Ivan Brusic
Interesting topic. Not elasticsearch specific, but nevertheless
interesting. One method to calculate relevancy given the precision and
recall of a query is by using the F1 score:
http://en.wikipedia.org/wiki/F1_score

F1 would be equal to 2 * (P * R) / (P + R), where P is the precision and R
is the recall.

I am glad someone is using elasticsearch as a search engine. Lately it
feels like it is mainly an analytical tool. :)

Cheers,

Ivan


On Tue, Feb 4, 2014 at 10:06 PM, Navneet Mathpal  wrote:

> Hi,
>
> I want to know how do we calculate relevancy with the help of precision
> and reacall,
>
> for example:-
> A = The number of relevant records retrieved,
> B = The number of relevantrecords not retrieved, and
> C = The number of irrelevant records retrieved.
> In this example A = 45, B = 35(80-45)and C = 15(60-45)
> Recall =(45 / (45 + 35)) * 100% => 45/80 * 100% =56%
> Precision =(45 / (45 + 15)) * 100% => 45/60 * 100% =75%
>
>
> what would be its relevancy how how ?
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0c5e3d1e-b58b-4ff1-a893-0cfbaef87aa8%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCA2jdxr9G0p9fL6FVx%3DRvt5GH-O2vLsXb2N41OeCA5sw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Query score calculation algorithm for multiple scoring option given in query

2014-02-04 Thread Narinder Kaur
Thanks for the reply,But I have a lot of others scoring filters too which 
are further complicated. So I just saw the score reaching to infiinity, and 
sorting was a mess then. I can not use this, So my priority is to 
understand this logic instead of using this trick. I need to understand how 
these two scoring mechanism are being combinded to give final scoring.

On Thursday, 30 January 2014 19:05:38 UTC+5:30, Binh Ly wrote:
>
> Narinder,
>
> Can you try changing the script to boost for each of your filters. 
> Something like this:
>
>   "filters": [
> {
>   "filter": {
> "term": {
>   "subtype": "GeoNeighborhood"
> }
>   },
>   "boost": 1
> },
> {
>   "filter": {
> "term": {
>   "subtype": "GeoCity"
> }
>   },
>   "boost": 100
> },
> {
>   "filter": {
> "term": {
>   "subtype": "GeoState"
> }
>   },
>   "boost": 1
> },
> {
>   "filter": {
> "term": {
>   "subtype": "GeoCountry"
> }
>   },
>   "boost": 100
> }
>   ],
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2641c9e2-1f4f-4be5-a8aa-1ddb0031c2c2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


how to calculate relevancy by the help of precision and recall

2014-02-04 Thread Navneet Mathpal
Hi,

I want to know how do we calculate relevancy with the help of precision and 
reacall,

for example:-
A = The number of relevant records retrieved, 
B = The number of relevantrecords not retrieved, and 
C = The number of irrelevant records retrieved. 
In this example A = 45, B = 35(80-45)and C = 15(60-45)
Recall =(45 / (45 + 35)) * 100% => 45/80 * 100% =56%
Precision =(45 / (45 + 15)) * 100% => 45/60 * 100% =75%


what would be its relevancy how how ?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c5e3d1e-b58b-4ff1-a893-0cfbaef87aa8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: cassandra river plugin installation issue

2014-02-04 Thread shamsul haque

>
> The issue is with Hector API, i have posted an issue in Hector API issue 
>> link  . I have 
>> changed river implementation for getting data from Cassandra by Hector API 
>> to Cassandra Java Driver.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a0426b1-b735-4185-a9d4-050ba63f02eb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to configure and implement Synonyms with multi words.

2014-02-04 Thread Jayesh Bhoyar
Hi Matt,

Thanks for your reply. I have modified your mapping which will suit my 
requirement and add it in Gist
https://gist.github.com/jsbonline2006/8817443

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e3fdc576-2028-44bc-98ea-c184a28778eb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: RE: How do I get whole values of a field, as a facet? (not individual terms!)

2014-02-04 Thread Jayesh Bhoyar
Hi,

You need to follow the steps given by Kimchy and David. To brief it again 
here are the steps:
1) You have to define your facet as multi_field value as follows

  "mappings": {
"data": {
  "properties": {
"name": {
  "type": "multi_field",
  "fields": {
"name": {
  "type": "string",
  "index": "analyzed"
},
"untouched": {
  "type": "string",
  "index": "not_analyzed"
}
  }
},

Here my "name" field is multi_field value. I can use "name" for searching 
purpose and "name.untouched" for faceting purpose.

I was facing same issue earlier as you mentioned in above thread. and then 
above mapping and usage helped me in resolving this issue

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dbb07c2b-61f7-4481-a0d7-989852653aeb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


query_string bug in Elasticsearch-0.90.3, please tell me if it really is a bug ?

2014-02-04 Thread coder
I started using explain api for query_string but I guess in process I found 
a bug (don't know if it really is a bug or intended behaviour of 
query_string). This is going to be a long post, please be patient with me.

I'm using a doc:{name:"new delhi to goa",st:"goa"}
On using analyzer api for indexing I got these tokens:

{
  "tokens" : [ {
"token" : "new",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
  }, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi t",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi t",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to g",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to go",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to goa",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "del",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
  }, {
"token" : "delh",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
  }, {
"token" : "delhi",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
  }, {
"token" : "del",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "delh",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "delhi",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "delhi ",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : 

Re: sorting problems

2014-02-04 Thread Jayesh Bhoyar
Hi,

Can you try following syntax for sorting the date value?
"sort": [
   {
  "date_created": {
"order": "desc"
  }
}
  ]

Let me know if this solves your problem.

I was using this syntax succesfully for integer value

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0dbad62-9561-48ca-bc1e-eb96f1a72c2a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Problem: Facets tokenize tags with spaces. Is there a solution?

2014-02-04 Thread Jayesh Bhoyar
Hi All,

Here is the solution for all of you:
1) You have to define your facet as multi_field value as follows

  "mappings": {
"data": {
  "properties": {
"name": {
  "type": "multi_field",
  "fields": {
"name": {
  "type": "string",
  "index": "analyzed"
},
"untouched": {
  "type": "string",
  "index": "not_analyzed"
}
  }
},

Here my "name" field is multi_field value. I can use "name" for searching 
purpose and "name.untouched" for faceting purpose.

I was facing same issue earlier as you guys mentioned in above thread. and 
then above mapping and usage helped me in resolving this issue

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20772e3f-2244-42e8-bf19-ac37c0efbaab%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Identical data uploaded - What to expect, overwrite/update/something else?

2014-02-04 Thread Jayesh Bhoyar
Hi Tony,

What I have observed in my so far experience with ES that it simply 
over-write the exisiting data with change in version.

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar


On Wednesday, February 5, 2014 6:16:32 AM UTC+5:30, Tony Su wrote:
>
> If data is re-loaded which is identical in every way to data which already 
> exists in ES,
>  
> Is new metadata created that simply over-writes existing (zero net effect)?
> Or is duplicate, unique metadata created?
> Or, maybe since identical data is found to already exist, although the 
> update API was not used it would be invoked anyway?
>  
> I think I understand that if data was inserted using the update API, then 
> there would be an orderly addition to current metadata, incrementing the 
> version but I'm interested in what would happen if the update api is not 
> used.
>  
> Thx,
> Tony
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6482bec-c73d-404c-bc21-06c5e39016f8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Simple question about a two-node cluster

2014-02-04 Thread Jack Park
I left everything as defaults.
The client is based on ES 0.90.7 using TransportClient.

I just restarted the entire platform, got a clean set of logs after startup.
Then, started my program. While it was booting, both servers went wild
with error messages on the console, then settled down. So, I started
the import process. The game is to build a topic map and several word
graphs while importing ontologies. Right now, it gives every
indication that it's running; lights indicating disk activity on both
servers are active, and the consoles reflect no distress. Didn't look
at logs.

Still no clues, but running, at least for the moment.

Running the same platform on a single server went without any errors.
The full import took 5 days. I'd like to hope this will run faster.

Thanks
Jack

On Tue, Feb 4, 2014 at 5:23 PM, Mark Walkom  wrote:
> Did you change the cluster name as the blog suggested?
>
> And can you clarify what client you are using as well?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 5 February 2014 12:20, Jack Park  wrote:
>>
>> That's very timely.
>>
>> The second-to-start node is receiving a join request from 10.1.10.80:9303
>> even though 10.1.10.80 does not ping from anywhere in the network
>>
>> I'll look into adding http://www.elastichq.org/ to the browser and see
>> what it says.
>>
>> Many thanks
>> Jack
>>
>> On Tue, Feb 4, 2014 at 5:13 PM, Mark Walkom 
>> wrote:
>> > What client are you using?
>> >
>> > You'd be well placed to install a plugin such as elastichq or kopf to
>> > monitor things as well. It might help tell you where this other node
>> > (10.1.10.80) is coming from.
>> >
>> > Regards,
>> > Mark Walkom
>> >
>> > Infrastructure Engineer
>> > Campaign Monitor
>> > email: ma...@campaignmonitor.com
>> > web: www.campaignmonitor.com
>> >
>> >
>> > On 5 February 2014 12:09, Jack Park  wrote:
>> >>
>> >> Hi Tony,
>> >>
>> >> I did look at the logs.
>> >>
>> >> Also, I restarted everything and followed the instructions found at
>> >>
>> >>
>> >> http://techhari.blogspot.com/2013/03/elasticsearch-cluster-setup-in-2-minutes.html
>> >>
>> >> Both nodes are started; I got a message from the first one that it
>> >> added a cluster when the second node started.
>> >> I then opened a browser and got the JSON string indicated on that
>> >> page, meaning, at least to me, that I have an operating cluster.
>> >>
>> >> Next up is to fire up the client again and see if things explode
>> >> again. In the last run, I was sending over data to be indexed, and the
>> >> client's log never showed any signs of distress, even thought the
>> >> consoles of both servers were filling with error messages.
>> >>
>> >> Jack
>> >>
>> >> On Tue, Feb 4, 2014 at 4:06 PM, Tony Su  wrote:
>> >> > Hi Jack,
>> >> > Although I'm a bit new to this, too...
>> >> >
>> >> > 1. You should take a look at your ES log files. Depending on how you
>> >> > installed and are running ES, the log files could be in different
>> >> > places. If
>> >> > I were to guess though, you should look in the following directory
>> >> > /var/log/elasticsearch/
>> >> >
>> >> > 2. One of the first things I did was to install and run
>> >> > elasticsearch-head
>> >> > to get near-runtime visibility the status and distribution of the
>> >> > nodes,
>> >> > indexes, shards, etc.
>> >> >
>> >> > HTH,
>> >> > Tony
>> >> >
>> >> >
>> >> > On Tuesday, February 4, 2014 3:50:27 PM UTC-8, Jack Park wrote:
>> >> >>
>> >> >> I confess that, at least for me, documentation, including purchased
>> >> >> books, remains a bit ambiguous, where the context is that of making
>> >> >> my
>> >> >> ES client talk to two different servers.
>> >> >>
>> >> >> In the end, I did nothing to the elasticsearch.yml files at each
>> >> >> server; it simply was not clear what needed to be changed.
>> >> >>
>> >> >> I did present two IP addresses to the client, but nothing else. That
>> >> >> is, I didn't set "sniff" to true, or tell it to ignore cluster names
>> >> >> since each server box has just one ES installation running.
>> >> >>
>> >> >> At startup, I could see that both servers were responding, but soon
>> >> >> they each blew up with a flurry of error messages which mean little
>> >> >> to
>> >> >> me. I bet they're meaningful, except that somewhere near the top
>> >> >> where
>> >> >> the initial error occurred and which is no-longer visible, perhaps
>> >> >> something important was stated.
>> >> >>
>> >> >> The client's log file correctly stated:
>> >> >> connected to 10.1.10.179:9300
>> >> >> and
>> >> >> connected to 10.1.10.178:9300
>> >> >>
>> >> >> but the log of the 179 server said words to this effect:
>> >> >> zen-disco-node_failed[...][inet 10.1.10.80:9301]
>> >> >>
>> >> >> I guess I missed something: I don't have a 10.1.10.80 on that
>> >> >> network...
>> >> >>
>> >> >> On the surface, is there something obvious I missed?
>> >> >>
>> >> >> Many thanks in adv

Re: Simple question about a two-node cluster

2014-02-04 Thread Mark Walkom
Did you change the cluster name as the blog suggested?

And can you clarify what client you are using as well?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 February 2014 12:20, Jack Park  wrote:

> That's very timely.
>
> The second-to-start node is receiving a join request from 10.1.10.80:9303
> even though 10.1.10.80 does not ping from anywhere in the network
>
> I'll look into adding http://www.elastichq.org/ to the browser and see
> what it says.
>
> Many thanks
> Jack
>
> On Tue, Feb 4, 2014 at 5:13 PM, Mark Walkom 
> wrote:
> > What client are you using?
> >
> > You'd be well placed to install a plugin such as elastichq or kopf to
> > monitor things as well. It might help tell you where this other node
> > (10.1.10.80) is coming from.
> >
> > Regards,
> > Mark Walkom
> >
> > Infrastructure Engineer
> > Campaign Monitor
> > email: ma...@campaignmonitor.com
> > web: www.campaignmonitor.com
> >
> >
> > On 5 February 2014 12:09, Jack Park  wrote:
> >>
> >> Hi Tony,
> >>
> >> I did look at the logs.
> >>
> >> Also, I restarted everything and followed the instructions found at
> >>
> >>
> http://techhari.blogspot.com/2013/03/elasticsearch-cluster-setup-in-2-minutes.html
> >>
> >> Both nodes are started; I got a message from the first one that it
> >> added a cluster when the second node started.
> >> I then opened a browser and got the JSON string indicated on that
> >> page, meaning, at least to me, that I have an operating cluster.
> >>
> >> Next up is to fire up the client again and see if things explode
> >> again. In the last run, I was sending over data to be indexed, and the
> >> client's log never showed any signs of distress, even thought the
> >> consoles of both servers were filling with error messages.
> >>
> >> Jack
> >>
> >> On Tue, Feb 4, 2014 at 4:06 PM, Tony Su  wrote:
> >> > Hi Jack,
> >> > Although I'm a bit new to this, too...
> >> >
> >> > 1. You should take a look at your ES log files. Depending on how you
> >> > installed and are running ES, the log files could be in different
> >> > places. If
> >> > I were to guess though, you should look in the following directory
> >> > /var/log/elasticsearch/
> >> >
> >> > 2. One of the first things I did was to install and run
> >> > elasticsearch-head
> >> > to get near-runtime visibility the status and distribution of the
> nodes,
> >> > indexes, shards, etc.
> >> >
> >> > HTH,
> >> > Tony
> >> >
> >> >
> >> > On Tuesday, February 4, 2014 3:50:27 PM UTC-8, Jack Park wrote:
> >> >>
> >> >> I confess that, at least for me, documentation, including purchased
> >> >> books, remains a bit ambiguous, where the context is that of making
> my
> >> >> ES client talk to two different servers.
> >> >>
> >> >> In the end, I did nothing to the elasticsearch.yml files at each
> >> >> server; it simply was not clear what needed to be changed.
> >> >>
> >> >> I did present two IP addresses to the client, but nothing else. That
> >> >> is, I didn't set "sniff" to true, or tell it to ignore cluster names
> >> >> since each server box has just one ES installation running.
> >> >>
> >> >> At startup, I could see that both servers were responding, but soon
> >> >> they each blew up with a flurry of error messages which mean little
> to
> >> >> me. I bet they're meaningful, except that somewhere near the top
> where
> >> >> the initial error occurred and which is no-longer visible, perhaps
> >> >> something important was stated.
> >> >>
> >> >> The client's log file correctly stated:
> >> >> connected to 10.1.10.179:9300
> >> >> and
> >> >> connected to 10.1.10.178:9300
> >> >>
> >> >> but the log of the 179 server said words to this effect:
> >> >> zen-disco-node_failed[...][inet 10.1.10.80:9301]
> >> >>
> >> >> I guess I missed something: I don't have a 10.1.10.80 on that
> >> >> network...
> >> >>
> >> >> On the surface, is there something obvious I missed?
> >> >>
> >> >> Many thanks in advance for ideas.
> >> >>
> >> >> Jack
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> > Groups
> >> > "elasticsearch" group.
> >> > To unsubscribe from this group and stop receiving emails from it, send
> >> > an
> >> > email to elasticsearch+unsubscr...@googlegroups.com.
> >> > To view this discussion on the web visit
> >> >
> >> >
> https://groups.google.com/d/msgid/elasticsearch/d4c504d2-402f-4028-9ce4-ffe87e43dd28%40googlegroups.com
> .
> >> > For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "elasticsearch" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to elasticsearch+unsubscr...@googlegroups.com.
> >> To view this discussion on the web visit
> >>
> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fw29a5hwrHv%3D2e0gtSW5FJaunqG5E2Znzg%3DE1PUvES_fA%40mail.gmail.com
> .

Re: Simple question about a two-node cluster

2014-02-04 Thread Jack Park
That's very timely.

The second-to-start node is receiving a join request from 10.1.10.80:9303
even though 10.1.10.80 does not ping from anywhere in the network

I'll look into adding http://www.elastichq.org/ to the browser and see
what it says.

Many thanks
Jack

On Tue, Feb 4, 2014 at 5:13 PM, Mark Walkom  wrote:
> What client are you using?
>
> You'd be well placed to install a plugin such as elastichq or kopf to
> monitor things as well. It might help tell you where this other node
> (10.1.10.80) is coming from.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 5 February 2014 12:09, Jack Park  wrote:
>>
>> Hi Tony,
>>
>> I did look at the logs.
>>
>> Also, I restarted everything and followed the instructions found at
>>
>> http://techhari.blogspot.com/2013/03/elasticsearch-cluster-setup-in-2-minutes.html
>>
>> Both nodes are started; I got a message from the first one that it
>> added a cluster when the second node started.
>> I then opened a browser and got the JSON string indicated on that
>> page, meaning, at least to me, that I have an operating cluster.
>>
>> Next up is to fire up the client again and see if things explode
>> again. In the last run, I was sending over data to be indexed, and the
>> client's log never showed any signs of distress, even thought the
>> consoles of both servers were filling with error messages.
>>
>> Jack
>>
>> On Tue, Feb 4, 2014 at 4:06 PM, Tony Su  wrote:
>> > Hi Jack,
>> > Although I'm a bit new to this, too...
>> >
>> > 1. You should take a look at your ES log files. Depending on how you
>> > installed and are running ES, the log files could be in different
>> > places. If
>> > I were to guess though, you should look in the following directory
>> > /var/log/elasticsearch/
>> >
>> > 2. One of the first things I did was to install and run
>> > elasticsearch-head
>> > to get near-runtime visibility the status and distribution of the nodes,
>> > indexes, shards, etc.
>> >
>> > HTH,
>> > Tony
>> >
>> >
>> > On Tuesday, February 4, 2014 3:50:27 PM UTC-8, Jack Park wrote:
>> >>
>> >> I confess that, at least for me, documentation, including purchased
>> >> books, remains a bit ambiguous, where the context is that of making my
>> >> ES client talk to two different servers.
>> >>
>> >> In the end, I did nothing to the elasticsearch.yml files at each
>> >> server; it simply was not clear what needed to be changed.
>> >>
>> >> I did present two IP addresses to the client, but nothing else. That
>> >> is, I didn't set "sniff" to true, or tell it to ignore cluster names
>> >> since each server box has just one ES installation running.
>> >>
>> >> At startup, I could see that both servers were responding, but soon
>> >> they each blew up with a flurry of error messages which mean little to
>> >> me. I bet they're meaningful, except that somewhere near the top where
>> >> the initial error occurred and which is no-longer visible, perhaps
>> >> something important was stated.
>> >>
>> >> The client's log file correctly stated:
>> >> connected to 10.1.10.179:9300
>> >> and
>> >> connected to 10.1.10.178:9300
>> >>
>> >> but the log of the 179 server said words to this effect:
>> >> zen-disco-node_failed[...][inet 10.1.10.80:9301]
>> >>
>> >> I guess I missed something: I don't have a 10.1.10.80 on that
>> >> network...
>> >>
>> >> On the surface, is there something obvious I missed?
>> >>
>> >> Many thanks in advance for ideas.
>> >>
>> >> Jack
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to elasticsearch+unsubscr...@googlegroups.com.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/elasticsearch/d4c504d2-402f-4028-9ce4-ffe87e43dd28%40googlegroups.com.
>> > For more options, visit https://groups.google.com/groups/opt_out.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fw29a5hwrHv%3D2e0gtSW5FJaunqG5E2Znzg%3DE1PUvES_fA%40mail.gmail.com.
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEM624bvPe_Ukq73aejBJkxNugSmjeHeF0RE7D-m0Vf0rb%3DX%3DQ%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.


Re: Simple question about a two-node cluster

2014-02-04 Thread Mark Walkom
What client are you using?

You'd be well placed to install a plugin such as elastichq or kopf to
monitor things as well. It might help tell you where this other node
(10.1.10.80) is coming from.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 February 2014 12:09, Jack Park  wrote:

> Hi Tony,
>
> I did look at the logs.
>
> Also, I restarted everything and followed the instructions found at
>
> http://techhari.blogspot.com/2013/03/elasticsearch-cluster-setup-in-2-minutes.html
>
> Both nodes are started; I got a message from the first one that it
> added a cluster when the second node started.
> I then opened a browser and got the JSON string indicated on that
> page, meaning, at least to me, that I have an operating cluster.
>
> Next up is to fire up the client again and see if things explode
> again. In the last run, I was sending over data to be indexed, and the
> client's log never showed any signs of distress, even thought the
> consoles of both servers were filling with error messages.
>
> Jack
>
> On Tue, Feb 4, 2014 at 4:06 PM, Tony Su  wrote:
> > Hi Jack,
> > Although I'm a bit new to this, too...
> >
> > 1. You should take a look at your ES log files. Depending on how you
> > installed and are running ES, the log files could be in different
> places. If
> > I were to guess though, you should look in the following directory
> > /var/log/elasticsearch/
> >
> > 2. One of the first things I did was to install and run
> elasticsearch-head
> > to get near-runtime visibility the status and distribution of the nodes,
> > indexes, shards, etc.
> >
> > HTH,
> > Tony
> >
> >
> > On Tuesday, February 4, 2014 3:50:27 PM UTC-8, Jack Park wrote:
> >>
> >> I confess that, at least for me, documentation, including purchased
> >> books, remains a bit ambiguous, where the context is that of making my
> >> ES client talk to two different servers.
> >>
> >> In the end, I did nothing to the elasticsearch.yml files at each
> >> server; it simply was not clear what needed to be changed.
> >>
> >> I did present two IP addresses to the client, but nothing else. That
> >> is, I didn't set "sniff" to true, or tell it to ignore cluster names
> >> since each server box has just one ES installation running.
> >>
> >> At startup, I could see that both servers were responding, but soon
> >> they each blew up with a flurry of error messages which mean little to
> >> me. I bet they're meaningful, except that somewhere near the top where
> >> the initial error occurred and which is no-longer visible, perhaps
> >> something important was stated.
> >>
> >> The client's log file correctly stated:
> >> connected to 10.1.10.179:9300
> >> and
> >> connected to 10.1.10.178:9300
> >>
> >> but the log of the 179 server said words to this effect:
> >> zen-disco-node_failed[...][inet 10.1.10.80:9301]
> >>
> >> I guess I missed something: I don't have a 10.1.10.80 on that network...
> >>
> >> On the surface, is there something obvious I missed?
> >>
> >> Many thanks in advance for ideas.
> >>
> >> Jack
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to elasticsearch+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/elasticsearch/d4c504d2-402f-4028-9ce4-ffe87e43dd28%40googlegroups.com
> .
> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fw29a5hwrHv%3D2e0gtSW5FJaunqG5E2Znzg%3DE1PUvES_fA%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bvPe_Ukq73aejBJkxNugSmjeHeF0RE7D-m0Vf0rb%3DX%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Simple question about a two-node cluster

2014-02-04 Thread Jack Park
Hi Tony,

I did look at the logs.

Also, I restarted everything and followed the instructions found at
http://techhari.blogspot.com/2013/03/elasticsearch-cluster-setup-in-2-minutes.html

Both nodes are started; I got a message from the first one that it
added a cluster when the second node started.
I then opened a browser and got the JSON string indicated on that
page, meaning, at least to me, that I have an operating cluster.

Next up is to fire up the client again and see if things explode
again. In the last run, I was sending over data to be indexed, and the
client's log never showed any signs of distress, even thought the
consoles of both servers were filling with error messages.

Jack

On Tue, Feb 4, 2014 at 4:06 PM, Tony Su  wrote:
> Hi Jack,
> Although I'm a bit new to this, too...
>
> 1. You should take a look at your ES log files. Depending on how you
> installed and are running ES, the log files could be in different places. If
> I were to guess though, you should look in the following directory
> /var/log/elasticsearch/
>
> 2. One of the first things I did was to install and run elasticsearch-head
> to get near-runtime visibility the status and distribution of the nodes,
> indexes, shards, etc.
>
> HTH,
> Tony
>
>
> On Tuesday, February 4, 2014 3:50:27 PM UTC-8, Jack Park wrote:
>>
>> I confess that, at least for me, documentation, including purchased
>> books, remains a bit ambiguous, where the context is that of making my
>> ES client talk to two different servers.
>>
>> In the end, I did nothing to the elasticsearch.yml files at each
>> server; it simply was not clear what needed to be changed.
>>
>> I did present two IP addresses to the client, but nothing else. That
>> is, I didn't set "sniff" to true, or tell it to ignore cluster names
>> since each server box has just one ES installation running.
>>
>> At startup, I could see that both servers were responding, but soon
>> they each blew up with a flurry of error messages which mean little to
>> me. I bet they're meaningful, except that somewhere near the top where
>> the initial error occurred and which is no-longer visible, perhaps
>> something important was stated.
>>
>> The client's log file correctly stated:
>> connected to 10.1.10.179:9300
>> and
>> connected to 10.1.10.178:9300
>>
>> but the log of the 179 server said words to this effect:
>> zen-disco-node_failed[...][inet 10.1.10.80:9301]
>>
>> I guess I missed something: I don't have a 10.1.10.80 on that network...
>>
>> On the surface, is there something obvious I missed?
>>
>> Many thanks in advance for ideas.
>>
>> Jack
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d4c504d2-402f-4028-9ce4-ffe87e43dd28%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fw29a5hwrHv%3D2e0gtSW5FJaunqG5E2Znzg%3DE1PUvES_fA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch index mapping in java

2014-02-04 Thread Kevin Wang
The index request is used to index document, you should use put mapping 
request.

e,g,
PutMappingResponse response = 
client.admin().indices().preparePutMapping(INDEX).setType(INDEX_TYPE).setSource(source).get();


On Wednesday, February 5, 2014 1:27:41 AM UTC+11, Doru Sular wrote:
>
> Hi guys,
>
> I am trying to create an index with the following code:
> XContentBuilder source = XContentFactory.jsonBuilder().startObject()//
> .startObject("settings")
> .field("number_of_shards", 1)
> .endObject()// end settings
> .startObject("mappings")
> .startObject(INDEX_TYPE)//
> .startObject("properties")//
> .startObject("user")
> .field("type", "string") // start user
> .field("store", "yes")
> .field("index", "analyzed")//
> .endObject()// end user
> .startObject("postDate")//
> .field("type", "date")
> .field("store", "yes")
> .field("index", "analyzed")//
> .endObject()// end post date
> .startObject("message") //
> .field("type", "string")
> .field("store", "yes")
> .field("index", "not_analyzed")
> .endObject() // end user field
> .endObject() // end properties
> .endObject() // end index type
> .endObject() // end mappings
> .endObject(); // end the container object
>
> IndexResponse response = this.client.prepareIndex(INDEX,INDEX_TYPE
> ).setSource(source)
> .setType(INDEX_TYPE).execute()
> .actionGet();
>
>
> I want to have the "message" field not analyzed, because later I want to 
> use facets to obtain unique messages.
> Unfortunately my code seems to add just a document in index with the 
> following structure:
> {
>   "settings": {
> "number_of_shards": 1
>   },
>   "mappings": {
> "tweet": {
>   "properties": {
> "user": {
>   "type": "string",
>   "store": "yes",
>   "index": "analyzed"
> },
> "postDate": {
>   "type": "date",
>   "store": "yes",
>   "index": "analyzed"
> },
> "message": {
>   "type": "string",
>   "store": "yes",
>   "index": "not_analyzed"
> }
>   }
> }
>   }
> }
>
> Please help me to spot the error, it seems that mapping are not created.
> Thank you very much,
> Doru
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/38ae7a3e-b8ef-4a05-8f7a-5ff20917f85e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Understanding ElasticSearch with MySQL

2014-02-04 Thread joergpra...@gmail.com
Congrats for deciding to start indexing RDBMS data with Elasticsearch, I
hope I can be helpful.

To add to 4) - you can craft a select query (or queries) to either
overwrite old docs in ES or add timeframe based incremental updates to ES.

Autocommit is not related to updates in ES. In fact there are no automatic
update. With a crontab notation, the river can start periodically and fire
a series of SQL statements, in the hope to select all the data for indexing.

JDBC river can generate nested JSON docs out of SQL result rows by column
name notation, so there is not necessarily a 1:1 relationship between rows
in the DB and docs in ES.

Quite a few instructional docs in the JDBC river wiki are missing, I'm
sorry for that, they will be added soon. If you like to write your
experiences or about how to use the JDBC river, you can drop me a note or
add it to the JDBC river wiki.

You can use PHP standard ES client to search and to administrate the
cluster, from then on, there is no difference to regular ES applications.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGhN%2B9bkE%3D8D3pX%2BD1avXKH4BuJi5JUU6PZSkHqYPY%2B0A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Identical data uploaded - What to expect, overwrite/update/something else?

2014-02-04 Thread Tony Su
If data is re-loaded which is identical in every way to data which already 
exists in ES,
 
Is new metadata created that simply over-writes existing (zero net effect)?
Or is duplicate, unique metadata created?
Or, maybe since identical data is found to already exist, although the 
update API was not used it would be invoked anyway?
 
I think I understand that if data was inserted using the update API, then 
there would be an orderly addition to current metadata, incrementing the 
version but I'm interested in what would happen if the update api is not 
used.
 
Thx,
Tony

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/27a3bc1f-a500-4e4a-9088-0ab78e329fca%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Joining node to cluster without restarting entire machine?

2014-02-04 Thread Tony Su
Hi Mark,
I've done all that to no effect.
 
FYI if it makes a diff,
I'm running on a distro that uses systemd, so in theory when the Service is 
started, it's supposed to create a cgroup in which the new process is run, 
and if there are any processes that are spawned (including but not limited 
to new ES processes), they're all supposed to be managed by that cgroup. 
This generally means that compared to SystemV when the cgroup is shutdown, 
it shuts down all child processes reliably, there are no orphaned processes 
that continue to run.
 
So, when I stop the ES service, it really should be shutdown.
But, when I start up again I've waited over 5 minutes on a small but active 
cluster accepting new data and the node never joins. 
But, after rebooting the orphaned node, and starting the ES service it 
rarely takes more than about 15 seconds to join (according to ES-head).
 
Tony
 

On Tuesday, February 4, 2014 2:10:14 PM UTC-8, Mark Walkom wrote:

> If you give the service a restart, it's a stop and then a start 
> (obviously).
> This will/should reread the config and attempt to rejoin the cluster in 
> the config.
>
> Can you try an explicit stop, then sleep for 5, then start? It could be 
> the process isn't properly closing when requested.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 5 February 2014 04:22, Tony Su > wrote:
>
>> Unless I'm missing something in the docs or these forums,
>>  
>> I've surprisingly found that if a node fails to join the cluster, it's 
>> not sufficient to simply restart ES on the machine. I would have thought 
>> that restarting ES thereby re-reading its config files should be sufficient 
>> to announce its intention to join the cluster.
>>  
>> But, I haven't found that to be the case, every time I've had to reboot 
>> the entire machine to join the cluster.
>>  
>> Is there a config I'm missing?
>>  
>> Thx,
>> Tony
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/02c4b578-f430-44ba-a98c-7337b684125d%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cb4d9dd4-eb79-4135-b615-2b1101b4d5f1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Bulk indexing tips for Elastic search and Cassandra River

2014-02-04 Thread Utkarsh Sengar
Can you please file a bug (https://github.com/eBay/cassandra-river/issues)
or share the stacktrace?

Thanks,
-Utkarsh


On Tue, Feb 4, 2014 at 8:54 AM, AKhan  wrote:

> cassandra-river is not working in my case too and I am getting exceptions
> on server side.
>
> elasticsearch.common.UUID;
>
> On Friday, March 29, 2013 10:01:14 PM UTC+1, utkar...@gmail.com wrote:
>>
>> Hello,
>>
>> I have been working on a cassandra river which triggers periodically and
>> indexes all data in a cassandra column family. The implementation for now
>> spawns 10 threads and processes 10k documents (with 13 columns)/thread.
>> The performance initially was very good. It indexed 1M documents in
>> 10mins. But after a 1hour, the indexing became very slow and it indexed
>> around 8M documents. I am trying to index a total of 50M documents.
>>
>> I have attached a screenshot of the memory and CPU usage. What I noticed
>> was, a lot of merge threads spawned up which reduced the speed considerably:
>> "elasticsearch[Doppelganger][[prodinfo][1]: Lucene Merge Thread #329]"
>> daemon prio=10 tid=0x2a63 nid=0x4c28 runnable [0x246bd000]
>>
>> So, I believe this has to do with some configuration which I can tweak to
>> improve bulk indexing. I am running 1 node with 5 shared with 2GB of
>> ES_HEAP_SIZE and no replicas for now.
>>
>> Shay mentioned some tips here: https://groups.google.com/
>> forum/?fromgroups=#!topic/elasticsearch/APWxRLrMOeU in 2011.
>> Wanted to know if there are any bulk indexing performance improvements?
>>
>> I am also using: bulk.execute().addListener() (async) in place of
>> bulk.execute().actionGet() (sync)
>>
>> I am planning to share the cassandra-river as soon its achieves acceptable
>> performance.
>>
>>
>>
>> 
>>
>>
>>
>>
>> 
>>
>>
>> Thanks,
>> -Utkarsh
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/M1aJqvAIpZE/unsubscribe.
>
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1f5550ca-d53e-4513-b691-8992e0504533%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Thanks,
-Utkarsh

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADjjot9uWLyBw%3D7ESYL%3D-sHLJmBLigPx4iON1UjsXNgpWsm18g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Improving Bulk Indexing

2014-02-04 Thread joergpra...@gmail.com
SSD is the best you can do for the persistence layer. I have such an ES
4xSSD RAID0 server at home, with 800 MB/sec sustained write I/O rate. My
servers for my day job are some years old when some TB in SSD costed a
fortune.

The higher the writing rate and IOPS capacity of the drives are, the more
throughput you can expect. Ramp up your monitoring tools, run bulk indexing
for an hour, and watch the segment merging - then you understand how bulk
indexing behaves. With slow drives, you will see decays in the bulk
indexing rate, with fast drives rather not.

10-12 MB/sec sustained rate includes transforming docs on a single remote
server to a cluster of 3 nodes, using some dozens of threads - I'm pretty
sure the ETL process is CPU bound, there is still network bandwidth
available.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFnauAWGi44pBAvu-YT31dt3eX-71PnBcR%3DWAP71BfW-w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Building custom panels in Kibana

2014-02-04 Thread Tony Su
I've been struggling with this a bit, too.
 
Lately, I've had some success understanding how the "new" Kibana works by 
installing Marvel and then looking around.
The biggest issue I'm currently experiencing is exactly what can be 
specified for display. A dropdown would be useful, but for now you just 
need to know.
 
Tony
 

On Tuesday, February 4, 2014 3:01:11 PM UTC-8, Gabe Gorelick-Feldman wrote:

> Is there any documentation on implementing custom panels in Kibana?
>
 
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/72c978fb-16ac-471c-bd09-ea563fc2c918%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Simple question about a two-node cluster

2014-02-04 Thread Tony Su
Hi Jack,
Although I'm a bit new to this, too...
 
1. You should take a look at your ES log files. Depending on how you 
installed and are running ES, the log files could be in different places. 
If I were to guess though, you should look in the following directory
/var/log/elasticsearch/
 
2. One of the first things I did was to install and run elasticsearch-head 
to get near-runtime visibility the status and distribution of the nodes, 
indexes, shards, etc.
 
HTH,
Tony
 

On Tuesday, February 4, 2014 3:50:27 PM UTC-8, Jack Park wrote:

> I confess that, at least for me, documentation, including purchased 
> books, remains a bit ambiguous, where the context is that of making my 
> ES client talk to two different servers. 
>
> In the end, I did nothing to the elasticsearch.yml files at each 
> server; it simply was not clear what needed to be changed. 
>
> I did present two IP addresses to the client, but nothing else. That 
> is, I didn't set "sniff" to true, or tell it to ignore cluster names 
> since each server box has just one ES installation running. 
>
> At startup, I could see that both servers were responding, but soon 
> they each blew up with a flurry of error messages which mean little to 
> me. I bet they're meaningful, except that somewhere near the top where 
> the initial error occurred and which is no-longer visible, perhaps 
> something important was stated. 
>
> The client's log file correctly stated: 
> connected to 10.1.10.179:9300 
> and 
> connected to 10.1.10.178:9300 
>
> but the log of the 179 server said words to this effect: 
> zen-disco-node_failed[...][inet 10.1.10.80:9301] 
>
> I guess I missed something: I don't have a 10.1.10.80 on that network... 
>
> On the surface, is there something obvious I missed? 
>
> Many thanks in advance for ideas. 
>
> Jack 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4c504d2-402f-4028-9ce4-ffe87e43dd28%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Adding a river using the python driver

2014-02-04 Thread Mihnea Dobrescu-Balaur
Hello,

I can't find how to add a river using the python driver.

I tried in various ways of calling es.index with '_river' as an index but I 
couldn't find a way to add the "_meta" document. The closest I got was an 
error from ES "no river _meta document found after 5 attempts".


Can this be done? And if so, how?

Thanks,
Mihnea

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/56180088-b96a-464a-8f81-c30bcff52b6d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Improving Bulk Indexing

2014-02-04 Thread ZenMaster80
Good to know, I will keep this in mind, even though I will try to go for 
SSD as I personally had great success with them in the past! When you say 
10-12 MB/sec, is this with doc parsing/processing or just ES index time. 
For my humble test on a quadcore labtop, I am pushing 6 MB/sec with 
processing and 9 MB/sec if I don't include processing time. I tried playing 
with many different settings, I think this is about all its going to do 
giving the machine I am running on. 

On Tuesday, February 4, 2014 4:22:10 PM UTC-5, Jörg Prante wrote:
>
> My use case is bibliographic data indexing of academic and public 
> libraries. There are ~100m records from various sources that I regularly 
> extract, transform into JSON-LD, and load into Elasticsearch. Some are 
> files, some are fetched by JDBC. I have six 32-core servers in our place, 
> organized in 2 ES clusters. Self installed and configured - no cloud VMs :) 
> With bulk indexing I can push around 10-12m/sec to an ES cluster. 
> Transforming docs is rather complex, needs re-processing of indexed data. 
> The job is done in a few hours so I can perform ETL every night. No SSD, 
> too expensive, but SAS-2 (6Gbit/sec) RAID-0 drives of ~1TB per server.
>
> Jörg
>
>
>
> On Tue, Feb 4, 2014 at 5:22 PM, ZenMaster80 
> > wrote:
>
>> Jörg,
>>
>> Great, I learned a lot about the process from your responses. Could you 
>> elaborate more on your use case, mine I think will be similar to yours 
>> where processing/feeding is on one server and I will use transport client, 
>> index nodes will be on EC2. So, when I do get to setting up Ec2 nodes, I 
>> believe I should be mostly looking for big cores and SSD.
>> For current test, besides running long feeds to guage performance and 
>> checking for analyzers, I take it there isn't much else I can do to make 
>> significant impact?
>>
>>
>> On Tuesday, February 4, 2014 3:11:14 AM UTC-5, Jörg Prante wrote:
>>>
>>> SSD will improve overall performance very much, yes. Disk drives are the 
>>> slowest part in the chain and this will help. No more low IOPS, so it will 
>>> significantly reduce the load on CPU (less IO waits).
>>>
>>> More RAM will not help that much. In fact, more RAM will slow down 
>>> persisting, it increases pressure on the memory-to-disk part. ES obviously 
>>> does not depend on large RAM for persisting data, some MB suffice, but you 
>>> can try and see for yourself.
>>>
>>> 85 MB is not sufficient for testing index segment merging and GC 
>>> effects, you should run a bulk indexing feed not for seconds, but for at 
>>> least 20-30 minutes, if not for hours.
>>>
>>> Also check if your mapping can be simplified, the less complex 
>>> analyzers, the faster ES can index.
>>>
>>> You should also exercise your feed program how long it takes to process 
>>> your input without the part of bulk indexing. Then you see a bottom line, 
>>> and maybe more space for improvement outside ES. 
>>>
>>> In my use case, it helped to move the feed program to another server and 
>>> use the TransportClient with a speedup of ~30%.
>>>
>>> I agree that 5.5M/sec is not the end of the line but that heavily 
>>> depends on your hard- and software configuration (machine, OS, file 
>>> systems, JVM).
>>>
>>> Jörg
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/8db08c83-c91d-45df-bd28-5fe49f7f32cd%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2f2b04d-8b43-4641-a31a-adadfff037e6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Issue post upgrade from version 0.90.5 to 0.9.10

2014-02-04 Thread venku123
We are planning on upgrade from 0.90.5 to 0.90.10 in our live environment. 
Prior to that we want to do the same in the test environment. Please find 
below the environment, upgrade steps and post upgrade issue detail

Environment:

   - Master Only (no data node) - 3
   - data only nodes - 3
   - version 0.90.5


Upgrade step

  Not planning to have any down time and so followed the under 
mentioned steps while having the cluster up and running


   - upgraded the data nodes to 0.90.10 one by one
   - upgraded the eligible master nodes to 0.90.10 one by one
   - upgraded the master node to 0.90.10


The upgrade was successful without any impact to the application.

Post upgrade, the following issue keeps occurring

On accessing the plugin head url, there will be no data beyond the cluster 
overview section (no index, nodes, shard details). But if you access the 
browser tab, the index does exist. Also I see the following error

[2014-02-04 22:40:15,518][DEBUG][action.admin.cluster.node.info] 
[ppmn-elastic01-21947.phx-os1.stratus.dev.ebay.com] failed to execute on 
node [ict1HABCQC-wGK3VvIHhXA]
org.elasticsearch.transport.NodeDisconnectedException: 
[es-upgrade-0.90.10-3-163533.phx-os1.stratus.dev.ebay.com][inet[/10.9.218.139:9300]][cluster/nodes/info/n]
 
disconnected.

On restarting the data node with this issue, the cluster will come back 
normally. But after a while the same issue happens.

Any pointers?

Thanks
Venkatesh


Connectes-upgrade-0.90.10-161885.phx-os1.stratus.dev.ebay.comcluster 
health: green (6, 1)ElasticSearch
Info
Overview
Browser
Structured Query [+]
Any Request [+]
Cluster Overview
New Index
Sort Cluster
Refresh
 




 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/65d81f33-678f-4233-89a0-8c6e5dfe2ff4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Simple question about a two-node cluster

2014-02-04 Thread Jack Park
I confess that, at least for me, documentation, including purchased
books, remains a bit ambiguous, where the context is that of making my
ES client talk to two different servers.

In the end, I did nothing to the elasticsearch.yml files at each
server; it simply was not clear what needed to be changed.

I did present two IP addresses to the client, but nothing else. That
is, I didn't set "sniff" to true, or tell it to ignore cluster names
since each server box has just one ES installation running.

At startup, I could see that both servers were responding, but soon
they each blew up with a flurry of error messages which mean little to
me. I bet they're meaningful, except that somewhere near the top where
the initial error occurred and which is no-longer visible, perhaps
something important was stated.

The client's log file correctly stated:
connected to 10.1.10.179:9300
and
connected to 10.1.10.178:9300

but the log of the 179 server said words to this effect:
zen-disco-node_failed[...][inet 10.1.10.80:9301]

I guess I missed something: I don't have a 10.1.10.80 on that network...

On the surface, is there something obvious I missed?

Many thanks in advance for ideas.

Jack

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fyQeqoOo0C_2VRp1dtK7sOW4DUWv3yByqccGo78DNAbdg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Understanding ElasticSearch with MySQL

2014-02-04 Thread Usman Ehtesham
Hello,

I am trying to add elastic search to a eCommerce website based on php and 
MySQL platform and spend today trying to research on elastic search. I 
learnt alot and have a fairly good idea on what to do. I just wanted some 
guidance if my understanding of elastic search with MySQL is correct. From 
what i gathered:

1) Get elasticsearch, jdbc river, MySQL driver for jdbc river
2) Run elasticsearch server on terminal
3) Create jdbc riverusing curl command on a new terminal
4) Feed the data from mysql to elasticsearch. This will ensure that all 
data will also be available on elasticsearch from our database and 
hopefully by setting parameter autocommit to true, everytime MySQL is 
updated, elasticsearch server will also be updated

Assuming this is also hopefully correct, my question is how can i use PHP 
to get the search query from front-end and get results from elasticsearch 
server? I still have issues understanding this concept. Most of the 
examples that i saw were either using PHP with documents-based storage or 
using curl commands for searching. I want to know how i can use PHP to get 
results of our search using elasticsearch. 

I went through 
https://github.com/jprante/elasticsearch-river-jdbc/wiki/_pages and many 
other resources online and have a pretty good idea about elasticsearch. I 
just need a few more things to figure out. 

This is the very first time that I have read on elasticsearch and i really 
like the search engine and would like to add this to the website. Would 
really really appreciate some guidance on this.  

Usman Ehtesham Gul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/89988ecc-5a1e-4d3b-8ff9-26835764b33b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Building custom panels in Kibana

2014-02-04 Thread Gabe Gorelick-Feldman
Is there any documentation on implementing custom panels in Kibana?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/147e8482-3ab4-4da5-843d-9c7c704eb599%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Azure Cloud Plugin Problems

2014-02-04 Thread Andrew Westgarth
Hi David,
here's the gist for the logs from the three nodes of Cluster 2 
- https://gist.github.com/apwestgarth/8813941 first thing I noticed which 
is strange is node 1 is referring to the cluster as sageerpdev_escluster 
whereas node 2 and 3 are correctly referring to it as sageerpdevescluster. 
 The config files (elasticsearch,yml) are the same on each node :s so not 
sure why that's happening.

I've since reverted cluster 1 back to unicast mode so I can carry on 
working with the old environment.

Thanks

Andrew

On Tuesday, 4 February 2014 22:22:49 UTC, David Pilato wrote:
>
> Could you please GIST your logs on both nodes?
> Also, could you change Log level to TRACE for discovery? (See 
> config/logging.yml file)
>
> Thanks
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 4 févr. 2014 à 22:42, Andrew Westgarth 
> > 
> a écrit :
>
> Hi,
> I read with interest the news about the Azure Cloud Plugin over the 
> weekend and today have been trying to get it working with Windows VMs on 
> Azure with mixed levels of success.
>
> I have two environments/clusters one which has been running for a few 
> weeks and another which is brand new and only been running for a couple of 
> days; both have the head plugin installed so I can see the status of the 
> cluster(s).
>
> All of the clusters consist of 3 machines are using the Windows Server 
> 2012 R2 Datacenter base image with java 7 added, and elasticsearch 0.90.10 
> installed as a service set to automatic startup.
>
> Cluster 1 - been running using multicast discovery disabled and the ip 
> addresses of the nodes listed.  I have since installed the azure cloud 
> plugin, added the certificate and configuration to the node and enabled 
> multicast discovery again and commented out the list of ip addresses.  Now 
> when I view the details of the cluster, none of the nodes can see each 
> other and the cluster health status is marked in amber as the full cluster 
> is no longer available.
>
> the elasticsearch.yml file is as follows:
>
> # ElasticSearch Configuration Example 
> #
> # This file contains an overview of various configuration settings,
> # targeted at operations staff. Application developers should
> # consult the guide at .
> #
> # The installation procedure is covered at
> # <
> http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html
> >.
> #
> # ElasticSearch comes with reasonable defaults for most settings,
> # so you can try it out without bothering with configuration.
> #
> # Most of the time, these defaults are just fine for running a production
> # cluster. If you're fine-tuning your cluster, or wondering about the
> # effect of certain configuration option, please _do ask_ on the
> # mailing list or IRC channel [http://elasticsearch.org/community].
> # Any element in the configuration can be replaced with environment 
> variables
> # by placing them in ${...} notation. For example:
> #
> # node.rack: ${RACK_ENV_VAR}
> # For information on supported formats and syntax for the config file, see
> # <
> http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html
> >
>
> ### Cluster 
> ###
> # Cluster name identifies your cluster for auto-discovery. If you're 
> running
> # multiple clusters on the same network, make sure you're using unique 
> names.
> #
>  cluster.name: elasticsearch
>
>  Node 
> #
> # Node names are generated dynamically on startup, so you're relieved
> # from configuring them manually. You can tie this node to a specific name:
> #
> # node.name: "Franz Kafka"
> # Every node can be configured to allow or deny being eligible as the 
> master,
> # and to allow or deny to store the data.
> #
> # Allow this node to be eligible as a master node (enabled by default):
> #
> # node.master: true
> #
> # Allow this node to store data (enabled by default):
> #
> # node.data: true
> # You can exploit these settings to design advanced cluster topologies.
> #
> # 1. You want this node to never become a master node, only to hold data.
> #This will be the "workhorse" of your cluster.
> #
> # node.master: false
> # node.data: true
> #
> # 2. You want this node to only serve as a master: to not store any data 
> and
> #to have free resources. This will be the "coordinator" of your 
> cluster.
> #
> # node.master: true
> # node.data: false
> #
> # 3. You want this node to be neither master nor data node, but
> #to act as a "search load balancer" (fetching data from nodes,
> #aggregating results, etc.)
> #
> # node.master: false
> # node.data: false
> # Use the Cluster Health API [http://localhost:9200/_cluster/health], the
> # Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools
> # such as 

Re: Azure Cloud Plugin Problems

2014-02-04 Thread David Pilato
Could you please GIST your logs on both nodes?
Also, could you change Log level to TRACE for discovery? (See 
config/logging.yml file)

Thanks

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 4 févr. 2014 à 22:42, Andrew Westgarth  a écrit :

Hi,
I read with interest the news about the Azure Cloud Plugin over the weekend 
and today have been trying to get it working with Windows VMs on Azure with 
mixed levels of success.

I have two environments/clusters one which has been running for a few weeks and 
another which is brand new and only been running for a couple of days; both 
have the head plugin installed so I can see the status of the cluster(s).

All of the clusters consist of 3 machines are using the Windows Server 2012 R2 
Datacenter base image with java 7 added, and elasticsearch 0.90.10 installed as 
a service set to automatic startup.

Cluster 1 - been running using multicast discovery disabled and the ip 
addresses of the nodes listed.  I have since installed the azure cloud plugin, 
added the certificate and configuration to the node and enabled multicast 
discovery again and commented out the list of ip addresses.  Now when I view 
the details of the cluster, none of the nodes can see each other and the 
cluster health status is marked in amber as the full cluster is no longer 
available.

the elasticsearch.yml file is as follows:

# ElasticSearch Configuration Example #
# This file contains an overview of various configuration settings,
# targeted at operations staff. Application developers should
# consult the guide at .
#
# The installation procedure is covered at
# 
.
#
# ElasticSearch comes with reasonable defaults for most settings,
# so you can try it out without bothering with configuration.
#
# Most of the time, these defaults are just fine for running a production
# cluster. If you're fine-tuning your cluster, or wondering about the
# effect of certain configuration option, please _do ask_ on the
# mailing list or IRC channel [http://elasticsearch.org/community].
# Any element in the configuration can be replaced with environment variables
# by placing them in ${...} notation. For example:
#
# node.rack: ${RACK_ENV_VAR}
# For information on supported formats and syntax for the config file, see
# 


### Cluster ###
# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
 cluster.name: elasticsearch

 Node #
# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
# node.name: "Franz Kafka"
# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
# node.master: true
#
# Allow this node to store data (enabled by default):
#
# node.data: true
# You can exploit these settings to design advanced cluster topologies.
#
# 1. You want this node to never become a master node, only to hold data.
#This will be the "workhorse" of your cluster.
#
# node.master: false
# node.data: true
#
# 2. You want this node to only serve as a master: to not store any data and
#to have free resources. This will be the "coordinator" of your cluster.
#
# node.master: true
# node.data: false
#
# 3. You want this node to be neither master nor data node, but
#to act as a "search load balancer" (fetching data from nodes,
#aggregating results, etc.)
#
# node.master: false
# node.data: false
# Use the Cluster Health API [http://localhost:9200/_cluster/health], the
# Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools
# such as  and
#  to inspect the cluster state.
# A node can have generic attributes associated with it, which can later be used
# for customized shard allocation filtering, or allocation awareness. An 
attribute
# is a simple key value pair, similar to node.key: value, here is an example:
#
# node.rack: rack314
# By default, multiple nodes are allowed to start from the same installation 
location
# to disable it, set the following:
# node.max_local_storage_nodes: 1

 Index 
# You can set a number of options (such as shard/replica options, mapping
# or analyzer definitions, translog settings, ...) for indices globally,
# in this file.
#
# Note, that it makes more sense to configure index settings specific

Re: Joining node to cluster without restarting entire machine?

2014-02-04 Thread Mark Walkom
If you give the service a restart, it's a stop and then a start (obviously).
This will/should reread the config and attempt to rejoin the cluster in the
config.

Can you try an explicit stop, then sleep for 5, then start? It could be the
process isn't properly closing when requested.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 February 2014 04:22, Tony Su  wrote:

> Unless I'm missing something in the docs or these forums,
>
> I've surprisingly found that if a node fails to join the cluster, it's not
> sufficient to simply restart ES on the machine. I would have thought that
> restarting ES thereby re-reading its config files should be sufficient to
> announce its intention to join the cluster.
>
> But, I haven't found that to be the case, every time I've had to reboot
> the entire machine to join the cluster.
>
> Is there a config I'm missing?
>
> Thx,
> Tony
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/02c4b578-f430-44ba-a98c-7337b684125d%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YB4%3Dte4VhWDeqkTtnEeDvsDO7_Hc6gWAtz74o76jKzSA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


suggestion completion across multiple types in an index

2014-02-04 Thread Avinash Mohan
Hi,

Is it possible to do a suggestion completion on a type. im able to do it on 
an index .

POST /data/_suggest
{
  "data" : {
"text" : "tr",
"completion" : {
  "field" : "sattributes",
  "size":50
}
  }
}

when i do on a type 

POST /data/suggestion/_suggest
{
  "data" : {
"text" : "tr",
"completion" : {
  "field" : "sattributes",
  "size":50
}
  }
}

"suggestion" is the type 

i don't get any results. I need to do suggestion on two different types 
articles and books. Do i need to create separate indexes to make them work 
or is there a way in elasticsearch to accomplish this. In case if i have to 
search on my index "data" is there way to get 50 results for type "article" 
and 50 results for type "book".

Any help is highly appreciated.

- Avinash

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b68f9f2-e566-414c-908c-4403ce0a2bcb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch LXC on Ubuntu 14.04 and recomended settings

2014-02-04 Thread Mark Walkom
That looks ok, similar to how we do things with virtualised master/data
nodes.
I wouldn't specify your shard/replica count on the node though, do it in
the index as it allows you to change with ease.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 February 2014 05:19, Tony Su  wrote:

> Hi Enger,
> Although I don't yet have enough experience building ES clusters to
> directly answer your question(s),
>
> Typically when I'm involved in this type of provisioning, I generally
> start off with a set of objectives and then design accordingly. I'd be
> interested in your objectives list and then match those to your proposed
> configuration.
>
> Thx,
> Tony
>
>
>
>
> On Tuesday, February 4, 2014 9:54:48 AM UTC-8, engel der wrote:
>
>> Hi,
>>
>> we are setting up a Elasticsearch 1.0 (RC2) Cluster and I think I need
>> some help were to start with (settings related). We have got 6 physical
>> server with 265GB RAM and 2TB local SAS storage (seperated in two Raid10
>> Groups as LVM VGs). Those six servers are running Ubuntu 14.04. All "roles"
>> (Application Server [NGINX+PHP-FPM+GlusterFS-Client+Elasticsearch
>> "searcher"], Database Server [Galera Cluster], Storage Server [GlusterFS],
>> Cache Server [Redis] ...) will be running in LXC containers. Most of them
>> Ubuntu 14.04 only the Galera Cluster in 12.04.
>> We expect about 100GB of data to index and the data is changing not that
>> fast (5% per day?). The idea is to install Elastic Search on all 6
>> Application Severs as "searcher" with:
>>
>> cluster.name: search001
>> node.master: false
>> node.data: false
>> #node.master: true
>> #node.data: true
>> node.max_local_storage_nodes: 1
>> index.number_of_shards: 5
>> index.number_of_replicas: 2
>>
>> Add 3 "data" Nodes with:
>>
>> cluster.name: search001
>> node.master: false
>> #node.data: false
>> #node.master: true
>> node.data: true
>> node.max_local_storage_nodes: 1
>> index.number_of_shards: 5
>> index.number_of_replicas: 2
>> bootstrap.mlockall: true
>>
>> and 3 "master" nodes:
>>
>> cluster.name: search001
>> #node.master: false
>> node.data: false
>> node.master: true
>> #node.data: true
>> node.max_local_storage_nodes: 1
>> index.number_of_shards: 5
>> index.number_of_replicas: 2
>>
>> The LXCs for those "searchers" get 8GB RAM, the "masters" get 2GB RAM and
>> the "data" LXCs get 60GB and 300GB storage.
>>
>> What about the Java settings for those "data" nodes???
>>
>> cat /etc/default/elasticsearch
>> # Run Elasticsearch as this user ID and group ID
>> ES_USER=elasticsearch
>> ES_GROUP=elasticsearch
>>
>> # Heap Size (defaults to 256m min, 1g max)
>> ES_HEAP_SIZE=30g
>>
>> # Heap new generation
>> ES_HEAP_NEWSIZE=1g
>>
>> # max direct memory
>> ES_DIRECT_SIZE=???
>>
>> # Maximum number of open files, defaults to 65535.
>> MAX_OPEN_FILES=65535
>>
>> # Maximum locked memory size. Set to "unlimited" if you use the
>> # bootstrap.mlockall option in elasticsearch.yml. You must also set
>> # ES_HEAP_SIZE.
>> MAX_LOCKED_MEMORY=unlimited
>>
>> # Maximum number of VMA (Virtual Memory Areas) a process can own
>> MAX_MAP_COUNT=262144  #more
>>
>> # Elasticsearch log directory
>> #LOG_DIR=/var/log/elasticsearch
>>
>> # Elasticsearch data directory
>> #DATA_DIR=/var/lib/elasticsearch
>>
>> # Elasticsearch work directory
>> #WORK_DIR=/tmp/elasticsearch
>>
>> # Elasticsearch configuration directory
>> #CONF_DIR=/etc/elasticsearch
>>
>> # Elasticsearch configuration file (elasticsearch.yml)
>> #CONF_FILE=/etc/elasticsearch/elasticsearch.yml
>>
>> # Additional Java OPTS
>> #ES_JAVA_OPTS=
>>
>> # Configure restart on package upgrade (true, every other setting will
>> lead to not restarting)
>> #RESTART_ON_UPGRADE=true
>>
>>
>> What about the master and searcher settings? I guess I do not have to
>> tune them?
>>
>> Thank you for any help!
>>
>> Regards,
>> Flo
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7ab42442-3680-4364-851a-c4f3b590f00e%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624aSY9Lb%2B1pavQB81C0QkxjnTk_e_F_An2Ht%3DP8w%2BFJKYA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Problem: Facets tokenize tags with spaces. Is there a solution?

2014-02-04 Thread mohammad
Hello everyone,
well i am new to elastic search and i am facing some similar difficulties as
mentioned above. i tried implementing some of the suggested solution but to
no avail.
I am posting part of codes and will be very grateful if somebody could help
me out. Thanks in advance.

the codes are written in java:
// i have the following in the mapping part
CreateIndexRequestBuilder builder =
client.admin().indices().prepareCreate(index)

.setSettings(ImmutableSettings.settingsBuilder().loadFromSource(configIndex));

builder.addMapping("StatTest",  "{\n" + 
"   \"StatTest\" : {\n" + 
"   \"_all\" : { \n" + 
"   \"analyzer\":\"francais\" \n" + 
"   },\n" + 
"   \"properties\" : {\n" + 
"   \"idUser\" : {\"type\" : \"string\", 
\"analyzer\":\"francais\"},\n" +
"   \"loginOfUser\" : {\"type\" : 
\"string\",
\"analyzer\":\"francais\"},\n" + 
"   \"nameOfUser\" : {\"type\" : \"string\",
\"analyzer\":\"francais\"},\n" + 
"   }\n" + 
"   }\n" + 
"}");   

//the sample data stored are the following
{idUser: "0121", loginOfUser: "login0121", nameOfUser :"mona lisa"},
{idUser: "0122", loginOfUser: "login0122", nameOfUser :"James Dean"},

//i am trying to get facets based upon name of user
//TermsFacetBuilder fb =
FacetBuilders.termsFacet("idOfUser").field("loginOfUser");
TermsFacetBuilder fb =
FacetBuilders.termsFacet("idOfUser").field("nameOfUser");   
SearchRequestBuilder srb1 =
client.prepareSearch().setIndices(index).addFacet(fb);  
AndFilterBuilder myFilters = 
FilterBuilders.andFilter();
myFilters.add(FilterBuilders.termFilter("year", 
"2014"));
FilterBuilder fbBuilder = 
FilterBuilders.andFilter(myFilters); 
FilteredQueryBuilder q =
QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),fbBuilder); 
SearchResponse sr = 
srb1.setQuery(q).execute().actionGet(); 
TermsFacet f = (TermsFacet) 
sr.getFacets().facetsAsMap().get("idOfUser");   
for (TermsFacet.Entry entry : f) {
String type = entry.getTerm().toString();
//System.out.println("enter type : "+type);
//System.out.println("enter 
entry.getCount() : "+entry.getCount());

}


//problems faced whenever i am trying to do a facet based on login of user,
everything works well
the variable type  returns :
login0121
login0122

however when i try to do a facet based on nameOfUser , the following is
returned:
mona 
lisa
James 
Dean

/
i want to retriev the usernames as one token only,
am i missing some codes somewhere
i will be very thankful if any one can help me on this
thanks in advance



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Problem-Facets-tokenize-tags-with-spaces-Is-there-a-solution-tp3651335p4048817.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1391536584221-4048817.post%40n3.nabble.com.
For more options, visit https://groups.google.com/groups/opt_out.


Exception with geoDistanceFilter

2014-02-04 Thread Oren Kagan
Hello there, 
I tried to solve this for a while with no success and I'm asking for help 
with using the geoDistanceFilter (and this problem happens both on Java as 
well as on the REST api).
When I'm using geoDistanceFilter, such as here:
FilterBuilder locationFilter = FilterBuilders.geoDistanceFilter("location")
.point(-71.12163, 42.342587)  
.distance(1000, DistanceUnit.METERS);
I'm getting the exception below (Note that if I replace my locationFilter 
with FilterBuilders.matchAllFilter() the search works OK with no 
exception). 
Do you have any idea why I am getting this and how to make it work?  
Thanks,
OK

org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to 
execute phase [query], all shards failed; shardFailures 
{[QXuVaxICQWWe8hf6QV3eDw][stores][4]: RemoteTransportException[[Andrew 
Gervais][inet[/67.168.58.25:9302]][search/phase/query]]; nested: 
SearchParseException[[stores][4]: from[-1],size[100]: Parse Failure [Failed 
to parse source 
[{"size":100,"query":{"filtered":{"query":{"match_all":{}},"filter":{"geo_distance":{"location":[42.342587,-71.12163],"distance":"1000.0m"}]]];
 
nested: QueryParsingException[[stores] failed to find geo_point field 
[location]]; }{[QXuVaxICQWWe8hf6QV3eDw][stores][2]: 
RemoteTransportException[[Andrew 
Gervais][inet[/68.169.58.25:9302]][search/phase/query]]; nested: 
SearchParseException[[stores][2]: from[-1],size[100]: Parse Failure [Failed 
to parse source 
[{"size":100,"query":{"filtered":{"query":{"match_all":{}},"filter":{"geo_distance":{"location":[42.342587,-71.12163],"distance":"1000.0m"}]]];
 
nested: QueryParsingException[[stores] failed to find geo_point field 
[location]]; }{[QXuVaxICQWWe8hf6QV3eDw][stores][3]: 
RemoteTransportException[[Andrew 
Gervais][inet[/68.169.58.25:9302]][search/phase/query]]; nested: 
SearchParseException[[stores][3]: from[-1],size[100]: Parse Failure [Failed 
to parse source 
[{"size":100,"query":{"filtered":{"query":{"match_all":{}},"filter":{"geo_distance":{"location":[42.342587,-71.12163],"distance":"1000.0m"}]]];
 
nested: QueryParsingException[[stores] failed to find geo_point field 
[location]]; }{[hlytGdVrTi6zoYt2K_9ATw][stores][1]: 
SearchParseException[[stores][1]: from[-1],size[100]: Parse Failure [Failed 
to parse source 
[{"size":100,"query":{"filtered":{"query":{"match_all":{}},"filter":{"geo_distance":{"location":[42.342587,-71.12163],"distance":"1000.0m"}]]];
 
nested: QueryParsingException[[stores] failed to find geo_point field 
[location]]; }{[QXuVaxICQWWe8hf6QV3eDw][stores][0]: 
RemoteTransportException[[Andrew 
Gervais][inet[/68.169.58.25:9302]][search/phase/query]]; nested: 
SearchParseException[[stores][0]: from[-1],size[100]: Parse Failure [Failed 
to parse source 
[{"size":100,"query":{"filtered":{"query":{"match_all":{}},"filter":{"geo_distance":{"location":[42.342587,-71.12163],"distance":"1000.0m"}]]];
 
nested: QueryParsingException[[stores] failed to find geo_point field 
[location]]; }
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:272)
at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:224)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$4.handleException(SearchServiceTransportAction.java:222)
at 
org.elasticsearch.transport.netty.MessageChannelHandler.handleException(MessageChannelHandler.java:181)
at 
org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:171)
at 
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:123)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at 
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at 
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at 
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at 
org.elasticsearch.common.netty.channel

Azure Cloud Plugin Problems

2014-02-04 Thread Andrew Westgarth
Hi,
I read with interest the news about the Azure Cloud Plugin over the 
weekend and today have been trying to get it working with Windows VMs on 
Azure with mixed levels of success.

I have two environments/clusters one which has been running for a few weeks 
and another which is brand new and only been running for a couple of days; 
both have the head plugin installed so I can see the status of the 
cluster(s).

All of the clusters consist of 3 machines are using the Windows Server 2012 
R2 Datacenter base image with java 7 added, and elasticsearch 0.90.10 
installed as a service set to automatic startup.

Cluster 1 - been running using multicast discovery disabled and the ip 
addresses of the nodes listed.  I have since installed the azure cloud 
plugin, added the certificate and configuration to the node and enabled 
multicast discovery again and commented out the list of ip addresses.  Now 
when I view the details of the cluster, none of the nodes can see each 
other and the cluster health status is marked in amber as the full cluster 
is no longer available.

the elasticsearch.yml file is as follows:

# ElasticSearch Configuration Example 
#
# This file contains an overview of various configuration settings,
# targeted at operations staff. Application developers should
# consult the guide at .
#
# The installation procedure is covered at
# <
http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html
>.
#
# ElasticSearch comes with reasonable defaults for most settings,
# so you can try it out without bothering with configuration.
#
# Most of the time, these defaults are just fine for running a production
# cluster. If you're fine-tuning your cluster, or wondering about the
# effect of certain configuration option, please _do ask_ on the
# mailing list or IRC channel [http://elasticsearch.org/community].
# Any element in the configuration can be replaced with environment 
variables
# by placing them in ${...} notation. For example:
#
# node.rack: ${RACK_ENV_VAR}
# For information on supported formats and syntax for the config file, see
# <
http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html
>

### Cluster 
###
# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique 
names.
#
 cluster.name: elasticsearch

 Node 
#
# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
# node.name: "Franz Kafka"
# Every node can be configured to allow or deny being eligible as the 
master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
# node.master: true
#
# Allow this node to store data (enabled by default):
#
# node.data: true
# You can exploit these settings to design advanced cluster topologies.
#
# 1. You want this node to never become a master node, only to hold data.
#This will be the "workhorse" of your cluster.
#
# node.master: false
# node.data: true
#
# 2. You want this node to only serve as a master: to not store any data and
#to have free resources. This will be the "coordinator" of your cluster.
#
# node.master: true
# node.data: false
#
# 3. You want this node to be neither master nor data node, but
#to act as a "search load balancer" (fetching data from nodes,
#aggregating results, etc.)
#
# node.master: false
# node.data: false
# Use the Cluster Health API [http://localhost:9200/_cluster/health], the
# Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools
# such as  and
#  to inspect the cluster state.
# A node can have generic attributes associated with it, which can later be 
used
# for customized shard allocation filtering, or allocation awareness. An 
attribute
# is a simple key value pair, similar to node.key: value, here is an 
example:
#
# node.rack: rack314
# By default, multiple nodes are allowed to start from the same 
installation location
# to disable it, set the following:
# node.max_local_storage_nodes: 1

 Index 

# You can set a number of options (such as shard/replica options, mapping
# or analyzer definitions, translog settings, ...) for indices globally,
# in this file.
#
# Note, that it makes more sense to configure index settings specifically 
for
# a certain index, either when creating it or by using the index templates 
API.
#
# See <
http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html>
 
and
# <
http://elasticsearch.org/guide/en/elasticsearch/reference/

Having issues with Azure Cloud plugin

2014-02-04 Thread Andrew Westgarth
Hi,
I've been trying to make use of the Azure Cloud plugin today to enable 
azure discovery of nodes, with very little success

I have two clusters and both are exhibiting the same behaviour, I'm testing 
using the Head plugin

Cluster 1 - previously working fine with multicast disabled and ip 
addresses listed.

Now using Azure Cloud Plugin cannot see any of the other nodes in the 
cluster, i.e. the machines think they are a cluster of one node

Cluster 2 - New clean cluster with no data, default configuration from box, 
only changed cluster.name and added config for azure cloud plugin

Cannot see any of the nodes in the cluster i.e. the machines think they are 
a cluster of one node

here is an example of my elasticsearch.yml file from cluster 1, the second 
is exactly the same as out of the box with just the cluster.name value set 
and the azure plugin information as below:

# ElasticSearch Configuration Example 
#
# This file contains an overview of various configuration settings,
# targeted at operations staff. Application developers should
# consult the guide at .
#
# The installation procedure is covered at
# <
http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html
>.
#
# ElasticSearch comes with reasonable defaults for most settings,
# so you can try it out without bothering with configuration.
#
# Most of the time, these defaults are just fine for running a production
# cluster. If you're fine-tuning your cluster, or wondering about the
# effect of certain configuration option, please _do ask_ on the
# mailing list or IRC channel [http://elasticsearch.org/community].
# Any element in the configuration can be replaced with environment 
variables
# by placing them in ${...} notation. For example:
#
# node.rack: ${RACK_ENV_VAR}
# For information on supported formats and syntax for the config file, see
# <
http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html
>

### Cluster 
###
# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique 
names.
#
 cluster.name: elasticsearch

 Node 
#
# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
# node.name: "Franz Kafka"
# Every node can be configured to allow or deny being eligible as the 
master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
# node.master: true
#
# Allow this node to store data (enabled by default):
#
# node.data: true
# You can exploit these settings to design advanced cluster topologies.
#
# 1. You want this node to never become a master node, only to hold data.
#This will be the "workhorse" of your cluster.
#
# node.master: false
# node.data: true
#
# 2. You want this node to only serve as a master: to not store any data and
#to have free resources. This will be the "coordinator" of your cluster.
#
# node.master: true
# node.data: false
#
# 3. You want this node to be neither master nor data node, but
#to act as a "search load balancer" (fetching data from nodes,
#aggregating results, etc.)
#
# node.master: false
# node.data: false
# Use the Cluster Health API [http://localhost:9200/_cluster/health], the
# Node Info API [http://localhost:9200/_cluster/nodes] or GUI tools
# such as  and
#  to inspect the cluster state.
# A node can have generic attributes associated with it, which can later be 
used
# for customized shard allocation filtering, or allocation awareness. An 
attribute
# is a simple key value pair, similar to node.key: value, here is an 
example:
#
# node.rack: rack314
# By default, multiple nodes are allowed to start from the same 
installation location
# to disable it, set the following:
# node.max_local_storage_nodes: 1

 Index 

# You can set a number of options (such as shard/replica options, mapping
# or analyzer definitions, translog settings, ...) for indices globally,
# in this file.
#
# Note, that it makes more sense to configure index settings specifically 
for
# a certain index, either when creating it or by using the index templates 
API.
#
# See <
http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html>
 
and
# <
http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html
>
# for more information.
# Set the number of shards (splits) of an index (5 by default):
#
# index.number_of_shards: 5
# Set the number of replicas (additional copies) of an index (1 by default):
#
#

Configuring refresh_interval at the query level

2014-02-04 Thread nariman
We've had some success improving bulk insertion times using a higher value 
for refresh_interval when doing bulk inserts.

However, the global nature of this setting seems to cause some problems.

We want some insertions processed with a higher value and others processed 
immediately (under the default 1s) there's no way to safely do this in 
a concurrent environment where end-user actions are triggering index 
updates. 

Any suggestions on how to handle this? 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b6d9820-874c-4b10-91d3-3493fe8efd17%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Improving Bulk Indexing

2014-02-04 Thread joergpra...@gmail.com
My use case is bibliographic data indexing of academic and public
libraries. There are ~100m records from various sources that I regularly
extract, transform into JSON-LD, and load into Elasticsearch. Some are
files, some are fetched by JDBC. I have six 32-core servers in our place,
organized in 2 ES clusters. Self installed and configured - no cloud VMs :)
With bulk indexing I can push around 10-12m/sec to an ES cluster.
Transforming docs is rather complex, needs re-processing of indexed data.
The job is done in a few hours so I can perform ETL every night. No SSD,
too expensive, but SAS-2 (6Gbit/sec) RAID-0 drives of ~1TB per server.

Jörg



On Tue, Feb 4, 2014 at 5:22 PM, ZenMaster80  wrote:

> Jörg,
>
> Great, I learned a lot about the process from your responses. Could you
> elaborate more on your use case, mine I think will be similar to yours
> where processing/feeding is on one server and I will use transport client,
> index nodes will be on EC2. So, when I do get to setting up Ec2 nodes, I
> believe I should be mostly looking for big cores and SSD.
> For current test, besides running long feeds to guage performance and
> checking for analyzers, I take it there isn't much else I can do to make
> significant impact?
>
>
> On Tuesday, February 4, 2014 3:11:14 AM UTC-5, Jörg Prante wrote:
>>
>> SSD will improve overall performance very much, yes. Disk drives are the
>> slowest part in the chain and this will help. No more low IOPS, so it will
>> significantly reduce the load on CPU (less IO waits).
>>
>> More RAM will not help that much. In fact, more RAM will slow down
>> persisting, it increases pressure on the memory-to-disk part. ES obviously
>> does not depend on large RAM for persisting data, some MB suffice, but you
>> can try and see for yourself.
>>
>> 85 MB is not sufficient for testing index segment merging and GC effects,
>> you should run a bulk indexing feed not for seconds, but for at least 20-30
>> minutes, if not for hours.
>>
>> Also check if your mapping can be simplified, the less complex analyzers,
>> the faster ES can index.
>>
>> You should also exercise your feed program how long it takes to process
>> your input without the part of bulk indexing. Then you see a bottom line,
>> and maybe more space for improvement outside ES.
>>
>> In my use case, it helped to move the feed program to another server and
>> use the TransportClient with a speedup of ~30%.
>>
>> I agree that 5.5M/sec is not the end of the line but that heavily depends
>> on your hard- and software configuration (machine, OS, file systems, JVM).
>>
>> Jörg
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8db08c83-c91d-45df-bd28-5fe49f7f32cd%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG1-JR%3D0S-oGsHxyqZcf04kqoGV19Y66vfLnEEi1C5zxA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: 1.0.0.RC2 lots of [WARN ][discovery.zen.ping.multicast] [Pixx] failed to read requesting data from ***

2014-02-04 Thread Chen Wang
Alex,
Thanks for your reply.
There are other es instance (0.90.10) running. But I have configured mine 
to have a different cluster name. but still it throws "failed to read 
requesting data" warnings. Can these warnings be safely ignored?
Thanks,
Chen

On Monday, February 3, 2014 11:47:58 PM UTC-8, Alexander Reelsen wrote:
>
> Hey,
>
> yes, elasticsearch now starts in the foreground by default, please see 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/breaking-changes.htmlfor
>  a list of breaking changes compared to 0.90
>
> The other problem might stem from the problem, that you are still using an 
> old elasticsearch version somewhere (maybe a node client?) or different JVM 
> versions (but if it worked before, I rather think the first).
>
> Can you check that? Also, you got an IP (10.93.x.y), does that run a valid 
> elasticsearch instance or just something that tries to connect to your 
> cluster?
>
>
> --Alex
>
>
> On Tue, Feb 4, 2014 at 8:18 AM, Chen Wang 
> > wrote:
>
>> Hi,
>> It seems that the default ./bin elasticsearch are changed to run in the 
>> front instead of the back end, is this true? As of beta2, still seems to 
>> run fine. But when I upgrade to 1.0.0.Rc1, or RC2
>> when running ./bin elasticsearch, it starts to run in the front, and 
>> gives me lots of warnings like:
>> [WARN ][discovery.zen.ping.multicast] [Pixx] failed to read requesting 
>> data from 
>> /10.93.69.138:54328
>> java.io.IOException: No transport address mapped to [21623]
>> at 
>> org.elasticsearch.common.transport.TransportAddressSerializers.addressFromStream(TransportAddressSerializers.java:71)
>> at 
>> org.elasticsearch.cluster.node.DiscoveryNode.readFrom(DiscoveryNode.java:267)
>> at 
>> org.elasticsearch.cluster.node.DiscoveryNode.readNode(DiscoveryNode.java:257)
>> at 
>> org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing$Receiver.run(MulticastZenPing.java:410)
>> at java.lang.Thread.run(Thread.java:662)
>>
>> Is this expected?
>> Thanks,
>> Chen
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5ca1d17c-1b4a-4f4e-bd8b-fb8b8bd0896b%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e6909354-1bb1-4846-9392-fdae5ae1db91%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: elasticesearch and event correlation

2014-02-04 Thread Jason Weber
John,
Same questions, did you ever figure anything out on this?

Jason




On Friday, June 7, 2013 4:35:22 AM UTC-4, John Zhang wrote:
>
> Hi guys,
>
> I am one newer for elasticesearch. 
>
> I am trying  ElasticSearch +Kibana + Logstash for my security log 
> management, I also need do event correlation on this platform, like what 
> Simple Event Correlator (SEC, http://simple-evcorr.sourceforge.net/) do. 
>
> My question is:
> How I do event correlation with ElasticSearch +Kibana + Logstash? Or Can 
> I make SEC work with ElasticSearch +Kibana + Logstash?
>
> Any suggestion, comment will be highly appreciated!
>
> Thanks!
>
> Best regards,
> John
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b4d3f19f-534c-4f05-88e0-23770c4638fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: RE: How do I get whole values of a field, as a facet? (not individual terms!)

2014-02-04 Thread Mohammad Shafraz Subdurally
Hello everyone,
well i am new to elastic search and i am facing some similar difficulties 
as mentioned above. i tried implementing some of the suggested solution but 
to no avail.
I am posting part of codes and will be very grateful if somebody could help 
me out. Thanks in advance.

the codes are written in java:
// i have the following in the mapping part
CreateIndexRequestBuilder builder = 
client.admin().indices().prepareCreate(index)

.setSettings(ImmutableSettings.settingsBuilder().loadFromSource(configIndex));

builder.addMapping("StatTest",  "{\n" + 
"\"StatTest\" : {\n" + 
"\"_all\" : { \n" + 
"\"analyzer\":\"francais\" \n" + 
"},\n" + 
"\"properties\" : {\n" + 
"\"idUser\" : {\"type\" : \"string\", 
\"analyzer\":\"francais\"},\n" +
"\"loginOfUser\" : {\"type\" : \"string\", 
\"analyzer\":\"francais\"},\n" + 
"\"nameOfUser\" : {\"type\" : \"string\", 
\"analyzer\":\"francais\"},\n" + 
"}\n" + 
"}\n" + 
"}");

//the sample data stored are the following
{idUser: "0121", loginOfUser: "login0121", nameOfUser :"mona lisa"},
{idUser: "0122", loginOfUser: "login0122", nameOfUser :"James Dean"},

//i am trying to get facets based upon name of user
//TermsFacetBuilder fb = 
FacetBuilders.termsFacet("idOfUser").field("loginOfUser");
TermsFacetBuilder fb = 
FacetBuilders.termsFacet("idOfUser").field("nameOfUser");
SearchRequestBuilder srb1 = 
client.prepareSearch().setIndices(index).addFacet(fb);
AndFilterBuilder myFilters = FilterBuilders.andFilter();
myFilters.add(FilterBuilders.termFilter("year", "2014"));
FilterBuilder fbBuilder = FilterBuilders.andFilter(myFilters); 
FilteredQueryBuilder q = 
QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),fbBuilder); 
SearchResponse sr = srb1.setQuery(q).execute().actionGet();

TermsFacet f = (TermsFacet) 
sr.getFacets().facetsAsMap().get("idOfUser");
for (TermsFacet.Entry entry : f) {
String type = entry.getTerm().toString();
//System.out.println("enter type : "+type);
//System.out.println("enter entry.getCount() : 
"+entry.getCount());

}


//problems faced whenever i am trying to do a facet based on login of user,
everything works well
the variable type  returns :
login0121
login0122

however when i try to do a facet based on nameOfUser , the following is 
returned:
mona 
lisa
James 
Dean

/
i want to retriev the usernames as one token only,
am i missing some codes somewhere
i will be very thankful if any one can help me on this
thanks in advance

On Wednesday, 24 August 2011 22:36:02 UTC+4, ogregras wrote:
>
> Thanks kimchy!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a5cc7038-01d1-4a8e-ab7a-5f84d51e0296%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch LXC on Ubuntu 14.04 and recomended settings

2014-02-04 Thread Tony Su
Hi Enger,
Although I don't yet have enough experience building ES clusters to 
directly answer your question(s),
 
Typically when I'm involved in this type of provisioning, I generally start 
off with a set of objectives and then design accordingly. I'd be interested 
in your objectives list and then match those to your proposed configuration.
 
Thx,
Tony
 
 
 

On Tuesday, February 4, 2014 9:54:48 AM UTC-8, engel der wrote:

> Hi,
>
> we are setting up a Elasticsearch 1.0 (RC2) Cluster and I think I need 
> some help were to start with (settings related). We have got 6 physical 
> server with 265GB RAM and 2TB local SAS storage (seperated in two Raid10 
> Groups as LVM VGs). Those six servers are running Ubuntu 14.04. All "roles" 
> (Application Server [NGINX+PHP-FPM+GlusterFS-Client+Elasticsearch 
> "searcher"], Database Server [Galera Cluster], Storage Server [GlusterFS], 
> Cache Server [Redis] ...) will be running in LXC containers. Most of them 
> Ubuntu 14.04 only the Galera Cluster in 12.04.
> We expect about 100GB of data to index and the data is changing not that 
> fast (5% per day?). The idea is to install Elastic Search on all 6 
> Application Severs as "searcher" with:
>
> cluster.name: search001
> node.master: false
> node.data: false
> #node.master: true
> #node.data: true
> node.max_local_storage_nodes: 1
> index.number_of_shards: 5
> index.number_of_replicas: 2
>
> Add 3 "data" Nodes with:
>
> cluster.name: search001
> node.master: false
> #node.data: false
> #node.master: true
> node.data: true
> node.max_local_storage_nodes: 1
> index.number_of_shards: 5
> index.number_of_replicas: 2
> bootstrap.mlockall: true
>
> and 3 "master" nodes:
>
> cluster.name: search001
> #node.master: false
> node.data: false
> node.master: true
> #node.data: true
> node.max_local_storage_nodes: 1
> index.number_of_shards: 5
> index.number_of_replicas: 2
>
> The LXCs for those "searchers" get 8GB RAM, the "masters" get 2GB RAM and 
> the "data" LXCs get 60GB and 300GB storage.
>
> What about the Java settings for those "data" nodes???
>
> cat /etc/default/elasticsearch 
> # Run Elasticsearch as this user ID and group ID
> ES_USER=elasticsearch
> ES_GROUP=elasticsearch
>
> # Heap Size (defaults to 256m min, 1g max)
> ES_HEAP_SIZE=30g
>
> # Heap new generation
> ES_HEAP_NEWSIZE=1g
>
> # max direct memory
> ES_DIRECT_SIZE=???
>
> # Maximum number of open files, defaults to 65535.
> MAX_OPEN_FILES=65535
>
> # Maximum locked memory size. Set to "unlimited" if you use the
> # bootstrap.mlockall option in elasticsearch.yml. You must also set
> # ES_HEAP_SIZE.
> MAX_LOCKED_MEMORY=unlimited
>
> # Maximum number of VMA (Virtual Memory Areas) a process can own
> MAX_MAP_COUNT=262144  #more
>
> # Elasticsearch log directory
> #LOG_DIR=/var/log/elasticsearch
>
> # Elasticsearch data directory
> #DATA_DIR=/var/lib/elasticsearch
>
> # Elasticsearch work directory
> #WORK_DIR=/tmp/elasticsearch
>
> # Elasticsearch configuration directory
> #CONF_DIR=/etc/elasticsearch
>
> # Elasticsearch configuration file (elasticsearch.yml)
> #CONF_FILE=/etc/elasticsearch/elasticsearch.yml
>
> # Additional Java OPTS
> #ES_JAVA_OPTS=
>
> # Configure restart on package upgrade (true, every other setting will 
> lead to not restarting)
> #RESTART_ON_UPGRADE=true
>
>
> What about the master and searcher settings? I guess I do not have to tune 
> them?
>
> Thank you for any help!
>
> Regards,
> Flo
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7ab42442-3680-4364-851a-c4f3b590f00e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: query_string queries don't handle multiple types

2014-02-04 Thread Amit Soni
I am using simple query string and also get this number format exception
for same reason. However it seems there is no "lenient" flag for simple
query string.

thoughts?

-Amit.


On Mon, Dec 30, 2013 at 6:52 AM, Yarin Miran  wrote:

> Silly me,
> I've found that there's a flag for ignoring these exceptions using
> "lenient"
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
>
>
> On Monday, December 30, 2013 4:44:13 PM UTC+2, Yarin Miran wrote:
>>
>> Hello everyone,
>>
>> I'm trying to implement a search for one of my indices using the
>> following query:
>>
>> {
>> "query" : {
>> "query_string" : {
>>  "query" : "some text",
>> "fields" : ["collected.*"]
>> }
>>  }
>> }
>>
>> The documents in the index have a field named "collected" which is
>> dynamic and changes between documents.
>>
>> When I try to run this query I get NumberFormatException since some of
>> the fields are Numeric and I guess elasticsearch tries to cast the input
>> string to a number.
>> Is there any possibility to make the query_string parameter perform a
>> 'best effort' search without raising that exception so that it will ignore
>> the fields that he can't cast,
>>
>> Thanks,
>> Yarin
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/64982b6f-0152-4a02-bfe0-85c7806b0f6b%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAOGaQKHNGSMtFMZLUP%2BMAPZfHUPZh0M1TUKWtZf2UMqZoq-gA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


sorting problems

2014-02-04 Thread damian noseda
 

Hello 


I'm having problems with sorting. I want to get the latest data (with the 
biggest date_created).

So I curl like this:

curl -vsX GET myelastic.com/notifications/notification/_search -d '{

 "query": {
 "bool": {
 "must": [
 {
 "term": {
 "type": "config_withdraw"
 }
 }
 ],
 "must_not": [],
 "should": []
 }
 },
 "sort": [
 {
 "date_created": "desc"
 }
 ],
 "size": 1
}'|python -mjson.tool
  
RESULT:

{
 "_shards": {
 "failed": 0, 
 "successful": 10, 
 "total": 10
 }, 
 "hits": {
 "hits": [
 {
 "_id": "a3Rks7DNTYWGDRZzVg5g8Q", 
 "_index": "notifications", 
 "_score": null, 
 "_source": {
 "date_created": "2024-02-01T13:49:52.454-04:00", 
 "expiration_date": "2024-02-01T13:49:52.452-04:00", 
 "level": null, 
 "model": {
 "user_id": 80843387
 }, 
 "reference_id": null, 
 "status": "new", 
 "type": "config_withdraw", 
 "user_id": "80843387"
 }, 
 "_type": "notification", 
 "sort": [
 1706809792454
 ]
 }
 ], 
 "max_score": null, 
 "total": 10533937
 }, 
 "timed_out": false, 
 "took": 163
}


But this is not the latest data becouse in another search I get a bigger 
date_created:

curl -vsX GET myelastic.com/notifications/notification/_search -d '{
 "query": {
 "bool": {
 "must": [
 {
 "term": {
 "user_id": "153423413"
 }
 }
 ],
 "must_not": [],
 "should": []
 }
 },
 "sort": [
 {
 "date_created": "desc"
 }
 ],
 "size": 1
}'|python -mjson.tool
 
RESULT:
{
 "_shards": {
 "failed": 0, 
 "successful": 10, 
 "total": 10
 }, 
 "hits": {
 "hits": [
 {
 "_id": "PfWSkIDxSk2ou62Yiqbudw", 
 "_index": "notifications", 
 "_score": null, 
 "_source": {
 "date_created": "2014-02-03T13:48:11.524-04:00", 
 "expiration_date": "2024-02-01T13:48:11.524-04:00", 
 "level": null, 
 "model": {
 "user_id": 153423413
 }, 
 "reference_id": null, 
 "status": "new", 
 "type": "config_withdraw", 
 "user_id": "153423413"
 }, 
 "_type": "notification", 
 "sort": [
 1391449691524
 ]
 }
 ], 
 "max_score": null, 
 "total": 2
 }, 
 "timed_out": false, 
 "took": 36
}



 

So I don't know what is the problem, maybe you can give me some pointer.

BTW this is the mapping of the type
 
curl -vsX GET myelastic.com/notifications/notification/_mapping

RESULT:

{
 "notification": {
 "properties": {
 "date_created": {
 "format": "dateOptionalTime", 
 "type": "date"
 }, 
 "expiration_date": {
 "format": "dateOptionalTime", 
 "type": "date"
 }, 
 "id": {
 "type": "string"
 }, 
 "model": {
 "properties": {
 "bank": {
 "type": "string"
 }, 
 "new_status": {
 "type": "string"
 }, 
 "status": {
 "type": "string"
 }, 
 "user_id": {
 "type": "long"
 }
 }
 }, 
 "reference_id": {
 "type": "long"
 }, 
 "status": {
 "type": "string"
 }, 
 "type": {
 "type": "string"
 }, 
 "user_id": {
 "type": "string"
 }
 }
 }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/30763a38-3b05-4c22-a910-f24b0f8e35d4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: GPGPU?

2014-02-04 Thread Tony Su
H
Looks interesting although of course for the moment is a very narrow 
implementation but may pave the way for more.
 
At first glance it sounds like one could write a CUDA app in OpenCL and 
access the JRE installed and running on the CPU (not GPU).
If I understand that correctly, then it might warrant inspection, a close 
look at what is running where.
 
The link to aparapi in your reference also looks interesting although I'm 
not too excited about anything that waits until runtime to do the byte 
translation.
 
Tony
 

On Tuesday, February 4, 2014 9:28:14 AM UTC-8, Ivan Brusic wrote:

> The JRE can access the GPU via libraries/bindings such as JOCL: 
> http://www.jocl.org/
>
> That said, elasticsearch can be thought of as a wrapper around Lucene. It 
> would make sense for the Lucene layer to take advantage of the GPU. Also, 
> for the most part, elasticsearch tends to be more memory and IO bound and 
> not CPU.
>
> Cheers,
>
> Ivan
>
>
> On Tue, Feb 4, 2014 at 9:17 AM, Tony Su  >wrote:
>
>> :)
>> I don't even know if there is a way to run a JRE on GPU computing (I 
>> haven't looked, either).
>> My current exposure has generally given me the impression all code is 
>> moderately low level... C, with some scripting.
>>  
>> But, if a JRE exists, then it's an interesting option considering the 
>> movement towards GPU computing for inexpensive massive computing for highly 
>> parallel tasks.
>>  
>> Tony
>>  
>>  
>>  
>> On Tuesday, February 4, 2014 8:39:46 AM UTC-8, depahelix wrote:
>>
>>> Is there anyway to make elasticsearch take advantage of GPGPU, when 
>>> available?  It would be nice to have some sort of plugin for this type of 
>>> thing, in the future.
>>>
>>> See here:
>>> http://en.wikipedia.org/wiki/General-purpose_computing_on_
>>> graphics_processing_units
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/eece2db6-6aca-43fc-a701-e2a004239a3d%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c37457f9-b652-4b18-bcf8-5cc54ba18609%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Elasticsearch LXC on Ubuntu 14.04 and recomended settings

2014-02-04 Thread engel der
Hi,

we are setting up a Elasticsearch 1.0 (RC2) Cluster and I think I need some 
help were to start with (settings related). We have got 6 physical server 
with 265GB RAM and 2TB local SAS storage (seperated in two Raid10 Groups as 
LVM VGs). Those six servers are running Ubuntu 14.04. All "roles" 
(Application Server [NGINX+PHP-FPM+GlusterFS-Client+Elasticsearch 
"searcher"], Database Server [Galera Cluster], Storage Server [GlusterFS], 
Cache Server [Redis] ...) will be running in LXC containers. Most of them 
Ubuntu 14.04 only the Galera Cluster in 12.04.
We expect about 100GB of data to index and the data is changing not that 
fast (5% per day?). The idea is to install Elastic Search on all 6 
Application Severs as "searcher" with:

cluster.name: search001
node.master: false
node.data: false
#node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

Add 3 "data" Nodes with:

cluster.name: search001
node.master: false
#node.data: false
#node.master: true
node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2
bootstrap.mlockall: true

and 3 "master" nodes:

cluster.name: search001
#node.master: false
node.data: false
node.master: true
#node.data: true
node.max_local_storage_nodes: 1
index.number_of_shards: 5
index.number_of_replicas: 2

The LXCs for those "searchers" get 8GB RAM, the "masters" get 2GB RAM and 
the "data" LXCs get 60GB and 300GB storage.

What about the Java settings for those "data" nodes???

cat /etc/default/elasticsearch 
# Run Elasticsearch as this user ID and group ID
ES_USER=elasticsearch
ES_GROUP=elasticsearch

# Heap Size (defaults to 256m min, 1g max)
ES_HEAP_SIZE=30g

# Heap new generation
ES_HEAP_NEWSIZE=1g

# max direct memory
ES_DIRECT_SIZE=???

# Maximum number of open files, defaults to 65535.
MAX_OPEN_FILES=65535

# Maximum locked memory size. Set to "unlimited" if you use the
# bootstrap.mlockall option in elasticsearch.yml. You must also set
# ES_HEAP_SIZE.
MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144  #more

# Elasticsearch log directory
#LOG_DIR=/var/log/elasticsearch

# Elasticsearch data directory
#DATA_DIR=/var/lib/elasticsearch

# Elasticsearch work directory
#WORK_DIR=/tmp/elasticsearch

# Elasticsearch configuration directory
#CONF_DIR=/etc/elasticsearch

# Elasticsearch configuration file (elasticsearch.yml)
#CONF_FILE=/etc/elasticsearch/elasticsearch.yml

# Additional Java OPTS
#ES_JAVA_OPTS=

# Configure restart on package upgrade (true, every other setting will lead 
to not restarting)
#RESTART_ON_UPGRADE=true


What about the master and searcher settings? I guess I do not have to tune 
them?

Thank you for any help!

Regards,
Flo

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/15991fc2-749e-4197-b854-88b5f64fc14a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Trying to build a faceted search that works like a charm, except for my locations_path

2014-02-04 Thread georgi . mateev
There is a complete curl recreation in this gist:

https://gist.github.com/gmateev/8808650

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3d2c57fd-6b15-480e-bdbc-8ef210c564dc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Joining node to cluster without restarting entire machine?

2014-02-04 Thread Tony Su
 
 Hi,
I understand you probably meant to post this to one of my other threads
https://groups.google.com/forum/#!topic/elasticsearch/dC48AAeL544
 
Interesting late development.
Too bad it sounds like what IBM is developing will be available only on IBM 
servers, but it's understandable.
 
Unless you want to pay for an IBM, I guess it'll be a wait.
 
Tony

On Tuesday, February 4, 2014 9:26:38 AM UTC-8, depahelix wrote:

> Here is something:
> http://blogs.nvidia.com/blog/2013/09/22/gpu-coming-to-java/
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/099a3c0d-d926-403f-a2c8-545661b46e26%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Restarting a cluster with existing data - Status Red?

2014-02-04 Thread Tony Su
Good stuff about data integrity if a fool or disaster strikes.
 
Maybe down the road it would be important to document the atomicity of ES 
transactions (I understand there are likely higher priorities now, and just 
ensuring integrity needs to be done before documentation).
 
Tony
 
 

On Tuesday, February 4, 2014 7:20:04 AM UTC-8, InquiringMind wrote:

>
>
> *2) Though *not* recommended - kill -9 should not result in data loss. If 
>> so it's a bug and should be reported.*
>>
>>
> It *should* not, but it *may*. A kill -9 ends a process without allowing 
> it to flush any unwritten buffers to disk, close any open files, or even 
> finish writing what it started. No process can detect or capture it; 
> therefore no process can perform any cleanup, shutdown, or completion. 
>
> So file all the bugs you wish, but there is no code change that can be 
> made to detect or handle a kill -9. Nothing in the Java code, and nothing 
> in the underlying JVM that is the process itself. Unless ES is redesigned 
> so that any given disk block can be written or not and the entire index 
> remains fully consistent. Because while the process cannot detect a kill 
> -9, the OS waits until it returns from any kernel call before ripping the 
> rug out from under it.
>
> Kill -9 just dangerous. No, it's not a guarantee of disaster. But the same 
> could be said about walking blindfolded across the Autobahn.
>
> Brian
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f9fdd057-571f-4638-86c6-0d3f492b8a58%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: GPGPU?

2014-02-04 Thread Ivan Brusic
The JRE can access the GPU via libraries/bindings such as JOCL:
http://www.jocl.org/

That said, elasticsearch can be thought of as a wrapper around Lucene. It
would make sense for the Lucene layer to take advantage of the GPU. Also,
for the most part, elasticsearch tends to be more memory and IO bound and
not CPU.

Cheers,

Ivan


On Tue, Feb 4, 2014 at 9:17 AM, Tony Su  wrote:

> :)
> I don't even know if there is a way to run a JRE on GPU computing (I
> haven't looked, either).
> My current exposure has generally given me the impression all code is
> moderately low level... C, with some scripting.
>
> But, if a JRE exists, then it's an interesting option considering the
> movement towards GPU computing for inexpensive massive computing for highly
> parallel tasks.
>
> Tony
>
>
>
> On Tuesday, February 4, 2014 8:39:46 AM UTC-8, depahelix wrote:
>
>> Is there anyway to make elasticsearch take advantage of GPGPU, when
>> available?  It would be nice to have some sort of plugin for this type of
>> thing, in the future.
>>
>> See here:
>> http://en.wikipedia.org/wiki/General-purpose_computing_on_
>> graphics_processing_units
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/eece2db6-6aca-43fc-a701-e2a004239a3d%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCt47Rvg-2uM%3DyP%2BM8-vuJAyKCrToReQpLnyYpOxE6R%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Joining node to cluster without restarting entire machine?

2014-02-04 Thread depahelix
Here is something:
http://blogs.nvidia.com/blog/2013/09/22/gpu-coming-to-java/

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bad9c4e0-7512-46b4-8762-191796ac412d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Joining node to cluster without restarting entire machine?

2014-02-04 Thread Tony Su
Unless I'm missing something in the docs or these forums,
 
I've surprisingly found that if a node fails to join the cluster, it's not 
sufficient to simply restart ES on the machine. I would have thought that 
restarting ES thereby re-reading its config files should be sufficient to 
announce its intention to join the cluster.
 
But, I haven't found that to be the case, every time I've had to reboot the 
entire machine to join the cluster.
 
Is there a config I'm missing?
 
Thx,
Tony

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/02c4b578-f430-44ba-a98c-7337b684125d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: GPGPU?

2014-02-04 Thread Tony Su
:)
I don't even know if there is a way to run a JRE on GPU computing (I 
haven't looked, either).
My current exposure has generally given me the impression all code is 
moderately low level... C, with some scripting.
 
But, if a JRE exists, then it's an interesting option considering the 
movement towards GPU computing for inexpensive massive computing for highly 
parallel tasks.
 
Tony
 
 
 
On Tuesday, February 4, 2014 8:39:46 AM UTC-8, depahelix wrote:

> Is there anyway to make elasticsearch take advantage of GPGPU, when 
> available?  It would be nice to have some sort of plugin for this type of 
> thing, in the future.
>
> See here:
>
> http://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eece2db6-6aca-43fc-a701-e2a004239a3d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


ES 1.0 and comments embedded in elasticsearch.yml

2014-02-04 Thread Tony Su
Just a reminder to whomever...
 
I'm noticing that the comments in elasticsearch.yml that reference or point 
to URLs that worked in <1.0 may no longer work.
 
eg
Node Discovery (and viewing health)
http://localhost:9200/_cluster/nodes
 
I'm not sure but I think it should be changed to
http://localhost:9200/_nodes/nodenameoraddress
 
Tony

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0047c01d-2193-4d67-8580-963076979c6a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Bulk indexing tips for Elastic search and Cassandra River

2014-02-04 Thread AKhan
cassandra-river is not working in my case too and I am getting exceptions 
on server side.

elasticsearch.common.UUID; 

On Friday, March 29, 2013 10:01:14 PM UTC+1, utkar...@gmail.com wrote:
>
> Hello,
>
> I have been working on a cassandra river which triggers periodically and 
> indexes all data in a cassandra column family. The implementation for now 
> spawns 10 threads and processes 10k documents (with 13 columns)/thread.
> The performance initially was very good. It indexed 1M documents in 
> 10mins. But after a 1hour, the indexing became very slow and it indexed 
> around 8M documents. I am trying to index a total of 50M documents.
>
> I have attached a screenshot of the memory and CPU usage. What I noticed 
> was, a lot of merge threads spawned up which reduced the speed considerably:
> "elasticsearch[Doppelganger][[prodinfo][1]: Lucene Merge Thread #329]" 
> daemon prio=10 tid=0x2a63 nid=0x4c28 runnable [0x246bd000]
>
> So, I believe this has to do with some configuration which I can tweak to 
> improve bulk indexing. I am running 1 node with 5 shared with 2GB of 
> ES_HEAP_SIZE and no replicas for now.
>
> Shay mentioned some tips here: 
> https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/APWxRLrMOeUin
>  2011.
> Wanted to know if there are any bulk indexing performance improvements?
>
> I am also using: bulk.execute().addListener() (async) in place of 
> bulk.execute().actionGet() (sync)
>
> I am planning to share the cassandra-river as soon its achieves acceptable 
> performance.
>
>
>
> 
>
>
>
>
> 
>
>
> Thanks,
> -Utkarsh
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1f5550ca-d53e-4513-b691-8992e0504533%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


GPGPU?

2014-02-04 Thread depahelix
Is there anyway to make elasticsearch take advantage of GPGPU, when 
available?  It would be nice to have some sort of plugin for this type of 
thing, in the future.

See here:
http://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7c0c7add-3b61-4b72-a51d-150618a897ef%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Trying to build a faceted search that works like a charm, except for my locations_path

2014-02-04 Thread georgi . mateev

Another strange thing that I see, is that if the field is mapped like a 
"string", I am not able to find anything on it. Neither can I do facets. If 
it is a multi_field, I get facets and results.

That's something that the documentation didn't say, that's why I think that 
I have other errors in the setup. But the documentation doesn't say 
anything on that topic as well, so I don't know what it is.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a736eb64-b513-496f-a892-7d3938f74bb9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Improving Bulk Indexing

2014-02-04 Thread ZenMaster80
Jörg,

Great, I learned a lot about the process from your responses. Could you 
elaborate more on your use case, mine I think will be similar to yours 
where processing/feeding is on one server and I will use transport client, 
index nodes will be on EC2. So, when I do get to setting up Ec2 nodes, I 
believe I should be mostly looking for big cores and SSD.
For current test, besides running long feeds to guage performance and 
checking for analyzers, I take it there isn't much else I can do to make 
significant impact?

On Tuesday, February 4, 2014 3:11:14 AM UTC-5, Jörg Prante wrote:
>
> SSD will improve overall performance very much, yes. Disk drives are the 
> slowest part in the chain and this will help. No more low IOPS, so it will 
> significantly reduce the load on CPU (less IO waits).
>
> More RAM will not help that much. In fact, more RAM will slow down 
> persisting, it increases pressure on the memory-to-disk part. ES obviously 
> does not depend on large RAM for persisting data, some MB suffice, but you 
> can try and see for yourself.
>
> 85 MB is not sufficient for testing index segment merging and GC effects, 
> you should run a bulk indexing feed not for seconds, but for at least 20-30 
> minutes, if not for hours.
>
> Also check if your mapping can be simplified, the less complex analyzers, 
> the faster ES can index.
>
> You should also exercise your feed program how long it takes to process 
> your input without the part of bulk indexing. Then you see a bottom line, 
> and maybe more space for improvement outside ES. 
>
> In my use case, it helped to move the feed program to another server and 
> use the TransportClient with a speedup of ~30%.
>
> I agree that 5.5M/sec is not the end of the line but that heavily depends 
> on your hard- and software configuration (machine, OS, file systems, JVM).
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8db08c83-c91d-45df-bd28-5fe49f7f32cd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Trying to build a faceted search that works like a charm, except for my locations_path

2014-02-04 Thread georgi . mateev
Hi guys,
I have been trying to build a faceted search and it works pretty well so 
far. There is just one problem that I have been struggling with in the last 
few days. 

It is a multinational jobsearch and the jobs are categorized under a 
country and a state. I want to be able to search for different states in 
different countries (It could happen in the EU, where some people would be 
interested to work abroad in a neighboring district). 

So what I tried out was this tutorial 
http://www.springyweb.com/2012/01/hierarchical-faceting-with-elastic.html . 
I skipped the part with the new tokenizer, as I don't care about the number 
of the level. 

Unfortunately it didn't work. The fields seems to get tokenized, since when 
I search for regexp "44" it finds /44/10, if I search for "10" it finds 
/44/10, but if I search for "/44" or "44.*" I get 0 hits.

Then I removed the path_tokenizer and tried to work with a not_analyzed 
string, hoping that it would do the job, but again it didn't work out.

What am I doing wrong? This is the mapping: The field is locations_path

https://gist.github.com/anonymous/8805721

{
"vacancy": {
"index_analyzer": "indexAnalyzer",
"search_analyzer": "searchAnalyzer",
"_boost": {
"name": "_boost",
"null_value": 1
},
"properties": {
"vId": {
"type": "integer",
"include_in_all": false
},
"title": {
"type": "string",
"include_in_all": true
},
"teaser": {
"type": "string",
"include_in_all": true
},
"keywords": {
"type": "multi_field",
"include_in_all": true
},
"completion": {
"type": "completion",
"analyzer": "standard"
},
"description": {
"type": "string",
"include_in_all": true
},
"company": {
"type": "multi_field",
"include_in_all": true
},
"company_id": {
"type": "multi_field",
"include_in_all": false
},
"workingSchedule": {
"type": "object",
"properties": {
"de": {
"type": "string",
"include_in_all": true
},
"ch": {
"type": "string",
"include_in_all": true
},
"pl": {
"type": "string",
"include_in_all": true
}
}
},
"careerPosition": {
"type": "object",
"properties": {
"de": {
"type": "string",
"include_in_all": true
},
"ch": {
"type": "string",
"include_in_all": true
},
"pl": {
"type": "string",
"include_in_all": true
}
}
},
"jobExperience": {
"type": "object",
"properties": {
"de": {
"type": "string",
"include_in_all": true
},
"ch": {
"type": "string",
"include_in_all": true
},
"pl": {
"type": "string",
"include_in_all": true
}
}
},
"seniority": {
"type": "object",
"properties": {
"de": {
"type": "string",
"include_in_all": true
},
"ch": {
"type": "string",
"include_in_all": true
},
"pl": {
"type": "string",
"include_in_all": true
}
}
},
"subjects": {
"type": "multi_field",
"include_in_all": false
},
"subjects_fulltext": {
"type": "object",
"properties": {
"de": {
"type": "multi_field",
"include_in_all": true
},
"ch": {
"type": "multi_field",
"include_in_all": true
},

Re: Marvel and basic_auth

2014-02-04 Thread Boaz Leskes
Hey Al,

We just release marvel 1.0.2, which contains support for basic auth for the 
data shipping. Can you give it a spin? 
See: http://www.elasticsearch.org/guide/en/marvel/current/#configuration

Cheers,
Boaz

On Thursday, January 30, 2014 11:59:00 AM UTC+1, Boaz Leskes wrote:
>
> Hi Al,
>
> This noted and we'll look into it. Thanks for reporting.
>
> Cheers,
> Boaz
>
> On Thursday, January 30, 2014 9:24:32 AM UTC+1, Al Smith wrote:
>>
>> Thanks Boaz... tried that, and it doesn't seem to want to use the 
>> credentials; from tcpdumping on lo I can see that it wants to do a PUT to 
>> /_template/marvel, doesn't try supplying auth credentials and of course it 
>> gets a 401 back. It doesn't seem to want to retry the request with 
>> authentication credentials.
>>
>> Regards,
>> Al.
>>
>> On Thursday, January 30, 2014 2:15:33 AM UTC+1, Boaz Leskes wrote:
>>>
>>> Hi Al,
>>>
>>> try settings the following in your elasticsearch.yml:
>>>
>>>
>>> marvel.agent.exporter.es.hosts: [ "user:passwd@host:9200" ]
>>>
>>>
>>> Cheers,
>>> Boaz
>>>
>>> On Wednesday, January 29, 2014 3:37:59 PM UTC+1, Al Smith wrote:

 Perhaps a silly question, and yes I've RTFM'd the online documenation - 
 I'd like to know how to tell Marvel to use a username and password (and 
 basic_auth) to talk to the ES servers? We have jetty configured to deny 
 write access without user/pass and of course Marvel needs to know what 
 those credentials are.

 Thanks,
 A.

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/76e7421d-bfcd-47fe-b691-7b5d33ecb867%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Marvel behind Nginx and https

2014-02-04 Thread Boaz Leskes
Hi,

Just wanted to drop a note that we just release Marvel 1.0.2 which contains 
a fix for this. 
See http://www.elasticsearch.org/guide/en/marvel/current/#_change_list .

Cheers,
Boaz

On Friday, January 31, 2014 3:23:33 PM UTC+1, Sean Gallagher wrote:
>
> Thanks for reporting this issue!  I've submitted the issue to the internal 
> repo and the team will take care of it.  Keep 'em coming!
>
> On Tuesday, January 28, 2014 10:45:04 AM UTC-5, J. Schulz wrote:
>>
>> Hi,
>>
>> I have Nginx configured as reverse proxy to access elasticsearch over 
>> https + auth basic. As example
>>
>> Unfortunately Marvel tries to connect to http://hostname/.
>>
>> The affected code line is in 
>> /usr/share/elasticsearch/plugins/marvel/_site/kibana/config.js
>>
>> elasticsearch: "http://"+window.location.hostname+(window.location.port 
>> !== '' ? ':'+window.location.port : ''),
>>
>> should be
>>
>> elasticsearch: 
>> window.location.protocol+"//"+window.location.hostname+(window.location.port 
>> !== '' ? ':'+window.location.port : '')
>>
>> Cheers,
>> Jonny
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5ac68c3-b546-406c-95b1-d483e81fea6f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Marvel - How is "Free Disk Space" evaluated? (Displayed in red)

2014-02-04 Thread Tony Su
Thx,
For anyone who views this thread who wants to see an example how this looks 
in Marvel,
I've posted a screenshot
https://github.com/putztzu/Misc_images/blob/master/screenshot.png
 
Tony
 
 
 

On Tuesday, February 4, 2014 2:16:07 AM UTC-8, Boaz Leskes wrote:

> Hi Tony,
>
> The red color does mean it needs attention. By default, Marvel will warn 
> you if you have less than 50GB of free space (displayed in yellow) and will 
> go red if you have less than 20GB. If you had higher numbers displayed in 
> color, please let me know as it is a bug. 
>
> Cheers,
> Boaz
>
> On Monday, February 3, 2014 7:16:04 PM UTC+1, Tony Su wrote:
>>
>> In this screenshot,
>> *https://github.com/putztzu/Misc_images/blob/master/marvel_only.png*
>>  
>> The "Free Disk Space" is displayed in red.
>>  
>> Does this red color mean something, eg a warning of some kind or is it 
>> simply stylistic?
>> I remember when I was pointing the data directory to hundreds of 
>> gigabytes instead of tens, the color was still red.
>>  
>> Thx,
>> Tony
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/19963834-4c2d-4e65-8dd3-5017674f3f08%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Restarting a cluster with existing data - Status Red?

2014-02-04 Thread InquiringMind


*2) Though *not* recommended - kill -9 should not result in data loss. If 
> so it's a bug and should be reported.*
>
>
It *should* not, but it *may*. A kill -9 ends a process without allowing it 
to flush any unwritten buffers to disk, close any open files, or even 
finish writing what it started. No process can detect or capture it; 
therefore no process can perform any cleanup, shutdown, or completion. 

So file all the bugs you wish, but there is no code change that can be made 
to detect or handle a kill -9. Nothing in the Java code, and nothing in the 
underlying JVM that is the process itself. Unless ES is redesigned so that 
any given disk block can be written or not and the entire index remains 
fully consistent. Because while the process cannot detect a kill -9, the OS 
waits until it returns from any kernel call before ripping the rug out from 
under it.

Kill -9 just dangerous. No, it's not a guarantee of disaster. But the same 
could be said about walking blindfolded across the Autobahn.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/be116ad2-61b2-482b-97e2-0557c92c78f5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Restarting a cluster with existing data - Status Red?

2014-02-04 Thread Boaz Leskes
Hi Tony,

It's good you're going to use the minimum_master_nodes settings. When this
number of master eligible nodes have started (more on this in a second),
one will be picked up randomly and that will stay so until that elected
master becomes unreachable (= shutdown).

If you want to control which nodes can become master, you can use the
node.master setting in the elasticsearch.yml and set it to false. Only
nodes that has this set to true can become master. True is the default
which makes all nodes viable. It is important to note that the minimum
master nodes setting relates to the number of node in the cluster which
have node.master set to true, not all the nodes in the cluster - so change
it accordingly.

Cheers,
Boaz




On Tue, Feb 4, 2014 at 4:00 PM, Tony Su  wrote:

> I've restarted the cluster a couple times since and not seen what I saw
> before.
>
> Been reading more of the documentation, am going to set the "min-max
> master" to 3 which is suggested for a 5 node cluster.
> Currently speculating, although I thought I've been very careful to start
> the master significantly before any other node, something may have happened
> the time that causes a persistently red status.
>
> Questions related to this general topic (restarting a cluster)
> Q - Once a cluster has started up with a different node as the master, is
> there persistence in continuing to assign that role to that node or is it
> completely arbitrary on every startup (ie what are the attributes an
> election is based on)?
>
> Q - If a cluster has started up with the wrong nodes with the Master role,
> is it possible or advisable to try to modify their roles while the cluster
> is running or is it advisable to shutdown the cluster, re-configure and
> start up again?
>
> Thx,
> Tony
>
>
>
> On Tuesday, February 4, 2014 2:27:30 AM UTC-8, Boaz Leskes wrote:
>
>> A couple of points:
>>
>> 1) If you bring down a whole cluster and start it back up, it may be that
>> during the start process the cluster is red. The reason is that until all
>> nods have rejoined some data may not be (yet) available for searching. This
>> should be resolve as soon as all the nodes are back (potentially earlier
>> depending on your replication settings)
>> 2) Though *not* recommended - kill -9 should not result in data loss. If
>> so it's a bug and should be reported.
>>
>>
>>
>> On Monday, February 3, 2014 11:15:24 PM UTC+1, Tony Su wrote:
>>>
>>> Thx for the input.
>>> Nope, ES is being shutdown "normally" usually by simply stopping the
>>> configured ES service, and only after it fully completes executing a
>>> shutdown.
>>>
>>> Tony
>>>
>>>
>>>
>>> On Monday, February 3, 2014 2:09:08 PM UTC-8, InquiringMind wrote:
>>>
 Tony,

 You're not doing a kill -9 during shutdown, I hope. If so, that would
 result in a large window of opportunity for index corruption.

 Just something to check for...

 We always do a normal kill to the pid within the pid file to shut down
 an ES instance before shutting down the machine itself, or before upgrading
 the software.And we have never seen any issues with the cluster coming back
 up in the same (usable, usually yellow or green) state that it was before
 the shutdown.

 On two occasions we have had machines power off due to thermal overload
 in the server room. This is a drastic event that is usually as dangerous
 (to disk data integrity) as a kill -9, but in these cases there wasn't any
 load on the machine and we experienced no data loss nor did we see the
 cluster as anything but green once the machine came back up and the node
 restarted.

 Brian

>>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/W-xGL2teI4g/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/89b4183e-5cf6-4c04-ab21-b4ab427fc313%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKzwz0pdX2TRMvcZEih219OXCUeHggwhHkoQ0sq58Ci-NYjhzg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Restarting a cluster with existing data - Status Red?

2014-02-04 Thread Tony Su
I've restarted the cluster a couple times since and not seen what I saw 
before.
 
Been reading more of the documentation, am going to set the "min-max 
master" to 3 which is suggested for a 5 node cluster.
Currently speculating, although I thought I've been very careful to start 
the master significantly before any other node, something may have happened 
the time that causes a persistently red status.
 
Questions related to this general topic (restarting a cluster)
Q - Once a cluster has started up with a different node as the master, is 
there persistence in continuing to assign that role to that node or is it 
completely arbitrary on every startup (ie what are the attributes an 
election is based on)?
 
Q - If a cluster has started up with the wrong nodes with the Master role, 
is it possible or advisable to try to modify their roles while the cluster 
is running or is it advisable to shutdown the cluster, re-configure and 
start up again?
 
Thx,
Tony
 
 

On Tuesday, February 4, 2014 2:27:30 AM UTC-8, Boaz Leskes wrote:

> A couple of points:
>
> 1) If you bring down a whole cluster and start it back up, it may be that 
> during the start process the cluster is red. The reason is that until all 
> nods have rejoined some data may not be (yet) available for searching. This 
> should be resolve as soon as all the nodes are back (potentially earlier 
> depending on your replication settings)
> 2) Though *not* recommended - kill -9 should not result in data loss. If 
> so it's a bug and should be reported.
>
>
>
> On Monday, February 3, 2014 11:15:24 PM UTC+1, Tony Su wrote:
>>
>> Thx for the input.
>> Nope, ES is being shutdown "normally" usually by simply stopping the 
>> configured ES service, and only after it fully completes executing a 
>> shutdown.
>>  
>> Tony
>>  
>>  
>>
>> On Monday, February 3, 2014 2:09:08 PM UTC-8, InquiringMind wrote:
>>
>>> Tony,
>>>
>>> You're not doing a kill -9 during shutdown, I hope. If so, that would 
>>> result in a large window of opportunity for index corruption.
>>>
>>> Just something to check for...
>>>
>>> We always do a normal kill to the pid within the pid file to shut down 
>>> an ES instance before shutting down the machine itself, or before upgrading 
>>> the software.And we have never seen any issues with the cluster coming back 
>>> up in the same (usable, usually yellow or green) state that it was before 
>>> the shutdown.
>>>
>>> On two occasions we have had machines power off due to thermal overload 
>>> in the server room. This is a drastic event that is usually as dangerous 
>>> (to disk data integrity) as a kill -9, but in these cases there wasn't any 
>>> load on the machine and we experienced no data loss nor did we see the 
>>> cluster as anything but green once the machine came back up and the node 
>>> restarted.
>>>
>>> Brian
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/89b4183e-5cf6-4c04-ab21-b4ab427fc313%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Elasticsearch index mapping in java

2014-02-04 Thread Doru Sular
Hi guys,

I am trying to create an index with the following code:
XContentBuilder source = XContentFactory.jsonBuilder().startObject()//
.startObject("settings")
.field("number_of_shards", 1)
.endObject()// end settings
.startObject("mappings")
.startObject(INDEX_TYPE)//
.startObject("properties")//
.startObject("user")
.field("type", "string") // start user
.field("store", "yes")
.field("index", "analyzed")//
.endObject()// end user
.startObject("postDate")//
.field("type", "date")
.field("store", "yes")
.field("index", "analyzed")//
.endObject()// end post date
.startObject("message") //
.field("type", "string")
.field("store", "yes")
.field("index", "not_analyzed")
.endObject() // end user field
.endObject() // end properties
.endObject() // end index type
.endObject() // end mappings
.endObject(); // end the container object

IndexResponse response = this.client.prepareIndex(INDEX, INDEX_TYPE
).setSource(source)
.setType(INDEX_TYPE).execute()
.actionGet();


I want to have the "message" field not analyzed, because later I want to 
use facets to obtain unique messages.
Unfortunately my code seems to add just a document in index with the 
following structure:
{
  "settings": {
"number_of_shards": 1
  },
  "mappings": {
"tweet": {
  "properties": {
"user": {
  "type": "string",
  "store": "yes",
  "index": "analyzed"
},
"postDate": {
  "type": "date",
  "store": "yes",
  "index": "analyzed"
},
"message": {
  "type": "string",
  "store": "yes",
  "index": "not_analyzed"
}
  }
}
  }
}

Please help me to spot the error, it seems that mapping are not created.
Thank you very much,
Doru

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d6635c65-41e5-43e9-b477-908f320127c5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Inconsistent responses from aggregations (ES1.0.0RC1)

2014-02-04 Thread Nils Dijk
To follow up,

I have a contained test suite at https://gist.github.com/thanodnl/8803745for 
this problem. It contains two files:

   1. aggsbug.sh
   2. aggsbug.json

The .json file contains ~1M documents newline separated to load into the 
database, I was not able to create a curl request to load them directly 
into the index.
The .sh file (https://gist.github.com/thanodnl/8803745/raw/aggsbug.sh) 
contains the instructions for recreating this behavior.

I have ran these against the following version:

   1. 1.0.0.Beta2
   2. 1.0.0.RC1
   3. 1.0.0-SNAPSHOT as compiled from the git 1.0 branch on commit 
   0f8b41ffad9b5ecdfd543d7c73edcf404e6fc763

When ran on 1.0.0.Beta2 it gives the same output consistently when I run 
the _search over and over again.
When ran on 1.0.0.RC1 it will give me multiple different outcomes 
comparable to the numbers I posted earlier in the thread,
When ran on 1.0.0-SNAPSHOT it behaves the same as in 1.0.0.RC1.

That it still was working on 1.0.0.Beta2 proves to me that it is a bug that 
got into RC1. I could not find any related ticket on the issues page of the 
github repository. Hopefully this is enough information to recreate the 
problem.

The json file is quite big and could bug when you open the gist it in a 
browser. A clone of the gist locally will work best:
$ git clone https://gist.github.com/8803745.git

I do not really know how to move on from here. Do you want me to open an 
issue for this problem at github.com/elasticsearch/elasticsearch? It would 
be nice to fix this problem before a release of 1.0.0 since that is the 
first release containing the aggregations for analytics.

On Tuesday, February 4, 2014 12:31:10 PM UTC+1, Nils Dijk wrote:

> I've loaded the same dataset in ES1.0.0.Beta2 with the same index 
> configuration as in the topic start.
>
> However now the numbers are consistent if I call the same aggregation 
> multiple times in a row AND the number match the numbers of the facets. 
> This leads me to the conclusion something is broken from Beta2 to RC1!
>
> I would like to test this on master, but I could not find any nightly 
> builds of elasticsearch. Is there a location where they are stored or 
> should I compile it myself?
>
> On Friday, January 31, 2014 6:43:07 PM UTC+1, Nils Dijk wrote:
>>
>> Hi Binh Ly,
>>
>> Thanks for the response.
>>
>> I'm aware that the numbers are not exact (hence the link to issue #1305 
>> in my initial post), and have been advocating slightly incorrect numbers 
>> with my colleges and customers for some time already to prepare them for 
>> the moment we provide analytics with ES. But what bothers me is that they 
>> are *inconsistent*.
>>
>> If you look at my gist you see that I ran the same aggs 3 times right 
>> after each other. If we just look at the top item we see the following 
>> results:
>>
>>1. { "key": "totaltrafficbos", "doc_count": 2880 }
>>2. { "key": "totaltrafficbos", "doc_count": 2552 }
>>3. { "key": "totaltrafficbos", "doc_count": 2179 }
>>
>> These results are taken within seconds without any change to the number of 
>> documents in the index. If I run them even more you see that it rotates 
>> between a hand full of numbers. Is this also behavior one would expect from 
>> the aggs? And if so, why do the facets show the same number over and over 
>> again?
>>
>> Anyway, I will try to work myself through the aggs code this weekend to get 
>> a better hang of what we could do with it, and what not.
>>
>> -- Nils
>>
>> On Friday, January 31, 2014 6:18:43 PM UTC+1, Binh Ly wrote:
>>>
>>> Nils,
>>>
>>> This is just the nature of splitting data around in shards. Actually the 
>>> terms facet has the same limitations (i.e. it will also give "approximate 
>>> counts"). Neither the terms facet nor the terms aggregation is better or 
>>> worse than the other - they are both approximations (using different 
>>> implementations). It is correct that if you put all your data in 1 shard, 
>>> then all the counts are exact. If you need to shard, you can increase the 
>>> "shard_size" parameter inside the terms aggregation to "improve accuracy". 
>>> Play with that number until it suits your purposes but the important thing 
>>> is they are just approximations the more documents you have in the index - 
>>> so just don't expect absolute numbers from them if you have more than 1 
>>> shard.
>>>
>>> {
>>>   "size": 0,
>>>   "aggs": {
>>> "a": {
>>>   "terms": {
>>> "field": "actor.displayName",
>>> "shard_size": 1
>>>   }
>>> }
>>>   }
>>> }
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb421a29-8923-4188-9363-03682fec71ab%40googlegroups.com.
For more options, visit https://groups.google.com

Re: cassandra river plugin installation issue

2014-02-04 Thread Ansar Rafique
Shamsul Haque, I am also getting the same error on Elastic Search console 
even though the type of id is string not int in my case. Any clue ?

On Wednesday, January 8, 2014 11:06:36 AM UTC+1, shamsul haque wrote:
>
> I have downloaded river from: https://github.com/eBay/cassandra-river
>
> change the settings in file: CassandraRiver.java as per my Cassandra 
> setting:
>
> if (riverSettings.settings().containsKey("cassandra")) {
> @SuppressWarnings("unchecked")
> Map couchSettings = (Map) 
> settings.settings().get("cassandra");
> this.clusterName = 
> XContentMapValues.nodeStringValue(couchSettings.get("cluster_name"), "Test 
> Cluster");
> this.keyspace = 
> XContentMapValues.nodeStringValue(couchSettings.get("keyspace"), 
> "topic_space");
> this.columnFamily = 
> XContentMapValues.nodeStringValue(couchSettings.get("column_family"), 
> "users");
> this.batchSize = 
> XContentMapValues.nodeIntegerValue(couchSettings.get("batch_size"), 1000);
> this.hosts = 
> XContentMapValues.nodeStringValue(couchSettings.get("hosts"), 
> "localhost:9160");
> this.username = 
> XContentMapValues.nodeStringValue(couchSettings.get("username"), 
> "USERNAME");
> this.password = 
> XContentMapValues.nodeStringValue(couchSettings.get("password"), "P$$WD");
> } else {
> /*
>  * Set default values
>  */
> this.clusterName = "Test Cluster";
> this.keyspace = "topic_space";
> this.columnFamily = "users";
> this.batchSize = 1000;
> this.hosts = "localhost:9160";
> this.username = "USERNAME";
> this.password = "P$$WD";
> }
>
> when i build maven using given command, mvn clean package in TEST mvn log 
> it shows:
>
> ---
>  T E S T S
> ---
> Running org.elasticsearch.river.cassandra.CassandraRiverIntegrationTest
> Configuring TestNG with: 
> org.apache.maven.surefire.testng.conf.TestNG652Configurator@67eaf25d
> Exception in thread "Queue-Indexer-thread-0" java.lang.NullPointerException
> at 
> org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Exception in thread "Queue-Indexer-thread-2" java.lang.NullPointerException
> at 
> org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Exception in thread "Queue-Indexer-thread-5" java.lang.NullPointerException
> at 
> org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Exception in thread "Queue-Indexer-thread-4" java.lang.NullPointerException
>
> i tried to do same after installing plugin in ES, it shows same error 
> continuously.
> Anybody have any idea, whats going wrong with my setup??
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/281251a1-7226-477b-8c75-1d5934336d8e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: _mapping API throws IndexMissingException even with examples from ES site

2014-02-04 Thread pitty . the . fools
Hi Karel,

Thanks for your response, OK just seems kind of counter intuative to me, 
after all I have supplied everything it would need to create the index, 
type and add the map (you know being elastic and all :-)) and it can easily 
be achieved via the Java API (I'm sure it used to be supported). The 
documentation should possibly mention its a limit to using the put mapping 
that the index must first be created, to avoid confusion.

Cheers,

Jon

On Friday, January 31, 2014 6:11:10 AM UTC, Karel Minařík wrote:
>
> With the first command, you *update* the mapping for an existing index -- 
> when that index doesn't exist, you'll get, predictably, an 
> error: {"error":"IndexMissingException[[twitter] missing]","status":404}
>
> With the second, command, you're creating an index called "twitter" (hence 
> PUT with name, as per RESTful conventions), while also supplying additional 
> configuration (mapping) for this index.
>
> Karel
>
> On Thursday, January 30, 2014 11:13:33 AM UTC+1, pitty.t...@gmail.comwrote:
>>
>> Hi guys and gals,
>>
>> I have come across a curious problem, possibly a bug while working with 
>> mappings this morning. It's been a few months since I have had to create 
>> any mapping but created series of PHP scripts to do so a few months ago. 
>> Everything worked great until my latest upgrade to 0.90.10, now when I run 
>> my scripts I get error 404 IndexMissingException. So I checked using the 
>> mappings suggested in the ES documentation specifically:
>>
>> $ curl -XPUT 'http://localhost:9200/twitter/tweet/_mapping' -d '
>> {
>> "tweet" : {
>> "properties" : {
>> "message" : {"type" : "string", "store" : "yes"}
>> }
>> }
>> }
>> '
>>
>> This throws the same error, after further investigation it seems this can be 
>> used after an initial command is sent to create the index first, like:
>>
>> $ curl -XPUT 'http://localhost:9200/twitter/'
>>
>> Additionally, this can be bypassed by using the following syntax:
>>
>> $ curl -XPUT 'http://localhost:9200/twitter/' -d '
>> {
>> "mapping" : {
>> "tweet" : {
>> "properties" : {
>> "message" : {"type" : "string", "store" : "yes"}
>> }
>> }
>> }
>> }
>>
>>
>> I'm not sure if this is purposeful but since it is still documented on 
>> the ES site I think it should be addressed or clarified.
>>
>> Additional I can confirm that the mapping works correctly via the Java 
>> API using:
>>
>>
>> client.admin().indices().preparePutMapping(index).setType(type).setSource(mapping).execute().actionGet();
>>
>> So this would suggest that the REST implementation is bugged. Any 
>> feedback or clarification would be greatly appreciated.
>>
>> Cheers,
>>
>> Jon
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fb0bdc7-231b-4dfa-a333-6d9b853241f9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Filter on deeply nested data?

2014-02-04 Thread Hendrik
Maybe this helps: https://github.com/salyh/elasticsearch-security-plugin

Am Donnerstag, 23. Januar 2014 17:37:20 UTC+1 schrieb David Haimson:
>
> Our data is stored in MongoDB 2.4.8, and indexed to ElasticSearch 0.90.7 
> using the ElasticSearch MongoDB River 1.7.3.
>
> Our data indexes correctly, and I can successfully search the fields we 
> want to search. But I also need to filter on permission - of course we only 
> want to return results the calling user can actually read.
>
> In the code on our server, I have the calling user's authorizations as an 
> array, for example:
>
> [ "Role:REGISTERED_USER", "Account:52c74b25da06f102c90d52f4", "Role:USER", 
> "Group:52cb057cda06ca463e78f0d7" ]
>
>
> An example of the unit data we're searching follows:
>
> {
> "_id" : ObjectId("52dffbd6da06422559386f7d"),
> "content" : "various stuff",
> "ownerId" : ObjectId("52d96bfada0695fcbdb41daf"),
> "acls" : [ 
> {
> "accessMap" : {},
> "sourceClass" : 
> "com.bulb.learn.domain.units.PublishedPageUnit",
> "sourceId" : ObjectId("52dffbd6da06422559386f7d")
> }, 
> {
> "accessMap" : {
> "Role:USER" : {
> "allow" : [ 
> "READ"
> ]
> },
> "Account:52d96bfada0695fcbdb41daf" : {
> "allow" : [ 
> "CREATE", 
> "READ", 
> "UPDATE", 
> "DELETE", 
> "GRANT"
> ]
> }
> },
> "sourceClass" : "com.bulb.learn.domain.units.CompositeUnit",
> "sourceId" : ObjectId("52dffb54da06422559386f57")
> }
> ]
> }
>
>
> In the sample data above, I have replaced all the searchable content with
>  "content" : "various stuff"
>
> The authorization data is in the "acls" array. The filter I need to write 
> would do the following (in English):
>
> pass all units where the "acls" array
> contains an "accessMap" object
> that contains a property whose name is one of the user's authorization 
> strings
> and whose "allow" property contains "READ"
> and whose "deny" property does not contain "READ"
>
> In the example above, the user has "Role:USER" authorization, and this 
> unit has an accessMap that has "Role:USER", which contains "allow", which 
> contains "READ", and "Role:USER" contains no "deny". So this unit would 
> pass the filter.
>
> I am not seeing how to write a filter for this using ElasticSearch.
>
> I get the impression that there are two ways to deal with nested arrays 
> like this: "nested", or "has_child" (or "has_parent").
>
> We are reluctant to use the "nested" filter because it apparently requires 
> that the whole block be re-indexed when any of the data changes. Searchable 
> content and authorization data can change at any time, in response to user 
> actions.
>
> It looks to me as though in order to use "has_child" or "has_parent", the 
> authorization data would have to be separate from the unit data (in a 
> different collection?), and when a node is indexed, it would have to have 
> its parent or child specified. I don't know whether the ElasticSearch 
> MongoDB River is capable of doing this.
>
> So is this even possible? Or should we rearrange the authorization data?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7624eeab-c0ba-4554-9c8e-a454add6d0d1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: hadoop to ES problem

2014-02-04 Thread Phil gib
Thnaks for your reply,
my SW context is:
 ES 0.90.3,   elasticsearch-hadoop-1.3.0.M1 , eclipse- java, windows7

(perharps some configuration problem in Eclipse , i will investigate )
best regards 
philippe

On Tuesday, February 4, 2014 12:22:54 PM UTC+1, Costin Leau wrote:
>
> The port and host are 9200 and localhost by default. What's your ES 
> version? 
> ES-Hadoop requires 0.90 or higher (preferably the latest 0.90). 
>
> On 04/02/2014 12:59 PM, Phil gib wrote: 
> > Hello Costin, 
> > changing to "es.resource" does not help... :-( 
> >   argh.. i see the socket connection closed in the ES logs.. so ES is 
> contacted ... the 9200 is correct ? 
> > 
> > philippe 
> > best regards 
> > 
> >  configuration.set("es.resource", "eshindex2/eshlog2/_search?q=*"); 
> >  configuration.set("es.host", "localhost"); 
> >  configuration.set("es.port", "9200"); 
> >  System.out.println("conf:"+ configuration); 
> >  JobConf job = new JobConf(configuration, 
> SimpleJobES2Hadoop.class); 
> > ... 
> > 
> > 
> > On Tuesday, February 4, 2014 11:07:37 AM UTC+1, Costin Leau wrote: 
> > 
> > If you are using M1 you should use es.resource instead of es.query 
> in your latest example. 
> > es.query is supported in the upcoming 1.3.0.M2 (not in m1). 
> > 
> > On 04/02/2014 12:01 PM, Phil gib wrote: 
> > > hello my context: 
> > >   ES 0.90.3,   elasticsearch-hadoop-1.3.0.M1 , eclipse- java 
> > > i am experimenting ES->Hadoop  and Hadoop->ES 
> > > no problem with Hadoop2ES  with these 3 settings  ( i see the 
> index + data  through head plugin, perfect) 
> > > job.set("es.resource", "eshindex2/eshlog2"); 
> > > job.set("es.host", "localhost"); 
> > > job.set("es.port", "9200"); 
> > > but experimenting the inverse , ie ES 2 Hadoop 
> > >  
> > >  configuration.set("es.query", 
> "eshindex2/eshlog2/_search?q=*"); 
> > >  configuration.set("es.host", "localhost"); 
> > >  configuration.set("es.port", "9200"); 
> > >  System.out.println("conf:"+ configuration); 
> > >  JobConf job = new JobConf(configuration, 
> SimpleJobES2Hadoop.class); 
> > > ... 
> > > i got the error 
> > > 10:58:27 INFO mapred.JobClient: Cleaning up the staging area 
> > > 
> file:/tmp/hadoop-xxx/mapred/staging/xxx-1302405475/.staging/job_local_0001 
> > > Exception in thread "main" java.lang.NullPointerException 
> > >  at 
> org.elasticsearch.hadoop.rest.dto.Node.(Node.java:30) 
> > >  at 
> org.elasticsearch.hadoop.rest.RestClient.getNodes(RestClient.java:139) 
> > > 
> > > 
> > > Any idea? 
> > > bestsregards 
> > > philippe 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > > To unsubscribe from this group and stop receiving emails from it, 
> send an email to 
> > >elasticsearc...@googlegroups.com . 
> > > To view this discussion on the web visit 
> > >
> https://groups.google.com/d/msgid/elasticsearch/2a52b4e7-3ecd-491b-8c6e-1d8d521260ef%40googlegroups.com
>  
> > <
> https://groups.google.com/d/msgid/elasticsearch/2a52b4e7-3ecd-491b-8c6e-1d8d521260ef%40googlegroups.com>.
>  
>
> > > For more options, visithttps://groups.google.com/groups/opt_out <
> https://groups.google.com/groups/opt_out>. 
> > 
> > -- 
> > Costin 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to 
> > elasticsearc...@googlegroups.com . 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/1353d6b0-830e-4881-b10d-533639227a1c%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/groups/opt_out. 
>
> -- 
> Costin 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8f8fbb72-097d-47e8-8c2c-d2f74a1f4b00%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: What is the difference between query_string and multi-match for querying docs ?

2014-02-04 Thread Mukul Gupta
Hi Ivan,

I followed your advice and started using explain api for query_string but I
guess in process I found a bug (don't know if it really is a bug or
intended behaviour of query_string). This is going to be a long post,
please be patient with me.

I'm using a doc:{name:"new delhi to goa",st:"goa"}
On using analyzer api for indexing I got these tokens:

{
  "tokens" : [ {
"token" : "new",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
  }, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
  }, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi ",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi t",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 1
  }, {
"token" : "new",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new d",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new de",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new del",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delh",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi t",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to ",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to g",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to go",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "new delhi to goa",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
  }, {
"token" : "del",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
  }, {
"token" : "delh",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
  }, {
"token" : "delhi",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 2
  }, {
"token" : "del",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "delh",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "delhi",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"position" : 2
  }, {
"token" : "delhi ",
"start_offset" : 4,
"end_offset" : 12,
"type" : "word",
"po

Re: Inconsistent responses from aggregations (ES1.0.0RC1)

2014-02-04 Thread Nils Dijk
I've loaded the same dataset in ES1.0.0.Beta2 with the same index 
configuration as in the topic start.

However now the numbers are consistent if I call the same aggregation 
multiple times in a row AND the number match the numbers of the facets. 
This leads me to the conclusion something is broken from Beta2 to RC1!

I would like to test this on master, but I could not find any nightly 
builds of elasticsearch. Is there a location where they are stored or 
should I compile it myself?

On Friday, January 31, 2014 6:43:07 PM UTC+1, Nils Dijk wrote:
>
> Hi Binh Ly,
>
> Thanks for the response.
>
> I'm aware that the numbers are not exact (hence the link to issue #1305 in 
> my initial post), and have been advocating slightly incorrect numbers with 
> my colleges and customers for some time already to prepare them for the 
> moment we provide analytics with ES. But what bothers me is that they are 
> *inconsistent*.
>
> If you look at my gist you see that I ran the same aggs 3 times right 
> after each other. If we just look at the top item we see the following 
> results:
>
>1. { "key": "totaltrafficbos", "doc_count": 2880 }
>2. { "key": "totaltrafficbos", "doc_count": 2552 }
>3. { "key": "totaltrafficbos", "doc_count": 2179 }
>
> These results are taken within seconds without any change to the number of 
> documents in the index. If I run them even more you see that it rotates 
> between a hand full of numbers. Is this also behavior one would expect from 
> the aggs? And if so, why do the facets show the same number over and over 
> again?
>
> Anyway, I will try to work myself through the aggs code this weekend to get a 
> better hang of what we could do with it, and what not.
>
> -- Nils
>
> On Friday, January 31, 2014 6:18:43 PM UTC+1, Binh Ly wrote:
>>
>> Nils,
>>
>> This is just the nature of splitting data around in shards. Actually the 
>> terms facet has the same limitations (i.e. it will also give "approximate 
>> counts"). Neither the terms facet nor the terms aggregation is better or 
>> worse than the other - they are both approximations (using different 
>> implementations). It is correct that if you put all your data in 1 shard, 
>> then all the counts are exact. If you need to shard, you can increase the 
>> "shard_size" parameter inside the terms aggregation to "improve accuracy". 
>> Play with that number until it suits your purposes but the important thing 
>> is they are just approximations the more documents you have in the index - 
>> so just don't expect absolute numbers from them if you have more than 1 
>> shard.
>>
>> {
>>   "size": 0,
>>   "aggs": {
>> "a": {
>>   "terms": {
>> "field": "actor.displayName",
>> "shard_size": 1
>>   }
>> }
>>   }
>> }
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6bee2ff8-ae78-4837-91f5-77ee80f55d34%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: hadoop to ES problem

2014-02-04 Thread Costin Leau

The port and host are 9200 and localhost by default. What's your ES version?
ES-Hadoop requires 0.90 or higher (preferably the latest 0.90).

On 04/02/2014 12:59 PM, Phil gib wrote:

Hello Costin,
changing to "es.resource" does not help... :-(
  argh.. i see the socket connection closed in the ES logs.. so ES is contacted 
... the 9200 is correct ?

philippe
best regards

 configuration.set("es.resource", "eshindex2/eshlog2/_search?q=*");
 configuration.set("es.host", "localhost");
 configuration.set("es.port", "9200");
 System.out.println("conf:"+ configuration);
 JobConf job = new JobConf(configuration, SimpleJobES2Hadoop.class);
...


On Tuesday, February 4, 2014 11:07:37 AM UTC+1, Costin Leau wrote:

If you are using M1 you should use es.resource instead of es.query in your 
latest example.
es.query is supported in the upcoming 1.3.0.M2 (not in m1).

On 04/02/2014 12:01 PM, Phil gib wrote:
> hello my context:
>   ES 0.90.3,   elasticsearch-hadoop-1.3.0.M1 , eclipse- java
> i am experimenting ES->Hadoop  and Hadoop->ES
> no problem with Hadoop2ES  with these 3 settings  ( i see the index + 
data  through head plugin, perfect)
> job.set("es.resource", "eshindex2/eshlog2");
> job.set("es.host", "localhost");
> job.set("es.port", "9200");
> but experimenting the inverse , ie ES 2 Hadoop
> 
>  configuration.set("es.query", "eshindex2/eshlog2/_search?q=*");
>  configuration.set("es.host", "localhost");
>  configuration.set("es.port", "9200");
>  System.out.println("conf:"+ configuration);
>  JobConf job = new JobConf(configuration, 
SimpleJobES2Hadoop.class);
> ...
> i got the error
> 10:58:27 INFO mapred.JobClient: Cleaning up the staging area
> file:/tmp/hadoop-xxx/mapred/staging/xxx-1302405475/.staging/job_local_0001
> Exception in thread "main" java.lang.NullPointerException
>  at org.elasticsearch.hadoop.rest.dto.Node.(Node.java:30)
>  at 
org.elasticsearch.hadoop.rest.RestClient.getNodes(RestClient.java:139)
>
>
> Any idea?
> bestsregards
> philippe
>
> --
> You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
email to
>elasticsearc...@googlegroups.com .
> To view this discussion on the web visit

>https://groups.google.com/d/msgid/elasticsearch/2a52b4e7-3ecd-491b-8c6e-1d8d521260ef%40googlegroups.com

.
> For more options, visithttps://groups.google.com/groups/opt_out 
.

--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1353d6b0-830e-4881-b10d-533639227a1c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52F0CD8E.30100%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Query object at specific occurrence/index of nested type

2014-02-04 Thread Michael Lawler
Hi,

When working with a nested type which is a list of objects, is it possible 
to constrain the scope of the query to an object ay a specific index?

i.e. if 'foo' is a nested type, can I search only for foo[0] rather than 
foo[1] within the parent document.

i.e. I want the path of my nested query to be 'foo[0]' not 'foo'

Michael

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b262dcff-432e-4e1e-a1af-a97ddae6e751%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: hadoop to ES problem

2014-02-04 Thread Phil gib
   Hello Costin,
changing to "es.resource" does not help... :-(
 argh.. i see the socket connection closed in the ES logs.. so ES is 
contacted ... the 9200 is correct ?

philippe 
best regards

configuration.set("es.resource", "eshindex2/eshlog2/_search?q=*");
configuration.set("es.host", "localhost");
configuration.set("es.port", "9200");
System.out.println("conf:"+ configuration);
JobConf job = new JobConf(configuration, 
SimpleJobES2Hadoop.class);  
...


On Tuesday, February 4, 2014 11:07:37 AM UTC+1, Costin Leau wrote:
>
> If you are using M1 you should use es.resource instead of es.query in your 
> latest example. 
> es.query is supported in the upcoming 1.3.0.M2 (not in m1). 
>
> On 04/02/2014 12:01 PM, Phil gib wrote: 
> > hello my context: 
> >   ES 0.90.3,   elasticsearch-hadoop-1.3.0.M1 , eclipse- java 
> > i am experimenting ES->Hadoop  and Hadoop->ES 
> > no problem with Hadoop2ES  with these 3 settings  ( i see the index + 
> data  through head plugin, perfect) 
> > job.set("es.resource", "eshindex2/eshlog2"); 
> > job.set("es.host", "localhost"); 
> > job.set("es.port", "9200"); 
> > but experimenting the inverse , ie ES 2 Hadoop 
> >  
> >  configuration.set("es.query", "eshindex2/eshlog2/_search?q=*"); 
> >  configuration.set("es.host", "localhost"); 
> >  configuration.set("es.port", "9200"); 
> >  System.out.println("conf:"+ configuration); 
> >  JobConf job = new JobConf(configuration, 
> SimpleJobES2Hadoop.class); 
> > ... 
> > i got the error 
> > 10:58:27 INFO mapred.JobClient: Cleaning up the staging area 
> > 
> file:/tmp/hadoop-xxx/mapred/staging/xxx-1302405475/.staging/job_local_0001 
> > Exception in thread "main" java.lang.NullPointerException 
> >  at org.elasticsearch.hadoop.rest.dto.Node.(Node.java:30) 
> >  at 
> org.elasticsearch.hadoop.rest.RestClient.getNodes(RestClient.java:139) 
> > 
> > 
> > Any idea? 
> > bestsregards 
> > philippe 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to 
> > elasticsearc...@googlegroups.com . 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/2a52b4e7-3ecd-491b-8c6e-1d8d521260ef%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/groups/opt_out. 
>
> -- 
> Costin 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1353d6b0-830e-4881-b10d-533639227a1c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: how to run default script when user request the api

2014-02-04 Thread Hendrik
Maybe this helps: https://github.com/salyh/elasticsearch-security-plugin

Am Sonntag, 26. Januar 2014 05:39:59 UTC+1 schrieb David shi:
>
> Hi guys:
>
> I have data  like: {"emolument":2, "partment":"Financial", "ACL": 
> {"jack":"rw","david":"r"} }
>
> Now when user request api with "DELETE" or "PUT"  (with useranme like. 
> jack, david),  I will check permission with username in ACL.
>
> But how to do it would be more convenient to perform this check by 
> default, like. if  ACL.username in [rw, w] to do next else stop request 
> and raise exception
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f4085f7c-bd13-4853-8f26-84ed33a724f5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Refresh call is getting stuck if merge scheduler runs in between

2014-02-04 Thread vineeth mohan
Hi ,

I do give a refresh to true in my Bulk request call.
It works fine every time but then when the following logs appear in the
logs , the call get stuck -

[2014-02-04 00:04:36,715][DEBUG][index.merge.scheduler] [Richard Rider]
[relations][0] merge [_1vl] done, took [1.1m]
[2014-02-04 00:04:48,064][DEBUG][index.merge.scheduler] [Richard Rider]
[relations][1] merge [_1ye] done, took [1.2m]


I am making this call from inside the rest plugin interface using the
client object given to me in that interface.

I am using ES version .90.9 .
Is this any bug and is there is a work around for this , like disabling
merge scheduler for time being , until i complete indexing.

Thanks
   Vineeth

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DXRDhDxCXGcO3pSsphabCtqJTi9MH%3D21bxeQ1-sjys9w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


[Ann] ElasticSearch OSEM and ElasticSearch Redis Transport

2014-02-04 Thread Kevin Wang
Hi

I've released a Object/Search Engine Mapping (OSEM) for ElasticSearch and 
Redis Transport for ElasticSearch

https://github.com/kzwang/elasticsearch-osem

https://github.com/kzwang/elasticsearch-transport-redis


Thanks.
Kevin

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9fe91fd-d1a1-4af0-a431-8fc441b11b41%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Restarting a cluster with existing data - Status Red?

2014-02-04 Thread Boaz Leskes
A couple of points:

1) If you bring down a whole cluster and start it back up, it may be that 
during the start process the cluster is red. The reason is that until all 
nods have rejoined some data may not be (yet) available for searching. This 
should be resolve as soon as all the nodes are back (potentially earlier 
depending on your replication settings)
2) Though *not* recommended - kill -9 should not result in data loss. If so 
it's a bug and should be reported.



On Monday, February 3, 2014 11:15:24 PM UTC+1, Tony Su wrote:
>
> Thx for the input.
> Nope, ES is being shutdown "normally" usually by simply stopping the 
> configured ES service, and only after it fully completes executing a 
> shutdown.
>  
> Tony
>  
>  
>
> On Monday, February 3, 2014 2:09:08 PM UTC-8, InquiringMind wrote:
>
>> Tony,
>>
>> You're not doing a kill -9 during shutdown, I hope. If so, that would 
>> result in a large window of opportunity for index corruption.
>>
>> Just something to check for...
>>
>> We always do a normal kill to the pid within the pid file to shut down an 
>> ES instance before shutting down the machine itself, or before upgrading 
>> the software.And we have never seen any issues with the cluster coming back 
>> up in the same (usable, usually yellow or green) state that it was before 
>> the shutdown.
>>
>> On two occasions we have had machines power off due to thermal overload 
>> in the server room. This is a drastic event that is usually as dangerous 
>> (to disk data integrity) as a kill -9, but in these cases there wasn't any 
>> load on the machine and we experienced no data loss nor did we see the 
>> cluster as anything but green once the machine came back up and the node 
>> restarted.
>>
>> Brian
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/59194a58-e1e7-4d84-ab4a-74ab186326b6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: ES 0.20.5 adding a new node to a running cluster (unicast mode)

2014-02-04 Thread Boaz Leskes
Hey,

The unicast discovery is just a different way of doing discovery and if 
configured correctly it shouldn't introduce any over head. You should think 
of the list of hosts in the unicast list as a "seed list", it doesn't need 
to be complete. When a node starts it will ping these addresses to register 
it self and discover all other nodes currently in the cluster. After this 
initial discovery bootstrapping, the nodes gossip with each other and do 
not use this list.

Typically you'd put a short list there, like 3, of nodes that you expect to 
always be part of your cluster. Note that it's OK to temporarily take one 
of those nodes off line, as the rest of the list will be used as fall back.

Cheers,
Boaz



On Monday, February 3, 2014 9:44:20 PM UTC+1, Dan Fairs wrote:
>
> yes of course in one work: unicast seems like an overhead ...  , think
> of a scenario that you need to add more nodes to the cluster (pretty
> common one for us) you need to change the yml config on each node and
> restart the cluster Node by Node making sure that all other nodes
> "sees" the new node  
>
>
> I'm pretty sure this is incorrect actually. I suspect if you had added 
> some existing nodes' IPs to the discovery list on the new node, all the 
> nodes would have formed a cluster. Information about what nodes are in the 
> cluster is stored canonically on the master and replicated around - once a 
> node is known about, its existence should be communicated to the other 
> nodes.
>
> Cheers,
> Dan
> --
> Dan Fairs | dan@gmail.com  | @danfairs | secondsync.com
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6539b1f3-7ebd-4133-84ad-c27e68b36166%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Marvel - How is "Free Disk Space" evaluated? (Displayed in red)

2014-02-04 Thread Boaz Leskes
Hi Tony,

The red color does mean it needs attention. By default, Marvel will warn 
you if you have less than 50GB of free space (displayed in yellow) and will 
go red if you have less than 20GB. If you had higher numbers displayed in 
color, please let me know as it is a bug. 

Cheers,
Boaz

On Monday, February 3, 2014 7:16:04 PM UTC+1, Tony Su wrote:
>
> In this screenshot,
> *https://github.com/putztzu/Misc_images/blob/master/marvel_only.png*
>  
> The "Free Disk Space" is displayed in red.
>  
> Does this red color mean something, eg a warning of some kind or is it 
> simply stylistic?
> I remember when I was pointing the data directory to hundreds of gigabytes 
> instead of tens, the color was still red.
>  
> Thx,
> Tony
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22f3260e-1bd5-4d06-88a4-e533f5cd580c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Marvel Document Creation Concern

2014-02-04 Thread Boaz Leskes
Hi David, Joseph,

Marvel generates (by default) reports every 5 seconds. Depending on the 
number of nodes/shards & indices this can result in different amount of 
data. All of marvel's data is stored (again by default) in daily indices 
named .marvel-.MM.dd (replace with date). You can safely delete older 
data based on your retention needs.

This little tool can be very handy 
there: https://github.com/elasticsearch/curator

Cheers,
Boaz

On Monday, February 3, 2014 9:16:29 PM UTC+1, Joseph Lombardo wrote:
>
> Same issue here!
>
> 1GB of total data with very little traffic and Marvel is generating a 
> 4.5GB index every day.
>
> On Monday, February 3, 2014 3:11:50 PM UTC-5, David Harrigan wrote:
>>
>> Hi,
>>
>> Firstly, thanks for an awesome plugin. I'm trying it out and so far very 
>> impressed.
>>
>> However, I'm very concerned about something. I have a small dataset of 
>> about 30,000 documents which only grows a few thousand every few days. 
>> Since installing Marvel (on launch day), my total number of documents has 
>> shot up to over 2 million and growing! It seems that Marvel is writing 
>> documents every few milliseconds into it's indices. I'm watching the 
>> Document Count just going up and up! This leads me on to my first question. 
>> Is this normal? Secondly, if I don't want this number of documents what do 
>> I do? Can I get Marvel to delete data over X days old?
>>
>> Any guidance and advice on this would be greatly appreciated.
>>
>> Thank you kindly,
>>
>> -=david=-
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eb316604-8bcc-41b2-ad89-f61a5803571d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


hadoop to ES problem

2014-02-04 Thread Phil gib
hello my context:
 ES 0.90.3,   elasticsearch-hadoop-1.3.0.M1 , eclipse- java
i am experimenting ES->Hadoop  and Hadoop->ES
no problem with Hadoop2ES  with these 3 settings  ( i see the index + data  
through head plugin, perfect) 
job.set("es.resource", "eshindex2/eshlog2");
job.set("es.host", "localhost");
job.set("es.port", "9200");
but experimenting the inverse , ie ES 2 Hadoop 
   
configuration.set("es.query", "eshindex2/eshlog2/_search?q=*");
configuration.set("es.host", "localhost");
configuration.set("es.port", "9200");
System.out.println("conf:"+ configuration);
JobConf job = new JobConf(configuration, 
SimpleJobES2Hadoop.class);  
...
i got the error 
10:58:27 INFO mapred.JobClient: Cleaning up the staging area 
file:/tmp/hadoop-xxx/mapred/staging/xxx-1302405475/.staging/job_local_0001
Exception in thread "main" java.lang.NullPointerException
at org.elasticsearch.hadoop.rest.dto.Node.(Node.java:30)
at 
org.elasticsearch.hadoop.rest.RestClient.getNodes(RestClient.java:139)


Any idea?
bestsregards 
philippe

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2a52b4e7-3ecd-491b-8c6e-1d8d521260ef%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[Ann] Elasticsearch Windows MSI Installer updated to ES 0.90.11

2014-02-04 Thread Hendrik
Hi,

i'd like to announce a update of the Windows MSI Installer for 
Elasticsearch which now installs ES 0.90.11 with Oracle JRE 7 Update 51.

It can be found here:
https://github.com/salyh/elasticsearch-msi-installer/releases/tag/2.0.90.11

What is Windows MSI Installer for Elasticsearch?

Create your own Elasticsearch MSI installer with a customized ES config 
that fit your needs.
Or simply download the standard preconfigured installer that comes 
configured with:

   - 2G Heapsize
   - analysis-phonetic Plugin preinstalled
   - mapper-attachments Plugin preinstalled
   - dynamic scripting turned off (for more security)
   - "embedded" Java Runtime (Java 7 Update 51 x64 Server JRE)
   - Windows "elasticsearch" service will be automatically started by the 
   installer and after OS boots
   - Uninstalling is supported
   
Runs on x64 Win 7/Server 2008*/Win 2012

Its implemented as a set of non-interactive cmd and Powershell 2 scripts to 
build (on top of WIX Toolset) a MSI installer for installing Elasticsearch 
as a windows service. 
The scripts will fetch Elasticsearch and Java Server JRE (and WIX Toolset) 
directly from the internet and then they create a MSI installer with an 
embedded JRE.

I hope it is useful for all of you who have to deal with ES and Windows.

Suggestions, corrections, improvements are very welcome!
Thanks and best regards
Hendrik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f6dd425e-baf3-4b31-ac61-bb256c3dedd3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: hadoop to ES problem

2014-02-04 Thread Costin Leau

If you are using M1 you should use es.resource instead of es.query in your 
latest example.
es.query is supported in the upcoming 1.3.0.M2 (not in m1).

On 04/02/2014 12:01 PM, Phil gib wrote:

hello my context:
  ES 0.90.3,   elasticsearch-hadoop-1.3.0.M1 , eclipse- java
i am experimenting ES->Hadoop  and Hadoop->ES
no problem with Hadoop2ES  with these 3 settings  ( i see the index + data  
through head plugin, perfect)
job.set("es.resource", "eshindex2/eshlog2");
job.set("es.host", "localhost");
job.set("es.port", "9200");
but experimenting the inverse , ie ES 2 Hadoop

 configuration.set("es.query", "eshindex2/eshlog2/_search?q=*");
 configuration.set("es.host", "localhost");
 configuration.set("es.port", "9200");
 System.out.println("conf:"+ configuration);
 JobConf job = new JobConf(configuration, SimpleJobES2Hadoop.class);
...
i got the error
10:58:27 INFO mapred.JobClient: Cleaning up the staging area
file:/tmp/hadoop-xxx/mapred/staging/xxx-1302405475/.staging/job_local_0001
Exception in thread "main" java.lang.NullPointerException
 at org.elasticsearch.hadoop.rest.dto.Node.(Node.java:30)
 at org.elasticsearch.hadoop.rest.RestClient.getNodes(RestClient.java:139)


Any idea?
bestsregards
philippe

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2a52b4e7-3ecd-491b-8c6e-1d8d521260ef%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52F0BBE9.5040406%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


date_historgram facet issue

2014-02-04 Thread samuel . merlet
Hi
i have some documents like this

{ 
 date : "2014-01-01",
 "periods" : {
{ "start" : 0 ,  "duration": 55  },
{ "start" : 1 ,  "duration": 55  },

{ "start" : 2 ,  "duration": 55  },

{ "start" : 3 ,  "duration": 55  },

etc...
  }
}



I do a query on a date range with a statistical facet to get the sum of all 
duration


{
  "query": {
"filtered": {
  "query": {
"match_all": {}
  },
  "filter": {
"range": {
  "date": {
"gte": "2014-01-01",
"lte": "2014-01-11"
  }
}
  }
}
  },
  "facets": {
"stat1": {
  "statistical": {
"field": "duration"
  },
  "nested": "periods"
}
  }
}


this works perfect .
Now i need to get the sum of duration for each day , so i tried by using 
the date_historgram , but i have no luck with it ( i guess because my 
periods is a nested object ) 


here is my try

{
  "query": {
"filtered": {
  "query": {
"match_all": {}
  },
  "filter": {
"range": {
  "date": {
"gte": "2013-01-10",
"lte": "2013-01-11"
  }
}
  }
}
  },
  "facets": {
"stat1": {
  "statistical": {
"field": "duration"
  },
  "nested": "periods"
},
"stat2": {
  "date_histogram": {
"key_field": "date",
"value_field": "duration",
"interval": "day"
  },
  "nested": "periods"
}
  }
}


Anyone can help me on this ? Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3a65022d-1787-4d1a-aaca-d0f493f54070%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: And/Or Filter using Java API

2014-02-04 Thread Mohammad Shafraz Subdurally

Hello everyone , 
 i am also trying to do something like this:
my intended query is like this:
select from range (dateA to dateB);


AndFilterBuilder myFilters = FilterBuilders.andFilter();
myFilters.add(FilterBuilders.rangeFilter("dateFormatted").gte(dateDebut).lte(dateFin));
 


howeve this returns to nothing:
the values dateDebut and dateFin are in the following format:
/MM/dd


and the field *dateFormatted* is in the following format:


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b014627-03d6-4cb6-abd2-a2780aa15e0d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: boosting in es

2014-02-04 Thread Navneet Mathpal

if I am running the above command it is showing SearchPhaseExecutionException 
error ... but the command you have suggested working fine.
Thanks.







On Tuesday, 4 February 2014 14:24:39 UTC+5:30, Jayesh Bhoyar wrote:
>
> Hi Navneet,
>
> What error you are getting while running above command?
>
> Try following Query:
> curl -XPOST "localhost:9200/indexName/indexType/_search?pretty=true" -d ' 
> { 
>   "query" : { 
> "boosting" : { 
> "positive" : { 
> "term" : { 
> "name" : "kamal" 
> } 
> }, 
> "negative" : { 
> "term" : { 
> "email" : "abc"
> } 
> }, 
> "negative_boost" : 0.5 
> } 
>   } 
> } 
>
> ---
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/429f456d-787c-4e28-b70f-e45022d308c3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: boosting in es

2014-02-04 Thread Jayesh Bhoyar
Hi Navneet,

What error you are getting while running above command?

Try following Query:
curl -XPOST "localhost:9200/indexName/indexType/_search?pretty=true" -d ' 
{ 
  "query" : { 
"boosting" : { 
"positive" : { 
"term" : { 
"name" : "kamal" 
} 
}, 
"negative" : { 
"term" : { 
"email" : "abc"
} 
}, 
"negative_boost" : 0.5 
} 
  } 
} 

---

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/111428e1-cd83-4dc9-8614-09be7d03bcbf%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: upgrade 0.20 to 1.0

2014-02-04 Thread GX
Thanks for the clarification Mark

On Tuesday, February 4, 2014 9:04:43 AM UTC+2, Mark Walkom wrote:
>
> The *easiest* is to do an upgrade directly to v1.0, but I highly doubt if 
> that will even work after the upgrade due to the number of changes between 
> 0.2X, 0.90.X and 1.0.0.
> And frankly, you'd be insane to consider it if you wanted to keep your 
> data.
>
> If you can export your data to disk, then reimport/reindex it, that might 
> be an easier option for you.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 4 February 2014 17:20, GX > wrote:
>
>> Hi Mark
>>
>> Thanks for the seedy reply, I was not thinking of an immediate upgrade 
>> since it needs to be scheduled tested etc, but rather considering skipping 
>> 90 since 1.0 is so close to release and doing a one step upgrade instead of 
>> a 2 step so rather than do an upgrade now then again in a few months just 
>> go strait for 1.0 when its final release is out, my main concern is if its 
>> possible to skip 90 in production, of course testing will be done however 
>> it never matches to real word circumstances since replicating a test 
>> environment with 4 nodes and gigabytes of data may in theory be possible 
>> but actual load and results are difficult do.
>> I suppose in the extreme case we can just do a fresh install of 1.0 and 
>> reindex all the data it just means several hours of 'downtime'
>>
>> so I guess the real question is which route is easiest as opposed to 
>> safest 
>> 0.20 -> 0.90 -> 1.0
>> 0.20 -> 1.0
>> nuke and pave strait to 1.0
>>
>> GX
>>
>> On Tuesday, February 4, 2014 7:42:11 AM UTC+2, Mark Walkom wrote:
>>>
>>> You will probably want to upgrade to 0.90.0 first.
>>> v1.0 is still RC so if this is production you might want to hold off.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>  
>>>
>>> On 4 February 2014 16:21, GX  wrote:
>>>
 Hi All

 it seems I have been slacking on keeping up do date with updates, will 
 it be possible to upgrade a 0.22.1 cluster to 1.0 or is it better to do it 
 in two steps i.e. 0.90.x first

 Thanks

 GX

 -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/745d1173-2045-4c69-83fc-95e9e078873c%
 40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/eb82d6a6-dc7f-489d-859b-31e712891a2e%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f356527b-239c-4bdc-be13-13e4aa915fd0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


boosting in es

2014-02-04 Thread Navneet Mathpal
I  am trying to do the follwing query but it is showing error



{
  "query": {
"boosting": {
  "positive": {
"term": {
  "name": "kamal"
}
  },
  "negative": {
"term": {
  "email": "abc"
}
  }
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/764af94e-0ff3-4993-a0ec-df0b02444619%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


  1   2   >