date:20140226

Try using the term query as term query is not analyzed so it might search the
exact term only.

{
"query" : {
"term" : { "street" : "xxx" }
}
}



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Exact-phrase-match-city-names-example-tp4019310p4050604.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1393485909531-4050604.post%40n3.nabble.com.
For more options, visit https://groups.google.com/groups/opt_out.

How to count??

Hello

I want to analysis " who read the how many books?"

When bookid is duplicate, I don't want to count.

POST /bookdatas/_search
{
"size": 0,
"facets": {
"ips_stats": {
"terms_stats": {
"key_field": "userId",
"value_field": "duration",
"size": 5
},
  "facet_filter": {
"fquery": {
  "query": {
"filtered": {
  "query": {
"bool": {
  "should": [
{
  "query_string": {
"query": "\"(Google)b...@gmail.com\""
  }
}
  ]
}
  }
}
  }
}
  }
}
}
}

result 
 "facets": {
  "ips_stats": {
 "_type": "terms_stats",
 "missing": 0,
 "terms": [
{
   "term": "(Google)b...@gmail.com",
   "count": 64,
   "total_count": 3,
   "min": 2,
   "max": 139436,
   "total": 140733,
   "mean": 46911
},


userId  prdId  duration  chapId(Google)bbb@gmail.com010020244421
(Google)bbb@gmail.com01002024441394361(Google)bbb@gmail.com010020244412951
A user only read a book, but count is three.

Can you give me a suggestion??

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6e865a48-54ad-4f95-aa60-732598d35de8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Relation Between Heap Size and Total Data Size

2014-02-26 Thread Umutcan


So, I am wondering that is there any relationship between heap size and total 
data size? Is there any formula to determine heap size based on data size?


You might want to check that you're not running out of file handles:

   http://www.elasticsearch.org/tutorials/too-many-open-files/


Thanks Dan. This article solves my problem.

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/530EDFEF.1060505%40gamegos.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Help Understanding custom_filters_score Error

2014-02-26 Thread James Martin

Hi Chris,

I'm in the same boat; looking to combine an "or filter" (so 1 or the other 
filter matches) with a custom_filters_score in order to boost results which 
meet a certain criteria.
Did you have any luck solving this?


On Friday, 3 May 2013 15:26:52 UTC+10, Chris wrote:
>
> Of course, posting in public results in me (finally) seeing the obvious 
> error:
>
>  "filter": {
> "boost": "1.5",
>
> Should be:
>  
>  "boost": "1.5",
>  "filter": {
>
>
> However, unlike my previous OR filter, which only returned results which 
> matched one of the filters, now it returns all records (with the right 
> records boosted to the top) - even those that don't match any filters? 
>
> Is it possible to replicate the behavior of an OR filter query with a 
> customer_filters_score query?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/11ab4355-3dbd-4809-a61d-f8ffc5c28686%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Include special Symbol

Hello

I already solve this problem. Your suggest is right.

Thanks
Nick

Binh Ly於 2014年2月26日星期三UTC+8下午11時03分29秒寫道：
>
> You'll likely need that field to be unanalyzed (i.e. tell ES not to cut it 
> up in the index). One way is to predefine that field in your mapping as:
>
> "user": {
> "type": "string",
> "index": "not_analyzed"
> }
>
> More details here:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7005059-e547-460e-a37c-fa1f6a020947%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to join 2 indexes at query time

2014-02-26 Thread Matt Weber

How about using parent/child functionality?

https://gist.github.com/mattweber/96f3515fc4453a5cb0db

Thanks,
Matt Weber



On Wed, Feb 26, 2014 at 7:45 PM, Jayesh Bhoyar wrote:

> Hi Binh,
>
> Thanks for the answer.
>
> Is there any case if I index this data into same index with different
> category GIST@ https://gist.github.com/jsbonline2006/9243973
> I have 1 index:
>
> productindex/ Type: offertype
> productindex/ Type: categorytype
>
>
> Now as per my index data:
> My input will be category "Flat TV"
> And in output: I want all skuid for "Flat TV" and there corresponding 
> offer_id.
>
> Regards,
> Jayesh Bhoyar
>
> *GIST @https://gist.github.com/jsbonline2006/9243973 
> *
>
>
> On Wednesday, February 26, 2014 8:07:01 PM UTC+5:30, Binh Ly wrote:
>>
>> Unfortunately, ES is not like SQL in this respect. You'll need to
>> denormalize somewhat because ES is more "document-oriented". You'd probably
>> need to either denormalize offer_id into categorytype, or category into
>> offertype to get all the data you want returned in 1 query.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/bdd1dd5e-be45-4faa-a01f-f6e491249d65%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoCF1hPXeapnrXyPpv3h%3DSetwCPN2MUSV%3DYtNiwW286HWA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to join 2 indexes at query time

2014-02-26 Thread Jayesh Bhoyar

Hi Binh,

Thanks for the answer.

Is there any case if I index this data into same index with different 
category GIST@ https://gist.github.com/jsbonline2006/9243973
I have 1 index:

productindex/ Type: offertype
productindex/ Type: categorytype


Now as per my index data:
My input will be category "Flat TV"
And in output: I want all skuid for "Flat TV" and there corresponding offer_id.

Regards,
Jayesh Bhoyar

*GIST @https://gist.github.com/jsbonline2006/9243973*


On Wednesday, February 26, 2014 8:07:01 PM UTC+5:30, Binh Ly wrote:
>
> Unfortunately, ES is not like SQL in this respect. You'll need to 
> denormalize somewhat because ES is more "document-oriented". You'd probably 
> need to either denormalize offer_id into categorytype, or category into 
> offertype to get all the data you want returned in 1 query.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bdd1dd5e-be45-4faa-a01f-f6e491249d65%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: APT repositories available?

2014-02-26 Thread Mark Walkom

I think you have it incorrectly;
deb http://packages.elasticsearch.org/elasticsearch/1.0/debian stable main

Not;
deb http://packages.elasticsearch.org/elasticsearch/1.0/debian main stable

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 February 2014 12:41, Trey Hyde  wrote:

> I'm trying to integrate the apt repositories into our setup according to
>
> http://www.elasticsearch.org/blog/apt-and-yum-repositories/
> and
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html
>
>
> A few days ago, apt would also give 403 for those repos.  At this point
> they are giving 404s.   Did I miss something or are the repos now
> unavailable?
>
> Thanks
>
>
> $ cat /etc/apt/sources.list.d/*
>
> deb http://packages.elasticsearch.org/elasticsearch/1.0/debian main
> stable
>
> deb http://packages.elasticsearch.org/logstash/1.3/debian  main stable
>
>
> $ sudo apt-get update
>
> ...
>
> Err http://packages.elasticsearch.org main/stable amd64 Packages
>
>   404  Not Found
>
> Ign http://packages.elasticsearch.org main/stable Translation-en_US
>
> Ign http://packages.elasticsearch.org main/stable Translation-en
>
> Err http://packages.elasticsearch.org main/stable amd64 Packages
>
>   404  Not Found
>
> ...
>
> Ign http://packages.elasticsearch.org main/stable Translation-en_US
>
> Ign http://packages.elasticsearch.org main/stable Translation-en
>
> ...
>
> W: Failed to fetch
> http://packages.elasticsearch.org/elasticsearch/1.0/debian/dists/main/stable/binary-amd64/Packages
> 404  Not Found
>
> W: Failed to fetch
> http://packages.elasticsearch.org/logstash/1.3/debian/dists/main/stable/binary-amd64/Packages
> 404  Not Found
>
>
> E: Some index files failed to download. They have been ignored, or old
> ones used instead.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6ebb001e-ed99-414b-ae49-8dc6942d1c50%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Z5CGxo9dadGKL2tJDzwUajFL2%3DDLF8poqd2SoJrWJLdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

APT repositories available?

2014-02-26 Thread Trey Hyde

I'm trying to integrate the apt repositories into our setup according to

http://www.elasticsearch.org/blog/apt-and-yum-repositories/
and
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html

A few days ago, apt would also give 403 for those repos. At this point
they are giving 404s. Did I miss something or are the repos now
unavailable?

Thanks

$ cat /etc/apt/sources.list.d/*

deb http://packages.elasticsearch.org/elasticsearch/1.0/debian main
stable

deb http://packages.elasticsearch.org/logstash/1.3/debian main stable

$ sudo apt-get update

...

Err http://packages.elasticsearch.org main/stable amd64 Packages

404 Not Found

Ign http://packages.elasticsearch.org main/stable Translation-en_US

Ign http://packages.elasticsearch.org main/stable Translation-en

Err http://packages.elasticsearch.org main/stable amd64 Packages

404 Not Found

...

Ign http://packages.elasticsearch.org main/stable Translation-en_US

Ign http://packages.elasticsearch.org main/stable Translation-en

...

W: Failed to fetch
http://packages.elasticsearch.org/elasticsearch/1.0/debian/dists/main/stable/binary-amd64/Packages

404 Not Found

W: Failed to fetch
http://packages.elasticsearch.org/logstash/1.3/debian/dists/main/stable/binary-amd64/Packages

404 Not Found

E: Some index files failed to download. They have been ignored, or old ones
used instead.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6ebb001e-ed99-414b-ae49-8dc6942d1c50%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Bug in Java PutMapping API?

2014-02-26 Thread David Pilato

It looks good to me.
Could you check the response?

PutMappingResponse response = client.admin().indices()
.preparePutMapping(index)
.setType(type)
.setSource(xcontent)
.execute().actionGet();
if (!response.isAcknowledged()) {
throw new Exception("Could not define mapping for type [" + 
index + "]/[" + type + "].");
} 

May be there is something else wrong in your actual code. Could you gist your 
full code?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 27 févr. 2014 à 00:51, Andre Encarnacao  a 
écrit :

I am trying to use the Java API (v1.0.0) to create a custom mapping for my 
index but have run into a problem. More specifically, some of my mapping fields 
(for example, "format" and "index") are not being stored as part of the mapping 
in Elastic Search. In fact, the only field that is being stored is the "type" 
field. However, if I use the CURL API, everything works as expected and all 
fields are stored properly in my mapping. Has anyone seen this problem when 
using the Java API? Is this a bug? I put together a very simple example that 
demonstrates my bug below:


--- SOURCE CODE ---

Client client = new TransportClient().addTransportAddress(new 
InetSocketTransportAddress(_hostname,_port));

XContentBuilder mapping = XContentFactory.jsonBuilder()
.startObject().startObject("type_name")
.startObject("properties")
.startObject("ATTRIBUTE1")
.field("type","string")
.field("format","dateOptionalTime")
.endObject()
.startObject("ATTRIBUTE2")
.field("type","long")
.field("index","not_analyzed")
.endObject()
.endObject()
.endObject()
.endObject();

client.admin().indices().preparePutMapping("index_name").setType("type_name").setSource(mapping).execute().actionGet();


--- OUTPUT ---

Below is the mapping created in ES after running the code snippet above (from 
http://localhost:9200/index_name/_mapping?pretty). Notice that only the "type" 
fields were stored in the mapping.
{
  "index_name" : {
"mappings" : {
  "type_name" : {
"properties" : {
  "ATTRIBUTE1" : {
"type" : "string"
  },
  "ATTRIBUTE2" : {
"type" : "long"
  }
}
  }
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8a1ae1b-bb29-45d7-8387-e938b1c0cedc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/C476001D-D1F7-4518-A2ED-88A1DC080F4C%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Exact phrase match - city names example

2014-02-26 Thread thale jacobs

Thanks for the reply Binh Ly - I think the mapping in your example are 
almost like the example I posted and I believe are functionality the 
equivalent.  But my query against the not_analyzed fields return all the 
docs with the word "Main" in themFrom the query side I also thought I 
could specify "analyzer" : "keyword"...but also get the same results...but 
yes, your are correct in something seems off as I can query and the case of 
the search term does not seem to impact the results so that is telling me a 
search analyzer is being used???
 

On Wednesday, February 26, 2014 5:12:02 PM UTC-5, Binh Ly wrote:

> Thale,
>
> Can you double check the mapping. Something seems off to me. Should be 
> something like this:
>
> {
>   "mappings": {
> "name": {
>   "properties": {
> "street": {
>   "type": "string",
>   "index" : "not_analyzed"
> }
>   }
> }
>   }
> }
>
> And don't forget, not_analyzed means case-sensitive matches, fyi. :)
>
> On Wednesday, February 26, 2014 4:51:40 PM UTC-5, thale jacobs wrote:
>>
>> I am having problem a similar problem too.  Here is how I set it up the 
>> test index:
>>
>> Create the index:
>> curl -s -XPUT 'localhost:9200/test' -d '{
>> "mappings": {
>> "properties": {
>> "name": {
>> "street": {
>> "type": "string",
>> "index_analyzer": "not_analyzed",
>> "search_analyzer": "not_analyzed",
>> "index" : "not_analyzed"
>> }
>> }
>> }
>> }
>> }'
>>
>>
>>
>> Inert some data:
>> curl -s -XPUT 'localhost:9200/test/name/5' -d '{ "street": ["E Main 
>> St"]}'
>> curl -s -XPUT 'localhost:9200/test/name/6' -d '{ "street": ["W Main St"] 
>> }'
>> curl -s -XPUT 'localhost:9200/test/name/7' -d '{ "street": ["East Main 
>> Rd"] }'
>> curl -s -XPUT 'localhost:9200/test/name/8' -d '{ "street": ["West Main 
>> Rd"] }'
>> curl -s -XPUT 'localhost:9200/test/name/9' -d '{ "street": ["Main"] }'
>> curl -s -XPUT 'localhost:9200/test/name/10' -d '{ "street": ["Main St"] 
>> }'
>>
>>
>>
>>
>> --Now attempt to search for "Main"... Not "Main St", Not "East Main 
>> Rd"...I only want to return doc #9 - "Main"
>> curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
>>"query":{
>>   "bool":{
>>  "must":[
>> {
>>"match":{
>>   "street":{
>>  "query":"main",
>>  "type":"phrase",
>>  "analyzer" : "keyword"
>>   }
>>}
>> }
>>  ]
>>   }
>>}
>> }';
>>
>> The best document returned is "Main", but I don't know how to filter out 
>> the others that are not exact matches (although they contain matching 
>> terms).
>> ...
>> Here the results from my example above:
>>   "_score" : 0.2876821, "_source" : { "street": ["Main"] }
>>   "_score" : 0.25316024, "_source" : { "street": ["East Main Rd"] }
>>   "_score" : 0.25316024, "_source" : { "street": ["W Main St"] }
>>   "_score" : 0.25316024, "_source" : { "street": ["E Main St"]}
>>   "_score" : 0.1805489, "_source" : { "street": ["Main St"] }
>>   "_score" : 0.14638957, "_source" : { "street": ["West Main Rd"] }
>>
>>
>>
>>
>>
>> On Thursday, June 14, 2012 3:38:31 PM UTC-4, Colin Dellow wrote:
>>>
>>> Does "index": "not_analyzed" not work for you (
>>> http://www.elasticsearch.org/guide/reference/mapping/core-types.html) ?
>>>
>>>
>>> On Thursday, 14 June 2012 14:02:28 UTC-4, Greg Silin wrote:

 Hi,
 One of our fields in the index stores city names, and we need to ensure 
 that the term is matched exactly.

 So if we have "san francisco" indexed, we need to ensure that *only* 
 the term "san francisco" matches; "san" or "francisco" or "south san 
 francisco" should all be misses.

 In particular, I don't have a solution on how to make sure "san 
 francisco" does not match against "south san francisco"

 Thanks
 -greg

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f94aca2-1754-4358-9be7-f763b671fc48%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bug in Java PutMapping API?

2014-02-26 Thread Andre Encarnacao

I am trying to use the Java API (v1.0.0) to create a custom mapping for my 
index but have run into a problem. More specifically, some of my mapping 
fields (for example, "format" and "index") are not being stored as part of 
the mapping in Elastic Search. In fact, the only field that is being stored 
is the "type" field. However, if I use the CURL API, everything works as 
expected and all fields are stored properly in my mapping. Has anyone seen 
this problem when using the Java API? Is this a bug? I put together a very 
simple example that demonstrates my bug below:


--- SOURCE CODE ---

Client client = new TransportClient().addTransportAddress(new 
InetSocketTransportAddress(_hostname,_port));

XContentBuilder mapping = XContentFactory.jsonBuilder()
.startObject().startObject("type_name")
.startObject("properties")
.startObject("ATTRIBUTE1")
.field("type","string")
.field("format","dateOptionalTime")
.endObject()
.startObject("ATTRIBUTE2")
.field("type","long")
.field("index","not_analyzed")
.endObject()
.endObject()
.endObject()
.endObject();

client.admin().indices().preparePutMapping("index_name").setType("type_name").setSource(mapping).execute().actionGet();


--- OUTPUT ---

Below is the mapping created in ES after running the code snippet above 
(from http://localhost:9200/index_name/_mapping?pretty). Notice that only 
the "type" fields were stored in the mapping.
{
  "index_name" : {
"mappings" : {
  "type_name" : {
"properties" : {
  "ATTRIBUTE1" : {
"type" : "string"
  },
  "ATTRIBUTE2" : {
"type" : "long"
  }
}
  }
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8a1ae1b-bb29-45d7-8387-e938b1c0cedc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

indexing binary

2014-02-26 Thread ZenMaster80

I index PDFs using apache with the following mapping.


.field( "type", "attachment" )

.field("fields")

.startObject()

.startObject("file")

.field("store", "yes")

.endObject()

I want to index photos, I am able to extract text using OCR. I am confused 
how to index the text though, do I treat it like any document and not as an 
attachment? I have text as "String" when extracted and not base 64 like in 
the case of pdfs?
I am confused to how it gets stored and how does it work if I need to make 
it available during search? Can someone explain on how I do this?

XContentFactory.jsonBuilder().startObject()

   .startObject(INDEX_TYPE) 

   .startObject("_source").field("enabled","no").endObject()  
//This 
line will not store/not store the base 64 whole _source

 .startObject("properties")



So, My photo object becomes something like this, what about the source (the 
image itself ?)
jsonObject
{
  "content":"text extracted from image"
  "name":"my_photo.png"
}


//add to the bulk indexer for indexing

bulkProcessor.add(Requests.indexRequest(INDEX_NAME).type(INDEX_TYPE).id(
jsonObject.getString("name")).source(jsonObject.toString()));

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2012d7c6-b499-4318-8ae7-512879e5e8b8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Interesting question on Transaction Log record mutability

2014-02-26 Thread Yuri Panchenko

Thanks for the explanation!!  I thought that if a record is contained in 
the transaction log, it would not be part of a sement.  But as soon as we 
flush the transaction log, it re-indexes the changes into the segment and 
then commits to disk.  But it sounds that a record can be both in the 
transaction log as well as in the lucene segment itself but in-memory. 
 That sounds believable :)   I'm trying to come up with a data model that 
would be efficient for a Customer record that can have many transactions. 
I've ruled out inner objects, nested objects, and now tinkering with 
Parent/Child or complete denormalization.

Thanks again Binh!!

On Wednesday, February 26, 2014 1:46:14 PM UTC-8, Binh Ly wrote:
>
> Thanks, I think I understand better now. I deleted my previous post so 
> that I can clarify better. The transaction log is just a backup mechanism 
> for durability. When you index a document, it eventually goes into a 
> segment (in memory). When you update it, the old doc is marked as deleted 
> and then a new one is indexed into a/the segment. If no flush/commit has 
> been made so far, the documents/segments are still in memory and each 
> operation is also recorded in the transaction log (one for the first index, 
> and then another for the update, and so on). When you do a flush, the 
> in-memory segments are then written to disk and then the transaction log is 
> emptied out (since we no longer need it as "backup" at this point). If on 
> the other hand you simply do a refresh, the "new" segments in memory are 
> simply made searchable (even though they are not necessarily written to 
> disk yet) and no flush to disk happens. In this case, the transaction log 
> still contains whatever it had in it so far.
>
> So to answer your question, each update will require a new document to be 
> indexed (no way around it). And the transaction log is probably not 
> something that would matter in your scenario. I hope that helps. :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba73b0e0-6c57-4205-8bbd-c0a27a69de38%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Exact phrase match - city names example

Thale,

Can you double check the mapping. Something seems off to me. Should be 
something like this:

{
  "mappings": {
"name": {
  "properties": {
"street": {
  "type": "string",
  "index" : "not_analyzed"
}
  }
}
  }
}

And don't forget, not_analyzed means case-sensitive matches, fyi. :)

On Wednesday, February 26, 2014 4:51:40 PM UTC-5, thale jacobs wrote:
>
> I am having problem a similar problem too.  Here is how I set it up the 
> test index:
>
> Create the index:
> curl -s -XPUT 'localhost:9200/test' -d '{
> "mappings": {
> "properties": {
> "name": {
> "street": {
> "type": "string",
> "index_analyzer": "not_analyzed",
> "search_analyzer": "not_analyzed",
> "index" : "not_analyzed"
> }
> }
> }
> }
> }'
>
>
>
> Inert some data:
> curl -s -XPUT 'localhost:9200/test/name/5' -d '{ "street": ["E Main St"]}'
> curl -s -XPUT 'localhost:9200/test/name/6' -d '{ "street": ["W Main St"] 
> }'
> curl -s -XPUT 'localhost:9200/test/name/7' -d '{ "street": ["East Main 
> Rd"] }'
> curl -s -XPUT 'localhost:9200/test/name/8' -d '{ "street": ["West Main 
> Rd"] }'
> curl -s -XPUT 'localhost:9200/test/name/9' -d '{ "street": ["Main"] }'
> curl -s -XPUT 'localhost:9200/test/name/10' -d '{ "street": ["Main St"] }'
>
>
>
>
> --Now attempt to search for "Main"... Not "Main St", Not "East Main 
> Rd"...I only want to return doc #9 - "Main"
> curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
>"query":{
>   "bool":{
>  "must":[
> {
>"match":{
>   "street":{
>  "query":"main",
>  "type":"phrase",
>  "analyzer" : "keyword"
>   }
>}
> }
>  ]
>   }
>}
> }';
>
> The best document returned is "Main", but I don't know how to filter out 
> the others that are not exact matches (although they contain matching 
> terms).
> ...
> Here the results from my example above:
>   "_score" : 0.2876821, "_source" : { "street": ["Main"] }
>   "_score" : 0.25316024, "_source" : { "street": ["East Main Rd"] }
>   "_score" : 0.25316024, "_source" : { "street": ["W Main St"] }
>   "_score" : 0.25316024, "_source" : { "street": ["E Main St"]}
>   "_score" : 0.1805489, "_source" : { "street": ["Main St"] }
>   "_score" : 0.14638957, "_source" : { "street": ["West Main Rd"] }
>
>
>
>
>
> On Thursday, June 14, 2012 3:38:31 PM UTC-4, Colin Dellow wrote:
>>
>> Does "index": "not_analyzed" not work for you (
>> http://www.elasticsearch.org/guide/reference/mapping/core-types.html) ?
>>
>>
>> On Thursday, 14 June 2012 14:02:28 UTC-4, Greg Silin wrote:
>>>
>>> Hi,
>>> One of our fields in the index stores city names, and we need to ensure 
>>> that the term is matched exactly.
>>>
>>> So if we have "san francisco" indexed, we need to ensure that *only* the 
>>> term "san francisco" matches; "san" or "francisco" or "south san francisco" 
>>> should all be misses.
>>>
>>> In particular, I don't have a solution on how to make sure "san 
>>> francisco" does not match against "south san francisco"
>>>
>>> Thanks
>>> -greg
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/be851b91-7355-4d11-bac6-20ff321611d3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: elasticsearch cache configuration

2014-02-26 Thread Mark Walkom

Configuration settings go in your elasticsearch.yml file.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 February 2014 02:51, Zachary Tong  wrote:

> If you simply want to decrease the amount of memory that Elasticsearch is
> using, you need to change your heap size (via the HEAP_SIZE environment
> variable).  That controls the total memory allocated to Elasticsearch.
>
> Echoing what Binh said...try not to change the field-data settings unless
> you know what you are doing.  You should not enable soft references for
> field-data, this is a very bad option.  It causes excessive GC thrashing
> and is not needed (especially now that 1.0 has circuit breaker logic built
> in).  Similarly, the time expiration on field-data is typically a poor
> option, since it unnecessarily thrashes the GC too.
>
>
>
> On Wednesday, February 26, 2014 10:19:40 AM UTC-5, Hediye Delkhosh wrote:
>>
>>
>> @Binh Ly thank you for reply.
>> I don't know where should I insert this configs.
>> I've installes FOS-elastica in my project, there is a elasticsearch.yml
>> in my project and another one is in /etc/elasticsearch. where should I
>> insert configs?
>>
>> I've tested server memory usage. Elasticserach service use memory alot.
>>
>> Thank you :)
>>
>> On Wednesday, February 26, 2014 6:29:04 PM UTC+3:30, Binh Ly wrote:
>>>
>>> For ES 1.0, the field data settings are here:
>>>
>>> http://www.elasticsearch.org/guide/en/elasticsearch/
>>> reference/current/index-modules-fielddata.html
>>>
>>> The filter cache settings are here:
>>>
>>> http://www.elasticsearch.org/guide/en/elasticsearch/
>>> reference/current/index-modules-cache.html
>>>
>>> The easiest way is to set these in the elasticsearch.yml file.
>>>
>>> I would caution that you'll probably need to first investigate and
>>> understand your memory usage before trying to change any of these settings.
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/325c57fb-7cef-4805-8708-c59ef14f6d73%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZFCJxuRZWZsg69tmHcWmFX0bn5YBxjHAqTS%2BMYirHYeA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [hadoop] Store GeoJson using PIG

Can you post your Pig script and your index mapping? Note that for non-native types (Pig doesn't support IP or geo) you 
should define the mapping in advanced in ES since es-hadoop will map it as strings as otherwise.


On 2/5/2014 8:30 PM, Dumitru Pascu wrote:

Anyone tried this before?

Regards,
Dumitru

On Monday, February 3, 2014 9:49:10 PM UTC+2, Dumitru Pascu wrote:

How can I store GeoJson using PIG?

I tried to use TOBAG(lon, lat), however I got in the location stored in the 
following form:

location: [
{
0: 23.80323889
}
{
 0: 44.31903611
}
]

Regards,
Dumitru

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0d4c090b-8ae3-4a00-bf11-8cbf8a4b5da0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/530E65BF.8040503%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Exact phrase match - city names example

Greg, to add to Colin's reply, try not_analyzed, or if you want 
case-insensitive searches, then you can do a custom analyzer consisting of 
keyword tokenizer + lowercase filter. You might also be interested in the 
multi fields feature if you want to search on the same field in many 
different ways:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#_multi_fields_3

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cf786122-8d78-4fd4-8548-31a999e6bfbd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [hadoop] Push _id to ES via PIG ESStorage


Yes, see
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.html#_mapping
 and
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/pig.html

Cheers,

On 1/3/2014 12:24 AM, Dumitru Pascu wrote:

Hi,

Is it possible to push the _id field via ESStorage / PIG towards the ES cluster?

Thanks,
Dumitru

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4bbd1115-390a-4fcd-91c5-4836499754e1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/530E6537.7030900%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: EsRejectedExecutionException when searching date based indices.

Would love to hear a success story anytime. :)

On Wednesday, February 26, 2014 4:58:03 PM UTC-5, Alex Clark wrote:
>
> Finally, just as a data point, we're really indexing 750M records x 365 
> days a year x 7 years which gives 1,916,250,000,000 documents for the ES 
> cluster to chew on.  It'll definitely be a good test of the technology and 
> interesting to see how the performance holds!  It's maybe even a good 
> customer success story to put on the elasticsearch website if all goes 
> well.  ;-)
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/21ccd023-836e-478e-be0f-bd952f221a73%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: EsRejectedExecutionException when searching date based indices.

2014-02-26 Thread Alex Clark

Thank you for the link, it's very helpful.  The reason I chose 20 per daily 
index was because each day would hold around 750 million documents (each 
with just under 1000 fields).  This seemed like a fairly high data 
requirement that would require many nodes.  

If I only have one shard and one replica, then I'll have 365 x 2 = 720 
total shards per year.  If I run them on a 10 node cluster, then will the 
shards be allocated evenly (72 shards per node) even though it is really 2 
shards per index per node (and 365 indices)?  If I then need to grow the 
cluster to 20 servers, will the collection automatically re-balance in a 
reasonable time? That's a lot of data for the cluster to move!  My main 
goal is to be able to add hardware to the cluster if needed without 
re-indexing 750M x 365 = 273,750,000,000 documents (each with 1000 fields) 
since this could take a considerably long time to do.  Also, is it 
reasonable to expect high performance out of a single shard index with 750M 
records each with 1000 fields?

Finally, just as a data point, we're really indexing 750M records x 365 
days a year x 7 years which gives 1,916,250,000,000 documents for the ES 
cluster to chew on.  It'll definitely be a good test of the technology and 
interesting to see how the performance holds!  It's maybe even a good 
customer success story to put on the elasticsearch website if all goes 
well.  ;-)

On Wednesday, February 26, 2014 9:13:02 AM UTC-8, Jörg Prante wrote:
>
> I think you have a misconception about shard over-allocation and 
> re-indexing, so you should read
>
> https://groups.google.com/d/msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ
>
> where kimchy explains how over-allocation of shards work.
>
> If you have time-series indexes, you need not 20 shards per day, just in 
> fear to be able to stretch out to 20 nodes in the future. That is only true 
> for single, static, non-time-series indexes. With index aliasing and 
> routing applied to time-series data, 1 shard (+1 replica) per day might be 
> enough (maybe some more like 2 or 3, or more replica, it depends on 
> balancing out indexing and search load). For a year with a shard per day, 
> you will end up in 365 shards plus 365 replica shards which is quite a 
> handful, and in theory enough to distribute over 365 nodes. If shards start 
> to get tight on resources, use index aliasing and routing. Or just add 
> nodes, and ES will automatically redistribute the existing shards to become 
> happy again. No re-indexing at all.
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8edd9cfe-2856-4dcf-9ffb-7a5833b80fcb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to generate ES index in the hadoop

On 2/26/2014 11:26 PM, drew dahlke wrote:

Hi Costin,

We're very interested offline processing as well. To draw a parallel to HBase, 
you could write a hadoop job that writes
out to a table over the thrift API. However if you're going to load in many 
terrabytes of data, there's the option to
write out directly to the HTable file format and bulk load the file into your 
cluster once generated. Bulk loading was
orders of magnitude faster than the HTTP based API.

Writing out lucene segments from a hadoop job is nothing new (check out the 
katta project
http://katta.sourceforge.net/). I saw that ES has snapshot backup/restore in 
the pipeline. It'd be fantastic if we could
write hadoop jobs that output data in the same format as ES backups and then 
use the restore functionality in ES to bulk
load the data directly without having to go through a REST API. I feel like 
that would be faster and it would provide
the flexibility to scale out the hadoop cluster independently of the ES cluster.

If you are concerned about indexing data directly into a live cluster, you could just have a different, staging one 
setup (with a complete topology as well) to which you can index data. Then do a snapshot (potentially to HDFS) and then 
load the data into your live one.

This is already supported - see this blog post [1].

Note that ES uses Lucene internally but the segments are just some part of its 
internal metadata.

Recreating an ES index directly into a job means hacking and reimplementing a lot of the distributed work that ES is 
already doing for you without a lot (if any) performance gains:
- each job/task would have its own ES instance.  This means a 1:1 mapping between the job tasks and ES nodes which is a 
waste of resources.
- each ES instance would rely on the running task/job machine. This can overload the hardware since you are forced to 
co-locate the two whether you want it or not.
- at the end each ES instance would have to export its data somehow. Since each node only gets some chunk of the data, 
the indices would have to be aggregated.
This implies significant I/O (since you are moving the same data multiple times) and at least twice the amount of disk 
space.  The network I/O gets significantly amplified when using HDFS for indexing since the disk is not local; with ES 
you can chose to use HDFS or not (for best performance I would advise against that).

Consider here the allocation of data within ES shards/nodes (based on the cluster topology + user settings). For the 
most part, this will be similar to another reindexing.

The current approach of es-hadoop has none of these issues and all the benefits. You can scale Hadoop or ES independent 
of each other - your job can have 10s (or 100s in some cases) of tasks that are streaming data to an ES cluster of 5-10 
beefy nodes. You can start with ES co-located on the same physical machines as Hadoop and, as you grow move some or all 
the nodes to a different setup.
Since es-hadoop parallelizes _both_ reads and writes, the hadoop job gets full access to the ES cluster; the bigger the 
target index is, the more shards it can talk to in parallel.

Additionally, there's minimal I/O - we only move the data needed _once_ to ES.

If you have a performance problem caused by es-hadoop, I'd be happy to look at 
the numbers/stats.

Hope this helps,

[1] http://www.elasticsearch.org/blog/elasticsearch-hadoop-1-3-m2/

On Saturday, June 22, 2013 10:18:57 AM UTC-4, Costin Leau wrote:

I'm not sure what you mean by "offline in Hadoop"...
Indexing the data requires ES or you could try and replicate it manually 
but I would argue you'll end up duplicating
the
work done in ES.
You could potentially setup a smaller (even one node) ES cluster just for 
indexing in parallel or collocated with your
Hadoop cluster - you could use this to do the indexing and then copy the 
indexes over to the live cluster.
That is, you'll have two ES clusters: one for staging/indexing and another 
one for live/read-only data...

On 21/06/2013 9:27 PM, Jack Liu wrote:
> Thanks Costin,
>
> I am afraid that I am not allowed to use it ( or any API), because of the 
cluster policy. What I am looking for is to
> complete the indexing part entirely offline
> in the hadoop, is it feasible though?
>
>
>
> On Friday, June 21, 2013 10:47:25 AM UTC-7, Costin Leau wrote:
>
> Have you looked at Elasticsearch-Hadoop [1] ? You can use it to 
stream data to/from ES to/from Hadoop.
>
> [1]https://github.com/elasticsearch/elasticsearch-hadoop/ 

>
>
> On 21/06/2013 8:38 PM, Jack Liu wrote:
> > Hi all,
> >
> > I am new to ES, and we have large set of data need to be indexed 
into ES cluster daily (there is n

Re: Exact phrase match - city names example

2014-02-26 Thread thale jacobs

I am having problem a similar problem too.  Here is how I set it up the 
test index:

Create the index:
curl -s -XPUT 'localhost:9200/test' -d '{
"mappings": {
"properties": {
"name": {
"street": {
"type": "string",
"index_analyzer": "not_analyzed",
"search_analyzer": "not_analyzed",
"index" : "not_analyzed"
}
}
}
}
}'



Inert some data:
curl -s -XPUT 'localhost:9200/test/name/5' -d '{ "street": ["E Main St"]}'
curl -s -XPUT 'localhost:9200/test/name/6' -d '{ "street": ["W Main St"] }'
curl -s -XPUT 'localhost:9200/test/name/7' -d '{ "street": ["East Main Rd"] 
}'
curl -s -XPUT 'localhost:9200/test/name/8' -d '{ "street": ["West Main Rd"] 
}'
curl -s -XPUT 'localhost:9200/test/name/9' -d '{ "street": ["Main"] }'
curl -s -XPUT 'localhost:9200/test/name/10' -d '{ "street": ["Main St"] }'




--Now attempt to search for "Main"... Not "Main St", Not "East Main Rd"...I 
only want to return doc #9 - "Main"
curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
   "query":{
  "bool":{
 "must":[
{
   "match":{
  "street":{
 "query":"main",
 "type":"phrase",
 "analyzer" : "keyword"
  }
   }
}
 ]
  }
   }
}';

The best document returned is "Main", but I don't know how to filter out 
the others that are not exact matches (although they contain matching 
terms).
...
Here the results from my example above:
  "_score" : 0.2876821, "_source" : { "street": ["Main"] }
  "_score" : 0.25316024, "_source" : { "street": ["East Main Rd"] }
  "_score" : 0.25316024, "_source" : { "street": ["W Main St"] }
  "_score" : 0.25316024, "_source" : { "street": ["E Main St"]}
  "_score" : 0.1805489, "_source" : { "street": ["Main St"] }
  "_score" : 0.14638957, "_source" : { "street": ["West Main Rd"] }





On Thursday, June 14, 2012 3:38:31 PM UTC-4, Colin Dellow wrote:
>
> Does "index": "not_analyzed" not work for you (
> http://www.elasticsearch.org/guide/reference/mapping/core-types.html) ?
>
>
> On Thursday, 14 June 2012 14:02:28 UTC-4, Greg Silin wrote:
>>
>> Hi,
>> One of our fields in the index stores city names, and we need to ensure 
>> that the term is matched exactly.
>>
>> So if we have "san francisco" indexed, we need to ensure that *only* the 
>> term "san francisco" matches; "san" or "francisco" or "south san francisco" 
>> should all be misses.
>>
>> In particular, I don't have a solution on how to make sure "san 
>> francisco" does not match against "south san francisco"
>>
>> Thanks
>> -greg
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/42921778-0a92-4a57-ab6f-7f089ebe95ec%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Interesting question on Transaction Log record mutability

Thanks, I think I understand better now. I deleted my previous post so that
I can clarify better. The transaction log is just a backup mechanism for
durability. When you index a document, it eventually goes into a segment
(in memory). When you update it, the old doc is marked as deleted and then
a new one is indexed into a/the segment. If no flush/commit has been made
so far, the documents/segments are still in memory and each operation is
also recorded in the transaction log (one for the first index, and then
another for the update, and so on). When you do a flush, the in-memory
segments are then written to disk and then the transaction log is emptied
out (since we no longer need it as "backup" at this point). If on the other
hand you simply do a refresh, the "new" segments in memory are simply made
searchable (even though they are not necessarily written to disk yet) and
no flush to disk happens. In this case, the transaction log still contains
whatever it had in it so far.

So to answer your question, each update will require a new document to be
indexed (no way around it). And the transaction log is probably not
something that would matter in your scenario. I hope that helps. :)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/34b4ade0-685f-4bd5-803f-c2264f76a7d5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Migration from 0.90.10 to 1.0.1

2014-02-26 Thread Mark Walkom

What OS are you on, are you using the packaged version or the standalone
(zip)?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 February 2014 04:31, Mariano Battistessa wrote:

> Hello,
>
> I have installed the version 0.90.10 of Elasticsearch. I have large
> amounts of indexed information.
>
> How I can migrate the information to Elasticsearch 1.0.1?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/16ac0990-8243-4294-9609-876e343f9e39%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZYJwPmjeM2jLbJWavjC4_ZD2Uo00URsoRE0pbMzgO0EA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

How to absorb a lot of incremental partial updates efficiently on a denormalized record?

2014-02-26 Thread Yuri Panchenko

Guys,

I'm evaluating a denormalized data structure in ES that basically looks
like a Customer record with a lot of transactions with dollar amounts and
dates. It roughly looks like this:

{ "id": 123,
"name": "Gavin",
...
"transactions": {
"txn_uid_1" : { "date" : "02-19-2013", "amount" : $19.99 },
"txn_uid_2" : { "date" : "02-20-2013", "amount" : $23.00 }, ...
"txn_uid_N" : { "date" : "02-21-2013", "amount" : $99.99}
}
}

The transactions in particular can be coming in quite frequently and in a
batch. I would like to be able to change the document using a batch of
partial updates carrying the changes to the same document. But since the
overall document can be quite beefy (customer name, characteristics, etc.)
I would like to avoid the re-indexing of the same document per single
partial update. I would like to be able to turn off ES indexing, apply the
batch hoping the batch will be merged in the transaction log, and turn
indexing back on, hoping the total end result change will be re-indexed.
Is this not possible? If not, what is the best way to solve this type of
use case? Would you normalize the data structure into a parent Customer
with children etc??

How efficiently can ES handle many partial updates to the same document if
the document is say a few pages long with 20-30 different fields some of
which are multi-valued arrays?

Thanks so much for your input!!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b4861dbd-f0a9-42de-bc34-5e4177aa18ed%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to generate ES index in the hadoop

2014-02-26 Thread drew dahlke

Hi Costin,

We're very interested offline processing as well. To draw a parallel to 
HBase, you could write a hadoop job that writes out to a table over the 
thrift API. However if you're going to load in many terrabytes of data, 
there's the option to write out directly to the HTable file format and bulk 
load the file into your cluster once generated. Bulk loading was orders of 
magnitude faster than the HTTP based API.

Writing out lucene segments from a hadoop job is nothing new (check out the 
katta project http://katta.sourceforge.net/). I saw that ES has snapshot 
backup/restore in the pipeline. It'd be fantastic if we could write hadoop 
jobs that output data in the same format as ES backups and then use the 
restore functionality in ES to bulk load the data directly without having 
to go through a REST API. I feel like that would be faster and it would 
provide the flexibility to scale out the hadoop cluster independently of 
the ES cluster.


On Saturday, June 22, 2013 10:18:57 AM UTC-4, Costin Leau wrote:
>
> I'm not sure what you mean by "offline in Hadoop"... 
> Indexing the data requires ES or you could try and replicate it manually 
> but I would argue you'll end up duplicating the 
> work done in ES. 
> You could potentially setup a smaller (even one node) ES cluster just for 
> indexing in parallel or collocated with your 
> Hadoop cluster - you could use this to do the indexing and then copy the 
> indexes over to the live cluster. 
> That is, you'll have two ES clusters: one for staging/indexing and another 
> one for live/read-only data... 
>
>
> On 21/06/2013 9:27 PM, Jack Liu wrote: 
> > Thanks Costin, 
> > 
> > I am afraid that I am not allowed to use it ( or any API), because of 
> the cluster policy. What I am looking for is to 
> > complete the indexing part entirely offline 
> > in the hadoop, is it feasible though? 
> > 
> > 
> > 
> > On Friday, June 21, 2013 10:47:25 AM UTC-7, Costin Leau wrote: 
> > 
> > Have you looked at Elasticsearch-Hadoop [1] ? You can use it to 
> stream data to/from ES to/from Hadoop. 
> > 
> > [1] https://github.com/elasticsearch/elasticsearch-hadoop/ <
> https://github.com/elasticsearch/elasticsearch-hadoop/> 
> > 
> > On 21/06/2013 8:38 PM, Jack Liu wrote: 
> > > Hi all, 
> > > 
> > > I am new to ES, and we have large set of data need to be indexed 
> into ES cluster daily (there is no delta available, we 
> > > only have 7~8 nodes). 
> > > I know use mapper function to directly call client api should be 
> fine, however, our hadoop cluster policy does not allow 
> > > that. 
> > > So I am wondering if there is a way to just generate ES index in 
> the hadoop, and then copy them into the cluster and ES 
> > > could pick them up when reloading. 
> > > Or could anyone point me to right place in the source code that is 
> related to it. 
> > > 
> > > Any suggestion could be very helpful ! 
> > > 
> > > Many thanks 
> > > Jack 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > > To unsubscribe from this group and stop receiving emails from it, 
> send an email to 
> > >elasticsearc...@googlegroups.com . 
> > > For more options, visithttps://groups.google.com/groups/opt_out <
> https://groups.google.com/groups/opt_out>. 
> > > 
> > > 
> > 
> > -- 
> > Costin 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to 
> > elasticsearc...@googlegroups.com . 
> > For more options, visit https://groups.google.com/groups/opt_out. 
> > 
> > 
>
> -- 
> Costin 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2a32eb96-5c30-491a-a501-0a6950d1918f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to visualize statistics on time series data in Kibana

Oh yeah forgot about the datatype - that's good that you caught that. Good 
to hear!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00e68528-8744-4a43-bf3b-f42982929230%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Interesting question on Transaction Log record mutability

2014-02-26 Thread Yuri Panchenko

Thanks Binh, but I don't think you got the fullest gist of my question.  I 
want to be able to minimize reindexing of the same document too many times. 
 What I would like to do is to turn off indexing/refreshing and even 
transaction log flushing in between of the batched partial updates.  If I 
do turn off all of these mechanisms and send a batch of partial updates to 
the same document, then it seems there would be no need to reindex the 
document into Lucene segments too many times.  The whole batch could 
operate on the same document and even increment the version numbers in the 
transaction log itself.  But I think you're implying that the document 
would be reindexed into a lucene segment per partial update?  What I'm 
looking for is roughly this sequence of events:

1. document A is indexed and merged into the segment: document VERSION 1
2. turn off all indexing and transaction log flushing
3. send in a batch of changes to document A containing partial updates: { 
A', A'', A''', A }
4. transaction log operates on document A applying the partial updates above
5. modified document A now looks like A and shows document VERSION 5
6. turn on indexing and transaction log flushing
7. document A with version 5 gets merged and indexed into the segment

What I want to achieve is to absorb a lot of incremental updates to a 
document in the transaction log without re-indexing per partial update.  Is 
this possible?

Thanks!!

On Wednesday, February 26, 2014 5:52:24 AM UTC-8, Binh Ly wrote:
>
> Yes each partial update will record to the transaction log. Whenever the 
> log is flushed, each update is replayed and the document version is 
> incremented per update.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/998f00e6-42b2-44ce-a4f2-1c90e68750ff%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Too many nodes started up on some data nodes - best approach to fix?

2014-02-26 Thread Josh Harrison

I restarted my cluster the other day, but something odd stuck, resulting in 
15/16 data nodes starting up an extra ES instance in the same cluster. This 
ended badly as there were two nodes with identical display names, the 
system locked up, etc.
When restarting again, to my horror, we were missing shards. I quickly 
figured out that the missing shards had gotten moved into the second 
instance storage location.
What is the best way to resolve this? Should we either spawn second ES 
instances on the culprit machines (with different instance names), or can a 
simple 
mv escluster/nodes/1/indices/data1/* escluster/nodes/0/indices/data1/ 
do the job?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5ec79698-5ea3-4ba9-a81d-0665a23a9bd5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to visualize statistics on time series data in Kibana

2014-02-26 Thread David Snigier Jr.

That did the trick! I was able to keep the spaces in the field name, but
did need to cast the field to a float in logstash for the metric to work.

Really loving how quickly valuable data hidden in the logs can be drawn out
and visualized with logstash+elasticsearch+kibana. Props to y'all for
making it happen.

Thanks!
-Dave


On Wed, Feb 26, 2014 at 9:52 AM, Binh Ly  wrote:

> When you add a Histogram panel, look in the setting Chart Value. There are
> options for max and mean in there and then in the Value Field, you can
> specify "scan duration" (or "connect duration") - I'm not 100% sure if the
> spaces in your field name might fail but if it does, you'll probably need
> to fix your LS config to output field names with no spaces. The only
> limitation right now is you can't plot multiple time series stats (Chart
> Value + Value Field) in 1 histogram at the moment. So you'll need to create
> separate histograms per Chart Value + Value Field.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/mc7bmixJGe8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/80167728-cacd-4be1-829b-9fc2abb1ab3a%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAA%3DO8vL7m1CJ9GRrmnLyJOHmWKKNoFgQc2vp-0b6pBC8ZmdHXw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?

2014-02-26 Thread Daniel Winterstein

Sorry Ivan! I'm not having much luck on this thread.
Daniel

Sent from my phone. Please excuse the brevity.
On 26 Feb 2014 01:58, "Ivan Brusic"  wrote:

> Luke? :)
>
>
> On Tue, Feb 25, 2014 at 1:09 PM, Daniel Winterstein <
> daniel.winterst...@gmail.com> wrote:
>
>> Dear Hariharan, Alex, Luke,
>>
>> My apologies. You're quite right. The information is there -- I just
>> didn't read far enough down.
>>
>> Thank you for your help & persistence.
>>
>> Best regards,
>>  - Daniel
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAEmLStnHQCUuMPJHhbcoq8_iQgFX%3D22t9%3DS9gOwWC7C1OtDToA%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/qER5uOq2A20/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD%3Dk0htmXcEwXBBB4T%2BwqNAyA_fOz41DX5cinf3aYsQGg%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEmLSt%3DjVdN_j1GQ3BYbpW0REeTOSLOJdH39UxCTmy7mnsbpEg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch 1.0.0 is now GA

2014-02-26 Thread Tony Su

Cool,
I'll try that next.
 
Thx,
Tony
 

On Tuesday, February 25, 2014 7:56:34 AM UTC-8, InquiringMind wrote:

> I always start Elasticsearch from within my own wrapper script, es.sh.
>
> Inside this wrapper script is the following incantation:
>
> NODE_OPT="-D*es.node.name *=$(uname -n | cut -d'.' 
> -f1)"
>
> This is verified to work on Linux, Mac OS X, and Solaris (at least).
>
> I then pass $NODE_OPT as a command-line argument to the elasticsearchstart-up 
> script.
>
> BTW, I seem to recall reading that the "*es.*" prefix on the 
> node.namevariable is no longer needed for 1.0 GA. But it still works fine, so 
> I have 
> left it there.
>
> This has always worked since ES 0.19.4 (the very first version I installed 
> and started using). I worked closely with our deployment engineer, and we 
> settled on a set of wrapper scripts that let me start everything on my 
> laptop in exactly the same way that it all starts on a production server.
>
> Brian
>
> On Tuesday, February 25, 2014 10:21:29 AM UTC-5, Tony Su wrote:
>>
>> One other issue.
>>  
>> I have never been able to deploy an elasticsearch.yml which names the 
>> cluster node the same as the machine hostname despite the suggestions in 
>> another thread. It just won't work, and based on another thread I strongly 
>> suspect the underlying Java code implements single quotes instead of double 
>> quotes when evaluating the variable. 
>>  
>> So, because it's a unique variable that needs to be set on each machine, 
>> that part of the config won't allow simply pointing all nodes to the same 
>> config script.
>>  
>> Is why, short of looking for the error in the Java code I've been looking 
>> at various simple and more enterprise tools that write individual config 
>> files to each node.
>>  
>> Tony
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b154d115-912d-4293-939d-5a88282d652b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch 1.0.0 is now GA

2014-02-26 Thread Tony Su

It's not working for me with or without any quotes.
 
 I'm not just doing some kind of incredible User error, I'm not talking 
about the User inserting quotes (or not)... I'm talking about the 
underlying Java code which accepts the input.
 
Although I can't think of how this would have anything to do with the 
distro, I've experienced the same on both CentOS and openSUSE nodes.
 
Tony
 
 

On Tuesday, February 25, 2014 2:16:51 PM UTC-8, Ivan Brusic wrote:

> I do not use quotes at all. Simply:
>
> node.name: ${HOSTNAME}
>
> -- 
> Ivan
>
>
> On Tue, Feb 25, 2014 at 7:56 AM, InquiringMind 
> 
> > wrote:
>
>> I always start Elasticsearch from within my own wrapper script, es.sh.
>>
>> Inside this wrapper script is the following incantation:
>>
>> NODE_OPT="-D*es.node.name *=$(uname -n | cut -d'.' 
>> -f1)"
>>
>> This is verified to work on Linux, Mac OS X, and Solaris (at least).
>>
>> I then pass $NODE_OPT as a command-line argument to the 
>> elasticsearchstart-up script.
>>
>> BTW, I seem to recall reading that the "*es.*" prefix on the 
>> node.namevariable is no longer needed for 1.0 GA. But it still works fine, 
>> so I have 
>> left it there.
>>
>> This has always worked since ES 0.19.4 (the very first version I 
>> installed and started using). I worked closely with our deployment 
>> engineer, and we settled on a set of wrapper scripts that let me start 
>> everything on my laptop in exactly the same way that it all starts on a 
>> production server.
>>
>> Brian
>>
>>
>> On Tuesday, February 25, 2014 10:21:29 AM UTC-5, Tony Su wrote:
>>>
>>> One other issue.
>>>  
>>> I have never been able to deploy an elasticsearch.yml which names the 
>>> cluster node the same as the machine hostname despite the suggestions in 
>>> another thread. It just won't work, and based on another thread I strongly 
>>> suspect the underlying Java code implements single quotes instead of double 
>>> quotes when evaluating the variable. 
>>>  
>>> So, because it's a unique variable that needs to be set on each machine, 
>>> that part of the config won't allow simply pointing all nodes to the same 
>>> config script.
>>>  
>>> Is why, short of looking for the error in the Java code I've been 
>>> looking at various simple and more enterprise tools that write individual 
>>> config files to each node.
>>>  
>>> Tony
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5af309d0-22b5-4809-907d-92b099b36632%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba4895d3-fc0f-4185-8dec-0ad4f74b24b2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Kibana empty after upgrade to ES 1.0.1

2014-02-26 Thread Terry Healy

I had to modify my URL back to point to 
Glassfish: http://192.168.4.254:8080/#/dashboard/file/guided.json


On Wednesday, February 26, 2014 10:26:02 AM UTC-5, Terry Healy wrote:
>
> I just upgraded all my ES systems to 1.0.1 and they seem to be working 
> fine - except for Kibana 3. I had installed Kibana 3 Milestone pre-5 
> (8512132). Previously I was using it just with _all enabled. Now when I 
> attempt to use a filter for "*" and a time filter for the past hour or so, 
> it lists none of my 3 active indices. 
>
> Prior to the upgrades, my bookmark went to 
> http://192.168.4.254:8080/#/dashboard/file/guided.json. The Kibana 
> installation instructions seem to be telling me that this URL should now be 
> just  http://192.168.4.254:9200, which just gives me node status (below) 
> so I must be misunderstanding something.
>
> {
>   "status" : 200,
>   "name" : "t5",
>   "version" : {
> "number" : "1.0.1",
> "build_hash" : "5c03844e1978e5cc924dab2a423dc63ce881c42b",
> "build_timestamp" : "2014-02-25T15:52:53Z",
> "build_snapshot" : false,
> "lucene_version" : "4.6"
>   },
>   "tagline" : "You Know, for Search"
> }
>
>
> In config.js, my elasticsearch is set as:
>
>
> /** @scratch /configuration/config.js/5
>  *  elasticsearch
>  *
>  * The URL to your elasticsearch server. You almost certainly don't
>  * want +http://localhost:9200+ here. Even if Kibana and Elasticsearch 
> are on
>  * the same host. By default this will attempt to reach ES at the same 
> host you have
>  * kibana installed on. You probably want to set it to the FQDN of your
>  * elasticsearch host
>  */
> elasticsearch: "http://192.168.4.254:9200";,
>
>
> Thanks for any suggestions.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28d253e3-495f-4938-a8bb-7c11307615e9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Migration from 0.90.10 to 1.0.1

2014-02-26 Thread Mariano Battistessa

Hello, 

I have installed the version 0.90.10 of Elasticsearch. I have large amounts 
of indexed information. 

How I can migrate the information to Elasticsearch 1.0.1?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/16ac0990-8243-4294-9609-876e343f9e39%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: BigDecimal support

2014-02-26 Thread joergpra...@gmail.com

ES accepts BigDecimal input. You can specify scale and rounding mode to
format the BigDecimal.

https://github.com/jprante/elasticsearch/commit/8ef8cd149b867e3e45bc3055dfd6da80e4e9c7ec

Internally, BigDecimal is automatically converted to a JSON string if the
number does not fit into double format. Because numbers are useful in
Lucene for range searches, they have an advantage.

But I agree, another option could be to enforce string conversion in any
case, for example storing currency values as strings for financial
services, without arithmetic operations in the index.

Maybe the toEngineeringString() was not a smart decision and
toPlainString() works better.

So I would welcome improvements, or should I suggest one in a pull request?

Jörg



On Wed, Feb 26, 2014 at 6:05 PM, mooky  wrote:

> In financial services space, we almost never use float/double in our
> domain - we always use BigDecimal.
>
> In elastic, I would like to be able to index/store BigDecimal in a
> lossless manner (ie what I get back from _source has the same precision,
> etc as what I put in).
>
> When I have had to preserve the json serialisation of BigDecimal, I have
> usually had custom serialiser/deserialisers that printed it out as a json
> number - but whose textual value was toPlainString(). When deserialising,
> creating the BigDecimal with the string value (e.g. '42.5400') maintained
> the precision that was originally serialised
> e.g.
>
> {
>   verySmallNumber : 0.012000,
>   otherNumber : 42.5400
> }
>
> Perhaps elastic could index bigdecimal as a double - but store it in the
> source in a lossless fashion.
> It would require a user setting, I guess, to treat all floating point
> numbers as BigDecimal.
>
> Thoughts?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b54dfd5a-3a0e-4946-aa5f-28b3794a92ac%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGBKpPHFN%3DoFFka%3Dk%3Dtk%3DOLmSqB9kbY0RSOC0nM4C5Lww%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: EsRejectedExecutionException when searching date based indices.

2014-02-26 Thread joergpra...@gmail.com

I think you have a misconception about shard over-allocation and
re-indexing, so you should read

https://groups.google.com/d/msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

where kimchy explains how over-allocation of shards work.

If you have time-series indexes, you need not 20 shards per day, just in
fear to be able to stretch out to 20 nodes in the future. That is only true
for single, static, non-time-series indexes. With index aliasing and
routing applied to time-series data, 1 shard (+1 replica) per day might be
enough (maybe some more like 2 or 3, or more replica, it depends on
balancing out indexing and search load). For a year with a shard per day,
you will end up in 365 shards plus 365 replica shards which is quite a
handful, and in theory enough to distribute over 365 nodes. If shards start
to get tight on resources, use index aliasing and routing. Or just add
nodes, and ES will automatically redistribute the existing shards to become
happy again. No re-indexing at all.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHoPHqx6GZoPw2QFqjRhc%2BS0AX93fe1WBuwFp_0ZA08NQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Removing elasticsearch logs

2014-02-26 Thread computer engineer


thanks. Will look into that. Do you perhaps know where the directory is 
that stores all these messages or log files

On Wednesday, February 26, 2014 8:10:09 AM UTC-5, Binh Ly wrote:
>
> There is currently discussion around this, but in the meantime, try this 
> to see if it helps:
>
> https://github.com/elasticsearch/curator
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/184a675d-8280-46b0-81c4-88c96116b05a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

BigDecimal support

2014-02-26 Thread mooky

In financial services space, we almost never use float/double in our domain 
- we always use BigDecimal.

In elastic, I would like to be able to index/store BigDecimal in a lossless 
manner (ie what I get back from _source has the same precision, etc as what 
I put in).

When I have had to preserve the json serialisation of BigDecimal, I have 
usually had custom serialiser/deserialisers that printed it out as a json 
number - but whose textual value was toPlainString(). When deserialising, 
creating the BigDecimal with the string value (e.g. '42.5400') maintained 
the precision that was originally serialised
e.g.

{
  verySmallNumber : 0.012000,
  otherNumber : 42.5400
}

Perhaps elastic could index bigdecimal as a double - but store it in the 
source in a lossless fashion.
It would require a user setting, I guess, to treat all floating point 
numbers as BigDecimal.

Thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b54dfd5a-3a0e-4946-aa5f-28b3794a92ac%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: EsRejectedExecutionException when searching date based indices.

2014-02-26 Thread Alex Clark

That is correct, I was mixing the terms "nodes" and "shards" (sorry about 
that).  I'm running the test on a single node (machine).  I've chosen 20 
shards so we could eventually go to a 20 server cluster without 
re-indexing.  It's unlikely we'll ever need to go that high but we never 
know and given we receive 750 million messages a day, the thought of 
reindexing after collecting a years worth of data makes me nervous.  If I 
can "over shard" and avoid a massive reindex then I'll be a happy guy.

I thought about reducing the 20 shards but even if I go to say 5 shards on 
5 machines (1 shard per machine?) then I'll still run into the issue if a 
user searches several years back.  Any other thoughts on a possible 
solution?  Would increasing the queue size be a good option.  Is there a 
down side (performance hit, running out of resources, etc)?

Thanks again!

On Tuesday, February 25, 2014 11:32:26 PM UTC-8, David Pilato wrote:
>
> You are mixing nodes and shards, right?
> How many elasticsearch nodes do you have to manage your 7300 shards?
> Why did you set 20 shards per index?
>
> You can increase the queue size in elasticsearch.yml but I'm not sure it's 
> the right thing to do here.
>
> My 2 cents
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 26 févr. 2014 à 01:36, Alex Clark > a 
> écrit :
>
> Hello all, I’m getting failed nodes when running searches and I’m hoping 
> someone can point me in the right direction.  I have indices created per 
> day to store messages.  The pattern is pretty straight forward: the index 
> for January 1 is "messages_20140101", for January 2 is "messages_20140102" 
> and so on.  Each index is created against a template that specifies 20 
> shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have 
> recently upgraded to ES 1.0.
>
> When I search for all messages in a year (either using an alias or 
> specifying “messages_2013*”), I get many failed nodes.  The reason given 
> is: “EsRejectedExecutionException[rejected execution (queue capacity 
> 1000) on 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”).
>   
> The more often I search, the fewer failed nodes I get (probably caching in 
> ES) but I can’t get down to 0 failed nodes.  I’m using ES for analytics so 
> the document counts coming back have to be accurate. The aggregate counts 
> will change depending on the number of node failures.  We use the Java API 
> to create a local node to index and search the documents.  However, we also 
> see the issue if we use the URL search API on port 9200.
>
> If I restrict the search for 30 days then I do not see any failures (it’s 
> under 1000 nodes so as expected).  However, it is a pretty common use case 
> for our customers to search messages spanning an entire year.  Any 
> suggestions on how I can prevent these failures?  
>
> Thank you for your help!
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/954f7266-6587-4509-8159-aae5897dc2b6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Single thread with high CPU usage

2014-02-26 Thread Nikolas Everett

Check to see how much GC you are doing when it spikes.  If it is high, try
to clear the cache:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-clearcache.html
I'd try clearing each cache one at a time to see which one helps.  If that
is the problem you can configure Elasticsearch to limit the size of those
caches to some percent of heap.

Nik


On Wed, Feb 26, 2014 at 11:41 AM, Magnus Hyllander <
magnus.hyllan...@gmail.com> wrote:

>  I have an ES 0.90.11 cluster with three nodes (d0, d1, d2), with 4 cores
> and 7GB memory, running Ubuntu and JDK 7u45. The ES instances are all
> master+data, configured with 3.5GB heap size. They are pretty much running
> a vanilla configuration. Logstash is currently storing on average 200 logs
> per second to the cluster, and we use kibana as a frontend. Usually when
> teh cluster is started the nodes run at around 20% cpu. However after some
> time, one or more of the nodes will jump up to around 90-100% cpu. And
> there they stay for what appears to be forever (until I tire and restart
> them).
>
> Using "top -H" I can see that there is one thread in each elasticsearch
> process that is using most of the cpu. Here are examples from two of the
> nodes:
>
> Node d1:
>
>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
> 41969 elastic   20   0 5814m 3.5g  11m R  82.8 52.0   1036:30 java
>  45601 elastic   20   0 5814m 3.5g  11m S  31.9 52.0  23:02.45 java
> 41965 elastic   20   0 5814m 3.5g  11m S  19.1 52.0  25:25.97 java
> 41966 elastic   20   0 5814m 3.5g  11m S  12.7 52.0  25:25.95 java
> 41967 elastic   20   0 5814m 3.5g  11m S  12.7 52.0  25:23.10 java
> 41968 elastic   20   0 5814m 3.5g  11m S  12.7 52.0  25:23.27 java
> 45810 elastic   20   0 5814m 3.5g  11m S   6.4 52.0  22:59.55 java
>
> Node d2:
>
>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
> 40604 elastic   20   0 5812m 3.6g  11m R  99.9 53.2 926:23.96 java
> 41487 elastic   20   0 5812m 3.6g  11m S   6.5 53.2   4:35.11 java
> 42443 elastic   20   0 5812m 3.6g  11m S   6.5 53.2  47:03.65 java
> 42446 elastic   20   0 5812m 3.6g  11m S   6.5 53.2  47:05.12 java
> 42447 elastic   20   0 5812m 3.6g  11m S   6.5 53.2  46:38.30 java
> 31827 elastic   20   0 5812m 3.6g  11m S   6.5 53.2   0:00.59 java
>
> As you can see there is one thread in each process that seems to be
> running amok.
>
> I have tried to use the _nodes/hot_threads API to see which thread is
> using the cpu, but I can't identify any single thread with the same cpu
> percentage that top reports. In addition, I have tried using jstack to dump
> the threads, but the stack dump doesn't even list the thread with the
> thread PID from top.
>
> Here are a couple of charts showing the cpu user percentage:
>
>
> 
>
>
> As you can see all the nodes went from 20% to 100% at around 3 PM. At
> midnight I got tired of waiting and restarted ES, one node at a time.
>
> The next chart is from some hours later:
>
>
> 
>
>
> In this case the nodes' cpu usage increased at different points in time.
>
> Cpu iowait remains low (5-10%) the whole time.
>
> I'm thinking that maybe this behavior is triggered by large queries, but I
> don't have a specific test case that triggers it.
>
> So, what can I do to find out what is going on? Any help would be greatly
> appreciated!
>
> Regards,
> Magnus Hyllander
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6af1c79d-8402-4de6-9ec2-07893c6b54f2%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2w7MEmj7HhC8uBbrvqgKyESXvMeSg7yzMcVTeRZXAi7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Single thread with high CPU usage

2014-02-26 Thread Magnus Hyllander

I have an ES 0.90.11 cluster with three nodes (d0, d1, d2), with 4 cores
and 7GB memory, running Ubuntu and JDK 7u45. The ES instances are all
master+data, configured with 3.5GB heap size. They are pretty much running
a vanilla configuration. Logstash is currently storing on average 200 logs
per second to the cluster, and we use kibana as a frontend. Usually when
teh cluster is started the nodes run at around 20% cpu. However after some
time, one or more of the nodes will jump up to around 90-100% cpu. And
there they stay for what appears to be forever (until I tire and restart
them).

Using "top -H" I can see that there is one thread in each elasticsearch
process that is using most of the cpu. Here are examples from two of the
nodes:

Node d1:

PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND
41969 elastic 20 0 5814m 3.5g 11m R 82.8 52.0 1036:30 java
45601 elastic 20 0 5814m 3.5g 11m S 31.9 52.0 23:02.45 java
41965 elastic 20 0 5814m 3.5g 11m S 19.1 52.0 25:25.97 java
41966 elastic 20 0 5814m 3.5g 11m S 12.7 52.0 25:25.95 java
41967 elastic 20 0 5814m 3.5g 11m S 12.7 52.0 25:23.10 java
41968 elastic 20 0 5814m 3.5g 11m S 12.7 52.0 25:23.27 java
45810 elastic 20 0 5814m 3.5g 11m S 6.4 52.0 22:59.55 java

Node d2:

PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND
40604 elastic 20 0 5812m 3.6g 11m R 99.9 53.2 926:23.96 java
41487 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 4:35.11 java
42443 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 47:03.65 java
42446 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 47:05.12 java
42447 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 46:38.30 java
31827 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 0:00.59 java

As you can see there is one thread in each process that seems to be
running amok.

I have tried to use the _nodes/hot_threads API to see which thread is using
the cpu, but I can't identify any single thread with the same cpu
percentage that top reports. In addition, I have tried using jstack to dump
the threads, but the stack dump doesn't even list the thread with the
thread PID from top.

Here are a couple of charts showing the cpu user percentage:

As you can see all the nodes went from 20% to 100% at around 3 PM. At
midnight I got tired of waiting and restarted ES, one node at a time.

The next chart is from some hours later:

In this case the nodes' cpu usage increased at different points in time.

Cpu iowait remains low (5-10%) the whole time.

I'm thinking that maybe this behavior is triggered by large queries, but I
don't have a specific test case that triggers it.

So, what can I do to find out what is going on? Any help would be greatly
appreciated!

Regards,
Magnus Hyllander

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6af1c79d-8402-4de6-9ec2-07893c6b54f2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Queue capacity and EsRejectedExecutionException leads to loss of data

2014-02-26 Thread David Pilato

I think that adding a comment into the existing issue would be fine.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 26 février 2014 à 17:00:12, Thomas (thomas.bo...@gmail.com) a écrit:

Thanks David,

So this is a rabbitMQRiver issue, is there a need to open a separate issue?
(Never done the procedure, will look this one)

Thomas

On Wednesday, 26 February 2014 15:48:55 UTC+2, Thomas wrote:
Hi,

We have installed the RabbitMQ river plugin to pull data from our Queue and
adding them to ES. The thing is that at some point we are receiving the
following exception and we have as a result to lose data.

[1775]: index [events-idx], type [click], id
[3f6e4604146b435aabcf4ea5a493fd32], message
[EsRejectedExecutionException[rejected execution (queue capacity 50) on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@12843ca2]]

We have changed the configuration of queue size to 1000 and the problem
disappeared.

My question is that is there any configuration/way to tell ES to instead of
throwing this exception and discarding the document to wait for available
resources (with the corresponding performance impact)?

Thanks

Thomas

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/03dcb0ea-2b6a-478b-b678-f52ecbc09298%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.530e1318.440badfc.5e46%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Hadoop] ERROR security.UserGroupInformation: PriviledgedActionException as:hue (auth:SIMPLE) cause:BeeswaxException

Looking at the stacktrace the issue seems to be caused by Beeswax/Hive, independent from es-hadoop (which doesn't appear 
in the stacktrace).



On 2/26/2014 3:53 PM, Yann Barraud wrote:

Hi,

Still struggling to get Es + Hadoop smoothly working together.

Anyone facing this type of issue ?

ERROR security.UserGroupInformation: PriviledgedActionException as:hue 
(auth:SIMPLE) cause:BeeswaxException




14/02/26 05:42:34 INFO Configuration.deprecation: 
mapred.input.dir.recursive is deprecated. Instead, use 
mapreduce.input.fileinputformat.input.dir.recursive
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO parse.ParseDriver: Parsing command: use default
14/02/26 05:42:34 INFO parse.ParseDriver: Parse Completed
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: Semantic Analysis Completed
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: Returning Hive schema: 
Schema(fieldSchemas:null, properties:null)
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: Starting command: use default
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
OK
14/02/26 05:42:34 INFO ql.Driver: OK
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE 
TABLE eslogs SELECT s.time, s.ext, s.ip, s.req, s.res, s.agent FROM logs s
14/02/26 05:42:34 INFO parse.ParseDriver: Parse Completed
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Completed phase 1 of 
Semantic Analysis
14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Get metadata for source 
tables
14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Get metadata for subqueries
14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Get metadata for destination 
tables
14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Completed getting MetaData 
in Semantic Analysis
14/02/26 05:42:34 INFO ppd.OpProcFactory: Processing for FS(9)
14/02/26 05:42:34 INFO ppd.OpProcFactory: Processing for SEL(8)
14/02/26 05:42:34 INFO ppd.OpProcFactory: Processing for TS(7)
14/02/26 05:42:34 INFO physical.MetadataOnlyOptimizer: Looking for table 
scans where optimization is applicable
14/02/26 05:42:34 INFO physical.MetadataOnlyOptimizer: Found 0 metadata 
only table scans
14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Completed plan generation
14/02/26 05:42:34 INFO ql.Driver: Semantic Analysis Completed
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: Returning Hive schema: 
Schema(fieldSchemas:[FieldSchema(name:time, type:string, comment:null), 
FieldSchema(name:ext, type:string, comment:null), FieldSchema(name:ip, 
type:string, comment:null), FieldSchema(name:req, type:string, comment:null), 
FieldSchema(name:res, type:int, comment:null), FieldSchema(name:agent, 
type:string, comment:null)], properties:null)
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE 
eslogs SELECT s.time, s.ext, s.ip, s.req, s.res, s.agent FROM logs s
Total MapReduce jobs = 1
14/02/26 05:42:34 INFO ql.Driver: Total MapReduce jobs = 1
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
14/02/26 05:42:34 INFO ql.Driver: 
Launching Job 1 out of 1
14/02/26 05:42:34 INFO ql.Driver: Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
14/02/26 05:42:34 INFO exec.Task: Number of reduce tasks is set to 0 since 
there's no reduce operator
14/02/26 05:42:34 INFO ql.Context: New scratch dir is 
hdfs://sandbox.hortonworks.com:8020/tmp/hive-beeswax-hue/hive_2014-02-26_05-42-34_494_4288001753889524446-3
14/02/26 05:42:34 INFO Configuration.deprecation: 
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.reduce.speculative
14/02/26 05:42:34 INFO mr.ExecDriver: Using 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
14/02/26 05:42:34 INFO exec.Utilities: Processing alias s
14/02/26 05:42:34 INFO exec.Utilities: Adding input file 
hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/logs
14/02/26 05:42:34 INFO exec.Utilities: Content Summary not cached for 
hdfs://sandbox.hortonw

Re: Queue capacity and EsRejectedExecutionException leads to loss of data

2014-02-26 Thread Thomas

Thanks David,

So this is a rabbitMQRiver issue, is there a need to open a separate issue? 
(Never done the procedure, will look this one)

Thomas

On Wednesday, 26 February 2014 15:48:55 UTC+2, Thomas wrote:
>
> Hi,
>
> We have installed the RabbitMQ river plugin to pull data from our Queue 
> and adding them to ES. The thing is that at some point we are receiving the 
> following exception and we have as a result to *lose data*.
>
> [1775]: index [events-idx], type [click], id 
>> [3f6e4604146b435aabcf4ea5a493fd32], message 
>> [EsRejectedExecutionException[rejected execution (queue capacity 50) on 
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@12843ca2]]
>
>
> We have changed the configuration  of queue size to 1000 and the problem 
> disappeared. 
>
> My question is that is there any configuration/way to tell ES to instead 
> of throwing this exception and discarding the document to wait for 
> available resources (with the corresponding performance impact)?
>
> Thanks
>
> Thomas
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/03dcb0ea-2b6a-478b-b678-f52ecbc09298%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: What's wrong with my match query in Java API?

Hmmm, not sure. I tired this and it worked for me:

  q = QueryBuilders.matchQuery("tvName", "决战华岩寺")
  .minimumShouldMatch("1")
  .operator(MatchQueryBuilder.Operator.OR);

Perhaps can you give a complete example of an index, 1 document, and the 
actual full Java query to duplicate your problem?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/723c084a-7549-4f3e-9109-77342bbc24bf%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: elasticsearch cache configuration

2014-02-26 Thread Zachary Tong

If you simply want to decrease the amount of memory that Elasticsearch is 
using, you need to change your heap size (via the HEAP_SIZE environment 
variable).  That controls the total memory allocated to Elasticsearch.

Echoing what Binh said...try not to change the field-data settings unless 
you know what you are doing.  You should not enable soft references for 
field-data, this is a very bad option.  It causes excessive GC thrashing 
and is not needed (especially now that 1.0 has circuit breaker logic built 
in).  Similarly, the time expiration on field-data is typically a poor 
option, since it unnecessarily thrashes the GC too.

On Wednesday, February 26, 2014 10:19:40 AM UTC-5, Hediye Delkhosh wrote:
>
>
> @Binh Ly thank you for reply.
> I don't know where should I insert this configs.
> I've installes FOS-elastica in my project, there is a elasticsearch.yml in 
> my project and another one is in /etc/elasticsearch. where should I insert 
> configs?
>
> I've tested server memory usage. Elasticserach service use memory alot.
>
> Thank you :)
>
> On Wednesday, February 26, 2014 6:29:04 PM UTC+3:30, Binh Ly wrote:
>>
>> For ES 1.0, the field data settings are here:
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
>>
>> The filter cache settings are here:
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-cache.html
>>
>> The easiest way is to set these in the elasticsearch.yml file. 
>>
>> I would caution that you'll probably need to first investigate and 
>> understand your memory usage before trying to change any of these settings.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/325c57fb-7cef-4805-8708-c59ef14f6d73%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[Hadoop] New tutorial submitted

2014-02-26 Thread Yann Barraud

Hi,

I'm glad to inform I publish a new tutorial to integrate ES + Hortonworks 
sandbox.

https://github.com/hortonworks/hadoop-tutorials/pull/9

Cheers,
Yann

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a719f3c0-3b3b-4e2c-82d8-912e3710719a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Kibana empty after upgrade to ES 1.0.1

2014-02-26 Thread Terry Healy

I just upgraded all my ES systems to 1.0.1 and they seem to be working fine
- except for Kibana 3. I had installed Kibana 3 Milestone pre-5 (8512132).
Previously I was using it just with _all enabled. Now when I attempt to use
a filter for "*" and a time filter for the past hour or so, it lists none
of my 3 active indices.

Prior to the upgrades, my bookmark went
to http://192.168.4.254:8080/#/dashboard/file/guided.json. The Kibana
installation instructions seem to be telling me that this URL should now be
just http://192.168.4.254:9200, which just gives me node status (below) so
I must be misunderstanding something.

{
"status" : 200,
"name" : "t5",
"version" : {
"number" : "1.0.1",
"build_hash" : "5c03844e1978e5cc924dab2a423dc63ce881c42b",
"build_timestamp" : "2014-02-25T15:52:53Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},
"tagline" : "You Know, for Search"
}

In config.js, my elasticsearch is set as:

/** @scratch /configuration/config.js/5
* elasticsearch
*
* The URL to your elasticsearch server. You almost certainly don't
* want +http://localhost:9200+ here. Even if Kibana and Elasticsearch are
on
* the same host. By default this will attempt to reach ES at the same host
you have
* kibana installed on. You probably want to set it to the FQDN of your
* elasticsearch host
*/
elasticsearch: "http://192.168.4.254:9200";,

Thanks for any suggestions.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e88230ec-e27a-49dd-8c3f-26c73f35c5b8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Include special Symbol

Hello Binh

I used elasticsearch-river-mongo plugin.

How to modify this index??
Thanks for your reply



Binh Ly於 2014年2月26日星期三UTC+8下午11時03分29秒寫道：
>
> You'll likely need that field to be unanalyzed (i.e. tell ES not to cut it 
> up in the index). One way is to predefine that field in your mapping as:
>
> "user": {
> "type": "string",
> "index": "not_analyzed"
> }
>
> More details here:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/96a3b201-adef-4fc8-95da-96633ffa6112%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: elasticsearch cache configuration

2014-02-26 Thread Hediye Delkhosh


@Binh Ly thank you for reply.
I don't know where should I insert this configs.
I've installes FOS-elastica in my project, there is a elasticsearch.yml in 
my project and another one is in /etc/elasticsearch. where should I insert 
configs?

I've tested server memory usage. Elasticserach service use memory alot.

Thank you :)

On Wednesday, February 26, 2014 6:29:04 PM UTC+3:30, Binh Ly wrote:
>
> For ES 1.0, the field data settings are here:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
>
> The filter cache settings are here:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-cache.html
>
> The easiest way is to set these in the elasticsearch.yml file. 
>
> I would caution that you'll probably need to first investigate and 
> understand your memory usage before trying to change any of these settings.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7ebf2e93-7f38-47fc-9cc0-c1eee98e60c2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What's wrong with my match query in Java API?

2014-02-26 Thread Daniel Guo

I want to do a match query, and the query works fine in REST:

curl -XGET 'localhost:9200/search/video_search/_search?pretty' -d @query.
json

query.json:
{
"query": {
"match": {
"tvName": {
"query": "决战华岩寺",
"operator": "or",
"minimum_should_match": "2"
}
}
}
}

but I don't know how to do it in Java, my code below doesn't work:

MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("tvName", keyword)
.minimumShouldMatch("2")
.operator(MatchQueryBuilder.Operator.OR);

the exception is : Unexpected end of block of data. 

Cloud anybody help? Thanks very much.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/86a21181-8007-4595-abc9-4b45f81b67a3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Include special Symbol

You'll likely need that field to be unanalyzed (i.e. tell ES not to cut it 
up in the index). One way is to predefine that field in your mapping as:

"user": {
"type": "string",
"index": "not_analyzed"
}

More details here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/805a1632-7e84-416f-8d7a-d5b9f327b8aa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Include special Symbol

Hello

I have column include special symbol.
Ex: 
user : (Google)a...@gmail.com

I want to count. 

POST /datas/_search
{
  "facets": {
"terms": {
  "terms": {
"field": "user",
"size": 10
  }
}
  }
}

But, result is not right.
 "term": "gmail.com",
   "count": 564987
},
{
   "term": "facebook",
   "count": 475632
},
{
   "term": "google",
   "count": 411384

Can I to solve this problem??

 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/532e8f3c-a643-452d-934c-b71774201f18%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: elasticsearch cache configuration

For ES 1.0, the field data settings are here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

The filter cache settings are here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-cache.html

The easiest way is to set these in the elasticsearch.yml file. 

I would caution that you'll probably need to first investigate and 
understand your memory usage before trying to change any of these settings.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b5779a7-8839-42d4-af20-1d9b720ff242%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to visualize statistics on time series data in Kibana

When you add a Histogram panel, look in the setting Chart Value. There are 
options for max and mean in there and then in the Value Field, you can 
specify "scan duration" (or "connect duration") - I'm not 100% sure if the 
spaces in your field name might fail but if it does, you'll probably need 
to fix your LS config to output field names with no spaces. The only 
limitation right now is you can't plot multiple time series stats (Chart 
Value + Value Field) in 1 histogram at the moment. So you'll need to create 
separate histograms per Chart Value + Value Field.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/80167728-cacd-4be1-829b-9fc2abb1ab3a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Invalid Version Format, but version same

Just a guess, can you check also that you have the same exact Java versions 
across all nodes?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9302b75-9fd3-455d-95d3-aac97da6ebf9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to join 2 indexes at query time

Unfortunately, ES is not like SQL in this respect. You'll need to 
denormalize somewhat because ES is more "document-oriented". You'd probably 
need to either denormalize offer_id into categorytype, or category into 
offertype to get all the data you want returned in 1 query.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/750f1c19-4eca-4204-9da0-25783feda996%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Free (cloud) hosting Elasticsearch provider

2014-02-26 Thread Itamar Syn-Hershko

I can recommend the dedicated hosting service at qbox.io
On Sep 5, 2013 8:56 AM, "ferhatsb"  wrote:

> Hi Charles,
>
> We are offering free plan with very limited resources.
> https://www.searchbox.io/plans_and_pricing
>
> Ferhat
>
> On Wednesday, September 4, 2013 9:49:21 AM UTC+3, Charles Moulliard wrote:
>>
>> Hi,
>>
>> Does it exist a free (cloud or not) hosting Elasticsearch provider ?
>>
>> Regards,
>>
>> Charles
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtoDRTA07rLiGHsg_oDHG0PwqCGeuE4y3r_RRG0m-RJAQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Queue capacity and EsRejectedExecutionException leads to loss of data

2014-02-26 Thread David Pilato

I think you should open an issue.

It's to me somehow related to this issue:
https://github.com/elasticsearch/elasticsearch-river-rabbitmq/issues/47
and its pending PR:
https://github.com/elasticsearch/elasticsearch-river-rabbitmq/pull/48

I think we should have a new option to let the user decides if if want to ack
messages in case of failure or leave them in queue.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 26 février 2014 à 14:49:00, Thomas (thomas.bo...@gmail.com) a écrit:

Hi,

We have changed the configuration of queue size to 1000 and the problem
disappeared.

Thanks

Thomas

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4e11ce4e-26d4-4a53-a53e-0ba89bde4605%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.530dfa41.1a70.5e46%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch Reverse Suggester Problem

2014-02-26 Thread Nikolas Everett

I'm not sure what to do then.  I only use the phrase suggesting in forwards
mode and only know the theory behind the reverse stuff.

Nik


On Wed, Feb 26, 2014 at 3:55 AM, Garry Welding  wrote:

> However, I did give it a try removing the pre filter, but it didn't change
> the results.
>
>
> On Tuesday, February 25, 2014 8:00:14 PM UTC, Nikolas Everett wrote:
>
>> I believe the job of the reverse filter is to efficiently provide
>> suggestions that share a suffix with the provided term rather than a
>> prefix.  You might try removing the pre_filter to see if it handles
>> reversed words.
>>
>> The reason for the reverse index for the suffix is that lucene stores
>> terms in sorted order and the suggester requires there to be a prefix match
>> to slice the portion of the index that must be scanned for terms.
>>
>> Nik
>>
>>
>> On Tue, Feb 25, 2014 at 2:28 PM, Garry Welding  wrote:
>>
>>> Really, nobody has an answer to this?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/35deb787-48f5-4d22-82fc-5d4cc178d87f%
>>> 40googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6171d924-678f-40e6-ae9b-08694e14fde1%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3VmZeg%3D2jgJfhp96Aa%2BgYhCowEM3CB0VDmxjOJiv8oYw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Histogram of high-cardinality aggregate

Unfortunately, I don't believe you can do a sub-aggregation on a 
single-value metric at the moment. For now, you'll probably have to index 
the actual ("min") values and then aggregate on them.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/64a4ff9a-0100-4960-abbd-9580bd0222f4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Free (cloud) hosting Elasticsearch provider

2014-02-26 Thread Mattias Nordberg

https://facetflow.com/ - We offer hosted Elasticsearch with 500MB and 5,000 
documents for free. 

On Wednesday, September 4, 2013 8:49:21 AM UTC+2, Charles Moulliard wrote:
>
> Hi,
>
> Does it exist a free (cloud or not) hosting Elasticsearch provider ?
>
> Regards,
>
> Charles
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28844b3d-30cb-445a-a8a0-c8527112512d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

elasticsearch cache configuration

2014-02-26 Thread Hediye Delkhosh

Hello.
I've installed Elasticsearch on web server and it use most part of memory, 
I searched and finally found cache module and it's configuration but I 
don't know how config Elasticsearch. Where should I insert below 
configurations???


index.cache.field.max_size: 5
index.cache.field.expire: 10m
index.cache.field.type: soft

should I enabale Cache module?? and where??

Thank you so much :)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0fcf6faf-caf2-4b16-b4b3-249ffc4cee4f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

How to visualize statistics on time series data in Kibana

2014-02-26 Thread Dave Snigier

Howdy everyone,
I have events with the following structure in ES:

{
  "_index": "logstash-2014.02.25",
  "_type": "symantecav-logs",
  "_id": "_5Hig6lPTUi2p-palnuplA",
  "_score": null,
  "_source": {
"message": [
  
"1393368016|0|2|5|3|69.16.1.13/UMTL300X.rtf|4|UMTL300X.rtf|39|192.168.23.7|17|0.167|18|0.232|43|192.168.25.22|44|9003|45|12133924"
],
"@version": 1,
"@timestamp": "2014-02-25T22:40:16.000Z",
"host": "antivirus1.domain.net",
"tags": [
  "antivirus",
  "test",
  "boston"
],
"file": "/antivirus/log/SSE20140225.log",
"type": "symantecav-logs",
"typecode": "0",
"filename": "UMTL300X.rtf",
"client": "client.domain.net",
"scan duration": 0.167,
"connect duration": 0.232,
"extension": "rtf"
  },
  "sort": [
1393368016000,
1393368016000
  ]
}


My goal is to visualize the max and mean of the scan and connect duration 
over time as a line graph within Kibana. Is this possible with the widgets 
currently available? I've been trying out several but haven't had much luck 
getting them to do what I'm looking for. 


Here are are the ES queries I'm using on the Kibana dashboard:
type:"symantecav-logs" AND tags:"test" AND host:"antivirus1.domain.net"
type:"symantecav-logs" AND tags:"test" AND host:"antivirus2.domain.net"


thanks for any and all help you can lend to a neophyte such as myself!
-Dave

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fc93669e-6d95-4f99-b00d-63ad997865d2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[Hadoop] ERROR security.UserGroupInformation: PriviledgedActionException as:hue (auth:SIMPLE) cause:BeeswaxException

2014-02-26 Thread Yann Barraud

Hi,

Still struggling to get Es + Hadoop smoothly working together. 

Anyone facing this type of issue ? 

ERROR security.UserGroupInformation: PriviledgedActionException as:hue 
(auth:SIMPLE) cause:BeeswaxException




14/02/26 05:42:34 INFO Configuration.deprecation: mapred.input.dir.recursive is 
deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO parse.ParseDriver: Parsing command: use default
> 14/02/26 05:42:34 INFO parse.ParseDriver: Parse Completed
> 14/02/26 05:42:34 INFO ql.Driver:  end=1393422154485 duration=1>
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver: Semantic Analysis Completed
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154485 end=1393422154485 duration=0>
> 14/02/26 05:42:34 INFO ql.Driver: Returning Hive schema: 
> Schema(fieldSchemas:null, properties:null)
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154483 end=1393422154485 duration=2>
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver: Starting command: use default
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154483 end=1393422154486 duration=3>
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154486 end=1393422154493 duration=7>
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154486 end=1393422154493 duration=7>
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154485 end=1393422154493 duration=8>
> OK
> 14/02/26 05:42:34 INFO ql.Driver: OK
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154493 end=1393422154493 duration=0>
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154483 end=1393422154493 duration=10>
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE 
> TABLE eslogs SELECT s.time, s.ext, s.ip, s.req, s.res, s.agent FROM logs s
> 14/02/26 05:42:34 INFO parse.ParseDriver: Parse Completed
> 14/02/26 05:42:34 INFO ql.Driver:  end=1393422154495 duration=1>
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
> 14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic 
> Analysis
> 14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Get metadata for source tables
> 14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Get metadata for subqueries
> 14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Get metadata for destination 
> tables
> 14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Completed getting MetaData in 
> Semantic Analysis
> 14/02/26 05:42:34 INFO ppd.OpProcFactory: Processing for FS(9)
> 14/02/26 05:42:34 INFO ppd.OpProcFactory: Processing for SEL(8)
> 14/02/26 05:42:34 INFO ppd.OpProcFactory: Processing for TS(7)
> 14/02/26 05:42:34 INFO physical.MetadataOnlyOptimizer: Looking for table 
> scans where optimization is applicable
> 14/02/26 05:42:34 INFO physical.MetadataOnlyOptimizer: Found 0 metadata only 
> table scans
> 14/02/26 05:42:34 INFO parse.SemanticAnalyzer: Completed plan generation
> 14/02/26 05:42:34 INFO ql.Driver: Semantic Analysis Completed
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154495 end=1393422154549 duration=54>
> 14/02/26 05:42:34 INFO ql.Driver: Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:time, type:string, comment:null), 
> FieldSchema(name:ext, type:string, comment:null), FieldSchema(name:ip, 
> type:string, comment:null), FieldSchema(name:req, type:string, comment:null), 
> FieldSchema(name:res, type:int, comment:null), FieldSchema(name:agent, 
> type:string, comment:null)], properties:null)
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154493 end=1393422154549 duration=56>
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE 
> eslogs SELECT s.time, s.ext, s.ip, s.req, s.res, s.agent FROM logs s
> Total MapReduce jobs = 1
> 14/02/26 05:42:34 INFO ql.Driver: Total MapReduce jobs = 1
> 14/02/26 05:42:34 INFO ql.Driver:  start=1393422154483 end=1393422154585 duration=102>
> 14/02/26 05:42:34 INFO ql.Driver: 
> 14/02/26 05:42:34 INFO ql.Driver: 
> Launching Job 1 out of 1
> 14/02/26 05:42:34 INFO ql.Driver: Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> 14/02/26 05:42:34 INFO exec.Task: Number of reduce tasks is set to 0 since 
> there's no reduce operator
> 14/02/26 05:42:34 INFO ql.Context: New scratch dir is 
> hdfs://sandbox.hortonworks.com:8020/tmp/hive-beeswax-hue/hive_2014-02-26_05-42-34_494_4288001753889524446-3
> 14/02/26 05:42:34 INFO Configuration.deprecation: 
> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
> mapreduce.reduce.speculative
> 14/02/26 05:42:34 INFO mr.ExecDriver: Using

Re: Compute TF/IDF across indexes

I tried this and indeed it works, so thanks Ivan for the tip!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/57018734-6da3-4991-9d90-4422f92c2aa6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Interesting question on Transaction Log record mutability

Yes each partial update will record to the transaction log. Whenever the 
log is flushed, each update is replayed and the document version is 
incremented per update.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e05348c6-73c8-439f-b849-3959892bfee1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Queue capacity and EsRejectedExecutionException leads to loss of data

2014-02-26 Thread Thomas

Hi,

We have installed the RabbitMQ river plugin to pull data from our Queue and 
adding them to ES. The thing is that at some point we are receiving the 
following exception and we have as a result to *lose data*.

[1775]: index [events-idx], type [click], id 
> [3f6e4604146b435aabcf4ea5a493fd32], message 
> [EsRejectedExecutionException[rejected execution (queue capacity 50) on 
> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@12843ca2]]


We have changed the configuration  of queue size to 1000 and the problem 
disappeared. 

My question is that is there any configuration/way to tell ES to instead of 
throwing this exception and discarding the document to wait for available 
resources (with the corresponding performance impact)?

Thanks

Thomas


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4e11ce4e-26d4-4a53-a53e-0ba89bde4605%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Index existing data and integrate elasticsearch with PHP

2014-02-26 Thread Santosh Maskar

Thank you Zach. Your input will definitely helpful.

Santosh
On 26-Feb-2014 5:52 pm, "Zachary Tong"  wrote:

>
>1. There is no way to index straight from MySQL itself.  You will need
>some kind of adapter/connector to move the data from MySQL to
>Elasticsearch.  This is necessary for several reasons, but mainly because
>you need to transform the rows into usable JSON documents.  That's
>something only your application will know how to do.  The adapter could be
>written in PHP, Python, Perl, etc...any of your favorite dynamic languages
>will work.
>
>2. Synchronizing databases is another job best left to the application
>and user.  The easiest solution is to have a dual-pipeline:  when an update
>(or delete or index) arrives at your system, it is sent to MySQL and
>Elasticsearch simultaneously.  It is usually easier to perform the
>operation in parallel rather than sending to the database and then trying
>to keep the two synchronized.
>
>3. That's an awfully broad question!  :P  The PHP client quickstart
>and docs show the syntax to use the client, which gives you access to all
>the Elasticsearch APIs.  How you want to integrate Elasticsearch into your
>application is entirely up to you, and it is very dependent on your
>particular use-case.
>
> -Zach
>
>
> On Wednesday, February 26, 2014 6:29:14 AM UTC-5, Santosh wrote:
>>
>> Dear All,
>>
>> I have gone through the link - http://www.elasticsearch.org/
>> guide/en/elasticsearch/client/php-api/current/_quickstart.html to
>> understand the setup.  Can someone point to the documentation where I can
>>
>> 1. Index existing data from mysql - I can index the data using curl but
>> trying to figure out how to do it in one go. Dont want to do through PHP
>> since size of data is very large.
>>
>> 2. Sync the data whenever there are any update into the database through
>> application. Is there a way to achieve this.
>>
>> 3. How can We use elastic search in PHP
>>
>> If someone can point to documentation which give some getting started
>> like stuff that would be great help.
>>
>> -Santosh
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/TKrj5q6w9gs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8c94a64f-f1fe-463e-9750-71aa7b9836ff%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADO%2BbTENERPrEGsPHbjxUZZQUJDth9S79iZM4X8Vx_YRErfLWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: nodes spend all time in RamUsageEstimator after upgrade to 0.90.11

2014-02-26 Thread Benoît

The release of last version 0.90.12 solve (or at least hide) this problem.

Thanks to the team !

Benoît

On Tuesday, February 25, 2014 5:50:47 PM UTC+1, Benoît wrote:
>
> Thank you Binh Ly,
>
> On Tuesday, February 25, 2014 4:25:59 PM UTC+1, Binh Ly wrote:
>>
>> This is a known issue and will be fixed shortly. For now, what you can do 
>> is run _optimize on all your indexes and set max_num_segments to 1, like 
>> below. Note that this may take a while depending on the size of your 
>> indexes.
>>
>> http://localhost:9200/_optimize?max_num_segments
>>
>>
>> Your suggestion confirm what Jörg Prante said here 
> https://groups.google.com/d/msg/elasticsearch/7mrDhqe6LEo/3gjOJka85OYJ
> This is a problem with Lucene segment of version 3.x
>
> I have around 1T of index, so I'm not really happy to run optimize, I will 
> try on one of the smallest index.
>
> If I stop all the request to the statistics API, I should see the load 
> decreasing ?
>
> Regards.
>
>
> Benoît
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/caf26d0b-73b0-436d-8253-b0298ec0b285%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Error in running cluster query

Hi All,
 I am executing a cluster query to fetch the documents and cluster it using
Carrot2.

If I am doing the query with the words "*mobiles, samsung, test*" it works
fine but if I do the query with word "*mobile*" it is giving the following
exception.
*error: ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ]; nested: c[bytes can be at most 32766 in length; got 131074];
status: 503*

Any solution to solve such type of exception or how I can find the root
cause for the same?



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Error-in-running-cluster-query-tp4050531.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1393420724089-4050531.post%40n3.nabble.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Removing elasticsearch logs

There is currently discussion around this, but in the meantime, try this to 
see if it helps:

https://github.com/elasticsearch/curator

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e43251cf-abc1-4dd3--c798f34bdd5a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: DateRange aggregation semantics - include_lower/include_upper?

2014-02-26 Thread mooky

I think its necessary to be able to specify an *include_lower*/
*include_upper* option like with filters.


On Tuesday, 25 February 2014 14:54:24 UTC, Binh Ly wrote:
>
> Yes, you are correct. The "from" is inclusive, and the "to" is exclusive.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/29315df6-645c-40a5-8920-b28488efa6fa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: ES Response Time

2014-02-26 Thread Zachary Tong

The `took` parameter is the number of milliseconds that the query took to 
execute on the Elasticsearch server.  It's basically the time required to 
parse the query, broadcast it to the shards and collect the results.  It 
doesn't include network time going to and from Elasticsearch itself (since 
that is basically unknowable information to ES).

If you want to record total end-to-end, you'll have to do it in your 
application.  Start a timer, send the request, stop the timer when the 
response comes back.  That will be your total time.  You can also subtract 
the `took` parameter from the total time to see the effects of network 
(e.g. total time is 2 seconds, but `took` parameter is 100ms...that means 
you had 1.9s of network latency causing the slowdown).

-Zach

On Wednesday, February 26, 2014 6:31:27 AM UTC-5, Prashy wrote:
>
> Hi All, 
> Is there any way to measure end-to-end response time for a query in 
> elastic 
> search? 
> That is the time taken from the time query is executed and the result is 
> shown on the ES UI. 
>
> And what does took parameter means in response ouput? 
>
>
>
> -- 
> View this message in context: 
> http://elasticsearch-users.115913.n3.nabble.com/ES-Response-Time-tp4050525.html
>  
> Sent from the ElasticSearch Users mailing list archive at Nabble.com. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/abbcb6e1-0ace-4835-8b93-dc1e69a32de2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Index existing data and integrate elasticsearch with PHP

2014-02-26 Thread Zachary Tong

   1. There is no way to index straight from MySQL itself.  You will need 
   some kind of adapter/connector to move the data from MySQL to 
   Elasticsearch.  This is necessary for several reasons, but mainly because 
   you need to transform the rows into usable JSON documents.  That's 
   something only your application will know how to do.  The adapter could be 
   written in PHP, Python, Perl, etc...any of your favorite dynamic languages 
   will work.

   2. Synchronizing databases is another job best left to the application 
   and user.  The easiest solution is to have a dual-pipeline:  when an update 
   (or delete or index) arrives at your system, it is sent to MySQL and 
   Elasticsearch simultaneously.  It is usually easier to perform the 
   operation in parallel rather than sending to the database and then trying 
   to keep the two synchronized.

   3. That's an awfully broad question!  :P  The PHP client quickstart and 
   docs show the syntax to use the client, which gives you access to all the 
   Elasticsearch APIs.  How you want to integrate Elasticsearch into your 
   application is entirely up to you, and it is very dependent on your 
   particular use-case.

-Zach

On Wednesday, February 26, 2014 6:29:14 AM UTC-5, Santosh wrote:
>
> Dear All,
>
> I have gone through the link - 
> http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_quickstart.htmlto
>  understand the setup.  Can someone point to the documentation where I can
>
> 1. Index existing data from mysql - I can index the data using curl but 
> trying to figure out how to do it in one go. Dont want to do through PHP 
> since size of data is very large.
>
> 2. Sync the data whenever there are any update into the database through 
> application. Is there a way to achieve this.
>
> 3. How can We use elastic search in PHP 
>
> If someone can point to documentation which give some getting started like 
> stuff that would be great help.
>
> -Santosh
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8c94a64f-f1fe-463e-9750-71aa7b9836ff%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Relation Between Heap Size and Total Data Size

2014-02-26 Thread Dan Fairs

> So, I am wondering that is there any relationship between heap size and total 
> data size? Is there any formula to determine heap size based on data size?


You might want to check that you're not running out of file handles:

  http://www.elasticsearch.org/tutorials/too-many-open-files/

Cheers,
Dan

--
Dan Fairs | dan.fa...@gmail.com | @danfairs | secondsync.com

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/FDC26956-0E46-4E2B-9A0D-1F899DDFD015%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

ES Response Time

Hi All,
Is there any way to measure end-to-end response time for a query in elastic
search?
That is the time taken from the time query is executed and the result is
shown on the ES UI.

And what does took parameter means in response ouput?



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/ES-Response-Time-tp4050525.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1393414287516-4050525.post%40n3.nabble.com.
For more options, visit https://groups.google.com/groups/opt_out.

Index existing data and integrate elasticsearch with PHP

2014-02-26 Thread Santosh

Dear All,

I have gone through the link - 
http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_quickstart.html
 
to understand the setup.  Can someone point to the documentation where I can

1. Index existing data from mysql - I can index the data using curl but 
trying to figure out how to do it in one go. Dont want to do through PHP 
since size of data is very large.

2. Sync the data whenever there are any update into the database through 
application. Is there a way to achieve this.

3. How can We use elastic search in PHP 

If someone can point to documentation which give some getting started like 
stuff that would be great help.

-Santosh

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f9ad5f6b-7e4a-4505-920a-20ec2207ab0d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Nested object scripting

2014-02-26 Thread Mehrez Marouani

Hi everyone,

I need to know how to make a facet script for a nested object. 

The mapping:

{
"vehicle": {
"properties": {
"id": {
"type": "long"
},
"fitments": {
"type": "nested",
"properties": {
"dimensions": {
"type": "nested",
"properties": {
"id": {
"type": "long"
},
"position": {
"type": "string",
"index_analyzer": "string_lowercase"
}
}
}
},
"id": {
"type": "long"
}
}
}
}
}


I've tried:
*Query:*
{
...  
 "facets" : {
"size" : {
  "terms" : {
"size" : 15,
"order" : "term",
"all_terms" : false,
"script" : "_source.fitments.dimensions.id + ';' + 
_source.fitments.dimensions.position"
  },
  "nested" : "fitments.dimensions"
}
  }
}

*Response:*
QueryPhaseExecutionException[[en_us][3]: query[filtered(+(year:[1983 TO 
1983]) +(carMaker:audi) +(carSegment:4000) 
+(carType:base))->cache(_type:vehicle)],from[0],size[0]: Query Failed 
[Failed to execute main query]]; nested: PropertyAccessException[[Error: 
could not access: fitments; in class: 
org.elasticsearch.search.lookup.SourceLookup]\n[Near : {... 
_source.fitments.dimensions.id }]\n ^\n[Line: 1, Column: 
1]];


I've tried also: 

*Query:*
{
...  
 "facets" : {
"size" : {
  "terms" : {
"size" : 15,
"order" : "term",
"all_terms" : false,
"script" : "doc['fitments.dimensions.id'] + ';' + 
doc['fitments.dimensions.position']"
  },
  "nested" : "fitments.dimensions"
}
  }
}


*Response:*
"facets": {
"size": {
"_type": "terms",
"missing": 0,
"total": 2,
"other": 0,
"terms": [
{
"term": 
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs@1b3562af;org.elasticsearch.index.fielddata.ScriptDocValues$Strings@4de1dd1d",
"count": 2
}
]
}
}


Thank you in advance for your help guys :)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d735a13a-861c-4d6d-a54d-477be4291e98%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Invalid Version Format, but version same

2014-02-26 Thread Jorj Ives

Hello,

I'm trying to bring a second node into my cluster. I've set it up as 
unicast and the two nodes are trying to communicate however, they claim 
they're incompatible versions.

Node One:

> {
>   "ok" : true,
>   "status" : 200,
>   "name" : "ES Server Node",
>   "version" : {
> "number" : "0.90.11",
> "build_hash" : "11da1bacf39cec400fd97581668acb2c5450516c",
> "build_timestamp" : "2014-02-03T15:27:39Z",
> "build_snapshot" : false,
> "lucene_version" : "4.6"
>   },
>   "tagline" : "You Know, for Search"
> }
>
>

Node Two:

> {
>   "ok" : true,
>   "status" : 200,
>   "name" : "Front End Server",
>   "version" : {
> "number" : "0.90.11",
> "build_hash" : "11da1bacf39cec400fd97581668acb2c5450516c",
> "build_timestamp" : "2014-02-03T15:27:39Z",
> "build_snapshot" : false,
> "lucene_version" : "4.6"
>   },
>   "tagline" : "You Know, for Search"
> }
>
>
As you can see, the version and build hash are identical, but I'm still 
getting this error:

> Caught exception while handling client http traffic, closing connection 
> [id: 0x1f08af31, /79.125.27.104:60874 :> /10.33.159.105:9200]

java.lang.IllegalArgumentException: invalid version format: 
> SERVERZLAOJ5QORYWPGIVHHQARNQ

at 
> org.elasticsearch.common.netty.handler.codec.http.HttpVersion.(HttpVersion.java:102)

at 
> org.elasticsearch.common.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:62)

at 
> org.elasticsearch.common.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:75)

at 
> org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:189)

at 
> org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:101)

at 
> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500)

at 
> org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:554)

at 
> org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365)

at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)

at 
> org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)

at 
> org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:396)

at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:360)

at 
> org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:81)

at 
> org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)

at 
> org.elasticsearch.common.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)

at org.elasticsearch.common.netty.channel.Channels.close(Channels.java:812)

at 
> org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:197)

at 
> org.elasticsearch.http.netty.NettyHttpServerTransport.exceptionCaught(NettyHttpServerTransport.java:307)

at 
> org.elasticsearch.http.netty.HttpRequestHandler.exceptionCaught(HttpRequestHandler.java:49)

at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)

at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)

at 
> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)

at 
> org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

at 
> org.elasticsearch.common.netty.channel.Defau

How to join 2 indexes at query time

2014-02-26 Thread Jayesh Bhoyar

Hi All,

I want to join 2 indexes at query time.
I have created a Gist for this @ 
https://gist.github.com/jsbonline2006/9227299

I have 2 indexes:

Index 1: offerindex/ Type: offertype
Index 2: categoryindex/ Type: categorytype

Now as per my index data:
My input will be category "Flat TV"
And in output: I want all skuid for "Flat TV" and there corresponding offer_id.

Regards,
Jayesh Bhoyar

*GIST @ https://gist.github.com/jsbonline2006/9227299*

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/99ad8aa4-dfec-48c1-9605-55c6e3024141%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Text Categorization in ES

2014-02-26 Thread Hannes Korte

On 26.02.2014 08:28, prashy wrote:
> To be specific I want a query like :
> Searching for Laptop will automatically give result for "Dell, Sony, HP,
> Lenevo, Samsung..." as well.

I'm not sure I got that correctly. Besides the text classification we
talked about, this sentence could also mean that you want to expand your
query. So instead of searching only for the term "Laptop" you want the
query to be expanded automatically by adding highly correlated words
like "Dell", "Sony", "HP", etc. to get a broader search result. Is it
like that?

http://en.wikipedia.org/wiki/Query_expansion

Hannes

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/530DBCF5.3030106%40hkorte.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Text Categorization in ES

2014-02-26 Thread Dawid Weiss

> So it means that all the classification has to be done prior, on the basis of
> user defined scenario.

For proper faceting yes -- this information would either come with
each document or would be extracted statically (when indexing each
document). I'm sure OpenNLP and other text mining projects have named
entity recognition that would be of help here. You may want to check
out Grant's book on the subject.

http://www.manning.com/ingersoll/

> And automatically this feature is not supported either through carrot or
> Lingo3g. Like we have the feature of word-delimiter, hunspell filter etc.

Feel free to try Carrot2 (and Lingo3G) on your data. Cluster labels
*are* sort of dynamic facet labels, but they are not as "ideal" as
statically indexed facets. Also, they are context-dependent (they will
be created from scratch for each search result). They are essentially
a different tool for a different task (to get a fast glimpse into a
larger window of search results, for which static facets are not
indexed).

Dawid

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAM21Rt9WMiVe70HGJATdARRqyFyui2GUO8j3O%2B7NC4n_9JS3CA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Add an extra flag in ES query to check if a field exists

2014-02-26 Thread Prince

Hi, many thanks for the pointer. I didnt try the code you mentioned, but 
changed my code like this:

  "script": "!doc['arranged_retweets.author_gender'].empty"

This worked for me. Thank you very much for your help.

On Wednesday, February 19, 2014 9:52:54 PM UTC+5:30, Binh Ly wrote:

> I see, looks like it's an object so you'd probably need to check down to 
> the leaf level, like for example:
>
> doc["arranged_retweets.author_gender"].isEmpty() || 
> doc["arranged_retweets.author_link"].isEmpty()
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c68ca04d-acf8-4cda-8638-d118a088d1c5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Text Categorization in ES