Re: data distribution over shards and replicas

2014-04-02 Thread Subhadip Bagui
Thanks a lot Mark. That explains a lot.

By backup I meant copy of same data.

One last question, for fast searching what will be the better selection? 
single index multiple shards or multiple index single shard?
 
Can you please give some reference how lucene splits documents and store in 
shards. That will help to get better idea.


Thanks,
Subhadip

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3785b01c-328f-4f4c-8dab-db93b73b2b5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Stree Address Queries

2014-04-02 Thread Henri van den Bulk
Thanks for the great explanation. Is there also a comparable equivalent 
when using query string? 

On Wednesday, March 26, 2014 2:25:05 PM UTC-6, Binh Ly wrote:
>
> You probably want to "upgrade" to the match query - "text" queries are 
> older and no longer exist in 1.x. But anyway when you query:
>
> "match": { "f": "S Fun St" }
>
> You are effectively doing (roughly):
>
> f=S or f=fun or f=St
>
> You could make it do AND if you want (in which case a match is only found 
> if the document/field value contains all terms):
>
> {
> "match" : {
> "f" : {
> "query" : "S Fun St",
> "operator" : "and"
> }
> }
> }
>
>
> You could also do OR with a minimum_should_match parameter to specify how 
> many of the individual terms should match the document/field value:
>
> {
> "match" : {
> "f" : {
> "query" : "S Fun St",
> "minimum_should_match" : 2
> }
> }
> }
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cd425185-e2e0-4d8f-ba1c-c357b54cd4d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Is there a way to know which synonym matched along with field value in the search ouput

2014-04-02 Thread saiprasad mishra
Hi All

Lets say we are trying to search for a field which has some stemming filter 
configured for synonyms. If field has value called x and for which there is 
a synonym y then can the search result return both x and y
Should I do some thing in the index time to store it before hand or is 
there any out of box features

Regards
Sai

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9d1bfcc0-1203-4897-9c67-a4f03640ede3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Can _search_with_cluster also cluster the result if content is of type "attachment"?

2014-04-02 Thread Pratikshya Kuinkel
When trying to use Carrot2 with elasticsearch, I need to map the field 
which is of type attachment for creating the logical cluster. Will it be 
able to cluster the result if the content is in base64 encoded format? (as 
that field is of type attachment) and at the moment, it does not seem to be 
doing so. Please suggest.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/842034cd-33b2-4740-9ef0-2cf66a2f1f74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Can a nested attachment type be highlighted?

2014-04-02 Thread Pratikshya Kuinkel
I am new to elasticsearch, so I may not be constructing the mapping in a 
correct way. But, my mapping looks as follows:

/myindex/messages/_mapping

{

“messages“: {

"properties": {

"author": {

"type": "String"

},

“pipe_id": {

"type": "String"

},

"files": {

"type": "nested",

"properties": {

"file" : {

"type": "attachment", 

"path": "full", 

"fields":

{

"file_author": {

"type": "String"

},

"file_id": {

"type": "String"

},

"file_mimetype": {

"type": "String"

},

"filename": {

"type": "String"

},

"content" : { 
"term_vector":"with_positions_offsets", "store":"yes", "type": "String" }

}

}

}

},

"tags": {

"properties": {

"tag_id": {

"type": "String"

},

"tag_name": {

"type": "String"

}

},

"type": "nested"

},

"email": {

"type": "String"

},

“message_description": {

"type": "String"

},

“message_title": {

"type": "String"

}

}

}

}


Now I have indexed the document and been able to search correctly on the 
attachments. But the highlighting on the "content" does not work. My query 
is as follows:

POST /myindex/messages/_search

{

"query" : {

"query_string" : {

  "query" : "music"

}

  },

  "highlight": {

  "fields": {

  "files.file.content" : {}

  }

  } 

}

It matches the term "music" in my file and displays the base 64 encoded 
string but there is no highlight field. Please suggest if my mapping is 
correct and why am I not getting the highlighted result?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ff20aab1-0502-469d-9985-f69e271d875a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Copying fields to a geopoint type ?

2014-04-02 Thread Pascal VINCENT
Hi,

I'm new to elasticsearch. My usecase is to load a csv file containing some 
agencies with geo location, each lines are like :

id;label;address;zipcode;city;region;*latitude*;*longitude*;(and some 
others fields)+

I'm using the csv river plugin to index the file.

My mapping is :

{
  "office": {
"properties": {

*(first fields omitted...)*

  "*latitude*": {
"type": "double",
  },
  "*longitude*": {
"type": "double",
  },
  "*location*": {
"type": "geo_point",
"lat_lon": "true"
  }
}  
}

I'd like to index the location .lon and .lat value from the latitude and 
longitude fields. I tried the copy_to function with no success :
  "latitude": {
"type": "double",
"copy_to": "location.lat"
  },
  "longitude": {
"type": "double",
"copy_to": "location.lon"
  },

Is there any way to feed the "location" property from latitude and 
longitude fields at indexation ?

My point is that I don't want to modify the input csv file to adapt it to 
the GeoJSON format (i.e concat lat and lon in one field in the csv file).

Thank you for any hints.

Pascal.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6e12ced7-5b1a-4142-93d1-a3d22d7138a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to modify term frequency formula?

2014-04-02 Thread Ivan Brusic
Are you using a full class name? I have no problems with

curl -XPOST 'http://localhost:9200/sim/' -d '
{
 "settings" : {
   "similarity" : {
"my_similarity" : {
 "type" :
"org.elasticsearch.index.similarity.NormRemovalSimilarityProvider"
}
  }
 },
 "mappings" : {
  "post" : {
   "properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed"},
"contents" : { "type" : "string", "store" : "no", "index" : "analyzed",
"similarity" : "my_similarity"}
   }
  }
 }
}
'



On Wed, Apr 2, 2014 at 12:03 PM, geantbrun  wrote:

> In order to better understand the error, I copied your
> NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in
> usr/share/elasticsearch/lib. I put these 2 files in a jar named
> NormRemovalSimilarity.jar. After restarting the elasticsearch service, I
> tried to create the index with the same mapping as before (except that I
> put "type" : "NormRemoval" in the settings of my_similarity.
>
> The result is the same:
> {"error":"IndexCreationException[[exbd] failed to create index]; nested:
> NoClassSettingsException[Failed to load class setting [type] with value
> [NormRemoval]]; nested:
> ClassNotFoundException[org.elasticsearch.index.similarity.normremoval.NormRemovalSimilarityProvider];
> ","status":500}]
>
> I deleted the jar file just to see if the error is the same: yes it is.
> It's like the new similarity is never found or loaded. Is it still working
> without modifications on your side?
> Cheers,
> Patrick
>
>
> Le mercredi 2 avril 2014 00:31:44 UTC-4, Ivan Brusic a écrit :
>>
>> It has been a while since I used a custom similarity, but what you have
>> looks right. Can you try a full class name instead?
>> Use org.elasticsearch.index.similarity.tfCappedSimilarityProvider.
>> According to the error, it is looking for org.elasticsearch.index.si
>> milarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider.
>>
>> --
>> Ivan
>>
>>
>> On Tue, Apr 1, 2014 at 7:00 AM, geantbrun  wrote:
>>
>>> Sure.
>>>
>>> {
>>>  "settings" : {
>>>   "index" : {
>>>"similarity" : {
>>> "my_similarity" : {
>>>  "type" : "tfCappedSimilarity"
>>> }
>>>}
>>>   }
>>>  },
>>>  "mappings" : {
>>>   "post" : {
>>>"properties" : {
>>> "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" },
>>> "name" : { "type" : "string", "store" : "yes", "index" : "analyzed"},
>>> "contents" : { "type" : "string", "store" : "no", "index" :
>>> "analyzed", "similarity" : "my_similarity"}
>>>}
>>>   }
>>>  }
>>> }
>>>
>>> If I substitute tfCappedSimilarity for tfCapped in the mapping, the
>>> error is the same except that provider is referred as
>>> tfCappedSimilarityProvider and not as tfCappedSimilaritySimilarit
>>> yProvider.
>>> Cheers,
>>> Patrick
>>>
>>>
>>> Le lundi 31 mars 2014 17:13:24 UTC-4, Ivan Brusic a écrit :

 Can you also post your mapping where you defined the similarity?

 --
 Ivan


 On Mon, Mar 31, 2014 at 10:36 AM, geantbrun wrote:

> I realize that I probably have to define the similarity property of my
> field as "my_similarity" (and not as "tfCappedSimilarity") and define in
> the settings my_similarity as being of type tfCappedSimilarity.
> When I do that, I get the following error at the index/mapping
> creation:
>
> {"error":"IndexCreationException[[exbd] failed to create index];
> nested: NoClassSettingsException[Failed to load class setting [type]
> with value [tfCappedSimilarity]]; nested: ClassNotFoundException[org.
> elasticsearch.index.similarity.tfcappedsimilarity.tfCappedSim
> ilaritySimilarityProvider]; ","status":500}]
>
> Note that the provider is referred in the error as
> tfCappedSimilaritySimilarityProvider (similarity repeated 2 times). Is
> it normal?
> Patrick
>
> Le lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit :
>
>> Hi Ivan,
>> I followed your instructions but it does not seem to work, I must be
>> wrong somewhere. I created the jar file from the following two java 
>> files,
>> could you tell me if they are ok?
>>
>> tfCappedSimilarity.java
>> ***
>> package org.elasticsearch.index.similarity;
>>
>> import org.apache.lucene.search.similarities.DefaultSimilarity;
>> import org.elasticsearch.common.logging.ESLogger;
>> import org.elasticsearch.common.logging.Loggers;
>>
>> public class tfCappedSimilarity extends DefaultSimilarity {
>>
>> private ESLogger logger;
>>
>> public tfCappedSimilarity() {
>> logger = Loggers.getLogger(getClass());
>> }
>>
>> /**
>>  * Capped tf value
>>  */
>> @Override
>> public float tf(float freq) {
>> return (float)Math.sqrt(M

Re: data distribution over shards and replicas

2014-04-02 Thread Mark Walkom
1 - Data from both will be available, you've just told ES not to use the
defaults for one index. A replica is not a backup, it's a 1:1 replica so it
will contain the same data as the primary shard.
2 - Not sure, but I don't think so as lucene will try to split things.
Routing is the recommended method for what you want.
3 - Yes, although you are unlikely to have them both on one node unless it
is a single node cluster. What do mean by backup? If you're talking about
replicas instead then the cluster will build a new replica if one dies.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 3 April 2014 00:21, Subhadip Bagui  wrote:

> Thanks Mark for the prompt reply, I have some more doubts
>
> 1. Suppose one index is running with 3 shards and 1 replica and other
> index is running with the cluster settings i.e. 5 shards 2 replica then
> total 3+1 or 5+2 shards will be available in cluster? I have installed
> elasticsearch-head plugin but the replica shard is not showing there.
>
> For data distribution, replica shard also keeps other index documents or
> it will be used to keep backup copy of data only.
>
> 2. So documents under same index will be split due to sharding and
> distribute over the shards right ? Can we push all the documents for same
> index in a particular shard? I don't want to use custom routing as then I
> need one field value common for all the documents. How can we find out
> which shard is holding which documents?
>
> 3. If I make one index with 2 shards and no replica and the node in
> cluster holding this 2 shards dies, then will I lose the data, or the data
> will have a copy in cluster level replica? If I have only 1 replica and
> the node holds the replica dies then how the backup will happen?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4d7d0243-dcd1-4ac7-9fef-1d6e44599ea1%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YnT8-1mHTJ-%3D8RmZhmn5MZugJ0cL39zdKwifG8o98myw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch data node and kibana on different machines

2014-04-02 Thread Mark Walkom
When you mean different data nodes, do you mean nodes that are part of the
same cluster? If so then all you do is point kibana to one node and it will
read any data from that cluster you request.

You need to remove the extra quotes you have in that variable, you only
need them around the entire value, not the http, IP and port.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 3 April 2014 02:33, computer engineer  wrote:

> I would like to know what is the best setup to have an elasticsearch data
> node and kibana server on separate machines. I have setup multiple
> elasticsearch data nodes and would like to show all dashboards on one
> server but not sure how to do that. I do not want to have different urls to
> view different dashboards. I set up the data nodes with logstash shipper on
> each machine so all I need now is to have kibana get data from each
> different data node. Is that possible? I edited the config file for kibana
> as follows:
>
> elasticsearch: "http://"192.168.xx.xxx":9200";
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/dc9a4fab-72e1-4e98-a3ea-1c43d51e050f%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y5xuR%2BffJgxcdEygn8aXaPe3ymgAQTLsBZH5jScV0vHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregations on nested array types

2014-04-02 Thread dazraf
Thanks very much Mark! I'll study this and respond back on this thread.

On Wednesday, 2 April 2014 18:31:29 UTC+1, Mark Harwood wrote:
>
> A rough Gist here that sums OK with one level of nesting: 
> https://gist.github.com/markharwood/9938890
>
>
>
> On Wednesday, April 2, 2014 5:13:22 PM UTC+1, dazraf wrote:
>>
>> Hi,
>> I've also experimented with nested types using dynamic templates. 
>> Interesting (empty!) aggregation results!
>> Gist: https://gist.github.com/dazraf/9937198
>>
>> Would be grateful if anyone can shed some light on this please?
>>
>> Thank you.
>>
>> On Wednesday, 2 April 2014 16:05:00 UTC+1, dazraf wrote:
>>>
>>> Hi,
>>>
>>> Gist: https://gist.github.com/dazraf/9935814
>>>
>>> Basically, I'd like to be able to aggregate a field of an array of 
>>> observations, grouped by an ancestor/parent id. 
>>> So for example (see gist): Aggregate the timings per contestant across a 
>>> set of contests.
>>>
>>> I realise that the data can be structured differently - effectively 
>>> flattened to a document per contest-contestant-contest. 
>>> However, I don't have the luxury of doing this in the real-world case. 
>>>
>>> Any help much appreciated.
>>>
>>> Thank you.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1bea9acb-4f36-40dd-a19a-65a1cb332939%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to modify term frequency formula?

2014-04-02 Thread geantbrun
In order to better understand the error, I copied your 
NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in 
usr/share/elasticsearch/lib. I put these 2 files in a jar named 
NormRemovalSimilarity.jar. After restarting the elasticsearch service, I 
tried to create the index with the same mapping as before (except that I 
put "type" : "NormRemoval" in the settings of my_similarity.

The result is the same: 
{"error":"IndexCreationException[[exbd] failed to create index]; nested: 
NoClassSettingsException[Failed to load class setting [type] with value 
[NormRemoval]]; nested: 
ClassNotFoundException[org.elasticsearch.index.similarity.normremoval.NormRemovalSimilarityProvider];
 
","status":500}]

I deleted the jar file just to see if the error is the same: yes it is. 
It's like the new similarity is never found or loaded. Is it still working 
without modifications on your side?
Cheers,
Patrick


Le mercredi 2 avril 2014 00:31:44 UTC-4, Ivan Brusic a écrit :
>
> It has been a while since I used a custom similarity, but what you have 
> looks right. Can you try a full class name instead? 
> Use org.elasticsearch.index.similarity.tfCappedSimilarityProvider. 
> According to the error, it is looking for org.elasticsearch.index.
> similarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider.
>
> -- 
> Ivan
>
>
> On Tue, Apr 1, 2014 at 7:00 AM, geantbrun 
> > wrote:
>
>> Sure.
>>
>> {
>>  "settings" : {
>>   "index" : {
>>"similarity" : {
>> "my_similarity" : {
>>  "type" : "tfCappedSimilarity"
>> }
>>}
>>   }
>>  },
>>  "mappings" : {
>>   "post" : {
>>"properties" : {
>> "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" },
>> "name" : { "type" : "string", "store" : "yes", "index" : "analyzed"},
>> "contents" : { "type" : "string", "store" : "no", "index" : 
>> "analyzed", "similarity" : "my_similarity"}
>>}
>>   }
>>  }
>> }
>>
>> If I substitute tfCappedSimilarity for tfCapped in the mapping, the 
>> error is the same except that provider is referred as 
>> tfCappedSimilarityProvider and not as tfCappedSimilaritySimilarit
>> yProvider.
>> Cheers,
>> Patrick
>>
>>
>> Le lundi 31 mars 2014 17:13:24 UTC-4, Ivan Brusic a écrit :
>>>
>>> Can you also post your mapping where you defined the similarity?
>>>
>>> -- 
>>> Ivan
>>>
>>>
>>> On Mon, Mar 31, 2014 at 10:36 AM, geantbrun  wrote:
>>>
 I realize that I probably have to define the similarity property of my 
 field as "my_similarity" (and not as "tfCappedSimilarity") and define in 
 the settings my_similarity as being of type tfCappedSimilarity.
 When I do that, I get the following error at the index/mapping creation:

 {"error":"IndexCreationException[[exbd] failed to create index]; 
 nested: NoClassSettingsException[Failed to load class setting [type] 
 with value [tfCappedSimilarity]]; nested: ClassNotFoundException[org.
 elasticsearch.index.similarity.tfcappedsimilarity.
 tfCappedSimilaritySimilarityProvider]; ","status":500}]

 Note that the provider is referred in the error as 
 tfCappedSimilaritySimilarityProvider (similarity repeated 2 times). Is 
 it normal?
 Patrick

 Le lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit :

> Hi Ivan,
> I followed your instructions but it does not seem to work, I must be 
> wrong somewhere. I created the jar file from the following two java 
> files, 
> could you tell me if they are ok?
>
> tfCappedSimilarity.java
> ***
> package org.elasticsearch.index.similarity;
>
> import org.apache.lucene.search.similarities.DefaultSimilarity;
> import org.elasticsearch.common.logging.ESLogger;
> import org.elasticsearch.common.logging.Loggers;
>  
> public class tfCappedSimilarity extends DefaultSimilarity {
>
> private ESLogger logger;
>
> public tfCappedSimilarity() {
> logger = Loggers.getLogger(getClass());
> }
>
> /**
>  * Capped tf value
>  */
> @Override
> public float tf(float freq) {
> return (float)Math.sqrt(Math.min(9, freq));
> }
> }
>
> tfCappedSimilarityProvider.java
> *
> package org.elasticsearch.index.similarity;
>
> import org.elasticsearch.common.inject.Inject;
> import org.elasticsearch.common.inject.assistedinject.Assisted;
> import org.elasticsearch.common.settings.Settings;
>
> public class tfCappedSimilarityProvider extends 
> AbstractSimilarityProvider {
>
> private tfCappedSimilarity similarity;
>
> @Inject
> public tfCappedSimilarityProvider(@Assisted String name, 
> @Assisted Settings settings) {
>  super(name);
> this.similarity = new tfCapped

Re: using java get document api within a script field

2014-04-02 Thread mat taylor
Yes that would be very interesting. 
I have also got a good workaround to my issue now by using the lookup 
script from https://github.com/imotov/elasticsearch-native-script-example


On Wednesday, April 2, 2014 1:17:52 AM UTC-7, Jörg Prante wrote:
>
> I wrote a denormalizer plugin where I use a node client from a field 
> analyzer for a field type "deref". A node client is started as a singleton 
> per node where the plugin is installed. It can ask other indexes/types for 
> a doc by given ID for injecting additional terms from an array of terms of 
> the referenced doc into the current tokenizer stream.
>
> It is not well tested and early in development, but I can share the code 
> if there is interest.
>
> I would never start a client node from a a script, because of the enormous 
> overhead, as you have experienced.
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/29da35bf-44e1-4fbf-85b6-2253f6943cbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sense on github abandoned?

2014-04-02 Thread AlexR
If it is a matter of paying for Sense, I would vote for a paid chrome extension 
at a reasonable price so people who need sense can purchase it independently 
from marvell 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/61825c6e-9fee-4251-a8a1-efb069407304%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart of a cluster?

2014-04-02 Thread Nikolas Everett
I'm not sure what is up but my advice is to make sure you read the cluster
state from the node you are restarting.  That'll make sure it is up in the
first place and you'll get that node's view of the cluster.


Nik


On Wed, Apr 2, 2014 at 2:08 PM, Mike Deeks  wrote:

> That is exactly what I'm doing. For some reason the cluster reports as
> green even though an entire node is down. The cluster doesn't seem to
> notice the node is gone and change to yellow until many seconds later. By
> then my rolling restart script has already gotten to the second node and
> killed it because the cluster was still green for some reason.
>
>
> On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:
>
>> Mike,
>>
>> Your script needs to check for the status of the cluster before shutting
>> down a node, ie if the state is yellow wait until it becomes green again
>> before shutting down the next node. You'll probably want do disable
>> allocation of shards while each node is being restarted (enable when node
>> comes back) in order to minimize the amount of data that needs to be
>> rebalanced.
>> Also make sure to have 'discovery.zen.minimum_master_nodes' correctly
>> set in your elasticsearch.yml file.
>>
>> Meta code
>>
>> for node in $cluster_nodes; do
>>if [ $cluster_status == 'green' ]; then
>> cluster_disable_allocation()
>> shutdown_node($node)
>> wait_for_node_to_rejoin()
>> cluster_enable_allocation()
>> wait_for_cluster_status_green()
>>   fi
>> done
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/modules-cluster.html
>>
>> /petter
>>
>>
>> On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks  wrote:
>>
>>> What is the proper way of performing a rolling restart of a cluster? I
>>> currently have my stop script check for the cluster health to be green
>>> before stopping itself. Unfortunately this doesn't appear to be working.
>>>
>>> My setup:
>>> ES 1.0.0
>>> 3 node cluster w/ 1 replica.
>>>
>>> When I perform the rolling restart I see the cluster still reporting a
>>> green state when a node is down. In theory that should be a yellow state
>>> since some shards will be unallocated. My script output during a rolling
>>> restart:
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>>> ... continues as green for many more seconds...
>>>
>>> Since it is reporting as green, the second node thinks it can stop and
>>> ends up putting the cluster into a broken red state:
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046
>>>
>>> My stop script issues a call to http://localhost:9200/_
>>> cluster/nodes/_local/_shutdown to kill the node. Is it possible the
>>> other nodes are waiting to timeout the down node before moving into the
>>> yellow state? I would assume the shutdown API call would inform the other
>>> nodes that it is going down.
>>>
>>> Appreciate any help on how to do this properly.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%
>>> 40googlegroups.com

Re: Rolling restart of a cluster?

2014-04-02 Thread Ivan Brusic
My scripts do a wait for yellow before waiting for green, because as you
noticed, the cluster does not entering a yellow state immediately following
a cluster (shutdown, replica change) event.

-- 
Ivan


On Wed, Apr 2, 2014 at 11:08 AM, Mike Deeks  wrote:

> That is exactly what I'm doing. For some reason the cluster reports as
> green even though an entire node is down. The cluster doesn't seem to
> notice the node is gone and change to yellow until many seconds later. By
> then my rolling restart script has already gotten to the second node and
> killed it because the cluster was still green for some reason.
>
>
> On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:
>
>> Mike,
>>
>> Your script needs to check for the status of the cluster before shutting
>> down a node, ie if the state is yellow wait until it becomes green again
>> before shutting down the next node. You'll probably want do disable
>> allocation of shards while each node is being restarted (enable when node
>> comes back) in order to minimize the amount of data that needs to be
>> rebalanced.
>> Also make sure to have 'discovery.zen.minimum_master_nodes' correctly
>> set in your elasticsearch.yml file.
>>
>> Meta code
>>
>> for node in $cluster_nodes; do
>>   if [ $cluster_status == 'green' ]; then
>> cluster_disable_allocation()
>> shutdown_node($node)
>> wait_for_node_to_rejoin()
>> cluster_enable_allocation()
>> wait_for_cluster_status_green()
>>   fi
>> done
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/modules-cluster.html
>>
>> /petter
>>
>>
>> On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks  wrote:
>>
>>> What is the proper way of performing a rolling restart of a cluster? I
>>> currently have my stop script check for the cluster health to be green
>>> before stopping itself. Unfortunately this doesn't appear to be working.
>>>
>>> My setup:
>>> ES 1.0.0
>>> 3 node cluster w/ 1 replica.
>>>
>>> When I perform the rolling restart I see the cluster still reporting a
>>> green state when a node is down. In theory that should be a yellow state
>>> since some shards will be unallocated. My script output during a rolling
>>> restart:
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>>> ... continues as green for many more seconds...
>>>
>>> Since it is reporting as green, the second node thinks it can stop and
>>> ends up putting the cluster into a broken red state:
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046
>>>
>>> My stop script issues a call to http://localhost:9200/_
>>> cluster/nodes/_local/_shutdown to kill the node. Is it possible the
>>> other nodes are waiting to timeout the down node before moving into the
>>> yellow state? I would assume the shutdown API call would inform the other
>>> nodes that it is going down.
>>>
>>> Appreciate any help on how to do this properly.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%
>>> 40googlegroups.com

Delete documents after split brain

2014-04-02 Thread Greg
Hi,

i am running a cluster of 5 servers. Elasticsearch version 0.90.5 . 

Today we run into split brain. One of the server saw all server and was a 
master and another 4 server saw only 4 servers ans has another server as a 
master. We restarted "broken" server, so the problem was gone. 

I need to delete a few documents from an index. Index has 5 shards and 1 
replica.  But if i try delete by query, elasticsearch says 5 shards failed 
and documets stay there. If i try to delete document by id, elasticsearch 
says -> document not exists. Do you have any ideas, how i can delete the 
documents? 

Thanks

Greg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9d72fda0-6d44-465c-9357-4237ea7cae04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart of a cluster?

2014-04-02 Thread Mike Deeks
That is exactly what I'm doing. For some reason the cluster reports as 
green even though an entire node is down. The cluster doesn't seem to 
notice the node is gone and change to yellow until many seconds later. By 
then my rolling restart script has already gotten to the second node and 
killed it because the cluster was still green for some reason.

On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:
>
> Mike,
>
> Your script needs to check for the status of the cluster before shutting 
> down a node, ie if the state is yellow wait until it becomes green again 
> before shutting down the next node. You'll probably want do disable 
> allocation of shards while each node is being restarted (enable when node 
> comes back) in order to minimize the amount of data that needs to be 
> rebalanced.
> Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set 
> in your elasticsearch.yml file.
>
> Meta code
>
> for node in $cluster_nodes; do
>   if [ $cluster_status == 'green' ]; then
> cluster_disable_allocation()
> shutdown_node($node)
> wait_for_node_to_rejoin()
> cluster_enable_allocation()
> wait_for_cluster_status_green()
>   fi
> done
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html
>
> /petter
>
>
> On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks 
> > wrote:
>
>> What is the proper way of performing a rolling restart of a cluster? I 
>> currently have my stop script check for the cluster health to be green 
>> before stopping itself. Unfortunately this doesn't appear to be working.
>>
>> My setup:
>> ES 1.0.0
>> 3 node cluster w/ 1 replica.
>>
>> When I perform the rolling restart I see the cluster still reporting a 
>> green state when a node is down. In theory that should be a yellow state 
>> since some shards will be unallocated. My script output during a rolling 
>> restart:
>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>
>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>
>> curl: (52) Empty reply from server
>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>
>> curl: (52) Empty reply from server
>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>> ... continues as green for many more seconds...
>>
>> Since it is reporting as green, the second node thinks it can stop and 
>> ends up putting the cluster into a broken red state:
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046
>>
>> My stop script issues a call to 
>> http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node. 
>> Is it possible the other nodes are waiting to timeout the down node before 
>> moving into the yellow state? I would assume the shutdown API call would 
>> inform the other nodes that it is going down.
>>
>> Appreciate any help on how to do this properly.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on

Re: Relevancy sorting of result returned

2014-04-02 Thread chee hoo lum
Hi Ivan,

Nope i didn't disable the norm. Here's the mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
"BILLINGTYPE_ID": {
"type": "long"
},
"CATMEDIA_CDATE": {
"type": "date",
"format": "dateOptionalTime"
},
"CATMEDIA_NAME": {
"type": "string"
},
"CATMEDIA_RANK": {
"type": "long"
},
"CAT_ID": {
"type": "long"
},
"CAT_NAME": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"CAT_PARENT": {
"type": "long"
},
"CHANNEL_ID": {
"type": "long"
},
"CKEY": {
"type": "long"
},
"DISPLAY_NAME": {
"type": "string"
},
"FTID": {
"type": "string"
},
"GENRE": {
"type": "string"
},
"ITEMCODE": {
"type": "string"
},
"KEYWORDS": {
"type": "string"
},
"LANG_ID": {
"type": "long"
},
"LONG_DESCRIPTION": {
"type": "string"
},
"MAPPINGS": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"MEDIA_ID": {
"type": "long"
},
"MEDIA_PKEY": {
"type": "string"
},
"PERFORMER": {
"type": "string"
},
"PLAYER": {
"type": "string"
},
"POSITION": {
"type": "long"
},
"PRICE": {
"type": "double"
},
"PRIORITY": {
"type": "long"
},
"SHORTCODE": {
"type": "string"
},
"SHORT_DESCRIPTION": {
"type": "string"
},
"TYPE_ID": {
"type": "long"
},
"VIEW_ID": {
"type": "long"
}
}
}
}


My client is nagging about the result relevancy returned. You know business
user always compare with google search result and stuff. lol. For now i am
scratching my head to sort this problem out. My use case is search through
by the display_name and performer and display as the closest possible in
the top of the list.

eg :

1)Happy
2)Happy
3)Be Happy

Would be deeply appreciated if you could shed me some light. Thanks






On Thu, Apr 3, 2014 at 1:51 AM, Ivan Brusic  wrote:

> All the documents have the same score since they have the same field
> weight, idf (always the same when you only have one search term) and term
> frequency (each document has the term once).
>
> It appears that you disabled norms on the DISPLAY_NAME field since the
> field norm is 1. Is this correct? Can you provide the mapping? If you
> disable norms, you will no longer get length normalization, which would
> provide the ordering you desire since the field norms will penalize the
> longer field, but it not might be ideal for every search. Relevancy
> ultimately depends on you and your use cases. Another option is to enable
> term vectors [1] (or index the number of terms yourself) and see if the
> resulting field has the same number of tokens returned.  Very kludgy.
>
> [1]
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html
>
> Cheers,
>
> Ivan
>
> On Wed, Apr 2, 2014 at 4:02 AM, chee hoo lum  wrote:
>
>> Hi Binh,
>>
>> The same problem again. I have the following queries :
>>
>> 1)
>>
>> {
>>   "from" : 0,
>>   "size" : 100,
>>   "explain" : true,
>>   "query" : {
>> "filtered" : {
>>   "query" : {
>>  "multi_match": {
>>   "query": "happy",
>>   "fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
>> }
>>   },
>>   "filter" : {
>> "query" : {
>>   "bool" : {
>>   "must" : {
>> "term" : {
>>   "CHANNEL_ID" : "1"
>> }
>>   }
>> }
>> }
>>   }
>> }
>>   }
>> }
>>
>> However the result display in reverse order for #2 and #3. I have added
>> the boost in the DISPLAY_NAME but still yield the same behaviour :
>>
>> 1)
>> * "_score": 10.960511,*
>> "_source": {
>> "DISPLAY_NAME": "Happy",
>> "PRICE": 5,
>> "CHANNEL_ID": 1,
>> "CAT_PARENT": 981,
>> "MEDIA_ID": 390933,
>> "GENRE": "Happy",
>>  

Re: Relevancy sorting of result returned

2014-04-02 Thread Ivan Brusic
All the documents have the same score since they have the same field
weight, idf (always the same when you only have one search term) and term
frequency (each document has the term once).

It appears that you disabled norms on the DISPLAY_NAME field since the
field norm is 1. Is this correct? Can you provide the mapping? If you
disable norms, you will no longer get length normalization, which would
provide the ordering you desire since the field norms will penalize the
longer field, but it not might be ideal for every search. Relevancy
ultimately depends on you and your use cases. Another option is to enable
term vectors [1] (or index the number of terms yourself) and see if the
resulting field has the same number of tokens returned.  Very kludgy.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Cheers,

Ivan

On Wed, Apr 2, 2014 at 4:02 AM, chee hoo lum  wrote:

> Hi Binh,
>
> The same problem again. I have the following queries :
>
> 1)
>
> {
>   "from" : 0,
>   "size" : 100,
>   "explain" : true,
>   "query" : {
> "filtered" : {
>   "query" : {
>  "multi_match": {
>   "query": "happy",
>   "fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
> }
>   },
>   "filter" : {
> "query" : {
>   "bool" : {
>   "must" : {
> "term" : {
>   "CHANNEL_ID" : "1"
> }
>   }
> }
> }
>   }
> }
>   }
> }
>
> However the result display in reverse order for #2 and #3. I have added
> the boost in the DISPLAY_NAME but still yield the same behaviour :
>
> 1)
> * "_score": 10.960511,*
> "_source": {
> "DISPLAY_NAME": "Happy",
> "PRICE": 5,
> "CHANNEL_ID": 1,
> "CAT_PARENT": 981,
> "MEDIA_ID": 390933,
> "GENRE": "Happy",
> "MEDIA_PKEY": "838644",
> "COMPOSER": null,
> "PLAYER": null,
> "CATMEDIA_NAME": "*Happy*",
> "FTID": null,
> "VIEW_ID": 43,
> "POSITION": 51399,
> "ITEMCODE": null,
> "CAT_ID": 982,
> "PRIORITY": 80,
> "CKEY": 757447,
> "CATMEDIA_RANK": 3,
> "BILLINGTYPE_ID": 1,
> "CAT_NAME": "POP",
> "KEYWORDS": null,
> "LONG_DESCRIPTION": null,
> "SHORT_DESCRIPTION": null,
> "TYPE_ID": 74,
> "ARTIST_GENDER": null,
>* "PERFORMER": "Mario Pacchioli",*
> "MAPPINGS": "1_43_982_POP_981_51399_5",
> "SHORTCODE": null,
> "CATMEDIA_CDATE": "2014-01-12T15:12:27.000Z",
> "LANG_ID": 1
> },
> "_explanation": {
> "value": 10.960511,
> "description": "max of:",
> "details": [
> {
> "value": 10.960511,
> "description": "weight(DISPLAY_NAME:happy^6.0
> in 23025) [PerFieldSimilarity], result of:",
> "details": [
> {
> "value": 10.960511,
> "description": "fieldWeight in 23025,
> product of:",
> "details": [
> {
> "value": 1,
> "description": "tf(freq=1.0),
> with freq of:",
> "details": [
> {
> "value": 1,
> "description":
> "termFreq=1.0"
> }
> ]
> },
> {
> "value": 10.960511,
> "description":
> "idf(docFreq=58, maxDocs=1249243)"
> },
> {
> "value": 1,
> "description":
> "fieldNorm(doc=23025)"
> }
> ]
> }
> ]
> }
> ]
> }
> }
>
>
> 2)
> "_id": "10194",
>   *  "_score": 10.699952,*
> 

ES document field.

2014-04-02 Thread san
Could the the JSON fields of the document indexed in Elasticsearch have the 
following:

1. Capital letters
2. Special character such as SPACE etc. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e13a34b2-c1d7-45f8-b82e-64c89c1cbb9c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sense on github abandoned?

2014-04-02 Thread Boaz Leskes
People can try and use marvel and thus sense for free in their dev environment. 
If they want to use it with a production cluster they need a license for that 
cluster. It doesn't matter how many developers are using it.

On Wed, Apr 2, 2014 at 7:14 PM, ppearcy  wrote:

> Hi, 
>   Since Marvel requires a license for production usage, does this mean in 
> order to use the Marvel bundled Sense against a production instance 
> requires you to buy a license? 
> I just got out of a meeting where I told a bunch of people to go download 
> sense off the chrome store. Whoops :) 
> Thanks!
> Paul
> On Tuesday, April 1, 2014 12:14:44 PM UTC-4, kimchy wrote:
>>
>> Sense started as a weekend project, and Boaz did not place a license on 
>> it. As you mentioned, this license effectively applies: 
>> http://choosealicense.com/no-license/. We consulted our lawyers, who 
>> specialize in open source, and changing the license to open source one is 
>> complex, expensive, and requires a lot of resources. The reason is that its 
>> not only getting the committers agreement, but also reaching all possible 
>> users and have them agree to it (or at least showing big investment in 
>> trying to do so, + a rather large time window to allow for people to 
>> object).
>>
>> When Boaz created Sense, he was not employed by Elasticsearch. Obviously 
>> any project started by our employees has a clear license (as you can notice 
>> with the many projects we created).
>>
>> Regarding Marvel:
>>
>> - You are only required to pay for it when used in production.
>> - You don’t have to be a support customer of Elasticsearch the company, 
>> you can buy a license for Marvel easily on the web. We made it super cheap 
>> since we think its something that a lot of people will find benefit from.
>>
>> On Apr 1, 2014, at 17:00, Ivan Brusic > 
>> wrote:
>>
>> I personally do not require an open source license for Marvel/Sense, but I 
>> would like to see an explicit clarification about the use of Marvel in this 
>> scenario. Marvel does require a license to use and that would apply to any 
>> of its subsystems. Then again, Sense does not have a license, which means 
>> its use is also somewhat restricted.
>>
>> Sense is an excellent tool and users dependency on the tool is quite 
>> apparent from this thread. :)
>>
>> I haven't packaged a Chrome plugin in about 3 years. Not only has my 
>> memory faded, but I would assume the mechanism has changed in our fast 
>> changing world of development. It would be a fun exercise to attempt to do 
>> it again.
>>
>> Cheers,
>> Ivan
>>
>>
>> On Tue, Apr 1, 2014 at 5:48 AM, Tim S >wrote:
>>
>>> @kimchy the whole reason for me asking these questions is that sometimes 
>>> a customer is using elasticsearch but they don't (yet) have a support 
>>> contract, but don't consider themselves "in development" either, and thus 
>>> wouldn't allow me to use Marvel. Yes, there are other tools for poking 
>>> around, but sense is invaluable for constructing complicated queries etc 
>>> quickly. In this situation they wouldn't let me install a chrome plugin 
>>> either, but sense works nicely as an elasticsearch plugin too.
>>>
>>> So, if sense (the abandoned version on github) had some kind of 
>>> permissive licence, I could turn up on customer site and use sense to poke 
>>> around.
>>> Ideally, it would have a licence like AL2 which would allow me to modify 
>>> it if necessary.
>>>
>>> I realise that you don't want updates pushed back to the version of sense 
>>> on github because those changes are helping you to make money from Marvel, 
>>> I understand that. But if the abandoned version of sense did have an 
>>> appropriate licence, it would allow us to use the current version - it's 
>>> still useful even if it's not kept up to date. I might even be tempted to 
>>> try and keep it up to date in my spare time. But clearly I can't do this 
>>> unless it has a licence that allows me to do it.
>>>
>>> Glad to see I'm not the only person thinking along these lines.
>>>
>>>
>>>
>>> On Tuesday, April 1, 2014 11:15:07 AM UTC+1, Jörg Prante wrote:

 +1 for Sense standalone packaging
 +1 for Sense in Chrome Web Store

 Sense is used here all the time, it's essential.

 I have also forked the code in case Sense goes away, hoping for a FOSS 
 license.

 Not that I'm fluid in writing browser plugins, but if I find time, I am 
 not afraid of the learning curve.

 Jörg



>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/837794c8-1a0a-411f-a29c-852133d6fbc2%40googlegroups.com

Re: Aggregations on nested array types

2014-04-02 Thread Mark Harwood
A rough Gist here that sums OK with one level of 
nesting: https://gist.github.com/markharwood/9938890



On Wednesday, April 2, 2014 5:13:22 PM UTC+1, dazraf wrote:
>
> Hi,
> I've also experimented with nested types using dynamic templates. 
> Interesting (empty!) aggregation results!
> Gist: https://gist.github.com/dazraf/9937198
>
> Would be grateful if anyone can shed some light on this please?
>
> Thank you.
>
> On Wednesday, 2 April 2014 16:05:00 UTC+1, dazraf wrote:
>>
>> Hi,
>>
>> Gist: https://gist.github.com/dazraf/9935814
>>
>> Basically, I'd like to be able to aggregate a field of an array of 
>> observations, grouped by an ancestor/parent id. 
>> So for example (see gist): Aggregate the timings per contestant across a 
>> set of contests.
>>
>> I realise that the data can be structured differently - effectively 
>> flattened to a document per contest-contestant-contest. 
>> However, I don't have the luxury of doing this in the real-world case. 
>>
>> Any help much appreciated.
>>
>> Thank you.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9dec2ee8-5cc5-4243-827f-39d1010382c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Tool Contribution] Alfred the ElasticSearch Butler

2014-04-02 Thread Ivan Brusic
Colton,

Interesting tool and thanks for contributing. Will definitely check it out.
One of the main index maintenance tasks that I do is to remove all replicas
on older (backup) indices. I am currently doing this task manually because
there needs to be human verification of certain criteria before indices are
closed/deleted.

BTW, Bruce Wayne is DC Comics, not Marvel.

Cheers,

Ivan (not a comic book reader)


On Wed, Apr 2, 2014 at 3:57 AM, Colton  wrote:

>  Hello ElasticSearch Community,
>
> My name is Colton McInroy and I work with DOSarrest Internet Security
> LTD. Over the past few months I have been working with ElasticSearch fairly
> closely and building a infrastructure for it. When dealing with lots of
> indices, managing lots them can be somewhat difficult in most web
> interfaces we found. We wanted to be able to for instance, have indices
> over a certain amount of time expire out of the cluster. We came across
> curator (https://github.com/elasticsearch/curator) which came fairly
> close, but had some limitations. I decided to spend a couple of days
> building our own tool from scratch which after discussion we have decided
> to release to the public via open source. We have called this tool Alfred,
> after Bruce Wayne's butler Alfred Pennyworth, keeping in line with the
> Marvel comics theme.
>
> Alfred can be set up in a cronjob to automatically groom your indices
> so that you only keep a certain amount of data, optimize indexes, change
> settings (such as changing routing), and more. By default no changes are
> made unless you specify the -r or --run parameter. In its default mode, you
> can test this tool all you want and get output to see what would have been
> done without changes actually occurring. You can use the -D option to
> specify more debug output also if you want to see what's going on (such as
> "-D debug"). Once you are ready, add the -r parameter and watch Alfred do
> all the work for you.
>
> Alfred was developed in Java, but does not use the ElasticSearch Java
> API, rather it uses the restful api through the use of Apache HttpClient (
> http://hc.apache.org/httpclient-3.x/). The following libraries are
> included via maven into Alfred...
>
> joda-time 2.3
> httpcore 4.3.2
> gson 2.2.4
> httpclient 4.3.3
> commons-logging 1.1.3
> commons-codec 1.6
> commons-cli 1.2
>
> A jar build is located at
> https://github.com/DOSarrest-Internet-Security/alfred/raw/master/builds/alfred-0.0.1.jar
> Our Github page with source and README is located at
> https://github.com/DOSarrest-Internet-Security/alfred
>
> Here is some of that README file to explain how to use alfred...
>
> usage: alfred
>  -b,--debloom  Disable Bloom on Indexes
>  -B,--bloomEnable Bloom on Indexes
>  -c,--closeClose Indexes
>  -D,--debug   Display debug (debug|info|warn|error|fatal)
>  -d,--delete   Delete Indexes
>  -E,--expiresize  Byte size limit  (Default 10 GB)
>  -e,--expiretime  Number of time units old (Default 24)
> --examples Show some examples of how to use Alfred
>  -f,--flushFlush Indexes
>  -h,--help Help Page (Viewing Now)
> --hostElasticSearch Host
>  -i,--index   Index pattern to match (Default _all)
> --max_num_segmentsOptimize max_num_segments (Default 2)
>  -o,--optimize Optimize Indexes
>  -O,--open Open Indexes
> --portElasticSearch Port
>  -r,--run  Required to execute changes on
>ElasticSearch
>  -s,--style   Clean up style (time|size) (Default time)
>  -S,--settingsPUT settings
> --ssl  ElasticSearch SSL
>  -T,--time-unit   Specify time units (hour|day|none) (Default
>hour)
>  -t,--timeout ElasticSearch Timeout (Default 30)
> Alfred Version: 0.0.1
>
>
> Alfred was built as a tool to handle maintenance work on ElasticSearch.
> Alfred will delete, flush cache, optimize, close/open, enable/disable bloom
> filter, as well as put settings on indexes. Alfred can do any of these
> actions based on either time or size parameters.
>
> Examples:
>
> java -jar alfred.jar -e48 -i"cron_*" -d
>
> Delete any indexes starting with "cron_" that are older that 48 hours
>
> java -jar alfred.jar -e24 -i"cron_*" 
> -S'{"index.routing.allocation.require.tag":"historical"}'
>
> Set routing to require historical tag on any indexes starting with "cron_"
> that are older that 24 hours
>
> java -jar alfred.jar -e24 -i"cron_*" -b -o
>
> Disable boom filter and optimize any indexes starting with "cron_" that
> are older that 24 hours
>
> java -jar alfred.jar -ssize -E"1 GB" -d
>
> Find all indxes, group by prefix, and delete indexes over a limit of 1 GB.
> Using the size style with an expire 

Re: Sense on github abandoned?

2014-04-02 Thread ppearcy
Hi, 
  Since Marvel requires a license for production usage, does this mean in 
order to use the Marvel bundled Sense against a production instance 
requires you to buy a license? 

I just got out of a meeting where I told a bunch of people to go download 
sense off the chrome store. Whoops :) 

Thanks!
Paul

On Tuesday, April 1, 2014 12:14:44 PM UTC-4, kimchy wrote:
>
> Sense started as a weekend project, and Boaz did not place a license on 
> it. As you mentioned, this license effectively applies: 
> http://choosealicense.com/no-license/. We consulted our lawyers, who 
> specialize in open source, and changing the license to open source one is 
> complex, expensive, and requires a lot of resources. The reason is that its 
> not only getting the committers agreement, but also reaching all possible 
> users and have them agree to it (or at least showing big investment in 
> trying to do so, + a rather large time window to allow for people to 
> object).
>
> When Boaz created Sense, he was not employed by Elasticsearch. Obviously 
> any project started by our employees has a clear license (as you can notice 
> with the many projects we created).
>
> Regarding Marvel:
>
> - You are only required to pay for it when used in production.
> - You don’t have to be a support customer of Elasticsearch the company, 
> you can buy a license for Marvel easily on the web. We made it super cheap 
> since we think its something that a lot of people will find benefit from.
>
> On Apr 1, 2014, at 17:00, Ivan Brusic > 
> wrote:
>
> I personally do not require an open source license for Marvel/Sense, but I 
> would like to see an explicit clarification about the use of Marvel in this 
> scenario. Marvel does require a license to use and that would apply to any 
> of its subsystems. Then again, Sense does not have a license, which means 
> its use is also somewhat restricted.
>
> Sense is an excellent tool and users dependency on the tool is quite 
> apparent from this thread. :)
>
> I haven't packaged a Chrome plugin in about 3 years. Not only has my 
> memory faded, but I would assume the mechanism has changed in our fast 
> changing world of development. It would be a fun exercise to attempt to do 
> it again.
>
> Cheers,
> Ivan
>
>
> On Tue, Apr 1, 2014 at 5:48 AM, Tim S >wrote:
>
>> @kimchy the whole reason for me asking these questions is that sometimes 
>> a customer is using elasticsearch but they don't (yet) have a support 
>> contract, but don't consider themselves "in development" either, and thus 
>> wouldn't allow me to use Marvel. Yes, there are other tools for poking 
>> around, but sense is invaluable for constructing complicated queries etc 
>> quickly. In this situation they wouldn't let me install a chrome plugin 
>> either, but sense works nicely as an elasticsearch plugin too.
>>
>> So, if sense (the abandoned version on github) had some kind of 
>> permissive licence, I could turn up on customer site and use sense to poke 
>> around.
>> Ideally, it would have a licence like AL2 which would allow me to modify 
>> it if necessary.
>>
>> I realise that you don't want updates pushed back to the version of sense 
>> on github because those changes are helping you to make money from Marvel, 
>> I understand that. But if the abandoned version of sense did have an 
>> appropriate licence, it would allow us to use the current version - it's 
>> still useful even if it's not kept up to date. I might even be tempted to 
>> try and keep it up to date in my spare time. But clearly I can't do this 
>> unless it has a licence that allows me to do it.
>>
>> Glad to see I'm not the only person thinking along these lines.
>>
>>
>>
>> On Tuesday, April 1, 2014 11:15:07 AM UTC+1, Jörg Prante wrote:
>>>
>>> +1 for Sense standalone packaging
>>> +1 for Sense in Chrome Web Store
>>>
>>> Sense is used here all the time, it's essential.
>>>
>>> I have also forked the code in case Sense goes away, hoping for a FOSS 
>>> license.
>>>
>>> Not that I'm fluid in writing browser plugins, but if I find time, I am 
>>> not afraid of the learning curve.
>>>
>>> Jörg
>>>
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/837794c8-1a0a-411f-a29c-852133d6fbc2%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web vis

Re: [Hadoop] New Feature - to write bulks to different indexes from hadoop...

2014-04-02 Thread Costin Leau

Hi,

The functionality is available in master and soon be released through es-hadoop 1.3 M3. The docs are not there yet but 
in short, you can declare dynamic index/type using the data being parsed. For example:


es.resource={media_type}/location_{id}
where 'media_type' and 'id' are resolved from the current entry. In M/R this means means looking into the current 
MapWritable, in Cascading and Pig the current tuple and for Hive the current 'column'.


Of course, raw json can be also used in which case, the field will be extracted.

Try it out and let us know what you think.
Cheers,

On 4/2/14 12:24 PM, Igor Romanov wrote:

Hey,

I am designing solution for indexing using hadoop.
I think to use same logic of LogStash to create index per period of time of my 
records (10 days or Month) , in order to
avoid working with big index sizes(from experience - merge of huge fragments in 
lucene make whole index being slow) and
also that way I don't limit myself to certain amount of shards, I will be able 
to modify period dynamically and move
indexes between nodes in the cluster...

So I though writing in elasticsearch-hadoop option of extracting indexName from 
value object -  or even use the key for
index name, then holding RestRepository object per index name, that will buffer 
bulks per index and send them when bulk
is full or hadoop job ends

Another option just write in the bulk index name + type, and send bulk to 
master ES node (not take shards list of
certain index and choose one shard depending on instance of hadoop)
(but in that scenario I think that master ES node will work too hard because 
many mappers/reducers will write to same
node and it will need to route those index records one by one...)

Who worked with elasticsearch-hadoop code - I would like to receive inputs - 
what do you think? what better?

Thanks,
Igor


--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/696de734-e97e-4cb5-ae80-5fa8717b6190%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/533C43D0.8050207%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch data node and kibana on different machines

2014-04-02 Thread computer engineer
you need to ask some difficult questions to get some help around 
here...oops wait this was my post.

On Wednesday, April 2, 2014 11:33:38 AM UTC-4, computer engineer wrote:
>
> I would like to know what is the best setup to have an elasticsearch data 
> node and kibana server on separate machines. I have setup multiple 
> elasticsearch data nodes and would like to show all dashboards on one 
> server but not sure how to do that. I do not want to have different urls to 
> view different dashboards. I set up the data nodes with logstash shipper on 
> each machine so all I need now is to have kibana get data from each 
> different data node. Is that possible? I edited the config file for kibana 
> as follows:
>
> elasticsearch: "http://"192.168.xx.xxx":9200";
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f14be71-fa3e-4cb2-8368-1ff6ca5ce32e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ESRejectedExecutionException

2014-04-02 Thread joergpra...@gmail.com
If you have 40 search threads on the node running and no queue, you should
not use more than 40 search threads on the client, otherwise rejections are
to be expected.

Jörg


On Wed, Apr 2, 2014 at 9:00 AM, Pandiyan  wrote:

> I've been testing concurrent queries, I have just one node in a server (1
> * 4
> core CPU, 12G memory) and create a index (4 shards, 1 replica).  I use 1000
> concurrent threads to query(use TransportClient, search condition contains
> a
> termFilter and sort in a field). I've found sometimes the testing could be
> finished,  sometimes it cound't, because there are many
> EsRejectedExecutionException exceptions in ES log file.
>
> Thread pool configuration ::
>
> # use routing concept in elasticserach-java-api. based on this entry   0 to
> n-1
> index.number_of_shards: 4
> index.number_of_replicas: 1
>
> #Compress the data
> index.store.compress.stored: true
>
> # Force all memory to be locked, forcing the JVM to never swap
> bootstrap.mlockall: true
>
> ## Threadpool Settings ##
>
> # Search pool
> threadpool.search.type: fixed
> threadpool.search.size: 40
> #threadpool.search.queue_size: 100
>
> # Bulk pool
> threadpool.bulk.type: fixed
> threadpool.bulk.size: 60
> #threadpool.bulk.queue_size: 300
>
> # Index pool
> threadpool.index.type: fixed
> threadpool.index.size: 10
> #threadpool.index.queue_size: 100
>
> # Indices settings
> indices.memory.index_buffer_size: 30%
> #indices.memory.min_shard_index_buffer_size: 12mb
> #indices.memory.min_index_buffer_size: 96mb
>
> # Cache Sizes
> indices.fielddata.cache.size: 15%
> indices.fielddata.cache.expire: 6m
> indices.cache.filter.size: 15%
> indices.cache.filter.expire: 6m
>
> # Indexing Settings for Writes
> #index.refresh_interval: 30s
> index.translog.flush_threshold_ops: 5
>
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/ESRejectedExecutionException-tp4053295.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1396422023093-4053295.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFUSvpNE2TOWPaYavaC79BvP8JO%3DbVyyzdWUHwB34Ftng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Grouping entries together in a query, need some help with aggregations

2014-04-02 Thread Vincent Massol
Actually I've just realized I'm going to hit a problem... I wanted to use 
Kibana to graph this for me but I'm not sure Kibana supports 
"aggregations"...

Any idea?

Thanks
-Vincent

On Wednesday, April 2, 2014 11:38:14 AM UTC+2, Vincent Massol wrote:
>
> Thanks a lot for your fast response Adrien!
>
> * I noticed the cardinality aggregation but I was worried by the "an 
> approximate count of distinct values." part of the documentation. I need an 
> exact value, not an approximate one :) However I've read more the 
> documentation and it may not be a real problem in practice, especially if I 
> use a threshold of 4 (the max apparently). I couldn't find the default 
> precision value BTW in the documentation.
> * From your answer I gather that using aggregations is the only solution 
> to my problem and there's no way to use the Query DSL to solve it.
>
> Thanks, it helps a lot!
> -Vincent
>
> On Wednesday, April 2, 2014 11:17:17 AM UTC+2, Adrien Grand wrote:
>>
>> Hi Vincent,
>>
>> I left some replies inline:
>>
>> On Wed, Apr 2, 2014 at 10:02 AM, Vincent Massol  wrote:
>>
>>> Hi guys,
>>>
>>> I'd like to count all entries in my ES instance, having a timestamp from 
>>> the *last day* and *group together all entries having the same 
>>> "instanceId"*. With the data below, the count result should be 1 (and 
>>> not 2) since 2 entries are within the last day but they have the same 
>>> instanceId of "def".
>>>
>>> I tried the following:
>>>
>>> curl -XPOST "
>>> http://localhost:9200/installs/install/_search?pretty=1&fields=_source,_timestamp";
>>>  
>>> -d'
>>> {
>>> "aggs": {
>>> "lastday" : {
>>> "filter" : { 
>>> "range" : { 
>>> "_timestamp" : {
>>> "gt" : "now-1d"
>>> }
>>> }
>>> },
>>> "aggs" : {
>>> "instanceids" : {
>>> "terms" : { "field" : "instanceId" }
>>> }
>>> }
>>> }
>>> }
>>> }'
>>>
>>> But I have 3 problems with this:
>>> * It's not a count but a search. "aggs" don't seem to work with _count
>>> * It returns all entries in the result before the aggs data
>>>
>>
>> For these two issues, you probably want to check out the count search 
>> type[1] which works with aggregations. It's like a regular search, but 
>> doesn't do perform the fetch phase in order to fetch the top hits.
>>
>> [1] 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#count
>>  
>>
>>> * In the aggs I don't get a direct count value and I have to count the 
>>> number of buckets to get my answer
>>>
>>
>> We recently (Elasticsearch 1.1.0) added a cardinality[2] aggregation, 
>> that allows for counting unique values. In previous versions of 
>> Elasticsearch, counting was indeed only possible through the terms 
>> aggregation with a high `size` parameter, but this was inefficient on 
>> high-cardinality fields.
>>
>> [2] 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#search-aggregations-metrics-cardinality-aggregation
>>  
>> Here is a gist that gives an example of the count search_type and the 
>> cardinality aggregation:
>>   https://gist.github.com/jpountz/9930690
>>
>> -- 
>> Adrien Grand
>>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0a0ba031-ab73-40d7-8397-dc536343ddf8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregations on nested array types

2014-04-02 Thread dazraf
Hi,
I've also experimented with nested types using dynamic templates. 
Interesting (empty!) aggregation results!
Gist: https://gist.github.com/dazraf/9937198

Would be grateful if anyone can shed some light on this please?

Thank you.

On Wednesday, 2 April 2014 16:05:00 UTC+1, dazraf wrote:
>
> Hi,
>
> Gist: https://gist.github.com/dazraf/9935814
>
> Basically, I'd like to be able to aggregate a field of an array of 
> observations, grouped by an ancestor/parent id. 
> So for example (see gist): Aggregate the timings per contestant across a 
> set of contests.
>
> I realise that the data can be structured differently - effectively 
> flattened to a document per contest-contestant-contest. 
> However, I don't have the luxury of doing this in the real-world case. 
>
> Any help much appreciated.
>
> Thank you.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2f780e18-66ea-4bee-bded-5b73632b532c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart of a cluster?

2014-04-02 Thread Nikolas Everett
I just used this to upgrade our labs environment a couple of days ago:

#!/bin/bash

export prefix=deployment-elastic0
export suffix=.eqiad.wmflabs
rm -f servers
for i in {1..4}; do
echo $prefix$i$suffix >> servers
done

cat << __commands__ > /tmp/commands
wget
https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.0.deb
sudo dpkg -i --force-confdef --force-confold elasticsearch-1.1.0.deb
curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{
"transient" : {
"cluster.routing.allocation.enable": "primaries"
}
}'
sudo /etc/init.d/elasticsearch restart
until curl -s localhost:9200/_cluster/health?pretty; do
sleep 1
done
curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{
"transient" : {
"cluster.routing.allocation.enable": "all"
}
}'
until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/health |
grep green; do
cat /tmp/health
sleep 1
done
__commands__

for server in $(cat servers); do
scp /tmp/commands $server:/tmp/commands
ssh $server bash /tmp/commands
done



Production will swap wget and dpkg with apt-get update and apt-get install
elasticsearch but you get the idea.

It isn't fool proof.  If it dies it doesn't know how to start where it left
off and you might have to kill it if the cluster doesn't come back like
you'd expect.  It really only covers the "everything worked out as
expected" scenario.  But it is nice when that happens.

Nik


On Wed, Apr 2, 2014 at 7:23 AM, Petter Abrahamsson  wrote:

> Mike,
>
> Your script needs to check for the status of the cluster before shutting
> down a node, ie if the state is yellow wait until it becomes green again
> before shutting down the next node. You'll probably want do disable
> allocation of shards while each node is being restarted (enable when node
> comes back) in order to minimize the amount of data that needs to be
> rebalanced.
> Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set
> in your elasticsearch.yml file.
>
> Meta code
>
> for node in $cluster_nodes; do
>if [ $cluster_status == 'green' ]; then
> cluster_disable_allocation()
> shutdown_node($node)
> wait_for_node_to_rejoin()
> cluster_enable_allocation()
> wait_for_cluster_status_green()
>   fi
> done
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html
>
> /petter
>
>
> On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks  wrote:
>
>> What is the proper way of performing a rolling restart of a cluster? I
>> currently have my stop script check for the cluster health to be green
>> before stopping itself. Unfortunately this doesn't appear to be working.
>>
>> My setup:
>> ES 1.0.0
>> 3 node cluster w/ 1 replica.
>>
>> When I perform the rolling restart I see the cluster still reporting a
>> green state when a node is down. In theory that should be a yellow state
>> since some shards will be unallocated. My script output during a rolling
>> restart:
>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>
>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>
>> curl: (52) Empty reply from server
>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>
>> curl: (52) Empty reply from server
>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>> ... continues as green for many more seconds...
>>
>> Since it is reporting as green, the second node thinks it can stop and
>> ends up putting the cluster into a broken red state:
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046
>>
>> My stop script issues a call to
>> http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
>> Is it possible the other nodes are wait

Re: bulk thread pool rejections

2014-04-02 Thread Drew Raines

shift wrote:

I am seeing a high number of rejections for the bulk thread pool 
on a 32 core system.  Should I leave the thread pool size fixed 
to the # of cores and the default queue size at 50?  Are these 
rejections re-processed?


From my clients sending bulk documents (logstash), do I need to 
limit the number of connections to 32?  I currently have 200 
output threads to each elasticsearch node.


The rejections are telling you that ES's bulk thread pool is busy 
and it can't enqueue any more to wait for an open thread.  They 
aren't retried.  The exception your client gets is the final word 
for that request.


Lower your logstash threads to 16 or 32, monitor rejections, and 
gradually raise.  You could also increase the queue size, but keep 
in mind that's only useful to handle spikes.  You probably don't 
want to keep thousands around waiting since they take resources.


Drew



"bulk" : {
  "threads" : 32,
 * "queue" : 50,*
  "active" : 32,
 * "rejected" : 12592108,*
  "largest" : 32,
  "completed" : 584407554
}

Thanks!  Any feedback is appreciated.


--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/m2lhvnpwx8.fsf%40mid.raines.me.
For more options, visit https://groups.google.com/d/optout.


elasticsearch data node and kibana on different machines

2014-04-02 Thread computer engineer


I would like to know what is the best setup to have an elasticsearch data 
node and kibana server on separate machines. I have setup multiple 
elasticsearch data nodes and would like to show all dashboards on one 
server but not sure how to do that. I do not want to have different urls to 
view different dashboards. I set up the data nodes with logstash shipper on 
each machine so all I need now is to have kibana get data from each 
different data node. Is that possible? I edited the config file for kibana 
as follows:

elasticsearch: "http://"192.168.xx.xxx":9200";

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dc9a4fab-72e1-4e98-a3ea-1c43d51e050f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregations on nested array types

2014-04-02 Thread dazraf
Hi,

Gist: https://gist.github.com/dazraf/9935814

Basically, I'd like to be able to aggregate a field of an array of 
observations, grouped by an ancestor/parent id. 
So for example (see gist): Aggregate the timings per contestant across a 
set of contests.

I realise that the data can be structured differently - effectively 
flattened to a document per contest-contestant-contest. 
However, I don't have the luxury of doing this in the real-world case. 

Any help much appreciated.

Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22d6273a-b6a9-4e7b-8364-9011bf34ef5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Mixing bool and multi match/function score query

2014-04-02 Thread Garry Welding
I'm currently doing a query that's a mix of multi match and function score. 
The important bit of the JSON looks like this:

"function_score":{
"query":{
"query_string":{
"query":"some query",

"fields":["id","name","strippedDescription","colourSearch","sizeSearch"]
}
}
}

However, I also want to include results that don't necessarily match the 
query but have a particular numeric value that's greater than 0. I think a 
bool query would do this, but I don't know how to use a bool query with a 
function score query.

I understand that a multi match query is just shorthand for a bool query, 
and I could expand out the multi match query into its bool counter-part, 
however, I then don't know how I would do function score within that.

Any ideas? I'm on version 1.1.0 by the way.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4279a874-eef8-47ec-9f49-f91efed1f851%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Adding payload and retrieve them in highlighting

2014-04-02 Thread Karol Sikora
Hi,

My simplified use case is to search in pages of book and show back to user 
on which pages search phrase was found.
First think about such case was to denormalize pages structure into fields 
in book, eg page_1, page_2,  The important thing is that i need to 
return back on which page we found phrase occurence.
This approach fails, because i need to search in content fields, but 
querying and highlighting i fields named page_* is impossible.
I was also thinked about storing pages in nested structures, but it also 
fails because i cannot get back information about page in highlighting.
Tested strucuture looked as following:
pages: [ {'name': 1, 'content': 'lorem ipsum'}, ...] (inner objects)
pages: {1: 'lorem ipsum', ...}, (nested objects)
Both above also fails.

The way that i'm thinking is to store payload(page numner) with content 
attribute, so indexing structure as inner objects will be theoretically 
able to return page number with higlighting response.

Im asking here for advice more advanced users. As far as i know storing 
payloads and returing them in highlight response is currently not possible, 
but i can try to develop support for such feature.
But, maybe there is an easier way that i'm missing now.

Regards,
Karol


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cc4a5e0c-56b0-486b-9595-6d7ffac37212%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Dario Rossi
Thanks, it works now. 

I suggest to point out the detail about local transport in the docs for 
TransportClient.

Il giorno mercoledì 2 aprile 2014 15:31:06 UTC+1, Igor Motov ha scritto:
>
> You should specify the same cluster name for both node and transport 
> client. It looks like they are running in different clusters:
>
> [2014-04-02 15:19:23,262][WARN ][org.elasticsearch.client.transport] [
> Humus Sapien] node [#transport#-1][d][inet[localhost/127.0.0.1:9300]] not 
> part of the cluster Cluster [elasticsearch], ignoring...
>
> On Wednesday, April 2, 2014 10:27:19 AM UTC-4, Dario Rossi wrote:
>>
>> I forgot, after setting up the embedded node, I wait the cluster status 
>> to be Yellow with
>>
>>  Client client = node.client();
>> client.admin().cluster().prepareHealth().setWaitForYellowStatus
>> ().execute().actionGet();
>>
>>
>> this is done on the embedded node.
>>
>> Il giorno mercoledì 2 aprile 2014 15:24:35 UTC+1, Dario Rossi ha scritto:
>>>
>>> I set it to local = false and now I don't get anymore the connection 
>>> refused. But unfortunaly I get another thing:
>>>
>>>
>>> [2014-04-02 15:19:23,218][DEBUG][org.elasticsearch.transport.netty] [
>>> Humus Sapien] using worker_count[12], port[9300-9400], 
>>> bind_host[null],publish_host
>>> [null], compress[false], connect_timeout[30s], connections_per_node[2/3/
>>> 6/1/1], receive_predictor[512kb->512kb]
>>> [2014-04-02 15:19:23,218][DEBUG][org.elasticsearch.client.transport] [
>>> Humus Sapien] node_sampler_interval[5s]
>>> [2014-04-02 15:19:23,232][DEBUG][org.elasticsearch.client.transport] [
>>> Humus Sapien] adding address [[
>>> #transport#-1][d][inet[localhost/127.0.0.1:9300]]]
>>> [2014-04-02 15:19:23,232][TRACE][org.elasticsearch.client.transport] [
>>> Humus Sapien] connecting to listed node (light) [[
>>> #transport#-1][d][inet[localhost/127.0.0.1:9300]]]
>>> [2014-04-02 15:19:23,233][TRACE][org.elasticsearch.transport.netty] [
>>> Jean Grey-Summers] channel opened: [id: 0x6045a678, /127.0.0.1:51541 => 
>>> /127.0.0.1:9300]
>>> [2014-04-02 15:19:23,233][DEBUG][org.elasticsearch.transport.netty] [
>>> Humus Sapien] connected to node [[
>>> #transport#-1][d][inet[localhost/127.0.0.1:9300]]]
>>> [2014-04-02 15:19:23,250][TRACE][org.elasticsearch.plugins] [Jean Grey-
>>> Summers] starting to fetch info on plugins
>>> [2014-04-02 15:19:23,262][WARN ][org.elasticsearch.client.transport] [
>>> Humus Sapien] node [#transport#-1][d][inet[localhost/127.0.0.1:9300]] 
>>> not part of the cluster Cluster [elasticsearch], ignoring...
>>>
>>>
>>> org.elasticsearch.client.transport.NoNodeAvailableException: No node 
>>> available
>>>  at org.elasticsearch.client.transport.TransportClientNodesService.
>>> execute(TransportClientNodesService.java:219)
>>>  at org.elasticsearch.client.transport.support.InternalTransportClient.
>>> execute(InternalTransportClient.java:106)
>>>  at org.elasticsearch.client.support.AbstractClient.index(AbstractClient
>>> .java:82)
>>>  at org.elasticsearch.client.transport.TransportClient.index(
>>> TransportClient.java:330)
>>>  at org.elasticsearch.action.index.IndexRequestBuilder.doExecute(
>>> IndexRequestBuilder.java:314)
>>>  at org.elasticsearch.action.ActionRequestBuilder.execute(
>>> ActionRequestBuilder.java:85)
>>>  at org.elasticsearch.action.ActionRequestBuilder.execute(
>>> ActionRequestBuilder.java:59)
>>>  at com.mycode.estests.test.TransportTest.transportTest(TransportTest.
>>> java:20)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl
>>> .java:57)
>>>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>> DelegatingMethodAccessorImpl.java:43)
>>>  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
>>> FrameworkMethod.java:47)
>>>  at org.junit.internal.runners.model.ReflectiveCallable.run(
>>> ReflectiveCallable.java:12)
>>>  at org.junit.runners.model.FrameworkMethod.invokeExplosively(
>>> FrameworkMethod.java:44)
>>>  at org.junit.internal.runners.statements.InvokeMethod.>> style="color: #000;" class
>>> ...
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b05929e1-9aa8-4034-a9e3-91b75c3937a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ESRejectedExecutionException

2014-04-02 Thread Pandiyan
I've been testing concurrent queries, I have just one node in a server (1 * 4
core CPU, 12G memory) and create a index (4 shards, 1 replica).  I use 1000
concurrent threads to query(use TransportClient, search condition contains a
termFilter and sort in a field). I've found sometimes the testing could be
finished,  sometimes it cound't, because there are many
EsRejectedExecutionException exceptions in ES log file. 

Thread pool configuration ::

# use routing concept in elasticserach-java-api. based on this entry   0 to
n-1
index.number_of_shards: 4
index.number_of_replicas: 1

#Compress the data
index.store.compress.stored: true

# Force all memory to be locked, forcing the JVM to never swap
bootstrap.mlockall: true

## Threadpool Settings ##

# Search pool
threadpool.search.type: fixed
threadpool.search.size: 40
#threadpool.search.queue_size: 100

# Bulk pool
threadpool.bulk.type: fixed
threadpool.bulk.size: 60
#threadpool.bulk.queue_size: 300

# Index pool
threadpool.index.type: fixed
threadpool.index.size: 10
#threadpool.index.queue_size: 100

# Indices settings
indices.memory.index_buffer_size: 30%
#indices.memory.min_shard_index_buffer_size: 12mb
#indices.memory.min_index_buffer_size: 96mb

# Cache Sizes
indices.fielddata.cache.size: 15%
indices.fielddata.cache.expire: 6m
indices.cache.filter.size: 15%
indices.cache.filter.expire: 6m

# Indexing Settings for Writes
#index.refresh_interval: 30s
index.translog.flush_threshold_ops: 5




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/ESRejectedExecutionException-tp4053295.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1396422023093-4053295.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Dario Rossi
I forgot, after setting up the embedded node, I wait the cluster status to 
be Yellow with

 Client client = node.client();
client.admin().cluster().prepareHealth().setWaitForYellowStatus().
execute().actionGet();


this is done on the embedded node.

Il giorno mercoledì 2 aprile 2014 15:24:35 UTC+1, Dario Rossi ha scritto:
>
> I set it to local = false and now I don't get anymore the connection 
> refused. But unfortunaly I get another thing:
>
>
> [2014-04-02 15:19:23,218][DEBUG][org.elasticsearch.transport.netty] [Humus 
> Sapien] using worker_count[12], port[9300-9400], bind_host[null],publish_host
> [null], compress[false], connect_timeout[30s], connections_per_node[2/3/6/
> 1/1], receive_predictor[512kb->512kb]
> [2014-04-02 15:19:23,218][DEBUG][org.elasticsearch.client.transport] [
> Humus Sapien] node_sampler_interval[5s]
> [2014-04-02 15:19:23,232][DEBUG][org.elasticsearch.client.transport] [
> Humus Sapien] adding address [[
> #transport#-1][d][inet[localhost/127.0.0.1:9300]]]
> [2014-04-02 15:19:23,232][TRACE][org.elasticsearch.client.transport] [
> Humus Sapien] connecting to listed node (light) [[
> #transport#-1][d][inet[localhost/127.0.0.1:9300]]]
> [2014-04-02 15:19:23,233][TRACE][org.elasticsearch.transport.netty] [Jean 
> Grey-Summers] channel opened: [id: 0x6045a678, /127.0.0.1:51541 => /127.0.
> 0.1:9300]
> [2014-04-02 15:19:23,233][DEBUG][org.elasticsearch.transport.netty] [Humus 
> Sapien] connected to node [[
> #transport#-1][d][inet[localhost/127.0.0.1:9300]]]
> [2014-04-02 15:19:23,250][TRACE][org.elasticsearch.plugins] [Jean Grey-
> Summers] starting to fetch info on plugins
> [2014-04-02 15:19:23,262][WARN ][org.elasticsearch.client.transport] [
> Humus Sapien] node [#transport#-1][d][inet[localhost/127.0.0.1:9300]] not 
> part of the cluster Cluster [elasticsearch], ignoring...
>
>
> org.elasticsearch.client.transport.NoNodeAvailableException: No node 
> available
>  at org.elasticsearch.client.transport.TransportClientNodesService.execute
> (TransportClientNodesService.java:219)
>  at org.elasticsearch.client.transport.support.InternalTransportClient.
> execute(InternalTransportClient.java:106)
>  at org.elasticsearch.client.support.AbstractClient.index(AbstractClient.
> java:82)
>  at org.elasticsearch.client.transport.TransportClient.index(
> TransportClient.java:330)
>  at org.elasticsearch.action.index.IndexRequestBuilder.doExecute(
> IndexRequestBuilder.java:314)
>  at org.elasticsearch.action.ActionRequestBuilder.execute(
> ActionRequestBuilder.java:85)
>  at org.elasticsearch.action.ActionRequestBuilder.execute(
> ActionRequestBuilder.java:59)
>  at com.mycode.estests.test.TransportTest.transportTest(TransportTest.java
> :20)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
> java:57)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
> FrameworkMethod.java:47)
>  at org.junit.internal.runners.model.ReflectiveCallable.run(
> ReflectiveCallable.java:12)
>  at org.junit.runners.model.FrameworkMethod.invokeExplosively(
> FrameworkMethod.java:44)
>  at org.junit.internal.runners.statements.InvokeMethod. style="color: #000;" class
> ...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/abfaf316-eb85-4b77-a656-a119faeb326e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: wait_for_completion doesn't seem to be working when making a snapshot

2014-04-02 Thread Robin Clarke
Thanks that works!  I didn't notice that detail.  Odd that some parameters 
work in the URL or body, and some only in the URL... o_O

Cheers,
-Robin-

On Wednesday, 2 April 2014 16:11:27 UTC+2, Igor Motov wrote:
>
> The wait_for_completion flag has to be specified on URL not in the body. 
> Try this:
>
> curl -XPUT "
> http://localhost:9200/_snapshot/backup/snapshot_kibana?wait_for_completion=true&pretty"
>  
> -d '{
> "indices": "kibana-int",
> "ignore_unavailable": true,
> "include_global_state": false
> }'
>
> On Wednesday, April 2, 2014 9:43:14 AM UTC-4, Robin Clarke wrote:
>>
>> I am writing a small script to create a snapshot of my kibana-int index, 
>> and hit an odd race condition.
>>
>> I delete the old snapshot if it exists:
>> curl -XDELETE '
>> http://localhost:9200/_snapshot/backup/snapshot_kibana?pretty'
>>
>> Then make the new snapshot
>> curl -XPUT "http://localhost:9200/_snapshot/backup/snapshot_kibana?pretty"; 
>> -d '{
>> "indices": "kibana-int",
>> "ignore_unavailable": true,
>> "wait_for_completion": true,
>> "include_global_state": false
>> }'
>>
>> Then create a tarball of the backup to transfer to another machine
>> tar czf /DATA/elasticsearch/kibana-int.tgz -C /DATA/elasticsearch ./backup
>>
>> When scripted, it seems that the DELETE, and/or PUT are not complete, and 
>> I get this error:
>> tar: ./backup/indices/kibana-int/4: file changed as we read it
>> tar: ./backup/indices/kibana-int/2: file changed as we read it
>> tar: ./backup/indices/kibana-int/3: file changed as we read it
>>
>> I tried putting a sleep between the DELETE and the PUT, and same error, 
>> so it seems that perhaps the "wait_for_completion" is not doing what it 
>> should...?
>>
>> Any ideas, other than just putting a sleep in there?
>>
>> Cheers!
>>
>> -Robin-
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e4f2dca1-b40e-427d-9bec-a44dcd4479ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Igor Motov
You should specify the same cluster name for both node and transport 
client. It looks like they are running in different clusters:

[2014-04-02 15:19:23,262][WARN ][org.elasticsearch.client.transport] [Humus 
Sapien] node [#transport#-1][d][inet[localhost/127.0.0.1:9300]] not part of 
the cluster Cluster [elasticsearch], ignoring...

On Wednesday, April 2, 2014 10:27:19 AM UTC-4, Dario Rossi wrote:
>
> I forgot, after setting up the embedded node, I wait the cluster status to 
> be Yellow with
>
>  Client client = node.client();
> client.admin().cluster().prepareHealth().setWaitForYellowStatus().
> execute().actionGet();
>
>
> this is done on the embedded node.
>
> Il giorno mercoledì 2 aprile 2014 15:24:35 UTC+1, Dario Rossi ha scritto:
>>
>> I set it to local = false and now I don't get anymore the connection 
>> refused. But unfortunaly I get another thing:
>>
>>
>> [2014-04-02 15:19:23,218][DEBUG][org.elasticsearch.transport.netty] [
>> Humus Sapien] using worker_count[12], port[9300-9400], 
>> bind_host[null],publish_host
>> [null], compress[false], connect_timeout[30s], connections_per_node[2/3/6
>> /1/1], receive_predictor[512kb->512kb]
>> [2014-04-02 15:19:23,218][DEBUG][org.elasticsearch.client.transport] [
>> Humus Sapien] node_sampler_interval[5s]
>> [2014-04-02 15:19:23,232][DEBUG][org.elasticsearch.client.transport] [
>> Humus Sapien] adding address [[
>> #transport#-1][d][inet[localhost/127.0.0.1:9300]]]
>> [2014-04-02 15:19:23,232][TRACE][org.elasticsearch.client.transport] [
>> Humus Sapien] connecting to listed node (light) [[
>> #transport#-1][d][inet[localhost/127.0.0.1:9300]]]
>> [2014-04-02 15:19:23,233][TRACE][org.elasticsearch.transport.netty] [Jean 
>> Grey-Summers] channel opened: [id: 0x6045a678, /127.0.0.1:51541 => /127.0
>> .0.1:9300]
>> [2014-04-02 15:19:23,233][DEBUG][org.elasticsearch.transport.netty] [
>> Humus Sapien] connected to node [[
>> #transport#-1][d][inet[localhost/127.0.0.1:9300]]]
>> [2014-04-02 15:19:23,250][TRACE][org.elasticsearch.plugins] [Jean Grey-
>> Summers] starting to fetch info on plugins
>> [2014-04-02 15:19:23,262][WARN ][org.elasticsearch.client.transport] [
>> Humus Sapien] node [#transport#-1][d][inet[localhost/127.0.0.1:9300]] 
>> not part of the cluster Cluster [elasticsearch], ignoring...
>>
>>
>> org.elasticsearch.client.transport.NoNodeAvailableException: No node 
>> available
>>  at org.elasticsearch.client.transport.TransportClientNodesService.
>> execute(TransportClientNodesService.java:219)
>>  at org.elasticsearch.client.transport.support.InternalTransportClient.
>> execute(InternalTransportClient.java:106)
>>  at org.elasticsearch.client.support.AbstractClient.index(AbstractClient.
>> java:82)
>>  at org.elasticsearch.client.transport.TransportClient.index(
>> TransportClient.java:330)
>>  at org.elasticsearch.action.index.IndexRequestBuilder.doExecute(
>> IndexRequestBuilder.java:314)
>>  at org.elasticsearch.action.ActionRequestBuilder.execute(
>> ActionRequestBuilder.java:85)
>>  at org.elasticsearch.action.ActionRequestBuilder.execute(
>> ActionRequestBuilder.java:59)
>>  at com.mycode.estests.test.TransportTest.transportTest(TransportTest.
>> java:20)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>> java:57)
>>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> DelegatingMethodAccessorImpl.java:43)
>>  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
>> FrameworkMethod.java:47)
>>  at org.junit.internal.runners.model.ReflectiveCallable.run(
>> ReflectiveCallable.java:12)
>>  at org.junit.runners.model.FrameworkMethod.invokeExplosively(
>> FrameworkMethod.java:44)
>>  at org.junit.internal.runners.statements.InvokeMethod.> style="color: #000;" class
>> ...
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7da21dd1-e3cd-4e3e-8636-1eabb736593d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Igor Motov
If you want to be able to connect to it using Transport Client - yes or 
remove it completely. If you still get some failure - post here complete 
log.

On Wednesday, April 2, 2014 10:09:16 AM UTC-4, Dario Rossi wrote:
>
> So shall I set local to false?
>
> Il giorno mercoledì 2 aprile 2014 15:06:04 UTC+1, Igor Motov ha scritto:
>>
>> You are starting local node, which is using local transport, which is not 
>> listening on port 9300. The log message that you see is from transport 
>> client that tries to connect to port 9300 but cannot. Try starting just 
>> your node and you will be see that nobody listens on port 9300.
>>
>> On Tuesday, April 1, 2014 12:52:41 PM UTC-4, Dario Rossi wrote:
>>>
>>> I've the following problem: to do an integration test, I set up an 
>>> embedded node and then I create a TransportClient to connect to it. 
>>>
>>> The setup of the embedded node is (among other things):
>>>
>>>
>>>  port = 11547; // User ports range 1024 - 49151
>>> tcpport = 9300;
>>> settings.put("http.port", port);
>>> settings.put("transport.tcp.port", tcpport);
>>>  
>>> Settings esSettings = settings.build();
>>>
>>>
>>> node = NodeBuilder.nodeBuilder().local(true).settings(esSettings
>>> ).node();  //I tried setting local to false too
>>> node.start();
>>>
>>>
>>> and the transportclient is as simple as:
>>>
>>>
>>>
>>>
>>>   TransportClient client = new TransportClient();
>>> client.addTransportAddress(new InetSocketTransportAddress(
>>> "localhost", 9300));
>>>
>>>
>>> client.prepareIndex("test", "type").setSource("field", "value").
>>> execute().actionGet();
>>>
>>>
>>>
>>>
>>> (I tried both localhost and 127.0.0.1). 
>>>
>>> Anyway I get a connection refused when running the above code:
>>>
>>>
>>> Caused by: java.net.ConnectException: Connection refused: localhost/
>>> 127.0.0.1:9300
>>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:
>>> 708)
>>>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.
>>> connect(NioClientBoss.java:150)
>>>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.
>>> processSelectedKeys(NioClientBoss.java:105)
>>>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.
>>> process(NioClientBoss.java:79)
>>>  at org.elasticsearch.common.netty.channel.socket.nio.
>>> AbstractNioSelector.run(AbstractNioSelector.java:318)
>>>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(
>>> NioClientBoss.java:42)
>>>  at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
>>> ThreadRenamingRunnable.java:108)
>>>  at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.
>>> run(DeadLockProofWorker.java:42)
>>>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor
>>> .java:1145)
>>>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> ThreadPoolExecutor.java:615)
>>>  at java.lang.Thread.run(Thread.java:724)
>>> [2014-04-01 17:48:10,836][TRACE][org.elasticsearch.transport.netty] [Cap 'N 
>>> Hawk] connect exception caught on transport layer [[id: 0x9526b405]]
>>> java.net.ConnectException: Connection refused: localhost/127.0.0.1:9300
>>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>  at 
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
>>>  at 
>>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
>>>  at 
>>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
>>>  at 
>>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
>>>  at 
>>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>>>  at 
>>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
>>>  at 
>>> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>>  at 
>>> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>>  at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>  at java.lang.Thread.run(Thread.java:724)
>>>
>>>
>>>
>>> my colleague was successful when he tried to connect to another host. 
>>> but he fails with localhost. 
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a1a6942-8b26-4376-81cc-bd75d5f3c863%40googlegroups.com.
For more options, visit https://groups.google.co

Re: wait_for_completion doesn't seem to be working when making a snapshot

2014-04-02 Thread Igor Motov
The wait_for_completion flag has to be specified on URL not in the body. 
Try this:

curl -XPUT "
http://localhost:9200/_snapshot/backup/snapshot_kibana?wait_for_completion=true&pretty"
 
-d '{
"indices": "kibana-int",
"ignore_unavailable": true,
"include_global_state": false
}'

On Wednesday, April 2, 2014 9:43:14 AM UTC-4, Robin Clarke wrote:
>
> I am writing a small script to create a snapshot of my kibana-int index, 
> and hit an odd race condition.
>
> I delete the old snapshot if it exists:
> curl -XDELETE '
> http://localhost:9200/_snapshot/backup/snapshot_kibana?pretty'
>
> Then make the new snapshot
> curl -XPUT "http://localhost:9200/_snapshot/backup/snapshot_kibana?pretty"; 
> -d '{
> "indices": "kibana-int",
> "ignore_unavailable": true,
> "wait_for_completion": true,
> "include_global_state": false
> }'
>
> Then create a tarball of the backup to transfer to another machine
> tar czf /DATA/elasticsearch/kibana-int.tgz -C /DATA/elasticsearch ./backup
>
> When scripted, it seems that the DELETE, and/or PUT are not complete, and 
> I get this error:
> tar: ./backup/indices/kibana-int/4: file changed as we read it
> tar: ./backup/indices/kibana-int/2: file changed as we read it
> tar: ./backup/indices/kibana-int/3: file changed as we read it
>
> I tried putting a sleep between the DELETE and the PUT, and same error, so 
> it seems that perhaps the "wait_for_completion" is not doing what it 
> should...?
>
> Any ideas, other than just putting a sleep in there?
>
> Cheers!
>
> -Robin-
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/beb0d9d0-73ef-4ce1-8e1f-c70e083d65d5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Dario Rossi
So shall I set local to false?

Il giorno mercoledì 2 aprile 2014 15:06:04 UTC+1, Igor Motov ha scritto:
>
> You are starting local node, which is using local transport, which is not 
> listening on port 9300. The log message that you see is from transport 
> client that tries to connect to port 9300 but cannot. Try starting just 
> your node and you will be see that nobody listens on port 9300.
>
> On Tuesday, April 1, 2014 12:52:41 PM UTC-4, Dario Rossi wrote:
>>
>> I've the following problem: to do an integration test, I set up an 
>> embedded node and then I create a TransportClient to connect to it. 
>>
>> The setup of the embedded node is (among other things):
>>
>>
>>  port = 11547; // User ports range 1024 - 49151
>> tcpport = 9300;
>> settings.put("http.port", port);
>> settings.put("transport.tcp.port", tcpport);
>>  
>> Settings esSettings = settings.build();
>>
>>
>> node = NodeBuilder.nodeBuilder().local(true).settings(esSettings
>> ).node();  //I tried setting local to false too
>> node.start();
>>
>>
>> and the transportclient is as simple as:
>>
>>
>>
>>
>>   TransportClient client = new TransportClient();
>> client.addTransportAddress(new InetSocketTransportAddress(
>> "localhost", 9300));
>>
>>
>> client.prepareIndex("test", "type").setSource("field", "value").
>> execute().actionGet();
>>
>>
>>
>>
>> (I tried both localhost and 127.0.0.1). 
>>
>> Anyway I get a connection refused when running the above code:
>>
>>
>> Caused by: java.net.ConnectException: Connection refused: localhost/127.0
>> .0.1:9300
>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708
>> )
>>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.
>> connect(NioClientBoss.java:150)
>>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.
>> processSelectedKeys(NioClientBoss.java:105)
>>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.
>> process(NioClientBoss.java:79)
>>  at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector
>> .run(AbstractNioSelector.java:318)
>>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(
>> NioClientBoss.java:42)
>>  at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
>> ThreadRenamingRunnable.java:108)
>>  at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.
>> run(DeadLockProofWorker.java:42)
>>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
>> java:1145)
>>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> .java:615)
>>  at java.lang.Thread.run(Thread.java:724)
>> [2014-04-01 17:48:10,836][TRACE][org.elasticsearch.transport.netty] [Cap 'N 
>> Hawk] connect exception caught on transport layer [[id: 0x9526b405]]
>> java.net.ConnectException: Connection refused: localhost/127.0.0.1:9300
>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
>>  at 
>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
>>  at 
>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
>>  at 
>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
>>  at 
>> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>>  at 
>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
>>  at 
>> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>  at 
>> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:724)
>>
>>
>>
>> my colleague was successful when he tried to connect to another host. but 
>> he fails with localhost. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7a89428-58e3-45b6-a8a4-2b4a651351aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Make a autosuggest-search searching in realtime doesn't work properly

2014-04-02 Thread Alex K
Hi there,

I have the following Request I send to ES:

{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "socks purple",
"fields": [
"TITLE"
],
"type": "phrase_prefix"
}
},
{
"multi_match": {
"query": "socks purple",
"fields": [
"TITLE"
],
}
}
]
}
},
"filter": {
"and": [
{
"terms": {
"ACTIVE": [
1
]
}
}
]
}
}
},
"size": 7
}

Now, the first multi_match gives me good results, when I input the words I 
search in the correct manner (e.g. "Purple Socks").
But when I enter it in a 'wrong' way (e.g. "Socks Purple") it doesn't find 
anything.
A colleague of mine said I could try using a second multi_match.
I have not much knowledge of ES, almost all of the above was already there, 
I just extended the code with the second multimatch.
But now there is the problem, that if I input "socks" it gives me all 
matches for socks.now when I continue to enter "purple", it gives me 
not just purple socks, but everything matching purple (although I would 
expect only purple socks)
Anyone knows what the problem here is?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/074ec987-379b-4591-a5dc-0d2b482d4ec8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


bulk thread pool rejections

2014-04-02 Thread shift
I am seeing a high number of rejections for the bulk thread pool on a 32 
core system.  Should I leave the thread pool size fixed to the # of cores 
and the default queue size at 50?  Are these rejections re-processed?

>From my clients sending bulk documents (logstash), do I need to limit the 
number of connections to 32?  I currently have 200 output threads to each 
elasticsearch node.

"bulk" : {
  "threads" : 32,
 * "queue" : 50,*
  "active" : 32,
 * "rejected" : 12592108,*
  "largest" : 32,
  "completed" : 584407554
}

Thanks!  Any feedback is appreciated.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1a7b5b6b-510c-42b0-933e-7b771e8b350b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: transport.tcp.port doesn't work for localhost?

2014-04-02 Thread Igor Motov
You are starting local node, which is using local transport, which is not 
listening on port 9300. The log message that you see is from transport 
client that tries to connect to port 9300 but cannot. Try starting just 
your node and you will be see that nobody listens on port 9300.

On Tuesday, April 1, 2014 12:52:41 PM UTC-4, Dario Rossi wrote:
>
> I've the following problem: to do an integration test, I set up an 
> embedded node and then I create a TransportClient to connect to it. 
>
> The setup of the embedded node is (among other things):
>
>
>  port = 11547; // User ports range 1024 - 49151
> tcpport = 9300;
> settings.put("http.port", port);
> settings.put("transport.tcp.port", tcpport);
>  
> Settings esSettings = settings.build();
>
>
> node = NodeBuilder.nodeBuilder().local(true).settings(esSettings).
> node();  //I tried setting local to false too
> node.start();
>
>
> and the transportclient is as simple as:
>
>
>
>
>   TransportClient client = new TransportClient();
> client.addTransportAddress(new InetSocketTransportAddress(
> "localhost", 9300));
>
>
> client.prepareIndex("test", "type").setSource("field", "value").
> execute().actionGet();
>
>
>
>
> (I tried both localhost and 127.0.0.1). 
>
> Anyway I get a connection refused when running the above code:
>
>
> Caused by: java.net.ConnectException: Connection refused: localhost/127.0.
> 0.1:9300
>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.
> connect(NioClientBoss.java:150)
>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.
> processSelectedKeys(NioClientBoss.java:105)
>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.
> process(NioClientBoss.java:79)
>  at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.
> run(AbstractNioSelector.java:318)
>  at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(
> NioClientBoss.java:42)
>  at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
> ThreadRenamingRunnable.java:108)
>  at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run
> (DeadLockProofWorker.java:42)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> java:1145)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
>  at java.lang.Thread.run(Thread.java:724)
> [2014-04-01 17:48:10,836][TRACE][org.elasticsearch.transport.netty] [Cap 'N 
> Hawk] connect exception caught on transport layer [[id: 0x9526b405]]
> java.net.ConnectException: Connection refused: localhost/127.0.0.1:9300
>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
>  at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
>  at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
>  at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
>  at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>  at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
>  at 
> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>  at 
> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:724)
>
>
>
> my colleague was successful when he tried to connect to another host. but 
> he fails with localhost. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d573c5ca-ada7-48b9-acb5-12cbd925bdee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


wait_for_completion doesn't seem to be working when making a snapshot

2014-04-02 Thread Robin Clarke
I am writing a small script to create a snapshot of my kibana-int index, 
and hit an odd race condition.

I delete the old snapshot if it exists:
curl -XDELETE 
'http://localhost:9200/_snapshot/backup/snapshot_kibana?pretty'

Then make the new snapshot
curl -XPUT "http://localhost:9200/_snapshot/backup/snapshot_kibana?pretty"; 
-d '{
"indices": "kibana-int",
"ignore_unavailable": true,
"wait_for_completion": true,
"include_global_state": false
}'

Then create a tarball of the backup to transfer to another machine
tar czf /DATA/elasticsearch/kibana-int.tgz -C /DATA/elasticsearch ./backup

When scripted, it seems that the DELETE, and/or PUT are not complete, and I 
get this error:
tar: ./backup/indices/kibana-int/4: file changed as we read it
tar: ./backup/indices/kibana-int/2: file changed as we read it
tar: ./backup/indices/kibana-int/3: file changed as we read it

I tried putting a sleep between the DELETE and the PUT, and same error, so 
it seems that perhaps the "wait_for_completion" is not doing what it 
should...?

Any ideas, other than just putting a sleep in there?

Cheers!

-Robin-

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7962b247-f4d1-47d3-9e73-25544ff7aa84%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: data distribution over shards and replicas

2014-04-02 Thread Subhadip Bagui


Thanks Mark for the prompt reply, I have some more doubts

1. Suppose one index is running with 3 shards and 1 replica and other index 
is running with the cluster settings i.e. 5 shards 2 replica then total 3+1 
or 5+2 shards will be available in cluster? I have installed 
elasticsearch-head plugin but the replica shard is not showing there. 

For data distribution, replica shard also keeps other index documents or it 
will be used to keep backup copy of data only.

2. So documents under same index will be split due to sharding and 
distribute over the shards right ? Can we push all the documents for same 
index in a particular shard? I don't want to use custom routing as then I 
need one field value common for all the documents. How can we find out 
which shard is holding which documents?

3. If I make one index with 2 shards and no replica and the node in cluster 
holding this 2 shards dies, then will I lose the data, or the data will 
have a copy in cluster level replica? If I have only 1 replica and the node 
holds the replica dies then how the backup will happen?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4d7d0243-dcd1-4ac7-9fef-1d6e44599ea1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Lucene index corruption on nodes restart

2014-04-02 Thread simonw
hey,

is it possible to look at this index / shard? do you still have it / can 
you safe it for further investigations? You can ping me directly at simon 
AT elasticsearch DOT com

On Wednesday, April 2, 2014 11:23:38 AM UTC+2, Paweł Chabierski wrote:
>
> Few days ago we found we've got that same error when we search for data. 
>
> reason: "FetchPhaseExecutionException[[site_production][1]: 
> query[ConstantScore(cache(_type:ademail))],from[0],size[648]: Fetch Failed 
> [Failed to fetch doc id [9615533]]]; nested: EOFException[seek past EOF: 
> MMapIndexInput(path="/opt/elasticsearch/data/naprawa/nodes/0/indices/site_production/1/index/_573oa.fdt")];
>  
>
> After this error, elasticsearch stop search and don't return all results, 
> even if they are returned properly in other query. Is there any way to fix 
> this index? We try checkIndex from Luccene core library, after that we lost 
> ~20 milions of documents but error still occurs. Also we try to restore 
> index from snapshot, but error still occurs :/.
>
> W dniu sobota, 22 marca 2014 14:04:56 UTC+1 użytkownik Andrey Perminov 
> napisał:
>>
>> We are using a small elasticsearch cluster of three nodes, version 1.0.1. 
>> Each node has 7 GB RAM. Our software creates daily indexes for storing it's 
>> data. Daily index is something around 5 GB. Unfortunately, for a reason, 
>> Elasticsearch eats up all RAM and hangs the node, even though heap size is 
>> set to 6 GB max. So we decided to use monit to restart it on reaching 
>> memory limit of 90%. It works, but sometimes we got such errors:
>>
>> [2014-03-22 16:56:04,943][DEBUG][action.search.type   ] [es-00] 
>> [product-22-03-2014][0], node[jbUDVzuvS5GTM7iOG8iwzQ], [P], s[STARTED]: 
>> Failed to execute [org.elasticsearch.action.search.SearchRequest@687dc039]
>> org.elasticsearch.search.fetch.FetchPhaseExecutionException: 
>> [product-22-03-2014][0]: query[filtered(ToParentBlockJoinQuery 
>> (filtered(history.created:[1392574921000 TO 
>> *])->cache(_type:__history)))->cache(_type:product)],from[0],size[1000],sort[>  
>> org.elasticsearch.index.search.nested.NestedFieldComparatorSource@15e4ece9>]:
>>  
>> Fetch Failed [Failed to fetch doc id [7263214]]
>> at 
>> org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:230)
>> at 
>> org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:156)
>> at 
>> org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:332)
>> at 
>> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
>> at 
>> org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
>> at 
>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
>> at 
>> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:292)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
>> Source)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
>> Source)
>> at java.lang.Thread.run(Unknown Source)
>> Caused by: java.io.EOFException: seek past EOF: 
>> MMapIndexInput(path="/opt/elasticsearch/main/nodes/0/indices/product-22-03-2014/0/index/_9lz.fdt")
>> at 
>> org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:174)
>> at 
>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:229)
>> at 
>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276)
>> at 
>> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
>> at 
>> org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:196)
>> at 
>> org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:228)
>> ... 9 more
>> [2014-03-22 16:56:04,944][DEBUG][action.search.type   ] [es-00] All 
>> shards failed for phase: [query_fetch]
>>
>> According to our logs, this might happen when one or two nodes get 
>> restarted. More strangely, same shard got corrupted on all nodes of 
>> cluster. Why could this happen? How can we fix it? Can you suggest us how 
>> to fix memory usage?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9d9aa67-5adc-4c06-8659-9031cb673d39%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rolling restart of a cluster?

2014-04-02 Thread Petter Abrahamsson
Mike,

Your script needs to check for the status of the cluster before shutting
down a node, ie if the state is yellow wait until it becomes green again
before shutting down the next node. You'll probably want do disable
allocation of shards while each node is being restarted (enable when node
comes back) in order to minimize the amount of data that needs to be
rebalanced.
Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set
in your elasticsearch.yml file.

Meta code

for node in $cluster_nodes; do
  if [ $cluster_status == 'green' ]; then
cluster_disable_allocation()
shutdown_node($node)
wait_for_node_to_rejoin()
cluster_enable_allocation()
wait_for_cluster_status_green()
  fi
done

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html

/petter


On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks  wrote:

> What is the proper way of performing a rolling restart of a cluster? I
> currently have my stop script check for the cluster health to be green
> before stopping itself. Unfortunately this doesn't appear to be working.
>
> My setup:
> ES 1.0.0
> 3 node cluster w/ 1 replica.
>
> When I perform the rolling restart I see the cluster still reporting a
> green state when a node is down. In theory that should be a yellow state
> since some shards will be unallocated. My script output during a rolling
> restart:
> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>
> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>
> curl: (52) Empty reply from server
> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>
> curl: (52) Empty reply from server
> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
> ... continues as green for many more seconds...
>
> Since it is reporting as green, the second node thinks it can stop and
> ends up putting the cluster into a broken red state:
> curl: (52) Empty reply from server
> curl: (52) Empty reply from server
> 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0
>
> curl: (52) Empty reply from server
> curl: (52) Empty reply from server
> 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530
>
> curl: (52) Empty reply from server
> curl: (52) Empty reply from server
> 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530
>
> curl: (52) Empty reply from server
> curl: (52) Empty reply from server
> 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530
>
> curl: (52) Empty reply from server
> curl: (52) Empty reply from server
> 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530
>
> curl: (52) Empty reply from server
> curl: (52) Empty reply from server
> 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530
>
> curl: (52) Empty reply from server
> curl: (52) Empty reply from server
> 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046
>
> My stop script issues a call to
> http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
> Is it possible the other nodes are waiting to timeout the down node before
> moving into the yellow state? I would assume the shutdown API call would
> inform the other nodes that it is going down.
>
> Appreciate any help on how to do this properly.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALAhT_hertv4oX1Rcq71ELQUBdyq33ncktqT5%3DZn%3D0cOfkBxaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: near real time alerts for syslogs

2014-04-02 Thread Antoine Brun
Hello Ryan,

I am trying to build the same type of application (device log collecting) 
and I'm also very new to logstash and elasticsearch.
I'm having a hard time setting up a lab environment that can sustain the 
load (2000 logs/sec, 1024ko logs) and only 60% of the logs are indexed (I 
count the number of lucene doucuments).

So maybe you can give me a few tips or advices on how you tuned you 
environment.

How do you start logstash? just with the script provided in the project?
Are you using the syslog plugin to listen on port 514?
How many elasticsearch nodes do you have? 

I would really appreciate if you could take some time to share your 
experience on this.

Thank you,

Antoine Brun

Le mercredi 29 mai 2013 03:10:12 UTC+2, Ryan Palamara a écrit :
>
> I am using Elasticsearch combined with Logstash and Kibana for collecting 
> log data from a number of different network devices. I just set it up in 
> the past few days and so far it has been handling the load wonderfully. I 
> would like to setup alerts for certain events that can be taken from the 
> logs. Things like getting an alert after a certain amount of events in a 
> time period or alerts for certain log events.
>
> Now I am very new at this and have been searching through for some way to 
> do this, but was hoping that someone could help point me in the right 
> direction.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4ca2099-86d2-4071-8359-565f902f390c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Relevancy sorting of result returned

2014-04-02 Thread chee hoo lum
Hi Binh,

The same problem again. I have the following queries :

1)

{
  "from" : 0,
  "size" : 100,
  "explain" : true,
  "query" : {
"filtered" : {
  "query" : {
 "multi_match": {
  "query": "happy",
  "fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
}
  },
  "filter" : {
"query" : {
  "bool" : {
  "must" : {
"term" : {
  "CHANNEL_ID" : "1"
}
  }
}
}
  }
}
  }
}

However the result display in reverse order for #2 and #3. I have added the
boost in the DISPLAY_NAME but still yield the same behaviour :

1)
* "_score": 10.960511,*
"_source": {
"DISPLAY_NAME": "Happy",
"PRICE": 5,
"CHANNEL_ID": 1,
"CAT_PARENT": 981,
"MEDIA_ID": 390933,
"GENRE": "Happy",
"MEDIA_PKEY": "838644",
"COMPOSER": null,
"PLAYER": null,
"CATMEDIA_NAME": "*Happy*",
"FTID": null,
"VIEW_ID": 43,
"POSITION": 51399,
"ITEMCODE": null,
"CAT_ID": 982,
"PRIORITY": 80,
"CKEY": 757447,
"CATMEDIA_RANK": 3,
"BILLINGTYPE_ID": 1,
"CAT_NAME": "POP",
"KEYWORDS": null,
"LONG_DESCRIPTION": null,
"SHORT_DESCRIPTION": null,
"TYPE_ID": 74,
"ARTIST_GENDER": null,
   * "PERFORMER": "Mario Pacchioli",*
"MAPPINGS": "1_43_982_POP_981_51399_5",
"SHORTCODE": null,
"CATMEDIA_CDATE": "2014-01-12T15:12:27.000Z",
"LANG_ID": 1
},
"_explanation": {
"value": 10.960511,
"description": "max of:",
"details": [
{
"value": 10.960511,
"description": "weight(DISPLAY_NAME:happy^6.0
in 23025) [PerFieldSimilarity], result of:",
"details": [
{
"value": 10.960511,
"description": "fieldWeight in 23025,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0),
with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.960511,
"description": "idf(docFreq=58,
maxDocs=1249243)"
},
{
"value": 1,
"description":
"fieldNorm(doc=23025)"
}
]
}
]
}
]
}
}


2)
"_id": "10194",
  *  "_score": 10.699952,*
"_source": {
"DISPLAY_NAME": "Be *Happy*",
"PRICE": 1.5,
"CHANNEL_ID": 1,
"CAT_PARENT": 557,
"MEDIA_ID": 10194,
"GENRE": "Be Happy",
"MEDIA_PKEY": "534570",
"COMPOSER": null,
"PLAYER": null,
"CATMEDIA_NAME": "Be Happy",
"FTID": null,
"VIEW_ID": 241,
"POSITION": 6733,
"ITEMCODE": "33271",
"CAT_ID": 558,
"PRIORITY": 100,
"CKEY": 528380,
"CATMEDIA_RANK": 3,
"BILLINGTYPE_ID": 1,
"CAT_NAME": "POP",
"KEYWORDS": null,
"LONG_DESCRIPTION": null,
"SHORT_DESCRIPTION": null,
"TYPE_ID": 76,
"ARTIST_GENDER": null,
   * "PERFORMER": "Mary J. Blige",*
"MAPPINGS": "1_241_558_POP_557_6733_1.5",
"SHO

[Tool Contribution] Alfred the ElasticSearch Butler

2014-04-02 Thread Colton

Hello ElasticSearch Community,

My name is Colton McInroy and I work with DOSarrest Internet 
Security LTD. Over the past few months I have been working with 
ElasticSearch fairly closely and building a infrastructure for it. When 
dealing with lots of indices, managing lots them can be somewhat 
difficult in most web interfaces we found. We wanted to be able to for 
instance, have indices over a certain amount of time expire out of the 
cluster. We came across curator 
(https://github.com/elasticsearch/curator) which came fairly close, but 
had some limitations. I decided to spend a couple of days building our 
own tool from scratch which after discussion we have decided to release 
to the public via open source. We have called this tool Alfred, after 
Bruce Wayne's butler Alfred Pennyworth, keeping in line with the Marvel 
comics theme.


Alfred can be set up in a cronjob to automatically groom your 
indices so that you only keep a certain amount of data, optimize 
indexes, change settings (such as changing routing), and more. By 
default no changes are made unless you specify the -r or --run 
parameter. In its default mode, you can test this tool all you want and 
get output to see what would have been done without changes actually 
occurring. You can use the -D option to specify more debug output also 
if you want to see what's going on (such as "-D debug"). Once you are 
ready, add the -r parameter and watch Alfred do all the work for you.


Alfred was developed in Java, but does not use the ElasticSearch 
Java API, rather it uses the restful api through the use of Apache 
HttpClient (http://hc.apache.org/httpclient-3.x/). The following 
libraries are included via maven into Alfred...


joda-time 2.3
httpcore 4.3.2
gson 2.2.4
httpclient 4.3.3
commons-logging 1.1.3
commons-codec 1.6
commons-cli 1.2

A jar build is located at 
https://github.com/DOSarrest-Internet-Security/alfred/raw/master/builds/alfred-0.0.1.jar
Our Github page with source and README is located at 
https://github.com/DOSarrest-Internet-Security/alfred


Here is some of that README file to explain how to use alfred...

|usage: alfred
 -b,--debloom  Disable Bloom on Indexes
 -B,--bloomEnable Bloom on Indexes
 -c,--closeClose Indexes
 -D,--debug   Display debug (debug|info|warn|error|fatal)
 -d,--delete   Delete Indexes
 -E,--expiresize  Byte size limit  (Default 10 GB)
 -e,--expiretime  Number of time units old (Default 24)
--examples Show some examples of how to use Alfred
 -f,--flushFlush Indexes
 -h,--help Help Page (Viewing Now)
--hostElasticSearch Host
 -i,--index   Index pattern to match (Default _all)
--max_num_segmentsOptimize max_num_segments (Default 2)
 -o,--optimize Optimize Indexes
 -O,--open Open Indexes
--portElasticSearch Port
 -r,--run  Required to execute changes on
   ElasticSearch
 -s,--style   Clean up style (time|size) (Default time)
 -S,--settingsPUT settings
--ssl  ElasticSearch SSL
 -T,--time-unit   Specify time units (hour|day|none) (Default
   hour)
 -t,--timeout ElasticSearch Timeout (Default 30)
Alfred Version: 0.0.1|


Alfred was built as a tool to handle maintenance work on ElasticSearch. 
Alfred will delete, flush cache, optimize, close/open, enable/disable 
bloom filter, as well as put settings on indexes. Alfred can do any of 
these actions based on either time or size parameters.


Examples:

|java -jar alfred.jar -e48 -i"cron_*" -d
|

Delete any indexes starting with "cron_" that are older that 48 hours

|java -jar alfred.jar -e24 -i"cron_*" 
-S'{"index.routing.allocation.require.tag":"historical"}'
|

Set routing to require historical tag on any indexes starting with 
"cron_" that are older that 24 hours


|java -jar alfred.jar -e24 -i"cron_*" -b -o
|

Disable boom filter and optimize any indexes starting with "cron_" that 
are older that 24 hours


|java -jar alfred.jar -ssize -E"1 GB" -d
|

Find all indxes, group by prefix, and delete indexes over a limit of 1 
GB. Using the size style with an expire size does not check space based 
on a single index but rather the indexes adding up over time. Such as 
the following...


|java -jar alfred.jar -i"cron_*" -d -ssize -E"500 GB"
GENERAL: cron_2014_04_02_08 is 469.9 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_07 is 436.5 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_06 is 404.0 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_05 is 372.1 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_04 is 341.2 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_03 is 310.1 GiB bytes before the cuttoff.
GENERAL: cron

Re: Relevancy sorting of result returned

2014-04-02 Thread chee hoo lum
Hi Binh,

Great. Thanks for that.


On Wed, Apr 2, 2014 at 12:05 AM, Binh Ly  wrote:

> If you specify explain=true in your query, it will tell you in detail how
> the score is computed:
>
> {
>   "explain": true,
>   "query": {}
> }
>
> Some useful info:
>
>
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>
> http://jontai.me/blog/2012/10/lucene-scoring-and-elasticsearch-_all-field/
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/523ccc24-90a5-4b1a-aca1-bd1018e041aa%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Regards,

Chee Hoo

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_onEHeshd%3Do2_wzN%3DcaWbqBBdJwUW2Y5p_0P5rL%2B8-1w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need some help for creating my model

2014-04-02 Thread Stefan Kruse
Hello,

many thanks to your answer. 
Could you give me a little example how to add/remove a single child to a 
parent object, maybe. 
I would like to do this with the elasticsearch php module. Is this possible?

Regards Stefan

Am Dienstag, 1. April 2014 16:45:16 UTC+2 schrieb Binh Ly:
>
> This might help:
>
> http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
>
> Out of the box, you can't model many to many in ES (unless you do it 
> yourself in code). One to many is supported using either nested or 
> parent-child.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ff28df3-39ca-4e3c-b678-8c49f631473d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Grouping entries together in a query, need some help with aggregations

2014-04-02 Thread Vincent Massol
Thanks a lot for your fast response Adrien!

* I noticed the cardinality aggregation but I was worried by the "an 
approximate count of distinct values." part of the documentation. I need an 
exact value, not an approximate one :) However I've read more the 
documentation and it may not be a real problem in practice, especially if I 
use a threshold of 4 (the max apparently). I couldn't find the default 
precision value BTW in the documentation.
* From your answer I gather that using aggregations is the only solution to 
my problem and there's no way to use the Query DSL to solve it.

Thanks, it helps a lot!
-Vincent

On Wednesday, April 2, 2014 11:17:17 AM UTC+2, Adrien Grand wrote:
>
> Hi Vincent,
>
> I left some replies inline:
>
> On Wed, Apr 2, 2014 at 10:02 AM, Vincent Massol 
> > wrote:
>
>> Hi guys,
>>
>> I'd like to count all entries in my ES instance, having a timestamp from 
>> the *last day* and *group together all entries having the same 
>> "instanceId"*. With the data below, the count result should be 1 (and 
>> not 2) since 2 entries are within the last day but they have the same 
>> instanceId of "def".
>>
>> I tried the following:
>>
>> curl -XPOST "
>> http://localhost:9200/installs/install/_search?pretty=1&fields=_source,_timestamp";
>>  
>> -d'
>> {
>> "aggs": {
>> "lastday" : {
>> "filter" : { 
>> "range" : { 
>> "_timestamp" : {
>> "gt" : "now-1d"
>> }
>> }
>> },
>> "aggs" : {
>> "instanceids" : {
>> "terms" : { "field" : "instanceId" }
>> }
>> }
>> }
>> }
>> }'
>>
>> But I have 3 problems with this:
>> * It's not a count but a search. "aggs" don't seem to work with _count
>> * It returns all entries in the result before the aggs data
>>
>
> For these two issues, you probably want to check out the count search 
> type[1] which works with aggregations. It's like a regular search, but 
> doesn't do perform the fetch phase in order to fetch the top hits.
>
> [1] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#count
>  
>
>> * In the aggs I don't get a direct count value and I have to count the 
>> number of buckets to get my answer
>>
>
> We recently (Elasticsearch 1.1.0) added a cardinality[2] aggregation, that 
> allows for counting unique values. In previous versions of Elasticsearch, 
> counting was indeed only possible through the terms aggregation with a high 
> `size` parameter, but this was inefficient on high-cardinality fields.
>
> [2] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#search-aggregations-metrics-cardinality-aggregation
>  
> Here is a gist that gives an example of the count search_type and the 
> cardinality aggregation:
>   https://gist.github.com/jpountz/9930690
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2260e806-b42b-4936-a9ec-5079e691108f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[Hadoop] New Feature - to write bulks to different indexes from hadoop...

2014-04-02 Thread Igor Romanov
Hey,

I am designing solution for indexing using hadoop.
I think to use same logic of LogStash to create index per period of time of 
my records (10 days or Month) , in order to avoid working with big index 
sizes(from experience - merge of huge fragments in lucene make whole index 
being slow) and also that way I don't limit myself to certain amount of 
shards, I will be able to modify period dynamically and move indexes 
between nodes in the cluster...

So I though writing in elasticsearch-hadoop option of extracting indexName 
from value object -  or even use the key for index name, then holding 
RestRepository object per index name, that will buffer bulks per index and 
send them when bulk is full or hadoop job ends

Another option just write in the bulk index name + type, and send bulk to 
master ES node (not take shards list of certain index and choose one shard 
depending on instance of hadoop)
(but in that scenario I think that master ES node will work too hard 
because many mappers/reducers will write to same node and it will need to 
route those index records one by one...)

Who worked with elasticsearch-hadoop code - I would like to receive inputs 
- what do you think? what better?

Thanks,
Igor


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/696de734-e97e-4cb5-ae80-5fa8717b6190%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Lucene index corruption on nodes restart

2014-04-02 Thread Paweł Chabierski
Few days ago we found we've got that same error when we search for data. 

reason: "FetchPhaseExecutionException[[site_production][1]: 
query[ConstantScore(cache(_type:ademail))],from[0],size[648]: Fetch Failed 
[Failed to fetch doc id [9615533]]]; nested: EOFException[seek past EOF: 
MMapIndexInput(path="/opt/elasticsearch/data/naprawa/nodes/0/indices/site_production/1/index/_573oa.fdt")];
 

After this error, elasticsearch stop search and don't return all results, 
even if they are returned properly in other query. Is there any way to fix 
this index? We try checkIndex from Luccene core library, after that we lost 
~20 milions of documents but error still occurs. Also we try to restore 
index from snapshot, but error still occurs :/.

W dniu sobota, 22 marca 2014 14:04:56 UTC+1 użytkownik Andrey Perminov 
napisał:
>
> We are using a small elasticsearch cluster of three nodes, version 1.0.1. 
> Each node has 7 GB RAM. Our software creates daily indexes for storing it's 
> data. Daily index is something around 5 GB. Unfortunately, for a reason, 
> Elasticsearch eats up all RAM and hangs the node, even though heap size is 
> set to 6 GB max. So we decided to use monit to restart it on reaching 
> memory limit of 90%. It works, but sometimes we got such errors:
>
> [2014-03-22 16:56:04,943][DEBUG][action.search.type   ] [es-00] 
> [product-22-03-2014][0], node[jbUDVzuvS5GTM7iOG8iwzQ], [P], s[STARTED]: 
> Failed to execute [org.elasticsearch.action.search.SearchRequest@687dc039]
> org.elasticsearch.search.fetch.FetchPhaseExecutionException: 
> [product-22-03-2014][0]: query[filtered(ToParentBlockJoinQuery 
> (filtered(history.created:[1392574921000 TO 
> *])->cache(_type:__history)))->cache(_type:product)],from[0],size[1000],sort[  
> org.elasticsearch.index.search.nested.NestedFieldComparatorSource@15e4ece9>]: 
> Fetch Failed [Failed to fetch doc id [7263214]]
> at 
> org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:230)
> at 
> org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:156)
> at 
> org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:332)
> at 
> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
> at 
> org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
> at 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
> at 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:292)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.io.EOFException: seek past EOF: 
> MMapIndexInput(path="/opt/elasticsearch/main/nodes/0/indices/product-22-03-2014/0/index/_9lz.fdt")
> at 
> org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:174)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:229)
> at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276)
> at 
> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
> at 
> org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:196)
> at 
> org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:228)
> ... 9 more
> [2014-03-22 16:56:04,944][DEBUG][action.search.type   ] [es-00] All 
> shards failed for phase: [query_fetch]
>
> According to our logs, this might happen when one or two nodes get 
> restarted. More strangely, same shard got corrupted on all nodes of 
> cluster. Why could this happen? How can we fix it? Can you suggest us how 
> to fix memory usage?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8831e9e9-9c4d-4a2e-bd16-99c33f589b7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Grouping entries together in a query, need some help with aggregations

2014-04-02 Thread Adrien Grand
Hi Vincent,

I left some replies inline:

On Wed, Apr 2, 2014 at 10:02 AM, Vincent Massol  wrote:

> Hi guys,
>
> I'd like to count all entries in my ES instance, having a timestamp from
> the *last day* and *group together all entries having the same
> "instanceId"*. With the data below, the count result should be 1 (and not
> 2) since 2 entries are within the last day but they have the same
> instanceId of "def".
>
> I tried the following:
>
> curl -XPOST "
> http://localhost:9200/installs/install/_search?pretty=1&fields=_source,_timestamp";
> -d'
> {
> "aggs": {
> "lastday" : {
> "filter" : {
> "range" : {
> "_timestamp" : {
> "gt" : "now-1d"
> }
> }
> },
> "aggs" : {
> "instanceids" : {
> "terms" : { "field" : "instanceId" }
> }
> }
> }
> }
> }'
>
> But I have 3 problems with this:
> * It's not a count but a search. "aggs" don't seem to work with _count
> * It returns all entries in the result before the aggs data
>

For these two issues, you probably want to check out the count search
type[1] which works with aggregations. It's like a regular search, but
doesn't do perform the fetch phase in order to fetch the top hits.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#count


> * In the aggs I don't get a direct count value and I have to count the
> number of buckets to get my answer
>

We recently (Elasticsearch 1.1.0) added a cardinality[2] aggregation, that
allows for counting unique values. In previous versions of Elasticsearch,
counting was indeed only possible through the terms aggregation with a high
`size` parameter, but this was inefficient on high-cardinality fields.

[2]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#search-aggregations-metrics-cardinality-aggregation

Here is a gist that gives an example of the count search_type and the
cardinality aggregation:
  https://gist.github.com/jpountz/9930690

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4BaSGiyoNoSdu6qCxjjU4n1xCh3hT35cmcTGPmemcLtg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation error( Java heap space)

2014-04-02 Thread Adrien Grand
On Wed, Apr 2, 2014 at 10:52 AM, 张阳  wrote:

> But I can do aggregation on 'banner' field on both cluster. Is that
> because values of 'banner' are not so unique compared to 'ip' field
>

Very likely, yes. Memory usage of field data is higher on high-cardinality
fields.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Fzw6Aud-J2RFb7a2DvfzrDfjyNdMLP0DcjuWgd0Ax9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation error( Java heap space)

2014-04-02 Thread 张阳
But I can do aggregation on 'banner' field on both cluster. Is that because
values of 'banner' are not so unique compared to 'ip' field


2014-04-02 16:27 GMT+08:00 Adrien Grand :

> Given your description of the problem, I think the issue is that your
> Elasticsearch cluster doesn't have enough memory to load field data for the
> ip field (which needs to be done for all documents, not only those that
> match your query). So you either need to give more nodes to your cluster,
> more memory to your nodes, or use doc values for your ip field[1] (the
> latter option requires reindexing).
>
> [1]
> http://www.elasticsearch.org/blog/disk-based-field-data-a-k-a-doc-values/
>
>
> On Wed, Apr 2, 2014 at 10:09 AM,  wrote:
>
>> The smaller index have 1 million lines of data. They are the lines
>> filtered  by "prefix":{"ip":"100.1"} from the bigger one.
>>
>> 在 2014年4月2日星期三UTC+8下午4时04分27秒,vir@gmail.com写道:
>>
>>> I do an *aggregation* search on my index(*6 nodes*). There are about *200
>>> million lines* of data(port scanning). Each line is same* like this 
>>> :**{"ip":"85.18.68.5",
>>> "banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
>>> So you can image I have these data sort into different type by port they
>>> are scanning. Now, I want to know who open a lot of ports at the same time.
>>> So, I choose to do aggregation on IP field, and I get an OOM error that may
>>> be reasonable because of most of them open only one port so that there are
>>> too many buckets? I guess.
>>>
>>>
>>> And then, I use aggregation filter.
>>>
>>> {
>>> "aggs":{
>>> "just_name1":{
>>> "filter":{
>>> "prefix":{
>>> "ip":"100.1"
>>> }
>>> },
>>> "aggs":{
>>> "just_name2":{
>>> "terms":{
>>> "field":"ip",
>>> "execution_hint":"map"
>>> }
>>> }
>>> }
>>> }
>>> }
>>> }(yes, my ip field is set as string)
>>>
>>> I think this time, I could make ES narrow down the set for aggregation. But 
>>> I still get an OOM error. While It works on a smaller index(another 
>>> cluster, one node). Why would this happen? After filtering, 2 cluster 
>>> should have an equal-volume set. Why the bigger one failed?
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Adrien Grand
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/cf6dpcV7G3w/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6kOx7RXmBzU9wfhesUYiz-2Qx8mrZStb_rCGdQv%2BpqNQ%40mail.gmail.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJp1%3DtwM3KJ1QYvsKGcXi4bDfjwDF-bRviSsYX6jUBEg6w5qgQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch hardware requirement

2014-04-02 Thread Jorge Román
Thanks for the reply. I have done the test with 1 node (16GB RAM and 8CPUs,
allocating 8GB to ES), and I have been able to deal with all events with
only 1 node. Now i¹m trying to find out where is the bottleneck.

Next step, I¹m gonna try to benchmarking elasticsearch without external
elements in order to take measures.

Best Regards!


Jorge Román Novalbos
CEO
jro...@servotic.com
679 99 08 62
 
  

 

 Este mensaje es solamente para la persona a la que va dirigido. Puede
contener información confidencial o legalmente protegida. Si usted ha
recibido este mensaje por error, le rogamos que borre de su sistema el
mensaje inmediatamente y notifíquelo al remitente. No debe, directa o
indirectamente, usar, revelar, distribuir, imprimir o copiar ninguna de las
partes de este mensaje si no es usted el destinatario. En cumplimiento de la
Ley Orgánica 15/1999, de Protección de Datos de Carácter Personal le
informamos que su dirección de correo electrónico, sus datos personales y de
empresa pasarán a formar parte de nuestro fichero de Clientes y Proveedores,
registrado ante la Agencia de Protección de Datos. En cumplimiento de la Ley
34/2002, de Servicios de la Sociedad de la Información y el Comercio
Electrónico, le informamos que esta dirección de correo electrónico podrá
ser utilizada para el envío de información comercial o promocional de
nuestra organización. Si no desea recibir información o desea ejercitar sus
derechos de acceso, rectificación, cancelación y oposición, le rogamos nos
lo comunique vía correo electrónico a la siguiente dirección:
l...@servotic.com

De:  Binh Ly 
Responder a:  
Fecha:  Fri, 28 Mar 2014 07:06:18 -0700 (PDT)
Para:  
Asunto:  Re: Elasticsearch hardware requirement

The best way is to test it. Take 1 node with say 16GB of RAM and allocate
8GB to ES. Then start pushing 1 day worth of logs into an index with 5
shards and 0 replicas on that one node AND run your typical queries. Take
measurements like throughput/dps, query latency, ram usage, cpu, and disk.
Then you'll know how much you can do on a single node and then start
extrapolating from there.

-- 
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/tVywigD5iU8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9ee91a2d-2941-4327-82f0-2793
eb1cb242%40googlegroups.com
 .
For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CF619208.1737D%25jroman%40servotic.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation error( Java heap space)

2014-04-02 Thread Adrien Grand
Given your description of the problem, I think the issue is that your
Elasticsearch cluster doesn't have enough memory to load field data for the
ip field (which needs to be done for all documents, not only those that
match your query). So you either need to give more nodes to your cluster,
more memory to your nodes, or use doc values for your ip field[1] (the
latter option requires reindexing).

[1]
http://www.elasticsearch.org/blog/disk-based-field-data-a-k-a-doc-values/


On Wed, Apr 2, 2014 at 10:09 AM,  wrote:

> The smaller index have 1 million lines of data. They are the lines
> filtered  by "prefix":{"ip":"100.1"} from the bigger one.
>
> 在 2014年4月2日星期三UTC+8下午4时04分27秒,vir@gmail.com写道:
>
>> I do an *aggregation* search on my index(*6 nodes*). There are about *200
>> million lines* of data(port scanning). Each line is same* like this 
>> :**{"ip":"85.18.68.5",
>> "banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.*
>> So you can image I have these data sort into different type by port they
>> are scanning. Now, I want to know who open a lot of ports at the same time.
>> So, I choose to do aggregation on IP field, and I get an OOM error that may
>> be reasonable because of most of them open only one port so that there are
>> too many buckets? I guess.
>>
>>
>> And then, I use aggregation filter.
>>
>> {
>> "aggs":{
>> "just_name1":{
>>  "filter":{
>>  "prefix":{
>>  "ip":"100.1"
>>  }
>>  },
>>  "aggs":{
>>  "just_name2":{
>>  "terms":{
>>  "field":"ip",
>>  "execution_hint":"map"
>>  }
>>  }
>>  }
>>  }
>> }
>> }(yes, my ip field is set as string)
>>
>> I think this time, I could make ES narrow down the set for aggregation. But 
>> I still get an OOM error. While It works on a smaller index(another cluster, 
>> one node). Why would this happen? After filtering, 2 cluster should have an 
>> equal-volume set. Why the bigger one failed?
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6kOx7RXmBzU9wfhesUYiz-2Qx8mrZStb_rCGdQv%2BpqNQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Automatically build `input` for `completion` fields?

2014-04-02 Thread Sviatoslav Abakumov
Hello,

I have an index `facebook` with type `post`. I need to provide users with 
autocompletions using terms that appear in `post.message`. Also the list of 
completions should be sorted by score.

The mapping is as follows:
{
"post": {
"properties": {
"created_time": {
"type": "date",
"format": "dateOptionalTime"
},
"link": {
"type": "string"
},
"message": {
"type": "string"
},
"object_id": {
"type": "string"
},
"picture": {
"type": "string"
},
"shares_count": {
"type": "long"
},
"type": {
"type": "string"
},
"update_time": {
"type": "date",
"format": "dateOptionalTime"
},
"user": {
"type": "long"
}
}
}
}

An example document:
{
"picture": "...",
"update_time": "2014-03-19T23:16:59",
"message": "The
 day has finally arrived - the first piece of the 1,000,000 Swag Bucks 
pie has been served!!  Check to see if you're our first winner!",
"object_id": "",
"shares_count": 0,
"link": "...",
"user": ...,
"created_time": "2014-03-17T21:02:32",
"type": "link"
}

To achieve the goal I've added one more field to the mapping:
"message_suggest": {
"type": "completion"
}

Every time I write a document, I query ES to tokenize the string `message`:
POST _analyze?_tokenizer=standard
The day has finally arrived - the first piece of the 1,000,000 Swag Bucks 
pie has been served!!  Check to see if you're our first winner!

Then I get the list of tokens from the response and add it to 
`post.message_suggest.input`.

When I do the following request, I get what I wanted:
POST facebook/_suggest
{
"messages": {
"text": "pi",
"completion": {
"field": "message_suggest"
}
}
}

I sense that this approach is not right or at least not optimal. I am new 
to Elasticsearch and I would appreciate any input.

Best,
Sviatoslav.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7ea7946c-59e3-4b6b-89f8-0ff327d19017%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Delete index directory

2014-04-02 Thread 김주은
I've tried deleting an index by calling '$ curl -XDELETE 
'http://localhost:9200/indexName', 
but in actual file system 'indexName' directory still persists in the path 
'repository/elasticsearch/data/228.5.8.6/nodes/0/indices/'.
The delete API only deletes '_state' directory under 'indexName' directory.
Because of this, I can't recreate an index with the same name.
Does anyone have any workaround for this?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2f426122-558f-4b03-92ad-0c3e957a6773%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: using java get document api within a script field

2014-04-02 Thread joergpra...@gmail.com
I wrote a denormalizer plugin where I use a node client from a field
analyzer for a field type "deref". A node client is started as a singleton
per node where the plugin is installed. It can ask other indexes/types for
a doc by given ID for injecting additional terms from an array of terms of
the referenced doc into the current tokenizer stream.

It is not well tested and early in development, but I can share the code if
there is interest.

I would never start a client node from a a script, because of the enormous
overhead, as you have experienced.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEYBU64spM4QvtRSQDVxjMK7-yRENWC%2BqFhr4mkTHN49w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Aggregation error( Java heap space)

2014-04-02 Thread vir . candy
The smaller index have 1 million lines of data. They are the lines filtered 
 by "prefix":{"ip":"100.1"} from the bigger one.

在 2014年4月2日星期三UTC+8下午4时04分27秒,vir@gmail.com写道:
>
> I do an *aggregation* search on my index(*6 nodes*). There are about *200 
> million lines* of data(port scanning). Each line is same* like this 
> :**{"ip":"85.18.68.5", 
> "banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.* 
> So you can image I have these data sort into different type by port they 
> are scanning. Now, I want to know who open a lot of ports at the same time. 
> So, I choose to do aggregation on IP field, and I get an OOM error that may 
> be reasonable because of most of them open only one port so that there are 
> too many buckets? I guess.
>
>
> And then, I use aggregation filter. 
>
> {
> "aggs":{
> "just_name1":{
>   "filter":{
>   "prefix":{
>   "ip":"100.1"
>   }
>   },
>   "aggs":{
>   "just_name2":{
>   "terms":{
>   "field":"ip",
>   "execution_hint":"map"
>   }
>   }
>   }
>   }
> }
> }(yes, my ip field is set as string)
>
> I think this time, I could make ES narrow down the set for aggregation. But I 
> still get an OOM error. While It works on a smaller index(another cluster, 
> one node). Why would this happen? After filtering, 2 cluster should have an 
> equal-volume set. Why the bigger one failed?  
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d384bea8-4a60-4521-aa0e-34bb2fd61ec5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: search cascading in ES

2014-04-02 Thread Adrien Grand
Hi,

There is no such built-in functionnality in Elasticsearch, and I don't know
of a third-party library that would provide this.


On Wed, Apr 2, 2014 at 8:10 AM, Chetana  wrote:

> We are developing an application which requires cascaded (flow based)
> search where the search result of one will become the input criteria for
> the next search.
>
> Is there a way to do this in ES ? If not, can you suggest some third party
> library which can provide cascading functionality over ES search
>
>
> Thanks
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ce66247d-f8d3-4d0e-8dc0-ecc848542240%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7ZD4-K8%3DXArw%3D765TgeXJSLGaj4ob2Y14qkGVEzZh6pg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Aggregation error( Java heap space)

2014-04-02 Thread vir . candy
I do an *aggregation* search on my index(*6 nodes*). There are about *200 
million lines* of data(port scanning). Each line is same* like this 
:**{"ip":"85.18.68.5", 
"banner":"cisco-IOS", "country":"IT", "_type":"port-80"}.* 
So you can image I have these data sort into different type by port they 
are scanning. Now, I want to know who open a lot of ports at the same time. 
So, I choose to do aggregation on IP field, and I get an OOM error that may 
be reasonable because of most of them open only one port so that there are 
too many buckets? I guess.


And then, I use aggregation filter. 

{
"aggs":{
"just_name1":{
"filter":{
"prefix":{
"ip":"100.1"
}
},
"aggs":{
"just_name2":{
"terms":{
"field":"ip",
"execution_hint":"map"
}
}
}
}
}
}(yes, my ip field is set as string)

I think this time, I could make ES narrow down the set for aggregation. But I 
still get an OOM error. While It works on a smaller index(another cluster, one 
node). Why would this happen? After filtering, 2 cluster should have an 
equal-volume set. Why the bigger one failed?  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d66bef21-b1e9-4538-b621-e93949b389cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Grouping entries together in a query, need some help with aggregations

2014-04-02 Thread Vincent Massol
Hi guys,

I'd like to count all entries in my ES instance, having a timestamp from 
the *last day* and *group together all entries having the same "instanceId"*. 
With the data below, the count result should be 1 (and not 2) since 2 
entries are within the last day but they have the same instanceId of "def".

I tried the following:

curl -XPOST 
"http://localhost:9200/installs/install/_search?pretty=1&fields=_source,_timestamp";
 
-d'
{
"aggs": {
"lastday" : {
"filter" : { 
"range" : { 
"_timestamp" : {
"gt" : "now-1d"
}
}
},
"aggs" : {
"instanceids" : {
"terms" : { "field" : "instanceId" }
}
}
}
}
}'

But I have 3 problems with this:
* It's not a count but a search. "aggs" don't seem to work with _count
* It returns all entries in the result before the aggs data
* In the aggs I don't get a direct count value and I have to count the 
number of buckets to get my answer

I'm pretty sure there's a simpler way but I'm having a hard time figuring 
it out. Also could this query be expressed fully in the Query DSL?

Data:
=

curl -XDELETE "http://localhost:9200/installs";

curl -XPUT "http://localhost:9200/installs";

curl -XPUT "http://localhost:9200/installs/install/_mapping"; -d'
{
  "install" : {
"_timestamp" : { 
  "enabled" : true,
  "store" : true
},
"properties" : {
  "formatVersion" : { "type" : "string", "index" : "not_analyzed" },
  "instanceId" : { "type" : "string", "index" : "not_analyzed" },
  "distributionId" : { "type" : "string", "index" : "not_analyzed" },
  "distributionVersion" : { "type" : "string", "index" : "not_analyzed" 
}
}
  }
}'

curl -XPOST "http://localhost:9200/installs/install?timestamp=2014-03-20"; 
-d'
{
  "formatVersion" : "2.0",
  "instanceId" : "abc",
  "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
  "distributionVersion" : "6.0-milestone-1"
}'

curl -XPOST "http://localhost:9200/installs/install"; -d'
{
  "formatVersion" : "2.0",
  "instanceId" : "def",
  "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
  "distributionVersion" : "5.4.3"
}'

curl -XPOST "http://localhost:9200/installs/install"; -d'
{
  "formatVersion" : "2.0",
  "instanceId" : "def",
  "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
  "distributionVersion" : "5.4.3"
}'

Thanks a lot for any help or pointers.
-Vincent

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/162805ff-5fa8-4a9a-9c77-a13922c09486%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How cause ElasticSearch to not throw DocumentAlreadyExistsException?

2014-04-02 Thread Igor Romanov
Eventually I solved the issue in pretty ugly way - I added new "add" 
command to elasticsearch, that doing what "create" does, but don't throw 
exception ... bad thing about it that each new elasticsearch version that I 
would want to use, I will need to merge those changes :/

On Monday, March 31, 2014 3:47:04 PM UTC+3, Igor Romanov wrote:
>
> In case of "Create" I wanted to ignore response full of errors (for 
> example errors on Creating 5000 docs in a batch...)
> Also I want to save time by not checking the document...it is additional 
> query to ES, when actually ES checks same when it try to create...so better 
> it be in one batch upload...
> I guess I am missing functionality in ES to CreateOnlyNew , so it will not 
> throw exceptions on existing documents ... :/
>
> Maybe there is a way to configure in ElasticSearch to not throw certain 
> exception to client?
>
> Igor
>
> On Monday, March 31, 2014 7:33:06 AM UTC+3, kidkid wrote:
>>
>> Hi,
>> If you use "Create" Operation, so when your doc exists, you need to catch 
>> & leave this exception.
>> To avoid it, I think you could check your document, if it's not exist 
>> then you do index.
>>
>>
>> On Sunday, March 30, 2014 11:48:34 PM UTC+7, Igor Romanov wrote:
>>>
>>> Hi all,
>>>
>>> I would like that elastic search will index only new documents, but for 
>>> existing one will not throw any exception and quietly continue
>>>
>>> Is that possible?
>>>
>>> I tried to index using "create" operation , but then I get 
>>> "DocumentAlreadyExistsException" - I would like not to get it at all...so 
>>> elastic search will do nothing in case that document already exists
>>>
>>> Also I tried to set "version_type" : external and each document set same 
>>> version, but then I get exception "VersionConflictEngineException"
>>>
>>> So question how I configure ElasticSearch or its index to just index new 
>>> docs and ignore existing once...
>>>
>>> (I need that because I run some hadoop jobs, and don't want that elastic 
>>> search will reindex same doc data after each job execution...I want to save 
>>> queries to ES to check if doc already exists...)
>>>
>>> Thanks
>>> Igor
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c3e07fe-04db-4311-8e5d-e586174c46e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.