date:20140806

May be you are doing a lot of updates (using same IDs)?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 7 août 2014 à 05:52, vjbangis  a écrit :

Hello guys,

Could you help me why "docs.count" below is not increasing? it's stack at 
2307764. while the "docs.deleted" keeps increasing.

i'm just running a php script to ingest the csv source data to ES.


[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   412834  1.2gb  638mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   421090  1.2gb637.8mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   434017  1.2gb638.5mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   437557  1.2gb632.6mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   451098  1.2gb642.4mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   514957  1.3gb670.1mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   882639  1.4gb734.7mb 



It's running 4 nodes of c3.large. ES-1.2.3 Is there anything I need to add in 
the elasticsearch.yml below ?


cluster.name: clustername 

node.name: “machine00" 

discovery.zen.ping.multicast.enabled: false 

discovery.zen.ping.unicast.hosts: [“machine01","machine02","machine03”] 

discovery.zen.minimum_master_nodes: 3


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b056741-69f8-4288-9b4a-e77755c66249%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8F2028BA-BB33-4AC7-9490-AFF77940D056%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Parse Failure [Expected [START_OBJECT] under [filter], but got a [START_ARRAY

This is incorrect:

 "x_book":{  

"filter":[  

   {  

  "term":{  

 "book.raw":"X"

  }

   }

],

It should be

 "x_book":{  

"filter":{

   {  

  "term":{  

 "book.raw":"X"

  }

   }

},


So it's not an elasticsearch issue. It's an error in your code.
That's why I closed your "issue".


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 7 août 2014 à 05:37, Vincent Gross  a écrit :

I've tried to upgrade the version of ES yesterday (from 1.1.1 to 1.3.1) and I 
have an Issue when I try to use a complexe query with Aggregation.

Parse Failure [Expected [START_OBJECT] under [filter], but got a [START_ARRAY

I've this bug since the version 1.2.0 (I've tried all the version after this 
version and I have this bug).

My system is very complexe and this bug comes from a filter on a Aggregation. I 
don't have enough time to expose my mapping and all the needs to reproduce the 
bug sorry about that. For my application I'd to downgrade my version to 1.1.2 
to have my project working well.

I've opened an issue on Github and it was close I don't understand why ? I 
still have my issue...

Here my query (in my Application I use the official php client of elastic 
search):

{  

   "index":"dailysnapshot_201406",

   "type":"dailysnapshot",

   "body":{  

  "size":0,

  "query":{  

 "filtered":{  

"filter":{  

   "bool":{  

  "must":[  

 {  

"term":{  

   "is_mam":false

}

 },

 {  

"term":{  

   "datetime":"20140619"

}

 }

  ],

  "_cache":true

   }

}

 }

  },

  "aggs":{  

 "x_book":{  

"filter":[  

   {  

  "term":{  

 "book.raw":"X"

  }

   }

],

"aggs":{  

   "unrealised_x_sum":{  

  "sum":{  

 "field":"unrealised_change_aud"

  }

   },

   "closed_x_sum":{  

  "sum":{  

 "field":"closed_profit_aud"

  }

   },

   "ib_x_sum":{  

  "sum":{  

 "field":"agent_commission_aud"

  }

   },

   "pnl_x_sum":{  

  "sum":{  

 "field":"daily_profit_aud"

  }

   }

}

 },

 "s_book":{  

"filter":[  

   {  

  "term":{  

 "book.raw":"S"

  }

   }

],

"aggs":{  

   "unrealised_s_sum":{  

  "sum":{  

 "field":"unrealised_change_aud"

  }

   },

   "closed_s_sum":{  

  "sum":{  

 "field":"closed_profit_aud"

  }

   },

   "ib_s_sum":{  

  "sum":{  

 "field":"agent_commission_aud"

  }

   },

   "pnl_s_sum":{  

  "sum":{  

 "field":"daily_profit_aud"

  }

   }

}

 }

  }

   }

}




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ecd65c06-e530-4371-8f52-a2a0a6bfa7c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/F04D0FDA-E5C5-453B-878F-5ED19C2A5DF9%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Can aggregation use a prepared result of real-time Map Reduce task?

2014-08-06 Thread Tong Liu

I still want to know some basic theory about that efficient manner. 

On Thursday, August 7, 2014 12:00:31 PM UTC+8, Tong Liu wrote:
>
> (I move the topic from github issue to here)
>
> I want to know the theory of ES aggregation.
> Maybe, it is one of them:
> (1) like a Database. compute when the aggregation query comes.
> (2) like Storm. When a data comes, it aggregate once. You don't need 
> aggregate when query comes. The aggregation query use a prepared result. So 
> it is very quick when query. Storm is like a real-time Hadoop.
>
> So ES is like (1)a common Database? or (2)Storm?
> Thank you!
>
>
> (
>
> This is imotov answer:
>
> In short the answer to your question is aggregations are computed when an 
> aggregation query comes, but because of data structures that elasticsearch 
> is using, they can computed in very efficient manner.
>
> )
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eeedcf51-a0e0-4016-ab83-2854d32acb69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Can aggregation use a prepared result of real-time Map Reduce task?

2014-08-06 Thread Tong Liu

I still want to know some basic theory about that efficient manner. 
Thank you very much!

在 2014年8月7日星期四UTC+8下午12时00分31秒，Tong Liu写道：
>
> (I move the topic from github issue to here)
>
> I want to know the theory of ES aggregation.
> Maybe, it is one of them:
> (1) like a Database. compute when the aggregation query comes.
> (2) like Storm. When a data comes, it aggregate once. You don't need 
> aggregate when query comes. The aggregation query use a prepared result. So 
> it is very quick when query. Storm is like a real-time Hadoop.
>
> So ES is like (1)a common Database? or (2)Storm?
> Thank you!
>
>
> (
>
> This is imotov answer:
>
> In short the answer to your question is aggregations are computed when an 
> aggregation query comes, but because of data structures that elasticsearch 
> is using, they can computed in very efficient manner.
>
> )
>

On Thursday, August 7, 2014 12:00:31 PM UTC+8, Tong Liu wrote:
>
> (I move the topic from github issue to here)
>
> I want to know the theory of ES aggregation.
> Maybe, it is one of them:
> (1) like a Database. compute when the aggregation query comes.
> (2) like Storm. When a data comes, it aggregate once. You don't need 
> aggregate when query comes. The aggregation query use a prepared result. So 
> it is very quick when query. Storm is like a real-time Hadoop.
>
> So ES is like (1)a common Database? or (2)Storm?
> Thank you!
>
>
> (
>
> This is imotov answer:
>
> In short the answer to your question is aggregations are computed when an 
> aggregation query comes, but because of data structures that elasticsearch 
> is using, they can computed in very efficient manner.
>
> )
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22f55524-4778-4ed6-858c-4a4994ef7b5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can aggregation use a prepared result of real-time Map Reduce task?

2014-08-06 Thread Tong Liu



(I move the topic from github issue to here)

I want to know the theory of ES aggregation.
Maybe, it is one of them:
(1) like a Database. compute when the aggregation query comes.
(2) like Storm. When a data comes, it aggregate once. You don't need 
aggregate when query comes. The aggregation query use a prepared result. So 
it is very quick when query. Storm is like a real-time Hadoop.

So ES is like (1)a common Database? or (2)Storm?
Thank you!


(

This is imotov answer:

In short the answer to your question is aggregations are computed when an 
aggregation query comes, but because of data structures that elasticsearch 
is using, they can computed in very efficient manner.

)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c8bd9a10-4698-4563-8f04-b79e5f9c4bc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

doc.deleted keeps increasing in indices

2014-08-06 Thread vjbangis

Hello guys,

Could you help me why "docs.count" below is not increasing? it's stack at 
2307764. while the "docs.deleted" keeps increasing.

i'm just running a php script to ingest the csv source data to ES.


[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   412834  1.2gb  
638mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   421090  1.2gb
637.8mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   434017  1.2gb
638.5mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   437557  1.2gb
632.6mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   451098  1.2gb
642.4mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   514957  1.3gb
670.1mb 

[login@machine elasticsearch]$ curl 'localhost:9200/_cat/indices?v'

health index  pri rep docs.count docs.deleted store.size pri.store.size 

green  kibana-int   5   1  10 25.6kb 12.8kb 

green  basic-info  5   12307764   882639  1.4gb
734.7mb 


It's running 4 nodes of c3.large. ES-1.2.3 Is there anything I need to add 
in the elasticsearch.yml below ?


cluster.name: clustername 

node.name: “machine00" 

discovery.zen.ping.multicast.enabled: false 

discovery.zen.ping.unicast.hosts: [“machine01","machine02","machine03”] 

discovery.zen.minimum_master_nodes: 3

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b056741-69f8-4288-9b4a-e77755c66249%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Parse Failure [Expected [START_OBJECT] under [filter], but got a [START_ARRAY

2014-08-06 Thread Vincent Gross



I've tried to upgrade the version of ES yesterday (from 1.1.1 to 1.3.1) and 
I have an Issue when I try to use a complexe query with Aggregation.

Parse Failure [Expected [START_OBJECT] under [filter], but got a 
[START_ARRAY

I've this bug since the version 1.2.0 (I've tried all the version after 
this version and I have this bug).

My system is very complexe and this bug comes from a filter on a 
Aggregation. I don't have enough time to expose my mapping and all the 
needs to reproduce the bug sorry about that. For my application I'd to 
downgrade my version to 1.1.2 to have my project working well.

I've opened an issue on Github and it was close I don't understand why ? I 
still have my issue...

Here my query (in my Application I use the official php client of elastic 
search):


{  

   "index":"dailysnapshot_201406",

   "type":"dailysnapshot",

   "body":{  

  "size":0,

  "query":{  

 "filtered":{  

"filter":{  

   "bool":{  

  "must":[  

 {  

"term":{  

   "is_mam":false

}

 },

 {  

"term":{  

   "datetime":"20140619"

}

 }

  ],

  "_cache":true

   }

}

 }

  },

  "aggs":{  

 "x_book":{  

"filter":[  

   {  

  "term":{  

 "book.raw":"X"

  }

   }

],

"aggs":{  

   "unrealised_x_sum":{  

  "sum":{  

 "field":"unrealised_change_aud"

  }

   },

   "closed_x_sum":{  

  "sum":{  

 "field":"closed_profit_aud"

  }

   },

   "ib_x_sum":{  

  "sum":{  

 "field":"agent_commission_aud"

  }

   },

   "pnl_x_sum":{  

  "sum":{  

 "field":"daily_profit_aud"

  }

   }

}

 },

 "s_book":{  

"filter":[  

   {  

  "term":{  

 "book.raw":"S"

  }

   }

],

"aggs":{  

   "unrealised_s_sum":{  

  "sum":{  

 "field":"unrealised_change_aud"

  }

   },

   "closed_s_sum":{  

  "sum":{  

 "field":"closed_profit_aud"

  }

   },

   "ib_s_sum":{  

  "sum":{  

 "field":"agent_commission_aud"

  }

   },

   "pnl_s_sum":{  

  "sum":{  

 "field":"daily_profit_aud"

  }

   }

}

 }

  }

   }

}



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ecd65c06-e530-4371-8f52-a2a0a6bfa7c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch still scan all types in a index even if I specify a type

2014-08-06 Thread panfei

Thanks for the information


2014-08-01 0:55 GMT+08:00 Ivan Brusic :

> All types eventually belong to the same Lucene index and Lucene cannot
> handle different types for the same field name. Avoid using the same name
> across types if the field type is different.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/mapping.html#_avoiding_type_gotchas
>
> --
> Ivan
>
>
> On Wed, Jul 30, 2014 at 8:58 PM, panfei  wrote:
>
>> First, put some sample data:
>>
>> curl -XPUT 'localhost:9200/testindex/action1/1?pretty' -d '
>> {
>> "title": "jumping tom",
>> "val": 101
>> }'
>>
>> curl -XPUT 'localhost:9200/testindex/action2/1?pretty' -d '
>> {
>> "title": "jumping jerry",
>> "val": "test"
>> }'
>>
>> as you can see, and the mapping is :
>>
>> {
>> "action1" : {
>> "properties" : {
>> "val" : {
>> "type" : "long"
>> },
>> "title" : {
>> "type" : "string"
>> }
>> }
>> },
>> "action2" : {
>> "properties" : {
>> "val" : {
>> "type" : "string"
>> },
>> "title" : {
>> "type" : "string"
>> }
>> }
>> }
>> }
>>
>> But when do a aggs action:
>>
>> curl 'http://192.168.2.245:9200/testindex/action1/_search' -d '
>> {
>> "aggs": {
>> "vals": {
>> "terms": {
>> "field": "val"
>> }
>> }
>> }
>> }'
>>
>>
>> {
>> "took" : 37,
>> "timed_out" : false,
>> "_shards" : {
>> "total" : 5,
>> "successful" : 4,
>> "failed" : 1,
>> "failures" : [
>> {
>> "index" : "testindex",
>> "shard" : 2,
>> "status" : 500,
>> "reason" : 
>> "RemoteTransportException[[a00][inet[/192.168.2.246:9300]][search/phase/query]];
>> nested: ElasticsearchException[java.lang.NumberFormatException: Invalid
>> shift value (84) in prefixCoded bytes (is encoded value really an INT?)];
>> nested: UncheckedExecutionException[java.lang.NumberFormatException:
>> Invalid shift value (84) in prefixCoded bytes (is encoded value really an
>> INT?)]; nested: NumberFormatException[Invalid shift value (84) in
>> prefixCoded bytes (is encoded value really an INT?)]; "
>> }
>> ]
>> },
>> "hits" : {
>> "total" : 0,
>> "max_score" : null,
>> "hits" : [
>> ]
>> },
>> "aggregations" : {
>> "vals" : {
>> "buckets" : [
>> ]
>> }
>> }
>> }
>>
>> The val field in action1 type is mapped to long, but it seems that ES
>> still scan the action2 type even if I specify the action1 type.
>>
>> any advice to resolve this issue ? thanks.
>>  --
>> 不学习，不知道
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLB_Md4w49%2BDW2O2OuLY6RBAR0DPz6rHLvb9WcLq8h3n6Q%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCA5e332x7zaZUn9kaFTANL-Pv--%2Buj3ed4a%3DY8_zgBGA%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
不学习，不知道

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLBGD1c2vd%3DYzrw4fCf-0BvXFTfqWmNOJ8bf6wQRHDVaRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: System Requirements for ElasticSearch stack

2014-08-06 Thread 熊贻青

I have found quite a few simliar emails about capacity planning. Although
it make sense that there are a lot of variables/factors, it would be great
for new users to have some sort of baseline, which could be simple , just
single type of indices, not too heavy load. Maybe there are already
blogs/articles covering thus topic, but worth a pointer in official
document.

My 2c

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ06eO3%2BfzjTLrN-xMybFSopC%3DkBbPDd%2BKr-qUc2qpuJTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search result only with unique value of the specific field

I'll definitely upgrade.
Thanks 

On Thursday, August 7, 2014 1:07:01 AM UTC+3, Ivan Brusic wrote:
>
> Sorry, I meant to specify the version, but I forgot. If you do upgrade, 
> here is another explanation of top hits: 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/top-hits.html
>
> -- 
> Ivan
>
>
> On Wed, Aug 6, 2014 at 2:59 PM, David Pilato  > wrote:
>
>> This has been added in 1.3.0: 
>> https://github.com/elasticsearch/elasticsearch/pull/6124
>>
>>  -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
>> @dadoonet  | @elasticsearchfr 
>> 
>>
>>
>> Le 6 août 2014 à 23:49:25, slavag (sla...@gmail.com ) a 
>> écrit:
>>  
>> Hi, Thanks for the reply. 
>> I'm trying to define top hits aggregation but getting error : "Parse 
>> Failure [Could not find aggregator type [top_hits] in [single_result]]]; }]" 
>> This is my aggregation definition, first bucket is grouped by id and the 
>> nested bucket is grouped by date and then I want to get only one document 
>> from each nested bucket. 
>>
>>  "aggs" : {
>>   "id" : {
>>   "terms" : {
>>   "field" : "id"
>>   }, 
>>   "aggs" : {
>>   "bckdate" : {
>>   "terms" : {
>>   "field" : "date"
>>   },
>>   "aggs" : {
>>   "single_result" : {
>>   "top_hits" : {
>>   "sort": [
>>{
>> "id": {
>> "order": 
>> "desc"
>>  }
>>}
>> ],
>> "_source": {
>>  "include": [
>>   "*"
>>   ]
>>  },
>>  "size" : 1
>>   }
>>   }
>>   }
>>   }
>>   }
>>   }
>>   }
>>
>> What could be issue with my aggregation ? I'm using ES 1.2.1
>>
>> Thanks
>>
>> On Wednesday, August 6, 2014 10:06:40 PM UTC+3, Ivan Brusic wrote: 
>>>
>>> Perhaps the top hits aggregation can help: http://www.
>>> elasticsearch.org/guide/en/elasticsearch/reference/
>>> current/search-aggregations-metrics-top-hits-aggregation.html 
>>>
>>> -- 
>>> Ivan
>>>  
>>>
>>> On Wed, Aug 6, 2014 at 11:21 AM, slavag  wrote:
>>>
 Hi,  
 Need some advise.
 I have indexed documents, each document has internal id that also 
 indexed as just another indexed field, this id is not used as indexed 
 document id (_id).
 There could be situation when same document is indexed more than once 
 (each of the indexed instances will have different elasticsearch _id), when
 I search I'm getting result all those documents (including multiple 
 instance of the same source document), is there any way to get kind of 
 distinct results, I mean 
 to get search result only unique documents, based on some field form 
 the indexed document ?

 Thanks
  --
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/afc78f11-6050-4471-baec-7e1d2faddb0b%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>  
>>>   --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/edd95f65-c380-410b-9f0f-465dd78e1bad%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/etPan.53e2a53a.2ae8944a.18f0%40MacBook-Air-de-David.local
>>  
>>

Re: Search result only with unique value of the specific field

Sorry, I meant to specify the version, but I forgot. If you do upgrade,
here is another explanation of top hits:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/top-hits.html

-- 
Ivan


On Wed, Aug 6, 2014 at 2:59 PM, David Pilato  wrote:

> This has been added in 1.3.0:
> https://github.com/elasticsearch/elasticsearch/pull/6124
>
> --
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr
> 
>
>
> Le 6 août 2014 à 23:49:25, slavag (slav...@gmail.com) a écrit:
>
> Hi, Thanks for the reply.
> I'm trying to define top hits aggregation but getting error : "Parse
> Failure [Could not find aggregator type [top_hits] in [single_result]]]; }]"
> This is my aggregation definition, first bucket is grouped by id and the
> nested bucket is grouped by date and then I want to get only one document
> from each nested bucket.
>
>  "aggs" : {
>   "id" : {
>   "terms" : {
>   "field" : "id"
>   },
>   "aggs" : {
>   "bckdate" : {
>   "terms" : {
>   "field" : "date"
>   },
>   "aggs" : {
>   "single_result" : {
>   "top_hits" : {
>   "sort": [
>{
> "id": {
> "order":
> "desc"
>  }
>}
> ],
> "_source": {
>  "include": [
>   "*"
>   ]
>  },
>  "size" : 1
>   }
>   }
>   }
>   }
>   }
>   }
>   }
>
> What could be issue with my aggregation ? I'm using ES 1.2.1
>
> Thanks
>
> On Wednesday, August 6, 2014 10:06:40 PM UTC+3, Ivan Brusic wrote:
>>
>> Perhaps the top hits aggregation can help: http://www.
>> elasticsearch.org/guide/en/elasticsearch/reference/
>> current/search-aggregations-metrics-top-hits-aggregation.html
>>
>> --
>> Ivan
>>
>>
>> On Wed, Aug 6, 2014 at 11:21 AM, slavag  wrote:
>>
>>> Hi,
>>> Need some advise.
>>> I have indexed documents, each document has internal id that also
>>> indexed as just another indexed field, this id is not used as indexed
>>> document id (_id).
>>> There could be situation when same document is indexed more than once
>>> (each of the indexed instances will have different elasticsearch _id), when
>>> I search I'm getting result all those documents (including multiple
>>> instance of the same source document), is there any way to get kind of
>>> distinct results, I mean
>>> to get search result only unique documents, based on some field form the
>>> indexed document ?
>>>
>>> Thanks
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/afc78f11-6050-4471-baec-7e1d2faddb0b%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>   --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/edd95f65-c380-410b-9f0f-465dd78e1bad%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/etPan.53e2a53a.2ae8944a.18f0%40MacBook-Air-de-David.local
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elastic

Re: Search result only with unique value of the specific field

Ooo, my bad, sorry.
In the top_hits explanation page 
: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
There was top_docs mentioned, but can't find any other reference to that 
aggregator, how can I use it ?

Thanks.

P.S. To include all fields in the _source, * is enough or should I use _all 
?

Thanks.


On Thursday, August 7, 2014 12:59:39 AM UTC+3, David Pilato wrote:
>
> This has been added in 1.3.0: 
> https://github.com/elasticsearch/elasticsearch/pull/6124
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 6 août 2014 à 23:49:25, slavag (sla...@gmail.com ) a 
> écrit:
>
> Hi, Thanks for the reply. 
> I'm trying to define top hits aggregation but getting error : "Parse 
> Failure [Could not find aggregator type [top_hits] in [single_result]]]; }]" 
> This is my aggregation definition, first bucket is grouped by id and the 
> nested bucket is grouped by date and then I want to get only one document 
> from each nested bucket. 
>
>  "aggs" : {
>   "id" : {
>   "terms" : {
>   "field" : "id"
>   }, 
>   "aggs" : {
>   "bckdate" : {
>   "terms" : {
>   "field" : "date"
>   },
>   "aggs" : {
>   "single_result" : {
>   "top_hits" : {
>   "sort": [
>{
> "id": {
> "order": 
> "desc"
>  }
>}
> ],
> "_source": {
>  "include": [
>   "*"
>   ]
>  },
>  "size" : 1
>   }
>   }
>   }
>   }
>   }
>   }
>   }
>
> What could be issue with my aggregation ? I'm using ES 1.2.1
>
> Thanks
>
> On Wednesday, August 6, 2014 10:06:40 PM UTC+3, Ivan Brusic wrote: 
>>
>> Perhaps the top hits aggregation can help: 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
>>  
>>
>> -- 
>> Ivan
>>  
>>
>> On Wed, Aug 6, 2014 at 11:21 AM, slavag  wrote:
>>
>>> Hi,  
>>> Need some advise.
>>> I have indexed documents, each document has internal id that also 
>>> indexed as just another indexed field, this id is not used as indexed 
>>> document id (_id).
>>> There could be situation when same document is indexed more than once 
>>> (each of the indexed instances will have different elasticsearch _id), when
>>> I search I'm getting result all those documents (including multiple 
>>> instance of the same source document), is there any way to get kind of 
>>> distinct results, I mean 
>>> to get search result only unique documents, based on some field form the 
>>> indexed document ?
>>>
>>> Thanks
>>>  --
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/afc78f11-6050-4471-baec-7e1d2faddb0b%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>  
>>   --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/edd95f65-c380-410b-9f0f-465dd78e1bad%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cde9a830-5a72-47a2-abd2-114603199578%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search result only with unique value of the specific field

This has been added in 1.3.0: 
https://github.com/elasticsearch/elasticsearch/pull/6124

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 6 août 2014 à 23:49:25, slavag (slav...@gmail.com) a écrit:

Hi, Thanks for the reply.
I'm trying to define top hits aggregation but getting error : "Parse Failure 
[Could not find aggregator type [top_hits] in [single_result]]]; }]"
This is my aggregation definition, first bucket is grouped by id and the nested 
bucket is grouped by date and then I want to get only one document from each 
nested bucket. 

"aggs" : {
  "id" : {
  "terms" : {
  "field" : "id"
  }, 
  "aggs" : {
  "bckdate" : {
  "terms" : {
  "field" : "date"
  },
  "aggs" : {
  "single_result" : {
  "top_hits" : {
  "sort": [
                                                           {
                                                            "id": {
                                                                "order": "desc"
                                                             }
                                                           }
                                                        ],
                                                        "_source": {
                                                             "include": [
                                                                  "*"
                                                              ]
                                                             },
                                                             "size" : 1
  }
  }
  }
  }
  }
  }
  }

What could be issue with my aggregation ? I'm using ES 1.2.1

Thanks

On Wednesday, August 6, 2014 10:06:40 PM UTC+3, Ivan Brusic wrote:
Perhaps the top hits aggregation can help: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

-- 
Ivan


On Wed, Aug 6, 2014 at 11:21 AM, slavag  wrote:
Hi, 
Need some advise.
I have indexed documents, each document has internal id that also indexed as 
just another indexed field, this id is not used as indexed document id (_id).
There could be situation when same document is indexed more than once (each of 
the indexed instances will have different elasticsearch _id), when
I search I'm getting result all those documents (including multiple instance of 
the same source document), is there any way to get kind of distinct results, I 
mean 
to get search result only unique documents, based on some field form the 
indexed document ?

Thanks
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/afc78f11-6050-4471-baec-7e1d2faddb0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/edd95f65-c380-410b-9f0f-465dd78e1bad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53e2a53a.2ae8944a.18f0%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: elasticsearch cluster spreading the bulk tasks

1. Yes, it is spread automatically

2. No

The bulk queue up is where the shards are. So check your shard
distribution. They should be equal on each node for an index. Otherwise
your system load is unbalanced.

Jörg


On Wed, Aug 6, 2014 at 10:36 PM, Pavel P  wrote:

> Still interested to know your view on the issue.
>
> On Wednesday, August 6, 2014 5:12:41 PM UTC+3, Pavel P wrote:
>>
>> Hi,
>>
>> Could someone clarify me the next:
>>
>> When I have the ES cluster, consisting from 2 machines, how should I send
>> the bulk index requests to them.
>>
>> 1. Do I understand right that I can send everything to any node I have,
>> then it would be spreaded for indexing among the cluster automatically?
>> 2. Do I need to cover the cluster with the load balancer so each node
>> would receive some portion of the indexing pressure?
>>
>> How it supposed to work by design?
>>
>> Currently I use the load balancer over my two instances, and as I see
>> with the Bigdesk - the bulk queue is growing on the master node, while the
>> slave node feels itself quite relaxed.
>>
>> Master node:
>>
>>
>> 
>>
>>
>> Slave node:
>>
>>
>> 
>>
>>
>> Is that ok, that my ES cluster from 2 machines, which are c3.large (4
>> CPU, 8Gb memory) is only able to index 13k small documents per 10 seconds
>> (I use it as output for the logstash)?
>> Which performance should I expect?
>>
>> Regards,
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2fa3fe97-ddfd-403c-98f3-22dc0bd1c70b%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHOe-0ppbhZ_%2ByROzww7YprMBh8RjrM8WYRiYhExopBAQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search result only with unique value of the specific field

Hi, Thanks for the reply.
I'm trying to define top hits aggregation but getting error : "Parse 
Failure [Could not find aggregator type [top_hits] in [single_result]]]; }]"
This is my aggregation definition, first bucket is grouped by id and the 
nested bucket is grouped by date and then I want to get only one document 
from each nested bucket. 

"aggs" : {
  "id" : {
  "terms" : {
  "field" : "id"
  }, 
  "aggs" : {
  "bckdate" : {
  "terms" : {
  "field" : "date"
  },
  "aggs" : {
  "single_result" : {
  "top_hits" : {
  "sort": [
   {
"id": {
"order": 
"desc"
 }
   }
],
"_source": {
 "include": [
  "*"
  ]
 },
 "size" : 1
  }
  }
  }
  }
  }
  }
  }

What could be issue with my aggregation ? I'm using ES 1.2.1

Thanks

On Wednesday, August 6, 2014 10:06:40 PM UTC+3, Ivan Brusic wrote:
>
> Perhaps the top hits aggregation can help: 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
>
> -- 
> Ivan
>
>
> On Wed, Aug 6, 2014 at 11:21 AM, slavag > 
> wrote:
>
>> Hi, 
>> Need some advise.
>> I have indexed documents, each document has internal id that also indexed 
>> as just another indexed field, this id is not used as indexed document id 
>> (_id).
>> There could be situation when same document is indexed more than once 
>> (each of the indexed instances will have different elasticsearch _id), when
>> I search I'm getting result all those documents (including multiple 
>> instance of the same source document), is there any way to get kind of 
>> distinct results, I mean 
>> to get search result only unique documents, based on some field form the 
>> indexed document ?
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/afc78f11-6050-4471-baec-7e1d2faddb0b%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/edd95f65-c380-410b-9f0f-465dd78e1bad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Twitter River plugin 2.3.0 released

2014-08-06 Thread Elasticsearch Team

Heya,


We are pleased to announce the release of the Elasticsearch Twitter River 
plugin, version 2.3.0.

The Twitter river indexes the public twitter stream, aka the hose, and makes it 
searchable.

https://github.com/elasticsearch/elasticsearch-river-twitter/

Release Notes - elasticsearch-river-twitter - Version 2.3.0



Update:
 * [59] - Remove deprecated camelCase properties 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/59)
 * [53] - Update to elasticsearch 1.3.0 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/53)


Doc:
 * [63] - Docs: make the welcome page more obvious 
(https://github.com/elasticsearch/elasticsearch-river-twitter/issues/63)


Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-river-twitter project repository: 
https://github.com/elasticsearch/elasticsearch-river-twitter/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53e2a099.08bab40a.68ce.9c73SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Google Compute Engine cloud plugin 2.3.0 released

2014-08-06 Thread Elasticsearch Team

Heya,


We are pleased to announce the release of the Elasticsearch Google Compute 
Engine cloud plugin, version 2.3.0.

The Google Compute Engine (GCE) Cloud plugin allows to use GCE API for the 
unicast discovery mechanism..

https://github.com/elasticsearch/elasticsearch-cloud-gce/

Release Notes - elasticsearch-cloud-gce - Version 2.3.0



Update:
 * [33] - Tests: refactor tests 
(https://github.com/elasticsearch/elasticsearch-cloud-gce/issues/33)
 * [32] - Update to Lucene 4.9.0 
(https://github.com/elasticsearch/elasticsearch-cloud-gce/issues/32)
 * [30] - Force Token URL to `http://metadata.google.internal/...` 
(https://github.com/elasticsearch/elasticsearch-cloud-gce/pull/30)
 * [24] - Update to elasticsearch 1.3.0 
(https://github.com/elasticsearch/elasticsearch-cloud-gce/issues/24)

New:
 * [27] - Add multiple zones support 
(https://github.com/elasticsearch/elasticsearch-cloud-gce/issues/27)

Doc:
 * [31] - Docs: make the welcome page more obvious 
(https://github.com/elasticsearch/elasticsearch-cloud-gce/issues/31)
 * [26] - Included notes about compute engine permissions 
(https://github.com/elasticsearch/elasticsearch-cloud-gce/pull/26)


Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-cloud-gce project repository: 
https://github.com/elasticsearch/elasticsearch-cloud-gce/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53e29b0c.ae6cb40a.5d10.82caSMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Re: elasticsearch cluster spreading the bulk tasks

2014-08-06 Thread Pavel P

Still interested to know your view on the issue.

On Wednesday, August 6, 2014 5:12:41 PM UTC+3, Pavel P wrote:
>
> Hi,
>
> Could someone clarify me the next:
>
> When I have the ES cluster, consisting from 2 machines, how should I send 
> the bulk index requests to them.
>
> 1. Do I understand right that I can send everything to any node I have, 
> then it would be spreaded for indexing among the cluster automatically?
> 2. Do I need to cover the cluster with the load balancer so each node 
> would receive some portion of the indexing pressure?
>
> How it supposed to work by design?
>
> Currently I use the load balancer over my two instances, and as I see with 
> the Bigdesk - the bulk queue is growing on the master node, while the slave 
> node feels itself quite relaxed.
>
> Master node:
>
>
> 
>
>
> Slave node:
>
>
> 
>
>
> Is that ok, that my ES cluster from 2 machines, which are c3.large (4 CPU, 
> 8Gb memory) is only able to index 13k small documents per 10 seconds (I use 
> it as output for the logstash)?
> Which performance should I expect?
>
> Regards,
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fa3fe97-ddfd-403c-98f3-22dc0bd1c70b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Query for nested objects count

2014-08-06 Thread Paulo Correa

Hi,

I have a need to retrieve documents (of type "bucket") which have at least 
2 nested objects (of type "products") inside them (details of my mapping 
and documents are on the gist below).
https://gist.github.com/anonymous/4f06c9322186ce9d4708

As far as I've searched, I did not find a way to accomplish this on a 
Elasticsearch (unless perhaps if I'd used parent/child for bucket/products, 
and used a "has child query" with "min_children=2"), but I'd rather avoid 
changing my mapping if I can.

Does anybody know if it is possible?

*PS*: not really the same situation, but perhaps *option number 2 *from 
this answer would be a solution 
http://stackoverflow.com/questions/19609498/elastic-search-order-by-count-of-nested-object.
 
Does anybody see another alternative?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9e3beb3b-7df0-432c-a32e-3317856d4080%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search result only with unique value of the specific field

Perhaps the top hits aggregation can help:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

-- 
Ivan


On Wed, Aug 6, 2014 at 11:21 AM, slavag  wrote:

> Hi,
> Need some advise.
> I have indexed documents, each document has internal id that also indexed
> as just another indexed field, this id is not used as indexed document id
> (_id).
> There could be situation when same document is indexed more than once
> (each of the indexed instances will have different elasticsearch _id), when
> I search I'm getting result all those documents (including multiple
> instance of the same source document), is there any way to get kind of
> distinct results, I mean
> to get search result only unique documents, based on some field form the
> indexed document ?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/afc78f11-6050-4471-baec-7e1d2faddb0b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDGEPns4eK0DPOp9pgOy0nO4RVS_ZE%2Ba-rr-LOXk_Nn8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Stripping html for indexing only?

1. Correct.
2. Also correct. The analysis chain only affects how the terms are indexed
and placed in the inverted index. The original document remains as is.
3. Not sure since I have never done highlighting. Highlighting might not
depend on the source since the term positions/offsets are used, but
hopefully someone will correct me.

-- 
Ivan


On Wed, Aug 6, 2014 at 11:45 AM, IronMike  wrote:

> I searched this topic but some of the answers were still vague to me.
>
> My goal is to index html docs but have the html stripped for the indexing,
> at the same time, I would like _source to have the original html document
> for display purposes.
>
> //My doc format:
> {
>   content:  Hello this is an html content 
>   rank:1
>   date:2014-8-8
>   title: Some title
>   
> }
>
> The questions that I am still not very clear on:
>
> 1 - if I understand correctly, I can push html doc like it is to Index,
> and it will strip html provided I do the charfilter referenced here?
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html
>
> 2- Will the stripping not affect the _source? In other words, _source will
> still have the original html?
>
> 3- Highlighting comes from the _source? this means highlighting will have
> html, meaning I will have to strip any html tags after the search comes
> back?
>
>
> Thanks
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6be77d25-f7fe-4a35-a247-932f93f07150%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBfhWBqtfi0zfPvmYs9ytT-bz75U8vCsuuUo3GVvLugpA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Stripping html for indexing only?

2014-08-06 Thread IronMike

I searched this topic but some of the answers were still vague to me.

My goal is to index html docs but have the html stripped for the indexing, 
at the same time, I would like _source to have the original html document 
for display purposes.

//My doc format:
{
  content:  Hello this is an html content 
  rank:1
  date:2014-8-8
  title: Some title
  
}

The questions that I am still not very clear on:

1 - if I understand correctly, I can push html doc like it is to Index, and 
it will strip html provided I do the charfilter referenced here?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html

2- Will the stripping not affect the _source? In other words, _source will 
still have the original html?

3- Highlighting comes from the _source? this means highlighting will have 
html, meaning I will have to strip any html tags after the search comes 
back?


Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6be77d25-f7fe-4a35-a247-932f93f07150%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is the snapshot incremental?

No. I don't think so.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 6 août 2014 à 20:04, IronMike  a écrit :

Thanks, it makes sense in this case. I don't think I can prevent something like 
that from happening?

> On Wednesday, August 6, 2014 1:29:40 PM UTC-4, David Pilato wrote:
> Well. It is incremental.
> 
> But let's say you have saved old Lucene segments and that old segments has 
> been merged in the meantime to a new bigger one, the next snapshot will copy 
> the new BIG segment and remove the old ones.
> 
> It means that old data will be copied twice in this scenario.
> 
> Makes sense?
> 
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> 
>> Le 6 août 2014 à 18:36, IronMike  a écrit :
>> 
>> 
>> 
>> curl -XPUT http://localhost:9200/_snapshot/myRepository/myIndex_`date 
>> "+%Y-%m-%d"`?wait_for_completion=true
>> 
>> This cron job runs daily which backs up my index to AWS S3, each day the 
>> snapshot has a different name. 
>> 
>> I want to make sure that I am not duplicating a 10GB index for example 
>> everyday in S3? Does it look at my index from yesterday and only index the 
>> changes? What if there were no changes, What does it mean for todays 
>> snapshot vs yesterday's snapshot (Is there a duplicate?)
>> 
>> 
>> 
>> Thanks
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/28ea0ac8-03c2-44f4-82ca-00ac288b45e6%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/afc9e207-5dca-47b7-a8d6-147ba0d0423b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/A2EAD035-26F2-474D-AEB9-822792B55808%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Search result only with unique value of the specific field

Hi, 
Need some advise.
I have indexed documents, each document has internal id that also indexed 
as just another indexed field, this id is not used as indexed document id 
(_id).
There could be situation when same document is indexed more than once (each 
of the indexed instances will have different elasticsearch _id), when
I search I'm getting result all those documents (including multiple 
instance of the same source document), is there any way to get kind of 
distinct results, I mean 
to get search result only unique documents, based on some field form the 
indexed document ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/afc78f11-6050-4471-baec-7e1d2faddb0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is the snapshot incremental?

2014-08-06 Thread IronMike

Thanks, it makes sense in this case. I don't think I can prevent something 
like that from happening?

On Wednesday, August 6, 2014 1:29:40 PM UTC-4, David Pilato wrote:
>
> Well. It is incremental.
>
> But let's say you have saved old Lucene segments and that old segments has 
> been merged in the meantime to a new bigger one, the next snapshot will 
> copy the new BIG segment and remove the old ones.
>
> It means that old data will be copied twice in this scenario.
>
> Makes sense?
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 6 août 2014 à 18:36, IronMike > a 
> écrit :
>
>
>
> curl -XPUT http://localhost:9200/_snapshot/myRepository/myIndex_`date 
> "+%Y-%m-%d"`?wait_for_completion=true
>
> This cron job runs daily which backs up my index to AWS S3, each day the 
> snapshot has a different name. 
>
> I want to make sure that I am not duplicating a 10GB index for example 
> everyday in S3? Does it look at my index from yesterday and only index the 
> changes? What if there were no changes, What does it mean for todays 
> snapshot vs yesterday's snapshot (Is there a duplicate?)
>
>
> Thanks
>
>
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/28ea0ac8-03c2-44f4-82ca-00ac288b45e6%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/afc9e207-5dca-47b7-a8d6-147ba0d0423b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is the snapshot incremental?

Well. It is incremental.

But let's say you have saved old Lucene segments and that old segments has been 
merged in the meantime to a new bigger one, the next snapshot will copy the new 
BIG segment and remove the old ones.

It means that old data will be copied twice in this scenario.

Makes sense?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 6 août 2014 à 18:36, IronMike  a écrit :
> 
> 
> 
> curl -XPUT http://localhost:9200/_snapshot/myRepository/myIndex_`date 
> "+%Y-%m-%d"`?wait_for_completion=true
> 
> This cron job runs daily which backs up my index to AWS S3, each day the 
> snapshot has a different name. 
> 
> I want to make sure that I am not duplicating a 10GB index for example 
> everyday in S3? Does it look at my index from yesterday and only index the 
> changes? What if there were no changes, What does it mean for todays snapshot 
> vs yesterday's snapshot (Is there a duplicate?)
> 
> 
> 
> Thanks
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/28ea0ac8-03c2-44f4-82ca-00ac288b45e6%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/C085A7A2-984D-4707-8892-B97FD9243AFD%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Recovering From Corrupted Shard Following Upgrade to 1.3.1

2014-08-06 Thread Nariman Haghighi

I should mention that there is a primary shard 4 on the other node, just 
need to understand why it's not auto recovering here what I can do to 
manually remove the corrupted shard to have the primary replicated to this 
node. 

On Wednesday, August 6, 2014 12:44:41 PM UTC-4, Nariman Haghighi wrote:
>
> A few days after the upgrade to 1.3.1 we experienced our first corrupted 
> shard in a 2 node cluster:
>
> [2014-08-06 15:54:28,815][WARN ][indices.cluster  ] 
> [FiveAces.Coffee.Web_IN_0] [streamentry5][4] failed to start shard
> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
> [streamentry5][4] failed to fetch index version after copying it over
> at 
> org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
> at 
> org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.lucene.index.CorruptIndexException: 
> [streamentry5][4] Corrupted index [corrupted_fuDt8NuqR_egGJK0fcjl6g] caused 
> by: CorruptIndexException[Invalid fieldsStream maxPointer (file 
> truncated?): maxPointer=6833538, length=524288]
> at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
> at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
> at 
> org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
> ... 4 more
>
> How do we recover from this?
>
> We've tried explicitly assigning via the reroute API:
>
> { "commands" : [ { "allocate" : { "index" : "streamentry5", "shard" : 4 , 
> "node" : "FiveAces.Coffee.Web_IN_0", "allow_primary" : 1 }}]}
>
> This puts the shard in INITIALIZING but quickly reverts back to 
> UNALLOCATED with a similar error in the logs.
>
> I'm interested in theories on how this could have happened assuming no 
> significant changes on our end during this period and never having 
> experienced this on ES before but more importantly how to recover from it.
>
> Thank you
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/96692c31-c938-41dd-aeb4-d4e61a9a515d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES + spark 1.0.1 - unable to send RDDs to ES

2014-08-06 Thread Costin Leau


Hi Phil,

Glad to see the work in es-hadoop master is being picked up even without any 
public announcement of it :)

The issue has been fixed in master [1] and already pushed to Maven - can you 
please update and try again?

FTR: The issue seems to be caused by multiple versions of Jackson which are pulled in the classpath (one from Hadoop, 
another from Spark)

which on some platforms, causes class loading issues in Jackson during 
start-up. The fix in master hopefully remedies that.

Cheers,

[1] https://github.com/elasticsearch/elasticsearch-hadoop/issues/239

On 8/6/14 7:32 PM, Phil gib wrote:

hello
my context : spark, spark-shell  1.0.1 jdk1.7 scala 2.10.4,   ES-Hadoop 2.1.0 ( 
nighly build)
my problem:
  - unable to send  RDDs from spark  to ES
i got  a  NoClassDefFoundError see below ( 
org/codehaus/jackson/annotate/JsonClass)
jackson  Jars to add  to spark shell?

philippe
best regards

--
  $bin/spark-shell  --jars /usr/lib/spark-1.0/lib/elasticsearch-hadoop-2.1.0.jar
..
spark   version 1.0.1
Using Scala version 2.10.4
..
14/08/06 17:19:36 INFO SparkContext: Added JAR 
file:/usr/lib/spark-1.0/lib/elasticsearch-hadoop-2.1.0.jar
scala>
import org.elasticsearch.hadoop.mr.EsOutputFormat
import org.elasticsearch.hadoop.mr.EsInputFormat
import org.elasticsearch.hadoop.cfg.ConfigurationOptions
import org.apache.hadoop.mapred.{FileOutputCommitter, FileOutputFormat, 
JobConf, OutputFormat}
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{MapWritable, Text, NullWritable}
val jobConf = new JobConf(sc.hadoopConfiguration) jobConf.set("es.resource", 
"myindex/mytype")
  jobConf.set("es.query", "?q=*")
  val esRDD = sc.hadoopRDD(jobConf,classOf[EsInputFormat[Text, 
MapWritable]],classOf[Text],classOf[MapWritable])
//*up to there everything ok *
es.count()
>
java.lang.NoClassDefFoundError: org/codehaus/jackson/annotate/JsonClass
 at
org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector.findDeserializationType(JacksonAnnotationIntrospector.java:524)
.
 at 
org.elasticsearch.hadoop.rest.RestClient.parseContent(RestClient.java:119)

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f525de7a-2d5f-41d0-b284-9ed2886c8c22%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53E25BF0.4000307%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: transport client? really?

Since version 1.0, there should be fewer binary protocol issues between any
nodes, including the clients, making rolling upgrades doable. Older clients
should be able to interact with newer server nodes, but the inverse is not
always the case.

-- 
Ivan


On Wed, Aug 6, 2014 at 8:47 AM, Brian  wrote:

> Here is my experience. Yours may vary.
>
> I also use the TransportClient. And then I wrap our business rules behind
> another server that offers an HTTP REST API but talks to Elasticsearch on
> the back end via the TransportClient. This server uses Netty and the LMAX
> Disruptor to provide low-resource high-throughput processing; it is
> somewhat like Node.js but in Java instead of JavaScript.
>
> Then I have a bevy of command-line maintenance and test tools that also
> use the TransportClient. I wrap them inside a shell script (for example,
> Foobar.main is wrapped inside foobar.sh) and convert command-line options
> (such as -t person) into Java properties (such as TypeName=person), and
> also set the classpath to all of the Elasticsearch jars plus all of mine.
>
> Whenever there is a compelling change to Elasticsearch, I upgrade, and
> many times I have watched my Java builds fail with all of the breaking
> changes. But even with the worst of the breaking changes, it was down for
> maybe a day or two at the most; the API is rather clean, and this newsgroup
> is a life saver, and so I never got stuck. And when I was done, I had
> learned even more about the ES Java API.
>
> So it's either a huge pain or it's the joy of learning, depending on your
> point of view. I have always viewed it as the joy of learning.
>
> I just wish the Facets-to-Aggregations migration was smoother. But I sense
> that there will be another breaking change on my horizon. This will be
> particularly sad for me, as I had implemented a rather nice hierarchical
> term frequency combining mvel and facets. Which are now deprecated and on
> the list to be removed. But again, I'll learn a lot when making the
> migration.
>
> I believe it was Thomas Edison who said that most people miss
> opportunities because the opportunities come dressed in overalls and look
> like work. But I digress :-)
>
> Brian
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/40a95f8f-e616-4086-837e-071539078fd4%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAhmHp%3DiszaaEXjYq4B%2B0HBbNvpkdHAgmphE%3D6GvR%3DKiw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Recovering From Corrupted Shard Following Upgrade to 1.3.1

2014-08-06 Thread Nariman Haghighi

A few days after the upgrade to 1.3.1 we experienced our first corrupted 
shard in a 2 node cluster:

[2014-08-06 15:54:28,815][WARN ][indices.cluster  ] 
[FiveAces.Coffee.Web_IN_0] [streamentry5][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
[streamentry5][4] failed to fetch index version after copying it over
at 
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at 
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.lucene.index.CorruptIndexException: [streamentry5][4] 
Corrupted index [corrupted_fuDt8NuqR_egGJK0fcjl6g] caused by: 
CorruptIndexException[Invalid fieldsStream maxPointer (file truncated?): 
maxPointer=6833538, length=524288]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
at 
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more

How do we recover from this?

We've tried explicitly assigning via the reroute API:

{ "commands" : [ { "allocate" : { "index" : "streamentry5", "shard" : 4 , 
"node" : "FiveAces.Coffee.Web_IN_0", "allow_primary" : 1 }}]}

This puts the shard in INITIALIZING but quickly reverts back to UNALLOCATED 
with a similar error in the logs.

I'm interested in theories on how this could have happened assuming no 
significant changes on our end during this period and never having 
experienced this on ES before but more importantly how to recover from it.

Thank you


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d225d1cd-79a6-455c-a4d0-6cf0dfd88314%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES + spark 1.0.1 - unable to send RDDs to ES

2014-08-06 Thread Phil gib


sorry for the mistake :   --> unable to read  from ES and create RDDS
On Wednesday, August 6, 2014 6:32:02 PM UTC+2, Phil gib wrote:
>
> hello
> my context : spark, spark-shell  1.0.1 jdk1.7 scala 2.10.4,   ES-Hadoop 
> 2.1.0 ( nighly build)
> my problem:
>  - unable to read  from ES and create RDDS
> i got  a  NoClassDefFoundError see below ( 
> org/codehaus/jackson/annotate/JsonClass)
> jackson  Jars to add  to spark shell? 
>
> philippe 
> best regards
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/63db7ae7-0da9-4b87-a835-42e38f11ee05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES + spark 1.0.1 - unable to send RDDs to ES

2014-08-06 Thread Phil gib

hello
my context : spark, spark-shell  1.0.1 jdk1.7 scala 2.10.4,   ES-Hadoop 
2.1.0 ( nighly build)
my problem:
 - unable to send  RDDs from spark  to ES
i got  a  NoClassDefFoundError see below ( 
org/codehaus/jackson/annotate/JsonClass)
jackson  Jars to add  to spark shell? 

philippe 
best regards

--
 $bin/spark-shell  --jars 
/usr/lib/spark-1.0/lib/elasticsearch-hadoop-2.1.0.jar 
..
spark   version 1.0.1
Using Scala version 2.10.4 
..
14/08/06 17:19:36 INFO SparkContext: Added JAR 
file:/usr/lib/spark-1.0/lib/elasticsearch-hadoop-2.1.0.jar 
scala>
import org.elasticsearch.hadoop.mr.EsOutputFormat
import org.elasticsearch.hadoop.mr.EsInputFormat
import org.elasticsearch.hadoop.cfg.ConfigurationOptions
import org.apache.hadoop.mapred.{FileOutputCommitter, FileOutputFormat, 
JobConf, OutputFormat}
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{MapWritable, Text, NullWritable}
val jobConf = new JobConf(sc.hadoopConfiguration) 
jobConf.set("es.resource", "myindex/mytype")
 jobConf.set("es.query", "?q=*")
 val esRDD = sc.hadoopRDD(jobConf,classOf[EsInputFormat[Text, 
MapWritable]],classOf[Text],classOf[MapWritable])
//* up to there everything ok *
es.count() 
>  
java.lang.NoClassDefFoundError: org/codehaus/jackson/annotate/JsonClass
at 
org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector.findDeserializationType(JacksonAnnotationIntrospector.java:524)
   .
at 
org.elasticsearch.hadoop.rest.RestClient.parseContent(RestClient.java:119)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f525de7a-2d5f-41d0-b284-9ed2886c8c22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Is the snapshot incremental?

2014-08-06 Thread IronMike



curl -XPUT http://localhost:9200/_snapshot/myRepository/myIndex_`date 
"+%Y-%m-%d"`?wait_for_completion=true

This cron job runs daily which backs up my index to AWS S3, each day the 
snapshot has a different name. 

I want to make sure that I am not duplicating a 10GB index for example 
everyday in S3? Does it look at my index from yesterday and only index the 
changes? What if there were no changes, What does it mean for todays 
snapshot vs yesterday's snapshot (Is there a duplicate?)


Thanks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/28ea0ac8-03c2-44f4-82ca-00ac288b45e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Group by field and then sum the groups

2014-08-06 Thread Cameron Barker

This worked perfectly!  Thank you for your help.

On Wednesday, August 6, 2014 3:49:57 AM UTC-4, Tihomir Lichev wrote:
>
> Thanks! You're absolutely right. Copy/paste error :)
>
> {
>  "aggs": {
>"user_likes": {
>  "terms": {
>"field": "user_id"
>  }, 
>  "aggs": {
>"likes_sum": {
>  "sum": {
>"field": "likes"
>  }
>}
>  }
>}
>  }
> }
>
> 06 август 2014, сряда, 10:06:38 UTC+3, Jun Ohtani написа:
>>
>> Hi,
>>
>> I think second "aggs" use "sum" instead of "terms", in "likes_sum".
>>
>>
>> 2014-08-06 14:32 GMT+09:00 Tihomir Lichev :
>>
>>> You can use aggregations:
>>> {
>>>  "aggs": {
>>>"user_likes": {
>>>  "terms": {
>>>"field": "user_id"
>>>  }, 
>>>  "aggs": {
>>>"likes_sum": {
>>>  "terms": {
>>>"field": "likes"
>>>  }
>>>}
>>>  }
>>>}
>>>  }
>>> }
>>>
>>>
>>> 05 август 2014, вторник, 23:11:59 UTC+3, Cameron Barker написа:
>>>
 Hi all,

 I have an elastic database of posts, each post has a *user_id* and has 
 *likes* field.  My goal is to output for a query how many likes in 
 total each user has.

 I wondered if any one had any advice/direction I could take to achieve 
 this?

 input:
 {user_id: 10, likes: 20}
 {user_id: 9, likes: 10}
 {user_id: 10, likes: 25}
 {user_id: 9, likes: 15}

 output:
 User: 10 likes: 45
 User: 9 likes: 25

  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/7bec5f9a-16cf-4723-87a6-7e95de45d0ea%40googlegroups.com
>>>  
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>> ---
>> Jun Ohtani
>> blog : http://blog.johtani.info
>>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb530eeb-0a9f-4460-905a-92f0d74fa5ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: leave content in mySQL and use ElasticSearch only for Index

JDBC plugin is not for migration.

It can be configured to select the data from the RDBMS you want. You can
fetch the metadata fields and index them into Elasticsearch with a simple
SQL select statement.

Jörg


On Wed, Aug 6, 2014 at 3:48 PM, Andrej Rosenheinrich <
andrej.rosenheinr...@unister.de> wrote:

> What I don't understand is why you generate an index and want to store it
> in elasticsearch. You could use the plugin as Jörg suggested, transfer you
> data to elasticsearch, set index:true for the fields you want and set
> store:false in the mapping. This way you get an index build by
> elasticsearch, can search on it, get the id as result and the data is not
> stored (except metadata, if you set it to be stored). See
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#mapping-core-types
> .
>
> Cheers,
> Andrej
>
> Am Mittwoch, 6. August 2014 15:34:11 UTC+2 schrieb asekn...@gmail.com:
>
>> Using this plugin would lead to a migration from mysql data into
>> Elasticsearch.
>>
>> So let me reformulate my question:
>>
>> My infrastructure is like this:
>>
>> client>Elasticsearch
>>|
>>|
>> >mySQL
>>
>> So I have a client which generates an index and some metadata for a
>> mail(header and body). The mail is stored in mySQL. And the client-side
>> generated index and metadata is stored in Elasticsearch.
>>
>> The reason is because I have > 1 TB of mail content every day. This
>> content shall still be written to mySQL. Elasticsearch shall keep only the
>> index. Is that possible? And how?
>>
>> Regards
>> Michael
>>
>>
>> Am Mittwoch, 6. August 2014 13:21:09 UTC+2 schrieb asekn...@gmail.com:
>>>
>>> Hello,
>>>
>>> I want to use Elasticsearch or only indexing and searching E-Mails. We
>>> want to store the meta-info within Elasticsearch, keeping the content/body
>>> of every Mail in an mySQL database. So Elasticsearch shall have a reference
>>> to the mail body.
>>>
>>> Is that possible and how?
>>>
>>> Regards
>>> Michael
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d1ffc059-abad-4b11-8179-35ed3c077cbf%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFRoWCW1ea-ZXr%2B9VfLzFDSvtenP8xHX35Tyi6Q6zDwSw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: transport client? really?

2014-08-06 Thread Brian

Here is my experience. Yours may vary.

I also use the TransportClient. And then I wrap our business rules behind
another server that offers an HTTP REST API but talks to Elasticsearch on
the back end via the TransportClient. This server uses Netty and the LMAX
Disruptor to provide low-resource high-throughput processing; it is
somewhat like Node.js but in Java instead of JavaScript.

Then I have a bevy of command-line maintenance and test tools that also use
the TransportClient. I wrap them inside a shell script (for example,
Foobar.main is wrapped inside foobar.sh) and convert command-line options
(such as -t person) into Java properties (such as TypeName=person), and
also set the classpath to all of the Elasticsearch jars plus all of mine.

Whenever there is a compelling change to Elasticsearch, I upgrade, and many
times I have watched my Java builds fail with all of the breaking changes.
But even with the worst of the breaking changes, it was down for maybe a
day or two at the most; the API is rather clean, and this newsgroup is a
life saver, and so I never got stuck. And when I was done, I had learned
even more about the ES Java API.

So it's either a huge pain or it's the joy of learning, depending on your
point of view. I have always viewed it as the joy of learning.

I just wish the Facets-to-Aggregations migration was smoother. But I sense
that there will be another breaking change on my horizon. This will be
particularly sad for me, as I had implemented a rather nice hierarchical
term frequency combining mvel and facets. Which are now deprecated and on
the list to be removed. But again, I'll learn a lot when making the
migration.

I believe it was Thomas Edison who said that most people miss opportunities
because the opportunities come dressed in overalls and look like work. But
I digress :-)

Brian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/40a95f8f-e616-4086-837e-071539078fd4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Some observations with Curator

2014-08-06 Thread Brian

Aaron,

Well, now I feel a little foolish. Perhaps it was from my initial attempt 
to put --logfile at the end of the command instead of before the action:

$ curator delete --older-than 8 --logfile /tmp/curator.log
usage: curator [-h] [-v] [--host HOST] [--url_prefix URL_PREFIX] [--port 
PORT]
   [--ssl] [--auth AUTH] [-t TIMEOUT] [--master-only] [-n] [-D]
   [--loglevel LOG_LEVEL] [-l LOG_FILE] [--logformat LOGFORMAT]
   {show,allocation,alias,snapshot,close,bloom,optimize,delete}
   ...
curator: error: unrecognized arguments: --logfile /tmp/curator.log

So I changed it to -l before I moved it, based on the error message above. 
But you're correct: It does accept both forms of the option:

# For testing: Works fine and stores the log in /tmp/curator.log

$ curator --logfile /tmp/curator.log delete --older-than 8
# Older CentOS server; it's 2.7.5 on my MacBook (Mavericks) and
# HP laptop (Ubuntu 14.04 LTS):
$ python --version
Python 2.6.6

# Latest released version:
$ curator --version
curator 1.2.2

Brian

On Tuesday, August 5, 2014 8:18:24 PM UTC-4, Aaron Mildenstein wrote:
>
> Hmm.  What version of python are you using?  I am able to use --logfile 
> or -l interchangeably.
>
> I'm glad you like Curator, and I like KELTIC :)  Nice acronym.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/764826ca-3da6-419e-807a-f940cd86a8a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: java.lang.ClassNotFoundException: org.elasticsearch.transport.RemoteTransportException

2014-08-06 Thread Earle Nietzel

Hi Jörg,

I was able to resolve this by moving elasticsearch and its dependencies 
into tomcats shared/lib. Since Tomcats shared classloader is always in the 
chain of classloaders to search the ClassNotFoundException's are no more.

Though there shouldn't have been any reason for elasticsearch to not work 
in the custom ComponentLoader, this did work in the 0.90.x series. We only 
experienced these classloader issues when ES was sending shard data to 
another node in the cluster which lead me to beleive that the Netty 
Transport for some reason was not using the classloader from the original 
ImmutableSettings.

Its my guess that the ImmutableSettings object was not present which is why 
it chose to use use the Classes.getDefaultClassLoader().

Thanks for your direction was very helpful in figuring this out,
Earle

On Tuesday, August 5, 2014 11:39:28 AM UTC-4, Jörg Prante wrote:
>
> The cluster wants to transport an exception to your web app container, and 
> the web app does not have access to elasticsearch jar.
>
> You should have a look at the ES server logs, if there are any exceptions, 
> to find the real problem. 
>
> Then, after fixing the real problem, you should try to configure your web 
> app so the elasticsearch jar is in the classpath.
>
> To your question about WebAppClassLoader: ES uses the thread context class 
> loader 
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/common/Classes.java#L56
>
> Also, do you use any ES plugins? If plugins throw exceptions, they must 
> also be available in the web app.
>
> Jörg
>
>
> On Tue, Aug 5, 2014 at 4:57 PM, Earle Nietzel  > wrote:
>
>> Using elasticsearch embedded in tomcat 7 where we have custom classloader 
>> that shares spring application beans with many webapps. The API's to these 
>> implementations are in shared but the implementations are in a separate 
>> classloader "ComponentLoader". Our search implementation is loaded from 
>> ComponentLoader where elasticsearch has been promoted as our default search 
>> engine.
>>
>> Everything works fine on a single node but when in a cluster seeing the 
>> following ClassNotFoundException issue when shards are trying to update on 
>> other nodes.
>>
>> Caused by: java.lang.ClassNotFoundException: 
>> org.elasticsearch.transport.RemoteTransportException 
>> at 
>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1702)
>>  
>>
>> at 
>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1547)
>>  
>>
>> at 
>> org.elasticsearch.common.io.ThrowableObjectInputStream.loadClass(ThrowableObjectInputStream.java:93)
>>  
>>
>> at 
>> org.elasticsearch.common.io.ThrowableObjectInputStream.readClassDescriptor(ThrowableObjectInputStream.java:67)
>>  
>>
>> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1601)
>>
>> I thought this might have been related to 
>> https://github.com/elasticsearch/elasticsearch/issues/4634 but that 
>> doesn't seem to be the case.
>>
>> My next idea is to look at why 
>> org.elasticsearch.common.io.ThrowableObjectInputStream.loadClass is 
>> invoking WebappClassLoader instead of our custom ComponentLoader and coerce 
>> it to use ours. 
>>
>> But I wanted to get some opinions on this strategy as I am new to 
>> elasticsearch ;)
>>
>> my thanks,
>> Earle
>>
>>
>> Full stack trace:
>>
>> 2014-08-05 01:25:26,685 WARN elasticsearch[app02][generic][T#1] 
>> org.elasticsearch.indices.cluster - [app02] [sakai_index][0] failed to 
>> start shard 
>> org.elasticsearch.indices.recovery.RecoveryFailedException: 
>> [sakai_index][0]: Recovery failed from 
>> [app01][0YtJIFeHSuehUfjjfMgv6A][ip-10-93-162-196][inet[/10.93.162.196:9300]]{local=false}
>>  
>> into 
>> [app02][yshRulD0QjaNyu40Z6EGWQ][ip-10-7-174-145][inet[/10.7.174.145:9300]]{local=false}
>>  
>>
>> at 
>> org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:307)
>>  
>>
>> at 
>> org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:65)
>>  
>>
>> at 
>> org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:175)
>>  
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  
>>
>> at java.lang.Thread.run(Thread.java:745) 
>> Caused by: org.elasticsearch.transport.RemoteTransportException: Failed 
>> to deserialize exception response from stream 
>> Caused by: org.elasticsearch.transport.TransportSerializationException: 
>> Failed to deserialize exception response from stream 
>> at 
>> org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:169)
>>  
>>
>> at 
>> org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:123)
>>  
>>
>> at 
>> org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handle

Re: Searching email

I haven't tested such aggregation, but I think the way I wrote it should 
give you the oldest email that match the request from each thread. Not sure 
how they will be sorted ...  

06 август 2014, сряда, 17:31:56 UTC+3, Mark Fletcher написа:
>
> Thanks again for your response. I don't have much experience with 
> aggregations, but wouldn't that just give me a set of thread id's ordered 
> by how many messages are in each thread? In my results, it's possible to 
> have a match on a message body be ranked higher than a match on a subject. 
> Using this aggregation, wouldn't this just end up showing all subject 
> matches first?
>
> Thanks,
> Mark
>
>
> On Wed, Aug 6, 2014 at 7:08 AM, Tihomir Lichev  > wrote:
>
>> So you should be able to use aggregation to get the first email from each 
>> thread.
>> Kind of :
>>
>> {
>>  "aggs": {
>>"threads": {
>>  "terms": {
>>"field": "thread_id"
>>  }, 
>>  "aggs": {
>>"first_email": {
>>  "min": {
>>"field": "email_id"
>>  }
>>}
>>  }
>>}
>>  }
>> }
>>
>> 06 август 2014, сряда, 17:02:21 UTC+3, Mark Fletcher написа:
>>
>>> Each thread has a unique integer id (so, every message in a given thread 
>>> has a particular thread id). And each email has a unique integer id as 
>>> well. 
>>>
>>> On Wednesday, August 6, 2014 6:59:36 AM UTC-7, Tihomir Lichev wrote:

 So how you can distinguish the first email from any thread ?
 Do you have some additional parameter ?

 06 август 2014, сряда, 16:56:10 UTC+3, Mark Fletcher написа:
>
> Thanks for your response. If I do as you suggested, a subject match 
> will return all the messages in that thread (because they all match). I 
> want the search results to only contain one result if there's a thread 
> match. 
>
> I suppose I could just grab all the results and then 'collapse' the 
> thread matches, but I was hoping to be able to do something better.
>
> Thanks,
> Mark
>
> On Tuesday, August 5, 2014 10:12:32 PM UTC-7, Tihomir Lichev wrote:
>>
>> Isn't better to create single document for each mail with fields 
>> "subject" and "body" (and whatever else you need from the mail) ?
>> This way you can search by any or all of the fields, also you can 
>> define boosting for each field. For instance when your search matches 
>> the 
>> subject the mail will be scored higher in the result than if it matches 
>> the 
>> body, and you will get single set of results.
>>
>> 06 август 2014, сряда, 02:12:52 UTC+3, Mark Fletcher написа:
>>>
>>> Hi,
>>>
>>> We're using ES to index email, specifically mailing list messages. 
>>> We'd like search to work similar to Gmail in that we'd like to match on 
>>> either the subject or body of the email, and if it matches on the 
>>> subject, 
>>> we only want to display one result for that match (say the first 
>>> message in 
>>> that thread). In our naive implementation, we have an ES index for 
>>> subjects 
>>> and another for message bodies. But that gets us two sets of results, 
>>> not 
>>> combined. Is there a better way to structure the data, or a query that 
>>> we're missing so that we get one set of combined results?
>>>
>>> Thanks,
>>> Mark
>>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/eQ9XVrbulk8/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a8ee880a-a399-4f27-b698-c8ee8c445b68%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eb497355-24a6-43e0-a429-66c97e1b0b98%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Searching email

2014-08-06 Thread Mark Fletcher

Thanks again for your response. I don't have much experience with
aggregations, but wouldn't that just give me a set of thread id's ordered
by how many messages are in each thread? In my results, it's possible to
have a match on a message body be ranked higher than a match on a subject.
Using this aggregation, wouldn't this just end up showing all subject
matches first?

Thanks,
Mark


On Wed, Aug 6, 2014 at 7:08 AM, Tihomir Lichev  wrote:

> So you should be able to use aggregation to get the first email from each
> thread.
> Kind of :
>
> {
>  "aggs": {
>"threads": {
>  "terms": {
>"field": "thread_id"
>  },
>  "aggs": {
>"first_email": {
>  "min": {
>"field": "email_id"
>  }
>}
>  }
>}
>  }
> }
>
> 06 август 2014, сряда, 17:02:21 UTC+3, Mark Fletcher написа:
>
>> Each thread has a unique integer id (so, every message in a given thread
>> has a particular thread id). And each email has a unique integer id as
>> well.
>>
>> On Wednesday, August 6, 2014 6:59:36 AM UTC-7, Tihomir Lichev wrote:
>>>
>>> So how you can distinguish the first email from any thread ?
>>> Do you have some additional parameter ?
>>>
>>> 06 август 2014, сряда, 16:56:10 UTC+3, Mark Fletcher написа:

 Thanks for your response. If I do as you suggested, a subject match
 will return all the messages in that thread (because they all match). I
 want the search results to only contain one result if there's a thread
 match.

 I suppose I could just grab all the results and then 'collapse' the
 thread matches, but I was hoping to be able to do something better.

 Thanks,
 Mark

 On Tuesday, August 5, 2014 10:12:32 PM UTC-7, Tihomir Lichev wrote:
>
> Isn't better to create single document for each mail with fields
> "subject" and "body" (and whatever else you need from the mail) ?
> This way you can search by any or all of the fields, also you can
> define boosting for each field. For instance when your search matches the
> subject the mail will be scored higher in the result than if it matches 
> the
> body, and you will get single set of results.
>
> 06 август 2014, сряда, 02:12:52 UTC+3, Mark Fletcher написа:
>>
>> Hi,
>>
>> We're using ES to index email, specifically mailing list messages.
>> We'd like search to work similar to Gmail in that we'd like to match on
>> either the subject or body of the email, and if it matches on the 
>> subject,
>> we only want to display one result for that match (say the first message 
>> in
>> that thread). In our naive implementation, we have an ES index for 
>> subjects
>> and another for message bodies. But that gets us two sets of results, not
>> combined. Is there a better way to structure the data, or a query that
>> we're missing so that we get one set of combined results?
>>
>> Thanks,
>> Mark
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/eQ9XVrbulk8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a8ee880a-a399-4f27-b698-c8ee8c445b68%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADOuSDHt5Qd8Q%3DXO3S7Ppj4vhBgzZ%2BV9c%3DrjOrXO7k%2BF_m70rA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

elasticsearch cluster spreading the bulk tasks

2014-08-06 Thread Pavel P

Hi,

Could someone clarify me the next:

When I have the ES cluster, consisting from 2 machines, how should I send 
the bulk index requests to them.

1. Do I understand right that I can send everything to any node I have, 
then it would be spreaded for indexing among the cluster automatically?
2. Do I need to cover the cluster with the load balancer so each node would 
receive some portion of the indexing pressure?

How it supposed to work by design?

Currently I use the load balancer over my two instances, and as I see with 
the Bigdesk - the bulk queue is growing on the master node, while the slave 
node feels itself quite relaxed.

Master node:




Slave node:




Is that ok, that my ES cluster from 2 machines, which are c3.large (4 CPU, 
8Gb memory) is only able to index 13k small documents per 10 seconds (I use 
it as output for the logstash)?
Which performance should I expect?

Regards,

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/657028e7-ae1e-409f-85b6-faa97f58c500%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

transport client? really?

2014-08-06 Thread Luis García Acosta

Hi Folk,

The question is, what client are you using out there?

Here at company X we have java applications using elasticsearch. We have 
many java applications, different java applications and they use the 
transport client. This decision was made for developers, given the ease of 
use that the transport client provides. BUT, for upgrading elasticsearch 
this is a pain in the ass, because every time we upgrade the cluster, the 
transport client has to be upgraded, and this is a maneuver hard to 
orchestrate with a cero downtime (we have to redeploy many applications 
with the newest transport client).

Is there anyone out there familiar with this situation? what are you using?

Thanks

L

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4f046673-6999-4f78-abbc-1a103484073d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Searching email

So you should be able to use aggregation to get the first email from each 
thread.
Kind of :

{
 "aggs": {
   "threads": {
 "terms": {
   "field": "thread_id"
 }, 
 "aggs": {
   "first_email": {
 "min": {
   "field": "email_id"
 }
   }
 }
   }
 }
}

06 август 2014, сряда, 17:02:21 UTC+3, Mark Fletcher написа:
>
> Each thread has a unique integer id (so, every message in a given thread 
> has a particular thread id). And each email has a unique integer id as 
> well. 
>
> On Wednesday, August 6, 2014 6:59:36 AM UTC-7, Tihomir Lichev wrote:
>>
>> So how you can distinguish the first email from any thread ?
>> Do you have some additional parameter ?
>>
>> 06 август 2014, сряда, 16:56:10 UTC+3, Mark Fletcher написа:
>>>
>>> Thanks for your response. If I do as you suggested, a subject match will 
>>> return all the messages in that thread (because they all match). I want the 
>>> search results to only contain one result if there's a thread match. 
>>>
>>> I suppose I could just grab all the results and then 'collapse' the 
>>> thread matches, but I was hoping to be able to do something better.
>>>
>>> Thanks,
>>> Mark
>>>
>>> On Tuesday, August 5, 2014 10:12:32 PM UTC-7, Tihomir Lichev wrote:

 Isn't better to create single document for each mail with fields 
 "subject" and "body" (and whatever else you need from the mail) ?
 This way you can search by any or all of the fields, also you can 
 define boosting for each field. For instance when your search matches the 
 subject the mail will be scored higher in the result than if it matches 
 the 
 body, and you will get single set of results.

 06 август 2014, сряда, 02:12:52 UTC+3, Mark Fletcher написа:
>
> Hi,
>
> We're using ES to index email, specifically mailing list messages. 
> We'd like search to work similar to Gmail in that we'd like to match on 
> either the subject or body of the email, and if it matches on the 
> subject, 
> we only want to display one result for that match (say the first message 
> in 
> that thread). In our naive implementation, we have an ES index for 
> subjects 
> and another for message bodies. But that gets us two sets of results, not 
> combined. Is there a better way to structure the data, or a query that 
> we're missing so that we get one set of combined results?
>
> Thanks,
> Mark
>


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a8ee880a-a399-4f27-b698-c8ee8c445b68%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Searching email

2014-08-06 Thread Mark Fletcher

Each thread has a unique integer id (so, every message in a given thread 
has a particular thread id). And each email has a unique integer id as 
well. 

On Wednesday, August 6, 2014 6:59:36 AM UTC-7, Tihomir Lichev wrote:
>
> So how you can distinguish the first email from any thread ?
> Do you have some additional parameter ?
>
> 06 август 2014, сряда, 16:56:10 UTC+3, Mark Fletcher написа:
>>
>> Thanks for your response. If I do as you suggested, a subject match will 
>> return all the messages in that thread (because they all match). I want the 
>> search results to only contain one result if there's a thread match. 
>>
>> I suppose I could just grab all the results and then 'collapse' the 
>> thread matches, but I was hoping to be able to do something better.
>>
>> Thanks,
>> Mark
>>
>> On Tuesday, August 5, 2014 10:12:32 PM UTC-7, Tihomir Lichev wrote:
>>>
>>> Isn't better to create single document for each mail with fields 
>>> "subject" and "body" (and whatever else you need from the mail) ?
>>> This way you can search by any or all of the fields, also you can define 
>>> boosting for each field. For instance when your search matches the subject 
>>> the mail will be scored higher in the result than if it matches the body, 
>>> and you will get single set of results.
>>>
>>> 06 август 2014, сряда, 02:12:52 UTC+3, Mark Fletcher написа:

 Hi,

 We're using ES to index email, specifically mailing list messages. We'd 
 like search to work similar to Gmail in that we'd like to match on either 
 the subject or body of the email, and if it matches on the subject, we 
 only 
 want to display one result for that match (say the first message in that 
 thread). In our naive implementation, we have an ES index for subjects and 
 another for message bodies. But that gets us two sets of results, not 
 combined. Is there a better way to structure the data, or a query that 
 we're missing so that we get one set of combined results?

 Thanks,
 Mark

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7159e2d-257b-4ad2-9df0-2b2e60422d4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Searching email

So how you can distinguish the first email from any thread ?
Do you have some additional parameter ?

06 август 2014, сряда, 16:56:10 UTC+3, Mark Fletcher написа:
>
> Thanks for your response. If I do as you suggested, a subject match will 
> return all the messages in that thread (because they all match). I want the 
> search results to only contain one result if there's a thread match. 
>
> I suppose I could just grab all the results and then 'collapse' the thread 
> matches, but I was hoping to be able to do something better.
>
> Thanks,
> Mark
>
> On Tuesday, August 5, 2014 10:12:32 PM UTC-7, Tihomir Lichev wrote:
>>
>> Isn't better to create single document for each mail with fields 
>> "subject" and "body" (and whatever else you need from the mail) ?
>> This way you can search by any or all of the fields, also you can define 
>> boosting for each field. For instance when your search matches the subject 
>> the mail will be scored higher in the result than if it matches the body, 
>> and you will get single set of results.
>>
>> 06 август 2014, сряда, 02:12:52 UTC+3, Mark Fletcher написа:
>>>
>>> Hi,
>>>
>>> We're using ES to index email, specifically mailing list messages. We'd 
>>> like search to work similar to Gmail in that we'd like to match on either 
>>> the subject or body of the email, and if it matches on the subject, we only 
>>> want to display one result for that match (say the first message in that 
>>> thread). In our naive implementation, we have an ES index for subjects and 
>>> another for message bodies. But that gets us two sets of results, not 
>>> combined. Is there a better way to structure the data, or a query that 
>>> we're missing so that we get one set of combined results?
>>>
>>> Thanks,
>>> Mark
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/02081a7b-98f2-4b79-b514-29327d072beb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Searching email

2014-08-06 Thread Mark Fletcher

Thanks for your response. If I do as you suggested, a subject match will 
return all the messages in that thread (because they all match). I want the 
search results to only contain one result if there's a thread match. 

I suppose I could just grab all the results and then 'collapse' the thread 
matches, but I was hoping to be able to do something better.

Thanks,
Mark

On Tuesday, August 5, 2014 10:12:32 PM UTC-7, Tihomir Lichev wrote:
>
> Isn't better to create single document for each mail with fields "subject" 
> and "body" (and whatever else you need from the mail) ?
> This way you can search by any or all of the fields, also you can define 
> boosting for each field. For instance when your search matches the subject 
> the mail will be scored higher in the result than if it matches the body, 
> and you will get single set of results.
>
> 06 август 2014, сряда, 02:12:52 UTC+3, Mark Fletcher написа:
>>
>> Hi,
>>
>> We're using ES to index email, specifically mailing list messages. We'd 
>> like search to work similar to Gmail in that we'd like to match on either 
>> the subject or body of the email, and if it matches on the subject, we only 
>> want to display one result for that match (say the first message in that 
>> thread). In our naive implementation, we have an ES index for subjects and 
>> another for message bodies. But that gets us two sets of results, not 
>> combined. Is there a better way to structure the data, or a query that 
>> we're missing so that we get one set of combined results?
>>
>> Thanks,
>> Mark
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/065dbefb-a74c-45cd-ba4e-6bb1b396fcfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: leave content in mySQL and use ElasticSearch only for Index

2014-08-06 Thread Andrej Rosenheinrich

What I don't understand is why you generate an index and want to store it 
in elasticsearch. You could use the plugin as Jörg suggested, transfer you 
data to elasticsearch, set index:true for the fields you want and set 
store:false in the mapping. This way you get an index build by 
elasticsearch, can search on it, get the id as result and the data is not 
stored (except metadata, if you set it to be stored). See 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#mapping-core-types.

Cheers,
Andrej

Am Mittwoch, 6. August 2014 15:34:11 UTC+2 schrieb asekn...@gmail.com:
>
> Using this plugin would lead to a migration from mysql data into 
> Elasticsearch.
>
> So let me reformulate my question:
>
> My infrastructure is like this:
>
> client>Elasticsearch
>|
>|
> >mySQL
>
> So I have a client which generates an index and some metadata for a 
> mail(header and body). The mail is stored in mySQL. And the client-side 
> generated index and metadata is stored in Elasticsearch.
>
> The reason is because I have > 1 TB of mail content every day. This 
> content shall still be written to mySQL. Elasticsearch shall keep only the 
> index. Is that possible? And how?
>
> Regards
> Michael
>
>
> Am Mittwoch, 6. August 2014 13:21:09 UTC+2 schrieb asekn...@gmail.com:
>>
>> Hello,
>>
>> I want to use Elasticsearch or only indexing and searching E-Mails. We 
>> want to store the meta-info within Elasticsearch, keeping the content/body 
>> of every Mail in an mySQL database. So Elasticsearch shall have a reference 
>> to the mail body.
>>
>> Is that possible and how?
>>
>> Regards
>> Michael
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d1ffc059-abad-4b11-8179-35ed3c077cbf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: leave content in mySQL and use ElasticSearch only for Index

2014-08-06 Thread aseknoppik

Using this plugin would lead to a migration from mysql data into 
Elasticsearch.

So let me reformulate my question:

My infrastructure is like this:

client>Elasticsearch
   |
   |
>mySQL

So I have a client which generates an index and some metadata for a 
mail(header and body). The mail is stored in mySQL. And the client-side 
generated index and metadata is stored in Elasticsearch.

The reason is because I have > 1 TB of mail content every day. This content 
shall still be written to mySQL. Elasticsearch shall keep only the index. 
Is that possible? And how?

Regards
Michael


Am Mittwoch, 6. August 2014 13:21:09 UTC+2 schrieb asekn...@gmail.com:
>
> Hello,
>
> I want to use Elasticsearch or only indexing and searching E-Mails. We 
> want to store the meta-info within Elasticsearch, keeping the content/body 
> of every Mail in an mySQL database. So Elasticsearch shall have a reference 
> to the mail body.
>
> Is that possible and how?
>
> Regards
> Michael
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bb04102a-9aca-469d-8145-d5e644481d39%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How to install ELK stack, Logforwarder(nxlog),Redis on Windows ?

2014-08-06 Thread Dinesh Bandaru

Hi All,

I followed below link and I was able to setup ELK stack on my test 
environment, but below link requires more modifications.  ++Link: 
http://community.ulyaoth.net/threads/how-to-install-logstash-on-a-windows-server-with-kibana-in-iis.17/
How to add filters like extension,geoip and many more filters on Windows 
platform machines. Also, I need better logstash.conf for parsing IIS logs, 
event logs, all types of logs. 
Basically, I need steps to install the 
-Logstashforwarder->redis->logstash->elasticsearch->kibana on Windows 
server or windows 7?


Thanks 
Dinesh

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7b748578-d4f3-4abb-8386-e3ca9a8978b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to install ELK stack, Logforwarder(nxlog),Redis on Windows ?

2014-08-06 Thread Dinesh Bandaru


++Link: 
http://community.ulyaoth.net/threads/how-to-install-logstash-on-a-windows-server-with-kibana-in-iis.17/


On Wednesday, August 6, 2014 6:25:50 PM UTC+5:30, Dinesh Bandaru wrote:
>
> I followed below link and I was able to setup ELK stack on my test 
> environment, but below link requires more modifications.  
> How to add filters like extension,geoip and many more filters on Windows 
> platform machines. Also, I need better logstash.conf for parsing IIS logs, 
> event logs, all types of logs. 
> Basically, I need steps to install the 
> -Logstashforwarder->redis->logstash->elasticsearch->kibana on Windows 
> server or windows 7?
>
>
> Thanks 
> Dinesh
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/711f476c-788a-4daa-8883-79369cae449c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How to install ELK stack, Logforwarder(nxlog),Redis on Windows ?

2014-08-06 Thread Dinesh Bandaru

I followed below link and I was able to setup ELK stack on my test 
environment, but below link requires more modifications.  
How to add filters like extension,geoip and many more filters on Windows 
platform machines. Also, I need better logstash.conf for parsing IIS logs, 
event logs, all types of logs. 
Basically, I need steps to install the 
-Logstashforwarder->redis->logstash->elasticsearch->kibana on Windows 
server or windows 7?


Thanks 
Dinesh

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/32b8fe2a-d88d-4f47-aeb0-1993ec6b9934%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: leave content in mySQL and use ElasticSearch only for Index

Have a look at the JDBC plugin. With that plugin, you can push metadata
from MySQL to Elasticsearch.

https://github.com/jprante/elasticsearch-river-jdbc

Jörg


On Wed, Aug 6, 2014 at 1:21 PM,  wrote:

> Hello,
>
> I want to use Elasticsearch or only indexing and searching E-Mails. We
> want to store the meta-info within Elasticsearch, keeping the content/body
> of every Mail in an mySQL database. So Elasticsearch shall have a reference
> to the mail body.
>
> Is that possible and how?
>
> Regards
> Michael
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/928f30a8-31fd-456c-a527-e2e51ebd69f4%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGP6w%2BxS%2BQLoBR9r2q6tz_iz1rez1zAE%2B3j56WgFnSTKQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

leave content in mySQL and use ElasticSearch only for Index

2014-08-06 Thread aseknoppik

Hello,

I want to use Elasticsearch or only indexing and searching E-Mails. We want 
to store the meta-info within Elasticsearch, keeping the content/body of 
every Mail in an mySQL database. So Elasticsearch shall have a reference to 
the mail body.

Is that possible and how?

Regards
Michael

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/928f30a8-31fd-456c-a527-e2e51ebd69f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Sorting Problem, ClassCastException.

2014-08-06 Thread Ian Harrigan


>
> Im getting the exact same problem... ES version 1.2.1... If i use 
> something like this:


 {
  "from" : 0,
  "size" : 10,
  "query" : {
"match_all" : { }
  }
}



All fine... However, if i have a sort on it, eg:

{
  "from" : 0,
  "size" : 10,
  "query" : {
"match_all" : { }
  },
  "sort" : [ {
"dateTime" : {
  "order" : "desc"
}
  } ]
}



Then i get: 

[2014-08-06 12:31:01,685][DEBUG][action.search.type   ] [Maur-Konn] [
portal-boots-2014-08][0]: Failed to execute [org.elasticsearch.action.search
.SearchRequ
est@558af98a] while moving to second phase
java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.
lucene.util.BytesRef
at org.apache.lucene.util.BytesRef.compareTo(BytesRef.java:33)
at org.apache.lucene.search.FieldComparator.compareValues(
FieldComparator.java:221)
at org.elasticsearch.search.controller.ShardFieldDocSortedHitQueue.
lessThan(ShardFieldDocSortedHitQueue.java:118)
at org.elasticsearch.search.controller.ShardFieldDocSortedHitQueue.
lessThan(ShardFieldDocSortedHitQueue.java:35)
at org.apache.lucene.util.PriorityQueue.insertWithOverflow(
PriorityQueue.java:159)
at org.elasticsearch.search.controller.SearchPhaseController.
sortDocs(SearchPhaseController.java:360)
at org.elasticsearch.search.controller.SearchPhaseController.
sortDocs(SearchPhaseController.java:146)
at org.elasticsearch.action.search.type.
TransportSearchQueryThenFetchAction$AsyncAction.moveToSecondPhase(
TransportSearchQueryThenFetchAction.java:85)
at org.elasticsearch.action.search.type.
TransportSearchTypeAction$BaseAsyncAction.innerMoveToSecondPhase(
TransportSearchTypeAction.java:404)
at org.elasticsearch.action.search.type.
TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(
TransportSearchTypeAction.java:198)
at org.elasticsearch.action.search.type.
TransportSearchTypeAction$BaseAsyncAction$1.onResult(
TransportSearchTypeAction.java:174)
at org.elasticsearch.action.search.type.
TransportSearchTypeAction$BaseAsyncAction$1.onResult(
TransportSearchTypeAction.java:171)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.
run(SearchServiceTransportAction.java:526)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)



This is something to do with using multiple indexes as when i use the query 
with the inquisitor plugin and a single index it works fine. Is there a 
work around for this? This is causing a major problem in our production 
environment. 

Thanks in advance,
Ian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/50e58d06-14d6-4c22-80ad-9058c52f4eb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Total TTL confusion

2014-08-06 Thread Dennis de Boer

I'm getting regular TTL exceptions in my elasticsearch setup during
updating documents. I want to figure out how exactly TLL works.
The more I read about it, the more confusing it gets.

I have several questions, hope anyone is willing to answer them:

*** Using elasticsearch 0.90.5
*** In my mapping I set the -tll to 30 days using "_ttl": { "enabled":
true, "default": "30d" }. I can confirm this works by querying the _mapping
endpoint. ***

1) when I update an existing document, will the ttl be updated or not?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/docs-update.html

and also https://github.com/elasticsearch/elasticsearch/issues/3180 make
reference to the fact that the TTL will not be updated when you do an
update request. Which makes sense since you basically do a delete and
reindex for that document. However, on
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-ttl-field.html

it states :

*Note that the expiration procedure handle versioning properly so if a
document is updated between the collection of documents to expire and the
delete order, the document won’t be deleted.*

Do the two resources conflict each other?

2) Since I have a default ttl of 30days, I would expect my documents to
have a ttl of 259200ms at most. However, when I query for all the TTL's
in my index (using a statistical facet for example), the minum value I get
is 1407373291910ms, which is over 44 years!
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-ttl-field.html

states that

*If no default is set and no _ttl value is given then the document has an
infinite _ttl and will not expire.*

Is this what happened here? Does the update query not (correctly / at all)
set the ttl?

3)
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/docs-index_.html#index-ttl

states:

*The expiration date that will be set for a document with a provided ttl is
relative to the timestamp of the document*

However; I have NOT set a timestamp OR enabled it in my mapping. As far as
I know my documents do not have any timestamp.
How does Elasticsearch compute the expiration date on these documents?

Any help would be much appreciated here . Thank you!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7104e6f2-83a8-4e74-a7e8-c8d718c196b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: System Requirements for ElasticSearch stack

There is no "one size fits all", no strict measure for RAM, CPU cores,
shard/node. This all depends on your testing results and your requirements.
Do not trust other test results more than your own.

You can index 2G with Elasticsearch in a few minutes, using commodity
hardware. Do not expect problems here.

For sizing, you should also take into consideration the total volume you
have to keep (for disk space setup) and how much query workload you need to
serve (for replica). The number of users is a hint, but it depends on the
type of query too (filters, aggregations, etc.)

For fault tolerance, you should take into consideration the availability of
the system. If you don't care, one node might be sufficient, but production
should at least use three nodes, for better fault tolerance.

Jörg


On Wed, Aug 6, 2014 at 6:02 AM, Gopinath Nallappan <
gopinathnallap...@gmail.com> wrote:

> Hello all,
>
> I'm new to the ELK stack. I will be logging Windows Events, Syslogs from
> firewalls, routers etc into my elasticsearch.
>
> I am expecting daily data of around 2GB to be logged into my elasticsearch
> server. I will be creating indices on daily or weekly basis.
>
> And my logs are going to be stored for atleast a year online and offline
> after that.
>
> I have been looking around and also searched this forum, but I was not
> able to find a definitive guide that explained how to design the
> architecture - RAM, # of CPU cores, # of Elastcisearch nodes and shards /
> node.
>
> The system will be mainly used for logging purposes only. So there won't
> be that many concurrent users.
>
> Appreciate any pointers on best practices in setting up the Elasticsearch
> deployment.
>
> Thanks,
> Gopinath
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/23818203-6fe3-49ae-996d-443c2250ea34%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH6aNsZht5WSHnNRqSH6jiNngwHQZnR%2BY96W6_DtwEJXg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Being able to search the documentation on the elasticsearch website would be great

2014-08-06 Thread Robert Gardam

Hehehe, this looks new! ;-) 


Maybe I shouldn't post questions when I'm tired! Apologies. :)

On Wednesday, August 6, 2014 11:03:40 AM UTC+2, Tihomir Lichev wrote:
>
> The big search field at the top
>
> 06 август 2014, сряда, 12:02:31 UTC+3, Tihomir Lichev написа:
>>
>> Perhaps you should give it a try here :
>> http://www.elasticsearch.org/guide/
>> ;-)
>>
>> 06 август 2014, сряда, 11:14:09 UTC+3, Robert Gardam написа:
>>>
>>> I don't mean to bash elasticsearch, but for a company that writes a 
>>> search engine, it would be great to be able to search your 
>>> documentation.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4f76c520-3358-4baf-b020-68922a2aebc3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Being able to search the documentation on the elasticsearch website would be great

The big search field at the top

06 август 2014, сряда, 12:02:31 UTC+3, Tihomir Lichev написа:
>
> Perhaps you should give it a try here :
> http://www.elasticsearch.org/guide/
> ;-)
>
> 06 август 2014, сряда, 11:14:09 UTC+3, Robert Gardam написа:
>>
>> I don't mean to bash elasticsearch, but for a company that writes a 
>> search engine, it would be great to be able to search your 
>> documentation.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a1b2f754-cdf9-46bf-bb6c-a2762e453a56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Being able to search the documentation on the elasticsearch website would be great

2014-08-06 Thread 'Sandeep Ramesh Khanzode' via elasticsearch

Perhaps you should give it a try here :
http://www.elasticsearch.org/guide/
;-)

06 август 2014, сряда, 11:14:09 UTC+3, Robert Gardam написа:
>
> I don't mean to bash elasticsearch, but for a company that writes a search 
> engine, it would be great to be able to search your documentation.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/67673c17-dfa2-424a-b27f-a359bd951807%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Aggregate results over multiple indices

2014-08-06 Thread Alexander Reelsen

Hey,

sure you can, just specify those indices in your request, see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/multi-index.html


--Alex


On Wed, Aug 6, 2014 at 10:07 AM, 'Sandeep Ramesh Khanzode' via
elasticsearch  wrote:

> Hi,
>
> If I have three different indices with the same schema mapping for a type,
> can I use the SearchRequestBuilder (or any other class) to simultaneously
> query all three indices and have ElasticSearch perform aggregations/sorts
> on the results from all three?
>
> Thanks,
> Sandeep
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/23cca37f-b999-4016-be3c-7e4eda34b28a%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8o15PqD_pQAdBTJS_LfmSt1%3DkQW2umvWEkGRuMMK_Dww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Being able to search the documentation on the elasticsearch website would be great

2014-08-06 Thread Robert Gardam

I don't mean to bash elasticsearch, but for a company that writes a search 
engine, it would be great to be able to search your documentation.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/71ed551f-7e03-4bd2-b819-da09d1568a46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aggregate results over multiple indices

Hi,

If I have three different indices with the same schema mapping for a type, 
can I use the SearchRequestBuilder (or any other class) to simultaneously 
query all three indices and have ElasticSearch perform aggregations/sorts 
on the results from all three?

Thanks,
Sandeep

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/23cca37f-b999-4016-be3c-7e4eda34b28a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: can i set default mapping for field store？

Have a look here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_dynamic_templates

You can set default properties for your fields

06 август 2014, сряда, 10:51:23 UTC+3, huangs...@gmail.com написа:
>
> hi 
> i want to disable _source ,at the same time i want to use dynamic 
> mapping。so can i set default mapping for field store to let field with 
> stored？
>
> wish you replay，thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5bc9da23-a8b2-4785-8fd9-cdfb08144b6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

can i set default mapping for field store？

2014-08-06 Thread huangshanjay

hi 
i want to disable _source ,at the same time i want to use dynamic 
mapping。so can i set default mapping for field store to let field with 
stored？

wish you replay，thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8333f5e6-0413-410c-a6f3-dbe2fff1b4db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Group by field and then sum the groups