Re: Alerting in ELK stack?

2014-07-07 Thread Otis Gospodnetic
We have and use SPM  for all our metrics (ES, 
Kafka, Apache, MySQL, Hadoop, everything) and we feed our logs to Logsene 
 (it has a Kibana UI and a "native" UI).  SPM 
has alerting and anomaly detection, so we use that to get out of bed early 
(nah, not really), but we currently lack alerting in Logsene (i.e. alerting 
on numerical data in logs or on patterns).  Since Logsene has Kibana UI and 
can be fed via Logstash and has an Elasticsearch API and backend, that's 
the closest we've gotten to ELK+Alerts.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



On Wednesday, June 25, 2014 11:18:01 AM UTC-4, Michael Hart wrote:
>
> We use Nagios for alerting. I originally was using the nsca output plugin 
> for logstash, but found that it took close to a second to execute the 
> command line nsca client, and if we got flooded with alert messages, 
> logstash would fall behind. I've since switched to use the http output and 
> send json to the nagios-api server (https://github.com/zorkian/nagios-api). 
> That seems to scale a lot better.
>
> We do also have metrics sent from logstash to statsd/graphite, but mostly 
> so I can see message rates.
>
> mike
>
> On Monday, June 23, 2014 4:50:22 AM UTC-4, Siddharth Trikha wrote:
>>
>> We are using the `ELK stack (logstash, elasticsearch, kibana)` to analyze 
>> our logs. So far, so good.
>>
>> But now we want notification generation on some particular kind of logs. 
>> Eg When a login failed logs comes more than 5 times (threshold crossed) an 
>> email to be sent to the sysadmin.
>>
>> I looked up online and heard about `statsd`, `riemann`, `nagios`, 
>> `metric` filter (logstash) to achieve our requirement. 
>>
>> Can anyone suggest which fits best with ELK stack?? I am new to this. 
>> Thanks
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/71f99e2b-6557-4be4-a68d-2df08e53e595%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: What does it take to make a custom stemmer for ES?

2014-07-07 Thread Otis Gospodnetic
Hi Nandiya,

Have a look at Lucene and its source-code for token filters.  You'd 
implement a custom stemmer at Lucene level, and then just use that in ES.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



On Monday, July 7, 2014 8:57:09 PM UTC-4, Nandiya Bhikkhu wrote:
>
> I am interested in using elasticsearch for our website suttacentral.net, 
> I've tried ES and found it pleasant to use with obvious power, the only 
> challenge is that on suttacentral we host many buddhist texts in ancient 
> languages, particularly the pali language, suffix to say there are no 
> existing stemmers. Stemming is a vital step for searching because pali is a 
> highly inflected language (like latin). The actual stemming step is 
> straightforward enough, presently we use a custom stemmer I wrote in 
> python, it's dead simple and I wouldn't have much trouble implementing the 
> same code in java (i.e. as a function which takes an inflected word as a 
> string, and returns the stem as another string). Where I'm in the dark is 
> making ES call that code.
>
> All the example stemmer plugins I've found are adapting existing stemmers 
> to ES. What I really just want is a way to call a function on each token 
> and use the return value of that function. It seems to me that *should* 
> be simple enough but I've not managed to find any simple minimalistic code 
> to use as a template. Although it would be noble at this point I'm not 
> interested in making a proper plugin, I would be happy with the barest 
> bodge/hack that would achieve the desired affect!
>
> If anyone could point me in the right direction, either to a minimalistic 
> code example, or outline what it would involve, I would be gratefully 
> appreciative.
>
> Kind regards,
> Nandiya Bhikkhu
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f3b3a496-b434-41b4-84b9-733b3139202c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana Elasticsearch Shards and replication

2014-07-07 Thread Mark Walkom
Once you have a cluster, all data on any node is accessible.
It does this by passing the query to the master node which then collects
the data as required from the other nodes.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 8 July 2014 14:35, Tony Chong  wrote:

> Hi everyone,
>
> Sorry if this has been covered but a few pages of searching through the
> group hasn't sprung an answer for this.
>
> If I decided to have 3 elasticsearch nodes, with 3 shards, and 0 replicas,
> would kibana be able to retrieve all the data in my ES cluster or just the
> data from the elasticsearch listed in its configuration, considering not
> all the shards would live on that node?
>
> Thanks,
>
> Tony
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1b7d0cdd-9689-4fb4-9cd5-09c907e1b9a6%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bLQgeT0agRU%2B9yTpBOrmVGGZ_dh2Ahin6xf5pGCmkcVA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Kibana Elasticsearch Shards and replication

2014-07-07 Thread Tony Chong
Hi everyone,

Sorry if this has been covered but a few pages of searching through the 
group hasn't sprung an answer for this. 

If I decided to have 3 elasticsearch nodes, with 3 shards, and 0 replicas, 
would kibana be able to retrieve all the data in my ES cluster or just the 
data from the elasticsearch listed in its configuration, considering not 
all the shards would live on that node?

Thanks,

Tony

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b7d0cdd-9689-4fb4-9cd5-09c907e1b9a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Time range filter

2014-07-07 Thread vineeth mohan
Hello Tom ,

At this point , i can think of 2 approaches -


   1. Store an additioanl field with just the time and not the date
   information. Do a normal range query here.
   2. Create script filters - In the filter , take the time out and check
   the range.
   
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html


But then this is a common use case and some elegant way to do it should
exist.
If not , I will create a bug.

Thanks
Vineeth



On Tue, Jul 8, 2014 at 7:19 AM, Tom Miller  wrote:

> All of the examples I can find on the web relate to date-range filtering.
> What I need is a time-range filter: i,e
> 19:00 - 23:30.
>
> So, in this example, I want all hits between 7PM and 11:30, regardless of
> the day...
>
> I'd do this in SQL by doing "Where TIME(column) BETWEEN x and y".
>
> Is this possible in elasticsearch?
>
> My only solution thus far is to date_histogram by hour, and then filter on
> the client and add them up, which is kinda lame...
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/943a4cca-ee2c-497a-840e-be39ad821a0f%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kFZMt1nuUfBJkwPuFkBCGN4ZUHXESxPn6Ccy9F0QL5xA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I do intersection or union operation with two facets filter?

2014-07-07 Thread 闫旭

It doesn't work, the result is:
{"term":"bbb","count":2},
{"term":"aaa","count":1 },
{"term":"ccc","count":1 }
于 2014年07月08日 02:07, Harish Ved 写道:

Did you try the following query?
"query":{
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{"term" : { "field_B" : "" }},
{"term": {"field_B": "bbb"}}
]
}
}
}
  },
"facets" : {
"tag" : {
"terms" : {
"field" : "field_A"
}
}
}
}


Plz confirm if it works for you.

On Monday, 7 July 2014 15:56:53 UTC+5:30, 闫旭 wrote:

Dear All!
I have some docs:
{"field_A":"aaa","field_B":"bbb"}
{"field_A":"aaa","field_B":"ccc"}
{"field_A":"bbb","field_B":"bbb"}
{"field_A":"bbb","field_B":"bbb"}
{"field_A":"bbb","field_B":"eee"}
{"field_A":"aaa","field_B":""}
{"field_A":"ccc","field_B":""}
first step:
{
  "query":{
"filtered" : {
"filter" : {
"bool" : {
"must" : {
"term" : { "field_B" : "bbb" }
}
}
}
}
  },
"facets" : {
"tag" : {
"terms" : {
"field" : "field_A"
}
}
}
}
first result:
{
...
{"term":"aaa","count":1},
{"term":"bbb","count":2},
...
}
-
second step:
the second facets:
{
  "query":{
"filtered" : {
"filter" : {
"bool" : {
"must" : {
"term" : { "field_B" : "" }
}
}
}
}
  },
"facets" : {
"tag" : {
"terms" : {
"field" : "field_A"
}
}
}
}
second result:
{
...
{"term":"aaa","count":1}.
{"term":"ccc","count":1}
...
}

-
third step:
combine the two result with interesction operation with "term":
{"term":"aaa","count":"I don't care the count value."}


-
Now, How can i combine the three steps in one filter facets or
othen method??


Thx All!!

--
You received this message because you are subscribed to the Google 
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to elasticsearch+unsubscr...@googlegroups.com 
.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/93d5adb6-bcd1-49b6-8c28-be4d456f92d8%40googlegroups.com 
.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53BB51CC.3080005%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Time range filter

2014-07-07 Thread Tom Miller
All of the examples I can find on the web relate to date-range filtering. 
What I need is a time-range filter: i,e
19:00 - 23:30.

So, in this example, I want all hits between 7PM and 11:30, regardless of 
the day...

I'd do this in SQL by doing "Where TIME(column) BETWEEN x and y".

Is this possible in elasticsearch?

My only solution thus far is to date_histogram by hour, and then filter on 
the client and add them up, which is kinda lame...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/943a4cca-ee2c-497a-840e-be39ad821a0f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


What does it take to make a custom stemmer for ES?

2014-07-07 Thread Nandiya Bhikkhu
I am interested in using elasticsearch for our website suttacentral.net, 
I've tried ES and found it pleasant to use with obvious power, the only 
challenge is that on suttacentral we host many buddhist texts in ancient 
languages, particularly the pali language, suffix to say there are no 
existing stemmers. Stemming is a vital step for searching because pali is a 
highly inflected language (like latin). The actual stemming step is 
straightforward enough, presently we use a custom stemmer I wrote in 
python, it's dead simple and I wouldn't have much trouble implementing the 
same code in java (i.e. as a function which takes an inflected word as a 
string, and returns the stem as another string). Where I'm in the dark is 
making ES call that code.

All the example stemmer plugins I've found are adapting existing stemmers 
to ES. What I really just want is a way to call a function on each token 
and use the return value of that function. It seems to me that *should* be 
simple enough but I've not managed to find any simple minimalistic code to 
use as a template. Although it would be noble at this point I'm not 
interested in making a proper plugin, I would be happy with the barest 
bodge/hack that would achieve the desired affect!

If anyone could point me in the right direction, either to a minimalistic 
code example, or outline what it would involve, I would be gratefully 
appreciative.

Kind regards,
Nandiya Bhikkhu

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fe2c777e-b823-4652-8f6c-ecf42ec36d33%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Not Working

2014-07-07 Thread shriyansh jain
Hey,
I tried executing the above commands specified by you. But still not 
getting the elastic-search working.
Still giving the same status message, "elasticsearch dead but subsys locked"

Thanks,
Shriyansh

On Monday, July 7, 2014 5:29:00 PM UTC-7, arshpreet singh wrote:
>
> On Tue, Jul 8, 2014 at 5:53 AM, shriyansh jain  > wrote: 
> > Hey Arshpreet, 
> > 
> > I am not getting anything in output. 
>
> Please avoid top posting and RTF while replying in mailing lists. 
> IMHO your service is blocked and you need to kill the demon. 
>
> sudo /etc/init.d/elasticsearch restart 
> or 
> sudo /etc/init.d/elasticsearch stop 
> sudo /etc/init.d/elasticsearch start 
>
>
> -- 
>
> Thanks 
> Arshpreet singh 
> http://arshpreetsingh.wordpress.com/ 
> I have no special talents. Only passionately curious 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bc85faa2-406e-4fac-ac75-06649d1bf075%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Not Working

2014-07-07 Thread arshpreet singh
On Tue, Jul 8, 2014 at 5:53 AM, shriyansh jain  wrote:
> Hey Arshpreet,
>
> I am not getting anything in output.

Please avoid top posting and RTF while replying in mailing lists.
IMHO your service is blocked and you need to kill the demon.

sudo /etc/init.d/elasticsearch restart
or
sudo /etc/init.d/elasticsearch stop
sudo /etc/init.d/elasticsearch start


-- 

Thanks
Arshpreet singh
http://arshpreetsingh.wordpress.com/
I have no special talents. Only passionately curious

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAstK2Hg%3DGBCYL4kr32VVyN%2BC6rR9p%3Di3qTLF8Y_ajDa7WAiJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Not Working

2014-07-07 Thread shriyansh jain
Hey,
I am running the following command from terminal to verify the status



*sudo /etc/init.d/elasticsearch status*Thanks,
Shriyansh


On Monday, July 7, 2014 5:07:04 PM UTC-7, Mark Walkom wrote:
>
> You need to provide more details for people to be able to effectively help.
>
> How are you verifying this, what method are you using?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 8 July 2014 10:03, shriyansh jain > 
> wrote:
>
>> When I am verifying the elastic-search status, its giving me the 
>> following error message. 
>>
>> *elasticsearch dead but subsys locked*
>>
>> Please help me out solving this.
>>
>> Thank you.
>> Shriyansh Jain
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a37913e3-fd5a-45b1-85c1-9d20fad05c6b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Not Working

2014-07-07 Thread Mark Walkom
What command?
Please be explicit, provide what you are running and the output.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 8 July 2014 10:25, shriyansh jain  wrote:

> I am just running a command from terminal.
>
> Thanks,
> Shriyansh
>
> On Monday, July 7, 2014 5:07:04 PM UTC-7, Mark Walkom wrote:
>>
>> You need to provide more details for people to be able to effectively
>> help.
>>
>> How are you verifying this, what method are you using?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 8 July 2014 10:03, shriyansh jain  wrote:
>>
>>> When I am verifying the elastic-search status, its giving me the
>>> following error message.
>>>
>>> *elasticsearch dead but subsys locked*
>>>
>>> Please help me out solving this.
>>>
>>> Thank you.
>>> Shriyansh Jain
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/269395b9-7d48-4f0c-93de-4d7b275dff6d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bY-27tu7iEoGy5rqU6E1po-2UX3KhMmvW6dGbfoEpkZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Not Working

2014-07-07 Thread shriyansh jain
I am just running a command from terminal.

Thanks,
Shriyansh

On Monday, July 7, 2014 5:07:04 PM UTC-7, Mark Walkom wrote:
>
> You need to provide more details for people to be able to effectively help.
>
> How are you verifying this, what method are you using?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 8 July 2014 10:03, shriyansh jain > 
> wrote:
>
>> When I am verifying the elastic-search status, its giving me the 
>> following error message. 
>>
>> *elasticsearch dead but subsys locked*
>>
>> Please help me out solving this.
>>
>> Thank you.
>> Shriyansh Jain
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/269395b9-7d48-4f0c-93de-4d7b275dff6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Not Working

2014-07-07 Thread shriyansh jain
Hey Arshpreet,

I am not getting anything in output.


Thank you,
Shriyansh




On Monday, July 7, 2014 5:12:05 PM UTC-7, arshpreet singh wrote:
>
> On Tue, Jul 8, 2014 at 5:33 AM, shriyansh jain  > wrote: 
> > When I am verifying the elastic-search status, its giving me the 
> following 
> > error message. 
> > 
> > elasticsearch dead but subsys locked 
>
> ipcs -s | grep elasticsearch 
>
> Can you post output for the above command? 
>
> -- 
>
> Thanks 
> Arshpreet singh 
> http://arshpreetsingh.wordpress.com/ 
> I have no special talents. Only passionately curious 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e48f5f1e-dfe8-4f25-b9f0-b787762b631b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Not Working

2014-07-07 Thread arshpreet singh
On Tue, Jul 8, 2014 at 5:33 AM, shriyansh jain  wrote:
> When I am verifying the elastic-search status, its giving me the following
> error message.
>
> elasticsearch dead but subsys locked

ipcs -s | grep elasticsearch

Can you post output for the above command?

-- 

Thanks
Arshpreet singh
http://arshpreetsingh.wordpress.com/
I have no special talents. Only passionately curious

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAstK2Ge%2B9CYeFYX0tiTQ2XL0Y%3Dcw49vUOLhGpC%2Bjn2i6Oyw7w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Not Working

2014-07-07 Thread Mark Walkom
You need to provide more details for people to be able to effectively help.

How are you verifying this, what method are you using?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 8 July 2014 10:03, shriyansh jain  wrote:

> When I am verifying the elastic-search status, its giving me the following
> error message.
>
> *elasticsearch dead but subsys locked*
>
> Please help me out solving this.
>
> Thank you.
> Shriyansh Jain
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624b5if1JbJ-BGEwZ31QwwmRsbmjTmj34infrMvh8MPHY7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch Not Working

2014-07-07 Thread shriyansh jain
When I am verifying the elastic-search status, its giving me the following 
error message. 

*elasticsearch dead but subsys locked*

Please help me out solving this.

Thank you.
Shriyansh Jain

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Problems upgrading an existing field to a multi-field

2014-07-07 Thread Ryan Tanner
I'm having trouble upgrading an existing field to a multi-field.  I've done 
this before with no issues on other fields.

I think the issue here is that the original mapping specifically defines an 
analyzer:

  "mappings" : {
"person" : {
  "properties" : {
"domain_titles" : {
  "type" : "string",
  "analyzer" : "stop",
  "include_in_all" : true
}
  }
}
  }

The other fields that have been upgraded do not have an analyzer in the 
original mapping.

This is the upgrade I'm attempting:

{
  "settings" : {
"index.analysis.filter.shingle_filter.type" : "shingle",
"index.analysis.filter.shingle_filter.min_shingle_size" : 2,
"index.analysis.filter.shingle_filter.max_shingle_size" : 5,
"index.analysis.analyzer.shingle_analyzer.type" : "custom",
"index.analysis.analyzer.shingle_analyzer.tokenizer" : "standard",
"index.analysis.analyzer.shingle_analyzer.filter" : [ "lowercase", 
"shingle_filter" ]
  },
  "mappings" : {
"person" : {
  "properties" : {
"domain_titles" : {
  "type" : "string",
  "fields" : {
"suggestions" : {
  "type" : "string",
  "index" : "analyzed",
  "include_in_all" : false,
  "analyzer" : "nicknameAnalyzer"
}
  }
}
  }
}
  }
}

Is there any reason why this sort of upgrade should fail?  This is the 
error message I get:

{"error":"MergeMappingException[Merge failed with failures {[mapper 
[domain_titles] has different index_analyzer]}]","status":400}

Thanks for the help.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e057498d-64ca-4f5f-a76c-0a4717b82b9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: excessive merging/small segment sizes

2014-07-07 Thread Michael McCandless
Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy  wrote:

> All that seems correct (except I think this is for node 6, not node 5). We
> don't delete documents, but we do some updates. The vast majority of
> documents get indexed into the large shards, but the smaller ones take some
> writes as well.
>
> We aren't using virtualized hardware and elasticsearch is the only thing
> running on the machines, no scheduled jobs, etc. I don't think something is
> interfering, actually overall disk i/o rate and operations on the machine
> go down quite a bit during the problematic period, which is consistent with
> your observations about things taking longer.
>
> I went back and checked all our collected metrics again. I noticed that
> even though the heap usage and gc count seems smooth during the period in
> question, gc time spent goes way up. Also active indexing threads goes up,
> but since our ingest rate didn't go up I assumed this was a side effect.
> During a previous occurrence a few days ago on node5, I stopped all
> indexing activity for 15 minutes. Active merges and indexing requests went
> to zero as expected. Then I re-enabled indexing and immediately the
> increased cpu/gc/active merges went back up to the problematic rates.
>
> Overall this is pretty confusing to me as to what is a symptom vs a root
> cause here. A summary of what I think I know:
>
>1. Every few days, cpu usage on a node goes way above the other nodes
>and doesn't recover. We've let the node run in the elevated cpu state for a
>day with no improvement.
>2. It doesn't seem likely that it's data related. We use replicas=1
>and no other nodes have issues.
>3. It doesn't seem hardware related. We run on a dedicated h/w with
>elasticsearch being the only thing running. Also the problem appears on
>various nodes and machine load seems tied directly to the elasticsearch
>process.
>4. During the problematic period: cpu usage, active merge threads,
>active bulk (indexing) threads, and gc time are elevated.
>5. During the problematic period: i/o ops and i/o throughput decrease.
>6. overall heap usage size seems to smoothly increase, the extra gc
>time seems to be spent on the new gen. Interestingly, the gc count didn't
>seem to increase.
>7. In the hours beforehand, gc behavior of the problematic node was
>similar to the other nodes.
>8. If I pause indexing, machine load quickly returns to normal, merges
>and indexing requests complete.  if I then restart indexing the problem
>reoccurs immediately.
>9. If I disable automatic refreshes, the problem disappears within an
>hour or so.
>10. hot threads show merging activity as the hot threads.
>
> The first few points make me think the increased active merges is perhaps
> a side effect, but then the last 3 make me think merging is the root cause.
> The only additional things I can think of that may be relevant are:
>
>1. Our documents can vary greatly in size, they average a couple KB
>but can rarely be several MB.
>2. we do use language analysis plugins, perhaps one of these is acting
>up?
>3. We eagerly load one field into the field data cache. But the cache
>size is ok and the overall heap behavior is ok so I don't think this is the
>problem.
>
> That's a lot of information, but I am not sure where to go next from
> here...
>
> On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:
>
>> Indeed there are no big merges during that time ...
>>
>> I can see on node5, ~14:45 suddenly merges are taking a long time,
>> refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
>> time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
>> e.g. total merging GB, number of commits/refreshes is very low during this
>> time.
>>
>> Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.  The
>> biggish shards are indexing at a very slow rate and only have ~1%
>> deletions.  Are you explicitly deleting docs?
>>
>> I suspect something is suddenly cutting into the IO perf of this box, and
>> because merging/refreshing is so IO intensive, it causes these operations
>> to run slower / backlog.
>>
>> Are there any scheduled jobs, e.g. backups/snapshots, that start up?  Are
>> you running on virtualized hardware?
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy  wrote:
>>
>>>  Just to reiterate, the problematic period is from 07/05 14:45 to 07/06
>>> 02:10. I included a couple hours before and after in the logs.
>>>
>>>
>>> On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

 They are linked below (node5 is the log of the normal node, node6 is
 the log of the problematic node).

 I don't think it was doing big merges, otherwise during the 

Re: Elasticsearch twitter river filtered stream question

2014-07-07 Thread David Pilato
It uses the filter functionality provided by Twitter API.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 7 juillet 2014 à 21:54:02, Josh Harrison (hij...@gmail.com) a écrit:

Quick question about the ES twitter river at 
https://github.com/elasticsearch/elasticsearch-river-twitter
The twitter streaming API allows you to filter, and you apparently get up to 1% 
of the stream total, with our search queries. So, if I were filtering for 
"coffee", I'd get "coffee" tweets that I wouldn't get if I was just capturing 
the 1% stream passively.
Does the Twitter river use this filter functionality, or does it do its 
filtering on the ingestion side, ingesting the normal 1% stream and discarding 
anything that doesn't match
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7adb5f1-49a1-4424-8f4e-1c75e15c4cb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53bb1e60.625558ec.5cf2%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Best practice to backup index daily?

2014-07-07 Thread Ivan Brusic
The Elasticsearch curator now supports snapshots:

https://github.com/elasticsearch/curator
http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/

You would still need to use cron to schedule tasks, but it would be a
curator task instead of a direct curl request.

Cheers,

Ivan


On Mon, Jul 7, 2014 at 1:12 PM, sabdalla80  wrote:

> I am able to take a snapshot of the index and back it up to AWS S3. What
> is the best way to automate this approach and have it done daily say every
> days at 12 midnight?
> I am aware that I can probably do it with crontab, but curious if other
> are doing it differently?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8620f7d9-b827-470d-8928-75c308e722cc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB%3D%2BXEGQB3GybgiE3sniiFavc_NMyTK6LgjNnbus4J-jA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to limit the fields of response when I search a keyword?

2014-07-07 Thread Ivan Brusic
I responded differently to your other similar question, but you can also
limit the fields, but explicitly asking for the set that you want:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html

Cheers,

Ivan


On Sat, Jul 5, 2014 at 2:32 AM, 纪路  wrote:

> Dear all:
>
> I have a reasonable need, but I can't find how to deal with it on es
> official docs and books, is anyone know, please teach it to me! thank you!
>
> I have a large set of docs, which hold a lot of fields, such as:
>
> uid2 = {
> "id": 1404999597,
> "idstr": "1404999597",
> "class": 1,
> "screen_name": "",
> "name": "",
> "province": "11",
> "city": "1000",
> "location": "北京",
> "description": "在主流与非主流之间徘徊",
> "url": "",
> "profile_image_url": "http://tp2.sinaimg.cn/1404999597/50/5642385629/0";,
> "profile_url": "u/1404999597",
> "domain": "",
> "weihao": "",
> "gender": "f",
> "followers_count": 1030710,
> "friends_count": 272,
> "statuses_count": 1519,
> "favourites_count": 90,
> "created_at": "Wed Mar 23 23:59:40 +0800 2011",
> "following": false,
> "allow_all_act_msg": false,
> "geo_enabled": false,
> "verified": true,
> "verified_type": 0,
> "remark": "",
> "status": {
> "created_at": "Tue Jul 01 13:17:55 +0800 2014",
> "id": 3727513249206064,
> "mid": "3727513249206064",
> "idstr": "3727513249206064",
> "text": "听到她的声音,我更相信她和荷西在天堂,依旧幸福着。 //@东方尔雅:现在这种纯真的爱情还好找吗? //@晓玲-有话说:[心]",
> "source": "http://app.weibo.com/t/feed/9ksdit\";
> rel=\"nofollow\">iPhone客户端",
> "favorited": false,
> "truncated": false,
> "in_reply_to_status_id": "",
> "in_reply_to_user_id": "",
> "in_reply_to_screen_name": "",
> "pic_urls": [],
> "geo": null,
> "reposts_count": 0,
> "comments_count": 0,
> "attitudes_count": 0,
> "mlevel": 0,
> "visible": {
> "type": 0,
> "list_id": 0
> },
> "darwin_tags": []
> },
> "ptype": 1,
> "allow_all_comment": true,
> "avatar_large": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
> "avatar_hd": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
> "verified_reason": "电视台主持人梦桐",
> "verified_trade": "",
> "verified_reason_url": "",
> "verified_source": "",
> "verified_source_url": "",
> "follow_me": false,
> "online_status": 0,
> "bi_followers_count": 167,
> "lang": "zh-cn",
> "star": 0,
> "mbtype": 0,
> "mbrank": 0,
> "block_word": 0,
> "block_app": 0,
> "ability_tags": "主持人",
> "worldcup_guess": 0
> }
>
> this is a user info. If I want analysis the gender distribution of  all of
> users whose live in "city": "1000"(1000 is a city code), I don't need other
> field except "city" and "gender", How can I exclude these meaningless field
> before they are returned. Because there are a lots of doc, If transmit the
> entire doc will wast many time and bandwidth, and I have to trim the
> additional information in myself program. so, is there a method can deal
> with problem for me? thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0f79f408-92d4-4806-8c47-02dd877ddaaf%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAF8fXiSFhpGiP31RiwpgPbwMrwszv_8chDGqopkXqyWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to limit fields of response doc when I search certain keyword?

2014-07-07 Thread Ivan Brusic
If I understand you correctly, you want to view the distribution of gender
based on the results of a query? In that case, you want to look into
aggregations, which work on top of the result set that is returned.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/aggregations.html

Here is a query that should work with your basic use case. Substitute
aggregations for facets if you have a newer version of Elasticsearch.

{
"size": 0,
   "query": {
  "filtered": {
 "query": {
"match_all": {}
 },
 "filter": {
"term": {
   "city": "1000"
}
 }
  }
   },
   "facets": {
  "gender": {
 "terms": {
"field": "gender"
 }
  }
   }
}

-- 
Ivan


On Sat, Jul 5, 2014 at 2:12 AM, 纪路  wrote:

> Dear all:
> There is a reasonable need, but I don't find a solve in official doc or
> book, can you help me?
>
> I have a large set of docs, which contains a lot of fields, such as:
>  {
> "id": 1404999597,
> "idstr": "1404999597",
> "class": 1,
> "screen_name": "主播梦桐",
> "name": "主播梦桐",
> "province": "11",
> "city": "1000",
> "location": "北京",
> "description": "在主流与非主流之间徘徊",
> "url": "",
> "profile_image_url": "http://tp2.sinaimg.cn/1404999597/50/5642385629/0";,
> "profile_url": "u/1404999597",
> "domain": "",
> "weihao": "",
> "gender": "f",
> "followers_count": 1030710,
> "friends_count": 272,
> "statuses_count": 1519,
> "favourites_count": 90,
> "created_at": "Wed Mar 23 23:59:40 +0800 2011",
> "following": false,
> "allow_all_act_msg": false,
> "geo_enabled": false,
> "verified": true,
> "verified_type": 0,
> "remark": "",
> "status": {
> "created_at": "Tue Jul 01 13:17:55 +0800 2014",
> "id": 3727513249206064,
> "mid": "3727513249206064",
> "idstr": "3727513249206064",
> "text": "听到她的声音,我更相信她和荷西在天堂,依旧幸福着。 //@东方尔雅:现在这种纯真的爱情还好找吗? //@晓玲-有话说:[心]",
> "source": "http://app.weibo.com/t/feed/9ksdit\";
> rel=\"nofollow\">iPhone客户端",
> "favorited": false,
> "truncated": false,
> "in_reply_to_status_id": "",
> "in_reply_to_user_id": "",
> "in_reply_to_screen_name": "",
> "pic_urls": [],
> "geo": null,
> "reposts_count": 0,
> "comments_count": 0,
> "attitudes_count": 0,
> "mlevel": 0,
> "visible": {
> "type": 0,
> "list_id": 0
> },
> "darwin_tags": []
> },
> "ptype": 1,
> "allow_all_comment": true,
> "avatar_large": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
> "avatar_hd": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
> "verified_reason": "电视台主持人梦桐",
> "verified_trade": "",
> "verified_reason_url": "",
> "verified_source": "",
> "verified_source_url": "",
> "follow_me": false,
> "online_status": 0,
> "bi_followers_count": 167,
> "lang": "zh-cn",
> "star": 0,
> "mbtype": 0,
> "mbrank": 0,
> "block_word": 0,
> "block_app": 0,
> "ability_tags": "主持人",
> "worldcup_guess": 0
> }
>
> My problem is when I search(or scan & scroll) a certain field, for example
> "city"=1000(1000 is its city code, which refer to a city name), there maybe
> 1 results are returned. But my goal is detect how gender of this city's
> person is distributed in my website, I don't need so many information
> except "gender" field. What method can I do for excluding meaningless data
> from the response JSON before they are returned? Because there are so many
> similar tasks for me, transmitting the entire doc will spend lots of time
> and bandwidth, and I have to trim the additional date in myself program, it
> also wast CPU time in local computer. So if you know how to deal with this
> need, pleas teach it to my. Thank you!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2a01d5f4-67a5-493a-8e35-6f9a40a9998b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCYFj8LGp%2B1jTaER10DrPbGVcbfnatkm8%2BNrOEvzbqfaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


issues with using repository-hdfs plug in for snapshot/restore operation

2014-07-07 Thread Jinyuan Zhou
I am using elasticsearch 1.2.1 and CDH 4.6. quick start vm. My ES server is 
installed on the same vm. 
I have one successful senario: I used light version and add the result and 
command `hadoop classpath` to ES_CLASSPATH

But I encoutered errros with the default version and hadoop2 version. 
Here is the details of issues. 
#1.  I installed the plugin with this command  
bin/plugin --install  elasticsearch/elasticsearch-repository-hdfs/2.0.0
and I sent a PUT request below: 
url: http://localhost:9200/_snapshot/hdfs_repo
data :{
  "type":"hdfs",
  "settings": 
  {
"uri":"hdfs://localhost.localdomain:8020",
"path":"/user/cloudera/es_snapshot"
  }
}

I got this response


   1. "error": "RepositoryException[[hdfs_repo] failed to create repository]; 
nested: CreationException[Guice creation errors:
   
   1) Error injecting constructor, 
org.elasticsearch.ElasticsearchGenerationException: Cannot create Hdfs 
file-system for uri [hdfs://localhost.localdomain:8020]
 at org.elasticsearch.repositories.hdfs.HdfsRepository.(Unknown Source)
 while locating org.elasticsearch.repositories.hdfs.HdfsRepository
 while locating org.elasticsearch.repositories.Repository
   
   1 error]; nested: ElasticsearchGenerationException[Cannot create Hdfs 
file-system for uri [hdfs://localhost.localdomain:8020]]; nested: 
RemoteException[Server IPC version 7 cannot communicate with client version 4]; 
",
   2. "status": 500
   
   


I noticed RemoteException: Server IPC version 7 cannot communicate with 
client version 4

#2 Then I tried hadoop2 version,. So I installed plugin with this command
bin/plugin --install  
elasticsearch/elasticsearch-repository-hdfs/2.0.0-hadoop2

I sent a PUT request as above, this time I even got more strange exectiopon 

NoClassDefFoundError[org/apache/commons/cli/ParseException]
Here is the response.

{
"error": "RepositoryException[[hdfs_repo] failed to create repository]; 
nested: CreationException[Guice creation errors:

1) Error injecting constructor, java.lang.NoClassDefFoundError: 
org/apache/commons/cli/ParseException
  at org.elasticsearch.repositories.hdfs.HdfsRepository.(Unknown Source)
  while locating org.elasticsearch.repositories.hdfs.HdfsRepository
  while locating org.elasticsearch.repositories.Repository

1 error]; nested: NoClassDefFoundError[org/apache/commons/cli/ParseException]; 
nested: ClassNotFoundException[org.apache.commons.cli.ParseException]; ",
"status": 500
}

I wonder if any one have simiar experiences. Note the failed cases are actaully
more realistic deplyoment choices. Because my hadoop cluster will less likely 
be on the same node
as my ES server. 
Thanks,
Jack







-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/acb15aff-299b-4e4b-bf20-b0ed5a891f60%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Search thread pools not released

2014-07-07 Thread Ivan Brusic
Yeah, already traced it back myself. Been using Elasticsearch for years and
I have been only setting query timeouts. Need to re-architect a way to
incorporate client-based timeouts.

Had two different elasticsearch meltdowns this weekend, after a long period
of stability. Both of them different and unique!

-- 
Ivan


On Mon, Jul 7, 2014 at 1:50 PM, joergpra...@gmail.com  wrote:

> Yes, actionGet() can be traced down to AbstractQueueSynchronizer's
> acquireSharedInterruptibly(-1) call
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/AbstractQueuedSynchronizer.html#acquireSharedInterruptibly(int)
>
> in org.elasticsearch.common.util.concurrent.BaseFuture which "waits"
> forever until interrupted. But there are twin methods, like actionGet(long
> millis), that time out.
>
> Jörg
>
>
> On Mon, Jul 7, 2014 at 7:53 PM, Ivan Brusic  wrote:
>
>> Still analyzing all the logs and dumps that I have accumulated so far,
>> but it looks like the blocking socket appender might be the issue. After
>> that node exhausts all of its search threads, the TransportClient will
>> still issue requests to it, although other nodes do not have issues. After
>> a while, the client application will also be blocked waiting for
>> Elasticsearch to return.
>>
>> I removed logging for now, will re-implement it with a service that reads
>> directly from the duplicate file-based log. Although I have a timeout
>> specific for my query, my recollection of the search code is that it only
>> applies to the Lucene LimitedCollector (its been a while since I looked at
>> that code). The next step should be to add an explicit timeout
>> to actionGet(). Is the default basically no wait?
>>
>> It might be a challenge for the cluster engine to not delegate queries to
>> overloaded servers.
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Sun, Jul 6, 2014 at 2:36 PM, joergpra...@gmail.com <
>> joergpra...@gmail.com> wrote:
>>
>>> Yes, socket appender blocks. Maybe the async appender of log4j can do
>>> better ...
>>>
>>> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/
>>>
>>> Jörg
>>>
>>>
>>> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic  wrote:
>>>
 Forgot to mention the thread dumps. I have taken them before, but not
 this time. Most of the block search thead pools are stuck in log4j.

 https://gist.github.com/brusic/fc12536d8e5706ec9c32

 I do have a socket appender to logstash (elasticsearch logs in
 elasticsearch!). Let me debug this connection.

 --
 Ivan


 On Sun, Jul 6, 2014 at 1:55 PM, joergpra...@gmail.com <
 joergpra...@gmail.com> wrote:

> Can be anything seen in a thread dump what looks like stray queries?
> Maybe some facet queries hanged while resources went low and never
> returned?
>
> Jörg
>
>
> On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic  wrote:
>
>> Having an issue on one of my clusters running version 1.1.1 with 8
>> master/data nodes, unicast, connecting via the Java TransportClient. A 
>> few
>> REST queries are executed via monitoring services.
>>
>> Currently there is almost no traffic on this cluster. The few queries
>> that are currently running are either small test queries or large facet
>> queries (which are infrequent and the longest runs for 16 seconds). What 
>> I
>> am noticing is that the active search threads on some noded never 
>> decreases
>> and when it reaches the limit, the entire cluster will stop accepting
>> requests. The current max is the default (3 x 8).
>>
>> http://search06:9200/_cat/thread_pool
>>
>> search05 1.1.1.5 0 0 0 0 0 0 19 0 0
>> search07 1.1.1.7 0 0 0 0 0 0  0 0 0
>> search08 1.1.1.8 0 0 0 0 0 0  0 0 0
>> search09 1.1.1.9 0 0 0 0 0 0  0 0 0
>> search11 1.1.1.11 0 0 0 0 0 0  0 0 0
>> search06 1.1.1.6 0 0 0 0 0 0  2 0 0
>> search10 1.1.1.10 0 0 0 0 0 0  0 0 0
>> search12 1.1.1.12 0 0 0 0 0 0  0 0 0
>>
>> In this case, both search05 and search06 have an active thread count
>> that does not change. If I run a query against search05, the search will
>> respond quickly and the total number of active search threads does not
>> increase.
>>
>> So I have two related issues:
>> 1) the active thread count does not decrease
>> 2) the cluster will not accept requests if one node becomes unstable.
>>
>> I have seen the issue intermittently in the past, but the issue has
>> started again and cluster restarts does not fix the problem. At the log
>> level, there have been issues with the cluster state not propagating. Not
>> every node will acknowledge the cluster state ([discovery.zen.publish
>> ]
>> received cluster state version NNN) and the master would log a timeout
>> (awaiting all nodes to process published state NNN timed out, timeout 
>> 30s).
>> The nodes are fine and I can ping each other with no issues

Re: Search thread pools not released

2014-07-07 Thread joergpra...@gmail.com
Yes, actionGet() can be traced down to AbstractQueueSynchronizer's
acquireSharedInterruptibly(-1) call

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/AbstractQueuedSynchronizer.html#acquireSharedInterruptibly(int)

in org.elasticsearch.common.util.concurrent.BaseFuture which "waits"
forever until interrupted. But there are twin methods, like actionGet(long
millis), that time out.

Jörg


On Mon, Jul 7, 2014 at 7:53 PM, Ivan Brusic  wrote:

> Still analyzing all the logs and dumps that I have accumulated so far, but
> it looks like the blocking socket appender might be the issue. After that
> node exhausts all of its search threads, the TransportClient will still
> issue requests to it, although other nodes do not have issues. After a
> while, the client application will also be blocked waiting for
> Elasticsearch to return.
>
> I removed logging for now, will re-implement it with a service that reads
> directly from the duplicate file-based log. Although I have a timeout
> specific for my query, my recollection of the search code is that it only
> applies to the Lucene LimitedCollector (its been a while since I looked at
> that code). The next step should be to add an explicit timeout
> to actionGet(). Is the default basically no wait?
>
> It might be a challenge for the cluster engine to not delegate queries to
> overloaded servers.
>
> Cheers,
>
> Ivan
>
>
> On Sun, Jul 6, 2014 at 2:36 PM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> Yes, socket appender blocks. Maybe the async appender of log4j can do
>> better ...
>>
>> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/
>>
>> Jörg
>>
>>
>> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic  wrote:
>>
>>> Forgot to mention the thread dumps. I have taken them before, but not
>>> this time. Most of the block search thead pools are stuck in log4j.
>>>
>>> https://gist.github.com/brusic/fc12536d8e5706ec9c32
>>>
>>> I do have a socket appender to logstash (elasticsearch logs in
>>> elasticsearch!). Let me debug this connection.
>>>
>>> --
>>> Ivan
>>>
>>>
>>> On Sun, Jul 6, 2014 at 1:55 PM, joergpra...@gmail.com <
>>> joergpra...@gmail.com> wrote:
>>>
 Can be anything seen in a thread dump what looks like stray queries?
 Maybe some facet queries hanged while resources went low and never
 returned?

 Jörg


 On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic  wrote:

> Having an issue on one of my clusters running version 1.1.1 with 8
> master/data nodes, unicast, connecting via the Java TransportClient. A few
> REST queries are executed via monitoring services.
>
> Currently there is almost no traffic on this cluster. The few queries
> that are currently running are either small test queries or large facet
> queries (which are infrequent and the longest runs for 16 seconds). What I
> am noticing is that the active search threads on some noded never 
> decreases
> and when it reaches the limit, the entire cluster will stop accepting
> requests. The current max is the default (3 x 8).
>
> http://search06:9200/_cat/thread_pool
>
> search05 1.1.1.5 0 0 0 0 0 0 19 0 0
> search07 1.1.1.7 0 0 0 0 0 0  0 0 0
> search08 1.1.1.8 0 0 0 0 0 0  0 0 0
> search09 1.1.1.9 0 0 0 0 0 0  0 0 0
> search11 1.1.1.11 0 0 0 0 0 0  0 0 0
> search06 1.1.1.6 0 0 0 0 0 0  2 0 0
> search10 1.1.1.10 0 0 0 0 0 0  0 0 0
> search12 1.1.1.12 0 0 0 0 0 0  0 0 0
>
> In this case, both search05 and search06 have an active thread count
> that does not change. If I run a query against search05, the search will
> respond quickly and the total number of active search threads does not
> increase.
>
> So I have two related issues:
> 1) the active thread count does not decrease
> 2) the cluster will not accept requests if one node becomes unstable.
>
> I have seen the issue intermittently in the past, but the issue has
> started again and cluster restarts does not fix the problem. At the log
> level, there have been issues with the cluster state not propagating. Not
> every node will acknowledge the cluster state ([discovery.zen.publish]
> received cluster state version NNN) and the master would log a timeout
> (awaiting all nodes to process published state NNN timed out, timeout 
> 30s).
> The nodes are fine and I can ping each other with no issues. Currently not
> seeing any log errors with the thread pool issue, so perhaps it is a red
> herring.
>
> Cheers,
>
> Ivan
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91

Saved Kibana Dashboards & Tribe

2014-07-07 Thread crb89
I have a Tribe node and a Kibana instance for the Tribe node. When I try to 
save a dashboard on the Kibana instance for the Tribe node, I get the 
following errors:

PUT http://{tribe}:9200/kibana-int_{tribe}/dashboard/Logs%20Search 

 
503 (Service Unavailable) 


   1. error: "MasterNotDiscoveredException[waited for [1m]]"
   2. status: 503


I would like to be able to save dashboards from the Kibana instance for the 
Tribe node. Is this possible?




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba09008a-fed9-4553-be50-ec9f8f7bd0af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Custom Plugin for specifying custom filter attributes at query time

2014-07-07 Thread joergpra...@gmail.com
In Elasticsearch, you can extend the existing queries and filters, by a
plugin, with the help of addQuery/addFilter at IndexQueryParserModule

Each query or filter comes in a pair of classes, a builder and a parser.

A filter builder manages the syntax, the content serialization with the
help of XContent classes for inner/outer representation of filter
specification.

A filter parser parses such a structure and turns it into a Lucene Filter
for internal processing.

So one approach would be to look at your bit set implementation how this
can be turned into a Lucene Filter. An instructive example where to start
from is
in org.elasticsearch.index.query.TermsFilterParser/TermsFilterBuilder

An example where terms from fielddata cache are read and turned into a
filter is org.elasticsearch.index.search.FielddataTermsFilter

A key line is the method

public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs)
throws IOException

An example for caching filters
is org.elasticsearch.indices.cache.filter.terms.IndicesTermsFilterCache
(the caching of filters in ES is done with Guava's cache classes)

Also, it could be helpful to study helper classes in this context like in
package org.elasticsearch.common.lucene.docset

I am not aware of a filter plugin yet but it is possible that I could
sketch a demo filter plugin source code on github.

Jörg




On Mon, Jul 7, 2014 at 3:49 PM, Sandeep Ramesh Khanzode <
k.sandee...@gmail.com> wrote:

> Hi,
>
> A little clarification:
>
> Assume sample data set of 50M documents. The documents need to be filtered
> by a field, Field1. However, at indexing time, this field is NOT written to
> the document in Lucene through ES. Field1 is a frequently changing field
> and hence, we will like to maintain it outside.
>
> (This following paragraph can be skipped.)
> Now assume that there are a few such fields, Field1, ..., FieldN. For
> every document in the corpus, the value for Field1 may be from a pool of
> 100-odd values. Thus, for example, at max, FIeld1 can hold 1M documents
> that correspond to one of the 100-dd values, and at the fag-end, can
> probably correspond to 10 documents as well.
>
>
> (Continue reading) :-)
> I would, at system startup time, make sure that I have loaded all relevant
> BitSets that I plan to use for any Filters in memory, so that my cache
> framework is warm and I can lookup the relevant filter values for a
> particular query from this cache at query run time. The mechanisms for this
> loading are still unknown, but please assume that this BitSet will be
> available readily during query time.
>
> This BitSet will correspond to the DocIDs in Lucene for a particular value
> of Field1 that I want to filter. I plan to create a Filter class overridden
> in Lucene that will accept this DocIdSet.
>
> What I am unable to understand is how I can achieve this in ES? Now, I
> have been exploring the different mail threads on this forum, and it seems
> that certain plugins can achieve this. Please see the list below that I
> could find on this forum.
>
> Can you please tell me how an IndexQueryParserModule will serve my use
> case? If you can provide some pointers on writing a plugin that can
> leverage a CustomFilter, that will be immensely helpful. Thanks,
>
> 1.
> https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/IndexQueryParserModule$20Plugin/elasticsearch/5Gqxx3UvN2s/FL4Lb2RxQt0J
> 2. https://groups.google.com/forum/#!topic/elasticsearch/1jiHl4kngJo
> 3. https://github.com/elasticsearch/elasticsearch/issues/208
> 4.
> http://elasticsearch-users.115913.n3.nabble.com/custom-filter-handler-plugin-td4051973.html
>
> Thanks,
> Sandeep
>
> On Mon, Jul 7, 2014 at 2:17 AM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> Thanks for being so patient with me :)
>>
>> I understand now the following: there are 50m of documents in an external
>> DB, from which up to 1m is to be exported in form of document identifiers
>> to work as a filter in ES. The idea is to use internal mechanisms like bit
>> sets. There is no API for manipulating filters in ES on that level, ES
>> receives the terms and passes them into Lucene TermFilter class according
>> to the type of the filter.
>>
>> What is a bit unclear to me: how is the filter set constructed? I assume
>> it should be a select statement on the database?
>>
>> Next, if you have this large set of document identifiers selected, I do
>> not understand what is the base query you want to apply the filter on? Is
>> there a user given query for ES? How does such query looks like? Is it
>> assumed there are other documents in ES that are related somehow to the 50m
>> documents? An illustrative example of the steps in the scenario would
>> really help to understand the data model.
>>
>> Just some food for thought: it is close to impossible to filter in ES on
>> 1m unique terms with a single step - the default setting of maximum clauses
>> in a Lucene Query is for good reason limited to 1024 te

Best practice to backup index daily?

2014-07-07 Thread sabdalla80
I am able to take a snapshot of the index and back it up to AWS S3. What is 
the best way to automate this approach and have it done daily say every 
days at 12 midnight? 
I am aware that I can probably do it with crontab, but curious if other are 
doing it differently?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8620f7d9-b827-470d-8928-75c308e722cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch twitter river filtered stream question

2014-07-07 Thread Josh Harrison
Quick question about the ES twitter river at 
https://github.com/elasticsearch/elasticsearch-river-twitter
The twitter streaming API allows you to filter, and you apparently get up 
to 1% of the stream total, with our search queries. So, if I were filtering 
for "coffee", I'd get "coffee" tweets that I wouldn't get if I was just 
capturing the 1% stream passively.
Does the Twitter river use this filter functionality, or does it do its 
filtering on the ingestion side, ingesting the normal 1% stream and 
discarding anything that doesn't match

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7adb5f1-49a1-4424-8f4e-1c75e15c4cb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-07 Thread Brian Thomas
I am trying to update an elasticsearch index using elasticsearch-hadoop.  I 
am aware of the *es.mapping.id* configuration where you can specify that 
field in the document to use as an id, but in my case the source document 
does not have the id (I used elasticsearch's autogenerated id when indexing 
the document).  Is it possible to specify the id to update without having 
the add a new field to the MapWritable object?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: excessive merging/small segment sizes

2014-07-07 Thread Kireet Reddy
All that seems correct (except I think this is for node 6, not node 5). We 
don't delete documents, but we do some updates. The vast majority of 
documents get indexed into the large shards, but the smaller ones take some 
writes as well.

We aren't using virtualized hardware and elasticsearch is the only thing 
running on the machines, no scheduled jobs, etc. I don't think something is 
interfering, actually overall disk i/o rate and operations on the machine 
go down quite a bit during the problematic period, which is consistent with 
your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed that 
even though the heap usage and gc count seems smooth during the period in 
question, gc time spent goes way up. Also active indexing threads goes up, 
but since our ingest rate didn't go up I assumed this was a side effect. 
During a previous occurrence a few days ago on node5, I stopped all 
indexing activity for 15 minutes. Active merges and indexing requests went 
to zero as expected. Then I re-enabled indexing and immediately the 
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a root 
cause here. A summary of what I think I know:

   1. Every few days, cpu usage on a node goes way above the other nodes 
   and doesn't recover. We've let the node run in the elevated cpu state for a 
   day with no improvement.
   2. It doesn't seem likely that it's data related. We use replicas=1 and 
   no other nodes have issues.
   3. It doesn't seem hardware related. We run on a dedicated h/w with 
   elasticsearch being the only thing running. Also the problem appears on 
   various nodes and machine load seems tied directly to the elasticsearch 
   process.
   4. During the problematic period: cpu usage, active merge threads, 
   active bulk (indexing) threads, and gc time are elevated.
   5. During the problematic period: i/o ops and i/o throughput decrease.
   6. overall heap usage size seems to smoothly increase, the extra gc time 
   seems to be spent on the new gen. Interestingly, the gc count didn't seem 
   to increase.
   7. In the hours beforehand, gc behavior of the problematic node was 
   similar to the other nodes.
   8. If I pause indexing, machine load quickly returns to normal, merges 
   and indexing requests complete.  if I then restart indexing the problem 
   reoccurs immediately.
   9. If I disable automatic refreshes, the problem disappears within an 
   hour or so.
   10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is perhaps a 
side effect, but then the last 3 make me think merging is the root cause. 
The only additional things I can think of that may be relevant are:

   1. Our documents can vary greatly in size, they average a couple KB but 
   can rarely be several MB. 
   2. we do use language analysis plugins, perhaps one of these is acting 
   up? 
   3. We eagerly load one field into the field data cache. But the cache 
   size is ok and the overall heap behavior is ok so I don't think this is the 
   problem.

That's a lot of information, but I am not sure where to go next from here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:
>
> Indeed there are no big merges during that time ...
>
> I can see on node5, ~14:45 suddenly merges are taking a long time, refresh 
> is taking much longer (4-5 seconds instead of < .4 sec), commit time goes 
> up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine e.g. total 
> merging GB, number of commits/refreshes is very low during this time.
>
> Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.  The 
> biggish shards are indexing at a very slow rate and only have ~1% 
> deletions.  Are you explicitly deleting docs?
>
> I suspect something is suddenly cutting into the IO perf of this box, and 
> because merging/refreshing is so IO intensive, it causes these operations 
> to run slower / backlog.
>
> Are there any scheduled jobs, e.g. backups/snapshots, that start up?  Are 
> you running on virtualized hardware?
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>  
>
> On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy  > wrote:
>
>> Just to reiterate, the problematic period is from 07/05 14:45 to 07/06 
>> 02:10. I included a couple hours before and after in the logs.
>>
>>
>> On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:
>>>
>>> They are linked below (node5 is the log of the normal node, node6 is the 
>>> log of the problematic node). 
>>>
>>> I don't think it was doing big merges, otherwise during the high load 
>>> period, the merges graph line would have had a "floor" > 0, similar to the 
>>> time period after I disabled refresh. We don't do routing and use mostly 
>>> default settings. I think the only settings we changed are:
>>>
>>> indices.memory.index_buffer_size:

Re: Opening TransportClient connection per Index

2014-07-07 Thread joergpra...@gmail.com
You can enlarge thread pools in TransportClient, also Netty worker threads.
User session states should be managed in the front-end service (reverse
proxy or Java middleware e.g.) so it is still ok to use a singleton
TransportClient since it is stateless. It handles requests and the
corresponding responses.

If 100 users are active at the same time, resources might get low, so you
should think about ramping up a line of front-end services on more than one
machine to balance the front-end load.

Jörg


On Mon, Jul 7, 2014 at 5:45 PM, AsyncAwait  wrote:

> I have a use case in which i want to create per index per account (assume
> an account represents an user), all data belong to that user will be kept
> in that index. My question is - what if we create connection per index and
> keep it alive during the user session. So this means for 100 active users,
> there will be 100 connections from TransportClient to ES Cluster. I don't
> have a reason why I have 100 instances of TransportClient has to be
> initialized and kept in memory. Not sure if a singleton would unnecessarily
> bring thread locks.
>
> Any help is greatly appreciated.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ce54bd4b-1c11-429a-8998-76482f051375%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEjt3qCUWD_5rxv84XL8BYNMzy2dKsfsfr8j-017cNbow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I do intersection or union operation with two facets filter?

2014-07-07 Thread Harish Ved
Did you try the following query?
"query":{
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{"term" : { "field_B" : "" }},
{"term": {"field_B": "bbb"}}
]
}
}
}
  },
"facets" : {
"tag" : {
"terms" : {
"field" : "field_A"
}
}
}
}


Plz confirm if it works for you.

On Monday, 7 July 2014 15:56:53 UTC+5:30, 闫旭 wrote:
>
>  Dear All!
> I have some docs:
> {"field_A":"aaa","field_B":"bbb"}
> {"field_A":"aaa","field_B":"ccc"}
> {"field_A":"bbb","field_B":"bbb"}
> {"field_A":"bbb","field_B":"bbb"}
> {"field_A":"bbb","field_B":"eee"}
> {"field_A":"aaa","field_B":""}
> {"field_A":"ccc","field_B":""}
> first step:
> {
>   "query":{
> "filtered" : {
> "filter" : {
> "bool" : {
> "must" : {
> "term" : { "field_B" : "bbb" }
> }
> }
> }
> }
>   },
> "facets" : {
> "tag" : {
> "terms" : {
> "field" : "field_A"
> }
> }
> }
> }
> first result:
> {
> ...
> {"term":"aaa","count":1},
> {"term":"bbb","count":2},
> ...
> }
> -
> second step:
> the second facets:
> {
>   "query":{
> "filtered" : {
> "filter" : {
> "bool" : {
> "must" : {
> "term" : { "field_B" : "" }
> }
> }
> }
> }
>   },
> "facets" : {
> "tag" : {
> "terms" : {
> "field" : "field_A"
> }
> }
> }
> }
> second result:
> {
> ...
> {"term":"aaa","count":1}.
> {"term":"ccc","count":1}
> ...
> }
>
> -
> third step:
> combine the two result with interesction operation with "term":
> {"term":"aaa","count":"I don't care the count value."}
>
>
> -
> Now, How can i combine the three steps in one filter facets or othen 
> method??
>
>
> Thx All!!
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/93d5adb6-bcd1-49b6-8c28-be4d456f92d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Search thread pools not released

2014-07-07 Thread Ivan Brusic
Still analyzing all the logs and dumps that I have accumulated so far, but
it looks like the blocking socket appender might be the issue. After that
node exhausts all of its search threads, the TransportClient will still
issue requests to it, although other nodes do not have issues. After a
while, the client application will also be blocked waiting for
Elasticsearch to return.

I removed logging for now, will re-implement it with a service that reads
directly from the duplicate file-based log. Although I have a timeout
specific for my query, my recollection of the search code is that it only
applies to the Lucene LimitedCollector (its been a while since I looked at
that code). The next step should be to add an explicit timeout
to actionGet(). Is the default basically no wait?

It might be a challenge for the cluster engine to not delegate queries to
overloaded servers.

Cheers,

Ivan


On Sun, Jul 6, 2014 at 2:36 PM, joergpra...@gmail.com  wrote:

> Yes, socket appender blocks. Maybe the async appender of log4j can do
> better ...
>
> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/
>
> Jörg
>
>
> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic  wrote:
>
>> Forgot to mention the thread dumps. I have taken them before, but not
>> this time. Most of the block search thead pools are stuck in log4j.
>>
>> https://gist.github.com/brusic/fc12536d8e5706ec9c32
>>
>> I do have a socket appender to logstash (elasticsearch logs in
>> elasticsearch!). Let me debug this connection.
>>
>> --
>> Ivan
>>
>>
>> On Sun, Jul 6, 2014 at 1:55 PM, joergpra...@gmail.com <
>> joergpra...@gmail.com> wrote:
>>
>>> Can be anything seen in a thread dump what looks like stray queries?
>>> Maybe some facet queries hanged while resources went low and never
>>> returned?
>>>
>>> Jörg
>>>
>>>
>>> On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic  wrote:
>>>
 Having an issue on one of my clusters running version 1.1.1 with 8
 master/data nodes, unicast, connecting via the Java TransportClient. A few
 REST queries are executed via monitoring services.

 Currently there is almost no traffic on this cluster. The few queries
 that are currently running are either small test queries or large facet
 queries (which are infrequent and the longest runs for 16 seconds). What I
 am noticing is that the active search threads on some noded never decreases
 and when it reaches the limit, the entire cluster will stop accepting
 requests. The current max is the default (3 x 8).

 http://search06:9200/_cat/thread_pool

 search05 1.1.1.5 0 0 0 0 0 0 19 0 0
 search07 1.1.1.7 0 0 0 0 0 0  0 0 0
 search08 1.1.1.8 0 0 0 0 0 0  0 0 0
 search09 1.1.1.9 0 0 0 0 0 0  0 0 0
 search11 1.1.1.11 0 0 0 0 0 0  0 0 0
 search06 1.1.1.6 0 0 0 0 0 0  2 0 0
 search10 1.1.1.10 0 0 0 0 0 0  0 0 0
 search12 1.1.1.12 0 0 0 0 0 0  0 0 0

 In this case, both search05 and search06 have an active thread count
 that does not change. If I run a query against search05, the search will
 respond quickly and the total number of active search threads does not
 increase.

 So I have two related issues:
 1) the active thread count does not decrease
 2) the cluster will not accept requests if one node becomes unstable.

 I have seen the issue intermittently in the past, but the issue has
 started again and cluster restarts does not fix the problem. At the log
 level, there have been issues with the cluster state not propagating. Not
 every node will acknowledge the cluster state ([discovery.zen.publish]
 received cluster state version NNN) and the master would log a timeout
 (awaiting all nodes to process published state NNN timed out, timeout 30s).
 The nodes are fine and I can ping each other with no issues. Currently not
 seeing any log errors with the thread pool issue, so perhaps it is a red
 herring.

 Cheers,

 Ivan


 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid

Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-07-07 Thread Costin Leau
Thanks for the analysis. It looks like Hadoop 1.0.4 POM has an invalid pom
- though it uses Jackson 1.8.8 (see the distro) the pom declares version
1.0.1 for some reason. Hadoop version 1.2 (the latest stable) and higher
has this fixed.

We don't mark the jackson version within our POM since it's already
available at runtime - we can probably due so going forward in the Spark
integration.


On Mon, Jul 7, 2014 at 6:39 PM, Brian Thomas 
wrote:

> Here is the gradle build I was using originally:
>
> apply plugin: 'java'
> apply plugin: 'eclipse'
>
> sourceCompatibility = 1.7
> version = '0.0.1'
> group = 'com.spark.testing'
>
> repositories {
> mavenCentral()
> }
>
> dependencies {
> compile 'org.apache.spark:spark-core_2.10:1.0.0'
> compile 'edu.stanford.nlp:stanford-corenlp:3.3.1'
> compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version:
> '3.3.1', classifier:'models'
> compile files('lib/elasticsearch-hadoop-2.0.0.jar')
> testCompile 'junit:junit:4.+'
> testCompile group: "com.github.tlrx", name: "elasticsearch-test", version:
> "1.2.1"
> }
>
>
> When I ran dependencyInsight on jackson, I got the following output:
>
> C:\dev\workspace\SparkProject>gradle dependencyInsight --dependency
> jackson-core
>
> :dependencyInsight
> com.fasterxml.jackson.core:jackson-core:2.3.0
> \--- com.fasterxml.jackson.core:jackson-databind:2.3.0
>  +--- org.json4s:json4s-jackson_2.10:3.2.6
>  |\--- org.apache.spark:spark-core_2.10:1.0.0
>  | \--- compile
>  \--- com.codahale.metrics:metrics-json:3.0.0
>   \--- org.apache.spark:spark-core_2.10:1.0.0 (*)
>
> org.codehaus.jackson:jackson-core-asl:1.0.1
> \--- org.codehaus.jackson:jackson-mapper-asl:1.0.1
>  \--- org.apache.hadoop:hadoop-core:1.0.4
>   \--- org.apache.hadoop:hadoop-client:1.0.4
>\--- org.apache.spark:spark-core_2.10:1.0.0
> \--- compile
>
> Version 1.0.1 of jackson-core-asl does not have the field
> ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do.
>
> On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote:
>
>> Hi,
>>
>> Glad to see you sorted out the problem. Out of curiosity what version of
>> jackson were you using and what was pulling it in? Can you share you maven
>> pom/gradle build?
>>
>>
>> On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas 
>> wrote:
>>
>>> I figured it out, dependency issue in my classpath.  Maven was pulling
>>> down a very old version of the jackson jar.  I added the following line to
>>> my dependencies and the error went away:
>>>
>>> compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13'
>>>
>>>
>>> On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote:

  I am trying to test querying elasticsearch using Apache Spark using
 elasticsearch-hadoop.  I am just trying to do a query to the elasticsearch
 server and return the count of results.

 Below is my test class using the Java API:

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.io.MapWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.JavaPairRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.serializer.KryoSerializer;
 import org.elasticsearch.hadoop.mr.EsInputFormat;

 import scala.Tuple2;

 public class ElasticsearchSparkQuery{

 public static int query(String masterUrl, String
 elasticsearchHostPort) {
 SparkConf sparkConfig = new SparkConf().setAppName("ESQuer
 y").setMaster(masterUrl);
 sparkConfig.set("spark.serializer",
 KryoSerializer.class.getName());
 JavaSparkContext sparkContext = new
 JavaSparkContext(sparkConfig);

 Configuration conf = new Configuration();
 conf.setBoolean("mapred.map.tasks.speculative.execution",
 false);
 conf.setBoolean("mapred.reduce.tasks.speculative.execution",
 false);
 conf.set("es.nodes", elasticsearchHostPort);
 conf.set("es.resource", "media/docs");
 conf.set("es.query", "?q=*");

 JavaPairRDD esRDD =
 sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class,
 MapWritable.class);
 return (int) esRDD.count();
 }
 }


 When I try to run this I get the following error:


 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0
 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0
 locally
 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit
 [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0]
 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id...
 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0
 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES
 at org.elasticsearch.hadoop.

Re: Memory issues on ES client node

2014-07-07 Thread joergpra...@gmail.com
I think this is not a concurrency problem but the cluster wanted to deliver
a huge portion of data (just guessing about such query responses because I
do not know anything about the queries on your system).

Client timeout of receiving data is around 30 secs IIRC. It only means that
it could be possible the client gave up before the transfer of the data was
done.

Jörg


On Mon, Jul 7, 2014 at 4:49 AM, Venkat Morampudi 
wrote:

> It expected to nodes move huge volumes of data but what I was wondering
> why the objects are not being garbage collected. Also, there are 242
> TransportSearchQueryThenFetchAction$AsyncAction; I don’t think that kind
> of concurrency is not expected. I couldn’t yet find from the code which
> object is holding those objects.
>
> The timeouts that you are referring to, are these between client node and
> data nodes or client node and consumer?  Is there any thing the consumer
> need to do to release objects.
>
> Thanks for your time,
> -VM
>
>
> On Jul 2, 2014, at 7:08 AM, joergpra...@gmail.com wrote:
>
> I'm not sure but it looks like a node tries to move some GB of document
> hits around. This might have triggered timeouts at other places (probably
> with node disconnects) and maybe the GB chunk is not yet GC collected, so
> you see this in your heap analyzer tool.
>
> It depends on the search results and search hits you generated if the
> heaviness of the search result is expected or not, so it would be useful to
> know more about your queries.
>
> Jörg
>
>
> On Wed, Jul 2, 2014 at 3:29 AM, Venkat Morampudi <
> venkatmoramp...@gmail.com> wrote:
>
>> Thanks for reply Jörg. I don't have any logs. I will try to enable them
>> it would but it would take some time though. If there anything in
>> particular that we need to enable, please let me know.
>>
>> -VM
>>
>>
>> On Tuesday, July 1, 2014 12:58:21 PM UTC-7, Jörg Prante wrote:
>>
>>> Do you have anything in your logs, i.e. many disconnects/reconnects?
>>>
>>> Jörg
>>>
>>>
>>> On Tue, Jul 1, 2014 at 7:59 PM, Venkat Morampudi 
>>> wrote:
>>>
 In the elastic search deployment we are seeing random client node
 crashed due to out of memory exception. I got the memory dump from one of
 the crash and analysed using Eclipse memory analyzer. I have attached leak
 suspect report. Apparently 242 objects of type org.elasticsearch.action.
 search.type.TransportSearchQueryThenFetchAction$AsyncAction are
 holding almost 8gb of memory. I have spent some time on source code but
 couldn't find anything obvious.


 I would really appreciate any help with this issue.


 -VM

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/37881ead-70c2-40d8-89b6-a771b2a36bdd%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/9930fcfd-d2d4-4f62-b8a0-8f1f989069f2%40googlegroups.com
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/EH76o1CIeQQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE_Xum%2BU%3D-M-X_R93qbDdOKx-QFS2PFCbxcik-uqtpBbw%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7E54C4B5-AE5A-4E64-8199-B923E83376

Re: New Logstash setup issue with iptables

2014-07-07 Thread Lois Bennett
Thank you, Linus!   That did the trick!  

Peace and Joy,
Lois

On Saturday, July 5, 2014 8:51:19 AM UTC-4, Linus Askengren wrote:
>
> Hi Lois,
>
> I had the exact same problem, the discovery is running on udp 54328 by 
> default 
> ,
>  
> opening that port solved it for me.
>
> Hope it helps
> Linus
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/387bd4dd-cd80-41c2-815f-329eeac33f63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ingest performance degrades sharply along with the documents having more fileds

2014-07-07 Thread Mahesh Venkat
Thanks Shay for updating us with perf improvements.
Apart from using the default parameters, should we follow the guideline 
listed in 

http://elasticsearch-users.115913.n3.nabble.com/Is-ES-es-index-store-type-memory-equivalent-to-Lucene-s-RAMDirectory-td4057417.html
 

Lucene supports MMapDirectory at the data indexing phase (in a batch) and 
switch to in-memory for queries to optimize on search latency.

Should we use JVM system parameter -Des.index.store.type=memory .  Isn't 
this equivalent to using RAMDirectory in Lucene for in-memory search query 
 ?
Thanks
--Mahesh

On Saturday, July 5, 2014 8:46:59 AM UTC-7, kimchy wrote:
>
> Heya, I worked a bit on it, and 1.x (upcoming 1.3) has some significant 
> perf improvements now for this case (including improvements Lucene wise, 
> that are for now in ES, but will be in Lucene next version). Those include:
>
> 6648: https://github.com/elasticsearch/elasticsearch/pull/6648
> 6714: https://github.com/elasticsearch/elasticsearch/pull/6714
> 6707: https://github.com/elasticsearch/elasticsearch/pull/6707
>
> It would be interesting if you can run the tests again with 1.x branch. 
> Note, also, please use default features in ES for now, no disable flushing 
> and such.
>
> On Friday, June 13, 2014 7:57:23 AM UTC+2, Maco Ma wrote:
>>
>> I try to measure the performance of ingesting the documents having lots 
>> of fields.
>>
>>
>> The latest elasticsearch 1.2.1:
>> Total docs count: 10k (a small set definitely)
>> ES_HEAP_SIZE: 48G
>> settings:
>>
>> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA","number_of_replicas":"0","translog":{"disable_flush":"true"},"number_of_shards":"5","refresh_interval":"-1","version":{"created":"1020199"}
>>
>> mappings:
>>
>> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"mapping":{"store":false,"norms":{"enabled":false},"type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"store":false,"type":"integer"},"match":"*_i"}}],"_source":{"enabled":false},"properties":{}
>>
>> All fields in the documents mach the templates in the mappings.
>>
>> Since I disabled the flush & refresh, I submitted the flush command 
>> (along with optimize command after it) in the client program every 10 
>> seconds. (I tried the another interval 10mins and got the similar results)
>>
>> Scenario 0 - 10k docs have 1000 different fields:
>> Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
>> heap memory).
>>
>>
>> Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
>> with scenario0):
>> This time ingestion took 29 secs.   Only 5.74G heap mem is used.
>>
>> Not sure why the performance degrades sharply.
>>
>> If I try to ingest the docs having 100k different fields, it will take 17 
>> mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
>> badly. 
>>
>> Anyone can give suggestion to improve the performance?
>>
>>
>>
>>
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9456c6ab-1f0b-4021-b011-d8573032915a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Warmer queries - many small or one large

2014-07-07 Thread Jonathan Foy
Hello

The subject pretty much says it all...is there an advantage one way or the 
other to having several (or many) small (single term) warmer queries rather 
than a single large query that searches all desired fields?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eb396abd-514d-4d81-9488-aa0fde98de18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Opening TransportClient connection per Index

2014-07-07 Thread AsyncAwait
I have a use case in which i want to create per index per account (assume 
an account represents an user), all data belong to that user will be kept 
in that index. My question is - what if we create connection per index and 
keep it alive during the user session. So this means for 100 active users, 
there will be 100 connections from TransportClient to ES Cluster. I don't 
have a reason why I have 100 instances of TransportClient has to be 
initialized and kept in memory. Not sure if a singleton would unnecessarily 
bring thread locks.

Any help is greatly appreciated.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce54bd4b-1c11-429a-8998-76482f051375%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-07-07 Thread Brian Thomas
Here is the gradle build I was using originally:

apply plugin: 'java'
apply plugin: 'eclipse'

sourceCompatibility = 1.7
version = '0.0.1'
group = 'com.spark.testing'

repositories {
mavenCentral()
}

dependencies {
compile 'org.apache.spark:spark-core_2.10:1.0.0'
compile 'edu.stanford.nlp:stanford-corenlp:3.3.1'
compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: 
'3.3.1', classifier:'models'
compile files('lib/elasticsearch-hadoop-2.0.0.jar')
testCompile 'junit:junit:4.+'
testCompile group: "com.github.tlrx", name: "elasticsearch-test", version: 
"1.2.1"
}


When I ran dependencyInsight on jackson, I got the following output:

C:\dev\workspace\SparkProject>gradle dependencyInsight --dependency 
jackson-core

:dependencyInsight
com.fasterxml.jackson.core:jackson-core:2.3.0
\--- com.fasterxml.jackson.core:jackson-databind:2.3.0
 +--- org.json4s:json4s-jackson_2.10:3.2.6
 |\--- org.apache.spark:spark-core_2.10:1.0.0
 | \--- compile
 \--- com.codahale.metrics:metrics-json:3.0.0
  \--- org.apache.spark:spark-core_2.10:1.0.0 (*)

org.codehaus.jackson:jackson-core-asl:1.0.1
\--- org.codehaus.jackson:jackson-mapper-asl:1.0.1
 \--- org.apache.hadoop:hadoop-core:1.0.4
  \--- org.apache.hadoop:hadoop-client:1.0.4
   \--- org.apache.spark:spark-core_2.10:1.0.0
\--- compile

Version 1.0.1 of jackson-core-asl does not have the field 
ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do.

On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote:
>
> Hi,
>
> Glad to see you sorted out the problem. Out of curiosity what version of 
> jackson were you using and what was pulling it in? Can you share you maven 
> pom/gradle build?
>
>
> On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas  > wrote:
>
>> I figured it out, dependency issue in my classpath.  Maven was pulling 
>> down a very old version of the jackson jar.  I added the following line to 
>> my dependencies and the error went away:
>>
>> compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13'
>>
>>
>> On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote:
>>>
>>>  I am trying to test querying elasticsearch using Apache Spark using 
>>> elasticsearch-hadoop.  I am just trying to do a query to the elasticsearch 
>>> server and return the count of results.
>>>
>>> Below is my test class using the Java API:
>>>
>>> import org.apache.hadoop.conf.Configuration;
>>> import org.apache.hadoop.io.MapWritable;
>>> import org.apache.hadoop.io.Text;
>>> import org.apache.spark.SparkConf;
>>> import org.apache.spark.api.java.JavaPairRDD;
>>> import org.apache.spark.api.java.JavaSparkContext;
>>> import org.apache.spark.serializer.KryoSerializer;
>>> import org.elasticsearch.hadoop.mr.EsInputFormat;
>>>
>>> import scala.Tuple2;
>>>
>>> public class ElasticsearchSparkQuery{
>>>
>>> public static int query(String masterUrl, String 
>>> elasticsearchHostPort) {
>>> SparkConf sparkConfig = new SparkConf().setAppName("
>>> ESQuery").setMaster(masterUrl);
>>> sparkConfig.set("spark.serializer", 
>>> KryoSerializer.class.getName());
>>> JavaSparkContext sparkContext = new 
>>> JavaSparkContext(sparkConfig);
>>>
>>> Configuration conf = new Configuration();
>>> conf.setBoolean("mapred.map.tasks.speculative.execution", 
>>> false);
>>> conf.setBoolean("mapred.reduce.tasks.speculative.execution", 
>>> false);
>>> conf.set("es.nodes", elasticsearchHostPort);
>>> conf.set("es.resource", "media/docs");
>>> conf.set("es.query", "?q=*");
>>>
>>> JavaPairRDD esRDD = 
>>> sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class,
>>> MapWritable.class);
>>> return (int) esRDD.count();
>>> }
>>> }
>>>
>>>
>>> When I try to run this I get the following error:
>>>
>>>
>>> 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0
>>> 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0 
>>> locally
>>> 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit 
>>> [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0]
>>> 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id...
>>> 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0
>>> java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES
>>> at org.elasticsearch.hadoop.serialization.json.
>>> JacksonJsonParser.(JacksonJsonParser.java:38)
>>> at org.elasticsearch.hadoop.serialization.ScrollReader.
>>> read(ScrollReader.java:75)
>>> at org.elasticsearch.hadoop.rest.RestRepository.scroll(
>>> RestRepository.java:267)
>>> at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(
>>> ScrollQuery.java:75)
>>> at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(
>>> EsInputFormat.java:319)
>>> at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.
>>> nextKeyValue(EsInputFormat.java:255)
>>> at org.apache.spark.rdd.

Re: ES Hadoop--Index only new documents without killing job from exceptions?

2014-07-07 Thread James Campbell
Thanks, Costin.  That makes sense; I've also commented on the issue you
mentioned on github.

Having more control over the when to fail a job or choose to ignore certain
errors would definitely be a great feature from my perspective. I've
encountered a few different areas where I think extra control would be
valuable:

(1) Ability to fail on indexing failures (that persist despite the retry
policy). Currently multiple failed bulk retries are reported only via
counter. Since job control programs such as Oozie don't make it easy to
fail a workflow based on a counter, I think it makes more sense to be able
to fail a job that had batches completely fail, else the documents may
never be searchable from elastic search.

(2) DocumentAlreadyExists exceptions with the "create" write mode. Given
the batch nature of hadoop, there are cases (e.g. building autocomplete)
where it may make sense to update an index only with new data. To avoid a
reindex cost, it would be nice to be able to have a job succeed even if ES
thaws a DocumentAlreadyExists exception so we can just throw data over to
ES to check whether it exists and ignore the request if it does.

(3) Malformed/bad data. Despite (2) above, it would be ideal to still be
able to throw errors and fail a job in the case of invalid data,
particularly in case of legitimately invalid JSON (such as unescaped
special characters that may have occurred in data that is being batch
processed from a a binary container format in HDFS).


On Sun, Jul 6, 2014 at 4:38 PM, Costin Leau  wrote:

> I would recommend indexing the document since it's a 'cheap' operation per
> document and it covers the potential differences between the docs. Also
> from a performance POV you are not going to lose much since you are anyway
> sending the doc to ES, which does hashing and returns the error to the user.
> So the only thing that you save and _might_ potentially see is the actual
> indexing which should become a problem only when dealing with large amounts
> of docs.
>
> These being said, there's already an issue opened [1] for
> trapping/handling errors during a job (to prevent it from being cancelled)
> which potentially can be used for such a purpose as well. Free free to add
> your comments to it.
>
> [1] https://github.com/elasticsearch/elasticsearch-hadoop/issues/160
>
>
> On Thu, Jul 3, 2014 at 8:49 PM, James Campbell  > wrote:
>
>> Hi ES-Hadoop users--
>>
>> I have a large list of simple documents that I would like to index for an
>> auto complete feature. At batch processing time, I do not know which values
>> are new (never seen before) and which are not (some other part of the
>> update process changed, but the autocomplete-relevant portion of the
>> document did not).
>>
>> I believe I could simply write all of the documents to the index whenever
>> I run a new batch with the default es.write.operation=index, but that will
>> cause ES to reindex the document each time even if it wasn't updated.
>>
>> On the other hand, if I choose to use es.write.operation=create, then any
>> existing documents will cause the job to fail.
>>
>> Is there a way to combine those behaviors, so that I can allow
>> elasticsearch to simply ignore requests to reindex existing documents
>> (based on _id) but not to throw an exception that kills the entire job?
>>
>> James Campbell
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/2e5b93ef-0c42-4068-bc2c-33e4efbe429b%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/EHJQsxb-s4w/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAJogdmeX_Dc-LRNcgPxY4bQ6drz43eL%3DuQnRVYYD-kjZ8%3DJebw%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BAQu3xrGMFhDV%2B%2B7SGshm%2ByLHof7DV-RRy3inLOz-DVsCaHXg%40mail.gmail.com.
For more options, visit https://groups.go

Re: excessive merging/small segment sizes

2014-07-07 Thread Michael McCandless
Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time, refresh
is taking much longer (4-5 seconds instead of < .4 sec), commit time goes
up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine e.g. total
merging GB, number of commits/refreshes is very low during this time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.  The
biggish shards are indexing at a very slow rate and only have ~1%
deletions.  Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this box, and
because merging/refreshing is so IO intensive, it causes these operations
to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up?  Are
you running on virtualized hardware?


Mike McCandless

http://blog.mikemccandless.com


On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy  wrote:

> Just to reiterate, the problematic period is from 07/05 14:45 to 07/06
> 02:10. I included a couple hours before and after in the logs.
>
>
> On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:
>>
>> They are linked below (node5 is the log of the normal node, node6 is the
>> log of the problematic node).
>>
>> I don't think it was doing big merges, otherwise during the high load
>> period, the merges graph line would have had a "floor" > 0, similar to the
>> time period after I disabled refresh. We don't do routing and use mostly
>> default settings. I think the only settings we changed are:
>>
>> indices.memory.index_buffer_size: 50%
>> index.translog.flush_threshold_ops: 5
>>
>> We are running on a 6 cpu/12 cores machine with a 32GB heap and 96GB of
>> memory with 4 spinning disks.
>>
>> node 5 log (normal) 
>> node 6 log (high load)
>> 
>>
>> On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:
>>>
>>> Can you post the IndexWriter infoStream output?  I can see if anything
>>> stands out.
>>>
>>> Maybe it was just that this node was doing big merges?  I.e., if you
>>> waited long enough, the other shards would eventually do their big merges
>>> too?
>>>
>>> Have you changed any default settings, do custom routing, etc.?  Is
>>> there any reason to think that the docs that land on this node are
>>> "different" in any way?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy  wrote:
>>>
  From all the information I’ve collected, it seems to be the merging
 activity:


1. We capture the cluster stats into graphite and the current
merges stat seems to be about 10x higher on this node.
2. The actual node that the problem occurs on has happened on
different physical machines so a h/w issue seems unlikely. Once the 
 problem
starts it doesn't seem to stop. We have blown away the indices in the 
 past
and started indexing again after enabling more logging/stats.
3. I've stopped executing queries so the only thing that's
happening on the cluster is indexing.
4. Last night when the problem was ongoing, I disabled refresh
(index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
returned to normal. The merge activity seemed to reduce, it seems like 2
very long running merges are executing but not much else.
5. I grepped an hour of logs of the 2 machiese for "add merge=", it
was 540 on the high load node and 420 on a normal node. I pulled out the
size value from the log message and the merges seemed to be much 
 smaller on
the high load node.

 I just created the indices a few days ago, so the shards of each index
 are balanced across the nodes. We have external metrics around document
 ingest rate and there was no spike during this time period.



 Thanks
 Kireet


 On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

> It's perfectly normal/healthy for many small merges below the floor
> size to happen.
>
> I think you should first figure out why this node is different from
> the others?  Are you sure it's merging CPU cost that's different?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy 
> wrote:
>
>>  We have a situation where one of the four nodes in our cluster
>> seems to get caught up endlessly merging.  However it seems to be
>> high CPU activity and not I/O constrainted. I have enabled the 
>> IndexWriter
>> info stream logs, and often times it seems to do merges of quite small
>> segments (100KB) that are much below the floor size (2MB). I suspect this
>> is due to frequent refreshes and/or using lots of threads concurrently to
>>

Re: Terms Aggregation and scope

2014-07-07 Thread Diederik Meijer | Ten Horses
Awesome, just ran the curl command below, works fine!

curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, "aggs": 
{ "datum_uitspraak_ymd": { "terms": { "field": "datum_uitspraak_ymd", 
"include":"2014.*", "size":1 } } } }'



Op Jul 7, 2014, om 3:10 PM heeft Colin Goodheart-Smithe 
 het volgende geschreven:

> Glad it worked.
> 
> Yes, there are options for includes and excludes patterns.  Take a look at 
> the following link for information on how to use them.
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values
> 
> Colin
> 
> On Monday, 7 July 2014 14:04:56 UTC+1, Diederik Meijer wrote:
> Hi Colin,
> 
> Thank you, that works perfectly!
> 
> Is there any way to limit the key-value pairs by a certain parameter, in the 
> example below: to limit the aggregation to "datum_uitspraak_ymd" keys that 
> start with "2014"?
> 
> Or does that require the combination of a filter and an aggregation?
> 
> 
> Many thanks,
> 
> Diederik
> 
> 
> 
> 
> Op Jul 7, 2014, om 2:47 PM heeft Colin Goodheart-Smithe 
>  het volgende geschreven:
> 
>> Diederik,
>> 
>> To increase the number of terms returned by the terms aggregation you will 
>> need to add the 'size' parameter to your aggregation. The below curl command 
>> will return you the top 200 terms (ordered by decending doc_count).
>> 
>> curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, 
>> "aggs": { "datum_uitspraak_ymd": { "terms": { "field": 
>> "datum_uitspraak_ymd", "size" : 200 } } } }'
>> 
>> You may also find the following link to the documentation useful regarding 
>> the size parameter of the terms aggregation.
>> 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_size_amp_shard_size
>> 
>> 
>> Hope this helps,
>> 
>> Colin
>> 
>> On Monday, 7 July 2014 13:09:52 UTC+1, Diederik Meijer wrote:
>> Dear list,
>> 
>> I need to create an aggregation by a specific field, named 
>> "datum_uitspraak_ymd'. I am using the below curl command and it works fine 
>> in a sense that it returns the aggregation listed below. While this result 
>> seems OK enough, it seems that the keys listed in the aggregation are 
>> limited to those listed in 10 records.
>> 
>> As the number of unique values for this key is much higher than 10, it seems 
>> that the aggregation's scope is global as far as it searches for documents 
>> with a value identical to one of the 10 listed, but it is limited as far as 
>> the key values used in the aggregation is concerned.
>> 
>> How do I need to set up my curl command in order for the aggregation to 
>> return more key-value pairs?
>> 
>> Many thanks,
>> Diederik
>> 
>> 
>> Command:
>> curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, 
>> "aggs": { "datum_uitspraak_ymd": { "terms": { "field": "datum_uitspraak_ymd" 
>> } } } }'
>> 
>> Returns:
>> "aggregations" : { "datum_uitspraak_ymd" : { "buckets" : [ { "key" : 
>> "20121219", "doc_count" : 612 }, { "key" : "20120516", "doc_count" : 526 }, 
>> { "key" : "20110601", "doc_count" : 472 }, { "key" : "20121218", "doc_count" 
>> : 468 }, { "key" : "20090520", "doc_count" : 349 }, { "key" : "20101222", 
>> "doc_count" : 274 }, { "key" : "20120711", "doc_count" : 272 }, { "key" : 
>> "20090429", "doc_count" : 246 }, { "key" : "20120718", "doc_count" : 230 }, 
>> { "key" : "20120425", "doc_count" : 226 } ] } } 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/wCkb_xHaUmQ/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/9f0cb2af-fa2d-40b0-9930-b884141f2969%40googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/wCkb_xHaUmQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/bcd2fd44-6771-4c30-acfc-68ad731074b3%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/578C3BD6-8AC2-46E0-A6

Re: Latitude -> Lat, Longitude -> lon

2014-07-07 Thread Olivier B
Thanks for your reply. 
I know I tried that, but it's in an array, so Iw ould need to iterate or 
something, because trying to map the path leads to an error if there is an 
array:
Here, items is an array: ctx.doc.items.location.longitude

That's why i'm looking for an alternative solution.

On Monday, July 7, 2014 4:16:48 PM UTC+10, David Pilato wrote:
>
> You could may be try to use script filters and add on the fly lat and lon 
> fields or a String representing your point.
>
> See doc: https://github.com/elasticsearch/elasticsearch-river-couchdb
>
> HTH
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 7 juil. 2014 à 02:23, Olivier B > a 
> écrit :
>
> Hi there,
>
> I understand a geo-point can be mapped based on two fields : "lat", "long".
> However my fields are name "longitude" and "latitude".
> I'm using the river plugin for couchdb and I cannot really rename those 
> fields before indexing. And those fields are part of an item in an array:
>
> "items": [
>   { 
> "item_id" : "abcd",
> "location": 
> {
>   "longitude": 145.7711,
>   "latitude": -16.92359
> }
>   },
>   { 
> "item_id" : "efgh",
> "location": 
> {
>   "longitude": 149.6611,
>   "latitude": -19.94098
> }
>   }
> ]
>
> So, any idea how I can rename those fields? And eventually map it to a geo 
> point (
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-geo-point-type.html
> )?
>
> Cheers
>
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/96700d4a-c3f5-441a-a53f-dfee9e934364%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2c23a9d-d901-49d6-8dfb-d8f9d501c766%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Clustering/Sharding impact on query performance

2014-07-07 Thread 'Fin Sekun' via elasticsearch

Hi,


*SCENARIO*

Our Elasticsearch database has ~2.5 million entries. Each entry has the 
three analyzed fields "match", "sec_match" and "thi_match" (all contains 
3-20 words) that will be used in this query:
https://gist.github.com/anonymous/a8d1142512e5625e4e91


ES runs on two *types of servers*:
(1) Real servers (system has direct access to real CPUs, no virtualization) 
of newest generation - Very performant!
(2) Cloud servers with virtualized CPUs - Poor CPUs, but this is generic 
for cloud services.

See https://gist.github.com/anonymous/3098b142c2bab51feecc for (1) and (2) 
CPU details.


*ES settings:*
ES version 1.2.0 (jdk1.8.0_05)
ES_HEAP_SIZE = 512m (we also tested with 1024m with same results)
vm.max_map_count = 262144
ulimit -n 64000
ulimit -l unlimited
index.number_of_shards: 1
index.number_of_replicas: 0
index.store.type: mmapfs
threadpool.search.type: fixed
threadpool.search.size: 75
threadpool.search.queue_size: 5000


*Infrastructure*:
As you can see above, we don't use the cluster feature of ES (1 shard, 0 
replicas). The reason is that our hosting infrastructure is based on 
different providers.
Upside: We aren't dependent on a single hosting provider. Downside: Our 
servers aren't in the same LAN.

This means:
- We cannot use ES sharding, because synchronisation via WAN (internet) 
seems not a useful solution.
- So, every ES-server has the complete dataset and we configured only one 
shard and no replicas for higher performance.
- We have a distribution process that updates the ES data on every host 
frequently. This process is fine for us, because updates aren't very often 
and perfect just-in-time ES synchronisation isn't necessary for our 
business case.
- If a server goes down/crashs, the central loadbalancer removes it (the 
resulting minimal packet lost is acceptable).
 



*PROBLEM*

For long query terms (6 and more keywords), we have very high CPU loads, 
even on the high performance server (1), and this leads to high response 
times: 1-4sec on server (1), 8-20sec on server (2). The system parameters 
while querying:
- Very high load (usually 100%) for the thread responsible CPU (the other 
CPUs are idle in our test scenario)
- No I/O load (the harddisks are fine)
- No RAM bottlenecks

So, we think the file caching is working fine, because we have no I/O 
problems and the garbage collector seams to be happy (jstat shows very few 
GCs). The CPU is the problem, and ES hot-threads point to the Scorer module:
https://gist.github.com/anonymous/9cecfd512cb533114b7d 




*SUMMARY/ASSUMPTIONS*

- Our database size isn't very big and the query not very complex.
- ES is designed for huge amount of data, but the key is 
clustering/sharding: Data distribution to many servers means smaller 
indices, smaller indices leads to fewer CPU load and short response times.
- So, our database isn't big, but to big for a single CPU and this means 
especially low performance (virtual) CPUs can only be used in sharding 
environments.

If we don't want to lost the provider independency, we have only the 
following two options:

1) Simpler query (I think not possible in our case)
2) Smaller database




*QUESTIONS*

Are our assumptions correct? Especially:

- Is clustering/sharding (also small indices) the main key to performance, 
that means the only possibility to prevent overloaded (virtual) CPUs?
- Is it right that clustering is only useful/possible in LANs?
- Do you have any ES configuration or architecture hints regarding our 
preference for using multiple hosting providers?



Thank you. Rgds
Fin

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d1ad2e3c-6d16-493b-a066-1fa2a06a29a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Custom Plugin for specifying custom filter attributes at query time

2014-07-07 Thread Sandeep Ramesh Khanzode
Hi,

A little clarification:

Assume sample data set of 50M documents. The documents need to be filtered
by a field, Field1. However, at indexing time, this field is NOT written to
the document in Lucene through ES. Field1 is a frequently changing field
and hence, we will like to maintain it outside.

(This following paragraph can be skipped.)
Now assume that there are a few such fields, Field1, ..., FieldN. For every
document in the corpus, the value for Field1 may be from a pool of 100-odd
values. Thus, for example, at max, FIeld1 can hold 1M documents that
correspond to one of the 100-dd values, and at the fag-end, can probably
correspond to 10 documents as well.


(Continue reading) :-)
I would, at system startup time, make sure that I have loaded all relevant
BitSets that I plan to use for any Filters in memory, so that my cache
framework is warm and I can lookup the relevant filter values for a
particular query from this cache at query run time. The mechanisms for this
loading are still unknown, but please assume that this BitSet will be
available readily during query time.

This BitSet will correspond to the DocIDs in Lucene for a particular value
of Field1 that I want to filter. I plan to create a Filter class overridden
in Lucene that will accept this DocIdSet.

What I am unable to understand is how I can achieve this in ES? Now, I have
been exploring the different mail threads on this forum, and it seems that
certain plugins can achieve this. Please see the list below that I could
find on this forum.

Can you please tell me how an IndexQueryParserModule will serve my use
case? If you can provide some pointers on writing a plugin that can
leverage a CustomFilter, that will be immensely helpful. Thanks,

1.
https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/IndexQueryParserModule$20Plugin/elasticsearch/5Gqxx3UvN2s/FL4Lb2RxQt0J
2. https://groups.google.com/forum/#!topic/elasticsearch/1jiHl4kngJo
3. https://github.com/elasticsearch/elasticsearch/issues/208
4.
http://elasticsearch-users.115913.n3.nabble.com/custom-filter-handler-plugin-td4051973.html

Thanks,
Sandeep

On Mon, Jul 7, 2014 at 2:17 AM, joergpra...@gmail.com  wrote:

> Thanks for being so patient with me :)
>
> I understand now the following: there are 50m of documents in an external
> DB, from which up to 1m is to be exported in form of document identifiers
> to work as a filter in ES. The idea is to use internal mechanisms like bit
> sets. There is no API for manipulating filters in ES on that level, ES
> receives the terms and passes them into Lucene TermFilter class according
> to the type of the filter.
>
> What is a bit unclear to me: how is the filter set constructed? I assume
> it should be a select statement on the database?
>
> Next, if you have this large set of document identifiers selected, I do
> not understand what is the base query you want to apply the filter on? Is
> there a user given query for ES? How does such query looks like? Is it
> assumed there are other documents in ES that are related somehow to the 50m
> documents? An illustrative example of the steps in the scenario would
> really help to understand the data model.
>
> Just some food for thought: it is close to impossible to filter in ES on
> 1m unique terms with a single step - the default setting of maximum clauses
> in a Lucene Query is for good reason limited to 1024 terms. A workaround
> would be iterating over 1m terms and execute 1000 filter queries and add up
> the results. This takes a long time and may not be the desired solution.
>
> Fortunately, in most situations, it is possible to find more concise
> grouping to reduce the 1m document identifiers into fewer ones for more
> efficient filtering.
>
> Jörg
>
>
>
> On Sun, Jul 6, 2014 at 9:39 PM, 'Sandeep Ramesh Khanzode' via
> elasticsearch  wrote:
>
>> Hi,
>>
>> Appreciate your continued assistance. :) Thanks,
>>
>> Disclaimer: I am yet to sufficiently understand ES sources so as to
>> depict my scenario completely. Some info' below may be conjecture.
>>
>> I would have a corpus of 50M docs (actually lot more, but for testing
>> now) out of which I would have say, upto, 1M DocIds to be used as a filter.
>> This set of 1M docs can be different for different use cases, the point
>> being, upto 1M docIds can form one logical set of documents for filtering
>> results. If I use a simple IdsFilter from ES Java API, I would have to keep
>> adding these 1M docs to the List implementation internally, and I have a
>> feeling it may not scale very well as they may change per use case and per
>> some combinations internal to a single use case also.
>>
>> As I debug the code, the IdsFilter will be converted to a Lucene filter.
>> Lucene filters, on the other hand, operate on a docId bitset type. That
>> gels very well with my requirement, since I can scale with BitSets (I
>> assume).
>>
>> If I can find a way to directly plug this BitSet as a Lucene Filter to
>> the Lucene search() call bypass

Re: Use arrays as update parameters with elasticsearch-hadoop-mr

2014-07-07 Thread James Campbell
Costin--

Great news, thanks for the update!

James


On Sun, Jul 6, 2014 at 4:41 PM, Costin Leau  wrote:

> Hi James,
>
> Fwiw, I plan to address this bug shortly - as you pointed out, the JSON
> array needs to be handled separately before passing its content in.
>
>
> On Thu, Jul 3, 2014 at 8:58 PM, James Campbell  > wrote:
>
>> I would like to update an existing document that has an array from
>> elasticsearch hadoop.
>>
>> I notice that I can do that from curl directly, for example:
>>
>> PUT arraydemo/temp/1
>> {
>>   "counter" : 1,
>>   "tags" : [ "I am an array", "With Multiple values" ],
>>   "more_tags" : [ "I am a tag" ],
>>   "even_more_tags": "I am a tag too!"
>> }
>>
>> GET arraydemo/temp/1
>>
>> POST arraydemo/temp/1/_update
>> {
>>   "script" : "tmp = new HashSet(); tmp.addAll(ctx._source.tags); 
>> tmp.addAll(new_tags); ctx._source.tags = tmp.toArray()",
>>   "params" : {
>> "new_tags" : [ "add me", "and me" ]
>>   }
>> }
>>
>>
>> However, elasticsearch-hadoop appears to be unable to parse array
>> parameters, such that an upsert operation from within elasticsearch hadoop
>> using the same script and a document with the same JSON for parameters
>> fails.
>>
>> I created an issue on github (elasticsearch hadoop (#223)), but thought I
>> should post here for ideas or in case there's a workaround that someone
>> might know of.
>>
>> James Campbell
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/70608cbd-5a4d-424e-b04e-6daee8ac0635%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/M7Z6lp3Yjo0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAJogdmdjfWMk0eKX-PDn7GHhCork0xfryXMTNKsOW9aJs-Gr6g%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BAQu3yRuVAP7VC3Ozb6Xjy4uG4v%2By2o%3DwkAKnvfB4YL8U71FQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Terms Aggregation and scope

2014-07-07 Thread Colin Goodheart-Smithe
Glad it worked.

Yes, there are options for includes and excludes patterns.  Take a look at 
the following link for information on how to use them.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values

Colin

On Monday, 7 July 2014 14:04:56 UTC+1, Diederik Meijer wrote:
>
> Hi Colin,
>
> Thank you, that works perfectly!
>
> Is there any way to limit the key-value pairs by a certain parameter, in 
> the example below: to limit the aggregation to "datum_uitspraak_ymd" keys 
> that start with "2014"?
>
> Or does that require the combination of a filter and an aggregation?
>
>
> Many thanks,
>
> Diederik
>
>
>
>
> Op Jul 7, 2014, om 2:47 PM heeft Colin Goodheart-Smithe <
> colin.goodheart-smi...@elasticsearch.com> het volgende geschreven:
>
> Diederik,
>
> To increase the number of terms returned by the terms aggregation you will 
> need to add the 'size' parameter to your aggregation. The below curl 
> command will return you the top 200 terms (ordered by decending doc_count).
>
> curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, 
> "aggs": { "datum_uitspraak_ymd": { "terms": { "field": 
> "datum_uitspraak_ymd", "size" : 200 } } } }'
>
> You may also find the following link to the documentation useful regarding 
> the size parameter of the terms aggregation.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_size_amp_shard_size
>
>
> Hope this helps,
>
> Colin
>
> On Monday, 7 July 2014 13:09:52 UTC+1, Diederik Meijer wrote:
>>
>> Dear list,
>>
>> I need to create an aggregation by a specific field, named 
>> "datum_uitspraak_ymd'. I am using the below curl command and it works fine 
>> in a sense that it returns the aggregation listed below. While this result 
>> seems OK enough, it seems that the keys listed in the aggregation are 
>> limited to those listed in 10 records.
>>
>> As the number of unique values for this key is much higher than 10, it 
>> seems that the aggregation's scope is global as far as it searches for 
>> documents with a value identical to one of the 10 listed, but it is limited 
>> as far as the key values used in the aggregation is concerned.
>>
>> How do I need to set up my curl command in order for the aggregation to 
>> return more key-value pairs?
>>
>> Many thanks,
>> Diederik
>>
>>
>> Command:
>> curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, 
>> "aggs": { "datum_uitspraak_ymd": { "terms": { "field": 
>> "datum_uitspraak_ymd" } } } }'
>>
>> Returns:
>> "aggregations" : { "datum_uitspraak_ymd" : { "buckets" : [ { "key" : 
>> "20121219", "doc_count" : 612 }, { "key" : "20120516", "doc_count" : 526 }, 
>> { "key" : "20110601", "doc_count" : 472 }, { "key" : "20121218", 
>> "doc_count" : 468 }, { "key" : "20090520", "doc_count" : 349 }, { "key" : 
>> "20101222", "doc_count" : 274 }, { "key" : "20120711", "doc_count" : 272 }, 
>> { "key" : "20090429", "doc_count" : 246 }, { "key" : "20120718", 
>> "doc_count" : 230 }, { "key" : "20120425", "doc_count" : 226 } ] } } 
>>
>>
> -- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/wCkb_xHaUmQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/9f0cb2af-fa2d-40b0-9930-b884141f2969%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bcd2fd44-6771-4c30-acfc-68ad731074b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Terms Aggregation and scope

2014-07-07 Thread Diederik Meijer | Ten Horses
Hi Colin,

Thank you, that works perfectly!

Is there any way to limit the key-value pairs by a certain parameter, in the 
example below: to limit the aggregation to "datum_uitspraak_ymd" keys that 
start with "2014"?

Or does that require the combination of a filter and an aggregation?


Many thanks,

Diederik




Op Jul 7, 2014, om 2:47 PM heeft Colin Goodheart-Smithe 
 het volgende geschreven:

> Diederik,
> 
> To increase the number of terms returned by the terms aggregation you will 
> need to add the 'size' parameter to your aggregation. The below curl command 
> will return you the top 200 terms (ordered by decending doc_count).
> 
> curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, 
> "aggs": { "datum_uitspraak_ymd": { "terms": { "field": "datum_uitspraak_ymd", 
> "size" : 200 } } } }'
> 
> You may also find the following link to the documentation useful regarding 
> the size parameter of the terms aggregation.
> 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_size_amp_shard_size
> 
> 
> Hope this helps,
> 
> Colin
> 
> On Monday, 7 July 2014 13:09:52 UTC+1, Diederik Meijer wrote:
> Dear list,
> 
> I need to create an aggregation by a specific field, named 
> "datum_uitspraak_ymd'. I am using the below curl command and it works fine in 
> a sense that it returns the aggregation listed below. While this result seems 
> OK enough, it seems that the keys listed in the aggregation are limited to 
> those listed in 10 records.
> 
> As the number of unique values for this key is much higher than 10, it seems 
> that the aggregation's scope is global as far as it searches for documents 
> with a value identical to one of the 10 listed, but it is limited as far as 
> the key values used in the aggregation is concerned.
> 
> How do I need to set up my curl command in order for the aggregation to 
> return more key-value pairs?
> 
> Many thanks,
> Diederik
> 
> 
> Command:
> curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, 
> "aggs": { "datum_uitspraak_ymd": { "terms": { "field": "datum_uitspraak_ymd" 
> } } } }'
> 
> Returns:
> "aggregations" : { "datum_uitspraak_ymd" : { "buckets" : [ { "key" : 
> "20121219", "doc_count" : 612 }, { "key" : "20120516", "doc_count" : 526 }, { 
> "key" : "20110601", "doc_count" : 472 }, { "key" : "20121218", "doc_count" : 
> 468 }, { "key" : "20090520", "doc_count" : 349 }, { "key" : "20101222", 
> "doc_count" : 274 }, { "key" : "20120711", "doc_count" : 272 }, { "key" : 
> "20090429", "doc_count" : 246 }, { "key" : "20120718", "doc_count" : 230 }, { 
> "key" : "20120425", "doc_count" : 226 } ] } } 
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/wCkb_xHaUmQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/9f0cb2af-fa2d-40b0-9930-b884141f2969%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2CC2403E-C358-4405-AA70-3ECDE29BF9D2%40tenhorses.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES Hadoop--Index only new documents without killing job from exceptions?

2014-07-07 Thread Costin Leau
I would recommend indexing the document since it's a 'cheap' operation per
document and it covers the potential differences between the docs. Also
from a performance POV you are not going to lose much since you are anyway
sending the doc to ES, which does hashing and returns the error to the user.
So the only thing that you save and _might_ potentially see is the actual
indexing which should become a problem only when dealing with large amounts
of docs.

These being said, there's already an issue opened [1] for trapping/handling
errors during a job (to prevent it from being cancelled) which potentially
can be used for such a purpose as well. Free free to add your comments to
it.

[1] https://github.com/elasticsearch/elasticsearch-hadoop/issues/160


On Thu, Jul 3, 2014 at 8:49 PM, James Campbell 
wrote:

> Hi ES-Hadoop users--
>
> I have a large list of simple documents that I would like to index for an
> auto complete feature. At batch processing time, I do not know which values
> are new (never seen before) and which are not (some other part of the
> update process changed, but the autocomplete-relevant portion of the
> document did not).
>
> I believe I could simply write all of the documents to the index whenever
> I run a new batch with the default es.write.operation=index, but that will
> cause ES to reindex the document each time even if it wasn't updated.
>
> On the other hand, if I choose to use es.write.operation=create, then any
> existing documents will cause the job to fail.
>
> Is there a way to combine those behaviors, so that I can allow
> elasticsearch to simply ignore requests to reindex existing documents
> (based on _id) but not to throw an exception that kills the entire job?
>
> James Campbell
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2e5b93ef-0c42-4068-bc2c-33e4efbe429b%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJogdmeX_Dc-LRNcgPxY4bQ6drz43eL%3DuQnRVYYD-kjZ8%3DJebw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How to limit the fields of response when I search a keyword?

2014-07-07 Thread 纪路
Dear all:

I have a reasonable need, but I can't find how to deal with it on es 
official docs and books, is anyone know, please teach it to me! thank you!

I have a large set of docs, which hold a lot of fields, such as:

uid2 = {
"id": 1404999597,
"idstr": "1404999597",
"class": 1,
"screen_name": "",
"name": "",
"province": "11",
"city": "1000",
"location": "北京",
"description": "在主流与非主流之间徘徊",
"url": "",
"profile_image_url": "http://tp2.sinaimg.cn/1404999597/50/5642385629/0";,
"profile_url": "u/1404999597",
"domain": "",
"weihao": "",
"gender": "f",
"followers_count": 1030710,
"friends_count": 272,
"statuses_count": 1519,
"favourites_count": 90,
"created_at": "Wed Mar 23 23:59:40 +0800 2011",
"following": false,
"allow_all_act_msg": false,
"geo_enabled": false,
"verified": true,
"verified_type": 0,
"remark": "",
"status": {
"created_at": "Tue Jul 01 13:17:55 +0800 2014",
"id": 3727513249206064,
"mid": "3727513249206064",
"idstr": "3727513249206064",
"text": "听到她的声音,我更相信她和荷西在天堂,依旧幸福着。 //@东方尔雅:现在这种纯真的爱情还好找吗? //@晓玲-有话说:[心]",
"source": "http://app.weibo.com/t/feed/9ksdit\"; 
rel=\"nofollow\">iPhone客户端",
"favorited": false,
"truncated": false,
"in_reply_to_status_id": "",
"in_reply_to_user_id": "",
"in_reply_to_screen_name": "",
"pic_urls": [],
"geo": null,
"reposts_count": 0,
"comments_count": 0,
"attitudes_count": 0,
"mlevel": 0,
"visible": {
"type": 0,
"list_id": 0
},
"darwin_tags": []
},
"ptype": 1,
"allow_all_comment": true,
"avatar_large": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
"avatar_hd": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
"verified_reason": "电视台主持人梦桐",
"verified_trade": "",
"verified_reason_url": "",
"verified_source": "",
"verified_source_url": "",
"follow_me": false,
"online_status": 0,
"bi_followers_count": 167,
"lang": "zh-cn",
"star": 0,
"mbtype": 0,
"mbrank": 0,
"block_word": 0,
"block_app": 0,
"ability_tags": "主持人",
"worldcup_guess": 0
}

this is a user info. If I want analysis the gender distribution of  all of 
users whose live in "city": "1000"(1000 is a city code), I don't need other 
field except "city" and "gender", How can I exclude these meaningless field 
before they are returned. Because there are a lots of doc, If transmit the 
entire doc will wast many time and bandwidth, and I have to trim the 
additional information in myself program. so, is there a method can deal 
with problem for me? thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0f79f408-92d4-4806-8c47-02dd877ddaaf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to limit fields of response doc when I search certain keyword?

2014-07-07 Thread 纪路
Dear all:
There is a reasonable need, but I don't find a solve in official doc or 
book, can you help me?

I have a large set of docs, which contains a lot of fields, such as:
 {
"id": 1404999597,
"idstr": "1404999597",
"class": 1,
"screen_name": "主播梦桐",
"name": "主播梦桐",
"province": "11",
"city": "1000",
"location": "北京",
"description": "在主流与非主流之间徘徊",
"url": "",
"profile_image_url": "http://tp2.sinaimg.cn/1404999597/50/5642385629/0";,
"profile_url": "u/1404999597",
"domain": "",
"weihao": "",
"gender": "f",
"followers_count": 1030710,
"friends_count": 272,
"statuses_count": 1519,
"favourites_count": 90,
"created_at": "Wed Mar 23 23:59:40 +0800 2011",
"following": false,
"allow_all_act_msg": false,
"geo_enabled": false,
"verified": true,
"verified_type": 0,
"remark": "",
"status": {
"created_at": "Tue Jul 01 13:17:55 +0800 2014",
"id": 3727513249206064,
"mid": "3727513249206064",
"idstr": "3727513249206064",
"text": "听到她的声音,我更相信她和荷西在天堂,依旧幸福着。 //@东方尔雅:现在这种纯真的爱情还好找吗? //@晓玲-有话说:[心]",
"source": "http://app.weibo.com/t/feed/9ksdit\"; 
rel=\"nofollow\">iPhone客户端",
"favorited": false,
"truncated": false,
"in_reply_to_status_id": "",
"in_reply_to_user_id": "",
"in_reply_to_screen_name": "",
"pic_urls": [],
"geo": null,
"reposts_count": 0,
"comments_count": 0,
"attitudes_count": 0,
"mlevel": 0,
"visible": {
"type": 0,
"list_id": 0
},
"darwin_tags": []
},
"ptype": 1,
"allow_all_comment": true,
"avatar_large": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
"avatar_hd": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
"verified_reason": "电视台主持人梦桐",
"verified_trade": "",
"verified_reason_url": "",
"verified_source": "",
"verified_source_url": "",
"follow_me": false,
"online_status": 0,
"bi_followers_count": 167,
"lang": "zh-cn",
"star": 0,
"mbtype": 0,
"mbrank": 0,
"block_word": 0,
"block_app": 0,
"ability_tags": "主持人",
"worldcup_guess": 0
}

My problem is when I search(or scan & scroll) a certain field, for example 
"city"=1000(1000 is its city code, which refer to a city name), there maybe 
1 results are returned. But my goal is detect how gender of this city's 
person is distributed in my website, I don't need so many information 
except "gender" field. What method can I do for excluding meaningless data 
from the response JSON before they are returned? Because there are so many 
similar tasks for me, transmitting the entire doc will spend lots of time 
and bandwidth, and I have to trim the additional date in myself program, it 
also wast CPU time in local computer. So if you know how to deal with this 
need, pleas teach it to my. Thank you!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2a01d5f4-67a5-493a-8e35-6f9a40a9998b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Inter-document Queries

2014-07-07 Thread Itamar Syn-Hershko
Hi, only saw this now

I wouldn't worry too much about high space complexity - storage comes cheap
nowadays, and the general practice in many systems is to store raw data and
do processing on demand (most commonly known approach is event sourcing).

I can understand an argument about high space complexity being a problem
when this is not the core of your business, and in those cases I'd indeed
try to find a way to store the data in different ways leveraging the
various advanced query types Elasticsearch offers - like the RegEx pattern
matching solution suggested by Theo

HTH,

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Wed, Jul 2, 2014 at 5:04 PM, Theo Harris  wrote:

> Together with Zennet we brainstormed a solution building on top of
> Itamar's proposal.
>
> In one string field we append the current path to the all previous ones
> and since we are talking about funnels we need to store them only on the
> last event/document generated, e.g SessionEndedEvent.
> Then we can use regex pattern matching to identify if the sequence of
> steps can be found anywhere in the stored paths string. This solution
> appears to be extremely fast.
>
>
> On Wednesday, June 11, 2014 1:14:59 AM UTC+3, Zennet Wheatcroft wrote:
>>
>> I simplified the actual problem in order to avoid explaining the domain
>> specific details. Allow me to add back more detail.
>>
>> We want to be able to search for multiple points of user action, towards
>> a conversion funnel, and condition on multiple fields. Let's add another
>> field (response) to the above model:
>> {.., "path":"/promo/A", "response": 200, ..}
>> {.., "path":"/page/1", "response": 401, ..}
>> {.., "path":"/promo/D","response": 200, ..}
>> {.., "path":"/page/23", "response": 301, ..}
>> {.., "path":"/page/2", "response": 418, ..}
>> Let's say we define three points through the conversion funnel:
>> A: Visited path=/page/1
>> B: Got response=401 from some path
>> C: Exited at path=/sale/C
>>
>> And we want to know how many users did steps A-B-C in that order. If we
>> add an array prev_response like we did for prev_path, then we can use a
>> term filter to find documents with term path=/sale/C and prev_path=/page/1
>> and prev_response=401. But this will not distinguish between A->B->C and
>> B->A->C. Perhaps I could use the script filter for the "last mile" and from
>> the term filtered results throw out B-A-C and it will run more quickly
>> because of the reduced document set.
>>
>> Is there another way to implement this query?
>>
>> Zennet
>>
>>
>> On Wednesday, June 4, 2014 5:01:19 PM UTC-7, Itamar Syn-Hershko wrote:
>>>
>>> You need to be able to form buckets that can be reduced again, either
>>> using the aggregations framework or a query. One model that will allow you
>>> to do that is something like this:
>>>
>>> { "userid": "xyz", "path":"/sale/B", "previous_paths":[...],
>>> "tstamp":"...", ... }
>>>
>>> So whenever you add a new path, you denormalize and add previous paths
>>> that could be relevant. This might bloat your storage a bit and be slower
>>> on writes, but it is very optimized for reads since now you can do an
>>> aggregation that queries for the desired "path" and buckets on the user. To
>>> check the condition of the previous path you should be able to bucket again
>>> using a script, or maybe even with a query on a nested type.
>>>
>>> This is just from the top of my head but should definitely work if you
>>> can get to that model
>>>
>>> --
>>>
>>> Itamar Syn-Hershko
>>> http://code972.com | @synhershko 
>>> Freelance Developer & Consultant
>>> Author of RavenDB in Action 
>>>
>>>
>>> On Thu, Jun 5, 2014 at 2:36 AM, Zennet Wheatcroft 
>>> wrote:
>>>
 Yes. I can re-index the data or transform it in any way to make this
 query efficient.

 What would you suggest?



 On Wednesday, June 4, 2014 2:14:09 PM UTC-7, Itamar Syn-Hershko wrote:

> This model is not efficient for this type of querying. You cannot do
> this in one query using this model, and the pre-processing work you do now
> + traversing all documents is very costly.
>
> Is it possible for you to index the data (even as a projection) into
> Elasticsearch using a different model, so you can use ES properly using
> queries or the aggregations framework?
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
>
> On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft <
> zwhea...@atypon.com> wrote:
>
>>  Hi,
>>
>> I am looking for an efficient way to do inter-document queries in
>> Elasticsearch. Specifically, I want to count th

Re: Terms Aggregation and scope

2014-07-07 Thread Colin Goodheart-Smithe
Diederik,

To increase the number of terms returned by the terms aggregation you will 
need to add the 'size' parameter to your aggregation. The below curl 
command will return you the top 200 terms (ordered by decending doc_count).

curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, 
"aggs": { "datum_uitspraak_ymd": { "terms": { "field": 
"datum_uitspraak_ymd", "size" : 200 } } } }'

You may also find the following link to the documentation useful regarding 
the size parameter of the terms aggregation.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_size_amp_shard_size


Hope this helps,

Colin

On Monday, 7 July 2014 13:09:52 UTC+1, Diederik Meijer wrote:
>
> Dear list,
>
> I need to create an aggregation by a specific field, named 
> "datum_uitspraak_ymd'. I am using the below curl command and it works fine 
> in a sense that it returns the aggregation listed below. While this result 
> seems OK enough, it seems that the keys listed in the aggregation are 
> limited to those listed in 10 records.
>
> As the number of unique values for this key is much higher than 10, it 
> seems that the aggregation's scope is global as far as it searches for 
> documents with a value identical to one of the 10 listed, but it is limited 
> as far as the key values used in the aggregation is concerned.
>
> How do I need to set up my curl command in order for the aggregation to 
> return more key-value pairs?
>
> Many thanks,
> Diederik
>
>
> Command:
> curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, 
> "aggs": { "datum_uitspraak_ymd": { "terms": { "field": 
> "datum_uitspraak_ymd" } } } }'
>
> Returns:
> "aggregations" : { "datum_uitspraak_ymd" : { "buckets" : [ { "key" : 
> "20121219", "doc_count" : 612 }, { "key" : "20120516", "doc_count" : 526 }, 
> { "key" : "20110601", "doc_count" : 472 }, { "key" : "20121218", 
> "doc_count" : 468 }, { "key" : "20090520", "doc_count" : 349 }, { "key" : 
> "20101222", "doc_count" : 274 }, { "key" : "20120711", "doc_count" : 272 }, 
> { "key" : "20090429", "doc_count" : 246 }, { "key" : "20120718", 
> "doc_count" : 230 }, { "key" : "20120425", "doc_count" : 226 } ] } } 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9f0cb2af-fa2d-40b0-9930-b884141f2969%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected)

2014-07-07 Thread Andrew Davidoff
On Mon, Jul 7, 2014 at 4:16 AM, Grégoire Seux  wrote:
> Andrew,
>
> Have you found a solution (or explaination) to your issue ?
> We are using elasticsearch 1.1.1, what about you ?

Hi,

I haven't learned anything new. To be clear about my problem, I am
aware that I must re-enable routing after having disabled it. My issue
is that I expect all the UNASSIGNED shards to go back to the same
node, but some do not, only to get rebalanced back there later. I am
running elasticsearch 1.2.1.

Andy

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJLXCZSM9MKC0MN_aaivrFwmE6U-%3Dmkqobun946XcnPk2BCcHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Terms Aggregation and scope

2014-07-07 Thread Diederik Meijer
Dear list,

I need to create an aggregation by a specific field, named 
"datum_uitspraak_ymd'. I am using the below curl command and it works fine 
in a sense that it returns the aggregation listed below. While this result 
seems OK enough, it seems that the keys listed in the aggregation are 
limited to those listed in 10 records.

As the number of unique values for this key is much higher than 10, it 
seems that the aggregation's scope is global as far as it searches for 
documents with a value identical to one of the 10 listed, but it is limited 
as far as the key values used in the aggregation is concerned.

How do I need to set up my curl command in order for the aggregation to 
return more key-value pairs?

Many thanks,
Diederik


Command:
curl -XGET 'http://localhost:9200/_search?pretty=true' -d '{ "size": 0, 
"aggs": { "datum_uitspraak_ymd": { "terms": { "field": 
"datum_uitspraak_ymd" } } } }'

Returns:
"aggregations" : { "datum_uitspraak_ymd" : { "buckets" : [ { "key" : 
"20121219", "doc_count" : 612 }, { "key" : "20120516", "doc_count" : 526 }, 
{ "key" : "20110601", "doc_count" : 472 }, { "key" : "20121218", 
"doc_count" : 468 }, { "key" : "20090520", "doc_count" : 349 }, { "key" : 
"20101222", "doc_count" : 274 }, { "key" : "20120711", "doc_count" : 272 }, 
{ "key" : "20090429", "doc_count" : 246 }, { "key" : "20120718", 
"doc_count" : 230 }, { "key" : "20120425", "doc_count" : 226 } ] } } 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/34b8c9ca-8fe1-45ca-96b1-49b2a4f55696%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: have we a way to use highlight and fuzzy together ?

2014-07-07 Thread Tanguy Bernard
I want to combine like this :

GET my_index/my_type/_search?pretty=true
{"size": 50,

"query": {
 
"multi_match": {
"query":  "my words",
   
"fields": ["title_doc"]

},*"fuzzy": 0.2*
}, 
*"highlight" : {*
   
*"fields" : {*
*"title_doc" : { "fragment_size" : 30**}*
*}*
*},*
"sort": [
   {
  "date_source": {
 "order": "desc"
  }
   }
]
   

}

Can you help me ?

Tanguy

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/af25f933-be1e-4df5-9c15-ab494b5d31b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How can I do intersection or union operation with two facets filter?

2014-07-07 Thread 闫旭
Dear All!
I have some docs:
{"field_A":"aaa","field_B":"bbb"}
{"field_A":"aaa","field_B":"ccc"}
{"field_A":"bbb","field_B":"bbb"}
{"field_A":"bbb","field_B":"bbb"}
{"field_A":"bbb","field_B":"eee"}
{"field_A":"aaa","field_B":""}
{"field_A":"ccc","field_B":""}
first step:
{
"query":{
"filtered" : {
"filter" : {
"bool" : {
"must" : {
"term" : { "field_B" : "bbb" }
}
}
}
}
},
"facets" : {
"tag" : {
"terms" : {
"field" : "field_A"
}
}
}
}
first result:
{
...
{"term":"aaa","count":1},
{"term":"bbb","count":2},
...
}
-
second step:
the second facets:
{
"query":{
"filtered" : {
"filter" : {
"bool" : {
"must" : {
"term" : { "field_B" : "" }
}
}
}
}
},
"facets" : {
"tag" : {
"terms" : {
"field" : "field_A"
}
}
}
}
second result:
{
...
{"term":"aaa","count":1}.
{"term":"ccc","count":1}
...
}
-
third step:
combine the two result with interesction operation with "term":
{"term":"aaa","count":"I don't care the count value."}

-
Now, How can i combine the three steps in one filter facets or othen
method??


Thx All!!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53BA75D9.1080502%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch with azure cloud plugin

2014-07-07 Thread Itamar Syn-Hershko
This doesn't sound like it's Azure-specific. For one, I'd try to use ES
1.2.1 as there has been a lot of work in that area (of GC and threads).

I'd also try to avoid using the Azure plugin as long as possible and use
Unicast instead - I've just blogged about exactly that, see
http://code972.com/blog/2014/07/74-the-definitive-guide-for-elasticsearch-on-windows-azure

HTH,

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Mon, Jul 7, 2014 at 10:35 AM, NetaN  wrote:

> Hello,
> We are using ES with the azure cloud plugin for node communication. Our
> current set-up is :
> 2 data node (hosted on azure ubuntu 14.04 VM) 3.5 giga RAM, 2 Cores (AKA
> medium VM) ES version 1.0.0.0. 5 shards and 2 replicas.
> The storing and query are done by Kibana and logstash which are hosted on
> separate machines.
> Logstash connects to the cluster via the node protocol.
> The problem we are having is that for some unknown reason one of the data
> nodes leaves the cluster, no errors are reported in the log of the machine
> that left and the node keeps running independently (wired, I know) when
> restarting the elasticsearch service on the lonely node it rejoins the
> cluster. After the node rejoins the cluster everything seems ok until it or
> the other data node will leave the cluster again (Meaning that there is no
> specific  node that have the problem, each one of the nodes can suddenly
> leave the cluster).
> Did anyone encounter this problem using the azure plugin for node
> discovery?
> Any input on how to approach this issue?
> Thanks
> Neta.
>
>
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-with-azure-cloud-plugin-tp4059340.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1404718530391-4059340.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zvr1fLPxPky4RQ5a3wRTFH6KrnXJEJxF4aRUw0KQDxRwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


What is exactly flush in node stats?

2014-07-07 Thread Peeyush Chandel
Hi,

When i see the node stat of my cluster is show something like this:

"flush" : {
  "total" : 30,
  "total_time_in_millis" : 6964
}

My translog flush interval is 180 minutes or 1200 mb , so i think flush 
should be done based on these values.

But if i keep on checking my node stats then flush values (total) are keep 
on increasing, so what is it means? 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/daf3a15c-9d7d-444e-966d-b090dfb76510%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: have we a way to use highlight and fuzzy together ?

2014-07-07 Thread Tanguy Bernard
I want to combine like this :

GET my_index/my_type/_search?pretty=true
{"size": 50,

"query": {
*"fuzzy": 0.2,* 
"multi_match": {
"query":  "my words",
   
"fields": ["title_doc"]

}
}, 
*"highlight" : {*
*"order" : "date_doc",*
*"fields" : {*
*"title_doc" : { "fragment_size" : 30**}*
*}*
*}*
   


}

Can you help me ?

Tanguy

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8e56e849-ca0a-4c3c-9183-c928d4236e77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Query DSL problem

2014-07-07 Thread 闫旭
Dear All!
Can I do intersection or union operation with facets filter result?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53BA6545.1070600%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


How to Know Progress of Backup And Restore

2014-07-07 Thread sowjanya
Hi,
I need Progress information of Back up and Restore in Elastic Search.
How can i find how much data is processed.

Please help me in this.




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/How-to-Know-Progress-of-Backup-And-Restore-tp4059347.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1404724112017-4059347.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch, how to make view with it?

2014-07-07 Thread Villiers Tientcheu Ngandjeuu

Merci David!
Le lundi 7 juillet 2014 10:20:19 UTC+2, David Pilato a écrit :
>
> Just answered on the french ML as well:
>
> You can use aliases on top of your indices and add a NGnix layer for 
> example to filter URLS per user/group.
>
>
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>
>
> Le 7 juillet 2014 à 09:41:11, Villiers Tientcheu Ngandjeuu (
> tientche...@gmail.com ) a écrit:
>
> Hello dear!
> I get three kind of logs to index with Elasticsearch, let say X, Y and Z 
> for three different teams in the business! With SQL, it was possible to 
> make view for customers. How can I make view with Elasticsearch for the 
> teams. Or how can restrict access to data between them:Team X just have to 
> access to logs X, etc.
> Thanks for your help!
> --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/e506c9f3-ef28-4dc6-9052-479a5a8d566a%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/adf30c81-9031-4352-884c-6619144baa4b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch with azure cloud plugin

2014-07-07 Thread NetaN
Yes this is the case. I will post


On Mon, Jul 7, 2014 at 11:24 AM, dadoonet [via ElasticSearch Users] <
ml-node+s115913n405934...@n3.nabble.com> wrote:

> So you are saying that when a node suddenly disappear for whatever reason
> (network, GC…), he can't rejoin again the cluster automatically so you have
> to restart it?
>
> If so, could you open an issue in cloud-azure plugin repo and if possible
> attach logs from the both nodes?
>
> Thanks
>
> --
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr
> 
>
>
> Le 7 juillet 2014 à 09:35:34, NetaN ([hidden email]
> ) a écrit:
>
> Hello,
> We are using ES with the azure cloud plugin for node communication. Our
> current set-up is :
> 2 data node (hosted on azure ubuntu 14.04 VM) 3.5 giga RAM, 2 Cores (AKA
> medium VM) ES version 1.0.0.0. 5 shards and 2 replicas.
> The storing and query are done by Kibana and logstash which are hosted on
> separate machines.
> Logstash connects to the cluster via the node protocol.
> The problem we are having is that for some unknown reason one of the data
> nodes leaves the cluster, no errors are reported in the log of the machine
> that left and the node keeps running independently (wired, I know) when
> restarting the elasticsearch service on the lonely node it rejoins the
> cluster. After the node rejoins the cluster everything seems ok until it
> or
> the other data node will leave the cluster again (Meaning that there is no
> specific node that have the problem, each one of the nodes can suddenly
> leave the cluster).
> Did anyone encounter this problem using the azure plugin for node
> discovery?
> Any input on how to approach this issue?
> Thanks
> Neta.
>
>
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-with-azure-cloud-plugin-tp4059340.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email]
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1404718530391-4059340.post%40n3.nabble.com.
>
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email]
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/etPan.53ba58f9.440badfc.2fae%40MacBook-Air-de-David.local
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-with-azure-cloud-plugin-tp4059340p4059344.html
>  To unsubscribe from Elasticsearch with azure cloud plugin, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-with-azure-cloud-plugin-tp4059340p4059345.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAApjjSTDFcMF%2BPvfHD-G8rs7aQZQp03%3DTcvSQ1FgYG2JUBHSWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch with azure cloud plugin

2014-07-07 Thread David Pilato
So you are saying that when a node suddenly disappear for whatever reason 
(network, GC…), he can't rejoin again the cluster automatically so you have to 
restart it?

If so, could you open an issue in cloud-azure plugin repo and if possible 
attach logs from the both nodes?

Thanks

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 7 juillet 2014 à 09:35:34, NetaN (n...@biocatch.com) a écrit:

Hello,  
We are using ES with the azure cloud plugin for node communication. Our  
current set-up is :  
2 data node (hosted on azure ubuntu 14.04 VM) 3.5 giga RAM, 2 Cores (AKA  
medium VM) ES version 1.0.0.0. 5 shards and 2 replicas.  
The storing and query are done by Kibana and logstash which are hosted on  
separate machines.  
Logstash connects to the cluster via the node protocol.  
The problem we are having is that for some unknown reason one of the data  
nodes leaves the cluster, no errors are reported in the log of the machine  
that left and the node keeps running independently (wired, I know) when  
restarting the elasticsearch service on the lonely node it rejoins the  
cluster. After the node rejoins the cluster everything seems ok until it or  
the other data node will leave the cluster again (Meaning that there is no  
specific node that have the problem, each one of the nodes can suddenly  
leave the cluster).  
Did anyone encounter this problem using the azure plugin for node discovery?  
Any input on how to approach this issue?  
Thanks  
Neta.  





--  
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-with-azure-cloud-plugin-tp4059340.html
  
Sent from the ElasticSearch Users mailing list archive at Nabble.com.  

--  
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.  
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.  
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1404718530391-4059340.post%40n3.nabble.com.
  
For more options, visit https://groups.google.com/d/optout.  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53ba58f9.440badfc.2fae%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch, how to make view with it?

2014-07-07 Thread David Pilato
Just answered on the french ML as well:

You can use aliases on top of your indices and add a NGnix layer for example to 
filter URLS per user/group.



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 7 juillet 2014 à 09:41:11, Villiers Tientcheu Ngandjeuu 
(tientcheuvilli...@gmail.com) a écrit:

Hello dear!
I get three kind of logs to index with Elasticsearch, let say X, Y and Z for 
three different teams in the business! With SQL, it was possible to make view 
for customers. How can I make view with Elasticsearch for the teams. Or how can 
restrict access to data between them:Team X just have to access to logs X, etc.
Thanks for your help!
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e506c9f3-ef28-4dc6-9052-479a5a8d566a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53ba5836.614fd4a1.2fae%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


RE: cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected)

2014-07-07 Thread Grégoire Seux
Andrew,

Have you found a solution (or explaination) to your issue ?
We are using elasticsearch 1.1.1, what about you ?

-- 
Grégoire

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/abb533aca6be4983a05269a35b72f7ba%40FRDCPWEXCH002.criteois.lan.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch, how to make view with it?

2014-07-07 Thread Villiers Tientcheu Ngandjeuu
Hello dear!
I get three kind of logs to index with Elasticsearch, let say X, Y and Z 
for three different teams in the business! With SQL, it was possible to 
make view for customers. How can I make view with Elasticsearch for the 
teams. Or how can restrict access to data between them:Team X just have to 
access to logs X, etc.
Thanks for your help! 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e506c9f3-ef28-4dc6-9052-479a5a8d566a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch with azure cloud plugin

2014-07-07 Thread NetaN
Hello,
We are using ES with the azure cloud plugin for node communication. Our
current set-up is :
2 data node (hosted on azure ubuntu 14.04 VM) 3.5 giga RAM, 2 Cores (AKA
medium VM) ES version 1.0.0.0. 5 shards and 2 replicas. 
The storing and query are done by Kibana and logstash which are hosted on
separate machines.
Logstash connects to the cluster via the node protocol.
The problem we are having is that for some unknown reason one of the data
nodes leaves the cluster, no errors are reported in the log of the machine
that left and the node keeps running independently (wired, I know) when
restarting the elasticsearch service on the lonely node it rejoins the
cluster. After the node rejoins the cluster everything seems ok until it or
the other data node will leave the cluster again (Meaning that there is no
specific  node that have the problem, each one of the nodes can suddenly
leave the cluster).  
Did anyone encounter this problem using the azure plugin for node discovery?
Any input on how to approach this issue?
Thanks
Neta. 

 



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-with-azure-cloud-plugin-tp4059340.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1404718530391-4059340.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.