date:20140707

Hey,
I tried executing the above commands specified by you. But still not 
getting the elastic-search working.
Still giving the same status message, "elasticsearch dead but subsys locked"

Thanks,
Shriyansh

On Monday, July 7, 2014 5:29:00 PM UTC-7, arshpreet singh wrote:
>
> On Tue, Jul 8, 2014 at 5:53 AM, shriyansh jain  > wrote: 
> > Hey Arshpreet, 
> > 
> > I am not getting anything in output. 
>
> Please avoid top posting and RTF while replying in mailing lists. 
> IMHO your service is blocked and you need to kill the demon. 
>
> sudo /etc/init.d/elasticsearch restart 
> or 
> sudo /etc/init.d/elasticsearch stop 
> sudo /etc/init.d/elasticsearch start 
>
>
> -- 
>
> Thanks 
> Arshpreet singh 
> http://arshpreetsingh.wordpress.com/ 
> I have no special talents. Only passionately curious 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bc85faa2-406e-4fac-ac75-06649d1bf075%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Not Working

2014-07-07 Thread arshpreet singh

On Tue, Jul 8, 2014 at 5:53 AM, shriyansh jain  wrote:
> Hey Arshpreet,
>
> I am not getting anything in output.

Please avoid top posting and RTF while replying in mailing lists.
IMHO your service is blocked and you need to kill the demon.

sudo /etc/init.d/elasticsearch restart
or
sudo /etc/init.d/elasticsearch stop
sudo /etc/init.d/elasticsearch start


-- 

Thanks
Arshpreet singh
http://arshpreetsingh.wordpress.com/
I have no special talents. Only passionately curious

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAstK2Hg%3DGBCYL4kr32VVyN%2BC6rR9p%3Di3qTLF8Y_ajDa7WAiJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Not Working

Hey,
I am running the following command from terminal to verify the status



*sudo /etc/init.d/elasticsearch status*Thanks,
Shriyansh


On Monday, July 7, 2014 5:07:04 PM UTC-7, Mark Walkom wrote:
>
> You need to provide more details for people to be able to effectively help.
>
> How are you verifying this, what method are you using?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 8 July 2014 10:03, shriyansh jain > 
> wrote:
>
>> When I am verifying the elastic-search status, its giving me the 
>> following error message. 
>>
>> *elasticsearch dead but subsys locked*
>>
>> Please help me out solving this.
>>
>> Thank you.
>> Shriyansh Jain
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a37913e3-fd5a-45b1-85c1-9d20fad05c6b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Not Working

2014-07-07 Thread Mark Walkom

What command?
Please be explicit, provide what you are running and the output.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 8 July 2014 10:25, shriyansh jain  wrote:

> I am just running a command from terminal.
>
> Thanks,
> Shriyansh
>
> On Monday, July 7, 2014 5:07:04 PM UTC-7, Mark Walkom wrote:
>>
>> You need to provide more details for people to be able to effectively
>> help.
>>
>> How are you verifying this, what method are you using?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 8 July 2014 10:03, shriyansh jain  wrote:
>>
>>> When I am verifying the elastic-search status, its giving me the
>>> following error message.
>>>
>>> *elasticsearch dead but subsys locked*
>>>
>>> Please help me out solving this.
>>>
>>> Thank you.
>>> Shriyansh Jain
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/269395b9-7d48-4f0c-93de-4d7b275dff6d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bY-27tu7iEoGy5rqU6E1po-2UX3KhMmvW6dGbfoEpkZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Not Working

I am just running a command from terminal.

Thanks,
Shriyansh

On Monday, July 7, 2014 5:07:04 PM UTC-7, Mark Walkom wrote:
>
> You need to provide more details for people to be able to effectively help.
>
> How are you verifying this, what method are you using?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 8 July 2014 10:03, shriyansh jain > 
> wrote:
>
>> When I am verifying the elastic-search status, its giving me the 
>> following error message. 
>>
>> *elasticsearch dead but subsys locked*
>>
>> Please help me out solving this.
>>
>> Thank you.
>> Shriyansh Jain
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/269395b9-7d48-4f0c-93de-4d7b275dff6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Not Working

Hey Arshpreet,

I am not getting anything in output.


Thank you,
Shriyansh




On Monday, July 7, 2014 5:12:05 PM UTC-7, arshpreet singh wrote:
>
> On Tue, Jul 8, 2014 at 5:33 AM, shriyansh jain  > wrote: 
> > When I am verifying the elastic-search status, its giving me the 
> following 
> > error message. 
> > 
> > elasticsearch dead but subsys locked 
>
> ipcs -s | grep elasticsearch 
>
> Can you post output for the above command? 
>
> -- 
>
> Thanks 
> Arshpreet singh 
> http://arshpreetsingh.wordpress.com/ 
> I have no special talents. Only passionately curious 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e48f5f1e-dfe8-4f25-b9f0-b787762b631b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Not Working

2014-07-07 Thread arshpreet singh

On Tue, Jul 8, 2014 at 5:33 AM, shriyansh jain  wrote:
> When I am verifying the elastic-search status, its giving me the following
> error message.
>
> elasticsearch dead but subsys locked

ipcs -s | grep elasticsearch

Can you post output for the above command?

-- 

Thanks
Arshpreet singh
http://arshpreetsingh.wordpress.com/
I have no special talents. Only passionately curious

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAstK2Ge%2B9CYeFYX0tiTQ2XL0Y%3Dcw49vUOLhGpC%2Bjn2i6Oyw7w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Not Working

2014-07-07 Thread Mark Walkom

You need to provide more details for people to be able to effectively help.

How are you verifying this, what method are you using?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 8 July 2014 10:03, shriyansh jain  wrote:

> When I am verifying the elastic-search status, its giving me the following
> error message.
>
> *elasticsearch dead but subsys locked*
>
> Please help me out solving this.
>
> Thank you.
> Shriyansh Jain
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624b5if1JbJ-BGEwZ31QwwmRsbmjTmj34infrMvh8MPHY7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch Not Working

When I am verifying the elastic-search status, its giving me the following 
error message. 

*elasticsearch dead but subsys locked*

Please help me out solving this.

Thank you.
Shriyansh Jain

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bbbf214b-909b-4819-a6ca-508e76c7af7b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Problems upgrading an existing field to a multi-field

2014-07-07 Thread Ryan Tanner

I'm having trouble upgrading an existing field to a multi-field.  I've done 
this before with no issues on other fields.

I think the issue here is that the original mapping specifically defines an 
analyzer:

  "mappings" : {
"person" : {
  "properties" : {
"domain_titles" : {
  "type" : "string",
  "analyzer" : "stop",
  "include_in_all" : true
}
  }
}
  }

The other fields that have been upgraded do not have an analyzer in the 
original mapping.

This is the upgrade I'm attempting:

{
  "settings" : {
"index.analysis.filter.shingle_filter.type" : "shingle",
"index.analysis.filter.shingle_filter.min_shingle_size" : 2,
"index.analysis.filter.shingle_filter.max_shingle_size" : 5,
"index.analysis.analyzer.shingle_analyzer.type" : "custom",
"index.analysis.analyzer.shingle_analyzer.tokenizer" : "standard",
"index.analysis.analyzer.shingle_analyzer.filter" : [ "lowercase", 
"shingle_filter" ]
  },
  "mappings" : {
"person" : {
  "properties" : {
"domain_titles" : {
  "type" : "string",
  "fields" : {
"suggestions" : {
  "type" : "string",
  "index" : "analyzed",
  "include_in_all" : false,
  "analyzer" : "nicknameAnalyzer"
}
  }
}
  }
}
  }
}

Is there any reason why this sort of upgrade should fail?  This is the 
error message I get:

{"error":"MergeMappingException[Merge failed with failures {[mapper 
[domain_titles] has different index_analyzer]}]","status":400}

Thanks for the help.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e057498d-64ca-4f5f-a76c-0a4717b82b9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: excessive merging/small segment sizes

2014-07-07 Thread Michael McCandless

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy  wrote:

> All that seems correct (except I think this is for node 6, not node 5). We
> don't delete documents, but we do some updates. The vast majority of
> documents get indexed into the large shards, but the smaller ones take some
> writes as well.
>
> We aren't using virtualized hardware and elasticsearch is the only thing
> running on the machines, no scheduled jobs, etc. I don't think something is
> interfering, actually overall disk i/o rate and operations on the machine
> go down quite a bit during the problematic period, which is consistent with
> your observations about things taking longer.
>
> I went back and checked all our collected metrics again. I noticed that
> even though the heap usage and gc count seems smooth during the period in
> question, gc time spent goes way up. Also active indexing threads goes up,
> but since our ingest rate didn't go up I assumed this was a side effect.
> During a previous occurrence a few days ago on node5, I stopped all
> indexing activity for 15 minutes. Active merges and indexing requests went
> to zero as expected. Then I re-enabled indexing and immediately the
> increased cpu/gc/active merges went back up to the problematic rates.
>
> Overall this is pretty confusing to me as to what is a symptom vs a root
> cause here. A summary of what I think I know:
>
>1. Every few days, cpu usage on a node goes way above the other nodes
>and doesn't recover. We've let the node run in the elevated cpu state for a
>day with no improvement.
>2. It doesn't seem likely that it's data related. We use replicas=1
>and no other nodes have issues.
>3. It doesn't seem hardware related. We run on a dedicated h/w with
>elasticsearch being the only thing running. Also the problem appears on
>various nodes and machine load seems tied directly to the elasticsearch
>process.
>4. During the problematic period: cpu usage, active merge threads,
>active bulk (indexing) threads, and gc time are elevated.
>5. During the problematic period: i/o ops and i/o throughput decrease.
>6. overall heap usage size seems to smoothly increase, the extra gc
>time seems to be spent on the new gen. Interestingly, the gc count didn't
>seem to increase.
>7. In the hours beforehand, gc behavior of the problematic node was
>similar to the other nodes.
>8. If I pause indexing, machine load quickly returns to normal, merges
>and indexing requests complete.  if I then restart indexing the problem
>reoccurs immediately.
>9. If I disable automatic refreshes, the problem disappears within an
>hour or so.
>10. hot threads show merging activity as the hot threads.
>
> The first few points make me think the increased active merges is perhaps
> a side effect, but then the last 3 make me think merging is the root cause.
> The only additional things I can think of that may be relevant are:
>
>1. Our documents can vary greatly in size, they average a couple KB
>but can rarely be several MB.
>2. we do use language analysis plugins, perhaps one of these is acting
>up?
>3. We eagerly load one field into the field data cache. But the cache
>size is ok and the overall heap behavior is ok so I don't think this is the
>problem.
>
> That's a lot of information, but I am not sure where to go next from
> here...
>
> On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:
>
>> Indeed there are no big merges during that time ...
>>
>> I can see on node5, ~14:45 suddenly merges are taking a long time,
>> refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
>> time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
>> e.g. total merging GB, number of commits/refreshes is very low during this
>> time.
>>
>> Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.  The
>> biggish shards are indexing at a very slow rate and only have ~1%
>> deletions.  Are you explicitly deleting docs?
>>
>> I suspect something is suddenly cutting into the IO perf of this box, and
>> because merging/refreshing is so IO intensive, it causes these operations
>> to run slower / backlog.
>>
>> Are there any scheduled jobs, e.g. backups/snapshots, that start up?  Are
>> you running on virtualized hardware?
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy  wrote:
>>
>>>  Just to reiterate, the problematic period is from 07/05 14:45 to 07/06
>>> 02:10. I included a couple hours before and after in the logs.
>>>
>>>
>>> On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

 They are linked below (node5 is the log of the normal node, node6 is
 the log of the problematic node).

 I don't think it was doing big merges, otherwise during the

Re: Elasticsearch twitter river filtered stream question

2014-07-07 Thread David Pilato

It uses the filter functionality provided by Twitter API.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 7 juillet 2014 à 21:54:02, Josh Harrison (hij...@gmail.com) a écrit:

Quick question about the ES twitter river at
https://github.com/elasticsearch/elasticsearch-river-twitter
The twitter streaming API allows you to filter, and you apparently get up to 1%
of the stream total, with our search queries. So, if I were filtering for
"coffee", I'd get "coffee" tweets that I wouldn't get if I was just capturing
the 1% stream passively.
Does the Twitter river use this filter functionality, or does it do its
filtering on the ingestion side, ingesting the normal 1% stream and discarding
anything that doesn't match
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b7adb5f1-49a1-4424-8f4e-1c75e15c4cb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53bb1e60.625558ec.5cf2%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: Best practice to backup index daily?

The Elasticsearch curator now supports snapshots:

https://github.com/elasticsearch/curator
http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/

You would still need to use cron to schedule tasks, but it would be a
curator task instead of a direct curl request.

Cheers,

Ivan


On Mon, Jul 7, 2014 at 1:12 PM, sabdalla80  wrote:

> I am able to take a snapshot of the index and back it up to AWS S3. What
> is the best way to automate this approach and have it done daily say every
> days at 12 midnight?
> I am aware that I can probably do it with crontab, but curious if other
> are doing it differently?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8620f7d9-b827-470d-8928-75c308e722cc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB%3D%2BXEGQB3GybgiE3sniiFavc_NMyTK6LgjNnbus4J-jA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to limit the fields of response when I search a keyword?

I responded differently to your other similar question, but you can also
limit the fields, but explicitly asking for the set that you want:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html

Cheers,

Ivan


On Sat, Jul 5, 2014 at 2:32 AM, 纪路  wrote:

> Dear all:
>
> I have a reasonable need, but I can't find how to deal with it on es
> official docs and books, is anyone know, please teach it to me! thank you!
>
> I have a large set of docs, which hold a lot of fields, such as:
>
> uid2 = {
> "id": 1404999597,
> "idstr": "1404999597",
> "class": 1,
> "screen_name": "",
> "name": "",
> "province": "11",
> "city": "1000",
> "location": "北京",
> "description": "在主流与非主流之间徘徊",
> "url": "",
> "profile_image_url": "http://tp2.sinaimg.cn/1404999597/50/5642385629/0";,
> "profile_url": "u/1404999597",
> "domain": "",
> "weihao": "",
> "gender": "f",
> "followers_count": 1030710,
> "friends_count": 272,
> "statuses_count": 1519,
> "favourites_count": 90,
> "created_at": "Wed Mar 23 23:59:40 +0800 2011",
> "following": false,
> "allow_all_act_msg": false,
> "geo_enabled": false,
> "verified": true,
> "verified_type": 0,
> "remark": "",
> "status": {
> "created_at": "Tue Jul 01 13:17:55 +0800 2014",
> "id": 3727513249206064,
> "mid": "3727513249206064",
> "idstr": "3727513249206064",
> "text": "听到她的声音，我更相信她和荷西在天堂，依旧幸福着。 //@东方尔雅:现在这种纯真的爱情还好找吗？ //@晓玲-有话说:[心]",
> "source": "http://app.weibo.com/t/feed/9ksdit\";
> rel=\"nofollow\">iPhone客户端",
> "favorited": false,
> "truncated": false,
> "in_reply_to_status_id": "",
> "in_reply_to_user_id": "",
> "in_reply_to_screen_name": "",
> "pic_urls": [],
> "geo": null,
> "reposts_count": 0,
> "comments_count": 0,
> "attitudes_count": 0,
> "mlevel": 0,
> "visible": {
> "type": 0,
> "list_id": 0
> },
> "darwin_tags": []
> },
> "ptype": 1,
> "allow_all_comment": true,
> "avatar_large": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
> "avatar_hd": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
> "verified_reason": "电视台主持人梦桐",
> "verified_trade": "",
> "verified_reason_url": "",
> "verified_source": "",
> "verified_source_url": "",
> "follow_me": false,
> "online_status": 0,
> "bi_followers_count": 167,
> "lang": "zh-cn",
> "star": 0,
> "mbtype": 0,
> "mbrank": 0,
> "block_word": 0,
> "block_app": 0,
> "ability_tags": "主持人",
> "worldcup_guess": 0
> }
>
> this is a user info. If I want analysis the gender distribution of  all of
> users whose live in "city": "1000"(1000 is a city code), I don't need other
> field except "city" and "gender", How can I exclude these meaningless field
> before they are returned. Because there are a lots of doc, If transmit the
> entire doc will wast many time and bandwidth, and I have to trim the
> additional information in myself program. so, is there a method can deal
> with problem for me? thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0f79f408-92d4-4806-8c47-02dd877ddaaf%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAF8fXiSFhpGiP31RiwpgPbwMrwszv_8chDGqopkXqyWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to limit fields of response doc when I search certain keyword?

If I understand you correctly, you want to view the distribution of gender
based on the results of a query? In that case, you want to look into
aggregations, which work on top of the result set that is returned.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/aggregations.html

Here is a query that should work with your basic use case. Substitute
aggregations for facets if you have a newer version of Elasticsearch.

{
"size": 0,
   "query": {
  "filtered": {
 "query": {
"match_all": {}
 },
 "filter": {
"term": {
   "city": "1000"
}
 }
  }
   },
   "facets": {
  "gender": {
 "terms": {
"field": "gender"
 }
  }
   }
}

-- 
Ivan


On Sat, Jul 5, 2014 at 2:12 AM, 纪路  wrote:

> Dear all:
> There is a reasonable need, but I don't find a solve in official doc or
> book, can you help me?
>
> I have a large set of docs, which contains a lot of fields, such as:
>  {
> "id": 1404999597,
> "idstr": "1404999597",
> "class": 1,
> "screen_name": "主播梦桐",
> "name": "主播梦桐",
> "province": "11",
> "city": "1000",
> "location": "北京",
> "description": "在主流与非主流之间徘徊",
> "url": "",
> "profile_image_url": "http://tp2.sinaimg.cn/1404999597/50/5642385629/0";,
> "profile_url": "u/1404999597",
> "domain": "",
> "weihao": "",
> "gender": "f",
> "followers_count": 1030710,
> "friends_count": 272,
> "statuses_count": 1519,
> "favourites_count": 90,
> "created_at": "Wed Mar 23 23:59:40 +0800 2011",
> "following": false,
> "allow_all_act_msg": false,
> "geo_enabled": false,
> "verified": true,
> "verified_type": 0,
> "remark": "",
> "status": {
> "created_at": "Tue Jul 01 13:17:55 +0800 2014",
> "id": 3727513249206064,
> "mid": "3727513249206064",
> "idstr": "3727513249206064",
> "text": "听到她的声音，我更相信她和荷西在天堂，依旧幸福着。 //@东方尔雅:现在这种纯真的爱情还好找吗？ //@晓玲-有话说:[心]",
> "source": "http://app.weibo.com/t/feed/9ksdit\";
> rel=\"nofollow\">iPhone客户端",
> "favorited": false,
> "truncated": false,
> "in_reply_to_status_id": "",
> "in_reply_to_user_id": "",
> "in_reply_to_screen_name": "",
> "pic_urls": [],
> "geo": null,
> "reposts_count": 0,
> "comments_count": 0,
> "attitudes_count": 0,
> "mlevel": 0,
> "visible": {
> "type": 0,
> "list_id": 0
> },
> "darwin_tags": []
> },
> "ptype": 1,
> "allow_all_comment": true,
> "avatar_large": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
> "avatar_hd": "http://tp2.sinaimg.cn/1404999597/180/5642385629/0";,
> "verified_reason": "电视台主持人梦桐",
> "verified_trade": "",
> "verified_reason_url": "",
> "verified_source": "",
> "verified_source_url": "",
> "follow_me": false,
> "online_status": 0,
> "bi_followers_count": 167,
> "lang": "zh-cn",
> "star": 0,
> "mbtype": 0,
> "mbrank": 0,
> "block_word": 0,
> "block_app": 0,
> "ability_tags": "主持人",
> "worldcup_guess": 0
> }
>
> My problem is when I search(or scan & scroll) a certain field, for example
> "city"=1000(1000 is its city code, which refer to a city name), there maybe
> 1 results are returned. But my goal is detect how gender of this city's
> person is distributed in my website, I don't need so many information
> except "gender" field. What method can I do for excluding meaningless data
> from the response JSON before they are returned? Because there are so many
> similar tasks for me, transmitting the entire doc will spend lots of time
> and bandwidth, and I have to trim the additional date in myself program, it
> also wast CPU time in local computer. So if you know how to deal with this
> need, pleas teach it to my. Thank you!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2a01d5f4-67a5-493a-8e35-6f9a40a9998b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCYFj8LGp%2B1jTaER10DrPbGVcbfnatkm8%2BNrOEvzbqfaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

issues with using repository-hdfs plug in for snapshot/restore operation

2014-07-07 Thread Jinyuan Zhou

I am using elasticsearch 1.2.1 and CDH 4.6. quick start vm. My ES server is
installed on the same vm.
I have one successful senario: I used light version and add the result and
command `hadoop classpath` to ES_CLASSPATH

But I encoutered errros with the default version and hadoop2 version.
Here is the details of issues.
#1. I installed the plugin with this command
bin/plugin --install elasticsearch/elasticsearch-repository-hdfs/2.0.0
and I sent a PUT request below:
url: http://localhost:9200/_snapshot/hdfs_repo
data :{
"type":"hdfs",
"settings":
{
"uri":"hdfs://localhost.localdomain:8020",
"path":"/user/cloudera/es_snapshot"
}
}

I got this response

1. "error": "RepositoryException[[hdfs_repo] failed to create repository];
nested: CreationException[Guice creation errors:

1) Error injecting constructor,
org.elasticsearch.ElasticsearchGenerationException: Cannot create Hdfs
file-system for uri [hdfs://localhost.localdomain:8020]
at org.elasticsearch.repositories.hdfs.HdfsRepository.(Unknown Source)
while locating org.elasticsearch.repositories.hdfs.HdfsRepository
while locating org.elasticsearch.repositories.Repository

1 error]; nested: ElasticsearchGenerationException[Cannot create Hdfs
file-system for uri [hdfs://localhost.localdomain:8020]]; nested:
RemoteException[Server IPC version 7 cannot communicate with client version 4];
",
2. "status": 500

I noticed RemoteException: Server IPC version 7 cannot communicate with
client version 4

#2 Then I tried hadoop2 version,. So I installed plugin with this command
bin/plugin --install
elasticsearch/elasticsearch-repository-hdfs/2.0.0-hadoop2

I sent a PUT request as above, this time I even got more strange exectiopon

NoClassDefFoundError[org/apache/commons/cli/ParseException]
Here is the response.

{
"error": "RepositoryException[[hdfs_repo] failed to create repository];
nested: CreationException[Guice creation errors:

1) Error injecting constructor, java.lang.NoClassDefFoundError:
org/apache/commons/cli/ParseException
at org.elasticsearch.repositories.hdfs.HdfsRepository.(Unknown Source)
while locating org.elasticsearch.repositories.hdfs.HdfsRepository
while locating org.elasticsearch.repositories.Repository

1 error]; nested: NoClassDefFoundError[org/apache/commons/cli/ParseException];
nested: ClassNotFoundException[org.apache.commons.cli.ParseException]; ",
"status": 500
}

I wonder if any one have simiar experiences. Note the failed cases are actaully
more realistic deplyoment choices. Because my hadoop cluster will less likely
be on the same node
as my ES server.
Thanks,
Jack

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/acb15aff-299b-4e4b-bf20-b0ed5a891f60%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search thread pools not released

Yeah, already traced it back myself. Been using Elasticsearch for years and
I have been only setting query timeouts. Need to re-architect a way to
incorporate client-based timeouts.

Had two different elasticsearch meltdowns this weekend, after a long period
of stability. Both of them different and unique!

-- 
Ivan


On Mon, Jul 7, 2014 at 1:50 PM, joergpra...@gmail.com  wrote:

> Yes, actionGet() can be traced down to AbstractQueueSynchronizer's
> acquireSharedInterruptibly(-1) call
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/AbstractQueuedSynchronizer.html#acquireSharedInterruptibly(int)
>
> in org.elasticsearch.common.util.concurrent.BaseFuture which "waits"
> forever until interrupted. But there are twin methods, like actionGet(long
> millis), that time out.
>
> Jörg
>
>
> On Mon, Jul 7, 2014 at 7:53 PM, Ivan Brusic  wrote:
>
>> Still analyzing all the logs and dumps that I have accumulated so far,
>> but it looks like the blocking socket appender might be the issue. After
>> that node exhausts all of its search threads, the TransportClient will
>> still issue requests to it, although other nodes do not have issues. After
>> a while, the client application will also be blocked waiting for
>> Elasticsearch to return.
>>
>> I removed logging for now, will re-implement it with a service that reads
>> directly from the duplicate file-based log. Although I have a timeout
>> specific for my query, my recollection of the search code is that it only
>> applies to the Lucene LimitedCollector (its been a while since I looked at
>> that code). The next step should be to add an explicit timeout
>> to actionGet(). Is the default basically no wait?
>>
>> It might be a challenge for the cluster engine to not delegate queries to
>> overloaded servers.
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Sun, Jul 6, 2014 at 2:36 PM, joergpra...@gmail.com <
>> joergpra...@gmail.com> wrote:
>>
>>> Yes, socket appender blocks. Maybe the async appender of log4j can do
>>> better ...
>>>
>>> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/
>>>
>>> Jörg
>>>
>>>
>>> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic  wrote:
>>>
 Forgot to mention the thread dumps. I have taken them before, but not
 this time. Most of the block search thead pools are stuck in log4j.

 https://gist.github.com/brusic/fc12536d8e5706ec9c32

 I do have a socket appender to logstash (elasticsearch logs in
 elasticsearch!). Let me debug this connection.

 --
 Ivan


 On Sun, Jul 6, 2014 at 1:55 PM, joergpra...@gmail.com <
 joergpra...@gmail.com> wrote:

> Can be anything seen in a thread dump what looks like stray queries?
> Maybe some facet queries hanged while resources went low and never
> returned?
>
> Jörg
>
>
> On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic  wrote:
>
>> Having an issue on one of my clusters running version 1.1.1 with 8
>> master/data nodes, unicast, connecting via the Java TransportClient. A 
>> few
>> REST queries are executed via monitoring services.
>>
>> Currently there is almost no traffic on this cluster. The few queries
>> that are currently running are either small test queries or large facet
>> queries (which are infrequent and the longest runs for 16 seconds). What 
>> I
>> am noticing is that the active search threads on some noded never 
>> decreases
>> and when it reaches the limit, the entire cluster will stop accepting
>> requests. The current max is the default (3 x 8).
>>
>> http://search06:9200/_cat/thread_pool
>>
>> search05 1.1.1.5 0 0 0 0 0 0 19 0 0
>> search07 1.1.1.7 0 0 0 0 0 0  0 0 0
>> search08 1.1.1.8 0 0 0 0 0 0  0 0 0
>> search09 1.1.1.9 0 0 0 0 0 0  0 0 0
>> search11 1.1.1.11 0 0 0 0 0 0  0 0 0
>> search06 1.1.1.6 0 0 0 0 0 0  2 0 0
>> search10 1.1.1.10 0 0 0 0 0 0  0 0 0
>> search12 1.1.1.12 0 0 0 0 0 0  0 0 0
>>
>> In this case, both search05 and search06 have an active thread count
>> that does not change. If I run a query against search05, the search will
>> respond quickly and the total number of active search threads does not
>> increase.
>>
>> So I have two related issues:
>> 1) the active thread count does not decrease
>> 2) the cluster will not accept requests if one node becomes unstable.
>>
>> I have seen the issue intermittently in the past, but the issue has
>> started again and cluster restarts does not fix the problem. At the log
>> level, there have been issues with the cluster state not propagating. Not
>> every node will acknowledge the cluster state ([discovery.zen.publish
>> ]
>> received cluster state version NNN) and the master would log a timeout
>> (awaiting all nodes to process published state NNN timed out, timeout 
>> 30s).
>> The nodes are fine and I can ping each other with no issues

Re: Search thread pools not released

Yes, actionGet() can be traced down to AbstractQueueSynchronizer's
acquireSharedInterruptibly(-1) call

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/AbstractQueuedSynchronizer.html#acquireSharedInterruptibly(int)

in org.elasticsearch.common.util.concurrent.BaseFuture which "waits"
forever until interrupted. But there are twin methods, like actionGet(long
millis), that time out.

Jörg


On Mon, Jul 7, 2014 at 7:53 PM, Ivan Brusic  wrote:

> Still analyzing all the logs and dumps that I have accumulated so far, but
> it looks like the blocking socket appender might be the issue. After that
> node exhausts all of its search threads, the TransportClient will still
> issue requests to it, although other nodes do not have issues. After a
> while, the client application will also be blocked waiting for
> Elasticsearch to return.
>
> I removed logging for now, will re-implement it with a service that reads
> directly from the duplicate file-based log. Although I have a timeout
> specific for my query, my recollection of the search code is that it only
> applies to the Lucene LimitedCollector (its been a while since I looked at
> that code). The next step should be to add an explicit timeout
> to actionGet(). Is the default basically no wait?
>
> It might be a challenge for the cluster engine to not delegate queries to
> overloaded servers.
>
> Cheers,
>
> Ivan
>
>
> On Sun, Jul 6, 2014 at 2:36 PM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> Yes, socket appender blocks. Maybe the async appender of log4j can do
>> better ...
>>
>> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/
>>
>> Jörg
>>
>>
>> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic  wrote:
>>
>>> Forgot to mention the thread dumps. I have taken them before, but not
>>> this time. Most of the block search thead pools are stuck in log4j.
>>>
>>> https://gist.github.com/brusic/fc12536d8e5706ec9c32
>>>
>>> I do have a socket appender to logstash (elasticsearch logs in
>>> elasticsearch!). Let me debug this connection.
>>>
>>> --
>>> Ivan
>>>
>>>
>>> On Sun, Jul 6, 2014 at 1:55 PM, joergpra...@gmail.com <
>>> joergpra...@gmail.com> wrote:
>>>
 Can be anything seen in a thread dump what looks like stray queries?
 Maybe some facet queries hanged while resources went low and never
 returned?

 Jörg


 On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic  wrote:

> Having an issue on one of my clusters running version 1.1.1 with 8
> master/data nodes, unicast, connecting via the Java TransportClient. A few
> REST queries are executed via monitoring services.
>
> Currently there is almost no traffic on this cluster. The few queries
> that are currently running are either small test queries or large facet
> queries (which are infrequent and the longest runs for 16 seconds). What I
> am noticing is that the active search threads on some noded never 
> decreases
> and when it reaches the limit, the entire cluster will stop accepting
> requests. The current max is the default (3 x 8).
>
> http://search06:9200/_cat/thread_pool
>
> search05 1.1.1.5 0 0 0 0 0 0 19 0 0
> search07 1.1.1.7 0 0 0 0 0 0  0 0 0
> search08 1.1.1.8 0 0 0 0 0 0  0 0 0
> search09 1.1.1.9 0 0 0 0 0 0  0 0 0
> search11 1.1.1.11 0 0 0 0 0 0  0 0 0
> search06 1.1.1.6 0 0 0 0 0 0  2 0 0
> search10 1.1.1.10 0 0 0 0 0 0  0 0 0
> search12 1.1.1.12 0 0 0 0 0 0  0 0 0
>
> In this case, both search05 and search06 have an active thread count
> that does not change. If I run a query against search05, the search will
> respond quickly and the total number of active search threads does not
> increase.
>
> So I have two related issues:
> 1) the active thread count does not decrease
> 2) the cluster will not accept requests if one node becomes unstable.
>
> I have seen the issue intermittently in the past, but the issue has
> started again and cluster restarts does not fix the problem. At the log
> level, there have been issues with the cluster state not propagating. Not
> every node will acknowledge the cluster state ([discovery.zen.publish]
> received cluster state version NNN) and the master would log a timeout
> (awaiting all nodes to process published state NNN timed out, timeout 
> 30s).
> The nodes are fine and I can ping each other with no issues. Currently not
> seeing any log errors with the thread pool issue, so perhaps it is a red
> herring.
>
> Cheers,
>
> Ivan
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91

Saved Kibana Dashboards & Tribe

2014-07-07 Thread crb89

I have a Tribe node and a Kibana instance for the Tribe node. When I try to 
save a dashboard on the Kibana instance for the Tribe node, I get the 
following errors:

PUT http://{tribe}:9200/kibana-int_{tribe}/dashboard/Logs%20Search 

 
503 (Service Unavailable) 


   1. error: "MasterNotDiscoveredException[waited for [1m]]"
   2. status: 503


I would like to be able to save dashboards from the Kibana instance for the 
Tribe node. Is this possible?




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba09008a-fed9-4553-be50-ec9f8f7bd0af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Custom Plugin for specifying custom filter attributes at query time

In Elasticsearch, you can extend the existing queries and filters, by a
plugin, with the help of addQuery/addFilter at IndexQueryParserModule

Each query or filter comes in a pair of classes, a builder and a parser.

A filter builder manages the syntax, the content serialization with the
help of XContent classes for inner/outer representation of filter
specification.

A filter parser parses such a structure and turns it into a Lucene Filter
for internal processing.

So one approach would be to look at your bit set implementation how this
can be turned into a Lucene Filter. An instructive example where to start
from is
in org.elasticsearch.index.query.TermsFilterParser/TermsFilterBuilder

An example where terms from fielddata cache are read and turned into a
filter is org.elasticsearch.index.search.FielddataTermsFilter

A key line is the method

public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs)
throws IOException

An example for caching filters
is org.elasticsearch.indices.cache.filter.terms.IndicesTermsFilterCache
(the caching of filters in ES is done with Guava's cache classes)

Also, it could be helpful to study helper classes in this context like in
package org.elasticsearch.common.lucene.docset

I am not aware of a filter plugin yet but it is possible that I could
sketch a demo filter plugin source code on github.

Jörg




On Mon, Jul 7, 2014 at 3:49 PM, Sandeep Ramesh Khanzode <
k.sandee...@gmail.com> wrote:

> Hi,
>
> A little clarification:
>
> Assume sample data set of 50M documents. The documents need to be filtered
> by a field, Field1. However, at indexing time, this field is NOT written to
> the document in Lucene through ES. Field1 is a frequently changing field
> and hence, we will like to maintain it outside.
>
> (This following paragraph can be skipped.)
> Now assume that there are a few such fields, Field1, ..., FieldN. For
> every document in the corpus, the value for Field1 may be from a pool of
> 100-odd values. Thus, for example, at max, FIeld1 can hold 1M documents
> that correspond to one of the 100-dd values, and at the fag-end, can
> probably correspond to 10 documents as well.
>
>
> (Continue reading) :-)
> I would, at system startup time, make sure that I have loaded all relevant
> BitSets that I plan to use for any Filters in memory, so that my cache
> framework is warm and I can lookup the relevant filter values for a
> particular query from this cache at query run time. The mechanisms for this
> loading are still unknown, but please assume that this BitSet will be
> available readily during query time.
>
> This BitSet will correspond to the DocIDs in Lucene for a particular value
> of Field1 that I want to filter. I plan to create a Filter class overridden
> in Lucene that will accept this DocIdSet.
>
> What I am unable to understand is how I can achieve this in ES? Now, I
> have been exploring the different mail threads on this forum, and it seems
> that certain plugins can achieve this. Please see the list below that I
> could find on this forum.
>
> Can you please tell me how an IndexQueryParserModule will serve my use
> case? If you can provide some pointers on writing a plugin that can
> leverage a CustomFilter, that will be immensely helpful. Thanks,
>
> 1.
> https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/IndexQueryParserModule$20Plugin/elasticsearch/5Gqxx3UvN2s/FL4Lb2RxQt0J
> 2. https://groups.google.com/forum/#!topic/elasticsearch/1jiHl4kngJo
> 3. https://github.com/elasticsearch/elasticsearch/issues/208
> 4.
> http://elasticsearch-users.115913.n3.nabble.com/custom-filter-handler-plugin-td4051973.html
>
> Thanks,
> Sandeep
>
> On Mon, Jul 7, 2014 at 2:17 AM, joergpra...@gmail.com <
> joergpra...@gmail.com> wrote:
>
>> Thanks for being so patient with me :)
>>
>> I understand now the following: there are 50m of documents in an external
>> DB, from which up to 1m is to be exported in form of document identifiers
>> to work as a filter in ES. The idea is to use internal mechanisms like bit
>> sets. There is no API for manipulating filters in ES on that level, ES
>> receives the terms and passes them into Lucene TermFilter class according
>> to the type of the filter.
>>
>> What is a bit unclear to me: how is the filter set constructed? I assume
>> it should be a select statement on the database?
>>
>> Next, if you have this large set of document identifiers selected, I do
>> not understand what is the base query you want to apply the filter on? Is
>> there a user given query for ES? How does such query looks like? Is it
>> assumed there are other documents in ES that are related somehow to the 50m
>> documents? An illustrative example of the steps in the scenario would
>> really help to understand the data model.
>>
>> Just some food for thought: it is close to impossible to filter in ES on
>> 1m unique terms with a single step - the default setting of maximum clauses
>> in a Lucene Query is for good reason limited to 1024 te

Best practice to backup index daily?

2014-07-07 Thread sabdalla80

I am able to take a snapshot of the index and back it up to AWS S3. What is 
the best way to automate this approach and have it done daily say every 
days at 12 midnight? 
I am aware that I can probably do it with crontab, but curious if other are 
doing it differently?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8620f7d9-b827-470d-8928-75c308e722cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch twitter river filtered stream question

2014-07-07 Thread Josh Harrison

Quick question about the ES twitter river at 
https://github.com/elasticsearch/elasticsearch-river-twitter
The twitter streaming API allows you to filter, and you apparently get up 
to 1% of the stream total, with our search queries. So, if I were filtering 
for "coffee", I'd get "coffee" tweets that I wouldn't get if I was just 
capturing the 1% stream passively.
Does the Twitter river use this filter functionality, or does it do its 
filtering on the ingestion side, ingesting the normal 1% stream and 
discarding anything that doesn't match

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7adb5f1-49a1-4424-8f4e-1c75e15c4cb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-07 Thread Brian Thomas

I am trying to update an elasticsearch index using elasticsearch-hadoop.  I 
am aware of the *es.mapping.id* configuration where you can specify that 
field in the document to use as an id, but in my case the source document 
does not have the id (I used elasticsearch's autogenerated id when indexing 
the document).  Is it possible to specify the id to update without having 
the add a new field to the MapWritable object?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce6161ad-d442-4ffb-9162-114cb8cd76dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: excessive merging/small segment sizes

2014-07-07 Thread Kireet Reddy

All that seems correct (except I think this is for node 6, not node 5). We 
don't delete documents, but we do some updates. The vast majority of 
documents get indexed into the large shards, but the smaller ones take some 
writes as well.

We aren't using virtualized hardware and elasticsearch is the only thing 
running on the machines, no scheduled jobs, etc. I don't think something is 
interfering, actually overall disk i/o rate and operations on the machine 
go down quite a bit during the problematic period, which is consistent with 
your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed that 
even though the heap usage and gc count seems smooth during the period in 
question, gc time spent goes way up. Also active indexing threads goes up, 
but since our ingest rate didn't go up I assumed this was a side effect. 
During a previous occurrence a few days ago on node5, I stopped all 
indexing activity for 15 minutes. Active merges and indexing requests went 
to zero as expected. Then I re-enabled indexing and immediately the 
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a root 
cause here. A summary of what I think I know:

   1. Every few days, cpu usage on a node goes way above the other nodes 
   and doesn't recover. We've let the node run in the elevated cpu state for a 
   day with no improvement.
   2. It doesn't seem likely that it's data related. We use replicas=1 and 
   no other nodes have issues.
   3. It doesn't seem hardware related. We run on a dedicated h/w with 
   elasticsearch being the only thing running. Also the problem appears on 
   various nodes and machine load seems tied directly to the elasticsearch 
   process.
   4. During the problematic period: cpu usage, active merge threads, 
   active bulk (indexing) threads, and gc time are elevated.
   5. During the problematic period: i/o ops and i/o throughput decrease.
   6. overall heap usage size seems to smoothly increase, the extra gc time 
   seems to be spent on the new gen. Interestingly, the gc count didn't seem 
   to increase.
   7. In the hours beforehand, gc behavior of the problematic node was 
   similar to the other nodes.
   8. If I pause indexing, machine load quickly returns to normal, merges 
   and indexing requests complete.  if I then restart indexing the problem 
   reoccurs immediately.
   9. If I disable automatic refreshes, the problem disappears within an 
   hour or so.
   10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is perhaps a 
side effect, but then the last 3 make me think merging is the root cause. 
The only additional things I can think of that may be relevant are:

   1. Our documents can vary greatly in size, they average a couple KB but 
   can rarely be several MB. 
   2. we do use language analysis plugins, perhaps one of these is acting 
   up? 
   3. We eagerly load one field into the field data cache. But the cache 
   size is ok and the overall heap behavior is ok so I don't think this is the 
   problem.

That's a lot of information, but I am not sure where to go next from here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:
>
> Indeed there are no big merges during that time ...
>
> I can see on node5, ~14:45 suddenly merges are taking a long time, refresh 
> is taking much longer (4-5 seconds instead of < .4 sec), commit time goes 
> up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine e.g. total 
> merging GB, number of commits/refreshes is very low during this time.
>
> Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.  The 
> biggish shards are indexing at a very slow rate and only have ~1% 
> deletions.  Are you explicitly deleting docs?
>
> I suspect something is suddenly cutting into the IO perf of this box, and 
> because merging/refreshing is so IO intensive, it causes these operations 
> to run slower / backlog.
>
> Are there any scheduled jobs, e.g. backups/snapshots, that start up?  Are 
> you running on virtualized hardware?
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>  
>
> On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy  > wrote:
>
>> Just to reiterate, the problematic period is from 07/05 14:45 to 07/06 
>> 02:10. I included a couple hours before and after in the logs.
>>
>>
>> On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:
>>>
>>> They are linked below (node5 is the log of the normal node, node6 is the 
>>> log of the problematic node). 
>>>
>>> I don't think it was doing big merges, otherwise during the high load 
>>> period, the merges graph line would have had a "floor" > 0, similar to the 
>>> time period after I disabled refresh. We don't do routing and use mostly 
>>> default settings. I think the only settings we changed are:
>>>
>>> indices.memory.index_buffer_size:

Re: Opening TransportClient connection per Index

You can enlarge thread pools in TransportClient, also Netty worker threads.
User session states should be managed in the front-end service (reverse
proxy or Java middleware e.g.) so it is still ok to use a singleton
TransportClient since it is stateless. It handles requests and the
corresponding responses.

If 100 users are active at the same time, resources might get low, so you
should think about ramping up a line of front-end services on more than one
machine to balance the front-end load.

Jörg


On Mon, Jul 7, 2014 at 5:45 PM, AsyncAwait  wrote:

> I have a use case in which i want to create per index per account (assume
> an account represents an user), all data belong to that user will be kept
> in that index. My question is - what if we create connection per index and
> keep it alive during the user session. So this means for 100 active users,
> there will be 100 connections from TransportClient to ES Cluster. I don't
> have a reason why I have 100 instances of TransportClient has to be
> initialized and kept in memory. Not sure if a singleton would unnecessarily
> bring thread locks.
>
> Any help is greatly appreciated.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ce54bd4b-1c11-429a-8998-76482f051375%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEjt3qCUWD_5rxv84XL8BYNMzy2dKsfsfr8j-017cNbow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How can I do intersection or union operation with two facets filter？

2014-07-07 Thread Harish Ved

Did you try the following query?
"query":{
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{"term" : { "field_B" : "" }},
{"term": {"field_B": "bbb"}}
]
}
}
}
  },
"facets" : {
"tag" : {
"terms" : {
"field" : "field_A"
}
}
}
}


Plz confirm if it works for you.

On Monday, 7 July 2014 15:56:53 UTC+5:30, 闫旭 wrote:
>
>  Dear All!
> I have some docs:
> {"field_A":"aaa","field_B":"bbb"}
> {"field_A":"aaa","field_B":"ccc"}
> {"field_A":"bbb","field_B":"bbb"}
> {"field_A":"bbb","field_B":"bbb"}
> {"field_A":"bbb","field_B":"eee"}
> {"field_A":"aaa","field_B":""}
> {"field_A":"ccc","field_B":""}
> first step:
> {
>   "query":{
> "filtered" : {
> "filter" : {
> "bool" : {
> "must" : {
> "term" : { "field_B" : "bbb" }
> }
> }
> }
> }
>   },
> "facets" : {
> "tag" : {
> "terms" : {
> "field" : "field_A"
> }
> }
> }
> }
> first result:
> {
> ...
> {"term":"aaa","count":1},
> {"term":"bbb","count":2},
> ...
> }
> -
> second step:
> the second facets:
> {
>   "query":{
> "filtered" : {
> "filter" : {
> "bool" : {
> "must" : {
> "term" : { "field_B" : "" }
> }
> }
> }
> }
>   },
> "facets" : {
> "tag" : {
> "terms" : {
> "field" : "field_A"
> }
> }
> }
> }
> second result:
> {
> ...
> {"term":"aaa","count":1}.
> {"term":"ccc","count":1}
> ...
> }
>
> -
> third step:
> combine the two result with interesction operation with "term":
> {"term":"aaa","count":"I don't care the count value."}
>
>
> -
> Now, How can i combine the three steps in one filter facets or othen 
> method??
>
>
> Thx All!!
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/93d5adb6-bcd1-49b6-8c28-be4d456f92d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Search thread pools not released

Still analyzing all the logs and dumps that I have accumulated so far, but
it looks like the blocking socket appender might be the issue. After that
node exhausts all of its search threads, the TransportClient will still
issue requests to it, although other nodes do not have issues. After a
while, the client application will also be blocked waiting for
Elasticsearch to return.

I removed logging for now, will re-implement it with a service that reads
directly from the duplicate file-based log. Although I have a timeout
specific for my query, my recollection of the search code is that it only
applies to the Lucene LimitedCollector (its been a while since I looked at
that code). The next step should be to add an explicit timeout
to actionGet(). Is the default basically no wait?

It might be a challenge for the cluster engine to not delegate queries to
overloaded servers.

Cheers,

Ivan

On Sun, Jul 6, 2014 at 2:36 PM, joergpra...@gmail.com  wrote:

> Yes, socket appender blocks. Maybe the async appender of log4j can do
> better ...
>
> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/
>
> Jörg
>
>
> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic  wrote:
>
>> Forgot to mention the thread dumps. I have taken them before, but not
>> this time. Most of the block search thead pools are stuck in log4j.
>>
>> https://gist.github.com/brusic/fc12536d8e5706ec9c32
>>
>> I do have a socket appender to logstash (elasticsearch logs in
>> elasticsearch!). Let me debug this connection.
>>
>> --
>> Ivan
>>
>>
>> On Sun, Jul 6, 2014 at 1:55 PM, joergpra...@gmail.com <
>> joergpra...@gmail.com> wrote:
>>
>>> Can be anything seen in a thread dump what looks like stray queries?
>>> Maybe some facet queries hanged while resources went low and never
>>> returned?
>>>
>>> Jörg
>>>
>>>
>>> On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic  wrote:
>>>
 Having an issue on one of my clusters running version 1.1.1 with 8
 master/data nodes, unicast, connecting via the Java TransportClient. A few
 REST queries are executed via monitoring services.

 Currently there is almost no traffic on this cluster. The few queries
 that are currently running are either small test queries or large facet
 queries (which are infrequent and the longest runs for 16 seconds). What I
 am noticing is that the active search threads on some noded never decreases
 and when it reaches the limit, the entire cluster will stop accepting
 requests. The current max is the default (3 x 8).

 http://search06:9200/_cat/thread_pool

 search05 1.1.1.5 0 0 0 0 0 0 19 0 0
 search07 1.1.1.7 0 0 0 0 0 0  0 0 0
 search08 1.1.1.8 0 0 0 0 0 0  0 0 0
 search09 1.1.1.9 0 0 0 0 0 0  0 0 0
 search11 1.1.1.11 0 0 0 0 0 0  0 0 0
 search06 1.1.1.6 0 0 0 0 0 0  2 0 0
 search10 1.1.1.10 0 0 0 0 0 0  0 0 0
 search12 1.1.1.12 0 0 0 0 0 0  0 0 0

 In this case, both search05 and search06 have an active thread count
 that does not change. If I run a query against search05, the search will
 respond quickly and the total number of active search threads does not
 increase.

 So I have two related issues:
 1) the active thread count does not decrease
 2) the cluster will not accept requests if one node becomes unstable.

 I have seen the issue intermittently in the past, but the issue has
 started again and cluster restarts does not fix the problem. At the log
 level, there have been issues with the cluster state not propagating. Not
 every node will acknowledge the cluster state ([discovery.zen.publish]
 received cluster state version NNN) and the master would log a timeout
 (awaiting all nodes to process published state NNN timed out, timeout 30s).
 The nodes are fine and I can ping each other with no issues. Currently not
 seeing any log errors with the thread pool issue, so perhaps it is a red
 herring.

 Cheers,

 Ivan

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com

 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid

Re: java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark

2014-07-07 Thread Costin Leau

Thanks for the analysis. It looks like Hadoop 1.0.4 POM has an invalid pom
- though it uses Jackson 1.8.8 (see the distro) the pom declares version
1.0.1 for some reason. Hadoop version 1.2 (the latest stable) and higher
has this fixed.

We don't mark the jackson version within our POM since it's already
available at runtime - we can probably due so going forward in the Spark
integration.


On Mon, Jul 7, 2014 at 6:39 PM, Brian Thomas 
wrote:

> Here is the gradle build I was using originally:
>
> apply plugin: 'java'
> apply plugin: 'eclipse'
>
> sourceCompatibility = 1.7
> version = '0.0.1'
> group = 'com.spark.testing'
>
> repositories {
> mavenCentral()
> }
>
> dependencies {
> compile 'org.apache.spark:spark-core_2.10:1.0.0'
> compile 'edu.stanford.nlp:stanford-corenlp:3.3.1'
> compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version:
> '3.3.1', classifier:'models'
> compile files('lib/elasticsearch-hadoop-2.0.0.jar')
> testCompile 'junit:junit:4.+'
> testCompile group: "com.github.tlrx", name: "elasticsearch-test", version:
> "1.2.1"
> }
>
>
> When I ran dependencyInsight on jackson, I got the following output:
>
> C:\dev\workspace\SparkProject>gradle dependencyInsight --dependency
> jackson-core
>
> :dependencyInsight
> com.fasterxml.jackson.core:jackson-core:2.3.0
> \--- com.fasterxml.jackson.core:jackson-databind:2.3.0
>  +--- org.json4s:json4s-jackson_2.10:3.2.6
>  |\--- org.apache.spark:spark-core_2.10:1.0.0
>  | \--- compile
>  \--- com.codahale.metrics:metrics-json:3.0.0
>   \--- org.apache.spark:spark-core_2.10:1.0.0 (*)
>
> org.codehaus.jackson:jackson-core-asl:1.0.1
> \--- org.codehaus.jackson:jackson-mapper-asl:1.0.1
>  \--- org.apache.hadoop:hadoop-core:1.0.4
>   \--- org.apache.hadoop:hadoop-client:1.0.4
>\--- org.apache.spark:spark-core_2.10:1.0.0
> \--- compile
>
> Version 1.0.1 of jackson-core-asl does not have the field
> ALLOW_UNQUOTED_FIELD_NAMES, but later versions of it do.
>
> On Sunday, July 6, 2014 4:28:56 PM UTC-4, Costin Leau wrote:
>
>> Hi,
>>
>> Glad to see you sorted out the problem. Out of curiosity what version of
>> jackson were you using and what was pulling it in? Can you share you maven
>> pom/gradle build?
>>
>>
>> On Sun, Jul 6, 2014 at 10:27 PM, Brian Thomas 
>> wrote:
>>
>>> I figured it out, dependency issue in my classpath.  Maven was pulling
>>> down a very old version of the jackson jar.  I added the following line to
>>> my dependencies and the error went away:
>>>
>>> compile 'org.codehaus.jackson:jackson-mapper-asl:1.9.13'
>>>
>>>
>>> On Friday, July 4, 2014 3:22:30 PM UTC-4, Brian Thomas wrote:

  I am trying to test querying elasticsearch using Apache Spark using
 elasticsearch-hadoop.  I am just trying to do a query to the elasticsearch
 server and return the count of results.

 Below is my test class using the Java API:

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.io.MapWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.JavaPairRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.serializer.KryoSerializer;
 import org.elasticsearch.hadoop.mr.EsInputFormat;

 import scala.Tuple2;

 public class ElasticsearchSparkQuery{

 public static int query(String masterUrl, String
 elasticsearchHostPort) {
 SparkConf sparkConfig = new SparkConf().setAppName("ESQuer
 y").setMaster(masterUrl);
 sparkConfig.set("spark.serializer",
 KryoSerializer.class.getName());
 JavaSparkContext sparkContext = new
 JavaSparkContext(sparkConfig);

 Configuration conf = new Configuration();
 conf.setBoolean("mapred.map.tasks.speculative.execution",
 false);
 conf.setBoolean("mapred.reduce.tasks.speculative.execution",
 false);
 conf.set("es.nodes", elasticsearchHostPort);
 conf.set("es.resource", "media/docs");
 conf.set("es.query", "?q=*");

 JavaPairRDD esRDD =
 sparkContext.newAPIHadoopRDD(conf, EsInputFormat.class, Text.class,
 MapWritable.class);
 return (int) esRDD.count();
 }
 }


 When I try to run this I get the following error:


 4/07/04 14:58:07 INFO executor.Executor: Running task ID 0
 14/07/04 14:58:07 INFO storage.BlockManager: Found block broadcast_0
 locally
 14/07/04 14:58:07 INFO rdd.NewHadoopRDD: Input split: ShardInputSplit
 [node=[5UATWUzmTUuNzhmGxXWy_w/S'byll|10.45.71.152:9200],shard=0]
 14/07/04 14:58:07 WARN mr.EsInputFormat: Cannot determine task id...
 14/07/04 14:58:07 ERROR executor.Executor: Exception in task ID 0
 java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES
 at org.elasticsearch.hadoop.

Re: Memory issues on ES client node