date:20140619

Re: Splunk vs. Elastic search performance?

2014-06-19 Thread joergpra...@gmail.com

It is correct you noted that Elasticsearch comes with developer settings -
that is exactly what a packages ES is meant for.

If you find issues when configuring and setting up ES for critical use, it
would be nice to post your issues so others can also find help too, and
maybe share their solutions , because there are ES installations that run
successfully in critical environments.

By just quoting "hate" of dev teams, it is rather impossible for me to
learn about the reason why this is so. Learning facts is more important
than emotions to fix software issues. The power of open source is that such
issues can be fixed by the help of a public discussion in the community. In
closed software products, you can not rely on issues being discussed
publicly for best solutions how to fix them.

Jörg



On Thu, Jun 19, 2014 at 2:48 PM, Thomas Paulsen 
wrote:

> We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
> Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
> The system is slow but ok to use.
>
> We tried Elasticsearch and we were able to get the same performance with
> the same amount of machines. Unfortunately with Elasticsearch you need
> almost double amount of storage, plus a LOT of patience to make is run. It
> took us six months to set it up properly, and even now, the system is quite
> buggy and instable and from time to time we loose data with Elasticsearch.
>
> I don´t recommend ELK for a critical production system, for just dev work,
> it is ok, if you don´t mind the hassle of setting up and operating it. The
> costs you save by not buying a splunk license you have to invest into
> consultants to get it up and running. Our dev teams hate Elasticsearch and
> prefer Splunk.
>
> Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
>>
>> That's a lot of data! I don't know of any installations that big but
>> someone else might.
>>
>> What sort of infrastructure are you running splunk on now, what's your
>> current and expected retention?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 19 April 2014 07:33, Frank Flynn  wrote:
>>
>>> We have a large Splunk instance.  We load about 1.25 Tb of logs a day.
>>>  We have about 1,300 loaders (servers that collect and load logs - they may
>>> do other things too).
>>>
>>> As I look at Elasticsearch / Logstash / Kibana does anyone know of a
>>> performance comparison guide?  Should I expect to run on very similar
>>> hardware?  More? or Less?
>>>
>>> Sure it depends on exactly what we're doing, the exact queries and the
>>> frequency we'd run them but I'm trying to get any kind of idea before we
>>> start.
>>>
>>> Are there any white papers or other documents about switching?  It seems
>>> an obvious choice but I can only find very little performance comparisons
>>> (I did see that Elasticsearch just hired "the former VP of Products at
>>> Splunk, Gaurav Gupta" - but there were few numbers in that article either).
>>>
>>> Thanks,
>>> Frank
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGtte%3DRWjZCNtBWcX5y4Z9j7yXpyXC5MWdzpqubtCce5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Extremely slow indexing -- java throwing http excetion errors

2014-06-19 Thread Alexander Reelsen

Hey.

judging from the exception this looks like an unstable network connection?
Are you using persistent HTTP connections? Pinging the nodes by each other
is not a problem I guess?


--Alex


On Thu, Jun 19, 2014 at 12:12 AM,  wrote:

> Hello all,
>
> So here's the issue, our cluster was previously very underwhelmed as far
> as resource consumption, and after some config changes (see complete config
> below) -- we were able to hike up resource consumption, but are still
> indexing documents at the same sluggish rate of < 400 docs/second.
>
> Redis and Logstash are definitely not the bottlenecks, and the indexing
> seems to be growing exponentially worse as we pull in more data.  We are
> using elasticsearch v 1.1.1.
>
> The java http exception errors would definitely explain the slugishness,
> as there seems to be a socket timeout every second, like clockwork -- but
> i'm at a loss for what could be causing the errors to begin with.
>
> We are running redis,logstash kibana and the es master (no data) on one
> node, and have our elasticsearch data instance on another node.  Network
> latency is definitely not so atrocious that it would be an outright
> bottleneck, and data gets to the secondary node fast enough -- but is
> backed up in indexing.
>
> Any help would greatly be appreciated, and I thank you all in advance!
>
> ### ES CONFIG ###
>
>
> index.indexing.slowlog.threshold.index.warn: 10s
> index.indexing.slowlog.threshold.index.info: 5s
> index.indexing.slowlog.threshold.index.debug: 2s
> index.indexing.slowlog.threshold.index.trace: 500ms
>
>
>
> monitor.jvm.gc.young.warn: 1000ms
> monitor.jvm.gc.young.info: 700ms
> #monitor.jvm.gc.young.debug: 400ms
>
> monitor.jvm.gc.old.warn: 10s
> monitor.jvm.gc.old.info: 5s
> #monitor.jvm.gc.old.debug: 2s
> cluster.name: iislog-cluster
> node.name: "VM-ELKIIS"
> discovery.zen.ping.multicast.enabled: true
> discovery.zen.ping.unicast.hosts: ["192.168.6.145"]
> discovery.zen.ping.timeout: 5
> node.master: true
> node.data: false
> index.number_of_shards: 10
> index.number_of_replicas: 0
> bootstrap.mlockall: true
> index.refresh_interval: 30
> indices.memory.index_buffer_size: 50%
> index.translog.flush_threshold_ops: 5
> index.store.type: mmapfs
> index.store.compress.stored: true
>
> threadpool.search.type: fixed
> threadpool.search.size: 20
> threadpool.search.queue_size: 100
>
> threadpool.index.type: fixed
> threadpool.index.size: 20
> threadpool.index.queue_size: 100
>
>  JAVA ERRORS IN ES LOG ###
>
> [2014-06-18 09:39:09,565][DEBUG][http.netty   ] [VM-ELKIIS]
> Caught exception while handling client http traffic, closing connection
> [id: 0x7561184c, /192.168.6.3:6206 => /192.168.6.21:9200]
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at
> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
> at
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
> at
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
> at
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
> at
> org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at
> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at
> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/95e3bc66-b403-4844-a798-da0f25141ca6%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-jK5P8DQWxVPzvcv

storage use by attachment plugin

2014-06-19 Thread Izam Fahmi Alias

Dear All,

i have about 20GB size of document, and i want to index all the document 
content using attachment plugin, my question is, what is the size of the 
index, is't  the size will be also 20gb

thank you

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/90ac9a54-6014-4db7-8583-6771aa2568c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How does shingle filter work on match_phrase in query phase?

2014-06-19 Thread 陳智清

How does shingle filter work on match_phrase in query phase? 

After analyzing phrase "t1 t2 t3", shingle filter produced five tokens, 
  t1
  t2
  t3
  "t1 t2"
  "t2 t3"

Will match_phrase still give "t1 t2 t3" a match? How it works? Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Include specific terms that don't exist on a particular field when applying terms aggregations

2014-06-19 Thread Raghav Jalan

Hey,

We all know that terms aggregator groups the count of specific fields and 
give it to us. But, suppose I want to display an extra field that doesn't 
exist in the field counts and display 
it as 0 count. Do we have anything for that provision

-Raghav

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/db3cd801-294d-41f7-9167-f245bf111ef0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Phrase Suggester with multiple Fields.

2014-06-19 Thread Bruno Miranda

I am using the phrase suggester to implement did-you-mean functionality. My 
source field is named did_you_mean_source which is a combination of first 
and last name with a space in the middle.

When I search for say "allex blak" I do get fairly descent suggestions, 
including the "alex black" I am hoping to get. 

*The problem is I also get "alex [any other first and/or last name present 
in the index that has a subbestion for blak]". This suggestion doesn't 
match to any "results" (not suggestion result) but actual search result in 
my index.*

So you search for first and last name which you happen to misspelled, but 
the suggestions propose another spelling which doesn't actually return any 
index search results. In other words, I need to somehow preserve the 
identity that one is a first name and the other is a last name.

What I am trying to achieve is a did-you-mean functionality where I can 
type a first and last name, and get suggestions based of of the 
did_you_mean_source but only where both typed words match a document. 


Here's my analyzer:
did_you_mean:
  type: custom
  tokenizer: standard
  filter: ["lowercase", "trim",]


Here's the suggest part of my query:

"suggest": {
  "text": "allex blak",
  "did_you_mean" : {
"phrase" : {
  "field": "did_you_mean_source",
  "real_word_error_likelihood": 0.90,
  "max_errors": 1,
  "direct_generator" : [{
"field" : "did_you_mean_source",
"suggest_mode" : "always",
"min_word_length" : 3,
"size": 5,
"prefix_length": 2,
"min_doc_freq": 1
  }]
}
  }
}



Here's what my mappings look like: 

{
  "development_search_suggestions": {
"mappings": {
  "search_suggestion": {
"_all": {
  "enabled": false
},
"properties": {
  "did_you_mean_source": {
"type": "string",
"analyzer": "did_you_mean"
  },
  "keywords": {
"type": "string",
"index_options": "offsets",
"analyzer": "full",
"fields": {
  "partial": {
"type": "string",
"index_options": "offsets",
"index_analyzer": "partial_auto_suggest",
"search_analyzer": "full_with_auto_suggest_synonyms"
  },
  "synonymic": {
"type": "string",
"index_analyzer": "full_with_auto_suggest_synonyms",
"search_analyzer": "full"
  }
}
  },
  "keywords_auxiliary": {
"type": "string",
"index_options": "offsets",
"analyzer": "full",
"fields": {
  "partial": {
"type": "string",
"index_options": "offsets",
"index_analyzer": "partial_auto_suggest",
"search_analyzer": "full_with_auto_suggest_synonyms"
  },
  "synonymic": {
"type": "string",
"index_analyzer": "full_with_auto_suggest_synonyms",
"search_analyzer": "full"
  }
}
  }
}
  }
}
  }
}


*A few examples:*

Say my index contains the following documents with first and last name in 
respective orders:
bruno miranda
miranda bella
bran scott



If I search for:  "brno miranda" I should get a suggestion for "bruno 
miranda" but I should not have suggestions for "bran miranda" because that 
document doesn't exist in the db. It's simply a mismatch of a first name + 
a different document's last name. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e4f16d17-a0fd-4a4f-88fd-bbeaf7238353%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Logstash limitting ElasticSearch heap

2014-06-19 Thread Antonio Augusto Santos

Thanks for you response Mark.

I think I've finally fine tuned my scenario...
For starters, it helped me A LOT to set xms on Logstash to the same value 
as LS_HEAP_SIZE. It really reduced the GC.

Second, I followed some tips form 
http://jablonskis.org/2013/elasticsearch-and-logstash-tuning/index.html and 
https://blog.codecentric.de/en/2014/05/elasticsearch-indexing-performance-cheatsheet/,
 
for increasing my indexing speed (search is second here). 

After that I increased the number of workers on LS (had to change 
/etc/init.d/logstash, since it was not respecting LS_WORKERS on 
/etc/sysconfig/logstash). This made a big difference, and I could finally 
see that the workers were being my bottleneck (with 3 workers my 4 cores 
were hitting 100% usage all the time). So I increased my VM cores to 8, set 
LS_WORKERS to 6, and set workers to 3 on the elasticsearch output.  The 
major boost came form these changes. And I could see LS is heavily CPU 
dependent.

Last, but not least, I changed my log strategy. Instead of saving the logs 
to disk with syslog and reading it back with LS, I got a setup a scenario 
like http://cookbook.logstash.net/recipes/central-syslog/ and got my self a 
redis server as a temp storage (for these logs I don't need logs on file, 
ES will do just fine).

After that I've bumped my indexing speed from about 500 tps to about 4k 
TPS. 

Not bad ;)

On Thursday, June 19, 2014 5:32:56 AM UTC-3, Mark Walkom wrote:
>
> Lots of GC isn't bad, you want to see a lot of small GCs rather than 
> stop-the-world sort of ones which can bring your cluster down.
>
> You can try increasing the index refresh interval 
> - index.refresh_interval. If you don't require "live" access, then 
> increasing it to 60 seconds or more will help.
> If you can gist/pastebin a bit more info on your cluster, node specs, 
> versions, total indexes and size etc it may help.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 18 June 2014 22:53, Antonio Augusto Santos  > wrote:
>
>> Hello,
>>
>> I think I'm hitting some kind of wall here... I'm running logstash on a 
>> syslog server. It receives logs from about 150 machines and also a LOT of 
>> iptables logs, and sending it to ElasticSearch. But, I think I'm not 
>> hitting all speed that I should. My Logstash throughput tops at about 1.000 
>> events/s, and it looks like my ES servers (I've 2) are really light. 
>>
>> On logstash I've three configs (syslog, ossec and iptables), so I get 
>> three new nodes on my cluster. I've set up LS Heap Size to be 2G, but 
>> according to bigdesk, the ES module is getting only about 150MB, and its 
>> generating a LOT of GC.
>>
>> Bellow the screenshot for big desk:
>>
>> [image: bigdesk] 
>> 
>>
>> And here the logstash process I'm running:
>>
>> *#ps -ef | grep logstash
>> logstash 13371 1 99 14:42 pts/000:29:37 /usr/bin/java 
>> -Djava.io.tmpdir=/opt/logstash/tmp -Xmx2g -XX:+UseParNewGC 
>> -XX:+UseConcMarkSweepGC -Djava.awt.headless=true 
>> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram 
>> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime 
>> -Xloggc:./logstash-gc.log -jar 
>> /opt/logstash/vendor/jar/jruby-complete-1.7.11.jar -I/opt/logstash/lib 
>> /opt/logstash/lib/logstash/runner.rb agent -f /etc/logstash/conf.d -l 
>> /var/log/logstash/logstash.log*
>>
>>
>>  
>>
>>  My Syslog/LS memory usage seens very light as well (its a 4 core VM), but 
>> the logstash process is always topping in about 150% - 200%
>>
>>
>> *# free -m
>>  total   used   free sharedbuffers cached
>> Mem:  7872   2076   5795  0 39   1502
>> -/+ buffers/cache:534   7337
>> Swap: 1023  8   1015
>> # uptime
>>  15:02:04 up 23:52,  1 user,  load average: 1.39, 1.12, 0.96*
>>
>>  
>>
>> Any ideas what I can do to increase the indexing performance?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/b6346f68-1c17-4699-8ad0-cb9121b5c7cb%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasti

Re: Splunk vs. Elastic search performance?

2014-06-19 Thread Mark Walkom

I'd be interested in knowing what problems you had with ELK, if you don't
mind sharing.

I understand the ease of splunk, but ELK isn't that difficult if you have
some in-house linux skills.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 June 2014 22:48, Thomas Paulsen  wrote:

> We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
> Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
> The system is slow but ok to use.
>
> We tried Elasticsearch and we were able to get the same performance with
> the same amount of machines. Unfortunately with Elasticsearch you need
> almost double amount of storage, plus a LOT of patience to make is run. It
> took us six months to set it up properly, and even now, the system is quite
> buggy and instable and from time to time we loose data with Elasticsearch.
>
> I don´t recommend ELK for a critical production system, for just dev work,
> it is ok, if you don´t mind the hassle of setting up and operating it. The
> costs you save by not buying a splunk license you have to invest into
> consultants to get it up and running. Our dev teams hate Elasticsearch and
> prefer Splunk.
>
> Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
>>
>> That's a lot of data! I don't know of any installations that big but
>> someone else might.
>>
>> What sort of infrastructure are you running splunk on now, what's your
>> current and expected retention?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 19 April 2014 07:33, Frank Flynn  wrote:
>>
>>> We have a large Splunk instance.  We load about 1.25 Tb of logs a day.
>>>  We have about 1,300 loaders (servers that collect and load logs - they may
>>> do other things too).
>>>
>>> As I look at Elasticsearch / Logstash / Kibana does anyone know of a
>>> performance comparison guide?  Should I expect to run on very similar
>>> hardware?  More? or Less?
>>>
>>> Sure it depends on exactly what we're doing, the exact queries and the
>>> frequency we'd run them but I'm trying to get any kind of idea before we
>>> start.
>>>
>>> Are there any white papers or other documents about switching?  It seems
>>> an obvious choice but I can only find very little performance comparisons
>>> (I did see that Elasticsearch just hired "the former VP of Products at
>>> Splunk, Gaurav Gupta" - but there were few numbers in that article either).
>>>
>>> Thanks,
>>> Frank
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y1eem-s6hD3QLfnKHJdZS2p5jtwO%2ByyMbqbcYDrroH1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Mapping for Java HashMap

2014-06-19 Thread rpras82

Hi,

How would I define a custom mapping if I wanted to index/store only the 
keys in the map and discard the values?

Prasanna

On Monday, March 17, 2014 2:28:39 AM UTC-7, Tomislav Poljak wrote:
>
> Hi, 
> I think first you need to define is what is the requirement (or 
> expectation) of hashMap (sub)object in index -> do you need to search 
> on key/value pairs? 
>
> Like for example, query = 'hashMap.N_10290607:XY' ? 
>
>
> If not, if you only need to store json serialization of haspmap as 
> part of bigger object inside index (_source) to retrieve it later, you 
> can define not to expand mapping dynamically  ("dynamic" : false) in 
> the 'hashMap' (sub)object - part of mapping for, like bellow: 
>
> ... 
> "properties" : { 
>   "hashMap" : { 
> "dynamic" : false, 
> "properties" : { 
> .. 
>
>
> This is done by a custom mapping. If you do not define a custom 
> mapping elasticsearch will, by default, create and expand mapping for 
> hashMap for all keys which will sooner or later create issues in 
> index. For more details check 
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-object-type.html#_dynamic
>  
>
> If you don't know how to create mapping in the first place you can get 
> existing mapping 
> (
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html),
>  
>
> remove fields added/recognised by elasticsearch automatically, define 
> "dynamic" : false for the 'hashMap' object and use new mapping when 
> reindexing data. 
>
>
> Hope this helps, 
>
> Tomislav 
>
> 2014-03-12 19:37 GMT+01:00  >: 
> > Hey Peter, 
> > 
> > Did you find a resolution for Hashmap mappings? Looking for the same 
> thing. 
> > 
> > Thanks, 
> > Sangita. 
> > 
> > 
> > On Tuesday, January 14, 2014 9:03:35 AM UTC-5, Peter Webber wrote: 
> >> 
> >> 
> >> Hi Alex, 
> >> 
> >> thanks for your reply first of all. The problem is that I will have 
> >> millions of keys, and that ES creates a mapping for each. That will 
> bloat 
> >> the mappings data structure and probably lead to some memory and/or 
> >> performance issues somewhere (I don't know enough about ES internals, 
> to 
> >> know precisely where, but it cannot be good to have a few million 
> entries in 
> >> the mapping where one would do.) 
> >> 
> >> Hope that helps! 
> >> Peter 
> >> 
> >> 
> >> 
> >> Am Dienstag, 14. Januar 2014 13:01:22 UTC+1 schrieb Alexander Reelsen: 
> >>> 
> >>> Hey Peter, 
> >>> 
> >>> can you tell me, where your problem with the above approach actually 
> is? 
> >>> You feed a number of key/value pairs into elasticsearch, each key and 
> each 
> >>> value is evaluated by its type and then put into the mapping, as each 
> key 
> >>> becomes an own field in elasticsearch, which can be searched for. 
> Wondering 
> >>> why this is a problem for you? Or why do you want to avoid that? 
> >>> 
> >>> Also, where and how do you want to change the mapping to? 
> >>> 
> >>> I am a bit confused what and why you are expecting to be different 
> than 
> >>> it actually is. Maybe you should not think in java data structures but 
> >>> rather in JSON, which is being indexed and needs to create a mapping 
> in 
> >>> order to be able to search for it. Happy to help, if I understand what 
> you 
> >>> are trying to do. Please elaborate. 
> >>> 
> >>> 
> >>> --Alex 
> >>> 
> >>> 
> >>> On Mon, Jan 13, 2014 at 11:15 AM, Oliver B. Fischer 
> >>>  wrote: 
>  
>  Hi Peter, 
>  
>  ES allows you to defined dynamic mappings there you can determine the 
>  mapping of a property based on the evaluation of some conditions. 
>  
>  
>  
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html
>  
>  
>  Oliver 
>  
>  Am 12.01.14 17:16, schrieb Peter Webber: 
>  
> > 
> > I had a look at the mapping ES created automatically for one of my 
> > indices, and found something that's not quite right: 
> > 
> >  
> > |||  "annotations"| |: { | 
> > |||"properties"| |: { | 
> > |||"ids"| |: { | 
> > |||"properties"| |: { | 
> > |||"hashMap"| |: { | 
> > |||"properties"| |: { | 
> > |||"N_10290607"| |: { | 
> > |||"type"| |: ||"double"| 
> > |||}, | 
> > |||"A_1038408"| |: { | 
> > |||"type"| |: ||"double"| 
> > |||}, | 
> > |||"A_11585994"| |: { | 
> > |||"type"| |: ||"double"| 
> > |||}, | 
> > |||"B_1245677"| |: { | 
> > |||"type"| |: ||"double"| 
> > |||}, | 
> > |||"B_1269810"| |: { | 
> > |||"type"| |: ||"double"| 
> > |||}, | 
> > |||"C_15680034"| |: { | 
> > |||"type"| |: ||"double"| 
> > |||}, | 
> > |||"N_1654171"| |: { | 
> > |||"type"| |: ||"double"| 
> > |||}, 
> > ... 
> > 
> > I use Gson to convert Java classes to Json and then directly put 
> them 
> > into ES. One of the classes I use has a HashMap

Re: How to find the number of authors who have written between 2-3 books?

2014-06-19 Thread Itamar Syn-Hershko

This is a Map/Reduce operation, you'll be better off maintaining a
ref-count document IMO then trying to hack the aggregations framework to
support this

Another reason for doing it that way is in a distributed environment some
aggregations can't be computed to an exact value - the Terms bucketing is
one example. So if you need exact values, I'd go for a model that does it.

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Fri, Jun 20, 2014 at 1:34 AM, Mike  wrote:

> Assume each document is a book:
> { title: "A", author: "Mike" }
> { title: "B", author: "Mike" }
> { title: "C", author: "Mike" }
> { title: "D", author: "Mike" }
>
> { title: "E", author: "John" }
> { title: "F", author: "John" }
> { title: "G", author: "John" }
>
> { title: "H", author: "Joe" }
> { title: "I", author: "Joe" }
>
> { title: "J", author: "Jack" }
>
>
> What is the best way to fin the number of authors who have written between
> 2-3 books?  In this case it would be 2, John and Joe.
>
> I know I can do a terms aggregation on author, set size to be very very
> large, and then on the client side traverse through the thousands of
> authors and count how many had between 2-3.  Is there a more efficient way
> to do this?  The cardinality aggregation is almost what I want, if only I
> could specify a min and max term count.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zv5%3DmuahwGVbGobX5SgMHYzC_bD4udiZ3XTiAdU1v8YCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

How to find the number of authors who have written between 2-3 books?

2014-06-19 Thread Mike

Assume each document is a book:  
{ title: "A", author: "Mike" }
{ title: "B", author: "Mike" }
{ title: "C", author: "Mike" }
{ title: "D", author: "Mike" }

{ title: "E", author: "John" }
{ title: "F", author: "John" }
{ title: "G", author: "John" }

{ title: "H", author: "Joe" }
{ title: "I", author: "Joe" }

{ title: "J", author: "Jack" }


What is the best way to fin the number of authors who have written between 
2-3 books?  In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very 
large, and then on the client side traverse through the thousands of 
authors and count how many had between 2-3.  Is there a more efficient way 
to do this?  The cardinality aggregation is almost what I want, if only I 
could specify a min and max term count. 


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch Node.Client Options

2014-06-19 Thread VB

And this stack trace.

[2014-06-04 14:47:12,939][INFO ][cluster.service  ] [BUS2F2801F3] 
master {new 
[ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false,
 
max_local_storage_nodes=1, master=true}, previous 
[ELS-10.76.121.130][BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 
max_local_storage_nodes=1, master=true}}, removed 
{[ELS-10.76.121.130][BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 
max_local_storage_nodes=1, master=true},}, reason: zen-disco-master_failed 
([ELS-10.76.121.130][BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 
max_local_storage_nodes=1, master=true})
[2014-06-04 14:48:03,969][WARN ][monitor.jvm  ] [BUS2F2801F3] 
[gc][old][55503][489] duration [49.6s], collections [1]/[49.9s], total 
[49.6s]/[4.5h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] 
[532.5mb]->[532.5mb]/[532.5mb]}{[survivor] 
[51.3mb]->[42.8mb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2014-06-04 14:48:40,256][WARN ][monitor.jvm  ] [BUS2F2801F3] 
[gc][old][55504][490] duration [35.7s], collections [1]/[36.2s], total 
[35.7s]/[4.5h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] 
[532.5mb]->[532.5mb]/[532.5mb]}{[survivor] 
[42.8mb]->[58.6mb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2014-06-04 14:49:30,335][WARN ][monitor.jvm  ] [BUS2F2801F3] 
[gc][old][55505][491] duration [49.9s], collections [1]/[50s], total 
[49.9s]/[4.5h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] 
[532.5mb]->[532.5mb]/[532.5mb]}{[survivor] 
[58.6mb]->[63.7mb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2014-06-04 14:49:30,350][INFO ][discovery.zen] [BUS2F2801F3] 
master_left 
[[ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false,
 
max_local_storage_nodes=1, master=true}], reason [failed to ping, tried [3] 
times, each with  maximum [30s] timeout]
[2014-06-04 14:49:30,865][WARN ][discovery.zen] [BUS2F2801F3] 
not enough master nodes after master left (reason = failed to ping, tried 
[3] times, each with  maximum [30s] timeout), current nodes: 
{[ELS-10.76.125.37][j3VQFYDaQLujkprUnke02w][inet[/10.76.125.37:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.122.38][5V8bqkEzTP2TzMukB5_j-Q][inet[/10.76.122.38:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.125.48][TGlF1uv8Q5GpgBVvIcvRAQ][inet[/10.76.125.48:9300]]{max_local_storage_nodes=1,
 
master=false},[EDSFB1ABF7][MqLDnM5mSLqIicIuyJk7IQ][inet[/10.76.122.19:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.120.62][evcNI2CqSs-Zz44Jdzn0aw][inet[/10.76.120.62:9300]]{client=true,
 
data=false, max_local_storage_nodes=1, 
master=false},[BUS9364B62][YZPjEsvhT6OjM9ti5Lxwkg][inet[/10.76.123.123:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.125.38][RyeswSy8SquV5H8Vfsw75Q][inet[/10.76.125.38:9300]]{max_local_storage_nodes=1,
 
master=false},[EDSFB1200C][XUNaWVlYQUOVZlJMv3nHMA][inet[/10.76.122.18:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.124.214][H8N9nIU0TKyGv_prKyRVCQ][inet[/10.76.124.214:9300]]{max_local_storage_nodes=1,
 
master=false},[EDS1A1F2240][ET2u1qImQCCvqc-1gRvQbQ][inet[/10.76.120.87:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.125.40][hp4wvQxER-mMPygey2Iqgg][inet[/10.76.125.40:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.122.67][BiXop5iCRgGQyGvxazMkQg][inet[/10.76.122.67:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.121.129][pf9xpva7Q4izIy6Nj4S4iQ][inet[/10.76.121.129:9300]]{data=false,
 
max_local_storage_nodes=1, 
master=true},[EDSFB21E69][RabnwdLbT1WCp9gIE-_AXw][inet[/10.76.122.20:9300]]{client=true,
 
data=false, 
master=false},[EDI1AE4FD76][UF1RMWe6RYaZGp6BU3x-VA][inet[/10.76.124.228:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.125.46][nXceQp40TjOSctChaGVtKw][inet[/10.76.125.46:9300]]{max_local_storage_nodes=1,
 
master=false},[EDI1A1EA928][rWlelgQuT7KHSfyIejmLPg][inet[/10.76.120.82:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.121.188][oWldDeY4TJioki90moNySw][inet[/10.76.121.188:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.122.34][kPSYm9G8R8i_z2skK_jq1g][inet[/10.76.122.34:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.125.43][JMgOIZFBSzaQZ9bVagG57w][inet[/10.76.125.43:9300]]{max_local_storage_nodes=1,
 
master=false},[EDI1AE3EE57][7JHGaYjzS3uI7PLN8Ynm-Q][inet[/10.76.124.227:9300]]{client=true,
 
data=false, 
master=false},[ELS-10.76.124.225][nTPlE6IkTHOZ7EThX-hLeQ][inet[/10.76.124.225:9300]]{max_local_storage_nodes=1,
 
master=false},[ELS-10.76.120.61][_60f636_QsOPIWN0tKyN2A][inet[/10.76.120.61:9300]]{client=true,
 
data=false, max_local_storage_nodes=1, 
master=false},[ELS-10.76.125.47][MV8eSvpbRtCS1MAK2iAcVg][inet[/10.76.125.47:9300]]{max_local_storage_nodes=1,
 
master=false},[EDI1AB0123F][Di8rrVJMSYm6PVnAVuFnkw][inet[/10.76.124.18:9300]]{client=true,
 
data=false, 
master=false},[BUS936E1B3][Vnr_UCzOTtysBzM6NlhvFA][inet[/1

Re: searching on nested docs - geting back the nested docs as a response

2014-06-19 Thread Itamar Syn-Hershko

It is very hard to give you concrete advice without knowing more about your
domain and usecases, but here are 2 points that came to mind:

1. You can make use of the highlighting features to show the content that
matched. Highlighters can return whole blocks of text, and by using
positionIncrements correctly you can get this right.

2. Yes, Elasticsearch is a document-oriented storage, but is it really
necessary for you to index entire books as one document? I'd most certainly
look at indexing sections or chapters maybe even pages as single documents
and use string references to the book ID. Unless you use data from the book
level along with full-text searches on the texts, which even then in some
scenarios I would consider denormalization.

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Thu, Jun 19, 2014 at 10:13 PM, liorg  wrote:

> Well, assuming we have a book type. the book holds a lot of metadata, lets
> say something of the following:
> {
> "author": {
> "name": "Jose",
> "lastName": "Martin"
> },
> "sections": [{
> "chapters": [{
> "pages": [{
> "pageNum": 1,
> "numOfChars": 1000,
> "text": "let my people...",
> "numofWords": 125
> },
> {
> "pageNum": 2,
> "numOfChars": 1005,
> "text": "let my people go...",
> "numofWords": 150
>  }],
> "chapterName": "the start"
> },
> {
> "pages": [{
> "pageNum": 3,
> "numOfChars": 1000,
> "text": "will do...",
> "numofWords": 125
> },
> {
> "pageNum": 4,
> "numOfChars": 1005,
> "text": "will do later on...",
> "numofWords": 150
>  }],
> "chapterName": "the end"
> }],
> "sectionName": "prologue"
> }]
> }
>
> we want to search for all the pages that have "let my people" in their
> text and more than 100 words.
> so, when we use ES we can use nested objects and query on the nested page
> object - but the actual returned values are the books (parents) that have
> those matching pages.
> now, if we want to show the user the pages he was looking for - we cannot
> do that, as we get the whole book type returned with all its metadata and
> not just the nested objects that matched the criteria... - we need to
> search again (maybe in memory?) for the pages that matched the criteria in
> order to display the user his search results... (the whole type is returned
> as ES does not support yet in returning the nested objects that matched the
> criteria).
>
> i hope it is better understood now
>
> On Thursday, June 19, 2014 7:22:13 PM UTC+3, Itamar Syn-Hershko wrote:
>
>> This is usually something that's being solved using parent-child, but the
>> question here really is what do you mean by needing to retrieve both books
>> & pages.
>>
>> Can you describe the actual scenario and what you are trying to achieve?
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko 
>> Freelance Developer & Consultant
>> Author of RavenDB in Action 
>>
>>
>> On Thu, Jun 19, 2014 at 7:12 PM, liorg  wrote:
>>
>>> Hi,
>>>
>>> we have somehow a complex type holding some nested docs with arrays
>>> (lets assume an hierarchy of books and for each book we have an array of
>>> pages containing its metadata).
>>>
>>> we want to search for the nested doc - search for all the books that
>>> have the term "XYZ" in one of their pages - but we want to get back not
>>> only the book, but the pages themselves.
>>>
>>> We've understood that it's problematic to achieve with ES (see
>>> https://github.com/elasticsearch/elasticsearch/issues/3022).
>>>
>>> We have a problem to achieve it with parent child model as the data
>>> model comes from our mongodb already existing model (and besides, not sure
>>> if a parent child model fits here).
>>>
>>> so...
>>>
>>> 1. Is there any a workaround we can do to get the results of the nested
>>> doc? (the actual pages?)
>>> 2. If not, is there a recommended way we can search for the data again
>>> in memory after it was narrowed down by ES server?...
>>> 3. Any advice will be appreciated as this is quite a big obstacle in our
>>> way to implement a solution using ES.
>>>
>>> thanks,
>>>
>>> Lior
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop r

Re: Get X word before and after search word

2014-06-19 Thread Ivan Brusic

Span queries are another option, but the main drawback is that they use
non-analyzed term queries.

-- 
Ivan


On Thu, Jun 19, 2014 at 2:11 AM, Alexander Reelsen  wrote:

> Hey,
>
> you potentially could use the termvectors API for this, see
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html
>
> Not sure, if this is excalty, what you are after... maybe explain your
> use-case a bit more
>
>
> --Alex
>
>
>
> On Tue, Jun 17, 2014 at 2:19 PM, Petr Janský  wrote:
>
>> Hello,
>>
>> I'm trying to find way how to get words/terms around search word eg let's
>> have a document with text "The best search engine is ElasticSearch". I will
>> search for "best" and get info that word "search" is xtime the next one
>> after search words.
>>
>> Thx
>> Petr
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/4a1b4896-6263-4de2-ad45-dc5efd4df7a3%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_J4biH1Dbce%2Bqh6VYd3hzK1LQYk43Okj4zE52BocR6Mw%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBaEAJsB3HM7yRqd0kTWC-Qum627R%3DZLtLKVihVde_xNw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

testing Elasticsearch performance - interesting read

2014-06-19 Thread Valentina Crisan

See this if interested in Elasticsearch performance on different hardware 
configurations. 

http://www.slideshare.net/bigstep-infrastructure/bigstep-partners-elasticsearch-scaling-benchmarks


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5365d201-72a4-4dc0-b65d-8e6ec1e26d5d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Very frequent ES OOM's & potential segment merge problems

2014-06-19 Thread Paul Sabou

Hi,

*Situation:*
We are using ES 1.2.1 on a machine with 32GB RAM, fast SSD and 12 cores. The
machine runs Ubuntu 14.0.x LTS.
The ES process has 12GB of RAM allocated.

We have an index in which we inserted 105 million small documents so the ES
data folder is around 50GB in size
(we see this by using du -h . on the folder)

The new document insertion rate is rather small (ie. 100-300 small docs per
second).

*The problem:*

We experienced rather frequent ES OOM (Out of Memory) at a rate of around
one every 15 mins. To lower the load on the index
we deleted 104+ million docs (ie. mostly small log entries) by deleting
everything in one type :
curl -XDELETE http://localhost:9200/index_xx/type_yy

so that we ended up with an ES index with several thousands docs.
After this we started to experience massive disk IO (10-20Mbs reads and
1MBs writes) and more frequent OOM's (at a rate of around
one every 7 minutes). We restart ES after every OOM and kept monitoring the
data folder size. Over the next hour the size went down
to around 36GB but now it's stuck there (doesn't go down in size even after
several hours).

*Questions* :
Is this a problem related to segment merging running out of memory? If so
how can be solved?
If not, what could be the problem?

Thanks
Paul.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/695c92a3-f77a-46bd-9041-79421a0bf1be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES v1.1 continuous young gc pauses old gc, stops the world when old gc happens and splits cluster

2014-06-19 Thread Bruce Ritchie

Java 8 with G1GC perhaps? It'll have more overhead but perhaps it'll be 
more consistent wrt pauses.



On Wednesday, June 18, 2014 2:02:24 PM UTC-4, Eric Brandes wrote:
>
> I'd just like to chime in with a "me too".  Is the answer just more 
> nodes?  In my case this is happening every week or so.
>
> On Monday, April 21, 2014 9:04:33 PM UTC-5, Brian Flad wrote:
>
> My dataset currently is 100GB across a few "daily" indices (~5-6GB and 15 
> shards each). Data nodes are 12 CPU, 12GB RAM (6GB heap).
>
>
> On Mon, Apr 21, 2014 at 6:33 PM, Mark Walkom  
> wrote:
>
> How big are your data sets? How big are your nodes?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 22 April 2014 00:32, Brian Flad  wrote:
>
> We're seeing the same behavior with 1.1.1, JDK 7u55, 3 master nodes (2 min 
> master), and 5 data nodes. Interestingly, we see the repeated young GCs 
> only on a node or two at a time. Cluster operations (such as recovering 
> unassigned shards) grinds to a halt. After restarting a GCing node, 
> everything returns to normal operation in the cluster.
>
> Brian F
>
>
> On Wed, Apr 16, 2014 at 8:00 PM, Mark Walkom  
> wrote:
>
> In both your instances, if you can, have 3 master eligible nodes as it 
> will reduce the likelihood of a split cluster as you will always have a 
> majority quorum. Also look at discovery.zen.minimum_master_nodes to go with 
> that.
> However you may just be reaching the limit of your nodes, which means the 
> best option is to add another node (which also neatly solves your split 
> brain!).
>
> Ankush it would help if you can update java, most people recommend u25 but 
> we run u51 with no problems.
>
>
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 17 April 2014 07:31, Dominiek ter Heide  wrote:
>
> We are seeing the same issue here. 
>
> Our environment:
>
> - 2 nodes
> - 30GB Heap allocated to ES
> - ~140GB of data
> - 639 indices, 10 shards per index
> - ~48M documents
>
> After starting ES everything is good, but after a couple of hours we see 
> the Heap build up towards 96% on one node and 80% on the other. We then see 
> the GC take very long on the 96% node:
>
>
>
>
>
>
>
>
>
> TOuKgmlzaVaFVA][elasticsearch1.trend1.bottlenose.com][inet[/192.99.45.125:
> 9300]]])
>
> [2014-04-16 12:04:27,845][INFO ][discovery] 
> [elasticsearch2.trend1] trend1/I3EHG_XjSayz2OsHyZpeZA
>
> [2014-04-16 12:04:27,850][INFO ][http ] [
> elasticsearch2.trend1] bound_address {inet[/0.0.0.0:9200]}, 
> publish_address {inet[/192.99.45.126:9200]}
>
> [2014-04-16 12:04:27,851][INFO ][node ] 
> [elasticsearch2.trend1] started
>
> [2014-04-16 12:04:32,669][INFO ][indices.store] 
> [elasticsearch2.trend1] updating indices.store.throttle.max_bytes_per_sec 
> from [20mb] to [1gb], note, type is [MERGE]
>
> [2014-04-16 12:04:32,669][INFO ][cluster.routing.allocation.decider] 
> [elasticsearch2.trend1] updating 
> [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] 
> to [50]
>
> [2014-04-16 12:04:32,670][INFO ][indices.recovery ] 
> [elasticsearch2.trend1] updating [indices.recovery.max_bytes_per_sec] from 
> [200mb] to [2gb]
>
> [2014-04-16 12:04:32,670][INFO ][cluster.routing.allocation.decider] 
> [elasticsearch2.trend1] updating 
> [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] 
> to [50]
>
> [2014-04-16 12:04:32,670][INFO ][cluster.routing.allocation.decider] 
> [elasticsearch2.trend1] updating 
> [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] 
> to [50]
>
> [2014-04-16 15:25:21,409][WARN ][monitor.jvm  ] 
> [elasticsearch2.trend1] [gc][old][11876][106] duration [1.1m], 
> collections [1]/[1.1m], total [1.1m]/[1.4m], memory [28.7gb]->[22gb]/[
> 29.9gb], all_pools {[young] [67.9mb]->[268.9mb]/[665.6mb]}{[survivor] [
> 60.5mb]->[0b]/[83.1mb]}{[old] [28.6gb]->[21.8gb]/[29.1gb]}
>
> [2014-04-16 16:02:32,523][WARN ][monitor.jvm  ] [
> elasticsearch2.trend1] [gc][old][13996][144] duration [1.4m], collections 
> [1]/[1.4m], total [1.4m]/[3m], memory [28.8gb]->[23.5gb]/[29.9gb], 
> all_pools {[young] [21.8mb]->[238.2mb]/[665.6mb]}{[survivor] [82.4mb]->[0b
> ]/[83.1mb]}{[old] [28.7gb]->[23.3gb]/[29.1gb]}
>
> [2014-04-16 16:14:12,386][WARN ][monitor.jvm  ] [
> elasticsearch2.trend1] [gc][old][14603][155] duration [1.3m], collections 
> [2]/[1.3m], total [1.3m]/[4.4m], memory [29.2gb]->[23.9gb]/[29.9gb], 
> all_pools {[young] [289mb]->[161.3mb]/[665.6mb]}{[survivor] [58.3mb]->[0b
> ]/[83.1mb]}{[old] [28.8gb]->[23.8gb]/[29.1gb]}
>
> [2014-04-16 16:17:55,480][WARN ][monitor.jvm  ] [
> elasticsearch2.trend1] [gc][old][14745][158] duration [1.3m], collections 
> [1]/[1.3m], total [1.3m]/[5.7m], memory [29.7gb]->[24.1gb]/[29.9g

Re: searching on nested docs - geting back the nested docs as a response

2014-06-19 Thread liorg

Well, assuming we have a book type. the book holds a lot of metadata, lets 
say something of the following:
{
"author": {
"name": "Jose",
"lastName": "Martin"
},
"sections": [{
"chapters": [{
"pages": [{
"pageNum": 1,
"numOfChars": 1000,
"text": "let my people...",
"numofWords": 125
},
{
"pageNum": 2,
"numOfChars": 1005,
"text": "let my people go...",
"numofWords": 150
 }],
"chapterName": "the start"
},
{
"pages": [{
"pageNum": 3,
"numOfChars": 1000,
"text": "will do...",
"numofWords": 125
},
{
"pageNum": 4,
"numOfChars": 1005,
"text": "will do later on...",
"numofWords": 150
 }],
"chapterName": "the end"
}],
"sectionName": "prologue"
}]
}

we want to search for all the pages that have "let my people" in their text 
and more than 100 words.
so, when we use ES we can use nested objects and query on the nested page 
object - but the actual returned values are the books (parents) that have 
those matching pages.
now, if we want to show the user the pages he was looking for - we cannot 
do that, as we get the whole book type returned with all its metadata and 
not just the nested objects that matched the criteria... - we need to 
search again (maybe in memory?) for the pages that matched the criteria in 
order to display the user his search results... (the whole type is returned 
as ES does not support yet in returning the nested objects that matched the 
criteria).

i hope it is better understood now

On Thursday, June 19, 2014 7:22:13 PM UTC+3, Itamar Syn-Hershko wrote:
>
> This is usually something that's being solved using parent-child, but the 
> question here really is what do you mean by needing to retrieve both books 
> & pages.
>
> Can you describe the actual scenario and what you are trying to achieve?
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Author of RavenDB in Action 
>
>
> On Thu, Jun 19, 2014 at 7:12 PM, liorg > 
> wrote:
>
>> Hi,
>>
>> we have somehow a complex type holding some nested docs with arrays (lets 
>> assume an hierarchy of books and for each book we have an array of pages 
>> containing its metadata).
>>
>> we want to search for the nested doc - search for all the books that have 
>> the term "XYZ" in one of their pages - but we want to get back not only the 
>> book, but the pages themselves.
>>
>> We've understood that it's problematic to achieve with ES (see 
>> https://github.com/elasticsearch/elasticsearch/issues/3022).
>>
>> We have a problem to achieve it with parent child model as the data model 
>> comes from our mongodb already existing model (and besides, not sure if a 
>> parent child model fits here).
>>
>> so...
>>
>> 1. Is there any a workaround we can do to get the results of the nested 
>> doc? (the actual pages?)
>> 2. If not, is there a recommended way we can search for the data again in 
>> memory after it was narrowed down by ES server?...
>> 3. Any advice will be appreciated as this is quite a big obstacle in our 
>> way to implement a solution using ES.
>>
>> thanks,
>>
>> Lior
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c3034e7-34d9-4b4d-802a-5110330b31a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

javascript language plugin _index support

2014-06-19 Thread Neeraj Makam

Hi,

i've had few issues with the mvel scripting, so was looking at other 
languages.
1) Is it true that i cant execute more than one sentence in python 
language??
2) When using the javascript language plugin, scripts calling functions 
like "doc['field_name'].value" OR "_source.obj1.test" work well.
But when i'm using functions of _index, i'm getting the following error:

[2014-06-20 00:06:25,736][DEBUG][action.search.type   ] [Elfqueen] 
[codeindex][0], node[sPYaZkF7TO6ZFpJ4sGv5Uw], [P], s[STARTED]:* Failed to 
execut*
*e [org.elasticsearch.action.search.SearchRequest@3414a347]*
*org.mozilla.javascript.EvaluatorException: Invalid JavaScript value of 
type org.elasticsearch.search.lookup.IndexFieldTerm (Script6.js#10)*
at 
org.mozilla.javascript.DefaultErrorReporter.runtimeError(DefaultErrorReporter.java:77)
at 
org.mozilla.javascript.Context.reportRuntimeError(Context.java:913)
at 
org.mozilla.javascript.Context.reportRuntimeError(Context.java:969)
at 
org.mozilla.javascript.Context.reportRuntimeError1(Context.java:932)
at 
org.mozilla.javascript.ScriptRuntime.errorWithClassName(ScriptRuntime.java:3964)
at 
org.mozilla.javascript.ScriptRuntime.typeof(ScriptRuntime.java:2530)
at 
org.mozilla.javascript.ScriptRuntime.notFunctionError(ScriptRuntime.java:3786)
at 
org.mozilla.javascript.ScriptRuntime.getPropFunctionAndThisHelper(ScriptRuntime.java:2269)
at 
org.mozilla.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2251)
at 
org.mozilla.javascript.gen.Script6_js_2._c_script_0(Script6.js:10)
at org.mozilla.javascript.gen.Script6_js_2.call(Script6.js)
at 
org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:394)
at 
org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3091)
at org.mozilla.javascript.gen.Script6_js_2.call(Script6.js)
at org.mozilla.javascript.gen.Script6_js_2.exec(Script6.js)
at 
org.elasticsearch.script.javascript.JavaScriptScriptEngineService$JavaScriptSearchScript.run(JavaScriptScriptEngineService.java:257)
at 
org.elasticsearch.search.fetch.script.ScriptFieldsFetchSubPhase.hitExecute(ScriptFieldsFetchSubPhase.java:74)
at 
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:211)
at 
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:340)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:308)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:305)
at 
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
at java.lang.Thread.run(Unknown Source)
[2014-06-20 00:06:25,738][DEBUG][action.search.type   ] [Elfqueen] All 
shards failed for phase: [query_fetch]


This is the line that is causing the error:
var term = _index["content"].get("hello",_POSITIONS);
same thing goes with:
_index.numDocs() or any of the functions that are called by _index.

Any solution for this?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/62c5d5a1-1e77-4837-8efb-bc27726ad2af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mvel script_fields error

2014-06-19 Thread Neeraj Makam

Hi

I am running a script in mvel (scripted fields) that returns a computed 
value by getting the payload information.
When i run the script, for the first few times i get the following error. 
After executing the query 3-4 times, the mvel script starts working 
perfectly fine till i change the parameter value.
Once changed, it again takes few attempts before it starts working. 
(sometimes works on the first attempt, and sometimes 4th/5th attempt).
Is there any solution for this?


Error:
{
   "error": "SearchPhaseExecutionException[Failed to execute phase 
[query_fetch], all shards failed; shardFailures 
{[SEKBsVXpS56AFe9Sshmxpg][codeindex][0]: ElasticsearchException[Bad type on 
operand stack\nException Details:\n  Location:\n   
 
ASMAccessorImpl_3185728311403177607390.getValue(Ljava/lang/Object;Ljava/lang/Object;Lorg/elasticsearch/common/mvel2/integration/VariableResolverFactory;)Ljava/lang/Object;
 
@27: invokeinterface\n  Reason:\nType 'java/lang/Object' (current 
frame, stack[1]) is not assignable to integer\n  Current Frame:\nbci: 
@27\nflags: { }\nlocals: { 
'ASMAccessorImpl_3185728311403177607390', 'java/lang/Object', 
'java/lang/Object', 
'org/elasticsearch/common/mvel2/integration/VariableResolverFactory' }\n   
 stack: { 'java/util/List', 'java/lang/Object' }\n  Bytecode:\n000: 
2d12 0eb9 0014 0200 b900 1901 00c0 001b\n010: 2ab4 001f 2c2d b900 
2403 00b9 0028 0200\n020: b0 
\n]; nested: VerifyError[Bad type on operand stack\nException Details:\n 
 Location:\n   
 
ASMAccessorImpl_3185728311403177607390.getValue(Ljava/lang/Object;Ljava/lang/Object;Lorg/elasticsearch/common/mvel2/integration/VariableResolverFactory;)Ljava/lang/Object;
 
@27: invokeinterface\n  Reason:\nType 'java/lang/Object' (current 
frame, stack[1]) is not assignable to integer\n  Current Frame:\nbci: 
@27\nflags: { }\nlocals: { 
'ASMAccessorImpl_3185728311403177607390', 'java/lang/Object', 
'java/lang/Object', 
'org/elasticsearch/common/mvel2/integration/VariableResolverFactory' }\n   
 stack: { 'java/util/List', 'java/lang/Object' }\n  Bytecode:\n000: 
2d12 0eb9 0014 0200 b900 1901 00c0 001b\n010: 2ab4 001f 2c2d b900 
2403 00b9 0028 0200\n020: b0 
\n]; }]",
   "status": 500
}


I've tried ES 1.1.1, 1.2.1, Jre7 and Jre8 already. It still is the same.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f12e0805-4f17-41a1-b436-f6c3daf5b329%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: can the "url" repository type be used as a "ready only" repo for the s3 repository?

2014-06-19 Thread David Pilato

You mean that you want to snapshot your data to S3 and then expose your
SNAPSHOT on http, right?

I think, but I might be wrong that in that case, using a fs repository to read
your http SNAPSHOT (even if it had been build with S3) should work.

But may be I misunderstood your case here???

I hope this help

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 19 juin 2014 à 17:32:17, JoeZ99 (jzar...@gmail.com) a écrit:

the "url" type is used in combination with the "fs" type. some machines can
write/read snapshots to a "fs" type repository, and same machines can only read
for a "url" repository which points to the same location the "fs" repository
points at.

Is this behavior by any chance possible using S3 repositories???
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/30022b27-3fa2-4fb3-9e2f-2d5b86523425%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53a32176.6b8b4567.198d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: 100% CPU on 1 Node with JMeter Tests

2014-06-19 Thread sairam

Bump

On Wednesday, June 18, 2014 6:20:58 PM UTC-7, sai...@roblox.com wrote:
>
> One out of 4 nodes always spikes to 100% CPU when we do some load tests 
> using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered 
> Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The 
> other nodes have less than 40% CPU on them at the same time. The heap is 
> set at 30GB on all of them.  This is the GIST for Hot Threads 
>  when the Test 
> was running. Is there anything else that can be done to improve the 
> performance? The Query Response times jump to 5-8 seconds when the CPU is 
> hammered.
>
>
>
> 
>
> I had previously posted the specs of the Servers on another thread 
> .
>  
> Here are the Server Specs:
> *Machine Specs:*
> Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
> Number of CPU cores:24
> Number of Physical CPUs:  2
> Installed RAM:   [~256 GB Total] 128 GB 128 GB 16 MB 
> Drive:Two 278GB SAS Drive configured in 
> RAID 0
> *OS:*
> Arch:  64bit(x86_64)
> OS Type:Linux
> Kernel:2.6.32-431.5.1.el6.x86_64
> OS Version:Red Hat Enterprise Linux Server release 6.5 
> (Santiago)
> Java Version:  Java 1.7.0_51 (Java 7u51 x64 version for 
> Linux).
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Azure cloud plugin 2.3.0 released

2014-06-19 Thread Elasticsearch Team


Heya,


We are pleased to announce the release of the Elasticsearch Azure cloud plugin, 
version 2.3.0.

The Azure Cloud plugin allows to use Azure API for the unicast discovery 
mechanism and add Azure storage repositories..

https://github.com/elasticsearch/elasticsearch-cloud-azure/

Release Notes - elasticsearch-cloud-azure - Version 2.3.0



Update:
 * [14] - Update to elasticsearch 1.2.0 
(https://github.com/elasticsearch/elasticsearch-cloud-azure/issues/14)




Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-cloud-azure project repository: 
https://github.com/elasticsearch/elasticsearch-cloud-azure/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53a31228.32c6b40a.1968.12a7SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Re: searching on nested docs - geting back the nested docs as a response

2014-06-19 Thread Itamar Syn-Hershko

This is usually something that's being solved using parent-child, but the
question here really is what do you mean by needing to retrieve both books
& pages.

Can you describe the actual scenario and what you are trying to achieve?

--

Itamar Syn-Hershko
http://code972.com | @synhershko 
Freelance Developer & Consultant
Author of RavenDB in Action 


On Thu, Jun 19, 2014 at 7:12 PM, liorg  wrote:

> Hi,
>
> we have somehow a complex type holding some nested docs with arrays (lets
> assume an hierarchy of books and for each book we have an array of pages
> containing its metadata).
>
> we want to search for the nested doc - search for all the books that have
> the term "XYZ" in one of their pages - but we want to get back not only the
> book, but the pages themselves.
>
> We've understood that it's problematic to achieve with ES (see
> https://github.com/elasticsearch/elasticsearch/issues/3022).
>
> We have a problem to achieve it with parent child model as the data model
> comes from our mongodb already existing model (and besides, not sure if a
> parent child model fits here).
>
> so...
>
> 1. Is there any a workaround we can do to get the results of the nested
> doc? (the actual pages?)
> 2. If not, is there a recommended way we can search for the data again in
> memory after it was narrowed down by ES server?...
> 3. Any advice will be appreciated as this is quite a big obstacle in our
> way to implement a solution using ES.
>
> thanks,
>
> Lior
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zuks28fnRvh7%2BVkNU%3D205oytvppCs61PEAbwQJ-6%3Dn0kQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

searching on nested docs - geting back the nested docs as a response

2014-06-19 Thread liorg

Hi,

we have somehow a complex type holding some nested docs with arrays (lets 
assume an hierarchy of books and for each book we have an array of pages 
containing its metadata).

we want to search for the nested doc - search for all the books that have 
the term "XYZ" in one of their pages - but we want to get back not only the 
book, but the pages themselves.

We've understood that it's problematic to achieve with ES 
(see https://github.com/elasticsearch/elasticsearch/issues/3022).

We have a problem to achieve it with parent child model as the data model 
comes from our mongodb already existing model (and besides, not sure if a 
parent child model fits here).

so...

1. Is there any a workaround we can do to get the results of the nested 
doc? (the actual pages?)
2. If not, is there a recommended way we can search for the data again in 
memory after it was narrowed down by ES server?...
3. Any advice will be appreciated as this is quite a big obstacle in our 
way to implement a solution using ES.

thanks,

Lior

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query Performance

2014-06-19 Thread Cédric Hourcade

Hello,

Can you isolate your slow queries and check if they are slow even when 
running them independently ? Check how many documents are matched by theses 
queries, if they are millions that would explain.

Also you are using a Terms filter with hundreds of entries. If theses 
entries are different for each query, you may want to set the filter 
"execution" to "bool" (or "fielddata" ?) to cache the terms individually 
rather than just the combination of them : { "terms" : { "execution": 
"bool", ... } }

Cédric Hourcade
c...@wal.fr

Le jeudi 19 juin 2014 15:12:43 UTC+2, ravim...@gmail.com a écrit :
>
> Hi All, 
>
> One thing i forgot to mention is that my 3rd query, which takes input from 
> the 2nd query, gets close to 500-1000 values from 2nd query. So the *terms 
> *query gets 500-100 values. The 90th percentile for the third query comes 
> out to be ~350 ms. 
>
> Thanks!
> Ravi
>
> On Wednesday, 18 June 2014 17:49:45 UTC+1, ravim...@gmail.com wrote:
>>
>> Hello All,
>>
>> As per continued experimentation, i changed 
>>
>> *indices.cache.filter.size *from default 20% to 30% on all of my boxes. 
>>
>> I can now see increased memory usage and i see increased cache usage. I 
>> see my cache jumped from 2.9GB to 4.4 GB which is accurate as allocated RAM 
>> is 15GB. 
>>
>> Even though RAM usage has increased on all machines, i do not see any 
>> performance improvement. How is that possible? Any clues as to what i might 
>> be doing wrong here? 
>>
>>
>> Thanks!
>> Ravi
>>
>> On Wednesday, 18 June 2014 11:18:57 UTC+1, ravim...@gmail.com wrote:
>>>
>>> btw, changing search_type to count did not have much impact on the 
>>> timings. 
>>>
>>> On Tuesday, 17 June 2014 18:19:40 UTC+1, ravim...@gmail.com wrote:

 Hi Binh,

 Did some tests and here are the findings: 

 Moving to c3.4xlarge reduces time by 300 ms. So that takes overall 90th 
 percentile down to ~1.5 seconds. CPU still in high 80s-90s. 

 Making all queries filtered and removing script from 2nd queries'  2nd 
 aggregation reduced CPU footprint (high 50s-60s) and improved overall 
 timings by close to 200 ms. I am at ~1.3 seconds for all 3 queries.

 I guess only next steps now is to play with shard size? or more 
 machines? 

 Thanks!
 Ravi

 On Tuesday, 17 June 2014 15:52:31 UTC+1, ravim...@gmail.com wrote:
>
> Hi Binh,
>
> thanks for helping. 
>
> My record size for 1st query is 4 fields. 3 of them integers and a 
> date. so the _source is not big enough to raise concerns. I will anyways 
> try your suggestion and report any improvements here. 
>
> For the 2nd query: i have 15gb of RAM. only 20% of which gets utilised 
> during the tests. Thanks for all three suggestion, Will definitely try 
> that 
> and come back here. Good catch for using script in simSum, thanks. I need 
> just the sum of that field, which does not need a script. Will change 
> that 
> and see what happens. 
>
> For the 3rd query, i do not care about the _score of returned values. 
> Will give that a try as well. 
>
> Thanks a lot. 
>
> Ravi
>
>
> On Tuesday, 17 June 2014 15:28:21 UTC+1, Binh Ly wrote:
>>
>> For the first query, since you don't care about the _score, move the 
>> bool query into a filter. If you only need field1 and field2 and your 
>> _source is big, might be able to save some network payload using source 
>> filtering only for those 2 fields.
>>
>> For the second query, if you have a lot RAM and say col_a and col_b 
>> are not big values (long strings) and not high cardinality, you can try 
>> to 
>> switch all _source.col_a (or _source.blah) to doc['col_a'].value in your 
>> scripts. This syntax will load the field values into memory and should 
>> perform faster than _source.blah. And your last stats agg (simSum), not 
>> sure why that needs to be a script - can it just be a stats-field on 
>> col_x? 
>> Also if the second query does not need to return hits (i.e. you only 
>> need 
>> info from the aggs), you can set search_type=count to further optimize 
>> it.
>>
>> For the third query, if you don't care about _score, move the query 
>> part into the filter part.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/99179994-33be-4753-8dd9-8c3db1fc5be5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

problem indexing with my analyzer

2014-06-19 Thread Tanguy Bernard

Hello
I have some issue, when I index a particular data "note_source" (sql 
longtext).
I use the same analyzer for each fields (except date_source and id_source) 
but for "note_source", I have a "warn monitor.jvm".
When I remove "note_source", everything fine. If I don't use analyzer on 
"note_source", everything fine, but if I use my analyzer on "note_source" I 
have some crash.

I think I have enough memory, I have used ES_HEAP_SIZE.
Maybe my problem it's with accent (ascii, utf-8)

Can you help me with this ?



*My Setting*

 public function createSetting($pf){
$params = array('index' => $pf, 'body' => array(
'settings' => array(
'number_of_shards' => 5,
'number_of_replicas' => 0,
'analysis' => array(
'filter' => array(
'nGram' => array(
"token_chars" =>array(),
"type" => "nGram",
"min_gram" => 3,
"max_gram"  => 250
)
),
'analyzer' => array(
'reuters' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => array('lowercase', 'asciifolding', 
'nGram')
)
)
)
)
));
$this->elasticsearchClient->indices()->create($params);
return;
}


*My Indexing*

public function indexTable($pf,$typeElement){
   
$params =array(
"index" =>'_river', 
"type" => $typeElement, 
"id" => "_meta", 
"body" =>array(
  
"type" => "jdbc",
"jdbc" => array(
"url" => "jdbc:mysql://ip/name",
"user" => 'root',
"password" => 'mdp',
"index" => $pf,
"type" => $typeElement,
"sql" => select id_source as _id, id_sous_theme, 
titre_source, desc_source, note_source, adresse_source, type_source, 
date_source from source,
"max_bulk_requests" => 5,  
)
)

);

 
$this->elasticsearchClient->index($params);
}

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dd6e60dc-d394-4d7d-b994-2105002d7bd7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ignore_missing flag in snapshot listing?

2014-06-19 Thread JoeZ99

When I want to list the snapshots that are within a certain repository, I
issue the following command:
curl -XGET http://localhost:9200/_snapshots//_all

as I understand, this is the only way of doing it.

However, chances are that while I'm issuing that command, some other
process may delete a snapshot. I' ve found that it may provoke an error
response from the list command, given the right conditions. I've figured
that when the _all api call is made, elastic search finds out about every
snapshot it has in the repository, and then fetch info from every one of
them (by doing something similar to what
curl -XGET http:
//localhost:9200/_snapshots//

does), and the returns a listing with the collected info.

If a snapshot is deleted by some other process AFTER elasticsearch has got
all the snapshot names, and BEFORE elasticsearch starts collecting info of
every one of them, an error related to elasticsearch not being able to find
that particular snapshot is thrown.

My question is

- Is it possible to use a flag like "ignore_missing_snapshots" or
something like that when making the curl -XGET http:
//localhost:9200/_snapshots//_all call?

- Can I "list by prefix", telling elastic search I want to list only the
snapshots that starts by certain prefix? something like -XGET
http://localhost:9200/_snapshots//_all?prefix=
that way I could make sure the listing process doesn't interfere with the
possible deletion process

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d31b55f3-2393-483e-bf7a-560c956483ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

can the "url" repository type be used as a "ready only" repo for the s3 repository?

2014-06-19 Thread JoeZ99

the "url" type is used in combination with the "fs" type. some machines can 
write/read snapshots to a "fs" type repository, and same machines can only 
read for a "url" repository which points to the same location the "fs" 
repository points at.


Is this behavior by any chance possible using S3 repositories???

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/30022b27-3fa2-4fb3-9e2f-2d5b86523425%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: elasticsearch/logstash/kibana with hadoop as Hunk alternative

2014-06-19 Thread kay rus

I've found another alternative. New 3.6 version of HUE supports 
data visualization in kibana style:
http://gethue.com/hadoop-ui-hue-3-6-and-the-search-dashboards-are-out/

четверг, 19 июня 2014 г., 10:42:50 UTC+4 пользователь kay rus написал:
>
> Hi
>
> For performance improvement I'm trying to combine 
> Elasticsearch/Logstash/Kibana with Hadoop (cdh4) and configure opensource 
> alternative to Hunk. Unfortunately I'm familiar only with HDFS where I 
> store logs. In my opinion the combination of Elasticsearch and Hadoop 
> should use HDFS as storage and transparent Hadoop map/reduce functionality 
> for search.
>
> I ran through elasticsearch-hadoop documentation and unfortunately I 
> didn't understand how this combination could help me for Kibana log 
> analysis. Documentation says "Elasticsearch real-time search and analytics 
> natively integrated with Hadoop.". But what should I configure? Hadoop with 
> Elasticsearch or Elasticsearch with Hadoop? As for the first one, I found 
> only Java code parts, nothing about the Hadoop configuration, so it seems 
> that I should be familiar with Java programming. As for the last one I 
> found only "Hadoop HDFS Snapshot/Restore plugin", but I guess it was 
> developed for indexes backup/restore, am I right?
>
> Anyway, are my expectations right? Or elasticsearch-hadoop was developed 
> for Hadoop developers only and it is not suitable for 
> "elasticsearch/logstash/kibana + hadoop" (like Hunk).
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22278760-02e8-4c68-80eb-cc3a316ea85d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Securing Data in Elasticsearch

2014-06-19 Thread Harvii Dent

@Zennet: I was thinking of doing something similar via a reverse-proxy in 
front of Kibana, however I believe Kibana still uses DELETE, PUT, and POST 
requests to save its dashboards, so I'm not sure what to block exactly.

@Jaguar: jetty plugin looks interesting, especially the 
"jetty-restrict-writes.xml" part, I'll be taking a look at that.

As Jörg said, it shouldn't be too difficult to create a DELETE request and 
spoof its source to appear as if coming from a trusted source; I just wish 
there was an option built into ES to disable deletes/updates or at least 
authenticate them first.

Thanks everyone

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec799ca8-a833-49c7-9794-cbb23435e98b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Snapshot & Restore in a cluster of two nodes

2014-06-19 Thread Daniel Bubenheim

Hello,

we have a cluster of two nodes. Every index in this cluster consists of 2 
shards and one replica. We want to make use of  snapshots & restore to 
transfer data between two clusters. When we make our snapshots on node one 
only the primary shard is included, the replica shard is missing. While 
restoring on the other cluster the process breaks because of the missing 
second shard. 
Do we have to make a snapshot for each node to include both primary shards 
so that we can restore the whole index or am i missing something here? 

Thanks in advance
Daniel

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb1b3a48-250c-46bc-9a4a-8a9ccd582164%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Count request does not support [filter]. Why?

2014-06-19 Thread Andrew Gaydenko

Count request does not support [filter]. Why? How to count with the same 
filter (except for "size", "fields", "from") and query I'm probably going 
to search hits after counting?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/290e2be1-6f48-4266-a02e-4c8ff7620225%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Query Performance

2014-06-19 Thread ravimbhatt

Hi All, 

One thing i forgot to mention is that my 3rd query, which takes input from 
the 2nd query, gets close to 500-1000 values from 2nd query. So the *terms 
*query 
gets 500-100 values. The 90th percentile for the third query comes out to 
be ~350 ms. 

Thanks!
Ravi

On Wednesday, 18 June 2014 17:49:45 UTC+1, ravim...@gmail.com wrote:
>
> Hello All,
>
> As per continued experimentation, i changed 
>
> *indices.cache.filter.size *from default 20% to 30% on all of my boxes. 
>
> I can now see increased memory usage and i see increased cache usage. I 
> see my cache jumped from 2.9GB to 4.4 GB which is accurate as allocated RAM 
> is 15GB. 
>
> Even though RAM usage has increased on all machines, i do not see any 
> performance improvement. How is that possible? Any clues as to what i might 
> be doing wrong here? 
>
>
> Thanks!
> Ravi
>
> On Wednesday, 18 June 2014 11:18:57 UTC+1, ravim...@gmail.com wrote:
>>
>> btw, changing search_type to count did not have much impact on the 
>> timings. 
>>
>> On Tuesday, 17 June 2014 18:19:40 UTC+1, ravim...@gmail.com wrote:
>>>
>>> Hi Binh,
>>>
>>> Did some tests and here are the findings: 
>>>
>>> Moving to c3.4xlarge reduces time by 300 ms. So that takes overall 90th 
>>> percentile down to ~1.5 seconds. CPU still in high 80s-90s. 
>>>
>>> Making all queries filtered and removing script from 2nd queries'  2nd 
>>> aggregation reduced CPU footprint (high 50s-60s) and improved overall 
>>> timings by close to 200 ms. I am at ~1.3 seconds for all 3 queries.
>>>
>>> I guess only next steps now is to play with shard size? or more 
>>> machines? 
>>>
>>> Thanks!
>>> Ravi
>>>
>>> On Tuesday, 17 June 2014 15:52:31 UTC+1, ravim...@gmail.com wrote:

 Hi Binh,

 thanks for helping. 

 My record size for 1st query is 4 fields. 3 of them integers and a 
 date. so the _source is not big enough to raise concerns. I will anyways 
 try your suggestion and report any improvements here. 

 For the 2nd query: i have 15gb of RAM. only 20% of which gets utilised 
 during the tests. Thanks for all three suggestion, Will definitely try 
 that 
 and come back here. Good catch for using script in simSum, thanks. I need 
 just the sum of that field, which does not need a script. Will change that 
 and see what happens. 

 For the 3rd query, i do not care about the _score of returned values. 
 Will give that a try as well. 

 Thanks a lot. 

 Ravi


 On Tuesday, 17 June 2014 15:28:21 UTC+1, Binh Ly wrote:
>
> For the first query, since you don't care about the _score, move the 
> bool query into a filter. If you only need field1 and field2 and your 
> _source is big, might be able to save some network payload using source 
> filtering only for those 2 fields.
>
> For the second query, if you have a lot RAM and say col_a and col_b 
> are not big values (long strings) and not high cardinality, you can try 
> to 
> switch all _source.col_a (or _source.blah) to doc['col_a'].value in your 
> scripts. This syntax will load the field values into memory and should 
> perform faster than _source.blah. And your last stats agg (simSum), not 
> sure why that needs to be a script - can it just be a stats-field on 
> col_x? 
> Also if the second query does not need to return hits (i.e. you only need 
> info from the aggs), you can set search_type=count to further optimize it.
>
> For the third query, if you don't care about _score, move the query 
> part into the filter part.
>


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/717e6183-37e7-4542-8371-a8f35382db32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scroll Questions

2014-06-19 Thread mooky

Further to (2). Would it be an improvement to have a different kind of 
request for a scrolling search - that way the api could exclude items that 
don't make sense (e.g. aggregations, facets, etc)




On Wednesday, 18 June 2014 10:28:06 UTC+1, mooky wrote:
>
> Many thanks Jörg.
>
> Further questions/comments inline:
>  
>
>> 1. yes
>
>
> Thanks,
>
> 2. facet/aggregations are not very useful while scrolling (I doubt they 
>> even work at all) because scrolling works on shard level and aggregations 
>> work on indices level
>
>
> If they are not expected to work, would it make sense to either:
>
>1. prevent aggregation/facet requests in conjunction with scroll 
>requests (ie give an error to the user)
>2. Simply not execute them? 
>
> If it doesn't make sense, would it be better to not return any 
> aggregation/facet results at all?
>
> 3. a scroll request takes resources. The purpose of ClearScrollRequest is 
>> to release those resources explicitly. This is indeed a rare situation when 
>> you need explicit clearing. The time delay of releasing scrolls implicitly 
>> can be controlled by the requests.
>
>
> Do you mean the keepAlive time? So, does the scroll (and its resources) 
> always remain for the duration of the keepAlive (since the last request on 
> that scroll) regardless of whether the end of the scroll was reached or not?
>
> I read the following (from the documentation) to imply that reading to the 
> end of the scroll had the effect of "aborting" and therefore cleaning up 
> resources.
>
> Besides consuming the scroll search until no hits has been returned a 
> scroll search can also be aborted by deleting the scroll_id
>
> So, just to confirm, reading to the end of the results does nothing in 
> terms of bringing about the cleanup of the scroll? Its either the TTL or 
> the ClearScrollRequest that brings about the cleanup of resources.
>
> Is there any downside to calling ClearScrollRequest explicitly?
> (I am inclined to call it explicitly when the end of the scroll is reached 
> in order clean up resources asap)
>
>
> 4. yes, the scroll id is an encoding of the combined state of all the 
>> shards that participate in the scroll. Even if the ID looks as if it has 
>> not changed, you should always use the latest reference to the scroll ID in 
>> the response, or you may clutter the nodes with unreleased scroll resources.
>
>
> Thanks for the explanation.
>
> A null scroll ID is a matter of API design. By using hit length check for 
>> 0, you can use the same condition for other queries, so it is convenient 
>> and not confusing. Null scroll IDs are always prone to NPEs.
>
>
> Agreed. Its a matter of API style/design.
> The only issue I have with checking hits.length is that depending on the 
> SearchType, sometimes hits.length==0 does not mean the end of the results 
> (e.g. SearchType.SCAN). Its the lack of consistency that bothers me about 
> it. It requires the code that handles results to be aware of a detail of 
> the request.
>
> My case for using scrollId is that:
> The scrollId is already null if no scroll is requested.
> For this reason, (IMO) scrollId==null would be a more consistent indicator 
> of no scrolling required - or no further scrolling required. Also it would 
> re-enforce the notion that the user should always use/observe the returned 
> scrollId - they would have to.
>
> Cheers,
> -Nick
>
>
> On Wednesday, 18 June 2014 00:04:06 UTC+1, Jörg Prante wrote:
>>
>> 1. yes
>>
>> 2. facet/aggregations are not very useful while scrolling (I doubt they 
>> even work at all) because scrolling works on shard level and aggregations 
>> work on indices level
>>
>> 3. a scroll request takes resources. The purpose of ClearScrollRequest is 
>> to release those resources explicitly. This is indeed a rare situation when 
>> you need explicit clearing. The time delay of releasing scrolls implicitly 
>> can be controlled by the requests.
>>
>> 4. yes, the scroll id is an encoding of the combined state of all the 
>> shards that participate in the scroll. Even if the ID looks as if it has 
>> not changed, you should always use the latest reference to the scroll ID in 
>> the response, or you may clutter the nodes with unreleased scroll resources.
>>
>> Scrolling is very different from search, because there is a shard-level 
>> machinery that iterates over the Lucene segments and keep them open. This 
>> tends to ramp up lots of server-side resources, which may long-lived - a 
>> challenge for resource management. There is a reaper thread that wakes up 
>> from time to time to take care of stray scroll searches. You observed this 
>> as a "time delay". Ordinary search actions never keep resources open at 
>> shard level.
>>
>> Using scroll search for creating large CSV exports is adequate because 
>> this iterates through the result set doc by doc. But replacing a 
>> full-fledged search that has facets/filters/aggregations/sorting with a 
>> scroll search, you will only create

Re: Elasticsearch Embedded Index Exact Match Query Is Not Working

2014-06-19 Thread K.Samanth Kumar Reddy

It seems exact match query is not working on embedded index of elastic 
search.

Is this an issue with elastic search?

Is there any one to help me on this?

Thanks,
Samanth

On Wednesday, June 18, 2014 11:06:38 PM UTC+5:30, K.Samanth Kumar Reddy 
wrote:
>
> Hi,
>
> I am using Lucene for last one year. I was successful in creating and 
> querying indexes using Lucene.
> Now I am newbie to elasticsearch. As a beginner of elasticsearch, I have 
> created an embedded index using JAVA API. 
> And I tried to query the index for exact match, I am failed, I am getting 
> zero results.
>
> Please find the sample code below:
>
> - Am I creating embedded index in the right way?
> - How to query exact match?
> - Can you please suggest me any example for embedded index  creating and 
> querying using java?
>
> I have an urgent requirement, Can anybody please help me on this? 
>
>
> Thanks,
> Samanth
>
> //
> /// Indexing 
> //
>
> // settings to initialize the node
> ImmutableSettings.Builder settings = ImmutableSettings.settingsBuilder();
> settings.put("path.home", 
> System.getProperty("user.home")+"/elasticSearch");
>
> Node node = 
> NodeBuilder.nodeBuilder().settings(settings).clusterName("temp_cluster").node();
>
> // Get the client
> Client client = node.client();
>
> String indexName = "index1";
> String indexType = "type1";
>
> String jsonString1 = " {\"name\":\"elasticsearch\"} ";
> client.prepareIndex(indexName, indexType, "1")
> .setSource(jsonString1).execute().actionGet();
> String jsonString2 = " {\"name\":\"search\"} ";
> client.prepareIndex(indexName, indexType, "2")
> .setSource(jsonString2).execute().actionGet();
> 
> // Searching
> 
>
> QueryBuilder queryBuilder = QueryBuilders.termQuery("name", 
> "elasticsearch");
> SearchRequestBuilder requestBuilder =
> client.prepareSearch(indexName)
> .setTypes(indexType)
> .setQuery(queryBuilder.toString());
> SearchResponse response1 = requestBuilder.execute().actionGet();
>
> SearchHit[] results = response1.getHits().getHits();
> System.out.println("Current Search Result Size: " + results.length);
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7248fa04-c2b3-4b80-926e-662c29e76dd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Any reason not to enable http.compression?

2014-06-19 Thread David Severski

I recently learned that ES's default for http.compression is false (no 
compression). A quick search through the archives find several instances of 
folks turning this on. Are there counter indications to enabling 
compression? My main goal is to enable some remote scan queries executed 
via the elasticsearch-py helper to be more kind to my network links. The 
only other things hitting my ES 1.2.1 cluster via HTTP are Kibana and 
Logstash.

Thanks for the advice and warnings!

David

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0ec89d21-73ae-4adc-b98a-e280525d1e1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Splunk vs. Elastic search performance?

2014-06-19 Thread Thomas Paulsen

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
The system is slow but ok to use. 

We tried Elasticsearch and we were able to get the same performance with 
the same amount of machines. Unfortunately with Elasticsearch you need 
almost double amount of storage, plus a LOT of patience to make is run. It 
took us six months to set it up properly, and even now, the system is quite 
buggy and instable and from time to time we loose data with Elasticsearch. 

I don´t recommend ELK for a critical production system, for just dev work, 
it is ok, if you don´t mind the hassle of setting up and operating it. The 
costs you save by not buying a splunk license you have to invest into 
consultants to get it up and running. Our dev teams hate Elasticsearch and 
prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:
>
> That's a lot of data! I don't know of any installations that big but 
> someone else might.
>
> What sort of infrastructure are you running splunk on now, what's your 
> current and expected retention?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 19 April 2014 07:33, Frank Flynn > 
> wrote:
>
>> We have a large Splunk instance.  We load about 1.25 Tb of logs a day. 
>>  We have about 1,300 loaders (servers that collect and load logs - they may 
>> do other things too).
>>
>> As I look at Elasticsearch / Logstash / Kibana does anyone know of a 
>> performance comparison guide?  Should I expect to run on very similar 
>> hardware?  More? or Less?
>>
>> Sure it depends on exactly what we're doing, the exact queries and the 
>> frequency we'd run them but I'm trying to get any kind of idea before we 
>> start.
>>
>> Are there any white papers or other documents about switching?  It seems 
>> an obvious choice but I can only find very little performance comparisons 
>> (I did see that Elasticsearch just hired "the former VP of Products at 
>> Splunk, Gaurav Gupta" - but there were few numbers in that article either).
>>
>> Thanks,
>> Frank
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Different results for the same query

2014-06-19 Thread Mathias Gawlista

No strange entries in the log.

As a temporary solution I rebuild the index through Ruby gem tire: 
ModelName.rebuild_index

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4554b1b6-dc2e-4235-9505-8662cbcbc708%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: I need to query parent which has not child attach to it

2014-06-19 Thread Ayache Khettar

Hi

I am replying to my query I sent last week for the benefit of all. Here is 
the answer:

{
"query": {
"match": {"status" : "ERROR"}
},
"filter": {
"not":{
  "filter":{
"has_child": {
"type": "redelivery",
"query" : {
"match_all": {}
}
}
}
  }
}
}
}

All the best

Ayache

On Monday, June 16, 2014 2:25:07 PM UTC+1, Ayache Khettar wrote:
>
> Hi 
>
> The below query search for all entries in a index which has child with the 
> given stepUUID. I am interested to search for all entries which don't have 
> child item populated yet. I was looking at using 'must_not" but not sure 
> how to construct the query. Your help is very much appreciated.
>
> Regards,
>
> akhettar
>
>
>
> {
> "query": {
> "match": {
> "stepUUID": "fd7a5c5d-5254-4941-9c8a-e19a39be86b0"
> }
> },
> "filter": {
> "has_child": {
> "type": "redelivery",
> "query" : {
> "match": {
> "stepUUID": {
> "query" : "fd7a5c5d-5254-4941-9c8a-e19a39be86b0",
> "operator" : "and"
> }
> }
> }
> }
> }
> }
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2656ef48-ce23-442e-b440-1a3d39ea0494%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Any clues about transport connection issues on AWS HVM instances?

2014-06-19 Thread Radu Gheorghe

Hi Elasticsearch list :)

I'm having some trouble while running Elasticsearch on r3.large (HVM
virtualization) instances in AWS. The short story is that, as soon as I put
any significant load on them, some requests take a very long time (for
example, Indices Stats) and I see disconnected/timeout errors in the logs.
Did anyone else experience similar things or has any ideas of another
solution than avoiding HVM instances?

More detailed symptoms:
- if there's very little load on them (say, 2GB of data on each node, few
queries and indexing operations) all is well
- by "significant load", I mean some 10GB of data, a few queries per
minute, 100 docs indexed per second (4K per doc, <10 fields). By no means
"overload", CPU rarely tops 20%, no significant GC, nothing suspicious in
any of the metrics SPM  collects. The only clue
is that, for the time the problem appears, we get heartbeat alerts because
requests to the stats APIs take too long
- by "some requests take very long time", I mean that some queries take
miliseconds (as I would expect them), and some take 10 minutes or so.
Eventually succeeding (at least this was the case for the manual requests
I've sent)
- sometimes, nodes get temporarily dropped from the cluster, but then
things quickly come back to green. However, sometimes shards were stuck
while relocating

Things I've tried:
- different ES versions and machine sizes: the same problem seems to appear
on 0.90.7 with r3.xlarge instances, I'm on 1.1.1 with r3.large
- teared down all machines and launched other ones and redeployed. Same
thing
- different JVM (1.7) versions: Oracle u25, u45, u55, u60, OpenJDK u51.
Same thing everywhere
- spawned the same number of machines with m3.large (same specs as
r3.large, except for half of the RAM, paravirtual instead of HVM). The
problem magically went away with the same data and load

Here are some Node Disconnected exceptions:
[2014-06-18 13:05:35,058][WARN ][search.action] [es01] Failed
to send release search context
org.elasticsearch.transport.NodeDisconnectedException:
[es02][inet[/10.140.1.84:9300]][search/freeContext] disconnected
[2014-06-18 13:05:35,058][DEBUG][action.admin.indices.stats] [es01]
[83f0223f-4222-4a57-a918-ff424924f002_2014-05-20][1],
node[oOlO-iewR3qnAuQkT28vfw], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@3339f285]
org.elasticsearch.transport.NodeDisconnectedException:
[es02][inet[/10.140.1.84:9300]][indices/stats/s] disconnected

I've enabled TRACE logging on both transport and discovery and all I see is
connection timeouts and exceptions, like:

07:29:19,039][TRACE][transport.netty ] [es01] close connection exception
caught on transport layer [[id: 0x190d8444]], disconnecting from relevant
node

Or, more verbose:

[2014-06-16 07:29:19,060][TRACE][transport.netty  ] [es01] connect
exception caught on transport layer [[id: 0x6816c0fe]]
org.elasticsearch.common.netty.channel.ConnectTimeoutException: connection
timed out: es03/10.171.39.244:9300
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2014-06-16 07:29:19,060][TRACE][discovery.zen.ping.unicast] [es01] [1]
failed to connect to [#zen_unicast_7#][es01][inet[es04/10.79.155.249:9300]]
org.elasticsearch.transport.ConnectTransportException: [][inet[es04/
10.79.155.249:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:683)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:643)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:610)
at
org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:133)
at
org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:279)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.netty.channel.ConnectTimeoutException:
connection timed out: es03/10.171.39.244:9300
at
org.elasticsearch.common.netty.channel.socke

Re: How to create 2 elasticsearch cluster instances on two machine

2014-06-19 Thread JONBON DASH

Hi Mark,

I have tried with unicast zen discovery, but i am getting exception "*failed
to send join request to master*".

This is the below configuration I set in elasticsearch.yml

NodeA:(192.168.19.102)
cluster.name: elasticsearch_RM
node.name: "ElasticSearch1"
node.master: true
node.data: true
transport.tcp.port: 9301
http.port: 9201

output:

*[2014-06-19 16:00:15,572][INFO ][cluster.service  ]
[ElasticSearch1] new_master
[ElasticSearch1][fStA_GANTZaiel_RREBAMg][DC-NG-01][inet[/192.168.19.102:9301]]{master=true},*
* reason: zen-disco-join (elected_as_master)*

NodeB: (192.168.19.105)
cluster.name: elasticsearch_RM
node.name: "ElasticSearch2"
node.master: false
node.data: true
transport.tcp.port: 9302
http.port: 9202
gateway.expected_nodes: 2
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.19.102:9301"]

output:

*failed to send join request to master
[[ElasticSearch1][fStA_GANTZaiel_RREBAMg][DC-NG-01][inet[/192.168.19.102:9301]]{master=true}],
reason [org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting
for task.]*


Please let me know, if any configuration I have missed.

Thanks & Regards,
Jonbon Dash




On Thu, Jun 19, 2014 at 1:32 PM, Mark Walkom 
wrote:

> Have a search for unicast zen discovery in the docs and you will be good
> to go.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 19 June 2014 17:52, JONBON DASH  wrote:
>
>> Hi Pulkit,
>>
>> Thanks for the early response.
>>
>> I want to maintain the same cluster name between two different machine.
>>
>> For more clarification,
>>
>> Suppose "NodeA" started with cluster name "elasticsearch_RM" in machine 1
>> as a master and "NodeB" started with same cluster name "elasticsearch_RM"
>> in machine 2 as worker. Both are running in same network.
>>
>> What setup I need to define in elasticsearch.yml file, so that worker
>> node auto-discover the master node? If possible, can you please share some
>> sample YML configuration.
>>
>> Thanks & Regards,
>> Jonbon Dash
>>
>> On Thursday, June 19, 2014 12:16:41 PM UTC+5:30, Kartavya wrote:
>>>
>>> You can use different cluster name.
>>>
>>> Thanks,
>>> Pulkit Agrawal
>>>
>>> Sent from my iPhone
>>>
>>> On 19-Jun-2014, at 11:58 AM, JONBON DASH  wrote:
>>>
>>> Good morning
>>>
>>> Can anyone guide me how to setup 2 or more instances of ElasticSearch
>>> Cluster on two different window server machine on same network?
>>>
>>> Thanks & Regards,
>>> Jonbon Dash
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/21430aa5-17aa-4aca-b97d-f1f63f6fb4fa%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/c048ac6a-cde9-4917-8559-149db5f7da64%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/4bxlrRX-ZbA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEM624biJ85rKcaATgWnChsEPNConmoCcbdjGi-Qgq3NUQai8A%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABk0ASnLxYdogV%3D__yXw-_-KpvpxZz8XMRU9kot_aq5%2BrPtGbw%40mail.gmail.com.
For more options, visit https://gro

Re: Topic Modeling Similarity

2014-06-19 Thread Alexander Reelsen

Hey

check out the similarity module documentation at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-similarity.html
You need to create your own lucene similarity and register it in a custom
elasticsearch search plugin, a sample is at
https://github.com/tlrx/elasticsearch-custom-similarity-provider

Hope this helps


--Alex


On Wed, Jun 18, 2014 at 5:16 PM, Sri Harsha 
wrote:

> Hi All
>
> I have a custom similarity jar for a particular property/field. How would
> I make use of such a jar.
>
> I am new to ES and have gone through basic documentation.
>
> Using other similarity metric among available list in the mapping file is
> clear for me. But I am more interested in coding own similarity metrics.
>
> Any pointers and Help is hugely appreciated.
>
> Thank you
> Sri Harsha
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6f9bceb2-939d-4006-b241-b3ae0780c50f%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8CXaQ_%2B%3DOnMH5SaD-JDn3YWTHuVVOFfPpY2p1%3DnaW2%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Regarding autosuggest completion type of field.

2014-06-19 Thread Alexander Reelsen

Hey,

just a wild guess: Are you having more than one type in your mapping and
not every type has this field configured as a completion field?
If it is not this cause, can you create a full blown recreation like
mentioned in http://elasticsearch.org/help

Thanks a lot!


--Alex


On Wed, Jun 18, 2014 at 3:32 PM, Taral Oza  wrote:

> Hello All,
>
> I am getting the the error about field name title_suggest is not of type
> completion, while trying to call autosuggest using completion type.
>
> *MAPPING :*
>
> {"catalog_search":{"dynamic":"true","properties": {"_delete":{"type":
> "string"},"id":  {"type": "integer"},"title":  {"type": "string"},"image":
>  {"type": "string"},"title_suggest": {"type":
> "completion","index_analyzer": "simple","search_analyzer":
> "simple","payloads":
> true},"is_category":{"type":"boolean","default":"false"},"description":{"type":
> "string"},"url":{"type": "string"},"sku":{"type":
> "string"},"status":{"type": "string"},"categories":
>  {"type":"object","dynamic":true},"attributes":
>  {"type":"object","dynamic":true},"price":{"type":
> "float"},"add_cart_link":{"type": "string"
>
>
> Please let me know what's wrong with my mapping ..
>
> Thanks,
> Taral Oza
>
>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/02e5a3bc-c7ea-4424-be47-8153b4e6887d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_Yz2MzKen449NnL_EW5TOFCa%2BAkTxmGyPRK-b87fFyxw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Strange behavior: same id in different types

2014-06-19 Thread Alexander Reelsen

Hey,

your request is already ambigous. You specify a single ID, but expect two
results to return, as you did not specify a type. However to identify a
document in a unique fashion, you need the tuple of index/type/id, of which
a type is missing here.

So, either specify all three, or maybe elasticsearch should possibly reject
such a request and return an error message. Maybe create a github issue?


--Alex


On Wed, Jun 18, 2014 at 1:57 PM, Hovsep Avakyan 
wrote:

> Hi !  I just started play with ES and noted some strange behavior:
>
> First I just indexed 2 documents:
> curl -XPUT http://192.168.0.118:9200/test/company/apple -d '{data:"Apple
> corp"}'
> curl -XPUT http://192.168.0.118:9200/test/fruit/apple -d '{data:"Just red
> apple"}'
>
> As you can see, they have same id and placed in same index,but different
> types.
> Ok, next I perform multi get request:
>
> curl -XGET http://192.168.0.118:9200/test/_mget?pretty -d
> '{ids:["apple"]}'
>
> Response:
>
> {
>   "docs" : [ {
> "_index" : "test",
> "_type" : "fruit",
> "_id" : "apple",
> "_version" : 1,
> "found" : true, "_source" : {data:"Just red apple"}
>   } ]
> }
>
>
> As you can see only one document returned. Why not both? I find such
> behavior very ambiguous. I think if we request whole index, so all
> documents with given id should be returned regardless of type.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/600e6174-036c-48dc-8c6d-2c31e9c9eccb%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-AMUZveO1KtrTKYCDa4eJENTZ_3_0uFmHvd5ABsVd02Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Get X word before and after search word

2014-06-19 Thread Alexander Reelsen

Hey,

you potentially could use the termvectors API for this, see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Not sure, if this is excalty, what you are after... maybe explain your
use-case a bit more


--Alex



On Tue, Jun 17, 2014 at 2:19 PM, Petr Janský  wrote:

> Hello,
>
> I'm trying to find way how to get words/terms around search word eg let's
> have a document with text "The best search engine is ElasticSearch". I will
> search for "best" and get info that word "search" is xtime the next one
> after search words.
>
> Thx
> Petr
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4a1b4896-6263-4de2-ad45-dc5efd4df7a3%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_J4biH1Dbce%2Bqh6VYd3hzK1LQYk43Okj4zE52BocR6Mw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Forcing sync of replicas

2014-06-19 Thread Michael Salmon

Moving the shard was a good idea but unfortunately:

{
   "error": "ElasticsearchIllegalArgumentException[[move_allocation] can't 
move [ds_infrastructure-storage-na-qtree][0], shard is not started (state = 
INITIALIZING]]",
   "status": 400
}

Allocate didn't work either as the shard was not unallocated.

On Wednesday, 11 June 2014 09:03:44 UTC+2, Boaz Leskes wrote:
>
> Hi Michael,
>
> The fix option of check_on_startup checks indices and removes the 
> *segments* that are corrupted, this is a lucene level operation and is 
> primarily meant to be used in extreme cases where you only had one copy of 
> shards and those got corrupted. 
>
> In your cases, since the primaries are good, the easiest would be to use 
> the reroute API to tell elasticsearch to move the replicas that have been 
> corrupted to another node. When moving replicas, ES actually makes a new 
> copy of the primary as it protects against exactly these kinds of 
> situations: 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-reroute.html#cluster-reroute
>
> Cheers,
> Boaz 
>
> On Tuesday, June 10, 2014 9:23:56 AM UTC+2, Michael Salmon wrote:
>>
>> I had a problem with corrupted shards so I restarted my cluster with 
>> "index.shard.check_on_startup: fix" and the corrupted shards were fixed 
>> (i.e. deleted). Unfortunately the replicas and primaries then had differing 
>> numbers of documents despite them all being green. Fortunately the 
>> primaries always had more than the replicas so that I hopefully haven't 
>> lost anything.
>>
>> To fix this I set the number of replicas to 0 then 1 on all the indices 
>> that had mismatches. Is there a better technique? I really didn't like 
>> having just one copy of my data even if it was for a short time.
>>
>> I am still running 1.1.1, is this addressed by a later release?
>>
>> /Michael
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7ae6a3cf-d51b-4d04-bbae-e7f085be9a90%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

puppet-elasticsearch options

2014-06-19 Thread Andrej Rosenheinrich

Hi,

i am playing around with puppet-easticsearch 0.4.0, works wells so far 
(thanks!), but I am missing a few options I havent seen in the 
documentation. As I couldnt figure it out immediately by reading the 
scripts, may be someone can help me fast on this:

- there is an option to change the port (9200), but this is only the http 
port. Is there an option to change the tcp transport port as well?
- how can I configure logging? I think about logfile names and loglevel, 
may be even thresholds for slowlog. May be this is interesting enough to 
add it to the documentation?
- is there an option in the module to easily configure memory usage?
- how can I configure the discovery minimum?

I am aware that I could go ahead and manipulate the elasticsearch.yml file 
with puppet, I am just curious if there are options for my questions 
already implemented in the module I have missed. So if someone could give 
me a hint or an example it would be really helpful!

Thanks in advance!
Andrej

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01bd5d3c-f00b-4a5d-b341-054af734462d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Logstash limitting ElasticSearch heap

2014-06-19 Thread Mark Walkom

Lots of GC isn't bad, you want to see a lot of small GCs rather than
stop-the-world sort of ones which can bring your cluster down.

You can try increasing the index refresh interval - index.refresh_interval.
If you don't require "live" access, then increasing it to 60 seconds or
more will help.
If you can gist/pastebin a bit more info on your cluster, node specs,
versions, total indexes and size etc it may help.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 18 June 2014 22:53, Antonio Augusto Santos  wrote:

> Hello,
>
> I think I'm hitting some kind of wall here... I'm running logstash on a
> syslog server. It receives logs from about 150 machines and also a LOT of
> iptables logs, and sending it to ElasticSearch. But, I think I'm not
> hitting all speed that I should. My Logstash throughput tops at about 1.000
> events/s, and it looks like my ES servers (I've 2) are really light.
>
> On logstash I've three configs (syslog, ossec and iptables), so I get
> three new nodes on my cluster. I've set up LS Heap Size to be 2G, but
> according to bigdesk, the ES module is getting only about 150MB, and its
> generating a LOT of GC.
>
> Bellow the screenshot for big desk:
>
> [image: bigdesk]
> 
>
> And here the logstash process I'm running:
>
> *#ps -ef | grep logstash
> logstash 13371 1 99 14:42 pts/000:29:37 /usr/bin/java 
> -Djava.io.tmpdir=/opt/logstash/tmp -Xmx2g -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC -Djava.awt.headless=true 
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram 
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime 
> -Xloggc:./logstash-gc.log -jar 
> /opt/logstash/vendor/jar/jruby-complete-1.7.11.jar -I/opt/logstash/lib 
> /opt/logstash/lib/logstash/runner.rb agent -f /etc/logstash/conf.d -l 
> /var/log/logstash/logstash.log*
>
>
>
>  My Syslog/LS memory usage seens very light as well (its a 4 core VM), but
> the logstash process is always topping in about 150% - 200%
>
>
> *# free -m
>  total   used   free sharedbuffers cached
> Mem:  7872   2076   5795  0 39   1502
> -/+ buffers/cache:534   7337
> Swap: 1023  8   1015
> # uptime
>  15:02:04 up 23:52,  1 user,  load average: 1.39, 1.12, 0.96*
>
>
>
> Any ideas what I can do to increase the indexing performance?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b6346f68-1c17-4699-8ad0-cb9121b5c7cb%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Zga8ixq9jVMvsvL9%2BO%2BPQNDUH3VCyubwdBrGLSRBE5eQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Documents on Elastic search and Kibana

2014-06-19 Thread Mark Walkom

http://www.elasticsearch.org/resources/ has videos and documentation that
will help.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 June 2014 17:44, srinu konda  wrote:

> Hi,
>
> I interested to learn Elastic search and Kibana, so please provide me some
> useful documents or links so that I can gain knowledge on ES and Kibana.
>
>
> Thanks & Regards,
> Srinivas.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/89ea7ffb-24a1-4cff-bd86-eac0d0b253ff%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Ztg8mSm74YnaD40Qg2P1XxGZohAcjvhJU2WmJmUhtKqA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSearch and duplicate content

2014-06-19 Thread Joffrey Hercule

Hi all, 
i'm an elasticsearch's newbie.

I've a little problem about duplicate content in order to guess if an item 
must be inserted or updated.

Example:
- A car contains a brand, a model, a color, ... (max 10 criterias)
- in order to know if a record exists into ES, we use an algorithm with 
points system
- so if the brand exists, we count 1 point. If the brand and the model 
exist, we count 5 points, etc.
- after the search, we do a sum. If the total is high, we need to merge the 
record, otherwise, we create it.

I tried a method with bool/should match and top score but it took very long 
time (more than 2 seconds) to retrieved the datas for 8 bool term.

Do you have a better idea about my problem ? Thanks in advance for your 
help and sorry for my bad english !

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/23e8fd48-ab04-444e-bf89-7d5ab8025898%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to create 2 elasticsearch cluster instances on two machine

2014-06-19 Thread Mark Walkom

Have a search for unicast zen discovery in the docs and you will be good to
go.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 19 June 2014 17:52, JONBON DASH  wrote:

> Hi Pulkit,
>
> Thanks for the early response.
>
> I want to maintain the same cluster name between two different machine.
>
> For more clarification,
>
> Suppose "NodeA" started with cluster name "elasticsearch_RM" in machine 1
> as a master and "NodeB" started with same cluster name "elasticsearch_RM"
> in machine 2 as worker. Both are running in same network.
>
> What setup I need to define in elasticsearch.yml file, so that worker node
> auto-discover the master node? If possible, can you please share some
> sample YML configuration.
>
> Thanks & Regards,
> Jonbon Dash
>
> On Thursday, June 19, 2014 12:16:41 PM UTC+5:30, Kartavya wrote:
>>
>> You can use different cluster name.
>>
>> Thanks,
>> Pulkit Agrawal
>>
>> Sent from my iPhone
>>
>> On 19-Jun-2014, at 11:58 AM, JONBON DASH  wrote:
>>
>> Good morning
>>
>> Can anyone guide me how to setup 2 or more instances of ElasticSearch
>> Cluster on two different window server machine on same network?
>>
>> Thanks & Regards,
>> Jonbon Dash
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/21430aa5-17aa-4aca-b97d-f1f63f6fb4fa%
>> 40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c048ac6a-cde9-4917-8559-149db5f7da64%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624biJ85rKcaATgWnChsEPNConmoCcbdjGi-Qgq3NUQai8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to create 2 elasticsearch cluster instances on two machine

2014-06-19 Thread JONBON DASH

Hi Pulkit,

Thanks for the early response.

I want to maintain the same cluster name between two different machine.

For more clarification,

Suppose "NodeA" started with cluster name "elasticsearch_RM" in machine 1 
as a master and "NodeB" started with same cluster name "elasticsearch_RM" 
in machine 2 as worker. Both are running in same network.

What setup I need to define in elasticsearch.yml file, so that worker node 
auto-discover the master node? If possible, can you please share some 
sample YML configuration.

Thanks & Regards,
Jonbon Dash

On Thursday, June 19, 2014 12:16:41 PM UTC+5:30, Kartavya wrote:
>
> You can use different cluster name.
>
> Thanks,
> Pulkit Agrawal
>
> Sent from my iPhone
>
> On 19-Jun-2014, at 11:58 AM, JONBON DASH > 
> wrote:
>
> Good morning 
>
> Can anyone guide me how to setup 2 or more instances of ElasticSearch 
> Cluster on two different window server machine on same network?
>
> Thanks & Regards,
> Jonbon Dash
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/21430aa5-17aa-4aca-b97d-f1f63f6fb4fa%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c048ac6a-cde9-4917-8559-149db5f7da64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Documents on Elastic search and Kibana

2014-06-19 Thread srinu konda

Hi,

I interested to learn Elastic search and Kibana, so please provide me some 
useful documents or links so that I can gain knowledge on ES and Kibana.


Thanks & Regards,
Srinivas.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/89ea7ffb-24a1-4cff-bd86-eac0d0b253ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Terms facets top 10 stat,size:500 not valid

2014-06-19 Thread 钱志强

Hi,all:
I'm using elasticsearch for the top 10 with count 500 item,But not valid.

would be glad to let me know if you have any knowledge about this, thanks.

{
"facets": {
  "terms": {
"terms": {
  "field": "agent",
  "size": 10,
  "order": "count",
  "exclude": []
},
"facet_filter": {
  "fquery": {
"query": {
  "filtered": {
"query": {
  "bool": {
"should": [
  {
"query_string": {
  "query": "*"
}
  }
]
  }
},
"filter": {
  "bool": {
"must": [
  {
"range": {
  "@timestamp": {
"from": null,
"to": null
  }
}
  }
]
  }
}
  }
}
  }
}
  }
},
"size": 500
  }
Thanks,
Terry

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/323259fa-7105-42e5-af34-a20add209395%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

55 matches

Mail list logo