storage use by attachment plugin

2014-06-20 Thread Izam Fahmi Alias
Dear All,

i have about 20GB size of document, and i want to index all the document 
content using attachment plugin, my question is, what is the size of the 
index, is't  the size will be also 20gb

thank you

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/90ac9a54-6014-4db7-8583-6771aa2568c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Extremely slow indexing -- java throwing http excetion errors

2014-06-20 Thread Alexander Reelsen
Hey.

judging from the exception this looks like an unstable network connection?
Are you using persistent HTTP connections? Pinging the nodes by each other
is not a problem I guess?


--Alex


On Thu, Jun 19, 2014 at 12:12 AM, alekjouhar...@gmail.com wrote:

 Hello all,

 So here's the issue, our cluster was previously very underwhelmed as far
 as resource consumption, and after some config changes (see complete config
 below) -- we were able to hike up resource consumption, but are still
 indexing documents at the same sluggish rate of  400 docs/second.

 Redis and Logstash are definitely not the bottlenecks, and the indexing
 seems to be growing exponentially worse as we pull in more data.  We are
 using elasticsearch v 1.1.1.

 The java http exception errors would definitely explain the slugishness,
 as there seems to be a socket timeout every second, like clockwork -- but
 i'm at a loss for what could be causing the errors to begin with.

 We are running redis,logstash kibana and the es master (no data) on one
 node, and have our elasticsearch data instance on another node.  Network
 latency is definitely not so atrocious that it would be an outright
 bottleneck, and data gets to the secondary node fast enough -- but is
 backed up in indexing.

 Any help would greatly be appreciated, and I thank you all in advance!

 ### ES CONFIG ###


 index.indexing.slowlog.threshold.index.warn: 10s
 index.indexing.slowlog.threshold.index.info: 5s
 index.indexing.slowlog.threshold.index.debug: 2s
 index.indexing.slowlog.threshold.index.trace: 500ms



 monitor.jvm.gc.young.warn: 1000ms
 monitor.jvm.gc.young.info: 700ms
 #monitor.jvm.gc.young.debug: 400ms

 monitor.jvm.gc.old.warn: 10s
 monitor.jvm.gc.old.info: 5s
 #monitor.jvm.gc.old.debug: 2s
 cluster.name: iislog-cluster
 node.name: VM-ELKIIS
 discovery.zen.ping.multicast.enabled: true
 discovery.zen.ping.unicast.hosts: [192.168.6.145]
 discovery.zen.ping.timeout: 5
 node.master: true
 node.data: false
 index.number_of_shards: 10
 index.number_of_replicas: 0
 bootstrap.mlockall: true
 index.refresh_interval: 30
 indices.memory.index_buffer_size: 50%
 index.translog.flush_threshold_ops: 5
 index.store.type: mmapfs
 index.store.compress.stored: true

 threadpool.search.type: fixed
 threadpool.search.size: 20
 threadpool.search.queue_size: 100

 threadpool.index.type: fixed
 threadpool.index.size: 20
 threadpool.index.queue_size: 100

  JAVA ERRORS IN ES LOG ###

 [2014-06-18 09:39:09,565][DEBUG][http.netty   ] [VM-ELKIIS]
 Caught exception while handling client http traffic, closing connection
 [id: 0x7561184c, /192.168.6.3:6206 = /192.168.6.21:9200]
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at
 org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 at
 org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/95e3bc66-b403-4844-a798-da0f25141ca6%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/95e3bc66-b403-4844-a798-da0f25141ca6%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-jK5P8DQWxVPzvcvOsFViFziGwSTnXSbYp689M5wLmMg%40mail.gmail.com.
For more options, visit 

Re: Splunk vs. Elastic search performance?

2014-06-20 Thread joergpra...@gmail.com
It is correct you noted that Elasticsearch comes with developer settings -
that is exactly what a packages ES is meant for.

If you find issues when configuring and setting up ES for critical use, it
would be nice to post your issues so others can also find help too, and
maybe share their solutions , because there are ES installations that run
successfully in critical environments.

By just quoting hate of dev teams, it is rather impossible for me to
learn about the reason why this is so. Learning facts is more important
than emotions to fix software issues. The power of open source is that such
issues can be fixed by the help of a public discussion in the community. In
closed software products, you can not rely on issues being discussed
publicly for best solutions how to fix them.

Jörg



On Thu, Jun 19, 2014 at 2:48 PM, Thomas Paulsen monokit2...@googlemail.com
wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
 The system is slow but ok to use.

 We tried Elasticsearch and we were able to get the same performance with
 the same amount of machines. Unfortunately with Elasticsearch you need
 almost double amount of storage, plus a LOT of patience to make is run. It
 took us six months to set it up properly, and even now, the system is quite
 buggy and instable and from time to time we loose data with Elasticsearch.

 I don´t recommend ELK for a critical production system, for just dev work,
 it is ok, if you don´t mind the hassle of setting up and operating it. The
 costs you save by not buying a splunk license you have to invest into
 consultants to get it up and running. Our dev teams hate Elasticsearch and
 prefer Splunk.

 Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:

 That's a lot of data! I don't know of any installations that big but
 someone else might.

 What sort of infrastructure are you running splunk on now, what's your
 current and expected retention?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com wrote:

 We have a large Splunk instance.  We load about 1.25 Tb of logs a day.
  We have about 1,300 loaders (servers that collect and load logs - they may
 do other things too).

 As I look at Elasticsearch / Logstash / Kibana does anyone know of a
 performance comparison guide?  Should I expect to run on very similar
 hardware?  More? or Less?

 Sure it depends on exactly what we're doing, the exact queries and the
 frequency we'd run them but I'm trying to get any kind of idea before we
 start.

 Are there any white papers or other documents about switching?  It seems
 an obvious choice but I can only find very little performance comparisons
 (I did see that Elasticsearch just hired the former VP of Products at
 Splunk, Gaurav Gupta - but there were few numbers in that article either).

 Thanks,
 Frank

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGtte%3DRWjZCNtBWcX5y4Z9j7yXpyXC5MWdzpqubtCce5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Clarification on has_child filter memory requirements

2014-06-20 Thread Alexander Reelsen
Hey,

not all parent documents (and not the data), just their ids. Still this can
accumulate, which is the reason why you should monitor the size of that
data structure (exposed in the nodes stats).

Hope that helps.


--Alex


On Thu, Jun 19, 2014 at 6:03 AM, Drew Kutcharian d...@venarc.com wrote:

 Based on the official docs (
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html
 ):

 {quote}
 memory considerations

 With the current implementation, all _parent field values and all _id
 field values of parent documents are loaded into memory (heap) via field
 data in order to support fast lookups, so make sure there is enough memory
 for it.
 {/quote}

 Does this mean that all the parent docs will be loaded into memory or the
 ones matching the filter? If the former is true, then it would mean that
 one should keep the size of the parent objects to minimum, right? In
 addition, say has_child is a part of a conjunction (regular filter AND
 has_child), would ES still load all the parent docs, or only the ones that
 matched the first filter?

 Thanks,

 Drew

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/FE901831-FB74-4F89-A313-16C1C08BF0A5%40venarc.com
 https://groups.google.com/d/msgid/elasticsearch/FE901831-FB74-4F89-A313-16C1C08BF0A5%40venarc.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-%3Dvbk3BkFQBbuXybg_-QX%3DEj6Rou2QMzqbzXUsbYJV8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
Information
My note_source contain picture (.jpg, .png ...) in base64 and text.

For my mapping I have used :
type = string
analyzer = reuteurs (the name of my analyzer)


Any idea ?

Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :

 Hello
 I have some issue, when I index a particular data note_source (sql 
 longtext).
 I use the same analyzer for each fields (except date_source and id_source) 
 but for note_source, I have a warn monitor.jvm.
 When I remove note_source, everything fine. If I don't use analyzer on 
 note_source, everything fine, but if I use my analyzer on note_source I 
 have some crash.

 I think I have enough memory, I have used ES_HEAP_SIZE.
 Maybe my problem it's with accent (ascii, utf-8)

 Can you help me with this ?



 *My Setting*

  public function createSetting($pf){
 $params = array('index' = $pf, 'body' = array(
 'settings' = array(
 'number_of_shards' = 5,
 'number_of_replicas' = 0,
 'analysis' = array(
 'filter' = array(
 'nGram' = array(
 token_chars =array(),
 type = nGram,
 min_gram = 3,
 max_gram  = 250
 )
 ),
 'analyzer' = array(
 'reuters' = array(
 'type' = 'custom',
 'tokenizer' = 'standard',
 'filter' = array('lowercase', 'asciifolding', 
 'nGram')
 )
 )
 )
 )
 ));
 $this-elasticsearchClient-indices()-create($params);
 return;
 }


 *My Indexing*

 public function indexTable($pf,$typeElement){

 $params =array(
 index ='_river', 
 type = $typeElement, 
 id = _meta, 
 body =array(
   
 type = jdbc,
 jdbc = array(
 url = jdbc:mysql://ip/name,
 user = 'root',
 password = 'mdp',
 index = $pf,
 type = $typeElement,
 sql = select id_source as _id, id_sous_theme, 
 titre_source, desc_source, note_source, adresse_source, type_source, 
 date_source from source,
 max_bulk_requests = 5,  
 )
 )
 
 );
 
  
 $this-elasticsearchClient-index($params);
 }

 Thanks in advance.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Losing data after Elasticsearch restart

2014-06-20 Thread Alexander Reelsen
Hey,

the exception you showed, can possibly happen, when you remove an alias.
However you mentioned NullPointerException in your first post, which is not
contained in the stacktrace, so it seems, that one is still missing.

Also, please retry with a newer version of Elasticsearch.


--Alex


On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jais...@gmail.com
wrote:

 Hi Alexander,
We sent you the stack trace. Can you please enlighten us on
 this?

 Thanks,
 Rohit


 On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal rohit.jais...@gmail.com
 wrote:

 Hi Alexander,
 Thanks for your reply. We plan to upgrade in the
 long run, however we need to fix the data loss problem on 0.90.2 in the
 immediate term.

 Here is the stack trace -


 10:09:37.783 PM

 [22:09:37,783][WARN ][indices.cluster  ] [Storm]
 [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
 org.elasticsearch.indices.recovery.RecoveryFailedException:
 [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
 Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
 [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
 at
 org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
 Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
 Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
 [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
 at
 org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
 at
 org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
 at
 org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
 at
 org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
 at
 org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
 at
 org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
 at
 org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: org.elasticsearch.transport.RemoteTransportException:
 [Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
 Caused by: org.elasticsearch.indices.InvalidAliasNameException:
 [b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
 [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
 alias name was passed to alias Filter
 at
 org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
 at
 org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
 at
 org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
 at
 org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 [22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending failed
 shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
 node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
 shard, message
 [RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
 failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
 into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
 RemoteTransportException[[Jeffrey 
 Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
 nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
 Phase[2] Execution failed]; nested:
 RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
 nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
 

Re: Count request does not support [filter]. Why?

2014-06-20 Thread Alexander Reelsen
Hey,

not a hundred percent sure, what you mean here. The post_filter setting?
There are two possibilities: Either use the search_type=count or use a
filtered query in the count API. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-count.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#count

Also, be aware that the execution models are a bit different (which may
result in different performance), see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-post-filter.html#search-request-post-filter

Hope this helps, if not please refine your questions


--Alex



On Thu, Jun 19, 2014 at 3:23 PM, Andrew Gaydenko andrew.gayde...@gmail.com
wrote:

 Count request does not support [filter]. Why? How to count with the same
 filter (except for size, fields, from) and query I'm probably going
 to search hits after counting?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/290e2be1-6f48-4266-a02e-4c8ff7620225%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/290e2be1-6f48-4266-a02e-4c8ff7620225%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9oHqQVvAbnh4pTRvtv%3DhzZJmq6YWnRjnkcRSXNqiVbcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Very frequent ES OOM's potential segment merge problems

2014-06-20 Thread Alexander Reelsen
Hey,

can you provide more information about the OOM exception? Also you should
use the nodes stats API to monitor your system, so you can maybe easily
spot, where this memory consumption stems from. Also, are you just indexing
or doing searches/queries/gets as well?


--Alex


On Thu, Jun 19, 2014 at 10:35 PM, Paul Sabou paul.sa...@gmail.com wrote:

 Hi,

 *Situation:*
 We are using ES 1.2.1 on a machine with 32GB RAM, fast SSD and 12 cores. The
 machine runs Ubuntu 14.0.x LTS.
 The ES process has 12GB of RAM allocated.

 We have an index in which we inserted 105 million small documents so the
 ES data folder is around 50GB in size
 (we see this by using du -h . on the folder)

 The new document insertion rate is rather small (ie. 100-300 small docs
 per second).

 *The problem:*

 We experienced rather frequent ES OOM (Out of Memory) at a rate of around
 one every 15 mins. To lower the load on the index
 we deleted 104+ million docs (ie. mostly small log entries) by deleting
 everything in one type :
 curl -XDELETE http://localhost:9200/index_xx/type_yy

 so that we ended up with an ES index with several thousands docs.
 After this we started to experience massive disk IO (10-20Mbs reads and
 1MBs writes) and more frequent OOM's (at a rate of around
 one every 7 minutes). We restart ES after every OOM and kept monitoring
 the data folder size. Over the next hour the size went down
 to around 36GB but now it's stuck there (doesn't go down in size even
 after several hours).

 *Questions* :
 Is this a problem related to segment merging running out of memory? If so
 how can be solved?
 If not, what could be the problem?


 Thanks
 Paul.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/695c92a3-f77a-46bd-9041-79421a0bf1be%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/695c92a3-f77a-46bd-9041-79421a0bf1be%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8Ed84KwzVg1MTK8Da83YgO6pjb3QMLVwCT%2B48NPw3HfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch Node.Client Options

2014-06-20 Thread Alexander Reelsen
Hey,

a client node with a full 10gb heap and garbage collection does not free
anything, so those objects are still in use (which clearly explains THAT
the OOM happens, but not WHY). Do you have huge searches going on spanning
a lot of shards with deep pagination (all the time). Do you have some sort
of backup mechanism which might be response for this? Anything from a
search perspective which might lead to excessive memory usage?


--Alex


On Fri, Jun 20, 2014 at 12:15 AM, VB vishal.batgh...@gmail.com wrote:

 And this stack trace.

 [2014-06-04 14:47:12,939][INFO ][cluster.service  ] [BUS2F2801F3]
 master {new 
 [ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false,
 max_local_storage_nodes=1, master=true}, previous [ELS-10.76.121.130][
 BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 max_local_storage_nodes=1, master=true}}, removed {[ELS-10.76.121.130][
 BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 max_local_storage_nodes=1, master=true},}, reason: zen-disco-master_failed
 ([ELS-10.76.121.130][BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300
 ]]{data=false, max_local_storage_nodes=1, master=true})
 [2014-06-04 14:48:03,969][WARN ][monitor.jvm  ] [BUS2F2801F3]
 [gc][old][55503][489] duration [49.6s], collections [1]/[49.9s], total
 [49.6s]/[4.5h], memory [9.9gb]-[9.9gb]/[9.9gb], all_pools {[young]
 [532.5mb]-[532.5mb]/[532.5mb]}{[survivor] [51.3mb]-[42.8mb]/[66.5mb]}{[old]
 [9.3gb]-[9.3gb]/[9.3gb]}
 [2014-06-04 14:48:40,256][WARN ][monitor.jvm  ] [BUS2F2801F3]
 [gc][old][55504][490] duration [35.7s], collections [1]/[36.2s], total
 [35.7s]/[4.5h], memory [9.9gb]-[9.9gb]/[9.9gb], all_pools {[young]
 [532.5mb]-[532.5mb]/[532.5mb]}{[survivor] [42.8mb]-[58.6mb]/[66.5mb]}{[old]
 [9.3gb]-[9.3gb]/[9.3gb]}
 [2014-06-04 14:49:30,335][WARN ][monitor.jvm  ] [BUS2F2801F3]
 [gc][old][55505][491] duration [49.9s], collections [1]/[50s], total
 [49.9s]/[4.5h], memory [9.9gb]-[9.9gb]/[9.9gb], all_pools {[young]
 [532.5mb]-[532.5mb]/[532.5mb]}{[survivor] [58.6mb]-[63.7mb]/[66.5mb]}{[old]
 [9.3gb]-[9.3gb]/[9.3gb]}
 [2014-06-04 14:49:30,350][INFO ][discovery.zen] [BUS2F2801F3]
 master_left 
 [[ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false,
 max_local_storage_nodes=1, master=true}], reason [failed to ping, tried [3]
 times, each with  maximum [30s] timeout]
 [2014-06-04 14:49:30,865][WARN ][discovery.zen] [BUS2F2801F3]
 not enough master nodes after master left (reason = failed to ping, tried
 [3] times, each with  maximum [30s] timeout), current nodes:
 {[ELS-10.76.125.37][j3VQFYDaQLujkprUnke02w][inet[/10.76.125.37:9300
 ]]{max_local_storage_nodes=1, master=false},[ELS-10.76.122.
 38][5V8bqkEzTP2TzMukB5_j-Q][inet[/10.76.122.38:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.125.48][TGlF1uv8Q5GpgBVvIcvRAQ][
 inet[/10.76.125.48:9300]]{max_local_storage_nodes=1,
 master=false},[EDSFB1ABF7][MqLDnM5mSLqIicIuyJk7IQ][inet[/10.76.122.19:9300
 ]]{client=true, data=false, master=false},[ELS-10.76.120.
 62][evcNI2CqSs-Zz44Jdzn0aw][inet[/10.76.120.62:9300]]{client=true,
 data=false, max_local_storage_nodes=1, master=false},[BUS9364B62][
 YZPjEsvhT6OjM9ti5Lxwkg][inet[/10.76.123.123:9300]]{client=true,
 data=false, master=false},[ELS-10.76.125.38][RyeswSy8SquV5H8Vfsw75Q][
 inet[/10.76.125.38:9300]]{max_local_storage_nodes=1,
 master=false},[EDSFB1200C][XUNaWVlYQUOVZlJMv3nHMA][inet[/10.76.122.18:9300
 ]]{client=true, data=false, master=false},[ELS-10.76.124.
 214][H8N9nIU0TKyGv_prKyRVCQ][inet[/10.76.124.214:9300]]{max_local_storage_nodes=1,
 master=false},[EDS1A1F2240][ET2u1qImQCCvqc-1gRvQbQ][inet[/
 10.76.120.87:9300]]{client=true, data=false, master=false},[ELS-10.76.125.
 40][hp4wvQxER-mMPygey2Iqgg][inet[/10.76.125.40:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.122.67][BiXop5iCRgGQyGvxazMkQg][
 inet[/10.76.122.67:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.121.129][pf9xpva7Q4izIy6Nj4S4iQ][
 inet[/10.76.121.129:9300]]{data=false, max_local_storage_nodes=1,
 master=true},[EDSFB21E69][RabnwdLbT1WCp9gIE-_AXw][inet[/10.76.122.20:9300
 ]]{client=true, data=false, master=false},[EDI1AE4FD76][
 UF1RMWe6RYaZGp6BU3x-VA][inet[/10.76.124.228:9300]]{client=true,
 data=false, master=false},[ELS-10.76.125.46][nXceQp40TjOSctChaGVtKw][
 inet[/10.76.125.46:9300]]{max_local_storage_nodes=1,
 master=false},[EDI1A1EA928][rWlelgQuT7KHSfyIejmLPg][inet[/
 10.76.120.82:9300]]{client=true, data=false, master=false},[ELS-10.76.121.
 188][oWldDeY4TJioki90moNySw][inet[/10.76.121.188:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.122.34][kPSYm9G8R8i_z2skK_jq1g][
 inet[/10.76.122.34:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.125.43][JMgOIZFBSzaQZ9bVagG57w][
 inet[/10.76.125.43:9300]]{max_local_storage_nodes=1,
 master=false},[EDI1AE3EE57][7JHGaYjzS3uI7PLN8Ynm-Q][inet[/
 10.76.124.227:9300]]{client=true, data=false,
 

Re: puppet-elasticsearch options

2014-06-20 Thread Richard Pijnenburg
Hi Andrej,

Thank you for using the puppet module :-)

The 'port' and 'discovery minimum' settings are both configuration settings 
for the elasticsearch.yml file.
You can set those in the 'config' option variable, for example:

elasticsearch::instance { 'instancename':
  config = { 'http.port' = '9210', 'discovery.zen.minimum_master_nodes' 
= 3 }
}


For the logging part, management of the logging.yml file is very limited at 
the moment but i hope to get some feedback on extending that.
The thresholds for the slowlogs can be set in the same config option 
variable.
See 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-slowlog.html#index-slow-log
 
for more information.

If you have any further questions, let me know.

Cheers

On Thursday, June 19, 2014 9:53:10 AM UTC+1, Andrej Rosenheinrich wrote:

 Hi,

 i am playing around with puppet-easticsearch 0.4.0, works wells so far 
 (thanks!), but I am missing a few options I havent seen in the 
 documentation. As I couldnt figure it out immediately by reading the 
 scripts, may be someone can help me fast on this:

 - there is an option to change the port (9200), but this is only the http 
 port. Is there an option to change the tcp transport port as well?
 - how can I configure logging? I think about logfile names and loglevel, 
 may be even thresholds for slowlog. May be this is interesting enough to 
 add it to the documentation?
 - is there an option in the module to easily configure memory usage?
 - how can I configure the discovery minimum?

 I am aware that I could go ahead and manipulate the elasticsearch.yml file 
 with puppet, I am just curious if there are options for my questions 
 already implemented in the module I have missed. So if someone could give 
 me a hint or an example it would be really helpful!

 Thanks in advance!
 Andrej


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/41d7340c-5570-4728-b979-35f97c233e25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


effiecient way to store the result of a large slow query

2014-06-20 Thread Chen Wang
Hi guys,
Just wondering what is the most efficient way of executing a query that
takes time(parent/child documents) and returns large amount of entries, and
store the result in randomly evenly divided block to HDFS? e.g, the query
will return 100million records and I want every random 1million stored in a
different location(file/folder) on HDFS.

I assume I could execute the query with scroll, and then whenever I
received the 1 million records back, I then spawn anther thread to commit
it to HDFS? Is there a way to run the query distributed way and have 100
threads query ES at the same time and each getting a random 1million
back(without duplicate)? will ES hadoop help in this case?

Appreciate your input!
Chen

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACim9Rm64uHE9EQ35r_mJr9VhiEbDfD-70vS1uQHSG6UXM7ZDQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Type Ahead feature for contact list

2014-06-20 Thread Omi60
Thanks for the help.

I am able to see the correct results now, but could you please suggest how
to write following query in java 

curl -X POST localhost:9200/hotels/_suggest -d '
{
  hotels : {
text : m,
completion : {
  field : name_suggest
}
  }
}'



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Re-Type-Ahead-feature-for-contact-list-tp4057883p4057889.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1403028901688-4057889.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Storing auto generated _id under different name

2014-06-20 Thread Johny Lam
I'm using elasticsearch as the database for a service. It would make things 
easier. For example, I could just return the _source field when other apps 
query my service. Related to that is that on the javascript client side, I 
am inserting the _id field into the _source JSON object as id and using 
that as the model for two way data-binding. If the id field was in the 
source already, I wouldn't have to keep track of this.

On Tuesday, June 17, 2014 4:26:07 PM UTC-7, Adrien Grand wrote:

 No, it isn't possible.

 Why would you like to have the id of the document included in _source?


 On Tue, Jun 17, 2014 at 8:16 PM, Johny Lam john...@gmail.com 
 javascript: wrote:

 Is it possible to have the _id be auto-generated and store it so that 
 it's in the _source field under a different name, like say id instead of 
 _id?
  
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/1eb03930-64c8-44ac-9f69-7ad2ff6b563e%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/1eb03930-64c8-44ac-9f69-7ad2ff6b563e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a1b9d878-47cc-4e06-ae02-0b32375cf3bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Combine elasticsearch/logstash/kibana with hadoop

2014-06-20 Thread kay rus


Hi

For performance improvement I'm trying to combine 
elasticsearch/logstash/kibana with hadoop (cdh4). Unfortunately I'm 
familiar only with HDFS where I store logs. In my opinion the combination 
of elasticsearch and hadoop should use hdfs as storage and transparent 
hadoop map/reduce functionality for search.

I ran through elasticsearch-hadoop documentation and unfortunately I didn't 
understand how this combination could help me for kibana log analyze. 
Documentation says Elasticsearch real-time search and analytics natively 
integrated with Hadoop.. But what should I configure? Hadoop with 
Elasticsearch or Elasticsearch with Hadoop? As for first one, I found only 
java code parts, nothing about the Hadoop configuration, so it seems that I 
should be familiar with java programming. As for the last one I found only 
Hadoop HDFS Snapshot/Restore plugin, but I guess it was developed for 
indexes backup/restore, am I right?



Anyway, are my expectations right? Or elasticsearch-hadoop was developed 
for developers only and it is not suitable for 
elasticsearch/logstash/kibana + hadoop

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/979a5788-1f17-4351-8c36-e205bc67dca0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Snapshot Restore in a cluster of two nodes

2014-06-20 Thread Alexander Reelsen
Hey,

can you be more precise and create a fully fledged example (generating the
repository, executing the snapshot on cluster one, executing restore on
cluster 2, etc) and include the concrete error message in order to find out
what 'the process breaks' means here? Also provide info about elasticsearch
and jvm versions. Thanks!

Snapshots are always done per index (the primary shards) and not per node,
so there must be something else going on.
Is it possible that only one node has write access to the repository?


--Alex


On Thu, Jun 19, 2014 at 3:36 PM, Daniel Bubenheim 
daniel.bubenh...@googlemail.com wrote:

 Hello,

 we have a cluster of two nodes. Every index in this cluster consists of 2
 shards and one replica. We want to make use of  snapshots  restore to
 transfer data between two clusters. When we make our snapshots on node one
 only the primary shard is included, the replica shard is missing. While
 restoring on the other cluster the process breaks because of the missing
 second shard.
 Do we have to make a snapshot for each node to include both primary shards
 so that we can restore the whole index or am i missing something here?

 Thanks in advance
 Daniel

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/fb1b3a48-250c-46bc-9a4a-8a9ccd582164%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/fb1b3a48-250c-46bc-9a4a-8a9ccd582164%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8sq_eZ6g1sGhau%3DO2%3D93t%2Bz2yOtqiXxb7xMA9mrchuYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does shingle filter work on match_phrase in query phase?

2014-06-20 Thread Cédric Hourcade
Hello,

Let's say you have an indexed text t1 t3 t3 with shingles. The token
positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos
1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3).

So if you are searching with a match_phrase for t1 t2 t3 (even if
not tokenized as shingles) it will matches the document, because t1,
t2 and t3 are considered next to each others (based on there recorded
position) for this document.

Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walker0...@gmail.com wrote:
 How does shingle filter work on match_phrase in query phase?

 After analyzing phrase t1 t2 t3, shingle filter produced five tokens,
   t1
   t2
   t3
   t1 t2
   t2 t3

 Will match_phrase still give t1 t2 t3 a match? How it works? Thank you.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPNWyj-r6LtrWDXv_HGA-sgxfy%3DEu4Z5gJ5kRk_K2MWVNw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Very frequent ES OOM's potential segment merge problems

2014-06-20 Thread Paul Sabou
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; 
cannot complete merge
at 
org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3546)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4272)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3728)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)


On Thursday, June 19, 2014 10:35:28 PM UTC+2, Paul Sabou wrote:

 Hi,

 *Situation:*
 We are using ES 1.2.1 on a machine with 32GB RAM, fast SSD and 12 cores. The 
 machine runs Ubuntu 14.0.x LTS.
 The ES process has 12GB of RAM allocated.

 We have an index in which we inserted 105 million small documents so the 
 ES data folder is around 50GB in size
 (we see this by using du -h . on the folder)

 The new document insertion rate is rather small (ie. 100-300 small docs 
 per second).

 *The problem:*

 We experienced rather frequent ES OOM (Out of Memory) at a rate of around 
 one every 15 mins. To lower the load on the index
 we deleted 104+ million docs (ie. mostly small log entries) by deleting 
 everything in one type :
 curl -XDELETE http://localhost:9200/index_xx/type_yy

 so that we ended up with an ES index with several thousands docs. 
 After this we started to experience massive disk IO (10-20Mbs reads and 
 1MBs writes) and more frequent OOM's (at a rate of around
 one every 7 minutes). We restart ES after every OOM and kept monitoring 
 the data folder size. Over the next hour the size went down
 to around 36GB but now it's stuck there (doesn't go down in size even 
 after several hours).

 *Questions* : 
 Is this a problem related to segment merging running out of memory? If so 
 how can be solved? 
 If not, what could be the problem? 


 Thanks
 Paul.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/db4e6c34-2d6b-4623-aa9c-c6fbf9083ea9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: 100% CPU on 1 Node with JMeter Tests

2014-06-20 Thread Cédric Hourcade
Hello,

It wouldn't surprise me if both Black Mamba and Slapstick were hitting
100%, they have more shards and have to handle more requests than the
others nodes. But in your case it's only one node.

First, are you http requests evenly spread over the 4 nodes? You could also
check that all your shards are about the same size?

To check if it's an hardware problem I would:
- disable the shards rebalacing
- stop the cluster
- switch the whole data directories from Black Mamba and Slapstick
- start the cluster and rerun the benchmark

You'll  then see if the problem comes from the 3 shards or the server
itself.

Cédric Hourcade
c...@wal.fr


On Thu, Jun 19, 2014 at 7:40 PM, sai...@roblox.com wrote:

 Bump

 On Wednesday, June 18, 2014 6:20:58 PM UTC-7, sai...@roblox.com wrote:

 One out of 4 nodes always spikes to 100% CPU when we do some load tests
 using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered
 Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The
 other nodes have less than 40% CPU on them at the same time. The heap is
 set at 30GB on all of them.  This is the GIST for Hot Threads
 https://gist.github.com/RobloxSai/9f040bbd5ab7b58f2b1d when the Test
 was running. Is there anything else that can be done to improve the
 performance? The Query Response times jump to 5-8 seconds when the CPU is
 hammered.



 https://lh3.googleusercontent.com/-EDnXAEg34cA/U6I5fb2zNOI/AB4/DqybJhq3Yhc/s1600/4+Nodes+Setup.png

 I had previously posted the specs of the Servers on another thread
 https://groups.google.com/forum/?utm_medium=emailutm_source=footer#!topic/elasticsearch/P1o_4bVvECA.
 Here are the Server Specs:
 *Machine Specs:*
 Processor: Intel(R) Xeon(R) CPU E5-2630 0 @
 2.30GHz
 Number of CPU cores:24
 Number of Physical CPUs:  2
 Installed RAM:   [~256 GB Total] 128 GB 128 GB 16 MB
 Drive:Two 278GB SAS Drive configured in
 RAID 0
 *OS:*
 Arch:  64bit(x86_64)
 OS Type:Linux
 Kernel:2.6.32-431.5.1.el6.x86_64
 OS Version:Red Hat Enterprise Linux Server release
 6.5 (Santiago)
 Java Version:  Java 1.7.0_51 (Java 7u51 x64 version for
 Linux).

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPPCtwhWJtGbY6dCU_mU6cyyfh3dgkLEW-0FW%3DH4Ki7LdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Count request does not support [filter]. Why?

2014-06-20 Thread Andrew Gaydenko
Sorry, I wasn't clear enough. I mean Java client's CountRequest.source()'s 
argument content, { filter: ... } in particular.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b402f868-eeaf-484a-9081-75e81b7f5aed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Cédric Hourcade
Does it mean your applying the reuters analyzer on your base64
encoded pictures?

I guess it generates a really huge number of tokens for each entry
because of your nGram filter (with a max at 250).

Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard
bernardtanguy1...@gmail.com wrote:
 Information
 My note_source contain picture (.jpg, .png ...) in base64 and text.

 For my mapping I have used :
 type = string
 analyzer = reuteurs (the name of my analyzer)


 Any idea ?

 Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :

 Hello
 I have some issue, when I index a particular data note_source (sql
 longtext).
 I use the same analyzer for each fields (except date_source and id_source)
 but for note_source, I have a warn monitor.jvm.
 When I remove note_source, everything fine. If I don't use analyzer on
 note_source, everything fine, but if I use my analyzer on note_source I
 have some crash.

 I think I have enough memory, I have used ES_HEAP_SIZE.
 Maybe my problem it's with accent (ascii, utf-8)

 Can you help me with this ?



 My Setting

  public function createSetting($pf){
 $params = array('index' = $pf, 'body' = array(
 'settings' = array(
 'number_of_shards' = 5,
 'number_of_replicas' = 0,
 'analysis' = array(
 'filter' = array(
 'nGram' = array(
 token_chars =array(),
 type = nGram,
 min_gram = 3,
 max_gram  = 250
 )
 ),
 'analyzer' = array(
 'reuters' = array(
 'type' = 'custom',
 'tokenizer' = 'standard',
 'filter' = array('lowercase', 'asciifolding',
 'nGram')
 )
 )
 )
 )
 ));
 $this-elasticsearchClient-indices()-create($params);
 return;
 }


 My Indexing

 public function indexTable($pf,$typeElement){

 $params =array(
 index ='_river',
 type = $typeElement,
 id = _meta,
 body =array(

 type = jdbc,
 jdbc = array(
 url = jdbc:mysql://ip/name,
 user = 'root',
 password = 'mdp',
 index = $pf,
 type = $typeElement,
 sql = select id_source as _id, id_sous_theme,
 titre_source, desc_source, note_source, adresse_source, type_source,
 date_source from source,
 max_bulk_requests = 5,
 )
 )

 );


 $this-elasticsearchClient-index($params);
 }

 Thanks in advance.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPM8qvsmcxB7Xu4KqN28pfvk%2BcBn5bpV2Emw42M5HzAAUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread Alexandre Touret
hello,


https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
 
  
I'm trying to index and query an index store in ES 1.2. I both create and 
populate the index with the JAVA API using the transportclient api. I have 
the following mapping:

get /tp/carte/_mapping{
   tp: {
  mappings: {
 carte: {
properties: {
   adherents: {
  properties: {
 birthday: {
type: date,
format: dateOptionalTime
 },
 firstname: {
type: string
 },
 lastname: {
type: string
 }
  }
   },
   dateEdition: {
  type: date,
  format: dateOptionalTime
   }
}
 }
  }
   }}


When I search ob object with the ID, it works fine but, when I try to query 
the content of one of my nested objects, *ES always returns all the objects 
stored in the index*. I also tried to create the objects manually with 
sense and I have the same behaviour.

Example of my insert

put /tp/carte/20454795{
   dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1958-05-05T23:00:00.000Z,
 firstname: ANDREW,
 lastname: DOE
  },
  {
 birthday: 1964-03-01T23:00:00.000Z,
 firstname: ROBERT,
 lastname: DOE
  },
  {
 birthday: 1989-02-27T23:00:00.000Z,
 firstname: DAVID,
 lastname: DOE
  },
  {
 birthday: 1990-12-11T23:00:00.000Z,
 firstname: JOHN,
 lastname: DOE
  }
   ]
}

Finally, you could find below a query executed in sense


get /tp/carte/_search{
  query: {
bool: {
  must: [
{
  match: {
adherents.lastname: {
  query: DOE
}
  }
}
  ]
}
  }


How can I fix that ?

Thanks

Regards


Alexandre


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to set the query resultset size to infinite

2014-06-20 Thread Nuno Carvalho
Hi all,

I just joined the mailing list, so sorry if this topic was discussed before.

I would like to set the query size to infinite (or no limit).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
This page explains what the parameters do, but there are no details on how 
to set the size to no limit or (if not possible) what is the max value 
accepted by ES for this parameter. I tried setting the value to -1, as I've 
read somewhere that this would be recognized as no limit, but instead it 
defaults to 10.

Any help?

Thanks,
Nuno

 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread David Pilato
Hey Alexandre,


This is correct. You are searching for a carte which contains an adherent.
Elasticsearch gives you a carte object as an answer. And elasticsearch gives 
you back exactly what you have indexed.

That being said, I think you could look at parent/child feature for that use 
case.
Or you can have one carte object per adherent?

Makes sense?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:06:40, Alexandre Touret (alexan...@touret.info) a écrit:

hello,


I'm trying to index and query an index store in ES 1.2. I both create and 
populate the index with the JAVA API using the transportclient api. I have the 
following mapping:


get /tp/carte/_mapping
{
   tp: {
  mappings: {
 carte: {
properties: {
   adherents: {
  properties: {
 birthday: {
type: date,
format: dateOptionalTime
 },
 firstname: {
type: string
 },
 lastname: {
type: string
 }
  }
   },
   dateEdition: {
  type: date,
  format: dateOptionalTime
   }
}
 }
  }
   }
}



When I search ob object with the ID, it works fine but, when I try to query the 
content of one of my nested objects, ES always returns all the objects stored 
in the index. I also tried to create the objects manually with sense and I have 
the same behaviour.

Example of my insert

put /tp/carte/20454795
{
   dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1958-05-05T23:00:00.000Z,
 firstname: ANDREW,
 lastname: DOE
  },
  {
 birthday: 1964-03-01T23:00:00.000Z,
 firstname: ROBERT,
 lastname: DOE
  },
  {
 birthday: 1989-02-27T23:00:00.000Z,
 firstname: DAVID,
 lastname: DOE
  },
  {
 birthday: 1990-12-11T23:00:00.000Z,
 firstname: JOHN,
 lastname: DOE
  }
   ]
}

Finally, you could find below a query executed in sense


get /tp/carte/_search
{
  query: {
bool: {
  must: [ 
{
  match: {
adherents.lastname: {
  query: DOE
}
  }
}
  ]
}
  }



How can I fix that ?

Thanks

Regards



Alexandre



--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53a3fad7.5bd062c2.198d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread Alexandre Touret
Hello,
thanks for your response

When I add an other carte

put /tp/carte/20450813
{
  dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1963-03-22T23:00:00.000Z,
 firstname: FLORENCE,
 lastname: SMITH
  },
  {
 birthday: 2001-10-12T22:00:00.000Z,
 firstname: M ANGELO,
 lastname: SMITH  },
  {
 birthday: 2003-07-30T22:00:00.000Z,
 firstname: M LILI,
 lastname: SMITH
  }
   ]
}

and I run the query described above, I have both of the two 'carte'

Is it normal ?
Do you have an example or a link to illustrate the parent/child feature ?


Thanks



Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :

 Hey Alexandre,


 This is correct. You are searching for a carte which contains an adherent.
 Elasticsearch gives you a carte object as an answer. And elasticsearch 
 gives you back exactly what you have indexed.

 That being said, I think you could look at parent/child feature for that 
 use case.
 Or you can have one carte object per adherent?

 Makes sense?

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info 
 javascript:) a écrit:

 hello,

   

 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
   
 I'm trying to index and query an index store in ES 1.2. I both create and 
 populate the index with the JAVA API using the transportclient api. I have 
 the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to 
 query the content of one of my nested objects, *ES always returns all the 
 objects stored in the index*. I also tried to create the objects manually 
 with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,
  firstname: DAVID,
  lastname: DOE
   },
   {
  birthday: 1990-12-11T23:00:00.000Z,
  firstname: JOHN,
  lastname: DOE
   }
]
 }

 Finally, you could find below a query executed in sense


 get /tp/carte/_search{
   query: {
 bool: {
   must: [ 
 {
   match: {
 adherents.lastname: {
   query: DOE
 }
   }
 }
   ]
 }
   }


  How can I fix that ?

 Thanks

 Regards


 Alexandre


  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Re: ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread David Pilato
Searching for DOE gives you that answer? 
If so, it's not normal IMHO. You should try to reproduce it with a full SENSE 
script recreation so we can replay it and help you from here.

See http://www.elasticsearch.org/help/ for information.

About parent child, you could read this: 
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:19:23, Alexandre Touret (alexan...@touret.info) a écrit:

Hello,
thanks for your response

When I add an other carte

put /tp/carte/20450813
{
  dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1963-03-22T23:00:00.000Z,
 firstname: FLORENCE,
 lastname: SMITH
  },
  {
 birthday: 2001-10-12T22:00:00.000Z,
 firstname: M ANGELO,
 lastname: SMITH  },
  {
 birthday: 2003-07-30T22:00:00.000Z,
 firstname: M LILI,
 lastname: SMITH
  }
   ]
}

and I run the query described above, I have both of the two 'carte'

Is it normal ?
Do you have an example or a link to illustrate the parent/child feature ?


Thanks



Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :
Hey Alexandre,


This is correct. You are searching for a carte which contains an adherent.
Elasticsearch gives you a carte object as an answer. And elasticsearch gives 
you back exactly what you have indexed.

That being said, I think you could look at parent/child feature for that use 
case.
Or you can have one carte object per adherent?

Makes sense?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit:

hello,


I'm trying to index and query an index store in ES 1.2. I both create and 
populate the index with the JAVA API using the transportclient api. I have the 
following mapping:


get /tp/carte/_mapping
{
   tp: {
  mappings: {
 carte: {
properties: {
   adherents: {
  properties: {
 birthday: {
type: date,
format: dateOptionalTime
 },
 firstname: {
type: string
 },
 lastname: {
type: string
 }
  }
   },
   dateEdition: {
  type: date,
  format: dateOptionalTime
   }
}
 }
  }
   }
}



When I search ob object with the ID, it works fine but, when I try to query the 
content of one of my nested objects, ES always returns all the objects stored 
in the index. I also tried to create the objects manually with sense and I have 
the same behaviour.

Example of my insert

put /tp/carte/20454795
{
   dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1958-05-05T23:00:00.000Z,
 firstname: ANDREW,
 lastname: DOE
  },
  {
 birthday: 1964-03-01T23:00:00.000Z,
 firstname: ROBERT,
 lastname: DOE
  },
  {
 birthday: 1989-02-27T23:00:00.000Z,
 firstname: DAVID,
 lastname: DOE
  },
  {
 birthday: 1990-12-11T23:00:00.000Z,
 firstname: JOHN,
 lastname: DOE
  }
   ]
}

Finally, you could find below a query executed in sense


get /tp/carte/_search
{
  query: {
bool: {
  must: [  
{
  match: {
adherents.lastname: {
  query: DOE
}
  }
}
  ]
}
  }



How can I fix that ?

Thanks

Regards



Alexandre



--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Re: How to set the query resultset size to infinite

2014-06-20 Thread David Pilato
You don't want to do that!
If your need is to extract (download) 1 000 000 000 records, you need to use 
scanscroll API: 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html#scan-scroll

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:08:00, Nuno Carvalho (nuno...@gmail.com) a écrit:

Hi all,

I just joined the mailing list, so sorry if this topic was discussed before.

I would like to set the query size to infinite (or no limit).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
This page explains what the parameters do, but there are no details on how to 
set the size to no limit or (if not possible) what is the max value accepted 
by ES for this parameter. I tried setting the value to -1, as I've read 
somewhere that this would be recognized as no limit, but instead it defaults 
to 10.

Any help?

Thanks,
Nuno

 
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53a3fe27.3352255a.198d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
Yes, I am applying reuters on my document (compose by text and picture).
My goal is to do my research on the text of the document with any word or 
part of a word.

Yes the problem it's my nGram filter.
How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an 
other but who satisfy my goal ?

Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit :

 Does it mean your applying the reuters analyzer on your base64 
 encoded pictures? 

 I guess it generates a really huge number of tokens for each entry 
 because of your nGram filter (with a max at 250). 

 Cédric Hourcade 
 c...@wal.fr javascript: 


 On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard 
 bernardt...@gmail.com javascript: wrote: 
  Information 
  My note_source contain picture (.jpg, .png ...) in base64 and text. 
  
  For my mapping I have used : 
  type = string 
  analyzer = reuteurs (the name of my analyzer) 
  
  
  Any idea ? 
  
  Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : 
  
  Hello 
  I have some issue, when I index a particular data note_source (sql 
  longtext). 
  I use the same analyzer for each fields (except date_source and 
 id_source) 
  but for note_source, I have a warn monitor.jvm. 
  When I remove note_source, everything fine. If I don't use analyzer 
 on 
  note_source, everything fine, but if I use my analyzer on 
 note_source I 
  have some crash. 
  
  I think I have enough memory, I have used ES_HEAP_SIZE. 
  Maybe my problem it's with accent (ascii, utf-8) 
  
  Can you help me with this ? 
  
  
  
  My Setting 
  
   public function createSetting($pf){ 
  $params = array('index' = $pf, 'body' = array( 
  'settings' = array( 
  'number_of_shards' = 5, 
  'number_of_replicas' = 0, 
  'analysis' = array( 
  'filter' = array( 
  'nGram' = array( 
  token_chars =array(), 
  type = nGram, 
  min_gram = 3, 
  max_gram  = 250 
  ) 
  ), 
  'analyzer' = array( 
  'reuters' = array( 
  'type' = 'custom', 
  'tokenizer' = 'standard', 
  'filter' = array('lowercase', 'asciifolding', 
  'nGram') 
  ) 
  ) 
  ) 
  ) 
  )); 
  $this-elasticsearchClient-indices()-create($params); 
  return; 
  } 
  
  
  My Indexing 
  
  public function indexTable($pf,$typeElement){ 
  
  $params =array( 
  index ='_river', 
  type = $typeElement, 
  id = _meta, 
  body =array( 
  
  type = jdbc, 
  jdbc = array( 
  url = jdbc:mysql://ip/name, 
  user = 'root', 
  password = 'mdp', 
  index = $pf, 
  type = $typeElement, 
  sql = select id_source as _id, id_sous_theme, 
  titre_source, desc_source, note_source, adresse_source, type_source, 
  date_source from source, 
  max_bulk_requests = 5, 
  ) 
  ) 
  
  ); 
  
  
  $this-elasticsearchClient-index($params); 
  } 
  
  Thanks in advance. 
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups 
  elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an 
  email to elasticsearc...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
  

  For more options, visit https://groups.google.com/d/optout. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7daa716-cb5f-45cc-916b-43c7c0aea6b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread Alexandre Touret
Yes
My request for doe always return that answer



Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :

 Searching for DOE gives you that answer? 
 If so, it's not normal IMHO. You should try to reproduce it with a full 
 SENSE script recreation so we can replay it and help you from here.

 See http://www.elasticsearch.org/help/ for information.

 About parent child, you could read this: 
 http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/



 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info 
 javascript:) a écrit:

 Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : 

  Hey Alexandre,
  
  
  This is correct. You are searching for a carte which contains an 
 adherent.
  Elasticsearch gives you a carte object as an answer. And elasticsearch 
 gives you back exactly what you have indexed.
  
  That being said, I think you could look at parent/child feature for that 
 use case.
  Or you can have one carte object per adherent?
  
  Makes sense?

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
  @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr
  

 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a 
 écrit:

  hello,

   

 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
   
 I'm trying to index and query an index store in ES 1.2. I both create and 
 populate the index with the JAVA API using the transportclient api. I have 
 the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to 
 query the content of one of my nested objects, *ES always returns all 
 the objects stored in the index*. I also tried to create the objects 
 manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,
  firstname: DAVID,
  lastname: DOE
   },
   {
  birthday: 1990-12-11T23:00:00.000Z,
  firstname: JOHN,
  lastname: DOE
   }
]
 }

 Finally, you could find below a query executed in sense


 get /tp/carte/_search{
   query: {
 bool: {
   must: [  
 {
   match: {
 adherents.lastname: {
   query: DOE
 }
   }
 }
   ]
 }
   }


  How can I fix that ?

 Thanks

 Regards


 Alexandre


  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this 

Re: ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread Cédric Hourcade
It looks like you are doing a GET rather than a POST, if so your query
content is ignored.


Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alexan...@touret.info
wrote:

 Yes
 My request for doe always return that answer



 Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :

 Searching for DOE gives you that answer?
 If so, it's not normal IMHO. You should try to reproduce it with a full
 SENSE script recreation so we can replay it and help you from here.

 See http://www.elasticsearch.org/help/ for information.

 About parent child, you could read this: http://www.
 elasticsearch.org/blog/managing-relations-inside-elasticsearch/



  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a
 écrit:

  Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :

  Hey Alexandre,


  This is correct. You are searching for a carte which contains an
 adherent.
  Elasticsearch gives you a carte object as an answer. And elasticsearch
 gives you back exactly what you have indexed.

  That being said, I think you could look at parent/child feature for
 that use case.
  Or you can have one carte object per adherent?

  Makes sense?

  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
  @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a
 écrit:

  hello,



 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#

 I'm trying to index and query an index store in ES 1.2. I both create
 and populate the index with the JAVA API using the transportclient api. I
 have the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to
 query the content of one of my nested objects, *ES always returns all
 the objects stored in the index*. I also tried to create the objects
 manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,
  firstname: DAVID,
  lastname: DOE
   },
   {
  birthday: 1990-12-11T23:00:00.000Z,
  firstname: JOHN,
  lastname: DOE
   }
]
 }

 Finally, you could find below a query executed in sense


 get /tp/carte/_search{
   query: {
 bool: {
   must: [
 {
   match: {
 adherents.lastname: {
   query: DOE
 }
   }
 }
   ]
 }
   }


  How can I fix that ?

 Thanks

 Regards


 

Re: ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread Alexandre Touret
That's right 
Thanks for your help :)

Regards

Le vendredi 20 juin 2014 11:28:26 UTC+2, Cédric Hourcade a écrit :

 It looks like you are doing a GET rather than a POST, if so your query 
 content is ignored.


 Cédric Hourcade
 c...@wal.fr javascript:


 On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alex...@touret.info 
 javascript: wrote:

 Yes
 My request for doe always return that answer



 Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :

 Searching for DOE gives you that answer? 
 If so, it's not normal IMHO. You should try to reproduce it with a full 
 SENSE script recreation so we can replay it and help you from here.

 See http://www.elasticsearch.org/help/ for information.

 About parent child, you could read this: http://www.
 elasticsearch.org/blog/managing-relations-inside-elasticsearch/



  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a 
 écrit:

  Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : 

  Hey Alexandre,
  
  
  This is correct. You are searching for a carte which contains an 
 adherent.
  Elasticsearch gives you a carte object as an answer. And elasticsearch 
 gives you back exactly what you have indexed.
  
  That being said, I think you could look at parent/child feature for 
 that use case.
  Or you can have one carte object per adherent?
  
  Makes sense?

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
  @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr
  

 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a 
 écrit:

  hello,

   

 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
   
 I'm trying to index and query an index store in ES 1.2. I both create 
 and populate the index with the JAVA API using the transportclient api. I 
 have the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to 
 query the content of one of my nested objects, *ES always returns all 
 the objects stored in the index*. I also tried to create the objects 
 manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,
  firstname: DAVID,
  lastname: DOE
   },
   {
  birthday: 1990-12-11T23:00:00.000Z,
  firstname: JOHN,
  lastname: DOE
   }
]
 }

 Finally, you could find below a query executed in sense


 get /tp/carte/_search{
   query: {
 bool: {
   must: [  
  

Re: ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread David Pilato
No. GET works for running searches.

It could be an issue if you are using an OLD SENSE version and not Marvel.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr) a écrit:

It looks like you are doing a GET rather than a POST, if so your query content 
is ignored.


Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alexan...@touret.info 
wrote:
Yes
My request for doe always return that answer



Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :
Searching for DOE gives you that answer? 
If so, it's not normal IMHO. You should try to reproduce it with a full SENSE 
script recreation so we can replay it and help you from here.

See http://www.elasticsearch.org/help/ for information.

About parent child, you could read this: 
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a écrit:

Hello,
thanks for your response

When I add an other carte

put /tp/carte/20450813
{
  dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1963-03-22T23:00:00.000Z,
 firstname: FLORENCE,
 lastname: SMITH
  },
  {
 birthday: 2001-10-12T22:00:00.000Z,
 firstname: M ANGELO,
 lastname: SMITH  },
  {
 birthday: 2003-07-30T22:00:00.000Z,
 firstname: M LILI,
 lastname: SMITH
  }
   ]
}

and I run the query described above, I have both of the two 'carte'

Is it normal ?
Do you have an example or a link to illustrate the parent/child feature ?


Thanks



Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :
Hey Alexandre,


This is correct. You are searching for a carte which contains an adherent.
Elasticsearch gives you a carte object as an answer. And elasticsearch gives 
you back exactly what you have indexed.

That being said, I think you could look at parent/child feature for that use 
case.
Or you can have one carte object per adherent?

Makes sense?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit:

hello,


I'm trying to index and query an index store in ES 1.2. I both create and 
populate the index with the JAVA API using the transportclient api. I have the 
following mapping:


get /tp/carte/_mapping
{
   tp: {
  mappings: {
 carte: {
properties: {
   adherents: {
  properties: {
 birthday: {
type: date,
format: dateOptionalTime
 },
 firstname: {
type: string
 },
 lastname: {
type: string
 }
  }
   },
   dateEdition: {
  type: date,
  format: dateOptionalTime
   }
}
 }
  }
   }
}



When I search ob object with the ID, it works fine but, when I try to query the 
content of one of my nested objects, ES always returns all the objects stored 
in the index. I also tried to create the objects manually with sense and I have 
the same behaviour.

Example of my insert

put /tp/carte/20454795
{
   dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1958-05-05T23:00:00.000Z,
 firstname: ANDREW,
 lastname: DOE
  },
  {
 birthday: 1964-03-01T23:00:00.000Z,
 firstname: ROBERT,
 lastname: DOE
  },
  {
 birthday: 1989-02-27T23:00:00.000Z,
 firstname: DAVID,
 lastname: DOE
  },
  {
 birthday: 1990-12-11T23:00:00.000Z,
 firstname: JOHN,
 lastname: DOE
  }
   ]
}

Finally, you could find below a query executed in sense


get /tp/carte/_search
{
  query: {
bool: {
  must: [   
{
  match: {
adherents.lastname: {
  query: DOE
}
  }
}
  ]
}
  }



How can I fix that ?

Thanks

Regards



Alexandre



--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To 

Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
I set max_gram=20. It's better but at the end I have this many times :

[2014-06-20 11:42:14,201][WARN ][monitor.jvm  ] [ik-test2] 
[gc][young][528][263] duration [2s], collections [1]/[2.1s], total 
[2s]/[43.9s], memory [536mb]-[580.2mb]/[1015.6mb], all_pools {[young] 
[22.5mb]-[22.3mb]/[66.5mb]}{[survivor] [14.9kb]-[49.3kb]/[8.3mb]}{[old] 
[513.4mb]-[557.8mb]/[940.8mb]}

I put ES_HEAP_SIZE : 2G. I think it's enough.
Something wrong ?


Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :

 Hello
 I have some issue, when I index a particular data note_source (sql 
 longtext).
 I use the same analyzer for each fields (except date_source and id_source) 
 but for note_source, I have a warn monitor.jvm.
 When I remove note_source, everything fine. If I don't use analyzer on 
 note_source, everything fine, but if I use my analyzer on note_source I 
 have some crash.

 I think I have enough memory, I have used ES_HEAP_SIZE.
 Maybe my problem it's with accent (ascii, utf-8)

 Can you help me with this ?



 *My Setting*

  public function createSetting($pf){
 $params = array('index' = $pf, 'body' = array(
 'settings' = array(
 'number_of_shards' = 5,
 'number_of_replicas' = 0,
 'analysis' = array(
 'filter' = array(
 'nGram' = array(
 token_chars =array(),
 type = nGram,
 min_gram = 3,
 max_gram  = 250
 )
 ),
 'analyzer' = array(
 'reuters' = array(
 'type' = 'custom',
 'tokenizer' = 'standard',
 'filter' = array('lowercase', 'asciifolding', 
 'nGram')
 )
 )
 )
 )
 ));
 $this-elasticsearchClient-indices()-create($params);
 return;
 }


 *My Indexing*

 public function indexTable($pf,$typeElement){

 $params =array(
 index ='_river', 
 type = $typeElement, 
 id = _meta, 
 body =array(
   
 type = jdbc,
 jdbc = array(
 url = jdbc:mysql://ip/name,
 user = 'root',
 password = 'mdp',
 index = $pf,
 type = $typeElement,
 sql = select id_source as _id, id_sous_theme, 
 titre_source, desc_source, note_source, adresse_source, type_source, 
 date_source from source,
 max_bulk_requests = 5,  
 )
 )
 
 );
 
  
 $this-elasticsearchClient-index($params);
 }

 Thanks in advance.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/154b8ca2-a130-4062-b5ce-0e0fa63d98fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread Alexandre Touret
I just upgraded to ES 1.2.1 and the latest release of mavel.
I have the same behaviour

Le vendredi 20 juin 2014 11:34:59 UTC+2, David Pilato a écrit :

 No. GET works for running searches.

 It could be an issue if you are using an OLD SENSE version and not Marvel.

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr javascript:) a 
 écrit:

 It looks like you are doing a GET rather than a POST, if so your query 
 content is ignored. 

  
 Cédric Hourcade
 c...@wal.fr javascript:


 On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alex...@touret.info 
 javascript: wrote:

 Yes
 My request for doe always return that answer



 Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit : 

   Searching for DOE gives you that answer? 
  If so, it's not normal IMHO. You should try to reproduce it with a full 
 SENSE script recreation so we can replay it and help you from here.
  
  See http://www.elasticsearch.org/help/ for information.
  
  About parent child, you could read this: 
 http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
  
  

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr
  

 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a 
 écrit:

   Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : 

  Hey Alexandre,
  
  
  This is correct. You are searching for a carte which contains an 
 adherent.
  Elasticsearch gives you a carte object as an answer. And elasticsearch 
 gives you back exactly what you have indexed.
  
  That being said, I think you could look at parent/child feature for 
 that use case.
  Or you can have one carte object per adherent?
  
  Makes sense?

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr
  

 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a 
 écrit:

  hello,

   

 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
   
 I'm trying to index and query an index store in ES 1.2. I both create 
 and populate the index with the JAVA API using the transportclient api. I 
 have the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to 
 query the content of one of my nested objects, *ES always returns all 
 the objects stored in the index*. I also tried to create the objects 
 manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,
  

Re: How to set the query resultset size to infinite

2014-06-20 Thread Nuno Carvalho
Right... that makes sense :)

I'll give it a try, thank you!

Nuno


On Friday, 20 June 2014 10:26:07 UTC+1, David Pilato wrote:

 You don't want to do that!
 If your need is to extract (download) 1 000 000 000 records, you need to 
 use scanscroll API: 
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html#scan-scroll

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:08:00, Nuno Carvalho (nun...@gmail.com javascript:) 
 a écrit:

 Hi all, 

 I just joined the mailing list, so sorry if this topic was discussed 
 before.

 I would like to set the query size to infinite (or no limit).

  
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
  This page explains what the parameters do, but there are no details on 
 how to set the size to no limit or (if not possible) what is the max 
 value accepted by ES for this parameter. I tried setting the value to -1, 
 as I've read somewhere that this would be recognized as no limit, but 
 instead it defaults to 10.

 Any help?

 Thanks,
 Nuno

  
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/49dbec8b-765a-4647-8672-b556028dcea0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
The user copy/paste the content of an html page and me, I index this 
information. I take the entire document with image. I can't change this 
behavior.

I set max_gram=20. It's better but at the end I have this many times :

[2014-06-20 11:42:14,201][WARN ][monitor.jvm  ] [ik-test2] 
[gc][young][528][263] duration [2s], collections [1]/[2.1s], total 
[2s]/[43.9s], memory [536mb]-[580.2mb]/[1015.6mb], all_pools {[young] 
[22.5mb]-[22.3mb]/[66.5mb]}{[survivor] [14.9kb]-[49.3kb]/[8.3mb]}{[old] 
[513.4mb]-[557.8mb]/[940.8mb]}

I put ES_HEAP_SIZE : 2G. I think it's enough.
Something wrong ?

Le vendredi 20 juin 2014 11:45:22 UTC+2, Cédric Hourcade a écrit :

 If you are only searching in the text you should index the images in 
 an other field field. With no analyzer (index: not_analyzed), or 
 even better index: no (not indexed). If you need to retrieve the 
 image data it's still in the _source. 

 But to be honest I wouldn't even store this kind of information in ES, 
 your index is going to be bigger, merges are going to be slower... I'd 
 keep the binary files stored elsewhere. 

 Cédric Hourcade 
 c...@wal.fr javascript: 


 On Fri, Jun 20, 2014 at 11:25 AM, Tanguy Bernard 
 bernardt...@gmail.com javascript: wrote: 
  Yes, I am applying reuters on my document (compose by text and 
 picture). 
  My goal is to do my research on the text of the document with any word 
 or 
  part of a word. 
  
  Yes the problem it's my nGram filter. 
  How do I solve this problem ? Deacrease nGram max ? Change Analyzer by 
 an 
  other but who satisfy my goal ? 
  
  Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit : 
  
  Does it mean your applying the reuters analyzer on your base64 
  encoded pictures? 
  
  I guess it generates a really huge number of tokens for each entry 
  because of your nGram filter (with a max at 250). 
  
  Cédric Hourcade 
  c...@wal.fr 
  
  
  On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard 
  bernardt...@gmail.com wrote: 
   Information 
   My note_source contain picture (.jpg, .png ...) in base64 and text. 
   
   For my mapping I have used : 
   type = string 
   analyzer = reuteurs (the name of my analyzer) 
   
   
   Any idea ? 
   
   Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : 
   
   Hello 
   I have some issue, when I index a particular data note_source (sql 
   longtext). 
   I use the same analyzer for each fields (except date_source and 
   id_source) 
   but for note_source, I have a warn monitor.jvm. 
   When I remove note_source, everything fine. If I don't use 
 analyzer 
   on 
   note_source, everything fine, but if I use my analyzer on 
   note_source I 
   have some crash. 
   
   I think I have enough memory, I have used ES_HEAP_SIZE. 
   Maybe my problem it's with accent (ascii, utf-8) 
   
   Can you help me with this ? 
   
   
   
   My Setting 
   
public function createSetting($pf){ 
   $params = array('index' = $pf, 'body' = array( 
   'settings' = array( 
   'number_of_shards' = 5, 
   'number_of_replicas' = 0, 
   'analysis' = array( 
   'filter' = array( 
   'nGram' = array( 
   token_chars =array(), 
   type = nGram, 
   min_gram = 3, 
   max_gram  = 250 
   ) 
   ), 
   'analyzer' = array( 
   'reuters' = array( 
   'type' = 'custom', 
   'tokenizer' = 'standard', 
   'filter' = array('lowercase', 
 'asciifolding', 
   'nGram') 
   ) 
   ) 
   ) 
   ) 
   )); 
   $this-elasticsearchClient-indices()-create($params); 
   return; 
   } 
   
   
   My Indexing 
   
   public function indexTable($pf,$typeElement){ 
   
   $params =array( 
   index ='_river', 
   type = $typeElement, 
   id = _meta, 
   body =array( 
   
   type = jdbc, 
   jdbc = array( 
   url = jdbc:mysql://ip/name, 
   user = 'root', 
   password = 'mdp', 
   index = $pf, 
   type = $typeElement, 
   sql = select id_source as _id, id_sous_theme, 
   titre_source, desc_source, note_source, adresse_source, type_source, 
   date_source from source, 
   max_bulk_requests = 5, 
   ) 
   ) 
   
   ); 
   
   
   $this-elasticsearchClient-index($params); 
   } 
   
   Thanks in advance. 
   
   -- 
   You received this message because you are subscribed to the Google 
   Groups 
   elasticsearch group. 
   To unsubscribe from this group and stop receiving emails from it, 
 send 
 

Re: ElasticSearch queries always return all the datas stored in the index

2014-06-20 Thread Cédric Hourcade
Ah yes sorry you are right, I am using some old tools :)


Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 11:49 AM, Alexandre Touret alexan...@touret.info
wrote:

 I just upgraded to ES 1.2.1 and the latest release of mavel.
 I have the same behaviour

 Le vendredi 20 juin 2014 11:34:59 UTC+2, David Pilato a écrit :

 No. GET works for running searches.

 It could be an issue if you are using an OLD SENSE version and not Marvel.

  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr) a écrit:

  It looks like you are doing a GET rather than a POST, if so your query
 content is ignored.


 Cédric Hourcade
 c...@wal.fr


 On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alex...@touret.info
 wrote:

 Yes
 My request for doe always return that answer



 Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :

   Searching for DOE gives you that answer?
  If so, it's not normal IMHO. You should try to reproduce it with a
 full SENSE script recreation so we can replay it and help you from here.

  See http://www.elasticsearch.org/help/ for information.

  About parent child, you could read this: http://www.
 elasticsearch.org/blog/managing-relations-inside-elasticsearch/



  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a
 écrit:

   Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature
 ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :

  Hey Alexandre,


  This is correct. You are searching for a carte which contains an
 adherent.
  Elasticsearch gives you a carte object as an answer. And
 elasticsearch gives you back exactly what you have indexed.

  That being said, I think you could look at parent/child feature for
 that use case.
  Or you can have one carte object per adherent?

  Makes sense?

  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a
 écrit:

  hello,



 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#

 I'm trying to index and query an index store in ES 1.2. I both create
 and populate the index with the JAVA API using the transportclient api. I
 have the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try
 to query the content of one of my nested objects, *ES always returns
 all the objects stored in the index*. I also tried to create the
 objects manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
   

Re: How does shingle filter work on match_phrase in query phase?

2014-06-20 Thread 陳智清
Hello Hourcade, Thanks for your response.

Does that mean different values should be set to index_analyzer and 
search_analyzer? (e.g. index_analyzer: shingle, and 
search_analyzer: standard)
What if I want to re-use the same shingle analyzer in both index and 
search? will the match_phrase t1 t2 t3 still give me a match?

I know that set a different analyzer to search_analyzer makes 
match_phrase t1 t2 t3 searchable, but if I do that, then I get no benefit 
from shingle, right? Instead I get a bigger index size.

I assume shingle is used for faster match_phrase searches. But after 
shingle, searching a phrase of 3 tokens t1 t2 t3 becomes searching a 
phrase of 5 tokens plus I don't know how shingle arranges the positions 
for a correct phrase query. So how can match_phrase be faster? Thank you.

Cédric Hourcade於 2014年6月20日星期五UTC+8下午4時18分03秒寫道:

 Hello, 

 Let's say you have an indexed text t1 t3 t3 with shingles. The token 
 positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos 
 1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3). 

 So if you are searching with a match_phrase for t1 t2 t3 (even if 
 not tokenized as shingles) it will matches the document, because t1, 
 t2 and t3 are considered next to each others (based on there recorded 
 position) for this document. 

 Cédric Hourcade 
 c...@wal.fr javascript: 


 On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walke...@gmail.com javascript: 
 wrote: 
  How does shingle filter work on match_phrase in query phase? 
  
  After analyzing phrase t1 t2 t3, shingle filter produced five tokens, 
t1 
t2 
t3 
t1 t2 
t2 t3 
  
  Will match_phrase still give t1 t2 t3 a match? How it works? Thank 
 you. 
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups 
  elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an 
  email to elasticsearc...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com.
  

  For more options, visit https://groups.google.com/d/optout. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/602477cb-d8f4-459b--e6174662fbfd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How do people typically handle shard failures in their results?

2014-06-20 Thread Shay Banon
If it fails on the primary shard, then a failure is returned. If it worked, and 
a replica failed, then that replica is deemed a failed replica, and will get 
allocated somewhere else in the cluster. Maybe an example of where a failure on 
all shards would help here?

On Jun 18, 2014, at 11:45, mooky nick.minute...@gmail.com wrote:

 If I understand correctly, we can get an OK response from elastic (ie no 
 error) but if there are shard failures in the response, it potentially means 
 that results are incomplete/incorrect. From my observation, we can get 
 failures on all shards - and elastic still returns OK (which was a bit 
 surprising to me)
 
 What kinds of approaches to people typically use to deal with shard failures?
 
 For my application, if there are shard failures, essentially my results are 
 inaccurate/incorrect - so I need to return an error to the client. Returning 
 bad results is worse than returning an error. 
 
 I am inclined to turn any shard failure into an exception.
 Is this quite common? Does it make sense to add a feature to the elastic api 
 ? (ie request.setTreatShardFailuresAsErrors(true)
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/461fa217-d664-47e9-a60d-88ea9506327d%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/FFDC7083-24CB-484D-B337-65582596D555%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How do people typically handle shard failures in their results?

2014-06-20 Thread Nikolas Everett
On Fri, Jun 20, 2014 at 7:08 AM, Shay Banon kim...@gmail.com wrote:

 If it fails on the primary shard, then a failure is returned. If it
 worked, and a replica failed, then that replica is deemed a failed replica,
 and will get allocated somewhere else in the cluster. Maybe an example of
 where a failure on “all” shards would help here?


I think its more about searches and they can fail on one shard but not
other for all sorts of reasons.  Queue full, unfortunate script, bug, only
one shard had results and the query asked for something weird like to use
the postings highlighter when postings aren't stored.  Lots of reasons.

I log the event and move on.  I toyed with outputting a warning to the user
but didn't have time to implement it.  We're pretty diligent with our logs
so we'd notice the log and run it down.

If the failure is caused by the queue being full only on one node, we'd
likely notice that real quick as ganglia would lose it.  This happened to
me recently when we put a node without an ssd into a cluster with ssds.  It
couldn't keep up and dropped a ton of searches.  In our defense, we didn't
know the rest of the cluster had ssds so we were double surprised.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2dvNM-wu%3Due4trJzAtLV%3Dz1xK0MVNxhYkUKv2g68z3VQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does shingle filter work on match_phrase in query phase?

2014-06-20 Thread Cédric Hourcade
Yes, you can use two different analyzers. In your case what you can do is:
- for the the indexation you apply a shingle filter.
- for the query you also apply a shingle filter, but this time you
disable the unigrams (output_unigrams: false), so it will only
generate the shingles, in your case : t1 t2 and t2 t3. It will
match your document.
Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 12:30 PM, 陳智清 walker0...@gmail.com wrote:
 Hello Hourcade, Thanks for your response.

 Does that mean different values should be set to index_analyzer and
 search_analyzer? (e.g. index_analyzer: shingle, and search_analyzer:
 standard)
 What if I want to re-use the same shingle analyzer in both index and
 search? will the match_phrase t1 t2 t3 still give me a match?

 I know that set a different analyzer to search_analyzer makes match_phrase
 t1 t2 t3 searchable, but if I do that, then I get no benefit from
 shingle, right? Instead I get a bigger index size.

 I assume shingle is used for faster match_phrase searches. But after
 shingle, searching a phrase of 3 tokens t1 t2 t3 becomes searching a
 phrase of 5 tokens plus I don't know how shingle arranges the positions
 for a correct phrase query. So how can match_phrase be faster? Thank you.

 Cédric Hourcade於 2014年6月20日星期五UTC+8下午4時18分03秒寫道:

 Hello,

 Let's say you have an indexed text t1 t3 t3 with shingles. The token
 positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos
 1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3).

 So if you are searching with a match_phrase for t1 t2 t3 (even if
 not tokenized as shingles) it will matches the document, because t1,
 t2 and t3 are considered next to each others (based on there recorded
 position) for this document.

 Cédric Hourcade
 c...@wal.fr


 On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walke...@gmail.com wrote:
  How does shingle filter work on match_phrase in query phase?
 
  After analyzing phrase t1 t2 t3, shingle filter produced five tokens,
t1
t2
t3
t1 t2
t2 t3
 
  Will match_phrase still give t1 t2 t3 a match? How it works? Thank
  you.
 
  --
  You received this message because you are subscribed to the Google
  Groups
  elasticsearch group.
  To unsubscribe from this group and stop receiving emails from it, send
  an
  email to elasticsearc...@googlegroups.com.
  To view this discussion on the web visit
 
  https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com.
  For more options, visit https://groups.google.com/d/optout.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/602477cb-d8f4-459b--e6174662fbfd%40googlegroups.com.

 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPMAEGK%3DSxYfoBtjgcdZYPHqAAiSPpQBjh1fvtXgkwWuLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How do people typically handle shard failures in their results?

2014-06-20 Thread Shay Banon
Ahh, I see. If its related to searches, then yea, the search response includes 
details about the total shards that the search was executed on, the successful 
shards, and failed shards. They are important to check to understand if one 
gets partial results.

In the REST API, if there is a total failure, then it will return the worst 
status code out of all the shards in the response. In the Java API, the search 
response will be returned (with no exception), so the content of the search has 
to be checked (which is a good practice anyhow). It might make sense to raise 
an exception in the Java API if all shards failed, I am on the fence on this 
one, since anyhow a check needs to be performed on the result.

On Jun 20, 2014, at 13:22, Nikolas Everett nik9...@gmail.com wrote:

 
 
 
 On Fri, Jun 20, 2014 at 7:08 AM, Shay Banon kim...@gmail.com wrote:
 If it fails on the primary shard, then a failure is returned. If it worked, 
 and a replica failed, then that replica is deemed a failed replica, and will 
 get allocated somewhere else in the cluster. Maybe an example of where a 
 failure on all shards would help here?
 
 I think its more about searches and they can fail on one shard but not other 
 for all sorts of reasons.  Queue full, unfortunate script, bug, only one 
 shard had results and the query asked for something weird like to use the 
 postings highlighter when postings aren't stored.  Lots of reasons.
 
 I log the event and move on.  I toyed with outputting a warning to the user 
 but didn't have time to implement it.  We're pretty diligent with our logs so 
 we'd notice the log and run it down.
 
 If the failure is caused by the queue being full only on one node, we'd 
 likely notice that real quick as ganglia would lose it.  This happened to me 
 recently when we put a node without an ssd into a cluster with ssds.  It 
 couldn't keep up and dropped a ton of searches.  In our defense, we didn't 
 know the rest of the cluster had ssds so we were double surprised.
 
 Nik
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2dvNM-wu%3Due4trJzAtLV%3Dz1xK0MVNxhYkUKv2g68z3VQ%40mail.gmail.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9A9246D5-338B-4B93-B2FD-4D3B93F621F2%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How does shingle filter work on match_phrase in query phase?

2014-06-20 Thread 陳智清
I got it! Thank you!

Cédric Hourcade於 2014年6月20日星期五UTC+8下午8時00分36秒寫道:

 Yes, you can use two different analyzers. In your case what you can do is: 
 - for the the indexation you apply a shingle filter. 
 - for the query you also apply a shingle filter, but this time you 
 disable the unigrams (output_unigrams: false), so it will only 
 generate the shingles, in your case : t1 t2 and t2 t3. It will 
 match your document. 
 Cédric Hourcade 
 c...@wal.fr javascript: 


 On Fri, Jun 20, 2014 at 12:30 PM, 陳智清 walke...@gmail.com javascript: 
 wrote: 
  Hello Hourcade, Thanks for your response. 
  
  Does that mean different values should be set to index_analyzer and 
  search_analyzer? (e.g. index_analyzer: shingle, and 
 search_analyzer: 
  standard) 
  What if I want to re-use the same shingle analyzer in both index and 
  search? will the match_phrase t1 t2 t3 still give me a match? 
  
  I know that set a different analyzer to search_analyzer makes 
 match_phrase 
  t1 t2 t3 searchable, but if I do that, then I get no benefit from 
  shingle, right? Instead I get a bigger index size. 
  
  I assume shingle is used for faster match_phrase searches. But after 
  shingle, searching a phrase of 3 tokens t1 t2 t3 becomes searching a 
  phrase of 5 tokens plus I don't know how shingle arranges the 
 positions 
  for a correct phrase query. So how can match_phrase be faster? Thank 
 you. 
  
  Cédric Hourcade於 2014年6月20日星期五UTC+8下午4時18分03秒寫道: 
  
  Hello, 
  
  Let's say you have an indexed text t1 t3 t3 with shingles. The token 
  positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos 
  1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3). 
  
  So if you are searching with a match_phrase for t1 t2 t3 (even if 
  not tokenized as shingles) it will matches the document, because t1, 
  t2 and t3 are considered next to each others (based on there recorded 
  position) for this document. 
  
  Cédric Hourcade 
  c...@wal.fr 
  
  
  On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walke...@gmail.com wrote: 
   How does shingle filter work on match_phrase in query phase? 
   
   After analyzing phrase t1 t2 t3, shingle filter produced five 
 tokens, 
 t1 
 t2 
 t3 
 t1 t2 
 t2 t3 
   
   Will match_phrase still give t1 t2 t3 a match? How it works? Thank 
   you. 
   
   -- 
   You received this message because you are subscribed to the Google 
   Groups 
   elasticsearch group. 
   To unsubscribe from this group and stop receiving emails from it, 
 send 
   an 
   email to elasticsearc...@googlegroups.com. 
   To view this discussion on the web visit 
   
   
 https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com.
  

   For more options, visit https://groups.google.com/d/optout. 
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups 
  elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an 
  email to elasticsearc...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/602477cb-d8f4-459b--e6174662fbfd%40googlegroups.com.
  

  
  For more options, visit https://groups.google.com/d/optout. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/61083ccb-f678-4074-bd48-a4dbcc0c0511%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Cédric Hourcade
If your base64 encodes are long, they are going to be splited in a lot
of tokens by the standard tokenizer.

Theses tokens are often going to be a lot longer than standard words,
so your nGram filter will generate even more tokens, a lot more than
with standard text. That may be your problem there.

You should really try to strip the encoded images with a simple regex
from your documents before indexing them. If you need to keep the
source, put the raw text in an unindexed field, and the cleaned one in
another.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPPD4UXAjX%2Buwi84LSsPeiy0C80uzcb4C1QFxwLzfyjQGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES v1.1 continuous young gc pauses old gc, stops the world when old gc happens and splits cluster

2014-06-20 Thread Ankush Jhalani
Mike - The above sounds like happened due to machines sending too many 
indexing requests and merging unable to keep up pace. Usual suspects would 
be not enough cpu/disk speed bandwidth. 
This doesn't sound related to memory constraints posted in the original 
issue of this thread. Do you see memory GC traces in logs? 

On Friday, June 20, 2014 9:40:48 AM UTC-4, Michael Hart wrote:

 We're seeing the same thing. ES 1.1.0, JDK 7u55 on Ubuntu 12.04, 5 data 
 nodes, 3 separate masters, all are 15GB hosts with 7.5GB Heaps, storage is 
 SSD. Data set is ~1.6TB according to Marvel.

 Our daily indices are roughly 33GB in size, with 5 shards and 2 replicas. 
 I'm still investigating what happened yesterday, but I do see in Marvel a 
 large spike in the Indices Current Merges graph just before the node 
 dies, and a corresponding increase in JVM Heap. When Heap hits 99% 
 everything grinds to a halt. Restarting the node fixes the issue, but 
 this is third or fourth time it's happened.

 I'm still researching how to deal with this, but a couple of things I am 
 looking at are:

- increase the number of shards so that the segment merges stay 
smaller (is that even a legitimate sentence?) I'm still reading through 
this page the Index Module Merge page 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-merge.html
  for 
more details.
- look at store level throttling 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html#store-throttling
.

 I would love to get some feedback on my ramblings. If I find anything more 
 I'll update this thread.

 cheers
 mike




 On Thursday, June 19, 2014 4:05:54 PM UTC-4, Bruce Ritchie wrote:

 Java 8 with G1GC perhaps? It'll have more overhead but perhaps it'll be 
 more consistent wrt pauses.



 On Wednesday, June 18, 2014 2:02:24 PM UTC-4, Eric Brandes wrote:

 I'd just like to chime in with a me too.  Is the answer just more 
 nodes?  In my case this is happening every week or so.

 On Monday, April 21, 2014 9:04:33 PM UTC-5, Brian Flad wrote:

 My dataset currently is 100GB across a few daily indices (~5-6GB and 15 
 shards each). Data nodes are 12 CPU, 12GB RAM (6GB heap).


 On Mon, Apr 21, 2014 at 6:33 PM, Mark Walkom ma...@campaignmonitor.com 
 wrote:

 How big are your data sets? How big are your nodes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 22 April 2014 00:32, Brian Flad bfla...@gmail.com wrote:

 We're seeing the same behavior with 1.1.1, JDK 7u55, 3 master nodes (2 min 
 master), and 5 data nodes. Interestingly, we see the repeated young GCs 
 only on a node or two at a time. Cluster operations (such as recovering 
 unassigned shards) grinds to a halt. After restarting a GCing node, 
 everything returns to normal operation in the cluster.

 Brian F


 On Wed, Apr 16, 2014 at 8:00 PM, Mark Walkom ma...@campaignmonitor.com 
 wrote:

 In both your instances, if you can, have 3 master eligible nodes as it 
 will reduce the likelihood of a split cluster as you will always have a 
 majority quorum. Also look at discovery.zen.minimum_master_nodes to go with 
 that.
 However you may just be reaching the limit of your nodes, which means the 
 best option is to add another node (which also neatly solves your split 
 brain!).

 Ankush it would help if you can update java, most people recommend u25 but 
 we run u51 with no problems.



 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 17 April 2014 07:31, Dominiek ter Heide domin...@gmail.com wrote:

 We are seeing the same issue here. 

 Our environment:

 - 2 nodes
 - 30GB Heap allocated to ES
 - ~140GB of data
 - 639 indices, 10 shards per index
 - ~48M documents

 After starting ES everything is good, but after a couple of hours we see 
 the Heap build up towards 96% on one node and 80% on the other. We then see 
 the GC take very long on the 96% node:









 TOuKgmlzaVaFVA][elasticsearch1.trend1.bottlenose.com][inet[/192.99.45.125:
 9300]]])

 [2014-04-16 12:04:27,845][INFO ][discovery] 
 [elasticsearch2.trend1] trend1/I3EHG_XjSayz2OsHyZpeZA

 [2014-04-16 12:04:27,850][INFO ][http ] [
 elasticsearch2.trend1] bound_address {inet[/0.0.0.0:9200]}, 
 publish_address {inet[/192.99.45.126:9200]}

 [2014-04-16 12:04:27,851][INFO ][node ] 
 [elasticsearch2.trend1] started

 [2014-04-16 12:04:32,669][INFO ][indices.store] 
 [elasticsearch2.trend1] updating indices.store.throttle.max_bytes_per_sec 
 from [20mb] to [1gb], note, type is [MERGE]

 [2014-04-16 12:04:32,669][INFO ][cluster.routing.allocation.decider] 
 [elasticsearch2.trend1] updating 
 [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] 
 to [50]

 [2014-04-16 12:04:32,670][INFO 

Re: Losing data after Elasticsearch restart

2014-06-20 Thread Rohit Jaiswal
Hi Alexander,
 Here is the stack trace for the NullpointerException -

[23:24:38,929][DEBUG][action.bulk  ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@22b11bbf]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[23:24:38,940][DEBUG][action.bulk  ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@768475c4]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)


Thanks,
Rohit


On Fri, Jun 20, 2014 at 12:02 AM, Alexander Reelsen a...@spinscale.de
wrote:

 Hey,

 the exception you showed, can possibly happen, when you remove an alias.
 However you mentioned NullPointerException in your first post, which is not
 contained in the stacktrace, so it seems, that one is still missing.

 Also, please retry with a newer version of Elasticsearch.


 --Alex


 On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jais...@gmail.com
 wrote:

 Hi Alexander,
We sent you the stack trace. Can you please enlighten us
 on this?

 Thanks,
 Rohit


 On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal rohit.jais...@gmail.com
 wrote:

 Hi Alexander,
 Thanks for your reply. We plan to upgrade in the
 long run, however we need to fix the data loss problem on 0.90.2 in the
 immediate term.

 Here is the stack trace -


 10:09:37.783 PM

 [22:09:37,783][WARN ][indices.cluster  ] [Storm]
 [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
 org.elasticsearch.indices.recovery.RecoveryFailedException:
 [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
 Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
 [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
 at
 org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: org.elasticsearch.transport.RemoteTransportException:
 [Jeffrey Mace][inet[/10.4.35.200:9300
 ]][index/shard/recovery/startRecovery]
 Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
 [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
 at
 org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
 at
 org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
 at
 org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
 at
 org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
 at
 

Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
Thank you Cédric Hourcade !

Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit :

 If your base64 encodes are long, they are going to be splited in a lot 
 of tokens by the standard tokenizer. 

 Theses tokens are often going to be a lot longer than standard words, 
 so your nGram filter will generate even more tokens, a lot more than 
 with standard text. That may be your problem there. 

 You should really try to strip the encoded images with a simple regex 
 from your documents before indexing them. If you need to keep the 
 source, put the raw text in an unindexed field, and the cleaned one in 
 another. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


guarding from double-start

2014-06-20 Thread Andrew Gaydenko
There were a couple of times during development workflow I have started ES 
script the second time. It results in red status (I use Elastic HQ) and 
not-working. So I'm forced to regenerate all indexes (with all test data) 
again. It takes noticeable time. 

At the moment I use this script

ES_MAX_MEM=512M
export ES_MAX_MEM
cd /ES-dir/bin
./elasticsearch.in.sh 
./elasticsearch -f 


under Linux to start ES. Can you. please, suggest a trick to avoid falling 
in red?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a79fba10-3fad-4c76-bc19-d744c2f79ef2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: searching on nested docs - geting back the nested docs as a response

2014-06-20 Thread liorg
I am not sure highlight will work as i suspect it will encounter the same 
obstacle, see in:
https://github.com/elasticsearch/elasticsearch/issues/5245

as for suggestion #2, this will break our current schema and will require a 
significant model change (we store the data in MongoDB as well) - so, i am 
not sure if we are not better off to wait until #3022 is solved? for the 
meantime, any workaround will be appreciated...

can we do some in memory searching again? (using native lucene somehow?...)

On Friday, June 20, 2014 1:13:42 AM UTC+3, Itamar Syn-Hershko wrote:

 It is very hard to give you concrete advice without knowing more about 
 your domain and usecases, but here are 2 points that came to mind:

 1. You can make use of the highlighting features to show the content that 
 matched. Highlighters can return whole blocks of text, and by using 
 positionIncrements correctly you can get this right.

 2. Yes, Elasticsearch is a document-oriented storage, but is it really 
 necessary for you to index entire books as one document? I'd most certainly 
 look at indexing sections or chapters maybe even pages as single documents 
 and use string references to the book ID. Unless you use data from the book 
 level along with full-text searches on the texts, which even then in some 
 scenarios I would consider denormalization.

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Thu, Jun 19, 2014 at 10:13 PM, liorg lior...@gmail.com javascript: 
 wrote:

 Well, assuming we have a book type. the book holds a lot of metadata, 
 lets say something of the following:
 {
 author: {
 name: Jose,
  lastName: Martin
 },
 sections: [{
  chapters: [{
 pages: [{
 pageNum: 1,
  numOfChars: 1000,
 text: let my people...,
 numofWords: 125
  },
 {
 pageNum: 2,
 numOfChars: 1005,
  text: let my people go...,
 numofWords: 150
  }],
 chapterName: the start
 },
  {
 pages: [{
 pageNum: 3,
 numOfChars: 1000,
  text: will do...,
 numofWords: 125
 },
  {
 pageNum: 4,
 numOfChars: 1005,
  text: will do later on...,
 numofWords: 150
  }],
 chapterName: the end
 }],
  sectionName: prologue
 }]
 }

 we want to search for all the pages that have let my people in their 
 text and more than 100 words.
 so, when we use ES we can use nested objects and query on the nested page 
 object - but the actual returned values are the books (parents) that have 
 those matching pages.
 now, if we want to show the user the pages he was looking for - we cannot 
 do that, as we get the whole book type returned with all its metadata and 
 not just the nested objects that matched the criteria... - we need to 
 search again (maybe in memory?) for the pages that matched the criteria in 
 order to display the user his search results... (the whole type is returned 
 as ES does not support yet in returning the nested objects that matched the 
 criteria).

 i hope it is better understood now

 On Thursday, June 19, 2014 7:22:13 PM UTC+3, Itamar Syn-Hershko wrote:

 This is usually something that's being solved using parent-child, but 
 the question here really is what do you mean by needing to retrieve both 
 books  pages.

 Can you describe the actual scenario and what you are trying to achieve?

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Thu, Jun 19, 2014 at 7:12 PM, liorg lior...@gmail.com wrote:

  Hi,

 we have somehow a complex type holding some nested docs with arrays 
 (lets assume an hierarchy of books and for each book we have an array of 
 pages containing its metadata).

 we want to search for the nested doc - search for all the books that 
 have the term XYZ in one of their pages - but we want to get back not 
 only the book, but the pages themselves.

 We've understood that it's problematic to achieve with ES (see 
 https://github.com/elasticsearch/elasticsearch/issues/3022).

 We have a problem to achieve it with parent child model as the data 
 model comes from our mongodb already existing model (and besides, not sure 
 if a parent child model fits here).

 so...

 1. Is there any a workaround we can do to get the results of the nested 
 doc? (the actual pages?)
 2. If not, is there a recommended way we can search for the data again 
 in memory after it was narrowed down by ES server?...
 3. Any advice will be appreciated as this is quite a big obstacle in 
 our way to implement a solution using ES.

 thanks,

 Lior

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%
 

Kibana Terms panel showing date fields as longs?

2014-06-20 Thread Chris Neal
Hello :)

I have some log data indexed in ES and trying to visualize in Kibana and
getting strange behavior related to dates.  I have Terms panel with the
following settings:

Terms mode: terms
Field: date
Length 10
Order: count

For some reason, the date column in the panel is showing up as a long,
not a date:

COUNTBYDATE

TermCountAction14662944096597 14663808060063 14662080059480
The Table panel showing all my log entries knows that field is a date, and
it displays as a date correctly.
If I curl the request to ES, it appears ES is returning it as a long, not a
date:


curl -XGET 'http://localhost:9200/test/_search?pretty' -d '{
  facets: {
terms: {
  terms: {
field: date,
size: 10,
order: count,
exclude: []
  },
  facet_filter: {
fquery: {
  query: {
filtered: {
  query: {
bool: {
  should: [
{
  query_string: {
query: _type:test_type
  }
}
  ]
}
  },
  filter: {
bool: {
  must: [
{
  range: {
date: {
  from: 1465925902106,
  to: 1466769177326
}
  }
}
  ]
}
  }
}
  }
}
  }
}
  },
  size: 0
}'

returns:

{
  took : 387,
  timed_out : false,
  _shards : {
total : 10,
successful : 10,
failed : 0
  },
  hits : {
total : 48173413,
max_score : 0.0,
hits : [ ]
  },
  facets : {
terms : {
  _type : terms,
  missing : 0,
  total : 365090,
  other : 0,
  terms : [ {
term : 146629440,
count : 96697
  }, {
term : 146638080,
count : 60343
  }, {
term : 146620800,
count : 59579
  }, {
term : 146612160,
count : 51592
  }, {
term : 146603520,
count : 48859
  }, {
term : 146594880,
count : 48020
  } ]
}
  }
}

Is there something I can do to have Kibana recognize the term is a date and
display it as 2014-06-17 like the Table panel does?

Thanks so much!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjWrZD8xiKCEDzXmcvydoQzztN-4q1%2BVr3rhaH4H0HEUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Result the number of matched terms for a given result.

2014-06-20 Thread Dan Harvey
Hi,

Is it possible to get elasticsearch to return the number of terms matched 
per result in a query. I know these are evaluated as they make up the score 
but there doesn't seem to be a way to get a simple count?

For example with :query = {:in = {:user_ids = [user_ids...], 
:minimum_should_match = 1}}

I would like to know how many user_ids were matched.

Thanks,
Dan

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5cd6753-f166-4ae5-8c61-844650efa859%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: guarding from double-start

2014-06-20 Thread Maciej Dziardziel
use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile 
guarding es instance. Or just run this way:
pgrep -f elasticsearch || ./start_es.sh


On Friday, June 20, 2014 3:21:08 PM UTC+1, Andrew Gaydenko wrote:

 There were a couple of times during development workflow I have started ES 
 script the second time. It results in red status (I use Elastic HQ) and 
 not-working. So I'm forced to regenerate all indexes (with all test data) 
 again. It takes noticeable time. 

 At the moment I use this script

 ES_MAX_MEM=512M
 export ES_MAX_MEM
 cd /ES-dir/bin
 ./elasticsearch.in.sh 
 ./elasticsearch -f 


 under Linux to start ES. Can you. please, suggest a trick to avoid falling 
 in red?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d78daaaf-305b-45b4-ad9a-e34cf1adbb22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: guarding from double-start

2014-06-20 Thread Ivan Brusic
You can either use the startup scripts that come with the package when you
install via apt/yum [1] or use the service wrapper [2].

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html
[2] https://github.com/elasticsearch/elasticsearch-servicewrapper

-- 
Ivan


On Fri, Jun 20, 2014 at 7:49 AM, Maciej Dziardziel fied...@gmail.com
wrote:

 use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile
 guarding es instance. Or just run this way:
 pgrep -f elasticsearch || ./start_es.sh


 On Friday, June 20, 2014 3:21:08 PM UTC+1, Andrew Gaydenko wrote:

 There were a couple of times during development workflow I have started
 ES script the second time. It results in red status (I use Elastic HQ) and
 not-working. So I'm forced to regenerate all indexes (with all test data)
 again. It takes noticeable time.

 At the moment I use this script

 ES_MAX_MEM=512M
 export ES_MAX_MEM
 cd /ES-dir/bin
 ./elasticsearch.in.sh
 ./elasticsearch -f 


 under Linux to start ES. Can you. please, suggest a trick to avoid
 falling in red?

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/d78daaaf-305b-45b4-ad9a-e34cf1adbb22%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d78daaaf-305b-45b4-ad9a-e34cf1adbb22%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDQMVO4sf-%3Dgq_cnQRX6cTP1RG7_HquR_tAoVa6A_VoFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: guarding from double-start

2014-06-20 Thread Andrew Gaydenko
On Friday, June 20, 2014 6:49:04 PM UTC+4, Maciej Dziardziel wrote:

 use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile 
 guarding es instance. Or just run this way:
 pgrep -f elasticsearch || ./start_es.sh


Aha, thanks! - at my case pgrep is the most appropriate.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/115162b2-d679-48f0-a06e-24c47f74d079%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] Elasticsearch Thrift transport plugin 2.2.0 released

2014-06-20 Thread Elasticsearch Team

Heya,


We are pleased to announce the release of the Elasticsearch Thrift transport 
plugin, version 2.2.0.

The thrift transport plugin allows to use the REST interface over thrift on top 
of HTTP..

https://github.com/elasticsearch/elasticsearch-transport-thrift/

Release Notes - elasticsearch-transport-thrift - Version 2.2.0



Update:
 * [28] - Update to elasticsearch 1.2.0 
(https://github.com/elasticsearch/elasticsearch-transport-thrift/issues/28)


Doc:
 * [25] - Add documentation on missing settings 
(https://github.com/elasticsearch/elasticsearch-transport-thrift/issues/25)


Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-transport-thrift project repository: 
https://github.com/elasticsearch/elasticsearch-transport-thrift/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53a462bc.814db40a.27a2.5605SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.


Re: Splunk vs. Elastic search performance?

2014-06-20 Thread Brian
Thomas,

Thanks for your insights and experiences. As I am someone who has explored 
and used ES for over a year but is relatively new to the ELK stack, your 
data points are extremely valuable. Let me offer some of my own views.

Re: double the storage. I strongly recommend ELK users to disable the _all 
field. The entire text of the log events generated by logstash ends up in 
the message field (and not @message as many people incorrectly post). So 
the _all field is just redundant overhead with no value add. The result is 
a dramatic drop in database file sizes and dramatic increase in load 
performance. Of course, you need to configure ES to use the message field 
as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on 
the front line of a brand new product with a smart and dedicated 
development team working steadily to improve the product. Six months ago, 
the ELK stack eluded me and reports weren't encouraging (with the sole 
exception of the Kibana web site's marketing pitch). But ES has come a long 
way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and 
prevent external (to the Splunk db itself, not to our company) users from 
causing harm to data. But Kibana seems to be meant for a small cadre of 
trusted users. What if I write a dashboard with the same name as someone 
else's? Kibana doesn't even begin to discuss user isolation. But I am 
confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND 
instead of OR. Google is not my friend: I keep getting references to the 
Ruby versions of Kibana; that's ancient history by now. Kibana is cool and 
promising, but it has a long way to go for deployment to all of the folks 
in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself has 
been an excellent tool for prototyping. The book has been invaluable in 
helping me extract dates from log events and handling all of our different 
multiline events. But it still doesn't explain why the date filter needs a 
different array of matching strings to get the date that the grok filter 
has already matched and isolated. And recommendations to avoid the 
elasticsearch_http output and use elasticsearch (via the Node client) 
directly contradict the fact that logstash's 1.1.1 version of the ES client 
library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it 
with Perl and Apache Flume (already in use) and pipe it into my Java bulk 
load tool (which is always kept up-to-date with the versions of ES we 
deploy!!). Because we send the data via Flume to our data warehouse, any 
losses in ES will be annoying but won't be catastrophic. And the front-end 
following of rotated log files will be done using the GNU *tail -F* command 
and option. This GNU tail command with its uppercase -F option follows 
rotated log files perfectly. I doubt that logstash can do the same, and we 
currently see that neither can Splunk (so we sporadically lose log events 
in Splunk too). So GNU tail -F piped into logstash with the stdin filter 
works perfectly in my evaluation setup and will likely form the first stage 
of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
 The system is slow but ok to use. 

 We tried Elasticsearch and we were able to get the same performance with 
 the same amount of machines. Unfortunately with Elasticsearch you need 
 almost double amount of storage, plus a LOT of patience to make is run. It 
 took us six months to set it up properly, and even now, the system is quite 
 buggy and instable and from time to time we loose data with Elasticsearch. 

 I don´t recommend ELK for a critical production system, for just dev work, 
 it is ok, if you don´t mind the hassle of setting up and operating it. The 
 costs you save by not buying a splunk license you have to invest into 
 consultants to get it up and running. Our dev teams hate Elasticsearch and 
 prefer Splunk.


On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
 The system is slow but ok to use. 

 We tried Elasticsearch and we were able to get the same performance with 
 the same amount of machines. Unfortunately with Elasticsearch you need 
 almost double amount of storage, plus a LOT of patience to make is run. It 
 took us six months to set it up properly, and even now, the system is quite 
 buggy and instable and from time to time we loose data 

boolean multi-field silently ignored in 1.2.1

2014-06-20 Thread Bruce Ritchie
I'm seeing multi-fields of type boolean silently being reduced to a normal 
boolean field in 1.2.1 which wasn't the behavior in 0.90.9. 
See https://gist.github.com/Omega359/0c2a93690b4db30693a1 for an example of 
this.

Is this expected? To me it seems like it should work - the boolean field 
mapper seems to be calling out to multiFieldsBuilder - but I'm not versed 
enough in the internals of ES to know where if at all it's broken.


Bruce

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Penalty or boost from a boolean property

2014-06-20 Thread David Pilato
Function_score is the way to go IMHO.

Best

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 20 juin 2014 à 19:50, hugo lassiege hlassi...@gmail.com a écrit :
 
 Hi,
 
 I'm looking for help :) This is maybe trivial but I can't find the good 
 solution. 
 
 I have some documents and those documents have two boolean properties, 
 basically thumbs up and thumbs down to show that the administrator approve or 
 not those documents. 
 I try to boost a document if it is thumbsup or demote the document if it is 
 thumbsdown. It's not a filter, the document could be retrieved, it's just 
 more or less relevant. 
 
 I tried with two should clauses in the global request :
 
 
 {
 bool : {
 should : [
 {
 term : { champ1 : valeur1 }
 },
 {
 term : { champ2 : valeur2 }
 },
 {
 term : { thumbsup : true }
 },
 {
 term : { thumbsdown : false}
 }
 ]
 }
 }
 
 But I get some irrelevant documents because they match the last conditions. 
 What would be the best method for this use case ?
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ba3964f0-fbc8-4e0c-be3f-c38af8221410%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/088863F1-E2EA-45A6-9368-D9AA69E717FE%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Penalty or boost from a boolean property

2014-06-20 Thread hugo lassiege
Hi,

I'm looking for help :) This is maybe trivial but I can't find the good 
solution. 

I have some documents and those documents have two boolean properties, 
basically thumbs up and thumbs down to show that the administrator approve 
or not those documents. 
I try to boost a document if it is thumbsup or demote the document if it 
is thumbsdown. It's not a filter, the document could be retrieved, it's 
just more or less relevant. 

I tried with two should clauses in the global request :


{
bool : {
should : [
{
term : { champ1 : valeur1 }
},
{
term : { champ2 : valeur2 }
},
{
term : { thumbsup : true }
},
{
term : { thumbsdown : false}
}
]
}
}


But I get some irrelevant documents because they match the last conditions. 
What would be the best method for this use case ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba3964f0-fbc8-4e0c-be3f-c38af8221410%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Getting complete value from ElasticSearch query

2014-06-20 Thread Vinay Pandey
I have the following structure on my ElasticSearch:

{
_index: 3_exposureindex
_type: exposuresearch
_id: 12738
_version: 4
_score: 1
_source: {
Name: test2_update
Description:
CreateUserId: 8
SourceId: null
Id: 12738
ExposureId: 12738
CreateDate: 2014-06-20T16:18:50.500
UpdateDate: 2014-06-20T16:19:57.547
UpdateUserId: 8
}
fields: {
_parent: 1
}
}


I am trying to get both, the data in `_source` as well as that in `fields`, 
when I run the query:

{
  query: {
terms: {
  Id: [
12738
  ]
}
  }
}


All I get are the values contained in `_source`, whereas, if I run the 
query:

{
  fields: [
_parent
  ],
  query: {
terms: {
  Id: [
12738
  ]
}
  }
}


Then I only the `fields`. Is there a way to get both? I will be grateful 
for any help.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cdb02319-f6ee-455e-bf13-762df7e33a82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting complete value from ElasticSearch query

2014-06-20 Thread Vinay Pandey
I forgot to mention that I have asked the same question in StackOverflow 
http://stackoverflow.com/questions/24333655/getting-complete-value-from-elasticsearch-query

On Friday, June 20, 2014 11:52:49 AM UTC-7, Vinay Pandey wrote:

 I have the following structure on my ElasticSearch:

 {
 _index: 3_exposureindex
 _type: exposuresearch
 _id: 12738
 _version: 4
 _score: 1
 _source: {
 Name: test2_update
 Description:
 CreateUserId: 8
 SourceId: null
 Id: 12738
 ExposureId: 12738
 CreateDate: 2014-06-20T16:18:50.500
 UpdateDate: 2014-06-20T16:19:57.547
 UpdateUserId: 8
 }
 fields: {
 _parent: 1
 }
 }


 I am trying to get both, the data in `_source` as well as that in 
 `fields`, when I run the query:

 {
   query: {
 terms: {
   Id: [
 12738
   ]
 }
   }
 }


 All I get are the values contained in `_source`, whereas, if I run the 
 query:

 {
   fields: [
 _parent
   ],
   query: {
 terms: {
   Id: [
 12738
   ]
 }
   }
 }


 Then I only the `fields`. Is there a way to get both? I will be grateful 
 for any help.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f70efa60-f62c-4dc0-9812-e02a3a900ea4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


deleting documents that are missing fields

2014-06-20 Thread Jeff Dupont
I can easily query for documents that are missing a particular term field, 
however I'd like to free up that space and remove those documents. I've 
tried this with no luck:

DELETE /my_index/pages/_search
{
filter : {
missing : {
field : sentences,
existence : true,
null_value : true
}
}
}


It works fine to find them, but i can't find an easy way to remove them and 
I have about 2million to remove as well.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c2d41bfb-145d-402e-a5aa-2f0329278bd9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting complete value from ElasticSearch query

2014-06-20 Thread Vinay Pandey
This just got answered:

You should be able to specify _source in the fields

Example:

{
  fields: [
_parent,
_source
  ],
  query: {
terms: {
  Id: [
12738
  ]
}
  }}



On Friday, June 20, 2014 11:52:49 AM UTC-7, Vinay Pandey wrote:

 I have the following structure on my ElasticSearch:

 {
 _index: 3_exposureindex
 _type: exposuresearch
 _id: 12738
 _version: 4
 _score: 1
 _source: {
 Name: test2_update
 Description:
 CreateUserId: 8
 SourceId: null
 Id: 12738
 ExposureId: 12738
 CreateDate: 2014-06-20T16:18:50.500
 UpdateDate: 2014-06-20T16:19:57.547
 UpdateUserId: 8
 }
 fields: {
 _parent: 1
 }
 }


 I am trying to get both, the data in `_source` as well as that in 
 `fields`, when I run the query:

 {
   query: {
 terms: {
   Id: [
 12738
   ]
 }
   }
 }


 All I get are the values contained in `_source`, whereas, if I run the 
 query:

 {
   fields: [
 _parent
   ],
   query: {
 terms: {
   Id: [
 12738
   ]
 }
   }
 }


 Then I only the `fields`. Is there a way to get both? I will be grateful 
 for any help.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00e590cf-352d-4ebf-800d-113565ee7fbe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Terms aggregation for multiple fields

2014-06-20 Thread Madhavan Ramachandran
Hi Team,
I am new to elasticsearch and learning about the searchapi/queryapi in 
elasticsearch.

I have a requirement to fetch the data from ES. My data is as below assume 
in a table format

Prop-Name  Type Use
Place1   Sale  Office
Place2   LeaseOffice
Place3   SubLease  Office
Place4   Sale  Industry


So in type i have Sale, Lease, sublease as distinct values for the 
property. Similarly for use i have 7 distinct types. 

I have loaded the data as into ES.  My need.. at the pageload.. i need to 
showe the count of each type and each use.

Upon selection of type, i need to filter the use and viceversa.

Assume if we have total 30 places for Type Sale ..then the Use might have 
Office15 and Industry 15..

When i select Office 15.. I need to find in the document how many types of 
each belong to Office.

1. All the time, I have to populate the distinct values (3 types and 7 use 
) and their counts based on the selection of each
2. How to do aggregation if the Use field having values as Multi-family 
and want to show as one aggregated value? Current query bring me as two 
results for this value. 

Regards
Madhavan.TR


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8f803b70-d8ff-4dbd-a4bb-0f71ecaec679%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: cassandra river plugin installation issue

2014-06-20 Thread Shams Haque
Hi,

The issue was not with Hector API, issue has been fixed by using WITH 
COMPACT STORAGE when creating column families in Cassandra.
Here i have posted it: 
http://stackoverflow.com/questions/21089453/cassandra-column-name-trailing-with-blank-characters


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9de84b4d-0d99-4483-bd1e-5f9471c0b97d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: deleting documents that are missing fields

2014-06-20 Thread Ivan Brusic
I do not use delete by query, but have you tried using a fully formed query
and not just a filter? Perhaps an implicit match_all query is not being
set. Try using a filtered query with a match_all query and your filter.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

-- 
Ivan


On Fri, Jun 20, 2014 at 12:13 PM, Jeff Dupont jeff.dup...@gmail.com wrote:

 I can easily query for documents that are missing a particular term field,
 however I'd like to free up that space and remove those documents. I've
 tried this with no luck:

 DELETE /my_index/pages/_search
 {
 filter : {
 missing : {
 field : sentences,
 existence : true,
 null_value : true
 }
 }
 }


 It works fine to find them, but i can't find an easy way to remove them
 and I have about 2million to remove as well.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c2d41bfb-145d-402e-a5aa-2f0329278bd9%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c2d41bfb-145d-402e-a5aa-2f0329278bd9%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCguLamXCnrtV-bA-Ed03pGdB%2BVMrAt5-CYkqkvfnDaGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

2014-06-20 Thread Brian
Patrick,

Here's my template, along with where the _all field is disabled. You may 
wish to add this setting to your own template, and then also add the index 
setting to ignore malformed data (if someone's log entry occasionally slips 
in null or no-data instead of the usual numeric value):

{
  automap : {
template : logstash-*,
settings : {
  *index.mapping.ignore_malformed : true*
},
mappings : {
  _default_ : {
numeric_detection : true,
*_all : { enabled : false },*
properties : {
  message : { type : string },
  host : { type : string },
  UUID : {  type : string, index : not_analyzed },
  logdate : {  type : string, index : no }
}
  }
}
  }
}

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a145cb1e-4013-4a6b-a58d-9a42368d8107%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: HIVE-Elasticsearch [mapr-elasticsearch] write to elasticsearch issue

2014-06-20 Thread shankarramshivram
Hi Costin,

Thanks for the tip. I replaced the old version of jackson and it works now 
:).

Cheers
Shankar

On Sunday, June 15, 2014 3:09:27 AM UTC-6, Costin Leau wrote:

 What version of MapR are you using? MapR uses an old version of jackson 
 which es-hadoop should detect and use an 
 appropriate code path. 
 There are various fixes: 

 1. I've pushed a fix on the 2.x branch which improves detection - you can 
 try the 2.0.1.BUILD-SNAPSHOT version here [a] 
 2. You can upgrade the jackson version in MapR to version 1.7 or higher 
 (vanilla Hadoop uses 1.8.8). This approach works 
 with the current 
 es hadoop and also gives you a performance boost for serializing data. 

 Cheers, 

 [a] 
 https://github.com/elasticsearch/elasticsearch-hadoop#development-snapshot 

 On 6/13/14 11:30 PM, shankarr...@gmail.com javascript: wrote: 
  Hi , 
  
  I am trying to integrate elasticsearch with a mapr hadoop cluster. I am 
 using the hive-elasticsearch integration 
  document. I am able to read data from the elasticsearch node. However I 
 am not able to write data into the elasticsearch 
  node which is my primary requirement. Request to kindly guide me . 
  
  I always get the following errors:- 
  
  2014-06-13 14:15:45,814 INFO 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS 
 maprfs:/user/hive/warehouse/dev.db/_tmp.shankar/02_0 
  *2014-06-13 14:15:45,947 FATAL 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper: java.lang.NoSuchMethodError: 
 org.codehaus.jackson.JsonGenerator.writeUTF8String([BII)V 
  at 
 org.elasticsearch.hadoop.serializ*ation.json.JacksonJsonGenerator.writeUTF8String(JacksonJsonGenerator.java:123)
  

  at 
 org.elasticsearch.hadoop.mr.WritableValueWriter.write(WritableValueWriter.java:47)
  

  at 
 org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:83)
  

  at 
 org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:38)
  

  at 
 org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:69) 

  at 
 org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:111) 

  at 
 org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:55) 

  at 
 org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:41) 

  at 
 org.elasticsearch.hadoop.serialization.builder.ContentBuilder.value(ContentBuilder.java:258)
  

  at 
 org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.doWriteObject(TemplatedBulk.java:92)
  

  at 
 org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:79)
  

  at 
 org.elasticsearch.hadoop.hive.EsSerDe.serialize(EsSerDe.java:128) 
  at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582)
  

  at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) 
  at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
  

  at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) 
  at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  

  at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) 
  at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) 
  at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) 
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
  at 
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) 
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) 
  at org.apache.hadoop.mapred.Child$4.run(Child.java:282) 
  at java.security.AccessController.doPrivileged(Native Method) 
  at javax.security.auth.Subject.doAs(Subject.java:415) 
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117)
  

  at org.apache.hadoop.mapred.Child.main(Child.java:271) 
  
  2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 3 finished. closing... 
  2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 DESERIALIZE_ERRORS:0 
  2014-06-13 14:15:45,947 INFO 
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing... 
  2014-06-13 14:15:45,947 INFO 
 org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 
  2014-06-13 14:15:45,947 INFO 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 finished. closing... 
  2014-06-13 14:15:45,948 INFO 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 Close done 
  2014-06-13 14:15:45,948 INFO 
 org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done 
  2014-06-13 14:15:45,948 INFO 
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 
  2014-06-13 14:15:45,948 INFO 

issues with file input from logstash to elastic - please read

2014-06-20 Thread Eitan Vesely
Guys,
its been more than a week i've been struggling with this issue,
if possible, please give it a look and try to help  :-( 

i have a config file that im running logstash with which is suppose to 
fetch the log file i specified in it and stream it to elasticsearch.

problem is that it worked twice and thats it. NO changes made to the file 
and most of the times it doest load the data and doesnt show any error msg.
when i change the input from file to stdin' it works fine.

this is the config file, which i belive the syntax is correct since it did 
work twice...

input{ 
file{
path = C:\elasticsearch-1.2.0\testLog.txt
start_position = beginning
}
} 
output{
   elasticsearch{
host= localhost
index= tester3
protocol= http
   }
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b0634eb-dd2c-47f3-9959-2e48bdcc349d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Disabling date detection [Hive-Elasticsearch]

2014-06-20 Thread shankarramshivram
Hi ,

My write to es from mapr fails because of the automatic date detection 
being enabled . Is there a way to disable date detection from the external 
hive table properties. ?
Request to please guide me regarding this.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed7e40b0-b896-4633-88fc-efdf2bead65a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: boolean multi-field silently ignored in 1.2.1

2014-06-20 Thread Clinton Gormley
heya bruce

that looks like a bug  - please open an issue

clint


On 20 June 2014 19:41, Bruce Ritchie bruce.ritc...@gmail.com wrote:

 I'm seeing multi-fields of type boolean silently being reduced to a normal
 boolean field in 1.2.1 which wasn't the behavior in 0.90.9. See
 https://gist.github.com/Omega359/0c2a93690b4db30693a1 for an example of
 this.

 Is this expected? To me it seems like it should work - the boolean field
 mapper seems to be calling out to multiFieldsBuilder - but I'm not versed
 enough in the internals of ES to know where if at all it's broken.


 Bruce

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSpOKM38EJDpVkXyTdNuKtL%2BE5dDHBEV89K2LPP4oS2-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: issues with file input from logstash to elastic - please read

2014-06-20 Thread Mark Walkom
You'll have better luck sending this to the Logstash mailing list :)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 21 June 2014 08:02, Eitan Vesely eitan...@gmail.com wrote:

 Guys,
 its been more than a week i've been struggling with this issue,
 if possible, please give it a look and try to help  :-(

 i have a config file that im running logstash with which is suppose to
 fetch the log file i specified in it and stream it to elasticsearch.

 problem is that it worked twice and thats it. NO changes made to the file
 and most of the times it doest load the data and doesnt show any error msg.
 when i change the input from file to stdin' it works fine.

 this is the config file, which i belive the syntax is correct since it did
 work twice...

 input{
 file{
  path = C:\elasticsearch-1.2.0\testLog.txt
 start_position = beginning
 }
  }
 output{
elasticsearch{
 host= localhost
 index= tester3
  protocol= http
}
 }

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/8b0634eb-dd2c-47f3-9959-2e48bdcc349d%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/8b0634eb-dd2c-47f3-9959-2e48bdcc349d%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YhwCh2XQ1BjK5c5czTy3t0Wa%3DK46st6Gr5Ei%3D5JAkCyg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to find the number of authors who have written between 2-3 books?

2014-06-20 Thread Clinton Gormley
Alternatively, if you mode this with parent-child, then you can use
min_children/max_children which is available in the next release

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#_min_max_children_2

clint


On 20 June 2014 17:15, Mike mnilsson2...@gmail.com wrote:

 I'm ok with the count returned being some estimate.  Say in this simple
 example if it returned 1 for just Joe, or 3 for John, Joe, and Jack that
 would be ok too.  I am also ok with restructuring my data in any way to
 more efficiently get this number.

 You mentioned creating a reference count document.  How would that look?
  1 doc per unique author, with a count of the total number of books he
 wrote so then I can do a range aggregation on that number?  What if I
 wanted to find the number of authors who have written between 2-3 books
 that have a title containing E, F, H, or I (still 2 in this case, John and
 Joe) ?




 On Thursday, June 19, 2014 6:43:41 PM UTC-4, Itamar Syn-Hershko wrote:

 This is a Map/Reduce operation, you'll be better off maintaining a
 ref-count document IMO then trying to hack the aggregations framework to
 support this

 Another reason for doing it that way is in a distributed environment some
 aggregations can't be computed to an exact value - the Terms bucketing is
 one example. So if you need exact values, I'd go for a model that does it.

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Fri, Jun 20, 2014 at 1:34 AM, Mike mnilss...@gmail.com wrote:

 Assume each document is a book:
 { title: A, author: Mike }
 { title: B, author: Mike }
 { title: C, author: Mike }
 { title: D, author: Mike }

 { title: E, author: John }
 { title: F, author: John }
 { title: G, author: John }

 { title: H, author: Joe }
 { title: I, author: Joe }

 { title: J, author: Jack }


 What is the best way to fin the number of authors who have written
 between 2-3 books?  In this case it would be 2, John and Joe.

 I know I can do a terms aggregation on author, set size to be very very
 large, and then on the client side traverse through the thousands of
 authors and count how many had between 2-3.  Is there a more efficient way
 to do this?  The cardinality aggregation is almost what I want, if only I
 could specify a min and max term count.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSyio7izuxr5UL4SD5uiA5J7rwtfyP742W3robxfk7s6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Splunk vs. Elastic search performance?

2014-06-20 Thread Mark Walkom
I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated that
there was minimal performance difference, at the greater benefit of not
being locked to specific LS+ES versioning.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 21 June 2014 02:43, Brian brian.from...@gmail.com wrote:

 Thomas,

 Thanks for your insights and experiences. As I am someone who has explored
 and used ES for over a year but is relatively new to the ELK stack, your
 data points are extremely valuable. Let me offer some of my own views.

 Re: double the storage. I strongly recommend ELK users to disable the _all
 field. The entire text of the log events generated by logstash ends up in
 the message field (and not @message as many people incorrectly post). So
 the _all field is just redundant overhead with no value add. The result is
 a dramatic drop in database file sizes and dramatic increase in load
 performance. Of course, you need to configure ES to use the message field
 as the default for a Lucene Kibana query.

 During the year that I've used ES and watched this group, I have been on
 the front line of a brand new product with a smart and dedicated
 development team working steadily to improve the product. Six months ago,
 the ELK stack eluded me and reports weren't encouraging (with the sole
 exception of the Kibana web site's marketing pitch). But ES has come a long
 way since six months ago, and the ELK stack is much more closely integrated.

 The Splunk UI is carefully crafted to isolate users from each other and
 prevent external (to the Splunk db itself, not to our company) users from
 causing harm to data. But Kibana seems to be meant for a small cadre of
 trusted users. What if I write a dashboard with the same name as someone
 else's? Kibana doesn't even begin to discuss user isolation. But I am
 confident that it will.

 How can I tell Kibana to set the default Lucene query operator to AND
 instead of OR. Google is not my friend: I keep getting references to the
 Ruby versions of Kibana; that's ancient history by now. Kibana is cool and
 promising, but it has a long way to go for deployment to all of the folks
 in our company who currently have access to Splunk.

 Logstash has a nice book that's been very helpful, and logstash itself has
 been an excellent tool for prototyping. The book has been invaluable in
 helping me extract dates from log events and handling all of our different
 multiline events. But it still doesn't explain why the date filter needs a
 different array of matching strings to get the date that the grok filter
 has already matched and isolated. And recommendations to avoid the
 elasticsearch_http output and use elasticsearch (via the Node client)
 directly contradict the fact that logstash's 1.1.1 version of the ES client
 library is not compatible with the most recent 1.2.1 version of ES.

 And logstash is also a resource hog, so we eventually plan to replace it
 with Perl and Apache Flume (already in use) and pipe it into my Java bulk
 load tool (which is always kept up-to-date with the versions of ES we
 deploy!!). Because we send the data via Flume to our data warehouse, any
 losses in ES will be annoying but won't be catastrophic. And the front-end
 following of rotated log files will be done using the GNU *tail -F* command
 and option. This GNU tail command with its uppercase -F option follows
 rotated log files perfectly. I doubt that logstash can do the same, and we
 currently see that neither can Splunk (so we sporadically lose log events
 in Splunk too). So GNU tail -F piped into logstash with the stdin filter
 works perfectly in my evaluation setup and will likely form the first stage
 of any log forwarder we end up deploying,

 Brian

 On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
 The system is slow but ok to use.

 We tried Elasticsearch and we were able to get the same performance with
 the same amount of machines. Unfortunately with Elasticsearch you need
 almost double amount of storage, plus a LOT of patience to make is run. It
 took us six months to set it up properly, and even now, the system is quite
 buggy and instable and from time to time we loose data with Elasticsearch.

 I don´t recommend ELK for a critical production system, for just dev
 work, it is ok, if you don´t mind the hassle of setting up and operating
 it. The costs you save by not buying a splunk license you have to invest
 into consultants to get it up and running. Our dev teams hate Elasticsearch
 and prefer Splunk.


 On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
 Indexer and 2 

Re: guarding from double-start

2014-06-20 Thread Clinton Gormley
And in your config file, set:

node.max_local_storage_nodes: 1

that way you won't start two nodes on a single instance


On 20 June 2014 16:54, Andrew Gaydenko andrew.gayde...@gmail.com wrote:

 On Friday, June 20, 2014 6:49:04 PM UTC+4, Maciej Dziardziel wrote:

 use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up
 pidfile guarding es instance. Or just run this way:
 pgrep -f elasticsearch || ./start_es.sh


 Aha, thanks! - at my case pgrep is the most appropriate.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/115162b2-d679-48f0-a06e-24c47f74d079%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/115162b2-d679-48f0-a06e-24c47f74d079%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKTwyNM0DGJ_6HMoSbWmyJkSv5PObsfwGOF3tZ1a0QmJ9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Clinton Gormley
You seriously don't want 3..250 length ngrams That's ENORMOUS

Typically set min/max to 3 or 4, and that's it

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html#_ngrams_for_partial_matching


On 20 June 2014 16:05, Tanguy Bernard bernardtanguy1...@gmail.com wrote:

 Thank you Cédric Hourcade !

 Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit :

 If your base64 encodes are long, they are going to be splited in a lot
 of tokens by the standard tokenizer.

 Theses tokens are often going to be a lot longer than standard words,
 so your nGram filter will generate even more tokens, a lot more than
 with standard text. That may be your problem there.

 You should really try to strip the encoded images with a simple regex
 from your documents before indexing them. If you need to keep the
 source, put the raw text in an unindexed field, and the cleaned one in
 another.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRS_zD%3DkVpKBpqp3hkcgJacAWsETGgJwMQJM%2BqJMuvscw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Adding order to a terms aggregator results in ArrayIndexOutOfBoundsException

2014-06-20 Thread debo
I have a simple document schema on which I am trying to run the following 
query :

curl -XPOST 'localhost:9200/indexName/topn/_search?pretty' -d '{
  aggregations : {
applid : {
  terms : {
field : applid,
size : 3,
order : {
  ttbyt_sum : desc
}
  },
  aggregations : {
tt : {
  filter : {
and : {
  filters : [ {
range : {
  t : {
from : 140321160,
to : 140321610,
include_lower : true,
include_upper : true
  }
}
  }, {
terms : {
  gid : [ abcd ]
}
  } ]
}
  },
  aggregations : {
byt_sum : {
  sum : {
field : byt
  }
}
  }
}
  }
}
  }
}'

This seems to give me back an error 

 error : SearchPhaseExecutionException[Failed to execute phase [query], 
all shards failed; shardFailures {[rcP5ncimTpmcUZgvn5cgSw][indexName][0]: 
ArrayIndexOutOfBoundsException[null]}{[vauVf2XOQvOobpqIbp0REQ][indexName][2]: 
RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]];
 
nested: ArrayIndexOutOfBoundsException; 
}{[vauVf2XOQvOobpqIbp0REQ][indexName][1]: 
RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]];
 
nested: ArrayIndexOutOfBoundsException; 
}{[vauVf2XOQvOobpqIbp0REQ][indexName][4]: 
RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]];
 
nested: ArrayIndexOutOfBoundsException; 
}{[vauVf2XOQvOobpqIbp0REQ][indexName][3]: 
RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]];
 
nested: ArrayIndexOutOfBoundsException; }],
  status : 500
}

When I take the 
order : {
  ttbyt_sum : desc
}
out, this seems to work fine. Also, the error only occurs for certain gid 
: [ abcd ] parameters. FOr example, it works for gid : [ 1234 ]. 
Could you suggest what is going wrong here?

Elasticsearch version :

{
  status : 200,
  name : Kylun,
  version : {
number : 1.1.1,
build_hash : f1585f096d3f3985e73456debdc1a0745f512bbc,
build_timestamp : 2014-04-16T14:27:12Z,
build_snapshot : false,
lucene_version : 4.7
  },
  tagline : You Know, for Search
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0aed2aa9-e91b-43db-b917-11612458da2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: guarding from double-start

2014-06-20 Thread Andrew Gaydenko
On Saturday, June 21, 2014 2:33:28 AM UTC+4, Clinton Gormley wrote:

 And in your config file, set:

 node.max_local_storage_nodes: 1

 that way you won't start two nodes on a single instance


Great, thanks!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/244c064e-3b2e-4b86-a2df-d1fa88617042%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Splunk vs. Elastic search performance?

2014-06-20 Thread Brian
Mark,

I've read one post (can't remember where) that the Node client was 
preferred, but have also read where the HTTP interface is minimal overhead. 
So yes, I am currently using logstash with the HTTP interface and it works 
fine.

I also performed some experiments with clustering (not much, due to 
resource and time constraints) and used unicast discovery. Then I read 
someone who strongly recommended multicast recovery, and I started to feel 
like I'd gone down the wrong path. Then I watched the ELK webinar and heard 
that unicast discovery was preferred. I think it's not a big deal either 
way; it's what works best for your particular networking infrastructure.

In addition, I was recently given this link: 
http://aphyr.com/posts/317-call-me-maybe-elasticsearch. It hasn't dissuaded 
me at all, but it is a thought-provoking read. I am a little confused by 
some things, though. In all of my high-performance banging on ES, even with 
my time-to-live test feature enabled, I never lost any documents at all. 
But I wasn't using auto-id; I was specifying my own unique ID. And when run 
in my 3-node cluster (slow due to being hosted by 3 VMs running on a 
dual-code machine), I still didn't lose any data. So I am not sure of the 
high data loss scenarios he describes in his missive; I have seen no 
evidence of any data loss due to false insert positives at all.

Brian

On Friday, June 20, 2014 6:30:27 PM UTC-4, Mark Walkom wrote:

 I wasn't aware that the elasticsearch_http output wasn't recommended?
 When I spoke to a few of the ELK devs a few months ago, they indicated 
 that there was minimal performance difference, at the greater benefit of 
 not being locked to specific LS+ES versioning.

 Regards,
 Mark Walkom



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: issues with file input from logstash to elastic - please read

2014-06-20 Thread Brian
Eitan,

My recommendation is to use the stdin input in logstash and avoid its file 
input. Then, for testing you pipe the file into your logstash instance. But 
in production, you should run the GNU version of *tail -F* (uppercase F 
option) to correctly follow all forms of rotated logs, and the pipe that 
output into your logstash instance.

I don't know just how robust logstash's file input is, but the GNU version 
of tail with the -F option is perfect, so there's no guesswork and no 
dependency on hope. Note that even Splunk has a currently open bug with 
losing data while trying to follow a rotated file.

Also, I added the multiline processing to the filters; it didn't seem to 
work when applied as a stdin codec. Now it works very well together.

Anyway, that's what our group is doing.

And yes, the logstash-users 
https://groups.google.com/forum/#!forum/logstash-users group is also 
rather active and is a good place for logstash-specific help.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9bbe59f4-93f1-4b59-8258-89301a8c5469%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch cluster on Azure using ubuntu. The nodes don't see each other

2014-06-20 Thread Pedro Alonso


I just posted this question on Stackoverflow:

I have been setting up a cluster of Elasticsearch in Azure, using Ubuntu 
VM, following the tutorial on the plugin page (elasticsearch-cloud-azure) 
on github. I've managed to configure everything and I have elasticsearch 
running, but I have 3 clusters of 1 Node instead of 1 Cluster of 3 nodes. I 
guess that the problem comes from:

cloud: azure: keystore: /path/to/keystore password: 
your_password_for_keystore subscription_id: your_azure_subscription_id 
service_name: your_azure_cloud_service_name discovery: type: azure 

I'm not sure of what your_azure_cloud_service_name should be. I have all 
my nodes inside a Virtual Network, so they can communicate each other. By 
default, on azure each time I create a VM, a new Cloud Service containing 
only that VM is created. Should that value be different for each of the 
nodes in my cluster?

I'm a bit lost on that one...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2968f5d-9f67-421c-a60f-8fd5053317ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


update field type in existing mapping in elastic search

2014-06-20 Thread srikanth ramineni
Hi ,

can you please provide inputs to update the existing  field type in the 
mapping.Below is the requirement.

I have crated  contractIndex and it is type is conract. In that i have 
fields  contractid as long, contract number as long  but  i want to  change 
   contract number  type as string.


Thanks,
Srikanth.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e80c0884-9e18-4af2-8c04-69cde01fd3ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch cluster on Azure using ubuntu. The nodes don't see each other

2014-06-20 Thread David Pilato
You must create each VM under the same cloud service.
azure vm create azure-elasticsearch-cluster 
Cloud service name is azure-elasticsearch-cluster

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 21 juin 2014 à 03:54, Pedro Alonso pedro@gmail.com a écrit :

I just posted this question on Stackoverflow:

I have been setting up a cluster of Elasticsearch in Azure, using Ubuntu VM, 
following the tutorial on the plugin page (elasticsearch-cloud-azure) on 
github. I've managed to configure everything and I have elasticsearch running, 
but I have 3 clusters of 1 Node instead of 1 Cluster of 3 nodes. I guess that 
the problem comes from:

cloud:
azure:
keystore: /path/to/keystore
password: your_password_for_keystore
subscription_id: your_azure_subscription_id
service_name: your_azure_cloud_service_name
discovery:
type: azure

I'm not sure of what your_azure_cloud_service_name should be. I have all my 
nodes inside a Virtual Network, so they can communicate each other. By default, 
on azure each time I create a VM, a new Cloud Service containing only that VM 
is created. Should that value be different for each of the nodes in my cluster?

I'm a bit lost on that one...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2968f5d-9f67-421c-a60f-8fd5053317ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/64636299-774A-4C9B-865A-E3FEB85F326B%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: update field type in existing mapping in elastic search

2014-06-20 Thread David Pilato
You can't.

You basically need to reindex.

That said, you can try to use a multifield which add a String representation of 
the same field. But old values (old docs) won't have this new field populated.

HTH
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 21 juin 2014 à 06:00, srikanth ramineni ri.srika...@gmail.com a écrit :

Hi ,

can you please provide inputs to update the existing  field type in the 
mapping.Below is the requirement.

I have crated  contractIndex and it is type is conract. In that i have fields  
contractid as long, contract number as long  but  i want to  changecontract 
number  type as string.


Thanks,
Srikanth.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e80c0884-9e18-4af2-8c04-69cde01fd3ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3FB4C99C-F209-4697-A902-3582C21711BF%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: relation between snapshot restore and update_mapping

2014-06-20 Thread JoeZ99
I just discoverd these strange update_mapping loglines come from a 
completely unrelated thing, so please take this post as invalid and accept 
my apologies.

On Thursday, June 19, 2014 1:21:32 PM UTC-4, JoeZ99 wrote:

 This is a somehow bizarre question. I really hope somebody jumps in, 
 because I'm losing my mind.

 We've set a system by which our one-machine cluster gets updated indexes 
 that have been made in other clusters by restoring snapshots.

 Long story short:

 for a few hours, the cluster is restoring snapshots, each one of them 
 containing information about two indexes. of course , the global_state 
 flag is set to false, because we don't want to recover the cluster, just 
 those two indexes. 

 Say during those few hours , the cluster have restored about 500 
 snapshots, one after another (there is never two restore processes at the 
 same time). By looking at the logs, it goes flawlessly :



 [2014-06-19 00:00:01,318][INFO ][snapshots] [Svarog] 
 restore [backups-1:5e51361312cb68f41e1cb1fa5597672a_ts20140618235915350570
 ] is done
 [2014-06-19 00:00:02,363][INFO ][repositories ] [Svarog] 
 update repository [backups-1]
 [2014-06-19 00:00:08,653][INFO ][cluster.metadata ] [Svarog] [
 5e51361312cb68f41e1cb1fa5597672a_ts20140617220817522348] deleting index
 [2014-06-19 00:00:09,286][INFO ][cluster.metadata ] [Svarog] [
 5e51361312cb68f41e1cb1fa5597672a_phonetic_ts20140617220817904810] 
 deleting index
 [2014-06-19 00:00:09,815][INFO ][repositories ] [Svarog] 
 update repository [backups-1]
 [2014-06-19 00:00:15,570][INFO ][repositories ] [Svarog] 
 update repository [backups-1]
 [2014-06-19 00:00:15,938][INFO ][repositories ] [Svarog] 
 update repository [backups-1]
 [2014-06-19 00:00:16,208][INFO ][repositories ] [Svarog] 
 update repository [backups-1]
 [2014-06-19 00:00:20,669][INFO ][snapshots] [Svarog] 
 restore [backups-1:70e3583358803e70dc60a83953aaca9e_ts20140618235930121779
 ] is done
 [2014-06-19 00:00:21,585][INFO ][repositories ] [Svarog] 
 update repository [backups-1]
 [2014-06-19 00:00:26,992][INFO ][cluster.metadata ] [Svarog] [
 70e3583358803e70dc60a83953aaca9e_ts20140617220848057264] deleting index
 [2014-06-19 00:00:27,601][INFO ][cluster.metadata ] [Svarog] [
 70e3583358803e70dc60a83953aaca9e_phonetic_ts20140617220848563815] 
 deleting index

 after restoring the snapshot, outdated version of the indices are removed 
 (because the indices recovered from the snapshot are newer).

 this goes quite well, and there is no significant load on the machine 
 while doing this.

 but, at some poing, the cluster starts to issue udpate_mapping commands 
 with no apparent reason (I'm almost sure there's been no interaction from 
 outside)...
 [2014-06-19 04:38:36,293][INFO ][snapshots] [Svarog] 
 restore [backups-1:99cbf66451446e6fe770878e84b4349b_ts20140619043819745139
 ] is done
 [2014-06-19 04:38:37,238][INFO ][repositories ] [Svarog] 
 update repository [backups-1]
 [2014-06-19 04:38:44,016][INFO ][cluster.metadata ] [Svarog] [
 99cbf66451446e6fe770878e84b4349b_ts20140604042653951289] deleting index
 [2014-06-19 04:38:44,517][INFO ][cluster.metadata ] [Svarog] [
 99cbf66451446e6fe770878e84b4349b_phonetic_ts20140604042655159506] 
 deleting index
 [2014-06-19 05:57:24,721][INFO ][repositories ] [Svarog] 
 update repository [backups-1]
 [2014-06-19 05:57:34,869][INFO ][repositories ] [Svarog] 
 update repository [backups-1]
 [2014-06-19 05:57:35,234span style=color: #660; class=styled
 ...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/949304b9-eba4-4328-badf-00f8288c36a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Clarification on has_child filter memory requirements

2014-06-20 Thread Drew Kutcharian
Thanks Alex. What do you mean by not all parent documents (and not the data), 
just their ids what decides what which parent document ids get loaded? Also, 
this ids that get loaded are per query or they stay around longer? I ask 
because in our use case we're going to keep adding more and more parents and 
children.

- Drew

On Jun 20, 2014, at 12:04 AM, Alexander Reelsen a...@spinscale.de wrote:

 Hey,
 
 not all parent documents (and not the data), just their ids. Still this can 
 accumulate, which is the reason why you should monitor the size of that data 
 structure (exposed in the nodes stats).
 
 Hope that helps.
 
 
 --Alex
 
 
 On Thu, Jun 19, 2014 at 6:03 AM, Drew Kutcharian d...@venarc.com wrote:
 Based on the official docs 
 (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html):
 
 {quote}
 memory considerations
 
 With the current implementation, all _parent field values and all _id field 
 values of parent documents are loaded into memory (heap) via field data in 
 order to support fast lookups, so make sure there is enough memory for it.
 {/quote}
 
 Does this mean that all the parent docs will be loaded into memory or the 
 ones matching the filter? If the former is true, then it would mean that one 
 should keep the size of the parent objects to minimum, right? In addition, 
 say has_child is a part of a conjunction (regular filter AND has_child), 
 would ES still load all the parent docs, or only the ones that matched the 
 first filter?
 
 Thanks,
 
 Drew
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/FE901831-FB74-4F89-A313-16C1C08BF0A5%40venarc.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-%3Dvbk3BkFQBbuXybg_-QX%3DEj6Rou2QMzqbzXUsbYJV8w%40mail.gmail.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/E4598079-47FD-4B49-BE88-A0AE75E98622%40venarc.com.
For more options, visit https://groups.google.com/d/optout.