date:20140620

Hey.

judging from the exception this looks like an unstable network connection?
Are you using persistent HTTP connections? Pinging the nodes by each other
is not a problem I guess?


--Alex


On Thu, Jun 19, 2014 at 12:12 AM, alekjouhar...@gmail.com wrote:

 Hello all,

 So here's the issue, our cluster was previously very underwhelmed as far
 as resource consumption, and after some config changes (see complete config
 below) -- we were able to hike up resource consumption, but are still
 indexing documents at the same sluggish rate of  400 docs/second.

 Redis and Logstash are definitely not the bottlenecks, and the indexing
 seems to be growing exponentially worse as we pull in more data.  We are
 using elasticsearch v 1.1.1.

 The java http exception errors would definitely explain the slugishness,
 as there seems to be a socket timeout every second, like clockwork -- but
 i'm at a loss for what could be causing the errors to begin with.

 We are running redis,logstash kibana and the es master (no data) on one
 node, and have our elasticsearch data instance on another node.  Network
 latency is definitely not so atrocious that it would be an outright
 bottleneck, and data gets to the secondary node fast enough -- but is
 backed up in indexing.

 Any help would greatly be appreciated, and I thank you all in advance!

 ### ES CONFIG ###


 index.indexing.slowlog.threshold.index.warn: 10s
 index.indexing.slowlog.threshold.index.info: 5s
 index.indexing.slowlog.threshold.index.debug: 2s
 index.indexing.slowlog.threshold.index.trace: 500ms



 monitor.jvm.gc.young.warn: 1000ms
 monitor.jvm.gc.young.info: 700ms
 #monitor.jvm.gc.young.debug: 400ms

 monitor.jvm.gc.old.warn: 10s
 monitor.jvm.gc.old.info: 5s
 #monitor.jvm.gc.old.debug: 2s
 cluster.name: iislog-cluster
 node.name: VM-ELKIIS
 discovery.zen.ping.multicast.enabled: true
 discovery.zen.ping.unicast.hosts: [192.168.6.145]
 discovery.zen.ping.timeout: 5
 node.master: true
 node.data: false
 index.number_of_shards: 10
 index.number_of_replicas: 0
 bootstrap.mlockall: true
 index.refresh_interval: 30
 indices.memory.index_buffer_size: 50%
 index.translog.flush_threshold_ops: 5
 index.store.type: mmapfs
 index.store.compress.stored: true

 threadpool.search.type: fixed
 threadpool.search.size: 20
 threadpool.search.queue_size: 100

 threadpool.index.type: fixed
 threadpool.index.size: 20
 threadpool.index.queue_size: 100

  JAVA ERRORS IN ES LOG ###

 [2014-06-18 09:39:09,565][DEBUG][http.netty   ] [VM-ELKIIS]
 Caught exception while handling client http traffic, closing connection
 [id: 0x7561184c, /192.168.6.3:6206 = /192.168.6.21:9200]
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 at
 org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 at
 org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at
 org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 at
 org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/95e3bc66-b403-4844-a798-da0f25141ca6%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/95e3bc66-b403-4844-a798-da0f25141ca6%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-jK5P8DQWxVPzvcvOsFViFziGwSTnXSbYp689M5wLmMg%40mail.gmail.com.
For more options, visit

Re: Splunk vs. Elastic search performance?

2014-06-20 Thread joergpra...@gmail.com

It is correct you noted that Elasticsearch comes with developer settings -
that is exactly what a packages ES is meant for.

If you find issues when configuring and setting up ES for critical use, it
would be nice to post your issues so others can also find help too, and
maybe share their solutions , because there are ES installations that run
successfully in critical environments.

By just quoting hate of dev teams, it is rather impossible for me to
learn about the reason why this is so. Learning facts is more important
than emotions to fix software issues. The power of open source is that such
issues can be fixed by the help of a public discussion in the community. In
closed software products, you can not rely on issues being discussed
publicly for best solutions how to fix them.

Jörg

On Thu, Jun 19, 2014 at 2:48 PM, Thomas Paulsen monokit2...@googlemail.com
wrote:

We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
The system is slow but ok to use.

We tried Elasticsearch and we were able to get the same performance with
the same amount of machines. Unfortunately with Elasticsearch you need
almost double amount of storage, plus a LOT of patience to make is run. It
took us six months to set it up properly, and even now, the system is quite
buggy and instable and from time to time we loose data with Elasticsearch.

I don´t recommend ELK for a critical production system, for just dev work,
it is ok, if you don´t mind the hassle of setting up and operating it. The
costs you save by not buying a splunk license you have to invest into
consultants to get it up and running. Our dev teams hate Elasticsearch and
prefer Splunk.

Am Samstag, 19. April 2014 00:07:44 UTC+2 schrieb Mark Walkom:

That's a lot of data! I don't know of any installations that big but
someone else might.

What sort of infrastructure are you running splunk on now, what's your
current and expected retention?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 19 April 2014 07:33, Frank Flynn faultle...@gmail.com wrote:

We have a large Splunk instance. We load about 1.25 Tb of logs a day.
We have about 1,300 loaders (servers that collect and load logs - they may
do other things too).

As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on very similar
hardware? More? or Less?

Sure it depends on exactly what we're doing, the exact queries and the
frequency we'd run them but I'm trying to get any kind of idea before we
start.

Are there any white papers or other documents about switching? It seems
an obvious choice but I can only find very little performance comparisons
(I did see that Elasticsearch just hired the former VP of Products at
Splunk, Gaurav Gupta - but there were few numbers in that article either).

Thanks,
Frank

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ea1a338b-5b44-485d-84b2-3558a812e8a0%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/32c23e38-2a2f-4c09-a76d-6a824edb1b85%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGtte%3DRWjZCNtBWcX5y4Z9j7yXpyXC5MWdzpqubtCce5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Clarification on has_child filter memory requirements

Hey,

not all parent documents (and not the data), just their ids. Still this can
accumulate, which is the reason why you should monitor the size of that
data structure (exposed in the nodes stats).

Hope that helps.

--Alex

On Thu, Jun 19, 2014 at 6:03 AM, Drew Kutcharian d...@venarc.com wrote:

Based on the official docs (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html
):

{quote}
memory considerations

With the current implementation, all _parent field values and all _id
field values of parent documents are loaded into memory (heap) via field
data in order to support fast lookups, so make sure there is enough memory
for it.
{/quote}

Does this mean that all the parent docs will be loaded into memory or the
ones matching the filter? If the former is true, then it would mean that
one should keep the size of the parent objects to minimum, right? In
addition, say has_child is a part of a conjunction (regular filter AND
has_child), would ES still load all the parent docs, or only the ones that
matched the first filter?

Thanks,

Drew

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/FE901831-FB74-4F89-A313-16C1C08BF0A5%40venarc.com
https://groups.google.com/d/msgid/elasticsearch/FE901831-FB74-4F89-A313-16C1C08BF0A5%40venarc.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-%3Dvbk3BkFQBbuXybg_-QX%3DEj6Rou2QMzqbzXUsbYJV8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem indexing with my analyzer

Information
My note_source contain picture (.jpg, .png ...) in base64 and text.

For my mapping I have used :
type = string
analyzer = reuteurs (the name of my analyzer)


Any idea ?

Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :

 Hello
 I have some issue, when I index a particular data note_source (sql 
 longtext).
 I use the same analyzer for each fields (except date_source and id_source) 
 but for note_source, I have a warn monitor.jvm.
 When I remove note_source, everything fine. If I don't use analyzer on 
 note_source, everything fine, but if I use my analyzer on note_source I 
 have some crash.

 I think I have enough memory, I have used ES_HEAP_SIZE.
 Maybe my problem it's with accent (ascii, utf-8)

 Can you help me with this ?



 *My Setting*

  public function createSetting($pf){
 $params = array('index' = $pf, 'body' = array(
 'settings' = array(
 'number_of_shards' = 5,
 'number_of_replicas' = 0,
 'analysis' = array(
 'filter' = array(
 'nGram' = array(
 token_chars =array(),
 type = nGram,
 min_gram = 3,
 max_gram  = 250
 )
 ),
 'analyzer' = array(
 'reuters' = array(
 'type' = 'custom',
 'tokenizer' = 'standard',
 'filter' = array('lowercase', 'asciifolding', 
 'nGram')
 )
 )
 )
 )
 ));
 $this-elasticsearchClient-indices()-create($params);
 return;
 }


 *My Indexing*

 public function indexTable($pf,$typeElement){

 $params =array(
 index ='_river', 
 type = $typeElement, 
 id = _meta, 
 body =array(
   
 type = jdbc,
 jdbc = array(
 url = jdbc:mysql://ip/name,
 user = 'root',
 password = 'mdp',
 index = $pf,
 type = $typeElement,
 sql = select id_source as _id, id_sous_theme, 
 titre_source, desc_source, note_source, adresse_source, type_source, 
 date_source from source,
 max_bulk_requests = 5,  
 )
 )
 
 );
 
  
 $this-elasticsearchClient-index($params);
 }

 Thanks in advance.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Losing data after Elasticsearch restart

Hey,

the exception you showed, can possibly happen, when you remove an alias.
However you mentioned NullPointerException in your first post, which is not
contained in the stacktrace, so it seems, that one is still missing.

Also, please retry with a newer version of Elasticsearch.


--Alex


On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jais...@gmail.com
wrote:

 Hi Alexander,
We sent you the stack trace. Can you please enlighten us on
 this?

 Thanks,
 Rohit


 On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal rohit.jais...@gmail.com
 wrote:

 Hi Alexander,
 Thanks for your reply. We plan to upgrade in the
 long run, however we need to fix the data loss problem on 0.90.2 in the
 immediate term.

 Here is the stack trace -


 10:09:37.783 PM

 [22:09:37,783][WARN ][indices.cluster  ] [Storm]
 [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
 org.elasticsearch.indices.recovery.RecoveryFailedException:
 [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
 Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
 [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
 at
 org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
 Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
 Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
 [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
 at
 org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
 at
 org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
 at
 org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
 at
 org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
 at
 org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
 at
 org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
 at
 org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: org.elasticsearch.transport.RemoteTransportException:
 [Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
 Caused by: org.elasticsearch.indices.InvalidAliasNameException:
 [b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
 [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
 alias name was passed to alias Filter
 at
 org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
 at
 org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
 at
 org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
 at
 org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 [22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending failed
 shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
 node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
 shard, message
 [RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
 failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
 into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
 RemoteTransportException[[Jeffrey 
 Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
 nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
 Phase[2] Execution failed]; nested:
 RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
 nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]

Re: Count request does not support [filter]. Why?

Hey,

not a hundred percent sure, what you mean here. The post_filter setting?
There are two possibilities: Either use the search_type=count or use a
filtered query in the count API. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-count.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#count

Also, be aware that the execution models are a bit different (which may
result in different performance), see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-post-filter.html#search-request-post-filter

Hope this helps, if not please refine your questions

--Alex

On Thu, Jun 19, 2014 at 3:23 PM, Andrew Gaydenko andrew.gayde...@gmail.com
wrote:

Count request does not support [filter]. Why? How to count with the same
filter (except for size, fields, from) and query I'm probably going
to search hits after counting?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/290e2be1-6f48-4266-a02e-4c8ff7620225%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/290e2be1-6f48-4266-a02e-4c8ff7620225%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9oHqQVvAbnh4pTRvtv%3DhzZJmq6YWnRjnkcRSXNqiVbcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Very frequent ES OOM's potential segment merge problems

Hey,

can you provide more information about the OOM exception? Also you should
use the nodes stats API to monitor your system, so you can maybe easily
spot, where this memory consumption stems from. Also, are you just indexing
or doing searches/queries/gets as well?

--Alex

On Thu, Jun 19, 2014 at 10:35 PM, Paul Sabou paul.sa...@gmail.com wrote:

Hi,

*Situation:*
We are using ES 1.2.1 on a machine with 32GB RAM, fast SSD and 12 cores. The
machine runs Ubuntu 14.0.x LTS.
The ES process has 12GB of RAM allocated.

We have an index in which we inserted 105 million small documents so the
ES data folder is around 50GB in size
(we see this by using du -h . on the folder)

The new document insertion rate is rather small (ie. 100-300 small docs
per second).

*The problem:*

We experienced rather frequent ES OOM (Out of Memory) at a rate of around
one every 15 mins. To lower the load on the index
we deleted 104+ million docs (ie. mostly small log entries) by deleting
everything in one type :
curl -XDELETE http://localhost:9200/index_xx/type_yy

so that we ended up with an ES index with several thousands docs.
After this we started to experience massive disk IO (10-20Mbs reads and
1MBs writes) and more frequent OOM's (at a rate of around
one every 7 minutes). We restart ES after every OOM and kept monitoring
the data folder size. Over the next hour the size went down
to around 36GB but now it's stuck there (doesn't go down in size even
after several hours).

*Questions* :
Is this a problem related to segment merging running out of memory? If so
how can be solved?
If not, what could be the problem?

Thanks
Paul.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/695c92a3-f77a-46bd-9041-79421a0bf1be%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/695c92a3-f77a-46bd-9041-79421a0bf1be%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8Ed84KwzVg1MTK8Da83YgO6pjb3QMLVwCT%2B48NPw3HfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch Node.Client Options

Hey,

a client node with a full 10gb heap and garbage collection does not free
anything, so those objects are still in use (which clearly explains THAT
the OOM happens, but not WHY). Do you have huge searches going on spanning
a lot of shards with deep pagination (all the time). Do you have some sort
of backup mechanism which might be response for this? Anything from a
search perspective which might lead to excessive memory usage?


--Alex


On Fri, Jun 20, 2014 at 12:15 AM, VB vishal.batgh...@gmail.com wrote:

 And this stack trace.

 [2014-06-04 14:47:12,939][INFO ][cluster.service  ] [BUS2F2801F3]
 master {new 
 [ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false,
 max_local_storage_nodes=1, master=true}, previous [ELS-10.76.121.130][
 BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 max_local_storage_nodes=1, master=true}}, removed {[ELS-10.76.121.130][
 BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300]]{data=false,
 max_local_storage_nodes=1, master=true},}, reason: zen-disco-master_failed
 ([ELS-10.76.121.130][BlGygpFmRn6uQNbgiEfl0A][inet[/10.76.121.130:9300
 ]]{data=false, max_local_storage_nodes=1, master=true})
 [2014-06-04 14:48:03,969][WARN ][monitor.jvm  ] [BUS2F2801F3]
 [gc][old][55503][489] duration [49.6s], collections [1]/[49.9s], total
 [49.6s]/[4.5h], memory [9.9gb]-[9.9gb]/[9.9gb], all_pools {[young]
 [532.5mb]-[532.5mb]/[532.5mb]}{[survivor] [51.3mb]-[42.8mb]/[66.5mb]}{[old]
 [9.3gb]-[9.3gb]/[9.3gb]}
 [2014-06-04 14:48:40,256][WARN ][monitor.jvm  ] [BUS2F2801F3]
 [gc][old][55504][490] duration [35.7s], collections [1]/[36.2s], total
 [35.7s]/[4.5h], memory [9.9gb]-[9.9gb]/[9.9gb], all_pools {[young]
 [532.5mb]-[532.5mb]/[532.5mb]}{[survivor] [42.8mb]-[58.6mb]/[66.5mb]}{[old]
 [9.3gb]-[9.3gb]/[9.3gb]}
 [2014-06-04 14:49:30,335][WARN ][monitor.jvm  ] [BUS2F2801F3]
 [gc][old][55505][491] duration [49.9s], collections [1]/[50s], total
 [49.9s]/[4.5h], memory [9.9gb]-[9.9gb]/[9.9gb], all_pools {[young]
 [532.5mb]-[532.5mb]/[532.5mb]}{[survivor] [58.6mb]-[63.7mb]/[66.5mb]}{[old]
 [9.3gb]-[9.3gb]/[9.3gb]}
 [2014-06-04 14:49:30,350][INFO ][discovery.zen] [BUS2F2801F3]
 master_left 
 [[ELS-10.76.121.131][dg_r12_nQbqIT_oJfjTwTg][inet[/10.76.121.131:9300]]{data=false,
 max_local_storage_nodes=1, master=true}], reason [failed to ping, tried [3]
 times, each with  maximum [30s] timeout]
 [2014-06-04 14:49:30,865][WARN ][discovery.zen] [BUS2F2801F3]
 not enough master nodes after master left (reason = failed to ping, tried
 [3] times, each with  maximum [30s] timeout), current nodes:
 {[ELS-10.76.125.37][j3VQFYDaQLujkprUnke02w][inet[/10.76.125.37:9300
 ]]{max_local_storage_nodes=1, master=false},[ELS-10.76.122.
 38][5V8bqkEzTP2TzMukB5_j-Q][inet[/10.76.122.38:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.125.48][TGlF1uv8Q5GpgBVvIcvRAQ][
 inet[/10.76.125.48:9300]]{max_local_storage_nodes=1,
 master=false},[EDSFB1ABF7][MqLDnM5mSLqIicIuyJk7IQ][inet[/10.76.122.19:9300
 ]]{client=true, data=false, master=false},[ELS-10.76.120.
 62][evcNI2CqSs-Zz44Jdzn0aw][inet[/10.76.120.62:9300]]{client=true,
 data=false, max_local_storage_nodes=1, master=false},[BUS9364B62][
 YZPjEsvhT6OjM9ti5Lxwkg][inet[/10.76.123.123:9300]]{client=true,
 data=false, master=false},[ELS-10.76.125.38][RyeswSy8SquV5H8Vfsw75Q][
 inet[/10.76.125.38:9300]]{max_local_storage_nodes=1,
 master=false},[EDSFB1200C][XUNaWVlYQUOVZlJMv3nHMA][inet[/10.76.122.18:9300
 ]]{client=true, data=false, master=false},[ELS-10.76.124.
 214][H8N9nIU0TKyGv_prKyRVCQ][inet[/10.76.124.214:9300]]{max_local_storage_nodes=1,
 master=false},[EDS1A1F2240][ET2u1qImQCCvqc-1gRvQbQ][inet[/
 10.76.120.87:9300]]{client=true, data=false, master=false},[ELS-10.76.125.
 40][hp4wvQxER-mMPygey2Iqgg][inet[/10.76.125.40:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.122.67][BiXop5iCRgGQyGvxazMkQg][
 inet[/10.76.122.67:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.121.129][pf9xpva7Q4izIy6Nj4S4iQ][
 inet[/10.76.121.129:9300]]{data=false, max_local_storage_nodes=1,
 master=true},[EDSFB21E69][RabnwdLbT1WCp9gIE-_AXw][inet[/10.76.122.20:9300
 ]]{client=true, data=false, master=false},[EDI1AE4FD76][
 UF1RMWe6RYaZGp6BU3x-VA][inet[/10.76.124.228:9300]]{client=true,
 data=false, master=false},[ELS-10.76.125.46][nXceQp40TjOSctChaGVtKw][
 inet[/10.76.125.46:9300]]{max_local_storage_nodes=1,
 master=false},[EDI1A1EA928][rWlelgQuT7KHSfyIejmLPg][inet[/
 10.76.120.82:9300]]{client=true, data=false, master=false},[ELS-10.76.121.
 188][oWldDeY4TJioki90moNySw][inet[/10.76.121.188:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.122.34][kPSYm9G8R8i_z2skK_jq1g][
 inet[/10.76.122.34:9300]]{max_local_storage_nodes=1,
 master=false},[ELS-10.76.125.43][JMgOIZFBSzaQZ9bVagG57w][
 inet[/10.76.125.43:9300]]{max_local_storage_nodes=1,
 master=false},[EDI1AE3EE57][7JHGaYjzS3uI7PLN8Ynm-Q][inet[/
 10.76.124.227:9300]]{client=true, data=false,

Re: puppet-elasticsearch options

2014-06-20 Thread Richard Pijnenburg

Hi Andrej,

Thank you for using the puppet module :-)

The 'port' and 'discovery minimum' settings are both configuration settings
for the elasticsearch.yml file.
You can set those in the 'config' option variable, for example:

elasticsearch::instance { 'instancename':
config = { 'http.port' = '9210', 'discovery.zen.minimum_master_nodes'
= 3 }
}

For the logging part, management of the logging.yml file is very limited at
the moment but i hope to get some feedback on extending that.
The thresholds for the slowlogs can be set in the same config option
variable.
See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-slowlog.html#index-slow-log

for more information.

If you have any further questions, let me know.

Cheers

On Thursday, June 19, 2014 9:53:10 AM UTC+1, Andrej Rosenheinrich wrote:

Hi,

i am playing around with puppet-easticsearch 0.4.0, works wells so far
(thanks!), but I am missing a few options I havent seen in the
documentation. As I couldnt figure it out immediately by reading the
scripts, may be someone can help me fast on this:

- there is an option to change the port (9200), but this is only the http
port. Is there an option to change the tcp transport port as well?
- how can I configure logging? I think about logfile names and loglevel,
may be even thresholds for slowlog. May be this is interesting enough to
add it to the documentation?
- is there an option in the module to easily configure memory usage?
- how can I configure the discovery minimum?

I am aware that I could go ahead and manipulate the elasticsearch.yml file
with puppet, I am just curious if there are options for my questions
already implemented in the module I have missed. So if someone could give
me a hint or an example it would be really helpful!

Thanks in advance!
Andrej

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/41d7340c-5570-4728-b979-35f97c233e25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

effiecient way to store the result of a large slow query

2014-06-20 Thread Chen Wang

Hi guys,
Just wondering what is the most efficient way of executing a query that
takes time(parent/child documents) and returns large amount of entries, and
store the result in randomly evenly divided block to HDFS? e.g, the query
will return 100million records and I want every random 1million stored in a
different location(file/folder) on HDFS.

I assume I could execute the query with scroll, and then whenever I
received the 1 million records back, I then spawn anther thread to commit
it to HDFS? Is there a way to run the query distributed way and have 100
threads query ES at the same time and each getting a random 1million
back(without duplicate)? will ES hadoop help in this case?

Appreciate your input!
Chen

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACim9Rm64uHE9EQ35r_mJr9VhiEbDfD-70vS1uQHSG6UXM7ZDQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Type Ahead feature for contact list

2014-06-20 Thread Omi60

Thanks for the help.

I am able to see the correct results now, but could you please suggest how
to write following query in java 

curl -X POST localhost:9200/hotels/_suggest -d '
{
  hotels : {
text : m,
completion : {
  field : name_suggest
}
  }
}'



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Re-Type-Ahead-feature-for-contact-list-tp4057883p4057889.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1403028901688-4057889.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Re: Storing auto generated _id under different name

2014-06-20 Thread Johny Lam

I'm using elasticsearch as the database for a service. It would make things
easier. For example, I could just return the _source field when other apps
query my service. Related to that is that on the javascript client side, I
am inserting the _id field into the _source JSON object as id and using
that as the model for two way data-binding. If the id field was in the
source already, I wouldn't have to keep track of this.

On Tuesday, June 17, 2014 4:26:07 PM UTC-7, Adrien Grand wrote:

No, it isn't possible.

Why would you like to have the id of the document included in _source?

On Tue, Jun 17, 2014 at 8:16 PM, Johny Lam john...@gmail.com
javascript: wrote:

Is it possible to have the _id be auto-generated and store it so that
it's in the _source field under a different name, like say id instead of
_id?

https://groups.google.com/d/msgid/elasticsearch/1eb03930-64c8-44ac-9f69-7ad2ff6b563e%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a1b9d878-47cc-4e06-ae02-0b32375cf3bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Combine elasticsearch/logstash/kibana with hadoop

2014-06-20 Thread kay rus

For performance improvement I'm trying to combine
elasticsearch/logstash/kibana with hadoop (cdh4). Unfortunately I'm
familiar only with HDFS where I store logs. In my opinion the combination
of elasticsearch and hadoop should use hdfs as storage and transparent
hadoop map/reduce functionality for search.

I ran through elasticsearch-hadoop documentation and unfortunately I didn't
understand how this combination could help me for kibana log analyze.
Documentation says Elasticsearch real-time search and analytics natively
integrated with Hadoop.. But what should I configure? Hadoop with
Elasticsearch or Elasticsearch with Hadoop? As for first one, I found only
java code parts, nothing about the Hadoop configuration, so it seems that I
should be familiar with java programming. As for the last one I found only
Hadoop HDFS Snapshot/Restore plugin, but I guess it was developed for
indexes backup/restore, am I right?

Anyway, are my expectations right? Or elasticsearch-hadoop was developed
for developers only and it is not suitable for
elasticsearch/logstash/kibana + hadoop

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/979a5788-1f17-4351-8c36-e205bc67dca0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Snapshot Restore in a cluster of two nodes

Hey,

can you be more precise and create a fully fledged example (generating the
repository, executing the snapshot on cluster one, executing restore on
cluster 2, etc) and include the concrete error message in order to find out
what 'the process breaks' means here? Also provide info about elasticsearch
and jvm versions. Thanks!

Snapshots are always done per index (the primary shards) and not per node,
so there must be something else going on.
Is it possible that only one node has write access to the repository?

--Alex

On Thu, Jun 19, 2014 at 3:36 PM, Daniel Bubenheim
daniel.bubenh...@googlemail.com wrote:

Hello,

we have a cluster of two nodes. Every index in this cluster consists of 2
shards and one replica. We want to make use of snapshots restore to
transfer data between two clusters. When we make our snapshots on node one
only the primary shard is included, the replica shard is missing. While
restoring on the other cluster the process breaks because of the missing
second shard.
Do we have to make a snapshot for each node to include both primary shards
so that we can restore the whole index or am i missing something here?

Thanks in advance
Daniel

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb1b3a48-250c-46bc-9a4a-8a9ccd582164%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb1b3a48-250c-46bc-9a4a-8a9ccd582164%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8sq_eZ6g1sGhau%3DO2%3D93t%2Bz2yOtqiXxb7xMA9mrchuYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How does shingle filter work on match_phrase in query phase?

Hello,

Let's say you have an indexed text t1 t3 t3 with shingles. The token
positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos
1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3).

So if you are searching with a match_phrase for t1 t2 t3 (even if
not tokenized as shingles) it will matches the document, because t1,
t2 and t3 are considered next to each others (based on there recorded
position) for this document.

Cédric Hourcade
c...@wal.fr

On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walker0...@gmail.com wrote:
How does shingle filter work on match_phrase in query phase?

After analyzing phrase t1 t2 t3, shingle filter produced five tokens,
t1
t2
t3
t1 t2
t2 t3

Will match_phrase still give t1 t2 t3 a match? How it works? Thank you.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPNWyj-r6LtrWDXv_HGA-sgxfy%3DEu4Z5gJ5kRk_K2MWVNw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Very frequent ES OOM's potential segment merge problems

2014-06-20 Thread Paul Sabou

java.lang.IllegalStateException: this writer hit an OutOfMemoryError;
cannot complete merge
at
org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3546)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4272)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3728)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

On Thursday, June 19, 2014 10:35:28 PM UTC+2, Paul Sabou wrote:

Hi,

*Situation:*
We are using ES 1.2.1 on a machine with 32GB RAM, fast SSD and 12 cores. The
machine runs Ubuntu 14.0.x LTS.
The ES process has 12GB of RAM allocated.

We have an index in which we inserted 105 million small documents so the
ES data folder is around 50GB in size
(we see this by using du -h . on the folder)

The new document insertion rate is rather small (ie. 100-300 small docs
per second).

*The problem:*

*Questions* :
Is this a problem related to segment merging running out of memory? If so
how can be solved?
If not, what could be the problem?

Thanks
Paul.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/db4e6c34-2d6b-4623-aa9c-c6fbf9083ea9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: 100% CPU on 1 Node with JMeter Tests

Hello,

It wouldn't surprise me if both Black Mamba and Slapstick were hitting
100%, they have more shards and have to handle more requests than the
others nodes. But in your case it's only one node.

First, are you http requests evenly spread over the 4 nodes? You could also
check that all your shards are about the same size?

To check if it's an hardware problem I would:
- disable the shards rebalacing
- stop the cluster
- switch the whole data directories from Black Mamba and Slapstick
- start the cluster and rerun the benchmark

You'll then see if the problem comes from the 3 shards or the server
itself.

Cédric Hourcade
c...@wal.fr

On Thu, Jun 19, 2014 at 7:40 PM, sai...@roblox.com wrote:

Bump

On Wednesday, June 18, 2014 6:20:58 PM UTC-7, sai...@roblox.com wrote:

One out of 4 nodes always spikes to 100% CPU when we do some load tests
using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered
Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The
other nodes have less than 40% CPU on them at the same time. The heap is
set at 30GB on all of them. This is the GIST for Hot Threads
https://gist.github.com/RobloxSai/9f040bbd5ab7b58f2b1d when the Test
was running. Is there anything else that can be done to improve the
performance? The Query Response times jump to 5-8 seconds when the CPU is
hammered.

https://lh3.googleusercontent.com/-EDnXAEg34cA/U6I5fb2zNOI/AB4/DqybJhq3Yhc/s1600/4+Nodes+Setup.png

I had previously posted the specs of the Servers on another thread
https://groups.google.com/forum/?utm_medium=emailutm_source=footer#!topic/elasticsearch/P1o_4bVvECA.
Here are the Server Specs:
*Machine Specs:*
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @
2.30GHz
Number of CPU cores:24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive:Two 278GB SAS Drive configured in
RAID 0
*OS:*
Arch: 64bit(x86_64)
OS Type:Linux
Kernel:2.6.32-431.5.1.el6.x86_64
OS Version:Red Hat Enterprise Linux Server release
6.5 (Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPPCtwhWJtGbY6dCU_mU6cyyfh3dgkLEW-0FW%3DH4Ki7LdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Count request does not support [filter]. Why?

Sorry, I wasn't clear enough. I mean Java client's CountRequest.source()'s 
argument content, { filter: ... } in particular.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b402f868-eeaf-484a-9081-75e81b7f5aed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem indexing with my analyzer

Does it mean your applying the reuters analyzer on your base64
encoded pictures?

I guess it generates a really huge number of tokens for each entry
because of your nGram filter (with a max at 250).

Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard
bernardtanguy1...@gmail.com wrote:
 Information
 My note_source contain picture (.jpg, .png ...) in base64 and text.

 For my mapping I have used :
 type = string
 analyzer = reuteurs (the name of my analyzer)


 Any idea ?

 Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :

 Hello
 I have some issue, when I index a particular data note_source (sql
 longtext).
 I use the same analyzer for each fields (except date_source and id_source)
 but for note_source, I have a warn monitor.jvm.
 When I remove note_source, everything fine. If I don't use analyzer on
 note_source, everything fine, but if I use my analyzer on note_source I
 have some crash.

 I think I have enough memory, I have used ES_HEAP_SIZE.
 Maybe my problem it's with accent (ascii, utf-8)

 Can you help me with this ?



 My Setting

  public function createSetting($pf){
 $params = array('index' = $pf, 'body' = array(
 'settings' = array(
 'number_of_shards' = 5,
 'number_of_replicas' = 0,
 'analysis' = array(
 'filter' = array(
 'nGram' = array(
 token_chars =array(),
 type = nGram,
 min_gram = 3,
 max_gram  = 250
 )
 ),
 'analyzer' = array(
 'reuters' = array(
 'type' = 'custom',
 'tokenizer' = 'standard',
 'filter' = array('lowercase', 'asciifolding',
 'nGram')
 )
 )
 )
 )
 ));
 $this-elasticsearchClient-indices()-create($params);
 return;
 }


 My Indexing

 public function indexTable($pf,$typeElement){

 $params =array(
 index ='_river',
 type = $typeElement,
 id = _meta,
 body =array(

 type = jdbc,
 jdbc = array(
 url = jdbc:mysql://ip/name,
 user = 'root',
 password = 'mdp',
 index = $pf,
 type = $typeElement,
 sql = select id_source as _id, id_sous_theme,
 titre_source, desc_source, note_source, adresse_source, type_source,
 date_source from source,
 max_bulk_requests = 5,
 )
 )

 );


 $this-elasticsearchClient-index($params);
 }

 Thanks in advance.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPM8qvsmcxB7Xu4KqN28pfvk%2BcBn5bpV2Emw42M5HzAAUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSearch queries always return all the datas stored in the index

hello,


https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
 
  
I'm trying to index and query an index store in ES 1.2. I both create and 
populate the index with the JAVA API using the transportclient api. I have 
the following mapping:

get /tp/carte/_mapping{
   tp: {
  mappings: {
 carte: {
properties: {
   adherents: {
  properties: {
 birthday: {
type: date,
format: dateOptionalTime
 },
 firstname: {
type: string
 },
 lastname: {
type: string
 }
  }
   },
   dateEdition: {
  type: date,
  format: dateOptionalTime
   }
}
 }
  }
   }}


When I search ob object with the ID, it works fine but, when I try to query 
the content of one of my nested objects, *ES always returns all the objects 
stored in the index*. I also tried to create the objects manually with 
sense and I have the same behaviour.

Example of my insert

put /tp/carte/20454795{
   dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1958-05-05T23:00:00.000Z,
 firstname: ANDREW,
 lastname: DOE
  },
  {
 birthday: 1964-03-01T23:00:00.000Z,
 firstname: ROBERT,
 lastname: DOE
  },
  {
 birthday: 1989-02-27T23:00:00.000Z,
 firstname: DAVID,
 lastname: DOE
  },
  {
 birthday: 1990-12-11T23:00:00.000Z,
 firstname: JOHN,
 lastname: DOE
  }
   ]
}

Finally, you could find below a query executed in sense


get /tp/carte/_search{
  query: {
bool: {
  must: [
{
  match: {
adherents.lastname: {
  query: DOE
}
  }
}
  ]
}
  }


How can I fix that ?

Thanks

Regards


Alexandre


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How to set the query resultset size to infinite

2014-06-20 Thread Nuno Carvalho

Hi all,

I just joined the mailing list, so sorry if this topic was discussed before.

I would like to set the query size to infinite (or no limit).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
This page explains what the parameters do, but there are no details on how 
to set the size to no limit or (if not possible) what is the max value 
accepted by ES for this parameter. I tried setting the value to -1, as I've 
read somewhere that this would be recognized as no limit, but instead it 
defaults to 10.

Any help?

Thanks,
Nuno

 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch queries always return all the datas stored in the index

Hey Alexandre,


This is correct. You are searching for a carte which contains an adherent.
Elasticsearch gives you a carte object as an answer. And elasticsearch gives 
you back exactly what you have indexed.

That being said, I think you could look at parent/child feature for that use 
case.
Or you can have one carte object per adherent?

Makes sense?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:06:40, Alexandre Touret (alexan...@touret.info) a écrit:

hello,


I'm trying to index and query an index store in ES 1.2. I both create and 
populate the index with the JAVA API using the transportclient api. I have the 
following mapping:


get /tp/carte/_mapping
{
   tp: {
  mappings: {
 carte: {
properties: {
   adherents: {
  properties: {
 birthday: {
type: date,
format: dateOptionalTime
 },
 firstname: {
type: string
 },
 lastname: {
type: string
 }
  }
   },
   dateEdition: {
  type: date,
  format: dateOptionalTime
   }
}
 }
  }
   }
}



When I search ob object with the ID, it works fine but, when I try to query the 
content of one of my nested objects, ES always returns all the objects stored 
in the index. I also tried to create the objects manually with sense and I have 
the same behaviour.

Example of my insert

put /tp/carte/20454795
{
   dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1958-05-05T23:00:00.000Z,
 firstname: ANDREW,
 lastname: DOE
  },
  {
 birthday: 1964-03-01T23:00:00.000Z,
 firstname: ROBERT,
 lastname: DOE
  },
  {
 birthday: 1989-02-27T23:00:00.000Z,
 firstname: DAVID,
 lastname: DOE
  },
  {
 birthday: 1990-12-11T23:00:00.000Z,
 firstname: JOHN,
 lastname: DOE
  }
   ]
}

Finally, you could find below a query executed in sense


get /tp/carte/_search
{
  query: {
bool: {
  must: [ 
{
  match: {
adherents.lastname: {
  query: DOE
}
  }
}
  ]
}
  }



How can I fix that ?

Thanks

Regards



Alexandre



--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53a3fad7.5bd062c2.198d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch queries always return all the datas stored in the index

Hello,
thanks for your response

When I add an other carte

put /tp/carte/20450813
{
  dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1963-03-22T23:00:00.000Z,
 firstname: FLORENCE,
 lastname: SMITH
  },
  {
 birthday: 2001-10-12T22:00:00.000Z,
 firstname: M ANGELO,
 lastname: SMITH  },
  {
 birthday: 2003-07-30T22:00:00.000Z,
 firstname: M LILI,
 lastname: SMITH
  }
   ]
}

and I run the query described above, I have both of the two 'carte'

Is it normal ?
Do you have an example or a link to illustrate the parent/child feature ?


Thanks



Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :

 Hey Alexandre,


 This is correct. You are searching for a carte which contains an adherent.
 Elasticsearch gives you a carte object as an answer. And elasticsearch 
 gives you back exactly what you have indexed.

 That being said, I think you could look at parent/child feature for that 
 use case.
 Or you can have one carte object per adherent?

 Makes sense?

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info 
 javascript:) a écrit:

 hello,

   

 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
   
 I'm trying to index and query an index store in ES 1.2. I both create and 
 populate the index with the JAVA API using the transportclient api. I have 
 the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to 
 query the content of one of my nested objects, *ES always returns all the 
 objects stored in the index*. I also tried to create the objects manually 
 with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,
  firstname: DAVID,
  lastname: DOE
   },
   {
  birthday: 1990-12-11T23:00:00.000Z,
  firstname: JOHN,
  lastname: DOE
   }
]
 }

 Finally, you could find below a query executed in sense


 get /tp/carte/_search{
   query: {
 bool: {
   must: [ 
 {
   match: {
 adherents.lastname: {
   query: DOE
 }
   }
 }
   ]
 }
   }


  How can I fix that ?

 Thanks

 Regards


 Alexandre


  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit

Re: ElasticSearch queries always return all the datas stored in the index

Searching for DOE gives you that answer? 
If so, it's not normal IMHO. You should try to reproduce it with a full SENSE 
script recreation so we can replay it and help you from here.

See http://www.elasticsearch.org/help/ for information.

About parent child, you could read this: 
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:19:23, Alexandre Touret (alexan...@touret.info) a écrit:

Hello,
thanks for your response

When I add an other carte

put /tp/carte/20450813
{
  dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1963-03-22T23:00:00.000Z,
 firstname: FLORENCE,
 lastname: SMITH
  },
  {
 birthday: 2001-10-12T22:00:00.000Z,
 firstname: M ANGELO,
 lastname: SMITH  },
  {
 birthday: 2003-07-30T22:00:00.000Z,
 firstname: M LILI,
 lastname: SMITH
  }
   ]
}

and I run the query described above, I have both of the two 'carte'

Is it normal ?
Do you have an example or a link to illustrate the parent/child feature ?


Thanks



Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :
Hey Alexandre,


This is correct. You are searching for a carte which contains an adherent.
Elasticsearch gives you a carte object as an answer. And elasticsearch gives 
you back exactly what you have indexed.

That being said, I think you could look at parent/child feature for that use 
case.
Or you can have one carte object per adherent?

Makes sense?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit:

hello,


I'm trying to index and query an index store in ES 1.2. I both create and 
populate the index with the JAVA API using the transportclient api. I have the 
following mapping:


get /tp/carte/_mapping
{
   tp: {
  mappings: {
 carte: {
properties: {
   adherents: {
  properties: {
 birthday: {
type: date,
format: dateOptionalTime
 },
 firstname: {
type: string
 },
 lastname: {
type: string
 }
  }
   },
   dateEdition: {
  type: date,
  format: dateOptionalTime
   }
}
 }
  }
   }
}



When I search ob object with the ID, it works fine but, when I try to query the 
content of one of my nested objects, ES always returns all the objects stored 
in the index. I also tried to create the objects manually with sense and I have 
the same behaviour.

Example of my insert

put /tp/carte/20454795
{
   dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1958-05-05T23:00:00.000Z,
 firstname: ANDREW,
 lastname: DOE
  },
  {
 birthday: 1964-03-01T23:00:00.000Z,
 firstname: ROBERT,
 lastname: DOE
  },
  {
 birthday: 1989-02-27T23:00:00.000Z,
 firstname: DAVID,
 lastname: DOE
  },
  {
 birthday: 1990-12-11T23:00:00.000Z,
 firstname: JOHN,
 lastname: DOE
  }
   ]
}

Finally, you could find below a query executed in sense


get /tp/carte/_search
{
  query: {
bool: {
  must: [  
{
  match: {
adherents.lastname: {
  query: DOE
}
  }
}
  ]
}
  }



How can I fix that ?

Thanks

Regards



Alexandre



--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6102e860-e997-45db-9db4-7da309e6c761%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit

Re: How to set the query resultset size to infinite

You don't want to do that!
If your need is to extract (download) 1 000 000 000 records, you need to use
scanscroll API:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html#scan-scroll

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 juin 2014 à 11:08:00, Nuno Carvalho (nuno...@gmail.com) a écrit:

Hi all,

I just joined the mailing list, so sorry if this topic was discussed before.

I would like to set the query size to infinite (or no limit).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
This page explains what the parameters do, but there are no details on how to
set the size to no limit or (if not possible) what is the max value accepted
by ES for this parameter. I tried setting the value to -1, as I've read
somewhere that this would be recognized as no limit, but instead it defaults
to 10.

Any help?

Thanks,
Nuno

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.53a3fe27.3352255a.198d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: problem indexing with my analyzer

Yes, I am applying reuters on my document (compose by text and picture).
My goal is to do my research on the text of the document with any word or 
part of a word.

Yes the problem it's my nGram filter.
How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an 
other but who satisfy my goal ?

Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit :

 Does it mean your applying the reuters analyzer on your base64 
 encoded pictures? 

 I guess it generates a really huge number of tokens for each entry 
 because of your nGram filter (with a max at 250). 

 Cédric Hourcade 
 c...@wal.fr javascript: 


 On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard 
 bernardt...@gmail.com javascript: wrote: 
  Information 
  My note_source contain picture (.jpg, .png ...) in base64 and text. 
  
  For my mapping I have used : 
  type = string 
  analyzer = reuteurs (the name of my analyzer) 
  
  
  Any idea ? 
  
  Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : 
  
  Hello 
  I have some issue, when I index a particular data note_source (sql 
  longtext). 
  I use the same analyzer for each fields (except date_source and 
 id_source) 
  but for note_source, I have a warn monitor.jvm. 
  When I remove note_source, everything fine. If I don't use analyzer 
 on 
  note_source, everything fine, but if I use my analyzer on 
 note_source I 
  have some crash. 
  
  I think I have enough memory, I have used ES_HEAP_SIZE. 
  Maybe my problem it's with accent (ascii, utf-8) 
  
  Can you help me with this ? 
  
  
  
  My Setting 
  
   public function createSetting($pf){ 
  $params = array('index' = $pf, 'body' = array( 
  'settings' = array( 
  'number_of_shards' = 5, 
  'number_of_replicas' = 0, 
  'analysis' = array( 
  'filter' = array( 
  'nGram' = array( 
  token_chars =array(), 
  type = nGram, 
  min_gram = 3, 
  max_gram  = 250 
  ) 
  ), 
  'analyzer' = array( 
  'reuters' = array( 
  'type' = 'custom', 
  'tokenizer' = 'standard', 
  'filter' = array('lowercase', 'asciifolding', 
  'nGram') 
  ) 
  ) 
  ) 
  ) 
  )); 
  $this-elasticsearchClient-indices()-create($params); 
  return; 
  } 
  
  
  My Indexing 
  
  public function indexTable($pf,$typeElement){ 
  
  $params =array( 
  index ='_river', 
  type = $typeElement, 
  id = _meta, 
  body =array( 
  
  type = jdbc, 
  jdbc = array( 
  url = jdbc:mysql://ip/name, 
  user = 'root', 
  password = 'mdp', 
  index = $pf, 
  type = $typeElement, 
  sql = select id_source as _id, id_sous_theme, 
  titre_source, desc_source, note_source, adresse_source, type_source, 
  date_source from source, 
  max_bulk_requests = 5, 
  ) 
  ) 
  
  ); 
  
  
  $this-elasticsearchClient-index($params); 
  } 
  
  Thanks in advance. 
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups 
  elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an 
  email to elasticsearc...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
  

  For more options, visit https://groups.google.com/d/optout. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7daa716-cb5f-45cc-916b-43c7c0aea6b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch queries always return all the datas stored in the index

Yes
My request for doe always return that answer



Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :

 Searching for DOE gives you that answer? 
 If so, it's not normal IMHO. You should try to reproduce it with a full 
 SENSE script recreation so we can replay it and help you from here.

 See http://www.elasticsearch.org/help/ for information.

 About parent child, you could read this: 
 http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/



 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info 
 javascript:) a écrit:

 Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : 

  Hey Alexandre,
  
  
  This is correct. You are searching for a carte which contains an 
 adherent.
  Elasticsearch gives you a carte object as an answer. And elasticsearch 
 gives you back exactly what you have indexed.
  
  That being said, I think you could look at parent/child feature for that 
 use case.
  Or you can have one carte object per adherent?
  
  Makes sense?

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
  @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr
  

 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a 
 écrit:

  hello,

   

 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
   
 I'm trying to index and query an index store in ES 1.2. I both create and 
 populate the index with the JAVA API using the transportclient api. I have 
 the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to 
 query the content of one of my nested objects, *ES always returns all 
 the objects stored in the index*. I also tried to create the objects 
 manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,
  firstname: DAVID,
  lastname: DOE
   },
   {
  birthday: 1990-12-11T23:00:00.000Z,
  firstname: JOHN,
  lastname: DOE
   }
]
 }

 Finally, you could find below a query executed in sense


 get /tp/carte/_search{
   query: {
 bool: {
   must: [  
 {
   match: {
 adherents.lastname: {
   query: DOE
 }
   }
 }
   ]
 }
   }


  How can I fix that ?

 Thanks

 Regards


 Alexandre


  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this

Re: ElasticSearch queries always return all the datas stored in the index

It looks like you are doing a GET rather than a POST, if so your query
content is ignored.


Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alexan...@touret.info
wrote:

 Yes
 My request for doe always return that answer



 Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :

 Searching for DOE gives you that answer?
 If so, it's not normal IMHO. You should try to reproduce it with a full
 SENSE script recreation so we can replay it and help you from here.

 See http://www.elasticsearch.org/help/ for information.

 About parent child, you could read this: http://www.
 elasticsearch.org/blog/managing-relations-inside-elasticsearch/



  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a
 écrit:

  Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :

  Hey Alexandre,


  This is correct. You are searching for a carte which contains an
 adherent.
  Elasticsearch gives you a carte object as an answer. And elasticsearch
 gives you back exactly what you have indexed.

  That being said, I think you could look at parent/child feature for
 that use case.
  Or you can have one carte object per adherent?

  Makes sense?

  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
  @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a
 écrit:

  hello,



 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#

 I'm trying to index and query an index store in ES 1.2. I both create
 and populate the index with the JAVA API using the transportclient api. I
 have the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to
 query the content of one of my nested objects, *ES always returns all
 the objects stored in the index*. I also tried to create the objects
 manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,
  firstname: DAVID,
  lastname: DOE
   },
   {
  birthday: 1990-12-11T23:00:00.000Z,
  firstname: JOHN,
  lastname: DOE
   }
]
 }

 Finally, you could find below a query executed in sense


 get /tp/carte/_search{
   query: {
 bool: {
   must: [
 {
   match: {
 adherents.lastname: {
   query: DOE
 }
   }
 }
   ]
 }
   }


  How can I fix that ?

 Thanks

 Regards

Re: ElasticSearch queries always return all the datas stored in the index

That's right 
Thanks for your help :)

Regards

Le vendredi 20 juin 2014 11:28:26 UTC+2, Cédric Hourcade a écrit :

 It looks like you are doing a GET rather than a POST, if so your query 
 content is ignored.


 Cédric Hourcade
 c...@wal.fr javascript:


 On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alex...@touret.info 
 javascript: wrote:

 Yes
 My request for doe always return that answer



 Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :

 Searching for DOE gives you that answer? 
 If so, it's not normal IMHO. You should try to reproduce it with a full 
 SENSE script recreation so we can replay it and help you from here.

 See http://www.elasticsearch.org/help/ for information.

 About parent child, you could read this: http://www.
 elasticsearch.org/blog/managing-relations-inside-elasticsearch/



  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a 
 écrit:

  Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : 

  Hey Alexandre,
  
  
  This is correct. You are searching for a carte which contains an 
 adherent.
  Elasticsearch gives you a carte object as an answer. And elasticsearch 
 gives you back exactly what you have indexed.
  
  That being said, I think you could look at parent/child feature for 
 that use case.
  Or you can have one carte object per adherent?
  
  Makes sense?

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
  @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr
  

 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a 
 écrit:

  hello,

   

 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
   
 I'm trying to index and query an index store in ES 1.2. I both create 
 and populate the index with the JAVA API using the transportclient api. I 
 have the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to 
 query the content of one of my nested objects, *ES always returns all 
 the objects stored in the index*. I also tried to create the objects 
 manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,
  firstname: DAVID,
  lastname: DOE
   },
   {
  birthday: 1990-12-11T23:00:00.000Z,
  firstname: JOHN,
  lastname: DOE
   }
]
 }

 Finally, you could find below a query executed in sense


 get /tp/carte/_search{
   query: {
 bool: {
   must: [

Re: ElasticSearch queries always return all the datas stored in the index

No. GET works for running searches.

It could be an issue if you are using an OLD SENSE version and not Marvel.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr) a écrit:

It looks like you are doing a GET rather than a POST, if so your query content 
is ignored.


Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alexan...@touret.info 
wrote:
Yes
My request for doe always return that answer



Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :
Searching for DOE gives you that answer? 
If so, it's not normal IMHO. You should try to reproduce it with a full SENSE 
script recreation so we can replay it and help you from here.

See http://www.elasticsearch.org/help/ for information.

About parent child, you could read this: 
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a écrit:

Hello,
thanks for your response

When I add an other carte

put /tp/carte/20450813
{
  dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1963-03-22T23:00:00.000Z,
 firstname: FLORENCE,
 lastname: SMITH
  },
  {
 birthday: 2001-10-12T22:00:00.000Z,
 firstname: M ANGELO,
 lastname: SMITH  },
  {
 birthday: 2003-07-30T22:00:00.000Z,
 firstname: M LILI,
 lastname: SMITH
  }
   ]
}

and I run the query described above, I have both of the two 'carte'

Is it normal ?
Do you have an example or a link to illustrate the parent/child feature ?


Thanks



Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :
Hey Alexandre,


This is correct. You are searching for a carte which contains an adherent.
Elasticsearch gives you a carte object as an answer. And elasticsearch gives 
you back exactly what you have indexed.

That being said, I think you could look at parent/child feature for that use 
case.
Or you can have one carte object per adherent?

Makes sense?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a écrit:

hello,


I'm trying to index and query an index store in ES 1.2. I both create and 
populate the index with the JAVA API using the transportclient api. I have the 
following mapping:


get /tp/carte/_mapping
{
   tp: {
  mappings: {
 carte: {
properties: {
   adherents: {
  properties: {
 birthday: {
type: date,
format: dateOptionalTime
 },
 firstname: {
type: string
 },
 lastname: {
type: string
 }
  }
   },
   dateEdition: {
  type: date,
  format: dateOptionalTime
   }
}
 }
  }
   }
}



When I search ob object with the ID, it works fine but, when I try to query the 
content of one of my nested objects, ES always returns all the objects stored 
in the index. I also tried to create the objects manually with sense and I have 
the same behaviour.

Example of my insert

put /tp/carte/20454795
{
   dateEdition: 2014-06-01T22:00:00.000Z,
   adherents: [
  {
 birthday: 1958-05-05T23:00:00.000Z,
 firstname: ANDREW,
 lastname: DOE
  },
  {
 birthday: 1964-03-01T23:00:00.000Z,
 firstname: ROBERT,
 lastname: DOE
  },
  {
 birthday: 1989-02-27T23:00:00.000Z,
 firstname: DAVID,
 lastname: DOE
  },
  {
 birthday: 1990-12-11T23:00:00.000Z,
 firstname: JOHN,
 lastname: DOE
  }
   ]
}

Finally, you could find below a query executed in sense


get /tp/carte/_search
{
  query: {
bool: {
  must: [   
{
  match: {
adherents.lastname: {
  query: DOE
}
  }
}
  ]
}
  }



How can I fix that ?

Thanks

Regards



Alexandre



--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To

Re: problem indexing with my analyzer

I set max_gram=20. It's better but at the end I have this many times :

[2014-06-20 11:42:14,201][WARN ][monitor.jvm  ] [ik-test2] 
[gc][young][528][263] duration [2s], collections [1]/[2.1s], total 
[2s]/[43.9s], memory [536mb]-[580.2mb]/[1015.6mb], all_pools {[young] 
[22.5mb]-[22.3mb]/[66.5mb]}{[survivor] [14.9kb]-[49.3kb]/[8.3mb]}{[old] 
[513.4mb]-[557.8mb]/[940.8mb]}

I put ES_HEAP_SIZE : 2G. I think it's enough.
Something wrong ?


Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :

 Hello
 I have some issue, when I index a particular data note_source (sql 
 longtext).
 I use the same analyzer for each fields (except date_source and id_source) 
 but for note_source, I have a warn monitor.jvm.
 When I remove note_source, everything fine. If I don't use analyzer on 
 note_source, everything fine, but if I use my analyzer on note_source I 
 have some crash.

 I think I have enough memory, I have used ES_HEAP_SIZE.
 Maybe my problem it's with accent (ascii, utf-8)

 Can you help me with this ?



 *My Setting*

  public function createSetting($pf){
 $params = array('index' = $pf, 'body' = array(
 'settings' = array(
 'number_of_shards' = 5,
 'number_of_replicas' = 0,
 'analysis' = array(
 'filter' = array(
 'nGram' = array(
 token_chars =array(),
 type = nGram,
 min_gram = 3,
 max_gram  = 250
 )
 ),
 'analyzer' = array(
 'reuters' = array(
 'type' = 'custom',
 'tokenizer' = 'standard',
 'filter' = array('lowercase', 'asciifolding', 
 'nGram')
 )
 )
 )
 )
 ));
 $this-elasticsearchClient-indices()-create($params);
 return;
 }


 *My Indexing*

 public function indexTable($pf,$typeElement){

 $params =array(
 index ='_river', 
 type = $typeElement, 
 id = _meta, 
 body =array(
   
 type = jdbc,
 jdbc = array(
 url = jdbc:mysql://ip/name,
 user = 'root',
 password = 'mdp',
 index = $pf,
 type = $typeElement,
 sql = select id_source as _id, id_sous_theme, 
 titre_source, desc_source, note_source, adresse_source, type_source, 
 date_source from source,
 max_bulk_requests = 5,  
 )
 )
 
 );
 
  
 $this-elasticsearchClient-index($params);
 }

 Thanks in advance.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/154b8ca2-a130-4062-b5ce-0e0fa63d98fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: ElasticSearch queries always return all the datas stored in the index

I just upgraded to ES 1.2.1 and the latest release of mavel.
I have the same behaviour

Le vendredi 20 juin 2014 11:34:59 UTC+2, David Pilato a écrit :

 No. GET works for running searches.

 It could be an issue if you are using an OLD SENSE version and not Marvel.

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr javascript:) a 
 écrit:

 It looks like you are doing a GET rather than a POST, if so your query 
 content is ignored. 

  
 Cédric Hourcade
 c...@wal.fr javascript:


 On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alex...@touret.info 
 javascript: wrote:

 Yes
 My request for doe always return that answer



 Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit : 

   Searching for DOE gives you that answer? 
  If so, it's not normal IMHO. You should try to reproduce it with a full 
 SENSE script recreation so we can replay it and help you from here.
  
  See http://www.elasticsearch.org/help/ for information.
  
  About parent child, you could read this: 
 http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
  
  

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr
  

 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a 
 écrit:

   Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit : 

  Hey Alexandre,
  
  
  This is correct. You are searching for a carte which contains an 
 adherent.
  Elasticsearch gives you a carte object as an answer. And elasticsearch 
 gives you back exactly what you have indexed.
  
  That being said, I think you could look at parent/child feature for 
 that use case.
  Or you can have one carte object per adherent?
  
  Makes sense?

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr 
 https://twitter.com/elasticsearchfr
  

 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a 
 écrit:

  hello,

   

 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
   
 I'm trying to index and query an index store in ES 1.2. I both create 
 and populate the index with the JAVA API using the transportclient api. I 
 have the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try to 
 query the content of one of my nested objects, *ES always returns all 
 the objects stored in the index*. I also tried to create the objects 
 manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {
  birthday: 1989-02-27T23:00:00.000Z,

Re: How to set the query resultset size to infinite

2014-06-20 Thread Nuno Carvalho

Right... that makes sense :)

I'll give it a try, thank you!

Nuno

On Friday, 20 June 2014 10:26:07 UTC+1, David Pilato wrote:

You don't want to do that!
If your need is to extract (download) 1 000 000 000 records, you need to
use scanscroll API:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html#scan-scroll

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr

Le 20 juin 2014 à 11:08:00, Nuno Carvalho (nun...@gmail.com javascript:)
a écrit:

Hi all,

I just joined the mailing list, so sorry if this topic was discussed
before.

I would like to set the query size to infinite (or no limit).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
This page explains what the parameters do, but there are no details on
how to set the size to no limit or (if not possible) what is the max
value accepted by ES for this parameter. I tried setting the value to -1,
as I've read somewhere that this would be recognized as no limit, but
instead it defaults to 10.

Any help?

Thanks,
Nuno

https://groups.google.com/d/msgid/elasticsearch/73ad3559-85b0-40a0-9325-5ff2054f192d%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/49dbec8b-765a-4647-8672-b556028dcea0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem indexing with my analyzer

The user copy/paste the content of an html page and me, I index this 
information. I take the entire document with image. I can't change this 
behavior.

I set max_gram=20. It's better but at the end I have this many times :

[2014-06-20 11:42:14,201][WARN ][monitor.jvm  ] [ik-test2] 
[gc][young][528][263] duration [2s], collections [1]/[2.1s], total 
[2s]/[43.9s], memory [536mb]-[580.2mb]/[1015.6mb], all_pools {[young] 
[22.5mb]-[22.3mb]/[66.5mb]}{[survivor] [14.9kb]-[49.3kb]/[8.3mb]}{[old] 
[513.4mb]-[557.8mb]/[940.8mb]}

I put ES_HEAP_SIZE : 2G. I think it's enough.
Something wrong ?

Le vendredi 20 juin 2014 11:45:22 UTC+2, Cédric Hourcade a écrit :

 If you are only searching in the text you should index the images in 
 an other field field. With no analyzer (index: not_analyzed), or 
 even better index: no (not indexed). If you need to retrieve the 
 image data it's still in the _source. 

 But to be honest I wouldn't even store this kind of information in ES, 
 your index is going to be bigger, merges are going to be slower... I'd 
 keep the binary files stored elsewhere. 

 Cédric Hourcade 
 c...@wal.fr javascript: 


 On Fri, Jun 20, 2014 at 11:25 AM, Tanguy Bernard 
 bernardt...@gmail.com javascript: wrote: 
  Yes, I am applying reuters on my document (compose by text and 
 picture). 
  My goal is to do my research on the text of the document with any word 
 or 
  part of a word. 
  
  Yes the problem it's my nGram filter. 
  How do I solve this problem ? Deacrease nGram max ? Change Analyzer by 
 an 
  other but who satisfy my goal ? 
  
  Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit : 
  
  Does it mean your applying the reuters analyzer on your base64 
  encoded pictures? 
  
  I guess it generates a really huge number of tokens for each entry 
  because of your nGram filter (with a max at 250). 
  
  Cédric Hourcade 
  c...@wal.fr 
  
  
  On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard 
  bernardt...@gmail.com wrote: 
   Information 
   My note_source contain picture (.jpg, .png ...) in base64 and text. 
   
   For my mapping I have used : 
   type = string 
   analyzer = reuteurs (the name of my analyzer) 
   
   
   Any idea ? 
   
   Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : 
   
   Hello 
   I have some issue, when I index a particular data note_source (sql 
   longtext). 
   I use the same analyzer for each fields (except date_source and 
   id_source) 
   but for note_source, I have a warn monitor.jvm. 
   When I remove note_source, everything fine. If I don't use 
 analyzer 
   on 
   note_source, everything fine, but if I use my analyzer on 
   note_source I 
   have some crash. 
   
   I think I have enough memory, I have used ES_HEAP_SIZE. 
   Maybe my problem it's with accent (ascii, utf-8) 
   
   Can you help me with this ? 
   
   
   
   My Setting 
   
public function createSetting($pf){ 
   $params = array('index' = $pf, 'body' = array( 
   'settings' = array( 
   'number_of_shards' = 5, 
   'number_of_replicas' = 0, 
   'analysis' = array( 
   'filter' = array( 
   'nGram' = array( 
   token_chars =array(), 
   type = nGram, 
   min_gram = 3, 
   max_gram  = 250 
   ) 
   ), 
   'analyzer' = array( 
   'reuters' = array( 
   'type' = 'custom', 
   'tokenizer' = 'standard', 
   'filter' = array('lowercase', 
 'asciifolding', 
   'nGram') 
   ) 
   ) 
   ) 
   ) 
   )); 
   $this-elasticsearchClient-indices()-create($params); 
   return; 
   } 
   
   
   My Indexing 
   
   public function indexTable($pf,$typeElement){ 
   
   $params =array( 
   index ='_river', 
   type = $typeElement, 
   id = _meta, 
   body =array( 
   
   type = jdbc, 
   jdbc = array( 
   url = jdbc:mysql://ip/name, 
   user = 'root', 
   password = 'mdp', 
   index = $pf, 
   type = $typeElement, 
   sql = select id_source as _id, id_sous_theme, 
   titre_source, desc_source, note_source, adresse_source, type_source, 
   date_source from source, 
   max_bulk_requests = 5, 
   ) 
   ) 
   
   ); 
   
   
   $this-elasticsearchClient-index($params); 
   } 
   
   Thanks in advance. 
   
   -- 
   You received this message because you are subscribed to the Google 
   Groups 
   elasticsearch group. 
   To unsubscribe from this group and stop receiving emails from it, 
 send

Re: ElasticSearch queries always return all the datas stored in the index

Ah yes sorry you are right, I am using some old tools :)


Cédric Hourcade
c...@wal.fr


On Fri, Jun 20, 2014 at 11:49 AM, Alexandre Touret alexan...@touret.info
wrote:

 I just upgraded to ES 1.2.1 and the latest release of mavel.
 I have the same behaviour

 Le vendredi 20 juin 2014 11:34:59 UTC+2, David Pilato a écrit :

 No. GET works for running searches.

 It could be an issue if you are using an OLD SENSE version and not Marvel.

  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr) a écrit:

  It looks like you are doing a GET rather than a POST, if so your query
 content is ignored.


 Cédric Hourcade
 c...@wal.fr


 On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alex...@touret.info
 wrote:

 Yes
 My request for doe always return that answer



 Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :

   Searching for DOE gives you that answer?
  If so, it's not normal IMHO. You should try to reproduce it with a
 full SENSE script recreation so we can replay it and help you from here.

  See http://www.elasticsearch.org/help/ for information.

  About parent child, you could read this: http://www.
 elasticsearch.org/blog/managing-relations-inside-elasticsearch/



  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:19:23, Alexandre Touret (alex...@touret.info) a
 écrit:

   Hello,
 thanks for your response

 When I add an other carte

 put /tp/carte/20450813
 {
   dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1963-03-22T23:00:00.000Z,
  firstname: FLORENCE,
  lastname: SMITH
   },
   {
  birthday: 2001-10-12T22:00:00.000Z,
  firstname: M ANGELO,
  lastname: SMITH  },
   {
  birthday: 2003-07-30T22:00:00.000Z,
  firstname: M LILI,
  lastname: SMITH
   }
]
 }

 and I run the query described above, I have both of the two 'carte'

 Is it normal ?
 Do you have an example or a link to illustrate the parent/child feature
 ?


 Thanks



 Le vendredi 20 juin 2014 11:12:04 UTC+2, David Pilato a écrit :

  Hey Alexandre,


  This is correct. You are searching for a carte which contains an
 adherent.
  Elasticsearch gives you a carte object as an answer. And
 elasticsearch gives you back exactly what you have indexed.

  That being said, I think you could look at parent/child feature for
 that use case.
  Or you can have one carte object per adherent?

  Makes sense?

  --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | @elasticsearchfr
 https://twitter.com/elasticsearchfr


 Le 20 juin 2014 à 11:06:40, Alexandre Touret (alex...@touret.info) a
 écrit:

  hello,



 https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#

 I'm trying to index and query an index store in ES 1.2. I both create
 and populate the index with the JAVA API using the transportclient api. I
 have the following mapping:

 get /tp/carte/_mapping{
tp: {
   mappings: {
  carte: {
 properties: {
adherents: {
   properties: {
  birthday: {
 type: date,
 format: dateOptionalTime
  },
  firstname: {
 type: string
  },
  lastname: {
 type: string
  }
   }
},
dateEdition: {
   type: date,
   format: dateOptionalTime
}
 }
  }
   }
}}


  When I search ob object with the ID, it works fine but, when I try
 to query the content of one of my nested objects, *ES always returns
 all the objects stored in the index*. I also tried to create the
 objects manually with sense and I have the same behaviour.

 Example of my insert

 put /tp/carte/20454795{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
   {
  birthday: 1958-05-05T23:00:00.000Z,
  firstname: ANDREW,
  lastname: DOE
   },
   {
  birthday: 1964-03-01T23:00:00.000Z,
  firstname: ROBERT,
  lastname: DOE
   },
   {

Re: How does shingle filter work on match_phrase in query phase?

2014-06-20 Thread 陳智清

Hello Hourcade, Thanks for your response.

Does that mean different values should be set to index_analyzer and
search_analyzer? (e.g. index_analyzer: shingle, and
search_analyzer: standard)
What if I want to re-use the same shingle analyzer in both index and
search? will the match_phrase t1 t2 t3 still give me a match?

I know that set a different analyzer to search_analyzer makes
match_phrase t1 t2 t3 searchable, but if I do that, then I get no benefit
from shingle, right? Instead I get a bigger index size.

I assume shingle is used for faster match_phrase searches. But after
shingle, searching a phrase of 3 tokens t1 t2 t3 becomes searching a
phrase of 5 tokens plus I don't know how shingle arranges the positions
for a correct phrase query. So how can match_phrase be faster? Thank you.

Cédric Hourcade於 2014年6月20日星期五UTC+8下午4時18分03秒寫道：

Hello,

Let's say you have an indexed text t1 t3 t3 with shingles. The token
positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos
1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3).

Cédric Hourcade
c...@wal.fr javascript:

On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walke...@gmail.com javascript:
wrote:
How does shingle filter work on match_phrase in query phase?

After analyzing phrase t1 t2 t3, shingle filter produced five tokens,
t1
t2
t3
t1 t2
t2 t3

Will match_phrase still give t1 t2 t3 a match? How it works? Thank
you.

https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/602477cb-d8f4-459b--e6174662fbfd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How do people typically handle shard failures in their results?

2014-06-20 Thread Shay Banon

If it fails on the primary shard, then a failure is returned. If it worked, and
a replica failed, then that replica is deemed a failed replica, and will get
allocated somewhere else in the cluster. Maybe an example of where a failure on
all shards would help here?

On Jun 18, 2014, at 11:45, mooky nick.minute...@gmail.com wrote:

If I understand correctly, we can get an OK response from elastic (ie no
error) but if there are shard failures in the response, it potentially means
that results are incomplete/incorrect. From my observation, we can get
failures on all shards - and elastic still returns OK (which was a bit
surprising to me)

What kinds of approaches to people typically use to deal with shard failures?

For my application, if there are shard failures, essentially my results are
inaccurate/incorrect - so I need to return an error to the client. Returning
bad results is worse than returning an error.

I am inclined to turn any shard failure into an exception.
Is this quite common? Does it make sense to add a feature to the elastic api
? (ie request.setTreatShardFailuresAsErrors(true)

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/461fa217-d664-47e9-a60d-88ea9506327d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/FFDC7083-24CB-484D-B337-65582596D555%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How do people typically handle shard failures in their results?

2014-06-20 Thread Nikolas Everett

On Fri, Jun 20, 2014 at 7:08 AM, Shay Banon kim...@gmail.com wrote:

If it fails on the primary shard, then a failure is returned. If it
worked, and a replica failed, then that replica is deemed a failed replica,
and will get allocated somewhere else in the cluster. Maybe an example of
where a failure on “all” shards would help here?

I think its more about searches and they can fail on one shard but not
other for all sorts of reasons. Queue full, unfortunate script, bug, only
one shard had results and the query asked for something weird like to use
the postings highlighter when postings aren't stored. Lots of reasons.

I log the event and move on. I toyed with outputting a warning to the user
but didn't have time to implement it. We're pretty diligent with our logs
so we'd notice the log and run it down.

If the failure is caused by the queue being full only on one node, we'd
likely notice that real quick as ganglia would lose it. This happened to
me recently when we put a node without an ssd into a cluster with ssds. It
couldn't keep up and dropped a ton of searches. In our defense, we didn't
know the rest of the cluster had ssds so we were double surprised.

Nik

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2dvNM-wu%3Due4trJzAtLV%3Dz1xK0MVNxhYkUKv2g68z3VQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How does shingle filter work on match_phrase in query phase?

Yes, you can use two different analyzers. In your case what you can do is:
- for the the indexation you apply a shingle filter.
- for the query you also apply a shingle filter, but this time you
disable the unigrams (output_unigrams: false), so it will only
generate the shingles, in your case : t1 t2 and t2 t3. It will
match your document.
Cédric Hourcade
c...@wal.fr

On Fri, Jun 20, 2014 at 12:30 PM, 陳智清 walker0...@gmail.com wrote:
Hello Hourcade, Thanks for your response.

Does that mean different values should be set to index_analyzer and
search_analyzer? (e.g. index_analyzer: shingle, and search_analyzer:
standard)
What if I want to re-use the same shingle analyzer in both index and
search? will the match_phrase t1 t2 t3 still give me a match?

I know that set a different analyzer to search_analyzer makes match_phrase
t1 t2 t3 searchable, but if I do that, then I get no benefit from
shingle, right? Instead I get a bigger index size.

Cédric Hourcade於 2014年6月20日星期五UTC+8下午4時18分03秒寫道：

Hello,

Let's say you have an indexed text t1 t3 t3 with shingles. The token
positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos
1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3).

Cédric Hourcade
c...@wal.fr

On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walke...@gmail.com wrote:
How does shingle filter work on match_phrase in query phase?

After analyzing phrase t1 t2 t3, shingle filter produced five tokens,
t1
t2
t3
t1 t2
t2 t3

Will match_phrase still give t1 t2 t3 a match? How it works? Thank
you.

https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPMAEGK%3DSxYfoBtjgcdZYPHqAAiSPpQBjh1fvtXgkwWuLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How do people typically handle shard failures in their results?

2014-06-20 Thread Shay Banon

Ahh, I see. If its related to searches, then yea, the search response includes
details about the total shards that the search was executed on, the successful
shards, and failed shards. They are important to check to understand if one
gets partial results.

In the REST API, if there is a total failure, then it will return the worst
status code out of all the shards in the response. In the Java API, the search
response will be returned (with no exception), so the content of the search has
to be checked (which is a good practice anyhow). It might make sense to raise
an exception in the Java API if all shards failed, I am on the fence on this
one, since anyhow a check needs to be performed on the result.

On Jun 20, 2014, at 13:22, Nikolas Everett nik9...@gmail.com wrote:

On Fri, Jun 20, 2014 at 7:08 AM, Shay Banon kim...@gmail.com wrote:
If it fails on the primary shard, then a failure is returned. If it worked,
and a replica failed, then that replica is deemed a failed replica, and will
get allocated somewhere else in the cluster. Maybe an example of where a
failure on all shards would help here?

I think its more about searches and they can fail on one shard but not other
for all sorts of reasons. Queue full, unfortunate script, bug, only one
shard had results and the query asked for something weird like to use the
postings highlighter when postings aren't stored. Lots of reasons.

I log the event and move on. I toyed with outputting a warning to the user
but didn't have time to implement it. We're pretty diligent with our logs so
we'd notice the log and run it down.

If the failure is caused by the queue being full only on one node, we'd
likely notice that real quick as ganglia would lose it. This happened to me
recently when we put a node without an ssd into a cluster with ssds. It
couldn't keep up and dropped a ton of searches. In our defense, we didn't
know the rest of the cluster had ssds so we were double surprised.

Nik

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2dvNM-wu%3Due4trJzAtLV%3Dz1xK0MVNxhYkUKv2g68z3VQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9A9246D5-338B-4B93-B2FD-4D3B93F621F2%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How does shingle filter work on match_phrase in query phase?

2014-06-20 Thread 陳智清

I got it! Thank you!

Cédric Hourcade於 2014年6月20日星期五UTC+8下午8時00分36秒寫道：

On Fri, Jun 20, 2014 at 12:30 PM, 陳智清 walke...@gmail.com javascript:
wrote:
Hello Hourcade, Thanks for your response.

Does that mean different values should be set to index_analyzer and
search_analyzer? (e.g. index_analyzer: shingle, and
search_analyzer:
standard)
What if I want to re-use the same shingle analyzer in both index and
search? will the match_phrase t1 t2 t3 still give me a match?

I know that set a different analyzer to search_analyzer makes
match_phrase
t1 t2 t3 searchable, but if I do that, then I get no benefit from
shingle, right? Instead I get a bigger index size.

I assume shingle is used for faster match_phrase searches. But after
shingle, searching a phrase of 3 tokens t1 t2 t3 becomes searching a
phrase of 5 tokens plus I don't know how shingle arranges the
positions
for a correct phrase query. So how can match_phrase be faster? Thank
you.

Cédric Hourcade於 2014年6月20日星期五UTC+8下午4時18分03秒寫道：

Hello,

Let's say you have an indexed text t1 t3 t3 with shingles. The token
positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos
1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3).

Cédric Hourcade
c...@wal.fr

On Fri, Jun 20, 2014 at 7:04 AM, 陳智清 walke...@gmail.com wrote:
How does shingle filter work on match_phrase in query phase?

After analyzing phrase t1 t2 t3, shingle filter produced five
tokens,
t1
t2
t3
t1 t2
t2 t3

Will match_phrase still give t1 t2 t3 a match? How it works? Thank
you.

https://groups.google.com/d/msgid/elasticsearch/33889bbd-9b01-4414-b579-4e625f0eec17%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

https://groups.google.com/d/msgid/elasticsearch/602477cb-d8f4-459b--e6174662fbfd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/61083ccb-f678-4074-bd48-a4dbcc0c0511%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem indexing with my analyzer

If your base64 encodes are long, they are going to be splited in a lot
of tokens by the standard tokenizer.

Theses tokens are often going to be a lot longer than standard words,
so your nGram filter will generate even more tokens, a lot more than
with standard text. That may be your problem there.

You should really try to strip the encoded images with a simple regex
from your documents before indexing them. If you need to keep the
source, put the raw text in an unindexed field, and the cleaned one in
another.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPPD4UXAjX%2Buwi84LSsPeiy0C80uzcb4C1QFxwLzfyjQGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ES v1.1 continuous young gc pauses old gc, stops the world when old gc happens and splits cluster

2014-06-20 Thread Ankush Jhalani

Mike - The above sounds like happened due to machines sending too many 
indexing requests and merging unable to keep up pace. Usual suspects would 
be not enough cpu/disk speed bandwidth. 
This doesn't sound related to memory constraints posted in the original 
issue of this thread. Do you see memory GC traces in logs? 

On Friday, June 20, 2014 9:40:48 AM UTC-4, Michael Hart wrote:

 We're seeing the same thing. ES 1.1.0, JDK 7u55 on Ubuntu 12.04, 5 data 
 nodes, 3 separate masters, all are 15GB hosts with 7.5GB Heaps, storage is 
 SSD. Data set is ~1.6TB according to Marvel.

 Our daily indices are roughly 33GB in size, with 5 shards and 2 replicas. 
 I'm still investigating what happened yesterday, but I do see in Marvel a 
 large spike in the Indices Current Merges graph just before the node 
 dies, and a corresponding increase in JVM Heap. When Heap hits 99% 
 everything grinds to a halt. Restarting the node fixes the issue, but 
 this is third or fourth time it's happened.

 I'm still researching how to deal with this, but a couple of things I am 
 looking at are:

- increase the number of shards so that the segment merges stay 
smaller (is that even a legitimate sentence?) I'm still reading through 
this page the Index Module Merge page 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-merge.html
  for 
more details.
- look at store level throttling 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html#store-throttling
.

 I would love to get some feedback on my ramblings. If I find anything more 
 I'll update this thread.

 cheers
 mike




 On Thursday, June 19, 2014 4:05:54 PM UTC-4, Bruce Ritchie wrote:

 Java 8 with G1GC perhaps? It'll have more overhead but perhaps it'll be 
 more consistent wrt pauses.



 On Wednesday, June 18, 2014 2:02:24 PM UTC-4, Eric Brandes wrote:

 I'd just like to chime in with a me too.  Is the answer just more 
 nodes?  In my case this is happening every week or so.

 On Monday, April 21, 2014 9:04:33 PM UTC-5, Brian Flad wrote:

 My dataset currently is 100GB across a few daily indices (~5-6GB and 15 
 shards each). Data nodes are 12 CPU, 12GB RAM (6GB heap).


 On Mon, Apr 21, 2014 at 6:33 PM, Mark Walkom ma...@campaignmonitor.com 
 wrote:

 How big are your data sets? How big are your nodes?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 22 April 2014 00:32, Brian Flad bfla...@gmail.com wrote:

 We're seeing the same behavior with 1.1.1, JDK 7u55, 3 master nodes (2 min 
 master), and 5 data nodes. Interestingly, we see the repeated young GCs 
 only on a node or two at a time. Cluster operations (such as recovering 
 unassigned shards) grinds to a halt. After restarting a GCing node, 
 everything returns to normal operation in the cluster.

 Brian F


 On Wed, Apr 16, 2014 at 8:00 PM, Mark Walkom ma...@campaignmonitor.com 
 wrote:

 In both your instances, if you can, have 3 master eligible nodes as it 
 will reduce the likelihood of a split cluster as you will always have a 
 majority quorum. Also look at discovery.zen.minimum_master_nodes to go with 
 that.
 However you may just be reaching the limit of your nodes, which means the 
 best option is to add another node (which also neatly solves your split 
 brain!).

 Ankush it would help if you can update java, most people recommend u25 but 
 we run u51 with no problems.



 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 17 April 2014 07:31, Dominiek ter Heide domin...@gmail.com wrote:

 We are seeing the same issue here. 

 Our environment:

 - 2 nodes
 - 30GB Heap allocated to ES
 - ~140GB of data
 - 639 indices, 10 shards per index
 - ~48M documents

 After starting ES everything is good, but after a couple of hours we see 
 the Heap build up towards 96% on one node and 80% on the other. We then see 
 the GC take very long on the 96% node:









 TOuKgmlzaVaFVA][elasticsearch1.trend1.bottlenose.com][inet[/192.99.45.125:
 9300]]])

 [2014-04-16 12:04:27,845][INFO ][discovery] 
 [elasticsearch2.trend1] trend1/I3EHG_XjSayz2OsHyZpeZA

 [2014-04-16 12:04:27,850][INFO ][http ] [
 elasticsearch2.trend1] bound_address {inet[/0.0.0.0:9200]}, 
 publish_address {inet[/192.99.45.126:9200]}

 [2014-04-16 12:04:27,851][INFO ][node ] 
 [elasticsearch2.trend1] started

 [2014-04-16 12:04:32,669][INFO ][indices.store] 
 [elasticsearch2.trend1] updating indices.store.throttle.max_bytes_per_sec 
 from [20mb] to [1gb], note, type is [MERGE]

 [2014-04-16 12:04:32,669][INFO ][cluster.routing.allocation.decider] 
 [elasticsearch2.trend1] updating 
 [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] 
 to [50]

 [2014-04-16 12:04:32,670][INFO

Re: Losing data after Elasticsearch restart

2014-06-20 Thread Rohit Jaiswal

Hi Alexander,
 Here is the stack trace for the NullpointerException -

[23:24:38,929][DEBUG][action.bulk  ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@22b11bbf]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[23:24:38,940][DEBUG][action.bulk  ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@768475c4]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)


Thanks,
Rohit


On Fri, Jun 20, 2014 at 12:02 AM, Alexander Reelsen a...@spinscale.de
wrote:

 Hey,

 the exception you showed, can possibly happen, when you remove an alias.
 However you mentioned NullPointerException in your first post, which is not
 contained in the stacktrace, so it seems, that one is still missing.

 Also, please retry with a newer version of Elasticsearch.


 --Alex


 On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jais...@gmail.com
 wrote:

 Hi Alexander,
We sent you the stack trace. Can you please enlighten us
 on this?

 Thanks,
 Rohit


 On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal rohit.jais...@gmail.com
 wrote:

 Hi Alexander,
 Thanks for your reply. We plan to upgrade in the
 long run, however we need to fix the data loss problem on 0.90.2 in the
 immediate term.

 Here is the stack trace -


 10:09:37.783 PM

 [22:09:37,783][WARN ][indices.cluster  ] [Storm]
 [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
 org.elasticsearch.indices.recovery.RecoveryFailedException:
 [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
 Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
 [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
 at
 org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
 at
 org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: org.elasticsearch.transport.RemoteTransportException:
 [Jeffrey Mace][inet[/10.4.35.200:9300
 ]][index/shard/recovery/startRecovery]
 Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
 [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
 at
 org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
 at
 org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
 at
 org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
 at
 org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
 at

Re: problem indexing with my analyzer

Thank you Cédric Hourcade !

Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit :

 If your base64 encodes are long, they are going to be splited in a lot 
 of tokens by the standard tokenizer. 

 Theses tokens are often going to be a lot longer than standard words, 
 so your nGram filter will generate even more tokens, a lot more than 
 with standard text. That may be your problem there. 

 You should really try to strip the encoded images with a simple regex 
 from your documents before indexing them. If you need to keep the 
 source, put the raw text in an unindexed field, and the cleaned one in 
 another. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

guarding from double-start

There were a couple of times during development workflow I have started ES 
script the second time. It results in red status (I use Elastic HQ) and 
not-working. So I'm forced to regenerate all indexes (with all test data) 
again. It takes noticeable time. 

At the moment I use this script

ES_MAX_MEM=512M
export ES_MAX_MEM
cd /ES-dir/bin
./elasticsearch.in.sh 
./elasticsearch -f 


under Linux to start ES. Can you. please, suggest a trick to avoid falling 
in red?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a79fba10-3fad-4c76-bc19-d744c2f79ef2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: searching on nested docs - geting back the nested docs as a response

2014-06-20 Thread liorg

I am not sure highlight will work as i suspect it will encounter the same 
obstacle, see in:
https://github.com/elasticsearch/elasticsearch/issues/5245

as for suggestion #2, this will break our current schema and will require a 
significant model change (we store the data in MongoDB as well) - so, i am 
not sure if we are not better off to wait until #3022 is solved? for the 
meantime, any workaround will be appreciated...

can we do some in memory searching again? (using native lucene somehow?...)

On Friday, June 20, 2014 1:13:42 AM UTC+3, Itamar Syn-Hershko wrote:

 It is very hard to give you concrete advice without knowing more about 
 your domain and usecases, but here are 2 points that came to mind:

 1. You can make use of the highlighting features to show the content that 
 matched. Highlighters can return whole blocks of text, and by using 
 positionIncrements correctly you can get this right.

 2. Yes, Elasticsearch is a document-oriented storage, but is it really 
 necessary for you to index entire books as one document? I'd most certainly 
 look at indexing sections or chapters maybe even pages as single documents 
 and use string references to the book ID. Unless you use data from the book 
 level along with full-text searches on the texts, which even then in some 
 scenarios I would consider denormalization.

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Thu, Jun 19, 2014 at 10:13 PM, liorg lior...@gmail.com javascript: 
 wrote:

 Well, assuming we have a book type. the book holds a lot of metadata, 
 lets say something of the following:
 {
 author: {
 name: Jose,
  lastName: Martin
 },
 sections: [{
  chapters: [{
 pages: [{
 pageNum: 1,
  numOfChars: 1000,
 text: let my people...,
 numofWords: 125
  },
 {
 pageNum: 2,
 numOfChars: 1005,
  text: let my people go...,
 numofWords: 150
  }],
 chapterName: the start
 },
  {
 pages: [{
 pageNum: 3,
 numOfChars: 1000,
  text: will do...,
 numofWords: 125
 },
  {
 pageNum: 4,
 numOfChars: 1005,
  text: will do later on...,
 numofWords: 150
  }],
 chapterName: the end
 }],
  sectionName: prologue
 }]
 }

 we want to search for all the pages that have let my people in their 
 text and more than 100 words.
 so, when we use ES we can use nested objects and query on the nested page 
 object - but the actual returned values are the books (parents) that have 
 those matching pages.
 now, if we want to show the user the pages he was looking for - we cannot 
 do that, as we get the whole book type returned with all its metadata and 
 not just the nested objects that matched the criteria... - we need to 
 search again (maybe in memory?) for the pages that matched the criteria in 
 order to display the user his search results... (the whole type is returned 
 as ES does not support yet in returning the nested objects that matched the 
 criteria).

 i hope it is better understood now

 On Thursday, June 19, 2014 7:22:13 PM UTC+3, Itamar Syn-Hershko wrote:

 This is usually something that's being solved using parent-child, but 
 the question here really is what do you mean by needing to retrieve both 
 books  pages.

 Can you describe the actual scenario and what you are trying to achieve?

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Thu, Jun 19, 2014 at 7:12 PM, liorg lior...@gmail.com wrote:

  Hi,

 we have somehow a complex type holding some nested docs with arrays 
 (lets assume an hierarchy of books and for each book we have an array of 
 pages containing its metadata).

 we want to search for the nested doc - search for all the books that 
 have the term XYZ in one of their pages - but we want to get back not 
 only the book, but the pages themselves.

 We've understood that it's problematic to achieve with ES (see 
 https://github.com/elasticsearch/elasticsearch/issues/3022).

 We have a problem to achieve it with parent child model as the data 
 model comes from our mongodb already existing model (and besides, not sure 
 if a parent child model fits here).

 so...

 1. Is there any a workaround we can do to get the results of the nested 
 doc? (the actual pages?)
 2. If not, is there a recommended way we can search for the data again 
 in memory after it was narrowed down by ES server?...
 3. Any advice will be appreciated as this is quite a big obstacle in 
 our way to implement a solution using ES.

 thanks,

 Lior

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%

Kibana Terms panel showing date fields as longs?

2014-06-20 Thread Chris Neal

Hello :)

I have some log data indexed in ES and trying to visualize in Kibana and
getting strange behavior related to dates.  I have Terms panel with the
following settings:

Terms mode: terms
Field: date
Length 10
Order: count

For some reason, the date column in the panel is showing up as a long,
not a date:

COUNTBYDATE

TermCountAction14662944096597 14663808060063 14662080059480
The Table panel showing all my log entries knows that field is a date, and
it displays as a date correctly.
If I curl the request to ES, it appears ES is returning it as a long, not a
date:


curl -XGET 'http://localhost:9200/test/_search?pretty' -d '{
  facets: {
terms: {
  terms: {
field: date,
size: 10,
order: count,
exclude: []
  },
  facet_filter: {
fquery: {
  query: {
filtered: {
  query: {
bool: {
  should: [
{
  query_string: {
query: _type:test_type
  }
}
  ]
}
  },
  filter: {
bool: {
  must: [
{
  range: {
date: {
  from: 1465925902106,
  to: 1466769177326
}
  }
}
  ]
}
  }
}
  }
}
  }
}
  },
  size: 0
}'

returns:

{
  took : 387,
  timed_out : false,
  _shards : {
total : 10,
successful : 10,
failed : 0
  },
  hits : {
total : 48173413,
max_score : 0.0,
hits : [ ]
  },
  facets : {
terms : {
  _type : terms,
  missing : 0,
  total : 365090,
  other : 0,
  terms : [ {
term : 146629440,
count : 96697
  }, {
term : 146638080,
count : 60343
  }, {
term : 146620800,
count : 59579
  }, {
term : 146612160,
count : 51592
  }, {
term : 146603520,
count : 48859
  }, {
term : 146594880,
count : 48020
  } ]
}
  }
}

Is there something I can do to have Kibana recognize the term is a date and
display it as 2014-06-17 like the Table panel does?

Thanks so much!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3DpjWrZD8xiKCEDzXmcvydoQzztN-4q1%2BVr3rhaH4H0HEUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Result the number of matched terms for a given result.

2014-06-20 Thread Dan Harvey

Hi,

Is it possible to get elasticsearch to return the number of terms matched 
per result in a query. I know these are evaluated as they make up the score 
but there doesn't seem to be a way to get a simple count?

For example with :query = {:in = {:user_ids = [user_ids...], 
:minimum_should_match = 1}}

I would like to know how many user_ids were matched.

Thanks,
Dan

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5cd6753-f166-4ae5-8c61-844650efa859%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: guarding from double-start

2014-06-20 Thread Maciej Dziardziel

use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile 
guarding es instance. Or just run this way:
pgrep -f elasticsearch || ./start_es.sh


On Friday, June 20, 2014 3:21:08 PM UTC+1, Andrew Gaydenko wrote:

 There were a couple of times during development workflow I have started ES 
 script the second time. It results in red status (I use Elastic HQ) and 
 not-working. So I'm forced to regenerate all indexes (with all test data) 
 again. It takes noticeable time. 

 At the moment I use this script

 ES_MAX_MEM=512M
 export ES_MAX_MEM
 cd /ES-dir/bin
 ./elasticsearch.in.sh 
 ./elasticsearch -f 


 under Linux to start ES. Can you. please, suggest a trick to avoid falling 
 in red?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d78daaaf-305b-45b4-ad9a-e34cf1adbb22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: guarding from double-start

2014-06-20 Thread Ivan Brusic

You can either use the startup scripts that come with the package when you
install via apt/yum [1] or use the service wrapper [2].

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html
[2] https://github.com/elasticsearch/elasticsearch-servicewrapper

--
Ivan

On Fri, Jun 20, 2014 at 7:49 AM, Maciej Dziardziel fied...@gmail.com
wrote:

use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile
guarding es instance. Or just run this way:
pgrep -f elasticsearch || ./start_es.sh

On Friday, June 20, 2014 3:21:08 PM UTC+1, Andrew Gaydenko wrote:

There were a couple of times during development workflow I have started
ES script the second time. It results in red status (I use Elastic HQ) and
not-working. So I'm forced to regenerate all indexes (with all test data)
again. It takes noticeable time.

At the moment I use this script

ES_MAX_MEM=512M
export ES_MAX_MEM
cd /ES-dir/bin
./elasticsearch.in.sh
./elasticsearch -f

under Linux to start ES. Can you. please, suggest a trick to avoid
falling in red?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d78daaaf-305b-45b4-ad9a-e34cf1adbb22%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d78daaaf-305b-45b4-ad9a-e34cf1adbb22%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDQMVO4sf-%3Dgq_cnQRX6cTP1RG7_HquR_tAoVa6A_VoFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: guarding from double-start

On Friday, June 20, 2014 6:49:04 PM UTC+4, Maciej Dziardziel wrote:

 use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile 
 guarding es instance. Or just run this way:
 pgrep -f elasticsearch || ./start_es.sh


Aha, thanks! - at my case pgrep is the most appropriate.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/115162b2-d679-48f0-a06e-24c47f74d079%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ANN] Elasticsearch Thrift transport plugin 2.2.0 released

2014-06-20 Thread Elasticsearch Team


Heya,


We are pleased to announce the release of the Elasticsearch Thrift transport 
plugin, version 2.2.0.

The thrift transport plugin allows to use the REST interface over thrift on top 
of HTTP..

https://github.com/elasticsearch/elasticsearch-transport-thrift/

Release Notes - elasticsearch-transport-thrift - Version 2.2.0



Update:
 * [28] - Update to elasticsearch 1.2.0 
(https://github.com/elasticsearch/elasticsearch-transport-thrift/issues/28)


Doc:
 * [25] - Add documentation on missing settings 
(https://github.com/elasticsearch/elasticsearch-transport-thrift/issues/25)


Issues, Pull requests, Feature requests are warmly welcome on 
elasticsearch-transport-thrift project repository: 
https://github.com/elasticsearch/elasticsearch-transport-thrift/
For questions or comments around this plugin, feel free to use elasticsearch 
mailing list: https://groups.google.com/forum/#!forum/elasticsearch

Enjoy,

-The Elasticsearch team

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53a462bc.814db40a.27a2.5605SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Re: Splunk vs. Elastic search performance?

Thomas,

Thanks for your insights and experiences. As I am someone who has explored 
and used ES for over a year but is relatively new to the ELK stack, your 
data points are extremely valuable. Let me offer some of my own views.

Re: double the storage. I strongly recommend ELK users to disable the _all 
field. The entire text of the log events generated by logstash ends up in 
the message field (and not @message as many people incorrectly post). So 
the _all field is just redundant overhead with no value add. The result is 
a dramatic drop in database file sizes and dramatic increase in load 
performance. Of course, you need to configure ES to use the message field 
as the default for a Lucene Kibana query.

During the year that I've used ES and watched this group, I have been on 
the front line of a brand new product with a smart and dedicated 
development team working steadily to improve the product. Six months ago, 
the ELK stack eluded me and reports weren't encouraging (with the sole 
exception of the Kibana web site's marketing pitch). But ES has come a long 
way since six months ago, and the ELK stack is much more closely integrated.

The Splunk UI is carefully crafted to isolate users from each other and 
prevent external (to the Splunk db itself, not to our company) users from 
causing harm to data. But Kibana seems to be meant for a small cadre of 
trusted users. What if I write a dashboard with the same name as someone 
else's? Kibana doesn't even begin to discuss user isolation. But I am 
confident that it will.

How can I tell Kibana to set the default Lucene query operator to AND 
instead of OR. Google is not my friend: I keep getting references to the 
Ruby versions of Kibana; that's ancient history by now. Kibana is cool and 
promising, but it has a long way to go for deployment to all of the folks 
in our company who currently have access to Splunk.

Logstash has a nice book that's been very helpful, and logstash itself has 
been an excellent tool for prototyping. The book has been invaluable in 
helping me extract dates from log events and handling all of our different 
multiline events. But it still doesn't explain why the date filter needs a 
different array of matching strings to get the date that the grok filter 
has already matched and isolated. And recommendations to avoid the 
elasticsearch_http output and use elasticsearch (via the Node client) 
directly contradict the fact that logstash's 1.1.1 version of the ES client 
library is not compatible with the most recent 1.2.1 version of ES.

And logstash is also a resource hog, so we eventually plan to replace it 
with Perl and Apache Flume (already in use) and pipe it into my Java bulk 
load tool (which is always kept up-to-date with the versions of ES we 
deploy!!). Because we send the data via Flume to our data warehouse, any 
losses in ES will be annoying but won't be catastrophic. And the front-end 
following of rotated log files will be done using the GNU *tail -F* command 
and option. This GNU tail command with its uppercase -F option follows 
rotated log files perfectly. I doubt that logstash can do the same, and we 
currently see that neither can Splunk (so we sporadically lose log events 
in Splunk too). So GNU tail -F piped into logstash with the stdin filter 
works perfectly in my evaluation setup and will likely form the first stage 
of any log forwarder we end up deploying,

Brian

On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
 The system is slow but ok to use. 

 We tried Elasticsearch and we were able to get the same performance with 
 the same amount of machines. Unfortunately with Elasticsearch you need 
 almost double amount of storage, plus a LOT of patience to make is run. It 
 took us six months to set it up properly, and even now, the system is quite 
 buggy and instable and from time to time we loose data with Elasticsearch. 

 I don´t recommend ELK for a critical production system, for just dev work, 
 it is ok, if you don´t mind the hassle of setting up and operating it. The 
 costs you save by not buying a splunk license you have to invest into 
 consultants to get it up and running. Our dev teams hate Elasticsearch and 
 prefer Splunk.


On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12 
 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned. 
 The system is slow but ok to use. 

 We tried Elasticsearch and we were able to get the same performance with 
 the same amount of machines. Unfortunately with Elasticsearch you need 
 almost double amount of storage, plus a LOT of patience to make is run. It 
 took us six months to set it up properly, and even now, the system is quite 
 buggy and instable and from time to time we loose data

boolean multi-field silently ignored in 1.2.1

2014-06-20 Thread Bruce Ritchie

I'm seeing multi-fields of type boolean silently being reduced to a normal 
boolean field in 1.2.1 which wasn't the behavior in 0.90.9. 
See https://gist.github.com/Omega359/0c2a93690b4db30693a1 for an example of 
this.

Is this expected? To me it seems like it should work - the boolean field 
mapper seems to be calling out to multiFieldsBuilder - but I'm not versed 
enough in the internals of ES to know where if at all it's broken.


Bruce

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Penalty or boost from a boolean property

Function_score is the way to go IMHO.

Best

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 20 juin 2014 à 19:50, hugo lassiege hlassi...@gmail.com a écrit :

Hi,

I'm looking for help :) This is maybe trivial but I can't find the good
solution.

I have some documents and those documents have two boolean properties,
basically thumbs up and thumbs down to show that the administrator approve or
not those documents.
I try to boost a document if it is thumbsup or demote the document if it is
thumbsdown. It's not a filter, the document could be retrieved, it's just
more or less relevant.

I tried with two should clauses in the global request :

{
bool : {
should : [
{
term : { champ1 : valeur1 }
},
{
term : { champ2 : valeur2 }
},
{
term : { thumbsup : true }
},
{
term : { thumbsdown : false}
}
]
}
}

But I get some irrelevant documents because they match the last conditions.
What would be the best method for this use case ?
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ba3964f0-fbc8-4e0c-be3f-c38af8221410%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/088863F1-E2EA-45A6-9368-D9AA69E717FE%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Penalty or boost from a boolean property

2014-06-20 Thread hugo lassiege

Hi,

I'm looking for help :) This is maybe trivial but I can't find the good 
solution. 

I have some documents and those documents have two boolean properties, 
basically thumbs up and thumbs down to show that the administrator approve 
or not those documents. 
I try to boost a document if it is thumbsup or demote the document if it 
is thumbsdown. It's not a filter, the document could be retrieved, it's 
just more or less relevant. 

I tried with two should clauses in the global request :


{
bool : {
should : [
{
term : { champ1 : valeur1 }
},
{
term : { champ2 : valeur2 }
},
{
term : { thumbsup : true }
},
{
term : { thumbsdown : false}
}
]
}
}


But I get some irrelevant documents because they match the last conditions. 
What would be the best method for this use case ?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba3964f0-fbc8-4e0c-be3f-c38af8221410%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Getting complete value from ElasticSearch query

2014-06-20 Thread Vinay Pandey

I have the following structure on my ElasticSearch:

{
_index: 3_exposureindex
_type: exposuresearch
_id: 12738
_version: 4
_score: 1
_source: {
Name: test2_update
Description:
CreateUserId: 8
SourceId: null
Id: 12738
ExposureId: 12738
CreateDate: 2014-06-20T16:18:50.500
UpdateDate: 2014-06-20T16:19:57.547
UpdateUserId: 8
}
fields: {
_parent: 1
}
}


I am trying to get both, the data in `_source` as well as that in `fields`, 
when I run the query:

{
  query: {
terms: {
  Id: [
12738
  ]
}
  }
}


All I get are the values contained in `_source`, whereas, if I run the 
query:

{
  fields: [
_parent
  ],
  query: {
terms: {
  Id: [
12738
  ]
}
  }
}


Then I only the `fields`. Is there a way to get both? I will be grateful 
for any help.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cdb02319-f6ee-455e-bf13-762df7e33a82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Getting complete value from ElasticSearch query

2014-06-20 Thread Vinay Pandey

I forgot to mention that I have asked the same question in StackOverflow 
http://stackoverflow.com/questions/24333655/getting-complete-value-from-elasticsearch-query

On Friday, June 20, 2014 11:52:49 AM UTC-7, Vinay Pandey wrote:

 I have the following structure on my ElasticSearch:

 {
 _index: 3_exposureindex
 _type: exposuresearch
 _id: 12738
 _version: 4
 _score: 1
 _source: {
 Name: test2_update
 Description:
 CreateUserId: 8
 SourceId: null
 Id: 12738
 ExposureId: 12738
 CreateDate: 2014-06-20T16:18:50.500
 UpdateDate: 2014-06-20T16:19:57.547
 UpdateUserId: 8
 }
 fields: {
 _parent: 1
 }
 }


 I am trying to get both, the data in `_source` as well as that in 
 `fields`, when I run the query:

 {
   query: {
 terms: {
   Id: [
 12738
   ]
 }
   }
 }


 All I get are the values contained in `_source`, whereas, if I run the 
 query:

 {
   fields: [
 _parent
   ],
   query: {
 terms: {
   Id: [
 12738
   ]
 }
   }
 }


 Then I only the `fields`. Is there a way to get both? I will be grateful 
 for any help.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f70efa60-f62c-4dc0-9812-e02a3a900ea4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

deleting documents that are missing fields

2014-06-20 Thread Jeff Dupont

I can easily query for documents that are missing a particular term field, 
however I'd like to free up that space and remove those documents. I've 
tried this with no luck:

DELETE /my_index/pages/_search
{
filter : {
missing : {
field : sentences,
existence : true,
null_value : true
}
}
}


It works fine to find them, but i can't find an easy way to remove them and 
I have about 2million to remove as well.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c2d41bfb-145d-402e-a5aa-2f0329278bd9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Getting complete value from ElasticSearch query

2014-06-20 Thread Vinay Pandey

This just got answered:

You should be able to specify _source in the fields

Example:

{
  fields: [
_parent,
_source
  ],
  query: {
terms: {
  Id: [
12738
  ]
}
  }}



On Friday, June 20, 2014 11:52:49 AM UTC-7, Vinay Pandey wrote:

 I have the following structure on my ElasticSearch:

 {
 _index: 3_exposureindex
 _type: exposuresearch
 _id: 12738
 _version: 4
 _score: 1
 _source: {
 Name: test2_update
 Description:
 CreateUserId: 8
 SourceId: null
 Id: 12738
 ExposureId: 12738
 CreateDate: 2014-06-20T16:18:50.500
 UpdateDate: 2014-06-20T16:19:57.547
 UpdateUserId: 8
 }
 fields: {
 _parent: 1
 }
 }


 I am trying to get both, the data in `_source` as well as that in 
 `fields`, when I run the query:

 {
   query: {
 terms: {
   Id: [
 12738
   ]
 }
   }
 }


 All I get are the values contained in `_source`, whereas, if I run the 
 query:

 {
   fields: [
 _parent
   ],
   query: {
 terms: {
   Id: [
 12738
   ]
 }
   }
 }


 Then I only the `fields`. Is there a way to get both? I will be grateful 
 for any help.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00e590cf-352d-4ebf-800d-113565ee7fbe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Terms aggregation for multiple fields

2014-06-20 Thread Madhavan Ramachandran

Hi Team,
I am new to elasticsearch and learning about the searchapi/queryapi in
elasticsearch.

I have a requirement to fetch the data from ES. My data is as below assume
in a table format

Prop-Name Type Use
Place1 Sale Office
Place2 LeaseOffice
Place3 SubLease Office
Place4 Sale Industry

So in type i have Sale, Lease, sublease as distinct values for the
property. Similarly for use i have 7 distinct types.

I have loaded the data as into ES. My need.. at the pageload.. i need to
showe the count of each type and each use.

Upon selection of type, i need to filter the use and viceversa.

Assume if we have total 30 places for Type Sale ..then the Use might have
Office15 and Industry 15..

When i select Office 15.. I need to find in the document how many types of
each belong to Office.

1. All the time, I have to populate the distinct values (3 types and 7 use
) and their counts based on the selection of each
2. How to do aggregation if the Use field having values as Multi-family
and want to show as one aggregated value? Current query bring me as two
results for this value.

Regards
Madhavan.TR

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8f803b70-d8ff-4dbd-a4bb-0f71ecaec679%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: cassandra river plugin installation issue

2014-06-20 Thread Shams Haque

Hi,

The issue was not with Hector API, issue has been fixed by using WITH 
COMPACT STORAGE when creating column families in Cassandra.
Here i have posted it: 
http://stackoverflow.com/questions/21089453/cassandra-column-name-trailing-with-blank-characters


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9de84b4d-0d99-4483-bd1e-5f9471c0b97d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: deleting documents that are missing fields

2014-06-20 Thread Ivan Brusic

I do not use delete by query, but have you tried using a fully formed query
and not just a filter? Perhaps an implicit match_all query is not being
set. Try using a filtered query with a match_all query and your filter.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

--
Ivan

On Fri, Jun 20, 2014 at 12:13 PM, Jeff Dupont jeff.dup...@gmail.com wrote:

I can easily query for documents that are missing a particular term field,
however I'd like to free up that space and remove those documents. I've
tried this with no luck:

DELETE /my_index/pages/_search
{
filter : {
missing : {
field : sentences,
existence : true,
null_value : true
}
}
}

It works fine to find them, but i can't find an easy way to remove them
and I have about 2million to remove as well.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c2d41bfb-145d-402e-a5aa-2f0329278bd9%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c2d41bfb-145d-402e-a5aa-2f0329278bd9%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCguLamXCnrtV-bA-Ed03pGdB%2BVMrAt5-CYkqkvfnDaGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: get rid of _all to optimize storage and perfs (Re: Splunk vs. Elastic search performance?)

Patrick,

Here's my template, along with where the _all field is disabled. You may 
wish to add this setting to your own template, and then also add the index 
setting to ignore malformed data (if someone's log entry occasionally slips 
in null or no-data instead of the usual numeric value):

{
  automap : {
template : logstash-*,
settings : {
  *index.mapping.ignore_malformed : true*
},
mappings : {
  _default_ : {
numeric_detection : true,
*_all : { enabled : false },*
properties : {
  message : { type : string },
  host : { type : string },
  UUID : {  type : string, index : not_analyzed },
  logdate : {  type : string, index : no }
}
  }
}
  }
}

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a145cb1e-4013-4a6b-a58d-9a42368d8107%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: HIVE-Elasticsearch [mapr-elasticsearch] write to elasticsearch issue

2014-06-20 Thread shankarramshivram

Hi Costin,

Thanks for the tip. I replaced the old version of jackson and it works now 
:).

Cheers
Shankar

On Sunday, June 15, 2014 3:09:27 AM UTC-6, Costin Leau wrote:

 What version of MapR are you using? MapR uses an old version of jackson 
 which es-hadoop should detect and use an 
 appropriate code path. 
 There are various fixes: 

 1. I've pushed a fix on the 2.x branch which improves detection - you can 
 try the 2.0.1.BUILD-SNAPSHOT version here [a] 
 2. You can upgrade the jackson version in MapR to version 1.7 or higher 
 (vanilla Hadoop uses 1.8.8). This approach works 
 with the current 
 es hadoop and also gives you a performance boost for serializing data. 

 Cheers, 

 [a] 
 https://github.com/elasticsearch/elasticsearch-hadoop#development-snapshot 

 On 6/13/14 11:30 PM, shankarr...@gmail.com javascript: wrote: 
  Hi , 
  
  I am trying to integrate elasticsearch with a mapr hadoop cluster. I am 
 using the hive-elasticsearch integration 
  document. I am able to read data from the elasticsearch node. However I 
 am not able to write data into the elasticsearch 
  node which is my primary requirement. Request to kindly guide me . 
  
  I always get the following errors:- 
  
  2014-06-13 14:15:45,814 INFO 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS 
 maprfs:/user/hive/warehouse/dev.db/_tmp.shankar/02_0 
  *2014-06-13 14:15:45,947 FATAL 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper: java.lang.NoSuchMethodError: 
 org.codehaus.jackson.JsonGenerator.writeUTF8String([BII)V 
  at 
 org.elasticsearch.hadoop.serializ*ation.json.JacksonJsonGenerator.writeUTF8String(JacksonJsonGenerator.java:123)
  

  at 
 org.elasticsearch.hadoop.mr.WritableValueWriter.write(WritableValueWriter.java:47)
  

  at 
 org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:83)
  

  at 
 org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:38)
  

  at 
 org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:69) 

  at 
 org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:111) 

  at 
 org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:55) 

  at 
 org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:41) 

  at 
 org.elasticsearch.hadoop.serialization.builder.ContentBuilder.value(ContentBuilder.java:258)
  

  at 
 org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.doWriteObject(TemplatedBulk.java:92)
  

  at 
 org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:79)
  

  at 
 org.elasticsearch.hadoop.hive.EsSerDe.serialize(EsSerDe.java:128) 
  at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582)
  

  at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) 
  at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
  

  at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) 
  at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  

  at 
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) 
  at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) 
  at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) 
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
  at 
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) 
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) 
  at org.apache.hadoop.mapred.Child$4.run(Child.java:282) 
  at java.security.AccessController.doPrivileged(Native Method) 
  at javax.security.auth.Subject.doAs(Subject.java:415) 
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117)
  

  at org.apache.hadoop.mapred.Child.main(Child.java:271) 
  
  2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 3 finished. closing... 
  2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 DESERIALIZE_ERRORS:0 
  2014-06-13 14:15:45,947 INFO 
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing... 
  2014-06-13 14:15:45,947 INFO 
 org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 
  2014-06-13 14:15:45,947 INFO 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 finished. closing... 
  2014-06-13 14:15:45,948 INFO 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 Close done 
  2014-06-13 14:15:45,948 INFO 
 org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done 
  2014-06-13 14:15:45,948 INFO 
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 
  2014-06-13 14:15:45,948 INFO

issues with file input from logstash to elastic - please read

2014-06-20 Thread Eitan Vesely

Guys,
its been more than a week i've been struggling with this issue,
if possible, please give it a look and try to help  :-( 

i have a config file that im running logstash with which is suppose to 
fetch the log file i specified in it and stream it to elasticsearch.

problem is that it worked twice and thats it. NO changes made to the file 
and most of the times it doest load the data and doesnt show any error msg.
when i change the input from file to stdin' it works fine.

this is the config file, which i belive the syntax is correct since it did 
work twice...

input{ 
file{
path = C:\elasticsearch-1.2.0\testLog.txt
start_position = beginning
}
} 
output{
   elasticsearch{
host= localhost
index= tester3
protocol= http
   }
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b0634eb-dd2c-47f3-9959-2e48bdcc349d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Disabling date detection [Hive-Elasticsearch]

2014-06-20 Thread shankarramshivram

Hi ,

My write to es from mapr fails because of the automatic date detection 
being enabled . Is there a way to disable date detection from the external 
hive table properties. ?
Request to please guide me regarding this.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed7e40b0-b896-4633-88fc-efdf2bead65a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: boolean multi-field silently ignored in 1.2.1

heya bruce

that looks like a bug - please open an issue

clint

On 20 June 2014 19:41, Bruce Ritchie bruce.ritc...@gmail.com wrote:

I'm seeing multi-fields of type boolean silently being reduced to a normal
boolean field in 1.2.1 which wasn't the behavior in 0.90.9. See
https://gist.github.com/Omega359/0c2a93690b4db30693a1 for an example of
this.

Is this expected? To me it seems like it should work - the boolean field
mapper seems to be calling out to multiFieldsBuilder - but I'm not versed
enough in the internals of ES to know where if at all it's broken.

Bruce

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ccc5b263-24a2-45c5-97d1-46a93799eb58%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSpOKM38EJDpVkXyTdNuKtL%2BE5dDHBEV89K2LPP4oS2-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: issues with file input from logstash to elastic - please read

2014-06-20 Thread Mark Walkom

You'll have better luck sending this to the Logstash mailing list :)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 21 June 2014 08:02, Eitan Vesely eitan...@gmail.com wrote:

Guys,
its been more than a week i've been struggling with this issue,
if possible, please give it a look and try to help :-(

i have a config file that im running logstash with which is suppose to
fetch the log file i specified in it and stream it to elasticsearch.

problem is that it worked twice and thats it. NO changes made to the file
and most of the times it doest load the data and doesnt show any error msg.
when i change the input from file to stdin' it works fine.

this is the config file, which i belive the syntax is correct since it did
work twice...

input{
file{
path = C:\elasticsearch-1.2.0\testLog.txt
start_position = beginning
}
}
output{
elasticsearch{
host= localhost
index= tester3
protocol= http
}
}

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8b0634eb-dd2c-47f3-9959-2e48bdcc349d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8b0634eb-dd2c-47f3-9959-2e48bdcc349d%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YhwCh2XQ1BjK5c5czTy3t0Wa%3DK46st6Gr5Ei%3D5JAkCyg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to find the number of authors who have written between 2-3 books?

Alternatively, if you mode this with parent-child, then you can use
min_children/max_children which is available in the next release

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#_min_max_children_2

clint

On 20 June 2014 17:15, Mike mnilsson2...@gmail.com wrote:

I'm ok with the count returned being some estimate. Say in this simple
example if it returned 1 for just Joe, or 3 for John, Joe, and Jack that
would be ok too. I am also ok with restructuring my data in any way to
more efficiently get this number.

You mentioned creating a reference count document. How would that look?
1 doc per unique author, with a count of the total number of books he
wrote so then I can do a range aggregation on that number? What if I
wanted to find the number of authors who have written between 2-3 books
that have a title containing E, F, H, or I (still 2 in this case, John and
Joe) ?

On Thursday, June 19, 2014 6:43:41 PM UTC-4, Itamar Syn-Hershko wrote:

This is a Map/Reduce operation, you'll be better off maintaining a
ref-count document IMO then trying to hack the aggregations framework to
support this

Another reason for doing it that way is in a distributed environment some
aggregations can't be computed to an exact value - the Terms bucketing is
one example. So if you need exact values, I'd go for a model that does it.

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Jun 20, 2014 at 1:34 AM, Mike mnilss...@gmail.com wrote:

Assume each document is a book:
{ title: A, author: Mike }
{ title: B, author: Mike }
{ title: C, author: Mike }
{ title: D, author: Mike }

{ title: E, author: John }
{ title: F, author: John }
{ title: G, author: John }

{ title: H, author: Joe }
{ title: I, author: Joe }

{ title: J, author: Jack }

What is the best way to fin the number of authors who have written
between 2-3 books? In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very
large, and then on the client side traverse through the thousands of
authors and count how many had between 2-3. Is there a more efficient way
to do this? The cardinality aggregation is almost what I want, if only I
could specify a min and max term count.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSyio7izuxr5UL4SD5uiA5J7rwtfyP742W3robxfk7s6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Splunk vs. Elastic search performance?

2014-06-20 Thread Mark Walkom

I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated that
there was minimal performance difference, at the greater benefit of not
being locked to specific LS+ES versioning.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 21 June 2014 02:43, Brian brian.from...@gmail.com wrote:

 Thomas,

 Thanks for your insights and experiences. As I am someone who has explored
 and used ES for over a year but is relatively new to the ELK stack, your
 data points are extremely valuable. Let me offer some of my own views.

 Re: double the storage. I strongly recommend ELK users to disable the _all
 field. The entire text of the log events generated by logstash ends up in
 the message field (and not @message as many people incorrectly post). So
 the _all field is just redundant overhead with no value add. The result is
 a dramatic drop in database file sizes and dramatic increase in load
 performance. Of course, you need to configure ES to use the message field
 as the default for a Lucene Kibana query.

 During the year that I've used ES and watched this group, I have been on
 the front line of a brand new product with a smart and dedicated
 development team working steadily to improve the product. Six months ago,
 the ELK stack eluded me and reports weren't encouraging (with the sole
 exception of the Kibana web site's marketing pitch). But ES has come a long
 way since six months ago, and the ELK stack is much more closely integrated.

 The Splunk UI is carefully crafted to isolate users from each other and
 prevent external (to the Splunk db itself, not to our company) users from
 causing harm to data. But Kibana seems to be meant for a small cadre of
 trusted users. What if I write a dashboard with the same name as someone
 else's? Kibana doesn't even begin to discuss user isolation. But I am
 confident that it will.

 How can I tell Kibana to set the default Lucene query operator to AND
 instead of OR. Google is not my friend: I keep getting references to the
 Ruby versions of Kibana; that's ancient history by now. Kibana is cool and
 promising, but it has a long way to go for deployment to all of the folks
 in our company who currently have access to Splunk.

 Logstash has a nice book that's been very helpful, and logstash itself has
 been an excellent tool for prototyping. The book has been invaluable in
 helping me extract dates from log events and handling all of our different
 multiline events. But it still doesn't explain why the date filter needs a
 different array of matching strings to get the date that the grok filter
 has already matched and isolated. And recommendations to avoid the
 elasticsearch_http output and use elasticsearch (via the Node client)
 directly contradict the fact that logstash's 1.1.1 version of the ES client
 library is not compatible with the most recent 1.2.1 version of ES.

 And logstash is also a resource hog, so we eventually plan to replace it
 with Perl and Apache Flume (already in use) and pipe it into my Java bulk
 load tool (which is always kept up-to-date with the versions of ES we
 deploy!!). Because we send the data via Flume to our data warehouse, any
 losses in ES will be annoying but won't be catastrophic. And the front-end
 following of rotated log files will be done using the GNU *tail -F* command
 and option. This GNU tail command with its uppercase -F option follows
 rotated log files perfectly. I doubt that logstash can do the same, and we
 currently see that neither can Splunk (so we sporadically lose log events
 in Splunk too). So GNU tail -F piped into logstash with the stdin filter
 works perfectly in my evaluation setup and will likely form the first stage
 of any log forwarder we end up deploying,

 Brian

 On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
 Indexer and 2 Searchheads. Each indexer had 1000IOPS guaranteed assigned.
 The system is slow but ok to use.

 We tried Elasticsearch and we were able to get the same performance with
 the same amount of machines. Unfortunately with Elasticsearch you need
 almost double amount of storage, plus a LOT of patience to make is run. It
 took us six months to set it up properly, and even now, the system is quite
 buggy and instable and from time to time we loose data with Elasticsearch.

 I don´t recommend ELK for a critical production system, for just dev
 work, it is ok, if you don´t mind the hassle of setting up and operating
 it. The costs you save by not buying a splunk license you have to invest
 into consultants to get it up and running. Our dev teams hate Elasticsearch
 and prefer Splunk.


 On Thursday, June 19, 2014 8:48:34 AM UTC-4, Thomas Paulsen wrote:

 We had a 2,2TB/d installation of Splunk and ran it on VMWare with 12
 Indexer and 2

Re: guarding from double-start

And in your config file, set:

node.max_local_storage_nodes: 1

that way you won't start two nodes on a single instance

On 20 June 2014 16:54, Andrew Gaydenko andrew.gayde...@gmail.com wrote:

On Friday, June 20, 2014 6:49:04 PM UTC+4, Maciej Dziardziel wrote:

use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up
pidfile guarding es instance. Or just run this way:
pgrep -f elasticsearch || ./start_es.sh

Aha, thanks! - at my case pgrep is the most appropriate.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/115162b2-d679-48f0-a06e-24c47f74d079%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/115162b2-d679-48f0-a06e-24c47f74d079%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKTwyNM0DGJ_6HMoSbWmyJkSv5PObsfwGOF3tZ1a0QmJ9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem indexing with my analyzer

You seriously don't want 3..250 length ngrams That's ENORMOUS

Typically set min/max to 3 or 4, and that's it

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html#_ngrams_for_partial_matching

On 20 June 2014 16:05, Tanguy Bernard bernardtanguy1...@gmail.com wrote:

Thank you Cédric Hourcade !

Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit :

If your base64 encodes are long, they are going to be splited in a lot
of tokens by the standard tokenizer.

Theses tokens are often going to be a lot longer than standard words,
so your nGram filter will generate even more tokens, a lot more than
with standard text. That may be your problem there.

You should really try to strip the encoded images with a simple regex
from your documents before indexing them. If you need to keep the
source, put the raw text in an unindexed field, and the cleaned one in
another.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRS_zD%3DkVpKBpqp3hkcgJacAWsETGgJwMQJM%2BqJMuvscw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Adding order to a terms aggregator results in ArrayIndexOutOfBoundsException

2014-06-20 Thread debo

I have a simple document schema on which I am trying to run the following 
query :

curl -XPOST 'localhost:9200/indexName/topn/_search?pretty' -d '{
  aggregations : {
applid : {
  terms : {
field : applid,
size : 3,
order : {
  ttbyt_sum : desc
}
  },
  aggregations : {
tt : {
  filter : {
and : {
  filters : [ {
range : {
  t : {
from : 140321160,
to : 140321610,
include_lower : true,
include_upper : true
  }
}
  }, {
terms : {
  gid : [ abcd ]
}
  } ]
}
  },
  aggregations : {
byt_sum : {
  sum : {
field : byt
  }
}
  }
}
  }
}
  }
}'

This seems to give me back an error 

 error : SearchPhaseExecutionException[Failed to execute phase [query], 
all shards failed; shardFailures {[rcP5ncimTpmcUZgvn5cgSw][indexName][0]: 
ArrayIndexOutOfBoundsException[null]}{[vauVf2XOQvOobpqIbp0REQ][indexName][2]: 
RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]];
 
nested: ArrayIndexOutOfBoundsException; 
}{[vauVf2XOQvOobpqIbp0REQ][indexName][1]: 
RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]];
 
nested: ArrayIndexOutOfBoundsException; 
}{[vauVf2XOQvOobpqIbp0REQ][indexName][4]: 
RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]];
 
nested: ArrayIndexOutOfBoundsException; 
}{[vauVf2XOQvOobpqIbp0REQ][indexName][3]: 
RemoteTransportException[[Bloodstorm][inet[/10.0.0.91:9300]][search/phase/query]];
 
nested: ArrayIndexOutOfBoundsException; }],
  status : 500
}

When I take the 
order : {
  ttbyt_sum : desc
}
out, this seems to work fine. Also, the error only occurs for certain gid 
: [ abcd ] parameters. FOr example, it works for gid : [ 1234 ]. 
Could you suggest what is going wrong here?

Elasticsearch version :

{
  status : 200,
  name : Kylun,
  version : {
number : 1.1.1,
build_hash : f1585f096d3f3985e73456debdc1a0745f512bbc,
build_timestamp : 2014-04-16T14:27:12Z,
build_snapshot : false,
lucene_version : 4.7
  },
  tagline : You Know, for Search
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0aed2aa9-e91b-43db-b917-11612458da2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: guarding from double-start

On Saturday, June 21, 2014 2:33:28 AM UTC+4, Clinton Gormley wrote:

 And in your config file, set:

 node.max_local_storage_nodes: 1

 that way you won't start two nodes on a single instance


Great, thanks!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/244c064e-3b2e-4b86-a2df-d1fa88617042%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Splunk vs. Elastic search performance?

Mark,

I've read one post (can't remember where) that the Node client was
preferred, but have also read where the HTTP interface is minimal overhead.
So yes, I am currently using logstash with the HTTP interface and it works
fine.

I also performed some experiments with clustering (not much, due to
resource and time constraints) and used unicast discovery. Then I read
someone who strongly recommended multicast recovery, and I started to feel
like I'd gone down the wrong path. Then I watched the ELK webinar and heard
that unicast discovery was preferred. I think it's not a big deal either
way; it's what works best for your particular networking infrastructure.

In addition, I was recently given this link:
http://aphyr.com/posts/317-call-me-maybe-elasticsearch. It hasn't dissuaded
me at all, but it is a thought-provoking read. I am a little confused by
some things, though. In all of my high-performance banging on ES, even with
my time-to-live test feature enabled, I never lost any documents at all.
But I wasn't using auto-id; I was specifying my own unique ID. And when run
in my 3-node cluster (slow due to being hosted by 3 VMs running on a
dual-code machine), I still didn't lose any data. So I am not sure of the
high data loss scenarios he describes in his missive; I have seen no
evidence of any data loss due to false insert positives at all.

Brian

On Friday, June 20, 2014 6:30:27 PM UTC-4, Mark Walkom wrote:

I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated
that there was minimal performance difference, at the greater benefit of
not being locked to specific LS+ES versioning.

Regards,
Mark Walkom

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f7621a17-9366-4166-9612-61415938013f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: issues with file input from logstash to elastic - please read

Eitan,

My recommendation is to use the stdin input in logstash and avoid its file 
input. Then, for testing you pipe the file into your logstash instance. But 
in production, you should run the GNU version of *tail -F* (uppercase F 
option) to correctly follow all forms of rotated logs, and the pipe that 
output into your logstash instance.

I don't know just how robust logstash's file input is, but the GNU version 
of tail with the -F option is perfect, so there's no guesswork and no 
dependency on hope. Note that even Splunk has a currently open bug with 
losing data while trying to follow a rotated file.

Also, I added the multiline processing to the filters; it didn't seem to 
work when applied as a stdin codec. Now it works very well together.

Anyway, that's what our group is doing.

And yes, the logstash-users 
https://groups.google.com/forum/#!forum/logstash-users group is also 
rather active and is a good place for logstash-specific help.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9bbe59f4-93f1-4b59-8258-89301a8c5469%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch cluster on Azure using ubuntu. The nodes don't see each other

2014-06-20 Thread Pedro Alonso



I just posted this question on Stackoverflow:

I have been setting up a cluster of Elasticsearch in Azure, using Ubuntu 
VM, following the tutorial on the plugin page (elasticsearch-cloud-azure) 
on github. I've managed to configure everything and I have elasticsearch 
running, but I have 3 clusters of 1 Node instead of 1 Cluster of 3 nodes. I 
guess that the problem comes from:

cloud: azure: keystore: /path/to/keystore password: 
your_password_for_keystore subscription_id: your_azure_subscription_id 
service_name: your_azure_cloud_service_name discovery: type: azure 

I'm not sure of what your_azure_cloud_service_name should be. I have all 
my nodes inside a Virtual Network, so they can communicate each other. By 
default, on azure each time I create a VM, a new Cloud Service containing 
only that VM is created. Should that value be different for each of the 
nodes in my cluster?

I'm a bit lost on that one...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e2968f5d-9f67-421c-a60f-8fd5053317ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

update field type in existing mapping in elastic search

2014-06-20 Thread srikanth ramineni

Hi ,

can you please provide inputs to update the existing  field type in the 
mapping.Below is the requirement.

I have crated  contractIndex and it is type is conract. In that i have 
fields  contractid as long, contract number as long  but  i want to  change 
   contract number  type as string.


Thanks,
Srikanth.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e80c0884-9e18-4af2-8c04-69cde01fd3ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch cluster on Azure using ubuntu. The nodes don't see each other

You must create each VM under the same cloud service.
azure vm create azure-elasticsearch-cluster
Cloud service name is azure-elasticsearch-cluster

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 juin 2014 à 03:54, Pedro Alonso pedro@gmail.com a écrit :

I just posted this question on Stackoverflow:

I have been setting up a cluster of Elasticsearch in Azure, using Ubuntu VM,
following the tutorial on the plugin page (elasticsearch-cloud-azure) on
github. I've managed to configure everything and I have elasticsearch running,
but I have 3 clusters of 1 Node instead of 1 Cluster of 3 nodes. I guess that
the problem comes from:

cloud:
azure:
keystore: /path/to/keystore
password: your_password_for_keystore
subscription_id: your_azure_subscription_id
service_name: your_azure_cloud_service_name
discovery:
type: azure

I'm not sure of what your_azure_cloud_service_name should be. I have all my
nodes inside a Virtual Network, so they can communicate each other. By default,
on azure each time I create a VM, a new Cloud Service containing only that VM
is created. Should that value be different for each of the nodes in my cluster?

I'm a bit lost on that one...

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e2968f5d-9f67-421c-a60f-8fd5053317ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/64636299-774A-4C9B-865A-E3FEB85F326B%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: update field type in existing mapping in elastic search