MoreLikeThis query on parent/children documents !!

2015-01-23 Thread Evans
Hi all, 
I have parent/child documents.  How can i find parents documents which are
similar to a given parent document "A" such that the MoreLikeThis query
would use only all "children documents" for each parent to perform the
query.

Thank you very much !! 






--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/MoreLikeThis-query-on-parent-children-documents-tp4069529.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1422074802400-4069529.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Better understanding Lucene/Shard overheads

2015-01-23 Thread Drew Kutcharian
Thanks Mike. I’m still a bit unclear on these comments:

> IndexReader requires some RAM for each segment to hold structures like live 
> docs, terms index, index data structures for doc values fields, and holds 
> open a number of file descriptors in proportion to how many segments are in 
> the index.
> There is also a per-indexed-field cost in Lucene; if you have a great many 
> unique indexed fields that may matter.


Aren’t these structures dependent on the size of the “lucene index"? Say if I 
have 1 large lucene index vs 10 small lucene indices (considering not much 
duplicated data across indices) wouldn’t the total memory used be the same? I 
understand that there will be more file descriptors because there will be more 
segments.

> IndexWriter has a RAM buffer (indices.memory.index_buffer_size in ES) to hold 
> recently indexed/deleted documents, and periodically opens readers (10 at a 
> time by default) to do merging, which bumps up RAM usage and file descriptors 
> while the merge runs.


According to the doc at 
https://github.com/elasticsearch/elasticsearch/blob/master/docs/reference/modules/indices.asciidoc
 

 seems like indices.memory.index_buffer_size is the “total” size of the buffer 
for all the shards on a node, so not sure how this would matter in case of 
having too many shards. I understand that there will be more file descriptors 
and a lot more “smaller” merge jobs running.

I’m going to test this myself, but I just wanted to understand the model better 
first so I have more accurate tests.


Thanks again,

Drew



> On Jan 23, 2015, at 2:18 AM, Michael McCandless  
> wrote:
> 
> There is definitely a non-trivial per-index cost.
> 
> From Lucene's standpoint, ES holds an IndexReader (for searching) and 
> IndexWriter (for indexing) open.
> 
> IndexReader requires some RAM for each segment to hold structures like live 
> docs, terms index, index data structures for doc values fields, and holds 
> open a number of file descriptors in proportion to how many segments are in 
> the index.
> 
> IndexWriter has a RAM buffer (indices.memory.index_buffer_size in ES) to hold 
> recently indexed/deleted documents, and periodically opens readers (10 at a 
> time by default) to do merging, which bumps up RAM usage and file descriptors 
> while the merge runs.
> 
> There is also a per-indexed-field cost in Lucene; if you have a great many 
> unique indexed fields that may matter.
> 
> If you use field data, it's entirely RAM resident (doc values is a better 
> choice since it uses much less RAM).
> 
> ES has common thread pools on the node which are shared for all ops across 
> all shards on that node, so I don't think more indices translates to more 
> threads.
> 
> Net/net you really should just conduct your own tests to get a feel of 
> resource consumption in your use case...
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com 
> On Thu, Jan 22, 2015 at 4:07 PM, Drew Kutcharian  > wrote:
> Hi,
> 
> I just came across this blog post: 
> http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html 
> 
> 
> Seems like there has been a lot of work done on Lucene to reduce its memory 
> requirements and even more on Lucene 5.0. This is specifically interesting to 
> me since I’m working on a project that uses Elasticsearch and we are planning 
> on using 1 index per customer model (each with 1 or maybe 2 shards and no 
> replicas) and shard allocation, mainly because:
> 
> 1. We are going to have few thousand customers at most
> 
> 2. Each customer will only need access to their own data (no global queries)
> 
> 3. The indices are going be relatively large (each with millions of small 
> docs)
> 
> 4. We are going to need to do a lot of parent/child type queries (and ES 
> doesn’t support cross-shard parent/child relationships and the parent id 
> cache seems not that efficient, see 
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child.html
>  
> 
>  and 
> https://github.com/elasticsearch/elasticsearch/issues/3516#issuecomment-23081662
>  
> ).
>  This is the main reason we feel we can’t use time based (daily, monthly, …) 
> indices.
> 
> 5. Being able to easily “drop” an index if a customer leaves the initial 
> trial.
> 
> 
> I wanted to better understand the overheads of an Elasticsearch shard. Is it 
> just memory or CPU/threads too? Where can I find more information about this?
> 
> Thanks,
> 
> Drew
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group an

Re: Unassigned replica shards after cluster restart.

2015-01-23 Thread Darsh
One correction. Pending tasks queue is not empty and it shows one task.

18596   56.1m URGENT   reroute_after_cluster_update_settings. I think this 
is *cluster.routing.allocation.**enable" : "all"* we executed after all nodes 
joined the cluster. Setting got updated on all the nodes. But why it is still 
in the queue from last 56 m?


On Friday, January 23, 2015 at 4:48:10 PM UTC-8, Darsh wrote:
>
> We are using ES version 1.4.1. We have hourly index with 50 shards per 
> index and replication is 1 . We restarted our cluster after the some 
> maintenance we had from system side. No we don't have any disk failures. 
> After the restart, i see all primary shards are assigned but all replicas 
> are unassigned?
>
> After cluster shutdown, these are the steps we  followed to do the restart
>
> 1) Start all master nodes 
> 2) "cluster.routing.allocation.enable" : "none"
> 3) Start all data nodes
> 4) After all nodes joined the cluster we did 
> cluster.routing.allocation.enable" : "all"
>
> for cluster setting this it shows.
> {
>
>- persistent: { },
>- transient: 
>{
>   - cluster: 
>   {
>  - routing: 
>  {
> - allocation: 
> {
>- enable: "all"
>}
> }
>  }
>   }
>
> }
>
>  Cluster status is yellow
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8a04bad1-6985-46ea-8f04-bec0c86c65bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Unassigned replica shards after cluster restart.

2015-01-23 Thread Darsh
We are using ES version 1.4.1. We have hourly index with 50 shards per 
index and replication is 1 . We restarted our cluster after the some 
maintenance we had from system side. No we don't have any disk failures. 
After the restart, i see all primary shards are assigned but all replicas 
are unassigned?

After cluster shutdown, these are the steps we  followed to do the restart

1) Start all master nodes 
2) "cluster.routing.allocation.enable" : "none"
3) Start all data nodes
4) After all nodes joined the cluster we did 
cluster.routing.allocation.enable" : "all"

for cluster setting this it shows.
{
   
   - persistent: { },
   - transient: 
   {
  - cluster: 
  {
 - routing: 
 {
- allocation: 
{
   - enable: "all"
   }
}
 }
  }
   
}

Pending tasks queue is empty. Cluster status is yellow

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e36555af-9aab-4496-9c72-b26ba19866ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-23 Thread Todd Nine
Thanks for the suggestion on the tribe nodes.  I'll take a look 
at  org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService 
more in depth.  A reference implementation would be helpful in 
understanding it's usage, do you happen to know of any projects that use 
it?  

>From an architecture perspective, I'm concerned with having the cluster 
master initiate any replication operations aside from replaying index 
modifications.  As we continue to increase our cluster size, I'm worried it 
may become too much load on the master to keep up.  Our system is getting 
larger every day, we have 12 c3.4xl instances in each region currently. 
 Our client to ES is a multi-tennant system 
(http://usergrid.incubator.apache.org/), so each application created in the 
system will get it's own indexes in ES. This allows us to scale the indexes 
using read/write aliases per each application's usage. 


To take a step back even further, is there a way we can use something 
existing in ES to perform this work, possibly with routing rules etc?  My 
primary concern is that we don't have to query across regions, can recover 
from a region or network outage, and that replication can begin once 
communication between regions is restored.  I seem to be venturing into 
uncharted territory here by thinking I need to create a plugin, and I doubt 
I'm the first user to encounter such a problem.   If there are any other 
known solutions, that would be great.  I just need our replication time to 
be every N seconds.

Thanks again!
Todd

On Friday, January 23, 2015 at 2:43:08 PM UTC-7, Jörg Prante wrote:
>
> This looks promising.
>
> For admin operations, see also the tribe node. A special 
> "replication-aware tribe node" (or maybe more than one tribe node for 
> resiliency) could supervise the cluster-to-cluster replication.
>
> For the segment strategy, I think it is hard to go down to the level of 
> the index store and capture the files properly and put it over the wire to 
> a target. It should be better to replicate on shard level. Maybe by reusing 
> some of the code 
> of org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService so 
> that a tribe node can trigger a snapshot action on the source cluster 
> master, open a transactional connection from a node in the source cluster 
> to a node in the target cluster, and place a restore action on a queue on 
> the target cluster master, plus a rollback logic if shard transaction 
> fails. So in short, the ES cluster to cluster replication process could be 
> realized by a "primary shard replication protocol".
>
> Just my 2¢
>
> Jörg
>
>
> On Fri, Jan 23, 2015 at 7:42 PM, Todd Nine 
> > wrote:
>
>> Thanks for the pointers Jorg,
>>   We use Rx Java in our current application, so I'm familiar with 
>> backpressure and ensuring we don't overwhelm target systems.  I've been 
>> mulling over the high level design a bit more.  A common approach in all 
>> systems that perform multi region replication is the concept of "log 
>> shipping".  It's used heavily in SQL systems for replication, as well as in 
>> systems such as Megastore/HBase.  This seems like it would be the most 
>> efficient way to ship data from Region A to Region B with a reasonable 
>> amount of latency.  I was thinking something like the following.
>>
>> *Admin Operation Replication*
>>
>> This can get messy quickly.  I'm thinking I won't have any sort of 
>> "merge" logic since this can get very different for everyone's use case.  I 
>> was going to support broadcasting the following operations.
>>
>>
>>- Index creation
>>- Index deletion
>>- Index mapping updates
>>- Alias index addition
>>- Alias index removal
>>
>> This can also get tricky because it makes the assumption of unique index 
>> operations in each region.  Our indexes are Time UUID based, so I know we 
>> won't get conflicts.  I won't handle the case of an operation being 
>> replayed that conflicts with an existing index, I'll simply log it and drop 
>> it.  Handlers could be built in later so users could create their own 
>> resolution logic.  Also, this must be replayed in a very strict order.  I'm 
>> concerned that adding this additional master/master region communication 
>> could result in more load on the master.  This can be solved by running a 
>> dedicated master, but I don't really see any other solution.
>>
>>
>> *Data Replication*
>>
>> 1) Store last sent segments, probably in a system index.  Each region 
>> could be offline at different times, so for each segment I'll need to know 
>> where it's been sent.
>>
>> 2) Monitor segments as they're created.  I still need to figure this out 
>> a bit more in the context of latent sending. 
>>
>> Example.  Region us-east-1 ES nodes.
>>
>> We missed sending 5 segments to us-west-1 , and they were merged into 1.  
>> I now only need to send the 1 merged segment to us-west-1, since the other 
>> 5 segments will be removed.
>>
>> However, then a merged segm

Re: distributors vs raid0

2015-01-23 Thread Mark Walkom
There are plans to stripe data at a complete segment level, so that if a
disk/mount dies then only the segments on that are lost.
I'm not sure if there is an ETA on that.

On 24 January 2015 at 05:00, joergpra...@gmail.com 
wrote:

> Now that I re-read
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-dir-layout.html
>
>
> I see the possible misconception. "RAID0" in the text should give the
> picture that an ES data directory should be seen as a logical drive which
> contains many files spread over physical drives. RAID0 in striping mode on
> hardware controllers works differently from this: each word of block data
> is split into bits that are read/written simultaneously to different
> physical drives, where the filesystem or free space considerations has
> nothing to do with RAID.
>
> The ES store distributor was implemented to handle the situation where
> data dirs on a node may have different free storage capacity. With the
> setting "least_used" (which s the default, it really means "most_free"), ES
> selects the mount point for new files that has the most free space first,
> so the data paths are filled optimally by using all available space.
>
> I don't think the distributor is of any value for future index recovery
> strategies, it is too low level. Recovery will become more intelligent with
> the advent of numbered sequences in Lucene segments, which allows
> incremental recovery and replication of shards.
>
> Jörg
>
>
> On Fri, Jan 23, 2015 at 4:56 PM, Shaun Senecal  wrote:
>
>> Thanks for the confirmations Jörg, Mark
>>
>> It seems like a lot of development effort to implement this feature for
>> little to no gain over RAID-0, so I wonder if the folks at ElasticSearch
>> have bigger plans for it in the future.  Perhaps file based recovery and/or
>> a distributor that keeps all files for a given shard together on the same
>> drive so that a failed drive results in the loss of only a few shards
>> rather than an entire node.  For now though, it seems RAID is the way to go.
>>
>>
>> Shaun
>>
>>
>> On Friday, January 23, 2015 at 2:53:53 AM UTC-8, Jörg Prante wrote:
>>>
>>> There are no advantages for JBOD over RAID0. RAID0 is far superior when
>>> using striped reads/writes, that is, you can add up the read/write
>>> performance of all the physical drives when using a hardware RAID
>>> controller. JBOD is limited to single physical drive performance .
>>>
>>> There is only one rare case, if you want to mix physical drives with
>>> different volume capacity, where RAID0 striping can not be applied. Then
>>> JBOD adds up all the volumes of the drives where striped RAID0 uses the
>>> smallest drive capacity only.
>>>
>>> And you are correct, in either case losing a drive means failure of a
>>> machine. ES solves node failures by replica shards on other machines, not
>>> by a file-based repairing strategy.
>>>
>>> Jörg
>>>
>>> On Fri, Jan 23, 2015 at 12:38 AM, Mark Walkom 
>>> wrote:
>>>
 You will lose all data if one drive dies and you use ES striping or
 RAID0.

 I don't know if there is a practical (throughput) difference, but
 logically they are the same.

 On 23 January 2015 at 10:21, Shaun Senecal  wrote:

> What is the advantage of using ElasticSearch's distributors (JBOD)
> over using raid0?
>
> As far as I can tell, if I lose a drive in either case I lose the
> whole node until the data can be recovered.  Is the distributor smart 
> about
> which files it recovers and only recovers the files that were on the 
> failed
> drive?  Is there some other advantage I am missing?
>
>
> Shaun
>
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

  --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_
 kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com
 
 .

 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving

Re: Elasticsearch on Cloudlinux Crashes

2015-01-23 Thread Mark Walkom
​Cloudlinux isn't a support platform I'm afraid.
Do you know​ what OS it's based on, RHEL, Debian, BSD?

On 24 January 2015 at 06:32, Max B  wrote:

> Hey all,
>
> I use elasticsearch for the forum software known as xenforo to index all
> the posts and all that good stuff.
>
> We recently converted the server to cloudlinux and when ever anyone tries
> to rebuild the search index, it crashes, I can't seem to make it log any
> thing either. Cloudlinux support after of about a day of investigating the
> server, told me to come take my question here.
>
> I've been told constantly that nothing is wrong with our setup and it
> should just be working.
>
> Any ideas?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/75f30371-1a80-4bfb-b7de-014fdaaeb179%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9BmXn-3pv_JKBq_sMyOFadLm29H2qUoXxd201YuzXDfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Configuration of elasticsearch to index 300 Million documents

2015-01-23 Thread Mark Walkom
1 - Yes, that's not a lot of data for ES :)
2 - Depends, you should be able to do that on a single node with ~16GB
heap, but you should test yourself.

On 23 January 2015 at 22:37, hitesh shekhada  wrote:

> Hi,
>
>
>1. My goal is to index 300 million documents/products from MS SQL
>server database.
>2. To get all documents I need to join 14 different tables.
>3. Total data size of 300 million documents is 300GB
>4. There 70 fields in one document
>5. One document is of size 0.8 kb
>6. Need to update value of 10 fields (out of 70) for almost 90 million
>documents every night.
>
>
> I need to know ...
>
>1. Does anybody has indexed such a large amount of data to
>elasticsearch server?
>2. How many cluster/nodes I need for handling it.
>
>
> Please let me know if someone has used elasticsearch for this amount of
> data.
>
> Thanks,
> Hitesh
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/acc7bbd4-7abc-4451-bfd5-b21ab725e02e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-Y28Sf_MnwDUb7iZp1MERxEWPehO4CRzmrSfBCHidumA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic index creation

2015-01-23 Thread Mark Walkom
By default ES will create an index automatically if it doesn't exist, so
just get your code to send to a new index every N hours and you should be
good.

On 24 January 2015 at 00:18, Abhimanyu Nagrath 
wrote:

> hi ,I am new to elastic search and I want to know can I create indexes
> automatically based on time .is there  a feature in elasitc search that
> supports this. For example i want to create a index after 4 hour
> automatically does elastic search support this?
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d91afc72-1c1d-4885-a98c-923ae808c4be%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_4P%2B6UQ9p35kXW07bYH4hbEV2pPK57nBh1j93xtfpkBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Bulk importing CSVs with different headers

2015-01-23 Thread Mark Walkom
You could use Logstash for this as well, it has a CSV filter.

On 24 January 2015 at 01:21, Ron  wrote:

> Hello all,
>
> So I'm trying to import a large number of CSV files into Elasticsearch.
> All the files have different content in them, with different headers.
>
> My goal is to have a directory we can drop CSVs into, and some plugin or
> process would pick them up, read the header and place the data in
> Elasticsearch mapping data to fields (gotten from the header).
>
> I've looked at the CSV River plugin and Fluentd. It looks like they both
> support half of the plan. The issue we run into is it looks like both want
> static field names before import.
>
> Am I wrong?  Anyone's help would be wonderful.
>
> Thanks
>
> Ron
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/54f555d9-1d66-4b19-9334-90dde4fc658a%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-a7K-soUP_800gF3bD5R1-gWutsYEE8ow6VSJB9QEaSA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: integrating ELK with SSO (CA Siteminder)

2015-01-23 Thread Mark Walkom
Can you post the applicable line from your kibana config that points to ES?

On 24 January 2015 at 07:50, Scott Lee  wrote:

> Hello, I am new to the ELK stack technology, and had a question.  My
> organization uses Siteminder to authenticate against their AD environment.
> In order to have this work with ELK, I was going to do the following:
>
> 1) Send log data to 1 of 5 different indices, based on source
> 2) Configure a separate Apache vhost and configure each based on what is
> accessible, i.e. using the LIMIT directives to limit everything except GET
> and POST for a certain index, for example.
> 3) Configure Siteminder for each vhost, allowing a certain subset of users
> access to each vhost based on what their permissions to each index should
> be (IE security gets access to the vhost that can send all methods, Network
> group can access the vhost that can only send GET and POST to the
> networking index, etc)
>
> I am in the process of testing this, and I got port 80 to work, but I
> can't get another port to work (in my test environment, I do not have
> access to the DNS server yet so I've been using IP vhosts).  I've allowed
> CORS to wildcard, I believe, and I've configured ES to bind to the
> localhost and use reverse proxy via apache.  It all works on port 80, but
> when I go on port 8080 for example I get the Kibana-ES "Connection Failed"
> error.
>
> Here are my configs (rough draft, not complete):
> elasticsearch.yml:
>  http.cors.enabled: true
>  http.cors.allow-origin: "/.*/"
>  network.host:"127.0.0.1"
>
> httpd-vhosts.conf:
>  
> DocumentRoot "/usr/local/data/www/docs/apache/"
> CustomLog "logs/access_log" combined
> ProxyRequests off
> ProxyPreserveHost on
> ServerName *ServerIP*
> ProxyPass /elasticsearch http://127.0.0.1:9200
> ProxyPassReverse /elasticsearch /
> 
> 
> Deny from all
> 
> 
> 
> 
> Deny from all
> 
> 
> 
> 
> Deny from all
> 
> 
> 
>
>
> 
> DocumentRoot "/usr/local/data/www/docs/apache/"
> CustomLog "logs/access_log" combined
> ProxyRequests off
> #ProxyPreserveHost on
> ServerName *ServerIP*
> ProxyPass /elasticsearch http://127.0.0.1:9200
> ProxyPassReverse /elasticsearch /
> 
>
>
>
> Does anybody have any feedback, and know why port 8080 isn't working to
> communicate with ES?
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/55240fdf-5ca8-457d-a342-a0ae4eb772dc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X97sqHvMuxVyyVB8vX9X%2BLXDO0P1BbKY4Dr_eZAqg_9Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-23 Thread joergpra...@gmail.com
This looks promising.

For admin operations, see also the tribe node. A special "replication-aware
tribe node" (or maybe more than one tribe node for resiliency) could
supervise the cluster-to-cluster replication.

For the segment strategy, I think it is hard to go down to the level of the
index store and capture the files properly and put it over the wire to a
target. It should be better to replicate on shard level. Maybe by reusing
some of the code
of org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService so
that a tribe node can trigger a snapshot action on the source cluster
master, open a transactional connection from a node in the source cluster
to a node in the target cluster, and place a restore action on a queue on
the target cluster master, plus a rollback logic if shard transaction
fails. So in short, the ES cluster to cluster replication process could be
realized by a "primary shard replication protocol".

Just my 2¢

Jörg


On Fri, Jan 23, 2015 at 7:42 PM, Todd Nine  wrote:

> Thanks for the pointers Jorg,
>   We use Rx Java in our current application, so I'm familiar with
> backpressure and ensuring we don't overwhelm target systems.  I've been
> mulling over the high level design a bit more.  A common approach in all
> systems that perform multi region replication is the concept of "log
> shipping".  It's used heavily in SQL systems for replication, as well as in
> systems such as Megastore/HBase.  This seems like it would be the most
> efficient way to ship data from Region A to Region B with a reasonable
> amount of latency.  I was thinking something like the following.
>
> *Admin Operation Replication*
>
> This can get messy quickly.  I'm thinking I won't have any sort of "merge"
> logic since this can get very different for everyone's use case.  I was
> going to support broadcasting the following operations.
>
>
>- Index creation
>- Index deletion
>- Index mapping updates
>- Alias index addition
>- Alias index removal
>
> This can also get tricky because it makes the assumption of unique index
> operations in each region.  Our indexes are Time UUID based, so I know we
> won't get conflicts.  I won't handle the case of an operation being
> replayed that conflicts with an existing index, I'll simply log it and drop
> it.  Handlers could be built in later so users could create their own
> resolution logic.  Also, this must be replayed in a very strict order.  I'm
> concerned that adding this additional master/master region communication
> could result in more load on the master.  This can be solved by running a
> dedicated master, but I don't really see any other solution.
>
>
> *Data Replication*
>
> 1) Store last sent segments, probably in a system index.  Each region
> could be offline at different times, so for each segment I'll need to know
> where it's been sent.
>
> 2) Monitor segments as they're created.  I still need to figure this out a
> bit more in the context of latent sending.
>
> Example.  Region us-east-1 ES nodes.
>
> We missed sending 5 segments to us-west-1 , and they were merged into 1.
> I now only need to send the 1 merged segment to us-west-1, since the other
> 5 segments will be removed.
>
> However, then a merged segment is created in us-east-1 from 5 segments
> I've already sent to us-west-1, I won't want to ship that since it will
> already contain the data.  As the tree is continually merged, I'll need to
> somehow sort out what contains shipped data, and what contains unshipped
> data.
>
>
> 3) As a new segment is created perform the following.
>   3.a) Replay any administrative operations since the last sync on the
> index to the target region, so the state is current.
>   3.b) Push the segment to the target region
>
> 4) The region receives the segment, and adds it to it's current segments.
> When a segment merge happens in the receiving region, this will get merged
> in.
>
>
>
>
>
> Thoughts?
>
>
>
>
> On Thursday, January 15, 2015 at 5:29:10 PM UTC-7, Jörg Prante wrote:
>>
>> While it seems quite easy to attach listeners to an ES node to capture
>> operations in translog-style and push out index/delete operations on shard
>> level somehow, there will be more to consider for a reliable solution.
>>
>> The Couchbase developers have added a data replication protocol to their
>> product which is meant for transporting changes over long distances with
>> latency for in-memory processing.
>>
>> To learn about the most important features, see
>>
>> https://github.com/couchbaselabs/dcp-documentation
>>
>> and
>>
>> http://docs.couchbase.com/admin/admin/Concepts/dcp.html
>>
>> I think bringing such a concept of an inter cluster protocol into ES
>> could be a good starting point, to sketch the complete path for such an
>> ambitious project beforehand.
>>
>> Most challenging could be dealing with back pressure when receiving
>> nodes/clusters are becoming slow. For a solution to this, reactive Java /
>> reactive streams look like a vi

integrating ELK with SSO (CA Siteminder)

2015-01-23 Thread Scott Lee
Hello, I am new to the ELK stack technology, and had a question.  My 
organization uses Siteminder to authenticate against their AD environment. 
 In order to have this work with ELK, I was going to do the following:

1) Send log data to 1 of 5 different indices, based on source
2) Configure a separate Apache vhost and configure each based on what is 
accessible, i.e. using the LIMIT directives to limit everything except GET 
and POST for a certain index, for example.
3) Configure Siteminder for each vhost, allowing a certain subset of users 
access to each vhost based on what their permissions to each index should 
be (IE security gets access to the vhost that can send all methods, Network 
group can access the vhost that can only send GET and POST to the 
networking index, etc)

I am in the process of testing this, and I got port 80 to work, but I can't 
get another port to work (in my test environment, I do not have access to 
the DNS server yet so I've been using IP vhosts).  I've allowed CORS to 
wildcard, I believe, and I've configured ES to bind to the localhost and 
use reverse proxy via apache.  It all works on port 80, but when I go on 
port 8080 for example I get the Kibana-ES "Connection Failed" error. 

Here are my configs (rough draft, not complete):
elasticsearch.yml:
 http.cors.enabled: true
 http.cors.allow-origin: "/.*/"
 network.host:"127.0.0.1"

httpd-vhosts.conf:
 
DocumentRoot "/usr/local/data/www/docs/apache/"
CustomLog "logs/access_log" combined
ProxyRequests off
ProxyPreserveHost on
ServerName *ServerIP*
ProxyPass /elasticsearch http://127.0.0.1:9200
ProxyPassReverse /elasticsearch /


Deny from all




Deny from all




Deny from all






DocumentRoot "/usr/local/data/www/docs/apache/"
CustomLog "logs/access_log" combined
ProxyRequests off
#ProxyPreserveHost on
ServerName *ServerIP*
ProxyPass /elasticsearch http://127.0.0.1:9200
ProxyPassReverse /elasticsearch /




Does anybody have any feedback, and know why port 8080 isn't working to 
communicate with ES?  


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55240fdf-5ca8-457d-a342-a0ae4eb772dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Release Schedules for Elasticsearch and plugins

2015-01-23 Thread David Ashby
I'm not wild about the idea of running snapshot code in production... but 
that would probably work.

On Thursday, January 22, 2015 at 1:10:30 PM UTC-5, Jörg Prante wrote:
>
> Yes, 1.5 will become a new branch with the status of 1.x once it is 
> released, and 1.x will then become the base for further 1.x releases (if 
> any)
>
> Jörg
>
>
> On Thu, Jan 22, 2015 at 5:50 PM, David Ashby  > wrote:
>
>> What's the difference between the es-1.4 and es-1.x branches? es-1.4 
>> stops updating once Elasticsearch 1.5 is released?
>>
>> On Wednesday, January 21, 2015 at 4:00:54 PM UTC-5, Jörg Prante wrote:
>>>
>>> I think the best thing about open source is, you can clone the github 
>>> repo branch es-1.4 of the plugin and build a snapshot version 
>>> 2.4.2-SNAPSHOT 
>>> for yourself.
>>>
>>> Last commit on es-1.4 was adding eu-central-1 so it should work then:
>>>
>>> https://github.com/elasticsearch/elasticsearch-cloud-aws/commit/
>>> 44d4c9a845d6fc1c0a3ce3b95b08eeef72ab3992
>>>
>>> This does not mean a release wouldn't be a good idea :)
>>>
>>> Jörg
>>>
>>>
>>> On Wed, Jan 21, 2015 at 9:35 PM, David Ashby  
>>> wrote:
>>>
 I'm curious what the release cycle for Elasticsearch and the 
 es-cloud-aws plugin are. https://github.com/
 elasticsearch/elasticsearch-cloud-aws/issues/165 lays out why, but 
 I'll include it here for completeness:

 Right now, it's very difficult to spin up a cluster in Frankfurt 
 (eu-central-1) because the released versions of the AWS plugin don't 
 support the region yet. I assume that when version 1.5.0 of Elasticsearch 
 is released, the plugin will get a version bump as well -- but when is 
 that 
 going to be?

 Thanks,
 -David

 -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/5a060461-55d9-45bc-96e6-d8efb959af02%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/d7eed6fc-80de-464e-aeeb-587db936e340%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5495e225-e412-446b-b0fb-dd6c5157e449%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Numberformat exception

2015-01-23 Thread joergpra...@gmail.com
On this issue is being worked on

https://github.com/elasticsearch/elasticsearch/issues/3975

Indexing all fields as strings seems like a plausible workaround.

Jörg

On Fri, Jan 23, 2015 at 7:54 PM, Prateek Asthana  wrote:

>
> I am indexing different type of fields that include number and strings.
> When I use multimatch queries (that includes both number and string fields)
> for terms such as "2001 camaro", I get NumberFormatException. I understand
> why this is happening but wondering if this can be avoided. Temporarily, I
> have fixed it by indexing all fields as strings instead of numbers.
>
> Example:
> Fields used : year (number) and model (string)
> multimatch cross type query includes both year and model.
> search query: 2001 camaro
>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/97149488-9e49-4ba8-8bdb-1a1a157e79b1%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHUkoL9GaoCu1g6Fv-UbLj-LWydTxvTFgWgurQA%3Dc7adA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Parent child documents query

2015-01-23 Thread bvnrwork

Is there a way we can get all child's and parents if parent /child 
qualifies for a query 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/35410b56-5334-4691-b966-b08402e284ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch crahs with Java 8

2015-01-23 Thread joergpra...@gmail.com
Check if you have more than one Elasticsearch jar on the classpath.
Maybe you have installed two versions in the same folder, without removing
the previous version?

Jörg

On Fri, Jan 23, 2015 at 9:02 PM, Fabio  wrote:

> I become the following message at startup of Elasticsearch :
>
> AnnotationFormatError[Duplicate annotation for class: interface
> java.lang.Deprecated: @java.lang.Deprecated()]
>
> with the current version :
> - Version of Elasticsearch is 1.4.2
>
> - java version "1.8.0_25"
> Java(TM) SE Runtime Environment (build 1.8.0_25-b18)
> Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
>
> -Windows 8
> Microsoft Windows [version 6.2.9200]
>
>
> Elasticsearch runs with Java 8 64bits?
>
> Best Regards
> Fabio
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/758524b9-8762-4957-8d39-65d3e8cde0cf%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFrEUKusQME5ofMCTBfy7jiQ80a9Cbgj%3DpjoAy7Swe%2BfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch C# client (NEST): access nested aggregation results

2015-01-23 Thread Jay Hilden
I'm trying to use the C# plugin to retrieve data from a nested aggregation. 
 If anyone could help that would be most appreciated.  Here is the Stack 
Overflow question.

http://stackoverflow.com/questions/28096723/elasticsearch-c-sharp-client-nest-access-nested-aggregation-results

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/73f7e411-123b-4e83-9806-a98b04cf8eb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch crahs with Java 8

2015-01-23 Thread Fabio
I become the following message at startup of Elasticsearch :

AnnotationFormatError[Duplicate annotation for class: interface 
java.lang.Deprecated: @java.lang.Deprecated()]

with the current version :
- Version of Elasticsearch is 1.4.2

- java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b18)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

-Windows 8
Microsoft Windows [version 6.2.9200]


Elasticsearch runs with Java 8 64bits?

Best Regards
Fabio

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/758524b9-8762-4957-8d39-65d3e8cde0cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch on Cloudlinux Crashes

2015-01-23 Thread Max B
Hey all,

I use elasticsearch for the forum software known as xenforo to index all 
the posts and all that good stuff.

We recently converted the server to cloudlinux and when ever anyone tries 
to rebuild the search index, it crashes, I can't seem to make it log any 
thing either. Cloudlinux support after of about a day of investigating the 
server, told me to come take my question here.

I've been told constantly that nothing is wrong with our setup and it 
should just be working.

Any ideas?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/75f30371-1a80-4bfb-b7de-014fdaaeb179%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: _count differes (same index different number of shards)

2015-01-23 Thread Daniel Gligorov
Yup, that's what I meant. I was editing number of shards in the backup. 
Thought might work that way, data size to end up same, just in less shards. 
Thanks for your opinion.

I need to edit number of shards, but they are not editable. Re-indexing 
will take long (3 days). So looking for a solution how to end up with 
edited shards number by backing up the old and restoring in modified one. 
Tried https://github.com/mallocator/Elasticsearch-Exporter but that one 
takes long time too (1 day). The snap and restore functionality does it in 
1h. Have 40mil docs, 100G.

On Friday, January 23, 2015 at 1:34:21 AM UTC-8, David Pilato wrote:
>
> Something I don't get. You are changing the number of shards of an index? 
> This is something you should not be able to do IMHO.
> You can not restore the full backup in less shards.
>
> That's probably the reason you are seeing less docs in your new index 
> because you restored only half of shards with what I would call somehow a 
> "hack".
>
> David
>
> Le 23 janv. 2015 à 03:23, Daniel Gligorov  > a écrit :
>
> ...some more clarification NEW and NEW_10 are restored from the same snap 
> taken from OLD. 
> What differs is before restoring NEW_10, I'm editing manualy number of 
> shards in the mapping in the snap (hdfs_repository plugin) 
>
> On Thursday, January 22, 2015 at 6:22:04 PM UTC-8, Daniel Gligorov wrote:
>>
>> Hi,
>>
>> Anybody knows how different number of primary shards can affect number of 
>> docs (_count) in same index?
>>
>> I mean I have read-only index OLD, did a snap (ES 1.2) of it and restored 
>> it as NEW. Both have same number  shards and docs. 
>> Then I'm restoring another index NEW_10, with less primary shards and I'm 
>> getting less documents count in it?
>> Doesn't matter how many tried I retested snap and restore, if I don't 
>> edit number of shards I end with good counter, so wonder what messes my case
>>
>> [:~] curl localhost:9200/_cat/indices/OLD,NEW,NEW_10?v
>> health index  pri rep docs.count docs.deleted store.size 
>> pri.store.size
>> green  OLD  20   2133371878639 15.7gb  
>> 5.2gb
>> green  NEW 20   1133348078624 10.5gb  
>> 5.2gb
>> green  NEW_10   10   1 66606442696  5.2gb  2.6gb
>>
>> Tried simple math, (docs.count - docs.deleted) / pri, but not getting 
>> same result
>>
>> Thank you,
>>
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/fa9d4c35-6a3a-4a59-b7c9-9770eea454fe%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ffc17949-bc55-4232-a9a4-794ad5ab681c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Numberformat exception

2015-01-23 Thread Prateek Asthana

I am indexing different type of fields that include number and strings. 
When I use multimatch queries (that includes both number and string fields) 
for terms such as "2001 camaro", I get NumberFormatException. I understand 
why this is happening but wondering if this can be avoided. Temporarily, I 
have fixed it by indexing all fields as strings instead of numbers.

Example:
Fields used : year (number) and model (string)
multimatch cross type query includes both year and model.
search query: 2001 camaro




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/97149488-9e49-4ba8-8bdb-1a1a157e79b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-23 Thread Todd Nine
Thanks for the pointers Jorg,
  We use Rx Java in our current application, so I'm familiar with 
backpressure and ensuring we don't overwhelm target systems.  I've been 
mulling over the high level design a bit more.  A common approach in all 
systems that perform multi region replication is the concept of "log 
shipping".  It's used heavily in SQL systems for replication, as well as in 
systems such as Megastore/HBase.  This seems like it would be the most 
efficient way to ship data from Region A to Region B with a reasonable 
amount of latency.  I was thinking something like the following.

*Admin Operation Replication*

This can get messy quickly.  I'm thinking I won't have any sort of "merge" 
logic since this can get very different for everyone's use case.  I was 
going to support broadcasting the following operations.


   - Index creation
   - Index deletion
   - Index mapping updates
   - Alias index addition
   - Alias index removal

This can also get tricky because it makes the assumption of unique index 
operations in each region.  Our indexes are Time UUID based, so I know we 
won't get conflicts.  I won't handle the case of an operation being 
replayed that conflicts with an existing index, I'll simply log it and drop 
it.  Handlers could be built in later so users could create their own 
resolution logic.  Also, this must be replayed in a very strict order.  I'm 
concerned that adding this additional master/master region communication 
could result in more load on the master.  This can be solved by running a 
dedicated master, but I don't really see any other solution.


*Data Replication*

1) Store last sent segments, probably in a system index.  Each region could 
be offline at different times, so for each segment I'll need to know where 
it's been sent.

2) Monitor segments as they're created.  I still need to figure this out a 
bit more in the context of latent sending. 

Example.  Region us-east-1 ES nodes.

We missed sending 5 segments to us-west-1 , and they were merged into 1.  I 
now only need to send the 1 merged segment to us-west-1, since the other 5 
segments will be removed.

However, then a merged segment is created in us-east-1 from 5 segments I've 
already sent to us-west-1, I won't want to ship that since it will already 
contain the data.  As the tree is continually merged, I'll need to somehow 
sort out what contains shipped data, and what contains unshipped data.


3) As a new segment is created perform the following.
  3.a) Replay any administrative operations since the last sync on the 
index to the target region, so the state is current.
  3.b) Push the segment to the target region

4) The region receives the segment, and adds it to it's current segments. 
 When a segment merge happens in the receiving region, this will get merged 
in.





Thoughts?




On Thursday, January 15, 2015 at 5:29:10 PM UTC-7, Jörg Prante wrote:
>
> While it seems quite easy to attach listeners to an ES node to capture 
> operations in translog-style and push out index/delete operations on shard 
> level somehow, there will be more to consider for a reliable solution.
>
> The Couchbase developers have added a data replication protocol to their 
> product which is meant for transporting changes over long distances with 
> latency for in-memory processing.
>
> To learn about the most important features, see
>
> https://github.com/couchbaselabs/dcp-documentation
>
> and
>
> http://docs.couchbase.com/admin/admin/Concepts/dcp.html
>
> I think bringing such a concept of an inter cluster protocol into ES could 
> be a good starting point, to sketch the complete path for such an ambitious 
> project beforehand.
>
> Most challenging could be dealing with back pressure when receiving 
> nodes/clusters are becoming slow. For a solution to this, reactive Java / 
> reactive streams look like a viable possibility.
>
> See also
>
> https://github.com/ReactiveX/RxJava/wiki/Backpressure
>
> http://www.ratpack.io/manual/current/streams.html
>
> I'm in favor of Ratpack since it comes with Java 8, Groovy, Google Guava, 
> and Netty, which has a resemblance to ES.
>
> In ES, for inter cluster communication, there is not much coded afaik, 
> except snapshot/restore. Maybe snapshot/restore can provide everything you 
> want, with incremental mode. Lucene will offer numbered segment files for 
> faster incremental snapshot/restore.
>
> Just my 2¢
>
> Jörg
>
>
>
> On Thu, Jan 15, 2015 at 7:00 PM, Todd Nine 
> > wrote:
>
>> Hey all,
>>   I would like to create a plugin, and I need a hand.  Below are the 
>> requirements I have.
>>
>>
>>- Our documents are immutable.  They are only ever created or 
>>deleted, updates do not apply.
>>- We want mirrors of our ES cluster in multiple AWS regions.  This 
>>way if the WAN between regions is severed for any reason, we do not 
>> suffer 
>>an outage, just a delay in consistency.
>>- As documents are added or removed they are rolled up then shipped 
>>

Re: Snapshot delete is executed but API call never returns

2015-01-23 Thread Nick Canzoneri
It looks like it is this issue:
https://github.com/elasticsearch/elasticsearch/issues/8958

On Thu, Jan 22, 2015 at 4:23 PM, Nick Canzoneri  wrote:

> Elasticsearch version 1.4.2
> The repository is of type "fs" and points to a NFS directory.
> Taking snapshots succeed just fine.
>
> When deleting a snapshot however, I can wait several minutes with no
> response. Pretty much immediately though if I GET the snapshot I get a 404
> in response (expected). Also, in the repository directory itself the
> folders related to the snapshot are deleted.
>
> The same behavior occurs if I shoot the request at a client node, the
> master or a data node.
>
> Let me know what more info I should provide.
>
> Thanks,
>
> --
> Nick Canzoneri
> Developer, Wildbit 
> Beanstalk , Postmark ,
> dploy.io
>



-- 
Nick Canzoneri
Developer, Wildbit 
Beanstalk , Postmark ,
dploy.io

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKWm5yOOYk9San7%3Da-HZurepwwFQUZRHPgJztANYPBx4QfcWww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: distributors vs raid0

2015-01-23 Thread joergpra...@gmail.com
Now that I re-read

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-dir-layout.html


I see the possible misconception. "RAID0" in the text should give the
picture that an ES data directory should be seen as a logical drive which
contains many files spread over physical drives. RAID0 in striping mode on
hardware controllers works differently from this: each word of block data
is split into bits that are read/written simultaneously to different
physical drives, where the filesystem or free space considerations has
nothing to do with RAID.

The ES store distributor was implemented to handle the situation where data
dirs on a node may have different free storage capacity. With the setting
"least_used" (which s the default, it really means "most_free"), ES selects
the mount point for new files that has the most free space first, so the
data paths are filled optimally by using all available space.

I don't think the distributor is of any value for future index recovery
strategies, it is too low level. Recovery will become more intelligent with
the advent of numbered sequences in Lucene segments, which allows
incremental recovery and replication of shards.

Jörg


On Fri, Jan 23, 2015 at 4:56 PM, Shaun Senecal  wrote:

> Thanks for the confirmations Jörg, Mark
>
> It seems like a lot of development effort to implement this feature for
> little to no gain over RAID-0, so I wonder if the folks at ElasticSearch
> have bigger plans for it in the future.  Perhaps file based recovery and/or
> a distributor that keeps all files for a given shard together on the same
> drive so that a failed drive results in the loss of only a few shards
> rather than an entire node.  For now though, it seems RAID is the way to go.
>
>
> Shaun
>
>
> On Friday, January 23, 2015 at 2:53:53 AM UTC-8, Jörg Prante wrote:
>>
>> There are no advantages for JBOD over RAID0. RAID0 is far superior when
>> using striped reads/writes, that is, you can add up the read/write
>> performance of all the physical drives when using a hardware RAID
>> controller. JBOD is limited to single physical drive performance .
>>
>> There is only one rare case, if you want to mix physical drives with
>> different volume capacity, where RAID0 striping can not be applied. Then
>> JBOD adds up all the volumes of the drives where striped RAID0 uses the
>> smallest drive capacity only.
>>
>> And you are correct, in either case losing a drive means failure of a
>> machine. ES solves node failures by replica shards on other machines, not
>> by a file-based repairing strategy.
>>
>> Jörg
>>
>> On Fri, Jan 23, 2015 at 12:38 AM, Mark Walkom  wrote:
>>
>>> You will lose all data if one drive dies and you use ES striping or
>>> RAID0.
>>>
>>> I don't know if there is a practical (throughput) difference, but
>>> logically they are the same.
>>>
>>> On 23 January 2015 at 10:21, Shaun Senecal  wrote:
>>>
 What is the advantage of using ElasticSearch's distributors (JBOD) over
 using raid0?

 As far as I can tell, if I lose a drive in either case I lose the whole
 node until the data can be recovered.  Is the distributor smart about which
 files it recovers and only recovers the files that were on the failed
 drive?  Is there some other advantage I am missing?


 Shaun

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%
 40googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_
>>> kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b17c012e-15f4-4b64-bfbb-fcfcdda25fe1%40googlegroups.com
> 
> .
>
> Fo

Kibana scripted dashboard click handler support?

2015-01-23 Thread Jeff Guth
Hello!  I am configuring my Kibana dashboard via scripted dashboard 
(http://www.elasticsearch.org/guide/en/kibana/current/templated-and-scripted-dashboards.html#scripted-dashboards-.js).
  
What i'm trying to do is dynamically change the the field parameter when a 
sub-element in a pie chart is clicked.  This means that I need to implement 
a click handler.  Does a scripted dashboard have click handler support?  If 
so, could someone share example code?

Thank you!
-Jeff Guth

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/36cdccae-4655-4977-96c1-066fdd7518e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to tell if connection good from JavaScript?

2015-01-23 Thread Blake McBride
Greetings,

I am using Node/JavaScript to access elasticsearch.  I am using the 1.4 API.

If a connection fails (using new elasticsearch.Client(...)), the 'new' 
succeeds.  You don't know the connection failed till you try to use it. 
 Okay, I can live with that.  What I can't live with is the backtrace I get 
when I try to use it rather than just an error return.

I get no error when I connect (to the wrong port). But I get the following 
when I try to use it:


Elasticsearch ERROR: 2015-01-23T17:14:40Z Error: Request error, retrying -- 
connect ECONNREFUSED at Log.error 
(/home/blake/ComponentServer/node_modules/elasticsearch/src/lib/log.js:213:60) 
at checkRespForFailure 
(/home/blake/ComponentServer/node_modules/elasticsearch/src/lib/transport.js:195:18)
 
at HttpConnector. 
(/home/blake/ComponentServer/node_modules/elasticsearch/src/lib/connectors/http.js:154:7)
 
at ClientRequest.bound 
(/home/blake/ComponentServer/node_modules/elasticsearch/node_modules/lodash-node/modern/internals/baseBind.js:56:17)
 
at ClientRequest.emit (events.js:95:17) at Socket.socketErrorListener 
(http.js:1552:9) at Socket.emit (events.js:95:17) at net.js:441:14 at 
process._tickCallback (node.js:442:13) Elasticsearch WARNING: 
2015-01-23T17:14:40Z Unable to revive connection: http://localhost:920/ 
Elasticsearch WARNING: 2015-01-23T17:14:40Z No living connections error: No 
Living connections

I tried using client.cluster.health and client.ping with the same results.

Thanks for the help!

Blake McBride

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/305807ef-2952-4ee9-88e9-653122269b3e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster misconfigured or elastic is just SLOW!!!!?

2015-01-23 Thread Kimbro Staken
How many shards to use is a complicated question and depends on the
specific use case. For testing in this scenario though, it's likely that
just matching the number of nodes you have would be a good choice. Then you
will have 1 primary shard for each index on each node.

That said it also looks like maybe you're creating more indices than is
ideal as well. 64 shards with 1 replica and 6 days data in daily indices
should only produce 768 total shards. Looks to me like maybe you have 1
index per hour?

On Fri, Jan 23, 2015 at 8:28 AM, Sam Flint  wrote:

> Here is the .yml part for the shards
>
> # Set the number of shards (splits) of an index (5 by default):
> #
> index.number_of_shards: 64
>
> # Set the number of replicas (additional copies) of an index (1 by
> default):
> #
> index.number_of_replicas: 1
>
>
> I have it set to 64 shards and 1 replica.   Is there a recommended amount
> of shards that I should be using?
>
> On Thursday, January 22, 2015 at 1:45:23 PM UTC-5, Sam Flint wrote:
>>
>> Hi,
>> I am evaluating elastic search for a data warehouse project.  I have
>> a 9 node cluster with 1 replica.  I have loaded 6 days worth of data.
>> Things seem sluggish all around.   Here is the health of the nodes
>>
>> {
>> "active_primary_shards": 9251,
>> "active_shards": 13509,
>> "cluster_name": "qa-elasticsearch",
>> "initializing_shards": 23,
>> "number_of_data_nodes": 9,
>> "number_of_nodes": 9,
>> "relocating_shards": 0,
>> "status": "red",
>> "timed_out": false,
>> "unassigned_shards": 5054
>> }
>>
>>
>> I am wondering if I am missing some configuration that will speed things
>> up or if I did something wrong?   I see the status is "RED" and it seems
>> that there are still node initializing and the cluster has been up for over
>> 24 hours.   I used logstash to push the data into elastic.
>> Any help would be appreciatedor if any additional information is
>> needed I can provide.
>>
>> Thanks
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/244fbf99-5ec4-4052-a860-16fad53159ed%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAA0DmXbZRa-_05-eKuF7Tej88q4u1gtK4RB9b2ypTDn1gfacBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Question on segments in memory

2015-01-23 Thread Han JU
Hi,

Recently we experienced that some slow query and finally lots of the 
queries timeouted. We found that on some data nodes the heap usage is 
nearly > 90% and GCs can not free much space.
As we already marked many fields as doc values so at this time, the 
fielddata cache size is small (~ 100MB) but we found that there's lots of 
segments in memory and GCs seems not able to free them.

More precisely on data nodes we have 4GB heap and nearly 3GB is occupied by 
in memory segments. In the same time we checked that only a few (~5) 
segments are not committed.

Our questions are 
  - why we have these in memory segments in memory when they are all 
committed? 
  - why when there's pressure on the heap, GC can not free them?
  - what brings these segments in memory? search, aggregation and maybe 
recovery process?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05b2380d-7afe-4b43-822a-c5d1eeff2733%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Kibana 4 no result found

2015-01-23 Thread Marian Valero
 I have this when I search a field: This is field is present in your 
elasticsearch mapping, but not in any documents in the search results. You 
may still be able to visualize or search on it.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c8b18a9-72cd-4383-9de3-7e0168117b06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana 4 no result found

2015-01-23 Thread Marian Valero
I have indexed data in ES and I can visualize that in Kibana 3 but I 
installed Kibana 4 and when I go to search or create an bar chart or any 
chart I get this: No result found, but I can see my index in settings and 
all field that I have indexed.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/56a1bc74-d563-4789-a1cf-7ef29d61a52b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: distributors vs raid0

2015-01-23 Thread Shaun Senecal
Thanks for the confirmations Jörg, Mark

It seems like a lot of development effort to implement this feature for 
little to no gain over RAID-0, so I wonder if the folks at ElasticSearch 
have bigger plans for it in the future.  Perhaps file based recovery and/or 
a distributor that keeps all files for a given shard together on the same 
drive so that a failed drive results in the loss of only a few shards 
rather than an entire node.  For now though, it seems RAID is the way to go.


Shaun


On Friday, January 23, 2015 at 2:53:53 AM UTC-8, Jörg Prante wrote:
>
> There are no advantages for JBOD over RAID0. RAID0 is far superior when 
> using striped reads/writes, that is, you can add up the read/write 
> performance of all the physical drives when using a hardware RAID 
> controller. JBOD is limited to single physical drive performance . 
>
> There is only one rare case, if you want to mix physical drives with 
> different volume capacity, where RAID0 striping can not be applied. Then 
> JBOD adds up all the volumes of the drives where striped RAID0 uses the 
> smallest drive capacity only.
>
> And you are correct, in either case losing a drive means failure of a 
> machine. ES solves node failures by replica shards on other machines, not 
> by a file-based repairing strategy.
>
> Jörg
>
> On Fri, Jan 23, 2015 at 12:38 AM, Mark Walkom  > wrote:
>
>> You will lose all data if one drive dies and you use ES striping or RAID0.
>>
>> I don't know if there is a practical (throughput) difference, but 
>> logically they are the same.
>>
>> On 23 January 2015 at 10:21, Shaun Senecal > > wrote:
>>
>>> What is the advantage of using ElasticSearch's distributors (JBOD) over 
>>> using raid0?
>>>
>>> As far as I can tell, if I lose a drive in either case I lose the whole 
>>> node until the data can be recovered.  Is the distributor smart about which 
>>> files it recovers and only recovers the files that were on the failed 
>>> drive?  Is there some other advantage I am missing?
>>>
>>>
>>> Shaun
>>>
>>> --
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b17c012e-15f4-4b64-bfbb-fcfcdda25fe1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cluster misconfigured or elastic is just SLOW!!!!?

2015-01-23 Thread Sam Flint
Here is the .yml part for the shards

# Set the number of shards (splits) of an index (5 by default):
#
index.number_of_shards: 64

# Set the number of replicas (additional copies) of an index (1 by default):
#
index.number_of_replicas: 1


I have it set to 64 shards and 1 replica.   Is there a recommended amount 
of shards that I should be using? 

On Thursday, January 22, 2015 at 1:45:23 PM UTC-5, Sam Flint wrote:
>
> Hi, 
> I am evaluating elastic search for a data warehouse project.  I have a 
> 9 node cluster with 1 replica.  I have loaded 6 days worth of data.   
> Things seem sluggish all around.   Here is the health of the nodes 
>
> {
> "active_primary_shards": 9251,
> "active_shards": 13509,
> "cluster_name": "qa-elasticsearch",
> "initializing_shards": 23,
> "number_of_data_nodes": 9,
> "number_of_nodes": 9,
> "relocating_shards": 0,
> "status": "red",
> "timed_out": false,
> "unassigned_shards": 5054
> }
>
>
> I am wondering if I am missing some configuration that will speed things 
> up or if I did something wrong?   I see the status is "RED" and it seems 
> that there are still node initializing and the cluster has been up for over 
> 24 hours.   I used logstash to push the data into elastic.
> Any help would be appreciatedor if any additional information is 
> needed I can provide. 
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/244fbf99-5ec4-4052-a860-16fad53159ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Need review for my REST query (template modification)

2015-01-23 Thread Aldian
Hi!

I am using the usual ELK stack with the default template (
http://pastebin.com/DtYiazVr 
).
 
In every log message, the date in stored in field named "log_date", which 
the date filter converts in a "@timestamp". I want to set the "log_date" 
field as "not_analyzed" so that I can sort it in Kibana without getting 
weird results.

I built the following query 

curl -XPUT localhost:9200/_template/template_1 -d '
{
"template" : "logstash-*",
"properties" : {
"log_date" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}

Can you confirm that the request is correct? I have doubts about the 
template name. I thought about calling url 
localhost:9200/_template/logstash in order to modify the existing template 
rather than creating a new one, but I am afraid of what could happen the 
day I restart logstash, so my thinking is that if all works as intended, 
both logstash default template and that one will apply.

Also I believe that templates are only about future data. Is there any way 
to retro apply it back on existing indexes?

Thanks for your help

Aldian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c7fe34e-7792-409d-83ab-3c39fe883e95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Bulk importing CSVs with different headers

2015-01-23 Thread Ron
Hello all,

So I'm trying to import a large number of CSV files into Elasticsearch. All 
the files have different content in them, with different headers. 

My goal is to have a directory we can drop CSVs into, and some plugin or 
process would pick them up, read the header and place the data in 
Elasticsearch mapping data to fields (gotten from the header). 

I've looked at the CSV River plugin and Fluentd. It looks like they both 
support half of the plan. The issue we run into is it looks like both want 
static field names before import.  

Am I wrong?  Anyone's help would be wonderful.

Thanks

Ron

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54f555d9-1d66-4b19-9334-90dde4fc658a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Automatic index creation

2015-01-23 Thread Abhimanyu Nagrath
hi ,I am new to elastic search and I want to know can I create indexes 
automatically based on time .is there  a feature in elasitc search that 
supports this. For example i want to create a index after 4 hour 
automatically does elastic search support this?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d91afc72-1c1d-4885-a98c-923ae808c4be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


beginner's question: matching field names exactly (or containing a substring)

2015-01-23 Thread r0b3rt
hello everybody!

i'm kinda new to elasticsearch and just entered this group. i started using 
it some months ago on my work and trying to get deeper into all this stuff.


i wondered if it is possible to do elasticsearch queries where i can match 
names of fields and not values of fields. 

in my case, i have a very large data model i am storing in elasticsearch 
with lots of field names with same suffixes. 
i would like to know if I can query elasticsearch to give me a result with 
all documents containing all fields ending with "_xyz" and their values of 
course. 
i know i could do a simple query where i just write all the field names 
explicitly, but i am curious if i could save time (and spent the saved time 
here to ask this question...)

thank you all ;)

robert

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/42dca496-5195-43fd-a755-375d691ecec0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Regarding data copy

2015-01-23 Thread David Pilato
I would copy the full data dir. Why? 
Are you looking for something special? Wondering what is behind your question...

David

> Le 23 janv. 2015 à 12:23, phani.nadimi...@goktree.com a écrit :
> 
> Thank you David for quick reply is it enough to copy the indices folder or 
> need to copy entire data folder?
> 
> phani
> 
> 
> 
>> On Friday, January 23, 2015 at 3:32:34 PM UTC+5:30, phani.n...@goktree.com 
>> wrote:
>> Hi All,
>> 
>>I have elastic search server of version 1.3.7 i have large indices on 
>> existing server.now i am establishing new servers with hardware 
>> up-gradation.I want to move or coy indices from  old elasticsearch server to 
>> new server new server also i am installing es 1.3.7 server.my question is 
>> will the indices work directly on other cluster if we copy data directly  or 
>> is there any other process or tools in elasticsearch
>> 
>> Thaks,
>> phani
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/ce16d52a-740d-461b-8a40-d7e40c16f3eb%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/C02EE18C-53C5-44B2-873E-6967FE16918C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch + Rails: how to refer to parent fields in nested / has_child query

2015-01-23 Thread Victor Zagorodny


Need help from guys with extensive experience in ElasticSearch or any other 
storage that supports full-text searching (matching phrases,tokenizers, 
analyzers), parent/child relationships, powerful and flexible queries and 
aggregations (sum/avg/min/max, group by specific field).

*Description of the problem*

There's a repository with 3 types of entities:

   - Search parameters (search)
   - Users (user)
   - Documents (document)

1 user has many documents

Having an instance of the search parameters object (search), which is 
created by a user (user), N documents are selected from storage based on 
these search parameters (target documents). For each of the N documents we 
must select 1 document of the same type (call this document co-document), 
on the basis of

   - Some parameters of the user who searches (so-called searcher)
   - An owner of target document of N found as a result of the initial 
   search (so-called owner)
   - Some of the fields of the document N (target document), at least 
   target_document.owner_id (mandatory condition)

The result is an array of pairs (target document; co-document) (in the 
repository can be documents that won't have the co-document under the 
current search conditions, and they are not in the array)

And finally, we need to build some aggregations against this array 
(aggregations in terms ElasticSearch 1.0 and above), which are based on the 
fields of target documents of these pairs

*Problems*

   - Using just data denormalization to store pairs (target document <-> 
   co-document) by itself does not help, because the relationship between the 
   target document and co-document is not defined statically, it dependent on 
   search params
   - Denormalizing data by storing a limited number of candidates to be 
   co-document of a particular target document (for example, with the same 
   owner_id, as a mandatory condition) won't help, because ElasticSearch does 
   not support referring parent object fields when querying for the child 
   objects (this is true for all types of queries related to parent-child 
   relations: nested query, has_child query)
   - You can solve the problem the dumb way by first selecting N target 
   documents, then for each of them you must choose co-document (N 
   subqueries), and then build aggregations against all these documents in the 
   application code; it's unacceptable in terms of UX, such a request takes 
   10-15 seconds on 15,000 documents, not taking into account the time to 
   build aggregations, and it's assumed to have hundreds of thousands or even 
   millions of documents
   - I've considered an option to return to the SQL-type storage like 
   PostgreSQL, but they do not support the necessary functions (eg search for 
   synonyms, removing stop-words), which are necessary in this case, or I 
   don't have info on such support

*Possible solutions*

   - Search for necessary, but perhaps unknown capabilities of 
   ElasticSearch which can help in solving this problem
   - Pick another storage having all of the ES capabilities + 
   joins/subqueries/you name it

Any help is appreciated.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a9955a7a-c04d-48f8-850a-7bb3c9dad52f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Configuration of elasticsearch to index 300 Million documents

2015-01-23 Thread hitesh shekhada
Hi,


   1. My goal is to index 300 million documents/products from MS SQL server 
   database.
   2. To get all documents I need to join 14 different tables.
   3. Total data size of 300 million documents is 300GB
   4. There 70 fields in one document
   5. One document is of size 0.8 kb
   6. Need to update value of 10 fields (out of 70) for almost 90 million 
   documents every night.


I need to know ...

   1. Does anybody has indexed such a large amount of data to elasticsearch 
   server?
   2. How many cluster/nodes I need for handling it.


Please let me know if someone has used elasticsearch for this amount of 
data.

Thanks,
Hitesh


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/acc7bbd4-7abc-4451-bfd5-b21ab725e02e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Regarding data copy

2015-01-23 Thread phani . nadiminti
Thank you David for quick reply is it enough to copy the indices folder or 
need to copy entire data folder?

phani



On Friday, January 23, 2015 at 3:32:34 PM UTC+5:30, phani.n...@goktree.com 
wrote:
>
> Hi All,
>
>I have elastic search server of version 1.3.7 i have large indices on 
> existing server.now i am establishing new servers with hardware 
> up-gradation.I want to move or coy indices from  old elasticsearch server 
> to new server new server also i am installing es 1.3.7 server.my question 
> is will the indices work directly on other cluster if we copy data 
> directly  or is there any other process or tools in elasticsearch
>
> Thaks,
> phani
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ce16d52a-740d-461b-8a40-d7e40c16f3eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: need some n00b help: how do I install and set up synonyms file?

2015-01-23 Thread Thami Inaflas
Hello,

i guess you should re-index your data because you have changed the mapping, 
and check if this analysis is run on your query AND on your indexed data.

I hope this helps you out,
TI

Le vendredi 23 janvier 2015 01:50:00 UTC+1, james rubinstein a écrit :
>
> Thanks Mark, 
> I believe that I have done that (but perhaps not)
>
> My file is located at
>
> /Users/jrubinstein/dev/elasticsearch/elasticsearch-1.4.1/config/analysis
>
> the YML files are in the 'config' directory. 
>
>
> On Thursday, January 22, 2015 at 3:37:39 PM UTC-8, Mark Walkom wrote:
>>
>> The path needs to be relative to the config home for ES.
>>
>> Take a look at 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-dir-layout.html#default-paths
>>  
>> for where that would be on your installation.
>>
>> On 23 January 2015 at 08:45, james rubinstein  
>> wrote:
>>
>>> Also, I tried this:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> curl -XPUT 'localhost:9200/test/_settings' -d '
>>>
>>> > {
>>>
>>> > "index" : {
>>>
>>> > "analysis" : {
>>>
>>> > "analyzer" : {
>>>
>>> > "synonym" : {
>>>
>>> > "tokenizer" : "whitespace",
>>>
>>> > "filter" : ["synonym"]
>>>
>>> > }
>>>
>>> > },
>>>
>>> > "filter" : {
>>>
>>> > "synonym" : {
>>>
>>> > "type" : "synonym",
>>>
>>> > "synonyms_path" : "analysis/synonym.txt"
>>>
>>> > }
>>>
>>> > }
>>>
>>> > }
>>>
>>> > }
>>>
>>> > }'
>>>
>>> Which gives the response : {"acknowledged":true} , but it still didn't 
>>> change the outcome when querying 
>>>
>>>
>>> On Thursday, January 22, 2015 at 12:57:51 PM UTC-8, james rubinstein 
>>> wrote:

 Hi all,
 I'm new to using ES, but have so far found the process quite intuitive. 
 I've installed ES, set up an index, indexed a bunch of JSON documents, and 
 I can search them. Hooray! 
 However, I want to start using synonyms for a few queries. I'd like to 
 do this at query time using a synonym file. How can I install the synonym 
 file and have ES read it? 
 I'm using elasticsearch-1.4.1 on Mac OSX locally. 

 I've read through the docs  
 
  but 
 I still don't understand what to do because I'm still not seeing queries 
 that use the synonym I've defined. 




 POST /test/_settings
 { 
 "index" : {
 "analysis" : {
 "analyzer" : {
 "synonym" : {
 "tokenizer" : "whitespace",
 "filter" : ["synonym"]
 }
 },
 "filter" : {
 "synonym" : {
 "type" : "synonym",
 "synonyms_path" : "analysis/synonym.txt"
 }
 }
 }
 }
 }

 My synonym file has one line:

 "clementine=> clementine,mandarin,orange,citrus"

 When I search for "Clementine" on my index I get 3 results, searching 
 for "Orange" gets me 66. 

 Thanks,
 JR


  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/a887f420-b04b-4d5d-80aa-14c120fb9da3%40googlegroups.com
>>>  
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aadf03af-02f1-41f9-b6fc-f8963efb25c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: distributors vs raid0

2015-01-23 Thread joergpra...@gmail.com
There are no advantages for JBOD over RAID0. RAID0 is far superior when
using striped reads/writes, that is, you can add up the read/write
performance of all the physical drives when using a hardware RAID
controller. JBOD is limited to single physical drive performance .

There is only one rare case, if you want to mix physical drives with
different volume capacity, where RAID0 striping can not be applied. Then
JBOD adds up all the volumes of the drives where striped RAID0 uses the
smallest drive capacity only.

And you are correct, in either case losing a drive means failure of a
machine. ES solves node failures by replica shards on other machines, not
by a file-based repairing strategy.

Jörg

On Fri, Jan 23, 2015 at 12:38 AM, Mark Walkom  wrote:

> You will lose all data if one drive dies and you use ES striping or RAID0.
>
> I don't know if there is a practical (throughput) difference, but
> logically they are the same.
>
> On 23 January 2015 at 10:21, Shaun Senecal  wrote:
>
>> What is the advantage of using ElasticSearch's distributors (JBOD) over
>> using raid0?
>>
>> As far as I can tell, if I lose a drive in either case I lose the whole
>> node until the data can be recovered.  Is the distributor smart about which
>> files it recovers and only recovers the files that were on the failed
>> drive?  Is there some other advantage I am missing?
>>
>>
>> Shaun
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG3tvLNqQ-iO%2B3Wdvexb34BoXB2wHoR1dcDhDB4R9x%3DHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Regarding data copy

2015-01-23 Thread David Pilato
Yes you can the full data dir from one node to another.
Note that you should stop both clusters for that.

If you want to do that live, there are other strategies.

David

> Le 23 janv. 2015 à 11:02, phani.nadimi...@goktree.com a écrit :
> 
> Hi All,
> 
>I have elastic search server of version 1.3.7 i have large indices on 
> existing server.now i am establishing new servers with hardware 
> up-gradation.I want to move or coy indices from  old elasticsearch server to 
> new server new server also i am installing es 1.3.7 server.my question is 
> will the indices work directly on other cluster if we copy data directly  or 
> is there any other process or tools in elasticsearch
> 
> Thaks,
> phani
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/68223dde-82ef-4480-9897-ae9abf93131c%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/48B34D02-858A-4CAE-9AC3-24542EC32C54%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Better understanding Lucene/Shard overheads

2015-01-23 Thread Michael McCandless
There is definitely a non-trivial per-index cost.

>From Lucene's standpoint, ES holds an IndexReader (for searching) and
IndexWriter (for indexing) open.

IndexReader requires some RAM for each segment to hold structures like live
docs, terms index, index data structures for doc values fields, and holds
open a number of file descriptors in proportion to how many segments are in
the index.

IndexWriter has a RAM buffer (indices.memory.index_buffer_size in ES) to
hold recently indexed/deleted documents, and periodically opens readers (10
at a time by default) to do merging, which bumps up RAM usage and file
descriptors while the merge runs.

There is also a per-indexed-field cost in Lucene; if you have a great many
unique indexed fields that may matter.

If you use field data, it's entirely RAM resident (doc values is a better
choice since it uses much less RAM).

ES has common thread pools on the node which are shared for all ops across
all shards on that node, so I don't think more indices translates to more
threads.

Net/net you really should just conduct your own tests to get a feel of
resource consumption in your use case...

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jan 22, 2015 at 4:07 PM, Drew Kutcharian  wrote:

> Hi,
>
> I just came across this blog post:
> http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html
>
> Seems like there has been a lot of work done on Lucene to reduce its
> memory requirements and even more on Lucene 5.0. This is specifically
> interesting to me since I’m working on a project that uses Elasticsearch
> and we are planning on using 1 index per customer model (each with 1 or
> maybe 2 shards and no replicas) and shard allocation, mainly because:
>
> 1. We are going to have few thousand customers at most
>
> 2. Each customer will only need access to their own data (no global
> queries)
>
> 3. The indices are going be relatively large (each with millions of small
> docs)
>
> 4. We are going to need to do a lot of parent/child type queries (and ES
> doesn’t support cross-shard parent/child relationships and the parent id
> cache seems not that efficient, see
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child.html
>  and
> https://github.com/elasticsearch/elasticsearch/issues/3516#issuecomment-23081662).
> This is the main reason we feel we can’t use time based (daily, monthly, …)
> indices.
>
> 5. Being able to easily “drop” an index if a customer leaves the initial
> trial.
>
>
> I wanted to better understand the overheads of an Elasticsearch shard. Is
> it just memory or CPU/threads too? Where can I find more information about
> this?
>
> Thanks,
>
> Drew
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/F59813A2-904C-4B29-BBC9-6174DD3C8DAF%40venarc.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRcpOy6RYgvi-GC6jpsuO1-qsRcTecUvr066Rkr3qxZijA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: stats aggregation on list length

2015-01-23 Thread Jilles van Gurp
Thanks! Now it works.

Best,

Jilles

On Friday, January 23, 2015 at 3:04:55 AM UTC+1, Masaru Hasegawa wrote:
>
> Hi, 
>
> Objects are flattened in index level. Nothing is indexed as “member” 
> that’s why you get the exception. 
> Using doc[‘members.name'] instead of doc[‘members’] in script should 
> work. 
>
>
> Masaru 
>
>
> On January 22, 2015 at 19:10:25, Jilles van Gurp (jilles...@gmail.com 
> ) wrote: 
> > I'm trying to do a stats aggregation on the list length using a script 
> but 
> > I'm getting errors. For this data, 
> >   
> > PUT test_groups/group/1 
> > { 
> > "name":"1", 
> > "members":[ 
> > { 
> > "name":"m1" 
> > } 
> > ] 
> > } 
> >   
> > PUT test_groups/group/2 
> > { 
> > "name":"2", 
> > "members":[ 
> > { 
> > "name":"m1" 
> > }, 
> > { 
> > "name":"m2" 
> > } 
> > ] 
> > } 
> >   
> > and this query: 
> >   
> > GET test_groups/group/_search 
> > { 
> > "aggs": { 
> > "group_members": { 
> > "filter": { 
> > "exists": { 
> > "field": "members" 
> > } 
> > }, 
> > "aggs": { 
> > "length": { 
> > "stats": { 
> > "script": "doc['members'].values.length" 
> > } 
> > } 
> > } 
> > } 
> > } 
> > } 
> >   
> > I get an error stating that the members field does not exist in type 
> group: 
> >   
> > { 
> > "took": 4, 
> > "timed_out": false, 
> > "_shards": { 
> > "total": 5, 
> > "successful": 3, 
> > "failed": 2, 
> > "failures": [ 
> > { 
> > "index": "test_groups", 
> > "shard": 2, 
> > "status": 500, 
> > "reason": "QueryPhaseExecutionException[[test_groups][2]: 
> > query[ConstantScore(cache(_type:group))],from[0],size[10]: Query Failed 
> > [Failed to execute main query]]; nested: 
> > GroovyScriptExecutionException[ElasticsearchIllegalArgumentException[No 
>   
> > field found for [members] in mapping with types [group]]]; " 
> > }, 
> > { 
> > "index": "test_groups", 
> > "shard": 3, 
> > "status": 500, 
> > "reason": "QueryPhaseExecutionException[[test_groups][3]: 
> > query[ConstantScore(cache(_type:group))],from[0],size[10]: Query Failed 
> > [Failed to execute main query]]; nested: 
> > GroovyScriptExecutionException[ElasticsearchIllegalArgumentException[No 
>   
> > field found for [members] in mapping with types [group]]]; " 
> > } 
> > ] 
> > }, 
> > "hits": { 
> > "total": 0, 
> > "max_score": null, 
> > "hits": [] 
> > }, 
> > "aggregations": { 
> > "group_members": { 
> > "doc_count": 0, 
> > "length": { 
> > "count": 0, 
> > "min": null, 
> > "max": null, 
> > "avg": null, 
> > "sum": null 
> > } 
> > } 
> > } 
> > } 
> >   
> > Is there a way to do this? 
> >   
> > Best regards, 
> >   
> > Jilles 
> >   
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch"   
> > group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to elasticsearc...@googlegroups.com .   
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/fcbce46c-6556-4e4e-b74a-2a4cbea915c6%40googlegroups.com.
>  
>   
> > For more options, visit https://groups.google.com/d/optout. 
> >   
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3f877a9-633e-4fe8-b4ea-3869e833782f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Regarding data copy

2015-01-23 Thread phani . nadiminti
Hi All,

   I have elastic search server of version 1.3.7 i have large indices on 
existing server.now i am establishing new servers with hardware 
up-gradation.I want to move or coy indices from  old elasticsearch server 
to new server new server also i am installing es 1.3.7 server.my question 
is will the indices work directly on other cluster if we copy data 
directly  or is there any other process or tools in elasticsearch

Thaks,
phani

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/68223dde-82ef-4480-9897-ae9abf93131c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how to pass 2 different timestamp in RangeFilterBuilder for elasticsearch

2015-01-23 Thread David Pilato
Is your question about how I create a JODA DateTime object?
Or how I pass it to the RangeFilter?

Sorry I did not get your question.

David

> Le 23 janv. 2015 à 06:17, Subhadip Bagui  a écrit :
> 
> Hi,
> 
> Any ideas?
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/0484e9f6-8f63-42fb-ad9d-2d3ec1629ce3%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/815F69B1-3EA3-4FB8-A2F9-8E6E098E4EF3%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: _count differes (same index different number of shards)

2015-01-23 Thread David Pilato
Something I don't get. You are changing the number of shards of an index? This 
is something you should not be able to do IMHO.
You can not restore the full backup in less shards.

That's probably the reason you are seeing less docs in your new index because 
you restored only half of shards with what I would call somehow a "hack".

David

> Le 23 janv. 2015 à 03:23, Daniel Gligorov  a écrit 
> :
> 
> ...some more clarification NEW and NEW_10 are restored from the same snap 
> taken from OLD. 
> What differs is before restoring NEW_10, I'm editing manualy number of shards 
> in the mapping in the snap (hdfs_repository plugin) 
> 
>> On Thursday, January 22, 2015 at 6:22:04 PM UTC-8, Daniel Gligorov wrote:
>> Hi,
>> 
>> Anybody knows how different number of primary shards can affect number of 
>> docs (_count) in same index?
>> 
>> I mean I have read-only index OLD, did a snap (ES 1.2) of it and restored it 
>> as NEW. Both have same number  shards and docs. 
>> Then I'm restoring another index NEW_10, with less primary shards and I'm 
>> getting less documents count in it?
>> Doesn't matter how many tried I retested snap and restore, if I don't edit 
>> number of shards I end with good counter, so wonder what messes my case
>> 
>> [:~] curl localhost:9200/_cat/indices/OLD,NEW,NEW_10?v
>> health index  pri rep docs.count docs.deleted store.size 
>> pri.store.size
>> green  OLD  20   2133371878639 15.7gb  5.2gb
>> green  NEW 20   1133348078624 10.5gb  5.2gb
>> green  NEW_10   10   1 66606442696  5.2gb  2.6gb
>> 
>> Tried simple math, (docs.count - docs.deleted) / pri, but not getting same 
>> result
>> 
>> Thank you,
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/fa9d4c35-6a3a-4a59-b7c9-9770eea454fe%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/70546D5D-2BFA-4405-8155-AA85515F7B3C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Reverse nested aggregation within nested filter aggregation fails

2015-01-23 Thread Selvinaz Karahancer-Bouraga
Hi Masaru,

the datamodel is correct, attributes are in the root element, thas was just
a copy paste failure.
But the reason why it fails was that the nested aggregation  around “LINE”
has missing.

Now I get the correct results.

Thank you s much :)

2015-01-23 5:22 GMT+01:00 Masaru Hasegawa :

> Hi,
>
> Not sure if it solves your issue but I think there are a few things to fix:
> - “attributes" is under “source". nested aggregation’s “path” would be
> “source.attributes”. You’d need to update field names accordingly as well.
> - reverse_nested aggregation’s “path” would be empty since it’s joined
> back to root.
> - nested aggregation is needed around “LINE” aggregation since you are in
> root level.
>
>
> Masaru
>
> On January 22, 2015 at 19:13:10, Selvinaz Karahancer-Bouraga (
> selvinaz.karahance...@gmail.com) wrote:
> > I am using Elasticsearch 1.3.4.
> >
> > Nobody has an idea why the buckets of LINES are empty?
> > Is there another possibility to resolve this problem?
> >
> > Am Mittwoch, 21. Januar 2015 13:21:04 UTC+1 schrieb Selvinaz
> > Karahancer-Bouraga:
> > >
> > > Hello,
> > >
> > > I have to realize distinct data queries on data persisted in
> ElasticSearch.
> > > My data model looks like:
> > >
> > >
> {"took":15,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"event_index_v_0_3","_type":"EventBean","_id":"o6tFCVGjS7mnyUV92d7tOQ","_score":1.0,"_source":{
> > > "severityLevel" : "SL_EVENT",
> > > "source" : {
> > > "sourceId" : "1",
> > > "sourceType" : "VEHICLE",
> > > "description" : null,
> > > "mandator" : {
> > > "mandatorId" : "DEF",
> > > "mandatorName" : null,
> > > "priority" : null
> > > },
> > > "eventTime" : 1410768722000,
> > > "version" : "Version_0_1",
> > > "attributes" : [ {
> > > "paramKey" : "COURSE",
> > > "value" : "123"
> > > }, {
> > > "paramKey" : "DRIVERNO",
> > > "value" : "111"
> > > }, {
> > > "paramKey" : "LINE",
> > > "value" : "101"
> > > }, {
> > > "paramKey" : "gps_x",
> > > "value" : ""
> > > }, {
> > > "paramKey" : "gps_y",
> > > "value" : "87654321"
> > > } ]
> > > }}
> > > where attributes are nested objects of EventBeans.
> > > Now I want to have all distinct values of mandatorId, LINE and gps_x.
> > > The aggregationbuilder looks like:
> > >
> > > "aggregations" : {
> > > "source.mandator.mandatorId" : {
> > > "terms" : {
> > > "field" : "source.mandator.mandatorId",
> > > "size" : 2147483647,
> > > "min_doc_count" : 1
> > > },
> > > "aggregations" : {
> > > "attributes" : {
> > > "nested" : {
> > > "path" : "attributes"
> > > },
> > > "aggregations" : {
> > > "gps_x" : {
> > > "filter" : {
> > > "term" : {
> > > "attributes.paramKey" : "gps_x"
> > > }
> > > },
> > > "aggregations" : {
> > > "gps_x" : {
> > > "terms" : {
> > > "field" : "attributes.value",
> > > "size" : 2147483647,
> > > "order" : {
> > > "_count" : "desc"
> > > }
> > > },
> > > "aggregations" : {
> > > "attributes" : {
> > > "reverse_nested" : {
> > > "path" : "attributes"
> > > },
> > > "aggregations" : {
> > > "LINE" : {
> > > "filter" : {
> > > "term" : {
> > > "attributes.paramKey" : "LINE"
> > > }
> > > },
> > > "aggregations" : {
> > > "LINE" : {
> > > "terms" : {
> > > "field" : "attributes.value",
> > > "size" : 2147483647,
> > > "order" : {
> > > "_count" : "desc"
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > >
> > > and the response looks like:
> > >
> > > "aggregations" : {
> > > "source.mandator.mandatorId" : {
> > > "doc_count_error_upper_bound" : 0,
> > > "sum_other_doc_count" : 0,
> > > "buckets" : [ {
> > > "key" : "def",
> > > "doc_count" : 3,
> > > "attributes" : {
> > > "doc_count" : 15,
> > > "gps_x" : {
> > > "doc_count" : 3,
> > > "gps_x" : {
> > > "doc_count_error_upper_bound" : 0,
> > > "sum_other_doc_count" : 0,
> > > "buckets" : [ {
> > > "key" : "",
> > > "doc_count" : 1,
> > > "attributes" : {
> > > "doc_count" : 1,
> > > "LINE" : {
> > > "doc_count" : 0,
> > > "LINE" : {
> > > "doc_count_error_upper_bound" : 0,
> > > "sum_other_doc_count" : 0,
> > > "buckets" : [ ]
> > > }
> > > }
> > > }
> > > }, {
> > > "key" : "",
> > > "doc_count" : 1,
> > > "attributes" : {
> > > "doc_count" : 1,
> > > "LINE" : {
> > > "doc_count" : 0,
> > > "LINE" : {
> > > "doc_count_error_upper_bound" : 0,
> > > "sum_other_doc_count" : 0,
> > > "buckets" : [ ]
> > > }
> > > }
> > > }
> > > }, {
> > > "key" : "",
> > > "doc_count" : 1,
> > > "attributes" : {
> > > "doc_count" : 1,
> > > "LINE" : {
> > > "doc_count" : 0,
> > > "LINE" : {
> > > "doc_count_error_upper_bound" : 0,
> > > "sum_other_doc_count" : 0,
> > > "buckets" : [ ]
> > > }
> > > }
> > > }
> > > } ]
> > > }
> > > }
> > > }
> > > } ]
> > > }
> > > }
> > >
> > > The buckets of LINEs are empty, I am using reverse_nested aggregation,
> but
> > > I think I am still in the filter of attributes.paramKey=gps_x.
> > > How can I solve thi

Re: Simultaneous indexing and searching in 2 threads gets "Failed to execute phase" exception...

2015-01-23 Thread David Pilato
Yes I am.

Client is thread safe. Any chance you could share on github a small project 
which reproduce this error?

David

> Le 23 janv. 2015 à 01:18, TimOnGmail  a écrit :
> 
> I changed the code to use a Singleton.  Even so, when I made the indexing and 
> searching happen in 2 different threads (without waiting for responses - just 
> ignoring the returned future), it failed similarly - even if I waited for 
> awhile before issuing the search.
> 
> If I did the same thing, but didn't do it in 2 threads, it succeeded.
> 
> So I am wondering if there is some issue when accessing the Singleton client 
> in different threads, some non-threadsafe issue at work.
> 
> David, are you a dev on the Elasticsearch team?  If not, I hope one of them 
> chimes in here.
> 
> - Tim
> 
>> On Thursday, January 22, 2015 at 2:30:07 PM UTC-8, David Pilato wrote:
>> Answers inlined 
>> 
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>> 
>>> Le 22 janv. 2015 à 20:19, TimOnGmail  a écrit :
>>> 
>>> Thanks for your suggestions!
>>> 
>>> I'm generally created a fresh client for each index/search request.  So 
>>> that's not correct?  I had thought it was better to do it that way.
>> 
>> No. You should create a singleton.
>> 
>>> 
>>> Any problems with using separate clients in the same VM, different threads, 
>>> that you know of?
>>> 
>> 
>> You don't need them so that's a waste of resources IMHO.
>> 
>>> The indexes are already created, incidentally; the calls I'm making are in 
>>> adding items to the index or in searching the index only.
>>> 
>> 
>> That's super strange that shards are failing if the index has been created 
>> with success.
>> 
>>> - Tim
>>> 
>>> 
 On Wednesday, January 21, 2015 at 10:57:52 PM UTC-8, David Pilato wrote:
 Some ideas:
 
 You can/should share the same client within all threads. So only one 
 client for the full JVM.
 You should create first the index and wait for the index to be created, 
 using actionGet(). It's a quick operation. Then run your code as you wrote.
 
 My 2 cents.
 
 David
 
> Le 22 janv. 2015 à 06:12, TimOnGmail  a écrit :
> 
> I have a situation where, using the Java API, I initiate a bunch of 
> indexing operations, but throw away the Future object (I don't need the 
> return status).  This is so I can do a lot of indexing reasonably 
> asynchronously, so I don't have to hold up the GUI that triggers these 
> calls.
> 
> However, if I fire off these indexing operations, and immediately after 
> do a search operation on the same index, I get the following exception:
> 
> Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: 
> Failed to execute phase [init_scan], all shards failed
>   at 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:233)
>  [elasticsearch-1.4.1.jar:]
>   at 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onFailure(TransportSearchTypeAction.java:179)
>  [elasticsearch-1.4.1.jar:]
>   at 
> org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:565)
>  [elasticsearch-1.4.1.jar:]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [rt.jar:1.7.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [rt.jar:1.7.0_60]
>   ... 1 more
> 
> ... however, if I wait awhile after indexing before searching, the search 
> succeeds, no errors.
> 
> Does anyone know what is going on there?  Shouldn't initiating an index 
> operation asynchronously not affect search operations made in a different 
> client?
> 
> This is happening only when I use 2 separate threads (and two separate 
> TransportClients) for each operation... thus:
> 
> 1. Thread 1: Fork Thread 2
> 2. Thread 2: Create new TransportClient; Index list of items, not waiting 
> for Future objects (i.e. not calling actionGet())
> 3. Thread 1: While Thread 2 is running, create new TransportClient and do 
> search
> 4. Get exception above 
> 
> ... compared to:
> 
> 1. Thread 1: Fork Thread 2
> 2. Thread 2: Create new TransportClient; Index list of items, not waiting 
> for Future objects (i.e. not calling actionGet())
> 3. Wait a bit
> 4. Thread 1: (Thread 2 has presumably finished by now); create new 
> TransportClient and do search
> 5. Don't get exception above 
>  
> ... also, if, instead, I do it one thread:
> 
> 1. Create new TransportClient; Index list of items, not waiting for 
> Future objects (i.e. not calling actionGet())
> 2. Close TransportClient
> 3. Create new TransportClient; do search
> 
> ... it works fi

RE: Heavy indexing cause severe delay for searching

2015-01-23 Thread Wang Yong
ES will send one query request to each shard when query on this index. So, if 
the number of shard is too big, the number of query request will also be too 
big to use up all query threads.

 

 

 

From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] On 
Behalf Of Hajime
Sent: Friday, January 23, 2015 3:19 PM
To: elasticsearch@googlegroups.com
Subject: Re: Heavy indexing cause severe delay for searching

 

I still don't get why having many shards in one index matter.Since the index is 
just a logical grouping of shards or lucene threads,perhaps total num of shards 
per the cluster should be more significant?For Elasticsearch to grouping the 
shards cost a lot?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com 
 .
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqDSevYiO7B_TE7UVV7uZRhm_oEB9%2B7R4futaufL5_BRw%40mail.gmail.com
 

 .
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/003501d036ec%24e0868b40%24a193a1c0%24%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Node.js Client Performance

2015-01-23 Thread Klausen Schaefersinho
Hi,

I would like to use elasticsearch and as my primary data base in a Node.js 
project. However, after playing a little I found out that the performance 
is really bad. A write takes nearly 300 ms to complete!

I am using the official node client. Is there a way to tune performance 
like keep-alive or so?


Cheers,

Klaus

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9a9a3983-e833-4330-8cc1-245a4f09f87b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.