Re: Cassandra + Elasticsearch or Just Elasticsearch for Primary data store.
Thanks. We have 300 TB of data, average size of document stored is 512KB. Just want to make sure that using ES as primary data store I'm not missing anything. >From your response it looks to me like durability isn't a concern with ES. Thanks Pranav. On Wednesday, July 16, 2014 6:53:52 AM UTC-4, mooky wrote: > > What do you mean by "durability"? > > Its highly likely that elastic has the same storage guarantees that > cassandra does. > That said, some people like to have the flexibility of having the golden > source elsewhere and the ability to blow away the index & re-index at a > whim. > There are a number of elastic users, however, where this is not viable - > where reindexing their volume of data would take a week or 2. > > How much data are you looking at storing/indexing? Mb? Gb? Tb? Pb? > > -M > > > > On Tuesday, 15 July 2014 15:27:15 UTC+1, pranav amin wrote: >> >> Thanks Tim. >> >> Does that mean i can't get durability if i store my data in ES as a >> primary data store? >> >> Thanks >> Pranav. >> >> On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote: >>> >>> >>>> I'm just confused if Cassandra can really make a difference here, since >>>> looks to me ES can suffice here. >>>> >>>> >>>> >>> If you are not going to be using Cassandra for indexing then there is no >>> reason to have it. If you want durability in case something goes wrong with >>> ES you can just store your data in a log file before pumping it into ES. >>> If for whatever reason something happens to your ES cluster you can >>> reconstruct it using the log files. >>> >>> >>> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fad94a5b-2378-4088-98f1-abb4366be230%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cassandra + Elasticsearch or Just Elasticsearch for Primary data store.
Thanks Tim. Does that mean i can't get durability if i store my data in ES as a primary data store? Thanks Pranav. On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote: > > >> I'm just confused if Cassandra can really make a difference here, since >> looks to me ES can suffice here. >> >> >> > If you are not going to be using Cassandra for indexing then there is no > reason to have it. If you want durability in case something goes wrong with > ES you can just store your data in a log file before pumping it into ES. > If for whatever reason something happens to your ES cluster you can > reconstruct it using the log files. > > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/17b22829-b482-4dcc-8b36-4575b176cb14%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Cassandra + Elasticsearch or Just Elasticsearch for Primary data store.
Hi, I'm struggling to chose between these two options: with having Elasticsearch as a primary data store or should I need Cassandra as the primary data store and then data being copied in ES for indexing? The goal is just to store documents worth of 144 KB and possibly increasing to 512KB. The load will be in terms of 100 million a day for say. Every field in document to be indexed so that it can be searched as soon as we get it into the data store. Adhoc queries are a must on this data set. The system must be scalable when the load goes to billion and data durability and availability is a must. I'm just confused if Cassandra can really make a difference here, since looks to me ES can suffice here. Anyone disagree or agreeing, views or concern welcome. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b88f54a-3efa-45e2-9fc2-66fcbd75cd6d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How to set replication type equal to Async for an index
Hi, We are planning to create Index with 2 replica's and in order to have better performance we are thinking of doing the replication Async. I'm creating the Index this way - curl -XPUT 'http://localhost:9200/xyz/' -d '{ "settings" : { "number_of_shards" : 2, "number_of_replicas" : 2, "replication" : "async" } }' Is the above correct? How do i confirm that my replication is Async, is there any curl command for confirmation. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a285565a-b9f8-4774-90fa-fbf7b7b8f091%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes
We used Jmeter for this test. On Friday, June 13, 2014 10:13:02 AM UTC-4, Greg Murnane wrote: > > I haven't seen it asked yet; what is feeding data into your elasticsearch? > Depending on what you're doing to get it there, a large document size could > easily bottleneck some feeding mechanisms. It's also noteable that some > "green" spinning disks top out in the realm of 72MB/s. It might be useful > to make sure that your feeding mechanism can handle more than 500 TPS. > > The information transmitted in this email is intended only for the > person(s) or entity to which it is addressed and may contain confidential > and/or privileged material. Any review, retransmission, dissemination or > other use of, or taking of any action in reliance upon, this information by > persons or entities other than the intended recipient is prohibited. If you > received this email in error, please contact the sender and permanently > delete the email from any computer. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/78489304-73b9-42d7-a8c3-c1ceb58fe84a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Training Information needed
Hi, I stay in Canada and am looking for some training in Elasticsearch (preferably Performance tuning) and country = Canada or USA? If anyone has got any information please let me know. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/67fa97cc-0930-4af8-9b4c-09a4bc26d2dd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Linear Scaling with ES
Hi, We have been spending considerable amount of time now just to figure out if we can get linear scaling in ES by increasing number of nodes or shards or some other parameters. We did so many experiments, changing shards, changing nodes, changing replica, etc but looks to me with everything we were hitting a limit. I know this is a very broad question i'm asking, but does anyone know if it is even possible? Is there any formula or magic mantra to achieve this. Thank a lot in advance if someone can answer this. It can save me some time. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d0122c86-f8d8-4e5a-bf07-fa225c2b787c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes
Thanks Mark. We are using Java version - 1.7.0_25 What is your document size? I'm wondering if our document size i.e. 144 KB is causing the low TPS. Thanks Pranav. On Monday, June 9, 2014 6:29:19 PM UTC-4, Mark Walkom wrote: > > One thing you never mentioned was what version of Java you are on, which > can impact things as well. > > > To give you some idea, we had a 12 node cluster of VMs with 30GB heap and > were seeing 12000 TPS (incoming events), so what you are seeing is very low. > > Regards, > Mark Walkom > > Infrastructure Engineer > Campaign Monitor > email: ma...@campaignmonitor.com > web: www.campaignmonitor.com > > > On 10 June 2014 06:49, joerg...@gmail.com < > joerg...@gmail.com > wrote: > >> How do you try to figure out you're hitting limits? I have not enough >> information to help. >> >> Marvel, Elastic HQ, etc. are all very useful tools but should be >> combined with OS-related monitoring to get an overall picture. >> >> Jörg >> >> >> >> On Mon, Jun 9, 2014 at 9:31 PM, pranav amin > > wrote: >> >>> Thanks Jorg for your help. >>> >>> Do you recommend any tool that can help me to point out the bottlenecks >>> in terms of I/O, Memory, Network, GC, etc? >>> I'm using some tools that are free (like Marvel, Elastic HQ, etc), but >>> am not able to figure out if i'm hitting some limits. >>> >>> Thanks >>> Pranav. >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com . >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com . >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da8197c8-9bbc-441f-9f23-4287e2b786ef%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes
Thanks Jorg for your help. Do you recommend any tool that can help me to point out the bottlenecks in terms of I/O, Memory, Network, GC, etc? I'm using some tools that are free (like Marvel, Elastic HQ, etc), but am not able to figure out if i'm hitting some limits. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Scaling Writes/Indexing
Hi Chris, Did you got better TPS after doing some change. Can you share your experience? If so, which parameter did you tweak to get Linear scaling of ES. We are kind of in the same situation, we aren't able to see better Write TPS by adding more nodes. Thanks in advance for your help. Pranav. On Wednesday, December 4, 2013 12:54:23 PM UTC-5, Christopher Jones wrote: > > Dear Folks: > > I am running some scaling tests on elastic search. We are considering > using elastic search to store a large volume of log file data. A key metric > for us is the write throughput. We've been trying to understand how > Elasticsearch can scale. We were hoping to see some form of linear scaling. > That is, add in new nodes, increase the shard count, etc, in order to be > able to handle an increased number of writes. No matter what lever we pull, > we're not seeing an increase in write throughput. Clearly we're missing > something. > > > We have tested several different clusters on AWS (details below). We have > doubled the number of nodes; we have doubled (and doubled again) the number > of shards. We have doubled the number of servers feeding the cluster log > file data. We have put the nodes in the cluster behind an AWS load > balancer. We have increased the number of open files allowed by ubuntu. > None of these changes has had any effect on the number of records indexed > per second. > > Details: > > > We are piping json data using fluentd, using the fluentd elastic search > plugin, storing the log file entries using the logstash format, which is > supported by the fluentd elastic search plugin. Note that we are not using > logstash as a data pipeline. > > > We configured the cluster with only 1 replica, the default. > > We increased the ulimit for open files to 32000 > > Elastic search configuration: > > discovery.type: ec2 > > cloud: > aws: > > region: us-west-2 > groups: elastic-search > > cloud.node.auto_attributes: true > > > index.number_of_shards: 40 # we varied this between 10-40 > index.refresh_interval: 60s > > bootstrap.mlockall: true > > discovery.zen.ping.multicast.enabled: false > > Here's how we setup the memory configuration in setup.sh > > #!/bin/sh > export ES_MIN_MEM=16g > export ES_MAX_MEM=30g > bin/elasticsearch -f > > Each node in the cluster is m1.xlarge, which has the following > characteristics: > > *Instance Family* *Instance Type* *Processor Arch* *vCPU* *ECU* *Memory > (GiB)* *Instance Storage (GB)* *EBS-optimized Available* *Network > Performance* General purpose m1.xlarge 64-bit 4 8 15 4 x 420 Yes High > > We've monitored the cluster using http://www.elastichq.org. For each > index, we calculated (Indexing Total)/(Indexing Time) to get the number of > records indexed per second. Whether we have a single node with 10 shards or > 6 nodes with 40 shards, we consistently see indexing occurring at a rate of > around 3000-3500 records/second. It is this measure that we never seem to > be able to increase. > > Here's some characteristic stats for an individual node: > > Heap Used: 1.1gb > Heap Committed: 7.9gb > Non Heap Used: 42.3mb > Non Heap Committed: 66.2mb > JVM Uptime: 4h > Thread Count/Peak: 62 / 82 > GC Count: 8156 > GC Time: 4.8m > Java Version: 1.7.0_45 > JVM Vendor: Oracle Corporation > JVM: Java HotSpot(TM) 64-Bit Server VM > > We have not seen memory contention, or high CPU utilization, in general > (CPU utilization is around 80% out of 400% possible). We're not doing any > reading of the database (that would be a subsequent test). > > Here's some more stats: > Open File Descriptors:795CPU Usage:68% of 400%CPU System:1.5mCPU User: > 26.6mCPU Total:28.2mResident Memory:8.4gbShared Memory:19.6mbTotal > Virtual Memory:10.3gb > > Thanks for any help. > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a17430a3-4493-40a9-9b7c-74f2029e943a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Cannot Increase Write TPS in Elasticsearch by adding more nodes
Hi all, While doing some prototyping in ES using SSD's we got some good Write TPS. But the Write TPS saturated after adding some more nodes! Here are the details i used for prototyping - Requirement: To read data as soon as possible since the read is followed by write. Version of ES:1.0.0 Document Size:144 KB Use of SSD for Storage: Yes Benchmarking Tool: Soap UI or Jmeter VM: Ubuntu, 64 Bit OS Total Nodes: 12 Total Shards: 60 Threads: 200 Replica: 2 Index Shards: 20 Total Index:1 Hardware configuration: 4 CPU, 6 GB RAM, 3 GB Heap Using the above setup we got Write TPS ~= 500. We wanted to know by adding more node if we can increase our Write TPS. But we couldn't. * By adding 3 more nodes (i..e Total Nodes = 15) the TPS just increase by 10 i.e. ~= 510. * Adding more Hardware like CPU, RAM and increasing Heap didn't help as well [8 CPU, 12 GB RAM, 5 GB Heap]. Can someone help out or point ideas what will be wrong? Conceptually ES should scale in terms of Write & Read TPS by adding more nodes. However we aren't able to get that. Much appreciated if someone can point us in the right direction. Let me know if more information is needed. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1e34d7c7-d3da-40c7-8fca-16281494065b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
elastic search & hadoop for analytics
Hi, I'm thinking to use Elastic search and Hadoop as a combination for analytics. I know Elastic search has got some good stuff integrated with Hadoop which is great news. I have some questions on that - 1) Is using Elastic search - Hadoop Integrated apis Free to use or is it licensed? 2) Does that mean i need hadoop installation as well, is that free or licensed? 3) Where is the actual data stored In Elastic search or In hadoop or In both? 4) Where is the latest data, like if an App writes data to which data store it goes first, and to which data store it goes second OR is it something handled by ES-Hadoop api to replicate data from ES to Hadoop or vice versa? 5) Any request for search, does it go to ES or goes to Hadoop? 6) From architecture point i'm not sure on how the ES & Hadoop work together from Data Read and Write aspect, any use case scenario would be helpful? Thanks in advance for answering. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cf3e3008-b2c5-4361-b0b4-939c7b1d9803%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.