Re: Cassandra + Elasticsearch or Just Elasticsearch for Primary data store.

2014-07-16 Thread pranav amin
Thanks. 

We have 300 TB of data, average size of document stored is 512KB. Just want 
to make sure that using ES as primary data store I'm not missing anything.
>From your response it looks to me like durability isn't a concern with ES.


Thanks
Pranav.

On Wednesday, July 16, 2014 6:53:52 AM UTC-4, mooky wrote:
>
> What do you mean by "durability"?
>
> Its highly likely that elastic has the same storage guarantees that 
> cassandra does.
> That said, some people like to have the flexibility of having the golden 
> source elsewhere and the ability to blow away the index & re-index at a 
> whim.
> There are a number of elastic users, however, where this is not viable - 
> where reindexing their volume of data would take a week or 2.
>
> How much data are you looking at storing/indexing? Mb? Gb? Tb? Pb?
>
> -M
>
>
>
> On Tuesday, 15 July 2014 15:27:15 UTC+1, pranav amin wrote:
>>
>> Thanks Tim.
>>
>> Does that mean i can't get durability if i store my data in ES as a 
>> primary data store? 
>>
>> Thanks
>> Pranav.
>>
>> On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote:
>>>
>>>
>>>> I'm just confused if Cassandra can really make a difference here, since 
>>>> looks to me ES can suffice here.
>>>>
>>>>
>>>>
>>> If you are not going to be using Cassandra for indexing then there is no 
>>> reason to have it. If you want durability in case something goes wrong with 
>>> ES you can just store your data in a log file before pumping it into ES. 
>>>  If for whatever reason something happens to your ES cluster you can 
>>> reconstruct it using the log files.
>>>
>>>  
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fad94a5b-2378-4088-98f1-abb4366be230%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cassandra + Elasticsearch or Just Elasticsearch for Primary data store.

2014-07-15 Thread pranav amin
Thanks Tim.

Does that mean i can't get durability if i store my data in ES as a primary 
data store? 

Thanks
Pranav.

On Monday, July 14, 2014 11:57:23 PM UTC-4, Tim Uckun wrote:
>
>
>> I'm just confused if Cassandra can really make a difference here, since 
>> looks to me ES can suffice here.
>>
>>
>>
> If you are not going to be using Cassandra for indexing then there is no 
> reason to have it. If you want durability in case something goes wrong with 
> ES you can just store your data in a log file before pumping it into ES. 
>  If for whatever reason something happens to your ES cluster you can 
> reconstruct it using the log files.
>
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/17b22829-b482-4dcc-8b36-4575b176cb14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Cassandra + Elasticsearch or Just Elasticsearch for Primary data store.

2014-07-14 Thread pranav amin
Hi,

I'm struggling to chose between these two options: with having 
Elasticsearch as a primary data store or should I need Cassandra as the 
primary data store and then data being copied in ES for indexing?

The goal is just to store documents worth of 144 KB and possibly increasing 
to 512KB. The load will be in terms of 100 million a day for say. Every 
field in document to be indexed so that it can be searched as soon as we 
get it into the data store. Adhoc queries are a must on this data set. The 
system must be scalable when the load goes to billion and data durability 
and availability is a must.

I'm just confused if Cassandra can really make a difference here, since 
looks to me ES can suffice here.

Anyone disagree or agreeing, views or concern welcome. 

Thanks
Pranav.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b88f54a-3efa-45e2-9fc2-66fcbd75cd6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to set replication type equal to Async for an index

2014-06-17 Thread pranav amin
Hi,

We are planning to create Index with 2 replica's and in order to have 
better performance we are thinking of doing the replication Async.

I'm creating the Index this way - 

curl -XPUT 'http://localhost:9200/xyz/' -d '{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 2,
"replication" : "async"
}
}'


Is the above correct? How do i confirm that my replication is Async, is 
there any curl command for confirmation. 

Thanks
Pranav.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a285565a-b9f8-4774-90fa-fbf7b7b8f091%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-16 Thread pranav amin
We used Jmeter for this test.

On Friday, June 13, 2014 10:13:02 AM UTC-4, Greg Murnane wrote:
>
> I haven't seen it asked yet; what is feeding data into your elasticsearch? 
> Depending on what you're doing to get it there, a large document size could 
> easily bottleneck some feeding mechanisms. It's also noteable that some 
> "green" spinning disks top out in the realm of 72MB/s. It might be useful 
> to make sure that your feeding mechanism can handle more than 500 TPS.
>
> The information transmitted in this email is intended only for the 
> person(s) or entity to which it is addressed and may contain confidential 
> and/or privileged material. Any review, retransmission, dissemination or 
> other use of, or taking of any action in reliance upon, this information by 
> persons or entities other than the intended recipient is prohibited. If you 
> received this email in error, please contact the sender and permanently 
> delete the email from any computer.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/78489304-73b9-42d7-a8c3-c1ceb58fe84a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Training Information needed

2014-06-15 Thread pranav amin
Hi,

I stay in Canada and am looking for some training in Elasticsearch 
(preferably Performance tuning) and country = Canada or USA?
If anyone has got any information please let me know.


Thanks
Pranav.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/67fa97cc-0930-4af8-9b4c-09a4bc26d2dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Linear Scaling with ES

2014-06-13 Thread pranav amin
Hi,

We have been spending considerable amount of time now just to figure out if 
we can get linear scaling in ES by increasing number of nodes or shards or 
some other parameters. We did so many experiments, changing shards, 
changing nodes, changing replica, etc but looks to me with everything we 
were hitting a limit.

I know this is a very broad question i'm asking, but does anyone know if it 
is even possible? Is there any formula or magic mantra to achieve this.

Thank a lot in advance if someone can answer this. It can save me some 
time. 

Thanks
Pranav.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d0122c86-f8d8-4e5a-bf07-fa225c2b787c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-10 Thread pranav amin
Thanks Mark.

We are using Java version - 1.7.0_25

What is your document size? I'm wondering if our document size i.e. 144 KB 
is causing the low TPS.

Thanks
Pranav.

On Monday, June 9, 2014 6:29:19 PM UTC-4, Mark Walkom wrote:
>
> One thing you never mentioned was what version of Java you are on, which 
> can impact things as well.
>
>
> To give you some idea, we had a 12 node cluster of VMs with 30GB heap and 
> were seeing 12000 TPS (incoming events), so what you are seeing is very low.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 10 June 2014 06:49, joerg...@gmail.com  <
> joerg...@gmail.com > wrote:
>
>> How do you try to figure out you're hitting limits? I have not enough 
>> information to help.
>>
>>  Marvel, Elastic HQ, etc. are all very useful tools but should be 
>> combined with OS-related monitoring to get an overall picture.
>>
>> Jörg
>>
>>
>>
>> On Mon, Jun 9, 2014 at 9:31 PM, pranav amin > > wrote:
>>
>>> Thanks Jorg for your help.
>>>
>>> Do you recommend any tool that can help me to point out the bottlenecks 
>>> in terms of I/O, Memory, Network, GC, etc?
>>> I'm using some tools that are free (like Marvel, Elastic HQ, etc), but 
>>> am not able to figure out if i'm hitting some limits.
>>>
>>> Thanks
>>> Pranav.
>>>
>>>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da8197c8-9bbc-441f-9f23-4287e2b786ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-09 Thread pranav amin
Thanks Jorg for your help.

Do you recommend any tool that can help me to point out the bottlenecks in 
terms of I/O, Memory, Network, GC, etc?
I'm using some tools that are free (like Marvel, Elastic HQ, etc), but am 
not able to figure out if i'm hitting some limits.

Thanks
Pranav.



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Scaling Writes/Indexing

2014-06-09 Thread pranav amin
Hi Chris,

Did you got better TPS after doing some change. Can you share your 
experience? 
If so, which parameter did you tweak to get Linear scaling of ES.

We are kind of in the same situation, we aren't able to see better Write 
TPS by adding more nodes.

Thanks in advance for your help.
Pranav.

On Wednesday, December 4, 2013 12:54:23 PM UTC-5, Christopher Jones wrote:
>
> Dear Folks:
>
> I am running some scaling tests on elastic search. We are considering 
> using elastic search to store a large volume of log file data. A key metric 
> for us is the write throughput. We've been trying to understand how 
> Elasticsearch can scale. We were hoping to see some form of linear scaling. 
> That is, add in new nodes, increase the shard count, etc, in order to be 
> able to handle an increased number of writes. No matter what lever we pull, 
> we're not seeing an increase in write throughput. Clearly we're missing 
> something.
>
>
> We have tested several different clusters on AWS (details below).  We have 
> doubled the number of nodes; we have doubled (and doubled again) the number 
> of shards. We have doubled the number of servers feeding the cluster log 
> file data. We have put the nodes in the cluster behind an AWS load 
> balancer. We have increased the number of open files allowed by ubuntu. 
> None of these changes has had any effect on the number of records indexed 
> per second. 
>
> Details:
>
>
> We are piping json data using fluentd, using the fluentd elastic search 
> plugin, storing the log file entries using the logstash format, which is 
> supported by the fluentd elastic search plugin. Note that we are not using 
> logstash as a data pipeline.
>
>
> We configured the cluster with only 1 replica, the default.
>
> We increased the ulimit for open files to 32000
>
> Elastic search configuration:
>
> discovery.type: ec2
>
> cloud:
> aws:
>
> region: us-west-2
> groups: elastic-search
>
> cloud.node.auto_attributes: true
>
>
> index.number_of_shards: 40 # we varied this between 10-40
> index.refresh_interval: 60s
>
> bootstrap.mlockall: true
>
> discovery.zen.ping.multicast.enabled: false
>
> Here's how we setup the memory configuration in setup.sh
>
> #!/bin/sh
> export ES_MIN_MEM=16g
> export ES_MAX_MEM=30g
> bin/elasticsearch -f
>
> Each node in the cluster is m1.xlarge, which has the following 
> characteristics:
>
>   *Instance Family* *Instance Type* *Processor Arch* *vCPU* *ECU* *Memory 
> (GiB)* *Instance Storage (GB)* *EBS-optimized Available* *Network 
> Performance*  General purpose m1.xlarge 64-bit 4 8 15 4 x 420 Yes High 
>
> We've monitored the cluster using http://www.elastichq.org. For each 
> index, we calculated (Indexing Total)/(Indexing Time) to get the number of 
> records indexed per second. Whether we have a single node with 10 shards or 
> 6 nodes with 40 shards, we consistently see indexing occurring at a rate of 
> around 3000-3500 records/second. It is this measure that we never seem to 
> be able to increase.
>
> Here's some characteristic stats for an individual node:
>
> Heap Used: 1.1gb
> Heap Committed: 7.9gb
> Non Heap Used: 42.3mb
> Non Heap Committed: 66.2mb
> JVM Uptime: 4h
> Thread Count/Peak: 62 / 82
> GC Count: 8156
> GC Time: 4.8m
> Java Version: 1.7.0_45
> JVM Vendor: Oracle Corporation
> JVM: Java HotSpot(TM) 64-Bit Server VM
>
> We have not seen memory contention, or high CPU utilization, in general 
> (CPU utilization is around 80% out of 400% possible). We're not doing any 
> reading of the database (that would be a subsequent test).
>
> Here's some more stats:
> Open File Descriptors:795CPU Usage:68% of 400%CPU System:1.5mCPU User:
> 26.6mCPU Total:28.2mResident Memory:8.4gbShared Memory:19.6mbTotal 
> Virtual Memory:10.3gb
>
> Thanks for any help.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a17430a3-4493-40a9-9b7c-74f2029e943a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-09 Thread pranav amin
Hi all,

While doing some prototyping in ES using SSD's we got some good Write TPS. 
But the Write TPS saturated after adding some more nodes! 


Here are the details i used for prototyping -

Requirement: To read data as soon as possible since the read is followed by 
write. 
Version of ES:1.0.0
Document Size:144 KB
Use of SSD for Storage: Yes
Benchmarking Tool: Soap UI or Jmeter
VM: Ubuntu, 64 Bit OS
Total Nodes: 12
Total Shards: 60
Threads: 200
Replica: 2
Index Shards: 20
Total Index:1 
Hardware configuration: 4 CPU, 6 GB RAM, 3 GB Heap

Using the above setup we got Write TPS ~= 500. 

We wanted to know by adding more node if we can increase our Write TPS. But 
we couldn't. 
* By adding 3 more nodes (i..e Total Nodes = 15) the TPS just increase by 
10 i.e. ~= 510. 
* Adding more Hardware like CPU, RAM and increasing Heap didn't help as 
well [8 CPU, 12 GB RAM, 5 GB Heap].

Can someone help out or point ideas what will be wrong? Conceptually ES 
should scale in terms of Write & Read TPS by adding more nodes. However we 
aren't able to get that.

Much appreciated if someone can point us in the right direction. Let me 
know if more information is needed.

Thanks
Pranav.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1e34d7c7-d3da-40c7-8fca-16281494065b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elastic search & hadoop for analytics

2014-03-27 Thread pranav amin
Hi,

I'm thinking to use Elastic search and Hadoop as a combination for 
analytics. I know Elastic search has got some good stuff integrated with 
Hadoop which is great news.
I have some questions on that - 

1) Is using Elastic search - Hadoop Integrated apis Free to use or is it 
licensed?
2) Does that mean i need hadoop installation as well, is that free or 
licensed?
3) Where is the actual data stored In Elastic search or In hadoop or In 
both?
4) Where is the latest data, like if an App writes data to which data store 
it goes first, and to which data store it goes second OR is it something 
handled by ES-Hadoop api to replicate data from ES to Hadoop or vice versa?
5) Any request for search, does it go to ES or goes to Hadoop?
6) From architecture point i'm not sure on how the ES & Hadoop work 
together from Data Read and Write aspect, any use case scenario would be 
helpful?

Thanks in advance for answering.

Thanks
Pranav.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cf3e3008-b2c5-4361-b0b4-939c7b1d9803%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.