I recently worked on a minor project that nevertheless needed to use 10 
gigs of RAM. It ran on a reasonably powerful server, yet it taxed that 
server. And I wondered, how are people scaling up such processes? If my 
approach was naive, what does the less naive approach look like? 

I wrote a simple app that pulled data from a MySQL database, denormalized 
it, and then stored it in ElasticSearch. It pulled about 4 million 
documents from MySQL. Parts of the data needed to be built up into complex 
structures (maps, vectors) before being put into ElasticSearch. In the end, 
the 4 million rows from MySQL became 1.5 million documents in ElasticSearch.

I was wondering, what if, instead of 4 million documents, I needed to 
process 400 million documents? I assume I would have to distribute the work 
over several machines? I'm curious what are some of the most common routes 
for doing so? Would this be the situation where people would start to use 
something like Onyx or Storm or Hadoop? I looked at Spark but it seems to 
be for a different use case, more about querying that denormalizing. 
Likewise, dumping everything onto S3 and then using something like Athena 
seems to be more for querying than denormalizing. 

For unrelated reasons, I am moving toward the architecture where all data 
is stored in Kafka. I suppose I could write a denormalizing app that reads 
over Kafka and builds up the data and then inserts it to ElasticSearch, 
though I suppose, on the narrow issue of memory usage, using Kafka is no 
different than using using MySQL.

So, I'm asking about common patterns here. When folks have an app that 
needs more RAM than a typical server, what is the first and most common 
steps they take? 







-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to