I recently worked on a minor project that nevertheless needed to use 10 gigs of RAM. It ran on a reasonably powerful server, yet it taxed that server. And I wondered, how are people scaling up such processes? If my approach was naive, what does the less naive approach look like?
I wrote a simple app that pulled data from a MySQL database, denormalized it, and then stored it in ElasticSearch. It pulled about 4 million documents from MySQL. Parts of the data needed to be built up into complex structures (maps, vectors) before being put into ElasticSearch. In the end, the 4 million rows from MySQL became 1.5 million documents in ElasticSearch. I was wondering, what if, instead of 4 million documents, I needed to process 400 million documents? I assume I would have to distribute the work over several machines? I'm curious what are some of the most common routes for doing so? Would this be the situation where people would start to use something like Onyx or Storm or Hadoop? I looked at Spark but it seems to be for a different use case, more about querying that denormalizing. Likewise, dumping everything onto S3 and then using something like Athena seems to be more for querying than denormalizing. For unrelated reasons, I am moving toward the architecture where all data is stored in Kafka. I suppose I could write a denormalizing app that reads over Kafka and builds up the data and then inserts it to ElasticSearch, though I suppose, on the narrow issue of memory usage, using Kafka is no different than using using MySQL. So, I'm asking about common patterns here. When folks have an app that needs more RAM than a typical server, what is the first and most common steps they take? -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.