[arangodb-google] Re: Timeline for a better memory model.

'Jan Stücke' via ArangoDB Tue, 10 May 2016 04:06:11 -0700

Hi JDP,

First of all - form the whole ArangoDB team - thank you so much for your 
awesome contribution to the ArangoDB community. Your support helped us grow 
period


We know how important the topic of memory usage is and that especially 
bootstrapped startups suffer from tight budgets. I myself worked twice in 
such environments so I know what you are talking about.

We see three different developments which will tackle the problem of memory 
usage of ArangoDB. (I have to say, that we always compare ourselfs with the 
best solutions out there so ArangoDBs memory usage is relatively higher to 
the best specialized solutions! Compared to all others we are quite good.)

A) *Prizes for memory are falling:* This trend is there for decades and we 
haven´t seen a stopping sign yet. AWS just announced a 2TB machine which 
will be available starting summer 2016 and we are confident that the low 
cost standard machines will improve memory-wise as well. The performance 
and capabilities you´ll get for 80$ will increase significantly 
<https://pcpartpicker.com/trends/memory/#ram.204sodimm.ddr3_1600.2x8192>. 
In addition there is a cool new development in the memory sphere like nv-memory 
which is quite promising <https://queue.acm.org/detail.cfm?id=2874238>   
BUT the pace of this decline/development is not fast enough so we know that 
we have to provide technical solutions within ArangoDB as well. 

B) *Memory usage of ArangoDB improves:* We just implemented VelocyPack (
VPack <https://github.com/arangodb/velocypack>) into our new release 
ArangoDB 3.0. This binary storage format is even more compact than e.g. 
MessagePack and will reduce memory usage for query results, storing 
documents and temporarily computed values.

*C) Persistent Indexes and plugable storage engine: *The problem you just 
described is mainly caused by the memory dedicated to indices. With our 
upcoming 3.0 release we will provide a solution for persistent indices 
which will party minimize memory needs This is the first step to our 
plugable storage engine, that will come with 3.x. With a plugable storage 
engine it´s up to you if you want to optimize for performance at the cost 
of higher memory usage (keep all in-memory) or if you are willing to 
sacrifice a bit of your performance and store those indices on disk.

This is the versatility and flexibility we want to achieve to get closer to 
our vision of simplifying data work.

Summed up we are on our way. The notion that we will "price ourselfs out of 
the market" is a bit far fetching from my personal perspective. Our current 
customer funnel tells another story.
But either way we know about the problem, are tackling it already and will 
provide performant solutions throughout 2016 starting with 3.0.

Hope I could help,

Jan


Am Dienstag, 10. Mai 2016 04:37:50 UTC+2 schrieb JPatrick Davenport:
>
> Hello,
> First, for those who don't know me, I'm a big fan of Arango. I've written 
> a Clojure driver, travesedo <https://github.com/deusdat/travesedo>, and 
> the only, as far as I know, Hadoop/Cascading taps in existence 
> <https://github.com/deusdat/guacaphant>, which makes Arango a first class 
> citizen in the "big data" world. So what follows comes from love.
>
> Is there a plan to move Arango to a more efficient memory model than 
> essentially mapped files? I've seen multi-Gb databases work fine in MySQL 
> with just 1 GB of RAM. I don't think that Arango would do as well under 
> these tight, bootstrapped requirements. I've personally watched a mere 700 
> GB collection bring Arango to its knees if the whole server only has 1 GB 
> of RAM.
>
> The reason I ask is because I think there's a large market of small 
> projects/bootstrapped startups out there that could really use an ArangoDB 
> type store. Money is drying up in Silicon Valley. This means that single 
> node, cloud-based systems of 96 GB are going away. We're going to return to 
> micro systems on various cloud providers. ArangoDB is pricing itself out of 
> the market.
>
> I appreciate the work of the shapes storage for the the data. I understand 
> how it can condense documents much better than MongoDB. At the same time 
> the memory usage for collections having to be read entirely cached in 
> memory makes the system difficult. The non-binary storage makes it terribly 
> inefficient as shown by ArangoDB's own benchmarks.
>
> Is there an effort to support partial collection loads (or other 
> optimizations)? 
>
> Postgres does really well with the inefficient JSON format used by Arango 
> in the comparison tests. It could do better in a Relational v Document 
> showdown too provided the proper indexes are in Postgres.
>
> How does ArangoDB's dev team view this issue(s)? 
>
> I ask because I'm at a cross roads now. I can only afford about $80/month 
> for my data tier. I can get a MySQL or Postgres system that would hum along 
> for years in single node mode. ArangoDB concerns me because I can quickly 
> need multiple systems to just happily process a few GB.
>
> Thanks,
> JPD
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: Timeline for a better memory model.

Reply via email to