hello david,
thank you for the quick reply! appreciate it very much.
Am 01.11.2011 01:01, schrieb David Montag:
Hi Alican,
On Mon, Oct 31, 2011 at 6:26 AM, algecyaalican.gecya...@openconcept.chwrote:
Hello everyone,
We are relatively new to neo4j and are evaluating some test scenarios in
order to decide to use neo4j in productive systems. We used the latest
stable release 1.4.2.
I wrote an import script and generated some random data with the given tree
structure:
http://neo4j-community-discussions.438527.n3.nabble.com/file/n3467806/neo4j_nodes.png
Nodes Summary:
Nodes with Type A: 1
Nodes with Type B: 100
Nodes with Type C: 50'000 (100x500)
Nodes with Type D: 500'000 (50'000x10)
Nodes with Type E: 25'000'000 (500'000x50)
Nodes with Type F: 375'000'000 (25'000'000x15)
This all worked quite OK, the import took approx. 30hours using the
batchimport.
We have multiple indexes, but we also have one index where all nodes are
indexed.
My first question would be, does it make sense to index all nodes with the
same index?
It depends on how you intend you access the data. If you always know the
type, then it would be beneficial to use different indices. Otherwise you
might want to put it all in a single index. Do remember that the index will
consume some disk space as well.
ok, we decided to create a type node for each type and let the nodes
relate to it. (Instead of having the type as an attribute at each node)
I guess I was thinking too much in relational database schemes.
therefore we will have an index per type.
If I would like to list all nodes with property type:type E it is quite
slow the first time ~270s
Second time it is fast ~1/2s. I know this is normal and mostlikely fixed in
the current milestone version. But I am not sure how long the query will be
cached in memory. Are there any configurations I should be concerned about?
The difference there is all about disk access time. Will give me all 25
million E's be a common operation?
We will need to find nodes with common attributes of type E , which may
return approx. 1million results. But there will always be a search for
different values.
E.g., nodes with type E have an attribute date created and an attribute
name. I will need to find all attributes created at the given date(say
year 2011) and the given name (abc).
The second search will be date (2011) and name (def). If certain time
passes and memory is being used for other searches, I am afraid my first
search (2011,abc) will be kicked out of memory and the search will take
long again the next time I query for it.
We also took the hardware sizing calculator. See the result here:
http://neo4j-community-discussions.438527.n3.nabble.com/file/n3467806/neo4j_hardware.png
Are these realistic result values? I guess 128GB RAM and 12TB SSD
harddrives
might be a bit cost intense.
The reason that the disk usage is 12TB is because you specified that each
node on average has 10kB of data, and each relationship on average has 1kB
of data. What kind of data are you storing on the nodes and relationships?
These are pretty rough estimates not taking into account the number of
properties nor the type of them. Also, if you decrease the property data by
a factor 100 (100B/node, 10B/rel), then your database will only consume
~150-200GB.
Ok I see your point. I think I am getting the hang of graph-based
databases now. I.e., I might not want to put all my data into attributes
but create nodes instead...
My rough guess was to increase the amount of nodes to a 1'000'000'000
and decrease the bytes consumed to 100B/node and 10/rel. The result is
to have approx. 400GB (no problem at all).
But I am still a bit concerned about the 128GB RAM..
Are there any reference applications with these amount of nodes and
relations?
We are in the process of adding case studies. Please get in touch with
sales for more info at this time.
Thank you, will do so.
Also Neoclipse won't start/connect to the database anymore with these
amount
of data.
Am I missing some configurations for neoclipse?
Are you getting an error message?
No error messages. Is there an option to enable logging?
I let neoclipse run for almost an hour and suddenly the graph appeared.
But I can not navigate(its like frozen, but there are calculations going
on..)
Not so sure why it takes so long though, the initial traversal depth is
1, there are 16 nodes and 15 relations. I also decreased the amount of
nodes to be displayed to 50.
I thought It would load data lazily?
Best regards
alican
Best,
David
Best regards
--
alican
--
View this message in context:
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-performance-with-400million-nodes-tp3467806p3467806.html
Sent from the Neo4j Community Discussions mailing list archive at
Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org