Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Todd,

The sample code creates nodes and relationships by parsing 4 csv files.
Thank you for trying to trigger this behaviour with this sample.

Núria

2009/12/9 Mattias Persson matt...@neotechnology.com

 Could you provide me with some sample code which can trigger this
 behaviour with the latest index-util-0.9-SNAPSHOT Núria?

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Todd,
 
  I haven't the same problem. In my case, after indexing all the
  attributes/properties of each node, the application creates all the edges
 by
  looking up the tail node and the head node. So, it calls the method
  org.neo4j.util.index.
  LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found
 node)
  in many occasions.
 
  Any one has an alternative to get a node with indexex
 attributes/properties?
 
  Thank you,
 
  Núria.
 
 
  2009/12/7 Mattias Persson matt...@neotechnology.com
 
  Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
  is a bug that we fixed yesterday... (assuming it's the same bug).
 
  2009/12/7 Todd Stavish toddstav...@gmail.com:
   Hi Mattias, Núria.
  
   I am also running into scalability problems with the Lucene batch
   inserter at much smaller numbers, 30,000 indexed nodes. I tried
   calling optimize more. Increasing ulimit didn't help.
  
   INFO] Exception in thread main java.lang.RuntimeException:
   java.io.FileNotFoundException:
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
   (Too many open files)
   [INFO]  at
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
   [INFO]  at
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
   [INFO]  at
  com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
   [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
   [INFO] Caused by: java.io.FileNotFoundException:
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
   (Too many open files)
  
   I tried breaking up to separate batchinserter instances, and it hangs
   now. Can I create more than one batch inserter per process if they run
   sequentially and non-threaded?
  
   Thanks,
   Todd
  
  
  
  
  
   On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com
  wrote:
   Hi again Mattias,
  
   I have tried to execute my application with the last version
 available
  in
   the maven repository and I still have the same problem. After
 creating
  and
   indexing all the nodes, the application calls the optimize method
 and,
   then, it creates all the edges by calling the method getNodes in
 order
  to
   select the tail and head node of the edge, but it doesn't work
 because
  many
   nodes are not found.
  
   I have tried to create only 30 nodes and 15 edges and it works
 properly,
  but
   if I try to create a big graph (180 million edges + 20 million nodes)
 it
   doesn't.
  
   I have also tried to call the optimize method every time the
  application
   has been created 1 million nodes but it doesn't work.
  
   Have you tried to create as many nodes as I have said with the newer
   index-util version?
  
   Thank you,
  
   Núria.
  
   2009/12/4 Núria Trench nuriatre...@gmail.com
  
   Hi Mattias,
  
   Thank you very much for fixing the problem so fast. I will try it as
  soon
   as the new changes will be available in the maven repository.
  
   Núria.
  
  
   2009/12/4 Mattias Persson matt...@neotechnology.com
  
   I fixed the problem and also added a cache per key for faster
   getNodes/getSingleNode lookup during the insert process. However
 the
   cache assumes that there's nothing in the index when the process
   starts (which almost always will be true) to speed things up even
   further.
  
   You can control the cache size and if it should be used by
 overriding
   the (this is also documented in the Javadoc):
  
   boolean useCache()
   int getMaxCacheSizePerKey()
  
   methods in your LuceneIndexBatchInserterImpl instance. The new
 changes
   should be available in the maven repository within an hour.
  
   2009/12/4 Mattias Persson matt...@neotechnology.com:
I think I found the problem... it's indexing as it should, but it
isn't reflected in getNodes/getSingleNode properly until you
flush/optimize/shutdown the index. I'll try to fix it today!
   
2009/12/3 Núria Trench nuriatre...@gmail.com:
Thank you very much for your response.
If you need more information, you only have to send an e-mail
 and I
   will try
to explain it better.
   
Núria.
   
2009/12/3 Mattias Persson matt...@neotechnology.com
   
This is something I'd like to reproduce and I'll do some
 testing
  on
this tomorrow
   
2009/12/3 Núria Trench nuriatre...@gmail.com:
 Hello,

 Last week, I decided to download your graph database core in
  order
   to use
 it. First, I created a new project to 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Hi again, Núria (it was I, Mattias who asked for the sample code).
Well... the fact that you parse 4 csv files doesn't really help me
setup a test for this... I mean how can I know that my test will be
similar to yours? Would it be ok to attach your code/csv files as
well?

/ Mattias

2009/12/9 Núria Trench nuriatre...@gmail.com:
 Hi Todd,

 The sample code creates nodes and relationships by parsing 4 csv files.
 Thank you for trying to trigger this behaviour with this sample.

 Núria

 2009/12/9 Mattias Persson matt...@neotechnology.com

 Could you provide me with some sample code which can trigger this
 behaviour with the latest index-util-0.9-SNAPSHOT Núria?

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Todd,
 
  I haven't the same problem. In my case, after indexing all the
  attributes/properties of each node, the application creates all the edges
 by
  looking up the tail node and the head node. So, it calls the method
  org.neo4j.util.index.
  LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found
 node)
  in many occasions.
 
  Any one has an alternative to get a node with indexex
 attributes/properties?
 
  Thank you,
 
  Núria.
 
 
  2009/12/7 Mattias Persson matt...@neotechnology.com
 
  Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
  is a bug that we fixed yesterday... (assuming it's the same bug).
 
  2009/12/7 Todd Stavish toddstav...@gmail.com:
   Hi Mattias, Núria.
  
   I am also running into scalability problems with the Lucene batch
   inserter at much smaller numbers, 30,000 indexed nodes. I tried
   calling optimize more. Increasing ulimit didn't help.
  
   INFO] Exception in thread main java.lang.RuntimeException:
   java.io.FileNotFoundException:
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
   (Too many open files)
   [INFO]  at
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
   [INFO]  at
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
   [INFO]  at
  com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
   [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
   [INFO] Caused by: java.io.FileNotFoundException:
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
   (Too many open files)
  
   I tried breaking up to separate batchinserter instances, and it hangs
   now. Can I create more than one batch inserter per process if they run
   sequentially and non-threaded?
  
   Thanks,
   Todd
  
  
  
  
  
   On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com
  wrote:
   Hi again Mattias,
  
   I have tried to execute my application with the last version
 available
  in
   the maven repository and I still have the same problem. After
 creating
  and
   indexing all the nodes, the application calls the optimize method
 and,
   then, it creates all the edges by calling the method getNodes in
 order
  to
   select the tail and head node of the edge, but it doesn't work
 because
  many
   nodes are not found.
  
   I have tried to create only 30 nodes and 15 edges and it works
 properly,
  but
   if I try to create a big graph (180 million edges + 20 million nodes)
 it
   doesn't.
  
   I have also tried to call the optimize method every time the
  application
   has been created 1 million nodes but it doesn't work.
  
   Have you tried to create as many nodes as I have said with the newer
   index-util version?
  
   Thank you,
  
   Núria.
  
   2009/12/4 Núria Trench nuriatre...@gmail.com
  
   Hi Mattias,
  
   Thank you very much for fixing the problem so fast. I will try it as
  soon
   as the new changes will be available in the maven repository.
  
   Núria.
  
  
   2009/12/4 Mattias Persson matt...@neotechnology.com
  
   I fixed the problem and also added a cache per key for faster
   getNodes/getSingleNode lookup during the insert process. However
 the
   cache assumes that there's nothing in the index when the process
   starts (which almost always will be true) to speed things up even
   further.
  
   You can control the cache size and if it should be used by
 overriding
   the (this is also documented in the Javadoc):
  
   boolean useCache()
   int getMaxCacheSizePerKey()
  
   methods in your LuceneIndexBatchInserterImpl instance. The new
 changes
   should be available in the maven repository within an hour.
  
   2009/12/4 Mattias Persson matt...@neotechnology.com:
I think I found the problem... it's indexing as it should, but it
isn't reflected in getNodes/getSingleNode properly until you
flush/optimize/shutdown the index. I'll try to fix it today!
   
2009/12/3 Núria Trench nuriatre...@gmail.com:
Thank you very much for your response.
If you need more information, you only have to send an e-mail
 and I
   will try
to explain it better.
   
Núria.
   
2009/12/3 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Mattias,

In my last e-mail I have attached the sample code, haven't you received it?
I will try to attach it again.

Núria.

2009/12/9 Mattias Persson matt...@neotechnology.com

 Hi again, Núria (it was I, Mattias who asked for the sample code).
 Well... the fact that you parse 4 csv files doesn't really help me
 setup a test for this... I mean how can I know that my test will be
 similar to yours? Would it be ok to attach your code/csv files as
 well?

 / Mattias

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Hi Todd,
 
  The sample code creates nodes and relationships by parsing 4 csv files.
  Thank you for trying to trigger this behaviour with this sample.
 
  Núria
 
  2009/12/9 Mattias Persson matt...@neotechnology.com
 
  Could you provide me with some sample code which can trigger this
  behaviour with the latest index-util-0.9-SNAPSHOT Núria?
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
   Todd,
  
   I haven't the same problem. In my case, after indexing all the
   attributes/properties of each node, the application creates all the
 edges
  by
   looking up the tail node and the head node. So, it calls the method
   org.neo4j.util.index.
   LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found
  node)
   in many occasions.
  
   Any one has an alternative to get a node with indexex
  attributes/properties?
  
   Thank you,
  
   Núria.
  
  
   2009/12/7 Mattias Persson matt...@neotechnology.com
  
   Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
   is a bug that we fixed yesterday... (assuming it's the same bug).
  
   2009/12/7 Todd Stavish toddstav...@gmail.com:
Hi Mattias, Núria.
   
I am also running into scalability problems with the Lucene batch
inserter at much smaller numbers, 30,000 indexed nodes. I tried
calling optimize more. Increasing ulimit didn't help.
   
INFO] Exception in thread main java.lang.RuntimeException:
java.io.FileNotFoundException:
   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
[INFO]  at
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
[INFO]  at
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
[INFO]  at
  
 com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
[INFO]  at
 com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
[INFO] Caused by: java.io.FileNotFoundException:
   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
   
I tried breaking up to separate batchinserter instances, and it
 hangs
now. Can I create more than one batch inserter per process if they
 run
sequentially and non-threaded?
   
Thanks,
Todd
   
   
   
   
   
On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
 nuriatre...@gmail.com
   wrote:
Hi again Mattias,
   
I have tried to execute my application with the last version
  available
   in
the maven repository and I still have the same problem. After
  creating
   and
indexing all the nodes, the application calls the optimize
 method
  and,
then, it creates all the edges by calling the method getNodes in
  order
   to
select the tail and head node of the edge, but it doesn't work
  because
   many
nodes are not found.
   
I have tried to create only 30 nodes and 15 edges and it works
  properly,
   but
if I try to create a big graph (180 million edges + 20 million
 nodes)
  it
doesn't.
   
I have also tried to call the optimize method every time the
   application
has been created 1 million nodes but it doesn't work.
   
Have you tried to create as many nodes as I have said with the
 newer
index-util version?
   
Thank you,
   
Núria.
   
2009/12/4 Núria Trench nuriatre...@gmail.com
   
Hi Mattias,
   
Thank you very much for fixing the problem so fast. I will try it
 as
   soon
as the new changes will be available in the maven repository.
   
Núria.
   
   
2009/12/4 Mattias Persson matt...@neotechnology.com
   
I fixed the problem and also added a cache per key for faster
getNodes/getSingleNode lookup during the insert process. However
  the
cache assumes that there's nothing in the index when the process
starts (which almost always will be true) to speed things up
 even
further.
   
You can control the cache size and if it should be used by
  overriding
the (this is also documented in the Javadoc):
   
boolean useCache()
int getMaxCacheSizePerKey()
   
methods in your LuceneIndexBatchInserterImpl instance. The new
  changes
should be available in the maven repository within an hour.
   
2009/12/4 Mattias Persson matt...@neotechnology.com:
 I think I found the problem... it's indexing as it should, but
 it
 isn't reflected 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Oh ok, It could be our attachments filter / security or something...
could you try to mail them to me directly at matt...@neotechnology.com
?

2009/12/9 Núria Trench nuriatre...@gmail.com:
 Hi Mattias,

 In my last e-mail I have attached the sample code, haven't you received it?
 I will try to attach it again.

 Núria.

 2009/12/9 Mattias Persson matt...@neotechnology.com

 Hi again, Núria (it was I, Mattias who asked for the sample code).
 Well... the fact that you parse 4 csv files doesn't really help me
 setup a test for this... I mean how can I know that my test will be
 similar to yours? Would it be ok to attach your code/csv files as
 well?

 / Mattias

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Hi Todd,
 
  The sample code creates nodes and relationships by parsing 4 csv files.
  Thank you for trying to trigger this behaviour with this sample.
 
  Núria
 
  2009/12/9 Mattias Persson matt...@neotechnology.com
 
  Could you provide me with some sample code which can trigger this
  behaviour with the latest index-util-0.9-SNAPSHOT Núria?
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
   Todd,
  
   I haven't the same problem. In my case, after indexing all the
   attributes/properties of each node, the application creates all the
 edges
  by
   looking up the tail node and the head node. So, it calls the method
   org.neo4j.util.index.
   LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found
  node)
   in many occasions.
  
   Any one has an alternative to get a node with indexex
  attributes/properties?
  
   Thank you,
  
   Núria.
  
  
   2009/12/7 Mattias Persson matt...@neotechnology.com
  
   Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
   is a bug that we fixed yesterday... (assuming it's the same bug).
  
   2009/12/7 Todd Stavish toddstav...@gmail.com:
Hi Mattias, Núria.
   
I am also running into scalability problems with the Lucene batch
inserter at much smaller numbers, 30,000 indexed nodes. I tried
calling optimize more. Increasing ulimit didn't help.
   
INFO] Exception in thread main java.lang.RuntimeException:
java.io.FileNotFoundException:
   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
[INFO]  at
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
[INFO]  at
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
[INFO]  at
  
 com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
[INFO]  at
 com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
[INFO] Caused by: java.io.FileNotFoundException:
   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
   
I tried breaking up to separate batchinserter instances, and it
 hangs
now. Can I create more than one batch inserter per process if they
 run
sequentially and non-threaded?
   
Thanks,
Todd
   
   
   
   
   
On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
 nuriatre...@gmail.com
   wrote:
Hi again Mattias,
   
I have tried to execute my application with the last version
  available
   in
the maven repository and I still have the same problem. After
  creating
   and
indexing all the nodes, the application calls the optimize
 method
  and,
then, it creates all the edges by calling the method getNodes in
  order
   to
select the tail and head node of the edge, but it doesn't work
  because
   many
nodes are not found.
   
I have tried to create only 30 nodes and 15 edges and it works
  properly,
   but
if I try to create a big graph (180 million edges + 20 million
 nodes)
  it
doesn't.
   
I have also tried to call the optimize method every time the
   application
has been created 1 million nodes but it doesn't work.
   
Have you tried to create as many nodes as I have said with the
 newer
index-util version?
   
Thank you,
   
Núria.
   
2009/12/4 Núria Trench nuriatre...@gmail.com
   
Hi Mattias,
   
Thank you very much for fixing the problem so fast. I will try it
 as
   soon
as the new changes will be available in the maven repository.
   
Núria.
   
   
2009/12/4 Mattias Persson matt...@neotechnology.com
   
I fixed the problem and also added a cache per key for faster
getNodes/getSingleNode lookup during the insert process. However
  the
cache assumes that there's nothing in the index when the process
starts (which almost always will be true) to speed things up
 even
further.
   
You can control the cache size and if it should be used by
  overriding
the (this is also documented in the Javadoc):
   
boolean useCache()
int getMaxCacheSizePerKey()
   
methods in your LuceneIndexBatchInserterImpl instance. The new
  changes
should be available 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Mattias,

I have already done it 10 minutes ago. If you need an example to see the
format of the 4 csv files, I can send it to you.
Thanks again,

Núria.

2009/12/9 Mattias Persson matt...@neotechnology.com

 Oh ok, It could be our attachments filter / security or something...
 could you try to mail them to me directly at matt...@neotechnology.com
 ?

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Hi Mattias,
 
  In my last e-mail I have attached the sample code, haven't you received
 it?
  I will try to attach it again.
 
  Núria.
 
  2009/12/9 Mattias Persson matt...@neotechnology.com
 
  Hi again, Núria (it was I, Mattias who asked for the sample code).
  Well... the fact that you parse 4 csv files doesn't really help me
  setup a test for this... I mean how can I know that my test will be
  similar to yours? Would it be ok to attach your code/csv files as
  well?
 
  / Mattias
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
   Hi Todd,
  
   The sample code creates nodes and relationships by parsing 4 csv
 files.
   Thank you for trying to trigger this behaviour with this sample.
  
   Núria
  
   2009/12/9 Mattias Persson matt...@neotechnology.com
  
   Could you provide me with some sample code which can trigger this
   behaviour with the latest index-util-0.9-SNAPSHOT Núria?
  
   2009/12/9 Núria Trench nuriatre...@gmail.com:
Todd,
   
I haven't the same problem. In my case, after indexing all the
attributes/properties of each node, the application creates all the
  edges
   by
looking up the tail node and the head node. So, it calls the method
org.neo4j.util.index.
LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no
 found
   node)
in many occasions.
   
Any one has an alternative to get a node with indexex
   attributes/properties?
   
Thank you,
   
Núria.
   
   
2009/12/7 Mattias Persson matt...@neotechnology.com
   
Todd, are you sure you have the latest index-util 0.9-SNAPSHOT?
 This
is a bug that we fixed yesterday... (assuming it's the same bug).
   
2009/12/7 Todd Stavish toddstav...@gmail.com:
 Hi Mattias, Núria.

 I am also running into scalability problems with the Lucene
 batch
 inserter at much smaller numbers, 30,000 indexed nodes. I tried
 calling optimize more. Increasing ulimit didn't help.

 INFO] Exception in thread main java.lang.RuntimeException:
 java.io.FileNotFoundException:

   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
 (Too many open files)
 [INFO]  at
   
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
 [INFO]  at
   
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
 [INFO]  at
   
  com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
 [INFO]  at
  com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
 [INFO] Caused by: java.io.FileNotFoundException:

   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
 (Too many open files)

 I tried breaking up to separate batchinserter instances, and it
  hangs
 now. Can I create more than one batch inserter per process if
 they
  run
 sequentially and non-threaded?

 Thanks,
 Todd





 On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
  nuriatre...@gmail.com
wrote:
 Hi again Mattias,

 I have tried to execute my application with the last version
   available
in
 the maven repository and I still have the same problem. After
   creating
and
 indexing all the nodes, the application calls the optimize
  method
   and,
 then, it creates all the edges by calling the method getNodes
 in
   order
to
 select the tail and head node of the edge, but it doesn't work
   because
many
 nodes are not found.

 I have tried to create only 30 nodes and 15 edges and it works
   properly,
but
 if I try to create a big graph (180 million edges + 20 million
  nodes)
   it
 doesn't.

 I have also tried to call the optimize method every time the
application
 has been created 1 million nodes but it doesn't work.

 Have you tried to create as many nodes as I have said with the
  newer
 index-util version?

 Thank you,

 Núria.

 2009/12/4 Núria Trench nuriatre...@gmail.com

 Hi Mattias,

 Thank you very much for fixing the problem so fast. I will try
 it
  as
soon
 as the new changes will be available in the maven repository.

 Núria.


 2009/12/4 Mattias Persson matt...@neotechnology.com

 I fixed the problem and also added a cache per key for faster
 getNodes/getSingleNode lookup during the insert process.
 However
   the
 cache assumes that there's nothing in the index when 

Re: [Neo] Type metadata in properties/nodes

2009-12-09 Thread Tobias Ivarsson
Associating nodes with a type node is a good approach, especially if you
want to be able to do queries like give me all nodes of type X. But for
knowing the semantic type of a node when found through a general traversal I
prefer to use the navigational context of the node. For example if I have a
Person-node I know that the node at the other end of a FRIEND-relationship
will be a Person-node as well. Or if I have i Car-node I know that the node
at the other end of a OWNER-relationship will be either a Person or a
Company, both of which probably have enough in common for me to be able to
get an address (for sending them the parking ticket or what ever), if I need
to specifically know if it's a Person or a Company, I could use some
property for that information (or check the relationship to a type node),
but most of the semantic information would be known from how I reached the
node.

I have added a note about this to the FAQ in the wiki.

Cheers,
Tobias

On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta 
rick.bullo...@burningskysoftware.com wrote:

 Thanks, Peter.  Good info.  I think we ended up with a hybrid approach: we
 modeled a set of Type nodes (related to a master Types node), each of
 which includes the type metadata (property/type data) for a specific
 type.
 Instance nodes then maintain a two-way relationship with their associated
 Type node so that any node can quickly obtain its Type node and so we can
 easily traverse all instances of a specific type...and we may end up
 extending this such that the properties themselves are each a node of their
 own, in some cases, where we need to be able to relate/search/traverse at a
 very detailed level.  I suppose that depends on the performance
 implications
 of having lots more nodes and relationships.

 In any case, it definitely seems do-able with Neo.




 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
 On
 Behalf Of Peter Neubauer
 Sent: Tuesday, December 08, 2009 3:25 PM
 To: Neo user discussions
 Subject: Re: [Neo] Type metadata in properties/nodes

 Hi Rick,
 there are a number of interesting approaches to this, involving both
 ways to retain the metadata:

 1. RDF and OWL
 - basically, every node will maintain a relationship to its type node
 (your shadow node), something like x?--RDF:TYPE--type_node which
 contains info on what the type is, what properties etc.

 2. Neo4j Meta package (http://components.neo4j.org/neo-meta/)
 - this is the concept of describing the type of things in code (Java
 in this case) and thus in code enforce the restrictions and type
 conversions on properties through the code. This does not capture any
 meta info in the graph but is easy to do.

 3. Annotate the nodes with type info
 - in this approach, there is a type or classname property on any
 node that is used to derive the type to deserialize/serialize the
 object into, the rest of the meta info is contained in the upper code
 layers. Andreas Ronges JRuby bindings are using this approach.

 4. Encode everything into a String property
 - this approach means shuffling everything into a string property,
 basically treating properties as BLOBs. Works in some cases, but
 certainly locks down your data in these properties.

 What is best depends on your domain, and there might be more
 approaches out there. I sense that you are asking even for an
 extensible type system especially on properties. That is not in scope
 of the core graph engine, but I am not sure if in theory it would be
 possible to extend the property type system, we would need to discuss
 that separately.

 Cheers,

 /peter neubauer

 COO and Sales, Neo Technology

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org- Relationships count.
 http://gremlin.tinkerpop.com - PageRank in 2 lines of code.



 On Tue, Dec 8, 2009 at 8:43 PM, Rick Bullotta
 rick.bullo...@burningskysoftware.com wrote:
  I can see how relationships could be used to map is a duck. typing, but
  I'm struggling with how to infer type from properties.  In particular,
 while
  anything could be stuffed into a String, it loses important semantics
 when
  you do so.  I'm not referring to *storage* as a String, which makes
 plenty
  of sense - it's that the type identity of the source property is lost if
 you
  do so.  I could maintain a shadow node of the type metadata that could
 be
  related to each instance with a property name/property type array, but
 that
  seems like something that would be useful within the node model itself.
 
 
 
  Types like DateTime, hyperlinks, and so on, while quite easily storable
 in
  Neo4J, lose useful semantics on the way in.  I'd welcome your thoughts on
  how others have managed this type of scenario and other techniques for
 meta
  tagging nodes and properties with type or other 

Re: [Neo] Type metadata in properties/nodes

2009-12-09 Thread rick . bullotta
   Hi, Tobias.



   Thanks for your thoughts and ideas.



   My requirement is not only to know the type of something, but also to
   store metadata for types so that I can catalog the property type of
   each individual property in a node for a given type.  It's a bit
   complicated, but we are allowing very dynamic declarative types that
   will not have an explicit compiled Java class wrapper for each type
   (we will have a generic wrapper that deals with the dynamic type, and
   some explicit wrapper for pre-defined entities).   The main reason is
   that we need to deal with a few data types beyond the Java primitives
   and String(s).  For example, we want to be able to know contextually
   that a property is a timestamp or a hyperlink.  Thus the need for
   the extra (but relatively simple) metadata.



   It might be useful to identify a commonly use subset of addition
   property types that correspond to, for example, the most common RDBMS
   data types and XML schema types.  This might include date, time,
   datetime, link, and so on.  Since at the persistence level it appears
   that a property is saved along with an integer enumeration of its
   simple type, perhaps there is an extensibility model that could be
   implemented to allow these application-specific types to be created and
   managed.  I know that would be problematic, though, given that the
   current implementation is an enumeration.  No worries though, since
   there are perfectly good workarounds/alternatives using relationships.



   Cheers,



   Rick





    Original Message 
   Subject: Re: [Neo] Type metadata in properties/nodes
   From: Tobias Ivarsson tobias.ivars...@neotechnology.com
   Date: Wed, December 09, 2009 5:39 am
   To: Neo user discussions user@lists.neo4j.org
   Associating nodes with a type node is a good approach, especially if
   you
   want to be able to do queries like give me all nodes of type X. But
   for
   knowing the semantic type of a node when found through a general
   traversal I
   prefer to use the navigational context of the node. For example if I
   have a
   Person-node I know that the node at the other end of a
   FRIEND-relationship
   will be a Person-node as well. Or if I have i Car-node I know that the
   node
   at the other end of a OWNER-relationship will be either a Person or a
   Company, both of which probably have enough in common for me to be able
   to
   get an address (for sending them the parking ticket or what ever), if I
   need
   to specifically know if it's a Person or a Company, I could use some
   property for that information (or check the relationship to a type
   node),
   but most of the semantic information would be known from how I reached
   the
   node.
   I have added a note about this to the FAQ in the wiki.
   Cheers,
   Tobias
   On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta 
   rick.bullo...@burningskysoftware.com wrote:
Thanks, Peter. Good info. I think we ended up with a hybrid approach:
   we
modeled a set of Type nodes (related to a master Types node),
   each of
which includes the type metadata (property/type data) for a specific
type.
Instance nodes then maintain a two-way relationship with their
   associated
Type node so that any node can quickly obtain its Type node and so
   we can
easily traverse all instances of a specific type...and we may end up
extending this such that the properties themselves are each a node of
   their
own, in some cases, where we need to be able to
   relate/search/traverse at a
very detailed level. I suppose that depends on the performance
implications
of having lots more nodes and relationships.
   
In any case, it definitely seems do-able with Neo.
   
   
   
   
-Original Message-
From: user-boun...@lists.neo4j.org
   [[1]mailto:user-boun...@lists.neo4j.org]
On
Behalf Of Peter Neubauer
Sent: Tuesday, December 08, 2009 3:25 PM
To: Neo user discussions
Subject: Re: [Neo] Type metadata in properties/nodes
   
Hi Rick,
there are a number of interesting approaches to this, involving both
ways to retain the metadata:
   
1. RDF and OWL
- basically, every node will maintain a relationship to its type node
(your shadow node), something like x?--RDF:TYPE--type_node which
contains info on what the type is, what properties etc.
   
2. Neo4j Meta package ([2]http://components.neo4j.org/neo-meta/)
- this is the concept of describing the type of things in code (Java
in this case) and thus in code enforce the restrictions and type
conversions on properties through the code. This does not capture any
meta info in the graph but is easy to do.
   
3. Annotate the nodes with type info
- in this approach, there is a type or classname property on any
node that is used to derive the type to deserialize/serialize the
object into, the rest of the meta info is 

Re: [Neo] Type metadata in properties/nodes

2009-12-09 Thread Tobias Ivarsson
I see. I realized that this was what you were after. What I was proposing
was that you would know the types for the properties given the type of the
node. The types for the nodes in your case would be more abstract, perhaps
just defined by the set of properties. I used concrete types in my
explanation because it usually helps people understand what I mean with
utilizing the navigation context.

I had a suspicion that your particular application might not benefit from
this approach, but I wanted to throw it into the mix for the sake of
completeness of the discussion, since there are a lot more people reading
the list than writing in a particular thread.

Cheers,
Tobias

On Wed, Dec 9, 2009 at 2:02 PM, rick.bullo...@burningskysoftware.comwrote:

   Hi, Tobias.



   Thanks for your thoughts and ideas.



   My requirement is not only to know the type of something, but also to
   store metadata for types so that I can catalog the property type of
   each individual property in a node for a given type.  It's a bit
   complicated, but we are allowing very dynamic declarative types that
   will not have an explicit compiled Java class wrapper for each type
   (we will have a generic wrapper that deals with the dynamic type, and
   some explicit wrapper for pre-defined entities).   The main reason is
   that we need to deal with a few data types beyond the Java primitives
   and String(s).  For example, we want to be able to know contextually
   that a property is a timestamp or a hyperlink.  Thus the need for
   the extra (but relatively simple) metadata.



   It might be useful to identify a commonly use subset of addition
   property types that correspond to, for example, the most common RDBMS
   data types and XML schema types.  This might include date, time,
   datetime, link, and so on.  Since at the persistence level it appears
   that a property is saved along with an integer enumeration of its
   simple type, perhaps there is an extensibility model that could be
   implemented to allow these application-specific types to be created and
   managed.  I know that would be problematic, though, given that the
   current implementation is an enumeration.  No worries though, since
   there are perfectly good workarounds/alternatives using relationships.



   Cheers,



   Rick





    Original Message 
   Subject: Re: [Neo] Type metadata in properties/nodes
From: Tobias Ivarsson tobias.ivars...@neotechnology.com
   Date: Wed, December 09, 2009 5:39 am
   To: Neo user discussions user@lists.neo4j.org
   Associating nodes with a type node is a good approach, especially if
   you
   want to be able to do queries like give me all nodes of type X. But
   for
   knowing the semantic type of a node when found through a general
   traversal I
   prefer to use the navigational context of the node. For example if I
   have a
   Person-node I know that the node at the other end of a
   FRIEND-relationship
   will be a Person-node as well. Or if I have i Car-node I know that the
   node
   at the other end of a OWNER-relationship will be either a Person or a
   Company, both of which probably have enough in common for me to be able
   to
   get an address (for sending them the parking ticket or what ever), if I
   need
   to specifically know if it's a Person or a Company, I could use some
   property for that information (or check the relationship to a type
   node),
   but most of the semantic information would be known from how I reached
   the
   node.
   I have added a note about this to the FAQ in the wiki.
   Cheers,
   Tobias
   On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta 
   rick.bullo...@burningskysoftware.com wrote:
Thanks, Peter. Good info. I think we ended up with a hybrid approach:
   we
modeled a set of Type nodes (related to a master Types node),
   each of
which includes the type metadata (property/type data) for a specific
type.
Instance nodes then maintain a two-way relationship with their
   associated
Type node so that any node can quickly obtain its Type node and so
   we can
easily traverse all instances of a specific type...and we may end up
extending this such that the properties themselves are each a node of
   their
own, in some cases, where we need to be able to
   relate/search/traverse at a
very detailed level. I suppose that depends on the performance
implications
of having lots more nodes and relationships.
   
In any case, it definitely seems do-able with Neo.
   
   
   
   
-Original Message-
From: user-boun...@lists.neo4j.org
[[1]mailto:user-boun...@lists.neo4j.org]
On
Behalf Of Peter Neubauer
Sent: Tuesday, December 08, 2009 3:25 PM
To: Neo user discussions
Subject: Re: [Neo] Type metadata in properties/nodes
   
Hi Rick,
there are a number of interesting approaches to this, involving both
ways to retain the metadata:
   
1. RDF and OWL
- basically, 

[Neo] Noob questions/comments

2009-12-09 Thread Rick Bullotta
Hi, all.

 

Here are a few questions and comments that I'd welcome feedback on :

 

Questions:

 

-  If you delete the reference node (id = 0), how can you recreate
it?

-  If you have a number of loose or disjoint graphs structured as
trees with a single root node, is there a best practice for
tracking/iterating only the top level node(s) of these disjoint graphs?  Is
relating them to the reference node and doing a first level traversal the
best way?

-  We would like to treat our properties as slightly more complex
than a simple type (they might have a last modified date, validity flag, and
so on) - given the choice between adding properties to track this state or
using nodes and relationships for these entities, what are the pros and cons
of each approach?

-  One aspect of our application will store nodes that can be
considered similar to event logs.  There may be many thousands of these
nodes per event stream.  We would like to be able to traverse the entries
in chronological order, very quickly.  We were considering the following
design possibilities:

o   Simply create a node for each stream and a node for each entry, with a
relationship between the stream and the entry, then implement our own sort
routine

o   Similar to the above, but create a node for each day, and manage
relationships to allow traversal by stream and/or day

o   Create a node for each stream, a node for each entry and treat the
entries as a forward-only linked list using relationships between the
entries (and of course a relationship between the stream and the first
entry)

-  Has the fact that the node id is an int rather than a long
been an issue in any implementations?  Are node id's reused if deleted (I
suspect not, but just wanted to confirm).

-  Any whitepaper/best practices for high availability/load-balanced
scenarios?  We were considering using a message queue to send deltas
around between nodes or something similar.

-  We'll be hosting Neo inside a servlet engine.  Plan was to start
up Neo within the init method of an autoloading servlet.  Any other
recommendations/suggestions?  Best practice for ensuring a clean shutdown?

-  Anyone used any kind of intermediate index or other approach to
bridge multiple Neo instances?

-  Any GUI tools for viewing/navigating the graph structure?  We are
prototyping one in Adobe Flex, curious if there are others.

 

Comments/observations:

-  I love the fact that you can delete nodes and relationships from
inside an iterator.  I always hated the way I had to separately maintain a
list of things to be deleted when traversing XML DOMs, for example.  Nice
capability!

-  Neo seems FAST!

-  It's a bit of a major mindset change, but once the lightbulb goes
on, the potential seems limitless!

 

Thanks in advance for guidance.

 

Rick

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Troubleshooting performance/memory issues

2009-12-09 Thread Rick Bullotta
Hi, all.

 

When trying to load a few hundred thousand nodes  relationships (chunking
it in groups of 1000 nodes or so), we are getting an out of memory heap
error after 15-20 minutes or so.  No big deal, we expanded the heap settings
for the JVM.  But then we also noticed that the nioneo_logical_log.xxx file
was continuing to grow, even though we were wrapping each 1000 node inserts
in their own transaction (there is no other transaction active) and
committing w/success and finishing each group of 1000.Periodically
(seemingly unrelated to our transaction finishing), that file shrinks again
and the data is flushed to the other neo propertystore and relationshipstore
files.  I just wanted to check if that was normal behavior, or if there is
something wrong with way we (or Neo) is handling the transactions, and thus
the reason we hit an out-of-memory error.

 

Thanks,

 

Rick

 

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Troubleshooting performance/memory issues

2009-12-09 Thread Rick Bullotta
FYI, we experimented with different heap size (1GB), along with different
chunk sizes, and were able to eliminate the heap error and get about a 10X
improvement in insert speed.  It would be helpful to better understand the
interactions of the various Neo startup parameters, transaction buffers, and
so on, and their impact on performance.  I read the performance guidelines,
which was some help, but perhaps some additional scenario-based
recommendations might help (frequent updates/frequent access, infrequent
update/frequent access, burst mode update vs steady update rate, etc...).  

Learning more about Neo every hour!

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Rick Bullotta
Sent: Wednesday, December 09, 2009 2:57 PM
To: 'Neo user discussions'
Subject: [Neo] Troubleshooting performance/memory issues

Hi, all.

 

When trying to load a few hundred thousand nodes  relationships (chunking
it in groups of 1000 nodes or so), we are getting an out of memory heap
error after 15-20 minutes or so.  No big deal, we expanded the heap settings
for the JVM.  But then we also noticed that the nioneo_logical_log.xxx file
was continuing to grow, even though we were wrapping each 1000 node inserts
in their own transaction (there is no other transaction active) and
committing w/success and finishing each group of 1000.Periodically
(seemingly unrelated to our transaction finishing), that file shrinks again
and the data is flushed to the other neo propertystore and relationshipstore
files.  I just wanted to check if that was normal behavior, or if there is
something wrong with way we (or Neo) is handling the transactions, and thus
the reason we hit an out-of-memory error.

 

Thanks,

 

Rick

 

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] I/O load in Neo during traversals

2009-12-09 Thread Rick Bullotta
When doing some large traversal testing (no writes/updates), I noticed that
the neostore.propertystore.db.strings file was seeing a lot of read I/O (as
expected) but also a huge amount of write I/O (almost 5X the read I/O rate).
Out of curiosity, what is the write activity that needs to occur when doing
traversals?

 

 

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user