Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-26 Thread Peter Neubauer
Hi Núria,
the current ID-scheme of using Integers for IDs for both Nodes,
Relationships and Properties limits the possible node space size to 4
Billion nodes, 4 Billion Relationships and 4 Billion properties. Of
course one could switch to Long as IDs, but that will increase the
reserved amount of bytes and cause possible performance penalties.
However, this is the current limit, after that you have to start
thinking about sharding along a suitable domain-specific criteria.
What size and domain are you imagining?

However, when dealing with bigger nodespaces you probably want to
increase RAM of your server machine and think about SSD in order to
keep the often-used parts of your graph cached and minimize IO cost.

HTH

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk:  neubauer.peter
Skype   peter.neubauer
Phone   +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter  http://twitter.com/peterneubauer

http://www.neo4j.org- Relationships count.
http://gremlin.tinkerpop.com- PageRank in 2 lines of code.
http://www.linkedprocess.org   - Computing at LinkedData scale.



On Sat, Dec 26, 2009 at 4:10 PM, Núria Trench nuriatre...@gmail.com wrote:
 Hi,

 I have just finished parsing and creating the database with the latest
 index-util-0.9-SNAPSHOT available in your repository. It has been finished
 succesfully so I must thank you for your interest and useful help.
 And, finally, I have one last question. I have been created 180 million of
 edges and 20 million of nodes. Is it possible to create a bigger amount of
 edges and nodes with Neo4j? Do you have a limit?

 Thank your very much again.

 2009/12/21 Núria Trench nuriatre...@gmail.com

 Hi again Mattias,

 I'm still trying to parse all the data in order to create the graph. I will
 report the results as soon as possible.
 Thank you very much for your interest.

 Núria.

 2009/12/21 Mattias Persson matt...@neotechnology.com

 Hi again,

 any luck with this yet?

 2009/12/11 Núria Trench nuriatre...@gmail.com:
  Thank you very much Mattias. I will test it as soon as possible and I'll
  will tell you something.
 
  Núria.
 
  2009/12/11 Mattias Persson matt...@neotechnology.com
 
  I've tried this a couple of times now and first of all I see some
  problems in your code:
 
  1) In the method createRelationsTitleImage you have an inverted head
  != -1 check where it should be head == -1
 
  2) You index relationships in createRelationsBetweenTitles method,
  this isn't ok since the index can only manage nodes.
 
  And I recently committed a fix which removed the caching layer in
  the LuceneIndexBatchInserterImpl (and therefore also
  LuceneFulltextIndexBatchInserter). This probably fixes your problems.
  I'm also working on a performance fix which makes consecutive getNodes
  calls faster.
 
  So I think that with these fixes (1) and (2) and the latest index-util
  0.9-SNAPSHOT your sample will run fine. Also you could try without
  calling optimize. See more information at
  http://wiki.neo4j.org/content/Indexing_with_BatchInserter
 
  2009/12/10 Mattias Persson matt...@neotechnology.com:
   To continue this thread in the user list:
  
   Thanks Núria, I've gotten your samples code/files and I'm running it
   now to try to reproduce you problem.
  
   2009/12/9 Núria Trench nuriatre...@gmail.com:
   I have finished uploading the 4 csv files. You'll see an e-mail with
 the
   other 3 csv files packed in a rar file.
   Thanks,
  
   Núria.
  
   2009/12/9 Núria Trench nuriatre...@gmail.com
  
   Yes, you are right. But there is one csv file that is too big to be
  packed
   with other files and I am reducing it.
   I am sending the other files now.
  
   2009/12/9 Mattias Persson matt...@neotechnology.com
  
   By the way, you might consider packing those files (with zip or
 tar.gz
   or something) cause they will shrink quite well
  
   2009/12/9 Mattias Persson matt...@neotechnology.com:
Great, but I only got the images.csv file... I'm starting to
 test
  with
that at least
   
2009/12/9 Núria Trench nuriatre...@gmail.com:
Hi again,
   
The errors show up after being parsed 2 csv files to create all
 the
nodes,
just in the moment of calling the method getSingleNode for
  looking
up the
tail and head node for creating all the edges by reading the
 other
  two
csv
files.
   
I am sending with Sprend the four csv files that will help you
 to
trigger
index behaviour.
   
Thank you,
   
Núria.
   
2009/12/9 Mattias Persson matt...@neotechnology.com
   
Hmm, I've no idea... but does the errors show up early in the
  process
or do you have to insert a LOT of data to trigger it? In such
 case
you
could send me a part of them... maybe using
 http://www.sprend.se,
WDYT?
   
2009/12/9 Núria Trench nuriatre...@gmail.com:
 Hi Mattias,

 The data isn't confident but the files are very big (5,5
 GB).
 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-21 Thread Núria Trench
Hi again Mattias,

I'm still trying to parse all the data in order to create the graph. I will
report the results as soon as possible.
Thank you very much for your interest.

Núria.

2009/12/21 Mattias Persson matt...@neotechnology.com

 Hi again,

 any luck with this yet?

 2009/12/11 Núria Trench nuriatre...@gmail.com:
  Thank you very much Mattias. I will test it as soon as possible and I'll
  will tell you something.
 
  Núria.
 
  2009/12/11 Mattias Persson matt...@neotechnology.com
 
  I've tried this a couple of times now and first of all I see some
  problems in your code:
 
  1) In the method createRelationsTitleImage you have an inverted head
  != -1 check where it should be head == -1
 
  2) You index relationships in createRelationsBetweenTitles method,
  this isn't ok since the index can only manage nodes.
 
  And I recently committed a fix which removed the caching layer in
  the LuceneIndexBatchInserterImpl (and therefore also
  LuceneFulltextIndexBatchInserter). This probably fixes your problems.
  I'm also working on a performance fix which makes consecutive getNodes
  calls faster.
 
  So I think that with these fixes (1) and (2) and the latest index-util
  0.9-SNAPSHOT your sample will run fine. Also you could try without
  calling optimize. See more information at
  http://wiki.neo4j.org/content/Indexing_with_BatchInserter
 
  2009/12/10 Mattias Persson matt...@neotechnology.com:
   To continue this thread in the user list:
  
   Thanks Núria, I've gotten your samples code/files and I'm running it
   now to try to reproduce you problem.
  
   2009/12/9 Núria Trench nuriatre...@gmail.com:
   I have finished uploading the 4 csv files. You'll see an e-mail with
 the
   other 3 csv files packed in a rar file.
   Thanks,
  
   Núria.
  
   2009/12/9 Núria Trench nuriatre...@gmail.com
  
   Yes, you are right. But there is one csv file that is too big to be
  packed
   with other files and I am reducing it.
   I am sending the other files now.
  
   2009/12/9 Mattias Persson matt...@neotechnology.com
  
   By the way, you might consider packing those files (with zip or
 tar.gz
   or something) cause they will shrink quite well
  
   2009/12/9 Mattias Persson matt...@neotechnology.com:
Great, but I only got the images.csv file... I'm starting to test
  with
that at least
   
2009/12/9 Núria Trench nuriatre...@gmail.com:
Hi again,
   
The errors show up after being parsed 2 csv files to create all
 the
nodes,
just in the moment of calling the method getSingleNode for
  looking
up the
tail and head node for creating all the edges by reading the
 other
  two
csv
files.
   
I am sending with Sprend the four csv files that will help you
 to
trigger
index behaviour.
   
Thank you,
   
Núria.
   
2009/12/9 Mattias Persson matt...@neotechnology.com
   
Hmm, I've no idea... but does the errors show up early in the
  process
or do you have to insert a LOT of data to trigger it? In such
 case
you
could send me a part of them... maybe using
 http://www.sprend.se,
WDYT?
   
2009/12/9 Núria Trench nuriatre...@gmail.com:
 Hi Mattias,

 The data isn't confident but the files are very big (5,5 GB).
 How can I send you this data?

 2009/12/9 Mattias Persson matt...@neotechnology.com

 Yep I got the java code, thanks. Yeah if the data is
 confident
  or
 sensitive you can just send me the formatting, else consider
 sending
 the files as well (or a subset if they are big).

 2009/12/9 Núria Trench nuriatre...@gmail.com:
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Neo Technology, www.neotechnology.com
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Neo Technology, www.neotechnology.com
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-11 Thread Mattias Persson
I've tried this a couple of times now and first of all I see some
problems in your code:

1) In the method createRelationsTitleImage you have an inverted head
!= -1 check where it should be head == -1

2) You index relationships in createRelationsBetweenTitles method,
this isn't ok since the index can only manage nodes.

And I recently committed a fix which removed the caching layer in
the LuceneIndexBatchInserterImpl (and therefore also
LuceneFulltextIndexBatchInserter). This probably fixes your problems.
I'm also working on a performance fix which makes consecutive getNodes
calls faster.

So I think that with these fixes (1) and (2) and the latest index-util
0.9-SNAPSHOT your sample will run fine. Also you could try without
calling optimize. See more information at
http://wiki.neo4j.org/content/Indexing_with_BatchInserter

2009/12/10 Mattias Persson matt...@neotechnology.com:
 To continue this thread in the user list:

 Thanks Núria, I've gotten your samples code/files and I'm running it
 now to try to reproduce you problem.

 2009/12/9 Núria Trench nuriatre...@gmail.com:
 I have finished uploading the 4 csv files. You'll see an e-mail with the
 other 3 csv files packed in a rar file.
 Thanks,

 Núria.

 2009/12/9 Núria Trench nuriatre...@gmail.com

 Yes, you are right. But there is one csv file that is too big to be packed
 with other files and I am reducing it.
 I am sending the other files now.

 2009/12/9 Mattias Persson matt...@neotechnology.com

 By the way, you might consider packing those files (with zip or tar.gz
 or something) cause they will shrink quite well

 2009/12/9 Mattias Persson matt...@neotechnology.com:
  Great, but I only got the images.csv file... I'm starting to test with
  that at least
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
  Hi again,
 
  The errors show up after being parsed 2 csv files to create all the
  nodes,
  just in the moment of calling the method getSingleNode for looking
  up the
  tail and head node for creating all the edges by reading the other two
  csv
  files.
 
  I am sending with Sprend the four csv files that will help you to
  trigger
  index behaviour.
 
  Thank you,
 
  Núria.
 
  2009/12/9 Mattias Persson matt...@neotechnology.com
 
  Hmm, I've no idea... but does the errors show up early in the process
  or do you have to insert a LOT of data to trigger it? In such case
  you
  could send me a part of them... maybe using http://www.sprend.se ,
  WDYT?
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
   Hi Mattias,
  
   The data isn't confident but the files are very big (5,5 GB).
   How can I send you this data?
  
   2009/12/9 Mattias Persson matt...@neotechnology.com
  
   Yep I got the java code, thanks. Yeah if the data is confident or
   sensitive you can just send me the formatting, else consider
   sending
   the files as well (or a subset if they are big).
  
   2009/12/9 Núria Trench nuriatre...@gmail.com:



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-10 Thread Mattias Persson
To continue this thread in the user list:

Thanks Núria, I've gotten your samples code/files and I'm running it
now to try to reproduce you problem.

2009/12/9 Núria Trench nuriatre...@gmail.com:
 I have finished uploading the 4 csv files. You'll see an e-mail with the
 other 3 csv files packed in a rar file.
 Thanks,

 Núria.

 2009/12/9 Núria Trench nuriatre...@gmail.com

 Yes, you are right. But there is one csv file that is too big to be packed
 with other files and I am reducing it.
 I am sending the other files now.

 2009/12/9 Mattias Persson matt...@neotechnology.com

 By the way, you might consider packing those files (with zip or tar.gz
 or something) cause they will shrink quite well

 2009/12/9 Mattias Persson matt...@neotechnology.com:
  Great, but I only got the images.csv file... I'm starting to test with
  that at least
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
  Hi again,
 
  The errors show up after being parsed 2 csv files to create all the
  nodes,
  just in the moment of calling the method getSingleNode for looking
  up the
  tail and head node for creating all the edges by reading the other two
  csv
  files.
 
  I am sending with Sprend the four csv files that will help you to
  trigger
  index behaviour.
 
  Thank you,
 
  Núria.
 
  2009/12/9 Mattias Persson matt...@neotechnology.com
 
  Hmm, I've no idea... but does the errors show up early in the process
  or do you have to insert a LOT of data to trigger it? In such case
  you
  could send me a part of them... maybe using http://www.sprend.se ,
  WDYT?
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
   Hi Mattias,
  
   The data isn't confident but the files are very big (5,5 GB).
   How can I send you this data?
  
   2009/12/9 Mattias Persson matt...@neotechnology.com
  
   Yep I got the java code, thanks. Yeah if the data is confident or
   sensitive you can just send me the formatting, else consider
   sending
   the files as well (or a subset if they are big).
  
   2009/12/9 Núria Trench nuriatre...@gmail.com:
   
   
  
  
  
   --
   Mattias Persson, [matt...@neotechnology.com]
   Neo Technology, www.neotechnology.com
  
  
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Neo Technology, www.neotechnology.com
 
 
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Neo Technology, www.neotechnology.com
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Neo Technology, www.neotechnology.com






-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Todd,

The sample code creates nodes and relationships by parsing 4 csv files.
Thank you for trying to trigger this behaviour with this sample.

Núria

2009/12/9 Mattias Persson matt...@neotechnology.com

 Could you provide me with some sample code which can trigger this
 behaviour with the latest index-util-0.9-SNAPSHOT Núria?

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Todd,
 
  I haven't the same problem. In my case, after indexing all the
  attributes/properties of each node, the application creates all the edges
 by
  looking up the tail node and the head node. So, it calls the method
  org.neo4j.util.index.
  LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found
 node)
  in many occasions.
 
  Any one has an alternative to get a node with indexex
 attributes/properties?
 
  Thank you,
 
  Núria.
 
 
  2009/12/7 Mattias Persson matt...@neotechnology.com
 
  Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
  is a bug that we fixed yesterday... (assuming it's the same bug).
 
  2009/12/7 Todd Stavish toddstav...@gmail.com:
   Hi Mattias, Núria.
  
   I am also running into scalability problems with the Lucene batch
   inserter at much smaller numbers, 30,000 indexed nodes. I tried
   calling optimize more. Increasing ulimit didn't help.
  
   INFO] Exception in thread main java.lang.RuntimeException:
   java.io.FileNotFoundException:
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
   (Too many open files)
   [INFO]  at
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
   [INFO]  at
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
   [INFO]  at
  com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
   [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
   [INFO] Caused by: java.io.FileNotFoundException:
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
   (Too many open files)
  
   I tried breaking up to separate batchinserter instances, and it hangs
   now. Can I create more than one batch inserter per process if they run
   sequentially and non-threaded?
  
   Thanks,
   Todd
  
  
  
  
  
   On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com
  wrote:
   Hi again Mattias,
  
   I have tried to execute my application with the last version
 available
  in
   the maven repository and I still have the same problem. After
 creating
  and
   indexing all the nodes, the application calls the optimize method
 and,
   then, it creates all the edges by calling the method getNodes in
 order
  to
   select the tail and head node of the edge, but it doesn't work
 because
  many
   nodes are not found.
  
   I have tried to create only 30 nodes and 15 edges and it works
 properly,
  but
   if I try to create a big graph (180 million edges + 20 million nodes)
 it
   doesn't.
  
   I have also tried to call the optimize method every time the
  application
   has been created 1 million nodes but it doesn't work.
  
   Have you tried to create as many nodes as I have said with the newer
   index-util version?
  
   Thank you,
  
   Núria.
  
   2009/12/4 Núria Trench nuriatre...@gmail.com
  
   Hi Mattias,
  
   Thank you very much for fixing the problem so fast. I will try it as
  soon
   as the new changes will be available in the maven repository.
  
   Núria.
  
  
   2009/12/4 Mattias Persson matt...@neotechnology.com
  
   I fixed the problem and also added a cache per key for faster
   getNodes/getSingleNode lookup during the insert process. However
 the
   cache assumes that there's nothing in the index when the process
   starts (which almost always will be true) to speed things up even
   further.
  
   You can control the cache size and if it should be used by
 overriding
   the (this is also documented in the Javadoc):
  
   boolean useCache()
   int getMaxCacheSizePerKey()
  
   methods in your LuceneIndexBatchInserterImpl instance. The new
 changes
   should be available in the maven repository within an hour.
  
   2009/12/4 Mattias Persson matt...@neotechnology.com:
I think I found the problem... it's indexing as it should, but it
isn't reflected in getNodes/getSingleNode properly until you
flush/optimize/shutdown the index. I'll try to fix it today!
   
2009/12/3 Núria Trench nuriatre...@gmail.com:
Thank you very much for your response.
If you need more information, you only have to send an e-mail
 and I
   will try
to explain it better.
   
Núria.
   
2009/12/3 Mattias Persson matt...@neotechnology.com
   
This is something I'd like to reproduce and I'll do some
 testing
  on
this tomorrow
   
2009/12/3 Núria Trench nuriatre...@gmail.com:
 Hello,

 Last week, I decided to download your graph database core in
  order
   to use
 it. First, I created a new project to 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Hi again, Núria (it was I, Mattias who asked for the sample code).
Well... the fact that you parse 4 csv files doesn't really help me
setup a test for this... I mean how can I know that my test will be
similar to yours? Would it be ok to attach your code/csv files as
well?

/ Mattias

2009/12/9 Núria Trench nuriatre...@gmail.com:
 Hi Todd,

 The sample code creates nodes and relationships by parsing 4 csv files.
 Thank you for trying to trigger this behaviour with this sample.

 Núria

 2009/12/9 Mattias Persson matt...@neotechnology.com

 Could you provide me with some sample code which can trigger this
 behaviour with the latest index-util-0.9-SNAPSHOT Núria?

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Todd,
 
  I haven't the same problem. In my case, after indexing all the
  attributes/properties of each node, the application creates all the edges
 by
  looking up the tail node and the head node. So, it calls the method
  org.neo4j.util.index.
  LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found
 node)
  in many occasions.
 
  Any one has an alternative to get a node with indexex
 attributes/properties?
 
  Thank you,
 
  Núria.
 
 
  2009/12/7 Mattias Persson matt...@neotechnology.com
 
  Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
  is a bug that we fixed yesterday... (assuming it's the same bug).
 
  2009/12/7 Todd Stavish toddstav...@gmail.com:
   Hi Mattias, Núria.
  
   I am also running into scalability problems with the Lucene batch
   inserter at much smaller numbers, 30,000 indexed nodes. I tried
   calling optimize more. Increasing ulimit didn't help.
  
   INFO] Exception in thread main java.lang.RuntimeException:
   java.io.FileNotFoundException:
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
   (Too many open files)
   [INFO]  at
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
   [INFO]  at
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
   [INFO]  at
  com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
   [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
   [INFO] Caused by: java.io.FileNotFoundException:
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
   (Too many open files)
  
   I tried breaking up to separate batchinserter instances, and it hangs
   now. Can I create more than one batch inserter per process if they run
   sequentially and non-threaded?
  
   Thanks,
   Todd
  
  
  
  
  
   On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com
  wrote:
   Hi again Mattias,
  
   I have tried to execute my application with the last version
 available
  in
   the maven repository and I still have the same problem. After
 creating
  and
   indexing all the nodes, the application calls the optimize method
 and,
   then, it creates all the edges by calling the method getNodes in
 order
  to
   select the tail and head node of the edge, but it doesn't work
 because
  many
   nodes are not found.
  
   I have tried to create only 30 nodes and 15 edges and it works
 properly,
  but
   if I try to create a big graph (180 million edges + 20 million nodes)
 it
   doesn't.
  
   I have also tried to call the optimize method every time the
  application
   has been created 1 million nodes but it doesn't work.
  
   Have you tried to create as many nodes as I have said with the newer
   index-util version?
  
   Thank you,
  
   Núria.
  
   2009/12/4 Núria Trench nuriatre...@gmail.com
  
   Hi Mattias,
  
   Thank you very much for fixing the problem so fast. I will try it as
  soon
   as the new changes will be available in the maven repository.
  
   Núria.
  
  
   2009/12/4 Mattias Persson matt...@neotechnology.com
  
   I fixed the problem and also added a cache per key for faster
   getNodes/getSingleNode lookup during the insert process. However
 the
   cache assumes that there's nothing in the index when the process
   starts (which almost always will be true) to speed things up even
   further.
  
   You can control the cache size and if it should be used by
 overriding
   the (this is also documented in the Javadoc):
  
   boolean useCache()
   int getMaxCacheSizePerKey()
  
   methods in your LuceneIndexBatchInserterImpl instance. The new
 changes
   should be available in the maven repository within an hour.
  
   2009/12/4 Mattias Persson matt...@neotechnology.com:
I think I found the problem... it's indexing as it should, but it
isn't reflected in getNodes/getSingleNode properly until you
flush/optimize/shutdown the index. I'll try to fix it today!
   
2009/12/3 Núria Trench nuriatre...@gmail.com:
Thank you very much for your response.
If you need more information, you only have to send an e-mail
 and I
   will try
to explain it better.
   
Núria.
   
2009/12/3 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Mattias,

In my last e-mail I have attached the sample code, haven't you received it?
I will try to attach it again.

Núria.

2009/12/9 Mattias Persson matt...@neotechnology.com

 Hi again, Núria (it was I, Mattias who asked for the sample code).
 Well... the fact that you parse 4 csv files doesn't really help me
 setup a test for this... I mean how can I know that my test will be
 similar to yours? Would it be ok to attach your code/csv files as
 well?

 / Mattias

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Hi Todd,
 
  The sample code creates nodes and relationships by parsing 4 csv files.
  Thank you for trying to trigger this behaviour with this sample.
 
  Núria
 
  2009/12/9 Mattias Persson matt...@neotechnology.com
 
  Could you provide me with some sample code which can trigger this
  behaviour with the latest index-util-0.9-SNAPSHOT Núria?
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
   Todd,
  
   I haven't the same problem. In my case, after indexing all the
   attributes/properties of each node, the application creates all the
 edges
  by
   looking up the tail node and the head node. So, it calls the method
   org.neo4j.util.index.
   LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found
  node)
   in many occasions.
  
   Any one has an alternative to get a node with indexex
  attributes/properties?
  
   Thank you,
  
   Núria.
  
  
   2009/12/7 Mattias Persson matt...@neotechnology.com
  
   Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
   is a bug that we fixed yesterday... (assuming it's the same bug).
  
   2009/12/7 Todd Stavish toddstav...@gmail.com:
Hi Mattias, Núria.
   
I am also running into scalability problems with the Lucene batch
inserter at much smaller numbers, 30,000 indexed nodes. I tried
calling optimize more. Increasing ulimit didn't help.
   
INFO] Exception in thread main java.lang.RuntimeException:
java.io.FileNotFoundException:
   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
[INFO]  at
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
[INFO]  at
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
[INFO]  at
  
 com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
[INFO]  at
 com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
[INFO] Caused by: java.io.FileNotFoundException:
   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
   
I tried breaking up to separate batchinserter instances, and it
 hangs
now. Can I create more than one batch inserter per process if they
 run
sequentially and non-threaded?
   
Thanks,
Todd
   
   
   
   
   
On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
 nuriatre...@gmail.com
   wrote:
Hi again Mattias,
   
I have tried to execute my application with the last version
  available
   in
the maven repository and I still have the same problem. After
  creating
   and
indexing all the nodes, the application calls the optimize
 method
  and,
then, it creates all the edges by calling the method getNodes in
  order
   to
select the tail and head node of the edge, but it doesn't work
  because
   many
nodes are not found.
   
I have tried to create only 30 nodes and 15 edges and it works
  properly,
   but
if I try to create a big graph (180 million edges + 20 million
 nodes)
  it
doesn't.
   
I have also tried to call the optimize method every time the
   application
has been created 1 million nodes but it doesn't work.
   
Have you tried to create as many nodes as I have said with the
 newer
index-util version?
   
Thank you,
   
Núria.
   
2009/12/4 Núria Trench nuriatre...@gmail.com
   
Hi Mattias,
   
Thank you very much for fixing the problem so fast. I will try it
 as
   soon
as the new changes will be available in the maven repository.
   
Núria.
   
   
2009/12/4 Mattias Persson matt...@neotechnology.com
   
I fixed the problem and also added a cache per key for faster
getNodes/getSingleNode lookup during the insert process. However
  the
cache assumes that there's nothing in the index when the process
starts (which almost always will be true) to speed things up
 even
further.
   
You can control the cache size and if it should be used by
  overriding
the (this is also documented in the Javadoc):
   
boolean useCache()
int getMaxCacheSizePerKey()
   
methods in your LuceneIndexBatchInserterImpl instance. The new
  changes
should be available in the maven repository within an hour.
   
2009/12/4 Mattias Persson matt...@neotechnology.com:
 I think I found the problem... it's indexing as it should, but
 it
 isn't reflected 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Oh ok, It could be our attachments filter / security or something...
could you try to mail them to me directly at matt...@neotechnology.com
?

2009/12/9 Núria Trench nuriatre...@gmail.com:
 Hi Mattias,

 In my last e-mail I have attached the sample code, haven't you received it?
 I will try to attach it again.

 Núria.

 2009/12/9 Mattias Persson matt...@neotechnology.com

 Hi again, Núria (it was I, Mattias who asked for the sample code).
 Well... the fact that you parse 4 csv files doesn't really help me
 setup a test for this... I mean how can I know that my test will be
 similar to yours? Would it be ok to attach your code/csv files as
 well?

 / Mattias

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Hi Todd,
 
  The sample code creates nodes and relationships by parsing 4 csv files.
  Thank you for trying to trigger this behaviour with this sample.
 
  Núria
 
  2009/12/9 Mattias Persson matt...@neotechnology.com
 
  Could you provide me with some sample code which can trigger this
  behaviour with the latest index-util-0.9-SNAPSHOT Núria?
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
   Todd,
  
   I haven't the same problem. In my case, after indexing all the
   attributes/properties of each node, the application creates all the
 edges
  by
   looking up the tail node and the head node. So, it calls the method
   org.neo4j.util.index.
   LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found
  node)
   in many occasions.
  
   Any one has an alternative to get a node with indexex
  attributes/properties?
  
   Thank you,
  
   Núria.
  
  
   2009/12/7 Mattias Persson matt...@neotechnology.com
  
   Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
   is a bug that we fixed yesterday... (assuming it's the same bug).
  
   2009/12/7 Todd Stavish toddstav...@gmail.com:
Hi Mattias, Núria.
   
I am also running into scalability problems with the Lucene batch
inserter at much smaller numbers, 30,000 indexed nodes. I tried
calling optimize more. Increasing ulimit didn't help.
   
INFO] Exception in thread main java.lang.RuntimeException:
java.io.FileNotFoundException:
   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
[INFO]  at
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
[INFO]  at
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
[INFO]  at
  
 com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
[INFO]  at
 com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
[INFO] Caused by: java.io.FileNotFoundException:
   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
   
I tried breaking up to separate batchinserter instances, and it
 hangs
now. Can I create more than one batch inserter per process if they
 run
sequentially and non-threaded?
   
Thanks,
Todd
   
   
   
   
   
On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
 nuriatre...@gmail.com
   wrote:
Hi again Mattias,
   
I have tried to execute my application with the last version
  available
   in
the maven repository and I still have the same problem. After
  creating
   and
indexing all the nodes, the application calls the optimize
 method
  and,
then, it creates all the edges by calling the method getNodes in
  order
   to
select the tail and head node of the edge, but it doesn't work
  because
   many
nodes are not found.
   
I have tried to create only 30 nodes and 15 edges and it works
  properly,
   but
if I try to create a big graph (180 million edges + 20 million
 nodes)
  it
doesn't.
   
I have also tried to call the optimize method every time the
   application
has been created 1 million nodes but it doesn't work.
   
Have you tried to create as many nodes as I have said with the
 newer
index-util version?
   
Thank you,
   
Núria.
   
2009/12/4 Núria Trench nuriatre...@gmail.com
   
Hi Mattias,
   
Thank you very much for fixing the problem so fast. I will try it
 as
   soon
as the new changes will be available in the maven repository.
   
Núria.
   
   
2009/12/4 Mattias Persson matt...@neotechnology.com
   
I fixed the problem and also added a cache per key for faster
getNodes/getSingleNode lookup during the insert process. However
  the
cache assumes that there's nothing in the index when the process
starts (which almost always will be true) to speed things up
 even
further.
   
You can control the cache size and if it should be used by
  overriding
the (this is also documented in the Javadoc):
   
boolean useCache()
int getMaxCacheSizePerKey()
   
methods in your LuceneIndexBatchInserterImpl instance. The new
  changes
should be available 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Mattias,

I have already done it 10 minutes ago. If you need an example to see the
format of the 4 csv files, I can send it to you.
Thanks again,

Núria.

2009/12/9 Mattias Persson matt...@neotechnology.com

 Oh ok, It could be our attachments filter / security or something...
 could you try to mail them to me directly at matt...@neotechnology.com
 ?

 2009/12/9 Núria Trench nuriatre...@gmail.com:
  Hi Mattias,
 
  In my last e-mail I have attached the sample code, haven't you received
 it?
  I will try to attach it again.
 
  Núria.
 
  2009/12/9 Mattias Persson matt...@neotechnology.com
 
  Hi again, Núria (it was I, Mattias who asked for the sample code).
  Well... the fact that you parse 4 csv files doesn't really help me
  setup a test for this... I mean how can I know that my test will be
  similar to yours? Would it be ok to attach your code/csv files as
  well?
 
  / Mattias
 
  2009/12/9 Núria Trench nuriatre...@gmail.com:
   Hi Todd,
  
   The sample code creates nodes and relationships by parsing 4 csv
 files.
   Thank you for trying to trigger this behaviour with this sample.
  
   Núria
  
   2009/12/9 Mattias Persson matt...@neotechnology.com
  
   Could you provide me with some sample code which can trigger this
   behaviour with the latest index-util-0.9-SNAPSHOT Núria?
  
   2009/12/9 Núria Trench nuriatre...@gmail.com:
Todd,
   
I haven't the same problem. In my case, after indexing all the
attributes/properties of each node, the application creates all the
  edges
   by
looking up the tail node and the head node. So, it calls the method
org.neo4j.util.index.
LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no
 found
   node)
in many occasions.
   
Any one has an alternative to get a node with indexex
   attributes/properties?
   
Thank you,
   
Núria.
   
   
2009/12/7 Mattias Persson matt...@neotechnology.com
   
Todd, are you sure you have the latest index-util 0.9-SNAPSHOT?
 This
is a bug that we fixed yesterday... (assuming it's the same bug).
   
2009/12/7 Todd Stavish toddstav...@gmail.com:
 Hi Mattias, Núria.

 I am also running into scalability problems with the Lucene
 batch
 inserter at much smaller numbers, 30,000 indexed nodes. I tried
 calling optimize more. Increasing ulimit didn't help.

 INFO] Exception in thread main java.lang.RuntimeException:
 java.io.FileNotFoundException:

   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
 (Too many open files)
 [INFO]  at
   
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
 [INFO]  at
   
  
 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
 [INFO]  at
   
  com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
 [INFO]  at
  com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
 [INFO] Caused by: java.io.FileNotFoundException:

   
  
 
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
 (Too many open files)

 I tried breaking up to separate batchinserter instances, and it
  hangs
 now. Can I create more than one batch inserter per process if
 they
  run
 sequentially and non-threaded?

 Thanks,
 Todd





 On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
  nuriatre...@gmail.com
wrote:
 Hi again Mattias,

 I have tried to execute my application with the last version
   available
in
 the maven repository and I still have the same problem. After
   creating
and
 indexing all the nodes, the application calls the optimize
  method
   and,
 then, it creates all the edges by calling the method getNodes
 in
   order
to
 select the tail and head node of the edge, but it doesn't work
   because
many
 nodes are not found.

 I have tried to create only 30 nodes and 15 edges and it works
   properly,
but
 if I try to create a big graph (180 million edges + 20 million
  nodes)
   it
 doesn't.

 I have also tried to call the optimize method every time the
application
 has been created 1 million nodes but it doesn't work.

 Have you tried to create as many nodes as I have said with the
  newer
 index-util version?

 Thank you,

 Núria.

 2009/12/4 Núria Trench nuriatre...@gmail.com

 Hi Mattias,

 Thank you very much for fixing the problem so fast. I will try
 it
  as
soon
 as the new changes will be available in the maven repository.

 Núria.


 2009/12/4 Mattias Persson matt...@neotechnology.com

 I fixed the problem and also added a cache per key for faster
 getNodes/getSingleNode lookup during the insert process.
 However
   the
 cache assumes that there's nothing in the index when 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-07 Thread Núria Trench
Hi again Mattias,

I have tried to execute my application with the last version available in
the maven repository and I still have the same problem. After creating and
indexing all the nodes, the application calls the optimize method and,
then, it creates all the edges by calling the method getNodes in order to
select the tail and head node of the edge, but it doesn't work because many
nodes are not found.

I have tried to create only 30 nodes and 15 edges and it works properly, but
if I try to create a big graph (180 million edges + 20 million nodes) it
doesn't.

I have also tried to call the optimize method every time the application
has been created 1 million nodes but it doesn't work.

Have you tried to create as many nodes as I have said with the newer
index-util version?

Thank you,

Núria.

2009/12/4 Núria Trench nuriatre...@gmail.com

 Hi Mattias,

 Thank you very much for fixing the problem so fast. I will try it as soon
 as the new changes will be available in the maven repository.

 Núria.


 2009/12/4 Mattias Persson matt...@neotechnology.com

 I fixed the problem and also added a cache per key for faster
 getNodes/getSingleNode lookup during the insert process. However the
 cache assumes that there's nothing in the index when the process
 starts (which almost always will be true) to speed things up even
 further.

 You can control the cache size and if it should be used by overriding
 the (this is also documented in the Javadoc):

 boolean useCache()
 int getMaxCacheSizePerKey()

 methods in your LuceneIndexBatchInserterImpl instance. The new changes
 should be available in the maven repository within an hour.

 2009/12/4 Mattias Persson matt...@neotechnology.com:
  I think I found the problem... it's indexing as it should, but it
  isn't reflected in getNodes/getSingleNode properly until you
  flush/optimize/shutdown the index. I'll try to fix it today!
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
  Thank you very much for your response.
  If you need more information, you only have to send an e-mail and I
 will try
  to explain it better.
 
  Núria.
 
  2009/12/3 Mattias Persson matt...@neotechnology.com
 
  This is something I'd like to reproduce and I'll do some testing on
  this tomorrow
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
   Hello,
  
   Last week, I decided to download your graph database core in order
 to use
   it. First, I created a new project to parse my CSV files and create
 a new
   graph database with Neo4j. This CSV files contain 150 milion edges
 and 20
   milion nodes.
  
   When I finished to write the code which will create the graph
 database, I
   executed it and, after six hours of execution, the program crashes
  because
   of a Lucene exception. The exception is related to the index merging
 and
  it
   has the following message:
   mergeFields produced an invalid result: docCount is 385282378 but
 fdx
  file
   size is 3082259028; now aborting this merge to prevent index
 corruption
  
   I have searched on the net and I found that it is a lucene bug. The
   libraries used for executing my project were:
   neo-1.0-b10
   index-util-0.7
   lucene-core-2.4.0
  
   So, I decided to use a newer Lucene version. I found that you have a
  newer
   index-util version so I updated the libraries:
   neo-1.0-b10
   index-util-0.9
   lucene-core-2.9.1
  
   When I had updated those libraries, I tried to execute my project
 again
  and
   I found that, in many occassions, it was not indexing properly. So,
 I
  tried
   to optimize the index after every time I indexed something. This was
 a
   solution because, after that, it was indexing properly but the time
   execution increased a lot.
  
   I am not using transactions, instead of this, I am using the Batch
  Inserter
   with the LuceneIndexBatchInserter.
  
   So, my question is: What can I do to solve this problem? If use
   index-util-0.7 I cannot finish the execution of creating the graph
  database
   and I use index-util-0.9 I have to optimize the index in every
 insertion
  and
   the execution never ever ends.
  
   Thank you very much in advance,
  
   Núria.
   ___
   Neo mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Neo Technology, www.neotechnology.com
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Neo Technology, www.neotechnology.com
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Neo Technology, www.neotechnology.com
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-07 Thread Todd Stavish
Hi Mattias, Núria.

I am also running into scalability problems with the Lucene batch
inserter at much smaller numbers, 30,000 indexed nodes. I tried
calling optimize more. Increasing ulimit didn't help.

INFO] Exception in thread main java.lang.RuntimeException:
java.io.FileNotFoundException:
/Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
[INFO]  at 
org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
[INFO]  at 
org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
[INFO]  at 
com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
[INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
[INFO] Caused by: java.io.FileNotFoundException:
/Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)

I tried breaking up to separate batchinserter instances, and it hangs
now. Can I create more than one batch inserter per process if they run
sequentially and non-threaded?

Thanks,
Todd





On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote:
 Hi again Mattias,

 I have tried to execute my application with the last version available in
 the maven repository and I still have the same problem. After creating and
 indexing all the nodes, the application calls the optimize method and,
 then, it creates all the edges by calling the method getNodes in order to
 select the tail and head node of the edge, but it doesn't work because many
 nodes are not found.

 I have tried to create only 30 nodes and 15 edges and it works properly, but
 if I try to create a big graph (180 million edges + 20 million nodes) it
 doesn't.

 I have also tried to call the optimize method every time the application
 has been created 1 million nodes but it doesn't work.

 Have you tried to create as many nodes as I have said with the newer
 index-util version?

 Thank you,

 Núria.

 2009/12/4 Núria Trench nuriatre...@gmail.com

 Hi Mattias,

 Thank you very much for fixing the problem so fast. I will try it as soon
 as the new changes will be available in the maven repository.

 Núria.


 2009/12/4 Mattias Persson matt...@neotechnology.com

 I fixed the problem and also added a cache per key for faster
 getNodes/getSingleNode lookup during the insert process. However the
 cache assumes that there's nothing in the index when the process
 starts (which almost always will be true) to speed things up even
 further.

 You can control the cache size and if it should be used by overriding
 the (this is also documented in the Javadoc):

 boolean useCache()
 int getMaxCacheSizePerKey()

 methods in your LuceneIndexBatchInserterImpl instance. The new changes
 should be available in the maven repository within an hour.

 2009/12/4 Mattias Persson matt...@neotechnology.com:
  I think I found the problem... it's indexing as it should, but it
  isn't reflected in getNodes/getSingleNode properly until you
  flush/optimize/shutdown the index. I'll try to fix it today!
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
  Thank you very much for your response.
  If you need more information, you only have to send an e-mail and I
 will try
  to explain it better.
 
  Núria.
 
  2009/12/3 Mattias Persson matt...@neotechnology.com
 
  This is something I'd like to reproduce and I'll do some testing on
  this tomorrow
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
   Hello,
  
   Last week, I decided to download your graph database core in order
 to use
   it. First, I created a new project to parse my CSV files and create
 a new
   graph database with Neo4j. This CSV files contain 150 milion edges
 and 20
   milion nodes.
  
   When I finished to write the code which will create the graph
 database, I
   executed it and, after six hours of execution, the program crashes
  because
   of a Lucene exception. The exception is related to the index merging
 and
  it
   has the following message:
   mergeFields produced an invalid result: docCount is 385282378 but
 fdx
  file
   size is 3082259028; now aborting this merge to prevent index
 corruption
  
   I have searched on the net and I found that it is a lucene bug. The
   libraries used for executing my project were:
   neo-1.0-b10
   index-util-0.7
   lucene-core-2.4.0
  
   So, I decided to use a newer Lucene version. I found that you have a
  newer
   index-util version so I updated the libraries:
   neo-1.0-b10
   index-util-0.9
   lucene-core-2.9.1
  
   When I had updated those libraries, I tried to execute my project
 again
  and
   I found that, in many occassions, it was not indexing properly. So,
 I
  tried
   to optimize the index after every time I indexed something. This was
 a
   solution because, after that, it was indexing properly but the time
   execution increased a lot.
  
   I am not using transactions, instead of this, I am using the Batch
  Inserter
   with 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-07 Thread Mattias Persson
Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
is a bug that we fixed yesterday... (assuming it's the same bug).

2009/12/7 Todd Stavish toddstav...@gmail.com:
 Hi Mattias, Núria.

 I am also running into scalability problems with the Lucene batch
 inserter at much smaller numbers, 30,000 indexed nodes. I tried
 calling optimize more. Increasing ulimit didn't help.

 INFO] Exception in thread main java.lang.RuntimeException:
 java.io.FileNotFoundException:
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
 (Too many open files)
 [INFO]  at 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
 [INFO]  at 
 org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
 [INFO]  at 
 com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
 [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
 [INFO] Caused by: java.io.FileNotFoundException:
 /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
 (Too many open files)

 I tried breaking up to separate batchinserter instances, and it hangs
 now. Can I create more than one batch inserter per process if they run
 sequentially and non-threaded?

 Thanks,
 Todd





 On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote:
 Hi again Mattias,

 I have tried to execute my application with the last version available in
 the maven repository and I still have the same problem. After creating and
 indexing all the nodes, the application calls the optimize method and,
 then, it creates all the edges by calling the method getNodes in order to
 select the tail and head node of the edge, but it doesn't work because many
 nodes are not found.

 I have tried to create only 30 nodes and 15 edges and it works properly, but
 if I try to create a big graph (180 million edges + 20 million nodes) it
 doesn't.

 I have also tried to call the optimize method every time the application
 has been created 1 million nodes but it doesn't work.

 Have you tried to create as many nodes as I have said with the newer
 index-util version?

 Thank you,

 Núria.

 2009/12/4 Núria Trench nuriatre...@gmail.com

 Hi Mattias,

 Thank you very much for fixing the problem so fast. I will try it as soon
 as the new changes will be available in the maven repository.

 Núria.


 2009/12/4 Mattias Persson matt...@neotechnology.com

 I fixed the problem and also added a cache per key for faster
 getNodes/getSingleNode lookup during the insert process. However the
 cache assumes that there's nothing in the index when the process
 starts (which almost always will be true) to speed things up even
 further.

 You can control the cache size and if it should be used by overriding
 the (this is also documented in the Javadoc):

 boolean useCache()
 int getMaxCacheSizePerKey()

 methods in your LuceneIndexBatchInserterImpl instance. The new changes
 should be available in the maven repository within an hour.

 2009/12/4 Mattias Persson matt...@neotechnology.com:
  I think I found the problem... it's indexing as it should, but it
  isn't reflected in getNodes/getSingleNode properly until you
  flush/optimize/shutdown the index. I'll try to fix it today!
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
  Thank you very much for your response.
  If you need more information, you only have to send an e-mail and I
 will try
  to explain it better.
 
  Núria.
 
  2009/12/3 Mattias Persson matt...@neotechnology.com
 
  This is something I'd like to reproduce and I'll do some testing on
  this tomorrow
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
   Hello,
  
   Last week, I decided to download your graph database core in order
 to use
   it. First, I created a new project to parse my CSV files and create
 a new
   graph database with Neo4j. This CSV files contain 150 milion edges
 and 20
   milion nodes.
  
   When I finished to write the code which will create the graph
 database, I
   executed it and, after six hours of execution, the program crashes
  because
   of a Lucene exception. The exception is related to the index merging
 and
  it
   has the following message:
   mergeFields produced an invalid result: docCount is 385282378 but
 fdx
  file
   size is 3082259028; now aborting this merge to prevent index
 corruption
  
   I have searched on the net and I found that it is a lucene bug. The
   libraries used for executing my project were:
   neo-1.0-b10
   index-util-0.7
   lucene-core-2.4.0
  
   So, I decided to use a newer Lucene version. I found that you have a
  newer
   index-util version so I updated the libraries:
   neo-1.0-b10
   index-util-0.9
   lucene-core-2.9.1
  
   When I had updated those libraries, I tried to execute my project
 again
  and
   I found that, in many occassions, it was not indexing properly. So,
 I
  tried
   to optimize the index after every time I indexed something. 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-04 Thread Mattias Persson
I think I found the problem... it's indexing as it should, but it
isn't reflected in getNodes/getSingleNode properly until you
flush/optimize/shutdown the index. I'll try to fix it today!

2009/12/3 Núria Trench nuriatre...@gmail.com:
 Thank you very much for your response.
 If you need more information, you only have to send an e-mail and I will try
 to explain it better.

 Núria.

 2009/12/3 Mattias Persson matt...@neotechnology.com

 This is something I'd like to reproduce and I'll do some testing on
 this tomorrow

 2009/12/3 Núria Trench nuriatre...@gmail.com:
  Hello,
 
  Last week, I decided to download your graph database core in order to use
  it. First, I created a new project to parse my CSV files and create a new
  graph database with Neo4j. This CSV files contain 150 milion edges and 20
  milion nodes.
 
  When I finished to write the code which will create the graph database, I
  executed it and, after six hours of execution, the program crashes
 because
  of a Lucene exception. The exception is related to the index merging and
 it
  has the following message:
  mergeFields produced an invalid result: docCount is 385282378 but fdx
 file
  size is 3082259028; now aborting this merge to prevent index corruption
 
  I have searched on the net and I found that it is a lucene bug. The
  libraries used for executing my project were:
  neo-1.0-b10
  index-util-0.7
  lucene-core-2.4.0
 
  So, I decided to use a newer Lucene version. I found that you have a
 newer
  index-util version so I updated the libraries:
  neo-1.0-b10
  index-util-0.9
  lucene-core-2.9.1
 
  When I had updated those libraries, I tried to execute my project again
 and
  I found that, in many occassions, it was not indexing properly. So, I
 tried
  to optimize the index after every time I indexed something. This was a
  solution because, after that, it was indexing properly but the time
  execution increased a lot.
 
  I am not using transactions, instead of this, I am using the Batch
 Inserter
  with the LuceneIndexBatchInserter.
 
  So, my question is: What can I do to solve this problem? If use
  index-util-0.7 I cannot finish the execution of creating the graph
 database
  and I use index-util-0.9 I have to optimize the index in every insertion
 and
  the execution never ever ends.
 
  Thank you very much in advance,
 
  Núria.
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Neo Technology, www.neotechnology.com
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-04 Thread Núria Trench
Hi Mattias,

Thank you very much for fixing the problem so fast. I will try it as soon as
the new changes will be available in the maven repository.

Núria.

2009/12/4 Mattias Persson matt...@neotechnology.com

 I fixed the problem and also added a cache per key for faster
 getNodes/getSingleNode lookup during the insert process. However the
 cache assumes that there's nothing in the index when the process
 starts (which almost always will be true) to speed things up even
 further.

 You can control the cache size and if it should be used by overriding
 the (this is also documented in the Javadoc):

 boolean useCache()
 int getMaxCacheSizePerKey()

 methods in your LuceneIndexBatchInserterImpl instance. The new changes
 should be available in the maven repository within an hour.

 2009/12/4 Mattias Persson matt...@neotechnology.com:
  I think I found the problem... it's indexing as it should, but it
  isn't reflected in getNodes/getSingleNode properly until you
  flush/optimize/shutdown the index. I'll try to fix it today!
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
  Thank you very much for your response.
  If you need more information, you only have to send an e-mail and I will
 try
  to explain it better.
 
  Núria.
 
  2009/12/3 Mattias Persson matt...@neotechnology.com
 
  This is something I'd like to reproduce and I'll do some testing on
  this tomorrow
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
   Hello,
  
   Last week, I decided to download your graph database core in order to
 use
   it. First, I created a new project to parse my CSV files and create a
 new
   graph database with Neo4j. This CSV files contain 150 milion edges
 and 20
   milion nodes.
  
   When I finished to write the code which will create the graph
 database, I
   executed it and, after six hours of execution, the program crashes
  because
   of a Lucene exception. The exception is related to the index merging
 and
  it
   has the following message:
   mergeFields produced an invalid result: docCount is 385282378 but
 fdx
  file
   size is 3082259028; now aborting this merge to prevent index
 corruption
  
   I have searched on the net and I found that it is a lucene bug. The
   libraries used for executing my project were:
   neo-1.0-b10
   index-util-0.7
   lucene-core-2.4.0
  
   So, I decided to use a newer Lucene version. I found that you have a
  newer
   index-util version so I updated the libraries:
   neo-1.0-b10
   index-util-0.9
   lucene-core-2.9.1
  
   When I had updated those libraries, I tried to execute my project
 again
  and
   I found that, in many occassions, it was not indexing properly. So, I
  tried
   to optimize the index after every time I indexed something. This was
 a
   solution because, after that, it was indexing properly but the time
   execution increased a lot.
  
   I am not using transactions, instead of this, I am using the Batch
  Inserter
   with the LuceneIndexBatchInserter.
  
   So, my question is: What can I do to solve this problem? If use
   index-util-0.7 I cannot finish the execution of creating the graph
  database
   and I use index-util-0.9 I have to optimize the index in every
 insertion
  and
   the execution never ever ends.
  
   Thank you very much in advance,
  
   Núria.
   ___
   Neo mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Neo Technology, www.neotechnology.com
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Neo Technology, www.neotechnology.com
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Neo Technology, www.neotechnology.com
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-03 Thread Núria Trench
Thank you very much for your response.
If you need more information, you only have to send an e-mail and I will try
to explain it better.

Núria.

2009/12/3 Mattias Persson matt...@neotechnology.com

 This is something I'd like to reproduce and I'll do some testing on
 this tomorrow

 2009/12/3 Núria Trench nuriatre...@gmail.com:
  Hello,
 
  Last week, I decided to download your graph database core in order to use
  it. First, I created a new project to parse my CSV files and create a new
  graph database with Neo4j. This CSV files contain 150 milion edges and 20
  milion nodes.
 
  When I finished to write the code which will create the graph database, I
  executed it and, after six hours of execution, the program crashes
 because
  of a Lucene exception. The exception is related to the index merging and
 it
  has the following message:
  mergeFields produced an invalid result: docCount is 385282378 but fdx
 file
  size is 3082259028; now aborting this merge to prevent index corruption
 
  I have searched on the net and I found that it is a lucene bug. The
  libraries used for executing my project were:
  neo-1.0-b10
  index-util-0.7
  lucene-core-2.4.0
 
  So, I decided to use a newer Lucene version. I found that you have a
 newer
  index-util version so I updated the libraries:
  neo-1.0-b10
  index-util-0.9
  lucene-core-2.9.1
 
  When I had updated those libraries, I tried to execute my project again
 and
  I found that, in many occassions, it was not indexing properly. So, I
 tried
  to optimize the index after every time I indexed something. This was a
  solution because, after that, it was indexing properly but the time
  execution increased a lot.
 
  I am not using transactions, instead of this, I am using the Batch
 Inserter
  with the LuceneIndexBatchInserter.
 
  So, my question is: What can I do to solve this problem? If use
  index-util-0.7 I cannot finish the execution of creating the graph
 database
  and I use index-util-0.9 I have to optimize the index in every insertion
 and
  the execution never ever ends.
 
  Thank you very much in advance,
 
  Núria.
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Neo Technology, www.neotechnology.com
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user