date:20091207

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-07 Thread Núria Trench

Hi again Mattias,

I have tried to execute my application with the last version available in
the maven repository and I still have the same problem. After creating and
indexing all the nodes, the application calls the optimize method and,
then, it creates all the edges by calling the method getNodes in order to
select the tail and head node of the edge, but it doesn't work because many
nodes are not found.

I have tried to create only 30 nodes and 15 edges and it works properly, but
if I try to create a big graph (180 million edges + 20 million nodes) it
doesn't.

I have also tried to call the optimize method every time the application
has been created 1 million nodes but it doesn't work.

Have you tried to create as many nodes as I have said with the newer
index-util version?

Thank you,

Núria.

2009/12/4 Núria Trench nuriatre...@gmail.com

 Hi Mattias,

 Thank you very much for fixing the problem so fast. I will try it as soon
 as the new changes will be available in the maven repository.

 Núria.


 2009/12/4 Mattias Persson matt...@neotechnology.com

 I fixed the problem and also added a cache per key for faster
 getNodes/getSingleNode lookup during the insert process. However the
 cache assumes that there's nothing in the index when the process
 starts (which almost always will be true) to speed things up even
 further.

 You can control the cache size and if it should be used by overriding
 the (this is also documented in the Javadoc):

 boolean useCache()
 int getMaxCacheSizePerKey()

 methods in your LuceneIndexBatchInserterImpl instance. The new changes
 should be available in the maven repository within an hour.

 2009/12/4 Mattias Persson matt...@neotechnology.com:
  I think I found the problem... it's indexing as it should, but it
  isn't reflected in getNodes/getSingleNode properly until you
  flush/optimize/shutdown the index. I'll try to fix it today!
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
  Thank you very much for your response.
  If you need more information, you only have to send an e-mail and I
 will try
  to explain it better.
 
  Núria.
 
  2009/12/3 Mattias Persson matt...@neotechnology.com
 
  This is something I'd like to reproduce and I'll do some testing on
  this tomorrow
 
  2009/12/3 Núria Trench nuriatre...@gmail.com:
   Hello,
  
   Last week, I decided to download your graph database core in order
 to use
   it. First, I created a new project to parse my CSV files and create
 a new
   graph database with Neo4j. This CSV files contain 150 milion edges
 and 20
   milion nodes.
  
   When I finished to write the code which will create the graph
 database, I
   executed it and, after six hours of execution, the program crashes
  because
   of a Lucene exception. The exception is related to the index merging
 and
  it
   has the following message:
   mergeFields produced an invalid result: docCount is 385282378 but
 fdx
  file
   size is 3082259028; now aborting this merge to prevent index
 corruption
  
   I have searched on the net and I found that it is a lucene bug. The
   libraries used for executing my project were:
   neo-1.0-b10
   index-util-0.7
   lucene-core-2.4.0
  
   So, I decided to use a newer Lucene version. I found that you have a
  newer
   index-util version so I updated the libraries:
   neo-1.0-b10
   index-util-0.9
   lucene-core-2.9.1
  
   When I had updated those libraries, I tried to execute my project
 again
  and
   I found that, in many occassions, it was not indexing properly. So,
 I
  tried
   to optimize the index after every time I indexed something. This was
 a
   solution because, after that, it was indexing properly but the time
   execution increased a lot.
  
   I am not using transactions, instead of this, I am using the Batch
  Inserter
   with the LuceneIndexBatchInserter.
  
   So, my question is: What can I do to solve this problem? If use
   index-util-0.7 I cannot finish the execution of creating the graph
  database
   and I use index-util-0.9 I have to optimize the index in every
 insertion
  and
   the execution never ever ends.
  
   Thank you very much in advance,
  
   Núria.
   ___
   Neo mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Neo Technology, www.neotechnology.com
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Neo Technology, www.neotechnology.com
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Neo Technology, www.neotechnology.com
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-07 Thread Todd Stavish

Hi Mattias, Núria.

I am also running into scalability problems with the Lucene batch
inserter at much smaller numbers, 30,000 indexed nodes. I tried
calling optimize more. Increasing ulimit didn't help.

INFO] Exception in thread main java.lang.RuntimeException:
java.io.FileNotFoundException:
/Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
[INFO] at
org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
[INFO] at
org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
[INFO] at
com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
[INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
[INFO] Caused by: java.io.FileNotFoundException:
/Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)

I tried breaking up to separate batchinserter instances, and it hangs
now. Can I create more than one batch inserter per process if they run
sequentially and non-threaded?

Thanks,
Todd

On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote:
Hi again Mattias,

I have tried to execute my application with the last version available in
the maven repository and I still have the same problem. After creating and
indexing all the nodes, the application calls the optimize method and,
then, it creates all the edges by calling the method getNodes in order to
select the tail and head node of the edge, but it doesn't work because many
nodes are not found.

I have tried to create only 30 nodes and 15 edges and it works properly, but
if I try to create a big graph (180 million edges + 20 million nodes) it
doesn't.

I have also tried to call the optimize method every time the application
has been created 1 million nodes but it doesn't work.

Have you tried to create as many nodes as I have said with the newer
index-util version?

Thank you,

Núria.

2009/12/4 Núria Trench nuriatre...@gmail.com

Hi Mattias,

Thank you very much for fixing the problem so fast. I will try it as soon
as the new changes will be available in the maven repository.

Núria.

2009/12/4 Mattias Persson matt...@neotechnology.com

I fixed the problem and also added a cache per key for faster
getNodes/getSingleNode lookup during the insert process. However the
cache assumes that there's nothing in the index when the process
starts (which almost always will be true) to speed things up even
further.

You can control the cache size and if it should be used by overriding
the (this is also documented in the Javadoc):

boolean useCache()
int getMaxCacheSizePerKey()

methods in your LuceneIndexBatchInserterImpl instance. The new changes
should be available in the maven repository within an hour.

2009/12/4 Mattias Persson matt...@neotechnology.com:
I think I found the problem... it's indexing as it should, but it
isn't reflected in getNodes/getSingleNode properly until you
flush/optimize/shutdown the index. I'll try to fix it today!

2009/12/3 Núria Trench nuriatre...@gmail.com:
Thank you very much for your response.
If you need more information, you only have to send an e-mail and I
will try
to explain it better.

Núria.

2009/12/3 Mattias Persson matt...@neotechnology.com

This is something I'd like to reproduce and I'll do some testing on
this tomorrow

2009/12/3 Núria Trench nuriatre...@gmail.com:
Hello,

Last week, I decided to download your graph database core in order
to use
it. First, I created a new project to parse my CSV files and create
a new
graph database with Neo4j. This CSV files contain 150 milion edges
and 20
milion nodes.

When I finished to write the code which will create the graph
database, I
executed it and, after six hours of execution, the program crashes
because
of a Lucene exception. The exception is related to the index merging
and
it
has the following message:
mergeFields produced an invalid result: docCount is 385282378 but
fdx
file
size is 3082259028; now aborting this merge to prevent index
corruption

I have searched on the net and I found that it is a lucene bug. The
libraries used for executing my project were:
neo-1.0-b10
index-util-0.7
lucene-core-2.4.0

So, I decided to use a newer Lucene version. I found that you have a
newer
index-util version so I updated the libraries:
neo-1.0-b10
index-util-0.9
lucene-core-2.9.1

When I had updated those libraries, I tried to execute my project
again
and
I found that, in many occassions, it was not indexing properly. So,
I
tried
to optimize the index after every time I indexed something. This was
a
solution because, after that, it was indexing properly but the time
execution increased a lot.

I am not using transactions, instead of this, I am using the Batch
Inserter
with

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-07 Thread Mattias Persson

Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
is a bug that we fixed yesterday... (assuming it's the same bug).

2009/12/7 Todd Stavish toddstav...@gmail.com:
Hi Mattias, Núria.

I am also running into scalability problems with the Lucene batch
inserter at much smaller numbers, 30,000 indexed nodes. I tried
calling optimize more. Increasing ulimit didn't help.

I tried breaking up to separate batchinserter instances, and it hangs
now. Can I create more than one batch inserter per process if they run
sequentially and non-threaded?

Thanks,
Todd

On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote:
Hi again Mattias,

I have tried to create only 30 nodes and 15 edges and it works properly, but
if I try to create a big graph (180 million edges + 20 million nodes) it
doesn't.

I have also tried to call the optimize method every time the application
has been created 1 million nodes but it doesn't work.

Have you tried to create as many nodes as I have said with the newer
index-util version?

Thank you,

Núria.

2009/12/4 Núria Trench nuriatre...@gmail.com

Hi Mattias,

Thank you very much for fixing the problem so fast. I will try it as soon
as the new changes will be available in the maven repository.

Núria.

2009/12/4 Mattias Persson matt...@neotechnology.com

You can control the cache size and if it should be used by overriding
the (this is also documented in the Javadoc):

boolean useCache()
int getMaxCacheSizePerKey()

methods in your LuceneIndexBatchInserterImpl instance. The new changes
should be available in the maven repository within an hour.

2009/12/3 Núria Trench nuriatre...@gmail.com:
Thank you very much for your response.
If you need more information, you only have to send an e-mail and I
will try
to explain it better.

Núria.

2009/12/3 Mattias Persson matt...@neotechnology.com

This is something I'd like to reproduce and I'll do some testing on
this tomorrow

2009/12/3 Núria Trench nuriatre...@gmail.com:
Hello,

I have searched on the net and I found that it is a lucene bug. The
libraries used for executing my project were:
neo-1.0-b10
index-util-0.7
lucene-core-2.4.0

So, I decided to use a newer Lucene version. I found that you have a
newer
index-util version so I updated the libraries:
neo-1.0-b10
index-util-0.9
lucene-core-2.9.1

Re: [Neo] LuceneIndexBatchInserter doubt

Re: [Neo] LuceneIndexBatchInserter doubt

Re: [Neo] LuceneIndexBatchInserter doubt

3 matches

Site Navigation

Mail list logo

Footer information