Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-31 Thread Núria Trench
Hi Peter,

The limits that you have specified is enough for me.
Thanks you again.

Núria.

2009/12/26 Peter Neubauer 

> Hi Núria,
> the current ID-scheme of using Integers for IDs for both Nodes,
> Relationships and Properties limits the possible node space size to 4
> Billion nodes, 4 Billion Relationships and 4 Billion properties. Of
> course one could switch to Long as IDs, but that will increase the
> reserved amount of bytes and cause possible performance penalties.
> However, this is the current limit, after that you have to start
> thinking about sharding along a suitable domain-specific criteria.
> What size and domain are you imagining?
>
> However, when dealing with bigger nodespaces you probably want to
> increase RAM of your server machine and think about SSD in order to
> keep the often-used parts of your graph cached and minimize IO cost.
>
> HTH
>
> Cheers,
>
> /peter neubauer
>
> COO and Sales, Neo Technology
>
> GTalk:  neubauer.peter
> Skype   peter.neubauer
> Phone   +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter  http://twitter.com/peterneubauer
>
> http://www.neo4j.org- Relationships count.
> http://gremlin.tinkerpop.com- PageRank in 2 lines of code.
> http://www.linkedprocess.org   - Computing at LinkedData scale.
>
>
>
> On Sat, Dec 26, 2009 at 4:10 PM, Núria Trench 
> wrote:
> > Hi,
> >
> > I have just finished parsing and creating the database with the latest
> > index-util-0.9-SNAPSHOT available in your repository. It has been
> finished
> > succesfully so I must thank you for your interest and useful help.
> > And, finally, I have one last question. I have been created 180 million
> of
> > edges and 20 million of nodes. Is it possible to create a bigger amount
> of
> > edges and nodes with Neo4j? Do you have a limit?
> >
> > Thank your very much again.
> >
> > 2009/12/21 Núria Trench 
> >
> >> Hi again Mattias,
> >>
> >> I'm still trying to parse all the data in order to create the graph. I
> will
> >> report the results as soon as possible.
> >> Thank you very much for your interest.
> >>
> >> Núria.
> >>
> >> 2009/12/21 Mattias Persson 
> >>
> >> Hi again,
> >>>
> >>> any luck with this yet?
> >>>
> >>> 2009/12/11 Núria Trench :
> >>> > Thank you very much Mattias. I will test it as soon as possible and
> I'll
> >>> > will tell you something.
> >>> >
> >>> > Núria.
> >>> >
> >>> > 2009/12/11 Mattias Persson 
> >>> >
> >>> >> I've tried this a couple of times now and first of all I see some
> >>> >> problems in your code:
> >>> >>
> >>> >> 1) In the method createRelationsTitleImage you have an inverted
> "head
> >>> >> != -1" check where it should be "head == -1"
> >>> >>
> >>> >> 2) You index relationships in createRelationsBetweenTitles method,
> >>> >> this isn't ok since the index can only manage nodes.
> >>> >>
> >>> >> And I recently committed a "fix" which removed the caching layer in
> >>> >> the LuceneIndexBatchInserterImpl (and therefore also
> >>> >> LuceneFulltextIndexBatchInserter). This probably fixes your
> problems.
> >>> >> I'm also working on a performance fix which makes consecutive
> getNodes
> >>> >> calls faster.
> >>> >>
> >>> >> So I think that with these fixes (1) and (2) and the latest
> index-util
> >>> >> 0.9-SNAPSHOT your sample will run fine. Also you could try without
> >>> >> calling optimize. See more information at
> >>> >> http://wiki.neo4j.org/content/Indexing_with_BatchInserter
> >>> >>
> >>> >> 2009/12/10 Mattias Persson :
> >>> >> > To continue this thread in the user list:
> >>> >> >
> >>> >> > Thanks Núria, I've gotten your samples code/files and I'm running
> it
> >>> >> > now to try to reproduce you problem.
> >>> >> >
> >>> >> > 2009/12/9 Núria Trench :
> >>> >> >> I have finished uploading the 4 csv files. You'll see an e-mail
> with
> >>> the
> >>> >> >> other 3 csv files packed in a rar file.
> >>> >> >> Thanks,
> >>> >> >>
> >>> >> >> Núria.
> >>> >> >>
> >>> >> >> 2009/12/9 Núria Trench 
> >>> >> >>>
> >>> >> >>> Yes, you are right. But there is one csv file that is too big to
> be
> >>> >> packed
> >>> >> >>> with other files and I am reducing it.
> >>> >> >>> I am sending the other files now.
> >>> >> >>>
> >>> >> >>> 2009/12/9 Mattias Persson 
> >>> >> 
> >>> >>  By the way, you might consider packing those files (with zip or
> >>> tar.gz
> >>> >>  or something) cause they will shrink quite well
> >>> >> 
> >>> >>  2009/12/9 Mattias Persson :
> >>> >>  > Great, but I only got the images.csv file... I'm starting to
> >>> test
> >>> >> with
> >>> >>  > that at least
> >>> >>  >
> >>> >>  > 2009/12/9 Núria Trench :
> >>> >>  >> Hi again,
> >>> >>  >>
> >>> >>  >> The errors show up after being parsed 2 csv files to create
> all
> >>> the
> >>> >>  >> nodes,
> >>> >>  >> just in the moment of calling the method "getSingleNode" for
> >>> >> looking
> >>> >>  >> up the
> >>> >>  >> tail and head node for cre

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-26 Thread Peter Neubauer
Hi Núria,
the current ID-scheme of using Integers for IDs for both Nodes,
Relationships and Properties limits the possible node space size to 4
Billion nodes, 4 Billion Relationships and 4 Billion properties. Of
course one could switch to Long as IDs, but that will increase the
reserved amount of bytes and cause possible performance penalties.
However, this is the current limit, after that you have to start
thinking about sharding along a suitable domain-specific criteria.
What size and domain are you imagining?

However, when dealing with bigger nodespaces you probably want to
increase RAM of your server machine and think about SSD in order to
keep the often-used parts of your graph cached and minimize IO cost.

HTH

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk:  neubauer.peter
Skype   peter.neubauer
Phone   +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter  http://twitter.com/peterneubauer

http://www.neo4j.org- Relationships count.
http://gremlin.tinkerpop.com- PageRank in 2 lines of code.
http://www.linkedprocess.org   - Computing at LinkedData scale.



On Sat, Dec 26, 2009 at 4:10 PM, Núria Trench  wrote:
> Hi,
>
> I have just finished parsing and creating the database with the latest
> index-util-0.9-SNAPSHOT available in your repository. It has been finished
> succesfully so I must thank you for your interest and useful help.
> And, finally, I have one last question. I have been created 180 million of
> edges and 20 million of nodes. Is it possible to create a bigger amount of
> edges and nodes with Neo4j? Do you have a limit?
>
> Thank your very much again.
>
> 2009/12/21 Núria Trench 
>
>> Hi again Mattias,
>>
>> I'm still trying to parse all the data in order to create the graph. I will
>> report the results as soon as possible.
>> Thank you very much for your interest.
>>
>> Núria.
>>
>> 2009/12/21 Mattias Persson 
>>
>> Hi again,
>>>
>>> any luck with this yet?
>>>
>>> 2009/12/11 Núria Trench :
>>> > Thank you very much Mattias. I will test it as soon as possible and I'll
>>> > will tell you something.
>>> >
>>> > Núria.
>>> >
>>> > 2009/12/11 Mattias Persson 
>>> >
>>> >> I've tried this a couple of times now and first of all I see some
>>> >> problems in your code:
>>> >>
>>> >> 1) In the method createRelationsTitleImage you have an inverted "head
>>> >> != -1" check where it should be "head == -1"
>>> >>
>>> >> 2) You index relationships in createRelationsBetweenTitles method,
>>> >> this isn't ok since the index can only manage nodes.
>>> >>
>>> >> And I recently committed a "fix" which removed the caching layer in
>>> >> the LuceneIndexBatchInserterImpl (and therefore also
>>> >> LuceneFulltextIndexBatchInserter). This probably fixes your problems.
>>> >> I'm also working on a performance fix which makes consecutive getNodes
>>> >> calls faster.
>>> >>
>>> >> So I think that with these fixes (1) and (2) and the latest index-util
>>> >> 0.9-SNAPSHOT your sample will run fine. Also you could try without
>>> >> calling optimize. See more information at
>>> >> http://wiki.neo4j.org/content/Indexing_with_BatchInserter
>>> >>
>>> >> 2009/12/10 Mattias Persson :
>>> >> > To continue this thread in the user list:
>>> >> >
>>> >> > Thanks Núria, I've gotten your samples code/files and I'm running it
>>> >> > now to try to reproduce you problem.
>>> >> >
>>> >> > 2009/12/9 Núria Trench :
>>> >> >> I have finished uploading the 4 csv files. You'll see an e-mail with
>>> the
>>> >> >> other 3 csv files packed in a rar file.
>>> >> >> Thanks,
>>> >> >>
>>> >> >> Núria.
>>> >> >>
>>> >> >> 2009/12/9 Núria Trench 
>>> >> >>>
>>> >> >>> Yes, you are right. But there is one csv file that is too big to be
>>> >> packed
>>> >> >>> with other files and I am reducing it.
>>> >> >>> I am sending the other files now.
>>> >> >>>
>>> >> >>> 2009/12/9 Mattias Persson 
>>> >> 
>>> >>  By the way, you might consider packing those files (with zip or
>>> tar.gz
>>> >>  or something) cause they will shrink quite well
>>> >> 
>>> >>  2009/12/9 Mattias Persson :
>>> >>  > Great, but I only got the images.csv file... I'm starting to
>>> test
>>> >> with
>>> >>  > that at least
>>> >>  >
>>> >>  > 2009/12/9 Núria Trench :
>>> >>  >> Hi again,
>>> >>  >>
>>> >>  >> The errors show up after being parsed 2 csv files to create all
>>> the
>>> >>  >> nodes,
>>> >>  >> just in the moment of calling the method "getSingleNode" for
>>> >> looking
>>> >>  >> up the
>>> >>  >> tail and head node for creating all the edges by reading the
>>> other
>>> >> two
>>> >>  >> csv
>>> >>  >> files.
>>> >>  >>
>>> >>  >> I am sending with Sprend the four csv files that will help you
>>> to
>>> >>  >> trigger
>>> >>  >> index behaviour.
>>> >>  >>
>>> >>  >> Thank you,
>>> >>  >>
>>> >>  >> Núria.
>>> >>  >>
>>> >>  >> 2009/12/9 Mattias Persson 
>>> >>  >>>

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-26 Thread Núria Trench
Hi,

I have just finished parsing and creating the database with the latest
index-util-0.9-SNAPSHOT available in your repository. It has been finished
succesfully so I must thank you for your interest and useful help.
And, finally, I have one last question. I have been created 180 million of
edges and 20 million of nodes. Is it possible to create a bigger amount of
edges and nodes with Neo4j? Do you have a limit?

Thank your very much again.

2009/12/21 Núria Trench 

> Hi again Mattias,
>
> I'm still trying to parse all the data in order to create the graph. I will
> report the results as soon as possible.
> Thank you very much for your interest.
>
> Núria.
>
> 2009/12/21 Mattias Persson 
>
> Hi again,
>>
>> any luck with this yet?
>>
>> 2009/12/11 Núria Trench :
>> > Thank you very much Mattias. I will test it as soon as possible and I'll
>> > will tell you something.
>> >
>> > Núria.
>> >
>> > 2009/12/11 Mattias Persson 
>> >
>> >> I've tried this a couple of times now and first of all I see some
>> >> problems in your code:
>> >>
>> >> 1) In the method createRelationsTitleImage you have an inverted "head
>> >> != -1" check where it should be "head == -1"
>> >>
>> >> 2) You index relationships in createRelationsBetweenTitles method,
>> >> this isn't ok since the index can only manage nodes.
>> >>
>> >> And I recently committed a "fix" which removed the caching layer in
>> >> the LuceneIndexBatchInserterImpl (and therefore also
>> >> LuceneFulltextIndexBatchInserter). This probably fixes your problems.
>> >> I'm also working on a performance fix which makes consecutive getNodes
>> >> calls faster.
>> >>
>> >> So I think that with these fixes (1) and (2) and the latest index-util
>> >> 0.9-SNAPSHOT your sample will run fine. Also you could try without
>> >> calling optimize. See more information at
>> >> http://wiki.neo4j.org/content/Indexing_with_BatchInserter
>> >>
>> >> 2009/12/10 Mattias Persson :
>> >> > To continue this thread in the user list:
>> >> >
>> >> > Thanks Núria, I've gotten your samples code/files and I'm running it
>> >> > now to try to reproduce you problem.
>> >> >
>> >> > 2009/12/9 Núria Trench :
>> >> >> I have finished uploading the 4 csv files. You'll see an e-mail with
>> the
>> >> >> other 3 csv files packed in a rar file.
>> >> >> Thanks,
>> >> >>
>> >> >> Núria.
>> >> >>
>> >> >> 2009/12/9 Núria Trench 
>> >> >>>
>> >> >>> Yes, you are right. But there is one csv file that is too big to be
>> >> packed
>> >> >>> with other files and I am reducing it.
>> >> >>> I am sending the other files now.
>> >> >>>
>> >> >>> 2009/12/9 Mattias Persson 
>> >> 
>> >>  By the way, you might consider packing those files (with zip or
>> tar.gz
>> >>  or something) cause they will shrink quite well
>> >> 
>> >>  2009/12/9 Mattias Persson :
>> >>  > Great, but I only got the images.csv file... I'm starting to
>> test
>> >> with
>> >>  > that at least
>> >>  >
>> >>  > 2009/12/9 Núria Trench :
>> >>  >> Hi again,
>> >>  >>
>> >>  >> The errors show up after being parsed 2 csv files to create all
>> the
>> >>  >> nodes,
>> >>  >> just in the moment of calling the method "getSingleNode" for
>> >> looking
>> >>  >> up the
>> >>  >> tail and head node for creating all the edges by reading the
>> other
>> >> two
>> >>  >> csv
>> >>  >> files.
>> >>  >>
>> >>  >> I am sending with Sprend the four csv files that will help you
>> to
>> >>  >> trigger
>> >>  >> index behaviour.
>> >>  >>
>> >>  >> Thank you,
>> >>  >>
>> >>  >> Núria.
>> >>  >>
>> >>  >> 2009/12/9 Mattias Persson 
>> >>  >>>
>> >>  >>> Hmm, I've no idea... but does the errors show up early in the
>> >> process
>> >>  >>> or do you have to insert a LOT of data to trigger it? In such
>> case
>> >>  >>> you
>> >>  >>> could send me a part of them... maybe using
>> http://www.sprend.se,
>> >>  >>> WDYT?
>> >>  >>>
>> >>  >>> 2009/12/9 Núria Trench :
>> >>  >>> > Hi Mattias,
>> >>  >>> >
>> >>  >>> > The data isn't confident but the files are very big (5,5
>> GB).
>> >>  >>> > How can I send you this data?
>> >>  >>> >
>> >>  >>> > 2009/12/9 Mattias Persson 
>> >>  >>> >>
>> >>  >>> >> Yep I got the java code, thanks. Yeah if the data is
>> confident
>> >> or
>> >>  >>> >> sensitive you can just send me the formatting, else
>> consider
>> >>  >>> >> sending
>> >>  >>> >> the files as well (or a subset if they are big).
>> >>  >>> >>
>> >>  >>> >> 2009/12/9 Núria Trench :
>> >>
>> >>
>> >>
>> >> --
>> >> Mattias Persson, [matt...@neotechnology.com]
>> >> Neo Technology, www.neotechnology.com
>> >> ___
>> >> Neo mailing list
>> >> User@lists.neo4j.org
>> >> https://lists.neo4j.org/mailman/listinfo/user
>> >>
>> > ___
>> > Neo mailing list
>> > User@lists.neo4j.org
>> 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-21 Thread Núria Trench
Hi again Mattias,

I'm still trying to parse all the data in order to create the graph. I will
report the results as soon as possible.
Thank you very much for your interest.

Núria.

2009/12/21 Mattias Persson 

> Hi again,
>
> any luck with this yet?
>
> 2009/12/11 Núria Trench :
> > Thank you very much Mattias. I will test it as soon as possible and I'll
> > will tell you something.
> >
> > Núria.
> >
> > 2009/12/11 Mattias Persson 
> >
> >> I've tried this a couple of times now and first of all I see some
> >> problems in your code:
> >>
> >> 1) In the method createRelationsTitleImage you have an inverted "head
> >> != -1" check where it should be "head == -1"
> >>
> >> 2) You index relationships in createRelationsBetweenTitles method,
> >> this isn't ok since the index can only manage nodes.
> >>
> >> And I recently committed a "fix" which removed the caching layer in
> >> the LuceneIndexBatchInserterImpl (and therefore also
> >> LuceneFulltextIndexBatchInserter). This probably fixes your problems.
> >> I'm also working on a performance fix which makes consecutive getNodes
> >> calls faster.
> >>
> >> So I think that with these fixes (1) and (2) and the latest index-util
> >> 0.9-SNAPSHOT your sample will run fine. Also you could try without
> >> calling optimize. See more information at
> >> http://wiki.neo4j.org/content/Indexing_with_BatchInserter
> >>
> >> 2009/12/10 Mattias Persson :
> >> > To continue this thread in the user list:
> >> >
> >> > Thanks Núria, I've gotten your samples code/files and I'm running it
> >> > now to try to reproduce you problem.
> >> >
> >> > 2009/12/9 Núria Trench :
> >> >> I have finished uploading the 4 csv files. You'll see an e-mail with
> the
> >> >> other 3 csv files packed in a rar file.
> >> >> Thanks,
> >> >>
> >> >> Núria.
> >> >>
> >> >> 2009/12/9 Núria Trench 
> >> >>>
> >> >>> Yes, you are right. But there is one csv file that is too big to be
> >> packed
> >> >>> with other files and I am reducing it.
> >> >>> I am sending the other files now.
> >> >>>
> >> >>> 2009/12/9 Mattias Persson 
> >> 
> >>  By the way, you might consider packing those files (with zip or
> tar.gz
> >>  or something) cause they will shrink quite well
> >> 
> >>  2009/12/9 Mattias Persson :
> >>  > Great, but I only got the images.csv file... I'm starting to test
> >> with
> >>  > that at least
> >>  >
> >>  > 2009/12/9 Núria Trench :
> >>  >> Hi again,
> >>  >>
> >>  >> The errors show up after being parsed 2 csv files to create all
> the
> >>  >> nodes,
> >>  >> just in the moment of calling the method "getSingleNode" for
> >> looking
> >>  >> up the
> >>  >> tail and head node for creating all the edges by reading the
> other
> >> two
> >>  >> csv
> >>  >> files.
> >>  >>
> >>  >> I am sending with Sprend the four csv files that will help you
> to
> >>  >> trigger
> >>  >> index behaviour.
> >>  >>
> >>  >> Thank you,
> >>  >>
> >>  >> Núria.
> >>  >>
> >>  >> 2009/12/9 Mattias Persson 
> >>  >>>
> >>  >>> Hmm, I've no idea... but does the errors show up early in the
> >> process
> >>  >>> or do you have to insert a LOT of data to trigger it? In such
> case
> >>  >>> you
> >>  >>> could send me a part of them... maybe using
> http://www.sprend.se,
> >>  >>> WDYT?
> >>  >>>
> >>  >>> 2009/12/9 Núria Trench :
> >>  >>> > Hi Mattias,
> >>  >>> >
> >>  >>> > The data isn't confident but the files are very big (5,5 GB).
> >>  >>> > How can I send you this data?
> >>  >>> >
> >>  >>> > 2009/12/9 Mattias Persson 
> >>  >>> >>
> >>  >>> >> Yep I got the java code, thanks. Yeah if the data is
> confident
> >> or
> >>  >>> >> sensitive you can just send me the formatting, else consider
> >>  >>> >> sending
> >>  >>> >> the files as well (or a subset if they are big).
> >>  >>> >>
> >>  >>> >> 2009/12/9 Núria Trench :
> >>
> >>
> >>
> >> --
> >> Mattias Persson, [matt...@neotechnology.com]
> >> Neo Technology, www.neotechnology.com
> >> ___
> >> Neo mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > ___
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Neo Technology, www.neotechnology.com
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-21 Thread Mattias Persson
Hi again,

any luck with this yet?

2009/12/11 Núria Trench :
> Thank you very much Mattias. I will test it as soon as possible and I'll
> will tell you something.
>
> Núria.
>
> 2009/12/11 Mattias Persson 
>
>> I've tried this a couple of times now and first of all I see some
>> problems in your code:
>>
>> 1) In the method createRelationsTitleImage you have an inverted "head
>> != -1" check where it should be "head == -1"
>>
>> 2) You index relationships in createRelationsBetweenTitles method,
>> this isn't ok since the index can only manage nodes.
>>
>> And I recently committed a "fix" which removed the caching layer in
>> the LuceneIndexBatchInserterImpl (and therefore also
>> LuceneFulltextIndexBatchInserter). This probably fixes your problems.
>> I'm also working on a performance fix which makes consecutive getNodes
>> calls faster.
>>
>> So I think that with these fixes (1) and (2) and the latest index-util
>> 0.9-SNAPSHOT your sample will run fine. Also you could try without
>> calling optimize. See more information at
>> http://wiki.neo4j.org/content/Indexing_with_BatchInserter
>>
>> 2009/12/10 Mattias Persson :
>> > To continue this thread in the user list:
>> >
>> > Thanks Núria, I've gotten your samples code/files and I'm running it
>> > now to try to reproduce you problem.
>> >
>> > 2009/12/9 Núria Trench :
>> >> I have finished uploading the 4 csv files. You'll see an e-mail with the
>> >> other 3 csv files packed in a rar file.
>> >> Thanks,
>> >>
>> >> Núria.
>> >>
>> >> 2009/12/9 Núria Trench 
>> >>>
>> >>> Yes, you are right. But there is one csv file that is too big to be
>> packed
>> >>> with other files and I am reducing it.
>> >>> I am sending the other files now.
>> >>>
>> >>> 2009/12/9 Mattias Persson 
>> 
>>  By the way, you might consider packing those files (with zip or tar.gz
>>  or something) cause they will shrink quite well
>> 
>>  2009/12/9 Mattias Persson :
>>  > Great, but I only got the images.csv file... I'm starting to test
>> with
>>  > that at least
>>  >
>>  > 2009/12/9 Núria Trench :
>>  >> Hi again,
>>  >>
>>  >> The errors show up after being parsed 2 csv files to create all the
>>  >> nodes,
>>  >> just in the moment of calling the method "getSingleNode" for
>> looking
>>  >> up the
>>  >> tail and head node for creating all the edges by reading the other
>> two
>>  >> csv
>>  >> files.
>>  >>
>>  >> I am sending with Sprend the four csv files that will help you to
>>  >> trigger
>>  >> index behaviour.
>>  >>
>>  >> Thank you,
>>  >>
>>  >> Núria.
>>  >>
>>  >> 2009/12/9 Mattias Persson 
>>  >>>
>>  >>> Hmm, I've no idea... but does the errors show up early in the
>> process
>>  >>> or do you have to insert a LOT of data to trigger it? In such case
>>  >>> you
>>  >>> could send me a part of them... maybe using http://www.sprend.se,
>>  >>> WDYT?
>>  >>>
>>  >>> 2009/12/9 Núria Trench :
>>  >>> > Hi Mattias,
>>  >>> >
>>  >>> > The data isn't confident but the files are very big (5,5 GB).
>>  >>> > How can I send you this data?
>>  >>> >
>>  >>> > 2009/12/9 Mattias Persson 
>>  >>> >>
>>  >>> >> Yep I got the java code, thanks. Yeah if the data is confident
>> or
>>  >>> >> sensitive you can just send me the formatting, else consider
>>  >>> >> sending
>>  >>> >> the files as well (or a subset if they are big).
>>  >>> >>
>>  >>> >> 2009/12/9 Núria Trench :
>>
>>
>>
>> --
>> Mattias Persson, [matt...@neotechnology.com]
>> Neo Technology, www.neotechnology.com
>> ___
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-11 Thread Núria Trench
Thank you very much Mattias. I will test it as soon as possible and I'll
will tell you something.

Núria.

2009/12/11 Mattias Persson 

> I've tried this a couple of times now and first of all I see some
> problems in your code:
>
> 1) In the method createRelationsTitleImage you have an inverted "head
> != -1" check where it should be "head == -1"
>
> 2) You index relationships in createRelationsBetweenTitles method,
> this isn't ok since the index can only manage nodes.
>
> And I recently committed a "fix" which removed the caching layer in
> the LuceneIndexBatchInserterImpl (and therefore also
> LuceneFulltextIndexBatchInserter). This probably fixes your problems.
> I'm also working on a performance fix which makes consecutive getNodes
> calls faster.
>
> So I think that with these fixes (1) and (2) and the latest index-util
> 0.9-SNAPSHOT your sample will run fine. Also you could try without
> calling optimize. See more information at
> http://wiki.neo4j.org/content/Indexing_with_BatchInserter
>
> 2009/12/10 Mattias Persson :
> > To continue this thread in the user list:
> >
> > Thanks Núria, I've gotten your samples code/files and I'm running it
> > now to try to reproduce you problem.
> >
> > 2009/12/9 Núria Trench :
> >> I have finished uploading the 4 csv files. You'll see an e-mail with the
> >> other 3 csv files packed in a rar file.
> >> Thanks,
> >>
> >> Núria.
> >>
> >> 2009/12/9 Núria Trench 
> >>>
> >>> Yes, you are right. But there is one csv file that is too big to be
> packed
> >>> with other files and I am reducing it.
> >>> I am sending the other files now.
> >>>
> >>> 2009/12/9 Mattias Persson 
> 
>  By the way, you might consider packing those files (with zip or tar.gz
>  or something) cause they will shrink quite well
> 
>  2009/12/9 Mattias Persson :
>  > Great, but I only got the images.csv file... I'm starting to test
> with
>  > that at least
>  >
>  > 2009/12/9 Núria Trench :
>  >> Hi again,
>  >>
>  >> The errors show up after being parsed 2 csv files to create all the
>  >> nodes,
>  >> just in the moment of calling the method "getSingleNode" for
> looking
>  >> up the
>  >> tail and head node for creating all the edges by reading the other
> two
>  >> csv
>  >> files.
>  >>
>  >> I am sending with Sprend the four csv files that will help you to
>  >> trigger
>  >> index behaviour.
>  >>
>  >> Thank you,
>  >>
>  >> Núria.
>  >>
>  >> 2009/12/9 Mattias Persson 
>  >>>
>  >>> Hmm, I've no idea... but does the errors show up early in the
> process
>  >>> or do you have to insert a LOT of data to trigger it? In such case
>  >>> you
>  >>> could send me a part of them... maybe using http://www.sprend.se,
>  >>> WDYT?
>  >>>
>  >>> 2009/12/9 Núria Trench :
>  >>> > Hi Mattias,
>  >>> >
>  >>> > The data isn't confident but the files are very big (5,5 GB).
>  >>> > How can I send you this data?
>  >>> >
>  >>> > 2009/12/9 Mattias Persson 
>  >>> >>
>  >>> >> Yep I got the java code, thanks. Yeah if the data is confident
> or
>  >>> >> sensitive you can just send me the formatting, else consider
>  >>> >> sending
>  >>> >> the files as well (or a subset if they are big).
>  >>> >>
>  >>> >> 2009/12/9 Núria Trench :
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Neo Technology, www.neotechnology.com
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-11 Thread Mattias Persson
I've tried this a couple of times now and first of all I see some
problems in your code:

1) In the method createRelationsTitleImage you have an inverted "head
!= -1" check where it should be "head == -1"

2) You index relationships in createRelationsBetweenTitles method,
this isn't ok since the index can only manage nodes.

And I recently committed a "fix" which removed the caching layer in
the LuceneIndexBatchInserterImpl (and therefore also
LuceneFulltextIndexBatchInserter). This probably fixes your problems.
I'm also working on a performance fix which makes consecutive getNodes
calls faster.

So I think that with these fixes (1) and (2) and the latest index-util
0.9-SNAPSHOT your sample will run fine. Also you could try without
calling optimize. See more information at
http://wiki.neo4j.org/content/Indexing_with_BatchInserter

2009/12/10 Mattias Persson :
> To continue this thread in the user list:
>
> Thanks Núria, I've gotten your samples code/files and I'm running it
> now to try to reproduce you problem.
>
> 2009/12/9 Núria Trench :
>> I have finished uploading the 4 csv files. You'll see an e-mail with the
>> other 3 csv files packed in a rar file.
>> Thanks,
>>
>> Núria.
>>
>> 2009/12/9 Núria Trench 
>>>
>>> Yes, you are right. But there is one csv file that is too big to be packed
>>> with other files and I am reducing it.
>>> I am sending the other files now.
>>>
>>> 2009/12/9 Mattias Persson 

 By the way, you might consider packing those files (with zip or tar.gz
 or something) cause they will shrink quite well

 2009/12/9 Mattias Persson :
 > Great, but I only got the images.csv file... I'm starting to test with
 > that at least
 >
 > 2009/12/9 Núria Trench :
 >> Hi again,
 >>
 >> The errors show up after being parsed 2 csv files to create all the
 >> nodes,
 >> just in the moment of calling the method "getSingleNode" for looking
 >> up the
 >> tail and head node for creating all the edges by reading the other two
 >> csv
 >> files.
 >>
 >> I am sending with Sprend the four csv files that will help you to
 >> trigger
 >> index behaviour.
 >>
 >> Thank you,
 >>
 >> Núria.
 >>
 >> 2009/12/9 Mattias Persson 
 >>>
 >>> Hmm, I've no idea... but does the errors show up early in the process
 >>> or do you have to insert a LOT of data to trigger it? In such case
 >>> you
 >>> could send me a part of them... maybe using http://www.sprend.se ,
 >>> WDYT?
 >>>
 >>> 2009/12/9 Núria Trench :
 >>> > Hi Mattias,
 >>> >
 >>> > The data isn't confident but the files are very big (5,5 GB).
 >>> > How can I send you this data?
 >>> >
 >>> > 2009/12/9 Mattias Persson 
 >>> >>
 >>> >> Yep I got the java code, thanks. Yeah if the data is confident or
 >>> >> sensitive you can just send me the formatting, else consider
 >>> >> sending
 >>> >> the files as well (or a subset if they are big).
 >>> >>
 >>> >> 2009/12/9 Núria Trench :



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-10 Thread Mattias Persson
To continue this thread in the user list:

Thanks Núria, I've gotten your samples code/files and I'm running it
now to try to reproduce you problem.

2009/12/9 Núria Trench :
> I have finished uploading the 4 csv files. You'll see an e-mail with the
> other 3 csv files packed in a rar file.
> Thanks,
>
> Núria.
>
> 2009/12/9 Núria Trench 
>>
>> Yes, you are right. But there is one csv file that is too big to be packed
>> with other files and I am reducing it.
>> I am sending the other files now.
>>
>> 2009/12/9 Mattias Persson 
>>>
>>> By the way, you might consider packing those files (with zip or tar.gz
>>> or something) cause they will shrink quite well
>>>
>>> 2009/12/9 Mattias Persson :
>>> > Great, but I only got the images.csv file... I'm starting to test with
>>> > that at least
>>> >
>>> > 2009/12/9 Núria Trench :
>>> >> Hi again,
>>> >>
>>> >> The errors show up after being parsed 2 csv files to create all the
>>> >> nodes,
>>> >> just in the moment of calling the method "getSingleNode" for looking
>>> >> up the
>>> >> tail and head node for creating all the edges by reading the other two
>>> >> csv
>>> >> files.
>>> >>
>>> >> I am sending with Sprend the four csv files that will help you to
>>> >> trigger
>>> >> index behaviour.
>>> >>
>>> >> Thank you,
>>> >>
>>> >> Núria.
>>> >>
>>> >> 2009/12/9 Mattias Persson 
>>> >>>
>>> >>> Hmm, I've no idea... but does the errors show up early in the process
>>> >>> or do you have to insert a LOT of data to trigger it? In such case
>>> >>> you
>>> >>> could send me a part of them... maybe using http://www.sprend.se ,
>>> >>> WDYT?
>>> >>>
>>> >>> 2009/12/9 Núria Trench :
>>> >>> > Hi Mattias,
>>> >>> >
>>> >>> > The data isn't confident but the files are very big (5,5 GB).
>>> >>> > How can I send you this data?
>>> >>> >
>>> >>> > 2009/12/9 Mattias Persson 
>>> >>> >>
>>> >>> >> Yep I got the java code, thanks. Yeah if the data is confident or
>>> >>> >> sensitive you can just send me the formatting, else consider
>>> >>> >> sending
>>> >>> >> the files as well (or a subset if they are big).
>>> >>> >>
>>> >>> >> 2009/12/9 Núria Trench :
>>> >>> >> >
>>> >>> >> >
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Mattias Persson, [matt...@neotechnology.com]
>>> >>> >> Neo Technology, www.neotechnology.com
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Mattias Persson, [matt...@neotechnology.com]
>>> >>> Neo Technology, www.neotechnology.com
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Mattias Persson, [matt...@neotechnology.com]
>>> > Neo Technology, www.neotechnology.com
>>> >
>>>
>>>
>>>
>>> --
>>> Mattias Persson, [matt...@neotechnology.com]
>>> Neo Technology, www.neotechnology.com
>>
>
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Mattias,

I have already done it 10 minutes ago. If you need an example to see the
format of the 4 csv files, I can send it to you.
Thanks again,

Núria.

2009/12/9 Mattias Persson 

> Oh ok, It could be our attachments filter / security or something...
> could you try to mail them to me directly at matt...@neotechnology.com
> ?
>
> 2009/12/9 Núria Trench :
> > Hi Mattias,
> >
> > In my last e-mail I have attached the sample code, haven't you received
> it?
> > I will try to attach it again.
> >
> > Núria.
> >
> > 2009/12/9 Mattias Persson 
> >
> >> Hi again, Núria (it was I, Mattias who asked for the sample code).
> >> Well... the fact that you parse 4 csv files doesn't really help me
> >> setup a test for this... I mean how can I know that my test will be
> >> similar to yours? Would it be ok to attach your code/csv files as
> >> well?
> >>
> >> / Mattias
> >>
> >> 2009/12/9 Núria Trench :
> >> > Hi Todd,
> >> >
> >> > The sample code creates nodes and relationships by parsing 4 csv
> files.
> >> > Thank you for trying to trigger this behaviour with this sample.
> >> >
> >> > Núria
> >> >
> >> > 2009/12/9 Mattias Persson 
> >> >
> >> >> Could you provide me with some sample code which can trigger this
> >> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
> >> >>
> >> >> 2009/12/9 Núria Trench :
> >> >> > Todd,
> >> >> >
> >> >> > I haven't the same problem. In my case, after indexing all the
> >> >> > attributes/properties of each node, the application creates all the
> >> edges
> >> >> by
> >> >> > looking up the tail node and the head node. So, it calls the method
> >> >> > "org.neo4j.util.index.
> >> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no
> found
> >> >> node)
> >> >> > in many occasions.
> >> >> >
> >> >> > Any one has an alternative to get a node with indexex
> >> >> attributes/properties?
> >> >> >
> >> >> > Thank you,
> >> >> >
> >> >> > Núria.
> >> >> >
> >> >> >
> >> >> > 2009/12/7 Mattias Persson 
> >> >> >
> >> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT?
> This
> >> >> >> is a bug that we fixed yesterday... (assuming it's the same bug).
> >> >> >>
> >> >> >> 2009/12/7 Todd Stavish :
> >> >> >> > Hi Mattias, Núria.
> >> >> >> >
> >> >> >> > I am also running into scalability problems with the Lucene
> batch
> >> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
> >> >> >> > calling optimize more. Increasing ulimit didn't help.
> >> >> >> >
> >> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
> >> >> >> > java.io.FileNotFoundException:
> >> >> >> >
> >> >> >>
> >> >>
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> >> >> > (Too many open files)
> >> >> >> > [INFO]  at
> >> >> >>
> >> >>
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> >> >> >> > [INFO]  at
> >> >> >>
> >> >>
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> >> >> >> > [INFO]  at
> >> >> >>
> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> >> >> >> > [INFO]  at
> >> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> >> >> >> > [INFO] Caused by: java.io.FileNotFoundException:
> >> >> >> >
> >> >> >>
> >> >>
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> >> >> > (Too many open files)
> >> >> >> >
> >> >> >> > I tried breaking up to separate batchinserter instances, and it
> >> hangs
> >> >> >> > now. Can I create more than one batch inserter per process if
> they
> >> run
> >> >> >> > sequentially and non-threaded?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Todd
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <
> >> nuriatre...@gmail.com>
> >> >> >> wrote:
> >> >> >> >> Hi again Mattias,
> >> >> >> >>
> >> >> >> >> I have tried to execute my application with the last version
> >> >> available
> >> >> >> in
> >> >> >> >> the maven repository and I still have the same problem. After
> >> >> creating
> >> >> >> and
> >> >> >> >> indexing all the nodes, the application calls the "optimize"
> >> method
> >> >> and,
> >> >> >> >> then, it creates all the edges by calling the method "getNodes"
> in
> >> >> order
> >> >> >> to
> >> >> >> >> select the tail and head node of the edge, but it doesn't work
> >> >> because
> >> >> >> many
> >> >> >> >> nodes are not found.
> >> >> >> >>
> >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works
> >> >> properly,
> >> >> >> but
> >> >> >> >> if I try to create a big graph (180 million edges + 20 million
> >> nodes)
> >> >> it
> >> >> >> >> doesn't.
> >> >> >> >>
> >> >> >> >> I have also tried to call the "optimize" method every time the
> >> >> >> application
> >> >> >> >> has been created 1 million nodes but it doesn't work.
> >> >> >> >>
> >> >> >> >

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Oh ok, It could be our attachments filter / security or something...
could you try to mail them to me directly at matt...@neotechnology.com
?

2009/12/9 Núria Trench :
> Hi Mattias,
>
> In my last e-mail I have attached the sample code, haven't you received it?
> I will try to attach it again.
>
> Núria.
>
> 2009/12/9 Mattias Persson 
>
>> Hi again, Núria (it was I, Mattias who asked for the sample code).
>> Well... the fact that you parse 4 csv files doesn't really help me
>> setup a test for this... I mean how can I know that my test will be
>> similar to yours? Would it be ok to attach your code/csv files as
>> well?
>>
>> / Mattias
>>
>> 2009/12/9 Núria Trench :
>> > Hi Todd,
>> >
>> > The sample code creates nodes and relationships by parsing 4 csv files.
>> > Thank you for trying to trigger this behaviour with this sample.
>> >
>> > Núria
>> >
>> > 2009/12/9 Mattias Persson 
>> >
>> >> Could you provide me with some sample code which can trigger this
>> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
>> >>
>> >> 2009/12/9 Núria Trench :
>> >> > Todd,
>> >> >
>> >> > I haven't the same problem. In my case, after indexing all the
>> >> > attributes/properties of each node, the application creates all the
>> edges
>> >> by
>> >> > looking up the tail node and the head node. So, it calls the method
>> >> > "org.neo4j.util.index.
>> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found
>> >> node)
>> >> > in many occasions.
>> >> >
>> >> > Any one has an alternative to get a node with indexex
>> >> attributes/properties?
>> >> >
>> >> > Thank you,
>> >> >
>> >> > Núria.
>> >> >
>> >> >
>> >> > 2009/12/7 Mattias Persson 
>> >> >
>> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
>> >> >> is a bug that we fixed yesterday... (assuming it's the same bug).
>> >> >>
>> >> >> 2009/12/7 Todd Stavish :
>> >> >> > Hi Mattias, Núria.
>> >> >> >
>> >> >> > I am also running into scalability problems with the Lucene batch
>> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
>> >> >> > calling optimize more. Increasing ulimit didn't help.
>> >> >> >
>> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
>> >> >> > java.io.FileNotFoundException:
>> >> >> >
>> >> >>
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> >> > (Too many open files)
>> >> >> > [INFO]  at
>> >> >>
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
>> >> >> > [INFO]  at
>> >> >>
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
>> >> >> > [INFO]  at
>> >> >>
>> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
>> >> >> > [INFO]  at
>> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
>> >> >> > [INFO] Caused by: java.io.FileNotFoundException:
>> >> >> >
>> >> >>
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> >> > (Too many open files)
>> >> >> >
>> >> >> > I tried breaking up to separate batchinserter instances, and it
>> hangs
>> >> >> > now. Can I create more than one batch inserter per process if they
>> run
>> >> >> > sequentially and non-threaded?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Todd
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <
>> nuriatre...@gmail.com>
>> >> >> wrote:
>> >> >> >> Hi again Mattias,
>> >> >> >>
>> >> >> >> I have tried to execute my application with the last version
>> >> available
>> >> >> in
>> >> >> >> the maven repository and I still have the same problem. After
>> >> creating
>> >> >> and
>> >> >> >> indexing all the nodes, the application calls the "optimize"
>> method
>> >> and,
>> >> >> >> then, it creates all the edges by calling the method "getNodes" in
>> >> order
>> >> >> to
>> >> >> >> select the tail and head node of the edge, but it doesn't work
>> >> because
>> >> >> many
>> >> >> >> nodes are not found.
>> >> >> >>
>> >> >> >> I have tried to create only 30 nodes and 15 edges and it works
>> >> properly,
>> >> >> but
>> >> >> >> if I try to create a big graph (180 million edges + 20 million
>> nodes)
>> >> it
>> >> >> >> doesn't.
>> >> >> >>
>> >> >> >> I have also tried to call the "optimize" method every time the
>> >> >> application
>> >> >> >> has been created 1 million nodes but it doesn't work.
>> >> >> >>
>> >> >> >> Have you tried to create as many nodes as I have said with the
>> newer
>> >> >> >> index-util version?
>> >> >> >>
>> >> >> >> Thank you,
>> >> >> >>
>> >> >> >> Núria.
>> >> >> >>
>> >> >> >> 2009/12/4 Núria Trench 
>> >> >> >>
>> >> >> >>> Hi Mattias,
>> >> >> >>>
>> >> >> >>> Thank you very much for fixing the problem so fast. I will try it
>> as
>> >> >> soon
>> >> >> >>> as the new changes will be available in the maven repository.
>> >> >> >>>
>> >> >> >>> Núria.
>> >> >> >>>
>> 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Mattias,

In my last e-mail I have attached the sample code, haven't you received it?
I will try to attach it again.

Núria.

2009/12/9 Mattias Persson 

> Hi again, Núria (it was I, Mattias who asked for the sample code).
> Well... the fact that you parse 4 csv files doesn't really help me
> setup a test for this... I mean how can I know that my test will be
> similar to yours? Would it be ok to attach your code/csv files as
> well?
>
> / Mattias
>
> 2009/12/9 Núria Trench :
> > Hi Todd,
> >
> > The sample code creates nodes and relationships by parsing 4 csv files.
> > Thank you for trying to trigger this behaviour with this sample.
> >
> > Núria
> >
> > 2009/12/9 Mattias Persson 
> >
> >> Could you provide me with some sample code which can trigger this
> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
> >>
> >> 2009/12/9 Núria Trench :
> >> > Todd,
> >> >
> >> > I haven't the same problem. In my case, after indexing all the
> >> > attributes/properties of each node, the application creates all the
> edges
> >> by
> >> > looking up the tail node and the head node. So, it calls the method
> >> > "org.neo4j.util.index.
> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found
> >> node)
> >> > in many occasions.
> >> >
> >> > Any one has an alternative to get a node with indexex
> >> attributes/properties?
> >> >
> >> > Thank you,
> >> >
> >> > Núria.
> >> >
> >> >
> >> > 2009/12/7 Mattias Persson 
> >> >
> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
> >> >> is a bug that we fixed yesterday... (assuming it's the same bug).
> >> >>
> >> >> 2009/12/7 Todd Stavish :
> >> >> > Hi Mattias, Núria.
> >> >> >
> >> >> > I am also running into scalability problems with the Lucene batch
> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
> >> >> > calling optimize more. Increasing ulimit didn't help.
> >> >> >
> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
> >> >> > java.io.FileNotFoundException:
> >> >> >
> >> >>
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> >> > (Too many open files)
> >> >> > [INFO]  at
> >> >>
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> >> >> > [INFO]  at
> >> >>
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> >> >> > [INFO]  at
> >> >>
> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> >> >> > [INFO]  at
> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> >> >> > [INFO] Caused by: java.io.FileNotFoundException:
> >> >> >
> >> >>
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> >> > (Too many open files)
> >> >> >
> >> >> > I tried breaking up to separate batchinserter instances, and it
> hangs
> >> >> > now. Can I create more than one batch inserter per process if they
> run
> >> >> > sequentially and non-threaded?
> >> >> >
> >> >> > Thanks,
> >> >> > Todd
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <
> nuriatre...@gmail.com>
> >> >> wrote:
> >> >> >> Hi again Mattias,
> >> >> >>
> >> >> >> I have tried to execute my application with the last version
> >> available
> >> >> in
> >> >> >> the maven repository and I still have the same problem. After
> >> creating
> >> >> and
> >> >> >> indexing all the nodes, the application calls the "optimize"
> method
> >> and,
> >> >> >> then, it creates all the edges by calling the method "getNodes" in
> >> order
> >> >> to
> >> >> >> select the tail and head node of the edge, but it doesn't work
> >> because
> >> >> many
> >> >> >> nodes are not found.
> >> >> >>
> >> >> >> I have tried to create only 30 nodes and 15 edges and it works
> >> properly,
> >> >> but
> >> >> >> if I try to create a big graph (180 million edges + 20 million
> nodes)
> >> it
> >> >> >> doesn't.
> >> >> >>
> >> >> >> I have also tried to call the "optimize" method every time the
> >> >> application
> >> >> >> has been created 1 million nodes but it doesn't work.
> >> >> >>
> >> >> >> Have you tried to create as many nodes as I have said with the
> newer
> >> >> >> index-util version?
> >> >> >>
> >> >> >> Thank you,
> >> >> >>
> >> >> >> Núria.
> >> >> >>
> >> >> >> 2009/12/4 Núria Trench 
> >> >> >>
> >> >> >>> Hi Mattias,
> >> >> >>>
> >> >> >>> Thank you very much for fixing the problem so fast. I will try it
> as
> >> >> soon
> >> >> >>> as the new changes will be available in the maven repository.
> >> >> >>>
> >> >> >>> Núria.
> >> >> >>>
> >> >> >>>
> >> >> >>> 2009/12/4 Mattias Persson 
> >> >> >>>
> >> >>  I fixed the problem and also added a cache per key for faster
> >> >>  getNodes/getSingleNode lookup during the insert process. However
> >> the
> >> >>  cache assumes that there's nothing in the index when the process
> >> >>  starts (which al

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Hi again, Núria (it was I, Mattias who asked for the sample code).
Well... the fact that you parse 4 csv files doesn't really help me
setup a test for this... I mean how can I know that my test will be
similar to yours? Would it be ok to attach your code/csv files as
well?

/ Mattias

2009/12/9 Núria Trench :
> Hi Todd,
>
> The sample code creates nodes and relationships by parsing 4 csv files.
> Thank you for trying to trigger this behaviour with this sample.
>
> Núria
>
> 2009/12/9 Mattias Persson 
>
>> Could you provide me with some sample code which can trigger this
>> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
>>
>> 2009/12/9 Núria Trench :
>> > Todd,
>> >
>> > I haven't the same problem. In my case, after indexing all the
>> > attributes/properties of each node, the application creates all the edges
>> by
>> > looking up the tail node and the head node. So, it calls the method
>> > "org.neo4j.util.index.
>> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found
>> node)
>> > in many occasions.
>> >
>> > Any one has an alternative to get a node with indexex
>> attributes/properties?
>> >
>> > Thank you,
>> >
>> > Núria.
>> >
>> >
>> > 2009/12/7 Mattias Persson 
>> >
>> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
>> >> is a bug that we fixed yesterday... (assuming it's the same bug).
>> >>
>> >> 2009/12/7 Todd Stavish :
>> >> > Hi Mattias, Núria.
>> >> >
>> >> > I am also running into scalability problems with the Lucene batch
>> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
>> >> > calling optimize more. Increasing ulimit didn't help.
>> >> >
>> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
>> >> > java.io.FileNotFoundException:
>> >> >
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> > (Too many open files)
>> >> > [INFO]  at
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
>> >> > [INFO]  at
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
>> >> > [INFO]  at
>> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
>> >> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
>> >> > [INFO] Caused by: java.io.FileNotFoundException:
>> >> >
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> > (Too many open files)
>> >> >
>> >> > I tried breaking up to separate batchinserter instances, and it hangs
>> >> > now. Can I create more than one batch inserter per process if they run
>> >> > sequentially and non-threaded?
>> >> >
>> >> > Thanks,
>> >> > Todd
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
>> >> wrote:
>> >> >> Hi again Mattias,
>> >> >>
>> >> >> I have tried to execute my application with the last version
>> available
>> >> in
>> >> >> the maven repository and I still have the same problem. After
>> creating
>> >> and
>> >> >> indexing all the nodes, the application calls the "optimize" method
>> and,
>> >> >> then, it creates all the edges by calling the method "getNodes" in
>> order
>> >> to
>> >> >> select the tail and head node of the edge, but it doesn't work
>> because
>> >> many
>> >> >> nodes are not found.
>> >> >>
>> >> >> I have tried to create only 30 nodes and 15 edges and it works
>> properly,
>> >> but
>> >> >> if I try to create a big graph (180 million edges + 20 million nodes)
>> it
>> >> >> doesn't.
>> >> >>
>> >> >> I have also tried to call the "optimize" method every time the
>> >> application
>> >> >> has been created 1 million nodes but it doesn't work.
>> >> >>
>> >> >> Have you tried to create as many nodes as I have said with the newer
>> >> >> index-util version?
>> >> >>
>> >> >> Thank you,
>> >> >>
>> >> >> Núria.
>> >> >>
>> >> >> 2009/12/4 Núria Trench 
>> >> >>
>> >> >>> Hi Mattias,
>> >> >>>
>> >> >>> Thank you very much for fixing the problem so fast. I will try it as
>> >> soon
>> >> >>> as the new changes will be available in the maven repository.
>> >> >>>
>> >> >>> Núria.
>> >> >>>
>> >> >>>
>> >> >>> 2009/12/4 Mattias Persson 
>> >> >>>
>> >>  I fixed the problem and also added a cache per key for faster
>> >>  getNodes/getSingleNode lookup during the insert process. However
>> the
>> >>  cache assumes that there's nothing in the index when the process
>> >>  starts (which almost always will be true) to speed things up even
>> >>  further.
>> >> 
>> >>  You can control the cache size and if it should be used by
>> overriding
>> >>  the (this is also documented in the Javadoc):
>> >> 
>> >>  boolean useCache()
>> >>  int getMaxCacheSizePerKey()
>> >> 
>> >>  methods in your LuceneIndexBatchInserterImpl instance. The new
>> changes
>> >>  should be available in the maven repository within an hour.
>> >> 
>> >> >>>

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Todd,

The sample code creates nodes and relationships by parsing 4 csv files.
Thank you for trying to trigger this behaviour with this sample.

Núria

2009/12/9 Mattias Persson 

> Could you provide me with some sample code which can trigger this
> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
>
> 2009/12/9 Núria Trench :
> > Todd,
> >
> > I haven't the same problem. In my case, after indexing all the
> > attributes/properties of each node, the application creates all the edges
> by
> > looking up the tail node and the head node. So, it calls the method
> > "org.neo4j.util.index.
> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found
> node)
> > in many occasions.
> >
> > Any one has an alternative to get a node with indexex
> attributes/properties?
> >
> > Thank you,
> >
> > Núria.
> >
> >
> > 2009/12/7 Mattias Persson 
> >
> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
> >> is a bug that we fixed yesterday... (assuming it's the same bug).
> >>
> >> 2009/12/7 Todd Stavish :
> >> > Hi Mattias, Núria.
> >> >
> >> > I am also running into scalability problems with the Lucene batch
> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
> >> > calling optimize more. Increasing ulimit didn't help.
> >> >
> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
> >> > java.io.FileNotFoundException:
> >> >
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> > (Too many open files)
> >> > [INFO]  at
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> >> > [INFO]  at
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> >> > [INFO]  at
> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> >> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> >> > [INFO] Caused by: java.io.FileNotFoundException:
> >> >
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> > (Too many open files)
> >> >
> >> > I tried breaking up to separate batchinserter instances, and it hangs
> >> > now. Can I create more than one batch inserter per process if they run
> >> > sequentially and non-threaded?
> >> >
> >> > Thanks,
> >> > Todd
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
> >> wrote:
> >> >> Hi again Mattias,
> >> >>
> >> >> I have tried to execute my application with the last version
> available
> >> in
> >> >> the maven repository and I still have the same problem. After
> creating
> >> and
> >> >> indexing all the nodes, the application calls the "optimize" method
> and,
> >> >> then, it creates all the edges by calling the method "getNodes" in
> order
> >> to
> >> >> select the tail and head node of the edge, but it doesn't work
> because
> >> many
> >> >> nodes are not found.
> >> >>
> >> >> I have tried to create only 30 nodes and 15 edges and it works
> properly,
> >> but
> >> >> if I try to create a big graph (180 million edges + 20 million nodes)
> it
> >> >> doesn't.
> >> >>
> >> >> I have also tried to call the "optimize" method every time the
> >> application
> >> >> has been created 1 million nodes but it doesn't work.
> >> >>
> >> >> Have you tried to create as many nodes as I have said with the newer
> >> >> index-util version?
> >> >>
> >> >> Thank you,
> >> >>
> >> >> Núria.
> >> >>
> >> >> 2009/12/4 Núria Trench 
> >> >>
> >> >>> Hi Mattias,
> >> >>>
> >> >>> Thank you very much for fixing the problem so fast. I will try it as
> >> soon
> >> >>> as the new changes will be available in the maven repository.
> >> >>>
> >> >>> Núria.
> >> >>>
> >> >>>
> >> >>> 2009/12/4 Mattias Persson 
> >> >>>
> >>  I fixed the problem and also added a cache per key for faster
> >>  getNodes/getSingleNode lookup during the insert process. However
> the
> >>  cache assumes that there's nothing in the index when the process
> >>  starts (which almost always will be true) to speed things up even
> >>  further.
> >> 
> >>  You can control the cache size and if it should be used by
> overriding
> >>  the (this is also documented in the Javadoc):
> >> 
> >>  boolean useCache()
> >>  int getMaxCacheSizePerKey()
> >> 
> >>  methods in your LuceneIndexBatchInserterImpl instance. The new
> changes
> >>  should be available in the maven repository within an hour.
> >> 
> >>  2009/12/4 Mattias Persson :
> >>  > I think I found the problem... it's indexing as it should, but it
> >>  > isn't reflected in getNodes/getSingleNode properly until you
> >>  > flush/optimize/shutdown the index. I'll try to fix it today!
> >>  >
> >>  > 2009/12/3 Núria Trench :
> >>  >> Thank you very much for your response.
> >>  >> If you need more information, you only have to send an e-mail
> and I
> >>  will try
> >> 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Could you provide me with some sample code which can trigger this
behaviour with the latest index-util-0.9-SNAPSHOT Núria?

2009/12/9 Núria Trench :
> Todd,
>
> I haven't the same problem. In my case, after indexing all the
> attributes/properties of each node, the application creates all the edges by
> looking up the tail node and the head node. So, it calls the method
> "org.neo4j.util.index.
> LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node)
> in many occasions.
>
> Any one has an alternative to get a node with indexex attributes/properties?
>
> Thank you,
>
> Núria.
>
>
> 2009/12/7 Mattias Persson 
>
>> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
>> is a bug that we fixed yesterday... (assuming it's the same bug).
>>
>> 2009/12/7 Todd Stavish :
>> > Hi Mattias, Núria.
>> >
>> > I am also running into scalability problems with the Lucene batch
>> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
>> > calling optimize more. Increasing ulimit didn't help.
>> >
>> > INFO] Exception in thread "main" java.lang.RuntimeException:
>> > java.io.FileNotFoundException:
>> >
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> > (Too many open files)
>> > [INFO]  at
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
>> > [INFO]  at
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
>> > [INFO]  at
>> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
>> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
>> > [INFO] Caused by: java.io.FileNotFoundException:
>> >
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> > (Too many open files)
>> >
>> > I tried breaking up to separate batchinserter instances, and it hangs
>> > now. Can I create more than one batch inserter per process if they run
>> > sequentially and non-threaded?
>> >
>> > Thanks,
>> > Todd
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
>> wrote:
>> >> Hi again Mattias,
>> >>
>> >> I have tried to execute my application with the last version available
>> in
>> >> the maven repository and I still have the same problem. After creating
>> and
>> >> indexing all the nodes, the application calls the "optimize" method and,
>> >> then, it creates all the edges by calling the method "getNodes" in order
>> to
>> >> select the tail and head node of the edge, but it doesn't work because
>> many
>> >> nodes are not found.
>> >>
>> >> I have tried to create only 30 nodes and 15 edges and it works properly,
>> but
>> >> if I try to create a big graph (180 million edges + 20 million nodes) it
>> >> doesn't.
>> >>
>> >> I have also tried to call the "optimize" method every time the
>> application
>> >> has been created 1 million nodes but it doesn't work.
>> >>
>> >> Have you tried to create as many nodes as I have said with the newer
>> >> index-util version?
>> >>
>> >> Thank you,
>> >>
>> >> Núria.
>> >>
>> >> 2009/12/4 Núria Trench 
>> >>
>> >>> Hi Mattias,
>> >>>
>> >>> Thank you very much for fixing the problem so fast. I will try it as
>> soon
>> >>> as the new changes will be available in the maven repository.
>> >>>
>> >>> Núria.
>> >>>
>> >>>
>> >>> 2009/12/4 Mattias Persson 
>> >>>
>>  I fixed the problem and also added a cache per key for faster
>>  getNodes/getSingleNode lookup during the insert process. However the
>>  cache assumes that there's nothing in the index when the process
>>  starts (which almost always will be true) to speed things up even
>>  further.
>> 
>>  You can control the cache size and if it should be used by overriding
>>  the (this is also documented in the Javadoc):
>> 
>>  boolean useCache()
>>  int getMaxCacheSizePerKey()
>> 
>>  methods in your LuceneIndexBatchInserterImpl instance. The new changes
>>  should be available in the maven repository within an hour.
>> 
>>  2009/12/4 Mattias Persson :
>>  > I think I found the problem... it's indexing as it should, but it
>>  > isn't reflected in getNodes/getSingleNode properly until you
>>  > flush/optimize/shutdown the index. I'll try to fix it today!
>>  >
>>  > 2009/12/3 Núria Trench :
>>  >> Thank you very much for your response.
>>  >> If you need more information, you only have to send an e-mail and I
>>  will try
>>  >> to explain it better.
>>  >>
>>  >> Núria.
>>  >>
>>  >> 2009/12/3 Mattias Persson 
>>  >>
>>  >>> This is something I'd like to reproduce and I'll do some testing
>> on
>>  >>> this tomorrow
>>  >>>
>>  >>> 2009/12/3 Núria Trench :
>>  >>> > Hello,
>>  >>> >
>>  >>> > Last week, I decided to download your graph database core in
>> order
>>  to use
>>  >>> > it. First, I created a new project to parse my CSV 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Todd,

I haven't the same problem. In my case, after indexing all the
attributes/properties of each node, the application creates all the edges by
looking up the tail node and the head node. So, it calls the method
"org.neo4j.util.index.
LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node)
in many occasions.

Any one has an alternative to get a node with indexex attributes/properties?

Thank you,

Núria.


2009/12/7 Mattias Persson 

> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
> is a bug that we fixed yesterday... (assuming it's the same bug).
>
> 2009/12/7 Todd Stavish :
> > Hi Mattias, Núria.
> >
> > I am also running into scalability problems with the Lucene batch
> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
> > calling optimize more. Increasing ulimit didn't help.
> >
> > INFO] Exception in thread "main" java.lang.RuntimeException:
> > java.io.FileNotFoundException:
> >
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> > (Too many open files)
> > [INFO]  at
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> > [INFO]  at
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> > [INFO]  at
> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> > [INFO] Caused by: java.io.FileNotFoundException:
> >
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> > (Too many open files)
> >
> > I tried breaking up to separate batchinserter instances, and it hangs
> > now. Can I create more than one batch inserter per process if they run
> > sequentially and non-threaded?
> >
> > Thanks,
> > Todd
> >
> >
> >
> >
> >
> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
> wrote:
> >> Hi again Mattias,
> >>
> >> I have tried to execute my application with the last version available
> in
> >> the maven repository and I still have the same problem. After creating
> and
> >> indexing all the nodes, the application calls the "optimize" method and,
> >> then, it creates all the edges by calling the method "getNodes" in order
> to
> >> select the tail and head node of the edge, but it doesn't work because
> many
> >> nodes are not found.
> >>
> >> I have tried to create only 30 nodes and 15 edges and it works properly,
> but
> >> if I try to create a big graph (180 million edges + 20 million nodes) it
> >> doesn't.
> >>
> >> I have also tried to call the "optimize" method every time the
> application
> >> has been created 1 million nodes but it doesn't work.
> >>
> >> Have you tried to create as many nodes as I have said with the newer
> >> index-util version?
> >>
> >> Thank you,
> >>
> >> Núria.
> >>
> >> 2009/12/4 Núria Trench 
> >>
> >>> Hi Mattias,
> >>>
> >>> Thank you very much for fixing the problem so fast. I will try it as
> soon
> >>> as the new changes will be available in the maven repository.
> >>>
> >>> Núria.
> >>>
> >>>
> >>> 2009/12/4 Mattias Persson 
> >>>
>  I fixed the problem and also added a cache per key for faster
>  getNodes/getSingleNode lookup during the insert process. However the
>  cache assumes that there's nothing in the index when the process
>  starts (which almost always will be true) to speed things up even
>  further.
> 
>  You can control the cache size and if it should be used by overriding
>  the (this is also documented in the Javadoc):
> 
>  boolean useCache()
>  int getMaxCacheSizePerKey()
> 
>  methods in your LuceneIndexBatchInserterImpl instance. The new changes
>  should be available in the maven repository within an hour.
> 
>  2009/12/4 Mattias Persson :
>  > I think I found the problem... it's indexing as it should, but it
>  > isn't reflected in getNodes/getSingleNode properly until you
>  > flush/optimize/shutdown the index. I'll try to fix it today!
>  >
>  > 2009/12/3 Núria Trench :
>  >> Thank you very much for your response.
>  >> If you need more information, you only have to send an e-mail and I
>  will try
>  >> to explain it better.
>  >>
>  >> Núria.
>  >>
>  >> 2009/12/3 Mattias Persson 
>  >>
>  >>> This is something I'd like to reproduce and I'll do some testing
> on
>  >>> this tomorrow
>  >>>
>  >>> 2009/12/3 Núria Trench :
>  >>> > Hello,
>  >>> >
>  >>> > Last week, I decided to download your graph database core in
> order
>  to use
>  >>> > it. First, I created a new project to parse my CSV files and
> create
>  a new
>  >>> > graph database with Neo4j. This CSV files contain 150 milion
> edges
>  and 20
>  >>> > milion nodes.
>  >>> >
>  >>> > When I finished to write the code which will create the graph
>  database, I
>  >>> > executed it and, after six

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-07 Thread Mattias Persson
Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
is a bug that we fixed yesterday... (assuming it's the same bug).

2009/12/7 Todd Stavish :
> Hi Mattias, Núria.
>
> I am also running into scalability problems with the Lucene batch
> inserter at much smaller numbers, 30,000 indexed nodes. I tried
> calling optimize more. Increasing ulimit didn't help.
>
> INFO] Exception in thread "main" java.lang.RuntimeException:
> java.io.FileNotFoundException:
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> (Too many open files)
> [INFO]  at 
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> [INFO]  at 
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> [INFO]  at 
> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> [INFO] Caused by: java.io.FileNotFoundException:
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> (Too many open files)
>
> I tried breaking up to separate batchinserter instances, and it hangs
> now. Can I create more than one batch inserter per process if they run
> sequentially and non-threaded?
>
> Thanks,
> Todd
>
>
>
>
>
> On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench  wrote:
>> Hi again Mattias,
>>
>> I have tried to execute my application with the last version available in
>> the maven repository and I still have the same problem. After creating and
>> indexing all the nodes, the application calls the "optimize" method and,
>> then, it creates all the edges by calling the method "getNodes" in order to
>> select the tail and head node of the edge, but it doesn't work because many
>> nodes are not found.
>>
>> I have tried to create only 30 nodes and 15 edges and it works properly, but
>> if I try to create a big graph (180 million edges + 20 million nodes) it
>> doesn't.
>>
>> I have also tried to call the "optimize" method every time the application
>> has been created 1 million nodes but it doesn't work.
>>
>> Have you tried to create as many nodes as I have said with the newer
>> index-util version?
>>
>> Thank you,
>>
>> Núria.
>>
>> 2009/12/4 Núria Trench 
>>
>>> Hi Mattias,
>>>
>>> Thank you very much for fixing the problem so fast. I will try it as soon
>>> as the new changes will be available in the maven repository.
>>>
>>> Núria.
>>>
>>>
>>> 2009/12/4 Mattias Persson 
>>>
 I fixed the problem and also added a cache per key for faster
 getNodes/getSingleNode lookup during the insert process. However the
 cache assumes that there's nothing in the index when the process
 starts (which almost always will be true) to speed things up even
 further.

 You can control the cache size and if it should be used by overriding
 the (this is also documented in the Javadoc):

 boolean useCache()
 int getMaxCacheSizePerKey()

 methods in your LuceneIndexBatchInserterImpl instance. The new changes
 should be available in the maven repository within an hour.

 2009/12/4 Mattias Persson :
 > I think I found the problem... it's indexing as it should, but it
 > isn't reflected in getNodes/getSingleNode properly until you
 > flush/optimize/shutdown the index. I'll try to fix it today!
 >
 > 2009/12/3 Núria Trench :
 >> Thank you very much for your response.
 >> If you need more information, you only have to send an e-mail and I
 will try
 >> to explain it better.
 >>
 >> Núria.
 >>
 >> 2009/12/3 Mattias Persson 
 >>
 >>> This is something I'd like to reproduce and I'll do some testing on
 >>> this tomorrow
 >>>
 >>> 2009/12/3 Núria Trench :
 >>> > Hello,
 >>> >
 >>> > Last week, I decided to download your graph database core in order
 to use
 >>> > it. First, I created a new project to parse my CSV files and create
 a new
 >>> > graph database with Neo4j. This CSV files contain 150 milion edges
 and 20
 >>> > milion nodes.
 >>> >
 >>> > When I finished to write the code which will create the graph
 database, I
 >>> > executed it and, after six hours of execution, the program crashes
 >>> because
 >>> > of a Lucene exception. The exception is related to the index merging
 and
 >>> it
 >>> > has the following message:
 >>> > "mergeFields produced an invalid result: docCount is 385282378 but
 fdx
 >>> file
 >>> > size is 3082259028; now aborting this merge to prevent index
 corruption"
 >>> >
 >>> > I have searched on the net and I found that it is a lucene bug. The
 >>> > libraries used for executing my project were:
 >>> > neo-1.0-b10
 >>> > index-util-0.7
 >>> > lucene-core-2.4.0
 >>> >
 >>> > So, I decided to use a newer Lucene version. I found that you have a
 >>> newer

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-07 Thread Todd Stavish
Hi Mattias, Núria.

I am also running into scalability problems with the Lucene batch
inserter at much smaller numbers, 30,000 indexed nodes. I tried
calling optimize more. Increasing ulimit didn't help.

INFO] Exception in thread "main" java.lang.RuntimeException:
java.io.FileNotFoundException:
/Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
[INFO]  at 
org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
[INFO]  at 
org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
[INFO]  at 
com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
[INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
[INFO] Caused by: java.io.FileNotFoundException:
/Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)

I tried breaking up to separate batchinserter instances, and it hangs
now. Can I create more than one batch inserter per process if they run
sequentially and non-threaded?

Thanks,
Todd





On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench  wrote:
> Hi again Mattias,
>
> I have tried to execute my application with the last version available in
> the maven repository and I still have the same problem. After creating and
> indexing all the nodes, the application calls the "optimize" method and,
> then, it creates all the edges by calling the method "getNodes" in order to
> select the tail and head node of the edge, but it doesn't work because many
> nodes are not found.
>
> I have tried to create only 30 nodes and 15 edges and it works properly, but
> if I try to create a big graph (180 million edges + 20 million nodes) it
> doesn't.
>
> I have also tried to call the "optimize" method every time the application
> has been created 1 million nodes but it doesn't work.
>
> Have you tried to create as many nodes as I have said with the newer
> index-util version?
>
> Thank you,
>
> Núria.
>
> 2009/12/4 Núria Trench 
>
>> Hi Mattias,
>>
>> Thank you very much for fixing the problem so fast. I will try it as soon
>> as the new changes will be available in the maven repository.
>>
>> Núria.
>>
>>
>> 2009/12/4 Mattias Persson 
>>
>>> I fixed the problem and also added a cache per key for faster
>>> getNodes/getSingleNode lookup during the insert process. However the
>>> cache assumes that there's nothing in the index when the process
>>> starts (which almost always will be true) to speed things up even
>>> further.
>>>
>>> You can control the cache size and if it should be used by overriding
>>> the (this is also documented in the Javadoc):
>>>
>>> boolean useCache()
>>> int getMaxCacheSizePerKey()
>>>
>>> methods in your LuceneIndexBatchInserterImpl instance. The new changes
>>> should be available in the maven repository within an hour.
>>>
>>> 2009/12/4 Mattias Persson :
>>> > I think I found the problem... it's indexing as it should, but it
>>> > isn't reflected in getNodes/getSingleNode properly until you
>>> > flush/optimize/shutdown the index. I'll try to fix it today!
>>> >
>>> > 2009/12/3 Núria Trench :
>>> >> Thank you very much for your response.
>>> >> If you need more information, you only have to send an e-mail and I
>>> will try
>>> >> to explain it better.
>>> >>
>>> >> Núria.
>>> >>
>>> >> 2009/12/3 Mattias Persson 
>>> >>
>>> >>> This is something I'd like to reproduce and I'll do some testing on
>>> >>> this tomorrow
>>> >>>
>>> >>> 2009/12/3 Núria Trench :
>>> >>> > Hello,
>>> >>> >
>>> >>> > Last week, I decided to download your graph database core in order
>>> to use
>>> >>> > it. First, I created a new project to parse my CSV files and create
>>> a new
>>> >>> > graph database with Neo4j. This CSV files contain 150 milion edges
>>> and 20
>>> >>> > milion nodes.
>>> >>> >
>>> >>> > When I finished to write the code which will create the graph
>>> database, I
>>> >>> > executed it and, after six hours of execution, the program crashes
>>> >>> because
>>> >>> > of a Lucene exception. The exception is related to the index merging
>>> and
>>> >>> it
>>> >>> > has the following message:
>>> >>> > "mergeFields produced an invalid result: docCount is 385282378 but
>>> fdx
>>> >>> file
>>> >>> > size is 3082259028; now aborting this merge to prevent index
>>> corruption"
>>> >>> >
>>> >>> > I have searched on the net and I found that it is a lucene bug. The
>>> >>> > libraries used for executing my project were:
>>> >>> > neo-1.0-b10
>>> >>> > index-util-0.7
>>> >>> > lucene-core-2.4.0
>>> >>> >
>>> >>> > So, I decided to use a newer Lucene version. I found that you have a
>>> >>> newer
>>> >>> > index-util version so I updated the libraries:
>>> >>> > neo-1.0-b10
>>> >>> > index-util-0.9
>>> >>> > lucene-core-2.9.1
>>> >>> >
>>> >>> > When I had updated those libraries, I tried to execute my project
>>> again
>>> >>> and
>>> >>> > I found that, in many occassions, it was not indexing properly. So,
>

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-07 Thread Núria Trench
Hi again Mattias,

I have tried to execute my application with the last version available in
the maven repository and I still have the same problem. After creating and
indexing all the nodes, the application calls the "optimize" method and,
then, it creates all the edges by calling the method "getNodes" in order to
select the tail and head node of the edge, but it doesn't work because many
nodes are not found.

I have tried to create only 30 nodes and 15 edges and it works properly, but
if I try to create a big graph (180 million edges + 20 million nodes) it
doesn't.

I have also tried to call the "optimize" method every time the application
has been created 1 million nodes but it doesn't work.

Have you tried to create as many nodes as I have said with the newer
index-util version?

Thank you,

Núria.

2009/12/4 Núria Trench 

> Hi Mattias,
>
> Thank you very much for fixing the problem so fast. I will try it as soon
> as the new changes will be available in the maven repository.
>
> Núria.
>
>
> 2009/12/4 Mattias Persson 
>
>> I fixed the problem and also added a cache per key for faster
>> getNodes/getSingleNode lookup during the insert process. However the
>> cache assumes that there's nothing in the index when the process
>> starts (which almost always will be true) to speed things up even
>> further.
>>
>> You can control the cache size and if it should be used by overriding
>> the (this is also documented in the Javadoc):
>>
>> boolean useCache()
>> int getMaxCacheSizePerKey()
>>
>> methods in your LuceneIndexBatchInserterImpl instance. The new changes
>> should be available in the maven repository within an hour.
>>
>> 2009/12/4 Mattias Persson :
>> > I think I found the problem... it's indexing as it should, but it
>> > isn't reflected in getNodes/getSingleNode properly until you
>> > flush/optimize/shutdown the index. I'll try to fix it today!
>> >
>> > 2009/12/3 Núria Trench :
>> >> Thank you very much for your response.
>> >> If you need more information, you only have to send an e-mail and I
>> will try
>> >> to explain it better.
>> >>
>> >> Núria.
>> >>
>> >> 2009/12/3 Mattias Persson 
>> >>
>> >>> This is something I'd like to reproduce and I'll do some testing on
>> >>> this tomorrow
>> >>>
>> >>> 2009/12/3 Núria Trench :
>> >>> > Hello,
>> >>> >
>> >>> > Last week, I decided to download your graph database core in order
>> to use
>> >>> > it. First, I created a new project to parse my CSV files and create
>> a new
>> >>> > graph database with Neo4j. This CSV files contain 150 milion edges
>> and 20
>> >>> > milion nodes.
>> >>> >
>> >>> > When I finished to write the code which will create the graph
>> database, I
>> >>> > executed it and, after six hours of execution, the program crashes
>> >>> because
>> >>> > of a Lucene exception. The exception is related to the index merging
>> and
>> >>> it
>> >>> > has the following message:
>> >>> > "mergeFields produced an invalid result: docCount is 385282378 but
>> fdx
>> >>> file
>> >>> > size is 3082259028; now aborting this merge to prevent index
>> corruption"
>> >>> >
>> >>> > I have searched on the net and I found that it is a lucene bug. The
>> >>> > libraries used for executing my project were:
>> >>> > neo-1.0-b10
>> >>> > index-util-0.7
>> >>> > lucene-core-2.4.0
>> >>> >
>> >>> > So, I decided to use a newer Lucene version. I found that you have a
>> >>> newer
>> >>> > index-util version so I updated the libraries:
>> >>> > neo-1.0-b10
>> >>> > index-util-0.9
>> >>> > lucene-core-2.9.1
>> >>> >
>> >>> > When I had updated those libraries, I tried to execute my project
>> again
>> >>> and
>> >>> > I found that, in many occassions, it was not indexing properly. So,
>> I
>> >>> tried
>> >>> > to optimize the index after every time I indexed something. This was
>> a
>> >>> > solution because, after that, it was indexing properly but the time
>> >>> > execution increased a lot.
>> >>> >
>> >>> > I am not using transactions, instead of this, I am using the Batch
>> >>> Inserter
>> >>> > with the LuceneIndexBatchInserter.
>> >>> >
>> >>> > So, my question is: What can I do to solve this problem? If use
>> >>> > index-util-0.7 I cannot finish the execution of creating the graph
>> >>> database
>> >>> > and I use index-util-0.9 I have to optimize the index in every
>> insertion
>> >>> and
>> >>> > the execution never ever ends.
>> >>> >
>> >>> > Thank you very much in advance,
>> >>> >
>> >>> > Núria.
>> >>> > ___
>> >>> > Neo mailing list
>> >>> > User@lists.neo4j.org
>> >>> > https://lists.neo4j.org/mailman/listinfo/user
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Mattias Persson, [matt...@neotechnology.com]
>> >>> Neo Technology, www.neotechnology.com
>> >>> ___
>> >>> Neo mailing list
>> >>> User@lists.neo4j.org
>> >>> https://lists.neo4j.org/mailman/listinfo/user
>> >>>
>> >> ___
>> >> Neo mailing list
>>

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-04 Thread Núria Trench
Hi Mattias,

Thank you very much for fixing the problem so fast. I will try it as soon as
the new changes will be available in the maven repository.

Núria.

2009/12/4 Mattias Persson 

> I fixed the problem and also added a cache per key for faster
> getNodes/getSingleNode lookup during the insert process. However the
> cache assumes that there's nothing in the index when the process
> starts (which almost always will be true) to speed things up even
> further.
>
> You can control the cache size and if it should be used by overriding
> the (this is also documented in the Javadoc):
>
> boolean useCache()
> int getMaxCacheSizePerKey()
>
> methods in your LuceneIndexBatchInserterImpl instance. The new changes
> should be available in the maven repository within an hour.
>
> 2009/12/4 Mattias Persson :
> > I think I found the problem... it's indexing as it should, but it
> > isn't reflected in getNodes/getSingleNode properly until you
> > flush/optimize/shutdown the index. I'll try to fix it today!
> >
> > 2009/12/3 Núria Trench :
> >> Thank you very much for your response.
> >> If you need more information, you only have to send an e-mail and I will
> try
> >> to explain it better.
> >>
> >> Núria.
> >>
> >> 2009/12/3 Mattias Persson 
> >>
> >>> This is something I'd like to reproduce and I'll do some testing on
> >>> this tomorrow
> >>>
> >>> 2009/12/3 Núria Trench :
> >>> > Hello,
> >>> >
> >>> > Last week, I decided to download your graph database core in order to
> use
> >>> > it. First, I created a new project to parse my CSV files and create a
> new
> >>> > graph database with Neo4j. This CSV files contain 150 milion edges
> and 20
> >>> > milion nodes.
> >>> >
> >>> > When I finished to write the code which will create the graph
> database, I
> >>> > executed it and, after six hours of execution, the program crashes
> >>> because
> >>> > of a Lucene exception. The exception is related to the index merging
> and
> >>> it
> >>> > has the following message:
> >>> > "mergeFields produced an invalid result: docCount is 385282378 but
> fdx
> >>> file
> >>> > size is 3082259028; now aborting this merge to prevent index
> corruption"
> >>> >
> >>> > I have searched on the net and I found that it is a lucene bug. The
> >>> > libraries used for executing my project were:
> >>> > neo-1.0-b10
> >>> > index-util-0.7
> >>> > lucene-core-2.4.0
> >>> >
> >>> > So, I decided to use a newer Lucene version. I found that you have a
> >>> newer
> >>> > index-util version so I updated the libraries:
> >>> > neo-1.0-b10
> >>> > index-util-0.9
> >>> > lucene-core-2.9.1
> >>> >
> >>> > When I had updated those libraries, I tried to execute my project
> again
> >>> and
> >>> > I found that, in many occassions, it was not indexing properly. So, I
> >>> tried
> >>> > to optimize the index after every time I indexed something. This was
> a
> >>> > solution because, after that, it was indexing properly but the time
> >>> > execution increased a lot.
> >>> >
> >>> > I am not using transactions, instead of this, I am using the Batch
> >>> Inserter
> >>> > with the LuceneIndexBatchInserter.
> >>> >
> >>> > So, my question is: What can I do to solve this problem? If use
> >>> > index-util-0.7 I cannot finish the execution of creating the graph
> >>> database
> >>> > and I use index-util-0.9 I have to optimize the index in every
> insertion
> >>> and
> >>> > the execution never ever ends.
> >>> >
> >>> > Thank you very much in advance,
> >>> >
> >>> > Núria.
> >>> > ___
> >>> > Neo mailing list
> >>> > User@lists.neo4j.org
> >>> > https://lists.neo4j.org/mailman/listinfo/user
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Mattias Persson, [matt...@neotechnology.com]
> >>> Neo Technology, www.neotechnology.com
> >>> ___
> >>> Neo mailing list
> >>> User@lists.neo4j.org
> >>> https://lists.neo4j.org/mailman/listinfo/user
> >>>
> >> ___
> >> Neo mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> >
> >
> >
> > --
> > Mattias Persson, [matt...@neotechnology.com]
> > Neo Technology, www.neotechnology.com
> >
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Neo Technology, www.neotechnology.com
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-04 Thread Mattias Persson
I fixed the problem and also added a cache per key for faster
getNodes/getSingleNode lookup during the insert process. However the
cache assumes that there's nothing in the index when the process
starts (which almost always will be true) to speed things up even
further.

You can control the cache size and if it should be used by overriding
the (this is also documented in the Javadoc):

boolean useCache()
int getMaxCacheSizePerKey()

methods in your LuceneIndexBatchInserterImpl instance. The new changes
should be available in the maven repository within an hour.

2009/12/4 Mattias Persson :
> I think I found the problem... it's indexing as it should, but it
> isn't reflected in getNodes/getSingleNode properly until you
> flush/optimize/shutdown the index. I'll try to fix it today!
>
> 2009/12/3 Núria Trench :
>> Thank you very much for your response.
>> If you need more information, you only have to send an e-mail and I will try
>> to explain it better.
>>
>> Núria.
>>
>> 2009/12/3 Mattias Persson 
>>
>>> This is something I'd like to reproduce and I'll do some testing on
>>> this tomorrow
>>>
>>> 2009/12/3 Núria Trench :
>>> > Hello,
>>> >
>>> > Last week, I decided to download your graph database core in order to use
>>> > it. First, I created a new project to parse my CSV files and create a new
>>> > graph database with Neo4j. This CSV files contain 150 milion edges and 20
>>> > milion nodes.
>>> >
>>> > When I finished to write the code which will create the graph database, I
>>> > executed it and, after six hours of execution, the program crashes
>>> because
>>> > of a Lucene exception. The exception is related to the index merging and
>>> it
>>> > has the following message:
>>> > "mergeFields produced an invalid result: docCount is 385282378 but fdx
>>> file
>>> > size is 3082259028; now aborting this merge to prevent index corruption"
>>> >
>>> > I have searched on the net and I found that it is a lucene bug. The
>>> > libraries used for executing my project were:
>>> > neo-1.0-b10
>>> > index-util-0.7
>>> > lucene-core-2.4.0
>>> >
>>> > So, I decided to use a newer Lucene version. I found that you have a
>>> newer
>>> > index-util version so I updated the libraries:
>>> > neo-1.0-b10
>>> > index-util-0.9
>>> > lucene-core-2.9.1
>>> >
>>> > When I had updated those libraries, I tried to execute my project again
>>> and
>>> > I found that, in many occassions, it was not indexing properly. So, I
>>> tried
>>> > to optimize the index after every time I indexed something. This was a
>>> > solution because, after that, it was indexing properly but the time
>>> > execution increased a lot.
>>> >
>>> > I am not using transactions, instead of this, I am using the Batch
>>> Inserter
>>> > with the LuceneIndexBatchInserter.
>>> >
>>> > So, my question is: What can I do to solve this problem? If use
>>> > index-util-0.7 I cannot finish the execution of creating the graph
>>> database
>>> > and I use index-util-0.9 I have to optimize the index in every insertion
>>> and
>>> > the execution never ever ends.
>>> >
>>> > Thank you very much in advance,
>>> >
>>> > Núria.
>>> > ___
>>> > Neo mailing list
>>> > User@lists.neo4j.org
>>> > https://lists.neo4j.org/mailman/listinfo/user
>>> >
>>>
>>>
>>>
>>> --
>>> Mattias Persson, [matt...@neotechnology.com]
>>> Neo Technology, www.neotechnology.com
>>> ___
>>> Neo mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>> ___
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Neo Technology, www.neotechnology.com
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-04 Thread Mattias Persson
I think I found the problem... it's indexing as it should, but it
isn't reflected in getNodes/getSingleNode properly until you
flush/optimize/shutdown the index. I'll try to fix it today!

2009/12/3 Núria Trench :
> Thank you very much for your response.
> If you need more information, you only have to send an e-mail and I will try
> to explain it better.
>
> Núria.
>
> 2009/12/3 Mattias Persson 
>
>> This is something I'd like to reproduce and I'll do some testing on
>> this tomorrow
>>
>> 2009/12/3 Núria Trench :
>> > Hello,
>> >
>> > Last week, I decided to download your graph database core in order to use
>> > it. First, I created a new project to parse my CSV files and create a new
>> > graph database with Neo4j. This CSV files contain 150 milion edges and 20
>> > milion nodes.
>> >
>> > When I finished to write the code which will create the graph database, I
>> > executed it and, after six hours of execution, the program crashes
>> because
>> > of a Lucene exception. The exception is related to the index merging and
>> it
>> > has the following message:
>> > "mergeFields produced an invalid result: docCount is 385282378 but fdx
>> file
>> > size is 3082259028; now aborting this merge to prevent index corruption"
>> >
>> > I have searched on the net and I found that it is a lucene bug. The
>> > libraries used for executing my project were:
>> > neo-1.0-b10
>> > index-util-0.7
>> > lucene-core-2.4.0
>> >
>> > So, I decided to use a newer Lucene version. I found that you have a
>> newer
>> > index-util version so I updated the libraries:
>> > neo-1.0-b10
>> > index-util-0.9
>> > lucene-core-2.9.1
>> >
>> > When I had updated those libraries, I tried to execute my project again
>> and
>> > I found that, in many occassions, it was not indexing properly. So, I
>> tried
>> > to optimize the index after every time I indexed something. This was a
>> > solution because, after that, it was indexing properly but the time
>> > execution increased a lot.
>> >
>> > I am not using transactions, instead of this, I am using the Batch
>> Inserter
>> > with the LuceneIndexBatchInserter.
>> >
>> > So, my question is: What can I do to solve this problem? If use
>> > index-util-0.7 I cannot finish the execution of creating the graph
>> database
>> > and I use index-util-0.9 I have to optimize the index in every insertion
>> and
>> > the execution never ever ends.
>> >
>> > Thank you very much in advance,
>> >
>> > Núria.
>> > ___
>> > Neo mailing list
>> > User@lists.neo4j.org
>> > https://lists.neo4j.org/mailman/listinfo/user
>> >
>>
>>
>>
>> --
>> Mattias Persson, [matt...@neotechnology.com]
>> Neo Technology, www.neotechnology.com
>> ___
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-03 Thread Núria Trench
Thank you very much for your response.
If you need more information, you only have to send an e-mail and I will try
to explain it better.

Núria.

2009/12/3 Mattias Persson 

> This is something I'd like to reproduce and I'll do some testing on
> this tomorrow
>
> 2009/12/3 Núria Trench :
> > Hello,
> >
> > Last week, I decided to download your graph database core in order to use
> > it. First, I created a new project to parse my CSV files and create a new
> > graph database with Neo4j. This CSV files contain 150 milion edges and 20
> > milion nodes.
> >
> > When I finished to write the code which will create the graph database, I
> > executed it and, after six hours of execution, the program crashes
> because
> > of a Lucene exception. The exception is related to the index merging and
> it
> > has the following message:
> > "mergeFields produced an invalid result: docCount is 385282378 but fdx
> file
> > size is 3082259028; now aborting this merge to prevent index corruption"
> >
> > I have searched on the net and I found that it is a lucene bug. The
> > libraries used for executing my project were:
> > neo-1.0-b10
> > index-util-0.7
> > lucene-core-2.4.0
> >
> > So, I decided to use a newer Lucene version. I found that you have a
> newer
> > index-util version so I updated the libraries:
> > neo-1.0-b10
> > index-util-0.9
> > lucene-core-2.9.1
> >
> > When I had updated those libraries, I tried to execute my project again
> and
> > I found that, in many occassions, it was not indexing properly. So, I
> tried
> > to optimize the index after every time I indexed something. This was a
> > solution because, after that, it was indexing properly but the time
> > execution increased a lot.
> >
> > I am not using transactions, instead of this, I am using the Batch
> Inserter
> > with the LuceneIndexBatchInserter.
> >
> > So, my question is: What can I do to solve this problem? If use
> > index-util-0.7 I cannot finish the execution of creating the graph
> database
> > and I use index-util-0.9 I have to optimize the index in every insertion
> and
> > the execution never ever ends.
> >
> > Thank you very much in advance,
> >
> > Núria.
> > ___
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Neo Technology, www.neotechnology.com
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-03 Thread Mattias Persson
This is something I'd like to reproduce and I'll do some testing on
this tomorrow

2009/12/3 Núria Trench :
> Hello,
>
> Last week, I decided to download your graph database core in order to use
> it. First, I created a new project to parse my CSV files and create a new
> graph database with Neo4j. This CSV files contain 150 milion edges and 20
> milion nodes.
>
> When I finished to write the code which will create the graph database, I
> executed it and, after six hours of execution, the program crashes because
> of a Lucene exception. The exception is related to the index merging and it
> has the following message:
> "mergeFields produced an invalid result: docCount is 385282378 but fdx file
> size is 3082259028; now aborting this merge to prevent index corruption"
>
> I have searched on the net and I found that it is a lucene bug. The
> libraries used for executing my project were:
> neo-1.0-b10
> index-util-0.7
> lucene-core-2.4.0
>
> So, I decided to use a newer Lucene version. I found that you have a newer
> index-util version so I updated the libraries:
> neo-1.0-b10
> index-util-0.9
> lucene-core-2.9.1
>
> When I had updated those libraries, I tried to execute my project again and
> I found that, in many occassions, it was not indexing properly. So, I tried
> to optimize the index after every time I indexed something. This was a
> solution because, after that, it was indexing properly but the time
> execution increased a lot.
>
> I am not using transactions, instead of this, I am using the Batch Inserter
> with the LuceneIndexBatchInserter.
>
> So, my question is: What can I do to solve this problem? If use
> index-util-0.7 I cannot finish the execution of creating the graph database
> and I use index-util-0.9 I have to optimize the index in every insertion and
> the execution never ever ends.
>
> Thank you very much in advance,
>
> Núria.
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] LuceneIndexBatchInserter doubt

2009-12-03 Thread Núria Trench
Hello,

Last week, I decided to download your graph database core in order to use
it. First, I created a new project to parse my CSV files and create a new
graph database with Neo4j. This CSV files contain 150 milion edges and 20
milion nodes.

When I finished to write the code which will create the graph database, I
executed it and, after six hours of execution, the program crashes because
of a Lucene exception. The exception is related to the index merging and it
has the following message:
"mergeFields produced an invalid result: docCount is 385282378 but fdx file
size is 3082259028; now aborting this merge to prevent index corruption"

I have searched on the net and I found that it is a lucene bug. The
libraries used for executing my project were:
neo-1.0-b10
index-util-0.7
lucene-core-2.4.0

So, I decided to use a newer Lucene version. I found that you have a newer
index-util version so I updated the libraries:
neo-1.0-b10
index-util-0.9
lucene-core-2.9.1

When I had updated those libraries, I tried to execute my project again and
I found that, in many occassions, it was not indexing properly. So, I tried
to optimize the index after every time I indexed something. This was a
solution because, after that, it was indexing properly but the time
execution increased a lot.

I am not using transactions, instead of this, I am using the Batch Inserter
with the LuceneIndexBatchInserter.

So, my question is: What can I do to solve this problem? If use
index-util-0.7 I cannot finish the execution of creating the graph database
and I use index-util-0.9 I have to optimize the index in every insertion and
the execution never ever ends.

Thank you very much in advance,

Núria.
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user