[Neo] I/O load in Neo during traversals

2009-12-09 Thread Rick Bullotta
When doing some large traversal testing (no writes/updates), I noticed that
the neostore.propertystore.db.strings file was seeing a lot of read I/O (as
expected) but also a huge amount of write I/O (almost 5X the read I/O rate).
Out of curiosity, what is the write activity that needs to occur when doing
traversals?

 

 

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Troubleshooting performance/memory issues

2009-12-09 Thread Rick Bullotta
FYI, we experimented with different heap size (1GB), along with different
"chunk sizes", and were able to eliminate the heap error and get about a 10X
improvement in insert speed.  It would be helpful to better understand the
interactions of the various Neo startup parameters, transaction buffers, and
so on, and their impact on performance.  I read the performance guidelines,
which was some help, but perhaps some additional scenario-based
recommendations might help (frequent updates/frequent access, infrequent
update/frequent access, burst mode update vs steady update rate, etc...).  

Learning more about Neo every hour!

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Rick Bullotta
Sent: Wednesday, December 09, 2009 2:57 PM
To: 'Neo user discussions'
Subject: [Neo] Troubleshooting performance/memory issues

Hi, all.

 

When trying to load a few hundred thousand nodes & relationships (chunking
it in groups of 1000 nodes or so), we are getting an out of memory heap
error after 15-20 minutes or so.  No big deal, we expanded the heap settings
for the JVM.  But then we also noticed that the nioneo_logical_log.xxx file
was continuing to grow, even though we were wrapping each 1000 node inserts
in their own transaction (there is no other transaction active) and
committing w/success and finishing each group of 1000.Periodically
(seemingly unrelated to our transaction finishing), that file shrinks again
and the data is flushed to the other neo propertystore and relationshipstore
files.  I just wanted to check if that was normal behavior, or if there is
something wrong with way we (or Neo) is handling the transactions, and thus
the reason we hit an out-of-memory error.

 

Thanks,

 

Rick

 

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Troubleshooting performance/memory issues

2009-12-09 Thread Rick Bullotta
Hi, all.

 

When trying to load a few hundred thousand nodes & relationships (chunking
it in groups of 1000 nodes or so), we are getting an out of memory heap
error after 15-20 minutes or so.  No big deal, we expanded the heap settings
for the JVM.  But then we also noticed that the nioneo_logical_log.xxx file
was continuing to grow, even though we were wrapping each 1000 node inserts
in their own transaction (there is no other transaction active) and
committing w/success and finishing each group of 1000.Periodically
(seemingly unrelated to our transaction finishing), that file shrinks again
and the data is flushed to the other neo propertystore and relationshipstore
files.  I just wanted to check if that was normal behavior, or if there is
something wrong with way we (or Neo) is handling the transactions, and thus
the reason we hit an out-of-memory error.

 

Thanks,

 

Rick

 

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Noob questions/comments

2009-12-09 Thread Rick Bullotta
Hi, all.

 

Here are a few questions and comments that I'd welcome feedback on :

 

Questions:

 

-  If you delete the reference node (id = 0), how can you recreate
it?

-  If you have a number of "loose" or disjoint graphs structured as
trees with a single root node, is there a best practice for
tracking/iterating only the top level node(s) of these disjoint graphs?  Is
relating them to the reference node and doing a first level traversal the
best way?

-  We would like to treat our properties as slightly more complex
than a simple type (they might have a last modified date, validity flag, and
so on) - given the choice between adding properties to track this state or
using nodes and relationships for these entities, what are the pros and cons
of each approach?

-  One aspect of our application will store nodes that can be
considered similar to event logs.  There may be many thousands of these
nodes per "event stream".  We would like to be able to traverse the entries
in chronological order, very quickly.  We were considering the following
design possibilities:

o   Simply create a node for each "stream" and a node for each entry, with a
relationship between the stream and the entry, then implement our own sort
routine

o   Similar to the above, but create a node for each "day", and manage
relationships to allow traversal by stream and/or day

o   Create a node for each stream, a node for each entry and treat the
entries as a forward-only linked list using relationships between the
entries (and of course a relationship between the stream and the "first"
entry)

-  Has the fact that the node id is an "int" rather than a "long"
been an issue in any implementations?  Are node id's reused if deleted (I
suspect not, but just wanted to confirm).

-  Any whitepaper/best practices for high availability/load-balanced
scenarios?  We were considering using a message queue to send "deltas"
around between nodes or something similar.

-  We'll be hosting Neo inside a servlet engine.  Plan was to start
up Neo within the init method of an autoloading servlet.  Any other
recommendations/suggestions?  Best practice for ensuring a clean shutdown?

-  Anyone used any kind of intermediate index or other approach to
bridge multiple Neo instances?

-  Any GUI tools for viewing/navigating the graph structure?  We are
prototyping one in Adobe Flex, curious if there are others.

 

Comments/observations:

-  I love the fact that you can delete nodes and relationships from
inside an iterator.  I always hated the way I had to separately maintain a
list of "things to be deleted" when traversing XML DOMs, for example.  Nice
capability!

-  Neo seems FAST!

-  It's a bit of a major mindset change, but once the lightbulb goes
on, the potential seems limitless!

 

Thanks in advance for guidance.

 

Rick

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Lucene Index Corruption?

2009-12-09 Thread Adam Rabung
Hi,
Lots of questions :)

1. This iterator will just prevent duplicates from being returned from
the iterator?  If there's a condition (bug in my code) that causes
shutdown w/ open transactions, will the Lucene indexes continue to
double until they're huge?

2. Would it be possible to detect this situation, and rebuild the
indexes?  I guess this is a losing cause if the app is regularly
corrupting the data.

3. Could you allow me to close transacations from different threads?
Yesterday, I wrote something that tracks tx opens and closes, and
could iterate through all open transactions and call finish() on them.
 But TransactionImpl.finish seems to assume the calling thread is the
creating thread, which is not the case here.

4. Better yet, expose API for me to force-finish all open
transactions?  I'd rather have a botched transaction than a corrupt
index.

5. Is the only condition for this open transactions + a Lucene
shutdown (via shutdown() OR abrubt process termination)?  In further
testing, it seems I can't reproduce the problem w/ a clean or dirty
shutdown if all transactions are closed.

6. I assume your iterator fix will make b11?  What are the chances the
root cause will be fixed in b11?  Do you have a tentative release date
for b11?

Thanks,
Adam


On Wed, Dec 9, 2009 at 9:02 AM, Mattias Persson
 wrote:
> Hi Adam,
>
> We're aware of such problems and I just now committed a fix which
> basically is a cover-up until those bugs are fixed... the iterable
> from getNodes() now runs through a filter (lazily before each next())
> so your problem should go away.
>
> 2009/12/8 Adam Rabung :
>> Hi,
>> I've recently run into problems with indexes becoming corrupt after
>> unclean shutdowns. Basically:
>> 1. Transaction 1 writes some data
>> 2. Transaction 2 reads some data, and is left open
>> 3. The database is shut down, with warnings about an open transaction
>> 4. The database is opened.  Recovery executes, but it appears the
>> Lucence indexes are "doubled" - that is, where we used to have key =>
>> (value1), we now have key => (value1, value1).
>>
>> I've attached a JUnit test case that hopefully reproduces this for
>> you.  I'm on Java 5, Mac OS 10.5, neo-1.0-b10.jar, and
>> index-util-0.8.jar
>>
>> Obviously, the first step on my end is to make sure any open
>> transactions are closed before attempting a shutdown.  However, I'm
>> able to pretty reliably reproduce this problem in a much scarier way -
>> just killing a running Neo process via the Eclipse "Console" view "red
>> square" process stop button.  Amazingly, Eclipse doesn't properly shut
>> down processes properly when this button is used, so I can't count on
>> shutdown hooks:
>> https://bugs.eclipse.org/bugs/show_bug.cgi?id=38016
>>
>> What expectations should I have for corruption when a database +
>> indexes are .shutDown() with open transactions?
>> What expectations should I have for corruption when a database +
>> indexes are terminated abruptly (Eclipse Console, power outage, etc)?
>> Beyond proper transaction management, and ensuring shutDown() is
>> called, is there anything I should be doing to help protect this data?
> I don't know if there's anything you could do. The problem is that we
> can't at the moment make lucene participate (I mean _really_
> participate) in a 2 phase commit together with the NeoService, but we
> will fix these issues in a near future.
>
> Until then, I think you'll be fine with this new fix
>>
>> Thanks,
>> Adam
>>
>> ___
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>>
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Neo Technology, www.neotechnology.com
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Type metadata in properties/nodes

2009-12-09 Thread Rick Bullotta
Hi, Tobias.

Actually, I think we'll use your approach for the "known relationships" and
"known types" (there are quite a few in our domain model) in addition to the
dynamic approach.

Thanks for the help!

Rick

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Tobias Ivarsson
Sent: Wednesday, December 09, 2009 8:54 AM
To: Neo user discussions
Subject: Re: [Neo] Type metadata in properties/nodes

I see. I realized that this was what you were after. What I was proposing
was that you would know the types for the properties given the type of the
node. The types for the nodes in your case would be more abstract, perhaps
just defined by the set of properties. I used concrete types in my
explanation because it usually helps people understand what I mean with
utilizing the navigation context.

I had a suspicion that your particular application might not benefit from
this approach, but I wanted to throw it into the mix for the sake of
completeness of the discussion, since there are a lot more people reading
the list than writing in a particular thread.

Cheers,
Tobias

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Lucene Index Corruption?

2009-12-09 Thread Mattias Persson
Hi Adam,

We're aware of such problems and I just now committed a fix which
basically is a cover-up until those bugs are fixed... the iterable
from getNodes() now runs through a filter (lazily before each next())
so your problem should go away.

2009/12/8 Adam Rabung :
> Hi,
> I've recently run into problems with indexes becoming corrupt after
> unclean shutdowns. Basically:
> 1. Transaction 1 writes some data
> 2. Transaction 2 reads some data, and is left open
> 3. The database is shut down, with warnings about an open transaction
> 4. The database is opened.  Recovery executes, but it appears the
> Lucence indexes are "doubled" - that is, where we used to have key =>
> (value1), we now have key => (value1, value1).
>
> I've attached a JUnit test case that hopefully reproduces this for
> you.  I'm on Java 5, Mac OS 10.5, neo-1.0-b10.jar, and
> index-util-0.8.jar
>
> Obviously, the first step on my end is to make sure any open
> transactions are closed before attempting a shutdown.  However, I'm
> able to pretty reliably reproduce this problem in a much scarier way -
> just killing a running Neo process via the Eclipse "Console" view "red
> square" process stop button.  Amazingly, Eclipse doesn't properly shut
> down processes properly when this button is used, so I can't count on
> shutdown hooks:
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=38016
>
> What expectations should I have for corruption when a database +
> indexes are .shutDown() with open transactions?
> What expectations should I have for corruption when a database +
> indexes are terminated abruptly (Eclipse Console, power outage, etc)?
> Beyond proper transaction management, and ensuring shutDown() is
> called, is there anything I should be doing to help protect this data?
I don't know if there's anything you could do. The problem is that we
can't at the moment make lucene participate (I mean _really_
participate) in a 2 phase commit together with the NeoService, but we
will fix these issues in a near future.

Until then, I think you'll be fine with this new fix
>
> Thanks,
> Adam
>
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Type metadata in properties/nodes

2009-12-09 Thread Tobias Ivarsson
I see. I realized that this was what you were after. What I was proposing
was that you would know the types for the properties given the type of the
node. The types for the nodes in your case would be more abstract, perhaps
just defined by the set of properties. I used concrete types in my
explanation because it usually helps people understand what I mean with
utilizing the navigation context.

I had a suspicion that your particular application might not benefit from
this approach, but I wanted to throw it into the mix for the sake of
completeness of the discussion, since there are a lot more people reading
the list than writing in a particular thread.

Cheers,
Tobias

On Wed, Dec 9, 2009 at 2:02 PM, wrote:

>   Hi, Tobias.
>
>
>
>   Thanks for your thoughts and ideas.
>
>
>
>   My requirement is not only to know the "type" of something, but also to
>   store metadata for "types" so that I can catalog the "property type" of
>   each individual property in a node for a given "type".  It's a bit
>   complicated, but we are allowing very dynamic declarative "types" that
>   will not have an explicit compiled Java class wrapper for each "type"
>   (we will have a generic wrapper that deals with the "dynamic" type, and
>   some explicit wrapper for pre-defined entities).   The main reason is
>   that we need to deal with a few data types beyond the Java primitives
>   and String(s).  For example, we want to be able to know contextually
>   that a property is a "timestamp" or a "hyperlink".  Thus the need for
>   the extra (but relatively simple) metadata.
>
>
>
>   It might be useful to identify a commonly use subset of addition
>   property types that correspond to, for example, the most common RDBMS
>   data types and XML schema types.  This might include date, time,
>   datetime, link, and so on.  Since at the persistence level it appears
>   that a property is saved along with an integer enumeration of its
>   "simple type", perhaps there is an extensibility model that could be
>   implemented to allow these application-specific types to be created and
>   managed.  I know that would be problematic, though, given that the
>   current implementation is an enumeration.  No worries though, since
>   there are perfectly good workarounds/alternatives using relationships.
>
>
>
>   Cheers,
>
>
>
>   Rick
>
>
>
>
>
>    Original Message 
>   Subject: Re: [Neo] Type metadata in properties/nodes
>From: Tobias Ivarsson 
>   Date: Wed, December 09, 2009 5:39 am
>   To: Neo user discussions 
>   Associating nodes with a type node is a good approach, especially if
>   you
>   want to be able to do queries like "give me all nodes of type X". But
>   for
>   knowing the semantic type of a node when found through a general
>   traversal I
>   prefer to use the navigational context of the node. For example if I
>   have a
>   Person-node I know that the node at the other end of a
>   FRIEND-relationship
>   will be a Person-node as well. Or if I have i Car-node I know that the
>   node
>   at the other end of a OWNER-relationship will be either a Person or a
>   Company, both of which probably have enough in common for me to be able
>   to
>   get an address (for sending them the parking ticket or what ever), if I
>   need
>   to specifically know if it's a Person or a Company, I could use some
>   property for that information (or check the relationship to a type
>   node),
>   but most of the semantic information would be known from how I reached
>   the
>   node.
>   I have added a note about this to the FAQ in the wiki.
>   Cheers,
>   Tobias
>   On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta <
>   rick.bullo...@burningskysoftware.com> wrote:
>   > Thanks, Peter. Good info. I think we ended up with a hybrid approach:
>   we
>   > modeled a set of "Type" nodes (related to a master "Types" node),
>   each of
>   > which includes the type metadata (property/type data) for a specific
>   > "type".
>   > "Instance" nodes then maintain a two-way relationship with their
>   associated
>   > "Type" node so that any node can quickly obtain its Type node and so
>   we can
>   > easily traverse all instances of a specific type...and we may end up
>   > extending this such that the properties themselves are each a node of
>   their
>   > own, in some cases, where we need to be able to
>   relate/search/traverse at a
>   > very detailed level. I suppose that depends on the performance
>   > implications
>   > of having lots more nodes and relationships.
>   >
>   > In any case, it definitely seems "do-able" with Neo.
>   >
>   >
>   >
>   >
>   > -Original Message-
>   > From: user-boun...@lists.neo4j.org
>[[1]mailto:user-boun...@lists.neo4j.org]
>   > On
>   > Behalf Of Peter Neubauer
>   > Sent: Tuesday, December 08, 2009 3:25 PM
>   > To: Neo user discussions
>   > Subject: Re: [Neo] Type metadata in properties/nodes
>   >
>   > Hi Rick,
>   > there are a number of interesting approaches to this, i

Re: [Neo] Type metadata in properties/nodes

2009-12-09 Thread rick . bullotta
   Hi, Tobias.



   Thanks for your thoughts and ideas.



   My requirement is not only to know the "type" of something, but also to
   store metadata for "types" so that I can catalog the "property type" of
   each individual property in a node for a given "type".  It's a bit
   complicated, but we are allowing very dynamic declarative "types" that
   will not have an explicit compiled Java class wrapper for each "type"
   (we will have a generic wrapper that deals with the "dynamic" type, and
   some explicit wrapper for pre-defined entities).   The main reason is
   that we need to deal with a few data types beyond the Java primitives
   and String(s).  For example, we want to be able to know contextually
   that a property is a "timestamp" or a "hyperlink".  Thus the need for
   the extra (but relatively simple) metadata.



   It might be useful to identify a commonly use subset of addition
   property types that correspond to, for example, the most common RDBMS
   data types and XML schema types.  This might include date, time,
   datetime, link, and so on.  Since at the persistence level it appears
   that a property is saved along with an integer enumeration of its
   "simple type", perhaps there is an extensibility model that could be
   implemented to allow these application-specific types to be created and
   managed.  I know that would be problematic, though, given that the
   current implementation is an enumeration.  No worries though, since
   there are perfectly good workarounds/alternatives using relationships.



   Cheers,



   Rick





    Original Message 
   Subject: Re: [Neo] Type metadata in properties/nodes
   From: Tobias Ivarsson 
   Date: Wed, December 09, 2009 5:39 am
   To: Neo user discussions 
   Associating nodes with a type node is a good approach, especially if
   you
   want to be able to do queries like "give me all nodes of type X". But
   for
   knowing the semantic type of a node when found through a general
   traversal I
   prefer to use the navigational context of the node. For example if I
   have a
   Person-node I know that the node at the other end of a
   FRIEND-relationship
   will be a Person-node as well. Or if I have i Car-node I know that the
   node
   at the other end of a OWNER-relationship will be either a Person or a
   Company, both of which probably have enough in common for me to be able
   to
   get an address (for sending them the parking ticket or what ever), if I
   need
   to specifically know if it's a Person or a Company, I could use some
   property for that information (or check the relationship to a type
   node),
   but most of the semantic information would be known from how I reached
   the
   node.
   I have added a note about this to the FAQ in the wiki.
   Cheers,
   Tobias
   On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta <
   rick.bullo...@burningskysoftware.com> wrote:
   > Thanks, Peter. Good info. I think we ended up with a hybrid approach:
   we
   > modeled a set of "Type" nodes (related to a master "Types" node),
   each of
   > which includes the type metadata (property/type data) for a specific
   > "type".
   > "Instance" nodes then maintain a two-way relationship with their
   associated
   > "Type" node so that any node can quickly obtain its Type node and so
   we can
   > easily traverse all instances of a specific type...and we may end up
   > extending this such that the properties themselves are each a node of
   their
   > own, in some cases, where we need to be able to
   relate/search/traverse at a
   > very detailed level. I suppose that depends on the performance
   > implications
   > of having lots more nodes and relationships.
   >
   > In any case, it definitely seems "do-able" with Neo.
   >
   >
   >
   >
   > -Original Message-
   > From: user-boun...@lists.neo4j.org
   [[1]mailto:user-boun...@lists.neo4j.org]
   > On
   > Behalf Of Peter Neubauer
   > Sent: Tuesday, December 08, 2009 3:25 PM
   > To: Neo user discussions
   > Subject: Re: [Neo] Type metadata in properties/nodes
   >
   > Hi Rick,
   > there are a number of interesting approaches to this, involving both
   > ways to retain the metadata:
   >
   > 1. RDF and OWL
   > - basically, every node will maintain a relationship to its type node
   > (your shadow node), something like x?--RDF:TYPE-->type_node which
   > contains info on what the type is, what properties etc.
   >
   > 2. Neo4j Meta package ([2]http://components.neo4j.org/neo-meta/)
   > - this is the concept of describing the type of things in code (Java
   > in this case) and thus in code enforce the restrictions and type
   > conversions on properties through the code. This does not capture any
   > meta info in the graph but is easy to do.
   >
   > 3. Annotate the nodes with type info
   > - in this approach, there is a "type" or "classname" property on any
   > node that is used to derive the type to deserialize/serialize the
   > object into, th

Re: [Neo] Type metadata in properties/nodes

2009-12-09 Thread Tobias Ivarsson
Associating nodes with a type node is a good approach, especially if you
want to be able to do queries like "give me all nodes of type X". But for
knowing the semantic type of a node when found through a general traversal I
prefer to use the navigational context of the node. For example if I have a
Person-node I know that the node at the other end of a FRIEND-relationship
will be a Person-node as well. Or if I have i Car-node I know that the node
at the other end of a OWNER-relationship will be either a Person or a
Company, both of which probably have enough in common for me to be able to
get an address (for sending them the parking ticket or what ever), if I need
to specifically know if it's a Person or a Company, I could use some
property for that information (or check the relationship to a type node),
but most of the semantic information would be known from how I reached the
node.

I have added a note about this to the FAQ in the wiki.

Cheers,
Tobias

On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta <
rick.bullo...@burningskysoftware.com> wrote:

> Thanks, Peter.  Good info.  I think we ended up with a hybrid approach: we
> modeled a set of "Type" nodes (related to a master "Types" node), each of
> which includes the type metadata (property/type data) for a specific
> "type".
> "Instance" nodes then maintain a two-way relationship with their associated
> "Type" node so that any node can quickly obtain its Type node and so we can
> easily traverse all instances of a specific type...and we may end up
> extending this such that the properties themselves are each a node of their
> own, in some cases, where we need to be able to relate/search/traverse at a
> very detailed level.  I suppose that depends on the performance
> implications
> of having lots more nodes and relationships.
>
> In any case, it definitely seems "do-able" with Neo.
>
>
>
>
> -Original Message-
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
> On
> Behalf Of Peter Neubauer
> Sent: Tuesday, December 08, 2009 3:25 PM
> To: Neo user discussions
> Subject: Re: [Neo] Type metadata in properties/nodes
>
> Hi Rick,
> there are a number of interesting approaches to this, involving both
> ways to retain the metadata:
>
> 1. RDF and OWL
> - basically, every node will maintain a relationship to its type node
> (your shadow node), something like x?--RDF:TYPE-->type_node which
> contains info on what the type is, what properties etc.
>
> 2. Neo4j Meta package (http://components.neo4j.org/neo-meta/)
> - this is the concept of describing the type of things in code (Java
> in this case) and thus in code enforce the restrictions and type
> conversions on properties through the code. This does not capture any
> meta info in the graph but is easy to do.
>
> 3. Annotate the nodes with type info
> - in this approach, there is a "type" or "classname" property on any
> node that is used to derive the type to deserialize/serialize the
> object into, the rest of the meta info is contained in the upper code
> layers. Andreas Ronges JRuby bindings are using this approach.
>
> 4. Encode everything into a String property
> - this approach means shuffling everything into a string property,
> basically treating properties as BLOBs. Works in some cases, but
> certainly locks down your data in these properties.
>
> What is best depends on your domain, and there might be more
> approaches out there. I sense that you are asking even for an
> extensible type system especially on properties. That is not in scope
> of the core graph engine, but I am not sure if in theory it would be
> possible to extend the property type system, we would need to discuss
> that separately.
>
> Cheers,
>
> /peter neubauer
>
> COO and Sales, Neo Technology
>
> GTalk:  neubauer.peter
> Skype   peter.neubauer
> Phone   +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter  http://twitter.com/peterneubauer
>
> http://www.neo4j.org- Relationships count.
> http://gremlin.tinkerpop.com - PageRank in 2 lines of code.
>
>
>
> On Tue, Dec 8, 2009 at 8:43 PM, Rick Bullotta
>  wrote:
> > I can see how relationships could be used to map "is a duck." typing, but
> > I'm struggling with how to infer type from properties.  In particular,
> while
> > anything could be stuffed into a String, it loses important semantics
> when
> > you do so.  I'm not referring to *storage* as a String, which makes
> plenty
> > of sense - it's that the type identity of the source property is lost if
> you
> > do so.  I could maintain a "shadow node" of the type metadata that could
> be
> > related to each instance with a property name/property type array, but
> that
> > seems like something that would be useful within the node model itself.
> >
> >
> >
> > Types like DateTime, hyperlinks, and so on, while quite easily storable
> in
> > Neo4J, lose useful semantics on the way in.  I'd welcome your thoughts on
> > how others have managed thi

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Mattias,

I have already done it 10 minutes ago. If you need an example to see the
format of the 4 csv files, I can send it to you.
Thanks again,

Núria.

2009/12/9 Mattias Persson 

> Oh ok, It could be our attachments filter / security or something...
> could you try to mail them to me directly at matt...@neotechnology.com
> ?
>
> 2009/12/9 Núria Trench :
> > Hi Mattias,
> >
> > In my last e-mail I have attached the sample code, haven't you received
> it?
> > I will try to attach it again.
> >
> > Núria.
> >
> > 2009/12/9 Mattias Persson 
> >
> >> Hi again, Núria (it was I, Mattias who asked for the sample code).
> >> Well... the fact that you parse 4 csv files doesn't really help me
> >> setup a test for this... I mean how can I know that my test will be
> >> similar to yours? Would it be ok to attach your code/csv files as
> >> well?
> >>
> >> / Mattias
> >>
> >> 2009/12/9 Núria Trench :
> >> > Hi Todd,
> >> >
> >> > The sample code creates nodes and relationships by parsing 4 csv
> files.
> >> > Thank you for trying to trigger this behaviour with this sample.
> >> >
> >> > Núria
> >> >
> >> > 2009/12/9 Mattias Persson 
> >> >
> >> >> Could you provide me with some sample code which can trigger this
> >> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
> >> >>
> >> >> 2009/12/9 Núria Trench :
> >> >> > Todd,
> >> >> >
> >> >> > I haven't the same problem. In my case, after indexing all the
> >> >> > attributes/properties of each node, the application creates all the
> >> edges
> >> >> by
> >> >> > looking up the tail node and the head node. So, it calls the method
> >> >> > "org.neo4j.util.index.
> >> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no
> found
> >> >> node)
> >> >> > in many occasions.
> >> >> >
> >> >> > Any one has an alternative to get a node with indexex
> >> >> attributes/properties?
> >> >> >
> >> >> > Thank you,
> >> >> >
> >> >> > Núria.
> >> >> >
> >> >> >
> >> >> > 2009/12/7 Mattias Persson 
> >> >> >
> >> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT?
> This
> >> >> >> is a bug that we fixed yesterday... (assuming it's the same bug).
> >> >> >>
> >> >> >> 2009/12/7 Todd Stavish :
> >> >> >> > Hi Mattias, Núria.
> >> >> >> >
> >> >> >> > I am also running into scalability problems with the Lucene
> batch
> >> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
> >> >> >> > calling optimize more. Increasing ulimit didn't help.
> >> >> >> >
> >> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
> >> >> >> > java.io.FileNotFoundException:
> >> >> >> >
> >> >> >>
> >> >>
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> >> >> > (Too many open files)
> >> >> >> > [INFO]  at
> >> >> >>
> >> >>
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> >> >> >> > [INFO]  at
> >> >> >>
> >> >>
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> >> >> >> > [INFO]  at
> >> >> >>
> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> >> >> >> > [INFO]  at
> >> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> >> >> >> > [INFO] Caused by: java.io.FileNotFoundException:
> >> >> >> >
> >> >> >>
> >> >>
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> >> >> > (Too many open files)
> >> >> >> >
> >> >> >> > I tried breaking up to separate batchinserter instances, and it
> >> hangs
> >> >> >> > now. Can I create more than one batch inserter per process if
> they
> >> run
> >> >> >> > sequentially and non-threaded?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Todd
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <
> >> nuriatre...@gmail.com>
> >> >> >> wrote:
> >> >> >> >> Hi again Mattias,
> >> >> >> >>
> >> >> >> >> I have tried to execute my application with the last version
> >> >> available
> >> >> >> in
> >> >> >> >> the maven repository and I still have the same problem. After
> >> >> creating
> >> >> >> and
> >> >> >> >> indexing all the nodes, the application calls the "optimize"
> >> method
> >> >> and,
> >> >> >> >> then, it creates all the edges by calling the method "getNodes"
> in
> >> >> order
> >> >> >> to
> >> >> >> >> select the tail and head node of the edge, but it doesn't work
> >> >> because
> >> >> >> many
> >> >> >> >> nodes are not found.
> >> >> >> >>
> >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works
> >> >> properly,
> >> >> >> but
> >> >> >> >> if I try to create a big graph (180 million edges + 20 million
> >> nodes)
> >> >> it
> >> >> >> >> doesn't.
> >> >> >> >>
> >> >> >> >> I have also tried to call the "optimize" method every time the
> >> >> >> application
> >> >> >> >> has been created 1 million nodes but it doesn't work.
> >> >> >> >>
> >> >> >> >

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Oh ok, It could be our attachments filter / security or something...
could you try to mail them to me directly at matt...@neotechnology.com
?

2009/12/9 Núria Trench :
> Hi Mattias,
>
> In my last e-mail I have attached the sample code, haven't you received it?
> I will try to attach it again.
>
> Núria.
>
> 2009/12/9 Mattias Persson 
>
>> Hi again, Núria (it was I, Mattias who asked for the sample code).
>> Well... the fact that you parse 4 csv files doesn't really help me
>> setup a test for this... I mean how can I know that my test will be
>> similar to yours? Would it be ok to attach your code/csv files as
>> well?
>>
>> / Mattias
>>
>> 2009/12/9 Núria Trench :
>> > Hi Todd,
>> >
>> > The sample code creates nodes and relationships by parsing 4 csv files.
>> > Thank you for trying to trigger this behaviour with this sample.
>> >
>> > Núria
>> >
>> > 2009/12/9 Mattias Persson 
>> >
>> >> Could you provide me with some sample code which can trigger this
>> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
>> >>
>> >> 2009/12/9 Núria Trench :
>> >> > Todd,
>> >> >
>> >> > I haven't the same problem. In my case, after indexing all the
>> >> > attributes/properties of each node, the application creates all the
>> edges
>> >> by
>> >> > looking up the tail node and the head node. So, it calls the method
>> >> > "org.neo4j.util.index.
>> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found
>> >> node)
>> >> > in many occasions.
>> >> >
>> >> > Any one has an alternative to get a node with indexex
>> >> attributes/properties?
>> >> >
>> >> > Thank you,
>> >> >
>> >> > Núria.
>> >> >
>> >> >
>> >> > 2009/12/7 Mattias Persson 
>> >> >
>> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
>> >> >> is a bug that we fixed yesterday... (assuming it's the same bug).
>> >> >>
>> >> >> 2009/12/7 Todd Stavish :
>> >> >> > Hi Mattias, Núria.
>> >> >> >
>> >> >> > I am also running into scalability problems with the Lucene batch
>> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
>> >> >> > calling optimize more. Increasing ulimit didn't help.
>> >> >> >
>> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
>> >> >> > java.io.FileNotFoundException:
>> >> >> >
>> >> >>
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> >> > (Too many open files)
>> >> >> > [INFO]  at
>> >> >>
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
>> >> >> > [INFO]  at
>> >> >>
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
>> >> >> > [INFO]  at
>> >> >>
>> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
>> >> >> > [INFO]  at
>> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
>> >> >> > [INFO] Caused by: java.io.FileNotFoundException:
>> >> >> >
>> >> >>
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> >> > (Too many open files)
>> >> >> >
>> >> >> > I tried breaking up to separate batchinserter instances, and it
>> hangs
>> >> >> > now. Can I create more than one batch inserter per process if they
>> run
>> >> >> > sequentially and non-threaded?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Todd
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <
>> nuriatre...@gmail.com>
>> >> >> wrote:
>> >> >> >> Hi again Mattias,
>> >> >> >>
>> >> >> >> I have tried to execute my application with the last version
>> >> available
>> >> >> in
>> >> >> >> the maven repository and I still have the same problem. After
>> >> creating
>> >> >> and
>> >> >> >> indexing all the nodes, the application calls the "optimize"
>> method
>> >> and,
>> >> >> >> then, it creates all the edges by calling the method "getNodes" in
>> >> order
>> >> >> to
>> >> >> >> select the tail and head node of the edge, but it doesn't work
>> >> because
>> >> >> many
>> >> >> >> nodes are not found.
>> >> >> >>
>> >> >> >> I have tried to create only 30 nodes and 15 edges and it works
>> >> properly,
>> >> >> but
>> >> >> >> if I try to create a big graph (180 million edges + 20 million
>> nodes)
>> >> it
>> >> >> >> doesn't.
>> >> >> >>
>> >> >> >> I have also tried to call the "optimize" method every time the
>> >> >> application
>> >> >> >> has been created 1 million nodes but it doesn't work.
>> >> >> >>
>> >> >> >> Have you tried to create as many nodes as I have said with the
>> newer
>> >> >> >> index-util version?
>> >> >> >>
>> >> >> >> Thank you,
>> >> >> >>
>> >> >> >> Núria.
>> >> >> >>
>> >> >> >> 2009/12/4 Núria Trench 
>> >> >> >>
>> >> >> >>> Hi Mattias,
>> >> >> >>>
>> >> >> >>> Thank you very much for fixing the problem so fast. I will try it
>> as
>> >> >> soon
>> >> >> >>> as the new changes will be available in the maven repository.
>> >> >> >>>
>> >> >> >>> Núria.
>> >> >> >>>
>> 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Mattias,

In my last e-mail I have attached the sample code, haven't you received it?
I will try to attach it again.

Núria.

2009/12/9 Mattias Persson 

> Hi again, Núria (it was I, Mattias who asked for the sample code).
> Well... the fact that you parse 4 csv files doesn't really help me
> setup a test for this... I mean how can I know that my test will be
> similar to yours? Would it be ok to attach your code/csv files as
> well?
>
> / Mattias
>
> 2009/12/9 Núria Trench :
> > Hi Todd,
> >
> > The sample code creates nodes and relationships by parsing 4 csv files.
> > Thank you for trying to trigger this behaviour with this sample.
> >
> > Núria
> >
> > 2009/12/9 Mattias Persson 
> >
> >> Could you provide me with some sample code which can trigger this
> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
> >>
> >> 2009/12/9 Núria Trench :
> >> > Todd,
> >> >
> >> > I haven't the same problem. In my case, after indexing all the
> >> > attributes/properties of each node, the application creates all the
> edges
> >> by
> >> > looking up the tail node and the head node. So, it calls the method
> >> > "org.neo4j.util.index.
> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found
> >> node)
> >> > in many occasions.
> >> >
> >> > Any one has an alternative to get a node with indexex
> >> attributes/properties?
> >> >
> >> > Thank you,
> >> >
> >> > Núria.
> >> >
> >> >
> >> > 2009/12/7 Mattias Persson 
> >> >
> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
> >> >> is a bug that we fixed yesterday... (assuming it's the same bug).
> >> >>
> >> >> 2009/12/7 Todd Stavish :
> >> >> > Hi Mattias, Núria.
> >> >> >
> >> >> > I am also running into scalability problems with the Lucene batch
> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
> >> >> > calling optimize more. Increasing ulimit didn't help.
> >> >> >
> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
> >> >> > java.io.FileNotFoundException:
> >> >> >
> >> >>
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> >> > (Too many open files)
> >> >> > [INFO]  at
> >> >>
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> >> >> > [INFO]  at
> >> >>
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> >> >> > [INFO]  at
> >> >>
> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> >> >> > [INFO]  at
> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> >> >> > [INFO] Caused by: java.io.FileNotFoundException:
> >> >> >
> >> >>
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> >> > (Too many open files)
> >> >> >
> >> >> > I tried breaking up to separate batchinserter instances, and it
> hangs
> >> >> > now. Can I create more than one batch inserter per process if they
> run
> >> >> > sequentially and non-threaded?
> >> >> >
> >> >> > Thanks,
> >> >> > Todd
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <
> nuriatre...@gmail.com>
> >> >> wrote:
> >> >> >> Hi again Mattias,
> >> >> >>
> >> >> >> I have tried to execute my application with the last version
> >> available
> >> >> in
> >> >> >> the maven repository and I still have the same problem. After
> >> creating
> >> >> and
> >> >> >> indexing all the nodes, the application calls the "optimize"
> method
> >> and,
> >> >> >> then, it creates all the edges by calling the method "getNodes" in
> >> order
> >> >> to
> >> >> >> select the tail and head node of the edge, but it doesn't work
> >> because
> >> >> many
> >> >> >> nodes are not found.
> >> >> >>
> >> >> >> I have tried to create only 30 nodes and 15 edges and it works
> >> properly,
> >> >> but
> >> >> >> if I try to create a big graph (180 million edges + 20 million
> nodes)
> >> it
> >> >> >> doesn't.
> >> >> >>
> >> >> >> I have also tried to call the "optimize" method every time the
> >> >> application
> >> >> >> has been created 1 million nodes but it doesn't work.
> >> >> >>
> >> >> >> Have you tried to create as many nodes as I have said with the
> newer
> >> >> >> index-util version?
> >> >> >>
> >> >> >> Thank you,
> >> >> >>
> >> >> >> Núria.
> >> >> >>
> >> >> >> 2009/12/4 Núria Trench 
> >> >> >>
> >> >> >>> Hi Mattias,
> >> >> >>>
> >> >> >>> Thank you very much for fixing the problem so fast. I will try it
> as
> >> >> soon
> >> >> >>> as the new changes will be available in the maven repository.
> >> >> >>>
> >> >> >>> Núria.
> >> >> >>>
> >> >> >>>
> >> >> >>> 2009/12/4 Mattias Persson 
> >> >> >>>
> >> >>  I fixed the problem and also added a cache per key for faster
> >> >>  getNodes/getSingleNode lookup during the insert process. However
> >> the
> >> >>  cache assumes that there's nothing in the index when the process
> >> >>  starts (which al

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Hi again, Núria (it was I, Mattias who asked for the sample code).
Well... the fact that you parse 4 csv files doesn't really help me
setup a test for this... I mean how can I know that my test will be
similar to yours? Would it be ok to attach your code/csv files as
well?

/ Mattias

2009/12/9 Núria Trench :
> Hi Todd,
>
> The sample code creates nodes and relationships by parsing 4 csv files.
> Thank you for trying to trigger this behaviour with this sample.
>
> Núria
>
> 2009/12/9 Mattias Persson 
>
>> Could you provide me with some sample code which can trigger this
>> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
>>
>> 2009/12/9 Núria Trench :
>> > Todd,
>> >
>> > I haven't the same problem. In my case, after indexing all the
>> > attributes/properties of each node, the application creates all the edges
>> by
>> > looking up the tail node and the head node. So, it calls the method
>> > "org.neo4j.util.index.
>> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found
>> node)
>> > in many occasions.
>> >
>> > Any one has an alternative to get a node with indexex
>> attributes/properties?
>> >
>> > Thank you,
>> >
>> > Núria.
>> >
>> >
>> > 2009/12/7 Mattias Persson 
>> >
>> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
>> >> is a bug that we fixed yesterday... (assuming it's the same bug).
>> >>
>> >> 2009/12/7 Todd Stavish :
>> >> > Hi Mattias, Núria.
>> >> >
>> >> > I am also running into scalability problems with the Lucene batch
>> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
>> >> > calling optimize more. Increasing ulimit didn't help.
>> >> >
>> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
>> >> > java.io.FileNotFoundException:
>> >> >
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> > (Too many open files)
>> >> > [INFO]  at
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
>> >> > [INFO]  at
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
>> >> > [INFO]  at
>> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
>> >> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
>> >> > [INFO] Caused by: java.io.FileNotFoundException:
>> >> >
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> > (Too many open files)
>> >> >
>> >> > I tried breaking up to separate batchinserter instances, and it hangs
>> >> > now. Can I create more than one batch inserter per process if they run
>> >> > sequentially and non-threaded?
>> >> >
>> >> > Thanks,
>> >> > Todd
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
>> >> wrote:
>> >> >> Hi again Mattias,
>> >> >>
>> >> >> I have tried to execute my application with the last version
>> available
>> >> in
>> >> >> the maven repository and I still have the same problem. After
>> creating
>> >> and
>> >> >> indexing all the nodes, the application calls the "optimize" method
>> and,
>> >> >> then, it creates all the edges by calling the method "getNodes" in
>> order
>> >> to
>> >> >> select the tail and head node of the edge, but it doesn't work
>> because
>> >> many
>> >> >> nodes are not found.
>> >> >>
>> >> >> I have tried to create only 30 nodes and 15 edges and it works
>> properly,
>> >> but
>> >> >> if I try to create a big graph (180 million edges + 20 million nodes)
>> it
>> >> >> doesn't.
>> >> >>
>> >> >> I have also tried to call the "optimize" method every time the
>> >> application
>> >> >> has been created 1 million nodes but it doesn't work.
>> >> >>
>> >> >> Have you tried to create as many nodes as I have said with the newer
>> >> >> index-util version?
>> >> >>
>> >> >> Thank you,
>> >> >>
>> >> >> Núria.
>> >> >>
>> >> >> 2009/12/4 Núria Trench 
>> >> >>
>> >> >>> Hi Mattias,
>> >> >>>
>> >> >>> Thank you very much for fixing the problem so fast. I will try it as
>> >> soon
>> >> >>> as the new changes will be available in the maven repository.
>> >> >>>
>> >> >>> Núria.
>> >> >>>
>> >> >>>
>> >> >>> 2009/12/4 Mattias Persson 
>> >> >>>
>> >>  I fixed the problem and also added a cache per key for faster
>> >>  getNodes/getSingleNode lookup during the insert process. However
>> the
>> >>  cache assumes that there's nothing in the index when the process
>> >>  starts (which almost always will be true) to speed things up even
>> >>  further.
>> >> 
>> >>  You can control the cache size and if it should be used by
>> overriding
>> >>  the (this is also documented in the Javadoc):
>> >> 
>> >>  boolean useCache()
>> >>  int getMaxCacheSizePerKey()
>> >> 
>> >>  methods in your LuceneIndexBatchInserterImpl instance. The new
>> changes
>> >>  should be available in the maven repository within an hour.
>> >> 
>> >> >>>

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Hi Todd,

The sample code creates nodes and relationships by parsing 4 csv files.
Thank you for trying to trigger this behaviour with this sample.

Núria

2009/12/9 Mattias Persson 

> Could you provide me with some sample code which can trigger this
> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
>
> 2009/12/9 Núria Trench :
> > Todd,
> >
> > I haven't the same problem. In my case, after indexing all the
> > attributes/properties of each node, the application creates all the edges
> by
> > looking up the tail node and the head node. So, it calls the method
> > "org.neo4j.util.index.
> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found
> node)
> > in many occasions.
> >
> > Any one has an alternative to get a node with indexex
> attributes/properties?
> >
> > Thank you,
> >
> > Núria.
> >
> >
> > 2009/12/7 Mattias Persson 
> >
> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
> >> is a bug that we fixed yesterday... (assuming it's the same bug).
> >>
> >> 2009/12/7 Todd Stavish :
> >> > Hi Mattias, Núria.
> >> >
> >> > I am also running into scalability problems with the Lucene batch
> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
> >> > calling optimize more. Increasing ulimit didn't help.
> >> >
> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
> >> > java.io.FileNotFoundException:
> >> >
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> > (Too many open files)
> >> > [INFO]  at
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> >> > [INFO]  at
> >>
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> >> > [INFO]  at
> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> >> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> >> > [INFO] Caused by: java.io.FileNotFoundException:
> >> >
> >>
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> >> > (Too many open files)
> >> >
> >> > I tried breaking up to separate batchinserter instances, and it hangs
> >> > now. Can I create more than one batch inserter per process if they run
> >> > sequentially and non-threaded?
> >> >
> >> > Thanks,
> >> > Todd
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
> >> wrote:
> >> >> Hi again Mattias,
> >> >>
> >> >> I have tried to execute my application with the last version
> available
> >> in
> >> >> the maven repository and I still have the same problem. After
> creating
> >> and
> >> >> indexing all the nodes, the application calls the "optimize" method
> and,
> >> >> then, it creates all the edges by calling the method "getNodes" in
> order
> >> to
> >> >> select the tail and head node of the edge, but it doesn't work
> because
> >> many
> >> >> nodes are not found.
> >> >>
> >> >> I have tried to create only 30 nodes and 15 edges and it works
> properly,
> >> but
> >> >> if I try to create a big graph (180 million edges + 20 million nodes)
> it
> >> >> doesn't.
> >> >>
> >> >> I have also tried to call the "optimize" method every time the
> >> application
> >> >> has been created 1 million nodes but it doesn't work.
> >> >>
> >> >> Have you tried to create as many nodes as I have said with the newer
> >> >> index-util version?
> >> >>
> >> >> Thank you,
> >> >>
> >> >> Núria.
> >> >>
> >> >> 2009/12/4 Núria Trench 
> >> >>
> >> >>> Hi Mattias,
> >> >>>
> >> >>> Thank you very much for fixing the problem so fast. I will try it as
> >> soon
> >> >>> as the new changes will be available in the maven repository.
> >> >>>
> >> >>> Núria.
> >> >>>
> >> >>>
> >> >>> 2009/12/4 Mattias Persson 
> >> >>>
> >>  I fixed the problem and also added a cache per key for faster
> >>  getNodes/getSingleNode lookup during the insert process. However
> the
> >>  cache assumes that there's nothing in the index when the process
> >>  starts (which almost always will be true) to speed things up even
> >>  further.
> >> 
> >>  You can control the cache size and if it should be used by
> overriding
> >>  the (this is also documented in the Javadoc):
> >> 
> >>  boolean useCache()
> >>  int getMaxCacheSizePerKey()
> >> 
> >>  methods in your LuceneIndexBatchInserterImpl instance. The new
> changes
> >>  should be available in the maven repository within an hour.
> >> 
> >>  2009/12/4 Mattias Persson :
> >>  > I think I found the problem... it's indexing as it should, but it
> >>  > isn't reflected in getNodes/getSingleNode properly until you
> >>  > flush/optimize/shutdown the index. I'll try to fix it today!
> >>  >
> >>  > 2009/12/3 Núria Trench :
> >>  >> Thank you very much for your response.
> >>  >> If you need more information, you only have to send an e-mail
> and I
> >>  will try
> >> 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Mattias Persson
Could you provide me with some sample code which can trigger this
behaviour with the latest index-util-0.9-SNAPSHOT Núria?

2009/12/9 Núria Trench :
> Todd,
>
> I haven't the same problem. In my case, after indexing all the
> attributes/properties of each node, the application creates all the edges by
> looking up the tail node and the head node. So, it calls the method
> "org.neo4j.util.index.
> LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node)
> in many occasions.
>
> Any one has an alternative to get a node with indexex attributes/properties?
>
> Thank you,
>
> Núria.
>
>
> 2009/12/7 Mattias Persson 
>
>> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
>> is a bug that we fixed yesterday... (assuming it's the same bug).
>>
>> 2009/12/7 Todd Stavish :
>> > Hi Mattias, Núria.
>> >
>> > I am also running into scalability problems with the Lucene batch
>> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
>> > calling optimize more. Increasing ulimit didn't help.
>> >
>> > INFO] Exception in thread "main" java.lang.RuntimeException:
>> > java.io.FileNotFoundException:
>> >
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> > (Too many open files)
>> > [INFO]  at
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
>> > [INFO]  at
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
>> > [INFO]  at
>> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
>> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
>> > [INFO] Caused by: java.io.FileNotFoundException:
>> >
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> > (Too many open files)
>> >
>> > I tried breaking up to separate batchinserter instances, and it hangs
>> > now. Can I create more than one batch inserter per process if they run
>> > sequentially and non-threaded?
>> >
>> > Thanks,
>> > Todd
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
>> wrote:
>> >> Hi again Mattias,
>> >>
>> >> I have tried to execute my application with the last version available
>> in
>> >> the maven repository and I still have the same problem. After creating
>> and
>> >> indexing all the nodes, the application calls the "optimize" method and,
>> >> then, it creates all the edges by calling the method "getNodes" in order
>> to
>> >> select the tail and head node of the edge, but it doesn't work because
>> many
>> >> nodes are not found.
>> >>
>> >> I have tried to create only 30 nodes and 15 edges and it works properly,
>> but
>> >> if I try to create a big graph (180 million edges + 20 million nodes) it
>> >> doesn't.
>> >>
>> >> I have also tried to call the "optimize" method every time the
>> application
>> >> has been created 1 million nodes but it doesn't work.
>> >>
>> >> Have you tried to create as many nodes as I have said with the newer
>> >> index-util version?
>> >>
>> >> Thank you,
>> >>
>> >> Núria.
>> >>
>> >> 2009/12/4 Núria Trench 
>> >>
>> >>> Hi Mattias,
>> >>>
>> >>> Thank you very much for fixing the problem so fast. I will try it as
>> soon
>> >>> as the new changes will be available in the maven repository.
>> >>>
>> >>> Núria.
>> >>>
>> >>>
>> >>> 2009/12/4 Mattias Persson 
>> >>>
>>  I fixed the problem and also added a cache per key for faster
>>  getNodes/getSingleNode lookup during the insert process. However the
>>  cache assumes that there's nothing in the index when the process
>>  starts (which almost always will be true) to speed things up even
>>  further.
>> 
>>  You can control the cache size and if it should be used by overriding
>>  the (this is also documented in the Javadoc):
>> 
>>  boolean useCache()
>>  int getMaxCacheSizePerKey()
>> 
>>  methods in your LuceneIndexBatchInserterImpl instance. The new changes
>>  should be available in the maven repository within an hour.
>> 
>>  2009/12/4 Mattias Persson :
>>  > I think I found the problem... it's indexing as it should, but it
>>  > isn't reflected in getNodes/getSingleNode properly until you
>>  > flush/optimize/shutdown the index. I'll try to fix it today!
>>  >
>>  > 2009/12/3 Núria Trench :
>>  >> Thank you very much for your response.
>>  >> If you need more information, you only have to send an e-mail and I
>>  will try
>>  >> to explain it better.
>>  >>
>>  >> Núria.
>>  >>
>>  >> 2009/12/3 Mattias Persson 
>>  >>
>>  >>> This is something I'd like to reproduce and I'll do some testing
>> on
>>  >>> this tomorrow
>>  >>>
>>  >>> 2009/12/3 Núria Trench :
>>  >>> > Hello,
>>  >>> >
>>  >>> > Last week, I decided to download your graph database core in
>> order
>>  to use
>>  >>> > it. First, I created a new project to parse my CSV 

Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-09 Thread Núria Trench
Todd,

I haven't the same problem. In my case, after indexing all the
attributes/properties of each node, the application creates all the edges by
looking up the tail node and the head node. So, it calls the method
"org.neo4j.util.index.
LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node)
in many occasions.

Any one has an alternative to get a node with indexex attributes/properties?

Thank you,

Núria.


2009/12/7 Mattias Persson 

> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
> is a bug that we fixed yesterday... (assuming it's the same bug).
>
> 2009/12/7 Todd Stavish :
> > Hi Mattias, Núria.
> >
> > I am also running into scalability problems with the Lucene batch
> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
> > calling optimize more. Increasing ulimit didn't help.
> >
> > INFO] Exception in thread "main" java.lang.RuntimeException:
> > java.io.FileNotFoundException:
> >
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> > (Too many open files)
> > [INFO]  at
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> > [INFO]  at
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> > [INFO]  at
> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> > [INFO] Caused by: java.io.FileNotFoundException:
> >
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> > (Too many open files)
> >
> > I tried breaking up to separate batchinserter instances, and it hangs
> > now. Can I create more than one batch inserter per process if they run
> > sequentially and non-threaded?
> >
> > Thanks,
> > Todd
> >
> >
> >
> >
> >
> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench 
> wrote:
> >> Hi again Mattias,
> >>
> >> I have tried to execute my application with the last version available
> in
> >> the maven repository and I still have the same problem. After creating
> and
> >> indexing all the nodes, the application calls the "optimize" method and,
> >> then, it creates all the edges by calling the method "getNodes" in order
> to
> >> select the tail and head node of the edge, but it doesn't work because
> many
> >> nodes are not found.
> >>
> >> I have tried to create only 30 nodes and 15 edges and it works properly,
> but
> >> if I try to create a big graph (180 million edges + 20 million nodes) it
> >> doesn't.
> >>
> >> I have also tried to call the "optimize" method every time the
> application
> >> has been created 1 million nodes but it doesn't work.
> >>
> >> Have you tried to create as many nodes as I have said with the newer
> >> index-util version?
> >>
> >> Thank you,
> >>
> >> Núria.
> >>
> >> 2009/12/4 Núria Trench 
> >>
> >>> Hi Mattias,
> >>>
> >>> Thank you very much for fixing the problem so fast. I will try it as
> soon
> >>> as the new changes will be available in the maven repository.
> >>>
> >>> Núria.
> >>>
> >>>
> >>> 2009/12/4 Mattias Persson 
> >>>
>  I fixed the problem and also added a cache per key for faster
>  getNodes/getSingleNode lookup during the insert process. However the
>  cache assumes that there's nothing in the index when the process
>  starts (which almost always will be true) to speed things up even
>  further.
> 
>  You can control the cache size and if it should be used by overriding
>  the (this is also documented in the Javadoc):
> 
>  boolean useCache()
>  int getMaxCacheSizePerKey()
> 
>  methods in your LuceneIndexBatchInserterImpl instance. The new changes
>  should be available in the maven repository within an hour.
> 
>  2009/12/4 Mattias Persson :
>  > I think I found the problem... it's indexing as it should, but it
>  > isn't reflected in getNodes/getSingleNode properly until you
>  > flush/optimize/shutdown the index. I'll try to fix it today!
>  >
>  > 2009/12/3 Núria Trench :
>  >> Thank you very much for your response.
>  >> If you need more information, you only have to send an e-mail and I
>  will try
>  >> to explain it better.
>  >>
>  >> Núria.
>  >>
>  >> 2009/12/3 Mattias Persson 
>  >>
>  >>> This is something I'd like to reproduce and I'll do some testing
> on
>  >>> this tomorrow
>  >>>
>  >>> 2009/12/3 Núria Trench :
>  >>> > Hello,
>  >>> >
>  >>> > Last week, I decided to download your graph database core in
> order
>  to use
>  >>> > it. First, I created a new project to parse my CSV files and
> create
>  a new
>  >>> > graph database with Neo4j. This CSV files contain 150 milion
> edges
>  and 20
>  >>> > milion nodes.
>  >>> >
>  >>> > When I finished to write the code which will create the graph
>  database, I
>  >>> > executed it and, after six