Re: [Neo4j] LuceneIndexService: NoSuchMethodError
As it turns out, this didn't resolve my problem, but actually broke the other part of the application that was using Lucene. The error is as follows: Caused by: java.lang.NoSuchMethodError: org.apache.lucene.search.IndexSearcher.search(Lorg/apache/lucene/search/Query;)Lorg/apache/lucene/search/Hits; In a previous message to this list, it was said that the Hits class, which causes the NoSuchMethod error, was copied into the index-core artifact, so it should be compatible with Lucene 3.0. Is there a way to make sure that the VM resolves to the included version of Hits rather than looking for it in the Lucene library where it no longer exists? So far the only solutions I've seen are to revert to Lucene 2.9.2, which unfortunately I don't have the flexibility to do. Is there another way to deal with this? I'd rather not have to edit the indexing component and build from source, but it seems like it might come to this. Thanks, Alex On Fri, Aug 6, 2010 at 12:52 PM, Alex D'Amour adam...@iq.harvard.edu wrote: For future reference, if anybody else is in the situation I was in, once solution is to package up your library with dependency classes rolled up into the jar file. The following added to the POM.xml under plugins accomplishes this: plugin artifactIdmaven-assembly-plugin/artifactId executions execution phasepackage/phase goals goalattached/goal /goals /execution /executions configuration descriptorRefs descriptorRefjar-with-dependencies/descriptorRef /descriptorRefs /configuration /plugin Alex On Thu, Aug 5, 2010 at 8:25 PM, Alex D'Amour adam...@iq.harvard.edu wrote: Hi all, I'm getting this same error in an environment where a class I implemented using Neo4j and the LuceneIndexService is called by an application that's using Lucene 3.0. The application server is unfortunately above my abstraction layer (I'm just implementing the back end in neo4j), so I can't change the version of lucene that it's including. Previous messages have suggested that the indexing component should work using Lucene 3.0, but is there an easy way for me to remvoe this version conflict without manually editing the pom.xml in the indexing component source? Thanks, Alex On Mon, Aug 2, 2010 at 11:07 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Max, since you are using neo4j-index, you should not be importing Lucene again, since it already is a dependency of the index components (and I think the version is higher there). So, upgrading to 1.1 and removing the Lucene dependency should fix it: dependency groupIdorg.neo4j/groupId artifactIdneo4j-kernel/artifactId version1.1/version /dependency dependency groupIdorg.neo4j/groupId artifactIdneo4j-index/artifactId version1.1/version /dependency Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 2, 2010 at 3:11 PM, Max Jakob max.ja...@fu-berlin.de wrote: Hi Peter, this sounds like a version clash on Lucene. Can you check what version(s) of Lucene (and Neo4j-Index) you are running in the two scenarios? That would make sense to me as well. But like I said, on the first run, the method is found. Running the exact same code a second time, without any changes, it complains that the method is not found. (?!) Here the versions I use (for both runs) from my pom.xml: dependency groupIdorg.neo4j/groupId artifactIdneo4j-kernel/artifactId version1.1-SNAPSHOT/version /dependency dependency groupIdorg.neo4j/groupId artifactIdneo4j-index/artifactId version1.1-SNAPSHOT/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-highlighter/artifactId version2.9.1/version /dependency Cheers, Max On Mon, Aug 2, 2010 at 3:01 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Max, this sounds like a version clash on Lucene. Can you check what version(s) of Lucene (and Neo4j-Index) you are running in the two scenarios? Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 2, 2010 at 2:38 PM
Re: [Neo4j] LuceneIndexService: NoSuchMethodError
For future reference, if anybody else is in the situation I was in, once solution is to package up your library with dependency classes rolled up into the jar file. The following added to the POM.xml under plugins accomplishes this: plugin artifactIdmaven-assembly-plugin/artifactId executions execution phasepackage/phase goals goalattached/goal /goals /execution /executions configuration descriptorRefs descriptorRefjar-with-dependencies/descriptorRef /descriptorRefs /configuration /plugin Alex On Thu, Aug 5, 2010 at 8:25 PM, Alex D'Amour adam...@iq.harvard.edu wrote: Hi all, I'm getting this same error in an environment where a class I implemented using Neo4j and the LuceneIndexService is called by an application that's using Lucene 3.0. The application server is unfortunately above my abstraction layer (I'm just implementing the back end in neo4j), so I can't change the version of lucene that it's including. Previous messages have suggested that the indexing component should work using Lucene 3.0, but is there an easy way for me to remvoe this version conflict without manually editing the pom.xml in the indexing component source? Thanks, Alex On Mon, Aug 2, 2010 at 11:07 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Max, since you are using neo4j-index, you should not be importing Lucene again, since it already is a dependency of the index components (and I think the version is higher there). So, upgrading to 1.1 and removing the Lucene dependency should fix it: dependency groupIdorg.neo4j/groupId artifactIdneo4j-kernel/artifactId version1.1/version /dependency dependency groupIdorg.neo4j/groupId artifactIdneo4j-index/artifactId version1.1/version /dependency Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 2, 2010 at 3:11 PM, Max Jakob max.ja...@fu-berlin.de wrote: Hi Peter, this sounds like a version clash on Lucene. Can you check what version(s) of Lucene (and Neo4j-Index) you are running in the two scenarios? That would make sense to me as well. But like I said, on the first run, the method is found. Running the exact same code a second time, without any changes, it complains that the method is not found. (?!) Here the versions I use (for both runs) from my pom.xml: dependency groupIdorg.neo4j/groupId artifactIdneo4j-kernel/artifactId version1.1-SNAPSHOT/version /dependency dependency groupIdorg.neo4j/groupId artifactIdneo4j-index/artifactId version1.1-SNAPSHOT/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-highlighter/artifactId version2.9.1/version /dependency Cheers, Max On Mon, Aug 2, 2010 at 3:01 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Max, this sounds like a version clash on Lucene. Can you check what version(s) of Lucene (and Neo4j-Index) you are running in the two scenarios? Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 2, 2010 at 2:38 PM, Max Jakob max.ja...@fu-berlin.de wrote: Hi, I have a problem with the LuceneIndexService. When I create an indexed graph base and I commit it to disk, next time I want to use it, I get a NoSuchMethodError for LuceneIndexService.getSingleNode: Exception in thread main java.lang.NoSuchMethodError: org.apache.lucene.search.IndexSearcher.search(Lorg/apache/lucene/search/Query;)Lorg/apache/lucene/search/Hits; at org.neo4j.index.lucene.LuceneIndexService.searchForNodes(LuceneIndexService.java:430) at org.neo4j.index.lucene.LuceneIndexService.getNodes(LuceneIndexService.java:310) at org.neo4j.index.lucene.LuceneIndexService.getSingleNode(LuceneIndexService.java:469) at org.neo4j.index.lucene.LuceneIndexService.getSingleNode(LuceneIndexService.java:461) To illustrate this in more detail: if I run the code below for the first time, everything goes fine. On a second run I get the exception. Could somebody give me a hint where I'm going wrong? (re-indexing does not work
[Neo4j] Attributes or Relationship Check During Traversal
Hello all, I have a question regarding traversals over a large graph when that traversal depends on a discretely valued attribute of the nodes being traversed. As a small example, the nodes in my graph can have 2 states -- on and off. I'd like to traverse over paths that only consist of active nodes. Since this state attributes can only take 2 values, I see two possible approaches to implementing this: 1) Use node properties, and have the PruneEvaluator and filter Predicate check to see whether the current endNode has a property called on. 2) Create a state node which represents the on state. Have all nodes that are in the on state have a relationship of type STATE_ON incoming from the on node. Have the PruneEvaluator and filter Predicate check whether the node has a single relationship of type STATE_ON, INCOMING. Which is closer to what we might consider best practices for Neo4j? The problem I see in implementation 1 is that that traversal has to hit the property store, which could slow things down. The problem with 2 is that there can be up to #nodes relationships coming from the on state node, and making this more efficient by setting up a tree of on state nodes seems to be manually replicating something that the indexing service has already accomplished. Also, how efficiently would each of these two implementations exploit caching (or is this irrelevant?)? Finally, would your answer change if we generalized this to a larger number of categories? Thanks, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Stability of Iterators Across Transactions?
Hello all, I have an application where I have a node that has several hundred thousand relationships (this probably needs to be changed). In the application I iterate over these relationships, and delete a large subset of them. Because there are so many writes, I want to commit the transaction every few thousand deletions. The problem is that the getAllRelationships iterator seems to halt after the first transaction commit. Clearly, I should reduce the number of relationships that are connected to this node, but is this the expected behavior? Should iterators be made stable across transactions, or are they only supposed to be guaranteed within a transaction? Thanks, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about labelling all connected components
One other option is to have a set of nodes, each of which represents a component. You can create a relationships of type OWNS (or whatever) to each of the nodes of a given component. This makes component lookup rather simple (just grab the node that represents the component, then traverse all of the OWNS relationships), and it makes merging components rather simple if you end up having two components that get linked to each other while the graph is evolving (transfer all of the relationships from one to the other). On Sat, Jul 24, 2010 at 1:10 PM, Mattias Persson matt...@neotechnology.comwrote: 2010/7/23 Arijit Mukherjee ariji...@gmail.com Thanx to both of you. Yes, I can just check whether the label exists on the node or not. In my case checking for Integer.MIN_VALUE which is what is assigned when the subscriber node is created. To assign a temporary value (or a value representing the state not assigned) seems unecessary. A better way would be to not set that property on creating a node and then use: node.getProperty( whatever key, Integer.MIN_VALUE ); when getting that property. BTW - is it ever possible to label the components while creating the graph? I can't think of any way of doing this - but I might be missing something... Regards Arijit On 22 July 2010 20:54, Vitor De Mario vitordema...@gmail.com wrote: As far as the algorithm goes, I see nothing wrong. Connected components is a well known problem in graph theory, and you're doing just fine. I second the recommendations of Tobias, specially the second one, as you would get rid of the labelled collection completely, and that improves you both in time and memory. []'s Vitor On Thu, Jul 22, 2010 at 11:35 AM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: The first obvious thing is that labelled.contains(currentNode.getId()) is going to take more time as your dataset grows, since it's a linear search for the element in an ArrayList. A HashSet would be a much more appropriate data structure for your application. The other thing that comes to mind is the memory overhead of the labelled-collection. Eventually it is going to contain every node in the graph, and be very large. This steals some of the memory that could have been used for caching the graph, forcing Neo4j to do more I/O than it would have if it could have used that memory for cache. Would it be possible for you to replace the !labelled.contains(currentNode.getId())-check with currentNode.getProperty(componentID,null) == null? Or are there situations where the node could have that property and not be considered labeled? Cheers, Tobias On Thu, Jul 22, 2010 at 3:35 PM, Arijit Mukherjee ariji...@gmail.com wrote: Hi All I'm trying to label all connected components in a graph - i.e. all nodes that are connected will have a common componentID property set. I'm using the Traverser to do this. For each node in the graph (unless it is already labelled, which I track by inserting the node ID in a list), the traverser finds out all the neighbours using BFS, and then the node and all the neighbours are labelled with a certain value. The code is something like this - IterableNode allNodes = graphDbService.getAllNodes(); ArrayList labelled = new ArrayList(); for (Node currentNode : allNodes) { if (currentNode.hasProperty(number) !labelled.contains(currentNode.getId())) { Traverser traverser = currentNode.traverse(Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, RelTypes.CALLS, Direction.BOTH); int currentID = initialID; initialID++; currentNode.setProperty(componentID, currentID); labelled.add(currentNode.getId()); for (Node friend : traverser) { friend.setProperty(componentID, currentID); // mark each node as labelled labelled.add(friend.getId()); } } } This works well for a small graph (2000 nodes). But for a graph of about 1 million nodes, this is taking about 45 minutes on a 64-bit Intel 2.3GHz CPU, 4GB RAM (Java 1.6 update 21 and Neo4J 1.0). Is this normal? Or is the code I'm using faulty? Is there any other way to label the connected components? Regards Arijit -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing Relationships?
First of all, I'd like to thank everybody who chimed in on this thread with their own use cases. To answer Tobias' original question, I have two examples of use cases that we want to support. In the first example, we'd like to represent the coauthorship network of inventors who hold patents registered in the United States, so each node is an author and each relationship is a coauthorship of a patent. We are particularly interested in how changes in authors' location or employment connect otherwise disjoint sets of coauthors (e.g. somebody moves from HP to Canon and provides a potential source of collaboration between engineers at these firms). Because a node's employer or location state changes with time, we choose to store this information in the relationships instead of the nodes. Often we are interested in collaborations that took place in a certain region during a certain timespan -- in this case, it makes sense to query the relationships. In another example, we have data on conflicts between countries that we would like to represent as a network. Each node is a country and each relationship indicates that two countries had a conflict at a particular time. We have time data stored in the relationships, and we would like to query for conflicts that occurred during a particular timeframe. In general, we want to provide data hosting for social scientists who have dyadic relational data. We'd like for them to be able to upload a graphml file that encodes their property graph, and then allow other researchers to query that graph for particular nodes and edges and download that subgraph. In particular, we would like the kind of each edge and node querying support found in network analysis packages like igraph. I realize that in most cases the relationship-querying problem can be solved with better domain modeling. In an extreme case we could make the graph bipartite and simply have each relationship be intercepted by a node that holds the properties that we would like to query. Of course, in many cases it's be possible to consolidate these relationship nodes or the properties they contain into nodes that represent other entities (a patent, a year, a place) and reduce the number of relationships in the graph. However, this is hard to generalize and in certain cases makes the storage engine much less efficient (2 times the edges, plus a new node for each edge in the worst case). Still, in requesting this feature, I should mention that we have also prototyped ways to make this relationship consolidation more general, but that it seems less straightforward than extending the indexing engine to include relationships. It is possible that this very flat graph representation isn't a priority for the Neo4j team and that our use of the database is an abuse of the system, but this is the type of structure that network researchers have the most interest in, and Neo4j is by far the best database available to store it. And while network analysis techniques will eventually adapt to more advanced representations, they currently rely on the much flatter structure of one type of node and one type of relationship. Allowing this kind of indexing on relationships would greatly enhance Neo4j's usability in the network analysis community, and perhaps begin to push researchers to explore the properties of more complex graphs that Neo4j is capable of representing. Thanks very much, Alex D'Amour On Sat, May 15, 2010 at 2:49 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I use relationships to encode paths in the graph based on the meta model. For example: Class(Article) -- Relationship(Author) -- Class(User) -- Property(Username) Right now I encode this using an md5 encoding of the above path, add a property to the first entity in the path, using the md5 encoding as the key (the value is irrelevant), relationships (with a DynamicRelationshipType with a name equal to the md5 key) are used to link the various items in the path. Finding the path requires a traversal from the first Class node in the path, following the given relationships. This traversal can potentially be expensive when a class takes many instances (all have a relationship to the class). When relationships were indexed, the path could be encoded by giving each relationship making up the path a property encoding the path, then use the index to retrieve all relationships making up the path and lay those relationships head to toe to construct the path. No longer would a traversal be necessary and the cost of the operation only depends on the number of elements in the path, and not to the number of relationships one of the elements in the path can potentially have. Niels From: tobias.ivars...@neotechnology.com Date: Sat, 15 May 2010 13:32:36 +0200 To: user@lists.neo4j.org Subject: Re: [Neo] Indexing Relationships? There is no indexing component for Relationships and there has never been one. The interesting question
[Neo] Iterator over Relationships?
Hello all, Is there an easy way to create an iterator over all relations stored in a database? There's database.getAllNodes(). Why isn't there a getAllRelationships() method? There appears to have been one in the past, but it looks like it's protected now? Is there a specific reason why this might be a bad idea? If so, would simply iterating over the nodes, and getting edges that way be substantially faster despite touching each edge twice? Thanks, Alex ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing on Doubles in Neo4j
Rick, Thanks for the suggestion, but we need a wider range of measures than those supplied here, so I'm following the tinkerpop team's lead and implementing the JUNG Graph interface, but using traversers under the hood. JUNG has a much fuller set of algorithms and seems to be more actively supported. However, we'll be dealing with network data that have real-valued covariates, and we'd prefer not to throw away information. Thanks, Alex On Mon, Mar 22, 2010 at 5:17 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Maybe this? http://components.neo4j.org/neo4j-graph-algo/apidocs/org/neo4j/graphalgo/cen trality/package-summary.htmlhttp://components.neo4j.org/neo4j-graph-algo/apidocs/org/neo4j/graphalgo/cen%0Atrality/package-summary.html -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Alex D'Amour Sent: Monday, March 22, 2010 5:06 PM To: Neo user discussions Subject: [Neo] Indexing on Doubles in Neo4j Hello all, I'm working on an application where it would be nice to perform lookups on a graph database based on real-valued properties. For example, if I have a social network, and have assigned real-valued centrality measures to each node, I'd like to be able to choose all vertices whose centrality measure is greater than some threshold. I see that the Timeline index service offers this for integer-valued properties. Is there something similar (or in the pipeline) for doing the same with real-valued properties? Is there an easy way to adapt one of the current indexing utilities to do this (besides multiplying by 10^n for sufficiently large n and then rounding)? Thanks, Alex D'Amour Harvard Institute for Quantitative Social Science ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing on Doubles in Neo4j
Craig, Please keep me (or just the list) updated on this. Thanks, Alex On Tue, Mar 23, 2010 at 5:43 AM, Craig Taverner cr...@amanzi.com wrote: Last year we wrote a multi-dimensional index for floats, similar in principle to the timeline index, but working on multiple floats (and doubles). We used it to index locations. Now we are hoping to include the same concepts in the new Neo4j Spatialhttp://wiki.neo4j.org/content/Neo4j_Spatialproject. Even though this is targeting map data, it seems viable for any float/double property index. We hope to have some usable code for this within the next few weeks. On Mon, Mar 22, 2010 at 10:13 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Alex, due to floating point precision issues, you might be best off determining some type of integral rounded or scaled key as you suggest. If you end up using the Lucene indexing engine, you'd probably want to do something like this anyway, since indexing is string-based under the hood. That said, I wonder if any of the graph algos available for Neo could be used to determine centrality during traversal rather than storing it statically? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Alex D'Amour Sent: Monday, March 22, 2010 5:06 PM To: Neo user discussions Subject: [Neo] Indexing on Doubles in Neo4j Hello all, I'm working on an application where it would be nice to perform lookups on a graph database based on real-valued properties. For example, if I have a social network, and have assigned real-valued centrality measures to each node, I'd like to be able to choose all vertices whose centrality measure is greater than some threshold. I see that the Timeline index service offers this for integer-valued properties. Is there something similar (or in the pipeline) for doing the same with real-valued properties? Is there an easy way to adapt one of the current indexing utilities to do this (besides multiplying by 10^n for sufficiently large n and then rounding)? Thanks, Alex D'Amour Harvard Institute for Quantitative Social Science ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Indexing on Doubles in Neo4j
Hello all, I'm working on an application where it would be nice to perform lookups on a graph database based on real-valued properties. For example, if I have a social network, and have assigned real-valued centrality measures to each node, I'd like to be able to choose all vertices whose centrality measure is greater than some threshold. I see that the Timeline index service offers this for integer-valued properties. Is there something similar (or in the pipeline) for doing the same with real-valued properties? Is there an easy way to adapt one of the current indexing utilities to do this (besides multiplying by 10^n for sufficiently large n and then rounding)? Thanks, Alex D'Amour Harvard Institute for Quantitative Social Science ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user