date:20111118

Re: [Neo4j] Sampling a Neo4j instance?

2011-11-18 Thread Anders Lindström

Would this work in HA mode too (i.e. HighlyAvailableGraphDatabase)? I can see 
that the 'getConfig' is there -- but does the cast to NeoStoreXaDataSource work 
as well?
Thanks.

 Date: Wed, 16 Nov 2011 21:40:32 +0200
 From: chris.gio...@neotechnology.com
 To: user@lists.neo4j.org
 Subject: Re: [Neo4j] Sampling a Neo4j instance?

 No, GraphDatabaseService wisely hides those things away. I would
 suggest using instanceof and casting to EmbeddedGraphDatabase.

 cheers,
 CG

 2011/11/16 Anders Lindström andli...@hotmail.com:

  Chris, thanks again for your replies.
  I realize now that I don't have the 'getConfig' method -- I'm writing a 
  server plugin and I only get the GraphDatabaseService interface passed to 
  my method, not a EmbeddedGraphDatabase. Is there an equivalent way of 
  getting the highest node index through the interface?
  Thanks.

  Date: Thu, 10 Nov 2011 12:01:31 +0200
  From: chris.gio...@neotechnology.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] Sampling a Neo4j instance?

  Answers inline.

  2011/11/9 Anders Lindström andli...@hotmail.com:

   Thanks to the both of you. I am very grateful that you took your time to 
   put this into code -- how's that for community!
   I presume this way of getting 'highId' is constant in time? It looks 
   rather messy though -- is it really the most straightforward way to do 
   it?

  This is the safest way to do it, that takes into consideration crashes
  and HA cluster membership.

  Another way to do it is

  long highId = db.getConfig().getIdGeneratorFactory().get( IdType.NODE
  ).getHighId();

  which can return the same value with the first, if some conditions are
  met. It is shorter and cast-free but i'd still use the first way.

  getHighId() is a constant time operation for both ways described - it
  is just a field access, with an additional long comparison for the
  first case.

   I am thinking about how efficient this will be. As I understand it, the 
   sampling misses come from deleted nodes that once was there. But if I 
   remember correctly, Neo4j tries to reuse these unused node indices when 
   new nodes are added. But is an unused node index _guaranteed_ to be used 
   given that there is one, or could inserting another node result in 
   increasing 'highId' even though some indices below it are not used?

  During the lifetime of a Neo4j instance there is no id reuse for Nodes
  and Relationships - deleted ids are saved however and will be reused
  the next time Neo4j starts. This means that if during run A you
  deleted nodes 3 and 5, the first two nodes returned by createNode() on
  the next run will have ids 3 and 5 - so highId will not change.
  Additionally, during run A, after deleting nodes 3 and 5, no new nodes
  would have the id 3 or 5. A crash (or improper shutdown) of the
  database will break this however, since the ids-to-recycle will
  probably not make it to disk.

  So, in short, it is guaranteed that ids *won't* be reused in the same
  run but not guaranteed to be reused between runs.

   My conclusion is that the sampling misses will increase with index 
   usage sparseness and that we will have a high rate of sampling misses 
   when we had many deletes and few insertions recently. Would you agree?

  Yes, that is true, especially given the cost of the wasted I/O and
  of handling the exception. However, this cost can go down
  significantly if you keep a hash set for the ids of nodes you have
  deleted and check that before asking for the node by id, instead of
  catching an exception. Persisting that between runs would move you
  away from encapsulated Neo4j constructs and would also be more
  efficient.

   Thanks again.
   Regards,Anders

   Date: Wed, 9 Nov 2011 19:30:36 +0200
   From: chris.gio...@neotechnology.com
   To: user@lists.neo4j.org
   Subject: Re: [Neo4j] Sampling a Neo4j instance?

   Hi,

   Backing Jim's algorithm with some code:

   public static void main( String[] args )
   {
   long SAMPLE_SIZE = 1;
   EmbeddedGraphDatabase db = new EmbeddedGraphDatabase(
   path/to/db/ );
   // Determine the highest possible id for the node store
   long highId = ( (NeoStoreXaDataSource)
   db.getConfig().getTxModule().getXaDataSourceManager().getXaDataSource(
   Config.DEFAULT_DATA_SOURCE_NAME )
   ).getNeoStore().getNodeStore().getHighId();
   System.out.println( highId +  is the highest id );
   long i = 0;
   long nextId;

   // Do the sampling
   Random random = new Random();
   while ( i  SAMPLE_SIZE )
   {
   nextId = Math.abs( random.nextLong() ) % highId;
   try
   {
   db.getNodeById( nextId );
   i++;
   System.out.println( id  + nextId +  is there );
   }
   catch ( NotFoundException e )
   {

[Neo4j] best way to get all directly related nodes?

2011-11-18 Thread D. Frej

Hi everybody,

what is the most performant way to get all directly related nodes? I 
know that there are following possibilites:
- node.getRelationships()
- node.traverse(StopEvaluator.DEPTH_ONE)
- Cypher

In the first two cases I get the Relationship and still have to do 
relationship.getEndNode() which seems to me as (little) overhead.

By nature, I want to use the most performant way to realise the task. 
However, I am always puzzled which way to use. Can some please provide 
me some numbers or even theoretical expression?

Thanks,

Didi
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] best way to get all directly related nodes?

2011-11-18 Thread Andres Taylor

On Fri, Nov 18, 2011 at 10:56 AM, D. Frej dieter_f...@gmx.net wrote:

 Hi everybody,

 what is the most performant way to get all directly related nodes? I
 know that there are following possibilites:
 - node.getRelationships()
 - node.traverse(StopEvaluator.DEPTH_ONE)
 - Cypher

 In the first two cases I get the Relationship and still have to do
 relationship.getEndNode() which seems to me as (little) overhead.

 By nature, I want to use the most performant way to realise the task.
 However, I am always puzzled which way to use. Can some please provide
 me some numbers or even theoretical expression?


I'm pretty sure that the core API is the most performant one. So, the first
option should be the fastest. The traversal is next, and Cypher is the
slowest.

The way I see it is: The trade off is how much work the database does for
you, and how much you have to do yourself. Cypher is an abstraction layer
built on top of traversals and core API. It can do more things than the
core API, but you pay for this with extra CPU cycles.

Andrés
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Max flow using gremlin

2011-11-18 Thread Alfredas Chmieliauskas

Dear all,

has anyone implemented any of the max flow algorithms using gremlin?

Alfredas
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread ov


Any one ?

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518340.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Krzysztof Raczyński

Of course providing some more context would be poor too? How are
we supposed to know what's the problem?
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Peter Neubauer

Yes,
I think you should resend your original post that got stuck...
On Nov 18, 2011 12:40 PM, Krzysztof Raczyński racz...@gmail.com wrote:

 Of course providing some more context would be poor too? How are
 we supposed to know what's the problem?
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Krzysztof Raczyński

Btw, inserting 600k nodes over REST with about 8 properties in batches
of 100 takes 20-30minutes for me. It's not awesomely fast, but it's
not slow either. What settings are affecting insertion speeds, Peter?
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Rick Bullotta

That seems about normal.  The good news is that it is much faster (usually) 
than an RDBMS on the same hardware.

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Krzysztof Raczynski
Sent: Friday, November 18, 2011 6:47 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Batch Insert : pr performance

Btw, inserting 600k nodes over REST with about 8 properties in batches
of 100 takes 20-30minutes for me. It's not awesomely fast, but it's
not slow either. What settings are affecting insertion speeds, Peter?
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Michael Hunger

Please try not to use lucene for lookups during batch-inserts just index your 
nodes (for later use) but use a custom, in memory cache for the insertion 
process.

customID - nodeId, like MapString,Long.

Using lucene for lookups takes up to 1000 times longer during batch - inserts 
(probably, as the merge threads in the background have to finish up before you 
can include their
results in the query).

the luceneBatchInserterIndex.setCacheCapacity() seems not to work as expected, 
we will investigate that.

Cheers

Michael

Here is the original post:

Hi, 
I am in almost the same case as a previous post concerning Batch Insert poor 
performance 
but, I still can figure out how to do it correctly with good performances. 

Nodes: 30 millions 
Relationships : 250 millions 

I am on a MacOSX 10.7.1, 4 cpus, 8Go RAM 
1) Insert Nodes : 
JVM -server -d64 -Xmx4G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC 
from 80 000 down to 50 000 inserts / seconds with properties (customID,url) 
with LuceneIndexing on customID and url 
a bit disappointing 

2) Insert Relationships 
JVM -server -d64 -Xmx6G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC 
Index cache capacity 30 000 000 (whole nodes) on customID 
neostore.nodestore.db.mapped_memory=300M 
neostore.relationshipstore.db.mapped_memory=1G 
neostore.propertystore.db.mapped_memory=2.2G 
neostore.propertystore.db.strings.mapped_memory=100M 
neostore.propertystore.db.arrays.mapped_memory=10M 

= insertion rate ~ 50 relationships / seconds 
and going down ... 

(many many tests ... but always very poor performances) 

Do you have any idea, on how to have this work correctly ? 

I am really stuck here 

if you want to have a look at my code : no issues ! :) 

Many many thanks for your help 

Am 18.11.2011 um 12:47 schrieb Krzysztof Raczyński:

 Btw, inserting 600k nodes over REST with about 8 properties in batches
 of 100 takes 20-30minutes for me. It's not awesomely fast, but it's
 not slow either. What settings are affecting insertion speeds, Peter?
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Max flow using gremlin

2011-11-18 Thread Peter Neubauer

Alfredas,
not that I know of. Do you hav ea good implementation idea?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org              - NOSQL for the Enterprise.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.



On Fri, Nov 18, 2011 at 11:45 AM, Alfredas Chmieliauskas
alfredas...@gmail.com wrote:
 Dear all,

 has anyone implemented any of the max flow algorithms using gremlin?

 Alfredas
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Max flow using gremlin

2011-11-18 Thread Marko Rodriguez

Hi,

 has anyone implemented any of the max flow algorithms using gremlin?

Most of the algorithms in my toolbox are flow-based algorithms. What in 
particular are you trying to do?

Marko.

http://markorodriguez.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Invitation to connect on LinkedIn

2011-11-18 Thread Luke Wen via LinkedIn

LinkedIn





Luke  Wen requested to add you as a connection on LinkedIn:
  

--

Craig,

I'd like to add you to my professional network on LinkedIn.

Accept invitation from Luke  Wen
http://www.linkedin.com/e/5gyj7a-gv58z8xr-5p/h9LPQ_TdyUOQHKzIpND15vYO56OQOUsn/blk/I244120619_9/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_elYVcjoMcz4Qd399bQNcr6tShlBMbPcScP4Nc30TczgLrCBxbOYWrSlI/EML_comm_afe/?hs=falsetok=1FXKd32GA_6R01

View invitation from Luke  Wen
http://www.linkedin.com/e/5gyj7a-gv58z8xr-5p/h9LPQ_TdyUOQHKzIpND15vYO56OQOUsn/blk/I244120619_9/0VnPANdz0OcjgQcAALqnpPbOYWrSlI/svi/?hs=falsetok=1zg-5zeGc_6R01
 

--
DID YOU KNOW you can use your LinkedIn profile as your website? Select a vanity 
URL and then promote this address on your business cards, email signatures, 
website, etc
http://www.linkedin.com/e/5gyj7a-gv58z8xr-5p/ewp/inv-21/?hs=falsetok=2NWsHpj0c_6R01

 
-- 
(c) 2011, LinkedIn Corporation
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Peter Neubauer

Olivier,
please let us know your progress, and feel free to issue a pull
request when you get things working!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org              - NOSQL for the Enterprise.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.



On Fri, Nov 18, 2011 at 2:16 PM, ov var...@echo.fr wrote:
 Thanks for your answer Michael,

 Indeed when creating a relationship between 2 nodes, I need to retrieve neo4j 
 nodeID (from customID) for both nodes ...
 I expected the cache to have a real big effect on this mechanism, but alas ...

 For this small graph, I suppose I can fully work in RAM, but this surely 
 won't do for a much bigger graph

 Thanks a lot,
 I'll try with my own cache mechanism

 Regards

 Le 18 nov. 2011 à 13:14, Michael Hunger [via Neo4j Community Discussions] a 
 écrit :

 Please try not to use lucene for lookups during batch-inserts just index 
 your nodes (for later use) but use a custom, in memory cache for the 
 insertion process.

 customID - nodeId, like MapString,Long.

 Using lucene for lookups takes up to 1000 times longer during batch - 
 inserts (probably, as the merge threads in the background have to finish up 
 before you can include their
 results in the query).

 the luceneBatchInserterIndex.setCacheCapacity() seems not to work as 
 expected, we will investigate that.

 Cheers

 Michael

 Here is the original post:

 Hi,
 I am in almost the same case as a previous post concerning Batch Insert poor 
 performance
 but, I still can figure out how to do it correctly with good performances.

 Nodes: 30 millions
 Relationships : 250 millions

 I am on a MacOSX 10.7.1, 4 cpus, 8Go RAM
 1) Insert Nodes :
 JVM -server -d64 -Xmx4G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC
 from 80 000 down to 50 000 inserts / seconds with properties (customID,url)
 with LuceneIndexing on customID and url
 a bit disappointing

 2) Insert Relationships
 JVM -server -d64 -Xmx6G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC
 Index cache capacity 30 000 000 (whole nodes) on customID
 neostore.nodestore.db.mapped_memory=300M
 neostore.relationshipstore.db.mapped_memory=1G
 neostore.propertystore.db.mapped_memory=2.2G
 neostore.propertystore.db.strings.mapped_memory=100M
 neostore.propertystore.db.arrays.mapped_memory=10M

 = insertion rate ~ 50 relationships / seconds
 and going down ...

 (many many tests ... but always very poor performances)

 Do you have any idea, on how to have this work correctly ?

 I am really stuck here

 if you want to have a look at my code : no issues ! :)

 Many many thanks for your help

 Am 18.11.2011 um 12:47 schrieb Krzysztof Raczyński:

  Btw, inserting 600k nodes over REST with about 8 properties in batches
  of 100 takes 20-30minutes for me. It's not awesomely fast, but it's
  not slow either. What settings are affecting insertion speeds, Peter?
  ___
  Neo4j mailing list
  [hidden email]
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 [hidden email]
 https://lists.neo4j.org/mailman/listinfo/user


 If you reply to this email, your message will be added to the discussion 
 below:
 http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518444.html
 To unsubscribe from Batch Insert : pr performance, click here.
 NAML



 --
 View this message in context: 
 http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518559.html
 Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Sampling a Neo4j instance?

2011-11-18 Thread Mattias Persson

They have a common abstract class AbstractGraphDatabase.

Den 18 november 2011 09:46 skrev Anders Lindström andli...@hotmail.com:


 Would this work in HA mode too (i.e. HighlyAvailableGraphDatabase)? I can
 see that the 'getConfig' is there -- but does the cast to
 NeoStoreXaDataSource work as well?
 Thanks.

  Date: Wed, 16 Nov 2011 21:40:32 +0200
  From: chris.gio...@neotechnology.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] Sampling a Neo4j instance?
 
  No, GraphDatabaseService wisely hides those things away. I would
  suggest using instanceof and casting to EmbeddedGraphDatabase.
 
  cheers,
  CG
 
  2011/11/16 Anders Lindström andli...@hotmail.com:
  
   Chris, thanks again for your replies.
   I realize now that I don't have the 'getConfig' method -- I'm writing
 a server plugin and I only get the GraphDatabaseService interface passed to
 my method, not a EmbeddedGraphDatabase. Is there an equivalent way of
 getting the highest node index through the interface?
   Thanks.
  
   Date: Thu, 10 Nov 2011 12:01:31 +0200
   From: chris.gio...@neotechnology.com
   To: user@lists.neo4j.org
   Subject: Re: [Neo4j] Sampling a Neo4j instance?
  
   Answers inline.
  
   2011/11/9 Anders Lindström andli...@hotmail.com:
   
Thanks to the both of you. I am very grateful that you took your
 time to put this into code -- how's that for community!
I presume this way of getting 'highId' is constant in time? It
 looks rather messy though -- is it really the most straightforward way to
 do it?
  
   This is the safest way to do it, that takes into consideration crashes
   and HA cluster membership.
  
   Another way to do it is
  
   long highId = db.getConfig().getIdGeneratorFactory().get( IdType.NODE
   ).getHighId();
  
   which can return the same value with the first, if some conditions are
   met. It is shorter and cast-free but i'd still use the first way.
  
   getHighId() is a constant time operation for both ways described - it
   is just a field access, with an additional long comparison for the
   first case.
  
I am thinking about how efficient this will be. As I understand it,
 the sampling misses come from deleted nodes that once was there. But if I
 remember correctly, Neo4j tries to reuse these unused node indices when new
 nodes are added. But is an unused node index _guaranteed_ to be used given
 that there is one, or could inserting another node result in increasing
 'highId' even though some indices below it are not used?
  
   During the lifetime of a Neo4j instance there is no id reuse for Nodes
   and Relationships - deleted ids are saved however and will be reused
   the next time Neo4j starts. This means that if during run A you
   deleted nodes 3 and 5, the first two nodes returned by createNode() on
   the next run will have ids 3 and 5 - so highId will not change.
   Additionally, during run A, after deleting nodes 3 and 5, no new nodes
   would have the id 3 or 5. A crash (or improper shutdown) of the
   database will break this however, since the ids-to-recycle will
   probably not make it to disk.
  
   So, in short, it is guaranteed that ids *won't* be reused in the same
   run but not guaranteed to be reused between runs.
  
My conclusion is that the sampling misses will increase with
 index usage sparseness and that we will have a high rate of sampling
 misses when we had many deletes and few insertions recently. Would you
 agree?
  
   Yes, that is true, especially given the cost of the wasted I/O and
   of handling the exception. However, this cost can go down
   significantly if you keep a hash set for the ids of nodes you have
   deleted and check that before asking for the node by id, instead of
   catching an exception. Persisting that between runs would move you
   away from encapsulated Neo4j constructs and would also be more
   efficient.
  
Thanks again.
Regards,Anders
   
Date: Wed, 9 Nov 2011 19:30:36 +0200
From: chris.gio...@neotechnology.com
To: user@lists.neo4j.org
Subject: Re: [Neo4j] Sampling a Neo4j instance?
   
Hi,
   
Backing Jim's algorithm with some code:
   
public static void main( String[] args )
{
long SAMPLE_SIZE = 1;
EmbeddedGraphDatabase db = new EmbeddedGraphDatabase(
path/to/db/ );
// Determine the highest possible id for the node store
long highId = ( (NeoStoreXaDataSource)
   
 db.getConfig().getTxModule().getXaDataSourceManager().getXaDataSource(
Config.DEFAULT_DATA_SOURCE_NAME )
).getNeoStore().getNodeStore().getHighId();
System.out.println( highId +  is the highest id );
long i = 0;
long nextId;
   
// Do the sampling
Random random = new Random();
while ( i  SAMPLE_SIZE )
{
nextId = Math.abs( random.nextLong() ) % highId;
try
{

Re: [Neo4j] About Neo4j Indexing

2011-11-18 Thread Samuel Feng

Hello List,

Anyone can help me on this?

Thanks and regards,

Samuel

在 2011年11月14日 下午1:51，Samuel Feng okos...@gmail.com写道：

 Dear List,

 I have two questions about indexing

 *Question 1*

 At the time of creation, we can use extra configuration can be specified
 to control the behavior of the index and which backend to use.
 e.g,
 IndexManager index = graphDb.index();
 IndexNode movies = index.forNodes( movies-fulltext,
 MapUtil.stringMap( IndexManager.PROVIDER, lucene, analyzer,
 org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer ) );
 movies.add( theMatrix, cTitle, 黑客帝国 );
 movies.add( theMatrix, date, 2000-01-01 );

 When adding node theMatrix to index, all the values will be
 analyzed/tokenized by SmartChineseAnalyzer.
 However， for some fields I do not want it to be analyzed/tokenized, Any
 interfaces to implement this? Can enhance valueContext so that I can pass
 in something like Field.Index.NOT_ANALYZED when adding a node into index?

 *Question 2*

 For Query,

 IndexHitsNode nodes = movies .query(new BooleanQuery(...));
 Node currentNode = null;
 ListMovie result = new ArrayListMovie();

 while (nodes.hasNext()) {
currentNode = nodes.next();
Movie m = new Movie(currentNode);

if(m.getDate().equals(2001-01-01)){
  result.add(m);
}
 }

 I found that if the indexHits is large, say size()  2, Each
 m.getDate() will spend some time to load the value from underlying
 node(especially the first-time query), So the total elapsed time is very
 long.
 Any interface that I can read the lucene document behind this node
 directly? Maybe u can use nodes.currentDoc() to expose it?

 Thanks and Regards,

 Samuel


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] About Neo4j Indexing

2011-11-18 Thread Peter Neubauer

Samuel,
so, in order to do this right, we would like to associate index specific
properties to nodes, in order to do things right. This is planned for Neo4j
1.7 with a much more powerful (auto)indexing framework. Before that, things
would be a hack, so I think we will postponing this. However, this is very
relevant even to Cypher query optimization.

Thanks for bringing this up! If you like please raise an issue on this so
you can track it.

Cheers,

/peter neubauer

GTalk:  neubauer.peter
Skype   peter.neubauer
Phone   +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter  http://twitter.com/peterneubauer

http://www.neo4j.org  - NOSQL for the Enterprise.
http://startupbootcamp.org/- Öresund - Innovation happens HERE.


2011/11/14 Samuel Feng okos...@gmail.com

 Dear List,

 I have two questions about indexing

 *Question 1*

 At the time of creation, we can use extra configuration can be specified to
 control the behavior of the index and which backend to use.
 e.g,
 IndexManager index = graphDb.index();
 IndexNode movies = index.forNodes( movies-fulltext,
MapUtil.stringMap( IndexManager.PROVIDER, lucene, analyzer,
 org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer ) );
 movies.add( theMatrix, cTitle, 黑客帝国 );
 movies.add( theMatrix, date, 2000-01-01 );

 When adding node theMatrix to index, all the values will be
 analyzed/tokenized by SmartChineseAnalyzer.
 However， for some fields I do not want it to be analyzed/tokenized, Any
 interfaces to implement this? Can enhance valueContext so that I can pass
 in something like Field.Index.NOT_ANALYZED when adding a node into index?

 *Question 2*

 For Query,

 IndexHitsNode nodes = movies .query(new BooleanQuery(...));
 Node currentNode = null;
 ListMovie result = new ArrayListMovie();

 while (nodes.hasNext()) {
   currentNode = nodes.next();
   Movie m = new Movie(currentNode);

   if(m.getDate().equals(2001-01-01)){
 result.add(m);
   }
 }

 I found that if the indexHits is large, say size()  2, Each
 m.getDate() will spend some time to load the value from underlying
 node(especially the first-time query), So the total elapsed time is very
 long.
 Any interface that I can read the lucene document behind this node
 directly? Maybe u can use nodes.currentDoc() to expose it?

 Thanks and Regards,

 Samuel
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Lab day: Cypher queries in embedded python bindings

2011-11-18 Thread Jacob Hansson

Hey all,

Like we've mentioned before, we have lab-day fridays at Neo4j, and today I
hacked some stuff together that landed directly in trunk for the embedded
python bindings.

As of 1 minute ago, the following operations are now possible with the
embedded python API:

from neo4j import GraphDatabase
db = GraphDatabase(/home/jake/db)

# Plain query
result = db.query(START n=node(0) RETURN n)

# Parameterized query
result = db.query(START n=node({id}) RETURN n, id=0)

# Pre-parsed query
get_node_by_id = db.prepare_query(START n=node({id}) RETURN n)
result = db.query(get_node_by_id, id=0)

# Read the result
for row in result:
print row['n']

for value in result['n']:
print value

node = db.query(get_node_by_id, id=0)['n'].single

Lemme know what you think :)

This is not available on Pypi yet (will be when the first 1.6 milestone is
released) but you can build it super-easily yourself, instructions are in
the readme at github: https://github.com/neo4j/python-embedded

Cheers,
-- 
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins



-- 
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Scalability Roadmap

2011-11-18 Thread serge

Are these following topics will be treated in future release (and when if you
know) ?

1/ Supernode

I know there is a big downside in handle of super-nodes, which can be a big
issue in a twitter-like website with, for example a user followed by more
than 200k users (i have in head, real case) or in a recommendation system
which have sophisticated rules.

I would like to know if the super-node issue (as we name it) is planned to
be investigated in futures releases ? 

2/ Sharding and horizontal scalability

I guess sharding is a complex problem to handle with graph db but is it
planned to address the horizontal scalability goal ? and that, even if it
should bring us towards kind of inconsistensy but acceptable situation 
(for example, there are many cases of synchronization latency website can
accept when it have a big load)

Thanks



--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Scalability-Roadmap-tp3519034p3519034.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Scalability Roadmap

2011-11-18 Thread Pablo Pareja

Hi Serge,

Regarding supernodes I already opened an issue about this some time ago:

https://github.com/neo4j/community/issues/19

and as you can read there, at the end of the conversation Peter said:  we
will hopefully be on it for 1.6 !  
I really hope they keep thinking of fixing this for 1.6 release, I'd
actually say that this is one of the most urgent points that should be
covered right now...

Cheers,

Pablo Pareja

On Fri, Nov 18, 2011 at 5:38 PM, serge s.fedoro...@gmail.com wrote:

 Are these following topics will be treated in future release (and when if
 you
 know) ?

 1/ Supernode

 I know there is a big downside in handle of super-nodes, which can be a big
 issue in a twitter-like website with, for example a user followed by more
 than 200k users (i have in head, real case) or in a recommendation system
 which have sophisticated rules.

 I would like to know if the super-node issue (as we name it) is planned
 to
 be investigated in futures releases ?

 2/ Sharding and horizontal scalability

 I guess sharding is a complex problem to handle with graph db but is it
 planned to address the horizontal scalability goal ? and that, even if it
 should bring us towards kind of inconsistensy but acceptable situation
 (for example, there are many cases of synchronization latency website can
 accept when it have a big load)

 Thanks



 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Scalability-Roadmap-tp3519034p3519034.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Pablo Pareja Tobes

My site http://about.me/pablopareja
LinkedInhttp://www.linkedin.com/in/pabloparejatobes
Twitter   http://www.twitter.com/pablopareja

Creator of Bio4j -- http://www.bio4j.com

http://www.ohnosequences.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Scalability Roadmap

2011-11-18 Thread serge

thanks, it sounds great :) 

is there a release date for 1.6 ?

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Scalability-Roadmap-tp3519034p3519137.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] [URGENT] Recommended server configurations

2011-11-18 Thread gustavoboby

Hi people's,

I'm creating a social network with a larg number of expected hits and i need
help with
the server recommended configurations:

1 - Operating system (Linux or Windows? What specifically?)
2 - Hardware (How much Memory necessary?)

You think the use of Neo4j REST API will cause problem? I use it to develop
my Asp.Net 
applications

I am open to suggestions!!

I thank the help.

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/URGENT-Recommended-server-configurations-tp3519328p3519328.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Scalability Roadmap

2011-11-18 Thread Jim Webber

 1/ Supernode

2012, around Q2.

 2/ Sharding and horizontal scalability

2013, around Q1.

These are guesses not promises :-)

Jim

PS - sharding graphs is NP complete. In theory no general solution exists.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] [URGENT] Recommended server configurations

2011-11-18 Thread Jacob Hansson

On Fri, Nov 18, 2011 at 7:21 PM, gustavoboby gustavob...@gmail.com wrote:

Hi people's,

I'm creating a social network with a larg number of expected hits and i
need
help with
the server recommended configurations:

1 - Operating system (Linux or Windows? What specifically?)

If you have the choice, Linux is preferable. We fully support both
platforms, but generally get higher performance on Linux, and less problems.

2 - Hardware (How much Memory necessary?)

This completely depends on how much data you intend to store. Can you
provide an estimation of how big your dataset would be? Number of nodes,
number of relationships per nodes, and how many properties (on both nodes
and relationships), and what types of property values.

You think the use of Neo4j REST API will cause problem? I use it to develop
my Asp.Net
applications

It depends on how you use it. Generally, you will get reasonable insert
speed if the client you use supports the batch operations part of the REST
API, query speed will depend on the query of course. You will get
significantly better performance with the embedded database right now, but
that is only available in JVM languages and Python.

I am open to suggestions!!

I thank the help.

--
View this message in context:
http://neo4j-community-discussions.438527.n3.nabble.com/URGENT-Recommended-server-configurations-tp3519328p3519328.html
Sent from the Neo4j Community Discussions mailing list archive at
Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

--
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Max flow using gremlin

2011-11-18 Thread Alfredas Chmieliauskas

Hey Marko,

I'm modeling the european gas transport/pipeline network. I need to
have a good way to calculate maximum flow from source to sink and get
the nodes in the path .

Alfredas

On Fri, Nov 18, 2011 at 2:48 PM, Marko Rodriguez okramma...@gmail.com wrote:
 Hi,

 has anyone implemented any of the max flow algorithms using gremlin?

 Most of the algorithms in my toolbox are flow-based algorithms. What in 
 particular are you trying to do?

 Marko.

 http://markorodriguez.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Max flow using gremlin

2011-11-18 Thread Marko Rodriguez

Hey,

Perhaps the simplist way to explore flow is to simply get the paths between 
source and sink and then calculate some function f over the path to determine 
its flow. For example:

def f = { List path -
// some function over the path where every other element is an 
edge (see traversal below)
}

source.outE.inV.loop(2){it.object.equals(sink)}.paths.each{
println(it +  has a flow of  + f(it))
}

This assumes you have a determined source and a determined sink and that there 
are no cycles in your gas pipeline. If there are cycles, then you can tweak the 
expression to make sure you break out of the loop when appropriate.

From this basic idea you can then tweak it to simulate decay over time/step or 
implement random walks through the gasline if you are interested in sampling 
or studying local eigenvectors in the pipeline.

Hope that provides you a good starting point.

Enjoy!,
Marko

http://markorodriguez.com

On Nov 18, 2011, at 1:20 PM, Alfredas Chmieliauskas wrote:

 Hey Marko,
 
 I'm modeling the european gas transport/pipeline network. I need to
 have a good way to calculate maximum flow from source to sink and get
 the nodes in the path .
 
 Alfredas
 
 On Fri, Nov 18, 2011 at 2:48 PM, Marko Rodriguez okramma...@gmail.com wrote:
 Hi,
 
 has anyone implemented any of the max flow algorithms using gremlin?
 
 Most of the algorithms in my toolbox are flow-based algorithms. What in 
 particular are you trying to do?
 
 Marko.
 
 http://markorodriguez.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Scalability Roadmap

2011-11-18 Thread Matt Luongo

Jim,

Not to nitpick, but that's for an ideal graph partitioning, not graph
sharding overall, right? Eg the problem is solvable in many specific
domains?

- Matt
On Nov 18, 2011 1:27 PM, Jim Webber j...@neotechnology.com wrote:

  1/ Supernode

 2012, around Q2.

  2/ Sharding and horizontal scalability

 2013, around Q1.

 These are guesses not promises :-)

 Jim

 PS - sharding graphs is NP complete. In theory no general solution exists.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Max flow using gremlin

2011-11-18 Thread Alfredas Chmieliauskas

Great! Thanks.

Also its missing the !... should be
source.outE.inV.loop(2){!it.object.equals(sink)}.paths.each{

A


On Fri, Nov 18, 2011 at 9:47 PM, Marko Rodriguez okramma...@gmail.com wrote:
 Hey,

 Perhaps the simplist way to explore flow is to simply get the paths between 
 source and sink and then calculate some function f over the path to determine 
 its flow. For example:

        def f = { List path -
                // some function over the path where every other element is an 
 edge (see traversal below)
        }

        source.outE.inV.loop(2){it.object.equals(sink)}.paths.each{
                println(it +  has a flow of  + f(it))
        }

 This assumes you have a determined source and a determined sink and that 
 there are no cycles in your gas pipeline. If there are cycles, then you can 
 tweak the expression to make sure you break out of the loop when appropriate.

 From this basic idea you can then tweak it to simulate decay over time/step 
 or implement random walks through the gasline if you are interested in 
 sampling or studying local eigenvectors in the pipeline.

 Hope that provides you a good starting point.

 Enjoy!,
 Marko

 http://markorodriguez.com

 On Nov 18, 2011, at 1:20 PM, Alfredas Chmieliauskas wrote:

 Hey Marko,

 I'm modeling the european gas transport/pipeline network. I need to
 have a good way to calculate maximum flow from source to sink and get
 the nodes in the path .

 Alfredas

 On Fri, Nov 18, 2011 at 2:48 PM, Marko Rodriguez okramma...@gmail.com 
 wrote:
 Hi,

 has anyone implemented any of the max flow algorithms using gremlin?

 Most of the algorithms in my toolbox are flow-based algorithms. What in 
 particular are you trying to do?

 Marko.

 http://markorodriguez.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Max flow using gremlin

2011-11-18 Thread Marko A. Rodriguez

 Great! Thanks.
 
 Also its missing the !... should be
 source.outE.inV.loop(2){!it.object.equals(sink)}.paths.each{
 

Yes...good catch.

Good luck,
Marko.

http://markorodriguez.com




 
 
 On Fri, Nov 18, 2011 at 9:47 PM, Marko Rodriguez okramma...@gmail.com wrote:
 Hey,
 
 Perhaps the simplist way to explore flow is to simply get the paths between 
 source and sink and then calculate some function f over the path to 
 determine its flow. For example:
 
def f = { List path -
// some function over the path where every other element is 
 an edge (see traversal below)
}
 
source.outE.inV.loop(2){it.object.equals(sink)}.paths.each{
println(it +  has a flow of  + f(it))
}
 
 This assumes you have a determined source and a determined sink and that 
 there are no cycles in your gas pipeline. If there are cycles, then you can 
 tweak the expression to make sure you break out of the loop when appropriate.
 
 From this basic idea you can then tweak it to simulate decay over time/step 
 or implement random walks through the gasline if you are interested in 
 sampling or studying local eigenvectors in the pipeline.
 
 Hope that provides you a good starting point.
 
 Enjoy!,
 Marko
 
 http://markorodriguez.com
 
 On Nov 18, 2011, at 1:20 PM, Alfredas Chmieliauskas wrote:
 
 Hey Marko,
 
 I'm modeling the european gas transport/pipeline network. I need to
 have a good way to calculate maximum flow from source to sink and get
 the nodes in the path .
 
 Alfredas
 
 On Fri, Nov 18, 2011 at 2:48 PM, Marko Rodriguez okramma...@gmail.com 
 wrote:
 Hi,
 
 has anyone implemented any of the max flow algorithms using gremlin?
 
 Most of the algorithms in my toolbox are flow-based algorithms. What in 
 particular are you trying to do?
 
 Marko.
 
 http://markorodriguez.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] [URGENT] Recommended server configurations

2011-11-18 Thread José Devezas

You can use Neo Technology's Hardware Sizing Calculator to estimate the
CPU, RAM and disk space needed for your set up prediction:
http://neotechnology.com/calculator/trial.html

If you're not doing a batch insertion, the REST API would be fine I guess,
specially if you put the database on a separate machine.

On Fri, Nov 18, 2011 at 8:02 PM, Jacob Hansson
jacob.hans...@neotechnology.com wrote:

On Fri, Nov 18, 2011 at 7:21 PM, gustavoboby gustavob...@gmail.com
wrote:

Hi people's,

I'm creating a social network with a larg number of expected hits and i
need
help with
the server recommended configurations:

1 - Operating system (Linux or Windows? What specifically?)

If you have the choice, Linux is preferable. We fully support both
platforms, but generally get higher performance on Linux, and less
problems.

2 - Hardware (How much Memory necessary?)

You think the use of Neo4j REST API will cause problem? I use it to
develop
my Asp.Net
applications

I am open to suggestions!!

I thank the help.

--
View this message in context:

http://neo4j-community-discussions.438527.n3.nabble.com/URGENT-Recommended-server-configurations-tp3519328p3519328.html
Sent from the Neo4j Community Discussions mailing list archive at
Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

--
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

--
José Devezas - http://www.josedevezas.com
MSc Informatics and Computing Engineering
Social Media and Network Theory Research
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Scalability Roadmap

2011-11-18 Thread Jim Webber

Hey Matt,

 Not to nitpick, but that's for an ideal graph partitioning, not graph
 sharding overall, right? Eg the problem is solvable in many specific
 domains?

You're right - it's the general case. I was just making the point that sharding 
isn't something that's an afternoon's hacking to complete.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Max flow using gremlin

2011-11-18 Thread Alfredas Chmieliauskas

This seems to calculate the max flow (edges have capacity):

source.outE.inV.loop(2){!it.object.equals(sink)}.paths.each{flow =
it.capacity.min(); maxFlow += flow;
it.findAll{it.capacity}.each{it.capacity -= flow}};

I can't believe this is so short!

A



On Fri, Nov 18, 2011 at 10:51 PM, Marko A. Rodriguez
okramma...@gmail.com wrote:
 Great! Thanks.

 Also its missing the !... should be
 source.outE.inV.loop(2){!it.object.equals(sink)}.paths.each{


 Yes...good catch.

 Good luck,
 Marko.

 http://markorodriguez.com






 On Fri, Nov 18, 2011 at 9:47 PM, Marko Rodriguez okramma...@gmail.com 
 wrote:
 Hey,

 Perhaps the simplist way to explore flow is to simply get the paths between 
 source and sink and then calculate some function f over the path to 
 determine its flow. For example:

        def f = { List path -
                // some function over the path where every other element is 
 an edge (see traversal below)
        }

        source.outE.inV.loop(2){it.object.equals(sink)}.paths.each{
                println(it +  has a flow of  + f(it))
        }

 This assumes you have a determined source and a determined sink and that 
 there are no cycles in your gas pipeline. If there are cycles, then you can 
 tweak the expression to make sure you break out of the loop when 
 appropriate.

 From this basic idea you can then tweak it to simulate decay over time/step 
 or implement random walks through the gasline if you are interested in 
 sampling or studying local eigenvectors in the pipeline.

 Hope that provides you a good starting point.

 Enjoy!,
 Marko

 http://markorodriguez.com

 On Nov 18, 2011, at 1:20 PM, Alfredas Chmieliauskas wrote:

 Hey Marko,

 I'm modeling the european gas transport/pipeline network. I need to
 have a good way to calculate maximum flow from source to sink and get
 the nodes in the path .

 Alfredas

 On Fri, Nov 18, 2011 at 2:48 PM, Marko Rodriguez okramma...@gmail.com 
 wrote:
 Hi,

 has anyone implemented any of the max flow algorithms using gremlin?

 Most of the algorithms in my toolbox are flow-based algorithms. What in 
 particular are you trying to do?

 Marko.

 http://markorodriguez.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Max flow using gremlin

2011-11-18 Thread Marko Rodriguez

 This seems to calculate the max flow (edges have capacity):
 
 source.outE.inV.loop(2){!it.object.equals(sink)}.paths.each{flow =
 it.capacity.min(); maxFlow += flow;
 it.findAll{it.capacity}.each{it.capacity -= flow}};
 
 I can't believe this is so short!

Thats the beauty of Gremlin. Once you get it, you can rip some very complex 
traversals in just a few characters. 

NOTES: For speed, make it.capacity - it.getProperty('capacity')
Some good notes here: 
https://github.com/tinkerpop/gremlin/wiki/Gremlin-Groovy-Path-Optimizations

Glad we could help you with your problem. 

Enjoy!,
Marko.

http://markorodriguez.com

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Scalability Roadmap

2011-11-18 Thread Rick Bullotta

...but I'm sure the community will come up with a wide range of sharding 
patterns, code, and best practices!

On Nov 18, 2011, at 5:46 PM, Jim Webber j...@neotechnology.com wrote:

 Hey Matt,
 
 Not to nitpick, but that's for an ideal graph partitioning, not graph
 sharding overall, right? Eg the problem is solvable in many specific
 domains?
 
 You're right - it's the general case. I was just making the point that 
 sharding isn't something that's an afternoon's hacking to complete.
 
 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Max flow using gremlin

2011-11-18 Thread Peter Neubauer

Guys,
I could put this into the docs, just for future reference. Great
contributions Marko and Alfredas!
On Nov 18, 2011 11:58 PM, Marko Rodriguez okramma...@gmail.com wrote:

  This seems to calculate the max flow (edges have capacity):
 
  source.outE.inV.loop(2){!it.object.equals(sink)}.paths.each{flow =
  it.capacity.min(); maxFlow += flow;
  it.findAll{it.capacity}.each{it.capacity -= flow}};
 
  I can't believe this is so short!

 Thats the beauty of Gremlin. Once you get it, you can rip some very
 complex traversals in just a few characters.

 NOTES: For speed, make it.capacity - it.getProperty('capacity')
Some good notes here:
 https://github.com/tinkerpop/gremlin/wiki/Gremlin-Groovy-Path-Optimizations

 Glad we could help you with your problem.

 Enjoy!,
 Marko.

 http://markorodriguez.com

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] REST, Gremlin and transactions (neo4django's type hierarchy)

2011-11-18 Thread Matt Luongo

Guys,

I'm trying to get neo4django's type hierarchy behaving in a safe way for
multiprocessing. I ducked the
REST API proper and am using the Gremlin extension, since I need the type
creation operation to be atomic.

The hierarchy is a simple single-inheritance system represented in-graph as
a tree rooted at the reference
node. Each node in the tree represents a type, including it's name
(`model_name`) and the module the type
was defined in (`app_label`).

I came up with the following script

g.setMaxBufferSize(0)
g.startTransaction()

cur_vertex = g.v(0)
for (def type_props : types) {
candidate = cur_vertex.outE('TYPE').inV.find{
it.map.subMap(type_props.keySet()) == type_props
}
if (candidate == null) {
new_type_node = g.addVertex(type_props)
name = type_props['app_label'] + : + type_props['model_name']
new_type_node.name = name
g.addEdge(cur_vertex, new_type_node, TYPE)
cur_vertex = new_type_node
}
else {
cur_vertex = candidate
}
}

g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS)

result = cur_vertex

which searches for a type node that fits the type lineage sent in through
the JSON-encoded `types`
list. The code works fine as a replacement for how I was managing types
in-graph.

However, if I send this script (again, through REST) using three threads
simultaneously, I don't get
the expected behavior. Instead of the first request resulting in one new
type node, and the other two
returning the node created by the first, three nodes are created and
returned. Which is irksome.

I'm pretty sure this is due to my own ignorance, but I've tried to do my
homework.
http://wiki.neo4j.org/content/Transactions#Isolation leads me to believe
that maybe code like above
won't work, because it only writes on condition after a read, but doesn't
have a read lock. Could this
be the case? and if so, is there a suggested fix in Gremlin?

Any help/intuition would be greatly appreciated!

--
Matt Luongo
Co-Founder, Scholr.ly
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Sampling a Neo4j instance?

[Neo4j] best way to get all directly related nodes?

Re: [Neo4j] best way to get all directly related nodes?

[Neo4j] Max flow using gremlin

Re: [Neo4j] Batch Insert : poooor performance

Re: [Neo4j] Batch Insert : poooor performance

Re: [Neo4j] Batch Insert : poooor performance

Re: [Neo4j] Batch Insert : poooor performance

Re: [Neo4j] Batch Insert : poooor performance

Re: [Neo4j] Batch Insert : poooor performance

Re: [Neo4j] Max flow using gremlin

Re: [Neo4j] Max flow using gremlin

[Neo4j] Invitation to connect on LinkedIn

Re: [Neo4j] Batch Insert : poooor performance

Re: [Neo4j] Sampling a Neo4j instance?

Re: [Neo4j] About Neo4j Indexing

Re: [Neo4j] About Neo4j Indexing

[Neo4j] Lab day: Cypher queries in embedded python bindings

[Neo4j] Scalability Roadmap

Re: [Neo4j] Scalability Roadmap

Re: [Neo4j] Scalability Roadmap

[Neo4j] [URGENT] Recommended server configurations

Re: [Neo4j] Scalability Roadmap

Re: [Neo4j] [URGENT] Recommended server configurations

Re: [Neo4j] Max flow using gremlin

Re: [Neo4j] Max flow using gremlin

Re: [Neo4j] Scalability Roadmap

Re: [Neo4j] Max flow using gremlin

Re: [Neo4j] Max flow using gremlin

Re: [Neo4j] [URGENT] Recommended server configurations

Re: [Neo4j] Scalability Roadmap

Re: [Neo4j] Max flow using gremlin

Re: [Neo4j] Max flow using gremlin

Re: [Neo4j] Scalability Roadmap

Re: [Neo4j] Max flow using gremlin

[Neo4j] REST, Gremlin and transactions (neo4django's type hierarchy)

36 matches

Site Navigation

Mail list logo

Footer information