[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution
[ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107165#comment-13107165 ] Jake Mannix commented on GIRAPH-37: --- We should make sure we don't all work on the same thing (note the discussion at the end of GIRAPH-12) - two at a time might be fine, but half of the developers all on RPC might be excessive. Do you want to take this one? I was going to go in and try and implement a Finagle-based solution, as it's already an async RPC-system on top of Netty, but if you're already going to look at this, I can drop what I was doing and work on something else. Implement Netty-backed rpc solution --- Key: GIRAPH-37 URL: https://issues.apache.org/jira/browse/GIRAPH-37 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan Assignee: Jakob Homan GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix reassigned GIRAPH-12: - Assignee: Avery Ching (was: Hyunsik Choi) Investigate communication improvements -- Key: GIRAPH-12 URL: https://issues.apache.org/jira/browse/GIRAPH-12 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Avery Ching Assignee: Avery Ching Priority: Minor Attachments: GIRAPH-12_1.patch Currently every worker will start up a thread to communicate with every other workers. Hadoop RPC is used for communication. For instance if there are 400 workers, each worker will create 400 threads. This ends up using a lot of memory, even with the option -Dmapred.child.java.opts=-Xss64k. It would be good to investigate using frameworks like Netty or custom roll our own to improve this situation. By moving away from Hadoop RPC, we would also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix reassigned GIRAPH-12: - Assignee: Hyunsik Choi (was: Avery Ching) Sorry, my 4-year old clicked when I was looking at this ticket. Didn't notice that it managed to make an actual assignment, reverting! Investigate communication improvements -- Key: GIRAPH-12 URL: https://issues.apache.org/jira/browse/GIRAPH-12 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Avery Ching Assignee: Hyunsik Choi Priority: Minor Attachments: GIRAPH-12_1.patch Currently every worker will start up a thread to communicate with every other workers. Hadoop RPC is used for communication. For instance if there are 400 workers, each worker will create 400 threads. This ends up using a lot of memory, even with the option -Dmapred.child.java.opts=-Xss64k. It would be good to investigate using frameworks like Netty or custom roll our own to improve this situation. By moving away from Hadoop RPC, we would also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps
[ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107281#comment-13107281 ] Jake Mannix commented on GIRAPH-36: --- Initial thoughts: VertexReader defines a next(MutableVertex vertex) method, which does the sensible thing of filling in the vertex from the HDFS block, and because it takes a vertex object and messes with it, it's natural that the vertex be required to be a MutableVertex. But of course this implies that *everything* be a MutableVertex, because if you can't be read in by a VertexReader, where do you get instantiated at all? If BasicVertex implements Writable, we could always readFields() data in, but not allow mutation, but this seems like it would interfere with the way VertexReader allows users to read straight from Text, etc. This would allow VertexList to extend ArrayListBasicVertex instead of ArrayListVertex, at the same time. Anyone have any thoughts/ideas? Are we wedded to making VertexReader implementations deal with MutableVertex, or can we swap them to handle Writable BasicVertex? Ensure that subclassing BasicVertex is possible by user apps Key: GIRAPH-36 URL: https://issues.apache.org/jira/browse/GIRAPH-36 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Priority: Blocker Fix For: 0.70.0 Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex). Classes which wish to have application specific data structures (ie. not a TreeMapI, EdgeI,E) may need to extend either MutableVertex or BasicVertex. Unfortunately VertexRange extends ArrayListVertex, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex. Let's make sure the internal APIs allow for BasicVertex to be the base class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps
[ https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107282#comment-13107282 ] Jake Mannix commented on GIRAPH-36: --- In fact, thinking about VertexReader further, it seems its entire API is a little backwards. Why are we *passing in* instantiated Vertices, and filling them in? Shouldn't they effectively be iterators over the InputSplit? Ensure that subclassing BasicVertex is possible by user apps Key: GIRAPH-36 URL: https://issues.apache.org/jira/browse/GIRAPH-36 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Priority: Blocker Fix For: 0.70.0 Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex). Classes which wish to have application specific data structures (ie. not a TreeMapI, EdgeI,E) may need to extend either MutableVertex or BasicVertex. Unfortunately VertexRange extends ArrayListVertex, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex. Let's make sure the internal APIs allow for BasicVertex to be the base class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-34) Failure of Vertex reflection for putVertexList from GIRAPH-27
[ https://issues.apache.org/jira/browse/GIRAPH-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106717#comment-13106717 ] Jake Mannix commented on GIRAPH-34: --- Wait, why would the sending Vertex modify the message object they just sent? Why would the even have a reference to it anymore? It's a message, right? Could we not simply document that messages should be treated as ephemeral and not retained? It seems like doing a bunch of reflection and object copying for each message to be sent could get prohibitively expensive. As I look through the VertexRangeBalance code, I notice also that VertexList extends ArrayListWritableVertexI, V, E, M. Yikes! Not everything needs to be a Vertex anymore - if we let people extend BasicVertex (or MutableVertex) instead of always extending Vertex, they'll get killed with runtime classcast exceptions if they try to do any balancing. Failure of Vertex reflection for putVertexList from GIRAPH-27 -- Key: GIRAPH-34 URL: https://issues.apache.org/jira/browse/GIRAPH-34 Project: Giraph Issue Type: Bug Reporter: Christian Kunz Assignee: Avery Ching Attachments: GIRAPH-34.patch Christian actually found this bug. I am filing the JIRA on his behalf. Here's my error when running TestVertexRangeBalancer. java.lang.RuntimeException: java.io.IOException: Call to returnwhose-lm/10.72.107.231:30002 failed on local exception: java.io.EOFException at org.apache.giraph.comm.BasicRPCCommunications.sendVertexListReq(BasicRPCCommunications.java:768) at org.apache.giraph.graph.BspServiceWorker.exchangeVertexRanges(BspServiceWorker.java:1282) at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:589) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.io.IOException: Call to returnwhose-lm/10.72.107.231:30002 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) at org.apache.hadoop.ipc.Client.call(Client.java:1033) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy3.putVertexList(Unknown Source) at org.apache.giraph.comm.BasicRPCCommunications.sendVertexListReq(BasicRPCCommunications.java:766) ... 10 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) I identified and fixed the issue by making BasicVertex implement Configurable and making the graph state set in BasicRPCCommunications. There is one more error though that I'll try and solve before putting up a reviewboard. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-34) Failure of Vertex reflection for putVertexList from GIRAPH-27
[ https://issues.apache.org/jira/browse/GIRAPH-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106748#comment-13106748 ] Jake Mannix commented on GIRAPH-34: --- I'll definitely open another JIRA for the Vertex subclasses, and dig into that a bit. But on this current topic, I see how users could possibly do something like sendMsg(destVertex, getVertexValue()), yes. But isn't this analogous to in regular Hadoop-land, that you simply cannot expect to hang onto your Writable instances and use them later. If you're in Mapper.map(SomethingWritableComparable key, SomethingWritable value, Context c), you should *never* just buffer up the key and value instances, as this is practically guaranteed to break - Hadoop will be re-using the key and value as container objects to read new bytes off of disk for the next invocation to map(), so that java objects are rarely created, instead you're just constantly doing simple bit/byte operations on the disk stream, and setting values inside of Writable containers. It seems like one of the basic contracts of Writables (at least in Hadoop-land) is that they are always to be considered containers: call get() or getSomeKindOfThing() on them as soon as you have a handle on one, and use whatever *that* is, assuming that the framework can and will reuse your original Writable. Failure of Vertex reflection for putVertexList from GIRAPH-27 -- Key: GIRAPH-34 URL: https://issues.apache.org/jira/browse/GIRAPH-34 Project: Giraph Issue Type: Bug Reporter: Christian Kunz Assignee: Avery Ching Attachments: GIRAPH-34.patch Christian actually found this bug. I am filing the JIRA on his behalf. Here's my error when running TestVertexRangeBalancer. java.lang.RuntimeException: java.io.IOException: Call to returnwhose-lm/10.72.107.231:30002 failed on local exception: java.io.EOFException at org.apache.giraph.comm.BasicRPCCommunications.sendVertexListReq(BasicRPCCommunications.java:768) at org.apache.giraph.graph.BspServiceWorker.exchangeVertexRanges(BspServiceWorker.java:1282) at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:589) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.io.IOException: Call to returnwhose-lm/10.72.107.231:30002 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) at org.apache.hadoop.ipc.Client.call(Client.java:1033) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy3.putVertexList(Unknown Source) at org.apache.giraph.comm.BasicRPCCommunications.sendVertexListReq(BasicRPCCommunications.java:766) ... 10 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) I identified and fixed the issue by making BasicVertex implement Configurable and making the graph state set in BasicRPCCommunications. There is one more error though that I'll try and solve before putting up a reviewboard. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-34) Failure of Vertex reflection for putVertexList from GIRAPH-27
[ https://issues.apache.org/jira/browse/GIRAPH-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106780#comment-13106780 ] Jake Mannix commented on GIRAPH-34: --- Yeah, how do you do that, Dmitriy? Failure of Vertex reflection for putVertexList from GIRAPH-27 -- Key: GIRAPH-34 URL: https://issues.apache.org/jira/browse/GIRAPH-34 Project: Giraph Issue Type: Bug Reporter: Christian Kunz Assignee: Avery Ching Attachments: GIRAPH-34.patch Christian actually found this bug. I am filing the JIRA on his behalf. Here's my error when running TestVertexRangeBalancer. java.lang.RuntimeException: java.io.IOException: Call to returnwhose-lm/10.72.107.231:30002 failed on local exception: java.io.EOFException at org.apache.giraph.comm.BasicRPCCommunications.sendVertexListReq(BasicRPCCommunications.java:768) at org.apache.giraph.graph.BspServiceWorker.exchangeVertexRanges(BspServiceWorker.java:1282) at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:589) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.io.IOException: Call to returnwhose-lm/10.72.107.231:30002 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) at org.apache.hadoop.ipc.Client.call(Client.java:1033) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy3.putVertexList(Unknown Source) at org.apache.giraph.comm.BasicRPCCommunications.sendVertexListReq(BasicRPCCommunications.java:766) ... 10 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) I identified and fixed the issue by making BasicVertex implement Configurable and making the graph state set in BasicRPCCommunications. There is one more error though that I'll try and solve before putting up a reviewboard. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-34) Failure of Vertex reflection for putVertexList from GIRAPH-27
[ https://issues.apache.org/jira/browse/GIRAPH-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106804#comment-13106804 ] Jake Mannix commented on GIRAPH-34: --- +1 from me - although I haven't run it on an actual cluster, so I'm going by my reading of the code. Although we should think further about ways we can be safe: it's possible that the right and efficient thing to do is analogous to your context.write() example: we take the Writable message, and we serialize the Writable to a byte[], and pass that byte[] to the local recipient if there is one. That recipient should be able to inexpensively deserialize and rehydrate the messages on the fly when running the VertexCombiner (only using one container Writable at a time, doing the same thing that Hadoop does, essentially) and just before the call to compute(). Failure of Vertex reflection for putVertexList from GIRAPH-27 -- Key: GIRAPH-34 URL: https://issues.apache.org/jira/browse/GIRAPH-34 Project: Giraph Issue Type: Bug Reporter: Christian Kunz Assignee: Avery Ching Attachments: GIRAPH-34.patch Christian actually found this bug. I am filing the JIRA on his behalf. Here's my error when running TestVertexRangeBalancer. java.lang.RuntimeException: java.io.IOException: Call to returnwhose-lm/10.72.107.231:30002 failed on local exception: java.io.EOFException at org.apache.giraph.comm.BasicRPCCommunications.sendVertexListReq(BasicRPCCommunications.java:768) at org.apache.giraph.graph.BspServiceWorker.exchangeVertexRanges(BspServiceWorker.java:1282) at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:589) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.io.IOException: Call to returnwhose-lm/10.72.107.231:30002 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) at org.apache.hadoop.ipc.Client.call(Client.java:1033) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy3.putVertexList(Unknown Source) at org.apache.giraph.comm.BasicRPCCommunications.sendVertexListReq(BasicRPCCommunications.java:766) ... 10 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) I identified and fixed the issue by making BasicVertex implement Configurable and making the graph state set in BasicRPCCommunications. There is one more error though that I'll try and solve before putting up a reviewboard. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106842#comment-13106842 ] Jake Mannix commented on GIRAPH-12: --- Hey Hyunsik, if you're going to write a benchmark for the RPC stuff, that would be totally great. I'd like to start playing around with trying Finagle in here, and we can compare notes on what kinds of techniques among both approaches work better, unless I'd be stepping on your toes by doing so... Investigate communication improvements -- Key: GIRAPH-12 URL: https://issues.apache.org/jira/browse/GIRAPH-12 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Avery Ching Assignee: Hyunsik Choi Priority: Minor Attachments: GIRAPH-12_1.patch Currently every worker will start up a thread to communicate with every other workers. Hadoop RPC is used for communication. For instance if there are 400 workers, each worker will create 400 threads. This ends up using a lot of memory, even with the option -Dmapred.child.java.opts=-Xss64k. It would be good to investigate using frameworks like Netty or custom roll our own to improve this situation. By moving away from Hadoop RPC, we would also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps
Ensure that subclassing BasicVertex is possible by user apps Key: GIRAPH-36 URL: https://issues.apache.org/jira/browse/GIRAPH-36 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Priority: Blocker Fix For: 0.70.0 Original assumptions in Giraph were that all users would subclass Vertex (which extended MutableVertex extended BasicVertex). Classes which wish to have application specific data structures (ie. not a TreeMapI, EdgeI,E) may need to extend either MutableVertex or BasicVertex. Unfortunately VertexRange extends ArrayListVertex, and there are other places where the assumption is that vertex classes are either Vertex, or at least MutableVertex. Let's make sure the internal APIs allow for BasicVertex to be the base class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105568#comment-13105568 ] Jake Mannix commented on GIRAPH-28: --- I don't know what it was, I just re-patched with current trunk, after the refactorings of the most recent few patches. Memory use dropped to what it should be! Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated GIRAPH-28: -- Attachment: GIRAPH-28.diff Newly regenerated against trunk. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105004#comment-13105004 ] Jake Mannix commented on GIRAPH-28: --- Ok another patch coming soon for this, but good news: this is the output of the object size calculator now: (key: Primitive is what Dmitriy put in that test code, LDFD is a trivial class which extends the new LongDoubleFloatDoubleVertex class, and shows exactly the same memory as this) Tiny: 0 840 Object: 0 872 Primitive: 0 4536 LDFD: 0 4536 Tiny: 1 840 Object: 1 976 Primitive: 1 4536 LDFD: 1 4536 Tiny: 10 840 Object: 10 1912 Primitive: 10 4536 LDFD: 10 4536 Tiny: 100 2640 Object: 100 11272 Primitive: 100 4536 LDFD: 100 4536 Tiny: 100016080 Object: 1000104872 Primitive: 100046784 LDFD: 100046784 Tiny: 1 123600 Object: 1 1040872 Primitive: 1 302000 LDFD: 1 302000 Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103745#comment-13103745 ] Jake Mannix commented on GIRAPH-31: --- And for the implementations which have both the ability to provide a sorted iterator which isn't prohibitively expensive, but also provide a much faster unsorted iterator, they can choose whether to return true or false from the isSorted() method, and provide another method of the type you're suggesting. Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-31.diff As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103798#comment-13103798 ] Jake Mannix commented on GIRAPH-31: --- +1 to that, given your argument on the current use of the class. It may come a time when we have generic things going on in GraphMapper or BspServiceWorker which need to do special optimized things to sorted vertices, and at that time we can add an isSorted() or getSortedIterator() method. Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-31.diff As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated GIRAPH-31: -- Attachment: GIRAPH-31.diff Updated patch - remove isSorted(), document the fact that the iterator may or may not be sorted (and in fact is, in Vertex), and that users may subclass either Vertex *or* MutableVertex. I have not tested subclassing BasicVertex, which I suspect would fail in various ways, as VertexReader, GraphMapper, and some other classes may expect to get a MutableVertex for some methods. Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-31.diff, GIRAPH-31.diff As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103948#comment-13103948 ] Jake Mannix commented on GIRAPH-31: --- Sounds good to me! Lazy consensus is pretty common to The Apache Way ( http://www.apache.org/foundation/voting.html#LazyConsensus ). Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-31.diff, GIRAPH-31.diff As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1310#comment-1310 ] Jake Mannix commented on GIRAPH-28: --- Ok, so I went ahead and made a 'straw-man' refactoring branch (on GitHub: https://github.com/jakemannix/giraph/tree/vertex_map_refactor ), removing the getDestEdgeMap() method, and having BasicVertex implement Iterable, as well as the random-access read method getEdgeValue(targetVertexId). I got it passing tests, but ran into a few things we may want to consider: testing for existence of a target vertex is no longer possible: getEdgeValue(targetVertexId) returns the *value* associated with the edge. Edges are allowed to have null values and still denote a connection between the source and target vertex, right? Maybe we should just have an EdgeI, E getEdge(I targetVertexId) method instead? Secondly, far less importantly, is we need to have getNumOutEdges(), because clients often want to know the out-degree of the vertex, and they used to call getDestEdgeMap().size(). Thirdly: for the same reason that getEdgeValue() can return superfluous nulls, removeEdge(), used as a boolean, can trick the caller into thinking there was no connection to the target (because removeEdge() returned null), but really it's because I was trying to be clever and return the *value* which could be null. Having removeEdge() return the actual Edge fixes this. I'll open another ticket for this stuff, as patching this patch seems a bit silly. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103339#comment-13103339 ] Jake Mannix commented on GIRAPH-28: --- I'm suggesting that iterator() be always sorted. SortedMap implements Iterable (by way of Collection), and the iterator it returns is always in the sorted order. We'd have BasicVertex do the same thing. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103341#comment-13103341 ] Jake Mannix commented on GIRAPH-28: --- Also, to contradict my 1st and 3rd points, Dmitriy pointed out (in an out-of-band chat) that if we don't want to expose Edge to the user, because a) don't want to store it in memory (as his test showed that even switching TreeMapI, EdgeI,E to TreeMapI, E reduced memory usage by a fair amount), and b) don't want to have to instantiate tons of useless objects by lazily creating them, we could instead just keep the getEdgeValue() and removeEdge() as they were, but also add a boolean hasEdge(I targetVertexId) to test for connection. Then you get everything you need without exposing the Edge class (which only gets used internally for its Writable nature): if(vertex.hasEdge(targetVertexId)) { E value = vertex.getEdgeValue(targetVertexId); vertex.removeEdge(targetVertexId); } etc... Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103357#comment-13103357 ] Jake Mannix commented on GIRAPH-28: --- The alternative to IterableEdgeI, E is IterableI, returning only the target vertices, and you can call getEdgeValue(targetVertexId) on any of these if you need it. Again, many algorithms will simply do something like for(I targetId : vertex) { sendMsg(targetId, someFunction(baseMsg, getEdgeValue(targetId)); } which is maybe a little nicer looking (or at least not uglier) than: for(EdgeI, E edge : vertex) { sendMsg(edge.getVertexId(), someFunction(baseMsg, edge.getValue()); } And then there are no Edge objects hanging around. Alternatively, Edge could act just like a typical Writable, and the IteratorEdgeI, E iterates over the *same* Edge object setting different values on it as next() is called. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103363#comment-13103363 ] Jake Mannix commented on GIRAPH-28: --- As for sorting, I'd imagine that assuming it always returns a sorted iterator is fine, but yes, some implementations I can imagine might not want to do that. I'd lean against having multiple iterators until it was known that they were needed, and maybe just document the ones which return nonsorted ones so that things don't get messed up? Vertex subclasses are where the algorithms are implemented, right? So a Vertex knows whether it has a sorted iterator or not... the only question would be: are there generic methods implemented in things like BspServiceWorker, or GraphMapper, which would be expected to need to do things to a sorted iterator? Currently there are no such places that I can see. Without such cases, we could easily leave Vertex implementations to decide whether they needed to return sorted iterators or not. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102430#comment-13102430 ] Jake Mannix commented on GIRAPH-28: --- So Avery, the question I have for you is regarding the getOutEdgeMap() method - if we get rid of that, and instead maybe offer something like the other methods discussed on the list thread: E getEdge(I targetVertexId); ImmutableListI getSortedOutVertices(); boolean removeEdge(I targetVertexId); we could do away with being tied to this TreeMap (although for now, keep it around in Vertex.java, as there's not much else possible in the generic object case, most likely), in addition to allowing me to remove my insane pretend SortedMap wrapper class. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102442#comment-13102442 ] Jake Mannix commented on GIRAPH-28: --- I like the Iterator more than ImmutableList, yeah, that's great. I wonder if then just making BasicVertex implement IterableEdgeI,E would be called for: for(EdgeI,E edge : vertex) { ... } ? Not sure if that syntactic sugar is worth it. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101021#comment-13101021 ] Jake Mannix commented on GIRAPH-28: --- This is a toy version of LongDoubleFloatDoubleVertex, a proof of concept that you can get SimplePageRankVertex extends LongDoubleFloatDoubleVertex to pass its current unit tests without subclassing Vertex (and only using primitives internally!) Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Attachments: GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100901#comment-13100901 ] Jake Mannix commented on GIRAPH-27: --- Awesome, thanks Avery. Looks good to me. In looking over the diff in more detail in reviewboard, I notice that there are still a bunch of places where Vertex is referred to, but really BasicVertex (or at most MutableVertex) is all that's needed. But I'll open another ticket for those changes once this has been merged in. Mutable static global state in Vertex.java should be refactored --- Key: GIRAPH-27 URL: https://issues.apache.org/jira/browse/GIRAPH-27 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-27.patch, GIRAPH-27.patch Vertex.java has a bunch of static methods for getting/setting global graph state (total number of vertices, edges, a reference to the GraphMapper, etc). Refactoring this into a GraphState object, which every Vertex can hold onto a reference to (yes, a tiny bit more memory per Vertex, but in comparison to what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira