[ https://issues.apache.org/jira/browse/GIRAPH-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710028#comment-13710028 ]
Alessandro Presta commented on GIRAPH-717: ------------------------------------------ Quick comment on the design: having to call wrap()/unwrap() is inconvenient and defeats part of the point of this abstraction. I know you're thinking of deeper changes in how we serialize the IVEM types, but if you want to ship this first, here's what I suggest in the interim: - The user implements an interface similar to Computation, but that doesn't extend it, so that types don't have to be Writable. - Internally, we do the wrapping/unwrapping magic. > HiveJythonRunner with support for pure Jython value types. > ---------------------------------------------------------- > > Key: GIRAPH-717 > URL: https://issues.apache.org/jira/browse/GIRAPH-717 > Project: Giraph > Issue Type: Bug > Reporter: Nitay Joffe > Assignee: Nitay Joffe > > This adds support for pure Jython jobs. Currently this runner is hooked up to > work with Hive. I'll make it more generic later. > A Jython job is made up of two Jython scripts: > 1) launcher - this script is used to configure the job, it is only > interpreted locally. > 2) worker - this script is distributed to every worker and is used there. > Running a Jython job is simply: > HIVE_HOME=<x> > HADOOP_HOME=<y> > $HIVE_HOME/bin/hive --service jar <giraph-hive-jar> > org.apache.giraph.hive.jython.HiveJythonRunner jython --launcher > <launcher.py> --worker <worker.py> > There are examples and tests in the diff. Here is one example: > launcher: https://gist.github.com/nitay/a62e0a5d369a5e701fa3 > worker: https://gist.github.com/nitay/7834fd2b059527e65a36 > There are a few pieces to a Jython job, I'll go over each part here. > The launcher defines the graph types (those IVEMM writables) and sets up the > Hive vertex/edge inputs and output. Each graph type is one of the following: > 1) A Java type. For example the user can specify simply IntWritable > 2) A Jython type that implements Writable. In the example above the message > value implements Writable. > 3) A pure Jython type. The Java code will wrap these objects in a Writable > wrapper that serializes Jython values using Pickle (jython IO framework). > For Hive usage - if your value type is a primitive e.g. IntWritable or > LongWritable, then you need not do anything. The Java code will automatically > read/write the Hive table specified and convert between Hive types and the > primitive Writable. The vertex_id type in the example works like this. > If your value is a custom Jython type, you must create classes which > implement JythonHiveReader/JythonHiveWriter (or JythonHiveIO which is both). > These objects read/write Jython types from Hive. There are wrappers in the > Java code which take HiveIO data normally used in giraph-hive and turns them > into Jython types. This means, for example, that getMap() will return a > Jython dictionary instead of a Java Map. > There is also a PageRankBenchmark (from previous diff) implemented in Jython. > Here's a run for comparison / sanity check: > PageRankBenchmark with 10 workers, 100M vertices, 10B edges, 10 compute > threads > trunk: > https://gist.github.com/nitay/3170fa3b575d4d2e22a9 > total time: 302466 > with this diff: > https://gist.github.com/nitay/a52b6d1d64e50ab9829e > total time: 306517 > in jython: > https://gist.github.com/nitay/3f2e758b2933c3521727 > total time: 434730 > So we see that existing things are not affected (is there something else I > should test?) and that Jython has around 40% overhead. > ReviewBoard: https://reviews.apache.org/r/12543/ (Sorry it's a big one, hard > to split up :/) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira