RE: BUG ? SequenceFileVertexInputFormat with custom Writable : can not iterate over vertices
Anyone ? -Message d'origine- De : VARENE Olivier DTSI/DSI Envoyé : jeudi 4 septembre 2014 11:47 À : user@giraph.apache.org Objet : BUG ? SequenceFileVertexInputFormat with custom Writable : can not iterate over vertices Hi, first of all many thanks to the community for the great work you are doing. with giraph 1.1-SNAPSHOT built for hadoop 2.4.0 I am stalled on a bug using custom SequenceFile Vertex Input format. I have opened a project on github to allow you to run/play/view the bug https://github.com/ovarene/giraph_SequenceFileVertexInput you will find everything needed :D I try to load/create my vertices from a SequenceFileLongWritable,MyWritable LongWritable being the vertexId MyWritable being a custom value writable used to generate VertexValue and its edges Vertices are created correctly and loaded (as seen on the log) 2014-09-04 10:44:44,683 DEBUG [load-0] MyReader (SequenceFileLongMyVertexInputFormat.java:getCurrentVertex(108)) - Create Vertex from 1.0 0.0 0: to : Vertex(id=1,value=1.0,#edges=0) 2014-09-04 10:44:44,694 DEBUG [load-0] MyReader (SequenceFileLongMyVertexInputFormat.java:getCurrentVertex(108)) - Create Vertex from 2.0 0.0 2:1|6| to : Vertex(id=2,value=2.0,#edges=2) 2014-09-04 10:44:44,695 DEBUG [load-0] MyReader (SequenceFileLongMyVertexInputFormat.java:getCurrentVertex(108)) - Create Vertex from 5.0 0.0 3:4|1|6| to : Vertex(id=5,value=5.0,#edges=3) 2014-09-04 10:44:44,697 INFO [load-0] worker.InputSplitsCallable (InputSplitsCallable.java:call(235)) - call: Loaded 1 input splits in 0.12198963 secs, (v=3, e=5) 24.592255 vertices/sec, 40.98709 edges/sec But alas, when trying to iterate over them during a compute session, I get the following error ERROR : But when reading in PrintVertex (BasicComputation): 2014-09-04 10:44:44,701 ERROR [load-0] utils.LogStacktraceCallable (LogStacktraceCallable.java:call(57)) - Execution of callable failed java.lang.IllegalStateException: next: IOException at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:101) at org.apache.giraph.partition.BasicPartition.addPartitionVertices(BasicPartition.java:99) at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110) at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482) at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:428) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60) at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: ensureRemaining: Only 5 bytes remaining, trying to read 8 at org.apache.giraph.utils.UnsafeByteArrayInputStream.ensureRemaining(UnsafeByteArrayInputStream.java:114) at org.apache.giraph.utils.UnsafeByteArrayInputStream.readLong(UnsafeByteArrayInputStream.java:197) at org.apache.hadoop.io.LongWritable.readFields(LongWritable.java:47) at org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:522) at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:98) ... 11 more Any idea ? Thanks a lot Olivier _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: How do I validate customArguments?
No worries. Just by the way, I realized after I sent that that using the public static int numPreprocessingSteps to store the value in the MasterCompute class doesn't work; you need to register a permanent aggregator to hold on to it, if you need to. Best, Matthew On Wed, Sep 10, 2014 at 3:15 PM, Matthew Cornell m...@matthewcornell.org wrote: Sorry for the long delay, Matthew. That's really helpful. Right now I'm stuck on apparently running out of memory on our little cluster, but the log messages are confusing. I'm putting together a question, but in the meantime I'll try one of the simpler examples such as degree count to see if /anything/ will run against my graph, which is very small (100K and edges nodes). -- matt On Thu, Aug 28, 2014 at 2:26 PM, Matthew Saltz sal...@gmail.com wrote: Matt, I'm not sure if you've resolved this problem already or not, but if you haven't: The initialize() method isn't limited to registering aggregators, and in fact, in my project I use it to do exactly what you're describing to check and load custom configuration parameters. Inside the initialize() method, I do this: *String numPreprocessingStepsConf = getConf().get(NUMBER_OF_PREPROCESSING_STEPS_CONF_OPT);* *numPreprocessingSteps = (numPreprocessingStepsConf != null) ?* *Integer.parseInt(numPreprocessingStepsConf.trim()) :* *DEFAULT_NUMBER_OF_PREPROCESSING_STEPS;* *System.out.println(Number of preprocessing steps: + numPreprocessingSteps);* where at the class level I declare: public static final String NUMBER_OF_PREPROCESSING_STEPS_CONF_OPT = wcc.numPreprocessingSteps; public static final int DEFAULT_NUMBER_OF_PREPROCESSING_STEPS = 1; public static int numPreprocessingSteps; To set the property, I use the option -ca wcc.numPreprocessingSteps=number of steps I want. If you need to check that it's properly formatted and not store them, this is a fine place to do it as well, given that it's run before the input superstep (see the giraph code in BspServiceMaster, line 1617 in the stable 1.1.0 release). What happens is that on the master, the MasterThread calls coordinateSuperstep() on a BspServiceMaster object, which checks if it's the input superstep, and if so, calls initialize() on the MasterCompute object (created in the becomeMaster() method of BspServiceMaster). Hope this helps, Matthew On Tue, Aug 26, 2014 at 4:36 PM, Matthew Cornell m...@matthewcornell.org wrote: Hi again. My application needs to pass in a String argument to the computation which each Vertex needs access to. (The argument is a list of the form [item1, item2, ...].) I found --customArguments (which I set in my tests via conf.set(arg_name, arg_val)) but I need to check that it's properly formatted. Where do I do that? The only thing I thought of is to specify a DefaultMasterCompute subclass whose initialize() does the check, but all the initialize() examples do is register aggregators; none of them check args or do anything else. Thanks in advance! -- matt -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org -- Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org