[jira] [Commented] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009911#comment-16009911 ] Edward J. Yoon commented on HAMA-983: - {code} # create a new branch inside your directory 'current' git checkout -b HAMA-983 # ... do some changes to the files ... # store changes in the branch git push origin HAMA-983 # commit changes to the branch git commit -a -m '[HAMA-983] Hama runner for DataFlow' Then go to your GitHub HAMA page and do a Pull Request. {code} Hi JongYoon, you can create new branch like above. > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So > we'll need to implement some data processing runner like > https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is > clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, > I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984007#comment-15984007 ] Edward J. Yoon commented on HAMA-983: - Sorry for late reply. {quote}could you create a branch called 'beam_support' on github?{quote} Sure. or, you'll also able to create a branch because you're committer. I can do it this weekend. > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So > we'll need to implement some data processing runner like > https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is > clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, > I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAMA-999) Wrong size of MemoryQueue
[ https://issues.apache.org/jira/browse/HAMA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963871#comment-15963871 ] Edward J. Yoon commented on HAMA-999: - Thanks for report! I'll check what's wrong. > Wrong size of MemoryQueue > - > > Key: HAMA-999 > URL: https://issues.apache.org/jira/browse/HAMA-999 > Project: Hama > Issue Type: Bug > Components: bsp core >Reporter: JongYoon Lim > > I found that *SuperstepPiEstimator* example sometimes gives wrong result when > call *peer.getNumCurrentMessages()*. And that was because of wrong *size()* > of *MemoryQueue*. When I printed out sizes of queue from the example, > sometimes it said, > {noformat} > bundle size: 20, numOfMsg: 19, deque size: 0 > {noformat} > I think *deque*, *bundles* and *numOfMsg* of *MemoryQueue* should be properly > synchronized to get correct result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730744#comment-15730744 ] Edward J. Yoon commented on HAMA-983: - cool, let me check. > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So > we'll need to implement some data processing runner like > https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is > clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, > I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730450#comment-15730450 ] Edward J. Yoon commented on HAMA-983: - Here's my skeleton code with example that counts the words. You should implement the HamaPipelineRunner. Just translate and execute batch job. I think you can find how to translate them from flink's code: https://github.com/dataArtisans/flink-dataflow/blob/aad5d936abd41240f3e15d294ea181fb9cca05e0/runner/src/main/java/com/dataartisans/flink/dataflow/translation/FlinkBatchTransformTranslators.java#L410 {code} public class WordCountTest { static final String[] WORDS_ARRAY = new String[] { "hi there", "hi", "hi sue bob", "hi sue", "", "bob hi" }; static final List WORDS = Arrays.asList(WORDS_ARRAY); static final String[] COUNTS_ARRAY = new String[] { "hi: 5", "there: 1", "sue: 2", "bob: 2" }; /** * Example test that tests a PTransform by using an in-memory input and * inspecting the output. */ @Test @Category(RunnableOnService.class) public void testCountWords() throws Exception { HamaOptions options = PipelineOptionsFactory.as(HamaOptions.class); options.setRunner(HamaPipelineRunner.class); Pipeline p = Pipeline.create(options); PCollection input = p.apply(Create.of(WORDS).withCoder( StringUtf8Coder.of())); PCollection output = input .apply(new WordCount()) .apply(MapElements.via(new FormatAsTextFn())); //.apply(TextIO.Write.to("/tmp/result")); PAssert.that(output).containsInAnyOrder(COUNTS_ARRAY); p.run().waitUntilFinish(); } public static class WordCount extends PTransform>> { private static final long serialVersionUID = 1L; @Override public PCollection > apply(PCollection lines) { // Convert lines of text into individual words. PCollection words = lines.apply(ParDo.of(new DoFn () { private static final long serialVersionUID = 1L; private final Aggregator emptyLines = createAggregator("emptyLines", new Sum.SumLongFn()); @ProcessElement public void processElement(ProcessContext c) { if (c.element().trim().isEmpty()) { emptyLines.addValue(1L); } // Split the line into words. String[] words = c.element().split("[^a-zA-Z']+"); // Output each word encountered into the output PCollection. for (String word : words) { if (!word.isEmpty()) { c.output(word); } } } })); // Count the number of times each word occurs. PCollection > wordCounts = words.apply(Count . perElement()); return wordCounts; } } // / TODO public static class HamaPipelineRunner extends PipelineRunner { public static HamaPipelineRunner fromOptions(PipelineOptions x) { return new HamaPipelineRunner(); } @Override public Output apply( PTransform transform, Input input) { return super.apply(transform, input); } @Override public HamaPipelineResult run(Pipeline pipeline) { // TODO Auto-generated method stub System.out.println("Executing pipeline using HamaPipelineRunner."); // TODO you need to translate pipeline to Hama program // and execute pipeline // return the result return null; } } public class HamaPipelineResult implements PipelineResult { @Override public State getState() { // TODO Auto-generated method stub return null; } @Override public State cancel() throws IOException { // TODO Auto-generated method stub return null; } @Override public State waitUntilFinish(Duration duration) { // TODO Auto-generated method stub return null; } @Override public State waitUntilFinish() { // TODO Auto-generated method stub return null; } @Override public AggregatorValues getAggregatorValues( Aggregator aggregator) throws AggregatorRetrievalException { // TODO Auto-generated method stub return null; } @Override public MetricResults metrics() { // TODO Auto-generated method stub return null; } } public static interface HamaOptions extends PipelineOptions { } } {code} > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The
[jira] [Created] (HAMA-997) Docker-compose for Hama Cluster
Edward J. Yoon created HAMA-997: --- Summary: Docker-compose for Hama Cluster Key: HAMA-997 URL: https://issues.apache.org/jira/browse/HAMA-997 Project: Hama Issue Type: Task Components: build , documentation Affects Versions: 0.7.1 Reporter: Edward J. Yoon Assignee: Edward J. Yoon Fix For: 0.7.2 The current docker file doesn't work correctly. Each service e.g., master, groom servers should have own docker file and be launched using docker-compose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502033#comment-15502033 ] Edward J. Yoon commented on HAMA-983: - >> once PoC is done Great. If you need some helps, feel free to let me know :-) > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So > we'll need to implement some data processing runner like > https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is > clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, > I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502017#comment-15502017 ] Edward J. Yoon commented on HAMA-983: - Why don't we contribute this feature to the Apache Beam directly? https://github.com/apache/incubator-beam/tree/master/runners > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So > we'll need to implement some data processing runner like > https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is > clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, > I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501973#comment-15501973 ] Edward J. Yoon commented on HAMA-983: - https://cloud.google.com/dataflow/examples/wordcount-example This page is well-described about beam concept. The flow is like below: {code} Creating the Pipeline Applying transforms to the Pipeline Reading input (in this example: reading text files) Applying ParDo transforms Applying SDK-provided transforms (in this example: Count) Writing output (in this example: writing to Google Cloud Storage) Running the Pipeline {code} Once we created Hama pipeline we should able to run the program like below: {code} public static void main(String[] args) { // Create a pipeline parameterized by commandline flags. Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(arg)); p.apply(TextIO.Read.from("gs://...")) // Read input. .apply(new CountWords()) // Do some processing. .apply(TextIO.Write.to("gs://...")); // Write output. // Run the pipeline. p.run(); } {code} For I/O operations, you can refer this https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/io/hadoop/HadoopIO.java (instead of org.apache.hadoop.mapreduce.lib.input.FileInputFormat you should use https://github.com/apache/hama/blob/master/core/src/main/java/org/apache/hama/bsp/FileInputFormat.java) {quote}BSP for dataflow could be similar to SuperstepBSP{quote} I think so. GroupByKey seems a built-in processor that groups records by key. We should implement it using a superstep. > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So > we'll need to implement some data processing runner like > https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is > clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, > I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454239#comment-15454239 ] Edward J. Yoon commented on HAMA-983: - Just FYI, Apache Beam's basic example is wordcount. I guess, the batch mode can be similar with org.apache.hama.examples.PiEstimator: (n - 1) tasks parses and counts the words and 1 task aggregates the word counts and emits the final result. The streaming mode is not sure, so you'll need to check how it handles io. > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So > we'll need to implement some data processing runner like > https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is > clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, > I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451100#comment-15451100 ] Edward J. Yoon commented on HAMA-983: - Hi, I didn't look at dataflow (apache beam) closely, but: >> Do you mean that each superstep can be executed in data pipeline as a >> pcollection? I guess yes, or single job can be executed as the case may be. If you're interested in working on this, you can refer https://github.com/dataArtisans/flink-dataflow/blob/master/runner/src/main/java/com/dataartisans/flink/dataflow/FlinkPipelineRunner.java And, before we do this, HAMA-940 and data processing BSP maybe the first I guess. Please feel free to drop your opinion and contribute the patches. :-) If you have any questions, let me know. > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So > we'll need to implement some data processing runner like > https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is > clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, > I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-991) Add math classes for float16/float32
[ https://issues.apache.org/jira/browse/HAMA-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438496#comment-15438496 ] Edward J. Yoon commented on HAMA-991: - NOTE: float16 is not implemented yet. > Add math classes for float16/float32 > > > Key: HAMA-991 > URL: https://issues.apache.org/jira/browse/HAMA-991 > Project: Hama > Issue Type: New Feature > Components: math >Affects Versions: 0.7.1 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.2 > > > Implement Float32Writable, Vector, and Matrix etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-988) Allow to add additional no-input tasks as number user want
[ https://issues.apache.org/jira/browse/HAMA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-988. - Resolution: Fixed solved > Allow to add additional no-input tasks as number user want > -- > > Key: HAMA-988 > URL: https://issues.apache.org/jira/browse/HAMA-988 > Project: Hama > Issue Type: Improvement > Components: bsp core >Affects Versions: 0.7.1 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.2 > > > BSP framework basically launches the tasks as number of splits. And, > force-setting the number of tasks is also possible by setting > "hama.force.set.bsp.tasks" to true . > By the way, there's no way to add more specific tasks to the number of > splits. For example, if input has 5 splits, I want to launch 6 (1 more > no-input task to be acted as a master) tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-996) Delete meaningless parameter
[ https://issues.apache.org/jira/browse/HAMA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438170#comment-15438170 ] Edward J. Yoon commented on HAMA-996: - I think TaskInProgress.getRecoveryTask() is related with task recovery. If ft service is enabled, the framework checkpoints statuses perioidically. When tasks failed or crashed, the framework recover the tasks from previous checkpoint automatically. It seems GroomServer.startRecoveryTask() and AsyncRcvdMsgCheckpointImpl.restartTask()'s role is for that. If TaskInProgress.getRecoveryTask() is useless code, we can remove them or add tags @Deprecated with some comments. > Delete meaningless parameter > > > Key: HAMA-996 > URL: https://issues.apache.org/jira/browse/HAMA-996 > Project: Hama > Issue Type: Improvement >Reporter: JongYoon Lim >Priority: Trivial > Attachments: HAMA-996.patch > > > It seems that *taskid* param from *getGroomToSchedule()* of *TaskInProgress* > is not essential for this function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-799) Add a new BSP API that uses multiple threads
[ https://issues.apache.org/jira/browse/HAMA-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438151#comment-15438151 ] Edward J. Yoon commented on HAMA-799: - Hi, I originally thought that we can add something like https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html and the goal was supporting easy-to-use multithreading API within BSP. But we may different slightly. In MapReduce case, map(K, V) function processes K, V of each line of the chunks of data sequentially (as you already might know). The multithreadedMap processes lines concurrently and generates intermediate files. The BSP model is more flexible. We can implement mapreduce framework on BSP model like below: {code} bsp(BSPPeer peer) { while (peer.readNext(key, value)) { map(key, value); // calls user-defined map function. } ... } {code} Then, the MultithreadedMapper is just like below: {code} bsp(BSPPeer peer) { while (peer.readNext(key, value)) { executor.execute(new MultithreadedMapper(key, value)); // executes map function concurrently. } ... } {code} After the while loop, above two approach will produce the same result but different performance. The BSP model is slightly differenct. Each threads need to share the incoming and outgoing queues. Otherwise, it's just same with increasing the number of bsp tasks (this is meaningless). So, the multithreading should be used only for parallelization of some sequential computation part, not whole bsp() function. For example, {code} bsp() { ... for(int i = 0; i < 1000; i++) { ... // this part can be multi-threaded. } ... } {code} In GraphJobRunner, I used multithreading like below: {code} private void doSuperstep(GraphJobMessage currentMessage, BSPPeerpeer) throws IOException { this.errorCount.set(0); long startTime = System.currentTimeMillis(); this.changedVertexCnt = 0; vertices.startSuperstep(); ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors .newCachedThreadPool(); executor.setMaximumPoolSize(conf.getInt(DEFAULT_THREAD_POOL_SIZE, 64)); executor.setRejectedExecutionHandler(retryHandler); long loopStartTime = System.currentTimeMillis(); while (currentMessage != null) { executor.execute(new ComputeRunnable(currentMessage)); currentMessage = peer.getCurrentMessage(); } LOG.info("Total time spent for superstep-" + peer.getSuperstepCount() + " looping: " + (System.currentTimeMillis() - loopStartTime) + " ms"); executor.shutdown(); try { executor.awaitTermination(60, TimeUnit.SECONDS); } catch (InterruptedException e) { throw new IOException(e); } if (errorCount.get() > 0) { throw new IOException("there were " + errorCount + " exceptions during compute vertices."); } Iterator it = vertices.iterator(); while (it.hasNext()) { Vertex vertex = (Vertex ) it.next(); if (!vertex.isHalted() && !vertex.isComputed()) { vertex.compute(Collections. emptyList()); vertices.finishVertexComputation(vertex); } } getAggregationRunner().sendAggregatorValues(peer, vertices.getActiveVerticesNum(), this.changedVertexCnt); this.iteration++; LOG.info("Total time spent for superstep-" + peer.getSuperstepCount() + " computing vertices: " + (System.currentTimeMillis() - startTime) + " ms"); startTime = System.currentTimeMillis(); finishSuperstep(); LOG.info("Total time spent for superstep-" + peer.getSuperstepCount() + " synchronizing: " + (System.currentTimeMillis() - startTime) + " ms"); } {code} If there's more elegant way to use multithreading in bsp() function, we can do it. Otherwise, we should close this issue. > Add a new BSP API that uses multiple threads > > > Key: HAMA-799 > URL: https://issues.apache.org/jira/browse/HAMA-799 > Project: Hama > Issue Type: New Feature > Components: bsp core >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > > Add a new (additional) BSP API that uses multiple threads, called > MultithreadedBSP. This could help in speeding up the highly CPU-intensive > task. > And, I personally would like to re-design the GraphJobRunner based on this > MultithreadedBSP. Because computing vertex 1 at a time is a reason of slow > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-996) Delete meaningless parameter
[ https://issues.apache.org/jira/browse/HAMA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436419#comment-15436419 ] Edward J. Yoon commented on HAMA-996: - It looks like getGroomToSchedule() method is useless. > Delete meaningless parameter > > > Key: HAMA-996 > URL: https://issues.apache.org/jira/browse/HAMA-996 > Project: Hama > Issue Type: Improvement >Reporter: JongYoon Lim >Priority: Trivial > Attachments: HAMA-996.patch > > > It seems that *taskid* param from *getGroomToSchedule()* of *TaskInProgress* > is not essential for this function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-994) Support GPU for math operations
[ https://issues.apache.org/jira/browse/HAMA-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436238#comment-15436238 ] Edward J. Yoon commented on HAMA-994: - Status: I checked license issue of aparapi but it's not suitable in Apache project - https://github.com/aparapi/aparapi/issues/37 If AMD's official reply is also same, I'll check another options. > Support GPU for math operations > --- > > Key: HAMA-994 > URL: https://issues.apache.org/jira/browse/HAMA-994 > Project: Hama > Issue Type: New Feature >Affects Versions: 0.7.1 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.2 > > > Support GPU for matrix/vector operations using aparapi. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-994) Support GPU for math operations
Edward J. Yoon created HAMA-994: --- Summary: Support GPU for math operations Key: HAMA-994 URL: https://issues.apache.org/jira/browse/HAMA-994 Project: Hama Issue Type: New Feature Affects Versions: 0.7.1 Reporter: Edward J. Yoon Assignee: Edward J. Yoon Fix For: 0.7.2 Support GPU for matrix/vector operations using aparapi. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-993) HAMA Cluster is not running pi example
[ https://issues.apache.org/jira/browse/HAMA-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398974#comment-15398974 ] Edward J. Yoon commented on HAMA-993: - Hi, Can you please provide your error logs? > HAMA Cluster is not running pi example > -- > > Key: HAMA-993 > URL: https://issues.apache.org/jira/browse/HAMA-993 > Project: Hama > Issue Type: Bug > Components: examples >Affects Versions: 0.7.1 >Reporter: Jatinder Goyal > > I have setup hama cluster of 9 nodes. I have used all the recommended > settings given on the site, but when I try to run pi example on hama it gets > stuck there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341557#comment-15341557 ] Edward J. Yoon edited comment on HAMA-990 at 6/21/16 10:59 AM: --- Generally looks good! As you already planned, it'd be nice if you can add more functions which dumps the output and plots 2d charts (gnu plot or google chart api?). Why don't you create a simple benchmark-tool project on github? That's more easy way to code reveiw and share. was (Author: udanax): Generally looks good! As you already planned, it'd be nice if you can add more functions which dumps the output and plots 2d charts. Why don't you create a simple benchmark-tool project on github? That's more easy way to code reveiw and share. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341557#comment-15341557 ] Edward J. Yoon commented on HAMA-990: - Generally looks good! As you already planned, it'd be nice if you can add more functions which dumps the output and plots 2d charts. Why don't you create a simple benchmark-tool project on github? That's more easy way to code reveiw and share. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331107#comment-15331107 ] Edward J. Yoon commented on HAMA-990: - Sorry for late review, I'm on business trip until 21th :/ Until next week, it'd be nice if you can write some documentation and share test result w/ me. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-992) Hama streaming
[ https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311757#comment-15311757 ] Edward J. Yoon commented on HAMA-992: - Hi, if possible please attach your bsp python code here. And, I mean, you should update the BinaryProtocol.py and copy to hdfs again (instead of change the hama project). > Hama streaming > -- > > Key: HAMA-992 > URL: https://issues.apache.org/jira/browse/HAMA-992 > Project: Hama > Issue Type: Question > Components: bsp core, pipes >Affects Versions: 0.7.1 > Environment: RASPBIAN JESSIE > Full desktop image based on Debian Jessie >Reporter: Chaitanya > Labels: features, github-import, newbie > > Hello all, > I am trying to implement apache hama on Raspberry pi model 3 to establish a > distributed computing platform for scientific computation. I am trying to run > hama streaming over hadoop on a single namenode but I am facing a bit of a > difficulty in streaming my python code. I have downloaded the hama streaming > repository from :- > https://github.com/thomasjungblut/HamaStreaming > I ran the examples and also HelloWorldBSP.py on Hama and they work well. But > as soon as I switch to running my python code, the job fails. > I am trying to run the code with the following command:- > hama pipes -streaming true -bspTasks 1 -interpreter python -output > /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs > python.py > Below is the log file for your reference. I hope you can find time to help me > in this minor project:- > 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001 > 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting > 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer > address:localhost port:61001 > 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is > deprecated. Instead, use mapreduce.job.cache.local.files > 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client > 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to > Zookeeper! At localhost/127.0.0.1:61001 > java.lang.NumberFormatException: For input string: "Traceback (most recent > call last):" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174) > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106) > 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad > command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > java.util.concurrent.BrokenBarrierException > at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250) > at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362) > at > org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255) > Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: > java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182) > Caused by: java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function. > java.io.IOException: Stream closed > at > java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433) > at java.io.OutputStream.write(OutputStream.java:116) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at
[jira] [Issue Comment Deleted] (HAMA-992) Hama streaming
[ https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-992: Comment: was deleted (was: +1. Thanks for your opinion and action.) > Hama streaming > -- > > Key: HAMA-992 > URL: https://issues.apache.org/jira/browse/HAMA-992 > Project: Hama > Issue Type: Question > Components: bsp core, pipes >Affects Versions: 0.7.1 > Environment: RASPBIAN JESSIE > Full desktop image based on Debian Jessie >Reporter: Chaitanya > Labels: features, github-import, newbie > > Hello all, > I am trying to implement apache hama on Raspberry pi model 3 to establish a > distributed computing platform for scientific computation. I am trying to run > hama streaming over hadoop on a single namenode but I am facing a bit of a > difficulty in streaming my python code. I have downloaded the hama streaming > repository from :- > https://github.com/thomasjungblut/HamaStreaming > I ran the examples and also HelloWorldBSP.py on Hama and they work well. But > as soon as I switch to running my python code, the job fails. > I am trying to run the code with the following command:- > hama pipes -streaming true -bspTasks 1 -interpreter python -output > /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs > python.py > Below is the log file for your reference. I hope you can find time to help me > in this minor project:- > 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001 > 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting > 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer > address:localhost port:61001 > 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is > deprecated. Instead, use mapreduce.job.cache.local.files > 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client > 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to > Zookeeper! At localhost/127.0.0.1:61001 > java.lang.NumberFormatException: For input string: "Traceback (most recent > call last):" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174) > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106) > 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad > command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > java.util.concurrent.BrokenBarrierException > at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250) > at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362) > at > org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255) > Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: > java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182) > Caused by: java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function. > java.io.IOException: Stream closed > at > java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433) > at java.io.OutputStream.write(OutputStream.java:116) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at
[jira] [Issue Comment Deleted] (HAMA-992) Hama streaming
[ https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-992: Comment: was deleted (was: +1. Thanks for your opinion and action.) > Hama streaming > -- > > Key: HAMA-992 > URL: https://issues.apache.org/jira/browse/HAMA-992 > Project: Hama > Issue Type: Question > Components: bsp core, pipes >Affects Versions: 0.7.1 > Environment: RASPBIAN JESSIE > Full desktop image based on Debian Jessie >Reporter: Chaitanya > Labels: features, github-import, newbie > > Hello all, > I am trying to implement apache hama on Raspberry pi model 3 to establish a > distributed computing platform for scientific computation. I am trying to run > hama streaming over hadoop on a single namenode but I am facing a bit of a > difficulty in streaming my python code. I have downloaded the hama streaming > repository from :- > https://github.com/thomasjungblut/HamaStreaming > I ran the examples and also HelloWorldBSP.py on Hama and they work well. But > as soon as I switch to running my python code, the job fails. > I am trying to run the code with the following command:- > hama pipes -streaming true -bspTasks 1 -interpreter python -output > /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs > python.py > Below is the log file for your reference. I hope you can find time to help me > in this minor project:- > 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001 > 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting > 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer > address:localhost port:61001 > 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is > deprecated. Instead, use mapreduce.job.cache.local.files > 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client > 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to > Zookeeper! At localhost/127.0.0.1:61001 > java.lang.NumberFormatException: For input string: "Traceback (most recent > call last):" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174) > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106) > 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad > command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > java.util.concurrent.BrokenBarrierException > at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250) > at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362) > at > org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255) > Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: > java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182) > Caused by: java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function. > java.io.IOException: Stream closed > at > java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433) > at java.io.OutputStream.write(OutputStream.java:116) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at
[jira] [Commented] (HAMA-992) Hama streaming
[ https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310141#comment-15310141 ] Edward J. Yoon commented on HAMA-992: - +1. Thanks for your opinion and action. > Hama streaming > -- > > Key: HAMA-992 > URL: https://issues.apache.org/jira/browse/HAMA-992 > Project: Hama > Issue Type: Question > Components: bsp core, pipes >Affects Versions: 0.7.1 > Environment: RASPBIAN JESSIE > Full desktop image based on Debian Jessie >Reporter: Chaitanya > Labels: features, github-import, newbie > > Hello all, > I am trying to implement apache hama on Raspberry pi model 3 to establish a > distributed computing platform for scientific computation. I am trying to run > hama streaming over hadoop on a single namenode but I am facing a bit of a > difficulty in streaming my python code. I have downloaded the hama streaming > repository from :- > https://github.com/thomasjungblut/HamaStreaming > I ran the examples and also HelloWorldBSP.py on Hama and they work well. But > as soon as I switch to running my python code, the job fails. > I am trying to run the code with the following command:- > hama pipes -streaming true -bspTasks 1 -interpreter python -output > /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs > python.py > Below is the log file for your reference. I hope you can find time to help me > in this minor project:- > 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001 > 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting > 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer > address:localhost port:61001 > 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is > deprecated. Instead, use mapreduce.job.cache.local.files > 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client > 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to > Zookeeper! At localhost/127.0.0.1:61001 > java.lang.NumberFormatException: For input string: "Traceback (most recent > call last):" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174) > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106) > 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad > command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > java.util.concurrent.BrokenBarrierException > at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250) > at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362) > at > org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255) > Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: > java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182) > Caused by: java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function. > java.io.IOException: Stream closed > at > java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433) > at java.io.OutputStream.write(OutputStream.java:116) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at
[jira] [Commented] (HAMA-992) Hama streaming
[ https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310140#comment-15310140 ] Edward J. Yoon commented on HAMA-992: - +1. Thanks for your opinion and action. > Hama streaming > -- > > Key: HAMA-992 > URL: https://issues.apache.org/jira/browse/HAMA-992 > Project: Hama > Issue Type: Question > Components: bsp core, pipes >Affects Versions: 0.7.1 > Environment: RASPBIAN JESSIE > Full desktop image based on Debian Jessie >Reporter: Chaitanya > Labels: features, github-import, newbie > > Hello all, > I am trying to implement apache hama on Raspberry pi model 3 to establish a > distributed computing platform for scientific computation. I am trying to run > hama streaming over hadoop on a single namenode but I am facing a bit of a > difficulty in streaming my python code. I have downloaded the hama streaming > repository from :- > https://github.com/thomasjungblut/HamaStreaming > I ran the examples and also HelloWorldBSP.py on Hama and they work well. But > as soon as I switch to running my python code, the job fails. > I am trying to run the code with the following command:- > hama pipes -streaming true -bspTasks 1 -interpreter python -output > /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs > python.py > Below is the log file for your reference. I hope you can find time to help me > in this minor project:- > 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001 > 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting > 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer > address:localhost port:61001 > 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is > deprecated. Instead, use mapreduce.job.cache.local.files > 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client > 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to > Zookeeper! At localhost/127.0.0.1:61001 > java.lang.NumberFormatException: For input string: "Traceback (most recent > call last):" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174) > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106) > 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad > command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > java.util.concurrent.BrokenBarrierException > at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250) > at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362) > at > org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255) > Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: > java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182) > Caused by: java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function. > java.io.IOException: Stream closed > at > java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433) > at java.io.OutputStream.write(OutputStream.java:116) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at
[jira] [Commented] (HAMA-992) Hama streaming
[ https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310142#comment-15310142 ] Edward J. Yoon commented on HAMA-992: - +1. Thanks for your opinion and action. > Hama streaming > -- > > Key: HAMA-992 > URL: https://issues.apache.org/jira/browse/HAMA-992 > Project: Hama > Issue Type: Question > Components: bsp core, pipes >Affects Versions: 0.7.1 > Environment: RASPBIAN JESSIE > Full desktop image based on Debian Jessie >Reporter: Chaitanya > Labels: features, github-import, newbie > > Hello all, > I am trying to implement apache hama on Raspberry pi model 3 to establish a > distributed computing platform for scientific computation. I am trying to run > hama streaming over hadoop on a single namenode but I am facing a bit of a > difficulty in streaming my python code. I have downloaded the hama streaming > repository from :- > https://github.com/thomasjungblut/HamaStreaming > I ran the examples and also HelloWorldBSP.py on Hama and they work well. But > as soon as I switch to running my python code, the job fails. > I am trying to run the code with the following command:- > hama pipes -streaming true -bspTasks 1 -interpreter python -output > /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs > python.py > Below is the log file for your reference. I hope you can find time to help me > in this minor project:- > 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001 > 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting > 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer > address:localhost port:61001 > 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is > deprecated. Instead, use mapreduce.job.cache.local.files > 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client > 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to > Zookeeper! At localhost/127.0.0.1:61001 > java.lang.NumberFormatException: For input string: "Traceback (most recent > call last):" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174) > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106) > 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad > command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > java.util.concurrent.BrokenBarrierException > at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250) > at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362) > at > org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255) > Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: > java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182) > Caused by: java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function. > java.io.IOException: Stream closed > at > java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433) > at java.io.OutputStream.write(OutputStream.java:116) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at
[jira] [Commented] (HAMA-992) Hama streaming
[ https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310101#comment-15310101 ] Edward J. Yoon commented on HAMA-992: - Thomas, I'm fine either way but adding to hama repo would be good idea if we can put some efforts to increase python supports and attract python users. > Hama streaming > -- > > Key: HAMA-992 > URL: https://issues.apache.org/jira/browse/HAMA-992 > Project: Hama > Issue Type: Question > Components: bsp core, pipes >Affects Versions: 0.7.1 > Environment: RASPBIAN JESSIE > Full desktop image based on Debian Jessie >Reporter: Chaitanya > Labels: features, github-import, newbie > > Hello all, > I am trying to implement apache hama on Raspberry pi model 3 to establish a > distributed computing platform for scientific computation. I am trying to run > hama streaming over hadoop on a single namenode but I am facing a bit of a > difficulty in streaming my python code. I have downloaded the hama streaming > repository from :- > https://github.com/thomasjungblut/HamaStreaming > I ran the examples and also HelloWorldBSP.py on Hama and they work well. But > as soon as I switch to running my python code, the job fails. > I am trying to run the code with the following command:- > hama pipes -streaming true -bspTasks 1 -interpreter python -output > /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs > python.py > Below is the log file for your reference. I hope you can find time to help me > in this minor project:- > 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001 > 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting > 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting > 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer > address:localhost port:61001 > 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is > deprecated. Instead, use mapreduce.job.cache.local.files > 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client > 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to > Zookeeper! At localhost/127.0.0.1:61001 > java.lang.NumberFormatException: For input string: "Traceback (most recent > call last):" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174) > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106) > 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad > command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > java.util.concurrent.BrokenBarrierException > at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250) > at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362) > at > org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255) > Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: > java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182) > Caused by: java.lang.Exception: Bad command code: -2 > at > org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174) > 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function. > java.io.IOException: Stream closed > at > java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433) > at java.io.OutputStream.write(OutputStream.java:116) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at
[jira] [Commented] (HAMA-991) Add math classes for float16/float32
[ https://issues.apache.org/jira/browse/HAMA-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299374#comment-15299374 ] Edward J. Yoon commented on HAMA-991: - I'll push this 32bit float classes first. Thanks. > Add math classes for float16/float32 > > > Key: HAMA-991 > URL: https://issues.apache.org/jira/browse/HAMA-991 > Project: Hama > Issue Type: New Feature > Components: math >Affects Versions: 0.7.1 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.2 > > > Implement Float32Writable, Vector, and Matrix etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292552#comment-15292552 ] Edward J. Yoon edited comment on HAMA-990 at 5/20/16 2:02 AM: -- Yes. We assume that there's existing HAMA/FLINK/SPARK cluster. And, your project provides a shell script that auto-produce benchmark results. For example, {code}% ${Behroz_project}/bin/run benchmarks [all | kmeans | pagerank | others.. ]{code} If MRQL is good for us and works well, we can leverage it. was (Author: udanax): Yes. We assume that there's existing HAMA/FLINK/SPARK cluster. And, your project provides a shell script that auto-produce benchmark results. For example, ${Behroz_project}/bin/run benchmarks [all | kmeans | pagerank | others.. ] If MRQL is good for us and works well, we can leverage it. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292552#comment-15292552 ] Edward J. Yoon commented on HAMA-990: - Yes. We assume that there's existing HAMA/FLINK/SPARK cluster. And, your project provides a shell script that auto-produce benchmark results. For example, ${Behroz_project}/bin/run benchmarks [all | kmeans | pagerank | others.. ] If MRQL is good for us and works well, we can leverage it. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292530#comment-15292530 ] Edward J. Yoon edited comment on HAMA-990 at 5/20/16 1:39 AM: -- The Hadoop + HAMA + Flink + Spark cluster boot scripts are already on both Amazon and Google clouds https://github.com/GoogleCloudPlatform/bdutil/tree/master/extensions/hama https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama So, if we use MRQL, shell script (that generates some input data, schedules the jobs, and collects performance results) will be enough. was (Author: udanax): The Hadoop + HAMA + Flink + Spark cluster boot scripts are already on both Amazon and Google clouds https://github.com/GoogleCloudPlatform/bdutil/tree/master/extensions/hama https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama So, if we use MRQL, shell script will be enough. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292530#comment-15292530 ] Edward J. Yoon commented on HAMA-990: - The Hadoop + HAMA + Flink + Spark cluster boot scripts are already on both Amazon and Google clouds https://github.com/GoogleCloudPlatform/bdutil/tree/master/extensions/hama https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama So, if we use MRQL, shell script will be enough. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292508#comment-15292508 ] Edward J. Yoon edited comment on HAMA-990 at 5/20/16 1:23 AM: -- {code} According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page Rank and Query Processing whereas Spark is faster in Word Count. We can reproduce these results in our cluster and then can calculate the results for Hama. Once we have all the results we can compare all the systems. {code} I think good idea. With this, we may able to derive insight from the results (this should be our goal). I think I heard that flink uses own serialization techniques and shows good performance but unstable. Just FYI, MRQL also can be used for K-Means and PageRank. Regarding cluster, current my cluster (used for my research) is consist of only few high-end machines equipped gpu and so somewhat not fit for large-scale distributed computing benchmark. If you can write some scripts that make it possible to auto-produce benchmark results on clouds such as Amazon or Google cloud, I can help. was (Author: udanax): {qoute} According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page Rank and Query Processing whereas Spark is faster in Word Count. We can reproduce these results in our cluster and then can calculate the results for Hama. Once we have all the results we can compare all the systems. {qoute} I think good idea. With this, we may able to derive insight from the results (this should be our goal). I think I heard that flink uses own serialization techniques and shows good performance but unstable. Just FYI, MRQL also can be used for K-Means and PageRank. Regarding cluster, current my cluster (used for my research) is consist of only few high-end machines equipped gpu and so somewhat not fit for large-scale distributed computing benchmark. If you can write some scripts that make it possible to auto-produce benchmark results on clouds such as Amazon or Google cloud, I can help. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292508#comment-15292508 ] Edward J. Yoon commented on HAMA-990: - {qoute} According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page Rank and Query Processing whereas Spark is faster in Word Count. We can reproduce these results in our cluster and then can calculate the results for Hama. Once we have all the results we can compare all the systems. {qoute} I think good idea. With this, we may able to derive insight from the results (this should be our goal). I think I heard that flink uses own serialization techniques and shows good performance but unstable. Just FYI, MRQL also can be used for K-Means and PageRank. Regarding cluster, current my cluster (used for my research) is consist of only few high-end machines equipped gpu and so somewhat not fit for large-scale distributed computing benchmark. If you can write some scripts that make it possible to auto-produce benchmark results on clouds such as Amazon or Google cloud, I can help. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
[ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288809#comment-15288809 ] Edward J. Yoon commented on HAMA-990: - how's your work going? and main goal? I personally recommend you don't spend much time for other trivial bug fixes. > GSoC'16: Apache Hama benchmark against Spark and Flink > -- > > Key: HAMA-990 > URL: https://issues.apache.org/jira/browse/HAMA-990 > Project: Hama > Issue Type: Documentation >Reporter: Behroz Sikander >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285684#comment-15285684 ] Edward J. Yoon commented on HAMA-941: - Quick comment from Greg Malewicz -- "There are many clustering algorithms. Perhaps it's better to start with why you need to group items, and then look at papers for an algorithm that has the desired grouping properties." > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285259#comment-15285259 ] Edward J. Yoon commented on HAMA-941: - Sure, I'll check. greg a original author is also near my seat. :-) -- Best Regards, Edward J. Yoon > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-991) Add math classes for float16/float32
Edward J. Yoon created HAMA-991: --- Summary: Add math classes for float16/float32 Key: HAMA-991 URL: https://issues.apache.org/jira/browse/HAMA-991 Project: Hama Issue Type: New Feature Components: math Affects Versions: 0.7.1 Reporter: Edward J. Yoon Assignee: Edward J. Yoon Fix For: 0.7.2 Implement Float32Writable, Vector, and Matrix etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273903#comment-15273903 ] Edward J. Yoon commented on HAMA-941: - Sorry for lazy review, it's Korean holidays and I'll be back next week. Can you please try to find the bug of implementation? :-) > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-940) Add StreamInputFormat
[ https://issues.apache.org/jira/browse/HAMA-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266013#comment-15266013 ] Edward J. Yoon commented on HAMA-940: - If we can hide these implmentations and simplified APIs for processing stream data, I think this way is the better. > Add StreamInputFormat > - > > Key: HAMA-940 > URL: https://issues.apache.org/jira/browse/HAMA-940 > Project: Hama > Issue Type: New Feature > Components: bsp core >Reporter: Edward J. Yoon > > Add StreamInputFormat that reads newly appended records from previous > superstep. > I roughly guess it will be possible using reopen() method and file offset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-940) Add StreamInputFormat
[ https://issues.apache.org/jira/browse/HAMA-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266012#comment-15266012 ] Edward J. Yoon commented on HAMA-940: - As I mentioned in Description, we can simply check whether there's an newly appended records to the input file, keeping last read offset. To implement this, first of all, you should see the InputFormat interface class. The tricky issue is how we implement the getSplits() method and multiple tasks. At the moment, my simple idea is that one bsp task acts as a "Stream input queue" without implement StreamInputFormat and change the framework core. For example, we set the file path in job configuration. The master task acts like below: {code} if(isMaster(peer.me)) { while(true) { peer.reopen(); // reopen peer.skip(offset); // jump to last offset if(peer.readNext()) { // at here we do load-balance. sendTo("send a newly appended record to free slave tasks"); } else { Thread.sleep(); } } } {code} > Add StreamInputFormat > - > > Key: HAMA-940 > URL: https://issues.apache.org/jira/browse/HAMA-940 > Project: Hama > Issue Type: New Feature > Components: bsp core >Reporter: Edward J. Yoon > > Add StreamInputFormat that reads newly appended records from previous > superstep. > I roughly guess it will be possible using reopen() method and file offset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265625#comment-15265625 ] Edward J. Yoon commented on HAMA-941: - I just used \{code\} patch copied to clipboard \{code\} tag. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265533#comment-15265533 ] Edward J. Yoon commented on HAMA-941: - P.S., Initial code can be found at HAMA-594. and, I changed few things because it doesn't work correctly. > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-941) Semiclustering Termination
[ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265532#comment-15265532 ] Edward J. Yoon commented on HAMA-941: - First of all, it looks like boundary score factor seems always 0.0. This is the user-defined parameter. 2nd, if vertex count is (vC <= 1), score should be 1.0. Please apply my patch and test again. Do you see more bugs? {code} diff --git a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java index 9a905c1..38481fd 100644 --- a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java +++ b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java @@ -71,7 +71,7 @@ candidates.add(msg); if (!msg.contains(this.getVertexID()) -&& msg.size() == semiClusterMaximumVertexCount) { +&& msg.size() < semiClusterMaximumVertexCount) { SemiClusterMessage msgNew = WritableUtils.clone(msg, this.getConf()); msgNew.addVertex(this); msgNew.setSemiClusterId("C" @@ -149,14 +149,15 @@ * @return the value to calcualte the Score of a semi-cluster. */ public double semiClusterScoreCalcuation(SemiClusterMessage message) { -double iC = 0.0, bC = 0.0, fB = 0.0, sC = 0.0; -int vC = 0, eC = 0; +// TODO fB is the bounday score factor. This should be configurable by user +// the default is 0.5 +double iC = 0.0, bC = 0.0, fB = 0.5, sC = 0.0; +int vC = 0; vC = message.size(); for (Vertexv : message .getVertexList()) { List > eL = v.getEdges(); for (Edge e : eL) { -eC++; if (message.contains(e.getDestinationVertexID()) && e.getValue() != null) { iC = iC + e.getValue().get(); @@ -165,8 +166,12 @@ } } } + if (vC > 1) - sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)) / eC; + sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)); +else + sC = 1.0; + return sC; } {code} > Semiclustering Termination > -- > > Key: HAMA-941 > URL: https://issues.apache.org/jira/browse/HAMA-941 > Project: Hama > Issue Type: Improvement > Components: examples, graph >Reporter: Edward J. Yoon >Priority: Minor > > Currently Semiclustering example will be terminated when the number of > iterations exceeded the predefined threshold max iteration. > App should be stopped if there's no cluster changes (I guess). Please check > and improve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-989) Build fails on non-Linux systems
[ https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-989. - Resolution: Fixed Assignee: Behroz Sikander Fixed. > Build fails on non-Linux systems > > > Key: HAMA-989 > URL: https://issues.apache.org/jira/browse/HAMA-989 > Project: Hama > Issue Type: Bug > Components: bsp core, build >Affects Versions: 0.7.1 >Reporter: Edward J. Yoon >Assignee: Behroz Sikander > Fix For: 0.7.2 > > > http://markmail.org/message/ipgc5fjs57xdmtr2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAMA-989) Build fails on non-Linux systems
[ https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261553#comment-15261553 ] Edward J. Yoon edited comment on HAMA-989 at 4/28/16 5:14 AM: -- When you write commit log, you should follow the format: HAMA-989: your commit log Then, apache infra and github will be integrated automatically by issue ID. Also, you have to merge into 1 commit before pull request. You can use rebase command for example, git rebase -i HEAD~3. Thanks. was (Author: udanax): When you write commit log, you should follow below format: HAMA-989: commitlog Then, apache infra and github will be integrated automatically by issue ID. Also, you have to merge into 1 commit before pull request. You can use rebase command for example, git rebase -i HEAD~3. Thanks. > Build fails on non-Linux systems > > > Key: HAMA-989 > URL: https://issues.apache.org/jira/browse/HAMA-989 > Project: Hama > Issue Type: Bug > Components: bsp core, build >Affects Versions: 0.7.1 >Reporter: Edward J. Yoon > Fix For: 0.7.2 > > > http://markmail.org/message/ipgc5fjs57xdmtr2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-989) Build fails on non-Linux systems
[ https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261553#comment-15261553 ] Edward J. Yoon commented on HAMA-989: - When you write commit log, you should follow below format: HAMA-989: commitlog Then, apache infra and github will be integrated automatically by issue ID. Also, you have to merge into 1 commit before pull request. You can use rebase command for example, git rebase -i HEAD~3. Thanks. > Build fails on non-Linux systems > > > Key: HAMA-989 > URL: https://issues.apache.org/jira/browse/HAMA-989 > Project: Hama > Issue Type: Bug > Components: bsp core, build >Affects Versions: 0.7.1 >Reporter: Edward J. Yoon > Fix For: 0.7.2 > > > http://markmail.org/message/ipgc5fjs57xdmtr2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-989) Build fails on non-Linux systems
[ https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257343#comment-15257343 ] Edward J. Yoon commented on HAMA-989: - We can catch and ignore exceptions or, SystemUtils. {code} diff --git a/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java b/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java index f4f89b9..b7bc9c8 100644 --- a/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java +++ b/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java @@ -23,6 +23,7 @@ import junit.framework.TestCase; +import org.apache.commons.lang.SystemUtils; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.NullWritable; @@ -45,10 +46,14 @@ public static volatile int increment = 1; public void testMemoryMessaging() throws Exception { -HamaConfiguration conf = new HamaConfiguration(); -conf.setClass(MessageManager.RECEIVE_QUEUE_TYPE_CLASS, MemoryQueue.class, -MessageQueue.class); -messagingInternal(conf); +if (SystemUtils.IS_OS_LINUX) { + HamaConfiguration conf = new HamaConfiguration(); + conf.setClass(MessageManager.RECEIVE_QUEUE_TYPE_CLASS, MemoryQueue.class, + MessageQueue.class); + messagingInternal(conf); +} else { + // we skip this test bc AsyncRPC is currently support only linux +} } private static void messagingInternal(HamaConfiguration conf) {code} WDYT? > Build fails on non-Linux systems > > > Key: HAMA-989 > URL: https://issues.apache.org/jira/browse/HAMA-989 > Project: Hama > Issue Type: Bug > Components: bsp core, build >Affects Versions: 0.7.1 >Reporter: Edward J. Yoon > Fix For: 0.7.2 > > > http://markmail.org/message/ipgc5fjs57xdmtr2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-989) Build fails on non-Linux systems
[ https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-989: Summary: Build fails on non-Linux systems (was: Build fails on non-Linux OS) > Build fails on non-Linux systems > > > Key: HAMA-989 > URL: https://issues.apache.org/jira/browse/HAMA-989 > Project: Hama > Issue Type: Bug > Components: bsp core, build >Affects Versions: 0.7.1 >Reporter: Edward J. Yoon > Fix For: 0.7.2 > > > http://markmail.org/message/ipgc5fjs57xdmtr2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-989) Build fails on non-Linux OS
Edward J. Yoon created HAMA-989: --- Summary: Build fails on non-Linux OS Key: HAMA-989 URL: https://issues.apache.org/jira/browse/HAMA-989 Project: Hama Issue Type: Bug Components: bsp core, build Affects Versions: 0.7.1 Reporter: Edward J. Yoon Fix For: 0.7.2 http://markmail.org/message/ipgc5fjs57xdmtr2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-988) Allow to add additional no-input tasks as number user want
Edward J. Yoon created HAMA-988: --- Summary: Allow to add additional no-input tasks as number user want Key: HAMA-988 URL: https://issues.apache.org/jira/browse/HAMA-988 Project: Hama Issue Type: Improvement Components: bsp core Affects Versions: 0.7.1 Reporter: Edward J. Yoon Assignee: Edward J. Yoon Fix For: 0.7.2 BSP framework basically launches the tasks as number of splits. And, force-setting the number of tasks is also possible by setting "hama.force.set.bsp.tasks" to true . By the way, there's no way to add more specific tasks to the number of splits. For example, if input has 5 splits, I want to launch 6 (1 more no-input task to be acted as a master) tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+
[ https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-984. - Resolution: Fixed I just committed this, Thanks Cazen! Please use the TRUNK version on your environments, and feel free to report your problems. Thanks. > Support AWS S3 schema in Hadoop 2.6+ > > > Key: HAMA-984 > URL: https://issues.apache.org/jira/browse/HAMA-984 > Project: Hama > Issue Type: Improvement > Components: build >Reporter: Cazen Lee >Assignee: Cazen Lee > > Hadoop 2.6+ does not contain AWS S3 related filesystem by default. > So, IOException(No FileSystem for scheme) occurred while trying to access S3 > via s3 or s3n schema. > I know it's not a Hama bug but it will be helpful to Hama users who using AWS > S3 because it can be used by previous version(includes 1.x) without manual > setting. Of course, we can also guide through the changes, without > modification any source code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+
[ https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-984: Assignee: Cazen Lee > Support AWS S3 schema in Hadoop 2.6+ > > > Key: HAMA-984 > URL: https://issues.apache.org/jira/browse/HAMA-984 > Project: Hama > Issue Type: Improvement > Components: build >Reporter: Cazen Lee >Assignee: Cazen Lee > > Hadoop 2.6+ does not contain AWS S3 related filesystem by default. > So, IOException(No FileSystem for scheme) occurred while trying to access S3 > via s3 or s3n schema. > I know it's not a Hama bug but it will be helpful to Hama users who using AWS > S3 because it can be used by previous version(includes 1.x) without manual > setting. Of course, we can also guide through the changes, without > modification any source code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-986) Hashcode calculation
[ https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-986: Fix Version/s: 0.7.2 > Hashcode calculation > - > > Key: HAMA-986 > URL: https://issues.apache.org/jira/browse/HAMA-986 > Project: Hama > Issue Type: Bug > Components: bsp core >Affects Versions: 0.7.1 >Reporter: JongYoon Lim >Priority: Trivial > Fix For: 0.7.2 > > Attachments: HAMA-986.patch > > > There is a missing value when calculating hashcode of AsyncClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-986) Hashcode calculation
[ https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-986. - Resolution: Fixed Assignee: JongYoon Lim I just committed this! Thanks JongYoon. > Hashcode calculation > - > > Key: HAMA-986 > URL: https://issues.apache.org/jira/browse/HAMA-986 > Project: Hama > Issue Type: Bug > Components: bsp core >Affects Versions: 0.7.1 >Reporter: JongYoon Lim >Assignee: JongYoon Lim >Priority: Trivial > Fix For: 0.7.2 > > Attachments: HAMA-986.patch > > > There is a missing value when calculating hashcode of AsyncClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-986) Hashcode calculation
[ https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-986: Affects Version/s: 0.7.1 > Hashcode calculation > - > > Key: HAMA-986 > URL: https://issues.apache.org/jira/browse/HAMA-986 > Project: Hama > Issue Type: Bug > Components: bsp core >Affects Versions: 0.7.1 >Reporter: JongYoon Lim >Priority: Trivial > Fix For: 0.7.2 > > Attachments: HAMA-986.patch > > > There is a missing value when calculating hashcode of AsyncClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-982) Vertex.read/writeState() method throws NullPointerException
[ https://issues.apache.org/jira/browse/HAMA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-982. - Resolution: Fixed Fix Version/s: (was: 0.7.2) 0.7.1 Fixed. > Vertex.read/writeState() method throws NullPointerException > --- > > Key: HAMA-982 > URL: https://issues.apache.org/jira/browse/HAMA-982 > Project: Hama > Issue Type: Bug > Components: graph >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.1 > > > It occurs at partitioning and initial supersteps. > > at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-982) Vertex.read/writeState() method throws NullPointerException
[ https://issues.apache.org/jira/browse/HAMA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-982: Fix Version/s: (was: 0.7.1) 0.7.2 > Vertex.read/writeState() method throws NullPointerException > --- > > Key: HAMA-982 > URL: https://issues.apache.org/jira/browse/HAMA-982 > Project: Hama > Issue Type: Bug > Components: graph >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.1 > > > It occurs at partitioning and initial supersteps. > > at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-986) Hashcode calculation
[ https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192751#comment-15192751 ] Edward J. Yoon commented on HAMA-986: - Thanks for your contribution! Since we're currently in release process, I can commit few days later. > Hashcode calculation > - > > Key: HAMA-986 > URL: https://issues.apache.org/jira/browse/HAMA-986 > Project: Hama > Issue Type: Bug > Components: bsp core >Reporter: JongYoon Lim >Priority: Trivial > Attachments: HAMA-986.patch > > > There is a missing value when calculating hashcode of AsyncClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-985) Update git scm provider dependency
Edward J. Yoon created HAMA-985: --- Summary: Update git scm provider dependency Key: HAMA-985 URL: https://issues.apache.org/jira/browse/HAMA-985 Project: Hama Issue Type: Bug Components: build Reporter: Edward J. Yoon Symptom: mvn release:prepare or perform not committing changes to pom.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+
[ https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177081#comment-15177081 ] Edward J. Yoon commented on HAMA-984: - Hi, the hadoop-auth package is used for security/auth in o.a.h.ipc package and YARN module. > Support AWS S3 schema in Hadoop 2.6+ > > > Key: HAMA-984 > URL: https://issues.apache.org/jira/browse/HAMA-984 > Project: Hama > Issue Type: Improvement > Components: build >Reporter: Cazen Lee > > Hadoop 2.6+ does not contain AWS S3 related filesystem by default. > So, IOException(No FileSystem for scheme) occurred while trying to access S3 > via s3 or s3n schema. > I know it's not a Hama bug but it will be helpful to Hama users who using AWS > S3 because it can be used by previous version(includes 1.x) without manual > setting. Of course, we can also guide through the changes, without > modification any source code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+
[ https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170592#comment-15170592 ] Edward J. Yoon commented on HAMA-984: - Thanks for pull request. I can check next week. > Support AWS S3 schema in Hadoop 2.6+ > > > Key: HAMA-984 > URL: https://issues.apache.org/jira/browse/HAMA-984 > Project: Hama > Issue Type: Improvement > Components: build >Reporter: Cazen Lee > > Hadoop 2.6+ does not contain AWS S3 related filesystem by default. > So, IOException(No FileSystem for scheme) occurred while trying to access S3 > via s3 or s3n schema. > I know it's not a Hama bug but it will be helpful to Hama users who using AWS > S3 because it can be used by previous version(includes 1.x) without manual > setting. Of course, we can also guide through the changes, without > modification any source code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-983) Hama runner for DataFlow
[ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-983: Labels: gsoc2016 (was: ) > Hama runner for DataFlow > > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug >Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both > batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So > we'll need to implement some data processing runner like > https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is > clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, > I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-983) Hama runner for DataFlow
Edward J. Yoon created HAMA-983: --- Summary: Hama runner for DataFlow Key: HAMA-983 URL: https://issues.apache.org/jira/browse/HAMA-983 Project: Hama Issue Type: Bug Reporter: Edward J. Yoon As you already know, Apache Beam provides unified programming model for both batch and streaming inputs. The APIs are generally associated with data filtering and transforming. So we'll need to implement some data processing runner like https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java Also, implementing similarity join can be funny. According to http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is clearly winner among Apache Hadoop and Apache Spark. Since it consists of transformation, aggregation, and partition computations, I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-982) Vertex.read/writeState() method throws NullPointerException
[ https://issues.apache.org/jira/browse/HAMA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-982: Description: It occurs at partitioning and initial supersteps. > at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557) was: It occurs when partitioning and initial supersteps. > at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557) > Vertex.read/writeState() method throws NullPointerException > --- > > Key: HAMA-982 > URL: https://issues.apache.org/jira/browse/HAMA-982 > Project: Hama > Issue Type: Bug > Components: graph >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.1 > > > It occurs at partitioning and initial supersteps. > > at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-982) Vertex.read/writeState() method throws NullPointerException
Edward J. Yoon created HAMA-982: --- Summary: Vertex.read/writeState() method throws NullPointerException Key: HAMA-982 URL: https://issues.apache.org/jira/browse/HAMA-982 Project: Hama Issue Type: Bug Components: graph Affects Versions: 0.7.0 Reporter: Edward J. Yoon Assignee: Edward J. Yoon Fix For: 0.7.1 It occurs when partitioning and initial supersteps. > at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-900) Rotation task scheduler
[ https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104020#comment-15104020 ] Edward J. Yoon commented on HAMA-900: - I just merged it into master, Thanks Behroz :-) > Rotation task scheduler > --- > > Key: HAMA-900 > URL: https://issues.apache.org/jira/browse/HAMA-900 > Project: Hama > Issue Type: New Feature > Components: bsp core >Reporter: Edward J. Yoon > > To spread tasks widely, I need a FIFO job scheduler that assign tasks one at > a time in rotation of groom servers (a method of dealing cards). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-900) Rotation task scheduler
[ https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-900. - Resolution: Fixed Assignee: Behroz Sikander Fix Version/s: 0.7.1 > Rotation task scheduler > --- > > Key: HAMA-900 > URL: https://issues.apache.org/jira/browse/HAMA-900 > Project: Hama > Issue Type: New Feature > Components: bsp core >Reporter: Edward J. Yoon >Assignee: Behroz Sikander > Fix For: 0.7.1 > > > To spread tasks widely, I need a FIFO job scheduler that assign tasks one at > a time in rotation of groom servers (a method of dealing cards). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-900) Rotation task scheduler
[ https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099245#comment-15099245 ] Edward J. Yoon commented on HAMA-900: - Cool~ 1) % mvn clean install 2) I personally think data locality should have 2nd priority in round-robin. 3) In getClass(String name, Class defaultValue, Class interface) method, "BestEffortDataLocalTaskAllocator.class" is just a default value. If you define the "bsp.taskalloc.class" property in hama-site.xml, you can use it. We may also want to add default configuration to the hama-default.xml like below: {code} bsp.taskalloc.class org.apache.hama.bsp.taskallocation.BestEffortDataLocalTaskAllocator The task allocator to choose. Default is BestEffortDataLocalTaskAllocator that takes in only the data locality as a constraint for allocating tasks. {code} > Rotation task scheduler > --- > > Key: HAMA-900 > URL: https://issues.apache.org/jira/browse/HAMA-900 > Project: Hama > Issue Type: New Feature > Components: bsp core >Reporter: Edward J. Yoon > > To spread tasks widely, I need a FIFO job scheduler that assign tasks one at > a time in rotation of groom servers (a method of dealing cards). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-900) Rotation task scheduler
[ https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097150#comment-15097150 ] Edward J. Yoon commented on HAMA-900: - If you needed, I can work on this this week. > Rotation task scheduler > --- > > Key: HAMA-900 > URL: https://issues.apache.org/jira/browse/HAMA-900 > Project: Hama > Issue Type: New Feature > Components: bsp core >Reporter: Edward J. Yoon > > To spread tasks widely, I need a FIFO job scheduler that assign tasks one at > a time in rotation of groom servers (a method of dealing cards). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-900) Rotation task scheduler
[ https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097144#comment-15097144 ] Edward J. Yoon commented on HAMA-900: - Thanks for reminding me. In the JobInProgress.java, you can see the two obtainNewTask() methods: obtainNewTask(MapgroomStatuses) and obtainNewTask(TaskInProgress task, Map groomStatuses, BSPResource[] resources). The latter API uses a taskAllocationStrategy. As far as I know, it originally created for task recovery and re-allocation. The default scheduler SimpleTaskWorkerManager.java still use former API like below. So, this issue is still TODO thing. {code} while ((t = jip.obtainNewTask(this.groomStatuses)) != null) { taskSet.add(t); // Scheduled all tasks if (++cnt == this.jip.tasks.length) { break; } } .. // assembly into actions for (Task task : taskSet) { GroomServerStatus groomStatus = jip.getGroomStatusForTask(task); {code} > Rotation task scheduler > --- > > Key: HAMA-900 > URL: https://issues.apache.org/jira/browse/HAMA-900 > Project: Hama > Issue Type: New Feature > Components: bsp core >Reporter: Edward J. Yoon > > To spread tasks widely, I need a FIFO job scheduler that assign tasks one at > a time in rotation of groom servers (a method of dealing cards). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-900) Rotation task scheduler
[ https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097390#comment-15097390 ] Edward J. Yoon commented on HAMA-900: - Oh, ... you're right. My mis-read. >> if just the interface needs to be implemented then I can give it a try since >> I already have the cluster where I can duplicate this issue. It'll be great if you can try this and share result with me. Thanks :-) > Rotation task scheduler > --- > > Key: HAMA-900 > URL: https://issues.apache.org/jira/browse/HAMA-900 > Project: Hama > Issue Type: New Feature > Components: bsp core >Reporter: Edward J. Yoon > > To spread tasks widely, I need a FIFO job scheduler that assign tasks one at > a time in rotation of groom servers (a method of dealing cards). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-981) Set maven scm to git
[ https://issues.apache.org/jira/browse/HAMA-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089030#comment-15089030 ] Edward J. Yoon commented on HAMA-981: - Yes, we migrated to GIT and SVN is now read-only. > Set maven scm to git > > > Key: HAMA-981 > URL: https://issues.apache.org/jira/browse/HAMA-981 > Project: Hama > Issue Type: Bug > Components: build >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.1 > > > SCM still uses svn repository. We need to change before release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-981) Set maven scm to git
[ https://issues.apache.org/jira/browse/HAMA-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080775#comment-15080775 ] Edward J. Yoon commented on HAMA-981: - {code} -scm:svn:https://svn.apache.org/repos/asf/hama/ - - scm:svn:http://svn.apache.org/repos/asf/hama/trunk/ - - - scm:svn:https://svn.apache.org/repos/asf/hama/trunk/ - +https://git-wip-us.apache.org/repos/asf/hama.git + scm:git:https://git-wip-us.apache.org/repos/asf/hama.git + scm:git:https://git-wip-us.apache.org/repos/asf/hama.git +HEAD {code} Here's my changes. If no objections, I'll commit directly within 3 days. :-) > Set maven scm to git > > > Key: HAMA-981 > URL: https://issues.apache.org/jira/browse/HAMA-981 > Project: Hama > Issue Type: Bug > Components: build >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.1 > > > SCM still uses svn repository. We need to change before release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAMA-970) Exception can occur if the size of splits is bigger than numBSPTasks
[ https://issues.apache.org/jira/browse/HAMA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047998#comment-15047998 ] Edward J. Yoon edited comment on HAMA-970 at 12/9/15 4:17 AM: -- Hi, To launch more tasks than num of splits, you should use input partitioner - https://github.com/apache/hama/blob/master/core/src/test/java/org/apache/hama/bsp/TestPartitioning.java example, you should use input partitioner. For example, if you have a 10MB file and set the number of tasks 10 with partitioner, the framework automatically partition 10MB file into 10 files and then launch your main BSP program with 10 tasks. {quote} Previously in my Input Paths, I was adding 2 files, one empty file and one 70 MB file. This is working but Hama only opens up 2 tasks, one for empty file (which becomes the master) and one for 70 MB file (which becomes my only slave). Now, since I want to divide the 70 MB file into 4-5 tasks if I try to do this solution, I get an exception. {quote} You can do like this: 1) partition one 70MB file into 9 files (manually) and then launch the BSP program with setNumOfTasks(10); was (Author: udanax): Hi, To launch more tasks than num of splits, you should use input partitioner - https://github.com/apache/hama/blob/master/core/src/test/java/org/apache/hama/bsp/TestPartitioning.java example, you should use input partitioner. For example, if you have a 10MB file and set the number of tasks 10 with partitioner, the framework automatically partition 10MB file into 10 files and then launch your main BSP program with 10 tasks. {qoute} Previously in my Input Paths, I was adding 2 files, one empty file and one 70 MB file. This is working but Hama only opens up 2 tasks, one for empty file (which becomes the master) and one for 70 MB file (which becomes my only slave). Now, since I want to divide the 70 MB file into 4-5 tasks if I try to do this solution, I get an exception. {qoute} You can do like this: 1) partition one 70MB file into 9 files (manually) and then launch the BSP program with setNumOfTasks(10); > Exception can occur if the size of splits is bigger than numBSPTasks > > > Key: HAMA-970 > URL: https://issues.apache.org/jira/browse/HAMA-970 > Project: Hama > Issue Type: Bug > Components: bsp core >Affects Versions: 0.7.0 >Reporter: JongYoon Lim >Priority: Trivial > Attachments: HAMA-970.patch > > > In JonInProgress, it's possble to get Exception in initTasks(). > {code:java} > this.tasks = new TaskInProgress[numBSPTasks]; > for (int i = 0; i < splits.length; i++) { > tasks[i] = new TaskInProgress(getJobID(), this.jobFile.toString(), > splits[i], this.conf, this, i); > } > {code} > I'm not sure that *numBSPTask* is always bigger than *splits.length*. > So, I think it's better to use bigger value to assign the *tasks* array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-970) Exception can occur if the size of splits is bigger than numBSPTasks
[ https://issues.apache.org/jira/browse/HAMA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047998#comment-15047998 ] Edward J. Yoon commented on HAMA-970: - Hi, To launch more tasks than num of splits, you should use input partitioner - https://github.com/apache/hama/blob/master/core/src/test/java/org/apache/hama/bsp/TestPartitioning.java example, you should use input partitioner. For example, if you have a 10MB file and set the number of tasks 10 with partitioner, the framework automatically partition 10MB file into 10 files and then launch your main BSP program with 10 tasks. {qoute} Previously in my Input Paths, I was adding 2 files, one empty file and one 70 MB file. This is working but Hama only opens up 2 tasks, one for empty file (which becomes the master) and one for 70 MB file (which becomes my only slave). Now, since I want to divide the 70 MB file into 4-5 tasks if I try to do this solution, I get an exception. {qoute} You can do like this: 1) partition one 70MB file into 9 files (manually) and then launch the BSP program with setNumOfTasks(10); > Exception can occur if the size of splits is bigger than numBSPTasks > > > Key: HAMA-970 > URL: https://issues.apache.org/jira/browse/HAMA-970 > Project: Hama > Issue Type: Bug > Components: bsp core >Affects Versions: 0.7.0 >Reporter: JongYoon Lim >Priority: Trivial > Attachments: HAMA-970.patch > > > In JonInProgress, it's possble to get Exception in initTasks(). > {code:java} > this.tasks = new TaskInProgress[numBSPTasks]; > for (int i = 0; i < splits.length; i++) { > tasks[i] = new TaskInProgress(getJobID(), this.jobFile.toString(), > splits[i], this.conf, this, i); > } > {code} > I'm not sure that *numBSPTask* is always bigger than *splits.length*. > So, I think it's better to use bigger value to assign the *tasks* array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-981) Set maven scm to git
Edward J. Yoon created HAMA-981: --- Summary: Set maven scm to git Key: HAMA-981 URL: https://issues.apache.org/jira/browse/HAMA-981 Project: Hama Issue Type: Bug Components: build Affects Versions: 0.7.0 Reporter: Edward J. Yoon Assignee: Edward J. Yoon Fix For: 0.7.1 SCM still uses svn repository. We need to change before release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-978) NumberFormatException at StreamingUplinkReaderThread.readCommand
[ https://issues.apache.org/jira/browse/HAMA-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-978. - Resolution: Fixed Assignee: Edward J. Yoon This bug has fixed by https://github.com/apache/hama/commit/c095c7da2e529256579429da9bb2d534a96da873 > NumberFormatException at StreamingUplinkReaderThread.readCommand > > > Key: HAMA-978 > URL: https://issues.apache.org/jira/browse/HAMA-978 > Project: Hama > Issue Type: Bug > Components: pipes >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.1 > > > {code} > Hi to all, > this is my first mail to this mailing list so please have patience if I > make some bad choice in the format. > I have a kubuntu-14.04 on an old Intel Core 2 Duo processor T7500 with 2 GB > of RAM. > I have properly installed Hadoop-2.7.1, Sun Java JDK 1.8.0_60 e Hama-0.7.0 > as you can see from the following lines: > > > > > $ hadoop version > > > Hadoop 2.7.1 > > > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r > 15ecc87ccf4a0228f35af08fc56de536e6ce657a > > > Compiled by jenkins on 2015-06-29T06:04Z > > > Compiled with protoc 2.5.0 > > > From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a > > > This command was run using > /home/tora/Downloads/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar > > > $ java -version > > > java version "1.8.0_60" > > > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > > > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode) > > > > I am able to properly run the basic Hadoop and Hama examples like the > following: > bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar > grep input output 'dfs[a-z.]+' > bin/hama jar hama-examples-0.7.0.jar pi > My problem is that I receive the following error message when I try to run > the HelloWorld Hama Streaming example with this instruction: > bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2 > -cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/ -program > /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP > The default python interpreters of my OS were python-2.7 and python-3.4; > since I had problems with this example I also tried to install python-3.2 > with the following instructions but it didn't solved the problem: > > > > > sudo apt-get install software-properties-common > > > sudo apt-add-repository ppa:fkrull/deadsnakes > > > sudo apt-get update > > > sudo apt-get install python3.2 > > > > The installed python version is 3.2.6 as you can see from the following > lines: > > > > > $ python3.2 > > > Python 3.2.6 (default, Oct 21 2014, 12:50:03) > > > [GCC 4.8.2] on linux2 > > > Type "help", "copyright", "credits" or "license" for more information. > > > >>> > > > > The error message is the following (I am working in local mode so I didn't > run bin/start-bspd.sh): > > > > > > > > > > $ clear;bin/hama pipes -streaming true -bspTasks 2 -interpreter > python3.2 -cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/ > -program /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP > > > 15/09/30 12:39:11 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > > > 15/09/30 12:39:11 INFO pipes.Submitter: Streaming enabled! > > > 15/09/30 12:39:11 INFO Configuration.deprecation: fs.default.name is > deprecated. Instead, use fs.defaultFS > > > 15/09/30 12:39:11 INFO Configuration.deprecation: user.name is > deprecated. Instead, use mapreduce.job.user.name > > > 15/09/30 12:39:11 WARN conf.Configuration: > org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream@4ddced80:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > > > 15/09/30 12:39:11 WARN conf.Configuration: > org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream@4ddced80:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > > > 15/09/30 12:39:12 INFO Configuration.deprecation: user.name is > deprecated. Instead, use mapreduce.job.user.name > > > 15/09/30 12:39:12 INFO bsp.BSPJobClient: Running job: > job_localrunner_0001 > > > 15/09/30 12:39:12 INFO Configuration.deprecation: > mapred.cache.localFiles is deprecated. Instead, use > mapreduce.job.cache.local.files > > > 15/09/30 12:39:12 INFO bsp.LocalBSPRunner: Setting up a new barrier for > 2 tasks! > > > java.lang.NumberFormatException: For input string: "Traceback (most > recent call last):" > > > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > > > at java.lang.Integer.parseInt(Integer.java:580) > > > at
[jira] [Resolved] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"
[ https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-980. - Resolution: Fixed I just committed this! Thanks! > Modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" > -- > > Key: HAMA-980 > URL: https://issues.apache.org/jira/browse/HAMA-980 > Project: Hama > Issue Type: Bug > Components: test >Affects Versions: 0.7.0 >Reporter: Minho Kim >Assignee: Minho Kim >Priority: Blocker > Fix For: 0.7.1 > > > Configuration value, "hama.sync.client.class", is never used. Because > configuration value to run test code is not "hama.sync.client.classe" but > "hama.sync.peer.class". > In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to > initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's > no use setting "hama.sync.client.class". > {code:title=SyncServiceFactory.java} > public static final String SYNC_SERVER_CLASS = "hama.sync.server.class"; > public static final String SYNC_PEER_CLASS = "hama.sync.peer.class"; > public static final String SYNC_MASTER_CLASS = "hama.sync.master.class"; > /** >* Returns a sync client via reflection based on what was configured. >*/ > public static PeerSyncClient getPeerSyncClient(Configuration conf) > throws ClassNotFoundException { > return (PeerSyncClient) ReflectionUtils.newInstance(conf > .getClassByName(conf.get(SYNC_PEER_CLASS, > ZooKeeperSyncClientImpl.class.getName())), conf); > } > {code} > We need to modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" in test codes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-961) Remove ANN package
[ https://issues.apache.org/jira/browse/HAMA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-961. - Resolution: Fixed I just committed this! > Remove ANN package > -- > > Key: HAMA-961 > URL: https://issues.apache.org/jira/browse/HAMA-961 > Project: Hama > Issue Type: Improvement > Components: machine learning >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.8.0 > > > I've recently started to review the MLP source codes closely, and I'm > thinking about some improvement and API refactoring e.g., APIs for > user-defined neuron and synapse models, data structure, ..., etc. > This issue is one of them, and related to train large models. I'm considering > distributed parameter server (http://parameterserver.org) for managing > parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-961) Remove ANN package
[ https://issues.apache.org/jira/browse/HAMA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013224#comment-15013224 ] Edward J. Yoon commented on HAMA-961: - As we decided, this efforts moved to Apache Horn podling. > Remove ANN package > -- > > Key: HAMA-961 > URL: https://issues.apache.org/jira/browse/HAMA-961 > Project: Hama > Issue Type: Improvement > Components: machine learning >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.8.0 > > > I've recently started to review the MLP source codes closely, and I'm > thinking about some improvement and API refactoring e.g., APIs for > user-defined neuron and synapse models, data structure, ..., etc. > This issue is one of them, and related to train large models. I'm considering > distributed parameter server (http://parameterserver.org) for managing > parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-961) Remove ANN package
[ https://issues.apache.org/jira/browse/HAMA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-961: Summary: Remove ANN package (was: Parameter Server for large scale MLP) > Remove ANN package > -- > > Key: HAMA-961 > URL: https://issues.apache.org/jira/browse/HAMA-961 > Project: Hama > Issue Type: Improvement > Components: machine learning >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.8.0 > > > I've recently started to review the MLP source codes closely, and I'm > thinking about some improvement and API refactoring e.g., APIs for > user-defined neuron and synapse models, data structure, ..., etc. > This issue is one of them, and related to train large models. I'm considering > distributed parameter server (http://parameterserver.org) for managing > parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"
[ https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-980: Priority: Blocker (was: Minor) > Modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" > -- > > Key: HAMA-980 > URL: https://issues.apache.org/jira/browse/HAMA-980 > Project: Hama > Issue Type: Bug > Components: test >Affects Versions: 0.7.0 >Reporter: Minho Kim >Assignee: Minho Kim >Priority: Blocker > Fix For: 0.7.1 > > > Configuration value, "hama.sync.client.class", is never used. Because > configuration value to run test code is not "hama.sync.client.classe" but > "hama.sync.peer.class". > In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to > initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's > no use setting "hama.sync.client.class". > {code:title=SyncServiceFactory.java} > public static final String SYNC_SERVER_CLASS = "hama.sync.server.class"; > public static final String SYNC_PEER_CLASS = "hama.sync.peer.class"; > public static final String SYNC_MASTER_CLASS = "hama.sync.master.class"; > /** >* Returns a sync client via reflection based on what was configured. >*/ > public static PeerSyncClient getPeerSyncClient(Configuration conf) > throws ClassNotFoundException { > return (PeerSyncClient) ReflectionUtils.newInstance(conf > .getClassByName(conf.get(SYNC_PEER_CLASS, > ZooKeeperSyncClientImpl.class.getName())), conf); > } > {code} > We need to modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" in test codes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"
[ https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015132#comment-15015132 ] Edward J. Yoon commented on HAMA-980: - Hi, really nice catch! BTW, why don't we change the SyncServiceFactory like below: {code} public static final String SYNC_SERVER_CLASS = "hama.sync.server.class"; - public static final String SYNC_PEER_CLASS = "hama.sync.peer.class"; + public static final String SYNC_PEER_CLASS = "hama.sync.client.class"; {code} Server and Client are always pair. > Modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" > -- > > Key: HAMA-980 > URL: https://issues.apache.org/jira/browse/HAMA-980 > Project: Hama > Issue Type: Bug > Components: test >Affects Versions: 0.7.0 >Reporter: Minho Kim >Assignee: Minho Kim >Priority: Minor > Fix For: 0.7.1 > > > Configuration value, "hama.sync.client.class", is never used. Because > configuration value to run test code is not "hama.sync.client.classe" but > "hama.sync.peer.class". > In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to > initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's > no use setting "hama.sync.client.class". > {code:title=SyncServiceFactory.java} > public static final String SYNC_SERVER_CLASS = "hama.sync.server.class"; > public static final String SYNC_PEER_CLASS = "hama.sync.peer.class"; > public static final String SYNC_MASTER_CLASS = "hama.sync.master.class"; > /** >* Returns a sync client via reflection based on what was configured. >*/ > public static PeerSyncClient getPeerSyncClient(Configuration conf) > throws ClassNotFoundException { > return (PeerSyncClient) ReflectionUtils.newInstance(conf > .getClassByName(conf.get(SYNC_PEER_CLASS, > ZooKeeperSyncClientImpl.class.getName())), conf); > } > {code} > We need to modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" in test codes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"
[ https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015345#comment-15015345 ] Edward J. Yoon commented on HAMA-980: - +1 > Modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" > -- > > Key: HAMA-980 > URL: https://issues.apache.org/jira/browse/HAMA-980 > Project: Hama > Issue Type: Bug > Components: test >Affects Versions: 0.7.0 >Reporter: Minho Kim >Assignee: Minho Kim >Priority: Blocker > Fix For: 0.7.1 > > > Configuration value, "hama.sync.client.class", is never used. Because > configuration value to run test code is not "hama.sync.client.classe" but > "hama.sync.peer.class". > In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to > initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's > no use setting "hama.sync.client.class". > {code:title=SyncServiceFactory.java} > public static final String SYNC_SERVER_CLASS = "hama.sync.server.class"; > public static final String SYNC_PEER_CLASS = "hama.sync.peer.class"; > public static final String SYNC_MASTER_CLASS = "hama.sync.master.class"; > /** >* Returns a sync client via reflection based on what was configured. >*/ > public static PeerSyncClient getPeerSyncClient(Configuration conf) > throws ClassNotFoundException { > return (PeerSyncClient) ReflectionUtils.newInstance(conf > .getClassByName(conf.get(SYNC_PEER_CLASS, > ZooKeeperSyncClientImpl.class.getName())), conf); > } > {code} > We need to modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" in test codes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"
[ https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-980: Fix Version/s: (was: 0.7.0) 0.7.1 > Modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" > -- > > Key: HAMA-980 > URL: https://issues.apache.org/jira/browse/HAMA-980 > Project: Hama > Issue Type: Bug > Components: test >Affects Versions: 0.7.0 >Reporter: Minho Kim >Assignee: Minho Kim >Priority: Minor > Fix For: 0.7.1 > > > Configuration value, "hama.sync.client.class", is never used. Because > configuration value to run test code is not "hama.sync.client.classe" but > "hama.sync.peer.class". > In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to > initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's > no use setting "hama.sync.client.class". > {code:title=SyncServiceFactory.java} > public static final String SYNC_SERVER_CLASS = "hama.sync.server.class"; > public static final String SYNC_PEER_CLASS = "hama.sync.peer.class"; > public static final String SYNC_MASTER_CLASS = "hama.sync.master.class"; > /** >* Returns a sync client via reflection based on what was configured. >*/ > public static PeerSyncClient getPeerSyncClient(Configuration conf) > throws ClassNotFoundException { > return (PeerSyncClient) ReflectionUtils.newInstance(conf > .getClassByName(conf.get(SYNC_PEER_CLASS, > ZooKeeperSyncClientImpl.class.getName())), conf); > } > {code} > We need to modify configuration value from "hama.sync.client.class" to > "hama.sync.peer.class" in test codes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-979) Change the setting the -source and -target of the Java Compiler to 1.7
[ https://issues.apache.org/jira/browse/HAMA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-979. - Resolution: Fixed Assignee: Edward J. Yoon I've just committed this! > Change the setting the -source and -target of the Java Compiler to 1.7 > -- > > Key: HAMA-979 > URL: https://issues.apache.org/jira/browse/HAMA-979 > Project: Hama > Issue Type: Bug > Components: build >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon > Fix For: 0.7.1 > > > As we discussed before http://markmail.org/message/xjpwn7uiit64vcd4, we > decided to move to Java7. > We need to change the setting the -source and -target of the Java Compiler to > 1.7 from 1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-979) Change the setting the -source and -target of the Java Compiler to 1.7
Edward J. Yoon created HAMA-979: --- Summary: Change the setting the -source and -target of the Java Compiler to 1.7 Key: HAMA-979 URL: https://issues.apache.org/jira/browse/HAMA-979 Project: Hama Issue Type: Bug Components: build Affects Versions: 0.7.0 Reporter: Edward J. Yoon Fix For: 0.7.1 As we discussed before http://markmail.org/message/xjpwn7uiit64vcd4, we decided to move to Java7. We need to change the setting the -source and -target of the Java Compiler to 1.7 from 1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-978) NumberFormatException at StreamingUplinkReaderThread.readCommand
Edward J. Yoon created HAMA-978: --- Summary: NumberFormatException at StreamingUplinkReaderThread.readCommand Key: HAMA-978 URL: https://issues.apache.org/jira/browse/HAMA-978 Project: Hama Issue Type: Bug Components: pipes Affects Versions: 0.7.0 Reporter: Edward J. Yoon Fix For: 0.7.1 {code} Hi to all, this is my first mail to this mailing list so please have patience if I make some bad choice in the format. I have a kubuntu-14.04 on an old Intel Core 2 Duo processor T7500 with 2 GB of RAM. I have properly installed Hadoop-2.7.1, Sun Java JDK 1.8.0_60 e Hama-0.7.0 as you can see from the following lines: > > > $ hadoop version > > Hadoop 2.7.1 > > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a > > Compiled by jenkins on 2015-06-29T06:04Z > > Compiled with protoc 2.5.0 > > From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a > > This command was run using /home/tora/Downloads/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar > > $ java -version > > java version "1.8.0_60" > > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode) > > I am able to properly run the basic Hadoop and Hama examples like the following: bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+' bin/hama jar hama-examples-0.7.0.jar pi My problem is that I receive the following error message when I try to run the HelloWorld Hama Streaming example with this instruction: bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2 -cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/ -program /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP The default python interpreters of my OS were python-2.7 and python-3.4; since I had problems with this example I also tried to install python-3.2 with the following instructions but it didn't solved the problem: > > > sudo apt-get install software-properties-common > > sudo apt-add-repository ppa:fkrull/deadsnakes > > sudo apt-get update > > sudo apt-get install python3.2 > > The installed python version is 3.2.6 as you can see from the following lines: > > > $ python3.2 > > Python 3.2.6 (default, Oct 21 2014, 12:50:03) > > [GCC 4.8.2] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> > > The error message is the following (I am working in local mode so I didn't run bin/start-bspd.sh): > > > > > > $ clear;bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2 -cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/ -program /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP > > 15/09/30 12:39:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable > > 15/09/30 12:39:11 INFO pipes.Submitter: Streaming enabled! > > 15/09/30 12:39:11 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS > > 15/09/30 12:39:11 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name > > 15/09/30 12:39:11 WARN conf.Configuration: org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream@4ddced80:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. > > 15/09/30 12:39:11 WARN conf.Configuration: org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream@4ddced80:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. > > 15/09/30 12:39:12 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name > > 15/09/30 12:39:12 INFO bsp.BSPJobClient: Running job: job_localrunner_0001 > > 15/09/30 12:39:12 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files > > 15/09/30 12:39:12 INFO bsp.LocalBSPRunner: Setting up a new barrier for 2 tasks! > > java.lang.NumberFormatException: For input string: "Traceback (most recent call last):" > > at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > > at java.lang.Integer.parseInt(Integer.java:580) > > at java.lang.Integer.parseInt(Integer.java:615) > > at org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174) > > at org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106) > > java.lang.NumberFormatException: For input string: "Traceback (most recent call last):" > > at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > > at java.lang.Integer.parseInt(Integer.java:580) > > at java.lang.Integer.parseInt(Integer.java:615) > >
[jira] [Created] (HAMA-977) Migrate from SVN to GIT
Edward J. Yoon created HAMA-977: --- Summary: Migrate from SVN to GIT Key: HAMA-977 URL: https://issues.apache.org/jira/browse/HAMA-977 Project: Hama Issue Type: Task Reporter: Edward J. Yoon Assignee: Edward J. Yoon INFRA ticket: https://issues.apache.org/jira/servicedesk/agent/INFRA/issue/INFRA-10466 - need to udpate infor of Wiki and Website contents - HowToContribute, HowToCommit, etc. - need to change settings of nightly build jobs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-976) Add the GraphJob example on YARN
[ https://issues.apache.org/jira/browse/HAMA-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738369#comment-14738369 ] Edward J. Yoon commented on HAMA-976: - Hi, just FYI, Before commit, we usually wait some time for review from other committers. If anyone comment there, you can commit by lazy consensus like this https://issues.apache.org/jira/browse/HAMA-818?focusedCommentId=13831194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13831194 > Add the GraphJob example on YARN > > > Key: HAMA-976 > URL: https://issues.apache.org/jira/browse/HAMA-976 > Project: Hama > Issue Type: New Feature > Components: yarn >Affects Versions: 0.7.1 >Reporter: Minho Kim >Assignee: Minho Kim > Fix For: 0.7.1 > > > I'll add the graph example to work on YARN cluster. The example is PageRank. > I'll test whether running normally or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-975) Improvement of Async RPC
[ https://issues.apache.org/jira/browse/HAMA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734691#comment-14734691 ] Edward J. Yoon commented on HAMA-975: - {quote}I'd like to divide this issue to small sub tasks.{quote} It's good idea. Please keep up the great work! > Improvement of Async RPC > > > Key: HAMA-975 > URL: https://issues.apache.org/jira/browse/HAMA-975 > Project: Hama > Issue Type: Improvement > Components: bsp core >Reporter: JongYoon Lim > > Hama has a feature of async IPC. > I found some points which have possibility to be improved as below. > 1. Add netty encoder and decoder to lighten a load of the handler. > 2. Consider using native transport, EpollEventLoopGroup instead of > NioEventLoopGroup. > 3. Using pooled buffer. > 4. Using ctx.* instead of channel.* > 5. Find and remove blocking codes. > 6. Async-fashioned RPC response > Also we can consider compress or json-style marshalling(unmarshalling). > But I'm not sure these changes always result in improvement of the > performance.. so benchmark should be provided to prove the improvement. > I'd like to divide this issue to small sub tasks. > WDYT? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAMA-975) Improvement of Async RPC
[ https://issues.apache.org/jira/browse/HAMA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon updated HAMA-975: Assignee: JongYoon Lim > Improvement of Async RPC > > > Key: HAMA-975 > URL: https://issues.apache.org/jira/browse/HAMA-975 > Project: Hama > Issue Type: Improvement > Components: bsp core >Reporter: JongYoon Lim >Assignee: JongYoon Lim > > Hama has a feature of async IPC. > I found some points which have possibility to be improved as below. > 1. Add netty encoder and decoder to lighten a load of the handler. > 2. Consider using native transport, EpollEventLoopGroup instead of > NioEventLoopGroup. > 3. Using pooled buffer. > 4. Using ctx.* instead of channel.* > 5. Find and remove blocking codes. > 6. Async-fashioned RPC response > Also we can consider compress or json-style marshalling(unmarshalling). > But I'm not sure these changes always result in improvement of the > performance.. so benchmark should be provided to prove the improvement. > I'd like to divide this issue to small sub tasks. > WDYT? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-974) Support fault tolerance for Graph job
[ https://issues.apache.org/jira/browse/HAMA-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734184#comment-14734184 ] Edward J. Yoon commented on HAMA-974: - Isn't HAMA-881 already addressed? I tested AsyncRcvdMsgCheckpointImpl, and it works fine. The problem is the last-checkpoint variables states. I think providing some custom checkpoint function is best. For example, we can add checkpointState() method to BSPInterface. {code} public setup() { } public bsp() { // your program } public checkpointState() { // define variables to be checkpointed } public close() { } {code} > Support fault tolerance for Graph job > - > > Key: HAMA-974 > URL: https://issues.apache.org/jira/browse/HAMA-974 > Project: Hama > Issue Type: Improvement > Components: graph >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon > Fix For: 0.8.0 > > > Currently we only checkpoints messages. To support FT for graph job, > aggregators, assigned vertices and its statuses must be checkpointed together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAMA-974) Support fault tolerance for Graph job
Edward J. Yoon created HAMA-974: --- Summary: Support fault tolerance for Graph job Key: HAMA-974 URL: https://issues.apache.org/jira/browse/HAMA-974 Project: Hama Issue Type: Improvement Components: graph Affects Versions: 0.7.0 Reporter: Edward J. Yoon Fix For: 0.8.0 Currently we only checkpoints messages. To support FT for graph job, aggregators, assigned vertices and its statuses must be checkpointed together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAMA-974) Support fault tolerance for Graph job
[ https://issues.apache.org/jira/browse/HAMA-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734229#comment-14734229 ] Edward J. Yoon commented on HAMA-974: - Yes, we need to vote. BTW, your branch is quite old, it's hard to compare with trunk. Can you please summarize your changes to give an overview of the changes you've made? > Support fault tolerance for Graph job > - > > Key: HAMA-974 > URL: https://issues.apache.org/jira/browse/HAMA-974 > Project: Hama > Issue Type: Improvement > Components: graph >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon > Fix For: 0.8.0 > > > Currently we only checkpoints messages. To support FT for graph job, > aggregators, assigned vertices and its statuses must be checkpointed together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAMA-973) GraphJob and RandBench example works incorrectly when FT is enabled.
[ https://issues.apache.org/jira/browse/HAMA-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward J. Yoon resolved HAMA-973. - Resolution: Fixed Fix Version/s: 0.7.1 > GraphJob and RandBench example works incorrectly when FT is enabled. > > > Key: HAMA-973 > URL: https://issues.apache.org/jira/browse/HAMA-973 > Project: Hama > Issue Type: Bug > Components: bsp core >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon >Priority: Critical > Fix For: 0.7.1 > > Attachments: patch.txt > > > Today I tested fault tolerance function with RandBench. FT works fine but I > just found that there is a bug in RandBench program. > {code} > [root@cluster-0 hama-0.7.0]# bin/hama jar hama-examples-0.7.0.jar bench 100 > 100 100 > 15/09/03 12:59:57 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 15/09/03 12:59:58 INFO Configuration.deprecation: user.name is deprecated. > Instead, use mapreduce.job.user.name > 15/09/03 12:59:58 INFO bsp.BSPJobClient: Running job: job_201509031258_0002 > 15/09/03 13:00:01 INFO bsp.BSPJobClient: Current supersteps number: 0 > 15/09/03 13:00:22 INFO bsp.BSPJobClient: Current supersteps number: 2 > 15/09/03 13:00:26 INFO bsp.BSPJobClient: Current supersteps number: 5 > 15/09/03 13:00:29 INFO bsp.BSPJobClient: Current supersteps number: 11 > 15/09/03 13:00:32 INFO bsp.BSPJobClient: Current supersteps number: 16 > 15/09/03 13:00:35 INFO bsp.BSPJobClient: Current supersteps number: 21 > 15/09/03 13:00:38 INFO bsp.BSPJobClient: Current supersteps number: 28 > 15/09/03 13:00:41 INFO bsp.BSPJobClient: Current supersteps number: 35 > 15/09/03 13:00:44 INFO bsp.BSPJobClient: Current supersteps number: 42 > 15/09/03 13:00:47 INFO bsp.BSPJobClient: Current supersteps number: 49 > 15/09/03 13:00:50 INFO bsp.BSPJobClient: Current supersteps number: 56 > 15/09/03 13:02:05 INFO bsp.BSPJobClient: Current supersteps number: 0 > 15/09/03 13:02:08 INFO bsp.BSPJobClient: Current supersteps number: 56 > 15/09/03 13:02:11 INFO bsp.BSPJobClient: Current supersteps number: 0 > 15/09/03 13:02:20 INFO bsp.BSPJobClient: Current supersteps number: 57 > 15/09/03 13:02:23 INFO bsp.BSPJobClient: Current supersteps number: 61 > 15/09/03 13:02:26 INFO bsp.BSPJobClient: Current supersteps number: 67 > 15/09/03 13:02:29 INFO bsp.BSPJobClient: Current supersteps number: 72 > 15/09/03 13:02:32 INFO bsp.BSPJobClient: Current supersteps number: 77 > 15/09/03 13:02:35 INFO bsp.BSPJobClient: Current supersteps number: 84 > 15/09/03 13:02:38 INFO bsp.BSPJobClient: Current supersteps number: 91 > 15/09/03 13:02:41 INFO bsp.BSPJobClient: Current supersteps number: 97 > 15/09/03 13:02:44 INFO bsp.BSPJobClient: Current supersteps number: 106 > 15/09/03 13:02:47 INFO bsp.BSPJobClient: Current supersteps number: 113 > 15/09/03 13:02:50 INFO bsp.BSPJobClient: Current supersteps number: 125 > 15/09/03 13:02:53 INFO bsp.BSPJobClient: Current supersteps number: 134 > 15/09/03 13:02:56 INFO bsp.BSPJobClient: Current supersteps number: 144 > 15/09/03 13:02:59 INFO bsp.BSPJobClient: Current supersteps number: 152 > 15/09/03 13:03:02 INFO bsp.BSPJobClient: Current supersteps number: 156 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: The total number of supersteps: 156 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: Counters: 6 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: > org.apache.hama.bsp.JobInProgress$JobCounter > 15/09/03 13:03:05 INFO bsp.BSPJobClient: SUPERSTEPS=156 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: LAUNCHED_TASKS=160 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: > org.apache.hama.bsp.BSPPeerImpl$PeerCounter > 15/09/03 13:03:05 INFO bsp.BSPJobClient: SUPERSTEP_SUM=24960 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=1943366 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=160 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=160 > Job Finished in 187.453 seconds > {code} > I ran with set the max iteration to 100. At 56 superstep, I killed one task > manually and I checked that failed task has automatically recovered. By the > way, the total num of supersteps was 156, not 100. > The reason is simple, i always starts from 0. To fix this issue, we have to > set the i to (int) peer.getSuperstepCount(). > {code} > public void bsp( > BSPPeerBytesWritable> peer) > throws IOException, SyncException, InterruptedException { > byte[] dummyData = new byte[sizeOfMsg]; > String[] peers = peer.getAllPeerNames(); > for (int i = 0; i < nSupersteps; i++) { > {code} > GraphJobRunner also have similar problem.
[jira] [Commented] (HAMA-973) GraphJob and RandBench example works incorrectly when FT is enabled.
[ https://issues.apache.org/jira/browse/HAMA-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734126#comment-14734126 ] Edward J. Yoon commented on HAMA-973: - I just committed my changes. For graphjob FT, there are more things to be fixed. For example, Vertex status also must be checkpointed. I'll fix them later. > GraphJob and RandBench example works incorrectly when FT is enabled. > > > Key: HAMA-973 > URL: https://issues.apache.org/jira/browse/HAMA-973 > Project: Hama > Issue Type: Bug > Components: bsp core >Affects Versions: 0.7.0 >Reporter: Edward J. Yoon >Assignee: Edward J. Yoon >Priority: Critical > Attachments: patch.txt > > > Today I tested fault tolerance function with RandBench. FT works fine but I > just found that there is a bug in RandBench program. > {code} > [root@cluster-0 hama-0.7.0]# bin/hama jar hama-examples-0.7.0.jar bench 100 > 100 100 > 15/09/03 12:59:57 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 15/09/03 12:59:58 INFO Configuration.deprecation: user.name is deprecated. > Instead, use mapreduce.job.user.name > 15/09/03 12:59:58 INFO bsp.BSPJobClient: Running job: job_201509031258_0002 > 15/09/03 13:00:01 INFO bsp.BSPJobClient: Current supersteps number: 0 > 15/09/03 13:00:22 INFO bsp.BSPJobClient: Current supersteps number: 2 > 15/09/03 13:00:26 INFO bsp.BSPJobClient: Current supersteps number: 5 > 15/09/03 13:00:29 INFO bsp.BSPJobClient: Current supersteps number: 11 > 15/09/03 13:00:32 INFO bsp.BSPJobClient: Current supersteps number: 16 > 15/09/03 13:00:35 INFO bsp.BSPJobClient: Current supersteps number: 21 > 15/09/03 13:00:38 INFO bsp.BSPJobClient: Current supersteps number: 28 > 15/09/03 13:00:41 INFO bsp.BSPJobClient: Current supersteps number: 35 > 15/09/03 13:00:44 INFO bsp.BSPJobClient: Current supersteps number: 42 > 15/09/03 13:00:47 INFO bsp.BSPJobClient: Current supersteps number: 49 > 15/09/03 13:00:50 INFO bsp.BSPJobClient: Current supersteps number: 56 > 15/09/03 13:02:05 INFO bsp.BSPJobClient: Current supersteps number: 0 > 15/09/03 13:02:08 INFO bsp.BSPJobClient: Current supersteps number: 56 > 15/09/03 13:02:11 INFO bsp.BSPJobClient: Current supersteps number: 0 > 15/09/03 13:02:20 INFO bsp.BSPJobClient: Current supersteps number: 57 > 15/09/03 13:02:23 INFO bsp.BSPJobClient: Current supersteps number: 61 > 15/09/03 13:02:26 INFO bsp.BSPJobClient: Current supersteps number: 67 > 15/09/03 13:02:29 INFO bsp.BSPJobClient: Current supersteps number: 72 > 15/09/03 13:02:32 INFO bsp.BSPJobClient: Current supersteps number: 77 > 15/09/03 13:02:35 INFO bsp.BSPJobClient: Current supersteps number: 84 > 15/09/03 13:02:38 INFO bsp.BSPJobClient: Current supersteps number: 91 > 15/09/03 13:02:41 INFO bsp.BSPJobClient: Current supersteps number: 97 > 15/09/03 13:02:44 INFO bsp.BSPJobClient: Current supersteps number: 106 > 15/09/03 13:02:47 INFO bsp.BSPJobClient: Current supersteps number: 113 > 15/09/03 13:02:50 INFO bsp.BSPJobClient: Current supersteps number: 125 > 15/09/03 13:02:53 INFO bsp.BSPJobClient: Current supersteps number: 134 > 15/09/03 13:02:56 INFO bsp.BSPJobClient: Current supersteps number: 144 > 15/09/03 13:02:59 INFO bsp.BSPJobClient: Current supersteps number: 152 > 15/09/03 13:03:02 INFO bsp.BSPJobClient: Current supersteps number: 156 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: The total number of supersteps: 156 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: Counters: 6 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: > org.apache.hama.bsp.JobInProgress$JobCounter > 15/09/03 13:03:05 INFO bsp.BSPJobClient: SUPERSTEPS=156 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: LAUNCHED_TASKS=160 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: > org.apache.hama.bsp.BSPPeerImpl$PeerCounter > 15/09/03 13:03:05 INFO bsp.BSPJobClient: SUPERSTEP_SUM=24960 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=1943366 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=160 > 15/09/03 13:03:05 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=160 > Job Finished in 187.453 seconds > {code} > I ran with set the max iteration to 100. At 56 superstep, I killed one task > manually and I checked that failed task has automatically recovered. By the > way, the total num of supersteps was 156, not 100. > The reason is simple, i always starts from 0. To fix this issue, we have to > set the i to (int) peer.getSuperstepCount(). > {code} > public void bsp( > BSPPeerBytesWritable> peer) > throws IOException, SyncException, InterruptedException { > byte[] dummyData = new byte[sizeOfMsg]; > String[] peers =