[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2017-05-14 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009911#comment-16009911
 ] 

Edward J. Yoon commented on HAMA-983:
-

{code}
# create a new branch inside your directory 'current'
git checkout -b HAMA-983
# ... do some changes to the files ...
# store changes in the branch
git push origin HAMA-983
# commit changes to the branch
git commit -a -m '[HAMA-983] Hama runner for DataFlow'
Then go to your GitHub HAMA page and do a Pull Request. 
{code}

Hi JongYoon, you can create new branch like above.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2017-04-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984007#comment-15984007
 ] 

Edward J. Yoon commented on HAMA-983:
-

Sorry for late reply. 

{quote}could you create a branch called 'beam_support' on github?{quote} 

Sure. or, you'll also able to create a branch because you're committer. I can 
do it this weekend.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAMA-999) Wrong size of MemoryQueue

2017-04-10 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963871#comment-15963871
 ] 

Edward J. Yoon commented on HAMA-999:
-

Thanks for report! I'll check what's wrong.

> Wrong size of MemoryQueue
> -
>
> Key: HAMA-999
> URL: https://issues.apache.org/jira/browse/HAMA-999
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Reporter: JongYoon Lim
>
> I found that *SuperstepPiEstimator* example sometimes gives wrong result when 
> call *peer.getNumCurrentMessages()*. And that was because of wrong *size()* 
> of *MemoryQueue*. When I printed out sizes of queue from the example, 
> sometimes it said, 
> {noformat}
> bundle size: 20, numOfMsg: 19, deque size: 0
> {noformat}
> I think *deque*, *bundles* and *numOfMsg* of *MemoryQueue* should be properly 
> synchronized to get correct result. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-12-07 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730744#comment-15730744
 ] 

Edward J. Yoon commented on HAMA-983:
-

cool, let me check.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-12-07 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730450#comment-15730450
 ] 

Edward J. Yoon commented on HAMA-983:
-

Here's my skeleton code with example that counts the words. You should 
implement the HamaPipelineRunner. Just translate and execute batch job. I think 
you can find how to translate them from flink's code: 
https://github.com/dataArtisans/flink-dataflow/blob/aad5d936abd41240f3e15d294ea181fb9cca05e0/runner/src/main/java/com/dataartisans/flink/dataflow/translation/FlinkBatchTransformTranslators.java#L410

{code}
public class WordCountTest {

  static final String[] WORDS_ARRAY = new String[] { "hi there", "hi",
  "hi sue bob", "hi sue", "", "bob hi" };

  static final List WORDS = Arrays.asList(WORDS_ARRAY);

  static final String[] COUNTS_ARRAY = new String[] { "hi: 5", "there: 1",
  "sue: 2", "bob: 2" };

  /**
   * Example test that tests a PTransform by using an in-memory input and
   * inspecting the output.
   */
  @Test
  @Category(RunnableOnService.class)
  public void testCountWords() throws Exception {
HamaOptions options = PipelineOptionsFactory.as(HamaOptions.class);
options.setRunner(HamaPipelineRunner.class);
Pipeline p = Pipeline.create(options);

PCollection input = p.apply(Create.of(WORDS).withCoder(
StringUtf8Coder.of()));

PCollection output = input
.apply(new WordCount())
.apply(MapElements.via(new FormatAsTextFn()));
//.apply(TextIO.Write.to("/tmp/result"));

PAssert.that(output).containsInAnyOrder(COUNTS_ARRAY);
p.run().waitUntilFinish();
  }

  public static class WordCount extends
  PTransform>> {

private static final long serialVersionUID = 1L;

@Override
public PCollection> apply(PCollection lines) {

  // Convert lines of text into individual words.
  PCollection words = lines.apply(ParDo.of(new DoFn() {
private static final long serialVersionUID = 1L;
private final Aggregator emptyLines =
createAggregator("emptyLines", new Sum.SumLongFn());

@ProcessElement
public void processElement(ProcessContext c) {
  if (c.element().trim().isEmpty()) {
emptyLines.addValue(1L);
  }

  // Split the line into words.
  String[] words = c.element().split("[^a-zA-Z']+");

  // Output each word encountered into the output PCollection.
  for (String word : words) {
if (!word.isEmpty()) {
  c.output(word);
}
  }
}
  }));

  // Count the number of times each word occurs.
  PCollection> wordCounts = words.apply(Count
  . perElement());

  return wordCounts;
}
  }

  // / TODO
  public static class HamaPipelineRunner extends
  PipelineRunner {

public static HamaPipelineRunner fromOptions(PipelineOptions x) {
  return new HamaPipelineRunner();
}

@Override
public  Output apply(
PTransform transform, Input input) {
return super.apply(transform, input);
}

@Override
public HamaPipelineResult run(Pipeline pipeline) {
  // TODO Auto-generated method stub
  System.out.println("Executing pipeline using HamaPipelineRunner.");

  // TODO you need to translate pipeline to Hama program
  // and execute pipeline
  // return the result
  return null;
}

  }

  public class HamaPipelineResult implements PipelineResult {

@Override
public State getState() {
  // TODO Auto-generated method stub
  return null;
}

@Override
public State cancel() throws IOException {
  // TODO Auto-generated method stub
  return null;
}

@Override
public State waitUntilFinish(Duration duration) {
  // TODO Auto-generated method stub
  return null;
}

@Override
public State waitUntilFinish() {
  // TODO Auto-generated method stub
  return null;
}

@Override
public  AggregatorValues getAggregatorValues(
Aggregator aggregator) throws AggregatorRetrievalException {
  // TODO Auto-generated method stub
  return null;
}

@Override
public MetricResults metrics() {
  // TODO Auto-generated method stub
  return null;
}

  }

  public static interface HamaOptions extends PipelineOptions {

  }

}
{code}

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The 

[jira] [Created] (HAMA-997) Docker-compose for Hama Cluster

2016-11-13 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-997:
---

 Summary: Docker-compose for Hama Cluster
 Key: HAMA-997
 URL: https://issues.apache.org/jira/browse/HAMA-997
 Project: Hama
  Issue Type: Task
  Components: build , documentation 
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.2


The current docker file doesn't work correctly. Each service e.g., master, 
groom servers should have own docker file and be launched using docker-compose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-09-18 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502033#comment-15502033
 ] 

Edward J. Yoon commented on HAMA-983:
-

>> once PoC is done

Great. If you need some helps, feel free to let me know :-)

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-09-18 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502017#comment-15502017
 ] 

Edward J. Yoon commented on HAMA-983:
-

Why don't we contribute this feature to the Apache Beam directly? 
https://github.com/apache/incubator-beam/tree/master/runners

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-09-18 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501973#comment-15501973
 ] 

Edward J. Yoon commented on HAMA-983:
-

https://cloud.google.com/dataflow/examples/wordcount-example

This page is well-described about beam concept. The flow is like below:

{code}
Creating the Pipeline
Applying transforms to the Pipeline
Reading input (in this example: reading text files)
Applying ParDo transforms
Applying SDK-provided transforms (in this example: Count)
Writing output (in this example: writing to Google Cloud Storage)
Running the Pipeline
{code}

Once we created Hama pipeline we should able to run the program like below:

{code}
  public static void main(String[] args) {
// Create a pipeline parameterized by commandline flags.
Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(arg));

p.apply(TextIO.Read.from("gs://..."))   // Read input.
 .apply(new CountWords())   // Do some processing.
 .apply(TextIO.Write.to("gs://..."));   // Write output.

// Run the pipeline.
p.run();
  }
{code}

For I/O operations, you can refer this 
https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/io/hadoop/HadoopIO.java
 (instead of org.apache.hadoop.mapreduce.lib.input.FileInputFormat you should 
use 
https://github.com/apache/hama/blob/master/core/src/main/java/org/apache/hama/bsp/FileInputFormat.java)

{quote}BSP for dataflow could be similar to SuperstepBSP{quote}

I think so. GroupByKey seems a built-in processor that groups records by key. 
We should implement it using a superstep.





> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-08-31 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454239#comment-15454239
 ] 

Edward J. Yoon commented on HAMA-983:
-

Just FYI, Apache Beam's basic example is wordcount. I guess, the batch mode can 
be similar with org.apache.hama.examples.PiEstimator: (n - 1) tasks parses and 
counts the words and 1 task aggregates the word counts and emits the final 
result. The streaming mode is not sure, so you'll need to check how it handles 
io.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-983) Hama runner for DataFlow

2016-08-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451100#comment-15451100
 ] 

Edward J. Yoon commented on HAMA-983:
-

Hi, I didn't look at dataflow (apache beam) closely, but:

>> Do you mean that each superstep can be executed in data pipeline as a 
>> pcollection? 

I guess yes, or single job can be executed as the case may be.

If you're interested in working on this, you can refer 
https://github.com/dataArtisans/flink-dataflow/blob/master/runner/src/main/java/com/dataartisans/flink/dataflow/FlinkPipelineRunner.java

And, before we do this, HAMA-940 and data processing BSP maybe the first I 
guess. Please feel free to drop your opinion and contribute the patches. :-)

If you have any questions, let me know.

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-991) Add math classes for float16/float32

2016-08-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438496#comment-15438496
 ] 

Edward J. Yoon commented on HAMA-991:
-

NOTE: float16 is not implemented yet.

> Add math classes for float16/float32
> 
>
> Key: HAMA-991
> URL: https://issues.apache.org/jira/browse/HAMA-991
> Project: Hama
>  Issue Type: New Feature
>  Components: math
>Affects Versions: 0.7.1
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.2
>
>
> Implement Float32Writable, Vector, and Matrix etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-988) Allow to add additional no-input tasks as number user want

2016-08-25 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-988.
-
Resolution: Fixed

solved

> Allow to add additional no-input tasks as number user want
> --
>
> Key: HAMA-988
> URL: https://issues.apache.org/jira/browse/HAMA-988
> Project: Hama
>  Issue Type: Improvement
>  Components: bsp core
>Affects Versions: 0.7.1
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.2
>
>
> BSP framework basically launches the tasks as number of splits. And, 
> force-setting the number of tasks is also possible by setting 
> "hama.force.set.bsp.tasks" to true .
> By the way, there's no way to add more specific tasks to the number of 
> splits. For example, if input has 5 splits, I want to launch 6 (1 more 
> no-input task to be acted as a master) tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-996) Delete meaningless parameter

2016-08-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438170#comment-15438170
 ] 

Edward J. Yoon commented on HAMA-996:
-

I think TaskInProgress.getRecoveryTask() is related with task recovery. If ft 
service is enabled, the framework checkpoints statuses perioidically. When 
tasks failed or crashed, the framework recover the tasks from previous 
checkpoint automatically. It seems GroomServer.startRecoveryTask() and 
AsyncRcvdMsgCheckpointImpl.restartTask()'s role is for that.

If TaskInProgress.getRecoveryTask() is useless code, we can remove them or add 
tags @Deprecated with some comments.



> Delete meaningless parameter
> 
>
> Key: HAMA-996
> URL: https://issues.apache.org/jira/browse/HAMA-996
> Project: Hama
>  Issue Type: Improvement
>Reporter: JongYoon Lim
>Priority: Trivial
> Attachments: HAMA-996.patch
>
>
> It seems that *taskid* param from *getGroomToSchedule()* of *TaskInProgress* 
> is not essential for this function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-799) Add a new BSP API that uses multiple threads

2016-08-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438151#comment-15438151
 ] 

Edward J. Yoon commented on HAMA-799:
-

Hi,

I originally thought that we can add something like 
https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html
 and the goal was supporting easy-to-use multithreading API within BSP. But we 
may different slightly.

In MapReduce case, map(K, V) function processes K, V of each line of the chunks 
of data sequentially (as you already might know). The multithreadedMap 
processes lines concurrently and generates intermediate files. 

The BSP model is more flexible. We can implement mapreduce framework on BSP 
model like below:

{code}
bsp(BSPPeer peer) {
 while (peer.readNext(key, value)) {
map(key, value); // calls user-defined map function.
}
... 
}
{code}

Then, the MultithreadedMapper is just like below:

{code}
bsp(BSPPeer peer) {
 while (peer.readNext(key, value)) {
executor.execute(new MultithreadedMapper(key, value)); // executes map 
function concurrently.
}
... 
}
{code}

After the while loop, above two approach will produce the same result but 
different performance.

The BSP model is slightly differenct. Each threads need to share the incoming 
and outgoing queues. Otherwise, it's just same with increasing the number of 
bsp tasks (this is meaningless). So, the multithreading should be used only for 
parallelization of some sequential computation part, not whole bsp() function. 
For example, 

{code}
bsp() {
   ...
   for(int i = 0; i < 1000; i++) {
  ... // this part can be multi-threaded.
   }
   ...
}
{code}

In GraphJobRunner, I used multithreading like below:

{code}
  private void doSuperstep(GraphJobMessage currentMessage,
  BSPPeer peer)
  throws IOException {
this.errorCount.set(0);
long startTime = System.currentTimeMillis();

this.changedVertexCnt = 0;
vertices.startSuperstep();

ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors
.newCachedThreadPool();
executor.setMaximumPoolSize(conf.getInt(DEFAULT_THREAD_POOL_SIZE, 64));
executor.setRejectedExecutionHandler(retryHandler);

long loopStartTime = System.currentTimeMillis();
while (currentMessage != null) {
  executor.execute(new ComputeRunnable(currentMessage));

  currentMessage = peer.getCurrentMessage();
}
LOG.info("Total time spent for superstep-" + peer.getSuperstepCount()
+ " looping: " + (System.currentTimeMillis() - loopStartTime) + " ms");

executor.shutdown();
try {
  executor.awaitTermination(60, TimeUnit.SECONDS);
} catch (InterruptedException e) {
  throw new IOException(e);
}

if (errorCount.get() > 0) {
  throw new IOException("there were " + errorCount
  + " exceptions during compute vertices.");
}

Iterator it = vertices.iterator();
while (it.hasNext()) {
  Vertex vertex = (Vertex) it.next();
  if (!vertex.isHalted() && !vertex.isComputed()) {
vertex.compute(Collections. emptyList());
vertices.finishVertexComputation(vertex);
  }
}

getAggregationRunner().sendAggregatorValues(peer,
vertices.getActiveVerticesNum(), this.changedVertexCnt);
this.iteration++;

LOG.info("Total time spent for superstep-" + peer.getSuperstepCount()
+ " computing vertices: " + (System.currentTimeMillis() - startTime)
+ " ms");

startTime = System.currentTimeMillis();
finishSuperstep();
LOG.info("Total time spent for superstep-" + peer.getSuperstepCount()
+ " synchronizing: " + (System.currentTimeMillis() - startTime) + " 
ms");
}
{code}

If there's more elegant way to use multithreading in bsp() function, we can do 
it. Otherwise, we should close this issue.

> Add a new BSP API that uses multiple threads
> 
>
> Key: HAMA-799
> URL: https://issues.apache.org/jira/browse/HAMA-799
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
>
> Add a new (additional) BSP API that uses multiple threads, called 
> MultithreadedBSP. This could help in speeding up the highly CPU-intensive 
> task.
> And, I personally would like to re-design the GraphJobRunner based on this 
> MultithreadedBSP. Because computing vertex 1 at a time is a reason of slow 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-996) Delete meaningless parameter

2016-08-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436419#comment-15436419
 ] 

Edward J. Yoon commented on HAMA-996:
-

It looks like getGroomToSchedule() method is useless.

> Delete meaningless parameter
> 
>
> Key: HAMA-996
> URL: https://issues.apache.org/jira/browse/HAMA-996
> Project: Hama
>  Issue Type: Improvement
>Reporter: JongYoon Lim
>Priority: Trivial
> Attachments: HAMA-996.patch
>
>
> It seems that *taskid* param from *getGroomToSchedule()* of *TaskInProgress* 
> is not essential for this function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-994) Support GPU for math operations

2016-08-24 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436238#comment-15436238
 ] 

Edward J. Yoon commented on HAMA-994:
-

Status: I checked license issue of aparapi but it's not suitable in Apache 
project - https://github.com/aparapi/aparapi/issues/37 If AMD's official reply 
is also same, I'll check another options.

> Support GPU for math operations
> ---
>
> Key: HAMA-994
> URL: https://issues.apache.org/jira/browse/HAMA-994
> Project: Hama
>  Issue Type: New Feature
>Affects Versions: 0.7.1
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.2
>
>
> Support GPU for matrix/vector operations using aparapi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-994) Support GPU for math operations

2016-08-09 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-994:
---

 Summary: Support GPU for math operations
 Key: HAMA-994
 URL: https://issues.apache.org/jira/browse/HAMA-994
 Project: Hama
  Issue Type: New Feature
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.2


Support GPU for matrix/vector operations using aparapi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-993) HAMA Cluster is not running pi example

2016-07-29 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398974#comment-15398974
 ] 

Edward J. Yoon commented on HAMA-993:
-

Hi,

Can you please provide your error logs?

> HAMA Cluster is not running pi example
> --
>
> Key: HAMA-993
> URL: https://issues.apache.org/jira/browse/HAMA-993
> Project: Hama
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 0.7.1
>Reporter: Jatinder Goyal
>
> I have setup hama cluster of 9 nodes. I have used all the recommended 
> settings given on the site, but when I try to run pi example on hama it gets 
> stuck there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-06-21 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341557#comment-15341557
 ] 

Edward J. Yoon edited comment on HAMA-990 at 6/21/16 10:59 AM:
---

Generally looks good! As you already planned, it'd be nice if you can add more 
functions which dumps the output and plots 2d charts (gnu plot or google chart 
api?). Why don't you create a simple benchmark-tool project on github? That's 
more easy way to code reveiw and share. 



was (Author: udanax):
Generally looks good! As you already planned, it'd be nice if you can add more 
functions which dumps the output and plots 2d charts. Why don't you create a 
simple benchmark-tool project on github? That's more easy way to code reveiw 
and share. 


> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-06-21 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341557#comment-15341557
 ] 

Edward J. Yoon commented on HAMA-990:
-

Generally looks good! As you already planned, it'd be nice if you can add more 
functions which dumps the output and plots 2d charts. Why don't you create a 
simple benchmark-tool project on github? That's more easy way to code reveiw 
and share. 


> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-06-14 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331107#comment-15331107
 ] 

Edward J. Yoon commented on HAMA-990:
-

Sorry for late review, I'm on business trip until 21th :/ Until next week, it'd 
be nice if you can write some documentation and share test result w/ me.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: Benchmark_script.sh, ver1.1_benchmark_script.sh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311757#comment-15311757
 ] 

Edward J. Yoon commented on HAMA-992:
-

Hi, if possible please attach your bsp python code here.

And, I mean, you should update the BinaryProtocol.py and copy to hdfs again 
(instead of change the hama project).


> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputStream.write(OutputStream.java:116)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at 

[jira] [Issue Comment Deleted] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-992:

Comment: was deleted

(was: +1. Thanks for your opinion and action.)

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputStream.write(OutputStream.java:116)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
>   at 

[jira] [Issue Comment Deleted] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-992:

Comment: was deleted

(was: +1. Thanks for your opinion and action.)

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputStream.write(OutputStream.java:116)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
>   at 

[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310141#comment-15310141
 ] 

Edward J. Yoon commented on HAMA-992:
-

+1. Thanks for your opinion and action.

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputStream.write(OutputStream.java:116)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
>   at 

[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310140#comment-15310140
 ] 

Edward J. Yoon commented on HAMA-992:
-

+1. Thanks for your opinion and action.

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputStream.write(OutputStream.java:116)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
>   at 

[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310142#comment-15310142
 ] 

Edward J. Yoon commented on HAMA-992:
-

+1. Thanks for your opinion and action.

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputStream.write(OutputStream.java:116)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
>   at 

[jira] [Commented] (HAMA-992) Hama streaming

2016-06-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310101#comment-15310101
 ] 

Edward J. Yoon commented on HAMA-992:
-

Thomas, I'm fine either way but adding to hama repo would be good idea if we 
can put some efforts to increase python supports and attract python users.

> Hama streaming
> --
>
> Key: HAMA-992
> URL: https://issues.apache.org/jira/browse/HAMA-992
> Project: Hama
>  Issue Type: Question
>  Components: bsp core, pipes
>Affects Versions: 0.7.1
> Environment: RASPBIAN JESSIE
> Full desktop image based on Debian Jessie
>Reporter: Chaitanya
>  Labels: features, github-import, newbie
>
> Hello all,
> I am trying to implement apache hama on Raspberry pi model 3 to establish a 
> distributed computing platform for scientific computation. I am trying to run 
> hama streaming over hadoop on a single namenode but I am facing a bit of a 
> difficulty in streaming my python code. I have downloaded the hama streaming 
> repository from :-
> https://github.com/thomasjungblut/HamaStreaming
> I ran the examples and also HelloWorldBSP.py on Hama and they work well. But 
> as soon as I switch to running my python code, the job fails.   
> I am trying to run the code with the following command:-  
> hama pipes -streaming true -bspTasks 1 -interpreter python -output 
> /tmp/pystream-out_2/ -program /tmp/PyStreaming/BSPRunner.py -programArgs 
> python.py   
> Below is the log file for your reference. I hope you can find time to help me 
> in this minor project:-
> 16/06/01 14:48:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/06/01 14:48:49 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server listener on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server Responder: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 16/06/01 14:48:49 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 16/06/01 14:48:49 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:localhost port:61001
> 16/06/01 14:48:51 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 16/06/01 14:48:51 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 16/06/01 14:48:51 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At localhost/127.0.0.1:61001
> java.lang.NumberFormatException: For input string: "Traceback (most recent 
> call last):"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> 16/06/01 14:48:52 ERROR protocol.UplinkReader: java.lang.Exception: Bad 
> command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> java.util.concurrent.BrokenBarrierException
>   at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250)
>   at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362)
>   at 
> org.apache.hama.pipes.protocol.StreamingProtocol.start(StreamingProtocol.java:223)
>   at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:293)
>   at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>   at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>   at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>   at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
> Exception in thread "pipe-uplink-handler" java.lang.RuntimeException: 
> java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:182)
> Caused by: java.lang.Exception: Bad command code: -2
>   at 
> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:174)
> 16/06/01 14:48:52 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.io.IOException: Stream closed
>   at 
> java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
>   at java.io.OutputStream.write(OutputStream.java:116)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at 

[jira] [Commented] (HAMA-991) Add math classes for float16/float32

2016-05-24 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299374#comment-15299374
 ] 

Edward J. Yoon commented on HAMA-991:
-

I'll push this 32bit float classes first. Thanks.

> Add math classes for float16/float32
> 
>
> Key: HAMA-991
> URL: https://issues.apache.org/jira/browse/HAMA-991
> Project: Hama
>  Issue Type: New Feature
>  Components: math
>Affects Versions: 0.7.1
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.2
>
>
> Implement Float32Writable, Vector, and Matrix etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292552#comment-15292552
 ] 

Edward J. Yoon edited comment on HAMA-990 at 5/20/16 2:02 AM:
--

Yes.

We assume that there's existing HAMA/FLINK/SPARK cluster. And, your project 
provides a shell script that auto-produce benchmark results. For example,
 {code}% ${Behroz_project}/bin/run benchmarks [all | kmeans | pagerank | 
others.. ]{code}

If MRQL is good for us and works well, we can leverage it.


was (Author: udanax):
Yes.

We assume that there's existing HAMA/FLINK/SPARK cluster. And, your project 
provides a shell script that auto-produce benchmark results. For example, 
${Behroz_project}/bin/run benchmarks [all | kmeans | pagerank | others.. ]

If MRQL is good for us and works well, we can leverage it.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292552#comment-15292552
 ] 

Edward J. Yoon commented on HAMA-990:
-

Yes.

We assume that there's existing HAMA/FLINK/SPARK cluster. And, your project 
provides a shell script that auto-produce benchmark results. For example, 
${Behroz_project}/bin/run benchmarks [all | kmeans | pagerank | others.. ]

If MRQL is good for us and works well, we can leverage it.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292530#comment-15292530
 ] 

Edward J. Yoon edited comment on HAMA-990 at 5/20/16 1:39 AM:
--

The Hadoop + HAMA + Flink + Spark cluster boot scripts are already on both 
Amazon and Google clouds

https://github.com/GoogleCloudPlatform/bdutil/tree/master/extensions/hama
https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama

So, if we use MRQL, shell script (that generates some input data, schedules the 
jobs, and collects performance results) will be enough.


was (Author: udanax):
The Hadoop + HAMA + Flink + Spark cluster boot scripts are already on both 
Amazon and Google clouds

https://github.com/GoogleCloudPlatform/bdutil/tree/master/extensions/hama
https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama

So, if we use MRQL, shell script will be enough.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292530#comment-15292530
 ] 

Edward J. Yoon commented on HAMA-990:
-

The Hadoop + HAMA + Flink + Spark cluster boot scripts are already on both 
Amazon and Google clouds

https://github.com/GoogleCloudPlatform/bdutil/tree/master/extensions/hama
https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama

So, if we use MRQL, shell script will be enough.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292508#comment-15292508
 ] 

Edward J. Yoon edited comment on HAMA-990 at 5/20/16 1:23 AM:
--

{code}
According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page 
Rank and Query Processing whereas Spark is faster in Word Count. We can 
reproduce these results in our cluster and then can calculate the results for 
Hama. Once we have all the results we can compare all the systems.
{code}

I think good idea. With this, we may able to derive insight from the results 
(this should be our goal). I think I heard that flink uses own serialization 
techniques and shows good performance but unstable. Just FYI, MRQL also can be 
used for K-Means and PageRank.

Regarding cluster, current my cluster (used for my research) is consist of only 
few high-end machines equipped gpu and so somewhat not fit for large-scale 
distributed computing benchmark. If you can write some scripts that make it 
possible to auto-produce benchmark results on clouds such as Amazon or Google 
cloud, I can help.



was (Author: udanax):
{qoute}
According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page 
Rank and Query Processing whereas Spark is faster in Word Count. We can 
reproduce these results in our cluster and then can calculate the results for 
Hama. Once we have all the results we can compare all the systems.
{qoute}

I think good idea. With this, we may able to derive insight from the results 
(this should be our goal). I think I heard that flink uses own serialization 
techniques and shows good performance but unstable. Just FYI, MRQL also can be 
used for K-Means and PageRank.

Regarding cluster, current my cluster (used for my research) is consist of only 
few high-end machines equipped gpu and so somewhat not fit for large-scale 
distributed computing benchmark. If you can write some scripts that make it 
possible to auto-produce benchmark results on clouds such as Amazon or Google 
cloud, I can help.


> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292508#comment-15292508
 ] 

Edward J. Yoon commented on HAMA-990:
-

{qoute}
According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page 
Rank and Query Processing whereas Spark is faster in Word Count. We can 
reproduce these results in our cluster and then can calculate the results for 
Hama. Once we have all the results we can compare all the systems.
{qoute}

I think good idea. With this, we may able to derive insight from the results 
(this should be our goal). I think I heard that flink uses own serialization 
techniques and shows good performance but unstable. Just FYI, MRQL also can be 
used for K-Means and PageRank.

Regarding cluster, current my cluster (used for my research) is consist of only 
few high-end machines equipped gpu and so somewhat not fit for large-scale 
distributed computing benchmark. If you can write some scripts that make it 
possible to auto-produce benchmark results on clouds such as Amazon or Google 
cloud, I can help.


> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink

2016-05-18 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288809#comment-15288809
 ] 

Edward J. Yoon commented on HAMA-990:
-

how's your work going? and main goal? I personally recommend you don't spend 
much time for other trivial bug fixes.

> GSoC'16: Apache Hama benchmark against Spark and Flink
> --
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
>  Issue Type: Documentation
>Reporter: Behroz Sikander
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-16 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285684#comment-15285684
 ] 

Edward J. Yoon commented on HAMA-941:
-

Quick comment from Greg Malewicz -- "There are many clustering algorithms. 
Perhaps it's better to start with why you need to group items, and then look at 
papers for an algorithm that has the desired grouping properties."

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-16 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285259#comment-15285259
 ] 

Edward J. Yoon commented on HAMA-941:
-

Sure, I'll check. greg a original author is also near my seat. :-)




-- 
Best Regards, Edward J. Yoon


> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-991) Add math classes for float16/float32

2016-05-10 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-991:
---

 Summary: Add math classes for float16/float32
 Key: HAMA-991
 URL: https://issues.apache.org/jira/browse/HAMA-991
 Project: Hama
  Issue Type: New Feature
  Components: math
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.2


Implement Float32Writable, Vector, and Matrix etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-06 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273903#comment-15273903
 ] 

Edward J. Yoon commented on HAMA-941:
-

Sorry for lazy review, it's Korean holidays and I'll be back next week. Can you 
please try to find the bug of implementation? :-)

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-940) Add StreamInputFormat

2016-05-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266013#comment-15266013
 ] 

Edward J. Yoon commented on HAMA-940:
-

If we can hide these implmentations and simplified APIs for processing stream 
data, I think this way is the better.

> Add StreamInputFormat
> -
>
> Key: HAMA-940
> URL: https://issues.apache.org/jira/browse/HAMA-940
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>
> Add StreamInputFormat that reads newly appended records from previous 
> superstep. 
> I roughly guess it will be possible using reopen() method and file offset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-940) Add StreamInputFormat

2016-05-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266012#comment-15266012
 ] 

Edward J. Yoon commented on HAMA-940:
-

As I mentioned in Description, we can simply check whether there's an newly 
appended records to the input file, keeping last read offset. 

To implement this, first of all, you should see the InputFormat interface 
class. The tricky issue is how we implement the getSplits() method and multiple 
tasks. 

At the moment, my simple idea is that one bsp task acts as a "Stream input 
queue" without implement StreamInputFormat and change the framework core. For 
example, we set the file path in job configuration. The master task acts like 
below: 

{code}
if(isMaster(peer.me)) {
  while(true) {
 peer.reopen(); // reopen
 peer.skip(offset); // jump to last offset
 if(peer.readNext()) {
 // at here we do load-balance.
sendTo("send a newly appended record to free slave tasks");
 } else {
Thread.sleep();
 }
  }
}
{code}



> Add StreamInputFormat
> -
>
> Key: HAMA-940
> URL: https://issues.apache.org/jira/browse/HAMA-940
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>
> Add StreamInputFormat that reads newly appended records from previous 
> superstep. 
> I roughly guess it will be possible using reopen() method and file offset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-05-01 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265625#comment-15265625
 ] 

Edward J. Yoon commented on HAMA-941:
-

I just used \{code\} patch copied to clipboard \{code\} tag.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265533#comment-15265533
 ] 

Edward J. Yoon commented on HAMA-941:
-

P.S., Initial code can be found at HAMA-594. and, I changed few things because 
it doesn't work correctly.

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-941) Semiclustering Termination

2016-04-30 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265532#comment-15265532
 ] 

Edward J. Yoon commented on HAMA-941:
-

First of all, it looks like boundary score factor seems always 0.0. This is the 
user-defined parameter. 2nd, if vertex count is (vC <= 1), score should be 1.0. 
Please apply my patch and test again. Do you see more bugs? 

{code}
diff --git 
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java 
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
index 9a905c1..38481fd 100644
--- 
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
+++ 
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
@@ -71,7 +71,7 @@
 candidates.add(msg);
 
 if (!msg.contains(this.getVertexID())
-&& msg.size() == semiClusterMaximumVertexCount) {
+&& msg.size() < semiClusterMaximumVertexCount) {
   SemiClusterMessage msgNew = WritableUtils.clone(msg, this.getConf());
   msgNew.addVertex(this);
   msgNew.setSemiClusterId("C"
@@ -149,14 +149,15 @@
* @return the value to calcualte the Score of a semi-cluster.
*/
   public double semiClusterScoreCalcuation(SemiClusterMessage message) {
-double iC = 0.0, bC = 0.0, fB = 0.0, sC = 0.0;
-int vC = 0, eC = 0;
+// TODO fB is the bounday score factor. This should be configurable by user
+// the default is 0.5
+double iC = 0.0, bC = 0.0, fB = 0.5, sC = 0.0;
+int vC = 0;
 vC = message.size();
 for (Vertex v : message
 .getVertexList()) {
   List> eL = v.getEdges();
   for (Edge e : eL) {
-eC++;
 if (message.contains(e.getDestinationVertexID())
 && e.getValue() != null) {
   iC = iC + e.getValue().get();
@@ -165,8 +166,12 @@
 }
   }
 }
+
 if (vC > 1)
-  sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)) / eC;
+  sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2));
+else
+  sC = 1.0;
+
 return sC;
   }
{code}

> Semiclustering Termination
> --
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
>  Issue Type: Improvement
>  Components: examples, graph
>Reporter: Edward J. Yoon
>Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of 
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check 
> and improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-989) Build fails on non-Linux systems

2016-04-28 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-989.
-
Resolution: Fixed
  Assignee: Behroz Sikander

Fixed.

> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>Reporter: Edward J. Yoon
>Assignee: Behroz Sikander
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-989) Build fails on non-Linux systems

2016-04-27 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261553#comment-15261553
 ] 

Edward J. Yoon edited comment on HAMA-989 at 4/28/16 5:14 AM:
--

When you write commit log, you should follow the format: HAMA-989: your commit 
log
Then, apache infra and github will be integrated automatically by issue ID.

Also, you have to merge into 1 commit before pull request. You can use rebase 
command for example, git rebase -i HEAD~3.

Thanks.



was (Author: udanax):
When you write commit log, you should follow below format: HAMA-989: commitlog
Then, apache infra and github will be integrated automatically by issue ID.

Also, you have to merge into 1 commit before pull request. You can use rebase 
command for example, git rebase -i HEAD~3.

Thanks.


> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>Reporter: Edward J. Yoon
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-989) Build fails on non-Linux systems

2016-04-27 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261553#comment-15261553
 ] 

Edward J. Yoon commented on HAMA-989:
-

When you write commit log, you should follow below format: HAMA-989: commitlog
Then, apache infra and github will be integrated automatically by issue ID.

Also, you have to merge into 1 commit before pull request. You can use rebase 
command for example, git rebase -i HEAD~3.

Thanks.


> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>Reporter: Edward J. Yoon
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-989) Build fails on non-Linux systems

2016-04-25 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257343#comment-15257343
 ] 

Edward J. Yoon commented on HAMA-989:
-

We can catch and ignore exceptions or, SystemUtils.

{code}
diff --git 
a/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java
 
b/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java
index f4f89b9..b7bc9c8 100644
--- 
a/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java
+++ 
b/core/src/test/java/org/apache/hama/bsp/message/TestHamaAsyncMessageManager.java
@@ -23,6 +23,7 @@
 
 import junit.framework.TestCase;
 
+import org.apache.commons.lang.SystemUtils;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.NullWritable;
@@ -45,10 +46,14 @@
   public static volatile int increment = 1;
 
   public void testMemoryMessaging() throws Exception {
-HamaConfiguration conf = new HamaConfiguration();
-conf.setClass(MessageManager.RECEIVE_QUEUE_TYPE_CLASS, MemoryQueue.class,
-MessageQueue.class);
-messagingInternal(conf);
+if (SystemUtils.IS_OS_LINUX) {
+  HamaConfiguration conf = new HamaConfiguration();
+  conf.setClass(MessageManager.RECEIVE_QUEUE_TYPE_CLASS, MemoryQueue.class,
+  MessageQueue.class);
+  messagingInternal(conf);
+} else {
+  // we skip this test bc AsyncRPC is currently support only linux
+}
   }
 
   private static void messagingInternal(HamaConfiguration conf)
{code}

WDYT?

> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>Reporter: Edward J. Yoon
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-989) Build fails on non-Linux systems

2016-04-25 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-989:

Summary: Build fails on non-Linux systems  (was: Build fails on non-Linux 
OS)

> Build fails on non-Linux systems
> 
>
> Key: HAMA-989
> URL: https://issues.apache.org/jira/browse/HAMA-989
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core, build 
>Affects Versions: 0.7.1
>Reporter: Edward J. Yoon
> Fix For: 0.7.2
>
>
> http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-989) Build fails on non-Linux OS

2016-04-25 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-989:
---

 Summary: Build fails on non-Linux OS
 Key: HAMA-989
 URL: https://issues.apache.org/jira/browse/HAMA-989
 Project: Hama
  Issue Type: Bug
  Components: bsp core, build 
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
 Fix For: 0.7.2


http://markmail.org/message/ipgc5fjs57xdmtr2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-988) Allow to add additional no-input tasks as number user want

2016-04-21 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-988:
---

 Summary: Allow to add additional no-input tasks as number user want
 Key: HAMA-988
 URL: https://issues.apache.org/jira/browse/HAMA-988
 Project: Hama
  Issue Type: Improvement
  Components: bsp core
Affects Versions: 0.7.1
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.2


BSP framework basically launches the tasks as number of splits. And, 
force-setting the number of tasks is also possible by setting 
"hama.force.set.bsp.tasks" to true .

By the way, there's no way to add more specific tasks to the number of splits. 
For example, if input has 5 splits, I want to launch 6 (1 more no-input task to 
be acted as a master) tasks. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+

2016-03-21 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-984.
-
Resolution: Fixed

I just committed this, Thanks Cazen!

Please use the TRUNK version on your environments, and feel free to report your 
problems. Thanks.

> Support AWS S3 schema in Hadoop 2.6+
> 
>
> Key: HAMA-984
> URL: https://issues.apache.org/jira/browse/HAMA-984
> Project: Hama
>  Issue Type: Improvement
>  Components: build 
>Reporter: Cazen Lee
>Assignee: Cazen Lee
>
> Hadoop 2.6+ does not contain AWS S3 related filesystem by default.
> So, IOException(No FileSystem for scheme) occurred while trying to access S3 
> via s3 or s3n schema.
> I know it's not a Hama bug but it will be helpful to Hama users who using AWS 
> S3 because it can be used by previous version(includes 1.x) without manual 
> setting. Of course, we can also guide through the changes, without 
> modification any source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-984:

Assignee: Cazen Lee

> Support AWS S3 schema in Hadoop 2.6+
> 
>
> Key: HAMA-984
> URL: https://issues.apache.org/jira/browse/HAMA-984
> Project: Hama
>  Issue Type: Improvement
>  Components: build 
>Reporter: Cazen Lee
>Assignee: Cazen Lee
>
> Hadoop 2.6+ does not contain AWS S3 related filesystem by default.
> So, IOException(No FileSystem for scheme) occurred while trying to access S3 
> via s3 or s3n schema.
> I know it's not a Hama bug but it will be helpful to Hama users who using AWS 
> S3 because it can be used by previous version(includes 1.x) without manual 
> setting. Of course, we can also guide through the changes, without 
> modification any source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-986) Hashcode calculation

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-986:

Fix Version/s: 0.7.2

> Hashcode calculation 
> -
>
> Key: HAMA-986
> URL: https://issues.apache.org/jira/browse/HAMA-986
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.1
>Reporter: JongYoon Lim
>Priority: Trivial
> Fix For: 0.7.2
>
> Attachments: HAMA-986.patch
>
>
> There is a missing value when calculating hashcode of AsyncClient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-986) Hashcode calculation

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-986.
-
Resolution: Fixed
  Assignee: JongYoon Lim

I just committed this! Thanks JongYoon.

> Hashcode calculation 
> -
>
> Key: HAMA-986
> URL: https://issues.apache.org/jira/browse/HAMA-986
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.1
>Reporter: JongYoon Lim
>Assignee: JongYoon Lim
>Priority: Trivial
> Fix For: 0.7.2
>
> Attachments: HAMA-986.patch
>
>
> There is a missing value when calculating hashcode of AsyncClient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-986) Hashcode calculation

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-986:

Affects Version/s: 0.7.1

> Hashcode calculation 
> -
>
> Key: HAMA-986
> URL: https://issues.apache.org/jira/browse/HAMA-986
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.1
>Reporter: JongYoon Lim
>Priority: Trivial
> Fix For: 0.7.2
>
> Attachments: HAMA-986.patch
>
>
> There is a missing value when calculating hashcode of AsyncClient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-982) Vertex.read/writeState() method throws NullPointerException

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-982.
-
   Resolution: Fixed
Fix Version/s: (was: 0.7.2)
   0.7.1

Fixed.

> Vertex.read/writeState() method throws NullPointerException
> ---
>
> Key: HAMA-982
> URL: https://issues.apache.org/jira/browse/HAMA-982
> Project: Hama
>  Issue Type: Bug
>  Components: graph
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> It occurs at partitioning and initial supersteps.
> >  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-982) Vertex.read/writeState() method throws NullPointerException

2016-03-15 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-982:

Fix Version/s: (was: 0.7.1)
   0.7.2

> Vertex.read/writeState() method throws NullPointerException
> ---
>
> Key: HAMA-982
> URL: https://issues.apache.org/jira/browse/HAMA-982
> Project: Hama
>  Issue Type: Bug
>  Components: graph
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> It occurs at partitioning and initial supersteps.
> >  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-986) Hashcode calculation

2016-03-13 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192751#comment-15192751
 ] 

Edward J. Yoon commented on HAMA-986:
-

Thanks for your contribution! Since we're currently in release process, I can 
commit few days later. 

> Hashcode calculation 
> -
>
> Key: HAMA-986
> URL: https://issues.apache.org/jira/browse/HAMA-986
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Reporter: JongYoon Lim
>Priority: Trivial
> Attachments: HAMA-986.patch
>
>
> There is a missing value when calculating hashcode of AsyncClient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-985) Update git scm provider dependency

2016-03-06 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-985:
---

 Summary: Update git scm provider dependency
 Key: HAMA-985
 URL: https://issues.apache.org/jira/browse/HAMA-985
 Project: Hama
  Issue Type: Bug
  Components: build 
Reporter: Edward J. Yoon


Symptom: mvn release:prepare or perform not committing changes to pom.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+

2016-03-02 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177081#comment-15177081
 ] 

Edward J. Yoon commented on HAMA-984:
-

Hi, the hadoop-auth package is used for security/auth in o.a.h.ipc package and 
YARN module.

> Support AWS S3 schema in Hadoop 2.6+
> 
>
> Key: HAMA-984
> URL: https://issues.apache.org/jira/browse/HAMA-984
> Project: Hama
>  Issue Type: Improvement
>  Components: build 
>Reporter: Cazen Lee
>
> Hadoop 2.6+ does not contain AWS S3 related filesystem by default.
> So, IOException(No FileSystem for scheme) occurred while trying to access S3 
> via s3 or s3n schema.
> I know it's not a Hama bug but it will be helpful to Hama users who using AWS 
> S3 because it can be used by previous version(includes 1.x) without manual 
> setting. Of course, we can also guide through the changes, without 
> modification any source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-984) Support AWS S3 schema in Hadoop 2.6+

2016-02-27 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170592#comment-15170592
 ] 

Edward J. Yoon commented on HAMA-984:
-

Thanks for pull request. I can check next week.

> Support AWS S3 schema in Hadoop 2.6+
> 
>
> Key: HAMA-984
> URL: https://issues.apache.org/jira/browse/HAMA-984
> Project: Hama
>  Issue Type: Improvement
>  Components: build 
>Reporter: Cazen Lee
>
> Hadoop 2.6+ does not contain AWS S3 related filesystem by default.
> So, IOException(No FileSystem for scheme) occurred while trying to access S3 
> via s3 or s3n schema.
> I know it's not a Hama bug but it will be helpful to Hama users who using AWS 
> S3 because it can be used by previous version(includes 1.x) without manual 
> setting. Of course, we can also guide through the changes, without 
> modification any source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-983) Hama runner for DataFlow

2016-02-16 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-983:

Labels: gsoc2016  (was: )

> Hama runner for DataFlow
> 
>
> Key: HAMA-983
> URL: https://issues.apache.org/jira/browse/HAMA-983
> Project: Hama
>  Issue Type: Bug
>Reporter: Edward J. Yoon
>  Labels: gsoc2016
>
> As you already know, Apache Beam provides unified programming model for both 
> batch and streaming inputs.
> The APIs are generally associated with data filtering and transforming. So 
> we'll need to implement some data processing runner like 
> https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java
> Also, implementing similarity join can be funny. According to 
> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
> clearly winner among Apache Hadoop and Apache Spark.
> Since it consists of transformation, aggregation, and partition computations, 
> I think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-983) Hama runner for DataFlow

2016-02-14 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-983:
---

 Summary: Hama runner for DataFlow
 Key: HAMA-983
 URL: https://issues.apache.org/jira/browse/HAMA-983
 Project: Hama
  Issue Type: Bug
Reporter: Edward J. Yoon


As you already know, Apache Beam provides unified programming model for both 
batch and streaming inputs.

The APIs are generally associated with data filtering and transforming. So 
we'll need to implement some data processing runner like 
https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java

Also, implementing similarity join can be funny. According to 
http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is 
clearly winner among Apache Hadoop and Apache Spark.

Since it consists of transformation, aggregation, and partition computations, I 
think it's possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-982) Vertex.read/writeState() method throws NullPointerException

2016-01-28 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-982:

Description: 
It occurs at partitioning and initial supersteps.

>  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)

  was:
It occurs when partitioning and initial supersteps.

>  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)


> Vertex.read/writeState() method throws NullPointerException
> ---
>
> Key: HAMA-982
> URL: https://issues.apache.org/jira/browse/HAMA-982
> Project: Hama
>  Issue Type: Bug
>  Components: graph
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> It occurs at partitioning and initial supersteps.
> >  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-982) Vertex.read/writeState() method throws NullPointerException

2016-01-28 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-982:
---

 Summary: Vertex.read/writeState() method throws 
NullPointerException
 Key: HAMA-982
 URL: https://issues.apache.org/jira/browse/HAMA-982
 Project: Hama
  Issue Type: Bug
  Components: graph
Affects Versions: 0.7.0
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.1


It occurs when partitioning and initial supersteps.

>  at org.apache.hama.graph.GraphJobRunner$Parser.run(GraphJobRunner.java:557)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-900) Rotation task scheduler

2016-01-17 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104020#comment-15104020
 ] 

Edward J. Yoon commented on HAMA-900:
-

I just merged it into master, Thanks Behroz :-)

> Rotation task scheduler
> ---
>
> Key: HAMA-900
> URL: https://issues.apache.org/jira/browse/HAMA-900
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>
> To spread tasks widely, I need a FIFO job scheduler that assign tasks one at 
> a time in rotation of groom servers (a method of dealing cards).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-900) Rotation task scheduler

2016-01-17 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-900.
-
   Resolution: Fixed
 Assignee: Behroz Sikander
Fix Version/s: 0.7.1

> Rotation task scheduler
> ---
>
> Key: HAMA-900
> URL: https://issues.apache.org/jira/browse/HAMA-900
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>Assignee: Behroz Sikander
> Fix For: 0.7.1
>
>
> To spread tasks widely, I need a FIFO job scheduler that assign tasks one at 
> a time in rotation of groom servers (a method of dealing cards).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-900) Rotation task scheduler

2016-01-14 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099245#comment-15099245
 ] 

Edward J. Yoon commented on HAMA-900:
-

Cool~

1) % mvn clean install
2) I personally think data locality should have 2nd priority in round-robin.
3) In getClass(String name, Class defaultValue, Class 
interface) method, "BestEffortDataLocalTaskAllocator.class" is just a default 
value. If you define the "bsp.taskalloc.class" property in hama-site.xml, you 
can use it. We may also want to add default configuration to the 
hama-default.xml like below:

{code}
  
bsp.taskalloc.class

org.apache.hama.bsp.taskallocation.BestEffortDataLocalTaskAllocator

  The task allocator to choose. Default is BestEffortDataLocalTaskAllocator 
that
  takes in only the data locality as a constraint for allocating tasks. 

  
{code}

> Rotation task scheduler
> ---
>
> Key: HAMA-900
> URL: https://issues.apache.org/jira/browse/HAMA-900
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>
> To spread tasks widely, I need a FIFO job scheduler that assign tasks one at 
> a time in rotation of groom servers (a method of dealing cards).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-900) Rotation task scheduler

2016-01-13 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097150#comment-15097150
 ] 

Edward J. Yoon commented on HAMA-900:
-

If you needed, I can work on this this week.

> Rotation task scheduler
> ---
>
> Key: HAMA-900
> URL: https://issues.apache.org/jira/browse/HAMA-900
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>
> To spread tasks widely, I need a FIFO job scheduler that assign tasks one at 
> a time in rotation of groom servers (a method of dealing cards).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-900) Rotation task scheduler

2016-01-13 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097144#comment-15097144
 ] 

Edward J. Yoon commented on HAMA-900:
-

Thanks for reminding me.

In the JobInProgress.java, you can see the two obtainNewTask() methods: 
obtainNewTask(Map groomStatuses) and 
obtainNewTask(TaskInProgress task, Map 
groomStatuses, BSPResource[] resources).

The latter API uses a taskAllocationStrategy. As far as I know, it originally 
created for task recovery and re-allocation. The default scheduler 
SimpleTaskWorkerManager.java still use former API like below. So, this issue is 
still TODO thing.

{code}
  while ((t = jip.obtainNewTask(this.groomStatuses)) != null) {
taskSet.add(t);
// Scheduled all tasks
if (++cnt == this.jip.tasks.length) {
  break;
}
  }

  ..
  // assembly into actions
  for (Task task : taskSet) {
GroomServerStatus groomStatus = jip.getGroomStatusForTask(task);
{code}

> Rotation task scheduler
> ---
>
> Key: HAMA-900
> URL: https://issues.apache.org/jira/browse/HAMA-900
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>
> To spread tasks widely, I need a FIFO job scheduler that assign tasks one at 
> a time in rotation of groom servers (a method of dealing cards).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-900) Rotation task scheduler

2016-01-13 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097390#comment-15097390
 ] 

Edward J. Yoon commented on HAMA-900:
-

Oh, ... you're right. My mis-read. 

>> if just the interface needs to be implemented then I can give it a try since 
>> I already have the cluster where I can duplicate this issue.

It'll be great if you can try this and share result with me. Thanks :-)

> Rotation task scheduler
> ---
>
> Key: HAMA-900
> URL: https://issues.apache.org/jira/browse/HAMA-900
> Project: Hama
>  Issue Type: New Feature
>  Components: bsp core
>Reporter: Edward J. Yoon
>
> To spread tasks widely, I need a FIFO job scheduler that assign tasks one at 
> a time in rotation of groom servers (a method of dealing cards).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-981) Set maven scm to git

2016-01-08 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089030#comment-15089030
 ] 

Edward J. Yoon commented on HAMA-981:
-

Yes, we migrated to GIT and SVN is now read-only.

> Set maven scm to git
> 
>
> Key: HAMA-981
> URL: https://issues.apache.org/jira/browse/HAMA-981
> Project: Hama
>  Issue Type: Bug
>  Components: build 
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> SCM still uses svn repository. We need to change before release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-981) Set maven scm to git

2016-01-03 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080775#comment-15080775
 ] 

Edward J. Yoon commented on HAMA-981:
-

{code}
   
-scm:svn:https://svn.apache.org/repos/asf/hama/
-
-  scm:svn:http://svn.apache.org/repos/asf/hama/trunk/
-
-
-  scm:svn:https://svn.apache.org/repos/asf/hama/trunk/
-
+https://git-wip-us.apache.org/repos/asf/hama.git
+
scm:git:https://git-wip-us.apache.org/repos/asf/hama.git
+
scm:git:https://git-wip-us.apache.org/repos/asf/hama.git
+HEAD
   
 {code}

Here's my changes. If no objections, I'll commit directly within 3 days. :-)

> Set maven scm to git
> 
>
> Key: HAMA-981
> URL: https://issues.apache.org/jira/browse/HAMA-981
> Project: Hama
>  Issue Type: Bug
>  Components: build 
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> SCM still uses svn repository. We need to change before release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAMA-970) Exception can occur if the size of splits is bigger than numBSPTasks

2015-12-08 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047998#comment-15047998
 ] 

Edward J. Yoon edited comment on HAMA-970 at 12/9/15 4:17 AM:
--

Hi,

To launch more tasks than num of splits,  you should use input partitioner  - 
https://github.com/apache/hama/blob/master/core/src/test/java/org/apache/hama/bsp/TestPartitioning.java
 example, you should use input partitioner. 

For example, if you have a 10MB file and set the number of tasks 10 with 
partitioner, the framework automatically partition 10MB file into 10 files and 
then launch your main BSP program with 10 tasks.

{quote}
Previously in my Input Paths, I was adding 2 files, one empty file and one 70 
MB file. This is working but Hama only opens up 2 tasks, one for empty file 
(which becomes the master) and one for 70 MB file (which becomes my only 
slave). Now, since I want to divide the 70 MB file into 4-5 tasks if I try to 
do this solution, I get an exception.
{quote}

You can do like this: 1) partition one 70MB file into 9 files (manually) and 
then launch the BSP program with setNumOfTasks(10);



was (Author: udanax):
Hi,

To launch more tasks than num of splits,  you should use input partitioner  - 
https://github.com/apache/hama/blob/master/core/src/test/java/org/apache/hama/bsp/TestPartitioning.java
 example, you should use input partitioner. 

For example, if you have a 10MB file and set the number of tasks 10 with 
partitioner, the framework automatically partition 10MB file into 10 files and 
then launch your main BSP program with 10 tasks.

{qoute}
Previously in my Input Paths, I was adding 2 files, one empty file and one 70 
MB file. This is working but Hama only opens up 2 tasks, one for empty file 
(which becomes the master) and one for 70 MB file (which becomes my only 
slave). Now, since I want to divide the 70 MB file into 4-5 tasks if I try to 
do this solution, I get an exception.
{qoute}

You can do like this: 1) partition one 70MB file into 9 files (manually) and 
then launch the BSP program with setNumOfTasks(10);


> Exception can occur if the size of splits is bigger than numBSPTasks
> 
>
> Key: HAMA-970
> URL: https://issues.apache.org/jira/browse/HAMA-970
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.0
>Reporter: JongYoon Lim
>Priority: Trivial
> Attachments: HAMA-970.patch
>
>
> In JonInProgress, it's possble to get Exception in initTasks(). 
> {code:java}
> this.tasks = new TaskInProgress[numBSPTasks];
> for (int i = 0; i < splits.length; i++) {
>   tasks[i] = new TaskInProgress(getJobID(), this.jobFile.toString(), 
> splits[i], this.conf, this, i);
> }
> {code}
> I'm not sure that *numBSPTask* is always bigger than *splits.length*. 
> So, I think it's better to use bigger value to assign the *tasks* array. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-970) Exception can occur if the size of splits is bigger than numBSPTasks

2015-12-08 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047998#comment-15047998
 ] 

Edward J. Yoon commented on HAMA-970:
-

Hi,

To launch more tasks than num of splits,  you should use input partitioner  - 
https://github.com/apache/hama/blob/master/core/src/test/java/org/apache/hama/bsp/TestPartitioning.java
 example, you should use input partitioner. 

For example, if you have a 10MB file and set the number of tasks 10 with 
partitioner, the framework automatically partition 10MB file into 10 files and 
then launch your main BSP program with 10 tasks.

{qoute}
Previously in my Input Paths, I was adding 2 files, one empty file and one 70 
MB file. This is working but Hama only opens up 2 tasks, one for empty file 
(which becomes the master) and one for 70 MB file (which becomes my only 
slave). Now, since I want to divide the 70 MB file into 4-5 tasks if I try to 
do this solution, I get an exception.
{qoute}

You can do like this: 1) partition one 70MB file into 9 files (manually) and 
then launch the BSP program with setNumOfTasks(10);


> Exception can occur if the size of splits is bigger than numBSPTasks
> 
>
> Key: HAMA-970
> URL: https://issues.apache.org/jira/browse/HAMA-970
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.0
>Reporter: JongYoon Lim
>Priority: Trivial
> Attachments: HAMA-970.patch
>
>
> In JonInProgress, it's possble to get Exception in initTasks(). 
> {code:java}
> this.tasks = new TaskInProgress[numBSPTasks];
> for (int i = 0; i < splits.length; i++) {
>   tasks[i] = new TaskInProgress(getJobID(), this.jobFile.toString(), 
> splits[i], this.conf, this, i);
> }
> {code}
> I'm not sure that *numBSPTask* is always bigger than *splits.length*. 
> So, I think it's better to use bigger value to assign the *tasks* array. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-981) Set maven scm to git

2015-12-08 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-981:
---

 Summary: Set maven scm to git
 Key: HAMA-981
 URL: https://issues.apache.org/jira/browse/HAMA-981
 Project: Hama
  Issue Type: Bug
  Components: build 
Affects Versions: 0.7.0
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon
 Fix For: 0.7.1


SCM still uses svn repository. We need to change before release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-978) NumberFormatException at StreamingUplinkReaderThread.readCommand

2015-11-30 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-978.
-
Resolution: Fixed
  Assignee: Edward J. Yoon

This bug has fixed by 
https://github.com/apache/hama/commit/c095c7da2e529256579429da9bb2d534a96da873

> NumberFormatException at StreamingUplinkReaderThread.readCommand
> 
>
> Key: HAMA-978
> URL: https://issues.apache.org/jira/browse/HAMA-978
> Project: Hama
>  Issue Type: Bug
>  Components: pipes
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> {code}
> Hi to all,
> this is my first mail to this mailing list so please have patience if I
> make some bad choice in the format.
> I have a kubuntu-14.04 on an old Intel Core 2 Duo processor T7500 with 2 GB
> of RAM.
> I have properly installed Hadoop-2.7.1, Sun Java JDK 1.8.0_60 e Hama-0.7.0
> as you can see from the following lines:
> >
> > > $ hadoop version
> > > Hadoop 2.7.1
> > > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
> 15ecc87ccf4a0228f35af08fc56de536e6ce657a
> > > Compiled by jenkins on 2015-06-29T06:04Z
> > > Compiled with protoc 2.5.0
> > > From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
> > > This command was run using
> /home/tora/Downloads/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar
> > > $ java -version
> > > java version "1.8.0_60"
> > > Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
> > > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
> > >
> I am able to properly run the basic Hadoop and Hama examples like the
> following:
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar
> grep input output 'dfs[a-z.]+'
> bin/hama jar hama-examples-0.7.0.jar pi
> My problem is that I receive the following error message when I try to run
> the HelloWorld Hama Streaming example with this instruction:
> bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2
> -cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/ -program
> /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP
> The default python interpreters of my OS were python-2.7 and python-3.4;
> since I had problems with this example I also tried to install python-3.2
> with the following instructions but it didn't solved the problem:
> >
> > > sudo apt-get install software-properties-common
> > > sudo apt-add-repository ppa:fkrull/deadsnakes
> > > sudo apt-get update
> > > sudo apt-get install python3.2
> > >
> The installed python version is 3.2.6 as you can see from the following
> lines:
> >
> > > $ python3.2
> > > Python 3.2.6 (default, Oct 21 2014, 12:50:03)
> > > [GCC 4.8.2] on linux2
> > > Type "help", "copyright", "credits" or "license" for more information.
> > > >>>
> > >
> The error message is the following (I am working in local mode so I didn't
> run bin/start-bspd.sh):
> >
> >
> > >
> > > $ clear;bin/hama pipes -streaming true -bspTasks 2 -interpreter
> python3.2 -cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/
> -program /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP
> > > 15/09/30 12:39:11 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> > > 15/09/30 12:39:11 INFO pipes.Submitter: Streaming enabled!
> > > 15/09/30 12:39:11 INFO Configuration.deprecation: fs.default.name is
> deprecated. Instead, use fs.defaultFS
> > > 15/09/30 12:39:11 INFO Configuration.deprecation: user.name is
> deprecated. Instead, use mapreduce.job.user.name
> > > 15/09/30 12:39:11 WARN conf.Configuration:
> org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream@4ddced80:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> > > 15/09/30 12:39:11 WARN conf.Configuration:
> org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream@4ddced80:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> > > 15/09/30 12:39:12 INFO Configuration.deprecation: user.name is
> deprecated. Instead, use mapreduce.job.user.name
> > > 15/09/30 12:39:12 INFO bsp.BSPJobClient: Running job:
> job_localrunner_0001
> > > 15/09/30 12:39:12 INFO Configuration.deprecation:
> mapred.cache.localFiles is deprecated. Instead, use
> mapreduce.job.cache.local.files
> > > 15/09/30 12:39:12 INFO bsp.LocalBSPRunner: Setting up a new barrier for
> 2 tasks!
> > > java.lang.NumberFormatException: For input string: "Traceback (most
> recent call last):"
> > > at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> > > at java.lang.Integer.parseInt(Integer.java:580)
> > > at 

[jira] [Resolved] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"

2015-11-22 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-980.
-
Resolution: Fixed

I just committed this! Thanks!

> Modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class"
> --
>
> Key: HAMA-980
> URL: https://issues.apache.org/jira/browse/HAMA-980
> Project: Hama
>  Issue Type: Bug
>  Components: test 
>Affects Versions: 0.7.0
>Reporter: Minho Kim
>Assignee: Minho Kim
>Priority: Blocker
> Fix For: 0.7.1
>
>
> Configuration value, "hama.sync.client.class", is never used. Because 
> configuration value to run test code is not "hama.sync.client.classe" but 
> "hama.sync.peer.class".
> In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to 
> initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's 
> no use setting "hama.sync.client.class".
> {code:title=SyncServiceFactory.java}
> public static final String SYNC_SERVER_CLASS = "hama.sync.server.class";
>   public static final String SYNC_PEER_CLASS = "hama.sync.peer.class";
>   public static final String SYNC_MASTER_CLASS = "hama.sync.master.class";
>   /**
>* Returns a sync client via reflection based on what was configured.
>*/
>   public static PeerSyncClient getPeerSyncClient(Configuration conf)
>   throws ClassNotFoundException {
> return (PeerSyncClient) ReflectionUtils.newInstance(conf
> .getClassByName(conf.get(SYNC_PEER_CLASS,
> ZooKeeperSyncClientImpl.class.getName())), conf);
>   }
> {code}
> We need to modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class" in test codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-961) Remove ANN package

2015-11-22 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-961.
-
Resolution: Fixed

I just committed this! 

> Remove ANN package
> --
>
> Key: HAMA-961
> URL: https://issues.apache.org/jira/browse/HAMA-961
> Project: Hama
>  Issue Type: Improvement
>  Components: machine learning
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.8.0
>
>
> I've recently started to review the MLP source codes closely, and I'm 
> thinking about some improvement and API refactoring e.g., APIs for 
> user-defined neuron and synapse models, data structure, ..., etc.
> This issue is one of them, and related to train large models. I'm considering 
> distributed parameter server (http://parameterserver.org) for managing 
> parameters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-961) Remove ANN package

2015-11-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013224#comment-15013224
 ] 

Edward J. Yoon commented on HAMA-961:
-

As we decided, this efforts moved to Apache Horn podling.

> Remove ANN package
> --
>
> Key: HAMA-961
> URL: https://issues.apache.org/jira/browse/HAMA-961
> Project: Hama
>  Issue Type: Improvement
>  Components: machine learning
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.8.0
>
>
> I've recently started to review the MLP source codes closely, and I'm 
> thinking about some improvement and API refactoring e.g., APIs for 
> user-defined neuron and synapse models, data structure, ..., etc.
> This issue is one of them, and related to train large models. I'm considering 
> distributed parameter server (http://parameterserver.org) for managing 
> parameters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-961) Remove ANN package

2015-11-19 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-961:

Summary: Remove ANN package  (was: Parameter Server for large scale MLP)

> Remove ANN package
> --
>
> Key: HAMA-961
> URL: https://issues.apache.org/jira/browse/HAMA-961
> Project: Hama
>  Issue Type: Improvement
>  Components: machine learning
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.8.0
>
>
> I've recently started to review the MLP source codes closely, and I'm 
> thinking about some improvement and API refactoring e.g., APIs for 
> user-defined neuron and synapse models, data structure, ..., etc.
> This issue is one of them, and related to train large models. I'm considering 
> distributed parameter server (http://parameterserver.org) for managing 
> parameters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"

2015-11-19 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-980:

Priority: Blocker  (was: Minor)

> Modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class"
> --
>
> Key: HAMA-980
> URL: https://issues.apache.org/jira/browse/HAMA-980
> Project: Hama
>  Issue Type: Bug
>  Components: test 
>Affects Versions: 0.7.0
>Reporter: Minho Kim
>Assignee: Minho Kim
>Priority: Blocker
> Fix For: 0.7.1
>
>
> Configuration value, "hama.sync.client.class", is never used. Because 
> configuration value to run test code is not "hama.sync.client.classe" but 
> "hama.sync.peer.class".
> In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to 
> initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's 
> no use setting "hama.sync.client.class".
> {code:title=SyncServiceFactory.java}
> public static final String SYNC_SERVER_CLASS = "hama.sync.server.class";
>   public static final String SYNC_PEER_CLASS = "hama.sync.peer.class";
>   public static final String SYNC_MASTER_CLASS = "hama.sync.master.class";
>   /**
>* Returns a sync client via reflection based on what was configured.
>*/
>   public static PeerSyncClient getPeerSyncClient(Configuration conf)
>   throws ClassNotFoundException {
> return (PeerSyncClient) ReflectionUtils.newInstance(conf
> .getClassByName(conf.get(SYNC_PEER_CLASS,
> ZooKeeperSyncClientImpl.class.getName())), conf);
>   }
> {code}
> We need to modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class" in test codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"

2015-11-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015132#comment-15015132
 ] 

Edward J. Yoon commented on HAMA-980:
-

Hi,

really nice catch! BTW, why don't we change the SyncServiceFactory like below: 

{code}
public static final String SYNC_SERVER_CLASS = "hama.sync.server.class";
  - public static final String SYNC_PEER_CLASS = "hama.sync.peer.class";
  + public static final String SYNC_PEER_CLASS = "hama.sync.client.class";
{code}

Server and Client are always pair.

> Modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class"
> --
>
> Key: HAMA-980
> URL: https://issues.apache.org/jira/browse/HAMA-980
> Project: Hama
>  Issue Type: Bug
>  Components: test 
>Affects Versions: 0.7.0
>Reporter: Minho Kim
>Assignee: Minho Kim
>Priority: Minor
> Fix For: 0.7.1
>
>
> Configuration value, "hama.sync.client.class", is never used. Because 
> configuration value to run test code is not "hama.sync.client.classe" but 
> "hama.sync.peer.class".
> In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to 
> initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's 
> no use setting "hama.sync.client.class".
> {code:title=SyncServiceFactory.java}
> public static final String SYNC_SERVER_CLASS = "hama.sync.server.class";
>   public static final String SYNC_PEER_CLASS = "hama.sync.peer.class";
>   public static final String SYNC_MASTER_CLASS = "hama.sync.master.class";
>   /**
>* Returns a sync client via reflection based on what was configured.
>*/
>   public static PeerSyncClient getPeerSyncClient(Configuration conf)
>   throws ClassNotFoundException {
> return (PeerSyncClient) ReflectionUtils.newInstance(conf
> .getClassByName(conf.get(SYNC_PEER_CLASS,
> ZooKeeperSyncClientImpl.class.getName())), conf);
>   }
> {code}
> We need to modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class" in test codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"

2015-11-19 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015345#comment-15015345
 ] 

Edward J. Yoon commented on HAMA-980:
-

+1

> Modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class"
> --
>
> Key: HAMA-980
> URL: https://issues.apache.org/jira/browse/HAMA-980
> Project: Hama
>  Issue Type: Bug
>  Components: test 
>Affects Versions: 0.7.0
>Reporter: Minho Kim
>Assignee: Minho Kim
>Priority: Blocker
> Fix For: 0.7.1
>
>
> Configuration value, "hama.sync.client.class", is never used. Because 
> configuration value to run test code is not "hama.sync.client.classe" but 
> "hama.sync.peer.class".
> In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to 
> initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's 
> no use setting "hama.sync.client.class".
> {code:title=SyncServiceFactory.java}
> public static final String SYNC_SERVER_CLASS = "hama.sync.server.class";
>   public static final String SYNC_PEER_CLASS = "hama.sync.peer.class";
>   public static final String SYNC_MASTER_CLASS = "hama.sync.master.class";
>   /**
>* Returns a sync client via reflection based on what was configured.
>*/
>   public static PeerSyncClient getPeerSyncClient(Configuration conf)
>   throws ClassNotFoundException {
> return (PeerSyncClient) ReflectionUtils.newInstance(conf
> .getClassByName(conf.get(SYNC_PEER_CLASS,
> ZooKeeperSyncClientImpl.class.getName())), conf);
>   }
> {code}
> We need to modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class" in test codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-980) Modify configuration value from "hama.sync.client.class" to "hama.sync.peer.class"

2015-11-18 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-980:

Fix Version/s: (was: 0.7.0)
   0.7.1

> Modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class"
> --
>
> Key: HAMA-980
> URL: https://issues.apache.org/jira/browse/HAMA-980
> Project: Hama
>  Issue Type: Bug
>  Components: test 
>Affects Versions: 0.7.0
>Reporter: Minho Kim
>Assignee: Minho Kim
>Priority: Minor
> Fix For: 0.7.1
>
>
> Configuration value, "hama.sync.client.class", is never used. Because 
> configuration value to run test code is not "hama.sync.client.classe" but 
> "hama.sync.peer.class".
> In BSPPeerImpl.java, configuration value refer to SYNC_PEER_CLASS so as to 
> initialize syncClient. But SYNC_PEER_CLASS is "hama.sync.peer.class" so it's 
> no use setting "hama.sync.client.class".
> {code:title=SyncServiceFactory.java}
> public static final String SYNC_SERVER_CLASS = "hama.sync.server.class";
>   public static final String SYNC_PEER_CLASS = "hama.sync.peer.class";
>   public static final String SYNC_MASTER_CLASS = "hama.sync.master.class";
>   /**
>* Returns a sync client via reflection based on what was configured.
>*/
>   public static PeerSyncClient getPeerSyncClient(Configuration conf)
>   throws ClassNotFoundException {
> return (PeerSyncClient) ReflectionUtils.newInstance(conf
> .getClassByName(conf.get(SYNC_PEER_CLASS,
> ZooKeeperSyncClientImpl.class.getName())), conf);
>   }
> {code}
> We need to modify configuration value from "hama.sync.client.class" to 
> "hama.sync.peer.class" in test codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-979) Change the setting the -source and -target of the Java Compiler to 1.7

2015-11-04 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-979.
-
Resolution: Fixed
  Assignee: Edward J. Yoon

I've just committed this!

> Change the setting the -source and -target of the Java Compiler to 1.7
> --
>
> Key: HAMA-979
> URL: https://issues.apache.org/jira/browse/HAMA-979
> Project: Hama
>  Issue Type: Bug
>  Components: build 
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.7.1
>
>
> As we discussed before http://markmail.org/message/xjpwn7uiit64vcd4, we 
> decided to move to Java7. 
> We need to change the setting the -source and -target of the Java Compiler to 
> 1.7 from 1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-979) Change the setting the -source and -target of the Java Compiler to 1.7

2015-10-22 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-979:
---

 Summary: Change the setting the -source and -target of the Java 
Compiler to 1.7
 Key: HAMA-979
 URL: https://issues.apache.org/jira/browse/HAMA-979
 Project: Hama
  Issue Type: Bug
  Components: build 
Affects Versions: 0.7.0
Reporter: Edward J. Yoon
 Fix For: 0.7.1


As we discussed before http://markmail.org/message/xjpwn7uiit64vcd4, we decided 
to move to Java7. 

We need to change the setting the -source and -target of the Java Compiler to 
1.7 from 1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-978) NumberFormatException at StreamingUplinkReaderThread.readCommand

2015-09-30 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-978:
---

 Summary: NumberFormatException at 
StreamingUplinkReaderThread.readCommand
 Key: HAMA-978
 URL: https://issues.apache.org/jira/browse/HAMA-978
 Project: Hama
  Issue Type: Bug
  Components: pipes
Affects Versions: 0.7.0
Reporter: Edward J. Yoon
 Fix For: 0.7.1


{code}
Hi to all,
this is my first mail to this mailing list so please have patience if I
make some bad choice in the format.

I have a kubuntu-14.04 on an old Intel Core 2 Duo processor T7500 with 2 GB
of RAM.

I have properly installed Hadoop-2.7.1, Sun Java JDK 1.8.0_60 e Hama-0.7.0
as you can see from the following lines:


>
> > $ hadoop version
> > Hadoop 2.7.1
> > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
15ecc87ccf4a0228f35af08fc56de536e6ce657a
> > Compiled by jenkins on 2015-06-29T06:04Z
> > Compiled with protoc 2.5.0
> > From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
> > This command was run using
/home/tora/Downloads/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar


> > $ java -version
> > java version "1.8.0_60"
> > Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
> >

I am able to properly run the basic Hadoop and Hama examples like the
following:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar
grep input output 'dfs[a-z.]+'
bin/hama jar hama-examples-0.7.0.jar pi
My problem is that I receive the following error message when I try to run
the HelloWorld Hama Streaming example with this instruction:
bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2
-cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/ -program
/tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP
The default python interpreters of my OS were python-2.7 and python-3.4;
since I had problems with this example I also tried to install python-3.2
with the following instructions but it didn't solved the problem:


>
> > sudo apt-get install software-properties-common
> > sudo apt-add-repository ppa:fkrull/deadsnakes
> > sudo apt-get update
> > sudo apt-get install python3.2
> >


The installed python version is 3.2.6 as you can see from the following
lines:


>
> > $ python3.2
> > Python 3.2.6 (default, Oct 21 2014, 12:50:03)
> > [GCC 4.8.2] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>>
> >


The error message is the following (I am working in local mode so I didn't
run bin/start-bspd.sh):


>
>
> >
> > $ clear;bin/hama pipes -streaming true -bspTasks 2 -interpreter
python3.2 -cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/
-program /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP
> > 15/09/30 12:39:11 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
> > 15/09/30 12:39:11 INFO pipes.Submitter: Streaming enabled!
> > 15/09/30 12:39:11 INFO Configuration.deprecation: fs.default.name is
deprecated. Instead, use fs.defaultFS
> > 15/09/30 12:39:11 INFO Configuration.deprecation: user.name is
deprecated. Instead, use mapreduce.job.user.name
> > 15/09/30 12:39:11 WARN conf.Configuration:
org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream@4ddced80:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> > 15/09/30 12:39:11 WARN conf.Configuration:
org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream@4ddced80:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
> > 15/09/30 12:39:12 INFO Configuration.deprecation: user.name is
deprecated. Instead, use mapreduce.job.user.name
> > 15/09/30 12:39:12 INFO bsp.BSPJobClient: Running job:
job_localrunner_0001
> > 15/09/30 12:39:12 INFO Configuration.deprecation:
mapred.cache.localFiles is deprecated. Instead, use
mapreduce.job.cache.local.files
> > 15/09/30 12:39:12 INFO bsp.LocalBSPRunner: Setting up a new barrier for
2 tasks!
> > java.lang.NumberFormatException: For input string: "Traceback (most
recent call last):"
> > at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> > at java.lang.Integer.parseInt(Integer.java:580)
> > at java.lang.Integer.parseInt(Integer.java:615)
> > at
org.apache.hama.pipes.protocol.StreamingProtocol$StreamingUplinkReaderThread.readCommand(StreamingProtocol.java:174)
> > at
org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:106)
> > java.lang.NumberFormatException: For input string: "Traceback (most
recent call last):"
> > at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> > at java.lang.Integer.parseInt(Integer.java:580)
> > at java.lang.Integer.parseInt(Integer.java:615)
> > 

[jira] [Created] (HAMA-977) Migrate from SVN to GIT

2015-09-21 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-977:
---

 Summary: Migrate from SVN to GIT
 Key: HAMA-977
 URL: https://issues.apache.org/jira/browse/HAMA-977
 Project: Hama
  Issue Type: Task
Reporter: Edward J. Yoon
Assignee: Edward J. Yoon


INFRA ticket: 
https://issues.apache.org/jira/servicedesk/agent/INFRA/issue/INFRA-10466

- need to udpate infor of Wiki and Website contents
  - HowToContribute, HowToCommit, etc.
- need to change settings of nightly build jobs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-976) Add the GraphJob example on YARN

2015-09-10 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738369#comment-14738369
 ] 

Edward J. Yoon commented on HAMA-976:
-

Hi, just FYI,

Before commit, we usually wait some time for review from other committers. If 
anyone comment there, you can commit by lazy consensus like this 
https://issues.apache.org/jira/browse/HAMA-818?focusedCommentId=13831194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13831194

> Add the GraphJob example on YARN
> 
>
> Key: HAMA-976
> URL: https://issues.apache.org/jira/browse/HAMA-976
> Project: Hama
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 0.7.1
>Reporter: Minho Kim
>Assignee: Minho Kim
> Fix For: 0.7.1
>
>
> I'll add the graph example to work on YARN cluster. The example is PageRank. 
> I'll test whether running normally or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-975) Improvement of Async RPC

2015-09-08 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734691#comment-14734691
 ] 

Edward J. Yoon commented on HAMA-975:
-

{quote}I'd like to divide this issue to small sub tasks.{quote}

It's good idea. Please keep up the great work!

> Improvement of Async RPC
> 
>
> Key: HAMA-975
> URL: https://issues.apache.org/jira/browse/HAMA-975
> Project: Hama
>  Issue Type: Improvement
>  Components: bsp core
>Reporter: JongYoon Lim
>
> Hama has a feature of async IPC. 
> I found some points which have possibility to be improved as below.
> 1. Add netty encoder and decoder to lighten a load of the handler.
> 2. Consider using native transport, EpollEventLoopGroup instead of 
> NioEventLoopGroup. 
> 3. Using pooled buffer. 
> 4. Using ctx.* instead of channel.* 
> 5. Find and remove blocking codes. 
> 6. Async-fashioned RPC response 
> Also we can consider compress or json-style marshalling(unmarshalling). 
> But I'm not sure these changes always result in improvement of the 
> performance.. so benchmark should be provided to prove the improvement. 
> I'd like to divide this issue to small sub tasks. 
> WDYT? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAMA-975) Improvement of Async RPC

2015-09-08 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-975:

Assignee: JongYoon Lim

> Improvement of Async RPC
> 
>
> Key: HAMA-975
> URL: https://issues.apache.org/jira/browse/HAMA-975
> Project: Hama
>  Issue Type: Improvement
>  Components: bsp core
>Reporter: JongYoon Lim
>Assignee: JongYoon Lim
>
> Hama has a feature of async IPC. 
> I found some points which have possibility to be improved as below.
> 1. Add netty encoder and decoder to lighten a load of the handler.
> 2. Consider using native transport, EpollEventLoopGroup instead of 
> NioEventLoopGroup. 
> 3. Using pooled buffer. 
> 4. Using ctx.* instead of channel.* 
> 5. Find and remove blocking codes. 
> 6. Async-fashioned RPC response 
> Also we can consider compress or json-style marshalling(unmarshalling). 
> But I'm not sure these changes always result in improvement of the 
> performance.. so benchmark should be provided to prove the improvement. 
> I'd like to divide this issue to small sub tasks. 
> WDYT? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-974) Support fault tolerance for Graph job

2015-09-07 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734184#comment-14734184
 ] 

Edward J. Yoon commented on HAMA-974:
-

Isn't HAMA-881 already addressed? I tested AsyncRcvdMsgCheckpointImpl, and it 
works fine.

The problem is the last-checkpoint variables states. I think providing some 
custom checkpoint function is best. For example, we can add checkpointState() 
method to BSPInterface.

{code}
public setup() { }

public bsp() {
  // your program
}

public checkpointState() {
  // define variables to be checkpointed
}

public close() { }
{code}

> Support fault tolerance for Graph job
> -
>
> Key: HAMA-974
> URL: https://issues.apache.org/jira/browse/HAMA-974
> Project: Hama
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
> Fix For: 0.8.0
>
>
> Currently we only checkpoints messages. To support FT for graph job, 
> aggregators, assigned vertices and its statuses must be checkpointed together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAMA-974) Support fault tolerance for Graph job

2015-09-07 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created HAMA-974:
---

 Summary: Support fault tolerance for Graph job
 Key: HAMA-974
 URL: https://issues.apache.org/jira/browse/HAMA-974
 Project: Hama
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.7.0
Reporter: Edward J. Yoon
 Fix For: 0.8.0


Currently we only checkpoints messages. To support FT for graph job, 
aggregators, assigned vertices and its statuses must be checkpointed together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAMA-974) Support fault tolerance for Graph job

2015-09-07 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734229#comment-14734229
 ] 

Edward J. Yoon commented on HAMA-974:
-

Yes, we need to vote. BTW, your branch is quite old, it's hard to compare with 
trunk. Can you please summarize your changes to give an overview of the changes 
you've made?

> Support fault tolerance for Graph job
> -
>
> Key: HAMA-974
> URL: https://issues.apache.org/jira/browse/HAMA-974
> Project: Hama
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
> Fix For: 0.8.0
>
>
> Currently we only checkpoints messages. To support FT for graph job, 
> aggregators, assigned vertices and its statuses must be checkpointed together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAMA-973) GraphJob and RandBench example works incorrectly when FT is enabled.

2015-09-07 Thread Edward J. Yoon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon resolved HAMA-973.
-
   Resolution: Fixed
Fix Version/s: 0.7.1

> GraphJob and RandBench example works incorrectly when FT is enabled.
> 
>
> Key: HAMA-973
> URL: https://issues.apache.org/jira/browse/HAMA-973
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
>Priority: Critical
> Fix For: 0.7.1
>
> Attachments: patch.txt
>
>
> Today I tested fault tolerance function with RandBench. FT works fine but I 
> just found that there is a bug in RandBench program.
> {code}
> [root@cluster-0 hama-0.7.0]# bin/hama jar hama-examples-0.7.0.jar bench 100 
> 100 100
> 15/09/03 12:59:57 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 15/09/03 12:59:58 INFO Configuration.deprecation: user.name is deprecated. 
> Instead, use mapreduce.job.user.name
> 15/09/03 12:59:58 INFO bsp.BSPJobClient: Running job: job_201509031258_0002
> 15/09/03 13:00:01 INFO bsp.BSPJobClient: Current supersteps number: 0
> 15/09/03 13:00:22 INFO bsp.BSPJobClient: Current supersteps number: 2
> 15/09/03 13:00:26 INFO bsp.BSPJobClient: Current supersteps number: 5
> 15/09/03 13:00:29 INFO bsp.BSPJobClient: Current supersteps number: 11
> 15/09/03 13:00:32 INFO bsp.BSPJobClient: Current supersteps number: 16
> 15/09/03 13:00:35 INFO bsp.BSPJobClient: Current supersteps number: 21
> 15/09/03 13:00:38 INFO bsp.BSPJobClient: Current supersteps number: 28
> 15/09/03 13:00:41 INFO bsp.BSPJobClient: Current supersteps number: 35
> 15/09/03 13:00:44 INFO bsp.BSPJobClient: Current supersteps number: 42
> 15/09/03 13:00:47 INFO bsp.BSPJobClient: Current supersteps number: 49
> 15/09/03 13:00:50 INFO bsp.BSPJobClient: Current supersteps number: 56
> 15/09/03 13:02:05 INFO bsp.BSPJobClient: Current supersteps number: 0
> 15/09/03 13:02:08 INFO bsp.BSPJobClient: Current supersteps number: 56
> 15/09/03 13:02:11 INFO bsp.BSPJobClient: Current supersteps number: 0
> 15/09/03 13:02:20 INFO bsp.BSPJobClient: Current supersteps number: 57
> 15/09/03 13:02:23 INFO bsp.BSPJobClient: Current supersteps number: 61
> 15/09/03 13:02:26 INFO bsp.BSPJobClient: Current supersteps number: 67
> 15/09/03 13:02:29 INFO bsp.BSPJobClient: Current supersteps number: 72
> 15/09/03 13:02:32 INFO bsp.BSPJobClient: Current supersteps number: 77
> 15/09/03 13:02:35 INFO bsp.BSPJobClient: Current supersteps number: 84
> 15/09/03 13:02:38 INFO bsp.BSPJobClient: Current supersteps number: 91
> 15/09/03 13:02:41 INFO bsp.BSPJobClient: Current supersteps number: 97
> 15/09/03 13:02:44 INFO bsp.BSPJobClient: Current supersteps number: 106
> 15/09/03 13:02:47 INFO bsp.BSPJobClient: Current supersteps number: 113
> 15/09/03 13:02:50 INFO bsp.BSPJobClient: Current supersteps number: 125
> 15/09/03 13:02:53 INFO bsp.BSPJobClient: Current supersteps number: 134
> 15/09/03 13:02:56 INFO bsp.BSPJobClient: Current supersteps number: 144
> 15/09/03 13:02:59 INFO bsp.BSPJobClient: Current supersteps number: 152
> 15/09/03 13:03:02 INFO bsp.BSPJobClient: Current supersteps number: 156
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: The total number of supersteps: 156
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: Counters: 6
> 15/09/03 13:03:05 INFO bsp.BSPJobClient:   
> org.apache.hama.bsp.JobInProgress$JobCounter
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: SUPERSTEPS=156
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: LAUNCHED_TASKS=160
> 15/09/03 13:03:05 INFO bsp.BSPJobClient:   
> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: SUPERSTEP_SUM=24960
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=1943366
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=160
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=160
> Job Finished in 187.453 seconds
> {code}
> I ran with set the max iteration to 100. At 56 superstep, I killed one task 
> manually and I checked that failed task has automatically recovered. By the 
> way, the total num of supersteps was 156, not 100.
> The reason is simple, i always starts from 0. To fix this issue, we have to 
> set the i to (int) peer.getSuperstepCount().
> {code}
> public void bsp(
> BSPPeer BytesWritable> peer)
> throws IOException, SyncException, InterruptedException {
>   byte[] dummyData = new byte[sizeOfMsg];
>   String[] peers = peer.getAllPeerNames();
>   for (int i = 0; i < nSupersteps; i++) {
> {code}
> GraphJobRunner also have similar problem. 

[jira] [Commented] (HAMA-973) GraphJob and RandBench example works incorrectly when FT is enabled.

2015-09-07 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734126#comment-14734126
 ] 

Edward J. Yoon commented on HAMA-973:
-

I just committed my changes. For graphjob FT, there are more things to be 
fixed. For example, Vertex status also must be checkpointed. I'll fix them 
later.

> GraphJob and RandBench example works incorrectly when FT is enabled.
> 
>
> Key: HAMA-973
> URL: https://issues.apache.org/jira/browse/HAMA-973
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.7.0
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
>Priority: Critical
> Attachments: patch.txt
>
>
> Today I tested fault tolerance function with RandBench. FT works fine but I 
> just found that there is a bug in RandBench program.
> {code}
> [root@cluster-0 hama-0.7.0]# bin/hama jar hama-examples-0.7.0.jar bench 100 
> 100 100
> 15/09/03 12:59:57 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 15/09/03 12:59:58 INFO Configuration.deprecation: user.name is deprecated. 
> Instead, use mapreduce.job.user.name
> 15/09/03 12:59:58 INFO bsp.BSPJobClient: Running job: job_201509031258_0002
> 15/09/03 13:00:01 INFO bsp.BSPJobClient: Current supersteps number: 0
> 15/09/03 13:00:22 INFO bsp.BSPJobClient: Current supersteps number: 2
> 15/09/03 13:00:26 INFO bsp.BSPJobClient: Current supersteps number: 5
> 15/09/03 13:00:29 INFO bsp.BSPJobClient: Current supersteps number: 11
> 15/09/03 13:00:32 INFO bsp.BSPJobClient: Current supersteps number: 16
> 15/09/03 13:00:35 INFO bsp.BSPJobClient: Current supersteps number: 21
> 15/09/03 13:00:38 INFO bsp.BSPJobClient: Current supersteps number: 28
> 15/09/03 13:00:41 INFO bsp.BSPJobClient: Current supersteps number: 35
> 15/09/03 13:00:44 INFO bsp.BSPJobClient: Current supersteps number: 42
> 15/09/03 13:00:47 INFO bsp.BSPJobClient: Current supersteps number: 49
> 15/09/03 13:00:50 INFO bsp.BSPJobClient: Current supersteps number: 56
> 15/09/03 13:02:05 INFO bsp.BSPJobClient: Current supersteps number: 0
> 15/09/03 13:02:08 INFO bsp.BSPJobClient: Current supersteps number: 56
> 15/09/03 13:02:11 INFO bsp.BSPJobClient: Current supersteps number: 0
> 15/09/03 13:02:20 INFO bsp.BSPJobClient: Current supersteps number: 57
> 15/09/03 13:02:23 INFO bsp.BSPJobClient: Current supersteps number: 61
> 15/09/03 13:02:26 INFO bsp.BSPJobClient: Current supersteps number: 67
> 15/09/03 13:02:29 INFO bsp.BSPJobClient: Current supersteps number: 72
> 15/09/03 13:02:32 INFO bsp.BSPJobClient: Current supersteps number: 77
> 15/09/03 13:02:35 INFO bsp.BSPJobClient: Current supersteps number: 84
> 15/09/03 13:02:38 INFO bsp.BSPJobClient: Current supersteps number: 91
> 15/09/03 13:02:41 INFO bsp.BSPJobClient: Current supersteps number: 97
> 15/09/03 13:02:44 INFO bsp.BSPJobClient: Current supersteps number: 106
> 15/09/03 13:02:47 INFO bsp.BSPJobClient: Current supersteps number: 113
> 15/09/03 13:02:50 INFO bsp.BSPJobClient: Current supersteps number: 125
> 15/09/03 13:02:53 INFO bsp.BSPJobClient: Current supersteps number: 134
> 15/09/03 13:02:56 INFO bsp.BSPJobClient: Current supersteps number: 144
> 15/09/03 13:02:59 INFO bsp.BSPJobClient: Current supersteps number: 152
> 15/09/03 13:03:02 INFO bsp.BSPJobClient: Current supersteps number: 156
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: The total number of supersteps: 156
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: Counters: 6
> 15/09/03 13:03:05 INFO bsp.BSPJobClient:   
> org.apache.hama.bsp.JobInProgress$JobCounter
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: SUPERSTEPS=156
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: LAUNCHED_TASKS=160
> 15/09/03 13:03:05 INFO bsp.BSPJobClient:   
> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: SUPERSTEP_SUM=24960
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=1943366
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=160
> 15/09/03 13:03:05 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=160
> Job Finished in 187.453 seconds
> {code}
> I ran with set the max iteration to 100. At 56 superstep, I killed one task 
> manually and I checked that failed task has automatically recovered. By the 
> way, the total num of supersteps was 156, not 100.
> The reason is simple, i always starts from 0. To fix this issue, we have to 
> set the i to (int) peer.getSuperstepCount().
> {code}
> public void bsp(
> BSPPeer BytesWritable> peer)
> throws IOException, SyncException, InterruptedException {
>   byte[] dummyData = new byte[sizeOfMsg];
>   String[] peers = 

  1   2   3   4   5   6   7   8   9   10   >