[
https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265532#comment-15265532
]
Edward J. Yoon commented on HAMA-941:
-------------------------------------
First of all, it looks like boundary score factor seems always 0.0. This is the
user-defined parameter. 2nd, if vertex count is (vC <= 1), score should be 1.0.
Please apply my patch and test again. Do you see more bugs?
{code}
diff --git
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
index 9a905c1..38481fd 100644
---
a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
+++
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
@@ -71,7 +71,7 @@
candidates.add(msg);
if (!msg.contains(this.getVertexID())
- && msg.size() == semiClusterMaximumVertexCount) {
+ && msg.size() < semiClusterMaximumVertexCount) {
SemiClusterMessage msgNew = WritableUtils.clone(msg, this.getConf());
msgNew.addVertex(this);
msgNew.setSemiClusterId("C"
@@ -149,14 +149,15 @@
* @return the value to calcualte the Score of a semi-cluster.
*/
public double semiClusterScoreCalcuation(SemiClusterMessage message) {
- double iC = 0.0, bC = 0.0, fB = 0.0, sC = 0.0;
- int vC = 0, eC = 0;
+ // TODO fB is the bounday score factor. This should be configurable by user
+ // the default is 0.5
+ double iC = 0.0, bC = 0.0, fB = 0.5, sC = 0.0;
+ int vC = 0;
vC = message.size();
for (Vertex<Text, DoubleWritable, SemiClusterMessage> v : message
.getVertexList()) {
List<Edge<Text, DoubleWritable>> eL = v.getEdges();
for (Edge<Text, DoubleWritable> e : eL) {
- eC++;
if (message.contains(e.getDestinationVertexID())
&& e.getValue() != null) {
iC = iC + e.getValue().get();
@@ -165,8 +166,12 @@
}
}
}
+
if (vC > 1)
- sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)) / eC;
+ sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2));
+ else
+ sC = 1.0;
+
return sC;
}
{code}
> Semiclustering Termination
> --------------------------
>
> Key: HAMA-941
> URL: https://issues.apache.org/jira/browse/HAMA-941
> Project: Hama
> Issue Type: Improvement
> Components: examples, graph
> Reporter: Edward J. Yoon
> Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of
> iterations exceeded the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check
> and improve it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)