[GitHub] flink pull request: Hits
Github user mfahimazizi closed the pull request at: https://github.com/apache/flink/pull/765 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-149148599 Hi @mfahimazizi, is there any update regarding this PR? If you're not planning to keep working on it, could you please close this? Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-140903149 Thanks for the explanation @mfahimazizi. So, if you put the edge label inside the message that you send in the messaging function, then you will know which values to add during odd and even supersteps. All you have to do is create a message that contains both the node value and the edge label. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user mfahimazizi commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-140871748 Hey @vasia, Thank you. sorry it is "calculate node values". first we label the edge as 'H' (hub score), then undirected the graph and label the new edges as 'A' (authority score). in VertexUpdateFunction, in odd iteration we first find edges which has 'H' label and sum its nodes value and even iteration we find edges which has 'A' label and sum its nodes value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-140669850 Hey @mfahimazizi, thank you for your reply! Could you give me some details on what you mean by "calculate edges values"? What is the computation that you want to do and why can't you include it in the messaging function? Do you want to update the edge values or simply read them? Let me know if I can help :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user mfahimazizi commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-140633835 Hi @vasia, Thank you. for this algorithm, we need to calculate edges values inside VertexUpdateFunction, so I will try to build my own delta iteration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-140314739 Hi @mfahimazizi, as I said, it is not possible to access the edge values inside the `VertexUpdateFunction`. However, you can get the same functionality if you add the edge value inside the message that you create in the `MessagingFunction`. Alternatively, you can build your own delta iteration, instead of using the vertex-centric model. If you're planning to finish this PR, then let us know and we can even sync on skype or so to help you out! Otherwise, please close this PR and hopefully someone else will pick up this issue. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user mfahimazizi commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-140280883 @andralungu and @vasia , we are outside germany :( we need to access edges inside vertexupdate function (not inside messaging function), as we mentioned in above comments, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-130016029 @vasia , I guess the guys already finished the IMPRO3 course. What is the status of this PR @mfahimazizi? Do you have time to address the comments? Are you still interested in contributing? We would be happy to answer your questions :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-119744587 Hi @mfahimazizi! Thank you for the update. It seems Travis is still complaining about checkstyle. Could you check again please? You can see these errors locally as well, if you run `mvn verify` inside `flink-gelly`. You are right that inside the `VertexUpdateFunction` you cannot access the edge values. However, you could add the edge value inside the message that you create in the `MessagingFunction` (you can access the edges there with `getEdges()`. Would that work for you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user mfahimazizi commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-119188834 Hi @vasia, sorry for late reply. I set indentation with tabs instead of spaces in IDE. and push new updated files. as I discuss in jira issue about vertexupdate function, where we couldn't access edges inside this function. if we overload this function with following, then we can identify Hub and Authority edges in undirected graph. ```public void updateVertex(Vertex vertex, MessageIterator,Double>> inMessages)``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-118161158 Hi there! What is the status of this PR? It seems that travis is failing with checkstyle violations related to using spaces instead of tabs. Let us know if you need further help! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-108634942 Hi @mfahimazizi, The common practice is to prefix the commit with the Jira issue, e.g. [FLINK-20XX][gelly]. Also, you should play with your IDE's settings. Right now, indentation is performed with spaces and it should be with tabs. This is why Travis is failing. To make sure the code works do a cd flink-staging/flink-gelly and then mvn verify. A success there equals a travis success 98% of the time :) After implementing an algorithm, it's always good to check whether it works correctly by writing a test (check the test/example/ folder :) ). I would also update the documentation here. In the Library method section add an entry "*HITS", perhaps with the full name of the acronym as well. I left some comments in-line. Overall this does not look bad at all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31678387 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/HITSExample.java --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.example; + +import org.apache.flink.api.common.ProgramDescription; +import org.apache.flink.api.common.functions.FlatMapFunction; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.library.HITS; +import org.apache.flink.graph.utils.Hits; +import org.apache.flink.graph.utils.Tuple3ToEdgeMap; +import org.apache.flink.util.Collector; + +/** + * This program is an example for HITS algorithm. + * the result is either a hub value or authority value base user selection. + * + * If no arguments are provided, the example runs with a random graph of 10 vertices + * and random edge weights. + */ + +public class HITSExample implements ProgramDescription { +@SuppressWarnings("serial") +public static void main(String[] args) throws Exception { + +if(!parseParameters(args)) { +return; +} + +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +DataSet> links = getEdgesDataSet(env); + +Graph network = Graph.fromDataSet(links, new MapFunction() { + +public Double map(Long value) throws Exception { +return 1.0; +} +}, env); + +// add graph to HITS class with iteration value and hub or authority enum value. +DataSet> HitsValue =network.run( +new HITS(Hits.AUTHORITY,maxIterations)).getVertices(); + +if (fileOutput) { +HitsValue.writeAsCsv(outputPath, "\n", "\t"); +} else { +HitsValue.print(); +} +//env.execute("HITS algorithm"); +} + +@Override +public String getDescription() { +return "HITS algorithm example"; +} + +// * +// UTIL METHODS +// * + +private static boolean fileOutput = false; +private static long numPages = 10; +private static String edgeInputPath = null; +private static String outputPath = null; +private static int maxIterations = 5; + +private static boolean parseParameters(String[] args) { + +if(args.length > 0) { +if(args.length != 3) { +System.err.println("Usage: PageRank "); --- End diff -- It's no longer Page Rank ;) let's not confuse people! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31678355 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/HITSExample.java --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.example; + +import org.apache.flink.api.common.ProgramDescription; +import org.apache.flink.api.common.functions.FlatMapFunction; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.library.HITS; +import org.apache.flink.graph.utils.Hits; +import org.apache.flink.graph.utils.Tuple3ToEdgeMap; +import org.apache.flink.util.Collector; + +/** + * This program is an example for HITS algorithm. + * the result is either a hub value or authority value base user selection. + * + * If no arguments are provided, the example runs with a random graph of 10 vertices + * and random edge weights. + */ + +public class HITSExample implements ProgramDescription { +@SuppressWarnings("serial") +public static void main(String[] args) throws Exception { + +if(!parseParameters(args)) { +return; +} + +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +DataSet> links = getEdgesDataSet(env); + +Graph network = Graph.fromDataSet(links, new MapFunction() { + +public Double map(Long value) throws Exception { +return 1.0; +} +}, env); + +// add graph to HITS class with iteration value and hub or authority enum value. +DataSet> HitsValue =network.run( +new HITS(Hits.AUTHORITY,maxIterations)).getVertices(); + +if (fileOutput) { +HitsValue.writeAsCsv(outputPath, "\n", "\t"); +} else { +HitsValue.print(); +} +//env.execute("HITS algorithm"); --- End diff -- The env.execute() caused problems for you because you were only testing print(). You don't need to do an env.execute() on the else branch as print() now has the same status as count() and collect(), but this command is still needed on the then branch(i.e. after writeAsCsv). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31677964 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/HITS.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; +import org.apache.flink.api.common.aggregators.LongSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.Hits; + +import java.io.Serializable; + + +/** + * + * This class implements the HITS algorithm by using flink Gelly API + *Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, + *developed by Jon Kleinberg. + * + * The algorithm performs a series of iterations, each consisting of two basic steps: + * + * Authority Update: Update each node's Authority score to be equal to the sum of the Hub Scores of each node that + * points to it. + * That is, a node is given a high authority score by being linked from pages that are recognized as Hubs for information. + * Hub Update: Update each node's Hub Score to be equal to the sum of the Authority Scores of each node that it + * points to. + * That is, a node is given a high hub score by linking to nodes that are considered to be authorities on the subject. + * + * The Hub score and Authority score for a node is calculated with the following algorithm: + * *Start with each node having a hub score and authority score of 1. + * *Run the Authority Update Rule + * *Run the Hub Update Rule + * *Normalize the values by dividing each Hub score by square root of the sum of the squares of all Hub scores, and + * dividing each Authority score by square root of the sum of the squares of all Authority scores. + * *Repeat from the second step as necessary. + * + * http://en.wikipedia.org/wiki/HITS_algorithm --- End diff -- I would add an @see annotation for the link here, check the community detection library method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31596508 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/HITSExample.java --- @@ -0,0 +1,113 @@ +package org.apache.flink.graph.example; --- End diff -- Missing Apache license header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31596519 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/HITS.java --- @@ -0,0 +1,163 @@ +package org.apache.flink.graph.library; --- End diff -- Missing Apache license header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31596533 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Hits.java --- @@ -0,0 +1,14 @@ +package org.apache.flink.graph.utils; --- End diff -- Missing Apache license header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
GitHub user mfahimazizi opened a pull request: https://github.com/apache/flink/pull/765 Hits HITS algorithm in Gelly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mfahimazizi/flink HITS Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/765.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #765 commit e3fe353a5a4fbca35a65672214f451cda58de6ab Author: mfahimazizi Date: 2015-05-30T21:53:30Z HITS algorithm added commit 646660272c4c9b4152d2ebfc4bfbd6e04d891408 Author: mfahimazizi Date: 2015-06-02T23:23:15Z HITS algorithm added_ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user JavidMayar closed the pull request at: https://github.com/apache/flink/pull/694 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
GitHub user JavidMayar opened a pull request: https://github.com/apache/flink/pull/694 Hits You can merge this pull request into a Git repository by running: $ git pull https://github.com/JavidMayar/flink HITS Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/694.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #694 commit 724be153d751d409e149a874fbd962e0fd52e782 Author: JavidMayar Date: 2015-05-19T00:46:33Z HITS algorithm added commit 73d5e1dde32714d774ff11b0ec209d90537bea39 Author: JavidMayar Date: 2015-05-19T11:13:03Z Updated Version of Hits algorithm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---