[ https://issues.apache.org/jira/browse/TINKERPOP-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123617#comment-17123617 ]
zjxian commented on TINKERPOP-2376: ----------------------------------- [^SampleGlobalStep.java] Here is a modified SampleGlobalStep file. I'm not familiar with the origin algorithm but the modified version seems working. > Probability distribution controlled by weight when using sample step > -------------------------------------------------------------------- > > Key: TINKERPOP-2376 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2376 > Project: TinkerPop > Issue Type: Improvement > Components: process > Affects Versions: 3.4.6 > Environment: Gremlin-Tinkerpop 3.4.6 on Fedora 32 > Reporter: zjxian > Priority: Critical > Attachments: SampleGlobalStep.java, out.csv > > > create a simple graph with 1 central node and 3 surronding nodes > add 3 edges with equal weight (1) and form a stargraph > traverse from center ( v[0] ) to other (3) nodes, sample(1) and record the > destination node > do that 10000 times > estimated probabitlity distribution: > v[1]:v[2]:v[3] = 3333:3333:3333 (1:1:1) > what i got: > v[1]:v[2]:v[3] = 3320:4439:2241 > I've checked some source file, like > ([https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/SampleGlobalStep.java]). > The probability distribution shoud be like 1/3:4/9:2/9, which is very close > to the results I got. > I think some improvements is needed here to make "random walk" in tinkerpop > really useful. > the script i use: > {code:java} > //代码占位符 > conf = new BaseConfiguration() > conf.setProperty("gremlin.tinkergraph.vertexIdManager","LONG") > conf.setProperty("gremlin.tinkergraph.edgeIdManager","LONG") > conf.setProperty("gremlin.tinkergraph.vertexPropertyIdManager","LONG"); > graph = TinkerGraph.open(conf)g=graph.traversal() > for(i=0;i<=3;i++){ > g.addV().iterate() > } > for(i=1;i<=3;i++){ > g.V(0).addE("connect").property("weight",1).to(g.V(i)).iterate() > } > ["bash", "-c", "rm -f out.csv"].execute().waitFor()file=new > File("out.csv")file.append("id\r\n") > for(i=0;i<10000;i++){ > g.V(0).outE().sample(1).by("weight").otherV().map{file.append > it.get().id()+"\r\n"}.iterate() > } > {code} > see result in attached out.csv > -- This message was sent by Atlassian Jira (v8.3.4#803005)