zjxian created TINKERPOP-2376:
---------------------------------
Summary: Probability distribution controlled by weight when using
sample step
Key: TINKERPOP-2376
URL: https://issues.apache.org/jira/browse/TINKERPOP-2376
Project: TinkerPop
Issue Type: New Feature
Components: process
Affects Versions: 3.4.6
Environment: Gremlin-Tinkerpop 3.4.6 on Fedora 32
Reporter: zjxian
Attachments: out.csv
create a simple graph with 1 central node and 3 surronding nodes
add 3 edges with equal weight (1) and form a stargraph
traverse from center ( v[0] ) to other (3) nodes, sample(1) and record the
destination node
do that 10000 times
estimated probabitlity distribution:
v[1]:v[2]:v[3] = 3333:3333:3333 (1:1:1)
what i got:
v[1]:v[2]:v[3] = 3320:4439:2241
I've checked some source file, like
([https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/SampleGlobalStep.java]).
The probability distribution shoud be like 1/3:4/9:2/9, which is very close
to the results I got.
I think some improvements is needed here to make "random walk" in tinkerpop
really useful.
the script i use:
{code:java}
//代码占位符
conf = new BaseConfiguration()
conf.setProperty("gremlin.tinkergraph.vertexIdManager","LONG")
conf.setProperty("gremlin.tinkergraph.edgeIdManager","LONG")
conf.setProperty("gremlin.tinkergraph.vertexPropertyIdManager","LONG");
graph = TinkerGraph.open(conf)g=graph.traversal()
for(i=0;i<=3;i++){
g.addV().iterate()
}
for(i=1;i<=3;i++){
g.V(0).addE("connect").property("weight",1).to(g.V(i)).iterate()
}
["bash", "-c", "rm -f out.csv"].execute().waitFor()file=new
File("out.csv")file.append("id\r\n")
for(i=0;i<10000;i++){
g.V(0).outE().sample(1).by("weight").otherV().map{file.append
it.get().id()+"\r\n"}.iterate()
}
{code}
see result in attached out.csv
--
This message was sent by Atlassian Jira
(v8.3.4#803005)