[jira] [Updated] (TINKERPOP-3074) The sample() step is largely unusable with large graphs

Yang Xia (Jira) Fri, 30 Aug 2024 09:30:28 -0700


     [ 
https://issues.apache.org/jira/browse/TINKERPOP-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yang Xia updated TINKERPOP-3074:
--------------------------------
    Affects Version/s: 3.7.2
                       3.6.7

> The sample() step is largely unusable with large graphs
> -------------------------------------------------------
>
>                 Key: TINKERPOP-3074
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-3074
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.6.7, 3.7.2
>            Reporter: Kelvin Lawrence
>            Priority: Major
>
> While the `sample` step can be useful with smallish sized amounts of data for 
> random walks and similar, its current implementation makes it unusable with 
> large graphs if you are looking to sample, say, one node, from a graph with a 
> millions or billions of nodes in it.
> {code:java}
> // This generally works assuming the out() step yields limited numbers of 
> nodes
> g.V(1).out().sample(1).out().sample(1) //etc
> // This fails for a large graph, usually with an OOM error
> g.V().sample(1){code}
> The current implementation of sample() is quite naive and assumes it can 
> fetch everything into memory before computing a result. I have seen many 
> users wanting to start a walk from a random place, and they always try to do 
> {color:#0747a6}_g.V().sample(1)_{color} or 
> _{color:#0747a6}g.E().sample(1){color}_ types of queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (TINKERPOP-3074) The sample() step is largely unusable with large graphs

Reply via email to