[jira] [Updated] (TINKERPOP-3074) The sample() step is largely unusable with large graphs

Kelvin Lawrence (Jira) Mon, 22 Apr 2024 09:49:45 -0700


     [ 
https://issues.apache.org/jira/browse/TINKERPOP-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kelvin Lawrence updated TINKERPOP-3074:
---------------------------------------
    Description: 
While the `sample` step can be useful with smallish sized amounts of data for 
random walks and similar, its current implementation makes it unusable with 
large graphs if you are looking to sample, say, one node, from a graph with a 
millions or billions of nodes in it.
{code:java}
// This generally works assuming the out() step yields limited numbers of nodes
g.V(1).out().sample(1).out().sample(1) //etc

// This fails for a large graph, usually with an OOM error
g.V().sample(1){code}
The current implementation of sample() is quite naive and assumes it can fetch 
everything into memory before computing a result. I have seen many users 
wanting to start a walk from a random place, and they always try to do 
{color:#0747a6}_g.V().sample(1)_{color} or 
_{color:#0747a6}g.E().sample(1){color}_ types of queries.

  was:
While the `sample` step can be useful with smallish sized amounts of data for 
random walks and similar, its current implementation makes it unusable with 
large graphs if you are looking to sample, say, one node, from a graph with a 
millions or billions of nodes in it.


{code:java}
// This generally works assuming the out() step yields limited numbers of nodes
g.V(1).out().sample(1).out.sample(1) //etc

// This fails for a large graph, usually with an OOM error
g.V().sample(1){code}
The current implementation of sample() is quite naive and assumes it can fetch 
everything into memory before computing a result. I have seen many users 
wanting to start a walk from a random place, and they always try to do 
{color:#0747a6}_g.V().sample(1)_{color} or 
_{color:#0747a6}g.E().sample(1){color}_ types of queries.


> The sample() step is largely unusable with large graphs
> -------------------------------------------------------
>
>                 Key: TINKERPOP-3074
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-3074
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>            Reporter: Kelvin Lawrence
>            Priority: Major
>
> While the `sample` step can be useful with smallish sized amounts of data for 
> random walks and similar, its current implementation makes it unusable with 
> large graphs if you are looking to sample, say, one node, from a graph with a 
> millions or billions of nodes in it.
> {code:java}
> // This generally works assuming the out() step yields limited numbers of 
> nodes
> g.V(1).out().sample(1).out().sample(1) //etc
> // This fails for a large graph, usually with an OOM error
> g.V().sample(1){code}
> The current implementation of sample() is quite naive and assumes it can 
> fetch everything into memory before computing a result. I have seen many 
> users wanting to start a walk from a random place, and they always try to do 
> {color:#0747a6}_g.V().sample(1)_{color} or 
> _{color:#0747a6}g.E().sample(1){color}_ types of queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (TINKERPOP-3074) The sample() step is largely unusable with large graphs

Reply via email to