[jira] [Updated] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere
[ https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] npanj updated SPARK-3190: - Description: While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset > 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf Note: Nodes are labeled are 1...6B . was: While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset > 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf > Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow > somewhere > --- > > Key: SPARK-3190 > URL: https://issues.apache.org/jira/browse/SPARK-3190 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 1.0.3 > Environment: Standalone mode running on EC2 . Using latest code from > master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 . >Reporter: npanj >Priority: Critical > > While creating a graph with 6B nodes and 12B edges, I noticed that > 'numVertices' api returns incorrect result; 'numEdges' reports correct > number. For few times(with different dataset > 2.5B nodes) I have also > notices that numVertices is returned as -ive number; so I suspect that there > is some overflow (may be we are using Int for some field?). > Here is some details of experiments I have done so far: > 1. Input: numNodes=6101995593 ; noEdges=12163784626 >Graph returns: numVertices=1807028297 ; numEdges=12163784626 > 2. Input : numNodes=2157586441 ; noEdges=2747322705 >Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 > 3. Input: numNodes=1725060105 ; noEdges=204176821 >Graph: numVertices=1725060105 ; numEdges=2041768213 > You can find the code to generate this bug here: > https://gist.github.com/npanj/92e949d86d08715bf4bf > Note: Nodes are labeled are 1...6B . > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere
[ https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] npanj updated SPARK-3190: - Environment: Standalone mode running on EC2 . Using latest code from master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 . (was: Standalone mode running on EC2 ) > Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow > somewhere > --- > > Key: SPARK-3190 > URL: https://issues.apache.org/jira/browse/SPARK-3190 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 1.0.3 > Environment: Standalone mode running on EC2 . Using latest code from > master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 . >Reporter: npanj >Priority: Critical > > While creating a graph with 6B nodes and 12B edges, I noticed that > 'numVertices' api returns incorrect result; 'numEdges' reports correct > number. For few times(with different dataset > 2.5B nodes) I have also > notices that numVertices is returned as -ive number; so I suspect that there > is some overflow (may be we are using Int for some field?). > Here is some details of experiments I have done so far: > 1. Input: numNodes=6101995593 ; noEdges=12163784626 >Graph returns: numVertices=1807028297 ; numEdges=12163784626 > 2. Input : numNodes=2157586441 ; noEdges=2747322705 >Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 > 3. Input: numNodes=1725060105 ; noEdges=204176821 >Graph: numVertices=1725060105 ; numEdges=2041768213 > You can find the code to generate this bug here: > https://gist.github.com/npanj/92e949d86d08715bf4bf > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere
[ https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] npanj updated SPARK-3190: - Description: While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset > 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf was: While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset > 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf > Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow > somewhere > --- > > Key: SPARK-3190 > URL: https://issues.apache.org/jira/browse/SPARK-3190 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 1.0.3 > Environment: Standalone mode running on EC2 >Reporter: npanj >Priority: Critical > > While creating a graph with 6B nodes and 12B edges, I noticed that > 'numVertices' api returns incorrect result; 'numEdges' reports correct > number. For few times(with different dataset > 2.5B nodes) I have also > notices that numVertices is returned as -ive number; so I suspect that there > is some overflow (may be we are using Int for some field?). > Here is some details of experiments I have done so far: > 1. Input: numNodes=6101995593 ; noEdges=12163784626 >Graph returns: numVertices=1807028297 ; numEdges=12163784626 > 2. Input : numNodes=2157586441 ; noEdges=2747322705 >Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 > 3. Input: numNodes=1725060105 ; noEdges=204176821 >Graph: numVertices=1725060105 ; numEdges=2041768213 > You can find the code to generate this bug here: > https://gist.github.com/npanj/92e949d86d08715bf4bf > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org