[jira] [Updated] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere

2014-08-22 Thread npanj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

npanj updated SPARK-3190:
-

Description: 
While creating a graph with 6B nodes and 12B edges, I noticed that 
'numVertices' api returns incorrect result; 'numEdges' reports correct number. 
For few times(with different dataset > 2.5B nodes) I have also notices that 
numVertices is returned as -ive number; so I suspect that there is some 
overflow (may be we are using Int for some field?).

Here is some details of experiments  I have done so far: 
1. Input: numNodes=6101995593 ; noEdges=12163784626
   Graph returns: numVertices=1807028297 ;  numEdges=12163784626

2. Input : numNodes=2157586441 ; noEdges=2747322705
   Graph Returns: numVertices=-2137380855 ;  numEdges=2747322705

3. Input: numNodes=1725060105 ; noEdges=204176821
   Graph: numVertices=1725060105 ;  numEdges=2041768213

You can find the code to generate this bug here: 

https://gist.github.com/npanj/92e949d86d08715bf4bf

Note: Nodes are labeled are 1...6B .














 

  was:
While creating a graph with 6B nodes and 12B edges, I noticed that 
'numVertices' api returns incorrect result; 'numEdges' reports correct number. 
For few times(with different dataset > 2.5B nodes) I have also notices that 
numVertices is returned as -ive number; so I suspect that there is some 
overflow (may be we are using Int for some field?).

Here is some details of experiments  I have done so far: 
1. Input: numNodes=6101995593 ; noEdges=12163784626
   Graph returns: numVertices=1807028297 ;  numEdges=12163784626

2. Input : numNodes=2157586441 ; noEdges=2747322705
   Graph Returns: numVertices=-2137380855 ;  numEdges=2747322705

3. Input: numNodes=1725060105 ; noEdges=204176821
   Graph: numVertices=1725060105 ;  numEdges=2041768213

You can find the code to generate this bug here: 

https://gist.github.com/npanj/92e949d86d08715bf4bf














 


> Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow 
> somewhere
> ---
>
> Key: SPARK-3190
> URL: https://issues.apache.org/jira/browse/SPARK-3190
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 1.0.3
> Environment: Standalone mode running on EC2 . Using latest code from 
> master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 .
>Reporter: npanj
>Priority: Critical
>
> While creating a graph with 6B nodes and 12B edges, I noticed that 
> 'numVertices' api returns incorrect result; 'numEdges' reports correct 
> number. For few times(with different dataset > 2.5B nodes) I have also 
> notices that numVertices is returned as -ive number; so I suspect that there 
> is some overflow (may be we are using Int for some field?).
> Here is some details of experiments  I have done so far: 
> 1. Input: numNodes=6101995593 ; noEdges=12163784626
>Graph returns: numVertices=1807028297 ;  numEdges=12163784626
> 2. Input : numNodes=2157586441 ; noEdges=2747322705
>Graph Returns: numVertices=-2137380855 ;  numEdges=2747322705
> 3. Input: numNodes=1725060105 ; noEdges=204176821
>Graph: numVertices=1725060105 ;  numEdges=2041768213
> You can find the code to generate this bug here: 
> https://gist.github.com/npanj/92e949d86d08715bf4bf
> Note: Nodes are labeled are 1...6B .
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere

2014-08-22 Thread npanj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

npanj updated SPARK-3190:
-

Environment: Standalone mode running on EC2 . Using latest code from master 
branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 .  (was: 
Standalone mode running on EC2 )

> Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow 
> somewhere
> ---
>
> Key: SPARK-3190
> URL: https://issues.apache.org/jira/browse/SPARK-3190
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 1.0.3
> Environment: Standalone mode running on EC2 . Using latest code from 
> master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 .
>Reporter: npanj
>Priority: Critical
>
> While creating a graph with 6B nodes and 12B edges, I noticed that 
> 'numVertices' api returns incorrect result; 'numEdges' reports correct 
> number. For few times(with different dataset > 2.5B nodes) I have also 
> notices that numVertices is returned as -ive number; so I suspect that there 
> is some overflow (may be we are using Int for some field?).
> Here is some details of experiments  I have done so far: 
> 1. Input: numNodes=6101995593 ; noEdges=12163784626
>Graph returns: numVertices=1807028297 ;  numEdges=12163784626
> 2. Input : numNodes=2157586441 ; noEdges=2747322705
>Graph Returns: numVertices=-2137380855 ;  numEdges=2747322705
> 3. Input: numNodes=1725060105 ; noEdges=204176821
>Graph: numVertices=1725060105 ;  numEdges=2041768213
> You can find the code to generate this bug here: 
> https://gist.github.com/npanj/92e949d86d08715bf4bf
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere

2014-08-22 Thread npanj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

npanj updated SPARK-3190:
-

Description: 
While creating a graph with 6B nodes and 12B edges, I noticed that 
'numVertices' api returns incorrect result; 'numEdges' reports correct number. 
For few times(with different dataset > 2.5B nodes) I have also notices that 
numVertices is returned as -ive number; so I suspect that there is some 
overflow (may be we are using Int for some field?).

Here is some details of experiments  I have done so far: 
1. Input: numNodes=6101995593 ; noEdges=12163784626
   Graph returns: numVertices=1807028297 ;  numEdges=12163784626

2. Input : numNodes=2157586441 ; noEdges=2747322705
   Graph Returns: numVertices=-2137380855 ;  numEdges=2747322705

3. Input: numNodes=1725060105 ; noEdges=204176821
   Graph: numVertices=1725060105 ;  numEdges=2041768213

You can find the code to generate this bug here: 

https://gist.github.com/npanj/92e949d86d08715bf4bf














 

  was:
While creating a graph with 6B nodes and 12B edges, I noticed that 
'numVertices' api returns incorrect result; 'numEdges' reports correct number. 
For few times(with different dataset > 2.5B nodes) I have also notices that 
numVertices is returned as -ive number; so I suspect that there is some 
overflow (may be we are using Int for some field?).

Here is some details of experiments  I have done so far: 
1. Input: numNodes=6101995593 ; noEdges=12163784626
   Graph returns: numVertices=1807028297 ;  numEdges=12163784626

2. Input : numNodes=2157586441 ; noEdges=2747322705
   Graph Returns: numVertices=-2137380855 ;  numEdges=2747322705

3. Input: numNodes=1725060105 ; noEdges=204176821
   Graph: numVertices=1725060105 ;  numEdges=2041768213

You can find the code to generate this bug here: 

https://gist.github.com/npanj/92e949d86d08715bf4bf













 


> Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow 
> somewhere
> ---
>
> Key: SPARK-3190
> URL: https://issues.apache.org/jira/browse/SPARK-3190
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 1.0.3
> Environment: Standalone mode running on EC2 
>Reporter: npanj
>Priority: Critical
>
> While creating a graph with 6B nodes and 12B edges, I noticed that 
> 'numVertices' api returns incorrect result; 'numEdges' reports correct 
> number. For few times(with different dataset > 2.5B nodes) I have also 
> notices that numVertices is returned as -ive number; so I suspect that there 
> is some overflow (may be we are using Int for some field?).
> Here is some details of experiments  I have done so far: 
> 1. Input: numNodes=6101995593 ; noEdges=12163784626
>Graph returns: numVertices=1807028297 ;  numEdges=12163784626
> 2. Input : numNodes=2157586441 ; noEdges=2747322705
>Graph Returns: numVertices=-2137380855 ;  numEdges=2747322705
> 3. Input: numNodes=1725060105 ; noEdges=204176821
>Graph: numVertices=1725060105 ;  numEdges=2041768213
> You can find the code to generate this bug here: 
> https://gist.github.com/npanj/92e949d86d08715bf4bf
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org