[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2017-12-06 Thread Brain (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281271#comment-16281271
 ] 

Brain commented on SPARK-1153:
--

:)  why do not add string or uuid support?

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2016-11-09 Thread Nicholas Tietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651453#comment-15651453
 ] 

Nicholas Tietz commented on SPARK-1153:
---

The decision we eventually made was to migrate as much of our code out of 
GraphX as we could (moving to writing more directly in Spark). We were running 
into other potential performance issues with GraphX and we could not do the 
kind of checkpointing we wanted to, so it was a workable solution for us.

At the end, we just dealt with the pain of managing consistent IDs ourselves 
and joining them in. It was not ideal, but it worked and the performance hit 
was made up for in other areas where we were able to migrate off of GraphX.

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2016-11-09 Thread Guillem LEFAIT (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651367#comment-15651367
 ] 

Guillem LEFAIT commented on SPARK-1153:
---

Hi Nicholas, we got the same needs here, and we delayed a fix until today where 
we found that collisions reach an arbitrary (low) level.
As JJ Zhang said, I'm not confortable with a solution that produces everyday a 
new order (and consequently a new ID) but keeping a dictionnary of key/value 
seems costly given the number of data we're dealing with.

Have you got a chance to make some experiments on the best way to solve this 
problem ? 

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2016-03-28 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214951#comment-15214951
 ] 

Reynold Xin commented on SPARK-1153:


The main thing is that we encode the data assuming integer ids, and are using 
specialized data structure for int ids. If we change to generic types, the 
memory footprint will increase, and the performance will decrease too.


> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2016-03-28 Thread Nicholas Tietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214948#comment-15214948
 ] 

Nicholas Tietz commented on SPARK-1153:
---

Thanks for the reply.

I think that GraphFrames is not quite sufficient to meet our needs here but I 
will dive in further. My focus this week is on addressing our problem with hash 
collisions in forming graph vertex ids, so you may hear more from me.

Could you say some more about where it will likely make performance regress? I 
am diving into the source this week, but pointers toward specific things to 
watch out for would be helpful.

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2016-03-25 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212670#comment-15212670
 ] 

Reynold Xin commented on SPARK-1153:


[~ntietz] changing this will very likely make performance regress for long ids, 
due to the lack of specialization. 

You might want to look into graphframes for more general graph functionalities 
too: https://github.com/graphframes/graphframes 

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2016-03-25 Thread Nicholas Tietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211948#comment-15211948
 ] 

Nicholas Tietz commented on SPARK-1153:
---

We are also running into this issue and would like this feature. For our use 
case and our data size, even a low risk of hash collisions is not acceptable, 
so we have to have a reliable way to form unique ids from our current unique 
string ids.

I'm going to work on a patch next week.

Since this is marked "won't fix" due to inactivity, what's the process if a PR 
is submitted? (Sorry, new to the Apache contribution process.)

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2015-08-21 Thread JJ Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706877#comment-14706877
 ] 

JJ Zhang commented on SPARK-1153:
-

We would also really like a general customized ID available for Vertex. We've 
been using zipwithIndex to create IDs for  now, however, it is a hassle 
process-wise because we never have a stable ID:  any update to a new version of 
Graph with incremental input data requires a total rebuild of vertex/edges, or 
we will need another infrastructure to serve as an ID service: additional 
cost/maintenance. We already have unique IDs for all of our data entities. It 
would make processing/maintenance much easier if our stable IDs can be used 
directly

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2015-04-21 Thread Carlos Balduz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505071#comment-14505071
 ] 

Carlos Balduz commented on SPARK-1153:
--

I am currently using zipWithUniqueId() to get a VertexID for my data, but that 
means that after getting the VertexIDs, I have to go to the edges data to look 
for each of those strings and assign the Id I got from the previous step. 

I agree it would be nice to be able to choose a different tipe of ID, leaving 
the user to decide whether he prefers performance or usability.

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2014-10-30 Thread Dan Osipov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190766#comment-14190766
 ] 

Dan Osipov commented on SPARK-1153:
---

FWIW, UUID.getMostSignificantBits() or getLeastSignificantBits() can be used to 
generate a Long, with low collision probability.

Using any type for the ID is still preferred.

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2014-08-04 Thread Larry Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085676#comment-14085676
 ] 

Larry Xiao commented on SPARK-1153:
---

I like npanj's approach.
It's universal. You treat UUID as attribute.

Like the procedure from 
http://spark.apache.org/docs/latest/graphx-programming-guide.html

// Connect to the Spark cluster
== Build Graph (build VertexID if necessary)
// Load my user data and parse into tuples of user id and attribute list
// Parse the edge data which is already in userId -> userId format
// Attach the user attributes
== Clean Graph
// Some users may not have attributes so we set them as empty, Restrict the 
graph to users with usernames and names
== Compute
// Compute the PageRank
== Get Result
// Get the attributes of the top pagerank users

> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.

2014-05-27 Thread npanj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010846#comment-14010846
 ] 

npanj commented on SPARK-1153:
--

An alternative approach, that I have been using: 
1 Use a preprocessing step that maps UUID to an Long.
2. Build graph based on Longs

For Mapping in step 1:
- Rank your uuids.
- some kind of has function?

For 1, graphx can provide a tool to generate map.

I will like to hear how others are building graphs out of non-Long node types




> Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
> --
>
> Key: SPARK-1153
> URL: https://issues.apache.org/jira/browse/SPARK-1153
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 0.9.0
>Reporter: Deepak Nulu
>
> Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be 
> able to use {{UUID}} as the vertex ID type because the data I want to process 
> with GraphX uses that type for its primay-keys. Others might have a different 
> type for their primary-keys. Generalizing {{VertexId}} (with a type class) 
> will help in such cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)