[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281271#comment-16281271 ] Brain commented on SPARK-1153: -- :) why do not add string or uuid support? > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651453#comment-15651453 ] Nicholas Tietz commented on SPARK-1153: --- The decision we eventually made was to migrate as much of our code out of GraphX as we could (moving to writing more directly in Spark). We were running into other potential performance issues with GraphX and we could not do the kind of checkpointing we wanted to, so it was a workable solution for us. At the end, we just dealt with the pain of managing consistent IDs ourselves and joining them in. It was not ideal, but it worked and the performance hit was made up for in other areas where we were able to migrate off of GraphX. > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651367#comment-15651367 ] Guillem LEFAIT commented on SPARK-1153: --- Hi Nicholas, we got the same needs here, and we delayed a fix until today where we found that collisions reach an arbitrary (low) level. As JJ Zhang said, I'm not confortable with a solution that produces everyday a new order (and consequently a new ID) but keeping a dictionnary of key/value seems costly given the number of data we're dealing with. Have you got a chance to make some experiments on the best way to solve this problem ? > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214951#comment-15214951 ] Reynold Xin commented on SPARK-1153: The main thing is that we encode the data assuming integer ids, and are using specialized data structure for int ids. If we change to generic types, the memory footprint will increase, and the performance will decrease too. > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214948#comment-15214948 ] Nicholas Tietz commented on SPARK-1153: --- Thanks for the reply. I think that GraphFrames is not quite sufficient to meet our needs here but I will dive in further. My focus this week is on addressing our problem with hash collisions in forming graph vertex ids, so you may hear more from me. Could you say some more about where it will likely make performance regress? I am diving into the source this week, but pointers toward specific things to watch out for would be helpful. > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212670#comment-15212670 ] Reynold Xin commented on SPARK-1153: [~ntietz] changing this will very likely make performance regress for long ids, due to the lack of specialization. You might want to look into graphframes for more general graph functionalities too: https://github.com/graphframes/graphframes > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211948#comment-15211948 ] Nicholas Tietz commented on SPARK-1153: --- We are also running into this issue and would like this feature. For our use case and our data size, even a low risk of hash collisions is not acceptable, so we have to have a reliable way to form unique ids from our current unique string ids. I'm going to work on a patch next week. Since this is marked "won't fix" due to inactivity, what's the process if a PR is submitted? (Sorry, new to the Apache contribution process.) > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706877#comment-14706877 ] JJ Zhang commented on SPARK-1153: - We would also really like a general customized ID available for Vertex. We've been using zipwithIndex to create IDs for now, however, it is a hassle process-wise because we never have a stable ID: any update to a new version of Graph with incremental input data requires a total rebuild of vertex/edges, or we will need another infrastructure to serve as an ID service: additional cost/maintenance. We already have unique IDs for all of our data entities. It would make processing/maintenance much easier if our stable IDs can be used directly > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505071#comment-14505071 ] Carlos Balduz commented on SPARK-1153: -- I am currently using zipWithUniqueId() to get a VertexID for my data, but that means that after getting the VertexIDs, I have to go to the edges data to look for each of those strings and assign the Id I got from the previous step. I agree it would be nice to be able to choose a different tipe of ID, leaving the user to decide whether he prefers performance or usability. > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190766#comment-14190766 ] Dan Osipov commented on SPARK-1153: --- FWIW, UUID.getMostSignificantBits() or getLeastSignificantBits() can be used to generate a Long, with low collision probability. Using any type for the ID is still preferred. > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085676#comment-14085676 ] Larry Xiao commented on SPARK-1153: --- I like npanj's approach. It's universal. You treat UUID as attribute. Like the procedure from http://spark.apache.org/docs/latest/graphx-programming-guide.html // Connect to the Spark cluster == Build Graph (build VertexID if necessary) // Load my user data and parse into tuples of user id and attribute list // Parse the edge data which is already in userId -> userId format // Attach the user attributes == Clean Graph // Some users may not have attributes so we set them as empty, Restrict the graph to users with usernames and names == Compute // Compute the PageRank == Get Result // Get the attributes of the top pagerank users > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1153) Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs.
[ https://issues.apache.org/jira/browse/SPARK-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010846#comment-14010846 ] npanj commented on SPARK-1153: -- An alternative approach, that I have been using: 1 Use a preprocessing step that maps UUID to an Long. 2. Build graph based on Longs For Mapping in step 1: - Rank your uuids. - some kind of has function? For 1, graphx can provide a tool to generate map. I will like to hear how others are building graphs out of non-Long node types > Generalize VertexId in GraphX so that UUIDs can be used as vertex IDs. > -- > > Key: SPARK-1153 > URL: https://issues.apache.org/jira/browse/SPARK-1153 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 0.9.0 >Reporter: Deepak Nulu > > Currently, {{VertexId}} is a type-synonym for {{Long}}. I would like to be > able to use {{UUID}} as the vertex ID type because the data I want to process > with GraphX uses that type for its primay-keys. Others might have a different > type for their primary-keys. Generalizing {{VertexId}} (with a type class) > will help in such cases. -- This message was sent by Atlassian JIRA (v6.2#6252)