RE: Best way to know the assignment of vertices to workers

2014-11-28 Thread Pavan Kumar A
I looked at the code again & does not seem like workerList is sorted, etc. so 
by knowing a worker number there is no consistent way to tell the actual worker 
details each time. Lukas was working on such a diff sometime back. Perhaps he 
can answer more.
From: pava...@outlook.com
To: user@giraph.apache.org
Subject: RE: Best way to know the assignment of vertices to workers
Date: Sat, 29 Nov 2014 11:23:39 +0530




I wrote a diff sometime ago where you can easily do that. 
You can find implementation details at - 
https://issues.apache.org/jira/browse/GIRAPH-908 & 
https://reviews.apache.org/r/22234/
Some options you can use are
-Dgiraph.mappingStoreClass=org.apache.giraph.mapping.LongByteMappingStore
-Dgiraph.lbMappingStoreUpper=1987000-Dgiraph.lbMappingStoreLower=4096# 
Mapping tore ops information
-Dgiraph.mappingStoreOpsClass=org.apache.giraph.mapping.DefaultEmbeddedLongByteOps
# Embed mapping information
-Dgiraph.edgeTranslationClass=org.apache.giraph.mapping.translate.LongByteTranslateEdge
# PartitionerFactory to be used
-Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.LongMappingStorePartitionerFactory
And like vertex input & edge input we now have a mapping inputI only 
implemented all these for giraph-hive, so if u have a hive table with the 
mapping vertexId -> workerNumthen u can pass the mapping input like
"org.apache.giraph.hive.input.mapping.examples.LongInt2ByteHiveToMapping, 
$mapping_table, $mapping_partition"
You can go through the code for each of these options to see what they do. 
Using this you can sort of pre-assign workers to vertex ids, now if u assign 
two vertices to a worker say worker-1, it is guaranteed they are both present 
in the same worker, the numbering (aka identification/naming) of workers is 
consistent (i.e, if a, b are assigned worker-x, they are guaranteed to be in 
the same worker but we do not know which worker that would be ahead in time), 
but cannot be explicitly set by the user. (which is what you want to do from 
what I can tell)
If you are using something else, other than hive then you will have to 
implement all the interfaces of MappingInputFormat and then u can easily 
achieve what you want.
From: kiran.garime...@aalto.fi
To: user@giraph.apache.org
Subject: Best way to know the assignment of vertices to workers
Date: Fri, 28 Nov 2014 12:02:59 +






Hi all,



Is there a clean way to find out which worker a particular vertex is assigned 
to?



>From what I tried out, I found that given n workers, each node is assigned to 
>the worker with id (vertex_id % n  ). Is that a safe way to do this?




I’ve had a look at previous discussions, but most of them have no answer.




—



Why I need it:



In my application, each vertex needs to know some additional meta data, which 
is loaded from file. This metadata file is huge (>50 G) and so, on each worker, 
I only want to load the metadata corresponding to the vertices present on that 
worker.



—






Previous discussions:
1. 
http://mail-archives.apache.org/mod_mbox/giraph-user/201310.mbox/%3C7EC16F82718A6D4A920A99FE46CE7F4E2861F779%40MERCMBX19R.na.SAS.com%3E
2. 
http://mail-archives.apache.org/mod_mbox/giraph-user/201403.mbox/%3CCAMf08QYE%2BRgUv9otXT6oPJorTNjQ-Ay8p4NUiuhds8%2BzgDzs1w%40mail.gmail.com%3E









Regards,
Kiran   
  

RE: Best way to know the assignment of vertices to workers

2014-11-28 Thread Pavan Kumar A
I wrote a diff sometime ago where you can easily do that. 
You can find implementation details at - 
https://issues.apache.org/jira/browse/GIRAPH-908 & 
https://reviews.apache.org/r/22234/
Some options you can use are
-Dgiraph.mappingStoreClass=org.apache.giraph.mapping.LongByteMappingStore
-Dgiraph.lbMappingStoreUpper=1987000-Dgiraph.lbMappingStoreLower=4096# 
Mapping tore ops information
-Dgiraph.mappingStoreOpsClass=org.apache.giraph.mapping.DefaultEmbeddedLongByteOps
# Embed mapping information
-Dgiraph.edgeTranslationClass=org.apache.giraph.mapping.translate.LongByteTranslateEdge
# PartitionerFactory to be used
-Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.LongMappingStorePartitionerFactory
And like vertex input & edge input we now have a mapping inputI only 
implemented all these for giraph-hive, so if u have a hive table with the 
mapping vertexId -> workerNumthen u can pass the mapping input like
"org.apache.giraph.hive.input.mapping.examples.LongInt2ByteHiveToMapping, 
$mapping_table, $mapping_partition"
You can go through the code for each of these options to see what they do. 
Using this you can sort of pre-assign workers to vertex ids, now if u assign 
two vertices to a worker say worker-1, it is guaranteed they are both present 
in the same worker, the numbering (aka identification/naming) of workers is 
consistent (i.e, if a, b are assigned worker-x, they are guaranteed to be in 
the same worker but we do not know which worker that would be ahead in time), 
but cannot be explicitly set by the user. (which is what you want to do from 
what I can tell)
If you are using something else, other than hive then you will have to 
implement all the interfaces of MappingInputFormat and then u can easily 
achieve what you want.
From: kiran.garime...@aalto.fi
To: user@giraph.apache.org
Subject: Best way to know the assignment of vertices to workers
Date: Fri, 28 Nov 2014 12:02:59 +






Hi all,



Is there a clean way to find out which worker a particular vertex is assigned 
to?



>From what I tried out, I found that given n workers, each node is assigned to 
>the worker with id (vertex_id % n  ). Is that a safe way to do this?




I’ve had a look at previous discussions, but most of them have no answer.




—



Why I need it:



In my application, each vertex needs to know some additional meta data, which 
is loaded from file. This metadata file is huge (>50 G) and so, on each worker, 
I only want to load the metadata corresponding to the vertices present on that 
worker.



—






Previous discussions:
1. 
http://mail-archives.apache.org/mod_mbox/giraph-user/201310.mbox/%3C7EC16F82718A6D4A920A99FE46CE7F4E2861F779%40MERCMBX19R.na.SAS.com%3E
2. 
http://mail-archives.apache.org/mod_mbox/giraph-user/201403.mbox/%3CCAMf08QYE%2BRgUv9otXT6oPJorTNjQ-Ay8p4NUiuhds8%2BzgDzs1w%40mail.gmail.com%3E









Regards,
Kiran 

Re: Best way to know the assignment of vertices to workers

2014-11-28 Thread Matthew Saltz
Kiran,

To answer your question directly, in an AbstractComputation class (or
whatever descendant you're using), you may call
getWorkerContext().getMyWorkerIndex() (here
).
However, if each vertex has metadata associated to it, I think the best way
to go would be to define a custom VertexReader

and custom Vertex type to take that into account when reading the vertex.

Best,
Matthew

On Fri, Nov 28, 2014 at 1:02 PM, Garimella Kiran 
wrote:

>  Hi all,
>
>  Is there a clean way to find out which worker a particular vertex is
> assigned to?
>
>  From what I tried out, I found that given n workers, each node is
> assigned to the worker with id (vertex_id % n  ). Is that a safe way to do
> this?
>
>  I’ve had a look at previous discussions, but most of them have no answer.
>
>  —
>
>  Why I need it:
>
>  In my application, each vertex needs to know some additional meta data,
> which is loaded from file. This metadata file is huge (>50 G) and so, on
> each worker, I only want to load the metadata corresponding to the vertices
> present on that worker.
>
>  —
>
>
>  Previous discussions:
> 1.
> http://mail-archives.apache.org/mod_mbox/giraph-user/201310.mbox/%3C7EC16F82718A6D4A920A99FE46CE7F4E2861F779%40MERCMBX19R.na.SAS.com%3E
> 2.
> http://mail-archives.apache.org/mod_mbox/giraph-user/201403.mbox/%3CCAMf08QYE%2BRgUv9otXT6oPJorTNjQ-Ay8p4NUiuhds8%2BzgDzs1w%40mail.gmail.com%3E
>
>
>
>  Regards,
> Kiran
>


Best way to know the assignment of vertices to workers

2014-11-28 Thread Garimella Kiran
Hi all,

Is there a clean way to find out which worker a particular vertex is assigned 
to?

>From what I tried out, I found that given n workers, each node is assigned to 
>the worker with id (vertex_id % n  ). Is that a safe way to do this?

I’ve had a look at previous discussions, but most of them have no answer.

—

Why I need it:

In my application, each vertex needs to know some additional meta data, which 
is loaded from file. This metadata file is huge (>50 G) and so, on each worker, 
I only want to load the metadata corresponding to the vertices present on that 
worker.

—


Previous discussions:
1. 
http://mail-archives.apache.org/mod_mbox/giraph-user/201310.mbox/%3C7EC16F82718A6D4A920A99FE46CE7F4E2861F779%40MERCMBX19R.na.SAS.com%3E
2. 
http://mail-archives.apache.org/mod_mbox/giraph-user/201403.mbox/%3CCAMf08QYE%2BRgUv9otXT6oPJorTNjQ-Ay8p4NUiuhds8%2BzgDzs1w%40mail.gmail.com%3E



Regards,
Kiran