Re: Giraph Partitioning

2015-03-24 Thread Ing. Alessio Arleo
Hello everybody.

Almost a month later, I bump this topic because actually there’s still no clear 
answer about the fate of the PartitionContext class, introduced in Giraph-504 
and included in Giraph-1.0.0. It seems like that this feature was not ported 
into the new version (1.1.0). Even if I strongly believe that the new Giraph 
design fulfils PartitionContext purpose so that it’s unnecessary, I do not have 
any evidence to support that. 

Does anybody have a clue?

~~~

Ing. Alessio Arleo

Dottorando in Ingegneria Industriale e dell’Informazione

Dottore Magistrale in Ingegneria Informatica e dell’Automazione
Dottore in Ingegneria Informatica ed Elettronica

Linkedin: it.linkedin.com/in/IngArleo 
Skype: Ing. Alessio Arleo

Tel: +39 075 5853920
Cell: +39 349 0575782

~~~



> On 25 Feb 2015, at 19:56, Arjun Sharma  wrote:
> 
> Thanks Matthew for your replies! They are quite helpful. Regarding question 
> number 4, I see a commit of PartitionContext here by Maja 
> http://mail-archives.apache.org/mod_mbox/giraph-commits/201302.mbox/%3c20130209001122.ddad73a...@tyr.zones.apache.org%3E
>  
> ,
>  but it seems to be removed from the current version?
> 
> 
> On Wed, Feb 25, 2015 at 3:30 AM, Matthew Saltz  > wrote:
> Hi,
> 
> 1) The partitions are processed in parallel based on the number of threads 
> you specify. The vertices within a partition are processed sequentially. You 
> may want to use more partitions than threads, that way if one partition takes 
> a particularly long time to be processed, the other threads can continue 
> processing the remaining partitions. If you have four machines with 12 
> threads each for example, with one worker per machine, the default number of 
> partitions will be 4^2 = 16 partitions, whereas you actually have 48 threads 
> available, so you'd probably want to specify the number of partitions 
> manually to a larger number to take advantage of parallelism. 
> 2) Yes 
> 3) If you are only doing single threading, there's no reason to do multiple 
> partitions per worker
> 3 (the second one)) I'm not familiar with the out-of-core functionality
> 4) I'm not sure
> 
> I'm basing this on the version of Giraph from this summer, not the most 
> recent release, but I don't think this part has changed. May want to verify 
> by looking at the code.  
> 
> Best,
> Matthew
> 
> On Wed, Feb 25, 2015 at 3:25 AM, Arjun Sharma  > wrote:
> Hi,
> 
> I understand that by default, the number of partitions = number of workers ^ 
> 2. So, if we have N workers, each worker will process N partitions. I have a 
> number of questions:
> 
> 1- By default, does Giraph process the N partitions within a single worker 
> sequentially? If yes, when setting the parameter giraph.numComputeThreads, 
> will partitions within each thread be computed sequentially?
> 
> 2- By default, does Giraph keep all partitions in memory?
> 
> 3- If the answers to 1 and 2 are yes and yes, is there any advantage from 
> using multiple partitions versus a single partition in the case of single 
> threading per worker?
> 
> 3- How does the out-of-core partitions affect out-of-core messages? Are they 
> completely independent? For example, if the number of partitions to be kept 
> in memory is set to a number less than N, and at the same time all messages 
> are set to be kept in memory, will ALL messages be kept in memory, even those 
> from out-of-core partitions? If the situation is reversed, where all 
> partitions are kept in memory, and out-of-core messaging is set, will 
> messages from memory-based partitions be saved on disk?
> 
> 4- Is there a class like a PartitionContext, where you can access 
> preSuperstep and postSuperstep *per partition*, along the lines of 
> WorkerContext?
> 
> 
> 



Re: Giraph Partitioning

2015-02-25 Thread Arjun Sharma
Thanks Matthew for your replies! They are quite helpful. Regarding question
number 4, I see a commit of PartitionContext here by Maja
http://mail-archives.apache.org/mod_mbox/giraph-commits/201302.mbox/%3c20130209001122.ddad73a...@tyr.zones.apache.org%3E,
but it seems to be removed from the current version?


On Wed, Feb 25, 2015 at 3:30 AM, Matthew Saltz  wrote:

> Hi,
>
> 1) The partitions are processed in parallel based on the number of threads
> you specify. The vertices within a partition are processed sequentially.
> You may want to use more partitions than threads, that way if one partition
> takes a particularly long time to be processed, the other threads can
> continue processing the remaining partitions. If you have four machines
> with 12 threads each for example, with one worker per machine, the default
> number of partitions will be 4^2 = 16 partitions, whereas you actually have
> 48 threads available, so you'd probably want to specify the number of
> partitions manually to a larger number to take advantage of parallelism.
> 2) Yes
> 3) If you are only doing single threading, there's no reason to do
> multiple partitions per worker
> 3 (the second one)) I'm not familiar with the out-of-core functionality
> 4) I'm not sure
>
> I'm basing this on the version of Giraph from this summer, not the most
> recent release, but I don't think this part has changed. May want to verify
> by looking at the code.
>
> Best,
> Matthew
>
> On Wed, Feb 25, 2015 at 3:25 AM, Arjun Sharma  wrote:
>
>> Hi,
>>
>> I understand that by default, the number of partitions = number of
>> workers ^ 2. So, if we have N workers, each worker will process N
>> partitions. I have a number of questions:
>>
>> 1- By default, does Giraph process the N partitions within a single
>> worker sequentially? If yes, when setting the parameter
>> giraph.numComputeThreads, will partitions within each thread be computed
>> sequentially?
>>
>> 2- By default, does Giraph keep all partitions in memory?
>>
>> 3- If the answers to 1 and 2 are yes and yes, is there any advantage from
>> using multiple partitions versus a single partition in the case of single
>> threading per worker?
>>
>> 3- How does the out-of-core partitions affect out-of-core messages? Are
>> they completely independent? For example, if the number of partitions to be
>> kept in memory is set to a number less than N, and at the same time all
>> messages are set to be kept in memory, will ALL messages be kept in memory,
>> even those from out-of-core partitions? If the situation is reversed, where
>> all partitions are kept in memory, and out-of-core messaging is set, will
>> messages from memory-based partitions be saved on disk?
>>
>> 4- Is there a class like a PartitionContext, where you can access
>> preSuperstep and postSuperstep *per partition*, along the lines of
>> WorkerContext?
>>
>>
>


Re: Giraph Partitioning

2015-02-25 Thread Matthew Saltz
Hi,

1) The partitions are processed in parallel based on the number of threads
you specify. The vertices within a partition are processed sequentially.
You may want to use more partitions than threads, that way if one partition
takes a particularly long time to be processed, the other threads can
continue processing the remaining partitions. If you have four machines
with 12 threads each for example, with one worker per machine, the default
number of partitions will be 4^2 = 16 partitions, whereas you actually have
48 threads available, so you'd probably want to specify the number of
partitions manually to a larger number to take advantage of parallelism.
2) Yes
3) If you are only doing single threading, there's no reason to do multiple
partitions per worker
3 (the second one)) I'm not familiar with the out-of-core functionality
4) I'm not sure

I'm basing this on the version of Giraph from this summer, not the most
recent release, but I don't think this part has changed. May want to verify
by looking at the code.

Best,
Matthew

On Wed, Feb 25, 2015 at 3:25 AM, Arjun Sharma  wrote:

> Hi,
>
> I understand that by default, the number of partitions = number of workers
> ^ 2. So, if we have N workers, each worker will process N partitions. I
> have a number of questions:
>
> 1- By default, does Giraph process the N partitions within a single worker
> sequentially? If yes, when setting the parameter giraph.numComputeThreads,
> will partitions within each thread be computed sequentially?
>
> 2- By default, does Giraph keep all partitions in memory?
>
> 3- If the answers to 1 and 2 are yes and yes, is there any advantage from
> using multiple partitions versus a single partition in the case of single
> threading per worker?
>
> 3- How does the out-of-core partitions affect out-of-core messages? Are
> they completely independent? For example, if the number of partitions to be
> kept in memory is set to a number less than N, and at the same time all
> messages are set to be kept in memory, will ALL messages be kept in memory,
> even those from out-of-core partitions? If the situation is reversed, where
> all partitions are kept in memory, and out-of-core messaging is set, will
> messages from memory-based partitions be saved on disk?
>
> 4- Is there a class like a PartitionContext, where you can access
> preSuperstep and postSuperstep *per partition*, along the lines of
> WorkerContext?
>
>