[jira] [Commented] (GIRAPH-42) The MapReduce counter 'Sent Messages' doesn't work.

2011-09-27 Thread Severin Corsten (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116196#comment-13116196
 ] 

Severin Corsten commented on GIRAPH-42:
---

I looked on this problem today because of an other reason, bud I don't know how 
to fix it right:

1. Increment the Counter in the sentMsg Method in Vertex -> If messages are 
combined it does not show up in the Counter.
2. Increment the Counter in after applying the combiner, which is currently 
impossible because the RPC Connection does not know the contex. 
3. Increment the Counter in the map method when executing the next superstep -> 
Delays the counting by one superstep.

At the moment I like 3 the most. 

Greetings Severin

> The MapReduce counter 'Sent Messages' doesn't work.
> ---
>
> Key: GIRAPH-42
> URL: https://issues.apache.org/jira/browse/GIRAPH-42
> Project: Giraph
>  Issue Type: Bug
>  Components: bsp
>Reporter: Hyunsik Choi
>Priority: Minor
>
> The MapReduce counter 'Sent Messages' doesn't work. It always shows 0.
> {noformat}
> .
> .
> 11/09/28 10:51:22 INFO mapred.JobClient: Current workers=20
> 11/09/28 10:51:22 INFO mapred.JobClient: Current master task partition=0
> 11/09/28 10:51:22 INFO mapred.JobClient: Sent messages=0
> 11/09/28 10:51:22 INFO mapred.JobClient: Aggregate finished 
> vertices=60
> 11/09/28 10:51:22 INFO mapred.JobClient: Aggregate vertices=60
> .
> .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-42) The MapReduce counter 'Sent Messages' doesn't work.

2011-09-27 Thread Hyunsik Choi (Created) (JIRA)
The MapReduce counter 'Sent Messages' doesn't work.
---

 Key: GIRAPH-42
 URL: https://issues.apache.org/jira/browse/GIRAPH-42
 Project: Giraph
  Issue Type: Bug
  Components: bsp
Reporter: Hyunsik Choi
Priority: Minor


The MapReduce counter 'Sent Messages' doesn't work. It always shows 0.

{noformat}
.
.
11/09/28 10:51:22 INFO mapred.JobClient: Current workers=20
11/09/28 10:51:22 INFO mapred.JobClient: Current master task partition=0
11/09/28 10:51:22 INFO mapred.JobClient: Sent messages=0
11/09/28 10:51:22 INFO mapred.JobClient: Aggregate finished vertices=60
11/09/28 10:51:22 INFO mapred.JobClient: Aggregate vertices=60
.
.
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116177#comment-13116177
 ] 

Hyunsik Choi commented on GIRAPH-12:


FYI, I record how I collected the memory log messages of Runtime.

{noformat}
grep "totalMem" hadoop/logs/userlogs/job_201109281028_0007/ -r > orig_1.log

cat orig_1.log | awk '{print $6" "$7" "$8}' | sed 's/totalMem=//g' | sed 
's/maxMem=//' | 
sed 's/freeMem=//' | awk '{total=total+$1; max=max+$2; free=free+$3} END {print 
total/NR,max/NR,free/NR}'
{noformat}

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116166#comment-13116166
 ] 

Hyunsik Choi commented on GIRAPH-12:


Avery,
Thank you for your comments. I decided to use Runtime. It seems to be enough to 
investigate this issue.

Again, I conducted a benchmark to measure memory consumption with 
RandomMessageBenchmark as follows:

{noformat}
hadoop jar giraph-0.70-jar-with-dependencies.jar 
org.apache.giraph.benchmark.RandomMessageBenchmark -e 2 -s 3 -w 20 -b 4 -n 150 
-V ${V} -v -f ${f}
{noformat}
, where 'f' option indicates the number of threads of thread pool. And, I 
changed the the thread executor as FixedThreadPool class.

I conducted two times for every experiment and I got the average of them. You 
can see the results from the below link:
http://goo.gl/arP62

This experiments was conducted in two cluaster nodes, each of which has 24 
cores and 64GB mem. They are connected each other over 1Gbps ethernet. I 
measured the memory footprints from Runtime in GraphMapper as Avery recommended.

In sum, the thread pool approach is better than original approach in terms of 
processing times. I guess that this is because the thread pool approach reduces 
the context switching cost and narrow the synchronization area.

Unfortunately, however, the thread pool approach doesn't reduce the memory 
consumption. This is the main focus of this issue. Rather, this approach needs 
slightly more memory as shown in Figure 3 and 4. However, we need to note the 
experiments with f = 5 and f = 20. In these experiments, the number of threads 
has small effect on the memory consumption.

We have faced the memory problem. We may need to approach this problem from 
another aspect.
I think that this problem may be mainly caused by the current message flushing 
strategy.

In current implementation, outgoing messages are transmitted to other peers by 
only two cases:
1) When the number of outgoing messages for a specific peer exceeds the a 
threshold (i.e., maxSize), the outgoing messages for the peer are transmitted 
to the peer.
2) When one super step is finished, the entire messages are flushed to other 
peers.

Flush (case 2) is only triggered at the end of superstep. During processing, 
the message flushing only depends on the case 1. This may be not effective 
because the case 1 only consider the the number of messages for each specific 
peer. It never take account of the real memory occupation. If destinations of 
outgoing messages are uniform, out of memory may occur before any 'case 1' is 
triggered.

To overcome this problem, we may need more eager message flushing strategy or 
some approach to store overflow messages into disk.

Let me know what you think.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira