Re: Streaming Hadoop using C

Charles Earl Wed, 29 Feb 2012 13:18:24 -0800

I assume you have also just tried running locally and using the jdk performance 
tools (e.g. jmap) to gain insight by configuring hadoop to run absolute minimum 
number of tasks?
Perhaps the discussion
http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task
might be relevant?
On Feb 29, 2012, at 3:53 PM, Mark question wrote:


> I've used hadoop profiling (.prof) to show the stack trace but it was hard
> to follow. jConsole locally since I couldn't find a way to set a port
> number to child processes when running them remotely. Linux commands
> (top,/proc), showed me that the virtual memory is almost twice as my
> physical which means swapping is happening which is what I'm trying to
> avoid.
> 
> So basically, is there a way to assign a port to child processes to monitor
> them remotely (asked before by Xun) or would you recommend another
> monitoring tool?
> 
> Thank you,
> Mark
> 
> 
> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl <charles.ce...@gmail.com>wrote:
> 
>> Mark,
>> So if I understand, it is more the memory management that you are
>> interested in, rather than a need to run an existing C or C++ application
>> in MapReduce platform?
>> Have you done profiling of the application?
>> C
>> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
>> 
>>> Thanks Charles .. I'm running Hadoop for research to perform duplicate
>>> detection methods. To go deeper, I need to understand what's slowing my
>>> program, which usually starts with analyzing memory to predict best input
>>> size for map task. So you're saying piping can help me control memory
>> even
>>> though it's running on VM eventually?
>>> 
>>> Thanks,
>>> Mark
>>> 
>>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl <charles.ce...@gmail.com
>>> wrote:
>>> 
>>>> Mark,
>>>> Both streaming and pipes allow this, perhaps more so pipes at the level
>> of
>>>> the mapreduce task. Can you provide more details on the application?
>>>> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
>>>> 
>>>>> Hi guys, thought I should ask this before I use it ... will using C
>> over
>>>>> Hadoop give me the usual C memory management? For example, malloc() ,
>>>>> sizeof() ? My guess is no since this all will eventually be turned into
>>>>> bytecode, but I need more control on memory which obviously is hard for
>>>> me
>>>>> to do with Java.
>>>>> 
>>>>> Let me know of any advantages you know about streaming in C over
>> hadoop.
>>>>> Thank you,
>>>>> Mark
>>>> 
>>>> 
>> 
>>

Re: Streaming Hadoop using C

Reply via email to