Re: Streaming Hadoop using C

2012-03-01 Thread Mark question
Starfish worked great for wordcount .. I didn't run it on my application
because I have only map tasks.

Mark

On Thu, Mar 1, 2012 at 4:34 AM, Charles Earl wrote:

> How was your experience of starfish?
> C
> On Mar 1, 2012, at 12:35 AM, Mark question wrote:
>
> > Thank you for your time and suggestions, I've already tried starfish, but
> > not jmap. I'll check it out.
> > Thanks again,
> > Mark
> >
> > On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl  >wrote:
> >
> >> I assume you have also just tried running locally and using the jdk
> >> performance tools (e.g. jmap) to gain insight by configuring hadoop to
> run
> >> absolute minimum number of tasks?
> >> Perhaps the discussion
> >>
> >>
> http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task
> >> might be relevant?
> >> On Feb 29, 2012, at 3:53 PM, Mark question wrote:
> >>
> >>> I've used hadoop profiling (.prof) to show the stack trace but it was
> >> hard
> >>> to follow. jConsole locally since I couldn't find a way to set a port
> >>> number to child processes when running them remotely. Linux commands
> >>> (top,/proc), showed me that the virtual memory is almost twice as my
> >>> physical which means swapping is happening which is what I'm trying to
> >>> avoid.
> >>>
> >>> So basically, is there a way to assign a port to child processes to
> >> monitor
> >>> them remotely (asked before by Xun) or would you recommend another
> >>> monitoring tool?
> >>>
> >>> Thank you,
> >>> Mark
> >>>
> >>>
> >>> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl <
> charles.ce...@gmail.com
> >>> wrote:
> >>>
>  Mark,
>  So if I understand, it is more the memory management that you are
>  interested in, rather than a need to run an existing C or C++
> >> application
>  in MapReduce platform?
>  Have you done profiling of the application?
>  C
>  On Feb 29, 2012, at 2:19 PM, Mark question wrote:
> 
> > Thanks Charles .. I'm running Hadoop for research to perform
> duplicate
> > detection methods. To go deeper, I need to understand what's slowing
> my
> > program, which usually starts with analyzing memory to predict best
> >> input
> > size for map task. So you're saying piping can help me control memory
>  even
> > though it's running on VM eventually?
> >
> > Thanks,
> > Mark
> >
> > On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl <
> >> charles.ce...@gmail.com
> > wrote:
> >
> >> Mark,
> >> Both streaming and pipes allow this, perhaps more so pipes at the
> >> level
>  of
> >> the mapreduce task. Can you provide more details on the application?
> >> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
> >>
> >>> Hi guys, thought I should ask this before I use it ... will using C
>  over
> >>> Hadoop give me the usual C memory management? For example,
> malloc() ,
> >>> sizeof() ? My guess is no since this all will eventually be turned
> >> into
> >>> bytecode, but I need more control on memory which obviously is hard
> >> for
> >> me
> >>> to do with Java.
> >>>
> >>> Let me know of any advantages you know about streaming in C over
>  hadoop.
> >>> Thank you,
> >>> Mark
> >>
> >>
> 
> 
> >>
> >>
>
>


Re: Streaming Hadoop using C

2012-03-01 Thread Charles Earl
How was your experience of starfish?
C
On Mar 1, 2012, at 12:35 AM, Mark question wrote:

> Thank you for your time and suggestions, I've already tried starfish, but
> not jmap. I'll check it out.
> Thanks again,
> Mark
> 
> On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl wrote:
> 
>> I assume you have also just tried running locally and using the jdk
>> performance tools (e.g. jmap) to gain insight by configuring hadoop to run
>> absolute minimum number of tasks?
>> Perhaps the discussion
>> 
>> http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task
>> might be relevant?
>> On Feb 29, 2012, at 3:53 PM, Mark question wrote:
>> 
>>> I've used hadoop profiling (.prof) to show the stack trace but it was
>> hard
>>> to follow. jConsole locally since I couldn't find a way to set a port
>>> number to child processes when running them remotely. Linux commands
>>> (top,/proc), showed me that the virtual memory is almost twice as my
>>> physical which means swapping is happening which is what I'm trying to
>>> avoid.
>>> 
>>> So basically, is there a way to assign a port to child processes to
>> monitor
>>> them remotely (asked before by Xun) or would you recommend another
>>> monitoring tool?
>>> 
>>> Thank you,
>>> Mark
>>> 
>>> 
>>> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl >> wrote:
>>> 
 Mark,
 So if I understand, it is more the memory management that you are
 interested in, rather than a need to run an existing C or C++
>> application
 in MapReduce platform?
 Have you done profiling of the application?
 C
 On Feb 29, 2012, at 2:19 PM, Mark question wrote:
 
> Thanks Charles .. I'm running Hadoop for research to perform duplicate
> detection methods. To go deeper, I need to understand what's slowing my
> program, which usually starts with analyzing memory to predict best
>> input
> size for map task. So you're saying piping can help me control memory
 even
> though it's running on VM eventually?
> 
> Thanks,
> Mark
> 
> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl <
>> charles.ce...@gmail.com
> wrote:
> 
>> Mark,
>> Both streaming and pipes allow this, perhaps more so pipes at the
>> level
 of
>> the mapreduce task. Can you provide more details on the application?
>> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
>> 
>>> Hi guys, thought I should ask this before I use it ... will using C
 over
>>> Hadoop give me the usual C memory management? For example, malloc() ,
>>> sizeof() ? My guess is no since this all will eventually be turned
>> into
>>> bytecode, but I need more control on memory which obviously is hard
>> for
>> me
>>> to do with Java.
>>> 
>>> Let me know of any advantages you know about streaming in C over
 hadoop.
>>> Thank you,
>>> Mark
>> 
>> 
 
 
>> 
>> 



Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
Thank you for your time and suggestions, I've already tried starfish, but
not jmap. I'll check it out.
Thanks again,
Mark

On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl wrote:

> I assume you have also just tried running locally and using the jdk
> performance tools (e.g. jmap) to gain insight by configuring hadoop to run
> absolute minimum number of tasks?
> Perhaps the discussion
>
> http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task
> might be relevant?
> On Feb 29, 2012, at 3:53 PM, Mark question wrote:
>
> > I've used hadoop profiling (.prof) to show the stack trace but it was
> hard
> > to follow. jConsole locally since I couldn't find a way to set a port
> > number to child processes when running them remotely. Linux commands
> > (top,/proc), showed me that the virtual memory is almost twice as my
> > physical which means swapping is happening which is what I'm trying to
> > avoid.
> >
> > So basically, is there a way to assign a port to child processes to
> monitor
> > them remotely (asked before by Xun) or would you recommend another
> > monitoring tool?
> >
> > Thank you,
> > Mark
> >
> >
> > On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl  >wrote:
> >
> >> Mark,
> >> So if I understand, it is more the memory management that you are
> >> interested in, rather than a need to run an existing C or C++
> application
> >> in MapReduce platform?
> >> Have you done profiling of the application?
> >> C
> >> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
> >>
> >>> Thanks Charles .. I'm running Hadoop for research to perform duplicate
> >>> detection methods. To go deeper, I need to understand what's slowing my
> >>> program, which usually starts with analyzing memory to predict best
> input
> >>> size for map task. So you're saying piping can help me control memory
> >> even
> >>> though it's running on VM eventually?
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl <
> charles.ce...@gmail.com
> >>> wrote:
> >>>
>  Mark,
>  Both streaming and pipes allow this, perhaps more so pipes at the
> level
> >> of
>  the mapreduce task. Can you provide more details on the application?
>  On Feb 29, 2012, at 1:56 PM, Mark question wrote:
> 
> > Hi guys, thought I should ask this before I use it ... will using C
> >> over
> > Hadoop give me the usual C memory management? For example, malloc() ,
> > sizeof() ? My guess is no since this all will eventually be turned
> into
> > bytecode, but I need more control on memory which obviously is hard
> for
>  me
> > to do with Java.
> >
> > Let me know of any advantages you know about streaming in C over
> >> hadoop.
> > Thank you,
> > Mark
> 
> 
> >>
> >>
>
>


Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl
I assume you have also just tried running locally and using the jdk performance 
tools (e.g. jmap) to gain insight by configuring hadoop to run absolute minimum 
number of tasks?
Perhaps the discussion
http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task
might be relevant?
On Feb 29, 2012, at 3:53 PM, Mark question wrote:

> I've used hadoop profiling (.prof) to show the stack trace but it was hard
> to follow. jConsole locally since I couldn't find a way to set a port
> number to child processes when running them remotely. Linux commands
> (top,/proc), showed me that the virtual memory is almost twice as my
> physical which means swapping is happening which is what I'm trying to
> avoid.
> 
> So basically, is there a way to assign a port to child processes to monitor
> them remotely (asked before by Xun) or would you recommend another
> monitoring tool?
> 
> Thank you,
> Mark
> 
> 
> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl wrote:
> 
>> Mark,
>> So if I understand, it is more the memory management that you are
>> interested in, rather than a need to run an existing C or C++ application
>> in MapReduce platform?
>> Have you done profiling of the application?
>> C
>> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
>> 
>>> Thanks Charles .. I'm running Hadoop for research to perform duplicate
>>> detection methods. To go deeper, I need to understand what's slowing my
>>> program, which usually starts with analyzing memory to predict best input
>>> size for map task. So you're saying piping can help me control memory
>> even
>>> though it's running on VM eventually?
>>> 
>>> Thanks,
>>> Mark
>>> 
>>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl >> wrote:
>>> 
 Mark,
 Both streaming and pipes allow this, perhaps more so pipes at the level
>> of
 the mapreduce task. Can you provide more details on the application?
 On Feb 29, 2012, at 1:56 PM, Mark question wrote:
 
> Hi guys, thought I should ask this before I use it ... will using C
>> over
> Hadoop give me the usual C memory management? For example, malloc() ,
> sizeof() ? My guess is no since this all will eventually be turned into
> bytecode, but I need more control on memory which obviously is hard for
 me
> to do with Java.
> 
> Let me know of any advantages you know about streaming in C over
>> hadoop.
> Thank you,
> Mark
 
 
>> 
>> 



Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl
The documentation on Starfish http://www.cs.duke.edu/starfish/index.html
looks promising , I have not used it. I wonder if others on the list have found 
it more useful than setting mapred.task.profile.
C
On Feb 29, 2012, at 3:53 PM, Mark question wrote:

> I've used hadoop profiling (.prof) to show the stack trace but it was hard
> to follow. jConsole locally since I couldn't find a way to set a port
> number to child processes when running them remotely. Linux commands
> (top,/proc), showed me that the virtual memory is almost twice as my
> physical which means swapping is happening which is what I'm trying to
> avoid.
> 
> So basically, is there a way to assign a port to child processes to monitor
> them remotely (asked before by Xun) or would you recommend another
> monitoring tool?
> 
> Thank you,
> Mark
> 
> 
> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl wrote:
> 
>> Mark,
>> So if I understand, it is more the memory management that you are
>> interested in, rather than a need to run an existing C or C++ application
>> in MapReduce platform?
>> Have you done profiling of the application?
>> C
>> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
>> 
>>> Thanks Charles .. I'm running Hadoop for research to perform duplicate
>>> detection methods. To go deeper, I need to understand what's slowing my
>>> program, which usually starts with analyzing memory to predict best input
>>> size for map task. So you're saying piping can help me control memory
>> even
>>> though it's running on VM eventually?
>>> 
>>> Thanks,
>>> Mark
>>> 
>>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl >> wrote:
>>> 
 Mark,
 Both streaming and pipes allow this, perhaps more so pipes at the level
>> of
 the mapreduce task. Can you provide more details on the application?
 On Feb 29, 2012, at 1:56 PM, Mark question wrote:
 
> Hi guys, thought I should ask this before I use it ... will using C
>> over
> Hadoop give me the usual C memory management? For example, malloc() ,
> sizeof() ? My guess is no since this all will eventually be turned into
> bytecode, but I need more control on memory which obviously is hard for
 me
> to do with Java.
> 
> Let me know of any advantages you know about streaming in C over
>> hadoop.
> Thank you,
> Mark
 
 
>> 
>> 



Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
I've used hadoop profiling (.prof) to show the stack trace but it was hard
to follow. jConsole locally since I couldn't find a way to set a port
number to child processes when running them remotely. Linux commands
(top,/proc), showed me that the virtual memory is almost twice as my
physical which means swapping is happening which is what I'm trying to
avoid.

So basically, is there a way to assign a port to child processes to monitor
them remotely (asked before by Xun) or would you recommend another
monitoring tool?

Thank you,
Mark


On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl wrote:

> Mark,
> So if I understand, it is more the memory management that you are
> interested in, rather than a need to run an existing C or C++ application
> in MapReduce platform?
> Have you done profiling of the application?
> C
> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
>
> > Thanks Charles .. I'm running Hadoop for research to perform duplicate
> > detection methods. To go deeper, I need to understand what's slowing my
> > program, which usually starts with analyzing memory to predict best input
> > size for map task. So you're saying piping can help me control memory
> even
> > though it's running on VM eventually?
> >
> > Thanks,
> > Mark
> >
> > On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl  >wrote:
> >
> >> Mark,
> >> Both streaming and pipes allow this, perhaps more so pipes at the level
> of
> >> the mapreduce task. Can you provide more details on the application?
> >> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
> >>
> >>> Hi guys, thought I should ask this before I use it ... will using C
> over
> >>> Hadoop give me the usual C memory management? For example, malloc() ,
> >>> sizeof() ? My guess is no since this all will eventually be turned into
> >>> bytecode, but I need more control on memory which obviously is hard for
> >> me
> >>> to do with Java.
> >>>
> >>> Let me know of any advantages you know about streaming in C over
> hadoop.
> >>> Thank you,
> >>> Mark
> >>
> >>
>
>


Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl
Mark,
So if I understand, it is more the memory management that you are interested 
in, rather than a need to run an existing C or C++ application in MapReduce 
platform?
Have you done profiling of the application?
C
On Feb 29, 2012, at 2:19 PM, Mark question wrote:

> Thanks Charles .. I'm running Hadoop for research to perform duplicate
> detection methods. To go deeper, I need to understand what's slowing my
> program, which usually starts with analyzing memory to predict best input
> size for map task. So you're saying piping can help me control memory even
> though it's running on VM eventually?
> 
> Thanks,
> Mark
> 
> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl wrote:
> 
>> Mark,
>> Both streaming and pipes allow this, perhaps more so pipes at the level of
>> the mapreduce task. Can you provide more details on the application?
>> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
>> 
>>> Hi guys, thought I should ask this before I use it ... will using C over
>>> Hadoop give me the usual C memory management? For example, malloc() ,
>>> sizeof() ? My guess is no since this all will eventually be turned into
>>> bytecode, but I need more control on memory which obviously is hard for
>> me
>>> to do with Java.
>>> 
>>> Let me know of any advantages you know about streaming in C over hadoop.
>>> Thank you,
>>> Mark
>> 
>> 



Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
Thanks Charles .. I'm running Hadoop for research to perform duplicate
detection methods. To go deeper, I need to understand what's slowing my
program, which usually starts with analyzing memory to predict best input
size for map task. So you're saying piping can help me control memory even
though it's running on VM eventually?

Thanks,
Mark

On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl wrote:

> Mark,
> Both streaming and pipes allow this, perhaps more so pipes at the level of
> the mapreduce task. Can you provide more details on the application?
> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
>
> > Hi guys, thought I should ask this before I use it ... will using C over
> > Hadoop give me the usual C memory management? For example, malloc() ,
> > sizeof() ? My guess is no since this all will eventually be turned into
> > bytecode, but I need more control on memory which obviously is hard for
> me
> > to do with Java.
> >
> > Let me know of any advantages you know about streaming in C over hadoop.
> > Thank you,
> > Mark
>
>


Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl
Mark,
Both streaming and pipes allow this, perhaps more so pipes at the level of the 
mapreduce task. Can you provide more details on the application?
On Feb 29, 2012, at 1:56 PM, Mark question wrote:

> Hi guys, thought I should ask this before I use it ... will using C over
> Hadoop give me the usual C memory management? For example, malloc() ,
> sizeof() ? My guess is no since this all will eventually be turned into
> bytecode, but I need more control on memory which obviously is hard for me
> to do with Java.
> 
> Let me know of any advantages you know about streaming in C over hadoop.
> Thank you,
> Mark



Streaming Hadoop using C

2012-02-29 Thread Mark question
Hi guys, thought I should ask this before I use it ... will using C over
Hadoop give me the usual C memory management? For example, malloc() ,
sizeof() ? My guess is no since this all will eventually be turned into
bytecode, but I need more control on memory which obviously is hard for me
to do with Java.

Let me know of any advantages you know about streaming in C over hadoop.
Thank you,
Mark