[jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line

2009-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755716#action_12755716
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-621:
-

I guess you need a separated MAPREDUCE patch for MiniMR.

> Exposing MiniDFS and MiniMR clusters as a single process command-line
> -
>
> Key: HDFS-621
> URL: https://issues.apache.org/jira/browse/HDFS-621
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: test, tools
>Reporter: Philip Zeyliger
>Priority: Minor
>
> It's hard to test non-Java programs that rely on significant mapreduce 
> functionality.  The patch I'm proposing shortly will let you just type 
> "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster" to start a 
> cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number 
> of daemons, etc.  A test that checks how some external process interacts with 
> Hadoop might start minicluster as a subprocess, run through its thing, and 
> then simply kill the java subprocess.
> I've been using just such a system for a couple of weeks, and I like it.  
> It's significantly easier than developing a lot of scripts to start a 
> pseudo-distributed cluster, and then clean up after it.  I figure others 
> might find it useful as well.
> I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests 
> have all the required libraries, so I've put it there.  I could conceivably 
> split this into "minimr" and "minihdfs", but it's specifically the fact that 
> they're configured to talk to each other that I like about having them 
> together.  And one JVM is better than two for my test programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line

2009-09-15 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755718#action_12755718
 ] 

Konstantin Boudnik commented on HDFS-621:
-

.bq And one JVM is better than two for my test programs.
Is it better for the testing of the product ? ;-)

> Exposing MiniDFS and MiniMR clusters as a single process command-line
> -
>
> Key: HDFS-621
> URL: https://issues.apache.org/jira/browse/HDFS-621
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: test, tools
>Reporter: Philip Zeyliger
>Priority: Minor
>
> It's hard to test non-Java programs that rely on significant mapreduce 
> functionality.  The patch I'm proposing shortly will let you just type 
> "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster" to start a 
> cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number 
> of daemons, etc.  A test that checks how some external process interacts with 
> Hadoop might start minicluster as a subprocess, run through its thing, and 
> then simply kill the java subprocess.
> I've been using just such a system for a couple of weeks, and I like it.  
> It's significantly easier than developing a lot of scripts to start a 
> pseudo-distributed cluster, and then clean up after it.  I figure others 
> might find it useful as well.
> I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests 
> have all the required libraries, so I've put it there.  I could conceivably 
> split this into "minimr" and "minihdfs", but it's specifically the fact that 
> they're configured to talk to each other that I like about having them 
> together.  And one JVM is better than two for my test programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line

2009-09-15 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755733#action_12755733
 ] 

Owen O'Malley commented on HDFS-621:


I would rather that the local runner be made to work better than fake a mini 
cluster. For testing the framework, the mini cluster makes sense. But I think 
that for testing applications, the local runner is a much better fit.

> Exposing MiniDFS and MiniMR clusters as a single process command-line
> -
>
> Key: HDFS-621
> URL: https://issues.apache.org/jira/browse/HDFS-621
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: test, tools
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: HDFS-621.patch
>
>
> It's hard to test non-Java programs that rely on significant mapreduce 
> functionality.  The patch I'm proposing shortly will let you just type 
> "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster" to start a 
> cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number 
> of daemons, etc.  A test that checks how some external process interacts with 
> Hadoop might start minicluster as a subprocess, run through its thing, and 
> then simply kill the java subprocess.
> I've been using just such a system for a couple of weeks, and I like it.  
> It's significantly easier than developing a lot of scripts to start a 
> pseudo-distributed cluster, and then clean up after it.  I figure others 
> might find it useful as well.
> I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests 
> have all the required libraries, so I've put it there.  I could conceivably 
> split this into "minimr" and "minihdfs", but it's specifically the fact that 
> they're configured to talk to each other that I like about having them 
> together.  And one JVM is better than two for my test programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line

2009-09-15 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755732#action_12755732
 ] 

Philip Zeyliger commented on HDFS-621:
--

bq. I guess you need a separated MAPREDUCE patch for MiniMR. 

If you would, take a quick look to see how I use MiniMRCluster.  Do you feel 
I'm abusing the fact that hdfs-hdfswithmr-test exists?

bq. Is it better for the testing of the product?

It doesn't contribute anything to testing Hadoop.  It does make it easier to 
write tests for programs that are downstream of Hadoop.  One might argue that 
such facilities ought to be "top-level" and not shunted into a "-test" jar.  
I'm open to suggestions.  (I could also see this being shunted to contrib, if 
folks have a strong preference.)

> Exposing MiniDFS and MiniMR clusters as a single process command-line
> -
>
> Key: HDFS-621
> URL: https://issues.apache.org/jira/browse/HDFS-621
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: test, tools
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: HDFS-621.patch
>
>
> It's hard to test non-Java programs that rely on significant mapreduce 
> functionality.  The patch I'm proposing shortly will let you just type 
> "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster" to start a 
> cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number 
> of daemons, etc.  A test that checks how some external process interacts with 
> Hadoop might start minicluster as a subprocess, run through its thing, and 
> then simply kill the java subprocess.
> I've been using just such a system for a couple of weeks, and I like it.  
> It's significantly easier than developing a lot of scripts to start a 
> pseudo-distributed cluster, and then clean up after it.  I figure others 
> might find it useful as well.
> I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests 
> have all the required libraries, so I've put it there.  I could conceivably 
> split this into "minimr" and "minihdfs", but it's specifically the fact that 
> they're configured to talk to each other that I like about having them 
> together.  And one JVM is better than two for my test programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line

2009-09-15 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755735#action_12755735
 ] 

Philip Zeyliger commented on HDFS-621:
--

Owen,

I totally agree that LocalJobRunner should be maximally useful.  That's great 
for testing jobs.

Let's say I have a python class that knows how to interact with HDFS and MR.  
It knows how to look at files, start jobs, etc.  I call out to hadoop binaries 
to interact with HDFS, and I want to capture all the details that occur when I 
talk to my real cluster.  For this, if I were in Java, I'd spin up a Mini* 
cluster.  Since I'm not in Java, I resort to spinning up a subprocess.  I could 
also mock everything out, but at the end of the day, I want an integration 
test, and I really don't want to run it against a cluster that has to be setup 
externally: I'd rather the cluster be spun up and shut down by my test itself.

I'm happy to throw this contrib/ if you feel strongly about it.  I figure it'd 
be useful to other folks.

-- Philip

> Exposing MiniDFS and MiniMR clusters as a single process command-line
> -
>
> Key: HDFS-621
> URL: https://issues.apache.org/jira/browse/HDFS-621
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: test, tools
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Minor
> Attachments: HDFS-621.patch
>
>
> It's hard to test non-Java programs that rely on significant mapreduce 
> functionality.  The patch I'm proposing shortly will let you just type 
> "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster" to start a 
> cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number 
> of daemons, etc.  A test that checks how some external process interacts with 
> Hadoop might start minicluster as a subprocess, run through its thing, and 
> then simply kill the java subprocess.
> I've been using just such a system for a couple of weeks, and I like it.  
> It's significantly easier than developing a lot of scripts to start a 
> pseudo-distributed cluster, and then clean up after it.  I figure others 
> might find it useful as well.
> I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests 
> have all the required libraries, so I've put it there.  I could conceivably 
> split this into "minimr" and "minihdfs", but it's specifically the fact that 
> they're configured to talk to each other that I like about having them 
> together.  And one JVM is better than two for my test programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line

2009-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755743#action_12755743
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-621:
-

> If you would, take a quick look to see how I use MiniMRCluster. Do you feel 
> I'm abusing the fact that hdfs-hdfswithmr-test exists?

No more mapreduce codes in hdfs, please.  Having hdfs-with-mr in hdfs is a 
mistake.  It leads to a circular dependence.  Indeed, we should move 
hdfs-with-mr to mapreduce.


> Exposing MiniDFS and MiniMR clusters as a single process command-line
> -
>
> Key: HDFS-621
> URL: https://issues.apache.org/jira/browse/HDFS-621
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: test, tools
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Minor
> Attachments: HDFS-621.patch
>
>
> It's hard to test non-Java programs that rely on significant mapreduce 
> functionality.  The patch I'm proposing shortly will let you just type 
> "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster" to start a 
> cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number 
> of daemons, etc.  A test that checks how some external process interacts with 
> Hadoop might start minicluster as a subprocess, run through its thing, and 
> then simply kill the java subprocess.
> I've been using just such a system for a couple of weeks, and I like it.  
> It's significantly easier than developing a lot of scripts to start a 
> pseudo-distributed cluster, and then clean up after it.  I figure others 
> might find it useful as well.
> I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests 
> have all the required libraries, so I've put it there.  I could conceivably 
> split this into "minimr" and "minihdfs", but it's specifically the fact that 
> they're configured to talk to each other that I like about having them 
> together.  And one JVM is better than two for my test programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.